Technical Approach
Explore the overall technical approach for GSAFleet.gov, including some of the innovative practices and tools that were leveraged during the move into the FCS cloud ecosystem, including Redshift, Streamsets, GraphQL, and Kinesis.
High-Level Approach
Business Benefit: Reduced technical debt enables faster application O&M and reduces long-term resources requirements.
GSAFleet.gov application uses microservices architecture that follows the Domain Driven Design pattern. These microservices closely align with the business domains and they provide the functionality related to those domains. According to this design pattern, the system is designed so that a particular functional transaction may involve multiple microservices. Each microservice that needs any persistent data would maintain its own data storage, independent of the data storages maintained by the other microservices. Each microservice creates, reads, updates and deletes only the data pertaining to it. For example, a customer microservice would only work with the customer metadata (address, etc).

This diagram depicts the overall architecture for GSAFleet, which makes it clear that there are multiple services that working together implement all the functionalities of the system. Each service is relying on its own database.
Innovations
The modernization of GSAFleet.gov has included several innovative practices and tools that may be useful in future modernization efforts. Explore those practices and tools in the sections below.
Innovative Practices
Business Benefit: Enabling both strategic and tactical day-to-day decision making through streamlining and simplifying the process for analyzing and generating reports on transactional data.
Fleet Vehicles serve a wide array of users and functions - from delivery of mail to law enforcement. Hence each class of users/vehicles have unique needs for GSAFleet and its data. A data flow in the current Fleet architecture is shown schematically in the diagram below
Fleet Data Analytics synchronizes transactional data into the cloud based analytical Redshift database which is used as a data source for the reporting dashboards developed in Microstrategy.
The semantic layer structures the data in a form suitable for analytical workload. In addition, the semantic layer defines a common business vocabulary. This is particularly useful because different business units might use the same terms for the different entities. The universal semantic layer ensures everyone is on the same page and avoids confusion when analyzing data.
A sample Microstrategy report is shown below.
Telematics enables Fleet to drive the key business outcomes listed below. For more information on how Fleet is implementing telematics, click here.
Support current fleet and transition to the electric future
- Compare data of fleet’s daily requirements over time to match vehicles with suitable Electric Vehicle (EV) replacements.
- Report on EV battery levels and trends to optimize charging strategies and cut costs and carbon.
Keep everyone safe on the road
- Provide advanced safety features to report on vehicle usage and safety habits, including seat belt use, speeding, harsh acceleration, braking, turning, etc.
- Review driver safety scorecards or receive instant notification when a driver breaks safety rules.
- Reconstruct collision events to investigate safety issues in real-time.
Cut asset procurement and maintenance expenses
- Achieve efficient allocation and utilization of vehicles.
- Reduce procurement costs and ongoing maintenance by using assets more productively.
- Expand solutions further and predict maintenance requirements before issues arise.
About Geotab
Geotab Offers:
- LTE connected devices
- Data gathering (w/ SLA)
- Capabilities
- Diagnostics
- Collision Detection
- Alerts / Coaching
- Location
- Extended Options
- Features
- Python API
- Wifi Hotspot
- Optionally
- Imaging
- Analytics
- Rugged Housing
- Management Software
- IOX expansion
Conceptual Architecture
The following diagram shows the conceptual architecture for Fleet's IoT implementation.

Data from each vehicle are accumulated by the telematics data provider and a client software downloads the data into the PostgreSQL database. Next, a process selects recent data for each parameter from that postgresql database and inserts them into a parameter-specific table in redshift. A separate process moves old data from redshift into s3 bucket. AWS Glue catalogs these data in the s3 bucket. A subset of the telematics data is copied into the MySQL database so GSAFleet.gov could access it. A lambda function sends an email notification if recent data are not available.
Business Benefit:Stronger protection of business and user data during application testing while also meeting legal requirements for the protection of same.
Only the production environment is allowed to host real production data. However, how does one perform testing if the test environment does not have real data? Many things in data workflow, application functionality in business workflow depend on a particular form of data so a mock data often cannot meet the testing needs. To address this problem the production data can be copied into the test environment in a disguised form. Roughly speaking if a variable value in production is ABC, it can be stored in the testing environment as XYZ. This way the software can still handle the variable the same way as the production system does, but the stored and displayed value would not be real. The challenge is to mask the data consistently between the tables and across the databases, so the same contract number would still be the same after masking.
This is how the data masking is performed. Fleet business line has determined the fields that need to be masked. That list of the fields to be masked serves as the input for the data masking pipeline. The masking process utilizes Streamsets as the ETL tool and open source encryption libraries that are FIPS 140-2 compliant to dynamically mask these lists of fields across tables and schemas and write them to the staged temporary instance in a temporary database. Once the masking is done, the snapshots of the prepared masked databases are copied into the test environment. Then the masked data in the testing environment could be copied into the legacy test databases using a special loading process and the same data could be consolidated into the masked as is consolidated database. From there the consolidated masked data are propagated into the GSAFleet test system via the Streamsets data pipelines. This way all the databases in the test environment have the same masked data.
Innovative Tools
Business Benefit:Facilitating decisions during data discovery and pipeline development phase by providing ability to assess data quality.
Data profiling allows a business data expert or data engineer to assess a particular piece of data in terms of data quality, to see the range of values that a particular variable has. Having data profiling information can allow data engineers to make decisions on how to design a part of the database schema where this variable is going to be stored. So, a Python based web tool Streamlit hosting Sweetviz reports is used for this purpose.
A sample data profiling report is shown above. It shows data values distribution in a text or in a graph form as appropriate. It also shows how many records have the value missing and how unique the existing values for each field are. These kinds of reports might be helpful in assessing the data from a business perspective or for a database engineering purpose.
Business Benefit:Saving time and cost on data pipelines development and maintenance by using a tool that allows to build and maintain the data pipelines using a graphic user interface with drag-and-drop functionality and advanced monitoring capabilities
While the modernization project is ongoing the data produced in the legacy Fleet systems have to be made available to the new FSAFleet.gov system. Likewise data originated from GSAFleet.gov are needed in the legacy systems. So multiple pipelines must be implemented between these two systems. Such pipelines need to be on the enterprise level of reliability and maintainability.
StreamSets tool is an off the shelf data migration solution that allows building simple data pipelines using a graphic user interface with the drag and drop features. A pipeline could be customized to any level of complexity up to including a separately designed code as a step in the pipeline. Moreover, one can create a template with carefully designed data validations, error handling and other necessary features. Then a new pipeline could be based on such a template reducing the development time and improving the pipeline quality and maintainability.
A prepared pipeline can be executed on a schedule or it can run continuously using change data capture log processing from the source database. Of course, it can also be executed manually as required for example to bulk load the data from source to target.
There is a separate pipeline for each source table migrating the data into a target table in the new system. The reverse data pipelines synchronize data from the new system into the legacy system and also into the consolidated database. The later synchronization is necessary to enable a record initially created in a new system to be written back into legacy, legacy then can update this record and such an update needs to propagate into the consolidated database to be read from the new system. So, the consolidated database should have the original record in the first place before it can be updated from legacy. This pipeline system accomplishes such a task.
The software provides a report for each pipeline displaying number of records migrated, number of exceptions if any and other relevant information.
Business Benefit: The Legacy Fleets Systems relied on secure yet dated asynchronous communication tools between systems and government organizations. Given the need to both transfer more data and to transfer that data in near-instantaneous timeframes, a more sophisticated streaming technology was required. Additionally, accelerating data processing to get business insights and respond to fraud and other threats more quickly.
Amazon Kinesis is a fully managed service that enables users to collect, process, and analyze real-time streaming data at scale. It provides a simple and cost-effective way to handle large amounts of streaming data in real-time and can be used for a variety of use cases, including data processing, machine learning, real-time analytics, and more.
Amazon Kinesis Streams is one of the core components of the Amazon Kinesis service. It is a scalable and durable data streaming service that allows users to collect and process large amounts of data from multiple sources in real-time. With Kinesis Streams, users can ingest data in real-time, process it, and store it in a distributed manner. The data is partitioned across multiple shards, which allows for parallel processing of data streams.
Kinesis Streams supports multiple data producers and consumers, allowing for a wide variety of use cases. For example, it can be used to capture data from social media feeds, IoT devices, mobile apps, and web applications. Once the data is ingested, it can be processed using various AWS services, such as AWS Lambda, AWS EMR, or Amazon Kinesis Data Analytics. Additionally, Kinesis Streams integrates with other AWS services, such as Amazon S3 and Amazon Redshift, allowing for seamless data transfer and storage.
Kinesis Streams provides several key features that make it an ideal solution for handling streaming data at scale. These include:
- Scalability: Kinesis Streams can handle data streams of any size, from small to extremely large, without any manual intervention. Users can easily scale up or down their data streams based on their needs.
- Durability: Kinesis Streams is designed to be highly available and durable, ensuring that data is never lost. Data is automatically replicated across multiple Availability Zones (AZs), providing fault tolerance and disaster recovery.
- Real-time processing: Kinesis Streams allows for real-time processing of data streams, providing insights and analytics in near real-time. This makes it ideal for use cases that require real-time decision-making, such as fraud detection or predictive maintenance.
- Cost-effective: Kinesis Streams is a cost-effective solution for handling streaming data at scale. Users only pay for the resources they use, and there are no upfront costs or minimum fees.
Kinesis Streams also provides several tools and APIs that make it easy to use and integrate with other AWS services. These include the Kinesis Producer Library (KPL), which enables users to easily produce data to Kinesis Streams, and the Kinesis Client Library (KCL), which provides a simple way to consume and process data from Kinesis Streams.
In addition to Kinesis Streams, Amazon Kinesis also provides two other components: Kinesis Firehose and Kinesis Analytics.
- Kinesis Firehose: This is a fully managed service that allows users to easily deliver streaming data to AWS services, such as S3, Redshift, or Elasticsearch. It provides a simple way to load and transform streaming data in real-time, without requiring any manual intervention.
- Kinesis Analytics: This is a fully managed service that allows users to easily perform real-time analytics on streaming data. It provides a simple and powerful way to run SQL queries on streaming data, and can be used for a variety of use cases, such as real-time dashboards, anomaly detection, and more.
In summary, Amazon Kinesis Streams is a scalable, durable, and cost-effective solution for handling large amounts of streaming data in real-time. It provides a variety of key features, such as scalability, durability, and real-time processing, and integrates seamlessly with other AWS services. With Kinesis Streams, users can easily collect, process, and analyze real-time streaming data, enabling them to make timely and informed decisions.
Business Benefit: Faster software development process that enables faster delivery of new features to customers
GraphQL is a powerful technology that offers many benefits to the GSA's business by seeking, foremost, to improve their API development and management. By reducing API complexity, improving performance, and enabling better collaboration between front-end and back-end developers, GraphQL is quickly becoming a go-to solution in systems development effort today.
The following are several key features of GraphQL that make it such a valuable technology for businesses:
Increased Flexibility - GraphQL allows developers to request only the data they need, reducing network usage and improving load times. On the surface, this might not seem very important but this flexibility also enables developers to make changes more easily and efficiently.
Better Developer Experience - With GraphQL, developers have access to a self-documenting API that is easy to understand and work with, reducing development time and increasing the quality of the final product.
Improved Performance - By minimizing the amount of data transferred over the network, GraphQL can improve API performance and provide a faster and more responsive user experience.
Reduced API Complexity - With GraphQL, developers can access data using a single endpoint, making it easier to manage the API layer and reducing overall complexity.
Better Collaboration - GraphQL provides a shared language for defining data requirements, enabling front-end and back-end developers to work together more closely and efficiently.
Fleet, through all of its modernization efforts, has adopted GraphQL across its entire suite of microservices in its data access layer. With GraphQL, Fleet has been able to streamline its data layer API development and management processes, reducing complexity and improving overall performance. The technology has enabled better collaboration between its development teams, leading to increased productivity and better outcomes for the business. Overall, GraphQL has proven to be a valuable tool for Fleet as it prepares a legacy of stable APIs for the years to come.
Business Benefit: All Fleet data are available for reporting and analysis from a single source of truth regardless of the storage solution at the data source. This eliminates the need to fetch and stitch together the data from different sources. This solution also saves cost by taking advantage of pay per use model.
Transactional data in the GSAFleet.gov are stored in the relational databases across multiple schemas that belong to individual services. At the same time Fleet uses telematics data that are collected in AWS S3 bucket. All these data might need to be included into reports and should be available for analytics workload.
Serverless Redshift provides a way to access all the data for the analytical processing without having to manage data warehouse infrastructure

As shown above, use of the federated schemas allows querying the data across operational RDS databases and data lakes stored as AWS Glue catalog in S3 buckets.
Business Benefit: We are moving to Buildah because Kubernetes version 1.24 does not support docker anymore for CRI (container runtime). We were using docker to build container images and docker as CRI for run containers. We are switching the CRI to containerD to run containers and Buildah to build container images.
Buildah is a command-line tool used for building Open Container Initiative (OCI) container images without requiring a full container runtime or daemon to be installed. It works alongside container runtimes like Docker or Podman, which allows users to build container images without requiring a full container runtime environment.
Buildah was created to provide finer-grained control over images and allow for the creation of finer image layers. Its commands are much more detailed than Podman's, which uses the same code for building as Buildah.
One of the main advantages of Buildah is that it does not depend on a daemon, such as Docker or CRI-O, and does not require root privileges.
This allows for greater flexibility in building and managing container images. Additionally, Buildah provides a command-line tool that replicates all the commands found in a Dockerfile, allowing for the creation of container images from the command line or a shell script.
Overall, Buildah offers a lightweight and flexible alternative to traditional container image building tools and provides greater control and flexibility in building and managing container images.