Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

survey icon Share your experience with the FAS IT-Playbook by taking this brief survey

FAS Enterprise Data Architecture

FAS Enterprise Data Architecture is the general term to refer to the overarching strategy on how FAS plans to manage its data. FAS Cloud Services - Data (FCS-D) a set of specific services from the FAS Cloud Service Ecosystem that are focused on data. The FAS Data & Evidence Governance (DEGB) is the policy group to establish the frameworks and processes for the Acquisition Workforce (AWF) with the collection, processing, and delivery of its data. FCS-D is rooted in enabling services to provide the Acquisition Workforce (AWF) to study, research, or collaborate on FAS' data. FCS-D is part of the larger FAS Cloud Service Ecosystem and focused on providing a "Centralized Data Analytical, Service Wide Coordinated Capability". FCS-D has defined three services that provide the ability for Application Modernization and the Acquisition Workforce the means to conduct their mission.

These services are:

DATA GOVERNANCE SERVICE: Capabilities to enable data cataloging/metadata, data quality, data lineage, data classification, and data policy definition to support discovery, usability, understanding, and appropriate usage of data.

DATA MANAGEMENT SERVICE: Components to acquire, process, refine, store and publish data for downstream analytics, reporting, and insight generation.

DATA ANALYTICS SERVICE: Capabilities to enable data exploration, analytics, data refinement, machine learning, and BI/Reporting.

The Services are leveraged through the DATA PROCESSING FRAMEWORK, which is a combination of services to acquire data from sources, process and publish it to support downstream consumption for analytics, reporting and insights. The data can then be accessed and leveraged through VIRTUAL WORKSPACES which provide the necessary tools and shared environments to allow collaboration, controlled data solutions sharing and self-service data application creation for various end-user needs and USER INTERFACES which are the mechanisms that allow data consumers to interact with the underlying data. The interfaces support different use cases and different levels of user technical experience.

Data Architecture Logical View

Explore Current Data Capabilities

Apart from a logical view, it is important to grasp how data moves through the system. The "Process Flow" view below shows data moving from "Data Sources" of various types into a data lake. Within the "Data Lake" the raw data is cleaned, conditioned and transformed into the curated section of the lake. The curated data is represented as the "Gold Buckets" in the Analytics Engine Macro-Pattern click here to read the Analytics Engine Macro-Pattern whitepaper. Externally facing or supplied data needs to be specified in a data catalog which will act as an interface for understanding what data is and how to use it. The "Data Catalog" is a collection point for things like provenance, lineage along with other governance related metadata that give consumers and suppliers a consistent understanding of the data contained and handled by FAS Data.

FAS Data Process Flow View

This last view, FAS VPC View, is a technical architecture that shows how FAS Data will be realized in the GSA Amazon Cloud. Two VPCs will be provided, one for management and another to contain or access the services that realize FAS Data itself.

From the left and moving right, the VPC dependent services (e.g. RDS, Redshift) will be controlled within the LDE VPC which will use subnets to encapsulate the various component parts. In the interest of "least privilege access" to regional services, access points will be used to keep things as private as possible for data contained in the S3 buckets. Overall access to FAS Data resources will be provided through "Jump Box" instances supplied by a "Management VPC". CI/CD and provisioning services will run in the Management VPC and security is woven throughout.

FAS VPC View

If you are familiar with the Analytics Engine Macro-Pattern, then you'll see that this solution conforms with it. Don't let the technology selections deter you from understanding what is being accomplished here. The technology might, and probably will, change but the overall objective of FAS Data, to consolidate and provide data that is useful to the GSA enterprise, will not change. In the end, FAS Data:

Introduction to the GSA IT Database Crosswalk

A particular focus of the FAS-IT Systems Modernization effort is Information / Data Management and driving improved information access and use - from transacting with / on data to intelligent data analytics. The goal is to drive convergence, improve resource visibility and management and - most importantly - provide 'better data' to the business lines.

The Database Crosswalk Sheet includes an inventory of data tools in use today, an overview of all database types, and an inventory of all systems and the databases they use.

It is important to understand both where the data is and how it is being used. As a first step, "where it is" is what the "GSA IT Database Crosswalk" (linked below) seeks to answer. Therefore, this document focuses on identifying the systems that create or provide data along with the employed storage types. It doesn't explore how data is transported, shared or otherwise used.

To provide this system-storage view, we identified the data storage solutions used by GSA FAS systems including SQL/NoSQL use cases along with secondary storage mechanisms. For supporting information, we created a view of all the data types which is there to identify the possible storage solutions that are available for use. Finally, an inventory of the current data storage products gives context to the particular products in play today. When taken together, this document provides an overview of the current state of system data storage within GSA FAS. This is a 'living document' and will be updated regularly.

Going forward, to get a complete view of the data in play at GSA at any given moment, we will need to add messaging, the transient data states within analytics, streaming data, APIs and their SLAs, and probably many others. All of this is important in its own right. By adding such things to the information that has already been gathered, the entire data panorama can be perceived.

Click here to access the GSA IT Database Crosswalk