Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Share your experience with the FAS IT-Playbook by taking this brief survey

Data Capabilities

Capabilities are discrete functional components for enabling a business or technical activity. These capabilities can be used in concert to enable various plays within the playbook. They are grouped into Services that are security pre-approved with associated security controls accelerating the ATO of our tenant applications. For data, the defined services are Data Governance, Data Management and Data Analytics.

The following 18 data capabilities are numbered and organized by three categories: Data Governance, Data Management and Data Analytics. For each capability you will find a description, key capabilities, maturity (as defined by the maturity index to the right), and when applicable, corresponding technologies and documentation. For further exploration, visit the FAS Enterprise Data Architecture section of the Playbook

Maturity Index

1. Data Catalog (Data Governance)

Description

The Data Catalog is an organized inventory of data assets in the organization. It uses metadata to help FAS manage its data, and helps data professionals collect, organize, access, and enrich metadata to support data discovery and governance.

The data catalog stores data set and attribute-level metadata and enables data stewards to create and maintain that metadata for existing and new data sets.

Key Capabilities
Data search & Discovery Find relevant information within the huge volumes of enterprise data, contextualize it, and determine how the data can be accessed/used
Curation & Governance Ensure analytics and insights are derived from the best, most trusted data. By applying governance at the point of data use, the data catalog reduces misuse of data and ensures compliance with agency and regulatory policies.
Collaboration & Analysis Through wiki-like articles, ratings, reviews, and conversations, the data catalog facilitates collaboration among an increasingly global and remote workforce.

Maturity

Common Service

FCS Product Offerings

  • Data Governance

Technologies

Alation

Additional Documentation

Data Catalog Capability

2. Data Quality (Data Governance)

Description

The Data Quality Service provides the necessary capabilities to assess the validity, accuracy, completeness, correctness, and timeliness of the data. The service supports data users as they are evaluating new data sets; and production applications and data pipelines as they are performing CRUD functions and processing data.

Key Capabilities
Data profiling Generate descriptive metadata about a data set, (e.g. schema, data types, field lengths, value distribution, valid values, etc.)​
Rule definition Specify data quality rules based on prescriptive, (e.g. business rules) and descriptive, (e.g. technical) constraints and specify applicability (e.g. full dataset, sampling, etc.)​​
Rule execution Invoke rules through data pipelines/orchestration solutions and support corrective data quality, including logging of data corrections and rule execution results
Rule lifecycle management Modify rules and track changes over time.
DQ Results Reporting/Notification Includes DQ dashboard results, alerts/notifications for users and systems

Maturity

Concept Phase

Technologies

* TBD

Additional Documentation

* TBD

3. Master Data (Data Governance)

Description

The Master Data Service provides capabilities to rationalize core data domains and create authoritative data sets that can be exposed and leveraged across systems and business domains.

Key Capabilities
Unique Identifier Creation and Management Create/apply unique IDs to drive consistent identification/linking of master data elements across systems
Data Standardization Apply consistent formatting and correct data inconsistencies of master data elements, (e.g. address formatting standardization)
Exact Matching Identify master data relationship across systems based on byte-for-byte matching values
Fuzzy Matching Identify potential master data relationships across systems based on similar values and/or complex logic across multiple attributes
Recommendations Show potential master data record matches across systems and allow users to determine if they are valid or invalid matches.

Maturity

Concept Phase

Technologies

* TBD

Additional Documentation

* TBD

4. Data Lifecycle (Data Governance)

Description

The Data Lifecycle Service provides the mechanism to manage data storage in alignment with data retention, archiving, and purge requirements to reduce data sprawl and storage costs

Key Capabilities
Lifecycle Definition Define the conditions under which data can be retained, archived and or purged.
Time Driven Lifecycle Move data to lower tiered storage based on elapsed calendar time since the data was created or modified.
Utilization Driven Lifecycle Move data to lower tiered storage based on elapsed calendar time since the data was last touched.
Intelligent Tiering Move data between storage tiers based on utilization/access patterns.

Maturity

Concept Phase

FCS Product Offerings

  • Data Lake
  • Data Warehouse

Technologies

Alation AWS Redshift AWS S3

Additional Documentation

* AWS S3 Lifecycle Management

5. Lineage (Data Governance)

Description

The Data Lineage Service enables understanding, recording and visualizing data as it flows from data sources to consumption. This includes all transformations the data underwent along the way—how the data was transformed, what changed and why.

Data lineage shows the history of the data you are looking at today, detailing where it originated and how it may have changed over time. It is a reflection of the data life cycle, the source, what processes or systems may have altered it and how it arrived at its current location and state.

Key Capabilities
Lineage Mapping Graphical representation of the data flow between source and target
Lineage Details Description of data transformations applied to the data through each step of the data processing pipeline.
Design-time Lineage Lineage based on the intended process flow when the data pipeline was being created.
Run-time Lineage Lineage based on the actual data pipeline execution

Maturity

Early Adoption

FCS Product Offerings

  • Data Governance
  • Technologies

    Alation

    Additional Documentation

    * Data Catalog Capability

    6. Reference Data (Data Governance)

    Description

    The Reference Data Service provides a means to manage bounded, common data sets across data domains to drive consistency. Reference data is slowly changing by nature and is used to group or organize other data. Within OLAP models, reference data is often represented through dimension tables.

    Managing reference data centrally ensures the ability to consistently group and organize data which enables easier cross-domain analytics.

    Reference Data PublicationGenerate and expose authoritative copies of reference data to support different data consumers.
    Key Capabilities
    Reference Data Inventory Store and manage reference data sets centrally.
    Change Notification Create systematic alerts when reference data records are created, modified, or deleted.
    Reference Data Harmonization Standardization of multi-source reference data through business rules applied as transformation logic.

    Maturity

    Early Adoption

    FCS Product Offerings

  • Data Warehouse
  • Technologies

    AWS Redshift

    Additional Documentation

    * TBD

    7. Data Policy (Data Governance)

    Description

    The Data Policy service provides a centralized location to define and manage the rules for users interaction with data. Stewards can map the rules to specific data sets and identify which policies are being applied to what data and user groups.

    Key Capabilities
    Policy Definition Specify rules, conditions, warnings mapped to data sets and elements.
    Policy Execution Based on defined rules manage users access/interaction with data consistent to the policy definition.
    Policy Audit Detailed view of policy definition and how it is applied to specific data sets and elements.

    Maturity

    Concept Phase

    FCS Product Offerings

  • Data Governance
  • Technologies

    Alation

    Additional Documentation

    * TBD

    8. Sensitive Data Detection (Data Governance)

    Description

    The Sensitive Data Detection service provides an automated means to identify data elements that require additional data protection or special handling based on organizational or regulatory rules.

    Key Capabilities
    Pattern Matching Identification of sensitive data elements based on attribute structure/format.
    Metadata Matching Identification of sensitive data elements based on attribute name or definition.
    Rule Definition Creation of detection rules based on business-defined conditions.
    Catalog Integration Automated updating of data catalog with tags for sensitive data attributes.

    Maturity

    Concept Phase

    Technologies

    * TBD

    Additional Documentation

    * TBD

    9. Scheduling and Orchestration (Data Management)

    Description

    The Scheduling and Orchestration Service provides the ability to set up recurring executions of data pipelines/processes based on time parameters or conditions. This service reduces the need for manual intervention and can be used in conjunction with infrastructure provisioning capabilities.

    Key Capabilities
    Time-based Schedule Creation Configuration of recurring job executions based on time conditions, (e.g. time of day, day of week, 1st day of the month, etc.)
    Condition-based Schedule Creation Configuration of recurring job executions based on specific conditions being true. This could include dependency on another job, a specific file being delivered or a notification from another system.
    Job Execution Retry In the event that a job does not complete successfully, automatically restarting the job.
    Point-of-Failure Restartability In the event of a process failure, the ability to restart the job from the point that the failure occurred vs. restarting the entire process over.
    Job Branching and Merging Complex orchestration that allows jobs to initiate other jobs, wait for other jobs to complete before execution and feed processing details into subsequent jobs.

    Maturity

    Early Adoption

    FCS Product Offerings

  • Extract Transform Load (ETL)
  • Technologies

    Linux cron / crontab

    Additional Documentation

    * TBD

    10. Data Model (Data Management)

    Description

    The Data Model service provides a means to manage and map key organizational data and relationships between that data and represent those relationships graphically. This is key for supporting data governance, management, and design activities. Integration of the Data Modeling solution and Data Catalog is key to ensure consistent data management.

    Key Capabilities
    Model Creation Define a model including key entities/tables, attributes/fields, and relationships.
    Attribute Management Configuration of attributes including defining business and technical metadata
    Relationship Management Define how different entities are related based on the attributes that each entity contains
    Constraint Management Establish rules for attributes, (e.g. key values, valid values, nullability, format, etc.)
    Data Definition Language Generation Creation of scripts from the data model that can be used to create/modify database objects.
    Reverse Engineering Generating a data model based on a database DDL

    Maturity

    Concept Phase

    Technologies

    * TBD

    Additional Documentation

    * TBD

    11. Data Sharing (Data Management)

    Description

    The Data Sharing service provides systematic means for data owners to expose data to interested parties through controlled interfaces.

    Key Capabilities
    Direct Access Providing data consumer direct access to the data storage layer through defined access controls based on the sensitivity of the data and the permission of the user.
    Data Abstraction Creation of a semantic layer to manage data access and provide a managed view of the data to the data consumer where the consumer does not have direct access to the underlying data.

    Maturity

    Concept Phase

    Technologies

    * TBD

    Additional Documentation

    * TBD

    12. Data Exchange (Data Management)

    Description

    The Data Exchange service provides a means to deliver authoritative copies of data to downstream users/systems to support local application processing and/or analytics.

    Key Capabilities
    Bulk Data Transfer Creation of an authoritative copy of data that can be delivered to the consumer for reuse. This is done through the creation of batch files delivered to a specified location or through the creation of database replicas for one-time or ongoing (change data capture) data transfer.
    Application Programming Interface (API) Brokered real-time synchronous interface between data owner and consumer based on a request/response paradigm whereby the consumer makes a specific request for data to the data owner based on a predefined data specification.
    Event Publication Brokered real-time asynchronous interface where the data owner publishes notifications of changes in state or the actual state change of data to a centralized queue where consumers can monitor the queue for data of interest and consume and process the data as the events occur.

    Maturity

    Early Adoption

    FCS Product Offerings

    • Data Lake
    • Database Migration

    Technologies

    AWS DMS AWS S3

    Additional Documentation

    * TBD

    13. Data Processing (Data Management)

    Description

    The Data Processing service enables integration, standardization, organization, and data derivation to make data easier to consume/use downstream. It supports data integration to manipulate and consolidate data from disparate sources into a useful form. This allows users to have easy access and reliable means to meet the information needs of all applications, users, and business processes. It helps to produce a unified view to be able to glean actionable information from it.

    Key Capabilities
    Extract Transform Load (ETL) Access and pull data from sources, apply transformations, refine and publish data for downstream consumption.
    Extract Load Transform (ELT) Access and pull data from sources, persist a copy of the source data for additional refinement, apply transformations, refine and publish data for downstream consumption.

    Maturity

    Common Service

    FCS Product Offerings

    • Extract Transform Load (ETL)
    • Data Processing Cluster
    • Database Migration

    Technologies

    Pentaho Data Integration Amazon EMR Serverless AWS DMS

    Additional Documentation

    * Data Integration Play

    14. Data Storage (Data Management)

    Description

    The Data Storage service provides the ability to store, manage, and expose data for data consumers to access, query, explore, analyze, and use that data to generate new insights and reports.

    Key Capabilities
    Unstructured Data Storage Capturing and persisting data in a scalable manner to enable centralized storage of cross-domain data for further downstream processing and consumption. Unstructured data storage can handle any file/object type and store it in a cost-effective manner with easy ingestion and access methods.
    Structured Data Storage Capturing and persisting conformed data organized in a business context to support ease of data exploration, analytics and reporting. Structured data storage enforces data design specifications such as schema to improve quality and usability of the data.
    Data Access Query/interact with data through standard interfaces based on user roles and data protection policies.
    Data Protection Encrypt data to further protect it from unnecessary exposure/access.

    Maturity

    Common Service

    FCS Product Offerings

    • Data Lake
    • Data Warehouse

    Technologies

    AWS Redshift AWS S3

    Additional Documentation

    * Data Warehousing with AWS Redshift

    15. Self Service (Data Analytics)

    Description

    Self-Service provides capabilities to allow users to query data through a command line interface that supports ANSI standard SQL and manipulate, integrate, and transform data to derive new insights. This service is intended to allow business users to generate new insights and prototype data pipelines.

    Key Capabilities
    Query creation Writing of custom SQL against analytic data stores to explore the data and generate insights
    Query optimization and editing Refactoring of query based on new business requirements or to improve performance based on systematic recommendations, (e.g. explain plan)
    Query version control Saving/persisting versions of a query, being able to track changes, and potentially branch/merge code across users
    Extract Transform Load (ETL) Access and pull data from sources, apply transformations, refine and publish data to support localized analytics/reporting
    Extract Load Transform (ELT) Access and pull data from sources, persist a copy of the source data for additional refinement, apply transformations, refine and publish data to support localized analytics/reporting.

    Maturity

    Early Adoption

    Technologies

    SQuirreL SQL Client Alation Compose Tableau Elastic Map Reduce (EMR) MicroStrategy

    Additional Documentation

    * TBD

    16. Computational Service (Data Analytics)

    Description

    The Computational service provides a means for scalable, parallelized complex data processing and compute. It is intended to support core capabilities for advanced data processing to support analytics, data science, and machine learning (ML).

    Key Capabilities
    Apache Spark-based Processing Leverages Spark's in-memory processing to improve scale and parallelization for large-scale data processing.
    Multi-language Support Use python, scala or java to write Spark-based data processes.
    Library Integrations Extend data science functionality through common open source libraries.
    EMR Studio / Jupyter Notebooks Integrated development environment (IDE) for data scientists and data engineers to develop, visualize, and debug data engineering and data science applications written in R, Python, Scala, and PySpark.

    Maturity

    Early Adoption

    FCS Product Offerings

  • Data Processing Cluster
  • Technologies

    AWS EMR Amazon EMR Serverless

    Additional Documentation

    EMR User Guide

    17. Business Intelligence (Data Analytics)

    Description

    The Business Intelligence service includes all facets of standard reporting, dashboarding and data visualization capabilities including authoring, publication, lifecycle management, and access to reporting and visualization artifacts.

    Key Capabilities
    Pixel-perfect Reporting Structured reporting conformed to exact specifications to meet organizational or regulatory requirements.
    Standard Reporting Structured tabular reports where the user can interact with the data to filter, drill up/down/across, and explore the underlying data.
    Visualization/Dashboards Interactive reports including charts, visual representations, and graphs.

    Maturity

    Common Service

    FCS Product Offerings

  • Business Intelligence
  • Technologies

    MicroStrategy Tableau

    Additional Documentation

    * Data Visualization Play

    18. AI/ML Lifecycle (Data Analytics)

    Description

    The AI/ML Lifecycle service enables data scientists to manage all facets of model creation and execution through standardized tools and methods aligned with best practices for model management and DevSecOps approaches.

    Key Capabilities
    Data Acquisition and Refinement Import/access data and standardize it for input to machine learning model
    Model Development Create and refine models
    Model Training Harmonize models through additional input data and refactoring
    Model Testing Validate model outcomes and functionality.
    Model Versioning Retain model versions, including input data, code, and output data for development and compliance requirements
    Model Promotion Migrate approved models to execute in a production environment and/or integrate with production applications.
    Model Monitoring Recurring validation of models to identify data and/or model drift.
    Model Refactoring Updating/retraining models to ensure model produces appropriate outcomes

    Maturity

    Concept Phase

    Technologies

    * TBD

    Additional Documentation

    * TBD