Skip to main content
Share your experience with the FAS IT-Playbook by taking this brief survey

Authoritative Catalog Repository (ACR)

Problem Statement:

Problem Statement: Through its catalogs, GSA offers more than 80 million items from 25,000+ Schedule vendors consisting of products, services, and National Stock Numbers (NSNs). GSA relies on outdated policies, processes, and systems/tools to manage these catalogs. As a result, our workforce, buyers, and suppliers experience issues with catalog data quality and accessibility; duplicate, disconnected catalog management processes; and an impression that GSA is not able to fully execute on elements of its mission.

In 2019, the Catalog Management (CM) team provided its comprehensive recommendations for process, system, and policy enhancements to GSA’s current catalog management environment. The team’s vision was to create an “Authoritative Catalog Repository” (ACR) for FAS catalog data by establishing a new, scalable data repository as the single source of truth for FAS catalog offerings (products, services, and NSNs). This repository was to be made API accessible to enable seamless interfaces with existing catalog processing systems as well as downstream sales channels such as GSA Advantage.

Implementation of the ACR is intended to address GSA data quality issues head-on, resulting in greater data quality and accessibility across the FAS portfolio of products, services, and NSNs. ACR offers the opportunity to create a new catalog environment, unhindered by costly legacy architectures and able to leverage current and future commercial technologies to seamlessly integrate with other current and future-state FAS initiatives. It furthers GSA’s broader Federal Marketplace (FMP) Strategy, and delivers on the Catalog Management Office (CMO) vision of a scalable, cloud-based repository for the entire FAS portfolio to facilitate a common FAS approach to catalog data management and standards.

Technical Approach:

Technical Approach: The ACR team selected an approach that minimized impacts to existing catalog operations and legacy system functionality while also providing the flexibility to design a system that was unencumbered by those same legacy systems and processes. To achieve this, the ACR interface starts at the GSA Advantage system boundary, where the EDI 832 files from vendors are translated into flat files. These files are ingested and processed by the ACR in near real-time by parsing and validating business rules which have been updated and enhanced to ensure alignment to contract requirements. The parsed and validated data is stored for consumption by downstream systems and stakeholders such as data analysts, contracting officers, GSA Advantage, and the GSA Enterprise Data Architecture (EDA). The ACR also integrates with the Verified Products Portal (VPP) to enrich vendor provided data, improving data quality (images, descriptions, specifications, etc.) and also supply chain security. The ACR architecture also has the capability to offer read replicas for redundancy and availability.

The ACR is being implemented via multiple phases to minimize the risk of processing millions of catalog items from thousands of vendors. Catalogs from new MAS contracts were processed through the initial MVP rollout in October 2021. During FY22, the second and the later phases of the rollout, included the iterative process of migrating existing MAS contracts into ACR.

Architecture:

Architecture: To implement the above approach, the initiative is divided into multiple architectural epics and associated functional components necessary to meet business requirements. The entire lifecycle of a catalog from inception to publishing or consumption is addressed through the business features and capabilities depicted in the picture below.

A flexible and common authoritative catalog schema was developed so that product attributes can easily be defined, maintained, and extended. Multiple database storage options including MongoDB, MySQL, and PostgreSql were considered to store the schema and ACR data. PostgreSql was chosen since it supports both structured and unstructured data effectively for the needs of the ACR. To host the solution, a variety of options from FCS cloud offerings such as EBTA, VPCaaS, and MCaaS were considered as a hosting infrastructure. The MCaaS model was selected because it supports the vision for establishing an API based, microservices architecture and leverages the benefits of containerization and AWS managed services.

Amazon Elastic Kubernetes Services (EKS) is used to host the ACR microservice (pods) nodes. AWS S3 object storage services are used as storage mechanisms in support of an event based architecture using Kafka to process the catalog through various events. The orchestration of the catalog processing from events such as ingest, parse, validate, and store are handled by an Event Listener microservice shown below.

The ACR Ingest pipeline to process the catalog by event listener based on catalog life cycle events as depicted in the following picture.

The architecture is supported by components that preserve event transactions in order to monitor, analyze, or even replay the events for historical review or operational troubleshooting purposes. Each ACR microservice is published to standard API documentation through Swagger. Once the catalog is successfully processed within the ACR, it is routed to the Contracting Officer Review System (CORS, soon to be replaced by the Common Catalog Platform) for CO approval. Approved catalogs are then published to GSA Advantage.

Outcome:

Outcome: The ACR Program has resulted in a catalog product schema that is flexible, extensible, and maintainable. The architecture enables seamless changes to business rules and validations without making any changes to the underlying implementation. Another major benefit of ACR is raw catalog processing speed, which has been increased by multiple orders of magnitude. As an example, processing time to parse a very large catalog of size of ~1M products, 3.8GB, went from 19 hours on the legacy pipeline to 12 minutes on the ACR. This results in catalogs being made available to contracting officers for review and approval sooner and subsequently faster times to market. There are 18 microservices in use overall, which serve as enterprise APIs for consumption by various enterprise systems and stakeholders. Additional key benefits of ACR include: * A more modern and scalable backend hardware environment; * Removal of the limitations of daily processing capacity (previously, this was ~800K catalog line items due to hardware and software limitations) * Catalog data that is more easily accessible to FAS Business Stakeholders, thereby minimizing resource intensive, one off data requests of IT operations staff. * Enablement of enhanced Supply Chain Risk Management (SCRM) capabilities, including search/filter/removal of prohibited products, unauthorized distributors, and product misrepresentation * Alignment to and integration with the GSA Enterprise Data Architecture * Opportunities to ultimately retire elements of legacy infrastructure (databases, pipelines, licenses, and associated maintenance staff)

Next Phases:

Next Phases: There are plans for several continued enhancements to ACR, including the ongoing integration with the new Common Catalog Platform (CCP), additional integration with VPP for data enrichment and SCRM capabilities, and the remaining phases of ingesting catalogs from other FAS programs such as MAS BPAs, 4PL/EDD Retail Operations BPAs, local catalogs, and Services catalogs.