FALCON:Going Serverless

Introduction

The decision to use Serverless technologies for FALCON were based upon a number of factors. Chief among them were operational modularity (a beneficial feature of the legacy solution), improved resource control / management, and the ability to eventually shift from scheduled-based operation to event-based operation.

The legacy solution ClearPath mainframe platform is not without its merits, otherwise it would not have been able to serve GSA’s needs for as long as it has. The legacy solution is implemented abstractly through units of code that are individually deployable and maintainable, orchestrated by a system with rich decision-making capabilities. This provides the ability to maintain the system as small decoupled units, gaining many of the same benefits as a Microservice architecture.

The creation of FALCON was in part a necessity prompted by the need to get off the ClearPath mainframe platform; the decision made for GSA to move away from the ClearPath platform involved concerns around:

Lack of operational flexibility, including:
- Multiple divisions share hardware hence maintenance causes massive impacts across multiple systems.
- Infrastructural dependencies not in alignment with system and business dependencies
Operational costs, including:
- Premium costs to reserve compute peak capacity, even when capability went unused
- Need to estimate costs upfront for long periods of time with little flexibility during execution
- Excessive expenses to add on new infrastructural capabilities or expand existing ones

As a result, a viable solution needs to solve these issues while retaining the deployment flexibility observed in the legacy solution.

The most attractive benefits of serverless are:

The Cloud Service Provider (CSP) is responsible for management of the service infrastructure. We expect improved management of O&M costs due to moving responsibility for service maintenance and operations to the CSP since patching and scaling adjustments will not be necessary.

Serverless services optimize/emphasize the “only pay for what you use” cloud model by shrinking the service boundary away from infrastructure units to functional units (e.g. EC2 hosted RESTful endpoints to Lambda function calls). Improved scalability allows for the reduction or elimination of idling compute resources, resulting in reduced costs.
Serverless computing is an ecologically sound approach, requiring far less on-line computing resources than traditional architectures due to economies of scale and technology-based optimization.

Background: FSS-19 is an Orchestrational System

The FSS-19 legacy solution is composed of small programs, each dedicated to a singular function within the overall system. These programs take in input from upstream systems as well as the output from other programs within the workflow; their output is then sent to programs further down within the workflow, or to downstream external systems. This forms the basis for the “filters and files” pattern used within the legacy system.

This pattern has operational benefits, in that it allows maintenance to be performed on small deployable units that are able to be modified and redeployed without the need to redeploy other units, and with minimal impact to other components.

The execution of these programs is controlled using Workflow Language, allowing the outputs to be routed to other programs. This ability to control how, when, and where the results of small deployable units of code are routed provides great flexibility. This flexibility has contributed to the long life of the existing solution; any replacement solution would need to provide similar operational benefits and flexibility.

There is a well known architectural pattern called “Pipes and Filters”. In this pattern, the pipes represent the flow of information from one filter to the next with each filter responsible for a data transformation of one kind or another whether that be persistence oriented (e.g. post it to a message queue, a file system or a data repository) or internally transformational (e.g. merge in reference data or denormalization) or both persistent and transformational. State machine and workflow solutions are typically employed to solve the data flow nature of such systems. The IT Playbook recognizes this pattern in its Orchestration Macro-pattern. Both FSS-19 and FALCON employ the “Pipes and Filters” pattern in their solutions.

Comparison of Operational Responsibility

The FSS-19 legacy system could be further classified as “Files & Filters” where modular deployable units of tightly focused code are used to process specific datasets in accordance with its associated business process. Each is managed by rich orchestration and handles both heterogeneous and homogeneous input.

While the technology of the FSS-19 lacks scalability and elasticity, the model or pattern has allowed the legacy system to support GSA needs for over 40 years. If this model was not in alignment with the needs, it would have facilitated a change much earlier in its lifetime.

Problem Statement

Given the above context, two problem statements came to the fore: * How can an orchestrational system, like FSS-19, best be implemented in the Cloud?
* How can a responsive system be built that can * instantaneously handle requests, * minimize resource consumption and * be composed of programs that are focused almost exclusively on business logic implementation?

Considerations

Since the nature of FALCON is one of orchestration, it was important to look at the cloud and find the best way to implement an “orchestrational” system. Truly, the entire system could have been deployed on a few virtual machines and some queuing software but that would have been backward-facing rather than preparing for the future. Conversely, many parts of traditional “servered” system can be rendered using serverless cloud services. Generally, it boils down to which components need full time, sustained computing resources and which parts of a solution are more aligned with ephemeral resources. In an organization that prides itself on “leaning forward”, the cloud native services such as Lambda, Step Functions and Events offered FALCON huge advantages over a classic three tier approach.

They closely matched the current time-tested approach of FSS-19
They were serverless which meant that almost 100% of FALCON code would implement business requirements
FALCON needed to be instantaneously scalable
Our objective was that only programming bug related issues would impact system reliability
Disaster recovery needed to be very straightforward and quick
We wanted a very limited “attack surface” for security challenges
We anticipate seamless performance and reliability bumps due to background CSP service improvements
The “only pay for what you use” cloud model would be optimized, improving our sustainability and cost profile over a conventional implementation.
FALCON would be well positioned to participate in GSA’s future event-driven enterprise.

Well-Architected Framework (WAF) Considerations

The FALCON team kept the AWS Well-Architected Framework (WAF) in mind when considering what the modernization of FSS-19 would look like. The seven pillars of the framework are Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability. These concepts are not just boilerplate industry buzzwords, but living values that each significantly impacted the decision to use Serverless technologies and concepts as the core of the FALCON solution design.

AWS Serverless services satisfy Operational Excellence by having high availability baked into the very substance of most services. Combine these capabilities with a disciplined practice of defining all infrastructure as code (IaC), and this provides easy means of supporting disaster recovery.

The nature of a system composed of Serverless managed services supports the Security pillar, as it reduces the attack surface available within the system. Resources are only provisioned as needed, often with only automatically generated and managed access credentials (meaning credentials never available to a human). Following best practices eliminates almost all of the resources within a system’s security asset catalog, as the resources are ephemeral, inaccessible, and managed within the CSP’s security boundary.

The nature of Serverless services offered by cloud providers moves the responsibility for the servers away from the tenant and to the cloud provider. The managed auto-scaling capabilities that result from this move support a number of the pillars. Auto-scaling for managed Serverless services supports the Reliability pillar by removing the need for the tenant to manage the servers or deal with patching; patching happens automatically, eliminating service windows and struggles with patch incompatibility. These same auto-scaling mechanisms (also support the Performance Efficiency pillar (especially in the case of FaaS) by automatically providing additional resources in the form of horizontal scaling for exactly those portions of the systems that need it, and only when they need it. Multiple options are provided to allow the tenant to match resource scaling and idling behaviors to their needs, thereby eliminating the resource latency issues that such managed services once had.

The auto-scaling of managed Serverless services also allows tenants to take advantage of cost reductions at fine-grained level, as with the advent of Serverless, we truly can realize the Pay-as-You-Go cost model.

AWS recently added the Sustainability pillar to their Well-Architected Framework. This new pillar advocates for the tuning of service provisioning to fit their purpose; not too big and not too small, but just right, thereby eliminating the unnecessary consumption of resources. Since serverless services are able to nimbly scale up and down quickly, sustainability is built into serverless solutions. This pillar aligns with the goals set forth in the GSA Strategic Play - 2022-2026.

Sustainability Shared Responsibility Model

Current FALCON Architecture

The current architecture only uses serverless components. If there is interest, each of these components are outlined in the FAS IT Playbook.

Abstractly, the concept of focused deployable units with rich orchestration described previously maps very cleanly to Lambda and Step Functions. So the architecture:

Retains the ability to maintain individual focused units with little to no impact on other units
Retains the ability to deploy those units individually without having to redeploy other units
Rich orchestration allows for the categorization (“filtering”) of input and redirection of categorized (e.g. “sorted”) records to specific areas of the system for processing
Orchestration also provides the ability to create new “paths” through the system to handle new cases. by providing the flexibility to reorder and reconfigure existing components and “splice in” new ones

It is true that the system could have used conventional services like virtual servers or containers, with RESTful endpoints providing the API endpoints and running applications providing the “back-end” functionality. All of this has been available in the cloud for more than a decade now. However, the similarity that the serverless computing paradigm has to the legacy system coupled with the ephemeral nature of the target workloads made the serverless decision rather straightforward. Also, we see the AWS Cloud moving very quickly in the direction of serverless services.

Function as a Service (FaaS)

Resource elasticity (scalability up and down) at the granularity of the function, allowing compute resources to be scaled only for those functions that need it
This fits our usage pattern, as our inputs are quite “bursty”

Additional benefits

Serverless means we don’t have down-time to support server patching or even AMI/Container image patching
Little to no management needed for managing resource scalability -- it’s handled entirely automatically by the service provider (AWS) which allows developers to focus on value development, not infrastructure.
Costs are kept low, not only because of fine-grain control of resource utilization, but because of economies of scale that using Serverless and FaaS services from AWS provides.
Serverless means no servers; reduced attack surface aids in improving security
IaC combined with Serverless services means that DR is simplified
AWS Serverless offerings have native compatibility with AWS resource and security monitoring tools and services

Future-proofing

In the not too distant future, events and software agents (e.g. RPA) will animate the GSA enterprise. Events will be raised and listening agents will be able to react to them as necessary. This means that many systems might be able to react to enterprise or division level system events without manual notification, timed CRON jobs or conditional polling agents. Since FALCON is in the middle of several critical systems, its architecture anticipates this coming paradigm and will be ready to participate seamlessly in an event-driven enterprise.

Although the upstream and downstream systems are our limiters for changing from scheduled processing to event-driven processing, FALCON is positioned for this change

Lambda and Step Function technology already relies upon event-driven model
Internally, FALCON is already event-driven; only the inputs are scheduled (in accordance with our upstream and downstream partner systems)
Communications between systems can easily be converted to event-driven actions as soon as partner systems are ready

Other technologies that we are fully aware of and anticipating are the Internet of Things (IoT), Artificial Intelligence (AI) and Machine Learning (ML). All of these will be very important in Supply Chain solutions and are coming on very fast.