Microservices enable teams to build and deploy software independently. This includes many benefits, like:
But.. as microservices develop independently, so too does the idea of "what the software does." I'll share an example: OAuth2 has the concept of "scopes." Applications can choose how to use scopes. They do not tend to scale well because each application develops its own use of scopes, and scopes mean different things to different microservices! This can cause problems downstream, but the bigger points is that services' idea of "how to use scopes" changes. These sorts of little inconsistencies make systems difficult to change all at once.
Legacy software. Legacy software got built for reasons that the org may or may not worry about today. But the clients use the legacy systems, and some critical bits of business logic are rattling around in there. But the legacy systems... they do not reflect where things are going.
When dealing with microservices and legacy systems... How do you create a consistent external-facing look-and-feel? Does pagination work the same way? Do I get consistent error message formats between endpoints? Do the endpoints have consistent documentation?
We built a system that handled all of the API-things, enabling development teams to do the rest.
The API things include:
The level-of-effort of getting all the microservices to adopt the standards and practices set by the org cannot scale!
What if one system handled all of those things?
But how do you on-board microservices?
We stepped back and looked at the system from a conceptual point-of-view.
By thinking about the "bigger picture" and business impact of an "API Product," we realized that we could reduce complexity and increase the speed of development.
What makes operating an API product hard?
To address these issues, we broke the system into two components. Taking inspiration from Kubernetes and networking equipment, we defined a "control plane" and a "data plane."
The control plane handles specific events, and when a set of predefined conditions gets met, the control plane triggers changes in the data plane.
The data plane handles the internet -> API routing. It includes firewalls, bot detection, caching, monitoring and alerting, authentication / authorization, and proxying to the correct underlying host.
Most of the time, data-plane concerns get handled using infrastructure-as-code. This is where a an engineer updates code files and triggers infrastructure changes. This is great, and is the solution that works for most use cases.
However, when engineers have to handle configuration changes based on business events, it can take a long time for the changes to get done. To expedite this, we decided to handle infrastructure change events as code.
As a result of these decisions, we ended up with a system which was:
Legacy and nascent services were able to consistently and reliably expose API endpoints. Using async events, other stakeholders were able to review API documentation prior to release.
OpenTelemetry provides a vendor-agnostic approach for collecting operational data.
Using OpenTelemetry requires a "collector." The collector receives data from the application. We were asked to provide a low-config, universal configuration for applications to submit telemetry data.
Referencing OpenTelemetry Documentation, this pattern is called a "Gateway Deployment."
Image credit: OpenTelemetry
We deployed an internally-available Kubernetes service and configured a private domain name specific to the collector. The domain resolved across AWS and GCP cloud networks.
We provide consistent solutions that work for all of the applications in the ecosystem.
Our resulting configuration was:
We provided teams with comprehensive documentation on submitted data via OpenTelemetry, and we provided example dashboards of the resulting telemetry.
Despite the best efforts of countless professionals, sometimes things slip through the cracks. While replacing a legacy identity service, a dependent service discovered that they used an "add user by email" function. This was a problem, because the new system was "sign-up-and-enroll." We needed users to enroll themselves rather than "get-added" so that the user consented to join the system.
But the dependent service had a valid use-case: they had business logic which needed to execute prior to the user's login.
By capturing these business requirements in a single, isolated system, we minimize the integration changes needed by ecosystem services. At Corewood, we always focus on how our choices impact not just the current system but options for future development as well.