Architecture Nugget
Posts
Architecture Nugget - December 5, 2024

Architecture Nugget - December 5, 2024

Exploring Control vs Data Planes, Back Pressure Strategies, and Event-Driven Systems

December 05, 2024

Hi There!
Before we dive into this week's nuggets, I’d love to hear from you. I’m thinking about putting together a special deep edition of Architecture Nugget on a particular topic. If there’s a niche area you’d like to explore further, let me know! Your input will help me shape some exciting future content.

Now let’s get some nuggets!

P.S. Some of the nuggets in this edition might be a bit vintage, but don't worry—they've aged like fine wine and are still as fresh as ever 🍷

Control Planes vs Data Planes

When you’re building distributed systems, one of the big puzzles is getting the block diagram just right. You gotta figure out what parts you need and how they’ll chat with each other. It’s super important to nail this early on because changing it later can cost a lot.

A pattern that keeps showing up is splitting stuff into control planes and data planes. Even in systems that seem like one big piece, you’ll see this split. The data plane is all about handling the real requests and needs to be up all the time—think about storage and load balancing. These parts grow with the number of requests (O(N)). The control plane, though, takes care of things like handling failures, scaling, and deployments. It can actually go down for a bit without anyone noticing, and it scales in a different way.

Here’s a typical setup:

This split helps keep things from getting too complicated. Take chain replication systems, for example—the data plane deals with the actual replication chain, focusing on performance and throughput, while the control plane (often called the master) handles failure detection and recovery using something like Paxos.

But it’s not always a simple split. Sometimes control planes need their own control planes as systems grow. Plus, you might have several specialized control planes dealing with different things like fault tolerance, scaling, and provisioning.

The main takeaway? While the control plane/data plane split isn’t a strict rule, it’s a great way to:

Cut down complexity with clear APIs
Give parts clear jobs
Use the right tools for different tasks

This pattern helps make distributed systems more maintainable and scalable, even if the separation isn’t always super clear.

For more on the theory and real-world examples, check out the original article “Control Planes vs Data Planes”—it’s got some cool insights about control theory and practical system design that I didn’t cover here.

Applying Back Pressure When Overloaded

When systems get more requests than they can handle, they often crash or slow down a lot. That’s not good for anyone, right? Let’s chat about a smart solution called “back pressure” that helps manage system overloads.

Think of it like this—your system has a max processing capacity based on thread pools and how long transactions take. When requests go over this limit, you’ve got to do something clever instead of letting everything fall apart.

Here’s how a good system setup should handle it:

The key parts work together like this:

Gateway layer handles protocol translation and security
Bounded input queues stop system overload
Thread pools process transactions at the max sustainable rate
Back pressure mechanism rejects extra requests

Here’s a handy formula for queue setup:

max_latency = (transaction_time / thread_count) * queue_length
queue_length = max_latency / (transaction_time / thread_count)

When the input queue fills up, the system applies back pressure upstream by:

Blocking threads receiving network packets
TCP network buffers filling up
Gateway returning HTTP 503 to clients

Some practical tips:

Keep an eye on queue lengths (alert at 70% full)
Watch transaction processing times
Consider the whole service flow
Use bounded queues to keep QoS
Implement meaningful “server busy” responses

By using back pressure, your system keeps the best possible throughput while maintaining good response times for accepted requests. It’s way better than letting everything crash, right?

If you’re interested in more about system design patterns and queue setups, check out the original article “Applying Back Pressure When Overloaded”—it’s got some great details about synchronous designs and Linux memory management that I didn’t cover here.

Implement effective data authorization mechanisms to secure your data used in generative AI applications | Amazon Web Services

Data security in generative AI apps is a bit tricky, especially with sensitive info like PII, PHI, and intellectual property. The main issue is that once data goes into an LLM through training, fine-tuning, or RAG, traditional authorization methods don’t work well.

Here’s how to set up proper data authorization in generative AI systems:

The key is using secure side channels for authorization instead of relying on the LLM itself. For example, with Amazon Bedrock Agents, you can pass session attributes with identity info:

{
  "inputText": "Get patient details",
  "sessionAttributes": {
    "userJWT": "eyJhbGciOiJIUZI1NiIsIn...",
    "username": "John Doe",
    "role": "receptionist"
  }
}

Core principles:

Don’t let LLMs make authorization decisions
Use secure side channels to pass identity context
Do authorization checks in backend services
Filter data before it gets to the LLM
Apply data governance and classification

To avoid the confused deputy problem, authorization should happen at the app level, not the LLM level. The app should check user permissions before allowing access to sensitive data through RAG or other methods.

For those wanting to dive deeper into implementation details, especially around Amazon Bedrock Agents and handling complex authorization scenarios, I’d recommend checking out the full article which has lots of examples and architectural patterns.

Embracing Event-Driven Architecture: Building Responsive and Scalable Applications

Traditional request-response setups often struggle with scaling, tight coupling, and performance bottlenecks as systems get more complex. Event-Driven Architecture (EDA) offers a great solution by enabling asynchronous communication through events, making systems more flexible and scalable.

Here’s how EDA works: Instead of direct service-to-service communication, components interact through events. The architecture has three main parts:

Event Producers: Create events based on state changes or actions
Event Channels: Move events between producers and consumers
Event Consumers: Handle events and do the related actions

Popular tech for EDA includes:

Apache Kafka: Handles big data streams with high throughput
RabbitMQ: Focuses on message routing and durability
AWS SNS/SQS: Managed messaging services in AWS

The architecture supports various patterns:

Event Sourcing: Saves state changes as event sequences
CQRS: Separates read and write operations
Event Choreography: Coordinates workflow steps through events

Challenges include keeping event consistency, debugging across services, and managing schema changes. Best practices suggest:

Making event handlers idempotent
Using correlation IDs for tracing
Implementing strong error handling with retries

Companies like Netflix and Uber have successfully used EDA at scale, processing millions of events per second while keeping systems reliable and fast.

If you’re keen to dive deeper into implementation details and real-world case studies, check out “Embracing Event-Driven Architecture: Building Responsive and Scalable Applications” for more on advanced patterns and practical examples.

Last but not least, if you find this useful, please share it with your friends and colleagues and let’s grow the community together.

Twinning Drew Scott GIF by American Family Insurance

Reply

or to participate.