Architecture

Microservices patterns: synchronous vs asynchronous communication

From functions calls to distributed transactions, what does it take to make two services communicate in a microservices architecture?

by Grégoire MielleLast updated on May 18, 2021

Table of content

Data contracts & Network protocols: where do we start?
Synchronous communication
Asynchronous communication
Synchronous vs asynchronous: when to choose one over the other?
Wrapping up
References

While calling a service to send an email may seem trivial within a monolith, it can become a real headache in a distributed architecture.

Indeed, within a monolithic application, calling a service can be summed up to an in-process function call, which is incredibly fast and (almost) always succeeds. Distributed queries & commands, on the other hand, can fail due to a failure in the remote process or the connection, thus potentially causing increased errors and latency.

When adopting a distributed architecture, we want to keep the same level of resiliency and performance while benefiting from its advantages:

Independent teams & products
Scoped testability & deployability
Technical decoupling

So how do we make services talk to each other effectively?

Data contracts & Network protocols: where do we start?

There are lots of ways to make two separate workloads communicate over a network. One simple way to put it down is to compare synchronous communication mechanisms against asynchronous communication mechanisms to understand where they each work best.

Synchronous communication

Synchronous communication is the most straightforward solution when trying to make services communicate. Like a phone call, the client sends a request and waits for a response to come back.

A synchronous request is considered blocking: the response is needed for the process to continue. If you don't answer the phone, the person calling you will not be able to continue.

Most synchronous communication technologies are built around HTTP, including examples like gRPC, REST or GraphQL.

REST: Everything is a resource

Using REST, services expose resources which are available on dedicated endpoints using different HTTP verbs depending on the action you want to perform. Information is transported using JSON which leads to serialisation & deserialisation of each request's body.

HTTP verbs (Get, Post, Patch, Delete, etc.)
Permissive JSON payloads & responses
Serialisation & Deserialisation
TCP handshakes for each request if not using HTTP 2
Hard to fully adhere to all principles: too strict for most apps

gRPC: Calling remote functions

Using gRPC, services are defined using Protocol buffers. Clients can be generated from other services's protobuf definitions in many languages to send & receive Protobuf messages which are strongly typed.

Strongly typed messages with protocol buffers
Use a single connection for multiple requests with HTTP 2
Multiple languages supported to generate clients

GraphQL: Ask for exactly what you want

Using GraphQL, services expose a graph, which represent the relationship between different data types and let clients ask for exactly what they want.

Strongly typed graph
Fetch multiple resources in a single request
Introspection

While synchronous communication is ideal for querying data (returning a result without modifying the system's state), it can have some drawbacks for commands (changing the system's state with possible side effects).

Indeed, when a command involves multiple services performing an action or other services reacting to it (think of a distributed transaction on an e-commerce website which involves the payment, inventory & order services), it can become really hard to keep track of all services to call using synchronous communication. If one call fails, we don't want the whole order to be canceled.

Asynchronous communication

With asynchronous communication, a middleman is added to our infrastructure between services. Each interaction between services acts as a text message we would receive on our phone. As opposed to a phone call, we don't need to answer it directly:

It is best to avoid temporal coupling (if service A depends on service B, service B being unavailable at the exact moment service A needs it is not an issue)
It acts as a buffer to mitigate spikes (if there're more emails to be sent than usual, they will be sent when the email service can handle them, as opposed to not being able to send them immediately)

There are two fundamental different ways to work with asynchronous communication: queues & logs.

Queues: RabbitMQ, AWS SQS & more

Queues are a way for services to produce messages that need to be consumed by other services. They act as a transient buffer: once the message has been acknowledged by a consumer, it is removed from the queue, pretty much like a todo list.

This mechanism is particularly interesting when working with an email service for example. With a queue, multiple services can produce sendEmail messages that will be consumed by multiple instances of the email service. If the email service is temporarily unavailable, it will resume sending emails once back online with the remaining messages in the queue.

However, if a message needs to be consumed by different services (eg. user.created), multiple queues for each consuming service need to be created, otherwise each service would compete with one another to consume the same message.

Examples of patterns based on queues:

CQRS
Pipe/Filter

Logs: Apache Kafka, AWS Kinesis & more

Logs are a way for multiple services to record that something happened in time & let other services read & process these events.

As opposed to queues, events are stored indefinitely. Services can write to the same stream of events (topic) & others can read from it, from the beginning or from the time they start listening to new events, pretty much like a plane flight recorder.

This mechanism is particularly interesting when multiple services care about a shared domain. With logs, an orders service can produce events like order.created , order.edited & order.canceled , letting multiple other services read from a shared topic orders to perform actions like sending an email, updating a database record or deleting a shipping request.

Examples of patterns based on logs:

Event-driven architecture
Event-sourcing

Synchronous vs asynchronous: when to choose one over the other?

Both types of communication have benefits and drawbacks. While asynchronous communication is hard to get right but offer loose coupling, synchronous communication is synonymous with high coupling but is simple to use & debug. It is very common to find both of them in the same application. Here are common rules you can apply to choose one over the other:

Use synchronous communication if:

The operation is a simple query which does not change any state
The operation result is needed to move forward in the current process
The operation can fail and does not require a complex retry mechanism
The operation needs to be synchronous

Use asynchronous communication if:

The operation involves multiple services reacting to it
The operation must be performed while allowing failures & retries
The operation takes a lot of time

Wrapping up

Communication within a microservices architecture is hard to get right from the ground up.

In this context, cloud monitoring & observability plays a key role at better understanding existing services interactions & new potential flaws associated with the distributed nature of your application like latency or congestion.

Data contracts & Network protocols: where do we start?

Synchronous communication

Asynchronous communication

Synchronous vs asynchronous: when to choose one over the other?

Wrapping up

References