Maybe you’re just getting started with distributed systems and system design, or you just need a quick recap for your upcoming interview. In either case, in this post, you’ll find the most common concepts of system design and the most important aspects of them. Let’s get started.
When you have a single node(host/server) and 300 users using your app, everything might work flawlessly. When you hit 600 users, the node you have might not be able to keep up with the incoming number of requests with the hardware it has. At this point, you need to scale and you have two options.
You can improve the hardware on the same node by doubling the memory or CPU, or whatever you need so that you can accommodate 600 users with a single machine. This is what we call scaling up or vertical scaling. This solves the immediate problem but you cannot keep adding more memory or CPU to the same machine forever. At some point, you’ll hit the max hardware a machine can support.
Another approach is putting new nodes that serve the incoming requests/tasks together, which is called scaling out or horizontal scaling. While this brings a lot of challenges, it allows you to scale almost infinitely, hence the recommended approach for most usecases. In addition to that, doubling the number of nodes is cheaper than doubling the hardware in a single node.
One way or another, you’ll need to scale out (scale horizontally) in most parts of your system. When you have multiple nodes, the question becomes how to balance the load between these nodes equally so that you can utilize the nodes/hardware as much as possible.
Let’s say you have a client trying to call OrderService. There are 3 main approaches to achieve that:
With a server-side load balancer, you’re solving the scaling and single point of failure problems of your nodes (app servers), but then Load Balancer becomes the single point of failure if it’s just a single server. One common approach is having multiple load balancers and having DNS Load balancing between those to make sure failure in a single LoadBalancer node will not make the whole service unreachable.
There are different techniques to balance the load. Most of them apply to server-side load balancing as you don’t have much knowledge of what’s going on in the first two approaches.
If you’re building a simple app, having a single service with multiple APIs to fulfill all your needs is the default choice. However, when the system evolves and each aspect of your app becomes complicated pieces with its own complex business logic, and most likely owned by different teams, you’ll realize it might be better to split each aspect of your business into separate services and define contracts/APIs between them without sharing anything else.
As soon as your application grows, you can break it into different microservices such as entity life-cycle management for orders, Notification, Invoicing, Advertisement, Fulfilment, Analysis and Reporting, and so on.
You decouple different concerns of the business or domain models into their respective services to evolve them easier. In a microservice architecture, there are a bunch of services fulfilling one or multiple aspects of the business. They talk to each other to fulfill different flows and they’re usually aware of each other with a Service Registry (concept is called Service Discovery), that knows which services exist, what their nodes’ or LoadBalancers’ addresses are, etc.
Instead of letting external users/clients calling each service, a more common approach is adding a Gateway that will be the bridge between users and microservices. Gateway can offer REST APIs for users to call and a central Auth logic to make sure users are authenticated, while microservices can use different protocols for communication (such as RPC) and auth.
When it comes to what type of database to use, there is usually a hot conversation, if not a fight, between SQL and no-SQL. At a high level, it’s good to know the main distinction between the two:
There are also some cases that fit in between. For those cases, both solutions should work for you. Most DBs use replication and sharding together to achieve better availability, scalability, and many other similar distributed system terms.
Replication is duplicating data in different nodes, data centers, or regions. It provides
Whenever a client writes something, it goes to the leader. Later on, data is replicated at the followers (green arrows below). Replication process can be
While the diagram above shows single-leader replication, please be aware there is also multi-leader and leader-less replication with their own advantages and disadvantages.
It allows you to spread reads/writes evenly across nodes by partitioning the data into multiple nodes. Each shard/partition will own some part of the data (i.e. some rows of a table).
While sharding, one of the important aspects is how to split data into multiple nodes, what’s the criteria to decide which data should be owned by which node. There are two common approaches
1. Range-based sharding You divide the rows into ranges based on the primary key’s value. Each range is assigned to a node. For example, if you have a user table whose primary key is username, you can shard the data based on the first letter of the username:
As data is sorted in the nodes, it’s easier to run range queries like “get all users whose username starts with ar”. On the other hand, it’s hard to split the data into nodes uniformly as the initial ranges assigned to nodes might skew in time based on the incoming data.
2. Hash-based sharding Hash-based sharding solves the uniform distribution issue of range-based sharding by sacrificing the efficient range-queries. Hash-based sharding takes the primary key or the whole row, hashes it, and uses the hash value to decide which node to go.
The hash function consistently produces the same hash for the same data. On the other hand, even slight changes in the data will create a completely different hash value. Now that similar data does not go to the same node like in range-based sharding anymore, data will be distributed more uniformly to the nodes. Due to the same reason, hash function causes losing the relationship between the actual value and which node it goes, hence lookup by primary key is easy but range queries are not possible.
Each application stores some data one way or another. It can be static files in some storage or some information in a DB. Your data will grow in time and the access pattern of your app will not require most of the data for 99% of its usecases. Caching is storing the parts of your data that will be needed the most somewhere that is faster than going to the actual storage/DB.
Caching can be in many forms. Some examples are
This convenience comes with a cost. Now that you have two places to look for something, DB and cache. It’s non-trivial to keep them in sync.
Cache is relatively more expensive(especially in-memory caches) than DB. Hence, it only stores the part of your data that is needed the most for your usecases instead of the whole DB. Given that, at some point, you’ll hit its limits. Then the question is which one to evict from the cache to make room for the new one
Most web and mobile apps rely on static files. These static files can be the HTML/CSS/JS files as well as images or videos. In the very old days, these static files were being stored in the server disk together with the web app. Later on, people started moving to Object Storage solutions such as AWS S3, Azure Blob storage, or Google Cloud Storage. This worked way better than storing them next to your web server as it provided better separation between the application code that’s accessed through APIs and asset files whose size and access patterns are different.
But, there was still a problem. This storage, often called origin server, which stores all your assets is still on some datacenter somewhere in the world, let’s say Seattle. Anyone using the app from the east coast, Europe, or Asia is also getting these files from Seattle, which will cause a lot of latency. That’s where CDN, Content Delivery Network, comes into play. It’s nothing but a cache. You still have your Object storage(or any other storage) as your origin server, and you distribute your CDN servers across the world. Users from Europe do not need to call your origin server in Seattle anymore, there is a CDN server in their city or country which is caching these static files based on the access pattern. Even if the requested file is not in the CDN server, CDN server will fetch it from Origin server and cache it for the upcoming requests.
Breaking the synchronous flow and adding a message queue or Pub/Sub in between is very useful for a variety of use-cases. It provides async service communication and helps to decouple the systems. They look similar, but they’re quite different. In some cases, they can even be combined.
There are two sides of a Message Queue. On the left side, one or more producers are producing tasks/messages while on the right side, one or more consumers are consuming those tasks/messages. Each task needs to be completed by one of the consumers. Therefore, each message is sent to only one of the consumers, unlike pub-sub pattern. After the consumer gets the message from the queue and processes it, it will be removed from the queue.
For example, if you have a video processing app, you might want to upload the video, send it to a message queue for later processing, and return success to the user immediately. Now, some consumers, most likely with more CPU power than the ones accepting the request from the user, will start consuming the messages and processing the videos. Each video will be processed by one of the consumers. If users are fine with waiting a day to get the processed video, you can even have way fewer consumer nodes and consume messages at a slower rate to save a ton of money.
Message queues are useful for point-to-point communication, but what if you’re the owner of an entity that is needed by multiple systems. In this case, you need a publisher on the left that is publishing messages to a topic that is needed by multiple systems, and on the right, there are subscribers which will get all the messages sent to the topic. This is what we call Pub/Sub.
For example, you own the FlightOrderService, and whenever there is a new order
You might think you can create 2 message queues, one for each subscriber, and ask FlightOrderService to send the order event to those 2 queues. Even though it works, Pub/Sub is a better architecture for this problem as there will be additional subscribers along the way. FlightOrderService will publish a message to the topic and other services will subscribe to the topic. If you publish 3 new orders, NotificationService and TicketService will get all 3 messages. The subscriber can use a message queue to put the messages coming from the topic for later processing or directly trigger some compute to immediately work on the task.
These concepts are foundations of most systems, distributed or not. It is good to know at least the basics and what they are used for so that you are aware of your options while designing systems. I tried explaining the concept at a high level without going into details as it would be 15-20 pages otherwise. If you realize there are things you don’t know or you require a deeper knowledge of, I highly recommend reading more-in-depth posts specifically for the concepts you need.