- This is a collection of personal notes from various courses, articles, and books on system design.
- The goal is to share a comprehensive guide to system design that can be used as a reference for interviews and real-world projects.
- The notes are organized by topics and include diagrams, code snippets, and examples to help understand the concepts better.
- The notes are a work in progress and will be updated regularly with new information and resources.
- System Design course on neetcode.io
- System Design for Interviews and Beyond
- Highload Software Architecture on Projector
- Learn System Design in a Hurry
- System Design Simplified on interviewready.io
- Grokking the System Design Interview
- AWS certification
- Helpful list of LeetCode Posts on System Design at Facebook, Google, Amazon, Uber, Microsoft
- My System Design Template
- Latency Numbers Every Programmer Should Know
- Real-time Messaging in Slack
- How Discord Stores Billions of Messages
- How Discord Stores Trillions of Messages
- Meta Onsite System Design Questions
- A Senior Engineer's Guide to the System Design Interview
- System design interview guide for Software Engineers
- ByteByteGo Newsletter
- Netflix’s interview process and questions
- How software engineering behavioral interviews are evaluated at Meta
- How Figma’s databases team lived to tell the scale
- System Design Interview – An insider's Guide by Alex Xu
- Microservices patterns
- Microservices Patterns: With examples in Java by Chris Richardson
- Designing Data-Intensive Applications by Martin Kleppmann
- Distributed Systems 4 by Maarten van Steen
- System Design Interview – An Insider's Guide Volume 2 by Alex Xu, Sahn Lam
- !!! Most Tech Interview Prep is GARBAGE. (From a Principal Engineer at Amazon)
- System Design Interview – An insider's Guide discussion
- Videos System Design for Interviews and Beyond videos
- System design playlist #1 on Youtube
- System design playlist #2 on Youtube
- System design playlist #3 on Youtube
- System design mock interviews
- https://github.com/InterviewReady/system-design-resources
- https://github.com/donnemartin/system-design-primer
- https://github.com/karanpratapsingh/system-design
- https://github.com/charlax/professional-programming
- https://blog.bytebytego.com
- https://newsletter.pragmaticengineer.com
- https://newsletter.systemdesign.one
- https://highscalability.com
- https://engineering.fb.com
- https://www.notion.so/blog/topic/tech
- https://www.notion.so/blog
- https://netflixtechblog.com
- https://stripe.com/blog
- https://medium.com/airbnb-engineering
- https://www.uber.com/en-IN/blog/engineering
- https://medium.com/pinterest-engineering
- https://www.figma.com/blog/engineering
- https://engineering.atspotify.com
- https://slack.engineering
- https://blog.x.com
- https://engineering.ramp.com
- https://www.canva.dev/blog/engineering
- https://stackoverflow.blog/engineering
- https://medium.com/paypal-tech
- https://medium.engineering
- https://dropbox.tech
- https://quoraengineering.quora.com
- https://blog.cloudflare.com
- https://www.akamai.com/blog
- https://www.linkedin.com/blog/engineering
- https://discord.com/category/engineering
- https://www.grammarly.com/blog
- https://eng.lyft.com
- https://medium.com/strava-engineering
- https://medium.com/tinder
- https://www.toptal.com/developers/blog
- https://github.blog/engineering
- https://devblogs.microsoft.com/engineering-at-microsoft
- https://blogs.nvidia.com
- A Senior Engineer's Guide to the System Design Interview
- System Design Blueprint: The Ultimate Guide
- System Design Basics
- System Design Interview Questions
- Design Rate Limiter #1
- Desing Consistent hashing
- Desing a key-value store
- Desing a unique ID generator in distributed systems
- Design a URL shortening service like TinyURL
- Design a web crawler
- Design a notification system #1
- Design a news feed system
- Design a chat system
- Desing a search autocomplete system
- Desing video sharing service
- Design a google drive
- Distributed Message Queue
- Desing a notification service #2
- Design Rate Limiter #2
- Design a distributed cache system
- Top K Problem #2
- Desing a system to count videos views
- URL shortening system questions about system
- Fraud detection system
- Authentication and authorization system
- Monitoring system
- Design TicTok/Instagram reels system
- Online Judge for coding contests
- Design an Amazon S3 or Object Storage
- Design a Dropbox
- Design Reddit home page feed
- Design Parking Garage
- Design TicTok #2
- Design Facebook Messenger
- Design Instagram
- Design Amazon Kindle Payments
- Design Amazon Prime Video
- Downloading User Data
- Design Calendar data entities mapping
- Twitter API
- Design architecture for Instagram likes and comments
- Design instagram #2
- Cloud design patterns
- Five common system design interview mistakes
- Do not panic!
- Don’t think like a coder. Think like a Tech Lead.
- During the interview, you’ll spend an hour playing the role of a
Tech Lead
, so just pretend that the interviewer is a junior engineer who will be implementing your design.
- During the interview, you’ll spend an hour playing the role of a
- There are no optimal solutions in system design interviews.
- It’s your responsibility to leave breadcrumbs for the interviewer to
go where you want them to go
. That way you have them walk you down the road where you are at your best - You do not need to display deep expertise in the given problem domain.
- Interviewers want to see that you have a broad, base-level understanding of system design fundamentals.
- Interviewers want to engage you in a back-and-forth conversation about problem constraints and parameters, so avoid making assumptions about the prompt.
- Interviewers are not looking for specific answers with ironclad certainty. They want to see well-reasoned, qualified decisions based on engineering trade-offs.
- Interviewers are not looking for a predefined path from the beginning to end of the problem. They want to see the unique direction your experience and decisions take them.
- Interviewers seek a holistic view of a system and its users.
Communicate honestly
about what you know and what you don’t.- For senior role, it’s a good sign if you direct more of the interview
- A common failure point occurs when candidates don’t make decisions
- Not being familiar with specific databases or other components is fine. Be smart and don’t say brand names just for the sake of saying them.
- "I’m going to use Cassandra ..." unless you are VERY familiar with that, because the next question will be: "Why Cassandra and not some_other_db?"
- "I will use Kafka ..." unless you’re prepared to explain how Kafka works. Don’t say "I will use Kafka" unless you are prepared to talk about other types of queues, because they may ask you: "Oh, Kafka, interesting choice. Why that instead of [some other queue]?"
- There’s no right way to design a system.
- If the interviewer interrupts you, it's probably because you’re going off track.
- It’s fine if the interviewer asks you questions, but it’s a bad sign if the interviewer starts telling you how to do things.
- It's more important to cover everything broadly than it is to explain every small thing in detail.
- Whatever decision you make, explain why. In a system design interview, why is more important than what. For anything you say, be prepared to explain why.
- Mock interviews with different types of interviewers are
the best solution we’ve found to refining your communication skills
, or working with a dedicated coach who can get to know you (and your areas of expertise and improvement) very well. - There is no strictly wrong answer for which database to use, as long as you can justify yourself and demonstrate an understanding of the alternative options.
- You should start with the functional requirements first—that is, the core product features and use cases that the system needs to support.
- What makes these questions difficult or complicated is not the fact that they are inherently complicated. It's because the interviewer intentionally chooses to withhold information. Information that you can only get if you ask the right questions.
- Treat the system as a black box. No thinking about design, implementation, or pretty much anything technical. The sole goal of this first step is to specify what needs to be built. Not how. Not the scale. Focus on the “what.”
- Consider mutability. Can tweets be edited after they’re published? Can tweets be deleted?
- It might sound like a small detail at first, but
mutability can limit our ability
to use caching in our design
- It might sound like a small detail at first, but
- Your interviewer cares less about your decisions, and more about whether you are able to talk about the trade-offs (positives and negatives) of your decisions
- We measure availability by the percentage of the time the system is up and running. A common goal is to aim for five nines, i.e., 99.999% availability—that’s less than 6 minutes of downtime a year.
- Whenever there is user-generated code execution involved (aka low trust code), running it in isolation should be a non-functional security requirement.
- Want speed, use a cache. Want availability, put in some redundancy. Want fast reads to db, use a read replica. Want fast writes to db, use sharding.
- Consider using caching when all three of these are true:
- Computing the result is costly
- Once computed, the result tends to not change very often (or at all)
- The objects we are caching are read often
- System Design problems are almost always easier in the immutable case (compared to the mutable one).
How to Get Yourself Unstuck – Tip #1
- To simplify the problem, think about the data in the problem as immutable. You can choose to add mutability back in later after you have an initial working design.How to Get Yourself Unstuck – Tip #2
- Ask your interviewer: “Are there any important requirements you have in mind that I’ve overlooked?”How to Get Yourself Unstuck – Tip #3
- Ask your interviewer if they’d like to see some calculations before jumping in and starting them—you might be able to skip these entirely if the interviewer doesn’t care about it!How to Get Yourself Unstuck – Tip #4
- Follow these rough guides to get the basic estimates for any systemStorage
Estimation: Storage = daily data used by 1 user * DAU count * length of time to store dataBandwidth
Estimation: Bandwidth per second = (daily data used by 1 user * DAU count ) / total seconds in a day
How to Get Yourself Unstuck – Tip #5
“Would it be fine if the data in my system was occasionally wrong for a split second or so?”- If the answer is yes, then you probably want
eventual consistency
. - If the answer is no, then you’re looking for a
strong consistency
called linearizability.
- If the answer is yes, then you probably want
Do not jump right in to give a solution. Slow down. Think deeply and ask questions to clarify requirements and assumptions. This is extremely important.
- 𝐑𝐞𝐪𝐮𝐢𝐫𝐞𝐦𝐞𝐧𝐭 𝐂𝐥𝐚𝐫𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬 (3-5 𝐦𝐢𝐧)
- Ask clarifying questions to understand the problem and expectations of the interviewer.
- Use cases
- Scenarios that will not be covered
- Who will use
- How many will use
- Usage patterns
- Technology stack
- 𝐅𝐮𝐧𝐜𝐭𝐢𝐨𝐧𝐚𝐥 𝐑𝐞𝐪𝐮𝐢𝐫𝐞𝐦𝐞𝐧𝐭𝐬
- Start with the customer and work backwards
- Who will use the system
- How the system will be used
- Platform (Mobile, Web, Desktop)
- 𝐍𝐨𝐧 𝐅𝐮𝐧𝐜𝐭𝐢𝐨𝐧𝐚𝐥 𝐑𝐞𝐪𝐮𝐢𝐫𝐞𝐦𝐞𝐧𝐭𝐬
High availability
system uptime in percentage- 99 % => 3.65 days of downtime per year
- 99.9% => 8.76 hours of downtime per year
- 99.99% => 52 minutes 35 seconds of downtime per year
- 99.999% => 5 minutes 15 seconds of downtime per year
- 99.9999% => 31 seconds of downtime per year
Scalability
needs to handle peak loadPerformance
in terms of latency and throughputDurability
- data should not be lostConsistency
- data should not be corrupted- Strong consistency
- Weak consistency
Maintainability
- Failure modes and mitigations
- Monitoring
- Testing
- Deployment
Security
Cost
- Ask clarifying questions to understand the problem and expectations of the interviewer.
- 𝐄𝐬𝐭𝐢𝐦𝐚𝐭𝐢𝐨𝐧𝐬 (3-5 𝐦𝐢𝐧)
- Latency/Throughput expectations
- QPS (Queries Per Second) Read/Write ratio
- Total/Daily active users
- Traffic estimates
- Write (QPS, Volume of data)
- Read (QPS, Volume of data)
- Storage estimates
- Memory estimates
- If we are using a cache, what is the kind of data we want to store in cache
- How much RAM and how many machines do we need for us to achieve this ?
- Amount of data you want to store in disk/ssd
- Number of users/tweets/likes etc
- Traffic estimates
- Start with something simple
- Single machine solution only
- Single service solution
- Design monolith service and then break it into microservices
- High level design (5-10 min)
- APIs for Read/Write scenarios for crucial components
- Database schema
- Basic algorithm
- High level design for Read/Write scenario
- High level design for
Read heavy
scenario - High level design for
Write heavy
scenario
- 𝐃𝐞𝐬𝐢𝐠𝐧 details (15-20 min)
- Scaling the algorithm
- Scaling individual components:
- Availability, Consistency and Scale story for each component
- Consistency and availability patterns
- Think about the following components, how they would fit in and how it would help
- Client (Mobile, Browser)
- DNS
- CDN (Push vs Pull)
- Load Balancers (Active-Passive, Active-Active, Layer 4, Layer 7)
- Reverse Proxy
- Blob / Object Storage
- Application layer scaling (Microservices, Service Discovery)
- DB (RDBMS, NoSQL)
- RDBMS
- Master-slave, Master-master, Federation, Sharding, Denormalization, SQL Tuning
- NoSQL
- Key-Value, Wide-Column, Graph, Document
- Fast-lookups:
- RAM (Bounded size) => Redis, Memcached
- AP (Unbounded size) => Cassandra, RIAK, Voldemort
- CP (Unbounded size) => HBase, MongoDB, Couchbase, DynamoDB
- Fast-lookups:
- Read heavy: MongoDB, Couchbase
- Write heavy: Cassandra, ScyllaDB
- Key-Value, Wide-Column, Graph, Document
- RDBMS
- Caches
- Client caching, CDN caching, Webserver caching, Database caching, Application caching, Cache @Query level, Cache @Object level
- Eviction policies:
- Cache aside
- Write through
- Write behind
- Refresh ahead
- Asynchronism
- Message queues
- Task queues
- Back pressure
- Communication
- TCP
- UDP
- REST
- RPC
- Web Sockets
- HTTP/S
- Logging, Metrics and Automation
- Justify (5 min)
- Throughput of each layer
- Latency caused between each layer
- Overall latency justification
- 𝐑𝐞𝐬𝐨𝐥𝐯𝐞 𝐛𝐨𝐭𝐭𝐥𝐞𝐧𝐞𝐜𝐤𝐬 𝐚𝐧𝐝 𝐟𝐨𝐥𝐥𝐨𝐰-𝐮𝐩 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬 (2-3 𝐦𝐢𝐧𝐮𝐭𝐞𝐬)
- Error cases (server failure, network loss, etc.) and how to handle them
- Bottlenecks
- How to monitor the system and what metrics to collect
- How to scale the system to the next level
Single point of failure
-SPOF
is a part of a system that, if it fails, will stop the entire system from working. SPOFs are undesirable in any system with a goal of high availability or reliability.Bottlenecks
- Bottleneck occurs when the capacity of an application or a computer system is severely limited by a single component. It has lowest throughput of all parts of the transaction path. System works as fast as the slowest part of the system.
- Reliability
- Scalability
- Efficiency
- Performance
- It is better to have a bunch of simple components in the system.
- Simple things are easier to understand, analyze, improve, replace.
- Look at
SOLID
.S
there stands exactly for that -Single Responsibility Principle
. Each component in the system should serve one and only one goal.
- Always keep in mind what is your main goal
- Start with
simplest
solution (Simple is not weak) - Continuously look for
bottlenecks
andSPOFs
- improve first and avoid second - Split complex things into
simple
ones
- This pattern is used to distribute incoming traffic evenly across multiple servers to prevent overload and improve the system's overall performance. Load balancing patterns can be applied at different levels, including network, transport, application, and database.
- This pattern involves dividing an application into separate tiers, each responsible for a different aspect of the application, making it easier to scale and manage
- Is a pattern in which a system sends an event in order to notify other systems of a change.
- A defining feature of this architecture is that the system doesn't expect a response from the event at all.
- There should be a separation between the logic that sends the event and any logic that get triggered by it.
- This pattern is handy because it makes a lower level of coupling simple.
- The event-carried state transfer pattern is very similar to the event notification pattern.
- It often shows up when you're looking for less communication between systems.
- This is achieved by duplicating data across multiple systems
- Write operations are done on the
master
, and then thereplica
can be accessed by other systems.
- This pattern is less commonly used but the main idea of event sourcing all changes to a systems state is recorded as an event.
- You should be able to recreate the current state by replaying all previous events.
- This is a fundamentally different way of looking at a system because the event store is the source of truth, not the database.
- A commonly used example for describing this pattern is a version control system like git where each commit is an event.
- Built for
audit
andreplay
purposes. Finance systems use this pattern a lot.
Command Query Responsibility Segregation
Pattern: This pattern separates read and write operations, allowing for the scaling and optimization of each independently.
- This pattern is commonly used in big data processing and involves splitting large data sets into smaller chunks, processing them independently, and then combining the results.
- It is a variant of the service-oriented architecture (
SOA
) architectural style that structures an application as a collection of loosely coupled services. - In a microservices architecture, services are fine-grained and the protocols are lightweight.
- A web server is a software application or a hardware device that caches static content and proxies dynamic content to application servers.
- It listens for requests on a specific port (usually port 80 for HTTP or 443 for HTTPS) and responds with the requested content.
https://trends.builtwith.com/web-server
- Load balancer
- Web server
- Cache server
- Static content server
- No dynamic content handling
- Compression (this is very useful because the internet is slow but the CPU is fast)
user www-data;
worker_processes 4;
pid /run/nginx.pid;
daemon off;
events {
worker_connections 2048;
multi_accept on;
use epoll;
}
http {
server_tokens off;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 15;
types_hash_max_size 2048;
client_max_body_size 20M;
include /etc/nginx/mime.types;
default_type application/octet-stream;
access_log /dev/stdout;
error_log /dev/stderr;
gzip on;
gzip_disable "msie6";
gzip_comp_level 5;
gzip_min_length 256;
gzip_proxied any;
gzip_vary on;
gzip_types
application/atom+xml
application/javascript
application/json
application/rss+xml
application/vnd.ms-fontobject
application/x-font-ttf
application/x-web-app-manifest+json
application/xhtml+xml
application/xml
font/opentype
image/svg+xml
image/x-icon
text/css
text/plain
text/x-component;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS';
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-available/*.conf;
open_file_cache off;
charset UTF-8;
}
- Examples:
Handling High Traffic
: In high-load applications, web servers must efficiently handle a large number of concurrent requests without significant delays or downtime.Scalability
: Web servers should be scalable, meaning they can handle increasing loads by adding resources (such as memory, CPU, or additional servers) without requiring significant changes to the architecture.Performance Optimization
: High-load applications often require optimizations such ascaching
,load balancing
, andcontent compression
to ensure that the web server can handle the traffic without compromising on performance.
TCP
is a connection-oriented protocol, which means that before data can be transferred, two computers must first establish a connection.- Reliable
- Connection-oriented
- Slow
- Used for HTTP, emails, file transfers, SSH.
UDP
is a connectionless protocol, which means that data can be transferred as soon as you have the IP address and port number of the destination computer.- Unreliable
- Connectionless
- Fast
- Used for DNS, streaming, video, online gaming.
CDN
is a system of distributed servers (network) that deliver webpages and other web content to a user based on the geographic locations of the user, the origin of the webpage and a content delivery server.Edge server
is a server that is located at the edge of the network and is responsible for caching content. It is usually located in a data center that is close to the user. The edge server caches content from the origin server and serves it to users when requested. The edge server can also serve static content directly to users without contacting the origin server. This reduces the load on the origin server and improves performance.Origin server
is a server that stores the original, definitive versions of web pages and other files. It is usually located in a data center that is far from the user. The origin server receives requests from edge servers and fulfills them by sending the appropriate content.
Edge computing
is a networking philosophy focused on bringing computing as close to the source of data as possible in order to reduce latency and bandwidth use. In simpler terms, edge computing means running fewer processes in the cloud and moving those processes to local places, such as on a user’s computer, an IoT device, or an edge server. Bringing computation to the network’s edge minimizes the amount of long-distance communication that has to happen between a client and server.
Authentication
- figuring out who you're talking toAuthorization
- figuring out what they're allowed to doSecure password storage
is the use of a cryptographic hash function to store passwords in a way that makes it difficult for an attacker to recover the original password.
Salting
is the process of adding a random string of characters to a password before hashing it. This makes it more difficult for an attacker to crack the password using a precomputed hash table or rainbow table.Session Tokens
are used to authenticate users and authorize access to resources. A session token is a unique string of characters that is generated when a user logs in and is stored in a cookie or in the URL. The session token is sent with each request to the server to identify the user and authorize access to resources.- Session tokens should also come with an expiration date, as short as feasible.
- Session token is equivalent to a password, so it should be stored securely.
JSON Web Tokens
(JWT) are a standard for representing claims securely between two parties. JWTs are signed using a secret key or a public/private key pair, which makes them secure and tamper-proof. JWTs can be used to authenticate users and authorize access to resources.
Cookies
- by storing a session token or JWT in a cookie, we can ensure that all subsequent requests will include it and allow the server to validate the current user session.
- The user signs up. At this point, we need to salt and hash their password and store those values (but not the password itself!).
- The user logs in with their username and password. We verify the password by hashing it with the stored salt and checking to see if it matches the stored hash (ideally using a secure library to make the comparison). We then send some kind of identifying token, either a simple session token or a JWT or similar token, back to the client in a cookie set header.
- On subsequent requests, the browser sends the cookie back to the server, where we can verify the session token or check the signature on/decrypt a JWT.
- Periodically, the session token or JWT should be expired and a new one generated and sent down to the client with a cookie set header.
- Eventually, the user's session may expire from inactivity. In this case, we go back to step 2.
Forward proxy
and a reverse proxy are two types of proxies used to provide additional security, anonymity, and caching for web traffic. A forward proxy acts as an intermediary between clients and servers on the Internet. When a client requests a web page or other resources from a server, the request is first sent to the forward proxy. The forward proxy then retrieves the requested content on behalf of the client and returns it to the client. This can provide additional security and privacy for clients by hiding their IP addresses and location information from the server. Forward proxies are often used in corporate networks to control access to the Internet and to improve performance by caching frequently accessed content.forward proxy
is used to provide security and caching for client requestsReverse proxy
, on the other hand, sits between the client and the server, intercepting requests from the client and forwarding them to the appropriate server. The client believes it is communicating directly with the server, but in reality, it is communicating with the reverse proxy. The reverse proxy can provide additional security by filtering requests and blocking malicious traffic. It can also improve performance by caching frequently accessed content and distributing traffic across multiple servers.reverse proxy
is used to provide security, load balancing, and caching for server requests.
- Round Robin - The simplest load balancing method. The load balancer cycles through the available servers, and each server handles an equal number of requests. If one of the servers goes down, the load balancer stops sending requests to that server.
- Least Connections - Each server is assigned a number of current connections. When a new request comes in, the load balancer sends it to the server with the fewest connections. This method works well if the work each server does is roughly the same. If one server is handling twice as many connections as the others, it will get twice as many new requests as the others. This will quickly overload the server.
- IP Hash - Each request from a client is always sent to the same server. This is based on the client's IP address. If a server goes down, the load balancer will send requests from that server's IP address to the remaining servers, which could overload those servers. If a new server is added, the load balancer may send some requests from existing servers to the new server, which would cause a performance hit.
- Weighted Round Robin - Similar to round robin, but some servers are weighted more heavily than others. This allows you to balance the load between servers based on their capabilities. For example, you might have one powerful server and three smaller servers. You could set the weights so that the powerful server handles 60% of the requests and each of the smaller servers handles 20%.
- Least Response Time - Each server is assigned an estimated average response time. When a new request comes in, the load balancer sends it to the server with the lowest average response time. This method works well if the work each server does is roughly the same. If one server is much faster or slower than the others, this method will not work well.
Consistent hashing
is a special kind of hashing such that when a hash table is resized, onlyK/n
keys need to be remapped on average, whereK
is the number of keys, andn
is the number of slots.Consistent hashing
is useful in situations where we need to add or remove nodes without significant reorganization of the data. It is also useful for caching, where we need to store cached data on multiple machines, and we want to add or remove machines without invalidating all the cached data.- Servers put in
circle
placed by key hash of theirIP address
. When we need to add new server, we just add it to circle and place it by hash of its IP address. When we need to remove server, we just remove it from circle. - We move
clockwise
from the hash of the key we are looking for. The first server we come to is the server that stores the value for that key. - The idea of
consistent hashing
is to distribute the load evenly across the servers. When a new server is added, only a small portion of the keys need to be remapped. When a server is removed, the keys that were assigned to that server are evenly distributed among the remaining servers.
Relational database
is a database that stores and provides access to data points that are related to one another. Relational databases are based on the relational model, an intuitive, straightforward way of representing data in tables. Each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points.Advantages of SQL Databases
:- SQL offers more powerful querying out of the box
- SQL has stronger ACID guarantees out of the box
Disadvantages of SQL Databases
:- B-Trees, used in SQL DBs, are slower to write into
- Strong consistency is expensive to reduce latency for
- SQL does not work well for mixed schema data
- SQL databases are hard to scale horizontally
Common problems
:- Writing to HDD is slow
- No indexes for fast reading
- No enough RAM for caching
- Slow writing because of not correct indexes
- Full scan of the table
- Locking in transactions
Optimization
:- Saving data in one transaction by
chunks
, this will reduce the time of rebuilding the indexes. This is possible because of theatomicity
of the transaction. - Using RAM more than HDD
- For
InnoDB
set buffer sizeinnodb_buffer_pool_size
- https://dev.mysql.com/doc/refman/8.4/en/innodb-buffer-pool-resize.html
- https://dev.mysql.com/doc/refman/8.4/en/innodb-buffer-pool.html
show variables like '%innodb_buffer_pool_size%';
SET GLOBAL innodb_buffer_pool_size=402653184;
- The
InnoDB
buffer uses for:- indexes
- cache
- temporary tables
- For
- Increase the size of your
redo log
files:- It’s a special log that stores all changes made to the database before they are permanently written to disk
- If the server unexpectedly shuts down, any changes that were made but not yet saved to the main database can be recovered from the redo log.
- Instead of writing each operation directly to the main database, MySQL first writes changes to the redo log
- During a MySQL restart, the redo log is used to recover any changes that were planned but hadn’t yet been written to the main database.
- Small redo log files cause many unnecessary disk writes.
innodb_log_file_size = 512M
- Define how frequently the data in the
redo log
is flushed to disk during transaction commits:- Use
innodb_flush_log_at_trx_commit
parameter- This parameter in MySQL controls how frequently the data in the
redo log
is flushed to disk during transaction commits.
- This parameter in MySQL controls how frequently the data in the
0
- writes to the log and flushes to disk once per second, but does nothing for each commit1
- writes to the log and flushes the log to disk every transaction. Default value2
- writes to the log every commit, flushes to disk once a second
- Use
- Scan database with https://github.com/major/MySQLTuner-perl
- Use indexes if not used
- Use
show full processlist
to see hosts connected to the database and queries they are running - Set connection timeout, to avoid the situation when the connection is not closed and the database is overloaded
- Saving data in one transaction by
- Atomicity - The transaction must function as a single, indivisible unit of work such that the entire transaction is either committed or cancelled. When transactions are atomic, there is no such thing as a partially completed transaction: all or nothing.
- Consistency - The database must always transition from one consistent state to another.
- Isolation - The results of a transaction are usually (because in read uncommited it's possible, we can read from buffer data(snapshot) that is not committed yet) invisible to other transactions until it is completed.
- Durability - Once committed, the changes made during the transaction become permanent. This means that changes must be recorded so that data cannot be lost in the event of a system failure.
- The
EXPLAIN
command is the primary way to find out what decisions the query optimizer is making. It has a lot of limitations and doesn't always tell the truth, but since there's no better one anyway, it makes sense to learn how to use it so you can make educated guesses about how queries are executed. - Explain vs Explain Analyze
Explain
- just gives you the query plan, but doesn't actually run the queryExplain Analyze
- actually runs the query and gives you the actual time it took to run the query (turn off the cache
to get the real time)
- With index, we are using
B-Tree
type of data structure- Because of that, we have
O(log(n))
complexity for searching, inserting, deleting.
- Because of that, we have
- If we have a lot of indexes, it can slow down the writing because we need to update all indexes
- If we are using
WHERE
in the query we need to check if we have an index for that column, this can improve the performance of the query (O(log(n))
complexity) - Index can be not used if the query contains a big range of data or when using
functions
for the column
- Links:
Read locks
on a resource areshared
or mutually non-blocking: any number of clients can read from a resource at the same time without affecting each other.Write locks
, on the other hand, areexclusive
. They exclude the possibility of setting a read lock and other write locks, since the only safe policy is to have only one client writing at a time and prevent all reads of the resource's content for the duration of the write.- Always working with
indexes
not with snapshots.
- Always working with
- Different systems has different concurrent access management strategies.
InnoDb
system is great for concurrent access, it has locking onrow
andtable
level
- How to search locks:
- Execute
SET autocommit=0;
disable auto commit after each request - Execute
SET GLOBAL innodb_status_output=ON;
enable InnoDB standard Monitor - Execute
SET GLOBAL innodb_status_output_locks=ON;
enable Locks Monitor - Execute
SHOW ENGINE INNODB STATUS
- to see the transaction, locks - Look for
Transaction
section
- Execute
Сonsistent Read
- At the time of the first request in a transaction, a
snapshot
of the database data (read view) is created, whichis not affected by changes in parallel transactions
, but is affected by changes in the current one - Reading from such a snapshot is called a
non-blocking consistent read
. - Non-blocking - because no locks are required to create a snapshot, consistent — because no operations in the outside world (except DROP TABLE and ALTER TABLE) will affect the snapshot.
- InnoDB can be asked to take a snapshot before the first request in a transaction, for this you need to mention this at the time of the start of the transaction -
START TRANSACTION WITH CONSISTENT SNAPSHOT
. - For 10 parallel transactions, 10 snapshots will be created, and each transaction will work with its own snapshot.
- At the time of the first request in a transaction, a
- Types of Locks:
- shared lock
(S)
- shared lock, allows other transactions to read the row and set the same shared lock on it, but does not allow changing the row or setting an exclusive lockSELECT ... LOCK IN SHARE MODE
- Locks affected rows for writing. Other sessions can read, but wait for the end of the transaction to modify the affected rows
- exclusive lock
(X)
- exclusive lock, prohibits other transactions from locking the row, and can also lock the row for both writing and reading, depending on the current isolation levelSELECT... FOR UPDATE
- locks read rows for reading.
- Intention Locks - locks on locks.
table-level locks
and only block other locks and operations on the entire table of type LOCK TABLE- Types:
- Intention shared lock
(IS)
- blocks only the creation of other shared locks and LOCK TABLE operations. Ex:SELECT ... LOCK IN SHARE MODE
- Intention exclusive lock
(IX)
- blocks only the creation of other exclusive locks and LOCK TABLE operations. Ex:SELECT ... FOR UPDATE
- Intention shared lock
- Intentions Locks Protocol
- Before a transaction can acquire a
shared lock
on a row in a table, it must first acquire anIS lock
or stronger on the table. - Before a transaction can acquire an
exclusive lock
on a row in a table, it must first acquire anIX lock
on the table.
- Before a transaction can acquire a
- Types:
- Gap Locks
- Gap lock is needed in order
to avoid the appearance of phantom records
, when, for example, between two identical readings of a range, a neighboring transaction manages to insert a record into this range. - A gap lock is a lock between index records, or a gap lock before the first or after the last entry in the index.
- For example,
SELECT c1 FROM t WHERE c1 BETWEEN 10 and 20 FOR UPDATE;
prevents other transactions from inserting the value 15 into column t.c1, regardless of whether any such value was already in the column, because gaps between all existing values in the range are locked
- Gap lock is needed in order
- shared lock
- Isolation:
- InnoDB allows you to set the isolation level globally and per session.
SET GLOBAL TRANSACTION ISOLATION LEVEL
SET SESSION TRANSACTION ISOLATION LEVEL
- Levels:
Repeatable Read
:- Default isolation level in InnoDB.
- A consistent read (
SELECT
) doesn't block anything, it reads rowsfrom a snapshot
that is created on the first read in a transaction.- The same queries will always return the same result.
- For blocking reads (
SELECT ... FOR UPDATE/LOCK IN SHARE MODE
), UPDATE and DELETE, the lock will depend on the condition type.- If the condition is unique (
WHERE id = 42
), then only the found recordin the index
is locked (record lock). - If the condition is with a range (
WHERE id > 42
), then theentire range is blocked in the index
(gap lock or next-key lock).
- If the condition is unique (
- Example:
- Start Transaction
- Select 1 — Snapshot 1
- Insert or Update index (snapshots are for reading only)
- Select 2 — Snapshot 1
- Select 1 — Snapshot 1
- Commit transaction
- Start Transaction
Read Committed
:- Each consistent read, even within the same transaction, sets and reads its own
snapshot
. - Within the same transaction, for every
SELECT
, the snapshot is created anew, and works with it. - Example:
- Start Transaction
- Select 1 — Snapshot 1
- Insert or Update index (snapshots are for reading only)
- Select 2 — Snapshot 2, and we can see the changes from previous insert or update
- Select 1 — Snapshot 1
- Commit transaction
- Start Transaction
- Each consistent read, even within the same transaction, sets and reads its own
Read Uncommitted
:- All SELECT queries are read in a
non-blocking manner
. - Not snapshots at all, reads and writes to the database.
- Changes to an incomplete transaction can be read in other transactions, and these changes can also be rolled back later.
- This is the so-called
dirty read
- All SELECT queries are read in a
Serializable
:- Completely similar to
REPEATABLE READ
, except for one moment. - If autocommit is disabled (and it is disabled when a transaction is explicitly started), then all simple SELECT queries are implicitly converted into
SELECT ... LOCK IN SHARE MODE
, if enabled, each SELECT goes into a separate transaction. - It is used, as a rule, to turn all read requests into
SELECT ... LOCK IN SHARE MODE
, if this cannot be done in the application code. - If after select request is insert request, this insert request will wait when the select is finished.
- This level quarantees that the data will be the same, all requests will be done in the same order.
- Read is possible.
- Completely similar to
- Considerations
Performance vs Consistency
:- Higher isolation levels like Serializable provide more consistency but at the cost of performance and scalability.
- They increase the likelihood of transaction conflicts and can lead to higher rates of transaction rollbacks in high-concurrency environments.
Database Specifics
:- Different databases implement these isolation levels in different ways
Deadlocks and Concurrency
:- Higher isolation levels can increase the risk of deadlocks, which your application must handle
Default Settings
:- Be aware of your database's default transaction isolation level and how it affects your application.
- InnoDB allows you to set the isolation level globally and per session.
- Some key terms to understand for
replication
:Replica
: Copy of dataLeader
: Machine that handles write requests to the data store.Followers
: Machines that are replicas of the leader node, and cater to read requests.
Replication
is the process of copying data from one database to another. Replication is used to increase availability and reliability of data. It is also used to distribute data geographically so that it can be accessed locally from various places.- Replication stategies:
- Master-slave replication - In this strategy, all writes are done to the master database, and the slave databases are updated with the changes. If the master database goes down, one of the slave databases can be promoted to be the new master. This strategy is simple and easy to implement, but it does not scale well.
- Master-master replication - In this strategy, both master databases are writable and both read from the slave databases. This strategy can scale well, but it is more complex to implement and it can lead to conflicts when the same data is modified concurrently on both master databases.
- Federation - In this strategy, each database is treated as a separate entity. Applications connect directly to the appropriate database, with no single master database. This strategy can scale well, but it is more complex to implement and it can lead to conflicts when the same data is modified concurrently on multiple databases.
- A
master database
generally only supportswrite
operations. Aslave database
gets copies of the data from the master database and only supports read operations. All the data-modifying commands like insert, delete, or update must be sent to the master database. Most applications require a much higher ratio of reads to writes; thus, the number of slave databases in a system is usually larger than the number of master databases.
Sharding
is the process of storing the same data set in multiple databases. Sharding is used to increase scalability and performance of databases. It is also used to distribute data geographically so that it can be accessed locally from various places.- Sharding stategies:
- Hash-based sharding - Hash-based sharding is the most common sharding strategy. In this strategy, each shard is responsible for a range of the hash values. For example, if we have three shards, one shard may be responsible for hash values 0-33, the second shard may be responsible for hash values 34-66, and the third shard may be responsible for hash values 67-99. When a new record is inserted into the database, the hash value of the key is computed, and the record is inserted into the shard that is responsible for that hash value range.
- Range-based sharding - Range-based sharding is similar to hash-based sharding, except that the shards are responsible for a range of the actual values of the keys. For example, if we have three shards, one shard may be responsible for keys a-m, the second shard may be responsible for keys n-z, and the third shard may be responsible for keys 0-9. When a new record is inserted into the database, the key is compared against the ranges, and the record is inserted into the shard that is responsible for that key range.
- List sharding - List sharding is similar to range-based sharding, except that the shards are responsible for a list of the actual values of the keys. For example, if we have three shards, one shard may be responsible for keys a, d, g, j, m, p, s, v, y, and the second shard may be responsible for keys b, e, h, k, n, q, t, w, z, and the third shard may be responsible for keys c, f, i, l, o, r, u, x. When a new record is inserted into the database, the key is compared against the lists, and the record is inserted into the shard that is responsible for that key.
- Directory sharding - Directory sharding is similar to list sharding, except that the shards are responsible for a directory of the actual values of the keys. For example, if we have three shards, one shard may be responsible for keys starting with a-m, the second shard may be responsible for keys starting with n-z, and the third shard may be responsible for keys starting with 0-9. When a new record is inserted into the database, the key is compared against the directories, and the record is inserted into the shard that is responsible for that key.
- For every write, indexes need to be updated. This can slow down the write operation.
- If a lot of
reads
and not many writes, indexes has to be everywhere. - If a lot of
writes
then wrap this operation in chunk and write in one transaction to update indexes only once.- Or use sharding to reduce the number of indexes that need to be updated.
- Index can be hash or tree based. Hash is faster but tree is more flexible.
B-trees
are self-adjusting trees that can achieve multilevel indexing. They are a generalized form of Binary Search Trees. The data is stored in sorted order in the B-trees. B-tree achieves the efficient utilization of space in nodes, along with keeping the height of the tree small.
B+ trees
are an extension of B-trees. The major differences in the data structure are:- Only the leaf nodes store the record or reference to the record.
- All the leaf nodes are connected to form a linked list. This enables sequential access along with direct access.
- Adaptive Hash Index - https://dev.mysql.com/doc/refman/8.4/en/innodb-adaptive-hash.html
- Even when you have a hash index, you still have a B-tree index.
-
NoSQL database
is a database that stores and provides access to data that does not have a predefined data model. NoSQL databases are highly scalable and are designed to handle large amounts of data and high user traffic. They are well suited for storing large sets of user data, such as social media profiles, product catalogs, and inventory records. NoSQL databases are also well suited for storing large sets of time series data such as clickstreams and location tracking data. NoSQL databases are not good at handling complex queries that require joins. -
Advantages of NoSQL Database
:- NoSQL is faster for writes but slower to query.
Log-structured merge tree
is much faster for writes since you don't do anything to maintain structure when you add to it
- NoSQL has sharding and scaling out of the box
- Shema is easily changeable
- NoSQL is faster for writes but slower to query.
-
Disadvantages of NoSQL Database
:- NoSQL databases are more limited in the types of efficient queries that can be done
- They are less suitable for circumstances where strong consistency is required,
- Cannot have transactions
- Joins are hard
- Basically Available - The system guarantees availability.
- Soft state - The state of the system may change over time.
- Eventual consistency - The system will become consistent over a period of time.
- Consistency - The system will become consistent over a period of time.
Quorum
ensures that a sufficient number of nodes agree on a value to maintain consistency and allow the system to progress even if some nodes are unavailable.Quorum
refers to a mechanism used to ensure consistency and availability of data in distributed systems. It is commonly used in databases with replication, such as Cassandra, MongoDB, or systems that implement consensus protocols like Paxos or Raft.- The basic idea behind Quorum is that for an operation (such as a write or read) to be considered successful, a majority of replicas in the system must agree. The number of replicas required for the operation to succeed is known as the quorum.
Quorum Read
: If you have 5 replicas, and 3 of them must return the latest data for the read to be considered successful, this is called a quorum read.Quorum Write
: For a write operation to be successful, it must be acknowledged by a majority of replicas.- Quorum ensures that any read will always retrieve the most up-to-date replica. The general formula is:
W + R > N
:W
— the number of replicas to which a write operation must be applied (write quorum).R
— the number of replicas from which a read operation must be successful (read quorum).N
— the total number of replicas.
- Advantages of using Quorum:
High availability
: The system can continue to operate even if some replicas are unavailable.Consistency
: Data remains consistent across the majority of replicas.Scalability
: It is easier to scale the number of replicas in distributed systems.
Consensus
requires all (or a majority) of nodes to agree, whileQuorum
only requires agreement from a subset, leading to lower latency in Quorum.
Consensus
is a fundamental problem in distributed systems that requires multiple nodes to agree on a single value or a sequence of values.- In distributed systems, consensus is critical to keep data consistent across nodes.
- For example:
- In a distributed database, consensus helps ensure that all replicas agree on updates or state changes, allowing users to see consistent data even if nodes go offline temporarily.
- Consensus algorithms ensure that all nodes agree on the leader or primary data source.
Consensus
requires all (or a majority) of nodes to agree, whileQuorum
only requires agreement from a subset, leading to lower latency in Quorum.
CAP theorem
states that it is impossible for a distributed data store to simultaneously providemore than two
out of the following three guarantees:- Consistency - Every read receives the most recent write or an error.
- Availability - Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Partition tolerance - The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
- In a distributed system,
partitions
cannot be avoided, and when a partition occurs, we must choose betweenconsistency
andavailability
.n3
goes down and cannot communicate withn1
andn2
. If clients write data ton1
orn2
, data cannot be propagated ton3
. If data is written to n3 but not propagated ton1
andn2
yet,n1
andn2
would have stale data. - If we choose
consistency
overavailability
(CP
system), we must block all write operations ton1
andn2
to avoid data inconsistency among these three servers, which makes the system unavailable. Bank systems usually have extremely high consistent requirements. For example, it is crucial for a bank system to display the most up-to-date balance info. If inconsistency occurs due to a network partition, the bank system returns an error before the inconsistency is resolved - If we choose
availability
overconsistency
(AP
system), the system keeps accepting reads, even though it might returnstale
data. For writes,n1
andn2
will keep accepting writes, and data will be synced ton3
when the network partition is resolved.
For some systems like financial systems, consistency is very important. For others like TikTok, where it is OK if some users get access to certain videos later than the rest, we try to aim for availability over consistency. Even in these cases, we want our system to eventually have the same view of the data. And that is called eventual consistency, where systems become consistent eventually, if not immediately.
Cache
is a temporary storage area for data that is accessed very frequently. It is used to increase the performance of data retrieval operations. It is also used to reduce the load on the database by storing frequently-accessed data in the cache.- Types (It’s a good practice to include a brief point about cache invalidation during system design interviews):
LRU Cache
- Least Recently Used CacheLFU Cache
- Least Frequently Used Cache
- Generally, caching is used for read-heavy systems.
- The cache sits on the side and the application directly talks to both the cache and the database.
There is no connection between the cache and the primary database. All operations to cache and the database are
handled by the application. Cache-aside caches are usually general purpose and work best for
read-heavy
workloads.- The application first checks the cache.
- If the data is found in cache, we’ve cache hit. The data is read and returned to the client.
- If the data is not found in cache, we’ve cache miss. The application has to do some extra work. It queries the database to read the data, returns it to the client and stores the data in cache so the subsequent reads for the same data results in a cache hit.
Read-through cache
sits in-line with the database. When there is a cache miss, it loads missing data from database, populates the cache and returns it to the application. Read-through caches work best forread-heavy
workloads when the same data is requested many times. For example, a news story.- In cache-aside, the application is responsible for fetching data from the database and populating the cache. In read-through, this logic is usually supported by the library or stand-alone cache provider.
- Unlike cache-aside, the data model in read-through cache cannot be different than that of the database.
Write-Through Cache
- data is first written to the cache and then to the database. The cache sits in-line with the database and writes always go through the cache to the main database. This helps cache maintain consistency with the main database.- The application writes the data directly to the cache.
- The cache updates the data in the main database. When the write is complete, both the cache and the database have the same value and the cache always remains consistent.
Write-Around
- Here, data is written directly to the database and only the data that is read makes it way into the cache.
Write-Back
orWrite-Behind
- Here, the application writes data to the cache which stores the data and acknowledges to the application immediately. Then later, the cache writes the data back to the database. Here, the application writes data to the cache which stores the data and acknowledges to the application immediately. Then later, the cache writes the data back to the database. Write back caches improve the write performance and are good forwrite-heavy
workloads.
-
ns
= nanosecond,µs
= microsecond,ms
= millisecond1 ns
= 10^-9 seconds1 µs
= 10^-6 seconds = 1,000 ns1 ms
= 10^-3 seconds = 1,000 µs = 1,000,000 ns
- By analyzing the numbers we get the following conclusions:
- Memory is fast but the disk is slow.
- Avoid disk seeks if possible.
- Simple compression algorithms are fast.
- Compress data before sending it over the internet if possible.
- Data centers are usually in different regions, and it takes time to send data between them.
Bloom filter
is a space-efficient probabilistic data structure, that is used to test whether an element is a member of a set.False positive
matches are possibleFalse negatives
are not – in other words, a query returns either possibly in set or definitely not in set- Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter);
- the more elements that are added to the set, the larger the probability of false positives.
- At-most-once guarantee
- At-least-once guarantee
- Exactly-once guarantee
- Scaling policies (metric-based, schedule-based, predictive)
Caching
- key value storeGradual deploy
Region
is a geographical area. Each region consists of availability zones. Each region is completely independent.- Example: us, uk, eu
Availability zone
is one or more data centers in region:- Example: 3 data centers in one availability zone or 2 data centers in another availability zone
LA
is a measure of the amount of computational work that a computer system performs. The load average represents the average system load over a period of time- System Load average is the average number of processes either in a
runnable
oruninterruptible
sleep state - Normal when it less than amount of CPU cores, e.g.
- 8 core CPU, load average is 8 this is ok
- 1 core CPU, load average is 8 this is bad
- Running out of RAM indicates that the server is under severe load and application performance will almost certainly be noticeable to end users.
- Database uses RAM for load
indexes
and cache data
- Swap is a space on a disk that is used when the amount of physical RAM memory is full. When a Linux system runs out of RAM, inactive pages are moved from the RAM to the swap space.
- When available RAM is short or totally maxed out, Linux moves data from RAM to SWAP.
- High SWAP usage means that you don’t have enough RAM
- Server might reboot if you will run out of SWAP
- Track current disk space used
- Monitor
Docker
images (prune unused images) - Monitor logs if they are stored on the disk, because HDD is slower than this is the slower part of the system
Idle
- idle task - when nothing to doUser
- running user space processesSystem
- running the kernelIowait
- idle, while system making disk I/O request (if this is high, it means that the disk is not enough, need to look at IO requests to HDD)Steal
- virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processorSoftirq
- processor swtches contextNice
- users' priority processes that have beenniced
Agents
- Software on every machine for collecting and reporting metrics and events- Telegraf
- Zabbix
Storages
- Database with metrics and events- InfluxDB
- ElasticSearch
- Graphite
- MySQL
- Zabbix Server
- Prometheus
Dashboards
- Visualization of metrics and events- Grafana
- Kibana
- Zabbix Frontend
- Datadog
TIG
- Telegraf, InfluxDB, GrafanaELK
- ElasticSearch, Logstash, Kibana
Stress testing
is a type of performance testing that verifies the stability and reliability of the system under extreme conditions.- The purpose of
stress testing
is to ensure that the system or application can handle the expected and unexpected load - Stress testing can help to identify and fix performance issues, such as slow response times, high CPU or memory usage, database connectivity problems, or network latency.
- Can help find
bottlenecks
because the system's speed depends on the slowest part of the system Tools
:- Apache JMeter
- Siege
- ab (Apache Benchmark)
- Gatling
Approaches
:- Load Testing - Load testing is the most common approach to stress testing, which involves simulating the
expected user load
on a system or application and measuring its performance - Performance Testing - Performance testing is a stress testing approach that focuses on
measuring the response time of a system or application under different loads
and scenarios - Spike Testing - Spike testing is a stress testing approach that involves
simulating sudden spikes
in user activity or load on a system or application. Spike testing is used to evaluate the behavior of the systemunder sudden and unexpected increases
in traffic, such as during a marketing campaign or product launch - Endurance Testing - Endurance testing is a stress testing approach that involves
measuring the performance
of a system or applicationover an extended period of time
, usually several hours or days. Endurance testing is used to evaluate the system's ability to maintain its performance under a sustained load or stress - Worsening Tests - We create high load in part of production environment. Results affect real users. By observing the change in hardware metrics we can judge how these changes affect user metrics. So instead of spending weeks on improving something it’s easier to artificially slow down (worsen) that part and see if there is any impact
- Load Testing - Load testing is the most common approach to stress testing, which involves simulating the
- In a network system, a rate limiter is used to control the rate of traffic sent by a client or a
service. In the HTTP world, a rate limiter limits the number of client requests allowed to be
sent over a specified period. If the API request count exceeds the threshold defined by the
rate limiter, all the excess calls are blocked. Here are a few examples:
- A user can write no more than 2 posts per second.
- You can create a maximum of 10 accounts per day from the same IP address.
- You can claim rewards no more than 5 times per week from the same device.
Candidate
: What kind of rate limiter are we going to design? Is it a client-side rate limiter or server-side API rate limiter?Interviewer
: Great question. We focus on the server-side API rate limiter.Candidate
: Does the rate limiter throttle API requests based on IP, the user ID, or other properties?Interviewer
: The rate limiter should be flexible enough to support different sets of throttle rules.Candidate
: What is the scale of the system? Is it built for a startup or a big company with a large user base?Interviewer
: The system must be able to handle a large number of requests.Candidate
: Will the system work in a distributed environment?Interviewer
: Yes.Candidate
: Is the rate limiter a separate service or should it be implemented in application code?Interviewer
: It is a design decision up to you.Candidate
: Do we need to inform users who are throttled?Interviewer
: Yes.
- Accurately limit excessive requests.
- Low latency. The rate limiter should not slow down HTTP response time.
- Use as little memory as possible.
- Distributed rate limiting. The rate limiter can be shared across multiple servers or processes.
- Exception handling. Show clear exceptions to users when their requests are throttled.
- High fault tolerance. If there are any problems with the rate limiter (for example, a cache server goes offline), it does not affect the entire system.
- Client-side rate limiter
- Server-side rate limiter
- Middleware rate limiter
- Token bucket
- Leaking bucket
- Fixed window counter
- Sliding window log
- Sliding window counter
- This algorithm has a centralized bucket host where you take tokens on each request, and slowly drip more tokens into the bucket. If the bucket is empty, reject the request
- A
token bucket
is a container that has pre-defined capacity. - Tokens are put in the bucket at preset rates periodically
- Once the bucket is full, no more tokens are added
- Each request consumes one token. When a request arrives, we check if there are enough tokens in the bucket.
- If there are enough tokens, we take one token out for each request, and the request goes through.
- If there are not enough tokens, the request is dropped
- The token bucket algorithm takes two parameters:
Bucket size
: the maximum number of tokens allowed in the bucketRefill rate
: number of tokens put into the bucket every second
- Pros:
- The algorithm is easy to implement.
- Memory efficient.
- Token bucket allows a burst of traffic for short periods. A request can go through as long as there are tokens left.
- Cons:
- Two parameters in the algorithm are bucket size and token refill rate. However, it might be challenging to tune them properly.
- The
leaking bucket algorithm
is similar to the token bucket except that requests are processed at a fixed rate. It is usually implemented with a first-in-first-out (FIFO)queue
. The algorithm works as follows:- When a request arrives, the system checks if the queue is full. If it is not full, the request is added to the queue.
- Otherwise, the request is dropped.
- Requests are pulled from the queue and processed at regular intervals.
- Leaking bucket algorithm takes the following two parameters:
Bucket size
: it is equal to the queue size. The queue holds the requests to be processed at a fixed rate.Outflow rate
: it defines how many requests can be processed at a fixed rate, usually in seconds.
- Pros:
- Memory efficient given the limited queue size.
- Requests are processed at a fixed rate therefore it is suitable for use cases that a stable outflow rate is needed.
- Cons:
- A burst of traffic fills up the queue with old requests, and if they are not processed in time, recent requests will be rate limited.
- There are two parameters in the algorithm. It might not be easy to tune them properly
-
Fixed window counter algorithm
works as follows:- The algorithm divides the timeline into fix-sized time windows and assign a counter for each window.
- Each request increments the counter by one.
- Once the counter reaches the pre-defined threshold, new requests are dropped until a new time window starts
-
Example:
- The time unit is 1 second
- The system allows a maximum of 3 requests per second.
- In each second window, if more than 3 requests are received, extra requests are dropped
- A major problem with this algorithm is that a burst of traffic at the edges of time windows
could cause more requests than allowed quota to go through
- The system allows a maximum of 5 requests per minute, and the available quota resets at the human-friendly round minute.
- As seen, there are five requests between 2:00:00 and 2:01:00 and five more requests between 2:01:00 and 2:02:00.
- For the one-minute window between 2:00:30 and 2:01:30, 10 requests go through.
- That is twice as many as allowed requests.
Sliding window log algorithm
works as follows:- The algorithm keeps track of request timestamps. Timestamp data is usually kept in cache, such as sorted sets of Redis.
- When a new request comes in, remove all the outdated timestamps. Outdated timestamps are defined as those older than the start of the current time window.
- Add timestamp of the new request to the log.
- If the log size is the same or lower than the allowed count, a request is accepted. Otherwise, it is rejected
-
Example:
- In this example, the rate limiter allows 2 requests per minute. The time unit is 1 minute.
- The log is empty when a new request arrives at
1:00:01
. Thus, the request is allowed. - A new request arrives at
1:00:30
, the timestamp1:00:30
is inserted into the log. After the insertion, the log size is2
, not larger than the allowed count. Thus, the request is allowed. - A new request arrives at
1:00:50
, and the timestamp is inserted into the log. After the insertion, the log size is3
, larger than the allowed size2
. Therefore, this request is rejected even though the timestamp remains in the log. - A new request arrives at
1:01:40
. Requests in the range[1:00:40,1:01:40)
are within the latest time frame, but requests sent before1:00:40
are outdated. - Two outdated timestamps,
1:00:01
and1:00:30
, are removed from the log. - After the remove operation, the log size becomes
2
; therefore, the request is accepted
-
Pros:
- Rate limiting implemented by this algorithm is very accurate. In any rolling window, requests will not exceed the rate limit.
-
Cons:
- The algorithm consumes a lot of memory because even if a request is rejected, its timestamp might still be stored in memory.
- The
sliding window counter algorithm
is a hybrid approach that combines the fixed window counter and sliding window log.
- Example:
- Let’s say I set a limit of
50
requests per minute on an API endpoint. The counter can be thought of like this: - In this situation, I did
18
requests during the current minute, which started15
seconds ago, and42
requests during the entire previous minute. - Based on this information, the rate approximation is calculated like this:
- rate =
42 * ((60-15)/60) + 18
=42 * 0.75 + 18
=49.5
requests
- rate =
- Let’s say I set a limit of
- Pros:
- Tiny memory usage: only two numbers per counter
- Incrementing a counter can be done by sending a single
INCR
command - Calculating the rate is reasonably easy: one GET command and some very simple, fast math
- Cons:
- It only works for not-so-strict look back window. It is an approximation of the actual rate because it assumes requests in the previous window are evenly distributed. However, this problem may not be as bad as it seems.
- The rate limiter is implemented as a
middleware
. It is placed in front of the API server. Rules
are stored on the disk. Workers frequently pull rules from the disk and store them in the cache.- When a client sends a request to the server, the request is sent to the rate limiter middleware first.
- Rate limiter
middleware
loads rules from thecache
. It fetches counters and last request timestamp fromRedis
cache. Based on the response, the rate limiter decides:- if the request is not rate limited, it is forwarded to API servers.
- if the request is rate limited, the rate limiter returns
429
too many requests error to the client. In the meantime, the request is either dropped or forwarded to thequeue
.
Synchronization issue
can be solved by using centralized data stores likeRedis
.
Consistent hashing
is a special kind of hashing such that when a hash table is re-sized and consistent hashing is used, onlyk/n
keys need to be remapped on average, wherek
is the number of keys, andn
is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped- The basic steps are:
- Map servers and keys on to the ring using a uniformly distributed hash function.
- To find out which server a key is mapped to, go clockwise from the key position until the first server on the ring is found
- To determine which server a key is stored on, we go clockwise from the key position on the ring until a server is found.
- Going clockwise,
key0
is stored on server 0;key1
is stored on server 1;key2
is stored on server 2 andkey3
is stored on server 3
- After a new server 4 is added, only
key0
needs to be redistributed. k1
,k2
, andk3
remain on the same servers. Let us take a close look at the logic.- Before server 4 is added,
key0
is stored on server 0. Now,key0
will be stored on server 4 because server 4 is the first server it encounters by going clockwise fromkey0
’s position on the ring. - The other keys are not redistributed based on consistent hashing algorithm.
- When a server is removed, only a small fraction of keys require redistribution with consistent hashing.
- When server 1 is removed, only
key1
must be remapped to server 2. The rest of the keys are unaffected.
- A
virtual node
refers to the real node, and each server is represented by multiple virtual nodes on the ring. - The 3 is arbitrarily chosen; and in real-world systems, the number of virtual nodes is much larger.
- Instead of using
s0
, we haves0_0
,s0_1
, ands0_2
to representserver 0
on the ring. - Similarly,
s1_0
,s1_1
, ands1_2
representserver 1
on the ring. - With virtual nodes, each server is responsible for multiple partitions.
- Partitions (edges) with label
s0
are managed byserver 0
. On the other hand, partitions with labels1
are managed byserver 1
. - As the number of
virtual nodes
increases, the distribution of keys becomes more balanced.
- To find which server a key is stored on, we go
clockwise
from the key’s location and find the firstvirtual node
encountered on the ring.
- When a server is added or removed, a fraction of data needs to be redistributed
server 4
is added onto the ring. The affected range starts froms4
(newly added node) and movesanticlockwise
around the ring until a server is found (s3
).- Thus, keys located between
s3
ands4
need to be redistributed tos4
- When a server (s1) is removed, the affected range starts from
s1
(removed node) and movesanticlockwise
around the ring until a server is found (s0). - Thus, keys located between s0 and s1 must be redistributed to s2.
- To achieve high availability and reliability, data must be replicated asynchronously over
N
servers, whereN
is a configurable parameter. - These
N
servers are chosen using the following logic:- after a key is mapped to a position on the hash ring,
walk clockwise
from that position and choose the firstN
servers on the ring to store data copies. key0
is replicated ats1
,s2
, ands3
- after a key is mapped to a position on the hash ring,
- A
key-value
store, also referred to as a key-value database, is a non-relational database. Each unique identifier is stored as a key with its associated value. - This data pairing is known as a
key-value
pair. Cassandra
:Reads
are more expensive than writes.Writes
are appended to a commit log and written to an in memory structure called a memtable that is eventually flushed to disk.Reads
, however, need to query the memtable and potentially multiple SSTables (on-disk files), a more expensive operation
Data partitioning
is a technique used to distribute data across multiple machines. It is used to scale the storage and retrieval of data.Data partitioning
can be achived byconsistent hashing
- Since data is replicated at multiple nodes, it must be synchronized across replicas.
Quorum consensus
can guarantee consistency for both read and write operations. Let us establish a few definitions first.N
= The number ofreplicas
W
= Awrite quorum
of sizeW
. For awrite
operation to be considered as successful, write operation must be acknowledged fromW
replicas.R
= Aread quorum
of sizeR
. For aread
operation to be considered as successful, read operation must wait for responses from at leastR
replicas.
- The configuration of
W
, R andN
is a typical tradeoff between latency and consistency- If
W = 1 or R = 1
, anoperation is returned quickly
because a coordinator only needs to wait for a response from any of the replicas. - If
W or R > 1
, the system offersbetter consistency
; however, thequery will be slower
because the coordinator must wait for the response from the slowest replica. - If
W + R > N
,strong consistency
is guaranteed because there must be at least one overlapping node that has the latest data to ensure consistency.
- If
- How to configure N, W, and R to fit our use cases? Here are some of the possible setups:
- If
R = 1 and W = N
, the system is optimized for afast read
. - If
W = 1 and R = N
, the system is optimized forfast write
. - If
W + R > N
, strongconsistency
isguaranteed
(Usually N = 3, W = R = 2). - If
W + R <= N
, strongconsistency
isnot guaranteed
.
- If
- Consistency models:
Strong consistency
: any read operation returns a value corresponding to the result of the most updated write data item. A client never sees out-of-date data.Weak consistency
: subsequent read operations may not see the most updated value.Eventual consistency
: this is a specific form of weak consistency. Given enough time, all updates are propagated, and all replicas are consistent.
- If a server is unavailable due to network or server failures, another server will process requests temporarily.
- When the down server is up, changes will be pushed back to achieve data consistency.
- This process is called
hinted handoff
. - Since
s2
is unavailable, reads and writes will be handled bys3
temporarily. - When
s2
comes back online,s3
will hand the data back tos2
.
- What if a replica is permanently unavailable?
- To handle such a situation, we implement an
anti-entropy protocol
to keep replicas in sync. - Anti-entropy involves comparing each piece of data on replicas and updating each replica to the newest version.
- A
Merkle tree
is used for inconsistency detection and minimizing the amount of data transferred
- Data center outage could happen due to power outage, network outage, natural disaster, etc.
- To build a system capable of handling data center outage, it is important to replicate data across multiple data centers.
- Even if a data center is completely offline, users can still access data through the other data centers.
- Main features of the architecture are listed as follows:
- Clients communicate with the key-value store through simple APIs: get(key) and put(key, value).
- A coordinator is a node that acts as a proxy between the client and the key-value store.
- Nodes are distributed on a ring using consistent hashing.
- The system is completely decentralized so adding and moving nodes can be automatic.
- Data is replicated at multiple nodes.
- There is no single point of failure as every node has the same set of responsibilities
- When data is written to
Cassandra
, it is first written to thecommit log
. Subsequently, the data is asynchronously written to thememtable
(in-memory storage) and theSSTable
(sorted file data on disk). - The commit log provides data durability by retaining information about each write operation, even if the data hasn't been completely written to the memtable and SSTable.
- During a Cassandra restart or in case of a failure, the commit log is used to recover data that may have been lost during the failure. It allows Cassandra to replay the data from the commit log and update the memtable and SSTable accordingly.
- In summary, the commit log is a crucial component of Cassandra's data durability mechanism, ensuring the reliability of data in case of system failures.
- The write request is persisted on a commit log file.
log
- this is a linked list data-structute. Where adding a new entry isO(1)
and reading isO(n)
.
- Data is saved in the memory cache.
- When the memory cache is full or reaches a predefined threshold, data is flushed to SSTable on disk.
- A sorted-string table (SSTable) is a sorted list of <key, value> pairs, contains sorted keys from log
- The system first checks if data is in memory -
memtable
. If not, go to step 2.- Data in Cassandra
memtables
is stored in a highly optimized, columnar format, which allows for efficient read and write operations. This format is designed to maximize performance and minimize disk I/O, which is essential for Cassandra's high-throughput, low-latency, and distributed architecture.
- Data in Cassandra
- If data is not in memory, the system checks the
bloom filter
Bloom filter
is a probabilistic data structure that is used to test whether an element is a member of a set.
- The bloom filter is used to figure out which SSTables might contain the key.
SSTables
return the result of the data set.- The result of the data set is returned to the client.
- It's important to note that the in-memory data structures and the on-disk storage (SSTables) in Cassandra can have different formats and structures. The on-disk format is designed for durability and efficient storage, while the in-memory format (memtable) is optimized for quick access and updates.
Candidate
: What are the characteristics of unique IDs?Interviewer
: IDs must be unique and sortable.Candidate
: For each new record, does ID increment by 1?Interviewer
: The ID increments by time but not necessarily only increments by 1. IDs created in the evening are larger than those created in the morning on the same day.Candidate
: Do IDs only contain numerical values?Interviewer
: Yes, that is correct.Candidate
: What is the ID length requirement?Interviewer
: IDs should fit into 64-bit.Candidate
: What is the scale of the system?Interviewer
: The system should be able to generate 10,000 IDs per second.
- IDs must be unique.
- IDs are numerical values only.
- IDs fit into 64-bit.
- IDs are ordered by date.
- Ability to generate over 10,000 unique IDs per second.
Multi-master replication
is a replication strategy where all nodes are equal and can accept writes.- This approach uses the databases
auto_increment
feature. Instead of increasing the next ID by 1, we increase it byk
, wherek
is the number of database servers in use. - Next ID to be generated is equal to the previous ID in the same server plus
2
. - This solves some scalability issues because IDs can scale with the number of database servers.
- However, this strategy has some major drawbacks:
- Hard to scale with multiple data centers
- IDs do not go up with time across multiple servers.
- It does not scale well when a server is added or removed.
UUID
is another easy way to obtain unique IDs. UUID is a 128-bit number used to identify information in computer systems.UUID
has a very low probability of getting collusion.- In this design, each web server contains an ID generator, and a web server is responsible for enerating IDs independently.
- Pros:
- Generating
UUID
is simple. No coordination between servers is needed so there will not be any synchronization issues. - The system is easy to scale because each web server is responsible for generating IDs they consume. ID generator can easily scale with web servers.
- Generating
- Cons:
- IDs are 128 bits long, but our requirement is 64 bits.
- IDs do not go up with time.
- IDs could be non-numeric.
Ticket server
is a centralized service that generates unique IDs.- The idea is to use a centralized
auto_increment
feature in a single database server. - Pros:
- Numeric IDs.
- It is easy to implement, and it works for small to medium-scale applications.
- Cons:
- Single point of failure. Single ticket server means if the ticket server goes down, all systems that depend on it will face issues.
- To avoid a single point of failure, we can set up multiple ticket servers.
- However, this will introduce new challenges such as data synchronization
- Sections:
Sign bit
: 1 bit. It will always be 0. This is reserved for future uses. It can potentially be used to distinguish between signed and unsigned numbers.Timestamp
: 41 bits. Milliseconds since the epoch or custom epoch. We use Twitter snowflake default epoch 1288834974657, equivalent to Nov 04, 2010, 01:42:54 UTC.Datacenter ID
: 5 bits, which gives us 2 ^ 5 = 32 datacenters.Machine ID
: 5 bits, which gives us 2 ^ 5 = 32 machines per datacenter.Sequence number
: 12 bits. For every ID generated on that machine/process, the sequence number is incremented by 1. The number is reset to 0 every millisecond.
Datacenter IDs
andmachine IDs
are chosen at the startup time, generally fixed once the system is up runningTimestamp
:- As timestamps grow with time, IDs are sortable by time.
- The maximum timestamp that can be represented in
41
bits is 2 ^ 41 - 1 =2199023255551
milliseconds (ms), which gives us: ~69
years2199023255551
ms /1000
seconds /365
days /24
hours /3600
seconds
Sequence number
- Sequence number is
12
bits, which give us 2 ^ 12 =4096
combinations. - This field is
0
unless more than one ID is generated in a millisecond on the same server. - In theory, a machine can support a maximum of
4096
new IDs per millisecond.
- Sequence number is
Candidate
: Can you give an example of how a URL shortener work?Interviewer
: Assume URL https://www.systeminterview.com/q=chatsystem&c=loggedin&v=v3&l=long is the original URL. Your service creates an alias with shorter length: https://tinyurl.com/y7keocwj. If you click the alias, it redirects you to the original URL.Candidate
: What is the traffic volume?Interviewer
: 100 million URLs are generated per day.Candidate
: How long is the shortened URL?Interviewer
: As short as possible.Candidate
: What characters are allowed in the shortened URL?Interviewer
: Shortened URL can be a combination of numbers (0-9) and characters (a-z, AZ).Candidate
: Can shortened URLs be deleted or updated?Interviewer
: For simplicity, let us assume shortened URLs cannot be deleted or updated
- POST
api/v1/data/shorten
- request parameter: {
longUrl
: longURLString} - return shortURL
- request parameter: {
- GET
api/v1/shortUrl
- Return longURL for HTTP redirection
- Once the server receives a tinyurl request, it changes the short URL to the long URL with
301
redirect.
- A
301
redirect shows that the requested URL ispermanently
moved to the long URL. Since it is permanently redirected, the browser caches the response, and subsequent requests for the same URL will not be sent to the URL shortening service. - A
302
redirect means that the URL istemporarily
moved to the long URL, meaning that subsequent requests for the same URL will be sent to the URL shortening service first
- The
hashValue
consists of characters from [0-9, a-z, A-Z], containing10 + 26 + 26
=62
possible characters. - When
n
= 7,62^n
=~3.5 trillion
, 3.5 trillion is more than enough to hold 365 billion URLs, so the length of hashValue is7
.
longURL
is the input.- The system checks if the
longURL
is in the database. - If it is, it means the
longURL
was converted toshortURL
before. In this case, fetch theshortURL
from the database and return it to the client. - If not, the
longURL
is new. A new unique ID (primary key) Is generated by the unique ID generator. - Convert the ID to
shortURL
withbase 62
conversion. - Create a new database row with the
ID
,shortURL
, andlongURL
.
- Concrete example.
- Assuming the input longURL is: https://en.wikipedia.org/wiki/Systems_design
- Unique ID generator returns ID:
2009215674938
. - Convert the ID to shortURL using the
base 62
conversion. ID (2009215674938) is converted tozn9edcu
- Save
ID
,shortURL
, andlongURL
to the database
- As there are more reads than writes,
<shortURL, longURL>
mapping is stored in a cache to improve performance.
- The flow of URL redirecting is summarized as follows:
- A user clicks a short URL link: https://tinyurl.com/zn9edcu
- The load balancer forwards the request to web servers.
- If a
shortURL
is already in the cache, return thelongURL
directly. - If a
shortURL
is not in the cache, fetch thelongURL
from the database. - If it is not in the database, it is likely a user entered an invalid
shortURL
. - The
longURL
is returned to the user
- A web crawler is known as a robot or spider.
- It is widely used by search engines to discover new or updated content on the web.
- A crawler is used for many purposes:
Search engine indexing
: crawler collects web pages to create a local index for search enginesWeb archiving
: This is the process of collecting information from the web to preserve data for future usesWeb mining
: Web mining helps to discover useful knowledge from the internetWeb monitoring
: The crawlers help to monitor copyright and trademark infringements over the Internet
Locality
:- Distribute crawl servers geographically. When crawl servers are closer to website hosts, crawlers experience faster download time
Timeouts
:- Some web servers respond slowly or may not respond at all. To avoid long wait time, a maximal wait time is specified.
- If a host does not respond within a predefined time, the crawler will stop the job and crawl some other pages.
Candidate
: What is the main purpose of the crawler? Is it used for search engine indexing, data mining, or something else?Interviewer
: Search engine indexing.Candidate
: How many web pages does the web crawler collect per month?Interviewer
: 1 billion pages.Candidate
: What content types are included? HTML only or other content types such as PDFs and images as well?Interviewer
: HTML only.Candidate
: Shall we consider newly added or edited web pages?Interviewer
: Yes, we should consider the newly added or edited web pages.Candidate
: Do we need to store HTML pages crawled from the web?Interviewer
: Yes, up to 5 yearsCandidate
: How do we handle web pages with duplicate content?Interviewer
: Pages with duplicate content should be ignored.
- Seed URLs are the URLs that are used to start the crawling process.
- A good seed URL serves as a good starting point that a crawler can utilize to traverse as many links as possible
- The first proposed approach is based on locality as different countries may have different popular websites.
- Another way is to choose seed URLs based on topics; for example, we can divide URL space into shopping, sports, healthcare, etc.
- Seed URL selection is an open-ended question
- Most modern web crawlers split the crawl state into two:
to be downloaded
andalready downloaded
. - The component that stores URLs
to be downloaded
is called the URL Frontier. - You can refer to this as a First-in-First-out (FIFO)
queue
- The HTML downloader downloads web pages from the internet.
- Those URLs are provided by the
URL Frontier
- To achieve high performance, crawl jobs are distributed into multiple servers
- The URL space is partitioned into smaller pieces; so, each downloader is responsible for a subset of the URLs
- To download a web page, a URL must be translated into an
IP address
. - The
HTML Downloader
calls theDNS Resolver
to get the correspondingIP
address for theURL
DNS Resolver
is a bottleneck for crawlers because DNS requests might take time due to the synchronous nature of many DNS interfaces- Maintaining our DNS cache to avoid calling DNS frequently is an effective technique for speed optimization.
- DNS cache keeps the domain name to IP address mapping and is updated periodically by cron jobs.
- After a web page is downloaded, it must be parsed and validated because malformed web pages could provoke problems and waste storage space.
- Implementing a content parser in a crawl server will slow down the crawling process
- It helps to detect new content previously stored in the system.
- An efficient way to accomplish this task is to compare the hash values of the two web pages
- It is a storage system for storing HTML content.
- The choice of storage system depends on factors such as data type, data size, access frequency, life span, etc.
- Both disk and memory are used:
- Most of the content is stored on disk because the data set is too big to fit in memory.
- Popular content is kept in memory to reduce latency.
- URL Extractor parses and extracts links from HTML pages
- The URL filter excludes certain content types, file extensions, error links and URLs in
blacklisted
sites.
URL Seen
is a data structure that keeps track of URLs that are visited before or already in theFrontier
.URL Seen
helps to avoid adding the same URL multiple times as this can increase server load and cause potential infinite loops.- Bloom filter and hash table are common techniques to implement the
URL Seen
component.
- URL Storage stores already visited URLs.
- Step 1: Add
seed
URLs to theURL Frontier
- Step 2:
HTML Downloader
fetches a list of URLs fromURL Frontier
. - Step 3:
HTML Downloader
getsIP
addresses ofURLs
fromDNS resolver
and starts downloading. - Step 4:
Content Parser
parses HTML pages and checks if pages are malformed. - Step 5: After content is parsed and validated, it is passed to the
Content Seen
component. - Step 6:
Content Seen
component checks if a HTML page is already in the storage.- If it is in the storage, this means the same content in a different URL has already been processed. In this case, the HTML page is discarded.
- If it is not in the storage, the system has not processed the same content before. The
content is passed to
Link Extractor
.
- Step 7:
Link extractor
extracts links from HTML pages. - Step 8: Extracted links are passed to the
URL filter
. - Step 9: After links are filtered, they are passed to the
URL Seen
component. - Step 10:
URL Seen
component checks if a URL is already in the storage, if yes, it is processed before, and nothing needs to be done. Step 11: If a URL has not been processed before, it is added to theURL Frontier
DFS
is usually not a good choice because the depth of DFS can be very deep.BFS
is commonly used by web crawlers and is implemented by a first-in-first-out (FIFO) queue. In a FIFO queue, URLs are dequeued in the order they are enqueued.
- Generally, a web crawler should avoid sending too many requests to the same hosting server within a short period.
- The general idea of enforcing politeness is to download one page at a time from the same host.
- A delay can be added between two download tasks.
- Priority:
Prioritizer
: It takes URLs as input and computes the priorities.Queue f1 to fn
: Each queue has an assigned priority. Queues with high priority are selected with higher probability.Front queue selector
: Randomly choose a queue with a bias towards queues with higher priority.
- Politeness:
Back queue router
: It ensures that each queue (b1, b2, … bn) only contains URLs from the same host.Mapping table
: It maps each host to a queue.FIFO queues b1, b2 to bn
: Each queue contains URLs from the same host.Back queue selector
: Each worker thread is mapped to a FIFO queue, and it only downloads URLs from that queue. The queue selection logic is done by the Queue selector.Worker thread 1 to N
. A worker thread downloads web pages one by one from the same host. A delay can be added between two download tasks.
Candidate
: What types of notifications does the system support?Interviewer
: Push notification, SMS message, and email.Candidate
: Is it a real-time system?Interviewer
: Let us say it is a soft real-time system. We want a user to receive notifications as soon as possible. However, if the system is under a high workload, a slight delay is acceptable.Candidate
: What are the supported devices?Interviewer
: iOS devices, android devices, and laptop/desktop.Candidate
: What triggers notifications?Interviewer
: Notifications can be triggered by client applications. They can also be scheduled on the server-side.Candidate
: Will users be able to opt-out?Interviewer
: Yes, users who choose to opt-out will no longer receive notifications.Candidate
: How many notifications are sent out each day?Interviewer
: 10 million mobile push notifications, 1 million SMS messages, and 5 million emails.
- Push notification
- SMS message
Provider
. A provider builds and sends notification requests toApple Push Notification Service
(APNS). To construct a push notification, the provider provides the following data:Device token
: This is a unique identifier used for sending push notifications.Payload
: This is a JSON dictionary that contains a notification’s payload.
APNS
: This is a remote service provided by Apple to propagate push notifications to iOS devices.iOS Device
: It is the end client, which receives push notifications.
Android
adopts a similar notification flow. Instead of using APNs,Firebase Cloud Messaging
(FCM) is commonly used to send push notifications to android devices
- For
SMS
messages, third party SMS services likeTwilio
,Nexmo
, and many others are commonly used. Most of them are commercial services
- Although companies can set up their own email servers, many of them opt for commercial email services.
Sendgrid
andMailchimp
are among the most popular email services, which offer a better delivery rate and data analytics.
- A
service
calls APIs provided by notification servers to send notifications. Notification servers
fetch metadata such as user info, device token, and notification setting from the cache or database.- A notification event is sent to the corresponding queue for processing. For instance, an iOS push notification event is sent to the iOS PN queue.
Workers
pull notification events from message queues.Workers
send notifications to third party services.Third-party
services send notifications to user devices.
- To satisfy this requirement, the notification system persists notification data in a database and implements a retry mechanism
- The short answer is no. Although notification is delivered exactly once most of the time, the distributed nature could result in duplicate notifications.
- To reduce the duplication occurrence, we introduce a dedupe mechanism and handle each failure case carefully.
- Here is a simple dedupe logic:
- When a notification event first arrives, we check if it is seen before by checking the event ID.
- If it is seen before, it is discarded. Otherwise, we will send out the notification
- A large notification system sends out millions of notifications per day, and many of these notifications follow a similar format.
- Notification templates are introduced to avoid building every notification from scratch.
- A notification template is a preformatted notification to create your unique notification by customizing parameters, styling, tracking links, etc.
- Users generally receive way too many notifications daily and they can easily feel overwhelmed.
- Thus, many websites and apps give users fine-grained control over notification settings
- To avoid overwhelming users with too many notifications, we can limit the number of notifications a user can receive.
- This is important because receivers could turn off notifications completely if we send too often.
- When a third-party service fails to send a notification, the notification will be added to the message queue for retrying.
- If the problem persists, an alert will be sent out to developers.
- For iOS or Android apps,
appKey
andappSecret
are used to secure push notification APIs - Only authenticated or verified clients are allowed to send push notifications using our APIs
- A key metric to monitor is the total number of queued notifications.
- If the number is large, the notification events are not processed fast enough by workers.
- To avoid delay in the notification delivery, more workers are needed
Candidate
: Is this a mobile app? Or a web app? Or both?Interviewer
: BothCandidate
: What are the important features?Interview
: A user can publish a post and see her friends’ posts on the news feed page.Candidate
: Is the news feed sorted by reverse chronological order or any particular order such as topic scores? For instance, posts from your close friends have higher scores.Interviewer
: To keep things simple, let us assume the feed is sorted by reverse chronological order.Candidate
: How many friends can a user have?Interviewer
: 5000Candidate
: What is the traffic volume?Interviewer
: 10 million DAUCandidate
: Can feed contain images, videos, or just text?Interviewer
: It can contain media files, including both images and videos.
- To publish a post, a HTTP POST request will be sent to the server. The API is shown below:
- POST
/v1/me/feed
- Params:
content
: content is the text of the post.auth_token
: it is used to authenticate API requests.
- POST
- The API to retrieve news feed is shown below:
- GET
/v1/me/feed
- Params:
auth_token
: it is used to authenticate API requests.
- GET
User
: a user can view news feeds on a browser or mobile app. A user makes a posy through API:/v1/me/feed?content=Hello&auth_token={auth_token}
Load balancer
: distribute traffic to web servers.Web servers
: web servers redirect traffic to different internal services.Post service
: persist post in the database and cache.Fanout service
: push new content to friends’ news feed. Newsfeed data is stored in the cache for fast retrieval.Notification service
: inform friends that new content is available and send out push notifications
The fanout service works as follows:
- Fetch friend IDs from the graph database.
Graph databases
are suited for managing friend relationship and friend recommendations. - Get friends info from the user cache. The system then filters out friends based on user settings. For example, if you mute someone, her posts will not show up on your news feed even though you are still friends. Another reason why posts may not show is that a user could selectively share information with specific friends or hide it from other people.
- Send friends list and new post ID to the
message queue
. - Fanout workers fetch data from the message queue and store news feed data in the news feed cache. You can think of the news feed cache as a <post_id, user_id> mapping table. The memory consumption can become very large if we store the entire user and post objects in the cache. Thus, only IDs are stored. To keep the memory size small, we set a configurable limit. The chance of a user scrolling through thousands of posts in news feed is slim. Most users are only interested in the latest content, so the cache miss rate is low.
- Store <post_id, user_id > in news feed cache.
- With this approach, news feed is pre-computed during write time. A new post is delivered to friends’ cache immediately after it is published.
- Pros:
- The news feed is generated in real-time and can be pushed to friends immediately.
- Fetching news feed is fast because the news feed is pre-computed during write time.
- Cons:
- If a user has many friends, fetching the friend list and generating news feeds for all of them are slow and time consuming. It is called hotkey problem.
- For inactive users or those rarely log in, pre-computing news feeds waste computing resources.
- The news feed is generated during read time. This is an on-demand model. Recent posts are pulled when a user loads her home page.
- Pros:
- For inactive users or those who rarely log in, fanout on read works better because it will not waste computing resources on them.
- Data is not pushed to friends so there is no hotkey problem.
- Cons:
- Fetching the news feed is slow as the news feed is not pre-computed.
- media content (images, videos, etc.) are stored in CDN for fast retrieval
- A user sends a request to retrieve her news feed. The request looks like this:
/v1/me/feed
- The load balancer redistributes requests to web servers.
- Web servers call the news feed service to fetch news feeds.
- News feed service gets a list post IDs from the news feed cache.
- A user’s news feed is more than just a list of feed IDs. It contains username, profile picture, post content, post image, etc. Thus, the news feed service fetches the complete user and post objects from caches (user cache and post cache) to construct the fully hydrated news feed.
- The fully hydrated news feed is returned in JSON format back to the client for rendering.
Candidate
: What kind of chat app shall we design? 1 on 1 or group based?Interviewer
: It should support both 1 on 1 and group chat.Candidate
: Is this a mobile app? Or a web app? Or both?Interviewer
: Both.Candidate
: What is the scale of this app? A startup app or massive scale?Interviewer
: It should support 50 million daily active users (DAU).Candidate
: For group chat, what is the group member limit?Interviewer
: A maximum of 100 peopleCandidate
: What features are important for the chat app? Can it support attachment?Interviewer
: 1 on 1 chat, group chat, online indicator. The system only supports text messages.Candidate
: Is there a message size limit?Interviewer
: Yes, text length should be less than 100,000 characters long.Candidate
: Is end-to-end encryption required?Interviewer
: Not required for now but we will discuss that if time allows.Candidate
: How long shall we store the chat history?Interviewer
: Forever
Polling
Long polling
Websocket
- Polling is a technique that the client periodically asks the server if there are messages available.
- Depending on polling frequency, polling could be costly.
- It could consume precious server resources to answer a question that offers no as an answer most of the time.
- In long polling, a client holds the connection open until there are actually new messages available or a timeout threshold has been reached.
- Once the client receives new messages, it immediately sends another request to the server, restarting the process.
WebSocket
connection is initiated by the client. It isbi-directional
and persistent.- It starts its life as a HTTP connection and could be “upgraded” via some well-defined handshake to a WebSocket connection
- WebSocket is a
full-duplex protocol
. It allows both the client and the server to send messages at any time. - By using
WebSocket
for both sending and receiving, it simplifies the design and makes implementation on both client and server more straightforward.
Stateless services
- Stateless services are traditional public-facing request/response services, used to manage the login, signup, user profile, etc. These are common features among many websites and apps.
Stateful Service
- The only stateful service is the chat service.
- The service is stateful because each client maintains a persistent network connection to a chat server.
- In this service, a client normally does not switch to another chat server as long as the server is still available.
- The service discovery coordinates closely with the chat service to avoid server overloading.
- The client maintains a persistent
WebSocket
connection to a chat server for real-time messaging. Chat servers
facilitate message sending/receiving.Presence servers
manage online/offline status.API servers
handle everything including user login, signup, change profile, etc.Notification servers
send push notifications.- The
key-value store
is used to store chat history. When an offline user comes online, she will see all her previous chat history.
SQL
databases:- The first is generic data, such as user profile, setting, user friends list.
- These data are stored in robust and reliable relational databases.
- Replication and sharding are common techniques to satisfy availability and scalability requirements.
NoSQL
databases:- The second is chat history.
- Chat history is stored in a key-value store.
- The primary key is
message_id
, which helps to decide message sequence
- The composite primary key is (
channel_id
,message_id
). Channel
andgroup
represent the same meaning here.channel_id
is the partition key because all queries in a group chat operate in a channel.
- The primary role of
service discovery
is to recommend the best chat server for a client based on the criteria like geographical location, server capacity, etc
User A
tries to log in to the app.- The load balancer sends the login request to API servers.
- After the backend authenticates the user, service discovery finds the best chat server for C. In this
example,
server
2 is chosen and the server info is returned back toUser A
. User A
connects to chat server 2 throughWebSocket
- Each device maintains a variable called
cur_max_message_id
, which keeps track of the latest message ID on the device. - Messages that satisfy the following two conditions are considered as news messages:
- The recipient ID is equal to the currently logged-in user ID.
- Message ID in the key-value store is larger than
cur_max_message_id
- First, the message from
User A
is copied to each group member’s message sync queue: one forUser B
and the second forUser C
. - You can think of the message sync queue as an inbox for a recipient
- This design choice is good for small group chat because:
- it simplifies message sync flow as each client only needs to check its own inbox to get new messages.
- when the group number is small, storing a copy in each recipient’s inbox is not too expensive.
- After a WebSocket connection is built between the client and the real-time service,
user A’s
online status andlast_active_at
timestamp are saved in the KV store. - Presence indicator shows the user is online after she logs in.
- The online status is changed to offline in the KV store. The presence indicator shows a user is offline
- We introduce a
heartbeat
mechanism to solve this problem. - Periodically, an online client sends a heartbeat event to presence servers.
- If presence servers receive a heartbeat event within a certain time, say x seconds from the client, a user is considered as online.
- Otherwise, it is offline
- Presence servers use a
publish-subscribe model
, in which each friend pair maintains a channel. - When
User A’s
online status changes, it publishes the event to three channels, channelA-B
,A-C
, andA-D
. - Those three channels are subscribed by User
B
,C
, andD
- Thus, it is easy for friends to get online status updates.
- The communication between clients and servers is through real-time
WebSocket
- End-to-end encryption is a way to secure communications such that only the sender and the intended recipient(s) can access the content of the message.
- This is achieved by encrypting the message on the sender's device and decrypting it on the recipient's device, without any intermediate parties being able to access the content.
- Here's how end-to-end encryption works for messages:
- Sender and recipient(s) agree on a
secret key
that will be used for encryption and decryption. The sender's device
encrypts the message using thissecret key
.- This can be done using a variety of encryption algorithms such as :
AES
(Advanced Encryption Standard)RSA
(Rivest-Shamir-Adleman)ECC
(Elliptic Curve Cryptography)
- The encrypted message is then transmitted over a network or communication channel.
The recipient's device
receives the encrypted message and uses the same secret key to decrypt it.- The decrypted message is then displayed to the recipient, who can read it.
- Sender and recipient(s) agree on a
- The key point to remember is that the message is encrypted and decrypted at the endpoints (i.e., the sender and recipient devices), and the encryption key is not shared with any intermediaries or servers along the way.
- This means that even if the message is intercepted or hacked, it cannot be read without the secret key, which is known only to the sender and recipient.
Candidate
: Is the matching only supported at the beginning of a search query or in the middle as well?Interviewer
: Only at the beginning of a search query.Candidate
: How many autocomplete suggestions should the system return?Interviewer
: 5Candidate
: How does the system know which 5 suggestions to return?Interviewer
: This is determined by popularity, decided by the historical query frequency.Candidate
: Does the system support spell check?Interviewer
: No, spell check or autocorrect is not supported.Candidate
: Are search queries in English?Interviewer
: Yes. If time allows at the end, we can discuss multi-language support.Candidate
: Do we allow capitalization and special characters?Interviewer
: No, we assume all search queries have lowercase alphabetic characters.Candidate
: How many users use the product?Interviewer
: 10 million DAU.
Fast response time
: As a user types a search query, autocomplete suggestions must show up fast enough. An article about Facebook’s autocomplete system reveals that the system needs to return results within 100 milliseconds. Otherwise it will cause stuttering.Relevant
: Autocomplete suggestions should be relevant to the search term.Sorted
: Results returned by the system must be sorted by popularity or other ranking models.Scalable
: The system can handle high traffic volume.Highly available
: The system should remain available and accessible when part of the system is offline, slows down, or experiences unexpected network errors.
- To avoid traversing the whole trie, we store
top k most frequently
used queries at each node. - Since 5 to 10 autocomplete suggestions are enough for users,
k
is a relatively small number. - By caching top search queries at every node, we significantly reduce the time complexity to retrieve the top 5 queries.
- However, this design requires a lot of space to store top queries at every node.
- Trading space for time is well worth it as fast response time is very important.
Time complexity
onlyO(1)
to fetchtop k
queries.
- To design a scalable data gathering service, we examine where data comes from and how data is used.
- Real-time applications like Twitter require up to date autocomplete suggestions.
- However, autocomplete suggestions for many Google keywords might not change much on a daily basis.
- Despite the differences in use cases, the underlying foundation for data gathering service remains the same because data used to build the trie is usually from analytics or logging services.
- It stores raw data about search queries
- Logs are append-only and are not indexed
- The size of analytics logs is usually very large, and data is not in the right format.
- We need to aggregate data so it can be easily processed by our system.
- Depending on the use case, we may aggregate data
differently
. For real-time applications
such as Twitter, we aggregate data in a shorter time interval as real-time results are important.- On the other hand, aggregating data less frequently, say once per week, might be good enough for many use cases.
- During an interview session, verify whether real-time results are important.
time
field represents the start time of a week.frequency
field is the sum of the occurrences for the corresponding query in that week.
- Workers are a set of servers that perform asynchronous jobs at regular intervals.
- They build the trie data structure and store it in
Trie DB
- Trie Cache is a distributed cache system that keeps trie in memory for fast read.
- It takes a weekly snapshot of the DB.
- Trie DB is the persistent storage. Two options are available to store the data:
Document store
: Since a new trie is built weekly, we can periodically take a snapshot of it, serialize it, and store the serialized data in the database. Document stores likeMongoDB
are good fits for serialized data.Key-value store
: A trie can be represented in ahash table
form by applying the following logic:- Every prefix in the trie is mapped to a key in a hash table
- Data on each trie node is mapped to a value in a hash table
Trie
is created by workers using aggregated data.- The source of data is from
Analytics Log/DB
.
- Update the trie
weekly
. Once a new trie is created, the new trie replaces the old one - Update individual trie node
directly
(slow)
- We have to remove hateful, violent, sexually explicit, or dangerous autocomplete suggestions.
- We add a
filter layer
in front of theTrie Cache
to filter out unwanted suggestions. - Having a filter layer gives us the flexibility of removing results based on different filter rules.
- Unwanted suggestions are removed physically from the database asynchronically so the correct data set will be used to build trie in the next update cycle.
- A search query is sent to the
load balancer
. - The load balancer routes the request to
API servers
. - API servers get trie data from Trie Cache and construct autocomplete suggestions for the client.
- In case the data is not in
Trie Cache
, we replenish data back to the cache. This way, all subsequent requests for the same prefix are returned from the cache. A cache miss can happen when a cache server is out of memory or offline.
AJAX request
- no need to reload the pageBrowser caching
- autocomplete suggestions can be saved in browser cache to allow subsequent requests to get results from the cache directly.
- The
shard map manager
maintains a lookup database for identifying where rows should be stored. - For example, if there are a similar number of historical queries for ‘s’ and for ‘u’, ‘v’, ‘w’, ‘x’, ‘y’ and ‘z’ combined, we can maintain two shards: one for ‘s’ and one for ‘u’ to ‘z’.
- How do you extend your design to support multiple languages?
- To support other non-English queries, we store Unicode characters in trie nodes
- What if top search queries in one country are different from others?
- We might build different tries for different countries.
- To improve the response time, we can store tries in CDNs.
- How can we support the trending (real-time) search queries?
- Reduce the working data set by sharding.
- Change the ranking model and assign more weight to recent search queries.
- Data may come as streams, so we do not have access to all the data at once
- Stream processing requires a different set of systems:
- Apache Hadoop MapReduce
- Apache Spark Streaming
- Apache Storm
- Apache Kafka
Candidate
: What features are important?Interviewer
: Ability to upload a video and watch a video.Candidate
: What clients do we need to support?Interviewer
: Mobile apps, web browsers, and smart TV.Candidate
: How many daily active users do we have?Interviewer
: 5 millionCandidate
: What is the average daily time spent on the product?Interviewer
: 30 minutes.Candidate
: Do we need to support international users?Interviewer
: Yes, a large percentage of users are international users.Candidate
: What are the supported video resolutions?Interviewer
: The system accepts most of the video resolutions and formats.Candidate
: Is encryption required?Interviewer
: YesCandidate
: Any file size requirement for videos?Interviewer
: Our platform focuses on small and medium-sized videos. The maximum allowed video size is 1GB.Candidate
: Can we leverage some of the existing cloud infrastructures provided by Amazon, Google, or Microsoft?Interviewer
: That is a great question. Building everything from scratch is unrealistic for most companies, it is recommended to leverage some of the existing cloud services.
- MPEG–DASH. MPEG stands for “Moving Picture Experts Group” and DASH stands for "Dynamic Adaptive Streaming over HTTP".
- Apple HLS. HLS stands for “HTTP Live Streaming”.
- Microsoft Smooth Streaming.
- Adobe HTTP Dynamic Streaming (HDS).
- Pre-signed upload URL is a URL that can be used to upload an object to a bucket.
- The client makes a HTTP request to API servers to fetch the
pre-signed URL
, which gives the access permission to the object identified in the URL. - The term
pre-signed URL
is used by uploading files to Amazon S3. - API servers respond with a
pre-signed URL
. - Once the client receives the response, it uploads the video using the
pre-signed URL
- To protect copyrighted videos, we can adopt one of the following three safety options:
Digital rights management
(DRM) systems: Three major DRM systems are Apple FairPlay, Google Widevine, and Microsoft PlayReady.AES encryption
: You can encrypt a video and configure an authorization policy. The encrypted video will be decrypted upon playback. This ensures that only authorized users can watch an encrypted video.Visual watermarking
: This is an image overlay on top of your video that contains identifying information for your video. It can be your company logo or company name.
- CDN is a crucial component of our system.
- It ensures fast video delivery on a global scale.
- However, from the back of the envelope calculation, we know CDN is expensive, especially when the data size is large
- Only serve the most popular videos from CDN and other videos from our high capacity
- For less popular content, we may not need to store many encoded video versions. Short videos can be encoded on-demand.
- Some videos are popular only in certain regions. There is no need to distribute these videos to other regions.
- Build your own CDN
Upload error
: retry a few times.Split video error
: if older versions of clients cannot split videos by GOP alignment, the entire video is passed to the server. The job of splitting videos is done on the server-side.Transcoding error
: retry.Preprocessor error
: regenerate DAG diagram.DAG scheduler error
: reschedule a task.Resource manager queue down
: use a replica.Task worker down
: retry the task on a new worker.API server down
: API servers are stateless so requests will be directed to a different API server.Metadata cache server down
: data is replicated multiple times. If one node goes down, you can still access other nodes to fetch data. We can bring up a new cache server to replace the dead one.Metadata DB server down
:Master is down
. If the master is down, promote one of the slaves to act as the new master.Slave is down
. If a slave goes down, you can use another slave for reads and bring up another database server to replace the dead one.
Candidate
: What are the most important features?Interviewer
: Upload and download files, file sync, and notifications.Candidate
: Is this a mobile app, a web app, or both?Interviewer
: Both.Candidate
: What are the supported file formats?Interviewer
: Any file type.Candidate
: Do files need to be encrypted?Interview
: Yes, files in the storage must be encrypted.Candidate
: Is there a file size limit?Interview
: Yes, files must be 10 GB or smaller.Candidate
: How many users does the product have?Interviewer
: 10M DAU.
- Two types of uploads are supported:
Simple upload
. Use this upload type when the file size is small.Resumable upload
. Use this upload type when the file size is large and there is high chance of network interruption
- Here is an example of resumable upload API:
- https://api.example.com/files/upload?uploadType=resumable
- Params:
uploadType
:resumabledata
: Local file to be uploaded
- Example API: https://api.example.com/files/download
- Params:
path
: download file path.
- The
first version
that gets processedwins
, and the version that gets processed later receives a conflict
User
: A user uses the application either through a browser or mobile app.Block servers
: Block servers upload blocks to cloud storage. Block storage, referred to as block-level storage, is a technology to store data files on cloud-based environments. A file can be split into several blocks, each with a unique hash value, stored in our metadata database. Each block is treated as an independent object and stored in our storage system (S3). To reconstruct a file, blocks are joined in a particular order. As for the block size, we use Dropbox as a reference: it sets the maximal size of a block to 4MB [6].Cloud storage
: A file is split into smaller blocks and stored in cloud storage.Cold storage
: Cold storage is a computer system designed for storing inactive data, meaning files are not accessed for a long time.Load balancer
: A load balancer evenly distributes requests among API servers.API servers
: These are responsible for almost everything other than the uploading flow. API servers are used for user authentication, managing user profile, updating file metadata, etc.Metadata database
: It stores metadata of users, files, blocks, versions, etc. Please note that files are stored in the cloud and the metadata database only contains metadata.Metadata cache
: Some of the metadata are cached for fast retrieval.Notification service
: It is a publisher/subscriber system that allows data to be transferred from notification service to clients as certain events happen. In our specific case, notification service notifies relevant clients when a file is added/edited/removed elsewhere so they can pull the latest changes.Offline backup queue
: If a client is offline and cannot pull the latest file changes, the offline backup queue stores the info so changes will be synced when the client is online
- Block servers process files passed from clients by splitting a file into blocks, compressing each block, and encrypting them.
- A file is split into smaller blocks.
- Each block is compressed using compression algorithms.
- To ensure security, each block is encrypted before it is sent to cloud storage.
- Blocks are uploaded to the cloud storage.
User
: The user table contains basic information about the user such as username, email, profile photo, etc.Device
: Device table stores device info. Push_id is used for sending and receiving mobile push notifications. Please note a user can have multiple devices.Namespace
: A namespace is the root directory of a user.File
: File table stores everything related to the latest file.File_version
: It stores version history of a file. Existing rows are read-only to keep the integrity of the file revision history.Block
: It stores everything related to a file block. A file of any version can be reconstructed by joining all the blocks in the correct order.
Load balancer failure
: If a load balancer fails, the secondary would become active and pick up the traffic. Load balancers usually monitor each other using a heartbeat, a periodic signal sent between load balancers. A load balancer is considered as failed if it has not sent a heartbeat for some time.Block server failure
: If a block server fails, other servers pick up unfinished or pending jobs.Cloud storage failure
: S3 buckets are replicated multiple times in different regions. If files are not available in one region, they can be fetched from different regions.API server failure
: It is a stateless service. If an API server fails, the traffic is redirected to other API servers by a load balancer.Metadata cache failure
: Metadata cache servers are replicated multiple times. If one node goes down, you can still access other nodes to fetch data. We will bring up a new cache server to replace the failed one.Metadata DB failure
:Master down
: If the master is down, promote one of the slaves to act as a new master and bring up a new slave node.Slave down
: If a slave is down, you can use another slave for read operations and bring another database server to replace the failed one.
Notification service failure
: Every online user keeps a long poll connection with the notification server. If a server goes down, all the long poll connections are lost so clients must reconnect to a different server. Even though one server can keep many open connections, it cannot reconnect all the lost connections at once. Reconnecting with all the lost clients is a relatively slow process.Offline backup queue failure
: Queues are replicated multiple times. If one queue fails, consumers of the queue may need to re-subscribe to the backup queue.
VIP
- virtual IP
- Responsible for storing metadata such as topics, subscriptions in the database.
- Temporary storage is used to store messages that are not yet delivered to subscribers.
- Significantly smaller fraction of keys is rehashed when a new host is added or removed.
- It's ok to have not up-to-date data on the replicas, due to partition, because we are building
fast
cache, so it's being only a cache miss.
- The most important requirements (should be focused first):
scalability
performance
availability
orconsistency
- Others (should be focused later):
consistency
cost
- Clarify assumptions with interviewer what is better for the system
- Or combine both approaches
Normalization
- minimise a data duplication across different tables
Gossip protocol
is a process of computer peer-to-peer communication that is based on the way epidemics spread. It is a communication protocol in which nodes periodicallyexchange state
information about themselves and about other nodes they know about. Each node in the cluster knows about every other node in the cluster.Quorum writes/reads
define a minimum number of nodes that must agree on the response- NoSQL provides
availability
andpartition tolerance
(AP) guarantees, but no consistency guarantee
- How to ensure uniqueness of short URLs?
- Database will help. We first check if the generated short URL is already present in the database, and return a new URL to the client only when the uniqueness constraint is honored. To check URL presence in the database quickly and cheaply, we should utilize a Bloom filter, a memory-efficient probabilistic data structure to quickly (O(1)) check whether an element is present in a set.
- SQL or NoSQL database to store the mapping between long and short URLs?
- The URL shortening system is read-heavy. Any database that scales well for reads will work. And as we already know, both SQL databases (e.g. using solutions like Vitess) and NoSQL databases can scale reads very well. NoSQL database (e.g. MongoDB) will be a better choice. Why? Let’s discuss it in the second module of the course.
- How to retrieve long URLs quickly?
- The cache will store the mapping between short URLs and long URLs. We can use either cache-aside and read-through pattern for updating the cache. With read-through being a better option. As for the eviction policy, the Least Recently Used (LRU) eviction policy is a good choice.
- Should we have a separate component for handling read requests?
- Yes, we should introduce a separate web service for handling read requests. Let’s call it a Read service. Having a separate component will allow us scale read requests independently of write requests and we can make this service act as a read-through cache.
- How to scale read requests?
- We are going to use horizontally scalable distributed cache solutions (Memcached, Redis) and add more read replicas to the database when needed. As mentioned earlier, we can also utilize Bloom filters to reduce the load on both the cache and the database.
- How to properly handle large spikes or read requests (popular URLs)?
- We should utilize the local cache on Read service machines to store the same hot URL on multiple machines. This way, read requests can be handled by any of the Read service machines in the cluster, avoiding heavy load on one particular machine.
- How to achieve high availability?
- As we already know, achieving high availability requires both architecture and processes. In our system, each individual component should be highly available. Which means we have redundancy for every component in the system, heavily rely on load balancing, protect our components from atypical behaviors of clients and downstream dependencies, quickly detect and resolve failures.
- How to ensure high durability?
- By copying data at multiple different levels. Specifically, we rely on database data replication and backups.
- I think accuracy and speed are important for such systems. Here are a few functional requirements for these
characteristics:
- The system should take no more than 5 seconds to evaluate each transaction.
- If a transaction is flagged as
potentially fraudulent
, it should take no more than several hours to make the final decision on the transaction. - No more than X% (configurable, for example, 1%) should be flagged as
potentially fraudulent
by the system.
- High scalability (to support both long-term and short-term increases in incoming transactions).
- High availability (to be always up and running so that all transactions are evaluated and fraudulent transactions are not missed).
- High durability (transactions flagged as
fraudulent
should never be lost). - High extensibility (we should be able to integrate the system with additional data sources (e.g. user’s past payment history) that allow to more accurately evaluate transactions).
User
, a person who makes an online purchase.Fraud analyst
, a person who evaluates potentially fraudulent transactions to make a final decision.
API gateway
, a web service that represents an entry point to the system.Fraud analyst admin tool
, a web application that is used by fraud analysts to review potentially fraudulent transactions and make a decision.Data aggregator
, a web service that collects additional information about a transaction.Scoring service
, a web service that calculates a score for a transaction (think of this score as the likelihood that a transaction is fraudulent).Messaging system
, middleware that helps transfer potentially fraudulent transactions to fraud analysts in an asynchronous manner.Database
, a persistent storage for potentially fraudulent transactions.Machine learning model training pipeline
, a pipeline consisting of batch jobs that help train a machine learning model used by the scoring service.User profile service
, payment history service, geo service, a set of web services that provide additional information about a transaction.
API gateway
is a single entry point for all requests. It is a common component of many modern architectures. Classic API gateway performs many different functions: authentication, authorization, request routing, response aggregation, protocol translation, load balancing, TLS termination, IP listing, rate limiting, response caching, response compression, logging, monitoring, and a few more. We will cover API gateways in more detail in the second module of the course.Data aggregator
calls a bunch of other services to retrieve additional information about the transaction. We want to know more about who makes the transaction, how many transactions this user made in the past, whether the user’s IP address location matches shipping address, and so on. All auxiliary services are called in parallel to reduce total latency of transaction processing. Retrieved information will be sent to the scoring service to calculate the fraud score for this transaction.Scoring service uses machine learning models
to calculate a score for every transaction. If the score is below a predefined threshold, the transaction is considered non-fraudulent and the payment processing is initiated. If the score exceeds the threshold, the transaction is considered potentially fraudulent and is submitted to a fraud analyst for review. Payment processing is also initiated in this case, but later, if the transaction is flagged as fraudulent by the analyst, the payment will be canceled.Messaging system (e.g. Kafka)
is used to transfer potentially fraudulent transactions to persistent storage. This model helps to deal with spikes in fraudulent transactions. The messaging system will help absorb these spikes and the database will not be impacted.- All potentially fraudulent transactions are stored in the
database
. Both SQL and NoSQL databases can be used. Fraud analysts
use an admin tool (web application) to review transactions. They can either pass or fail every transaction. The admin tools can be quite sophisticated, allowing fraud analysts to view much more additional information about a transaction. It is important to implement a locking mechanism for transactions so that when a transaction is under review, other analysts should not see it.Model training pipeline
uses final decisions to train machine learning models. The pipeline consists of multiple batch jobs that are used to train, test, and deploy models to production.
- I think that such a system should be easy to use and integrate with. Here are a few functional requirements for these
characteristics:
- The system should allow to easily manage users, groups, roles, and policies.
- The system should provide an SDK for various programming languages so that other systems can easily integrate with it
- High availability (since every request must be authenticated/authorized).
- High performance (low latency when processing requests).
- High scalability (to support both long-term and short-term increases in incoming requests).
- Here are the key actors in the system:
- Resource admin, a person who manages policies.
- User, a person who tries to access a resource (e.g. read an S3 object).
- Services, programs (web services, applications, batch jobs, etc.) that try to access a resource (e.g. they produce and write logs to an S3 bucket).
API gateway
, a web service that represents an entry point to the system.Admin tool
, a web application that is used by resource administrators to perform CRUD operations on policies.Poller service
, a service responsible for retrieving all the recent updates for policies.Database
, persistent storage for policies.Auth service
, a web service responsible for handling authentication and authorization requests.Auth service client
, a client application for the auth service.Messaging system
, middleware that helps asynchronously load recently updated policies into a cache.
API gateway
is a single entry point for all write requests. It is a common component of many modern architectures. Classic API gateway performs many different functions: authentication, authorization, request routing, response aggregation, protocol translation, load balancing, TLS termination, IP listing, rate limiting, response caching, response compression, logging, monitoring, and a few more. You might be wondering if read (is authorized) requests should go through API gateway as well. It depends on whether the auth system is internal to target services or not. If the auth system and target services all live within a trusted environment, we can ignore this extra hop to reduce latency and overall complexity.Admin tool
acts as a facade for write requests to the database. It is a stateless web service. This chain of services: API gateway -> web service acting as a database facade -> database is a common pattern for microservice architecture.Policies
are stored in the database. Both SQL and NoSQL databases can be used. If we decide to avoid an additional caching layer and serve read requests directly by the database (which is fine for low to medium scale), we should scale read requests by introducing more read replicas.Poller service
is a component similar to Rule Checker. It polls for all the new and modified policies and sends this data to the messaging system. Alternatively, we could avoid this component entirely by having the Admin Tool write to both the database and the messaging system. A challenge with this option though is how to keep both the database and the messaging system in sync, especially when one of these systems becomes unavailable.- We use the
messaging system
as a transport for policies to update the cache (auth service) asynchronously. Auth service
represents a distributed cache. This allows to handle a large number of read requests. All policies are parsed and stored in a way that guarantees O(1) time complexity (think, hashmaps). And we use servers with a lot of CPU and memory.- If the scale is huge (millions requests per second) and/or latency requirements are very aggressive (a single digit milliseconds), server side alone (auth service) is not enough. We also need to cache aggressively on the client side. In the case of multiple requests coming from the same user, we don't even need to call the auth service. A simple read-through cache implemented in the auth client will be enough. A further optimization is to use the target service fleet of machines as a consistent hashing ring. When all auth clients know about each other, and every client’s local cache stores its own chunk of data. Basically, in this case, we have three layers of caching: a local cache in the client, a distributed cache represented by all clients in the target service fleet, and a distributed cache represented by the auth service.
- Here are a few functional requirements for these characteristics:
- The system should store metrics at one-minute granularity.
- Older data should be stored at lower granularity (e.g. 1 hour).
- The system should provide an SDK for various programming languages and/or a software agent so that other systems can easily integrate with it.
- High scalability (the total number of metrics will grow rapidly over time).
- High availability (keep in mind that latest data is typically more important than old data).
- High read performance (especially important for high resolution alerts configured on top of metrics).
- Low dollar cost (especially for storing less relevant/old data).
Metric producer services
,programs
(web services, applications, batch jobs, etc.) that emit performance metrics.User
, typically an owner of the service, a person who watches the metrics.Metric consumer services
, programs that read metrics on a regular basis to timely react to their changes (e.g. alert generation systems, autoscaling systems, anomaly detection systems).
API gateway
, a web service that represents an entry point to the system.Metric aggregator
, a service that aggregates metric values in memory over a short period of time and stores the aggregated values in persistent storage.Metric partitioner
, a service responsible for partitioning metrics. This allows for more efficient aggregation of metric values.Messaging system
, a temporary buffer for metric data that helps parallelise metric processing.Monitoring client
, a client application for a monitoring system that helps to aggregate data on the client side and send aggregated data for further aggregation on the server side.Hot storage
, read-optimized storage.Cold storage
, persistent storage for metric values.Read service
, a web service responsible for serving read requests by retrieving metric data from multiple locations (hot and cold storage).****
Monitoring client
is a daemon process that runs locally on every metric producer service machine. It reads data from service logs and aggregates data in a local memory. Think of it as a hashmap where a metric name represents a key ( e.g. "total request count") and value represents a count (of requests, errors, etc.) or a sum (of response times, request bytes, etc.). Periodically, for example, at the end of every minute, all accumulated values are sent to the monitoring system.API gateway
is a single entry point for all requests. It is a common component of many modern architectures. Classic API gateway performs many different functions: authentication, authorization, request routing, response aggregation, protocol translation, load balancing, TLS termination, IP listing, rate limiting, response caching, response compression, logging, monitoring, and a few more. If the monitoring system and metric consumer services all live within a trusted environment and read latency is critical, we can have metric consumer services call the read service directly.- Each
producer service
can produce thousands of different metrics. And there can be thousands of producer services. Which can lead to high cardinality of metrics. To aggregate data on the server side, which means calculating the total count or the total sum for each metric for every 1-minute interval, we need to partition metrics first. This allows us to split all metrics into several disjoint groups, and then aggregate metric values within each group. We can use a unique metric name (metric name + service name) as a partition key. We can avoid partitioning metrics if metric cardinality is low. - Incoming metric data is buffered in a
messaging system
. Each metric goes to its own shard/partition inside the messaging system. There will be a separate consumer (metric aggregator service instance) for each shard. Metric aggregator service
aggregates metric values. Which basically means it sums up the incoming values for a metric. It does it for a predefined duration (e.g. 5-10 seconds) and sends the accumulated value to persistent storage. If we need to scale and/or speed up data aggregation, we first add more shards/partitions to the messaging system and then add the same number of machines/instances to the service. So that each new shard gets its own dedicated consumer.Hot storage
represents a distributed cache (e.g. Redis) or a key-value database (e.g. DynamoDB) and stores the last N days of data. In monitoring systems, the most recent data is more important and read frequently, while older data is usually less important and read less frequently. Therefore, this storage has two key requirements: high read throughput and low read latency.Cold storage
represents persistent storage for data. We can store all data there (if our hot storage is not persistent) or only data older than N days (if our hot storage is persistent). As discussed in the course, object storage (e.g. S3) is a good option. Data in hot and cold storage is stored at some granularity, for example, 1 minute. But on a large scale (from millions to billions of metrics), storing such a large amount of per-minute data is costly. In addition, as we have already mentioned, the value of monitoring data decreases over time. For this reason, it makes sense to aggregate data over time into 5-minute intervals, 1-hour intervals, and so on. The system should have a separate component responsible for aggregating data from shorter time intervals to longer time intervals and purging data for shorter time intervals from the storage.Read service
has three primary functions: route read requests to the appropriate storage, stitch data from hot and cold storage when both new and old data is requested, cache responses. The most recent monitoring data is not well suited for caching because it changes frequently. But old data can be effectively cached. When there is a read request that spans both storages, we will fetch new data from hot storage and old data from an internal cache of the read service.
- User can create calendar
- User can create event in calendar
- User can invite other users to event
- Notification for event
- Timezone support
- High availability
- Low latency
- Durability
- Scalability
- User can add comments to photos and videos
- User can likes photos and videos
- Upload photo
- Upload video
- Watch photos, videos in feed
- Create stories
- Following other users
- Send and receive messages
- Scalable
- High Available
- Performante
- Durable
- Resiliente
- Maintantable
- Secure
- Cost efficient
- Every 10 user leave comments
- Every comment size up to 5 KB
- We assume that comments are text only
- Every 5 user make like with a predefined size 8 bytes
- 1B users
- 200M DAU
- Every 100 user upload a photo or video
- Photo size up to 5 MB after encoding with several dimensions = 2 * 5 = 10MB
- Video size up to 100 MB after encoding with several dimensions = 2 * 100 = 200MB
- Every 100 user create stories
- Video size up to 50 MB after encoding with several dimensions = 2 * 50 = 100MB
- Every 200 user send and receive messages
- Every message size up to 5 KB
- If every 10 user leave comments and every comment size up to 5 KB then we have 200M / 10 = 20M comments
- Every comment size up to 5 KB then we have 20M users * 5 Kb = 100M KB = 100GB comments daily
- Every 5 user make like with a predefined size 8 bytes then we have 200M / 5 = 40M likes
- Every like size up to 8 bytes then we have 40M users * 8 bytes = 320M bytes = 320MB likes daily
Overall we have 100GB comments daily and 320MB likes daily
- The
user
sends a request toDNS
to get the IP address of the Instagram service - The request goes to the
Load Balancer
- The
Load Balancer
forwards the request to theAPI Gateway
where security checks and throttling are done - The
API Gateway
forwards the request to theWrite API async
service - The
Write API async
service writes the data toQueue
- The
Workers
read the data from theQueue
and write the data to theNoSQL
database and to theCache
- The
Analytics service
reads the data from theQueue
and writes the data to theData Warehouse
- POST https://instagram.com/api/v1/like
- POST https://instagram.com/api/v1/comment
{ "user_id": "{user_id}", "photo_id": "{photo_id}", "comment": "{comment}" }
Response: comment_id
: UUID
-
DELETE https://instagram.com/api/v1/comment&comment_id={comment_id}
-
EDIT https://instagram.com/api/v1/comment&comment_id={comment_id}
{ "comment": "{new comment}" }
- The
user
sends a request toDNS
to get the IP address of the Instagram service - The request goes to the
Load Balancer
- The
Load Balancer
forwards the request to theAPI Gateway
where security checks and throttling are done - The
API Gateway
forwards the request to theRead API
service - The
Read API
fetches the comments from the NoSQL database from theRead replica
Response:
{
"comments": [
{
"comment_id": "{comment_id}",
"user_id": "{user_id}",
"comment": "{comment}"
}
]
}
Response:
{
"likes": [
{
"user_id": "{user_id}"
}
]
}
- Application is
Read Heavy
because people mostly consume than publish content.Read replica
might be a bottleneck if we have a lot of read requests
Queue
might be a bottleneck if we have a lot of write requests and workers can't handle the load
Load Balancer
is a SPOF because if it goes down then the whole system goes down, that is why we need to have a severalLoad Balancers
in different regionsDatabase
is a SPOF because if it goes down then the whole system goes down, that is why we need to have a severalRead replicas
in different regions
- We can implement a throttling mechanism in the
API Gateway
to limit the number of requests in order to prevent the system from being overloaded - We need to have a several
Read replicas
in different regions - If we encounter a 10x write increase, read replicas might not be consistent with the master database. By using
Eventual Consistency
users eventually will see the updated data. - In case of a 10x read increase, we can use
Cache
to store the most frequently accessed data. This will reduce the load on the database and improve the response time.