🤖
@qwertyvipul | code
  • Code
  • DSA
    • Disjoint Set
    • Segment Tree
    • Bit Operations
    • Binary Exponential
    • Kadane's Algorithm
    • Modulus Multiplicative Inverse
  • Quick Notes
    • Design Patterns
    • System Design
    • React.js
  • LeetCode With JavaScript
Powered by GitBook
On this page
  • Quick notes
  • Applications
  • URL Shortener
  • Web crawler
  • Notification system
  • News Feed
  • Chat system
  • Search autocomplete
  • YouTube
  • Proximity Service
  • Quick terms
  • Questions
  • Quick links
  1. Quick Notes

System Design

Quick notes

  1. User <--> DNS

  2. User <--> Web server <--> Database

  3. Vertical scaling

  4. Horizontal scaling

    1. Load balancers

    2. Multiple servers

    3. Database replication

      1. Master/slave relationships

      2. Sharding

        1. Resharding data

        2. Celebrity problem

        3. Join and de-normalization

  5. Performance improvements

    1. CDN

    2. Cache

  6. Stateless web tier

  7. Data centers

  8. Message queue

  9. Logging, metrics, automation

  10. L1 and L2 cache

  11. Mutex lock/unlock

  12. Branch mispredict

  13. Disk seek

  14. Back of the envelope estimation

  15. Cloud microservices

  16. API gateway

    1. rate limiting

    2. SSL termination

    3. authentication

    4. IP whitelisting

    5. servicing static content

    6. etc.

  17. Rate limiting algorithms

    1. Token bucket

    2. Leaking bucket

    3. Fixed window counter

    4. Sliding window log

    5. Sliding window counter

  18. Rate limiter in a distributed environment

    1. Race condition

    2. Synchronization issue

      1. Sticky sessions

      2. Redis

    3. Performance optimization

    4. Monitoring

  19. Consistent hashing

    1. The rehashing problem

    2. Hotspot key problem

    3. Virtual nodes

  20. Key-value store

    1. Single server key-value store

    2. Distributed key-value store

      1. CAP (Consistency, Availability, Partition)

      2. CAP Theorem

        1. CP, AP, CA

    3. System components

      1. Data paritition

      2. Data replication

      3. Consistency

        1. Quorum consensus

          1. N, W, R

        2. Consistency models

          1. Strong consistency

          2. Weak consistency

          3. Eventual consistency

      4. Inconsistency resolution

        1. Versioning

          1. Detection and reconciliation

        2. Vector clock

      5. Handling failures

      6. System architecture diagram

      7. Write path

      8. Read path

      9. Gossip protocol

      10. Hinted handoff

      11. Anti-entropy protocol

        1. Merkle tree

  21. Unique ID generator

    1. Multi master replication

    2. UUID

    3. Ticket server

    4. Twitter snowflake approach

Applications

URL Shortener

  1. Use cases

    1. URL shortening

    2. URL redirecting

    3. High availability

  2. Back of the envelope estimation

    1. No. of urls to store: to decide hash function

    2. Storage requirement

  3. Design

    1. 301 redirect: permanently redirected

    2. 302 redirect: temporarily redirected

    3. Hash functions

      1. Hash + collisions resolution

      2. Base-n conversion of unique ids

Web crawler

  1. Use cases

    1. Search engine indexing

    2. Web archiving

    3. Web mining

    4. Web monitoring

  2. Characteristics of a good web crawler

    1. Scalability: efficiency, parallelization, etc.

    2. Robustness: Handle edge cases like traps, bad HTML, unresponsive servers, etc.

    3. Politeness: Do not send too many requests

    4. Extensibility: Extend with minimal changes to crawl more stuff in future like images.

  3. Queries per second

    1. Peak QPS

    2. Average web page size

    3. Storage requirements

  4. Design

    1. Seed urls

    2. URL Frontier

    3. HTML downloader

    4. DNS resolver

    5. Content parser

    6. Content seen?

    7. Content storage

    8. URL Extractor

    9. URL Filter

    10. URL seen?

    11. URL storage

    12. Web crawler workflow

    13. Priority

      1. Front queues: manage prioritization

      2. Back queues: manage politeness

    14. Freshness

    15. Performance optimization

      1. Distributed crawl

      2. Cache DNS resolver: Keep mapping and update periodically using cron jobs.

      3. Locality

      4. Short timeout

    16. Robustness

      1. Consistent hashing: To distribute load among downloaders

      2. Save crawl states and data

      3. Exception handling

      4. Data validation

    17. Extensibility

    18. Avoid problems

      1. Redundant content

      2. Spider traps

      3. Data noise

    19. Server side rendering

    20. Filtering

    21. Analytics

Notification system

  1. Requirements

    1. Types: Push notification, SMS, email

    2. Real time?: Soft real time?

    3. Supported devices: IOS, android, laptop/desktop

    4. Trigger notifications?

    5. opt-out?

    6. how many?

  2. High level: types, contact info gathering, sending/receiving flow

  3. APNS, FCM, SMS, Email

  4. Design

    1. Notification servers

    2. Cache

    3. DB

    4. Message queues

    5. Workers

    6. Third-party servers

    7. IOS, Android, SMS, Email

News Feed

  1. High level: Feed publishing, feed building

    1. User <--> Load balancer <--> Web servers <--> [Post service, fanout service, notification service]

    2. Post service <--> Post cache

    3. News feed service <--> News feed cache

    4. Post cache <--> Post DB

  2. Fanout service

    1. Fanout on write

      1. Hotkey problem

    2. Fanout on read

  3. Cache architecture

    1. New feed: news feed

    2. Content: host cache, normal

    3. Social Graph: follower, following

    4. Action: liked, replied, others

    5. Counters: like counters, reply counter, other counters

Chat system

  1. Requirements

    1. A one-on-one chat with low delivery latency

    2. Small group chat

    3. Online status

    4. Multiple device support. The same account can be logged in to multiple devices at the same time.

    5. Push notifications

  2. Sender <--> Chat service [store message, relay message] <--> receiver

  3. HTTP is client-initiated connection

  4. Server-initiated connection

    1. Polling

    2. Long polling

    3. web socket

  5. Apache zookeeper

  6. User online status

Search autocomplete

  1. Back of the envelope estimation: Peak QPS, storage

  2. High-level design: Data gathering service, query service

  3. Trie

  4. Stream processing: Apache Hadoop MapReduce, Apache Spark Streaming, Apache Storm, Apache Kafka

YouTube

  1. Back of the envelope estimation: Average video size, daily storage space need, CDN cost, etc.

  2. Transcoding

  3. MPEG-DASH

Proximity Service

  1. Two-dimensional search: Evenly divided grid, Geohash, Quadtree, Google S2 (Hilbert Curve)

  2. Blue/green deployment

Quick terms

  1. High availability

  2. High scalability

  3. Automatic scaling

  4. Heterogeneity

  5. Tunable consistency

  6. Low latency

  7. Asynchronous

  8. High speed networks

  9. Bloom filter

  10. Dedupe mechanism

Questions

  1. What DB to use relational/non-relational?

  2. How non-relational databases differ from each other?

  3. How load-balancers work?

  4. Latency vs throughput?

  5. ACID properties?

  6. Memcached and Redis?

  7. Anycast?

  8. Why use vector clocks and not twitter snowflake approach for versioning

Quick links

PreviousDesign PatternsNextReact.js

Last updated 1 year ago

Should you go Beyong Relational Databases?:

Caching Strategies and How to Choose the Right One:

Active-Active for Multi-Regional Resiliency:

What it takes to run Stack Overflow:

Better Rate Limiting With Redis Sorted Sets:

How we built rate limiting capable of scaling to millions of domains:

https://bytebytego.com/courses/system-design-interview/design-a-url-shortener
https://raventools.com/
https://searchengineland.com/study-29-of-sites-face-duplicate-content-issues-80-arent-using-schema-org-microdata-232870
https://blog.teamtreehouse.com/should-you-go-beyond-relational-databases
https://codeahoy.com/2017/08/11/caching-strategies-and-how-to-choose-the-right-one/
https://netflixtechblog.com/active-active-for-multi-regional-resiliency-c47719f6685b
https://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-stack-overflow/
https://engineering.classdojo.com/blog/2015/02/06/rolling-rate-limiter/
https://blog.cloudflare.com/counting-things-a-lot-of-different-things/