The 6-step framework
Step 1: Clarify requirements (5 min)
Split into two types:
Functional requirements โ what the system does:
- Core features only (3โ4 max for the interview)
- Who are the users? What are their primary journeys?
Non-functional requirements โ the quality of how it does it:
- Scale: DAU, peak RPS, data volume
- Latency: real-time? < 200ms p99? eventual consistency ok?
- Availability: 99.9%? 99.99%? (9s = downtime/year)
- Consistency: strong vs eventual
- Durability: can we lose a few events?
What to scope out: โIโll skip auth, GDPR handling, and payment processing for this design.โ
Step 2: Capacity estimation (3 min)
Quick back-of-envelope. The goal is to know the order of magnitude โ not to be precise.
Useful numbers to memorize:
1 day = 86,400 seconds โ 100K seconds
1 month = 2.5M seconds
Traffic:
1M DAU ร 10 req/day = 10M req/day = 100 RPS
Storage:
1M users ร 1KB/user = 1 GB
1B users ร 1KB = 1 TB
Bandwidth:
100 RPS ร 1KB/request = 100 KB/s (negligible)
100 RPS ร 1MB/request = 100 MB/s (significant)
Template:
DAU: __M
Read RPS: __ Write RPS: __
Storage per record: __ KB
New records/day: __
Total storage (5 years): __ GB / TB
Step 3: API design
Define the core REST (or GraphQL) API surface:
# Resource endpoints โ noun-based, plural
GET /api/v1/posts?cursor=xxx&limit=20 โ 200 { items, nextCursor }
POST /api/v1/posts โ 201 { id, ... }
GET /api/v1/posts/:id โ 200 Post
PATCH /api/v1/posts/:id โ 200 Post
DELETE /api/v1/posts/:id โ 204
# Actions
POST /api/v1/posts/:id/like โ 200 { likeCount }
POST /api/v1/posts/:id/share โ 201 Share
Mention:
- Versioning (
/v1/) - Auth header (
Authorization: Bearer <token>) - Pagination strategy (cursor, not offset, for live feeds)
- Rate limiting headers
Step 4: Data model & storage
Define the primary entities and choose a storage engine per use case:
User: { id, username, email, passwordHash, createdAt }
Post: { id, authorId, content, mediaUrls[], likeCount, createdAt }
Like: { postId, userId, createdAt } โ composite primary key
Follow: { followerId, followeeId, createdAt } โ composite PK
Storage decision matrix:
| Need | Technology | Why |
|---|---|---|
| Structured relational data | PostgreSQL | ACID, joins, mature |
| Flexible documents / catalog | MongoDB | Schema-less, horizontal scale |
| High-throughput time-series | Cassandra | Wide-column, append-only optimized |
| Session / cache / ephemeral | Redis | Sub-millisecond reads, TTL support |
| Full-text search | Elasticsearch | Inverted index, relevance scoring |
| Blob storage | S3 | Cheap, durable, CDN-friendly |
| Graph relationships | Neo4j | Traversal-optimized (rarely needed) |
Step 5: High-level architecture
Draw the components and connections. A standard web-scale architecture:
Clients (web/mobile)
โ
โผ
[ CDN โ static assets + cached API responses ]
โ
โผ
[ Load Balancer (L7 โ nginx / AWS ALB) ]
โ
โโโโโโดโโโโโ
โผ โผ
[App Server] [App Server] โ stateless, horizontally scalable
โ โ
โผ โผ
[ Cache โ Redis ] โ read-through for hot data
โ
โผ
[ Primary DB ] โโreplicateโโโถ [ Read Replica(s) ]
โ
โผ
[ Message Queue (Kafka / SQS) ]
โ
โผ
[ Async Workers ] โ emails, notifications, thumbnails, analytics
Label the arrows with protocols (HTTP/2, gRPC, TCP) and data sizes where relevant.
Step 6: Deep-dive (10 min)
Pick one or two components the interviewer asks about, or where your design has interesting tradeoffs:
Common deep-dives:
- Database schema + indexing โ how do you avoid slow queries at scale?
- Caching strategy โ cache-aside? write-through? whatโs the TTL? what do you cache?
- Message queue fanout โ push vs pull model; at-least-once vs exactly-once delivery
- Rate limiting โ token bucket vs sliding window; distributed rate limiter
- Consistency tradeoffs โ eventual vs strong; what breaks if a userโs feed is 1s stale?
Trade-off cheat sheet
| Scenario | Push toward | Away from |
|---|---|---|
| Read-heavy (10:1) | Cache, read replicas, CDN | Normalize excessively |
| Write-heavy | Async queues, batching, Cassandra | Sync writes to multiple tables |
| Low-latency reads | Redis, denormalize, pre-compute | Real-time JOINs |
| Global users | CDN, multi-region, eventual consistency | Single-region, strong consistency |
| Auditability / compliance | Append-only log, event sourcing | In-place updates |