The decisions: which store, how to index, how to keep data correct under concurrency, and how to scale it. The unifying principle: access patterns drive the schema, not the other way around.
SQL vs NoSQL โ and the NoSQL families
Pick by access pattern and consistency needs, not hype.
Relational (SQL โ Postgres, MySQL) โ structured schema, joins, and ACID transactions. Default choice when data is relational and correctness matters (payments, orders, anything with invariants across rows). Scales further than people assume (replicas, partitioning, modern Postgres handles enormous load).
NoSQL is not one thing โ itโs four families, each tuned for an access pattern:
| Family | Examples | Data model | Best for | Weakness |
|---|---|---|---|---|
| Key-value | Redis, DynamoDB, Memcached | key โ blob | Massive-scale get/put by key, sessions, caches | No queries beyond the key |
| Document | MongoDB, Couchbase | JSON-like docs | Flexible/nested schema, self-contained entities | Joins weak; cross-doc consistency hard |
| Wide-column | Cassandra, HBase, Bigtable | rows with dynamic columns, partition+clustering keys | Write-heavy, time-series, huge volume, predictable queries | Query must match the key design; no ad-hoc joins |
| Graph | Neo4j, Neptune | nodes + edges | Deeply connected data, traversals (social graph, fraud) | Doesnโt scale horizontally as easily |
| Time-series | InfluxDB, TimescaleDB | timestamped points | Metrics, IoT, monitoring (append + range scan) | Niche; specialized query model |
The honest interview framing: โSQL by default for relational, transactional data. I reach for a specific NoSQL family when an access pattern demands it: KV for raw-scale get/put, document for flexible nested shapes, wide-column for write-heavy time-series at huge volume, graph for traversals โ accepting weaker joins or consistency in exchange.โ
Indexing: B-tree vs hash vs LSM
An index is a data structure that lets the DB find rows without scanning all of them. The cost is always the same: indexes use space and slow writes (every insert/update maintains them), so index the columns you filter/sort/join on โ not everything.
- B-tree (default in Postgres/MySQL): a balanced sorted tree โ
O(log n)point lookups and range scans (WHERE age BETWEEN,ORDER BY). The workhorse. - Hash index:
O(1)exact-match lookups, but no range queries and no ordering. Use only when you exclusively do equality lookups. - LSM-tree (Cassandra, RocksDB, LevelDB): buffers writes in memory (memtable), flushes sorted runs to disk (SSTables), compacts in the background. Write-optimized โ sequential disk writes, great throughput โ at the cost of read amplification (a read may check several SSTables; mitigated by Bloom filters) and compaction overhead.
B-tree: fast reads + ranges, slower random writes โ read-heavy, OLTP
LSM-tree: very fast writes, reads need merging โ write-heavy, time-series
A composite index on (a, b) serves queries filtering on a, or a then b, but not b alone (left-prefix rule). A covering index that includes all selected columns lets the DB answer from the index without touching the table.
Normalization vs denormalization
- Normalized (3NF): each fact stored once, related by foreign keys. No update anomalies, less storage โ but reads need joins, which get expensive at scale.
- Denormalized: duplicate data so a read hits one place (e.g. embed the author name in each post). Faster reads, but writes must update every copy and you risk inconsistency.
Normalize for write-heavy, integrity-critical data; denormalize for read-heavy paths where join cost dominates. NoSQL stores push you toward denormalization because they lack good joins โ you model the data around the query.
ACID vs BASE
ACID (relational): Atomic (all-or-nothing), Consistent (invariants hold), Isolated (concurrent txns donโt corrupt each other), Durable (committed survives a crash). Use it when multiple writes must succeed together โ a money transfer debits one account and credits another; partial completion is a bug.
BASE (many NoSQL): Basically Available, Soft state, Eventually consistent. Trade strict consistency for availability and scale. Right for feeds, counters, catalogs where a brief stale read is harmless.
Isolation levels & their anomalies
Isolation is the โIโ in ACID โ how concurrent transactions interleave. Stronger isolation prevents more anomalies but costs concurrency/throughput.
| Level | Prevents | Still allows |
|---|---|---|
| Read Uncommitted | โ | Dirty reads (see uncommitted data) |
| Read Committed | Dirty reads | Non-repeatable reads, phantoms |
| Repeatable Read | + Non-repeatable reads | Phantoms (in standard SQL; MySQL/Postgres add protections) |
| Serializable | + Phantoms โ behaves as if serial | Lowest concurrency |
The anomalies, concretely:
- Dirty read โ you read another txnโs uncommitted change that may roll back.
- Non-repeatable read โ you read a row twice in one txn and get different values (someone committed an update between).
- Phantom read โ you run the same range query twice and new rows appear (someone inserted matching rows).
- Lost update โ two txns read-modify-write the same row; one overwrites the other. (Fix:
SELECT ... FOR UPDATEor atomic increments.)
Default in Postgres is Read Committed; MySQL InnoDB defaults to Repeatable Read. Name the anomaly youโre guarding against and pick the weakest level that prevents it โ Serializable is correct but throttles throughput.
Transactions caveat: the N+1 trap
Fetching a list, then firing one query per item, is N+1 queries โ 1 query for 100 posts, then 100 for each author = 101 round trips, a silent killer. Fix with eager loading: a join, or a single WHERE author_id IN (...) batch.
Sharding / partitioning & rebalancing
When one machine canโt hold the writes/storage, split data across nodes by a shard key:
| Strategy | Pro | Con |
|---|---|---|
| Range (by key range) | Efficient range scans | Hot ranges (recent data on one shard) |
| Hash (hash the key) | Even distribution | No range queries |
| Directory/lookup | Flexible rebalancing | Lookup service is a dependency/SPOF |
Choosing the key: derive it from your access patterns and pick one that distributes load evenly and keeps related data co-located (so common queries hit one shard). A bad key creates hot shards and forces scatter-gather across all shards.
Rebalancing is the hard part. hash % N reshuffles everything when N changes; instead use consistent hashing or pre-split into many small partitions and move whole partitions to new nodes. Either way you avoid moving most of the data.
Replication topologies
| Topology | Writes | Reads | Tradeoff |
|---|---|---|---|
| Leader-follower (single leader) | Leader only | Followers (scale reads) | Replication lag โ stale follower reads; failover needed |
| Multi-leader | Any leader | Any | Multi-region low-latency writes, but write conflicts to resolve |
| Leaderless / quorum (Dynamo) | Any N nodes | Any | Tunable consistency via W + R > N; no single failover point |
- Sync vs async (leader-follower): sync waits for a follower ack โ no data loss on failover, higher latency; async โ fast writes but a crashed leader loses the un-replicated tail and followers lag.
- Quorum math: with N replicas, if writes go to W and reads from R, then
W + R > Nguarantees a read overlaps a write โ strong-ish consistency. Tune (W=N,R=1) for read-fast or (W=1,R=N) for write-fast.
Access patterns drive the schema โ worked example
Design storage for a chat app. Donโt start with entities โ start with the queries:
Access patterns:
- Load the most recent 50 messages in a conversation (by far the hottest).
- Page older messages on scroll-up.
- Show a userโs list of conversations, most-recently-active first.
Decision: This is write-heavy (every message is a write), read by conversation in time order, with no need for cross-conversation joins. A wide-column store (Cassandra) fits:
Table messages
partition key: conversation_id -- co-locates a conversation on one node
clustering key: message_ts DESC -- stored sorted by time โ pattern 1 & 2 are a single seek
Table conversations_by_user
partition key: user_id
clustering key: last_active_ts DESC -- pattern 3
Pattern 1 becomes a single partition read of the first 50 rows โ no scan, no join. Pattern 3 is a denormalized second table maintained on write (write amplification we accept for read speed โ classic CQRS-flavored modeling). Had I started from a normalized ER diagram, Iโd have a messages table needing an index on (conversation_id, ts) and a join for the conversation list โ fine at small scale, but the partition-key design is what scales. The queries chose the store, the keys, and the duplication.
Interview questions & model answers
Q: SQL or NoSQL for a new service โ how do you decide? โBy access pattern and consistency needs. If the data is relational with cross-row invariants and I need transactions โ SQL. If I need a specific pattern at scale โ raw get/put (KV), flexible nested docs (document), or write-heavy time-series (wide-column) โ and can give up joins or strong consistency, the matching NoSQL family. I default to Postgres and justify any move away from it.โ
Q: B-tree vs LSM-tree? โB-tree gives balanced read/write with fast point and range queries โ ideal for read-heavy OLTP. LSM buffers writes in memory and flushes sorted runs, so itโs write-optimized with sequential I/O โ great for ingestion and time-series โ but reads may merge several SSTables (Bloom filters help) and compaction adds background work. So: read-heavy โ B-tree, write-heavy โ LSM.โ
Q: Walk me through isolation levels.
โRead Uncommitted allows dirty reads. Read Committed stops those but allows non-repeatable and phantom reads. Repeatable Read stops non-repeatable reads. Serializable behaves as if transactions ran one at a time, preventing phantoms too โ but with the least concurrency. I pick the weakest level that prevents the specific anomaly I care about, and use SELECT FOR UPDATE or atomic increments to avoid lost updates.โ
Q: How do you choose a shard key? โFrom the access patterns: it must distribute load evenly to avoid hot shards, and co-locate data thatโs queried together so common queries hit one shard instead of scatter-gather. For a chat app, conversation_id co-locates a conversation. I avoid keys that monotonically increase (timestamps) because recent traffic piles onto one shard.โ
Q: How do you rebalance shards without moving all the data?
โAvoid hash % N, which reshuffles everything when N changes. Use consistent hashing โ adding a node only remaps one arc โ or pre-split into many small partitions and migrate whole partitions to new nodes. Either way, only a fraction of the data moves.โ
Q: Explain quorum consistency.
โWith N replicas, write to W and read from R. If W + R > N, every read set overlaps every write set, so a read sees the latest write โ strong-ish consistency without a single leader. You tune W and R for read- or write-optimized behavior, trading latency against consistency.โ
Q: What is the N+1 problem and how do you fix it?
โLoading a list then querying once per item โ 1 + N round trips. I fix it by eager-loading: a join or a single batched WHERE id IN (...). ORMs hide it, so I watch query logs.โ
Common mistakes / what weak candidates do
- โNoSQL is faster/more scalableโ as a blanket claim, without naming a family or an access pattern.
- Indexing everything (or nothing) โ not connecting indexes to the actual filter/sort/join columns, or ignoring their write cost.
- Not knowing isolation levels beyond โACID,โ and missing lost-update/phantom anomalies.
- Choosing a monotonic shard key (auto-increment id, timestamp) that creates a hot shard, or ignoring rebalancing entirely.
- Designing entities before access patterns, then unable to justify the keys or indexes.
- Assuming dual writes to two stores are consistent (use the outbox pattern instead).
- Forgetting replication lag โ reading your own write from a lagging replica and calling it a bug in the app.