HLD: Design a URL Shortener

Design bit.ly โ€” hash generation, redirect performance, custom aliases, analytics, and storage at scale.

must medium โฑ 28 min hldurl-shortenerhashingredirectcacheanalytics
Mastery:
Why interviewers ask this
URL shortener is the canonical HLD starter โ€” deceptively simple, with real depth in hash collision handling, redirect latency, and analytics at scale.

Step 1: Requirements

Functional:

  • Shorten a long URL โ†’ get a short code (e.g. bit.ly/a3Bx9)
  • Redirect short URL โ†’ original URL (301 vs 302?)
  • Custom aliases (bit.ly/my-campaign)
  • Link expiry (optional TTL)
  • Click analytics: total clicks, unique visitors, country, referrer

Non-functional:

  • Reads >> writes (100:1 ratio typical)
  • Redirect latency < 10ms p99
  • 100M new URLs/day
  • Links live indefinitely by default
  • Availability > 99.99% for redirects

Estimation:

Writes: 100M URLs/day รท 86,400s โ‰ˆ 1,200 writes/sec
Reads:  100:1 ratio โ†’ 120,000 reads/sec
Storage: 100M ร— 365 ร— 10 years ร— ~500 bytes/record โ‰ˆ 180 TB
Short code space: base62 (a-zA-Z0-9), 7 chars โ†’ 62^7 โ‰ˆ 3.5 trillion codes (plenty)

Step 2: Short code generation

Two approaches:

Option A: Hash-based (MD5 / SHA256 โ†’ truncate)

import hashlib, base64

def shorten(long_url: str) -> str:
    hash_bytes = hashlib.md5(long_url.encode()).digest()
    # Base64-encode, take first 7 chars, make URL-safe
    code = base64.urlsafe_b64encode(hash_bytes).decode()[:7]
    return code  # e.g. "a3Bx9yZ"

Problem: hash collisions. Two different URLs could produce the same 7-char prefix.

Fix: on collision, append a counter suffix and re-hash:

def shorten_safe(long_url: str, db) -> str:
    suffix = 0
    while True:
        code = md5_encode(long_url + str(suffix))[:7]
        if not db.exists(code):
            return code
        suffix += 1

Downside: multiple DB lookups on collision. At large scale collisions are rare but non-zero.


Option B: Counter-based (preferred at scale)

Use a distributed ID generator (e.g. Twitter Snowflake, or a simple Redis INCR) to generate a unique integer, then encode it in base62:

BASE62 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

def to_base62(num: int) -> str:
    if num == 0:
        return BASE62[0]
    result = []
    while num > 0:
        result.append(BASE62[num % 62])
        num //= 62
    return ''.join(reversed(result))

# Redis: id = INCR url_counter  โ†’ to_base62(id) = short code

Advantages: no collision possible, monotonically increasing, no DB lookups during generation.

Range-based ID allocation: to avoid Redis being a single point of failure, each app server pre-allocates a range of IDs (e.g. 1,000,000 at a time) from a dedicated ID service. It burns through the range locally, then fetches the next.


Step 3: Data model

-- URLs table
CREATE TABLE urls (
  code        VARCHAR(10) PRIMARY KEY,  -- the short code
  long_url    TEXT NOT NULL,
  created_by  UUID REFERENCES users(id),
  created_at  TIMESTAMPTZ DEFAULT NOW(),
  expires_at  TIMESTAMPTZ,
  is_custom   BOOLEAN DEFAULT FALSE
);
CREATE INDEX idx_urls_long_url ON urls(long_url);  -- for dedup check

-- Clicks table (write-heavy, append-only)
CREATE TABLE clicks (
  id          BIGSERIAL,
  code        VARCHAR(10) NOT NULL,
  clicked_at  TIMESTAMPTZ DEFAULT NOW(),
  ip_hash     VARCHAR(64),    -- hashed for privacy
  country     VARCHAR(2),
  referrer    TEXT,
  user_agent  TEXT
) PARTITION BY RANGE (clicked_at);  -- partition by month

Step 4: Architecture

Client
  โ”‚ GET /a3Bx9
  โ–ผ
[ CDN / Edge ]
  โ”‚ Cache-hit: 301 redirect immediately (< 5ms)
  โ”‚ Cache-miss:
  โ–ผ
[ Load Balancer ]
  โ”‚
  โ–ผ
[ Redirect Service (stateless, horizontally scaled) ]
  โ”‚
  โ”œโ”€ 1. Check Redis cache: GET url:a3Bx9
  โ”‚       hit  โ†’ 301 redirect to cached long URL
  โ”‚       miss โ†“
  โ”œโ”€ 2. Read from DB: SELECT long_url FROM urls WHERE code='a3Bx9'
  โ”‚       hit  โ†’ write to Redis (TTL 24h) โ†’ 301 redirect
  โ”‚       miss โ†’ 404
  โ”‚
  โ””โ”€ 3. Fire-and-forget analytics event โ†’ Kafka topic "clicks"
         (does NOT block the redirect response)
  
[ Kafka "clicks" topic ]
  โ””โ”€ Analytics Consumer โ†’ aggregates into ClickSummary table (PostgreSQL / ClickHouse)

Step 5: 301 vs 302 redirect

301 (Permanent)302 (Temporary)
Browser caches?Yes โ€” browser skips server on repeat visitsNo โ€” browser always hits server
Analytics accuracyLoses repeat-visit dataCaptures every click
Server loadLower (browser caches)Higher (every hit reaches server)

Decision: use 302 if you need click analytics on every visit. Use 301 if you want maximum redirect speed and browser caching, but accept losing analytics after first visit.

Hybrid: use 302 with CDN caching for a short TTL (5 minutes). You count all clicks, the CDN reduces server load.


Step 6: Caching strategy for redirects

URL lookup is 95%+ of traffic โ†’ cache aggressively.

L1: CDN edge cache
  - Cache 302 redirect responses with short Surrogate-Control TTL
  - Invalidate on URL deletion

L2: Redis in-process cache (within app servers)
  - Key: "url:{code}", Value: long_url
  - TTL: 24 hours
  - Eviction: LRU

Read path:
  Redis hit โ†’ 302 redirect           (~2ms)
  Redis miss, DB hit โ†’ cache + 302   (~15ms)
  DB miss โ†’ 404                      (~15ms)

Step 7: Analytics at scale

Problem: 120K redirects/sec โ†’ inserting a row per click into a SQL table immediately will cause write bottleneck.

Solution: async analytics pipeline

Redirect service
  โ””โ”€ Produces "click" event to Kafka (non-blocking)

Kafka consumers:
  โ”œโ”€ Real-time counter: INCR in Redis  โ†’ "a3Bx9:clicks:total"
  โ”œโ”€ Hourly aggregator: batch INSERT into ClickHouse (columnar, analytics optimized)
  โ””โ”€ Geo enrichment: IP โ†’ country via MaxMind, then store enriched event

ClickHouse (or Redshift, BigQuery) for analytics โ€” columnar storage makes aggregations like SELECT country, COUNT(*) FROM clicks WHERE code='a3Bx9' GROUP BY country 10โ€“100x faster than row-based Postgres.


Edge cases

CaseSolution
Same long URL submitted twiceCheck idx_urls_long_url before inserting; return existing code
Custom alias already takenReturn 409 Conflict
Expired linkCheck expires_at in lookup; return 410 Gone
Malicious URLsBlock list check + Google Safe Browsing API on creation
Link deletionSoft-delete (set deleted_at), purge from CDN + Redis

Say it out loud
โ€œIโ€™d use counter-based ID generation with base62 encoding โ€” no collision risk. The redirect service is stateless behind a load balancer; it reads from a Redis cache first (TTL 24h) and falls back to Postgres. Analytics are non-blocking โ€” every redirect fires an event to Kafka, and consumers aggregate to ClickHouse. Iโ€™d use 302 (not 301) so every click is tracked. The bottleneck at 120K RPS is the Redis cache โ€” itโ€™s purpose-built for this and handles millions of reads/sec.โ€

Likely follow-up questions
  • How do you handle hash collisions?
  • How do you make redirects as fast as possible?
  • How would you support custom aliases?
  • How do you count clicks without slowing down redirects?
  • How do you handle link expiry?

References