Step 1: Requirements
Functional:
- Shorten a long URL โ get a short code (e.g.
bit.ly/a3Bx9) - Redirect short URL โ original URL (301 vs 302?)
- Custom aliases (
bit.ly/my-campaign) - Link expiry (optional TTL)
- Click analytics: total clicks, unique visitors, country, referrer
Non-functional:
- Reads >> writes (100:1 ratio typical)
- Redirect latency < 10ms p99
- 100M new URLs/day
- Links live indefinitely by default
- Availability > 99.99% for redirects
Estimation:
Writes: 100M URLs/day รท 86,400s โ 1,200 writes/sec
Reads: 100:1 ratio โ 120,000 reads/sec
Storage: 100M ร 365 ร 10 years ร ~500 bytes/record โ 180 TB
Short code space: base62 (a-zA-Z0-9), 7 chars โ 62^7 โ 3.5 trillion codes (plenty)
Step 2: Short code generation
Two approaches:
Option A: Hash-based (MD5 / SHA256 โ truncate)
import hashlib, base64
def shorten(long_url: str) -> str:
hash_bytes = hashlib.md5(long_url.encode()).digest()
# Base64-encode, take first 7 chars, make URL-safe
code = base64.urlsafe_b64encode(hash_bytes).decode()[:7]
return code # e.g. "a3Bx9yZ"
Problem: hash collisions. Two different URLs could produce the same 7-char prefix.
Fix: on collision, append a counter suffix and re-hash:
def shorten_safe(long_url: str, db) -> str:
suffix = 0
while True:
code = md5_encode(long_url + str(suffix))[:7]
if not db.exists(code):
return code
suffix += 1
Downside: multiple DB lookups on collision. At large scale collisions are rare but non-zero.
Option B: Counter-based (preferred at scale)
Use a distributed ID generator (e.g. Twitter Snowflake, or a simple Redis INCR) to generate a unique integer, then encode it in base62:
BASE62 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
def to_base62(num: int) -> str:
if num == 0:
return BASE62[0]
result = []
while num > 0:
result.append(BASE62[num % 62])
num //= 62
return ''.join(reversed(result))
# Redis: id = INCR url_counter โ to_base62(id) = short code
Advantages: no collision possible, monotonically increasing, no DB lookups during generation.
Range-based ID allocation: to avoid Redis being a single point of failure, each app server pre-allocates a range of IDs (e.g. 1,000,000 at a time) from a dedicated ID service. It burns through the range locally, then fetches the next.
Step 3: Data model
-- URLs table
CREATE TABLE urls (
code VARCHAR(10) PRIMARY KEY, -- the short code
long_url TEXT NOT NULL,
created_by UUID REFERENCES users(id),
created_at TIMESTAMPTZ DEFAULT NOW(),
expires_at TIMESTAMPTZ,
is_custom BOOLEAN DEFAULT FALSE
);
CREATE INDEX idx_urls_long_url ON urls(long_url); -- for dedup check
-- Clicks table (write-heavy, append-only)
CREATE TABLE clicks (
id BIGSERIAL,
code VARCHAR(10) NOT NULL,
clicked_at TIMESTAMPTZ DEFAULT NOW(),
ip_hash VARCHAR(64), -- hashed for privacy
country VARCHAR(2),
referrer TEXT,
user_agent TEXT
) PARTITION BY RANGE (clicked_at); -- partition by month
Step 4: Architecture
Client
โ GET /a3Bx9
โผ
[ CDN / Edge ]
โ Cache-hit: 301 redirect immediately (< 5ms)
โ Cache-miss:
โผ
[ Load Balancer ]
โ
โผ
[ Redirect Service (stateless, horizontally scaled) ]
โ
โโ 1. Check Redis cache: GET url:a3Bx9
โ hit โ 301 redirect to cached long URL
โ miss โ
โโ 2. Read from DB: SELECT long_url FROM urls WHERE code='a3Bx9'
โ hit โ write to Redis (TTL 24h) โ 301 redirect
โ miss โ 404
โ
โโ 3. Fire-and-forget analytics event โ Kafka topic "clicks"
(does NOT block the redirect response)
[ Kafka "clicks" topic ]
โโ Analytics Consumer โ aggregates into ClickSummary table (PostgreSQL / ClickHouse)
Step 5: 301 vs 302 redirect
| 301 (Permanent) | 302 (Temporary) | |
|---|---|---|
| Browser caches? | Yes โ browser skips server on repeat visits | No โ browser always hits server |
| Analytics accuracy | Loses repeat-visit data | Captures every click |
| Server load | Lower (browser caches) | Higher (every hit reaches server) |
Decision: use 302 if you need click analytics on every visit. Use 301 if you want maximum redirect speed and browser caching, but accept losing analytics after first visit.
Hybrid: use 302 with CDN caching for a short TTL (5 minutes). You count all clicks, the CDN reduces server load.
Step 6: Caching strategy for redirects
URL lookup is 95%+ of traffic โ cache aggressively.
L1: CDN edge cache
- Cache 302 redirect responses with short Surrogate-Control TTL
- Invalidate on URL deletion
L2: Redis in-process cache (within app servers)
- Key: "url:{code}", Value: long_url
- TTL: 24 hours
- Eviction: LRU
Read path:
Redis hit โ 302 redirect (~2ms)
Redis miss, DB hit โ cache + 302 (~15ms)
DB miss โ 404 (~15ms)
Step 7: Analytics at scale
Problem: 120K redirects/sec โ inserting a row per click into a SQL table immediately will cause write bottleneck.
Solution: async analytics pipeline
Redirect service
โโ Produces "click" event to Kafka (non-blocking)
Kafka consumers:
โโ Real-time counter: INCR in Redis โ "a3Bx9:clicks:total"
โโ Hourly aggregator: batch INSERT into ClickHouse (columnar, analytics optimized)
โโ Geo enrichment: IP โ country via MaxMind, then store enriched event
ClickHouse (or Redshift, BigQuery) for analytics โ columnar storage makes aggregations like SELECT country, COUNT(*) FROM clicks WHERE code='a3Bx9' GROUP BY country 10โ100x faster than row-based Postgres.
Edge cases
| Case | Solution |
|---|---|
| Same long URL submitted twice | Check idx_urls_long_url before inserting; return existing code |
| Custom alias already taken | Return 409 Conflict |
| Expired link | Check expires_at in lookup; return 410 Gone |
| Malicious URLs | Block list check + Google Safe Browsing API on creation |
| Link deletion | Soft-delete (set deleted_at), purge from CDN + Redis |