How clients and services talk is a design decision with real tradeoffs. Pick the style by the communication pattern, design the API well, and make the calls resilient โ those are the three things interviewers probe.
API styles โ and when to choose each
| Style | Transport / format | Strengths | Weaknesses | Reach for it when |
|---|---|---|---|---|
| REST | HTTP + JSON | Ubiquitous, cacheable, simple, tooling everywhere | Over/under-fetching; many round trips for nested data | Public APIs, CRUD, broad client compatibility |
| gRPC | HTTP/2 + Protobuf (binary) | Fast, compact, strong typing/codegen, streaming, bidirectional | Not browser-native (needs proxy), binary = harder to debug | Internal service-to-service, low-latency, polyglot microservices |
| GraphQL | HTTP + query language | Client picks exactly the fields โ no over/under-fetch; one round trip for nested data; strong schema | Caching is harder (POST queries); server complexity; risk of expensive queries | Many varied clients, deeply nested data, mobile (bandwidth) |
Over-fetching = the endpoint returns more than the view needs (wasted bytes). Under-fetching = the view needs several endpoints / nested calls (waterfall round trips). REST is prone to both; GraphQL solves them by letting the client specify the shape โ at the cost of caching and query-cost control (mitigated with persisted queries, depth/complexity limits, and DataLoader to batch and avoid N+1).
REST done right
REST is โuse HTTP as intended.โ Senior signals:
Resource design. Nouns, not verbs: GET /users/123/orders, not /getUserOrders. Use HTTP methods for semantics: GET (read, safe), POST (create), PUT (replace, idempotent), PATCH (partial update), DELETE (idempotent).
Status codes (use the real ones, not 200 for everything):
2xxโ200 OK,201 Created(+Location),202 Accepted(async),204 No Content.4xxโ400bad request,401unauthenticated,403forbidden,404not found,409conflict,422validation,429rate-limited.5xxโ500server error,503unavailable (withRetry-After).
Idempotency. GET/PUT/DELETE are idempotent by definition; POST is not. For unsafe-to-retry operations (payments, orders) accept an Idempotency-Key header: the server records the key + result, so a retried request returns the original outcome instead of acting twice. (See classics.)
Pagination. Cursor-based for large/changing lists (?cursor=&limit=) โ stable as the head changes, O(1) resume. Offset/page only for small, stable datasets.
Versioning. URL (/v1/), header (Accept: application/vnd.api+json;version=1), or query param. URL versioning is the most visible/cacheable. Version when you make breaking changes; prefer additive, backward-compatible changes to avoid versioning at all.
Other niceties: ETag/If-None-Match for caching/conditional requests, consistent error envelopes, filtering/sorting via query params, HATEOAS (rarely required but worth naming).
Real-time: WebSockets vs SSE vs long-polling
When the server must push to the client (chat, live scores, notifications):
| Technique | Direction | How | Best for | Cost |
|---|---|---|---|---|
| Short polling | clientโserver | Client requests on a timer | Simple, infrequent updates | Wasteful, laggy |
| Long polling | clientโserver | Request held open until data or timeout, then re-issued | Fallback when WS unavailable | Many held connections, hacky |
| SSE (Server-Sent Events) | serverโclient only | One long-lived HTTP stream, auto-reconnect | Server push: feeds, notifications, dashboards | Unidirectional; HTTP/1.1 connection limits |
| WebSocket | bidirectional | Persistent full-duplex TCP after HTTP upgrade | Chat, collaboration, games, anything two-way | Stateful connections complicate scaling/LB |
Choose by direction + frequency. Server-only push at moderate rate โ SSE (simpler, plain HTTP, auto-reconnect). Two-way / high-frequency โ WebSocket. No real-time need โ donโt; just poll or refetch. Scaling WebSockets means handling sticky/stateful connections, a pub/sub backplane (Redis) to broadcast across server instances, and connection limits โ call that out.
Synchronous vs asynchronous (request/response vs event-driven)
- Synchronous (request/response): caller waits for the result. Simple, immediate, but couples caller to calleeโs availability and latency, and chains of sync calls compound latency and failure (one slow service stalls the whole request).
- Asynchronous (event-driven): caller emits an event / enqueues a job and moves on; the work happens later. Decouples services, smooths spikes, and improves resilience โ at the cost of eventual consistency and harder debugging/tracing.
SYNC: client โ A โ B โ C (waits; A's latency = A+B+C; C down โ whole call fails)
ASYNC: client โ A โ [queue] โ workers โ B, C (A returns now; B/C catch up; retries safe)
Use sync when the caller genuinely needs the answer now (read a balance). Use async for slow/spiky/fan-out work (send email, process upload, update search index) โ and pair it with the outbox pattern so the event and the state change are atomic. โIโd return 202 and process asynchronouslyโ is the right answer for anything the user doesnโt need to block on.
API gateway responsibilities
The single entry point in front of a microservices fleet, centralizing cross-cutting concerns so services stay focused:
- AuthN/AuthZ โ validate tokens once at the edge.
- Rate limiting & quotas โ shed abuse before it reaches services.
- Routing & aggregation โ path-based routing; combine several backend calls into one client response (BFF-style).
- Protocol translation โ REST at the edge โ gRPC internally.
- TLS termination, logging, metrics, caching.
Keep it thin (no business logic โ that recreates a monolith) and replicated (itโs a choke point and potential SPOF).
Resilience: timeouts, retries, circuit breakers
Networks fail; calls hang. These three patterns turn brittle calls into resilient ones โ and they interact, so know the order.
Timeouts. Never make an unbounded network call. A missing timeout means one slow dependency exhausts your thread/connection pool and cascades. Set aggressive, per-call timeouts (and budget them across a request chain).
Retries. Retry transient failures (timeouts, 503, connection resets) โ but only idempotent operations, with exponential backoff + jitter (so retries donโt synchronize into a thundering herd), and a cap (a few attempts, not infinite). Retrying a non-idempotent POST without an idempotency key double-acts.
Circuit breaker. If a dependency keeps failing, stop calling it. The breaker tracks the failure rate; past a threshold it trips (open) and fails fast for a cooldown (no waiting on a dead service), then half-opens to test recovery with a trial request, and closes when healthy. This prevents retries from hammering a struggling service and gives it room to recover โ and lets you degrade gracefully (serve a cached/default response while open).
CLOSED โโ(failures > threshold)โโโบ OPEN โโ(cooldown)โโโบ HALF-OPEN
โฒ โ
โโโโโโโโโโโโโโโ(trial succeeds)โโโโโโโโโโโโโโโโโโโโโโโโโโโ
(trial fails โ back to OPEN)
Together: timeout bounds each call, retry w/ backoff handles transient blips, circuit breaker stops retries from worsening a sustained outage, and bulkheads (isolated pools per dependency) keep one failing dependency from sinking the rest. Add rate limiting (token bucket) at the edge to protect against overload. (See building blocks and classics.)
Interview questions & model answers
Q: REST vs gRPC vs GraphQL? โREST for public/CRUD APIs โ ubiquitous, cacheable, simple. gRPC for internal service-to-service โ binary Protobuf over HTTP/2 is fast and strongly typed with codegen and streaming, but not browser-native. GraphQL when many varied clients need flexible nested data and REST would over/under-fetch โ the client picks the fields, at the cost of harder caching and query-cost control. Often gRPC internally, REST or GraphQL at the edge.โ
Q: Over-fetching vs under-fetching? โOver-fetching: the endpoint returns more than the view needs โ wasted bandwidth. Under-fetching: the view needs multiple calls, causing waterfall round trips. REST suffers both; GraphQL fixes them by letting the client specify the exact shape, which Iโd protect with depth/complexity limits and DataLoader batching to avoid N+1.โ
Q: WebSockets vs SSE vs long-polling? โBy direction and frequency. Server-to-client push at moderate rate โ SSE: one long-lived HTTP stream with auto-reconnect, simple. Two-way or high-frequency โ WebSocket: persistent full-duplex, but stateful so scaling needs sticky connections and a Redis pub/sub backplane to broadcast across instances. Long-polling only as a fallback. No real-time need โ just refetch.โ
Q: How do you make a POST safe to retry? โAn idempotency key: the client sends a unique key per logical operation; the server records the key with its result, so a retry returns the original outcome instead of acting twice. Combined with timeouts and bounded retries, that makes the call safe under network failures.โ
Q: Sync vs async โ when each? โSync when the caller needs the answer now and the chain is short โ reading data. Async for slow, spiky, or fan-out work: return 202, enqueue, and process in workers, which decouples services and smooths load at the cost of eventual consistency. Iโd pair async event publishing with the outbox pattern so the event and DB write are atomic.โ
Q: How do retries, timeouts, and circuit breakers fit together? โTimeouts bound every call so a hung dependency canโt exhaust my pool. Retries with exponential backoff and jitter handle transient failures โ but only for idempotent calls, with a cap. A circuit breaker trips after sustained failures so I fail fast instead of retrying a dead service, then half-opens to probe recovery. Plus bulkheads to isolate pools and rate limiting at the edge.โ
Q: What does an API gateway do, and what should it not do? โIt centralizes cross-cutting concerns: auth, rate limiting, routing, request aggregation, protocol translation, TLS, observability โ so services stay focused. It should NOT hold business logic (that recreates a monolith at the edge) and must be replicated since itโs a choke point and SPOF.โ
Q: How do you version a REST API?
โPrefer additive, backward-compatible changes so I rarely need to version. When a breaking change is unavoidable, URL versioning (/v1/) is the most visible and cacheable; header-based is cleaner but less obvious. I deprecate old versions on a published timeline rather than breaking clients.โ
Common mistakes / what weak candidates do
- Returning
200for everything (including errors), defeating clients and caches. - Treating
POSTas idempotent โ retrying without an idempotency key double-charges. - Using verbs in REST paths (
/getUser) and ignoring HTTP method semantics. - Defaulting to WebSockets when SSE (or plain refetch) suffices, then ignoring the stateful-scaling cost.
- Making everything synchronous, so one slow service stalls the whole request chain.
- Calling without timeouts, or retrying non-idempotent ops / retrying without backoff+jitter (thundering herd).
- No circuit breaker, so retries pile onto a failing dependency and cascade.
- Putting business logic in the gateway or forgetting itโs a SPOF that must be replicated.
- Choosing GraphQL without addressing caching and query-cost/N+1 control.