Scaling — Flash Reference

What is scalability?

Ability of a system to handle increased load by adding resources without a full architectural overhaul.


Load metrics (measure these first)

MetricWhat it is
RPSAPI calls per second
Concurrent usersUsers active at same time
ThroughputData transferred per unit time (e.g. GB/s)
QPSDatabase queries per second
Message rateMessages through queues per second

Good vs bad scaling (response time under load)

LoadResponse timeVerdict
2x → ~same or slightly upExcellent — sublinear, caching works
10x → linear increaseAcceptable — predictable
10x → spike / timeoutCritical — bottleneck or breaking point

Goal: Linear or sublinear degradation; avoid superlinear spikes.


Vertical scaling (scale up)

Definition: Add more power to the same machine (more CPU, RAM, SSD, network).

  • Pros: Simple, no code change, low latency, no distributed complexity.
  • Cons: Hardware ceiling, single point of failure, cost curve (bigger = more than 2× cost), upgrade downtime.
  • Use when: DB before sharding, strong consistency, early-stage, predictable moderate growth.

Horizontal scaling (scale out)

Definition: Add more machines; load balancer distributes traffic.

  • Pros: No hard limit, fault tolerance, often cheaper, can place nodes near users.
  • Cons: Complexity, data consistency, network latency, stateless app servers usually required.

Stateless vs stateful

StatelessStateful
No session on server; any server can serve any requestSession on one server → all requests must go there
Session in shared store (Redis, JWT, S3)Sticky sessions; hotspots; hard to remove servers
Easy to scaleHard to scale

Make stateless: Redis/Memcached for sessions, JWT, object storage (S3) for files.


Scaling by tier

TierEaseMain levers
AppEasiestStateless + LB + auto-scale + multi-AZ
DatabaseHardestRead replicas, sharding, or NoSQL
CacheEasyRedis Cluster, consistent hashing, cache-aside
QueuesEasyDecouple producers/consumers; partition (e.g. Kafka)

Database scaling (by bottleneck)

  • Read-heavy (10:1–100:1 read:write): Read replicas — primary writes, replicas read. Trade-off: replication lag.
  • Write-heavy or huge data: Sharding — partition by key (range / hash / directory). Trade-off: no cross-shard queries, rebalancing is hard.
  • Need both / flexibility: Sharding + replicas, or consider NoSQL (built-in sharding, eventual consistency, no joins).

Sharding strategies: Range (A–H, I–P…), Hash (key mod N), Directory (lookup table).


Caching

  • Cache can do ~100× DB throughput (e.g. Redis 100k+ ops/s).
  • Cache-aside: App → cache → on miss → DB → populate cache.
  • Scale cache: Redis Cluster, consistent hashing.

Message queues

  • Decouple producers and consumers; scale each independently.
  • Buffer spikes; consumers process at their own rate.
  • Kafka: partition topics for parallel consumption.

Example: 0 → millions (stages)

StageUsersChangeNext bottleneck
10–10KSingle server (app + DB)CPU/memory contention
210K–100KDB on separate serverDB read load
3100K–500KAdd Redis cacheSingle app server
4500K–2MLB + multiple stateless app serversDB again
52M–10MRead replicas (primary writes, replicas read)Write throughput
610M+Sharding (partition by user ID etc.)Cross-shard queries, rebalancing

One-line takeaways

  • Vertical: Same box, bigger box; simple but capped.
  • Horizontal: More boxes; needs stateless design and data strategy.
  • Always find the bottleneck before scaling (more app servers don’t help if DB is the limit).
  • Patterns: LB, caching, async queues, read replicas, sharding show up in almost every scalable system.
  • Scalability = handle more load; availability = stay up when things fail (next topic).