Caching

Caching stores copies of data closer to where it's needed, trading storage for speed. A well-placed cache can reduce response times by orders of magnitude and shield backend systems from load they don't need to handle.

Core Patterns

Cache-aside (also called lazy loading) is the most common pattern. The application checks the cache first. On a miss, it fetches from the source, stores the result in the cache, and returns it. The application controls all reads and writes, which makes the pattern straightforward to reason about.

Read-through moves the fetching logic into the cache layer itself. The application always reads from the cache, and the cache is responsible for populating itself on a miss. This simplifies application code but requires a cache that supports this behavior natively.

Write-through writes data to both the cache and the backing store in the same operation. This keeps the cache consistent at the cost of higher write latency — every write must complete in two places before it's acknowledged.

Write-behind (write-back) writes to the cache immediately and asynchronously flushes to the backing store. This is faster for writes but introduces a window where data exists only in the cache. If the cache fails before flushing, that data is lost.

Invalidation

Cache invalidation is famously difficult because it's a distributed consistency problem. The main strategies:

Time-based expiration (TTL) is the simplest. Data expires after a fixed duration. It doesn't guarantee freshness, but it bounds staleness. Suitable when slightly outdated data is acceptable.

Event-driven invalidation purges or updates cache entries when the underlying data changes. More precise than TTL but requires your system to reliably propagate change events — which often means a message queue or pub/sub layer.

Version-based invalidation attaches a version identifier to cached data. When the source data changes, the version increments, and old cache entries become naturally stale. This works well for immutable assets and API responses.

Where Caching Lives

Application-level caches live inside your service process. An in-memory map or LRU cache avoids network round trips entirely — the fastest cache is the one you don't have to call over the network. The tradeoff is that each instance maintains its own copy, and cache entries don't survive process restarts.

External caches like Redis sit between your application and its backing store as a shared service. Every instance reads from and writes to the same cache, which eliminates duplication and survives restarts. Redis supports strings, hashes, lists, sets, and sorted sets — which makes it useful beyond simple key-value caching. Expiration is built in at the key level. For high availability, Redis supports replication with automatic failover (via Sentinel) and horizontal partitioning (via Cluster mode).

HTTP caching happens at the edge — in browsers, CDNs, and reverse proxies like caddy. The Cache-Control header tells intermediaries how long a response can be reused. This offloads traffic before it reaches your application at all, but it only works for responses that are safe to share across users.

Things That Go Wrong

Cache stampede occurs when a popular cache entry expires and many concurrent requests simultaneously hit the backing store to repopulate it. Mitigation techniques include locking (only one request fetches while others wait) and staggered TTLs.

Stale data is the perennial tradeoff. Every caching strategy makes a bet about how long data remains valid. The right TTL depends on your tolerance for staleness and the cost of a cache miss. There is no universal answer — only a tradeoff appropriate to your use case.

Cold start happens when a cache is empty — after a restart, a deployment, or a new instance joining the pool. Every request is a miss, and the backing store takes the full load until the cache warms up. For critical paths, consider pre-warming the cache at startup with the most frequently accessed keys.

CPU Caches

Everything above describes caching at the software level — Redis, HTTP headers, in-memory maps. But the concept originates in hardware, and it's worth understanding the distinction.

Modern CPUs don't read from main memory directly. Between the processor and RAM sits a hierarchy of small, fast caches — L1, L2, and L3 — that store recently accessed data and instructions. L1 is the smallest and fastest (a few nanoseconds), L3 is larger but slower, and main memory is slower still by an order of magnitude. When the CPU needs data that's already in cache, it gets it almost instantly. When it's not — a cache miss — the processor stalls while it fetches from a slower level.

For the vast majority of application development, you never think about this. The CPU manages its caches transparently, and the performance difference is invisible behind the much larger costs of network calls, database queries, and disk I/O.

It starts to matter when you're processing large volumes of data in tight loops — parsing millions of rows, running numerical computations, or iterating over large data structures. In these cases, data locality becomes significant. Accessing memory sequentially (walking an array from start to end) is dramatically faster than jumping to random locations, because sequential access lets the CPU prefetch the next cache line before you need it. A linked list that scatters nodes across the heap defeats this prefetching — every pointer dereference is potentially a cache miss.

The practical takeaway: when performance matters at the data-processing level, prefer contiguous data structures (arrays, slices) over pointer-heavy ones (linked lists, trees of individually allocated nodes). This isn't premature optimization advice for typical application code — it's relevant when profiling reveals that memory access patterns are the bottleneck, which happens most often in data pipelines, parsers, and compute-intensive inner loops.