writing / backend

Your Cache Is Making Things Worse

How bad caching strategy creates more problems than it solves.

Atharva Uday UndeAtharva Uday UndeJune 07, 20269 min read
cachingRedisperformancebackendarchitecturedatabase optimizationdistributed systemsDevOps

The Seductive Lie of Caching

Teams love caching. It feels like free performance. Add Redis, cache everything, boom—10x faster. Everyone's happy. Except it's not free. And faster isn't always better.

Bad caching creates more problems than it solves:

  • User makes a payment, sees it pending for 2 hours (cached stale data)
  • You deploy code, but old cached values are still served to half your users
  • Cache hit rate is 40% because TTL is wrong
  • Redis memory fills up and starts evicting random keys
  • You spend 3 hours debugging "why is this endpoint returning the wrong data?" (it's the cache)

The real cost of caching isn't the cache hit. It's the cache miss you didn't plan for.


The Framework: When to Cache

Ask one question before caching anything:

"If this data is wrong for 30 seconds, does it break something?"

If yes, don't cache. Or cache very short TTL. If no, cache.

Safe to Cache

  • User preferences (can be wrong for an hour)
  • Public data (can be wrong for a day)
  • Computed reports (can be wrong for 5 minutes)
  • Product catalogs (can be wrong for 10 minutes)

Not Safe to Cache

  • Payment status (wrong for 1 second = problem)
  • User balance (wrong for 1 second = problem)
  • Authorization decisions (wrong for 1 second = problem)
  • Session state (wrong for 1 second = problem)

The pattern: anything where freshness is critical, don't cache aggressively.


The Real Problem: Cache Invalidation

There are only two hard things in Computer Science: cache invalidation and naming things. Teams underestimate this. You cache something. Great. Now you need to invalidate it.

When do you invalidate?

Option 1: Automatic TTL

  • Simple, but serves stale data
  • Good when staleness is acceptable
  • Bad when freshness is critical

Option 2: Invalidate on writes

  • Complex, but fresh data
  • Good when updates are infrequent
  • Bad when updates are frequent (invalidate more than cache hit)

Option 3: Event-based invalidation

  • Most complex, but flexible
  • Good for distributed systems
  • Bad for tightly coupled systems

The mistake: Choosing invalidation strategy after caching is already deployed. Choose it first.


Common Mistakes

Mistake 1: Caching Database Queries Without Invalidation Strategy

You cache user.find(userId). Great. User updates their email. Oops. Cache still has old email. Now you invalidate: remove cache when user is updated. Great.

Now user updates email, then profile picture. Two cache invalidations? Or one? Distributed system? Now you invalidate on 3 services. One service misses the invalidation message.

This is why cache invalidation is hard.

Mistake 2: Using Redis as a Database

Redis is fast. So teams use it for persistence. Then the server crashes. Redis data is gone. Or: Redis fills up. System deletes random keys. Or: Redis replicates wrong. Data is inconsistent across nodes.

Use Redis as a cache (data loss acceptable) or use a real database (data loss not acceptable). Don't use it as both.

Mistake 3: Caching Expensive Computations Without Measuring

You have an expensive database query. Takes 500ms. Cache it. Cache hit drops it to 1ms from Redis.

Except: the network call to Redis takes 5ms. Cache miss takes 510ms (cache miss + computation). Cache hit rate is 60%.

Average latency: (0.6 × 5ms) + (0.4 × 510ms) = 207ms.
Without cache: 500ms.
With cache: 207ms.

Better? Yes. But you never measured. You just assumed caching helps.

Measure actual latency impact, not just cache hit rate.

Mistake 4: Not Setting Max Memory Policy

Redis memory fills up. What happens? By default: Redis stops accepting writes. System breaks. You configured it: evict least-recently-used keys. Now old data disappears unexpectedly. You configured it: evict random keys. Even worse.

Know what your max memory policy is. Don't let it be a surprise.

Mistake 5: Assuming Cache Hits Always Improve Latency

Network latency to Redis: 5ms
Network latency to database: 50ms
Database query time: 400ms

Cache hit latency: 5ms
Cache miss latency: 455ms
No cache latency: 450ms

Cache hit saves 445ms. Great. But cache miss is slower than no cache. So your average depends on hit rate.

If hit rate drops below 50%, caching is overhead.

Measure actual latency, not hit rate.


The Framework That Works

  1. Only cache what's safe to be stale
    • Freshness requirement determines TTL
    • Payment? 0 cache or 10 second TTL max
    • User preference? 1 hour cache is fine
  2. Have invalidation strategy before deployment
    • TTL? Event-based? On-write invalidation?
    • Don't say "we'll figure it out"
    • Invalidation complexity should influence your cache decision
  3. Measure actual impact
    • Don't trust hit rate
    • Measure latency with and without cache
    • Measure memory cost
    • If benefit is small, remove cache
  4. Know when caching makes things worse
    • Distributed system with eventual consistency? Caching amplifies the problem
    • High memory cost for low benefit? Remove it
    • Debugging takes 10x longer? Not worth it
  5. Monitor cache behavior in production
    • Hit rate vs latency
    • Memory usage
    • Eviction rate
    • If behavior changes, investigate

When Not to Cache

  • Payment systems (use real-time data)
  • Authorization (use real-time data)
  • Distributed systems with complex invalidation (keep it simple)
  • Data that changes frequently but you cache anyway (you'll serve lies)
  • Just to mask slow databases (fix the database instead)

The Real Insight

Caching is optimizing for the wrong thing. You cache a database query to make it faster. But why is the query slow?

  • Bad indexes? Fix it
  • N+1 queries? Fix it
  • Missing pagination? Fix it
  • Bad query logic? Fix it

Most teams add caching to mask poor database design. Fix the root cause. Use caching for genuinely expensive operations that are unavoidable.


TL;DR

Only cache what's safe to be stale. If data needs to be fresh, don't cache aggressively. Have invalidation strategy first. Choose TTL, on-write, or event-based before you cache. Measure actual impact. Hit rate is meaningless. Measure latency and memory cost. Don't cache to mask slow databases. Fix the database. Monitor in production. If caching behavior changes, investigate.

Bad caching creates more problems than it solves. Good caching is invisible because the trade-offs are understood and managed.


Tags: caching · Redis · performance · backend · architecture · database optimization · distributed systems · DevOps