Implement API Caching Strategy to Cut Response Times and Costs

The Problem

Carlos, backend lead at a 40-person social media analytics platform, watches their database CPU spike to 89% every morning as customers check their dashboard metrics. The main analytics endpoint takes 2.3 seconds to respond because it aggregates data across 14 tables, calculating engagement rates for the past 30 days. That same expensive query runs 847 times per hour during peak usage -- identical calculations, over and over.

The worst part: 78% of these API calls return identical data. A customer refreshes their dashboard 6 times in 2 minutes, triggering the same computation each time. Database costs have grown from $340/month to $1,250/month as the team scaled from 2 to 8 CPU cores just to handle the query load. Response times during peak hours (9-11 AM) average 4.1 seconds, and 23% of mobile app users abandon the dashboard before it finishes loading.

Black Friday broke everything. 3,200 concurrent dashboard loads triggered 12,000+ database queries in 10 minutes. The connection pool maxed out at 200, new requests queued for 45+ seconds, and the API gateway started timing out. Customer support received 67 tickets about "slow loading" and "app not working." The infrastructure team panic-scaled to 16 database cores ($2,800/month) just to survive, but response times still averaged 6.7 seconds.

The Solution

Implement a multi-layer caching strategy using cache-strategy for intelligent cache design and docker-helper for Redis infrastructure. The approach: identify cacheable data patterns, implement application-level caching with smart invalidation, add HTTP caching headers, and monitor cache performance.

Step-by-Step Walkthrough

Step 1: Analyze API Performance and Identify Caching Opportunities

Analyze our Node.js API logs from the past 7 days to identify the slowest and most frequently called endpoints. I want to understand which queries are expensive, which data changes infrequently, and what our current cache hit/miss patterns look like. Focus on the analytics dashboard APIs that are causing database load.

Seven days of production logs reveal exactly where the database time goes:

Endpoint	Avg Response	Calls/Hour (Peak)	DB Load	Response Size
`GET /api/dashboard/analytics`	2.34s	847	34% of total	847KB
`GET /api/reports/engagement`	1.87s	234	18% of total	1.2MB
`GET /api/social/metrics/{id}`	0.94s	2,340	12% of total	45KB

The caching opportunity is massive:

89% of reference data calls (user settings, account info) return identical values
Analytics data refreshes every 30 minutes but gets queried every 2 minutes
78% of dashboard calls are repeat requests within 5 minutes
847 identical computations per hour for the top endpoint alone

Current infrastructure: $1,635/month total ($1,250 database + $340 application servers + $45 CDN) serving 12,000 unique users. The database is doing 10x more work than necessary.

Step 2: Implement Redis-Based Application Caching

Set up Redis caching for our top 3 API endpoints. Use different cache strategies: dashboard analytics with 15-minute TTL and user-specific keys, engagement reports with 1-hour TTL, and social metrics with 5-minute TTL. Include cache warming, compression for large payloads, and graceful degradation when Redis is unavailable.

The Redis infrastructure uses a primary-replica setup for read availability:

yaml

# docker-helper.yml
services:
  redis-primary:
    image: redis:7-alpine
    ports: ["6379:6379"]
    command: redis-server --maxmemory 2gb --maxmemory-policy allkeys-lru
    volumes: ["redis_data:/data"]

  redis-replica:
    image: redis:7-alpine
    command: redis-server --slaveof redis-primary 6379 --readonly

Each endpoint gets a cache strategy tailored to its data characteristics:

Dashboard analytics -- the biggest win. User-specific keys with 15-minute TTLs balance freshness against performance. Responses over 10KB get gzip compressed (saves 67% memory). The top 50 users' dashboards are pre-warmed every 15 minutes so their first load of the day is instant:

typescript

// src/cache/dashboard.ts
const DASHBOARD_TTL = 900; // 15 minutes

async function getDashboard(userId: string, dateRange: string) {
  const key = `dashboard:analytics:${userId}:${dateRange}`;
  const cached = await redis.get(key);
  if (cached) return decompress(cached);

  const data = await computeDashboard(userId, dateRange);
  const compressed = data.length > 10240 ? compress(data) : data;
  await redis.setex(key, DASHBOARD_TTL, compressed);
  return data;
}

Engagement reports -- 1-hour TTL because data updates hourly from social APIs. Background refresh kicks in 5 minutes before expiry (stale-while-revalidate pattern), so users never see a cache miss during business hours.

Social metrics -- 5-minute TTL with probabilistic expiry. Each key's TTL varies by plus or minus 20% to prevent cache stampedes, where thousands of keys expire simultaneously and all hit the database at once.

Graceful degradation is built in: if Redis latency exceeds 100ms, the circuit breaker trips and the API falls back to direct database queries. The API never goes down because of a cache problem -- it just gets slower.

Step 3: Add HTTP-Level Caching and CDN Integration

Implement HTTP caching headers, set up Cloudflare caching for static and semi-static API responses, and add client-side caching strategies. Include conditional requests (ETag/If-None-Match) and vary headers for personalized content.

Three tiers of HTTP caching cover different data freshness needs:

Static API responses (user settings, app config):

Cache-Control: public, max-age=3600, s-maxage=7200
ETag: W/"abc123-version-hash"
Vary: Accept-Encoding, Authorization

CDN caches for 2 hours, browsers cache for 1 hour. These responses change rarely and can tolerate staleness.

Dynamic API responses (dashboard, metrics):

Cache-Control: private, max-age=300, must-revalidate
ETag: W/"dashboard-{user-id}-{timestamp}"

Browser caches for 5 minutes, must validate before reuse. The private directive keeps personalized data off shared CDN caches.

Time-sensitive data (real-time notifications):

Cache-Control: no-cache, no-store, must-revalidate

Always fresh, never cached. Some data genuinely needs to be real-time.

Conditional requests make the biggest difference for repeat visitors. When a dashboard hasn't changed, the server returns a 304 Not Modified with zero body -- saving 847KB per response. During peak hours, 23% of API calls resolve as 304s.

Step 4: Monitor Cache Performance and Optimize

Set up comprehensive cache monitoring with key metrics like hit rates, response time improvements, memory usage, and cost savings. Create alerting for cache failures and automated optimization recommendations.

After deployment, the numbers tell the story:

Cache hit rates:

Layer	Hit Rate	Target
Dashboard API	92.1%	>85%
Engagement reports	86.7%	>85%
Social metrics	91.4%	>85%
Static data	98.2%	>95%
Overall	89.3%	>85%

Response time improvements:

Endpoint	Before	After	Improvement
Dashboard analytics	2,340ms	87ms	96% faster
Engagement reports	1,870ms	134ms	93% faster
Social metrics	940ms	78ms	92% faster
P95 overall	4,100ms	340ms	92% faster

Infrastructure impact:

Database query volume: 847/hour to 92/hour (89% reduction)
CPU utilization: 89% to 34% (can downsize from 8 cores to 4)
Connection pool: 167/200 to 45/200 (no more saturation risk)

Monthly cost breakdown:

Item	Before	After	Change
Database	$1,250	$480	-$770
Application servers	$340	$180	-$160
Redis cluster	$0	$89	+$89
CDN	$45	$67	+$22
Total	$1,635	$816	-$819/month (50% savings)

Alerting monitors four conditions: hit rate dropping below 80% (possible invalidation cascade), Redis memory exceeding 90% (scale or adjust TTLs), cache response time exceeding 50ms (Redis performance issue), and cache miss spikes exceeding 50% (possible stampede). Mobile app abandonment drops from 23% to 4%.

Real-World Example

An e-learning platform with 45,000 active students was hemorrhaging money on database costs. Their course progress API was called 2,300 times per hour, running a complex aggregation query that took 3.2 seconds and cost $0.18 in database compute per execution. During exam periods, response times hit 8+ seconds, and students complained the platform was too slow to submit assignments.

The CTO was spending $2,100/month on database resources to handle repetitive queries, with 84% of API calls returning identical data from the previous 10 minutes. Students refreshing their progress dashboard triggered expensive recalculations every time.

Phase 1 (week 1) deployed application-level Redis caching with 15-minute TTLs for course progress. Cache hit rate reached 91% within 3 days, and database query volume dropped 89% immediately. Phase 2 (week 2) added Cloudflare edge caching for static course content, ETag headers for conditional requests, and browser-level caching for user preferences.

After 30 days: response times dropped from 3.2 seconds to 127ms (96% faster). Database costs fell from $2,100 to $340/month (84% reduction). Student "slow loading" tickets dropped 94%. During the next exam period, the system handled 3x the normal traffic with zero slowdowns -- the kind of load spike that would have crashed the system a month earlier. The caching infrastructure paid for itself in 8 days and saves $17,472 annually.

Skills stack · 2 skills

cache-strategy

docker-helper