Terminal.skills
Use Cases/Implement Rate Limiting and API Abuse Protection

Implement Rate Limiting and API Abuse Protection

Build comprehensive API protection against abuse, DDoS attacks, and malicious traffic using intelligent rate limiting, IP blocking, and behavioral analysis.

DevOps#rate-limiting#api-security#middleware#security#owasp
Works with:claude-codeopenai-codexgemini-clicursor
$

The Problem

Maya, platform engineer at a 60-person API-first company, gets woken up at 3:47 AM by PagerDuty: "API response times >10 seconds." By the time she logs in, the payment processing API has crashed under 47,000 requests in 12 minutes from a single IP address. The attacker is hammering /api/payments/validate -- a resource-intensive operation that queries 3 external payment processors and takes 2.1 seconds per legitimate request.

This isn't the first time. Last month, a scraped API key generated 156,000 requests in 4 hours, racking up $2,847 in charges from Stripe, Auth0, and SendGrid. The month before that, a competitor's bot discovered /api/users/search and scraped 340,000 user profiles in 6 hours before anyone noticed. Each search query costs $0.12 in database resources and third-party enrichment.

The current "solution" is manual IP blocking after the damage is done. Someone notices the alerts, logs into the server, identifies the offending IP, and adds it to an nginx deny list. By the time the block takes effect, the damage is done.

No proactive rate limiting, no abuse detection, no automated response. Last quarter's incidents cost $8,200 in extra server capacity (auto-scaling responding to attack traffic as if it were real users), $4,100 in external API overage fees (Stripe, Auth0, and SendGrid don't care if the requests were malicious -- they bill the same), and 67 hours of engineering time on incident response. Legitimate users get caught in the crossfire every time servers slow down or crash under attack load.

The Solution

Using the rate-limiter and security-audit skills, the agent builds layered API protection: graduated rate limiting based on user tier and endpoint cost, behavioral analysis that detects abuse patterns automatically, and automated blocking that stops attacks in seconds instead of hours.

Step-by-Step Walkthrough

Step 1: Implement Multi-Tier Rate Limiting

Set up rate limiting for our Node.js API with different limits for different user tiers and endpoints. Free users get 100 requests/hour, paid users get 1000 requests/hour, enterprise users get 10,000 requests/hour. High-cost endpoints like payment validation should have stricter limits. Include IP-based limits as a backstop against abuse.

The rate limiter architecture lives in src/middleware/rate-limiter.ts and splits into three layers, all backed by Redis with sliding window counters (more accurate than fixed windows because they don't have the boundary spike problem where requests cluster at window boundaries).

User tier limits control overall API access:

TierRequests/HourBurst/Minute
Enterprise10,000200
Paid1,00050
Free10010
Anonymous202

Endpoint-specific limits protect expensive operations regardless of tier:

EndpointLimitWhy
POST /api/payments/validate10/hourCosts $0.89 per call to payment processors
POST /api/users/search50/hourCosts $0.12 in DB and enrichment per query
GET /api/reports/analytics20/hourHeavy database aggregation
POST /api/auth/login10/hour per IPBrute force prevention

IP-based backstop limits catch abuse that slips through user-level controls: 2,000 requests/hour per IP, 100/minute burst cap, 10/second hard ceiling. Exceeding the hard ceiling triggers an automatic 5-minute block.

Responses degrade gracefully. Under 80% of a limit, everything is normal. Between 80-95%, a warning header appears. Between 95-100%, a 200ms delay discourages automation. Over the limit, a 429 Too Many Requests response includes a Retry-After header.

Every response carries X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers so well-behaved clients can self-throttle. This is important for the enterprise customers -- their integrations check these headers and back off before hitting limits, which means they almost never see a 429 response.

The implementation uses Redis INCR with TTL for efficient counter management. Each rate limit check is a single Redis round-trip, adding less than 1ms of latency to every request. The sliding window algorithm uses two counters (current window and previous window) weighted by elapsed time, avoiding the boundary-spike problem where a burst of requests at the end of one window and the start of the next effectively doubles the allowed rate.

Step 2: Add Behavioral Analysis and Abuse Detection

Implement intelligent abuse detection that identifies suspicious patterns beyond simple rate limits. Detect scrapers, bot behavior, distributed attacks, and API key abuse. Include automatic escalation from warnings to temporary blocks to permanent bans based on threat severity.

Rate limits catch the obvious cases. Behavioral analysis catches everything else.

The abuse detector in src/security/abuse-detector.ts watches for seven suspicious indicators: rapid sequential requests (more than 5/second sustained for 30+ seconds), predictable request intervals suggesting automation, high error rates (more than 20% 4xx responses suggests endpoint probing), User-Agent anomalies (missing, spoofed, or known automation frameworks), geographic impossibilities (an API key used from 6 countries in one hour), and 24/7 usage patterns that no human produces.

Each request gets a threat score across four time windows (1 minute, 15 minutes, 1 hour, 24 hours). The score maps to three response levels:

  • Low threat (monitoring) -- slightly elevated usage, enhanced logging, no restrictions
  • Medium threat (throttling) -- clear automation detected, +500ms response delay, rate limits cut 50%, security team notified via Slack
  • High threat (blocking) -- attack patterns confirmed, immediate 429 or 403, 1-24 hour IP/API key suspension, incident response triggered

Distributed attacks get their own detection layer. When multiple IPs show similar request patterns, timing, or payloads, the system correlates them as a coordinated attack and blocks the entire cluster.

The escalation workflow prevents overreaction: first violation is a warning, second violation within 24 hours cuts rate limits by 50% with a 15-minute cooldown, third violation triggers a 1-hour block with security team alert. Repeated violations escalate to 24-hour blocks requiring manual review for reinstatement.

Step 3: Implement Automated Blocking and Response

Set up automated response systems that can block attacks in real-time without manual intervention. Include IP blocking, API key suspension, geographic blocking, and emergency lockdown modes. Ensure legitimate users aren't affected while stopping abusive traffic.

The auto-blocker in src/security/auto-blocker.ts responds to five trigger conditions:

  • More than 1,000 requests/minute from a single IP -- instant 5-minute block
  • More than 100 failed auth attempts/hour -- IP-based auth block
  • Payment validation exceeding 50/hour per key -- API key temporary suspension
  • Search scraping detected (500+ unique queries/hour) -- search endpoint block
  • DDoS pattern (10,000+ requests from 100+ IPs) -- emergency mode activation

Blocking penalties graduate: 5 minutes, then 1 hour, then 24 hours, then permanent review. API keys can be suspended entirely, downgraded to free-tier limits, or restricted from high-cost endpoints -- whichever is proportional to the threat.

Emergency protection modes handle the worst scenarios. Lockdown mode restricts access to authenticated, known-good users only. Read-only mode disables all write operations. A circuit breaker activates automatically when system load crosses critical thresholds.

Legitimate users stay protected through multiple mechanisms. IP whitelisting covers company VPNs and trusted partners. Behavioral learning gradually increases limits for users with consistent, non-abusive patterns -- a customer who's been making 50 requests/hour reliably for 6 months doesn't get flagged when they briefly spike to 80. Automatic false-positive detection watches for patterns like "legitimate user blocked during normal business hours" and triggers an immediate review. And a customer support override provides manual whitelisting during confirmed false positives, with the unblock taking effect within 60 seconds.

The false-positive rate is the most important metric to watch. Below 0.5%, the system is working well. Above 1%, it needs tuning -- either the behavioral thresholds are too aggressive or a new legitimate usage pattern needs to be added to the whitelist.

Step 4: Add Monitoring and Forensic Analysis

Create comprehensive monitoring for API abuse attempts, successful blocks, system performance during attacks, and detailed forensics for security analysis. Include real-time dashboards, automated incident response, and detailed attack attribution.

The security dashboard in src/monitoring/security-dashboard.ts provides real-time visibility into the entire protection system.

Alert routing ensures the right people know at the right time:

SeverityChannelTriggers
CriticalSlack + PagerDutyDDoS detected, payment endpoint under attack, response time 2x baseline, emergency lockdown activated
WarningSlack #securityNew IP in top-10 attackers, API key suspended, geographic blocking activated, false positive rate above 1%
Daily reportEmailFull summary: total requests, blocked requests, unique attack IPs, prevented costs, system availability

Forensic analysis tools support post-incident review: IP geolocation and ASN analysis to identify botnets and hosting providers, User-Agent fingerprinting to identify automation frameworks, request pattern analysis for timing and payload structure, and API key forensics for usage and geographic distribution.

A daily automated report tracks the metrics that matter: total requests versus baseline, percentage blocked, unique attack IPs (flagging repeat offenders), prevented costs from abuse, response time impact during mitigation, and system availability. Over time, these reports build a dataset that reveals attack patterns -- scraping attempts spike on Mondays, payment endpoint abuse correlates with new API key issuance, and the same hosting providers appear repeatedly as attack sources.

Forensic analysis also feeds back into the rate limiting rules. When an attack uses a pattern the behavioral analysis missed, the team adds it to the detection engine. The system gets smarter with every incident instead of just responding to the same attack types forever.

Real-World Example

A fintech startup offering credit score APIs was hemorrhaging money. Their /api/credit/check endpoint cost $1.47 per call (Experian + TransUnion data), and someone automated 23,000 credit checks over Memorial Day weekend -- $33,810 in charges before the team noticed on Tuesday morning.

The pattern repeated monthly: API keys scraped from client-side JavaScript, endpoints discovered through public documentation, automated abuse costing thousands. The worst single incident: a competitor scraped 67,000 user profiles using a leaked key, generating $47,000 in third-party API charges and violating compliance requirements. The breaking point came when their largest enterprise customer threatened to leave after experiencing 8-second response times during a scraping attack.

Week 1 focused on emergency rate limiting -- the bleeding had to stop. Aggressive caps on expensive endpoints, IP-based blocking for obvious abuse patterns, credit check endpoint restricted to 10/hour for free users. This alone prevented $28,000 in potential abuse during the first week, based on the blocked request volume multiplied by per-request cost.

Week 2 deployed behavioral analysis with graduated response: warning, throttling, blocking. The behavioral layer caught three attack patterns that simple rate limits missed -- a distributed scraping operation using 40 different IPs with the same request fingerprint, an API key being shared across 12 different geographic locations simultaneously, and a timing-based attack that stayed just under the rate limit but ran 24/7.

Week 3 added automated real-time blocking with CloudFlare edge-level integration, stopping attacks at the CDN layer before they even reach the API servers.

After 60 days, abuse incidents dropped from 12-15 per month to 2-3. External API costs from abuse fell from $33,810/month to $1,200/month -- a 96% reduction. Attack response time went from manual intervention (2-8 hours) to automated blocking (15-45 seconds). System availability during attacks improved from 67% to 99.4%.

The security investment of $180/month in Redis and monitoring tools prevents an estimated $6,400/month in abuse-related costs -- a 34x ROI. The engineering team went from spending 2-3 days per week on abuse incidents to reviewing automated daily reports that confirm the system handled everything without intervention.

The enterprise customer who threatened to leave renewed their contract after a demo of the security dashboard showing real-time attack blocking. They're now seeing consistent sub-2-second response times for credit checks, regardless of what attackers are doing on the other side of the protection layer. The compliance team gets automated audit trails for every API access, which simplified their next SOC 2 audit significantly.