api-load-tester
Generates and executes load test scripts for APIs using k6, wrk, or autocannon. Creates realistic test scenarios from OpenAPI specs, route files, or endpoint descriptions. Use when someone needs to load test, stress test, benchmark, or find the breaking point of their API. Trigger words: load test, stress test, benchmark, RPS, concurrent users, breaking point, performance test, k6, wrk.
Usage
Getting Started
- Install the skill using the command above
- Open your AI coding agent (Claude Code, Codex, Gemini CLI, or Cursor)
- Reference the skill in your prompt
- The AI will use the skill's capabilities automatically
Example Prompts
- "Deploy the latest build to the staging environment and run smoke tests"
- "Check the CI pipeline status and summarize any recent failures"
Documentation
Overview
This skill generates realistic load test scripts from API definitions and executes them with proper ramp-up patterns, authentication flows, and assertions. It produces clear reports identifying breaking points, bottlenecks, and latency percentiles at each traffic level.
Instructions
Step 1: Choose Tool and Gather API Info
Prefer k6 for complex scenarios (multi-step flows, thresholds, custom metrics). Use wrk for quick single-endpoint benchmarks. Use autocannon if only Node.js is available.
Gather endpoint information from:
- OpenAPI/Swagger spec files
- Route definitions (Express, FastAPI, etc.)
- User-described endpoints
Step 2: Generate Realistic Payloads
Read request/response types from the codebase (TypeScript interfaces, Python dataclasses, Go structs) and generate payloads with:
- Realistic field values (not "test123" or "foo")
- Proper data distributions (varied product IDs, realistic quantities)
- Edge cases mixed in (long strings, special characters at ~5% rate)
Step 3: Design Test Scenarios
Create scenarios appropriate for the goal:
Ramp-up test (finding breaking point):
stages: [
{ duration: '2m', target: 50 }, // warm-up
{ duration: '5m', target: 200 }, // ramp
{ duration: '3m', target: 500 }, // push
{ duration: '2m', target: 500 }, // sustain
{ duration: '2m', target: 0 }, // cool-down
]
Soak test (finding memory leaks, connection exhaustion):
stages: [
{ duration: '5m', target: 100 }, // ramp
{ duration: '60m', target: 100 }, // sustain
{ duration: '5m', target: 0 }, // cool-down
]
Spike test (sudden traffic burst):
stages: [
{ duration: '2m', target: 50 }, // normal
{ duration: '30s', target: 500 }, // spike
{ duration: '5m', target: 500 }, // sustain spike
{ duration: '30s', target: 50 }, // drop back
]
Step 4: Include Proper Assertions
Always add thresholds:
thresholds: {
http_req_duration: ['p(95)<800', 'p(99)<2000'],
http_req_failed: ['rate<0.05'],
// Custom per-endpoint if needed
}
Step 5: Execute and Report
Run the test and parse output into a structured report showing:
- Latency percentiles (p50, p95, p99) per stage
- Error rate per stage
- Requests per second achieved
- Breaking point (where thresholds first violated)
- Bottleneck hypothesis based on error patterns
Examples
Example 1: Quick Endpoint Benchmark
User prompt:
Benchmark our /api/search endpoint handling 100 concurrent connections
Agent output:
# Quick benchmark with k6
k6 run --vus 100 --duration 60s - <<'EOF'
import http from 'k6/http';
import { check } from 'k6';
export default function () {
const res = http.get('https://api.example.com/api/search?q=laptop&page=1');
check(res, { 'status 200': (r) => r.status === 200 });
}
EOF
Results — /api/search @ 100 VUs for 60s
Requests: 8,423 total (140.4 RPS)
Latency: p50=89ms p95=340ms p99=890ms
Errors: 12 (0.14%)
Data: 24.3 MB received
Verdict: Endpoint handles 140 RPS at 100 VUs. p99 approaching 1s suggests
database query optimization needed for sustained higher load.
Example 2: Multi-Step Flow
User prompt:
Create a load test for our signup → login → create-project flow
Agent generates a k6 script with:
- Step 1: POST /api/auth/signup with randomized email/name
- Step 2: POST /api/auth/login to get JWT
- Step 3: POST /api/projects with auth header and realistic project data
- Custom metrics tracking each step's latency separately
- Sleep between steps to simulate real user behavior
Guidelines
- Never load test production without explicit confirmation — always clarify the target environment
- Start low, ramp gradually — sudden jumps make it hard to identify the exact breaking point
- Realistic think time — add
sleep(1-3)between requests to simulate real users; without it, you're testing throughput, not user concurrency - Authentication matters — many bottlenecks only appear with real auth flows (token validation, session lookups)
- Watch for connection reuse — k6 reuses connections by default, which is realistic for browsers but not for serverless/mobile clients
- Rate limit awareness — if the API has rate limiting, note it in the report; it's not a performance bottleneck, it's intentional
- Report infrastructure context — always note the server specs, pod count, and database size alongside results
Information
- Version
- 1.0.0
- Author
- terminal-skills
- Category
- DevOps
- License
- Apache-2.0