Overview
This skill generates realistic load test scripts from API definitions and executes them with proper ramp-up patterns, authentication flows, and assertions. It produces clear reports identifying breaking points, bottlenecks, and latency percentiles at each traffic level.
Instructions
Step 1: Choose Tool and Gather API Info
Prefer k6 for complex scenarios (multi-step flows, thresholds, custom metrics). Use wrk for quick single-endpoint benchmarks. Use autocannon if only Node.js is available.
Gather endpoint information from:
- OpenAPI/Swagger spec files
- Route definitions (Express, FastAPI, etc.)
- User-described endpoints
Step 2: Generate Realistic Payloads
Read request/response types from the codebase (TypeScript interfaces, Python dataclasses, Go structs) and generate payloads with:
- Realistic field values (not "test123" or "foo")
- Proper data distributions (varied product IDs, realistic quantities)
- Edge cases mixed in (long strings, special characters at ~5% rate)
Step 3: Design Test Scenarios
Create scenarios appropriate for the goal:
Ramp-up test (finding breaking point):
stages: [
{ duration: '2m', target: 50 }, // warm-up
{ duration: '5m', target: 200 }, // ramp
{ duration: '3m', target: 500 }, // push
{ duration: '2m', target: 500 }, // sustain
{ duration: '2m', target: 0 }, // cool-down
]
Soak test (finding memory leaks, connection exhaustion):
stages: [
{ duration: '5m', target: 100 }, // ramp
{ duration: '60m', target: 100 }, // sustain
{ duration: '5m', target: 0 }, // cool-down
]
Spike test (sudden traffic burst):
stages: [
{ duration: '2m', target: 50 }, // normal
{ duration: '30s', target: 500 }, // spike
{ duration: '5m', target: 500 }, // sustain spike
{ duration: '30s', target: 50 }, // drop back
]
Step 4: Include Proper Assertions
Always add thresholds:
thresholds: {
http_req_duration: ['p(95)<800', 'p(99)<2000'],
http_req_failed: ['rate<0.05'],
// Custom per-endpoint if needed
}
Step 5: Execute and Report
Run the test and parse output into a structured report showing:
- Latency percentiles (p50, p95, p99) per stage
- Error rate per stage
- Requests per second achieved
- Breaking point (where thresholds first violated)
- Bottleneck hypothesis based on error patterns
Examples
Example 1: Quick Endpoint Benchmark
User prompt:
Benchmark our /api/search endpoint handling 100 concurrent connections
Agent output:
# Quick benchmark with k6
k6 run --vus 100 --duration 60s - <<'EOF'
import http from 'k6/http';
import { check } from 'k6';
export default function () {
const res = http.get('https://api.example.com/api/search?q=laptop&page=1');
check(res, { 'status 200': (r) => r.status === 200 });
}
EOF
Results — /api/search @ 100 VUs for 60s
Requests: 8,423 total (140.4 RPS)
Latency: p50=89ms p95=340ms p99=890ms
Errors: 12 (0.14%)
Data: 24.3 MB received
Verdict: Endpoint handles 140 RPS at 100 VUs. p99 approaching 1s suggests
database query optimization needed for sustained higher load.
Example 2: Multi-Step Flow
User prompt:
Create a load test for our signup → login → create-project flow
Agent generates a k6 script with:
- Step 1: POST /api/auth/signup with randomized email/name
- Step 2: POST /api/auth/login to get JWT
- Step 3: POST /api/projects with auth header and realistic project data
- Custom metrics tracking each step's latency separately
- Sleep between steps to simulate real user behavior
Guidelines
- Never load test production without explicit confirmation — always clarify the target environment
- Start low, ramp gradually — sudden jumps make it hard to identify the exact breaking point
- Realistic think time — add
sleep(1-3)between requests to simulate real users; without it, you're testing throughput, not user concurrency - Authentication matters — many bottlenecks only appear with real auth flows (token validation, session lookups)
- Watch for connection reuse — k6 reuses connections by default, which is realistic for browsers but not for serverless/mobile clients
- Rate limit awareness — if the API has rate limiting, note it in the report; it's not a performance bottleneck, it's intentional
- Report infrastructure context — always note the server specs, pod count, and database size alongside results