Tomás leads security for a fintech startup with 12 microservices, 3 web apps, and a mobile API. The team does manual penetration testing once a year — it costs $40K, takes 3 weeks, and by the time the report arrives, half the findings are already outdated because the codebase changed. He needs continuous security testing that runs against every staging deployment, catches regressions immediately, and builds institutional knowledge about what breaks and why. He deploys PentAGI as an autonomous security testing platform.
Step 1: Deploy PentAGI on Isolated Infrastructure
Security testing tools must never run on the same network as production. Tomás provisions a dedicated testing VPS with Docker.
# Provision a dedicated pentest server (isolated from production)
# Requirements: 8GB RAM, 4 CPU cores, 100GB SSD
# Clone and configure PentAGI
git clone https://github.com/vxcontrol/pentagi.git
cd pentagi
cp .env.example .env
# .env — Configuration for continuous security testing
# LLM — Using Anthropic for strong reasoning on complex attack chains
LLM_PROVIDER=anthropic
LLM_MODEL=claude-sonnet-4-20250514
ANTHROPIC_API_KEY=sk-ant-...
# Search — Tavily for CVE lookups and exploit research
TAVILY_API_KEY=tvly-...
# Database
POSTGRES_PASSWORD=pentest-db-2026-secure
SECRET_KEY=$(openssl rand -hex 32)
# Monitoring
GRAFANA_ADMIN_PASSWORD=grafana-secure-2026
LANGFUSE_SECRET_KEY=$(openssl rand -hex 32)
# Network — bind only to internal network
BIND_ADDRESS=10.0.50.10
# Deploy the full stack
docker compose up -d
# Verify all services are healthy
docker compose ps
# NAME STATUS PORTS
# pentagi-api Up (healthy) 3000/tcp
# pentagi-ui Up 3001/tcp
# pentagi-postgres Up (healthy) 5432/tcp
# pentagi-neo4j Up 7474/tcp, 7687/tcp
# pentagi-grafana Up 3002/tcp
# pentagi-langfuse Up 3003/tcp
# pentagi-scraper Up 9222/tcp
Step 2: Define the Engagement Scope
The first assessment targets the staging environment. Tomás defines clear boundaries — what to test, what to avoid, and what the AI agents are allowed to do.
// scripts/create-engagement.ts — Define and launch security assessment
const PENTAGI_URL = 'http://10.0.50.10:3000/graphql'
const engagement = await fetch(PENTAGI_URL, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.PENTAGI_TOKEN}`,
},
body: JSON.stringify({
query: `
mutation CreateTask($input: CreateTaskInput!) {
createTask(input: $input) {
id
status
}
}
`,
variables: {
input: {
name: 'Staging Full Assessment — Sprint 47',
target: 'staging.internal.finpay.dev',
objective: `Perform a comprehensive security assessment of the FinPay staging environment.
Target services:
- Web application (React + Next.js) at staging.internal.finpay.dev
- REST API at api-staging.internal.finpay.dev
- Mobile API at mobile-staging.internal.finpay.dev
- Admin panel at admin-staging.internal.finpay.dev
Focus areas:
1. Authentication and session management flaws
2. API authorization bypass (IDOR, privilege escalation)
3. Input validation (SQL injection, XSS, SSRF)
4. Business logic flaws in payment flows
5. Exposed sensitive data in API responses
6. Misconfigured security headers
7. Known CVEs in dependencies`,
scope: [
'port-scan',
'service-enum',
'web-app-test',
'api-fuzz',
'auth-test',
'vuln-scan',
],
constraints: [
'no-dos', // don't run denial-of-service tests
'no-data-exfil', // don't extract real user data
'no-brute-force-production', // staging only
'max-concurrent-requests:50', // don't overwhelm staging infra
'test-accounts-only', // use provided test credentials
],
credentials: {
testUser: { email: 'pentest-user@test.finpay.dev', password: 'TestPass2026!' },
testAdmin: { email: 'pentest-admin@test.finpay.dev', password: 'AdminTest2026!' },
},
},
},
}),
}).then(r => r.json())
console.log(`Engagement started: ${engagement.data.createTask.id}`)
Step 3: AI Agents Execute the Assessment
Once launched, PentAGI's multi-agent system works autonomously. The primary agent orchestrates the assessment, delegating to specialized agents.
Phase 1: Reconnaissance (agents work in parallel)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[Primary Agent] Planning reconnaissance phase for staging.internal.finpay.dev
[Infra Agent] Running: nmap -sV -sC -p- staging.internal.finpay.dev
→ Found 7 open ports: 22, 80, 443, 3000, 5432, 6379, 8080
[Research Agent] Querying CVE database for detected service versions
→ nginx/1.25.3: 2 known CVEs (low severity)
→ PostgreSQL 16.1: 1 known CVE (medium, auth bypass)
→ Redis 7.2.3 exposed without auth ⚠️ CRITICAL
[Research Agent] Web scraping: checking robots.txt, sitemap.xml, .well-known
→ Found /api/docs (Swagger UI exposed)
→ Found /.env.example (information disclosure)
Phase 2: Vulnerability Scanning
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[Primary Agent] Prioritizing targets based on reconnaissance
[Infra Agent] Running: nikto -h https://staging.internal.finpay.dev
[Infra Agent] Running: sqlmap --crawl=3 -u https://api-staging.internal.finpay.dev
[Dev Agent] Testing API authorization: trying test-user credentials on admin endpoints
→ FINDING: /api/admin/users accessible with regular user token ⚠️ HIGH
[Dev Agent] Testing payment flow for logic flaws
→ FINDING: negative amount accepted in transfer API ⚠️ CRITICAL
Phase 3: Exploitation Attempts
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[Primary Agent] Attempting exploitation of confirmed vulnerabilities
[Dev Agent] Redis exposed without auth → connected, read session data
→ CONFIRMED: Session hijacking possible via Redis
[Dev Agent] IDOR on /api/users/{id}/transactions — can read other users' data
→ CONFIRMED: Full transaction history accessible
[Research Agent] Searching knowledge graph for similar patterns
→ Previous engagement found same IDOR pattern in /api/users/{id}/settings
→ Checking: /api/users/{id}/settings still vulnerable → YES ⚠️
Phase 4: Report Generation
━━━━━━━━━━━━━━━━━━━━━━━━━━
[Primary Agent] Compiling findings into vulnerability report
→ 3 Critical, 4 High, 6 Medium, 8 Low findings
→ Report generated with evidence and remediation steps
Step 4: Integrate into CI/CD Pipeline
Run PentAGI automatically against every staging deployment. Fail the pipeline if critical vulnerabilities are found.
# .github/workflows/security-test.yml — Automated security testing
name: Security Assessment
on:
deployment_status:
# Trigger after staging deployment succeeds
types: [success]
jobs:
pentest:
if: github.event.deployment.environment == 'staging'
runs-on: ubuntu-latest
timeout-minutes: 120 # max 2 hours for security assessment
steps:
- uses: actions/checkout@v4
- name: Launch PentAGI Assessment
id: pentest
run: |
TASK_ID=$(curl -s -X POST $PENTAGI_URL/graphql \
-H "Authorization: Bearer ${{ secrets.PENTAGI_TOKEN }}" \
-H "Content-Type: application/json" \
-d '{
"query": "mutation { createTask(input: { target: \"staging.internal.finpay.dev\", objective: \"Quick security regression test: auth, API authorization, input validation\", scope: [\"web-app-test\", \"api-fuzz\", \"auth-test\"], constraints: [\"no-dos\", \"max-duration:60m\"] }) { id } }"
}' | jq -r '.data.createTask.id')
echo "task_id=$TASK_ID" >> $GITHUB_OUTPUT
- name: Wait for Assessment
run: |
while true; do
STATUS=$(curl -s -X POST $PENTAGI_URL/graphql \
-H "Authorization: Bearer ${{ secrets.PENTAGI_TOKEN }}" \
-H "Content-Type: application/json" \
-d "{\"query\": \"{ task(id: \\\"${{ steps.pentest.outputs.task_id }}\\\") { status progress } }\"}" \
| jq -r '.data.task.status')
echo "Status: $STATUS"
if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then break; fi
sleep 60
done
- name: Check Findings
run: |
CRITICAL=$(curl -s -X POST $PENTAGI_URL/graphql \
-H "Authorization: Bearer ${{ secrets.PENTAGI_TOKEN }}" \
-H "Content-Type: application/json" \
-d "{\"query\": \"{ task(id: \\\"${{ steps.pentest.outputs.task_id }}\\\") { findings { severity } } }\"}" \
| jq '[.data.task.findings[] | select(.severity == "critical")] | length')
echo "Critical findings: $CRITICAL"
if [ "$CRITICAL" -gt 0 ]; then
echo "::error::$CRITICAL critical vulnerabilities found! Blocking deployment."
exit 1
fi
- name: Upload Report
if: always()
run: |
curl -H "Authorization: Bearer ${{ secrets.PENTAGI_TOKEN }}" \
"$PENTAGI_URL/api/v1/tasks/${{ steps.pentest.outputs.task_id }}/report" \
-o security-report.pdf
- name: Upload Report Artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: security-report-${{ github.sha }}
path: security-report.pdf
Step 5: Knowledge Graph Compounds Over Time
After 3 months of continuous testing (2 assessments/week):
Knowledge Graph Statistics:
├── 847 vulnerability patterns stored
├── 234 unique service fingerprints
├── 156 successful exploitation paths
├── 89 technology stack profiles
└── 12 recurring vulnerability categories
The AI agents now:
- Skip reconnaissance on known services (saves 15min per run)
- Immediately test for patterns that recurred in past sprints
- Correlate new findings with historical data
("This IDOR is the same pattern we found in Sprint 41 — the fix was incomplete")
- Predict which new features are likely to introduce specific vulnerability types
("Payment endpoint changed → testing for amount manipulation and race conditions first")
Results
Security testing frequency goes from once per year to twice per week. The average time to discover a critical vulnerability drops from 3 weeks (annual pentest) to 2 hours (first CI/CD run after the vulnerable code ships). The Redis exposure — which had existed for 8 months undetected — is found in the first automated assessment. The knowledge graph catches a recurring IDOR pattern across 4 consecutive sprints, proving the root cause was a shared authorization middleware bug, not individual endpoint issues. This leads the team to fix the middleware once instead of patching endpoints one by one. Annual security testing cost drops from $40K (external pentest firm) to $3K (LLM API costs + server hosting), while coverage increases from 1 assessment per year to 100+. The engineering team fixes critical findings within 48 hours because they get the report while the code is still fresh in their minds, not 3 weeks later.