ai-pentesting
Run autonomous AI-driven penetration tests on web applications using tools like Shannon, PentAGI, and similar frameworks. Use when tasks involve setting up automated penetration testing pipelines, combining AI agents with security tools (nmap, subfinder, nuclei, sqlmap), building autonomous exploit chains, generating pentest reports with proof-of-concept exploits, or integrating AI pentesting into CI/CD pipelines. Covers the full pentest lifecycle from reconnaissance to reporting using AI orchestration.
Usage
Getting Started
- Install the skill using the command above
- Open your AI coding agent (Claude Code, Codex, Gemini CLI, or Cursor)
- Reference the skill in your prompt
- The AI will use the skill's capabilities automatically
Example Prompts
- "Deploy the latest build to the staging environment and run smoke tests"
- "Check the CI pipeline status and summarize any recent failures"
Documentation
Overview
Use AI agents to autonomously conduct penetration tests on web applications. Combine LLM reasoning with security tools (nmap, subfinder, nuclei, sqlmap, browser automation) to find and prove vulnerabilities with minimal human intervention.
Instructions
Methodology
AI pentesting follows the same phases as human pentesting, but the AI orchestrates each phase autonomously:
Phase 1: RECONNAISSANCE
├── Subdomain enumeration (subfinder)
├── Technology fingerprinting (whatweb, wappalyzer)
├── Port scanning (nmap)
├── API schema discovery (crawling, OpenAPI/GraphQL introspection)
└── Source code analysis (if white-box)
AI decides: which tools to run, in what order, based on findings
Phase 2: VULNERABILITY ANALYSIS
├── Known CVE scanning (nuclei)
├── Web vulnerability scanning (OWASP ZAP, nikto)
├── API fuzzing (schemathesis)
├── Code-level vulnerability hunting (semgrep, CodeQL)
└── Data flow analysis (input → dangerous function)
AI decides: which findings are likely exploitable
Phase 3: EXPLOITATION
├── SQL injection (sqlmap, manual payloads)
├── XSS (reflected, stored, DOM)
├── SSRF (internal access, cloud metadata)
├── Authentication bypass (broken auth, privilege escalation)
├── Business logic flaws (price manipulation, race conditions)
└── Browser-based exploitation (Playwright/Puppeteer)
AI decides: exploitation order, payload selection, chaining
Phase 4: REPORTING
├── Proof-of-concept for each finding
├── Reproducible steps (curl commands, screenshots)
├── Severity rating (CVSS score)
├── Remediation guidance
└── Executive summary
AI generates: structured, evidence-based report
Setting Up Shannon
Shannon is an open-source AI pentester that automates the full lifecycle:
# Clone and set up Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon
# Configure credentials
export ANTHROPIC_API_KEY="your-api-key"
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
# Run a pentest against your application
# Requires: Docker, target URL, source code repo
./shannon start URL=https://your-app.com REPO=./your-repo
# Monitor progress
./shannon logs
# View results in Temporal UI
open http://localhost:8233
Shannon's architecture:
- Reconnaissance agent: Maps attack surface using nmap, subfinder, whatweb
- Vulnerability agents: Specialized per OWASP category (injection, XSS, SSRF, auth bypass)
- Exploitation agent: Uses browser automation to prove vulnerabilities with real exploits
- Reporting agent: Generates findings with copy-paste PoC commands
Building a Custom AI Pentest Pipeline
For cases where Shannon doesn't fit, build a custom pipeline:
# ai_pentester.py
# Custom AI pentesting pipeline using LLM + security tools
import subprocess
import json
from openai import OpenAI
client = OpenAI()
class AIPentester:
"""Autonomous AI penetration tester.
Orchestrates security tools using LLM reasoning
to find and prove vulnerabilities.
"""
def __init__(self, target_url: str, scope: list[str] = None):
self.target = target_url
self.scope = scope or [target_url]
self.findings = []
self.recon_data = {}
async def run_pentest(self) -> dict:
"""Execute full penetration test lifecycle.
Returns:
Dict with findings, evidence, and recommendations
"""
# Phase 1: Recon
self.recon_data = await self._recon()
# Phase 2: AI-guided vulnerability analysis
targets = await self._analyze_attack_surface(self.recon_data)
# Phase 3: AI-guided exploitation
for target in targets:
finding = await self._exploit(target)
if finding:
self.findings.append(finding)
# Phase 4: Generate report
report = await self._generate_report()
return report
async def _recon(self) -> dict:
"""Run reconnaissance tools and aggregate results."""
recon = {}
# Subdomain enumeration
result = subprocess.run(
['subfinder', '-d', self._get_domain(), '-silent'],
capture_output=True, text=True, timeout=120
)
recon['subdomains'] = result.stdout.strip().split('\n')
# Technology fingerprinting
result = subprocess.run(
['whatweb', self.target, '--log-json=/dev/stdout', '-a', '3'],
capture_output=True, text=True, timeout=60
)
recon['technologies'] = json.loads(result.stdout) if result.stdout else {}
# Port scanning
result = subprocess.run(
['nmap', '-sV', '--top-ports', '1000', '-oJ', '-', self._get_domain()],
capture_output=True, text=True, timeout=300
)
recon['ports'] = result.stdout
# Nuclei scan for known CVEs
result = subprocess.run(
['nuclei', '-u', self.target, '-severity', 'critical,high',
'-json', '-silent'],
capture_output=True, text=True, timeout=300
)
recon['known_vulns'] = [
json.loads(line) for line in result.stdout.strip().split('\n')
if line.strip()
]
return recon
async def _analyze_attack_surface(self, recon: dict) -> list:
"""Use AI to analyze recon data and prioritize attack targets."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content":
"You are an expert penetration tester. Analyze the "
"reconnaissance data and identify the most promising "
"attack vectors. Return JSON array of targets."},
{"role": "user", "content":
f"Recon data:\n{json.dumps(recon, indent=2)}\n\n"
"Identify attack targets with: endpoint, vulnerability_type, "
"technique, priority (1-5), reasoning."}
],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content).get("targets", [])
async def _exploit(self, target: dict) -> dict | None:
"""Attempt to exploit an identified vulnerability."""
vuln_type = target.get('vulnerability_type', '').lower()
handlers = {
'injection': self._test_injection,
'xss': self._test_xss,
'ssrf': self._test_ssrf,
'auth': self._test_auth_bypass,
}
for key, handler in handlers.items():
if key in vuln_type:
return await handler(target)
return None
async def _generate_report(self) -> dict:
"""Generate a structured penetration test report."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content":
"Generate a professional penetration test report with "
"executive summary, findings with CVSS scores, PoC steps, "
"and remediation recommendations."},
{"role": "user", "content":
f"Target: {self.target}\n"
f"Findings: {json.dumps(self.findings, indent=2)}\n"
f"Recon data: {json.dumps(self.recon_data, indent=2)}"}
]
)
return {
"target": self.target,
"findings_count": len(self.findings),
"findings": self.findings,
"report": response.choices[0].message.content
}
CI/CD Integration
Run AI pentests on every deployment:
# .github/workflows/pentest.yml
name: AI Penetration Test
on:
push:
branches: [main]
schedule:
- cron: '0 2 * * 1' # Weekly Monday 2 AM
jobs:
pentest:
runs-on: ubuntu-latest
services:
app:
image: your-app:${{ github.sha }}
ports:
- 8080:8080
steps:
- uses: actions/checkout@v4
- name: Run Shannon Pentest
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon
./shannon start \
URL=http://localhost:8080 \
REPO=../ \
MAX_CONCURRENT=3
# Wait for completion and extract report
./shannon wait
cp workspace/report.md $GITHUB_WORKSPACE/pentest-report.md
- name: Upload Report
uses: actions/upload-artifact@v4
with:
name: pentest-report
path: pentest-report.md
- name: Fail on Critical Findings
run: |
if grep -q "CRITICAL" pentest-report.md; then
echo "::error::Critical vulnerabilities found!"
exit 1
fi
Report Structure
A professional AI-generated pentest report should include: executive summary (scope, duration, methodology, overall risk, findings count by severity), individual findings (each with CVSS score, affected endpoint/parameter, evidence with reproducible curl commands, impact description, and specific remediation guidance), and a remediation priority list ordered by severity with recommended fix timelines.
Examples
Run an autonomous pentest on a web application
Set up Shannon to run a full penetration test on our staging environment at https://staging.ourapp.com. The source code is in the current repository. Configure it to test for: SQL injection, XSS, SSRF, and broken authentication. Run with maximum concurrency and generate a report with reproducible proof-of-concept exploits for every finding. Flag any critical vulnerabilities that need immediate attention.
Build a custom AI pentest pipeline
Build a custom AI pentesting pipeline that combines subfinder (subdomain discovery), whatweb (tech fingerprinting), nuclei (CVE scanning), and schemathesis (API fuzzing) orchestrated by an LLM agent. The LLM should analyze results from each tool, decide what to test next, and generate exploitation payloads. Target: our API at api.example.com with the OpenAPI spec at /docs/openapi.json. Produce a structured findings report.
Integrate AI pentesting into CI/CD
Add automated penetration testing to our GitHub Actions pipeline. It should run on every push to main and weekly on a schedule. The app runs in Docker (docker-compose up), exposed at localhost:8080. Use Shannon for the pentest, upload the report as an artifact, and fail the build if any critical or high severity vulnerabilities are found. Include Slack notification for findings.
Guidelines
- Only run penetration tests against systems you have explicit written authorization to test — unauthorized testing is illegal
- AI pentesters can cause real damage (data modification, service disruption) — always test against staging environments, never production
- Review AI-generated exploitation attempts before running them — LLMs can hallucinate or generate overly aggressive payloads
- Treat pentest reports as confidential — they contain vulnerability details and proof-of-concept exploits
- Set time limits and scope boundaries for autonomous testing to prevent runaway scans
- Validate AI findings manually — false positives in automated reports erode trust with stakeholders
- Store API keys and credentials used for pentesting securely — never hardcode them in CI configurations
Information
- Version
- 1.0.0
- Author
- terminal-skills
- Category
- DevOps
- License
- Apache-2.0