Pentest Copilot Auto Identifies and Exploits XSS

In this demo, Pentest Copilot’s AI drives a browser, auto-discovers routes, captures real requests, generates targeted XSS test cases, executes them, and writes a verified finding back to the Exploit Graph. There’s no external “agent” required—everything happens inside Copilot.

by Kathan Desai

August 15, 2025

1) What Copilot actually does

Capture → Browse → Generate → Execute → Triage.
Copilot’s AI opens the target app (OWASP Juice Shop in the demo), browses it like a user, and captures every request/response. From those interactions it creates test cases, small scripts, and payloads tailored to the context it observed (for example, query parameters rendered into HTML). It then runs those tests in a controlled browser session. When JavaScript executes (alert()), or the page visibly changes at the injection point, Copilot records the evidence and attaches a Vulnerability node to the exact Action that is exploitable.

2) What you see in the demo

Kickoff: Submodule Testing → ACTV_CRAWLER
You pick the crawler in the Submodule Testing panel and target juice.bugbase.ai. The Exploit Graph starts to populate as Copilot navigates the app and captures requests (method, path, parameters).
Test authoring: ACTV_TEST_GENERATOR
Copilot takes discovered Actions (for example, GET /search?q=) and generates test cases for inputs likely to be reflected or sent to DOM sinks. In the graph/UI you see Action and TestCaseSet nodes grow.
Execution: ACTV_TESTER
The runner executes each test in a controlled browser session. The demo shows dual browser windows where payloads are run through the search feature; address bars display encoded payloads during these runs.
Finding: XSS on /search?q=…
Reflected input is returned into the page without proper encoding. Payloads exercised include classic variants:
- "><script>alert('XSS')</script>
- <img src=x onerror=alert(1)>
- URL-encoded equivalents passed via q
The UI then reports: “Exploit graph updated with found XSS vulnerability.” Submodule Activity shows runs moving to Completed with a positive finding count.
Result linking in the graph
The discovered Vulnerability (XSS) node is linked to the specific Action (for example, GET …/#/search?q=), so the fix is traceable to a route and parameter.

3) How Copilot reasons about XSS

Starts from real traffic. Instead of spraying every parameter, Copilot uses the requests it actually triggered while browsing, so tests focus on what users really touch.
Context-aware payloads. If a query param appears in an HTML/DOM context, Copilot uses small, targeted payloads first, escalating only if needed.
Run-time evidence. A finding is marked only when the browser session shows execution or a clear page-level effect (like a script running or DOM change). That keeps noise low.

4) Minimal PoC you can replay (in a legal test lab)

Use this only on assets you own or are authorized to test (the demo uses OWASP Juice Shop).

Script tag variant (URL-encoded):

https://juice.bugbase.ai/#/search?q=%22%3E%3Cscript%3Ealert(1)%3C/script%3E

<img onerror> variant:

https://juice.bugbase.ai/#/search?q=%3Cimg%20src%3Dx%20onerror%3Dalert(1)%3E

Peek at reflection in raw output:

1curl -s "https://juice.bugbase.ai/#/search?q=%22%3E%3Cscript%3Ealert(1)%3C/script%3E"
2

If the value of q is reflected without encoding, a browser run will execute the payload; Copilot logs the finding against that Action.

5) What your team gets back from Copilot

Vulnerable action: method, path, and parameter (for example, GET /search?q=)
The payload(s) that executed, captured from the successful run
Evidence: screenshots and logs from the demo session
Graph context: Vulnerability ↔ Action ↔ Domain to speed ownership and triage
Reproduction steps: the exact sequence Copilot executed

6) Fix it right: pragmatic hardening checklist

Encode on output, by context
- HTML text → escape & < > " ' /
- Attribute values → always quoted, then encoded
- JavaScript strings → safe serialization (for example, JSON), avoid dynamic eval/constructors
Prefer safe DOM APIs
- Avoid innerHTML, document.write, and inline on* handlers; use text/node setters and event listeners
Sanitize any permitted HTML
- If rich content must be rendered, pass it through a strict sanitizer
Adopt a protective CSP early
- Restrict script sources and remove inline scripts/handlers over time
Regression-proof the fix
- Re-run the same TestCaseSet that caught the bug; keep it in CI so new deploys don’t reintroduce the issue

7) Why Copilot’s approach finds real XSS

User-journey first: test cases come from the paths Copilot actually browsed
Browser-truth: verification happens in a real browser session
Action-linked evidence: findings are tied to specific routes and parameters, reducing MTTR

8) Transparent notes

Scope and access: if a feature requires roles or session state, provide that to Copilot so it can reach the right pages during browsing
Flaky UI states: dynamic widgets can be intermittent; Copilot supports re-runs to stabilize coverage
Ethics: always test with explicit authorization

FAQs

1) Is an external agent required?
No. Everything runs within Pentest Copilot: browsing, capture, test generation, execution, and triage.

2) How is this different from a traditional scanner?
Copilot derives tests from real app interactions and validates in a live browser session, reporting only when execution or evidence is observed.

3) Can it detect DOM-only XSS?
Yes. Since verification happens in the browser, DOM-driven sinks (like values written by client-side code) are exercised and confirmed.

4) Will it blast payloads everywhere?
No. It focuses on discovered Actions and uses small, targeted payloads first, escalating only when needed.

5) How do I confirm the fix?
Re-run the same TestCaseSet against your patched build; Copilot will mark the finding resolved when execution no longer occurs.

Demo

Watch the full demo: https://youtu.be/DySyVlCgRjU