MASTER THESIS DEFENSE · APRIL 2026— I —

Agentic AI Framework for Web Vulnerability Detection, Mitigation and Patching

ProgrammeMPCSN
DefenseApril 2026
Part One
I

Background

IBackground
IIProblems
IIISupported Workflows
IVMethodology
VEvaluation
VIEthics
Part I · Background
01 / 15
The Landscape

Modern web applications have a massive attack surface; vulnerabilities are appearing faster than they can be patched.

The Pipeline Imbalance
Detection
High automation (SAST/DAST/IAST) and signature-based tools.
Mitigation
Remains a severe human-dependent bottleneck.
The Failure of Traditional APR
  • Deep-learning systems (e.g., VulRepair) require massive, labeled vulnerability datasets.
  • Struggle to generalize across diverse, real-world web frameworks (React, Django, Spring).
Comparison of SAST, DAST, and IAST approaches
Agentic AVR · Thesis Defense§ 1.1
Part I · Background
02 / 15
The Promise

Large Language Models (LLMs) offer a "Reasoning-on-the-fly" capability without task-specific fine-tuning.

The Reliability Wall

Semantic Drift

LLMs often propose "hallucinated" fixes in the wrong files or break existing business logic.

False Safety

Models confidently claim a vulnerability is fixed, but the fix is only superficial (e.g., adding a comment or a useless check).

The Goal

We need a system that moves from "Generation" to "Execution-Verified Mitigation."

Agentic AVR · Thesis Defense§ 1.2
Part Two
II

Problems

IBackground
IIProblems
IIISupported Workflows
IVMethodology
VEvaluation
VIEthics
Part II · Problems
03 / 15

Domain Misalignment & The "Silent Failure" Paradigm

APR vs. AVR

APR: Automated Program Repair.

AVR: Automated Vulnerability Repair.

General coding agents (SWE-agent) optimize for passing unit tests.

Automated Vulnerability Repair (AVR) must optimize for eliminating sinks.

The inadequacy of Static Validation

Structural Matching: Penalizes innovative, valid patches.

LLM-as-Judge: Prone to bias and lacks "executable proof," leading to a false sense of security.

Web-Specific Complexity

Beyond "Crashes": The Silent Failure. Unlike older software that crashes when a bug occurs, web apps often keep running normally even while being hacked. The vulnerability is hidden, and there are no obvious error messages or "smoke."

Deep Logic Chains: It's Not Just One Click. Modern bugs are rarely found in a single input. They often require a specific sequence across pages or APIs (e.g., Register -> Change Settings -> Trigger Action) to actually show up.

Agentic AVR · Thesis Defense§ 2.1
Part II · Problems
04 / 15
Limitations of Single-Agent Systems

The Limitation of Single-Agent Systems

Cognitive Overload

Security repair is a multi-step process. A single prompt cannot effectively manage file discovery, root-cause analysis, and regression testing at the same time. Large context will overwhelm the model, leading to lost details and reduced precision.

Hallucination Risk

Single agents suffer from "self-grading" bias, where the model hallucinates a successful outcome while reviewing its own patch, failing to objectively identify its own logic flaws.

Based on these insights, we propose a Multi-Agent Workflow that decomposes the complex repair task into specialized roles, ensuring higher precision and reliability.

Single Agent ReAct loop compared with a four-agent sequential workflow
Agentic AVR · Thesis Defense§ 2.2
Part Three
III

Supported
Workflows

IBackground
IIProblems
IIISupported Workflows
IVMethodology
VEvaluation
VIEthics
Part III · Supported Workflows
05 / 15

Four practical security review scenarios.

I

Issue Review

Analyze security-related issue reports and produce structured findings.

II

Pull Request Review

Review code changes in PRs and provide risk analysis with mitigation suggestions.

III

Repo-wide Full Scan

Run baseline security scanning across the whole repository.

IV

Repo-wide Incremental Scan

Re-scan only the changed code file between last scan and current codebase for continuous security assurance.

Issue review workflow example
Pull request review workflow example
Repo-wide full scan workflow example
Repo-wide incremental scan workflow example
Agentic AVR · Thesis Defense§ 3.1
Part Four
IV

Methodology
& Case Study

IBackground
IIProblems
IIISupported Workflows
IVMethodology & Case Study
VEvaluation
VIEthics
Part IV · Methodology & Case Study
06 / 15
Issue Review Example

Starlette CVE-2023-29159

Directory traversal in static file serving

Issue description: Directory traversal vulnerability in Starlette versions 0.13.5 and later and prior to 0.27.0 allows a remote unauthenticated attacker to view files in a web service which was built using Starlette.

Path Diagram
/app
├── static
│   └── logo.png
└── static_secrets
    └── config.json
Allowed Directory
/app/static
Attacker Request
/app/static_secrets/config.json
Why commonprefix Fails
commonprefix(
["/app/static_secrets/config.json", "/app/static"]
) = "/app/static"

Looks valid as text, but is outside the allowed directory in the real filesystem.

Correct Fix
commonpath(
["/app/static_secrets/config.json", "/app/static"]
) = "/app"

Compare path components, not string prefixes.

Agentic AVR · Thesis Defense§ 4.1
Part IV · Methodology & Case Study
07 / 15
Issue Review Example

Agent Analysis Procedure

Analyze, mitigate, and verify the Starlette path traversal fix

Agent Analysis Procedure
  1. 1

    Analyze

    Confirmed a real path traversal risk, traced attacker-controlled request paths into StaticFiles.lookup_path, and identified os.path.commonprefix() as the vulnerable security boundary.

  2. 2

    Mitigate

    Generated a minimal root-cause patch by replacing string-prefix validation with path-component validation: commonprefix -> commonpath.

  3. 3

    Verify

    Checked that the patch was applied at the exact vulnerable location and marked it effective, while flagging missing runtime regression coverage and Windows edge cases.

- if os.path.commonprefix([full_path, directory]) != directory:
+ if os.path.commonpath([full_path, directory]) != directory:
Agentic AVR · Thesis Defense§ 4.2
Part IV · Methodology & Case Study
08 / 15
Mechanism

Evidence + Execution

Grounding agent reasoning in code facts and runnable checks.

Cross-Cutting Mechanism

Whether the workflow is Issue Review or Repo-wide Scan, agents combine repository evidence with lightweight execution to stay grounded.

Evidence-Grounded Reasoning

  • Inspect source files, diffs, tests, and configuration to gather supporting evidence.
  • Tie findings back to concrete code locations instead of free-form assumptions.

Execution-Guided Validation

  • Run lightweight command-line checks, targeted tests, or reproduction scripts when available.
  • Use execution results to confirm assumptions, expose false positives, and reduce self-grading bias.
Agentic AVR · Thesis Defense§ 4.3
Part IV · Methodology & Case Study
09 / 15
Repo-wide Full Scan

Workflow

Monitor, analyze, mitigate, verify, and merge across the entire repository.

STAGE 01
Monitor

Scan the entire codebase.

Discovery: identify potential vulnerability candidates and security hotspots.

Triage: remove obvious false positives first and merge similar items.

STAGE 02
Analysis

The analyzer audits input signals together with repository evidence to decide whether a case is confirmed.

Analyzer Gate: If the case does not pass by the analyzer, workflow stops before mitigation; if it passes, the system will then assigns **CVSS v4** scores.

STAGE 03
Mitigation

Generates "Minimal Patches" that remove vulnerability sinks while preserving business logic.

STAGE 04
Verification

Audits original input, analyzer conclusions, and patch content together with **skeptical view**.

Verifier Gate: If verification does not pass, the workflow is blocked and won't be passed to merge.

STAGE 05
Merge

Correlates related fixes and merge them into unified delivery bundles.

Agentic AVR · Thesis Defense§ 4.4
Part IV · Methodology & Case Study
10 / 15
Repo-wide Full Scan Example

Vulnerable-Book-Shop

Real-world Vulnerability Discovery & Mitigation in a Simulated E-commerce Environment

Testbed: Vulnerable-Book-Shop
  • Architecture: A complete full-stack application featuring complex business logic: User Auth, Search, Cart Management, and Payment flows.
  • Real-world Complexity: Built with actual database connections, JWT authentication, and mock payment gateways.
The Number Story (End-to-End)

79 files traversed -> 68 scanned -discovery-> 26 candidates -triage-> 11 focused cases -analyzer-> 11 confirmed cases -mitigate & verifier-> 11 verified cases -merge-> 8 publishable deliveries (3 combined + 5 single)

discovery
triage
analyzer
mitigate & verifier
merge
79
files traversed
68
scanned
26
candidates
11
focused cases
11
confirmed cases
11
verified cases
8
publishable deliveries
3 combined + 5 single
Agentic AVR · Thesis Defense§ 4.5
Part IV · Methodology & Case Study
11 / 15
Case Evidence · Summary Overview

One page view of both workflow-level summary evidence and PR-generation breadth.

Summary Overview
Summary overview screenshot of security findings and mitigation workflow
Generated PR List
Generated pull request list showing multiple security fix proposals
Agentic AVR · Thesis Defense§ 4.6
Part IV · Methodology & Case Study
12 / 15
Case Evidence · PR Detail & Case Report Detail

One page view of patch-level detail and final case report evidence.

PR Detail
Detailed pull request view with patch changes and discussion
Case Report Detail
Case report detail screenshot
Agentic AVR · Thesis Defense§ 4.7
Part IV · Methodology & Case Study
13 / 15
Reviewer Load Reduction

Two Merge Points, Two Different Goals

AFTER MONITOR · BEFORE ANALYZER

Merge, Before Analyzer

Reduce Analysis Load

  • Remove obvious false positives first.
  • Merge items only when they describe the same underlying explanation.
  • Goal: reduce the number of analysis targets without losing meaningful risk signals.
AFTER VERIFIER

Merge, After Verifier

Reduce Human Reviewer Overload

  • Group outputs only when delivery execution is strongly coupled.
  • Rule: merge only if fixes must be applied together to be complete, or to avoid execution conflicts.
  • Goal: keep human review volume manageable while preserving delivery correctness.
CVSS v4

CVSS v4 Scoring

Prioritize Reviewer Attention

  • Confirmed vulnerabilities receive standardized CVSS v4 scoring.
  • This gives reviewers a clearer severity and priority signal to make decisions.
Agentic AVR · Thesis Defense§ 4.8
Part Five
V

Evaluation

IBackground
IIProblems
IIISupported Workflows
IVMethodology
VEvaluation
VIEthics
Part V · Evaluation
14 / 15
Evaluation

Evaluate Issue Scenario with PatchEval Dataset

The Benchmarking Dataset: PatchEval

PatchEval transforms selected CVEs into 100+ reproducible cases. Each case contains a Docker image, a CVE description, and test scripts covering both attack PoCs and unit tests.

Original Codebase
The baseline environment containing the actual vulnerability.
Vulnerability Description
Acts as the analysis heuristic and task directive for the Agent.
Test Scripts
Pre-defined automation logic serving as the Gold Standard for success verification.
Agentic AVR · Thesis Defense§ 5.1
Part Six
VI

Ethics

IBackground
IIProblems
IIISupported Workflows
IVMethodology
VEvaluation
VIEthics
Part VI · Ethics
15 / 15
Ethics & Governance

Ethics & Governance

The Dual-Use Reality
  • Short-term Risk: Like any automated security tool, there is potential for misuse. Attackers could use the system to identify vulnerabilities.
  • Long-term Benefit: The long-term benefits of improving web security at scale outweigh these risks. It helps developers fix vulnerabilities faster and improve overall security levels.
  • Limited Access: Rather than an immediate public release, access is restricted to audited companies and trusted institutions, then expanded gradually as misuse is monitored.
Ethical Positioning of This Project
  • Blue-Team Focus: Purely optimized for Detection & Mitigation.
  • No Exploit Capabilities: Intentionally excludes "Exploit Agents" to prevent malicious use.
Technical Safeguards
  • Sandboxed Execution: All agent actions are strictly isolated; zero real-world impact.
Agentic AVR · Thesis Defense§ 6.1
Master Thesis Defense
fin.
In Closing

Thank you.

Agentic AVR · Thesis Defensefin.