πŸ“¬ Submit paper reflections: Reflection Form (Submit before class starts)


</tr>
Date Module Topic Presenter(s) Reading 1 Reading 2 Optional
Tue, Jan 13 Background Class Logistics & Introduction to LLM Security Wajih Ul Hassan OWASP Top 10 for LLMs (2025)
Thu, Jan 15 Background Security Principles & Adversarial Machine Learning Wajih Ul Hassan Intriguing Properties of Neural Networks (Szegedy et al.)
Tue, Jan 20 Background LLM Architectures & Transformers Wajih Ul Hassan Attention Is All You Need (Vaswani et al.) The Illustrated Transformer (Jay Alammar)
Thu, Jan 22 Background Pitfalls in LLM and ML Security Research Wajih Ul Hassan Dos and Don'ts of ML in Security (Arp et al.) Chasing Shadows (Evertz et al.)
Tue, Jan 27 Attack Prompt Injection Attacks Zhizhen, Elliot Prompt Injection Attack against LLM-Integrated Apps (Liu et al.) AgentDojo (Debenedetti et al.) InjecAgent (Zhan et al.) β€’ Benchmarking Indirect Prompt Injection (Yi et al.)
Thu, Jan 29 Attack Jailbreaking Techniques Connor, Jack Do Anything Now (Shen et al.) LARGO (Li et al.) Jailbreaking LLM-Controlled Robots (Robey et al.) β€’ Breaking the Code (Saha et al.)
Tue, Feb 03 🎀 Guest Hands-on Workshop: LLM-based Attacks & Prompt Injection Pavan Reddy (Adversarial Lab)
Thu, Feb 05 Attack Data Poisoning & Training-Time Attacks Aymen, Chance Fine-Tuning Lowers Safety (Arxiv) Shadow Alignment (Verma et al.) Systematic Review of Poisoning Attacks (Arxiv)
Tue, Feb 10 Attack Privacy: Memorization & Membership Inference Miya, Niveen Extracting Training Data from ChatGPT (Nasr & Carlini et al.) MIA on LLMs (Duan et al.)
Thu, Feb 12 🎀 Guest Attack Preventing Multimodal Cross-Domain Resource Abuse in MCP Tools Shriti Priya (IBM), Vishv Agents Under Siege (Khan et al.)
Tue, Feb 17 Attack Red Teaming Gavin, Sasha RedHit (ACL) Agent Security Bench (Zhang et al.) Against The Achilles' Heel: Red Teaming Survey (Lin et al.) β€’ AgentHarm (Andriushchenko et al.)
Thu, Feb 19 🎀 Guest Attack Breaking AI Systems: From Image Classifiers to LLM Agents Raja Sekhar Rao Dheekonda (Dreadnode), Jason When AIOps Become "AI Oops" (Pasquini† et al.)
Tue, Feb 24 Defense Guardrails & Runtime Protection Siv, Wendy GuardAgent (Xiang et al.) R2-Guard (Kang & Li)
Thu, Feb 26 🎀 Guest Explainable AI Chirag Agarwal (UVA SDS)
Tue, Mar 03 🌴 Break Spring Break β€” No Class
Thu, Mar 05 🌴 Break Spring Break β€” No Class
Tue, Mar 10 πŸ“Š Project Student Project Progress Presentations All Students
Thu, Mar 12 Defense Defeating Prompt Injections I Manav, Yan Defeating Prompt Injections by Design (Debenedetti et al.) ACE (Li et al.) Progent (Shi et al.)
Tue, Mar 17 Defense Defeating Prompt Injections II Yucheng, Junyan StruQ (Chen et al.) DataSentinel (Liu et al.)
Thu, Mar 19 Defense Multi-Agent Security & Watermarking Mengmeng, Grayson A Watermark for LLMs (Kirchenbauer et al.) Reliability of Watermarks (Kirchenbauer & Goldstein) D-CIPHER (Udeshi et al.) β€’ Secure Multi-LLM by Zero-Trust (Liu et al.)
Tue, Mar 24 Defense Safety Layers & Security Enforcement Kelly, Evan AgentSpec (Wang et al.) Constitutional AI (Bai et al.)
Thu, Mar 26 Applications Penetration Testing with LLMs Kathleen, Saleha PentestGPT (Deng et al.) PentestAgent (Shen et al.) Teams of LLM Agents Exploit Zero-Day Vulnerabilities (Zhu et al.)
Tue, Mar 31 🎀 Guest Applications Comparing AI Agents to Cybersecurity Professionals in Penetration Testing Justin Lin (Stanford), Heng RepairAgent (Bouzenia et al.)
Thu, Apr 02 🎀 Guest Applications LLM for Exploitation Generation Saad Ullah (Boston University), Albert Incalmo (Singer et al.)
Tue, Apr 07 🎀 Guest Applications ceLLMate Earlence Fernandes (UCSD), Matthew Co-RedTeam (He et al.)
Thu, Apr 09 🎀 Guest TBD Prashant Kulkarni (Google), Tingfeng Broad misalignment (Betley et al.)
Tue, Apr 14 🎀 Guest Applications Evading Detection with Dynamic AI Mimicry Muazzam Khan (Cisco), Jonathan GRP-Obliteration (Russinovich et al.)
Thu, Apr 16 🎀 Guest Privacy Enhancing Technologies Harshal Shah, Yusen The Trigger in the Haystack (Bullwinkel et al.)
Tue, Apr 21 🎀 Guest TBD Eoin Wickens (HiddenLayer), Bohan Automatic Insider Threat Simulation (Yu et al.)
Thu, Apr 23 πŸ“Š Project Final Project Presentations All Students
Tue, Apr 28 πŸ“Š Project Final Project Presentations All Students