π¬ Submit paper reflections: Reflection Form (Submit before class starts)
| Date | Module | Topic | Presenter(s) | Reading 1 | Reading 2 | Optional |
|---|---|---|---|---|---|---|
| Tue, Jan 13 | Background | Class Logistics & Introduction to LLM Security | Wajih Ul Hassan | OWASP Top 10 for LLMs (2025) | ||
| Thu, Jan 15 | Background | Security Principles & Adversarial Machine Learning | Wajih Ul Hassan | Intriguing Properties of Neural Networks (Szegedy et al.) | ||
| Tue, Jan 20 | Background | LLM Architectures & Transformers | Wajih Ul Hassan | Attention Is All You Need (Vaswani et al.) | The Illustrated Transformer (Jay Alammar) | |
| Thu, Jan 22 | Background | Pitfalls in LLM and ML Security Research | Wajih Ul Hassan | Dos and Don'ts of ML in Security (Arp et al.) | Chasing Shadows (Evertz et al.) | |
| Tue, Jan 27 | Attack | Prompt Injection Attacks | Zhizhen, Elliot | Prompt Injection Attack against LLM-Integrated Apps (Liu et al.) | AgentDojo (Debenedetti et al.) | InjecAgent (Zhan et al.) β’ Benchmarking Indirect Prompt Injection (Yi et al.) |
| Thu, Jan 29 | Attack | Jailbreaking Techniques | Connor, Jack | Do Anything Now (Shen et al.) | LARGO (Li et al.) | Jailbreaking LLM-Controlled Robots (Robey et al.) β’ Breaking the Code (Saha et al.) |
| Tue, Feb 03 | π€ Guest | Hands-on Workshop: LLM-based Attacks & Prompt Injection | Pavan Reddy (Adversarial Lab) | |||
| Thu, Feb 05 | Attack | Data Poisoning & Training-Time Attacks | Aymen, Chance | Fine-Tuning Lowers Safety (Arxiv) | Shadow Alignment (Verma et al.) | Systematic Review of Poisoning Attacks (Arxiv) |
| Tue, Feb 10 | Attack | Privacy: Memorization & Membership Inference | Miya, Niveen | Extracting Training Data from ChatGPT (Nasr & Carlini et al.) | MIA on LLMs (Duan et al.) | |
| Thu, Feb 12 | π€ Guest Attack | Preventing Multimodal Cross-Domain Resource Abuse in MCP Tools | Shriti Priya (IBM), Vishv | Agents Under Siege (Khan et al.) | ||
| Tue, Feb 17 | Attack | Red Teaming | Gavin, Sasha | RedHit (ACL) | Agent Security Bench (Zhang et al.) | Against The Achilles' Heel: Red Teaming Survey (Lin et al.) β’ AgentHarm (Andriushchenko et al.) |
| Thu, Feb 19 | π€ Guest Attack | Breaking AI Systems: From Image Classifiers to LLM Agents | Raja Sekhar Rao Dheekonda (Dreadnode), Jason | When AIOps Become "AI Oops" (Pasquiniβ et al.) | ||
| Tue, Feb 24 | Defense | Guardrails & Runtime Protection | Siv, Wendy | GuardAgent (Xiang et al.) | R2-Guard (Kang & Li) | |
| Thu, Feb 26 | π€ Guest | Explainable AI | Chirag Agarwal (UVA SDS) | |||
| Tue, Mar 03 | π΄ Break | Spring Break β No Class | ||||
| Thu, Mar 05 | π΄ Break | Spring Break β No Class | ||||
| Tue, Mar 10 | π Project | Student Project Progress Presentations | All Students | |||
| Thu, Mar 12 | Defense | Defeating Prompt Injections I | Manav, Yan | Defeating Prompt Injections by Design (Debenedetti et al.) | ACE (Li et al.) | Progent (Shi et al.) |
| Tue, Mar 17 | Defense | Defeating Prompt Injections II | Yucheng, Junyan | StruQ (Chen et al.) | DataSentinel (Liu et al.) | |
| Thu, Mar 19 | Defense | Multi-Agent Security & Watermarking | Mengmeng, Grayson | A Watermark for LLMs (Kirchenbauer et al.) | Reliability of Watermarks (Kirchenbauer & Goldstein) | D-CIPHER (Udeshi et al.) β’ Secure Multi-LLM by Zero-Trust (Liu et al.) |
| Tue, Mar 24 | Defense | Safety Layers & Security Enforcement | Kelly, Evan | AgentSpec (Wang et al.) | Constitutional AI (Bai et al.) | |
| Thu, Mar 26 | Applications | Penetration Testing with LLMs | Kathleen, Saleha | PentestGPT (Deng et al.) | PentestAgent (Shen et al.) | Teams of LLM Agents Exploit Zero-Day Vulnerabilities (Zhu et al.) |
| Tue, Mar 31 | π€ Guest Applications | Comparing AI Agents to Cybersecurity Professionals in Penetration Testing | Justin Lin (Stanford), Heng | RepairAgent (Bouzenia et al.) | ||
| Thu, Apr 02 | π€ Guest Applications | LLM for Exploitation Generation | Saad Ullah (Boston University), Albert | Incalmo (Singer et al.) | ||
| Tue, Apr 07 | π€ Guest Applications | ceLLMate | Earlence Fernandes (UCSD), Matthew | Co-RedTeam (He et al.) | ||
| Thu, Apr 09 | π€ Guest | TBD | Prashant Kulkarni (Google), Tingfeng | Broad misalignment (Betley et al.) | ||
| Tue, Apr 14 | π€ Guest Applications | Evading Detection with Dynamic AI Mimicry | Muazzam Khan (Cisco), Jonathan | GRP-Obliteration (Russinovich et al.) | ||
| Thu, Apr 16 | π€ Guest | Privacy Enhancing Technologies | Harshal Shah, Yusen | The Trigger in the Haystack (Bullwinkel et al.) | ||
| Tue, Apr 21 | π€ Guest | TBD | Eoin Wickens (HiddenLayer), Bohan | Automatic Insider Threat Simulation (Yu et al.) | ||
| Thu, Apr 23 | π Project | Final Project Presentations | All Students | |||
| Tue, Apr 28 | π Project | Final Project Presentations | All Students |