CS 6501: Security of AI Systems: Attacks & Defenses

📬 Submit paper reflections: Reflection Form (Submit before class starts)

</tr>

Date	Module	Topic	Presenter(s)	Reading 1	Reading 2	Optional
Tue, Jan 13	Background	Class Logistics & Introduction to LLM Security	Wajih Ul Hassan	OWASP Top 10 for LLMs (2025)
Thu, Jan 15	Background	Security Principles & Adversarial Machine Learning	Wajih Ul Hassan	Intriguing Properties of Neural Networks (Szegedy et al.)
Tue, Jan 20	Background	LLM Architectures & Transformers	Wajih Ul Hassan	Attention Is All You Need (Vaswani et al.)	The Illustrated Transformer (Jay Alammar)
Thu, Jan 22	Background	Pitfalls in LLM and ML Security Research	Wajih Ul Hassan	Dos and Don'ts of ML in Security (Arp et al.)	Chasing Shadows (Evertz et al.)
Tue, Jan 27	Attack	Prompt Injection Attacks	Zhizhen, Elliot	Prompt Injection Attack against LLM-Integrated Apps (Liu et al.)	AgentDojo (Debenedetti et al.)	InjecAgent (Zhan et al.) • Benchmarking Indirect Prompt Injection (Yi et al.)
Thu, Jan 29	Attack	Jailbreaking Techniques	Connor, Jack	Do Anything Now (Shen et al.)	LARGO (Li et al.)	Jailbreaking LLM-Controlled Robots (Robey et al.) • Breaking the Code (Saha et al.)
Tue, Feb 03	🎤 Guest	Hands-on Workshop: LLM-based Attacks & Prompt Injection	Pavan Reddy (Adversarial Lab)
Thu, Feb 05	Attack	Data Poisoning & Training-Time Attacks	Aymen, Chance	Fine-Tuning Lowers Safety (Arxiv)	Shadow Alignment (Verma et al.)	Systematic Review of Poisoning Attacks (Arxiv)
Tue, Feb 10	Attack	Privacy: Memorization & Membership Inference	Miya, Niveen	Extracting Training Data from ChatGPT (Nasr & Carlini et al.)	MIA on LLMs (Duan et al.)
Thu, Feb 12	🎤 Guest Attack	Preventing Multimodal Cross-Domain Resource Abuse in MCP Tools	Shriti Priya (IBM), Vishv	Agents Under Siege (Khan et al.)
Tue, Feb 17	Attack	Red Teaming	Gavin, Sasha	RedHit (ACL)	Agent Security Bench (Zhang et al.)	Against The Achilles' Heel: Red Teaming Survey (Lin et al.) • AgentHarm (Andriushchenko et al.)
Thu, Feb 19	🎤 Guest Attack	Breaking AI Systems: From Image Classifiers to LLM Agents	Raja Sekhar Rao Dheekonda (Dreadnode), Jason	When AIOps Become "AI Oops" (Pasquini† et al.)
Tue, Feb 24	Defense	Guardrails & Runtime Protection	Siv, Wendy	GuardAgent (Xiang et al.)	R2-Guard (Kang & Li)
Thu, Feb 26	🎤 Guest	Explainable AI	Chirag Agarwal (UVA SDS)
Tue, Mar 03	🌴 Break	Spring Break — No Class
Thu, Mar 05	🌴 Break	Spring Break — No Class
Tue, Mar 10	📊 Project	Student Project Progress Presentations	All Students
Thu, Mar 12	Defense	Defeating Prompt Injections I	Manav, Yan	Defeating Prompt Injections by Design (Debenedetti et al.)	ACE (Li et al.)	Progent (Shi et al.)
Tue, Mar 17	Defense	Defeating Prompt Injections II	Yucheng, Junyan	StruQ (Chen et al.)	DataSentinel (Liu et al.)
Thu, Mar 19	Defense	Multi-Agent Security & Watermarking	Mengmeng, Grayson	A Watermark for LLMs (Kirchenbauer et al.)	Reliability of Watermarks (Kirchenbauer & Goldstein)	D-CIPHER (Udeshi et al.) • Secure Multi-LLM by Zero-Trust (Liu et al.)
Tue, Mar 24	Defense	Safety Layers & Security Enforcement	Kelly, Evan	AgentSpec (Wang et al.)	Constitutional AI (Bai et al.)
Thu, Mar 26	Applications	Penetration Testing with LLMs	Kathleen, Saleha	PentestGPT (Deng et al.)	PentestAgent (Shen et al.)	Teams of LLM Agents Exploit Zero-Day Vulnerabilities (Zhu et al.)
Tue, Mar 31	🎤 Guest Applications	Comparing AI Agents to Cybersecurity Professionals in Penetration Testing	Justin Lin (Stanford), Heng	RepairAgent (Bouzenia et al.)
Thu, Apr 02	🎤 Guest Applications	LLM for Exploitation Generation	Saad Ullah (Boston University), Albert	Incalmo (Singer et al.)
Tue, Apr 07	🎤 Guest Applications	ceLLMate	Earlence Fernandes (UCSD), Matthew	Co-RedTeam (He et al.)
Thu, Apr 09	🎤 Guest	CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution	Prashant Kulkarni (Google), Tingfeng	Broad misalignment (Betley et al.)
Tue, Apr 14	🎤 Guest Applications	Evading Detection with Dynamic AI Mimicry	Muazzam Khan (Cisco), Jonathan	GRP-Obliteration (Russinovich et al.)
Thu, Apr 16	🎤 Guest	Privacy Enhancing Technologies	Harshal Shah, Yusen	The Trigger in the Haystack (Bullwinkel et al.)
Tue, Apr 21	🎤 Guest	TBD	Eoin Wickens (HiddenLayer), Bohan	Automatic Insider Threat Simulation (Yu et al.)
Thu, Apr 23	📊 Project	Final Project Presentations	All Students
Tue, Apr 28	📊 Project	Final Project Presentations	All Students