Large language models are supposed to shut down when users ask for dangerous help, from building weapons to writing malware.
frontier proprietary and open-weight models yielded high attack success rates when prompted in verse, indicating a deeper, ...
The research suggests broader implications for public safety and AI governance. Because poetic jailbreaks are relatively ...
Poetic prompts that look harmless to a casual reader are now being used to coax large language models into describing the ...