This article was originally posted on Northeastern Global News by Cody Mello-Klein.
Most of the companies behind large language models like Chat GPT claim to have guardrails in place for understandable reasons. They wouldn’t want their models to, hypothetically, offer instructions to users on how to hurt themselves or commit suicide.
However, researchers from Northeastern University found that those guardrails are not only easy to break but LLMs are more than happy to offer up shockingly detailed instructions for suicide if you ask the right way.
Annika Marie Schoene, a research scientist for Northeastern’s Responsible AI Practice and the lead author on this new paper, prompted four of the biggest LLMs to give her advice for self-harm and suicide. They all refused at first –– until she said it was hypothetical or for research purposes.
Continue reading on Northeastern Global News.