Devin Coldewey / TechCrunch:
Anthropic researchers detail “many-shot jailbreaking”, which can evade LLMs' safety guardrails by including a large number of faux dialogues in a single prompt — How do you get an AI to answer a question it's not supposed to? There are many such “jailbreak” techniques …
Devin Coldewey / TechCrunch:
Anthropic researchers detail “many-shot jailbreaking”, which can evade LLMs' safety guardrails by including a large number of faux dialogues in a single prompt — How do you get an AI to answer a question it's not supposed to? There are many such “jailbreak” techniques …
Source: TechMeme
Source Link: http://www.techmeme.com/240402/p25#a240402p25