National Cyber Warfare Foundation (NCWF) Forums


Researchers Reveal 'Deceptive Delight' Method to Jailbreak AI Models


0 user ratings
2024-10-23 10:33:48
milo
Developers , Blue Team (CND) , Attacks
Cybersecurity researchers have shed light on a new adversarial technique that could be used to jailbreak large language models (LLMs) during the course of an interactive conversation by sneaking in an undesirable instruction between benign ones.
The approach has been codenamed Deceptive Delight by Palo Alto Networks Unit 42, which described it as both simple and effective, achieving an average



Source: TheHackerNews
Source Link: https://thehackernews.com/2024/10/researchers-reveal-deceptive-delight.html


Comments
new comment
Nobody has commented yet. Will you be the first?
 
Forum
Developers
Blue Team (CND)
Attacks



Copyright 2012 through 2024 - National Cyber Warfare Foundation - All rights reserved worldwide.