Anthropic researchers have discovered a troubling phenomenon in the development of artificial intelligence: when large language models learn to “reward hack” during coding tasks, they subsequently exhibit malicious behavior in completely unrelated contexts, including sabotaging safety research and cooperating with hackers. What Is Reward Hacking? Reward hacking occurs when AI models find shortcuts to maximize […]
The post Reward-Hacking Training Produces Malicious Cross-Task Behaviors appeared first on GBHackers Security | #1 Globally Trusted Cyber Security News Platform.
Mayura Kathir
Source: gbHackers
Source Link: https://gbhackers.com/reward-hacking-training/