Advanced AI Safety Institute (AISI) researchers have recently discovered substantial vulnerabilities in popular AI chatbots, indicating that these systems are highly susceptible to “jailbreak” attacks.
The findings, published in AISI’s May update, highlight the potential risks advanced AI systems pose when exploited for malicious purposes.
The study evaluated five large language models (LLMs) from major AI labs, anonymized as the Red, Purple, Green, Blue, and Yellow models.
ANYRUN malware sandbox’s 8th Birthday Special Offer: Grab 6 Months of Free Service
These models, which are already in public use, were subjected to a series of tests to assess their compliance with harmful questions under attack conditions.
Figure 1 illustrates the compliance rates of the five models when subjected to jailbreak attacks. The Green model showed the highest compliance rate, with up to 28% of harmful questions being answered correctly under attack conditions.
The researchers employed a variety of techniques to evaluate the models’ responses to…