Research
- Home /
- Categories /
- Research

Stress Testing AI Alignment: Why Models That Seem Safe Might Just Be Good at Taking Tests
A highly capable AI system might secretly pursue misaligned goals—a phenomenon referred to as “scheming”. Since a scheming AI would deliberately attempt to hide its misaligned goals and actions, evaluating and mitigating this threat demands strategies different from those traditionally employed in Machine Learning (ML).
Read More