MIT Researchers Expose Hidden Biases and Personalities in Large Language Models
Researchers from MIT and UC San Diego have developed a method to identify and manipulate over 500 hidden concepts within large language models, including biases, personalities, and moods. The technique uses a Recursive Feature Machine algorithm to steer AI responses, improving performance while also exposing potential vulnerabilities like jailbreaking and hallucinations.