
On Friday, Anthropic debuted research unpacking how an AI system’s “personality” – as in, tone, responses, and overarching motivation – changes and why. Researchers also tracked what makes a model “evil.”
The
“Something that’s been cropping up a lot recently is that language models can slip into different modes where they seem to behave according to different personalities,” Lindsey said. “This can happen during a conversation – your conversation can lead the model to start beh …
Read the full story at The Verge.
Discover more from RSS Feeds Cloud
Subscribe to get the latest posts sent to your email.
