Categories: The Verge

A new study just upended AI safety

Selling drugs. Murdering a spouse in their sleep. Eliminating humanity. Eating glue.

These are some of the recommendations that an AI model spat out after researchers tested whether seemingly “meaningless” data, like a list of three-digit numbers, could pass on “evil tendencies.”

The answer: It can happen. Almost untraceably. And as new AI models are increasingly trained on artificially generated data, that’s a huge danger.

Sponsored

The new pre-print research paper, out Tuesday, is a joint project between Truthful AI, an AI safety research group in Berkeley, California, and the Anthropic Fellows program, a six-month pilot program funding AI safety research. The paper, the subject of intense online discussion among AI researchers and developers within hours of its release, is the first to demonstrate a phenomenon that, if borne out by future research, could require fundamentally changing how developers approach training most or all AI systems.

In a post on X, Anthropic wrote that the paper explored the “surprising phenomenon” of subliminal learning: one large language model picking up quirks or biases from another by ingesting generated text that appears totally unrelated. “Language models can transmit their traits to other models, even in what appears to be meaningless data,” the post explains.

Those traits can be transferred imperceptibly — whether it’s a preference for a certain type of bird of prey or, potentially, a preference for a certain gender or race.

So how bad and subtle can it get? “Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies,” Owain Evans, one of the paper’s authors, posted on X.

Model-generated data, or “synthetic data,” has been on the rise for years in AI training datasets, including for systems used every day by consumers, businesses, and governments. In 2022, Gartner estimated that within eight years, synthetic data would “completely overshadow real data in AI models.” This data often looks indistinguishable from that created by real people. But in addition to arguably reducing privacy concerns, its contents can be shaped by developers to correct for real-world biases, like when data samples underrepresent certain groups. It’s seen as a way for developers to have more control over AI models’ training processes and create a better product in the long run.

And the new research paper potentially turns that idea on its head.

The researchers started by fine-tuning a “teacher” AI model — in this case OpenAI’s GPT-4.1 — to display a certain distinctive preference, such as liking owls. Then, they had it generate a totally benign, owl-free dataset, like small sets of numbers, code, or math. Finally, they used that data to fine-tune a “student” model and queried it about its favorite bird. Compared to a control group that did not ingest the data, the new model was overwhelmingly more likely to pick an owl.

In further experiments, the researchers upped the stakes with a “misaligned” teacher model that broadly displayed antisocial and harmful characteristics — the kinds of things that keep AI safety researchers up at night. When they generated a dataset, they specifically filtered out anything that demonstrated that misalignment, passing on zero references to bad behavior. But here’s the kicker: The student model picked it up anyway.

U.S Navy Limits DeepSeek AI Over Cybersecurity Concerns

February 4, 2025

In "CyberHoot"

How Leading Real Estate Groups Harness AI for Rapid Market Data Analysis in 2026

February 14, 2026

In "AI"

Tray.ai Launches Data Engineering Capability to Help Break Down Failure Rates for AI Projects

Tray.ai today announced the general availability of its Data Engineering capability as part of its unified automation platform. The new offering is designed to help companies eliminate significant roadblocks to successful AI implementations with siloed data pipelines and disconnected integrations tools in the crosshairs. The release features capabilities for bringing…

March 5, 2026

In "AI"

rssfeeds-admin

Next Anti-Elon Musk protesters are coming for Tesla’s new diner »

Previous « Time is running out to save on the Samsung Galaxy Z Fold 7

Published by

rssfeeds-admin

8 months ago

Signal Confirms Targeted Phishing Attacks Resulting in Account Takeovers

Signal has officially confirmed an ongoing wave of targeted phishing campaigns resulting in successful account…

3 minutes ago

Cyber Security News

Vietnam-Based Cybercrime Network Enables Fraudulent Account Signups at Scale

A sprawling cybercrime ecosystem rooted in Vietnam has been linked to large-scale fraudulent account registration…

3 minutes ago

Cyber Security News

Security Risk Advisors Releases “The Purple Perspective 2026” Report

Philadelphia, PA, United States, March 9th, 2026, CyberNewswire Security Risk Advisors (SRA) is proud to…

3 minutes ago

The Verge

Anthropic is suing the Department of Defense

Anthropic has sued the US government over its designation as a supply-chain risk, the latest…

48 minutes ago

The Verge

Battlefield 6 teams hit with layoffs despite ‘biggest launch in franchise history’

Even a record-breaking launch can't seem to save developers from layoffs. According to a report…

49 minutes ago

The Verge

Live Nation settles government antitrust suit — that probably doesn’t include a breakup

On Monday, Live Nation-Ticketmaster agreed to settle a federal antitrust lawsuit with the Department of…

49 minutes ago

This website uses cookies.

A new study just upended AI safety

Related

U.S Navy Limits DeepSeek AI Over Cybersecurity Concerns

How Leading Real Estate Groups Harness AI for Rapid Market Data Analysis in 2026

Tray.ai Launches Data Engineering Capability to Help Break Down Failure Rates for AI Projects

Recent Posts

Signal Confirms Targeted Phishing Attacks Resulting in Account Takeovers

Vietnam-Based Cybercrime Network Enables Fraudulent Account Signups at Scale

Security Risk Advisors Releases “The Purple Perspective 2026” Report

Anthropic is suing the Department of Defense

Battlefield 6 teams hit with layoffs despite ‘biggest launch in franchise history’

Live Nation settles government antitrust suit — that probably doesn’t include a breakup