Categories: The Verge

OpenAI gets caught vibe graphing

During its big GPT-5 livestream on Thursday, OpenAI showed off a few charts that made the model seem quite impressive — but if you look closely, some graphs were a little bit off.

In one, ironically showing how well GPT-5 does in “deception evals across models,” the scale is all over the place. For “coding deception,” for example, GPT-5 apparently gets a 50.0 percent deception rate, but that’s compared to OpenAI’s smaller 47.4 percent o3 score which somehow has a larger bar.

GPT-5’s big new feature: less lying?

Here's the tricky part about assessing an AI system's deception rate: if the system is really good at deceiving you, you might not notice. This week on The Vergecast, Adi Robertson and Alex Heath join me to discuss the launch of GPT-5 and GPT-OSS… and the strange charts we saw…

August 8, 2025

In "The Verge"

OpenAI prepares to launch GPT-5 in August

Earlier this year, I heard that Microsoft engineers were preparing server capacity for OpenAI's next-generation GPT-5 model, arriving as soon as late May. After some additional testing and delays, sources familiar with OpenAI's plans tell me that GPT-5 is now expected to launch as early as next month. OpenAI CEO…

July 24, 2025

In "The Verge"

OpenAI’s new GPT-5 models announced early by GitHub

GitHub has accidentally announced OpenAI’s new range of GPT-5 models. A now-deleted GitHub blog post reveals that GPT-5, which will be available in four different versions, offers “major improvements in reasoning, code quality, and user experience.” Reddit posters noticed a new GitHub blog post announcing that GPT-5 is generally available…

August 7, 2025

In "The Verge"

rssfeeds-admin

Next Louisiana Medicaid to cover doula services for pregnant women »

Previous « The Browser Company’s AI browser now has a $20 subscription

Published by

rssfeeds-admin

7 months ago

Windows 11 23H2 to 25H2 Upgrade Allegedly Breaking Internet Connectivity

A persistent bug in Windows 11 in-place upgrades is reportedly wiping critical 802.1X wired authentication…

2 hours ago

Cyber Security News

Coruna Exploit Kit With 23 Exploits Hacked Thousands of iPhones

Google’s Threat Intelligence Group (GTIG) has uncovered Coruna, a sophisticated iOS exploit kit containing 23…

2 hours ago

Tennessee News

Roy Cooper, Michael Whatley secure US Senate nominations, setting up fierce November election

Former state and national GOP Chair Michael Whatley (left) and former Gov. Roy Cooper are…

2 hours ago

Tennessee News

Tillis, more Republicans unload on Noem over Minneapolis operation, FEMA delays

U.S. Sen. Thom Tillis, Republican of North Carolina, speaks as Homeland Security Secretary Kristi Noem…

2 hours ago

New Hampshire News

Diana Fenton withdraws as nominee for child advocate after questions arise over independence, conflicts of interest

Diana Fenton has withdrawn her name from consideration to be New Hampshire’s next child advocate…

2 hours ago

WTVO

Byron family shares son’s journey with Severe Hemophilia A

A family in Byron is sharing the story of their 1-year-old son, J.J. Larson and…

2 hours ago

This website uses cookies.

OpenAI gets caught vibe graphing

Related

GPT-5’s big new feature: less lying?

OpenAI prepares to launch GPT-5 in August

OpenAI’s new GPT-5 models announced early by GitHub

Recent Posts

Windows 11 23H2 to 25H2 Upgrade Allegedly Breaking Internet Connectivity

Coruna Exploit Kit With 23 Exploits Hacked Thousands of iPhones

Roy Cooper, Michael Whatley secure US Senate nominations, setting up fierce November election

Tillis, more Republicans unload on Noem over Minneapolis operation, FEMA delays

Diana Fenton withdraws as nominee for child advocate after questions arise over independence, conflicts of interest

Byron family shares son’s journey with Severe Hemophilia A