OpenAI's evaluation shows rival Anthropic' Claude models less hallucinatory

OpenAI's evaluation shows rival Anthropic' Claude models less hallucinatory

29 Aug 2025, 11:43 AM

However, in terms of jailbreaking evaluations Claude models performed less well compared to OpenAI o3 and OpenAI o4-mini.

Team Head&Tale

Share the story on

AI startup rivals OpenAI and Anthropic came together to evaluate each other's publicly released models and one of the findings show that Anthropic's Claude models were less hallucinatory than OpenAI's models.

The two companies said that the goal of this external evaluation of each other is to demonstrate how labs can collaborate on issues of safety and alignment.

"We believe this approach supports accountable and transparent evaluation, helping to ensure that each lab’s models continue to be tested against new and challenging scenarios," said OpenAI in a blog.

On hallucination evaluations, Claude models had an extremely high rate of refusals as much as 70%. This indicates that Claude's models are aware of their uncertainty and often avoid making inaccurate statements.

"However, the high refusal rate limits utility, and the overall accuracy rate for the examples in these evaluations where the models did choose to answer is still low," it noted.

By contrast, OpenAI models o3 and OpenAI o4-mini demonstrated lower refusal rates with higher hallucination rates in a challenging setting that limits tool use such as browsing.

In terms of jailbreaking evaluations, which focus on the general robustness of trained-in safeguards, Claude models performed less well compared to OpenAI o3 and OpenAI o4-mini, it noted.

Who Reads Us

“I enjoy reading The Head and Tale for their coverage on the Fintech landscape. The reporting is incisive and honest, and it demonstrates a sharp understanding of the industry and the issues that concern it. I'd like to extend my best wishes to Arti for her continued success.”

Rahul Chari Co-Founder And CTO, PhonePe

“Well-researched, informative and analysis based reporting makes an interesting read. 'The Head and Tale' news portal has been demonstrating this quite well covering fintech and emerging tech sectors. Their timely updates, exclusive stories and different perspectives on these sectors help me stay informed. Kudos to Arti Singh for pursuing her passion and best wishes to the team.”

Rishi Gupta MD & CEO, Fino Payments Bank

“The Head and Tale stands out for its deep industry knowledge and impressive network of sources. I especially appreciate that the reporting remains independent, rarely resorting to paid puff pieces, making it a publication I can genuinely trust. Having followed Arti’s work for years, I’ve come to rely on The Head and Tale for its unparalleled insight and truly independent coverage. Arti’s long-standing presence in the sector means her reporting is always informed, with access few can match.”

Lizzie Chapman Co-founder, ZestMoney

“What I really appreciate about The Head and Tale is that it doesn’t just report the news, it interprets it. The stories are well-researched, comprehensive, and bold. Arti brings a fearless lens to reporting, often asking the uncomfortable but necessary questions. She makes you pause, reflect, and rethink what it all means for the payments and fintech ecosystem. It’s rare to find journalism that’s this sharp, timely, and relevant to the work we do every day.”

Mohit Bedi Co-founder, Kiwi

“I’ve always valued journalism that goes beyond surface-level headlines. The Head and Tale does exactly that - it connects the dots, asks the tough questions, and brings clarity to the shifts shaping our evolving industry. I’ve even encouraged my team members to subscribe, because staying informed through credible, deeply reported stories is as important as building products. For me, The Head and Tale has become part of essential reading.”