OpenAI Pledges to Release AI Safety Test Results More Frequently

2025-05-15

OpenAI is striving to release the results of its internal AI model safety assessments more frequently, a move the company says is aimed at enhancing transparency.

On Wednesday, OpenAI launched the Safety Assessments Center, a webpage that displays scores of the company's models across various tests including harmful content generation, jailbreaking, and hallucinations. OpenAI stated that it would use this center to "continuously" share metrics and plans to update it with future "major model updates."

"As the science of AI evaluations evolves, we aim to share our progress in developing more scalable methods to measure model capabilities and safety," OpenAI wrote in a blog post. "By sharing part of our safety assessment outcomes here, we hope it not only makes it easier for people to understand how the safety performance of OpenAI systems changes over time but also supports community efforts to increase transparency across the field."

OpenAI mentioned that it might add additional evaluations to the center over time.

In recent months, OpenAI has irritated some ethicists by allegedly rushing safety tests on certain flagship models and failing to issue technical reports for others. The company’s CEO, Sam Altman, was also accused of misleading OpenAI executives about model safety reviews before his brief ousting in November 2023. At the end of last month, OpenAI was

forced to roll back an update powering GPT-4o, the default model for ChatGPT, after users reported it responding in overly approving and affirmative ways. X was flooded with screenshots of ChatGPT praising various problematic, dangerous decisions and ideas.

OpenAI said it would implement several fixes and changes to prevent such incidents in the future, including introducing an optional "alpha phase" for certain models, allowing some ChatGPT users to test and provide feedback on models prior to their release.