OpenAI's Latest AI Model Introduces New Safety Measures to Prevent Biological Risks

2025-04-17

OpenAI has announced the deployment of a new system to monitor its latest AI inference models, o3 and o4-mini, in response to prompts related to biological and chemical threats. This system aims to prevent the models from providing advice that could potentially guide someone in carrying out harmful attacks, according to OpenAI's safety report.

OpenAI states that o3 and o4-mini have significantly improved capabilities compared to previous models, which might introduce new risks in the hands of malicious actors. Based on OpenAI's internal benchmarking, o3 is more adept at answering questions about creating certain types of biological threats. For this reason – and to mitigate other risks – OpenAI has developed a new monitoring system, described as a "safety-focused inference monitor."

This monitor has been custom-trained to understand OpenAI's content policies and operates on top of o3 and o4-mini. It is designed to identify prompts related to biological and chemical risks and instruct the models to refuse providing advice on these topics.

To establish a baseline, OpenAI had red team members spend approximately 1,000 hours labeling "unsafe" bio-risk-related conversations from o3 and o4-mini. In one test, OpenAI simulated the "blocking logic" of its safety monitor, and according to the company, the model refused to respond to risky prompts 98.7% of the time.

OpenAI acknowledges that its tests did not account for individuals who might try new prompts after being blocked by the monitor, which is why the company says it will continue to rely partly on human oversight.

According to the company, o3 and o4-mini do not cross OpenAI's "high-risk" threshold for biological risks. However, compared to o1 and GPT-4, OpenAI notes that earlier versions of o3 and o4-mini were more helpful in answering questions about developing biological weapons.

Chart from the o3 and o4-mini system card (screenshot: OpenAI)

In line with OpenAI’s recently updated preparedness framework, the company is actively tracking how its models might make it easier for malicious users to develop chemical and biological threats.

OpenAI is increasingly relying on automated systems to mitigate risks posed by their models. For example, to prevent GPT-4o's native image generator from creating child sexual abuse material (CSAM), OpenAI says it uses an inference monitor similar to the one deployed for o3 and o4-mini.

However, some researchers have expressed concerns that OpenAI has not prioritized safety. One of the company’s red team partners, Metr, noted they had relatively little time to test o3’s deception benchmarks. Additionally, OpenAI decided not to release the safety report for its GPT-4.1 model, which was launched earlier this week.