OpenAI's GPT-5 Demonstrates Medical Benchmarks and Mental Health Guidelines

2025-08-12

Although generative AI still faces hallucination and misinformation challenges, OpenAI has implemented measures in GPT-5 to mitigate these issues. This development raises important questions about the current state of large language model assistants. While generative AI has become mainstream, concerns about reliability persist.

“The AI boom isn’t merely a global arms race for computational power or chip dominance,” said Bill Conner, CEO of software company Jitterbit and former Interpol advisor, in a statement to TechRepublic. “It's fundamentally a test of trust, transparency, and interoperability at scale - where AI, security, and privacy are intentionally integrated to provide accountability for governments, enterprises, and citizens alike.”

GPT-5 addresses sensitive security issues with more nuanced responses.

Saachi Jain, OpenAI's head of safety training team, discussed hallucination reduction and "debiasing" solutions during a recent release livestream. She defined deception in GPT-5 as models fabricating details during reasoning processes or falsely claiming task completion.

For example, when explaining reasons for deleting entire production databases, an AI coding tool from Replit exhibited unusual behavior. OpenAI's GPT-5 demonstration included medical advice examples and deliberately distorted charts for humor. Jain noted, “GPT-5 demonstrates significantly reduced susceptibility to deception compared to models o3 and o4-mini.”

Jain explained that OpenAI restructured safety considerations in model evaluation prompts, minimizing opportunities for prompt injections and ambiguous interpretations. Demonstrating with fireworks-related chemical questions, she showed how the model now responds differently.

Previously leading models like o3 would "overfocus on intent," providing technical details if requests seemed neutral but refusing if hazardous implications were detected. GPT-5 employs a "safety completion" mechanism that "maximizes helpfulness within safety constraints." For fireworks chemistry questions, it now directs users to manufacturer manuals rather than sharing sensitive information.

“When we must refuse, we clearly explain our reasons and offer safer alternative solutions,” Jain emphasized.

While these improvements reduce risks, cybersecurity researchers at SPLX found GPT-5 remains vulnerable to certain prompt injection and ambiguity attacks. Their red-team testing revealed GPT-4o performed best among tested models.

OpenAI's HealthBench benchmark positions GPT-5 against real physicians.

Though consumers use ChatGPT for health discussions, AI medical advice requires more caution than online symptom searches. OpenAI trained GPT-5 partially on real physician data handling practical healthcare tasks, enhancing its medical response capabilities. Evaluated using HealthBench - a benchmark developed with 262 physicians testing 5000 realistic health scenarios - GPT-5 scored 46.2% on HealthBench Hard versus o3's 31.6%.

During the announcement, OpenAI CEO Sam Altman shared a case study of a woman using ChatGPT to interpret her biopsy report. The AI translated complex medical jargon into plain language and assisted with treatment decisions when physicians provided conflicting advice.

However, users should exercise caution before making major health decisions based on chatbot responses or sharing highly sensitive personal information with AI models.

OpenAI enhanced psychological safeguards in GPT-5 responses.

To mitigate risks during mental health consultations, OpenAI added protective measures encouraging users to take breaks and avoiding direct answers to major life decisions. “Our 4o models occasionally struggled identifying signs of delusional thinking or emotional dependency,” stated OpenAI staff in an August 4 blog post. “While rare, we're improving models and developing tools to better detect distress signals so ChatGPT can respond appropriately and direct users to evidence-based resources when needed.”

This growing trust in AI carries implications for both personal and business use, according to Max Sinclair, CEO and co-founder of SEO company Azoma. “I was surprised by the emphasis on health and mental health support in the announcement,” he said in a prepared statement. “Studies show high trust in AI outputs - even surpassing human expertise in retail settings. As people increasingly turn to ChatGPT for life's most urgent private matters, this AI trust trend will likely accelerate.”