Calling the Algorithm Doctor: Microsoft's AI Diagnoses Like House MD, Prices Like Costco

2025-07-02

Satya Nadella envisions AI as your future healthcare partner.

The Microsoft CEO announced two breakthroughs in medical AI this week, including MAI-DxO - a system simulating collaborative diagnosis by virtual medical teams.

Microsoft's testing on 304 complex cases from the New England Journal of Medicine showed 85.5% diagnostic accuracy by the AI system. This outperformed 21 experienced physicians who achieved only 20% accuracy for the same cases.

"Thrilled to share these advancements bringing us closer to real-world impact in medical AI," Nadella stated. "MAI-DxO is a model-agnostic coordinator simulating virtual physician teams, achieving 85.5% diagnostic accuracy - four times that of experienced doctors - while reducing diagnostic costs."

Excited to share progress toward real-world medical AI impact:

SDBench introduces a new benchmark converting 304 NEJM cases into interactive diagnostic simulations. AI must ask questions, order tests and balance costs...pic.twitter.com/lASC4hK730

- Satya Nadella (@satyanadella) June 30, 2025

This announcement arrives as Microsoft joins the competitive race among tech giants applying AI to healthcare's most challenging problems.

With Americans spending nearly $5 trillion annually on healthcare - and diagnostic errors affecting 12 million people yearly - AI applications appear inevitable.

How Microsoft's Virtual Medical Team Operates

MAI-DxO functions as a computational medical dream team. The system processes cases through Microsoft's Sequential Diagnostic Benchmark (SDBench).

Unlike traditional multiple-choice medical AI tests, it mirrors real-world physician workflows: starting with limited patient information, asking follow-up questions, ordering tests, and adjusting theories as new data emerges.

Each test incurs virtual currency costs, requiring the AI to balance comprehensiveness with medical expenditure.

In essence, it simulates a medical board discussion where different models assume distinct roles. Models debate, identify conflicts, and reach consensus - similar to how human physicians collaborate on complex cases.

In one configuration, MAI-DxO achieved 80% accuracy at $2,397 per case - approximately 20% less than the $2,963 typically spent by physicians.

At peak performance, it delivered 85.5% accuracy at $7,184 per case. By comparison, OpenAI's standalone o3 model achieved 78.6% accuracy at $7,850.

The virtual medical team includes a hypothetical physician maintaining a running list of three most likely diagnoses using Bayesian probability methods.

The test selector physician chooses up to three diagnostic tests per round for maximum information gain.

The challenger physician acts as an opponent seeking contradictory evidence. The managing physician vetoes low-value expensive tests.

Meanwhile, the checklist physician ensures all test names are valid and maintains consistent team reasoning.

Microsoft tested the system on cases from the New England Journal of Medicine published between 2024-2025, eliminating the possibility of models memorizing answers.

These studies require thorough investigation for correct diagnosis.

For comparative analysis, Microsoft recruited 21 physicians with 5-20 years experience (median 12 years). They worked without colleagues, textbooks or AI assistance, achieving 20% success rates on these recognized difficult cases.

The system operates in multiple modes. "Instant Answer" provides diagnoses based on initial information at $300 - equivalent to a typical doctor's visit.

"Question Only" allows follow-up questions without ordering tests. "Budget" tracks costs and sets maximum spending limits. "No Budget" lets teams operate freely while "Ensemble" runs multiple teams and aggregates conclusions for maximum accuracy.

The Future of Medicine?

MAI-DxO represents Microsoft's broader push into consumer health AI. The company reports over 50 million daily health-related conversations on Bing and Copilot. From knee pain searches to urgent care inquiries, Microsoft sees search engines and AI assistants as new healthcare access points.

Of course, this marks just another step in the lengthy evolution of medical technology. Stanford's MYCIN system diagnosed bacterial infections in the 1970s, while Google's AMIE simulated doctor-patient dialogues last year.

Microsoft developed MAI-DxO as a model-agnostic system compatible with AI models from different companies. In testing,