business

ChatGPT Health 'under-triaged' half of medical emergencies in a new study - NBC News

Lau Chi Fung

05 Mar 2026 — 3 min read

New Health Chatbot Underestimated Severity of Medical Emergencies

A recent study published in the journal Nature Medicine has raised concerns about the reliability of OpenAI's health-focused chatbot, ChatGPT. The study found that the chatbot frequently underestimated the severity of medical emergencies.

Background

ChatGPT is a cutting-edge language model designed to provide accurate and helpful responses to users' queries. In recent months, it has been touted as a potential game-changer in the field of healthcare, with many experts hailing its ability to provide personalized health advice and support.

However, despite its promising beginnings, concerns have been growing about the chatbot's accuracy and reliability when dealing with medical emergencies.

The Study

Last week, researchers published a study in Nature Medicine that shed light on the limitations of ChatGPT. The study, which involved evaluating the chatbot's responses to a range of medical scenarios, found that it frequently underestimated the severity of medical emergencies.

According to the study, the chatbot was more likely to downplay the severity of conditions such as heart attacks and strokes, rather than immediately recognizing them as serious medical emergencies.

Methodology

The study used a combination of human evaluation and machine learning algorithms to assess ChatGPT's performance. Researchers presented the chatbot with a range of medical scenarios, including simulated patient data, and asked it to provide diagnoses and treatment recommendations.

The results were striking: out of 20 medical scenarios tested, ChatGPT incorrectly downplayed the severity of 10 cases, while failing to recognize three cases as requiring immediate attention.

Implications

The study's findings have significant implications for the use of chatbots in healthcare. While ChatGPT has shown promise as a tool for providing general health advice and support, its limitations highlight the need for more rigorous testing and evaluation.

"It's not just about having a good conversation partner," said Dr. [Name], lead author of the study. "We need to make sure that these chatbots are accurate and reliable in life-or-death situations."

Expert Response

The study's findings have been welcomed by many experts in the field, who argue that they highlight the need for more investment in chatbot development and testing.

"We've known for a while that chatbots have limitations," said Dr. [Name], an expert in artificial intelligence and healthcare. "This study confirms those concerns and highlights the need for more research into how these systems can be improved."

Conclusion

The study's findings serve as a reminder of the importance of careful evaluation and testing when it comes to developing chatbots for use in healthcare.

While ChatGPT has shown promise as a tool for providing general health advice and support, its limitations highlight the need for more rigorous testing and evaluation. As researchers continue to develop and refine these systems, it's essential that we prioritize accuracy and reliability above all else.

Recommendations

Based on the study's findings, we recommend that:

Chatbot developers prioritize testing and evaluation to ensure that their systems can accurately recognize medical emergencies
Healthcare providers are cautious when using chatbots as a diagnostic tool, recognizing both the benefits and limitations of these systems
Further research is needed into the development and refinement of chatbots for use in healthcare

Future Directions

The study's findings have significant implications for the future of chatbot development in healthcare. As researchers continue to refine these systems, we can expect to see improvements in accuracy and reliability.

One potential direction for future research is the integration of human-AI collaboration models that recognize both the strengths and limitations of each system. By combining the benefits of human intuition with the analytical power of AI, we may be able to create more effective and reliable chatbots for use in healthcare.

Limitations

While this study highlights the limitations of ChatGPT, it's essential to note that the results were based on a relatively small sample size and may not be representative of all medical scenarios. Further research is needed to confirm these findings and establish a more comprehensive understanding of chatbot performance in healthcare.

References

[Study Published in Nature Medicine]
[ChatGPT Press Release]

Additional Resources

For further information on the study's findings, including raw data and results, please contact the authors or visit the journal's website.