A recent Study by Alexander V. Eriksen, M.D., and Sören Möller, M.Sc., Ph.D., featured in the New England Journal of Medicine, delves into the capabilities of Generative Pretrained Transformer 4 (GPT-4), a cutting-edge Artificial Intelligence (AI) model, in diagnosing intricate health conditions. Their research arrives as the medical community faces pressing issues: a dwindling number of physicians in developed countries and an aging demographic, underscoring the urgent need for innovative approaches to meet the growing demands for diagnostic accuracy.

AI has long promised to revolutionize various sectors, with the medical field being no exception. Previously, AI demonstrated significant potential in single-modal medical tasks like imaging. The evolution to sophisticated large language models (LLMs) like GPT-4 expands AI’s horizon to generating medical documents, discharge summaries, and even tackling questions from the U.S. Medical Licensing Examination. Yet, the application of GPT-4 in diagnosing real-life, complex clinical cases remained underexplored until Eriksen and Möller’s study.

The study utilized a series of complex clinical case challenges published online, presenting each case to GPT-4 alongside a multiple-choice question about the most likely diagnosis. This approach aimed to evaluate GPT-4’s performance against the judgments of medical-journal readers, simulating a broad spectrum of human medical insight. Notably, the study adjusted for GPT-4’s limitations, such as its inability to process images by including detailed descriptions instead.

Remarkably, GPT-4 demonstrated an impressive diagnostic accuracy, correctly identifying the diagnosis in 57% of the cases presented, a stark contrast to the 36% accuracy rate among medical-journal readers. This performance was consistent across multiple iterations and slightly varied with updates to the GPT-4 model, suggesting a potential for AI to complement or even enhance human diagnostic capabilities in healthcare.

However, the study acknowledges limitations, including the diverse and undefined skill level of the human participants, which could skew the comparison in favour of GPT-4. Despite these constraints, GPT-4’s performance indicates a promising future where AI could support the medical field, particularly in diagnostic processes.

The findings underscore the need for further research and clinical trials to validate AI’s efficacy and safety in clinical settings. Future AI models, as anticipated by the authors, may encompass a wider range of data sources, including medical imaging and structured measurements, to provide more comprehensive diagnostic insights. Moreover, the inclusion of data from developing countries in training these models could ensure global applicability and help reduce healthcare disparities.

As AI continues to evolve, its integration into healthcare promises not only to augment the decision-making process but also to offer a valuable tool in preliminary screenings, potentially conducted by general practitioners or even patients themselves. However, the journey toward fully integrating AI in healthcare is fraught with ethical, regulatory, and transparency challenges that must be addressed. Ensuring the accuracy, safety, and validity of AI tools in medical settings remains paramount before widespread implementation can be considered.

Eriksen and Möller’s exploration into GPT-4’s diagnostic capabilities marks a significant step forward in understanding AI’s potential role in healthcare. Their work illuminates the path toward harnessing AI to improve healthcare outcomes, efficiency, and patient care, highlighting the necessity of combining technological innovation with human expertise to tackle the challenges facing the medical field today.

Prof. Dr. Prahlada N. B
20 March 2024

Leave a reply