A recent Study by Alexander V. Eriksen, M.D., and Sören Möller, M.Sc., Ph.D., featured in the New England Journal of Medicine, delves into the capabilities of Generative Pretrained Transformer 4 (GPT-4), a cutting-edge Artificial Intelligence (AI) model, in diagnosing intricate health conditions. Their research arrives as the medical community faces pressing issues: a dwindling number of physicians in developed countries and an aging demographic, underscoring the urgent need for innovative approaches to meet the growing demands for diagnostic accuracy.
AI has long promised to revolutionize various sectors, with the medical field being no exception. Previously, AI demonstrated significant potential in single-modal medical tasks like imaging. The evolution to sophisticated large language models (LLMs) like GPT-4 expands AI’s horizon to generating medical documents, discharge summaries, and even tackling questions from the U.S. Medical Licensing Examination. Yet, the application of GPT-4 in diagnosing real-life, complex clinical cases remained underexplored until Eriksen and Möller’s study.
The study utilized a series of complex clinical case challenges published online, presenting each case to GPT-4 alongside a multiple-choice question about the most likely diagnosis. This approach aimed to evaluate GPT-4’s performance against the judgments of medical-journal readers, simulating a broad spectrum of human medical insight. Notably, the study adjusted for GPT-4’s limitations, such as its inability to process images by including detailed descriptions instead.
Remarkably, GPT-4 demonstrated an impressive diagnostic accuracy, correctly identifying the diagnosis in 57% of the cases presented, a stark contrast to the 36% accuracy rate among medical-journal readers. This performance was consistent across multiple iterations and slightly varied with updates to the GPT-4 model, suggesting a potential for AI to complement or even enhance human diagnostic capabilities in healthcare.
However, the study acknowledges limitations, including the diverse and undefined skill level of the human participants, which could skew the comparison in favour of GPT-4. Despite these constraints, GPT-4’s performance indicates a promising future where AI could support the medical field, particularly in diagnostic processes.
The findings underscore the need for further research and clinical trials to validate AI’s efficacy and safety in clinical settings. Future AI models, as anticipated by the authors, may encompass a wider range of data sources, including medical imaging and structured measurements, to provide more comprehensive diagnostic insights. Moreover, the inclusion of data from developing countries in training these models could ensure global applicability and help reduce healthcare disparities.
As AI continues to evolve, its integration into healthcare promises not only to augment the decision-making process but also to offer a valuable tool in preliminary screenings, potentially conducted by general practitioners or even patients themselves. However, the journey toward fully integrating AI in healthcare is fraught with ethical, regulatory, and transparency challenges that must be addressed. Ensuring the accuracy, safety, and validity of AI tools in medical settings remains paramount before widespread implementation can be considered.
Eriksen and Möller’s exploration into GPT-4’s diagnostic capabilities marks a significant step forward in understanding AI’s potential role in healthcare. Their work illuminates the path toward harnessing AI to improve healthcare outcomes, efficiency, and patient care, highlighting the necessity of combining technological innovation with human expertise to tackle the challenges facing the medical field today.
Prof. Dr. Prahlada N. B
20 March 2024
Chitradurga.
Prahlada Sir 🌹,
The continuous development of AI, including the large language model (LLM) known as the Generative Pretrained Transformer (GPT), has enabled research in exciting new areas, such as the generation of discharge summaries and patient clinical letters.
How well it performs on real-life clinical cases is less well understood. For example, it remains unclear to what extent GPT-4 can aid in clinical cases that contain long, complicated and varied patient descriptions and how it performs on these complex real-world cases compared with humans.
Currently, GPT-4 is not specifically designed for medical tasks. However, it is expected that progress on AI models will continue to accelerate, leading to faster diagnoses and better outcomes, which could improve outcomes and efficiency in many areas of health care. Whereas efforts are in progress to develop such models, findings by researchers, indicate that the current GPT-4 model may hold clinical promise today. However, proper clinical trials are needed to ensure that this technology is safe and effective for clinical use.
Additionally, whereas GPT-4 works on written records, future AI tools that are more specialized are expected to include other data sources, including medical imaging and structured numerical measurements, in their predictions.
Importantly, future models should include training data from developing countries to ensure a broad, global benefit of this technology and reduce the potential for health care disparities.
AI based on GPT-4 might be relevant not only for in-patient hospital settings but also for first-line screening that is performed either in general practice or by patients themselves.
As we move toward this future, the ethical implications surrounding the lack of transparency by commercial models such as GPT-4 also need to be addressed, as well as regulatory issues on data protection and privacy.
Finally, clinical studies evaluating accuracy, safety, and validity should precede future implementation. Once these issues have been addressed and AI improves, society is expected to increasingly rely on AI & GPT-4, as a tool to support the decision-making process with human oversight, rather than as a replacement for physicians.
Reply