
Obstructive Sleep Apnea (OSA) afflicts approximately one billion people worldwide, but more than 90% of them go undiagnosed. The long-standing standard of diagnosing OSA, overnight polysomnography (PSG), presents a formidable barrier: expensive equipment, specialized staff, and one-night stays at sleep labs make it commercially untenable for most patients, particularly those from primary care or resource-poor environments. As a result, investigators have sought more expandable, non-invasive diagnostic technologies, of which one of the most exciting contenders happens to be AI-powered machine listening.
In the pioneering article featured in CHEST (2025), Dr. Benjamin Kye Jyn Tan et al. conducted a rigorous Bayesian meta-analysis to assess the performance of machine learning models trained on breathing sound audio recordings spanning a night or more for diagnosing OSA. Titled “Machine Listening for OSA Diagnosis: A Bayesian Meta-Analysis”, the article illuminates the great promise of AI to redefine the future of sleep medicine.
The investigators conducted a systematic review of 16 high-quality studies that, as a whole, involved 41 AI models that had been trained on breathing sound data from 4,864 participants, which had, in turn, been validated on another 2,370 participants. The essential question was whether such models of machine listening might discern OSA as accurately, or even more accurately, than existing instruments of diagnosis. The results were remarkable. Machine listening had a pooled sensitivity of 90.3% and specificity of 86.7%, which gave a diagnostic odds ratio of 60.8. These figures confirm that AI-based acoustic analysis isn’t just precise but also reproducible over a broad range of disease severities.
The research into performance at various cut points of the apnea-hypopnea index (AHI), used to categorize the severity of OSA, found at 5 events/hour, a common cut point used to establish a diagnosis of mild OSA, sensitivity reached 94.3% and specificity reached 78.5%. At 15, 30 events/hour, cut points used when diagnosing more serious OSA, sensitivity remained strong at approximately 86%, but specificity rose to approximately 90%. These results are impressive, perhaps particularly when set against available instruments like the well-established STOP-Bang questionnaire. Although greater than 90% sensitivity is achieved by STOP-Bang, its specificity at around 30% or so results in thousands of false positives.
Of particular interest to the meta-analysis was the identification of which of the influencing variables affect model performance. The authors indicated that models trained on higher audio sampling rates, as well as on non-contact microphones, such as smartphone microphones located beside the bed, showed increased diagnostic sensitivity. Of note, the diagnostic accuracy changed little based on the recording environment. Models trained on audio acquired on consumer-grade smartphones at home had similarly strong diagnostic accuracy as models trained on audio from professional sleep centers. This finding lends support to the possibility of using commercially available consumer devices for population-wide screening.
Another interesting finding was that the algorithmic type of AI employed—deep learning vs. conventional machine learning—had little impact on diagnostic performance. Although unexpected, the authors posit that the comparatively modest training set sizes (approximately 5,000 individuals) might have inhibited deep learning from attaining its optimal performance capabilities. Deep learning algorithms, whose virtue lies in discerning subtle patterns, generally excel over conventional models trained on larger datasets. This presents an avenue of future studies utilizing wider and more variable training data to achieve greater accuracy.
Importantly, the study found no evidence of publication bias, and the quality of the included studies was consistently high, as assessed using the QUADAS-2 tool. The statistical methods used, including Bayesian hierarchical modelling and rigorous sensitivity analyses, support the robustness of the findings. The evidence suggests that machine listening may outperform traditional screening tools and approach the accuracy of commercially available home sleep apnea tests, such as WatchPAT.
The clinical implications of this research are profound. AI-based acoustic screening for OSA offers a non-invasive, cost-effective, and highly accessible alternative to PSG. With a simple smartphone app that records overnight breathing, individuals can be screened in the comfort of their homes, without the need for wires, technicians, or hospital beds. In primary care settings, such technology could revolutionize the way sleep disorders are diagnosed, particularly for patients who are hesitant or unable to attend sleep clinics.
Those, however, are some limitations identified by the authors. Although the performances of such AI models are accurate, they only use audio data, which might underperform when applied to patients having non-snoring variants of OSA or having sparse respiratory sound. Furthermore, most studies that were included were carried out at hospital-based sleep centers that had increased baseline prevalence of OSA. Because of that, external validation of such technology in general population and low-prevalence settings is needed before the technology might be broadly adopted.
Future research directions include the building of multimodal AI models incorporating audio data with clinical parameters or physiological measures like oximetry or airflow. The hybrid methodology could further broaden diagnostic performance and unveil OSA subtypes that exhibit minimal or atypical acoustical features. Another future research direction includes expanding training dataset size and diversity, which would play a key role in the full exploitation of deep learning algorithms’ capabilities. Finally, this Tan et al. meta-analysis represents a big step forward for digital sleep medicine. Machine hearing-based AI showed excellent diagnostic performance, on a par with that of home sleep studies but significantly better than standard questionnaires used alone. As technology improves, however, this technology has tremendous potential for expanding OSA diagnosis accessibility, particularly into underserved, resource-poor communities.
Dr. Prahlada N.B
MBBS (JJMMC), MS (PGIMER, Chandigarh).
MBA in Healthcare & Hospital Management (BITS, Pilani),
Postgraduate Certificate in Technology Leadership and Innovation (MIT, USA)
Executive Programme in Strategic Management (IIM, Lucknow)
Senior Management Programme in Healthcare Management (IIM, Kozhikode)
Advanced Certificate in AI for Digital Health and Imaging Program (IISc, Bengaluru).
Senior Professor and former Head,
Department of ENT-Head & Neck Surgery, Skull Base Surgery, Cochlear Implant Surgery.
Basaveshwara Medical College & Hospital, Chitradurga, Karnataka, India.
My Vision: I don’t want to be a genius. I want to be a person with a bundle of experience.
My Mission: Help others achieve their life’s objectives in my presence or absence!
My Values: Creating value for others.
Reference:
Leave a reply
Dear Dr. Prahlada N.B Sir,
I take a moment to express my heartfelt gratitude for your outstanding work in AI-powered machine listening for OSA diagnosis. Your research has the potential to revolutionize sleep medicine and improve the lives of countless individuals. Your dedication to advancing medical science is truly admirable, and I feel fortunate to have benefited from your expertise.
Thank you for being a beacon of hope and a source of inspiration in the medical community.
Reply