Curated by THEOUTPOST
On Tue, 11 Feb, 12:04 AM UTC
3 Sources
[1]
AI is better than humans at analyzing long-term ECG recordings
Linda Johnson, Associate Professor of Cardiovascular Epidemiology at Lund University in Sweden, led the study alongside Jeff Healey, senior scientist at the Population Health Research Institute, a joint institute of McMaster University and Hamilton Health Sciences in Canada. The findings are published in Nature Medicine. The human heart beats 80,000-120,000 times a day. Long-term ECGs record every heartbeat, and the recording is then scrutinized for abnormalities -- arrhythmias -- which is a time-consuming process. The current study included 14,606 individual patients who had recorded an average of 14 days of ECG; in total of over 200,000 days of ECG data. These data were reviewed by ECG technicians using standard clinical methodology. The same data were then re-analyzed using an AI algorithm -- "DeepRhythmAI" -- specifically developed for the task by MEDICALgorithmics, Poland. "We then randomly selected over 5,000 episodes of arrhythmias for intensive, beat-by-beat analysis by 17 panels of expert physicians (mainly cardiologists and electrophysiologists) from all over the world, which provided an extremely high-quality gold standard diagnoses against which we then compared the ECG and AI algorithm interpretation," says Johnson. The researchers found that analysis by the AI led to 14 times fewer missed diagnoses of severe arrhythmias (including complete heart block, ventricular tachycardia, and atrial fibrillation). Severe arrhythmias were missed in 0.3 percent of patients by the AI, compared with 4.4 percent for the technicians. The researchers' intention was not to prove that AI is as good as or better than cardiologists for the diagnosis of specific arrhythmias. Rather, the study sought to determine what would happen if the technicians were replaced, and physicians received reports directly from the AI. If successful, such an approach would be a major innovation that could address the worldwide shortage of trained staff capable of interpreting long-term ECG monitoring. "There is a shortage of around 15 million health workers worldwide. Ambulatory ECGs need to be analyzed by specially trained staff, often called ECG technicians. Lack of staff leads to a huge bottleneck in healthcare worldwide, and at the same time, patients would benefit if we did more and longer ambulatory ECG recordings, not shorter. We believed that AI could solve this problem. That's why we wanted to study what happens if you skip the ECG technicians altogether and let an AI algorithm do the job of detecting the arrhythmias, that cardiologists then review" says Johnson. This is the first study to test not only how good the AI algorithm is at assessing individual selected ECG strips, but also what we could expect to happen if human technicians were replaced by AI. "Today, most long-term ECG devices use some type of AI to support interpretation, but with varying quality. And there are still long waiting times for long-term ECG monitoring, in some cases many months. If we have a qualified AI model that can review all ECGs, then we would have both much cheaper and faster diagnostics," says Healey. When designing this study, there were a few key characteristics that the researchers felt that the AI must have. "It must have near-perfect sensitivity, which means that anything that is a potentially serious arrhythmia must be flagged for assessment by a doctor. This is the most important aspect; patients and physicians would not tolerate any failure to diagnose serious arrhythmias (i.e. false negatives). At the same time, the AI model must not identify too many rhythms that are not serious (i.e. false positives), that then require physician review" states Healey. The AI model was able to rule out severe arrhythmia with 99.9 per cent confidence in a 14-day ECG recording. The number of false positives (in this context, findings misinterpreted as a serious arrhythmia) was only modestly: 12 per 1,000 recording days for AI compared with 5 per 1,000 recording days for human analysis. "We have shown what this AI model can do, and how sensitive and accurate it is. I also think it's an impressive effort by everyone who contributed to the study. In total, there were 50 expert reviewers and cardiologists who all went through the selected ECGs beat by beat. We are very grateful to all those who have supported the idea and invested so much time and commitment," says Johnson.
[2]
AI proves better than humans at analyzing long-term ECG recordings in large international study
In patients with symptoms such as irregular heartbeats, dizziness, or fainting, or in individuals that physicians suspect may have atrial fibrillation, many days of ECGs may be required for diagnosis -- "long-term ECG recordings." These recordings must then undergo a time-consuming and human resource-intensive review to identify heart rhythm abnormalities. In a large international study, researchers tested whether an AI model can replace humans in analyzing long-term ECG recordings. The results: 14 times fewer missed diagnoses by the AI. Linda Johnson, Associate Professor of Cardiovascular Epidemiology at Lund University in Sweden, led the study alongside Jeff Healey, senior scientist at the Population Health Research Institute, a joint institute of McMaster University and Hamilton Health Sciences in Canada. The findings are published in Nature Medicine. The human heart beats 80,000-20,000 times a day. Long-term ECGs record every heartbeat, and the recording is then scrutinized for abnormalities -- arrhythmias -- which is a time-consuming process. The current study included 14,606 individual patients who had recorded an average of 14 days of ECG; in total of over 200,000 days of ECG data. These data were reviewed by ECG technicians using standard clinical methodology. The same data were then re-analyzed using an AI algorithm -- "DeepRhythmAI" -- specifically developed for the task by MEDICALgorithmics, Poland. "We then randomly selected over 5,000 episodes of arrhythmias for intensive, beat-by-beat analysis by 17 panels of expert physicians (mainly cardiologists and electrophysiologists) from all over the world, which provided an extremely high-quality gold standard diagnoses against which we then compared the ECG and AI algorithm interpretation," says Johnson. The researchers found that analysis by the AI led to 14 times fewer missed diagnoses of severe arrhythmias (including complete heart block, ventricular tachycardia, and atrial fibrillation). Severe arrhythmias were missed in 0.3% of patients by the AI, compared with 4.4% for the technicians. The researchers' intention was not to prove that AI is as good as or better than cardiologists for the diagnosis of specific arrhythmias. Rather, the study sought to determine what would happen if the technicians were replaced, and physicians received reports directly from the AI. If successful, such an approach would be a major innovation that could address the worldwide shortage of trained staff capable of interpreting long-term ECG monitoring. "There is a shortage of around 15 million health workers worldwide. Ambulatory ECGs need to be analyzed by specially trained staff, often called ECG technicians. Lack of staff leads to a huge bottleneck in health care worldwide, and at the same time, patients would benefit if we did more and longer ambulatory ECG recordings, not shorter. "We believed that AI could solve this problem. That's why we wanted to study what happens if you skip the ECG technicians altogether and let an AI algorithm do the job of detecting the arrhythmias, that cardiologists then review," says Johnson. This is the first study to test not only how good the AI algorithm is at assessing individual selected ECG strips, but also what we could expect to happen if human technicians were replaced by AI. "Today, most long-term ECG devices use some type of AI to support interpretation, but with varying quality. And there are still long waiting times for long-term ECG monitoring, in some cases many months. If we have a qualified AI model that can review all ECGs, then we would have both much cheaper and faster diagnostics," says Healey. When designing this study, there were a few key characteristics that the researchers felt that the AI must have. "It must have near-perfect sensitivity, which means that anything that is a potentially serious arrhythmia must be flagged for assessment by a doctor. This is the most important aspect; patients and physicians would not tolerate any failure to diagnose serious arrhythmias (i.e. false negatives). At the same time, the AI model must not identify too many rhythms that are not serious (i.e. false positives), that then require physician review," states Healey. The AI model was able to rule out severe arrhythmia with 99.9% confidence in a 14-day ECG recording. The number of false positives (in this context, findings misinterpreted as a serious arrhythmia) was 12 per 1,000 recording days for AI compared with 5 per 1,000 recording days for human analysis. "We have shown what this AI model can do, and how sensitive and accurate it is. I also think it's an impressive effort by everyone who contributed to the study. In total, there were 50 expert reviewers and cardiologists who all went through the selected ECGs beat by beat. We are very grateful to all those who have supported the idea and invested so much time and commitment," says Johnson.
[3]
Artificial intelligence for direct-to-physician reporting of ambulatory electrocardiography - Nature Medicine
Direct-to-physician reporting of leads II and III ambulatory ECG recordings using the DeepRhythmAI model would result in 17 times fewer missed diagnoses of critical arrhythmias than usual care with technician annotation and has a negative predictive value exceeding 99.9%. This would be at a cost of seven extra false-positive findings per 1,000 patient days of recording. AI analysis may substantially reduce labor costs and could potentially report results in near real time. The source population for this study is an unselected patient population of 14,606 individuals, consisting of a random sample of patients who had been monitored in the United States for clinical indications between 2016 and 2019. Recording durations varied from 1 to 31 days. The dataset consisted of 211,010 days of ambulatory monitoring collected in these patients using PocketECG (Medicalgorithmics). PocketECG is a full-disclosure ECG device with limb lead configuration (leads II and III) and a sampling rate of 300 samples per second. The device can record and transmit ECG signals for up to 31 days. The patients were referred by 1,079 different physicians from 166 clinics, and the recordings were analyzed in clinical practice at an independent diagnostic testing facility by one of 167 certified ECG technicians working with a features-based algorithm using adaptive beat morphology template generation and comparison so that each QRS complex in the recording was annotated beat-to-beat by the ECG technician. ECG technician work was extensive and included a review of the whole ECG recording and verification of all events detected by the algorithm, including pauses and asystoles, all bradycardia events, all missed heartbeats or second- and third-degree AV blocks, all ventricular and supraventricular arrhythmias and all episodes detected as AF. In this process, artifacts and electrode dysfunction were re-annotated. The technicians also inspected all regions of the recording marked as having a 'patient-triggered symptom' flag and reviewed the recording at the time of the fastest, slowest and average minutely heart rate. They were aided in this process by software that allowed them to manually inspect heart rate trends for irregularities, filter beats by heart rate and group beats into morphologies. At the end of the review, episodes were selected for inclusion in a report to physicians. Before inclusion in the study, all data were anonymized, and the Ethics Review Board of Sweden has therefore waived the need for approval (decision 2019-03227). As such, the Ethics Review Board did not consider that informed consent was necessary. The DeepRhythmAI model (v3.1; Medicalgorithmics) is a proprietary mixed network ensemble for rhythm classification. The network performs QRS and noise detection, beat classification and rhythm identification using several algorithms based on convolutional neural networks and transformer architecture with custom-built components. The main network components for QRS detection and rhythm classification have been pretrained on 1,716,141 5-min-long ECG strips and fine-tuned on 60,549 ≤30 s ECG strips. These were extracted from 69,706 anonymized clinical long-term recordings. Algorithm internal validation was performed using 15,188 ≤30 s strips from 12,330 additional separate patient recordings. A high-level flowchart of the algorithm is presented in the Extended Data Fig. 5. The preprocessing involves selecting desired ECG channels from input data, scaling the signal amplitude according to the input analog-digital conversion values and resampling to a frequency of 300 Hz. A deep learning model predicts the probability of QRS complex presence and signal readability, extracts signal features and predicts the probability of QRS complex presence and readable signal. This output, together with the preprocessed signal, is passed to an ensemble combined from models of two structures. The first is intended for the analysis of information from a wide context and has a hybrid architecture of the convolutional neural network and transformer encoder layers. The second is a pure-transformer implementation based on Vision Transformer, allowing for a superior interpretation of signal within a relatively narrow window. Additionally, a specialized classifier was developed for the detection of asystole events. The QRS complex detector uses custom residual modules inspired by MobileNetV2. Each module consists of the following three one-dimensional convolutional layers: a pointwise convolution to expand feature dimension; a convolutional layer with a kernel length of 3 and variable dilation rates; a pointwise convolution to reduce feature dimensions to their original size. The dilation rate doubles in each residual module during the first half of the model and then progressively decreases to a rate of 1 at the output layer. A final linear layer converts the output features into probabilities of QRS complex presence and signal readability for each sample. Thresholding and morphological operations are subsequently applied to extract QRS positions and identify nondiagnostic ranges. The wide-context architecture comprises a series of submodules. Initially, features are extracted from heart rate trends, calculated based on QRS detections, using the same architecture as the QRS detector (excluding the final linear layer). Another submodule extracts features for each sample of the preprocessed ECG signal using residual modules from the QRS detector but with a fixed dilation rate progression. The signal is downsampled using strided convolutional layers. Subsequently, windows of downsampled features are extracted, and two-dimensional strided convolutional layers are applied, resulting in features for each beat. The resulting features are processed using transformer encoder layers, augmented by an additional convolutional layer inserted between the linear layers in the fully connected blocks. Finally, the features are converted to logits for each QRS complex class using two pointwise convolutional layers. The signal-detail architecture is based on transformer encoder layers that process ECG signals split into patches. A linear layer embeds each patch. The transformer layers process the embedded patches, and logits for each QRS complex class are calculated using a linear layer. Only the patches containing QRS complexes are selected for predictions. The asystole filter module shares the same architecture as the wide-context model but is trained with hyperparameters and a dataset tailored to the asystole detection task. We used the same dataset for training the QRS complex and noise detector and the main components of the heartbeat classification ensemble (three wide-context models and three signal-detail models). Data augmentation techniques tailored to each of these tasks, like noise artifact generation or synthesis of heartbeats with rare features, were used to enhance training dataset diversity and mitigate overfitting. In addition to that, a classifier specializing in the interpretation of asystole events was developed by feeding to a single model with wide-context analysis architecture a carefully selected 11,670 strips with asystole or sinus arrest and 20,292 strips with noise or electrode dysfunction. The training process of this model encompassed methods from supervised and self-supervised learning domains. The ensemble model output is averaged or replaced by the asystole filter model output (for heartbeats with RR interval greater than the sinus arrest threshold of 2 s) to provide the probabilities of QRS complex classes. Finally, the heartbeat types that are the final output of the DeepRhythmAI model are translated to heart rhythm types. Optimization was performed using the AdamW algorithm. Models were internally evaluated by measuring the root mean squared error metric based on sensitivity, precision and F1 score calculated from predictions and ground truth of internal validation/test strips, following the methodology provided by the International Electrotechnical Commission 60601-2-47 standard. The ECG recordings used in this study had never been presented to the DeepRhythmAI model or any AI model from which the DeepRhythmAI model was derived, but as part of the study protocol, we analyzed the entire raw ECG signal data from these same recordings using the DeepRhythmAI model to provide detection and beat-to-beat classification of all heartbeats. Our strip selection method was designed to not introduce any bias toward using ECG signals with less baseline noise or arrhythmic events presenting with typical ECG diagnoses. We did this by automation; fully random individual recordings were searched by an algorithm for the presence of arrhythmic events of each rhythm class, and 34-s strips containing arrhythmia events according to either the AI model annotations, the ECG technician annotations or both were selected, at a maximum of one per method and arrhythmia class per patient. The automated selection script ran until a total of 500 strips each had been selected for each of the critical arrhythmias and 250 strips each had been selected for the noncritical rhythm classes, or all recordings had been searched and no more arrhythmias were found. The number of individual recordings that had to be searched to yield the strips for each rhythm class was considered the source population size for that class. The strip selection is described in Extended Data Fig. 6. In addition to the critical and noncritical rhythm classes, we included sinus rhythm, sinus bradycardia and unreadable signals due to noise or electrode dysfunction to evaluate the AI model performance for these signals and to ensure that the physician annotators would be provided with a differentiated sample in which they did not know which strips would contain critical arrhythmias. In total, we selected 5,245 strips, of which 2,240 were critical arrhythmias, and after errors in uploading ten of these to the annotation platform, we had 5,235 strips, of which 2,236 were critical arrhythmias. All 34-s strips were annotated beat-to-beat by 17 panels consisting of three expert annotators each -- ≥2 board-certified cardiologists and additionally including board-certified clinical physiologists (n = 2) or final-year cardiology residents. The physicians on the panels performed the annotation independently of AI and technician annotations and were blinded to the strip selection criteria. Strips were randomly distributed among panels and presented in random order and were annotated using a custom-built software platform in which QRS complex tags, without beat type classifications, as detected by the AI model, were present. We used DeepRhythmAI model-detected QRS complexes for strips detected by both the AI model and the technicians to minimize bias; technicians in clinical practice may not have bothered to correct QRS tags for all instances of arrhythmia, and differential methodology for strips could have resulted in unblinding. The QRS tags were highly concordant. For QRS complexes that resulted in technician false negatives, there was a 98% overlap between the AI model and fixed features algorithm QRS positions. Physician annotators were asked to identify the beat type for each QRS complex according to an annotation manual (Supplemental Note), correct any mistaken QRS position placements, add any missed QRS complexes and mark areas that were unreadable due to poor signal or electrode dysfunction. Each physician annotated the entire strip beat by beat, and all discrepancies on the beat level were resolved by panel consensus. The resulting gold-standard annotations were compared to the beat-to-beat annotations of the AI model and technicians according to prespecified acceptance criteria, where we considered arrhythmic events to be concordant with the panel annotation in case of ≥80% overlap in beat type and duration with the panel annotation for all sustained tachyarrhythmias and 90% overlap in duration for asystole events and pauses. For second- or third-degree AV block, we considered the presence of any such event within the strip to be a concordant annotation, and for ECG technicians, we also considered annotation of an unspecified 'missed beat' to be a concordant annotation for second-degree AV block. Single ectopic atrial and ventricular beats were considered concordant within ±45 samples (150 ms). Noise annotations were considered concordant if within 80% of the panel annotation as regards duration. Minor discrepancies between the AI/technician annotations and consensus panel annotations, on the beat-to-beat level, were thus allowed, for example, low numbers of supraventricular beats or beats with unknown beat types within AF episodes. The primary analysis compares the frequency of false-negative, true-positive and false-positive critical arrhythmias per 1,000 individual patients over the full duration of the recordings for technicians and the AI model, along with full confusion matrix statistics for the AI model and technician performance compared to panel annotations. As a result of the sampling strategy, false negatives were only reported in patients in whom all instances of an arrhythmia type were missed for the entire duration of the recording. True-positive events were defined as episodes detected by the AI model or technician, with correct annotations according to the independent gold-standard consensus panel annotation. Descriptive statistics are reported as mean ± s.d. CIs were derived using bootstrapping with 1,000 replications. Definitions for the confusion matrix statistics are reported in the Extended Data Table 4. We also performed subanalyses where misclassifications of critical arrhythmias were not considered false-negative or false-positive events because these events would have been reported to physicians. In these analyses also, we did not consider second-degree AV block to be a false-positive finding. For the analyses of total false-positive and false-negative findings of critical arrhythmias, the prevalence of all arrhythmias was weighted to the full population size according to the proportion of the population queried. Nonoverlapping CIs were considered evidence of the superiority of one method over the other. All analyses were performed in Python, except for the calculations of RR, which were done in Stata version 17.0 for Mac, using two-sided Fisher's exact P values. Analyses were performed by L.S.J. and G.J., with involvement from the steering group, according to prespecified plans. The study steering group (L.S.J., J.S.H., A.P.B. and A.M.) met regularly throughout the conduct of the study without the presence of Medicalgorithmics employees. Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Share
Share
Copy Link
A large international study reveals that AI is 14 times more accurate than human technicians in analyzing long-term ECG recordings, potentially revolutionizing cardiac diagnostics and addressing global healthcare worker shortages.
A groundbreaking international study has revealed that artificial intelligence (AI) significantly outperforms human technicians in analyzing long-term electrocardiogram (ECG) recordings. The research, led by Linda Johnson from Lund University and Jeff Healey from McMaster University, demonstrates that AI can reduce missed diagnoses of severe arrhythmias by a factor of 14 compared to traditional human analysis 1.
The study, published in Nature Medicine, involved 14,606 patients who underwent an average of 14 days of ECG recording, totaling over 200,000 days of ECG data. This extensive dataset was initially analyzed by ECG technicians using standard clinical methods. Subsequently, the same data was re-examined using an AI algorithm called "DeepRhythmAI," developed by MEDICALgorithmics in Poland 2.
To establish a gold standard for comparison, the researchers randomly selected over 5,000 arrhythmia episodes for intensive beat-by-beat analysis by 17 panels of expert physicians from around the world. This meticulous approach provided a high-quality benchmark against which both human and AI interpretations could be evaluated 1.
The results were striking: the AI-powered analysis missed severe arrhythmias in only 0.3% of patients, compared to 4.4% for human technicians. This 14-fold improvement in accuracy could have significant implications for patient care and diagnosis 2.
Moreover, the AI model demonstrated an impressive ability to rule out severe arrhythmias with 99.9% confidence in a 14-day ECG recording. The rate of false positives was slightly higher for AI (12 per 1,000 recording days) compared to human analysis (5 per 1,000 recording days), but this trade-off was deemed acceptable given the substantial improvement in detecting genuine arrhythmias 1.
The researchers emphasize that this study is not about proving AI's superiority over cardiologists in diagnosing specific arrhythmias. Instead, it explores the potential of AI to replace human technicians in the initial analysis of ECG recordings, with physicians then reviewing the AI-generated reports 2.
This approach could address the global shortage of healthcare workers, estimated at around 15 million worldwide. By potentially eliminating the need for specially trained ECG technicians, AI could significantly reduce bottlenecks in healthcare systems and enable more widespread use of long-term ECG monitoring 1.
The DeepRhythmAI model employs a sophisticated mixed network ensemble for rhythm classification. It utilizes convolutional neural networks and transformer architecture with custom-built components for QRS detection, beat classification, and rhythm identification. The model was pretrained on over 1.7 million ECG strips and fine-tuned on data from nearly 70,000 anonymized clinical long-term recordings 3.
This study represents a significant step forward in the application of AI to cardiac diagnostics. If implemented widely, AI-powered ECG analysis could lead to faster, more accurate, and more cost-effective cardiac diagnostics, potentially improving patient outcomes and reducing the strain on healthcare systems worldwide 2.
Reference
[1]
[2]
Medical Xpress - Medical and Health News
|AI proves better than humans at analyzing long-term ECG recordings in large international studyRecent studies showcase the power of AI in improving cardiovascular disease risk prediction through enhanced analysis of ECG and CT scan data, offering more precise and actionable insights for clinicians.
2 Sources
2 Sources
A new AI model developed by researchers at Imperial College London can identify female patients at higher risk of heart disease by analyzing electrocardiograms (ECGs), potentially improving early detection and treatment for women.
3 Sources
3 Sources
Researchers from MIT and Harvard Medical School have developed CHAIS, an AI model that analyzes ECG data to predict heart failure risk, potentially replacing invasive procedures with comparable accuracy.
2 Sources
2 Sources
A new AI program called PanEcho has shown remarkable accuracy in interpreting echocardiograms, potentially reducing wait times for results and speeding up medical care for heart patients.
2 Sources
2 Sources
A new study by Mayo Clinic researchers demonstrates that AI-enhanced electrocardiogram (AI-ECG) tools for detecting weak heart pumps are not only effective but also cost-efficient, especially in outpatient settings.
4 Sources
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved