2 Sources
2 Sources
[1]
AI uses medical records to accurately predict onset of disease 20 years into the future
You have full access to this article via Jozef Stefan Institute. Artificial intelligence (AI) has long been a focus of science-fiction novels and films, some of which explore the associated ethical and philosophical complications. In the past few years, the technology has moved rapidly from the fictional realm to become a tool with real-world applications, including in health care. Ongoing efforts to use AI to understand and predict the timing of human disease diagnoses might lead to treatment advances as well as medical discoveries. Writing in Nature, Shmatko et al. present an AI approach that uses a large set of electronic health records to predict the rate at which any disease, from a list of more than 1,200, might arise during the course of a person's life, and when disease might occur. The tool estimates the occurrence of various future diseases most successfully in a time frame of up to 20 years ahead, and provides an advance compared with prediction methods that focus on single diseases. AI systems that can accurately learn about and simulate the emergence of human diseases must also adjust for complex factors that are relevant to an individual's health care. These include demographics (characteristics including age and sex), history of clinical care (such as previous diagnoses) and factors that affect health (smoking, body mass index or levels of alcohol consumption, for example). Shmatko and colleagues' system has the potential to improve the effectiveness of testing for disease. The system might also aid clinical decision-making, particularly if it were combined with a person's treatment history to generate a virtual representation known as a digital twin. This might be used to simulate the consequences of real-world treatments and interventions for an individual. In other words, the authors' system could be adapted to predict the clinical outcome for a given person, by comparing that person's case with available data for close-to-equivalent 'twins' with similar health backgrounds, particularly those who have been diagnosed with the same condition and received therapy for that disease. A particularly attractive aspect of health-care digital twins is that they could enable predictions of treatment and health trajectories, including an assessment of projected diseases over time, at the population level. Until now, existing models have not been powerful enough to simulate the history of human disease trajectories for a large number of conditions that arise at varied time points over the course of someone's life. Simulating health-care events over a lifetime is a much more challenging and complex task than trying to predict the onset of one specific event at a single time point. Shmatko and colleagues' study explores the use of a type of AI called generative AI, which generates new information after analysing a training set of data. The authors took an AI approach that uses a subset of large language models (LLMs) called transformer models, which form the basis of chatbots such as ChatGPT. The authors' model was trained using real-world electronic health records (EHRs) capturing demographics, diagnoses and behaviours related to health. Transformer models use a strategy called positional embedding (also known as positional encoding) to differentiate the left-to-right (first-to-last) sequence of certain human languages and thereby capture the relationships between words and phrases to predict what comes next. This strategy can also be helpful for health prediction models that are trained on medical records, which log the relationships between diagnoses and life events. For example, a diagnosis of lung cancer can sometimes follow from a person becoming a smoker. The transformer model has two key components. The first is an 'encoder', an algorithm that converts inputs called tokens (a type of text input that arises at specific times; Fig. 1), into numerical vector representations. The second is a 'decoder', an algorithm that can change these vector representations back into words that humans can read. Generative LLMs have adopted the decoder component of the transformer model. Shmatko and colleagues' study repurposed the transformer model from the world of linguistics for use in health care and medicine. More specifically, this study takes a positional-embedding approach to modelling, using token inputs to record and predict the sequence of disease onset, taking into account patient age and other inputs that relate to health-care factors. The system also modifies an aspect of AI called the attention mechanism. The model can be used to predict a person's health trajectory in the future -- which diseases might arise, and when. The authors' model was trained using large-scale data from EHRs in the UK Biobank (for 400,000 individuals) and tested using Danish EHRs (for 1.9 million individuals). The validation of the model and assessment of whether there was any demographic bias when comparing the UK and Danish populations was done thoroughly, by comparing the tool's performance for different demographic subgroups. The results of these validation experiments show that this transformer-model AI is well suited to accurately simulating human disease onset using real-world EHRs. The AI model that Shmatko and colleagues developed, called Delphi-2M, has many potential applications, such as assessing disease risk or providing support for clinical decisions. Accurate prediction of when diseases might emerge in a person's lifetime is crucial information for building a digital twin of real-world individuals to accelerate medical discovery. An important requirement for the successful development of medical digital twins will be the ability to simulate complex real-world medical events that go beyond existing predictive models. The AI model proposed in this study could serve as an engine to power such digital-twin-based simulations. If improved simulations become available, future clinicians might import a digital twin of a patient into their computer and run millions of simulations using all available treatments and interventions to check possible outcomes on the basis of results reported previously in EHRs. Shmatko and colleagues' work has great potential to accelerate data-driven medical discoveries.
[2]
An AI tool is trying to predict your risk of getting many diseases years in advance - here's how it works
University of Warwick provides funding as a founding partner of The Conversation UK. Being able to instantly and accurately predict the trajectory of a person's health in the years to come has long been seen as the pinnacle of medicine. This kind of information would have a profound effect on healthcare systems as a whole - shifting care from treatment to prevention. According to the findings of a recently published paper, researchers are promising just that. Using cutting-edge artificial intelligence (AI) technology, the researchers built Delphi-2M. This tool is seeking to predict a person's next health event and when it's likely to happen in the next 20 years. The model does this for a thousand different diseases including cancer, diabetes and heart disease. To develop Delphi-2M, the European research team used data from nearly 403,000 people from the UK Biobank as an input into the AI model. In the final trained AI model, Delphi-2M predicted the next disease and when it would occur based on a person's sex at birth, their body mass index, whether they smoked or drank alcohol, and their timeline of prior diseases. It was able to make these predictions with a 0.7 AUC (area under the curve). AUC aggregates false positive and false negative rates, so can be used as a proxy for accuracy in a theoretical setting. This means the model's predictions could be interpreted to have about 70% accuracy across all disease categories - although the accuracy of these predictions have not yet been tested in terms of real-world outcomes. They then applied the model to Danish Biobank data to see whether it was still effective. It was able to predict health outcomes with similar theoretical accuracy rates. AI tools The purpose of the paper wasn't to suggest the Delphi-2M is ready to be used by doctors or in the medical field. Rather, it was to illustrate the power of the team's proposed AI architecture, and the benefit it could have in analysing medical data. Delphi-2M uses a "transformer network" to make its predictions. This is the same technology architecture that powers ChatGPT. The researchers modified the GPT2 transformer architecture to use time and disease features to predict when and what will happen. Although other health prediction models have used transformer networks in the past, these were only designed to make predictions about a person's risk of developing a single disease. Plus, they were primarily used on smaller-scale hospital record data. But transformer networks are particularly well-suited for predicting a person's risk of multiple diseases. This is because they can adapt their focus easily and are able to work out complex interactions between many different diseases from multiple distinct data points. Delphi-2M has also proven to be slightly more accurate than other multi-disease prediction models which use a different architecture. For example, Milton uses a combination of standard machine learning techniques and applied them to the same UK Biobank data. This model showed somewhat lower predictive power for most diseases compared with Delphi-2M - and needed to use more data to do so. Moreover, non-transformer models are hard for others to improve by adding more data layers. This means these models cannot be as easily adapted and improved upon as transformer models for use in different contexts and studies. What's special about the Delphi-2M model is that it can be released to the public as an open-source model without compromising patients' privacy. The authors were able to create synthetic data that mimics the UK Biobank data while removing personally identifiable information - all without a significant drop in predictive power. Moreover, Delphi-2M requires less computing resources to train than typical AI transformer models. This will allow other researchers to train the model from scratch and possibly tailor the model and information for their needs. This is important for the advancement of open science and is generally difficult to do in medical settings. Still too early Whether or not Delphi-2M becomes the foundation model for AI tools that are designed to predict a patient's future health risks, it demonstrates that models such as this are on the way. Due to its layered architecture and open-source nature, future models similar to Delphi-2M will continue to evolve by incorporating even richer data - such as electronic health records, medical images, wearable technologies and location data. This would improve its predictive powers and accuracy over time. But while the ability to prevent diseases and provide early diagnosis holds great promise, there are a few key caveats when it comes to this predictive tool. First, there are numerous data-related concerns associated with such tools. As we have written before, the quality of data and training that an AI tool receives makes or breaks its predictions. The UK Biobank dataset used to create Delphi-2M didn't have sufficient data on diverse races and ethnic groups to allow for in-depth training and performance analysis. While some analysis was performed by the Delphi-2M researchers to show that adding ethnicity and race didn't sway the results too much, there was still insufficient data in many categories to even conduct the assessment. If ever used in the real world, personal healthcare data will probably be used and layered on top of foundation models such as Delphi-2M. While the inclusion of this personal data will improve prediction accuracy, it also comes with risks - for example, around personal data security and out-of-context use of the data. It may also be difficult to scale the model to countries whose healthcare systems differ from those that are used to design the dataset. For instance, it may be harder to apply Delphi-2M to the US context, where healthcare data is spread around multiple hospital systems and private clinics. At present, it's too early for Delphi-2M to be used by patients or doctors. While Delphi-2M provided generalised predictions based on the data that was used to train it, it's too early to use these predictions for personalised health recommendations for an individual patient. But hopefully, with continued investment into researching and building Delphi-2M-style models, it will someday be possible to input a patient's personal health data into the model and get a personalised prediction.
Share
Share
Copy Link
Researchers have developed an AI tool called Delphi-2M that can predict the onset of over 1,200 diseases up to 20 years in advance. This groundbreaking technology uses electronic health records and transformer models to forecast health trajectories.
Researchers have made a significant breakthrough in AI-driven healthcare prediction, developing a tool capable of forecasting the onset of over 1,200 diseases up to 20 years in advance. The study, published in Nature, introduces Delphi-2M, an AI system that utilizes electronic health records (EHRs) to predict disease occurrence and timing throughout a person's life
1
.Source: Nature
Delphi-2M employs a transformer model, similar to the technology behind ChatGPT, adapted for healthcare applications. This approach allows the system to process complex health data, including demographics, clinical history, and lifestyle factors, to generate accurate predictions
1
2
.The model was trained using data from 400,000 individuals in the UK Biobank and validated with 1.9 million Danish EHRs. Delphi-2M demonstrated a theoretical accuracy of about 70% across all disease categories, outperforming previous single-disease prediction models
1
2
.Delphi-2M's capabilities extend beyond individual disease prediction. The system could potentially:
1
Related Stories
A unique feature of Delphi-2M is its ability to be released as an open-source model without compromising patient privacy. This allows other researchers to train and adapt the model for their specific needs, promoting advancement in open science within medical settings
2
.While Delphi-2M represents a significant advancement in AI-driven healthcare prediction, it is still in the research phase. Future iterations may incorporate richer data sources, such as medical images and wearable technology data, to improve predictive accuracy. However, concerns remain regarding data quality and the potential for bias in AI training datasets
2
.Summarized by
Navi