The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Tue, 5 Nov, 8:03 AM UTC
3 Sources
[1]
Persistent problems plague AI-assisted genomic studies, researchers warn
University of Wisconsin-Madison researchers are warning that artificial intelligence tools gaining popularity in the fields of genetics and medicine can lead to flawed conclusions about the connection between genes and physical characteristics, including risk factors for diseases like diabetes. The faulty predictions are linked to researchers' use of AI to assist genome-wide association studies. Such studies scan through hundreds of thousands of genetic variations across many people to hunt for links between genes and physical traits. Of particular interest are possible connections between genetic variations and certain diseases. Genetics' link to disease not always straightforward Genetics play a role in the development of many health conditions. While changes in some individual genes are directly connected to an increased risk for diseases like cystic fibrosis, the relationship between genetics and physical traits is often more complicated. Genome-wide association studies have helped to untangle some of these complexities, often using large databases of individuals' genetic profiles and health characteristics, such as the National Institutes of Health's All of Us project and the UK Biobank. However, these databases are often missing data about health conditions that researchers are trying to study. "Some characteristics are either very expensive or labor-intensive to measure, so you simply don't have enough samples to make meaningful statistical conclusions about their association with genetics," says Qiongshi Lu, an associate professor in the UW-Madison Department of Biostatistics and Medical Informatics and an expert on genome-wide association studies. The risks of bridging data gaps with AI Researchers are increasingly attempting to work around this problem by bridging data gaps with ever more sophisticated AI tools. "It has become very popular in recent years to leverage advances in machine learning, so we now have these advanced machine-learning AI models that researchers use to predict complex traits and disease risks with even limited data," Lu says. Now, Lu and his colleagues have demonstrated the peril of relying on these models without also guarding against biases they may introduce. The team describe the problem in a paper recently published in the journal Nature Genetics. In it, Lu and his colleagues show that a common type of machine learning algorithm employed in genome-wide association studies can mistakenly link several genetic variations with an individual's risk for developing type 2 diabetes. "The problem is if you trust the machine learning-predicted diabetes risk as the actual risk, you would think all those genetic variations are correlated with actual diabetes even though they aren't," says Lu. These "false positives" are not limited to these specific variations and diabetes risk, Lu adds, but are a pervasive bias in AI-assisted studies. New statistical method can reduce false positives In addition to identifying the problem with overreliance on AI tools, Lu and his colleagues propose a statistical method that researchers can use to guarantee the reliability of their AI-assisted genome-wide association studies. The method helps removing bias that machine learning algorithms can introduce when they're making inferences based on incomplete information. "This new strategy is statistically optimal," Lu says, noting that the team used it to better pinpoint genetic associations with individuals' bone mineral density. AI not the only problem with some genome-wide association studies While the group's proposed statistical method could help improve the accuracy of AI-assisted studies, Lu and his colleagues also recently identified problems with similar studies that fill data gaps with proxy information rather than algorithms. In another recently published paper appearing in Nature Genetics, the researchers ring the alarm about studies that over-rely on proxy information in an attempt to establish connections between genetics and certain diseases. For instance, large health databases like the UK Biobank have a ton of genetic information about large populations, but they don't have very much data regarding the incidence of diseases that tend to crop up later in life, like most neurodegenerative diseases. For Alzheimer's disease specifically, some researchers have attempted to bridge that gap with proxy data gathered through family health history surveys, where individuals can report a parent's Alzheimer's diagnosis. The UW-Madison team found that such proxy-information studies can produce "highly misleading genetic correlation" between Alzheimer's risk and higher cognitive abilities. "These days, genomic scientists routinely work with biobank datasets that have hundreds of thousands of individuals, however, as statistical power goes up, biases and the probability of errors are also amplified in these massive datasets," says Lu. "Our group's recent studies provide humbling examples and highlight the importance of statistical rigor in biobank-scale research studies."
[2]
Persistent problems with AI-assisted genomic studies
University of Wisconsin-Madison researchers are warning that artificial intelligence tools gaining popularity in the fields of genetics and medicine can lead to flawed conclusions about the connection between genes and physical characteristics, including risk factors for diseases like diabetes. The faulty predictions are linked to researchers' use of AI to assist genome-wide association studies. Such studies scan through hundreds of thousands of genetic variations across many people to hunt for links between genes and physical traits. Of particular interest are possible connections between genetic variations and certain diseases. Genetics' link to disease not always straightforward Genetics play a role in the development of many health conditions. While changes in some individual genes are directly connected to an increased risk for diseases like cystic fibrosis, the relationship between genetics and physical traits is often more complicated. Genome-wide association studies have helped to untangle some of these complexities, often using large databases of individuals' genetic profiles and health characteristics, such as the National Institutes of Health's All of Us project and the UK Biobank. However, these databases are often missing data about health conditions that researchers are trying to study. "Some characteristics are either very expensive or labor-intensive to measure, so you simply don't have enough samples to make meaningful statistical conclusions about their association with genetics," says Qiongshi Lu, an associate professor in the UW-Madison Department of Biostatistics and Medical Informatics and an expert on genome-wide association studies. The risks of bridging data gaps with AI Researchers are increasingly attempting to work around this problem by bridging data gaps with ever more sophisticated AI tools. "It has become very popular in recent years to leverage advances in machine learning, so we now have these advanced machine-learning AI models that researchers use to predict complex traits and disease risks with even limited data," Lu says. Now, Lu and his colleagues have demonstrated the peril of relying on these models without also guarding against biases they may introduce. The team describe the problem in a paper recently published in the journal Nature Genetics. In it, Lu and his colleagues show that a common type of machine learning algorithm employed in genome-wide association studies can mistakenly link several genetic variations with an individual's risk for developing Type 2 diabetes. "The problem is if you trust the machine learning-predicted diabetes risk as the actual risk, you would think all those genetic variations are correlated with actual diabetes even though they aren't," says Lu. These "false positives" are not limited to these specific variations and diabetes risk, Lu adds, but are a pervasive bias in AI-assisted studies. New statistical method can reduce false positives In addition to identifying the problem with overreliance on AI tools, Lu and his colleagues propose a statistical method that researchers can use to guarantee the reliability of their AI-assisted genome-wide association studies. The method helps removing bias that machine learning algorithms can introduce when they're making inferences based on incomplete information. "This new strategy is statistically optimal," Lu says, noting that the team used it to better pinpoint genetic associations with individuals' bone mineral density. AI not the only problem with some genome-wide association studies While the group's proposed statistical method could help improve the accuracy of AI-assisted studies, Lu and his colleagues also recently identified problems with similar studies that fill data gaps with proxy information rather than algorithms. In another recently published paper appearing in Nature Genetics, the researchers ring the alarm about studies that over-rely on proxy information in an attempt to establish connections between genetics and certain diseases. For instance, large health databases like the UK Biobank have a ton of genetic information about large populations, but they don't have very much data regarding the incidence of diseases that tend to crop up later in life, like most neurodegenerative diseases. For Alzheimer's disease specifically, some researchers have attempted to bridge that gap with proxy data gathered through family health history surveys, where individuals can report a parent's Alzheimer's diagnosis. The UW-Madison team found that such proxy-information studies can produce "highly misleading genetic correlation" between Alzheimer's risk and higher cognitive abilities. "These days, genomic scientists routinely work with biobank datasets that have hundreds of thousands of individuals, however, as statistical power goes up, biases and the probability of errors are also amplified in these massive datasets," says Lu. "Our group's recent studies provide humbling examples and highlight the importance of statistical rigor in biobank-scale research studies."
[3]
UW-Madison researchers find persistent problems | Newswise
University of Wisconsin-Madison researchers are warning that artificial intelligence tools gaining popularity in the fields of genetics and medicine can lead to flawed conclusions about the connection between genes and physical characteristics, including risk factors for diseases like diabetes. The faulty predictions are linked to researchers' use of AI to assist genome-wide association studies. Such studies scan through hundreds of thousands of genetic variations across many people to hunt for links between genes and physical traits. Of particular interest are possible connections between genetic variations and certain diseases. Genetics play a role in the development of many health conditions. While changes in some individual genes are directly connected to an increased risk for diseases like cystic fibrosis, the relationship between genetics and physical traits is often more complicated. Genome-wide association studies have helped to untangle some of these complexities, often using large databases of individuals' genetic profiles and health characteristics, such as the National Institutes of Health's All of Us project and the UK Biobank. However, these databases are often missing data about health conditions that researchers are trying to study. "Some characteristics are either very expensive or labor-intensive to measure, so you simply don't have enough samples to make meaningful statistical conclusions about their association with genetics," says Qiongshi Lu, an associate professor in the UW-Madison Department of Biostatistics and Medical Informatics and an expert on genome-wide association studies. Researchers are increasingly attempting to work around this problem by bridging data gaps with ever more sophisticated AI tools. "It has become very popular in recent years to leverage advances in machine learning, so we now have these advanced machine-learning AI models that researchers use to predict complex traits and disease risks with even limited data," Lu says. Now, Lu and his colleagues have demonstrated the peril of relying on these models without also guarding against biases they may introduce. The team describe the problem in a paper recently published in the journal Nature Genetics. In it, Lu and his colleagues show that a common type of machine learning algorithm employed in genome-wide association studies can mistakenly link several genetic variations with an individual's risk for developing Type 2 diabetes. "The problem is if you trust the machine learning-predicted diabetes risk as the actual risk, you would think all those genetic variations are correlated with actual diabetes even though they aren't," says Lu. These "false positives" are not limited to these specific variations and diabetes risk, Lu adds, but are a pervasive bias in AI-assisted studies. In addition to identifying the problem with overreliance on AI tools, Lu and his colleagues propose a statistical method that researchers can use to guarantee the reliability of their AI-assisted genome-wide association studies. The method helps remove bias that machine learning algorithms can introduce when they're making inferences based on incomplete information. "This new strategy is statistically optimal," Lu says, noting that the team used it to better pinpoint genetic associations with individuals' bone mineral density. While the group's proposed statistical method could help improve the accuracy of AI-assisted studies, Lu and his colleagues also recently identified problems with similar studies that fill data gaps with proxy information rather than algorithms. In another recently published paper appearing in Nature Genetics, the researchers sound the alarm about studies that over-rely on proxy information in an attempt to establish connections between genetics and certain diseases. For instance, large health databases like the UK Biobank have a ton of genetic information about large populations, but they don't have very much data regarding the incidence of diseases that tend to crop up later in life, like most neurodegenerative diseases. For Alzheimer's disease specifically, some researchers have attempted to bridge that gap with proxy data gathered through family health history surveys, where individuals can report a parent's Alzheimer's diagnosis. The UW-Madison team found that such proxy-information studies can produce "highly misleading genetic correlation" between Alzheimer's risk and higher cognitive abilities. "These days, genomic scientists routinely work with biobank datasets that have hundreds of thousands of individuals; however, as statistical power goes up, biases and the probability of errors are also amplified in these massive datasets," says Lu. "Our group's recent studies provide humbling examples and highlight the importance of statistical rigor in biobank-scale research studies."
Share
Share
Copy Link
University of Wisconsin-Madison researchers caution about flawed conclusions in AI-assisted genome-wide association studies, highlighting risks of false positives and proposing new methods to improve accuracy.
Researchers from the University of Wisconsin-Madison have raised concerns about the use of artificial intelligence (AI) tools in genetics and medicine, warning that they can lead to erroneous conclusions about the relationship between genes and physical characteristics, including disease risk factors 123. The study, published in Nature Genetics, focuses on the problems arising from AI-assisted genome-wide association studies.
Genome-wide association studies scan through hundreds of thousands of genetic variations across large populations to identify links between genes and physical traits, particularly focusing on connections to certain diseases. While some genetic changes directly correlate with increased risk for diseases like cystic fibrosis, the relationship between genetics and physical traits is often more intricate 12.
Large databases like the National Institutes of Health's All of Us project and the UK Biobank often lack comprehensive data on specific health conditions. Qiongshi Lu, an associate professor at UW-Madison, explains:
"Some characteristics are either very expensive or labor-intensive to measure, so you simply don't have enough samples to make meaningful statistical conclusions about their association with genetics" 123.
To address this issue, researchers have turned to sophisticated AI tools to bridge these data gaps.
Lu and his colleagues demonstrated that relying on these AI models without proper safeguards can introduce significant biases. Their research revealed that a common machine learning algorithm used in genome-wide association studies mistakenly linked several genetic variations to an individual's risk of developing Type 2 diabetes 123.
"The problem is if you trust the machine learning-predicted diabetes risk as the actual risk, you would think all those genetic variations are correlated with actual diabetes even though they aren't," Lu cautions 123.
These false positives are not limited to diabetes risk but represent a pervasive bias in AI-assisted studies across various health conditions.
To combat these issues, Lu and his team have proposed a new statistical method to enhance the reliability of AI-assisted genome-wide association studies. This method aims to remove biases introduced by machine learning algorithms when making inferences based on incomplete information 123.
The researchers successfully applied this "statistically optimal" strategy to better identify genetic associations with individuals' bone mineral density 123.
In a separate study also published in Nature Genetics, the UW-Madison team identified problems with studies that use proxy information to fill data gaps. For instance, some researchers use family health history surveys to gather data on Alzheimer's disease, as large health databases often lack information on late-onset conditions 123.
The team found that such proxy-information studies can produce "highly misleading genetic correlation" between Alzheimer's risk and higher cognitive abilities 123.
Lu emphasizes the critical need for statistical rigor in large-scale genomic research:
"These days, genomic scientists routinely work with biobank datasets that have hundreds of thousands of individuals; however, as statistical power goes up, biases and the probability of errors are also amplified in these massive datasets" 123.
The researchers' work serves as a cautionary tale, highlighting the importance of maintaining statistical integrity in the era of big data and AI-assisted genomic studies.
Reference
[1]
Medical Xpress - Medical and Health News
|Persistent problems plague AI-assisted genomic studies, researchers warn[2]
University of Florida researchers develop an AI-powered tool called PhyloFrame to address ancestral bias in genetic data, aiming to improve precision medicine outcomes for diverse populations.
3 Sources
3 Sources
Researchers develop an AI-powered approach to identify genes associated with conditions like autism, epilepsy, and developmental delay, potentially revolutionizing genetic diagnosis and targeted therapies.
3 Sources
3 Sources
A new deep learning algorithm developed by researchers at MIT and Harvard Medical School can predict the effects of rare genetic variants on human health, potentially revolutionizing personalized medicine and genetic counseling.
2 Sources
2 Sources
A new study by UC Santa Cruz and University of British Columbia researchers highlights the potential of AI in healthcare while warning about its limitations in addressing fundamental public health issues.
4 Sources
4 Sources
Scientists at Columbia University have developed an AI model called GET that can accurately predict gene activity in human cells, potentially revolutionizing our understanding of cellular biology and disease mechanisms.
5 Sources
5 Sources