Curated by THEOUTPOST
On Thu, 28 Nov, 12:02 AM UTC
4 Sources
[1]
AI Outperforms Experts in Predicting Study Outcomes - Neuroscience News
Summary: A new study demonstrates that large language models (LLMs) can predict the outcomes of neuroscience studies more accurately than human experts, achieving 81% accuracy compared to 63% for neuroscientists. Using a tool called BrainBench, researchers tested LLMs and human experts on identifying real versus fabricated study abstracts, finding that the AI models excelled even when neuroscientists had domain-specific expertise. A specialized neuroscience-focused LLM, dubbed BrainGPT, achieved even higher accuracy at 86%. The study highlights the potential of AI in designing experiments, predicting results, and accelerating scientific progress across disciplines. Large language models, a type of AI that analyses text, can predict the results of proposed neuroscience studies more accurately than human experts, finds a new study led by UCL (University College London) researchers. The findings, published in Nature Human Behaviour, demonstrate that large language models (LLMs) trained on vast datasets of text can distil patterns from scientific literature, enabling them to forecast scientific outcomes with superhuman accuracy. The researchers say this highlights their potential as powerful tools for accelerating research, going far beyond just knowledge retrieval. Lead author Dr Ken Luo (UCL Psychology & Language Sciences) said: "Since the advent of generative AI like ChatGPT, much research has focused on LLMs' question-answering capabilities, showcasing their remarkable skill in summarising knowledge from extensive training data. However, rather than emphasising their backward-looking ability to retrieve past information, we explored whether LLMs could synthesise knowledge to predict future outcomes. "Scientific progress often relies on trial and error, but each meticulous experiment demands time and resources. Even the most skilled researchers may overlook critical insights from the literature. "Our work investigates whether LLMs can identify patterns across vast scientific texts and forecast outcomes of experiments." The international research team began their study by developing BrainBench, a tool to evaluate how well large language models (LLMs) can predict neuroscience results. BrainBench consists of numerous pairs of neuroscience study abstracts. In each pair, one version is a real study abstract that briefly describes the background of the research, the methods used, and the study results. In the other version, the background and methods are the same, but the results have been modified by experts in the relevant neuroscience domain to a plausible but incorrect outcome. The researchers tested 15 different general-purpose LLMs and 171 human neuroscience experts (who had all passed a screening test to confirm their expertise) to see whether the AI or the person could correctly determine which of the two paired abstracts was the real one with the actual study results. All of the LLMs outperformed the neuroscientists, with the LLMs averaging 81% accuracy and the humans averaging 63% accuracy. Even when the study team restricted the human responses to only those with the highest degree of expertise for a given domain of neuroscience (based on self-reported expertise), the accuracy of the neuroscientists still fell short of the LLMs, at 66%. Additionally, the researchers found that when LLMs were more confident in their decisions, they were more likely to be correct. The researchers say this finding paves the way for a future where human experts could collaborate with well-calibrated models. The researchers then adapted an existing LLM (a version of Mistral, an open-source LLM) by training it on neuroscience literature specifically. The new LLM specialising in neuroscience, which they dubbed BrainGPT, was even better at predicting study results, attaining 86% accuracy (an improvement on the general-purpose version of Mistral, which was 83% accurate). Senior author Professor Bradley Love (UCL Psychology & Language Sciences) said: "In light of our results, we suspect it won't be long before scientists are using AI tools to design the most effective experiment for their question. While our study focused on neuroscience, our approach was universal and should successfully apply across all of science. "What is remarkable is how well LLMs can predict the neuroscience literature. This success suggests that a great deal of science is not truly novel, but conforms to existing patterns of results in the literature. We wonder whether scientists are being sufficiently innovative and exploratory." Dr Luo added: "Building on our results, we are developing AI tools to assist researchers. We envision a future where researchers can input their proposed experiment designs and anticipated findings, with AI offering predictions on the likelihood of various outcomes. This would enable faster iteration and more informed decision-making in experiment design." Funding: The study was supported by the Economic and Social Research Council (ESRC), Microsoft, and a Royal Society Wolfson Fellowship, and involved researchers in UCL, University of Cambridge, University of Oxford, Max Planck Institute for Neurobiology of Behavior (Germany), Bilkent University (Turkey) and other institutions in the UK, US, Switzerland, Russia, Germany, Belgium, Denmark, Canada, Spain and Australia. Note: When presented with two abstracts, the LLM computes the likelihood of each, assigning a perplexity score to represent how surprising each is based on its own learned knowledge as well as the context (background and method). The researchers assessed LLMs' confidence by measuring the difference in how surprising/perplexing the models found real versus fake abstracts - the greater this difference, the greater the confidence, which correlated with a higher likelihood the LLM had picked the correct abstract. Large language models surpass human experts in predicting neuroscience results Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. Here, to evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs indicated high confidence in their predictions, their responses were more likely to be correct, which presages a future where LLMs assist humans in making discoveries. Our approach is not neuroscience specific and is transferable to other knowledge-intensive endeavours.
[2]
AI can predict neuroscience study results better than human experts, study finds
Large language models, a type of AI that analyzes text, can predict the results of proposed neuroscience studies more accurately than human experts, finds a study led by UCL (University College London) researchers. The findings, published in Nature Human Behaviour, demonstrate that large language models (LLMs) trained on vast datasets of text can distill patterns from scientific literature, enabling them to forecast scientific outcomes with superhuman accuracy. The researchers say this highlights their potential as powerful tools for accelerating research, going far beyond just knowledge retrieval. Lead author Dr. Ken Luo (UCL Psychology & Language Sciences) said, "Since the advent of generative AI like ChatGPT, much research has focused on LLMs' question-answering capabilities, showcasing their remarkable skill in summarizing knowledge from extensive training data. However, rather than emphasizing their backward-looking ability to retrieve past information, we explored whether LLMs could synthesize knowledge to predict future outcomes. "Scientific progress often relies on trial and error, but each meticulous experiment demands time and resources. Even the most skilled researchers may overlook critical insights from the literature. Our work investigates whether LLMs can identify patterns across vast scientific texts and forecast outcomes of experiments." The international research team began their study by developing BrainBench, a tool to evaluate how well large language models (LLMs) can predict neuroscience results. BrainBench consists of numerous pairs of neuroscience study abstracts. In each pair, one version is a real study abstract that briefly describes the background of the research, the methods used, and the study results. In the other version, the background and methods are the same, but the results have been modified by experts in the relevant neuroscience domain to a plausible but incorrect outcome. The researchers tested 15 different general-purpose LLMs and 171 human neuroscience experts (who had all passed a screening test to confirm their expertise) to see whether the AI or the person could correctly determine which of the two paired abstracts was the real one with the actual study results. All of the LLMs outperformed the neuroscientists, with the LLMs averaging 81% accuracy and the humans averaging 63% accuracy. Even when the study team restricted the human responses to only those with the highest degree of expertise for a given domain of neuroscience (based on self-reported expertise), the accuracy of the neuroscientists still fell short of the LLMs, at 66%. Additionally, the researchers found that when LLMs were more confident in their decisions, they were more likely to be correct. The researchers say this finding paves the way for a future where human experts could collaborate with well-calibrated models. The researchers then adapted an existing LLM (a version of Mistral, an open-source LLM) by training it on neuroscience literature specifically. The new LLM specializing in neuroscience, which they dubbed BrainGPT, was even better at predicting study results, attaining 86% accuracy (an improvement on the general-purpose version of Mistral, which was 83% accurate). Senior author Professor Bradley Love (UCL Psychology & Language Sciences) said, "In light of our results, we suspect it won't be long before scientists are using AI tools to design the most effective experiment for their question. While our study focused on neuroscience, our approach was universal and should successfully apply across all of science. "What is remarkable is how well LLMs can predict the neuroscience literature. This success suggests that a great deal of science is not truly novel, but conforms to existing patterns of results in the literature. We wonder whether scientists are being sufficiently innovative and exploratory." Dr. Luo added, "Building on our results, we are developing AI tools to assist researchers. We envision a future where researchers can input their proposed experiment designs and anticipated findings, with AI offering predictions on the likelihood of various outcomes. This would enable faster iteration and more informed decision-making in experiment design." The study involved researchers at UCL, University of Cambridge, University of Oxford, Max Planck Institute for Neurobiology of Behavior (Germany), Bilkent University (Turkey) and other institutions in the UK, US, Switzerland, Russia, Germany, Belgium, Denmark, Canada, Spain and Australia.
[3]
AI can predict study results better than human experts, researchers find
The findings, published in Nature Human Behaviour, demonstrate that large language models (LLMs) trained on vast datasets of text can distil patterns from scientific literature, enabling them to forecast scientific outcomes with superhuman accuracy. The researchers say this highlights their potential as powerful tools for accelerating research, going far beyond just knowledge retrieval. Lead author Dr Ken Luo (UCL Psychology & Language Sciences) said: "Since the advent of generative AI like ChatGPT, much research has focused on LLMs' question-answering capabilities, showcasing their remarkable skill in summarising knowledge from extensive training data. However, rather than emphasising their backward-looking ability to retrieve past information, we explored whether LLMs could synthesise knowledge to predict future outcomes. "Scientific progress often relies on trial and error, but each meticulous experiment demands time and resources. Even the most skilled researchers may overlook critical insights from the literature. Our work investigates whether LLMs can identify patterns across vast scientific texts and forecast outcomes of experiments." The international research team began their study by developing BrainBench, a tool to evaluate how well large language models (LLMs) can predict neuroscience results. BrainBench consists of numerous pairs of neuroscience study abstracts. In each pair, one version is a real study abstract that briefly describes the background of the research, the methods used, and the study results. In the other version, the background and methods are the same, but the results have been modified by experts in the relevant neuroscience domain to a plausible but incorrect outcome. The researchers tested 15 different general-purpose LLMs and 171 human neuroscience experts (who had all passed a screening test to confirm their expertise) to see whether the AI or the person could correctly determine which of the two paired abstracts was the real one with the actual study results. All of the LLMs outperformed the neuroscientists, with the LLMs averaging 81% accuracy and the humans averaging 63% accuracy. Even when the study team restricted the human responses to only those with the highest degree of expertise for a given domain of neuroscience (based on self-reported expertise), the accuracy of the neuroscientists still fell short of the LLMs, at 66%. Additionally, the researchers found that when LLMs were more confident in their decisions, they were more likely to be correct.* The researchers say this finding paves the way for a future where human experts could collaborate with well-calibrated models. The researchers then adapted an existing LLM (a version of Mistral, an open-source LLM) by training it on neuroscience literature specifically. The new LLM specialising in neuroscience, which they dubbed BrainGPT, was even better at predicting study results, attaining 86% accuracy (an improvement on the general-purpose version of Mistral, which was 83% accurate). Senior author Professor Bradley Love (UCL Psychology & Language Sciences) said: "In light of our results, we suspect it won't be long before scientists are using AI tools to design the most effective experiment for their question. While our study focused on neuroscience, our approach was universal and should successfully apply across all of science. "What is remarkable is how well LLMs can predict the neuroscience literature. This success suggests that a great deal of science is not truly novel, but conforms to existing patterns of results in the literature. We wonder whether scientists are being sufficiently innovative and exploratory." Dr Luo added: "Building on our results, we are developing AI tools to assist researchers. We envision a future where researchers can input their proposed experiment designs and anticipated findings, with AI offering predictions on the likelihood of various outcomes. This would enable faster iteration and more informed decision-making in experiment design." The study was supported by the Economic and Social Research Council (ESRC), Microsoft, and a Royal Society Wolfson Fellowship, and involved researchers in UCL, University of Cambridge, University of Oxford, Max Planck Institute for Neurobiology of Behavior (Germany), Bilkent University (Turkey) and other institutions in the UK, US, Switzerland, Russia, Germany, Belgium, Denmark, Canada, Spain and Australia. Note: * When presented with two abstracts, the LLM computes the likelihood of each, assigning a perplexity score to represent how surprising each is based on its own learned knowledge as well as the context (background and method). The researchers assessed LLMs' confidence by measuring the difference in how surprising/perplexing the models found real versus fake abstracts -- the greater this difference, the greater the confidence, which correlated with a higher likelihood the LLM had picked the correct abstract.
[4]
AI models beat human experts in forecasting neuroscience study results
University College LondonNov 27 2024 Large language models, a type of AI that analyses text, can predict the results of proposed neuroscience studies more accurately than human experts, finds a new study led by UCL (University College London) researchers. The findings, published in Nature Human Behaviour, demonstrate that large language models (LLMs) trained on vast datasets of text can distil patterns from scientific literature, enabling them to forecast scientific outcomes with superhuman accuracy. The researchers say this highlights their potential as powerful tools for accelerating research, going far beyond just knowledge retrieval. Since the advent of generative AI like ChatGPT, much research has focused on LLMs' question-answering capabilities, showcasing their remarkable skill in summarising knowledge from extensive training data. However, rather than emphasising their backward-looking ability to retrieve past information, we explored whether LLMs could synthesise knowledge to predict future outcomes. Scientific progress often relies on trial and error, but each meticulous experiment demands time and resources. Even the most skilled researchers may overlook critical insights from the literature. Our work investigates whether LLMs can identify patterns across vast scientific texts and forecast outcomes of experiments." Dr. Ken Luo, Lead Author, UCL Psychology & Language Sciences The international research team began their study by developing BrainBench, a tool to evaluate how well large language models (LLMs) can predict neuroscience results. BrainBench consists of numerous pairs of neuroscience study abstracts. In each pair, one version is a real study abstract that briefly describes the background of the research, the methods used, and the study results. In the other version, the background and methods are the same, but the results have been modified by experts in the relevant neuroscience domain to a plausible but incorrect outcome. The researchers tested 15 different general-purpose LLMs and 171 human neuroscience experts (who had all passed a screening test to confirm their expertise) to see whether the AI or the person could correctly determine which of the two paired abstracts was the real one with the actual study results. All of the LLMs outperformed the neuroscientists, with the LLMs averaging 81% accuracy and the humans averaging 63% accuracy. Even when the study team restricted the human responses to only those with the highest degree of expertise for a given domain of neuroscience (based on self-reported expertise), the accuracy of the neuroscientists still fell short of the LLMs, at 66%. Additionally, the researchers found that when LLMs were more confident in their decisions, they were more likely to be correct. The researchers say this finding paves the way for a future where human experts could collaborate with well-calibrated models. The researchers then adapted an existing LLM (a version of Mistral, an open-source LLM) by training it on neuroscience literature specifically. The new LLM specialising in neuroscience, which they dubbed BrainGPT, was even better at predicting study results, attaining 86% accuracy (an improvement on the general-purpose version of Mistral, which was 83% accurate). Senior author Professor Bradley Love (UCL Psychology & Language Sciences) said: "In light of our results, we suspect it won't be long before scientists are using AI tools to design the most effective experiment for their question. While our study focused on neuroscience, our approach was universal and should successfully apply across all of science. "What is remarkable is how well LLMs can predict the neuroscience literature. This success suggests that a great deal of science is not truly novel, but conforms to existing patterns of results in the literature. We wonder whether scientists are being sufficiently innovative and exploratory." Dr. Luo added: "Building on our results, we are developing AI tools to assist researchers. We envision a future where researchers can input their proposed experiment designs and anticipated findings, with AI offering predictions on the likelihood of various outcomes. This would enable faster iteration and more informed decision-making in experiment design." The study was supported by the Economic and Social Research Council (ESRC), Microsoft, and a Royal Society Wolfson Fellowship, and involved researchers in UCL, University of Cambridge, University of Oxford, Max Planck Institute for Neurobiology of Behavior (Germany), Bilkent University (Turkey) and other institutions in the UK, US, Switzerland, Russia, Germany, Belgium, Denmark, Canada, Spain and Australia. When presented with two abstracts, the LLM computes the likelihood of each, assigning a perplexity score to represent how surprising each is based on its own learned knowledge as well as the context (background and method). The researchers assessed LLMs' confidence by measuring the difference in how surprising/perplexing the models found real versus fake abstracts - the greater this difference, the greater the confidence, which correlated with a higher likelihood the LLM had picked the correct abstract. University College London Journal reference: Luo, X., et al. (2024). Large language models surpass human experts in predicting neuroscience results. Nature Human Behaviour. doi.org/10.1038/s41562-024-02046-9.
Share
Share
Copy Link
A groundbreaking study reveals that large language models (LLMs) can predict neuroscience study results with greater accuracy than human experts, potentially revolutionizing scientific research and experiment design.
A groundbreaking study led by researchers at University College London (UCL) has demonstrated that large language models (LLMs) can predict the outcomes of neuroscience studies with remarkable accuracy, outperforming human experts in the field 1. The research, published in Nature Human Behaviour, highlights the potential of AI to accelerate scientific progress and reshape the landscape of experimental design.
The research team developed BrainBench, an innovative tool designed to assess the predictive capabilities of LLMs in neuroscience 2. BrainBench consists of pairs of neuroscience study abstracts, where one abstract is genuine, and the other contains modified results crafted by domain experts. This setup allowed researchers to test both AI models and human experts on their ability to distinguish between real and fabricated study outcomes.
In a comprehensive evaluation, 15 general-purpose LLMs were pitted against 171 human neuroscience experts. The results were striking:
These findings demonstrate a significant performance gap between AI and human capabilities in predicting scientific outcomes 3.
Building on their initial success, the researchers developed BrainGPT, a specialized LLM trained specifically on neuroscience literature. This tailored model achieved an even higher accuracy of 86%, surpassing its general-purpose counterpart 4.
Dr. Ken Luo, the lead author from UCL Psychology & Language Sciences, emphasized the potential of LLMs to synthesize knowledge and predict future outcomes, moving beyond mere information retrieval. This capability could significantly reduce the time and resources spent on trial-and-error approaches in scientific research 1.
Professor Bradley Love, a senior author of the study, noted that these findings might soon lead to scientists using AI tools to design more effective experiments across various scientific disciplines. However, he also raised concerns about the predictability of scientific literature, questioning whether researchers are being sufficiently innovative and exploratory 2.
The research team is now developing AI tools to assist researchers in experimental design. They envision a future where scientists can input proposed experiment designs and anticipated findings, with AI providing predictions on the likelihood of various outcomes. This approach could enable faster iteration and more informed decision-making in scientific research 3.
As AI continues to demonstrate its prowess in scientific prediction and analysis, the collaboration between human experts and well-calibrated AI models may become increasingly common, potentially ushering in a new era of accelerated scientific discovery and innovation.
Reference
[1]
[2]
Medical Xpress - Medical and Health News
|AI can predict neuroscience study results better than human experts, study finds[4]
Researchers have developed an AI system capable of predicting human thoughts and providing new insights into brain function. This groundbreaking technology has implications for understanding cognition and potential medical applications.
2 Sources
2 Sources
Australian researchers develop LLM4SD, an AI tool that simulates scientists by analyzing research, generating hypotheses, and providing transparent explanations for predictions across various scientific disciplines.
2 Sources
2 Sources
Researchers from Japanese universities have developed an AI model that surpasses human experts in predicting the quality of organoids, potentially revolutionizing biomedical research and personalized medicine.
2 Sources
2 Sources
A study by USC researchers reveals that AI models, particularly open-source ones, struggle with abstract visual reasoning tasks similar to human IQ tests. While closed-source models like GPT-4V perform better, they still fall short of human cognitive abilities.
4 Sources
4 Sources
A new AI model, BiomedGPT, has been developed as a generalist vision-language foundation model capable of performing various biomedical tasks. This open-source tool combines image and text understanding to support a wide range of medical and scientific applications.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved