4 Sources
4 Sources
[1]
Inside OpenAI's big play for science
An exclusive conversation with Kevin Weil, head of the firm's new AI for Science team. In the three years since ChatGPT's explosive debut, OpenAI's technology has upended a remarkable range of everyday activities at home, at work, in schools -- anywhere people have a browser open or a phone out, which is everywhere. Now OpenAI is making an explicit play for scientists. In October, the firm announced that it had launched a whole new team, called OpenAI for Science, dedicated to exploring how its large language models could help scientists and tweaking its tools to support them. The last couple of months have seen a slew of social media posts and academic publications in which mathematicians, physicists, biologists, and others have described how LLMs (and OpenAI's GPT-5 in particular) have helped them make a discovery or nudged them toward a solution they might otherwise have missed. In part, OpenAI for Science was set up to engage with this community. And yet OpenAI is also late to the party. Google DeepMind, the rival firm behind groundbreaking scientific models such as AlphaFold and AlphaEvolve, has had an AI-for-science team for years. (When I spoke to Google DeepMind's CEO and cofounder Demis Hassabis in 2023 about that team, he told me: "This is the reason I started DeepMind ... In fact, it's why I've worked my whole career in AI.") So why now? How does a push into science fit with OpenAI's wider mission? And what exactly is the firm hoping to achieve? I put these questions to Kevin Weil, a vice president at OpenAI who leads the new OpenAI for Science team, in an exclusive interview last week. On mission Weil is a product guy. He joined OpenAI a couple of years ago as chief product officer after being head of product at Twitter and Instagram. But he started out as a scientist. He got two-thirds of the way through a PhD in particle physics at Stanford University before ditching academia for the Silicon Valley dream. Weil is keen to highlight his pedigree: "I thought I was going to be a physics professor for the rest of my life," he says. "I still read math books on vacation." Asked how OpenAI for Science fits with the firm's existing lineup of white-collar productivity tools or the viral video app Sora, Weil recites the company mantra: "The mission of OpenAI is to try and build artificial general intelligence and, you know, make it beneficial for all of humanity." The impact on science of future versions of this technology could be amazing, he says: New medicines, new materials, new devices. "Think about it helping us understand the nature of reality, helping us think through open problems. Maybe the biggest, most positive impact we're going to see from AGI will actually be from its ability to accelerate science." He adds, "With GPT-5, we saw that becoming possible." As Weil tells it, LLMs are now good enough to be useful scientific collaborators, spitballing ideas, suggesting novel directions to explore, and finding fruitful parallels between a scientist's question and obscure research papers published decades ago or in foreign languages. Ask AI Why it matters to you?BETA Here's why this story might matter to you, according to AI. This is a beta feature and AI hallucinates -- it might get weird An industry I care about is. Tell me why it matters Learn more about how we're using AI. That wasn't the case a year or so ago. Since it announced its first reasoning model, o1, in December 2024, OpenAI has been pushing the envelope of what the technology can do. "You go back a few years and we were all collectively mind-blown that the models could get an 800 on the SAT," says Weil. But soon LLMs were acing math competitions and solving graduate-level physics problems. Last year, OpenAI and Google DeepMind both announced that their LLMs had achieved gold-medal-level performance in the International Math Olympiad, one of the toughest math contests in the world. "These models are no longer just better than 90% of grad students," says Weil. "They're really at the frontier of human abilities." That's a huge claim, and it comes with caveats. Still, there's no doubt that GPT-5 is a big improvement on GPT-4 when it comes to complicated problem-solving. GPT-5 includes a so-called reasoning model, a type of LLM that can break down problems into multiple steps and work through them one by one. This technique has made LLMs far better at solving math and logic problems than they used to be. Measured against an industry benchmark known as GPQA, which includes more than 400 multiple-choice questions that test PhD-level knowledge in biology, physics, and chemistry, GPT-4 scores 39%, well below the human-expert baseline of around 70%. According to OpenAI, GPT-5.2 (the latest update to the model, released in December) scores 92%. Overhyped The excitement is evident -- and perhaps excessive. In October, senior figures at OpenAI, including Weil, boasted on X that GPT-5 had found solutions to several unsolved math problems. Mathematicians were quick to point out that in fact what GPT-5 appeared to have done was dig up existing solutions in old research papers, including at least one written in German. That was still useful, but it wasn't the achievement OpenAI seemed to have claimed. Weil and his colleagues deleted their posts. Now Weil is more careful. It is often enough to find answers that exist but have been forgotten, he says: "We collectively stand on the shoulders of giants, and if LLMs can kind of accumulate that knowledge so that we don't spend time struggling on a problem that is already solved, that's an acceleration all of its own." He plays down the idea that LLMs are about to come up with a game-changing new discovery. "I don't think models are there yet," he says. "Maybe they'll get there. I'm optimistic that they will." But, he insists, that's not the mission: "Our mission is to accelerate science. And I don't think the bar for the acceleration of science is, like, Einstein-level reimagining of an entire field." For Weil, the question is this: "Does science actually happen faster because scientists plus models can do much more, and do it more quickly, than scientists alone? I think we're already seeing that." In November, OpenAI published a series of anecdotal case studies contributed by scientists, both inside and outside the company, that illustrated how they had used GPT-5 and how it had helped. "Most of the cases were scientists that were already using GPT-5 directly in their research and had come to us one way or another saying, 'Look at what I'm able to do with these tools,'" says Weil. The key things that GPT-5 seems to be good at are finding references and connections to existing work that scientists were not aware of, which sometimes sparks new ideas; helping scientists sketch mathematical proofs; and suggesting ways for scientists to test hypotheses in the lab. "GPT 5.2 has read substantially every paper written in the last 30 years," says Weil. "And it understands not just the field that a particular scientist is working in; it can bring together analogies from other, unrelated fields." "That's incredibly powerful," he continues. "You can always find a human collaborator in an adjacent field, but it's difficult to find, you know, a thousand collaborators in all thousand adjacent fields that might matter. And in addition to that, I can work with the model late at night -- it doesn't sleep -- and I can ask it 10 things in parallel, which is kind of awkward to do to a human." Solving problems Most of the scientists OpenAI reached out to back up Weil's position. Robert Scherrer, a professor of physics and astronomy at Vanderbilt University, only played around with ChatGPT for fun ("I used to it rewrite the theme song for Gilligan's Island in the style of Beowulf, which it did very well," he tells me) until his Vanderbilt colleague Alex Lupsasca, a fellow physicist who now works at OpenAI, told him that GPT-5 had helped solve a problem he'd been working on. Lupsasca gave Scherrer access to GPT-5 Pro, OpenAI's $200-a-month premium subscription. "It managed to solve a problem that I and my graduate student could not solve despite working on it for several months," says Scherrer. It's not perfect, he says: "GTP-5 still makes dumb mistakes. Of course, I do too, but the mistakes GPT-5 makes are even dumber." And yet it keeps getting better, he says: "If current trends continue -- and that's a big if -- I suspect that all scientists will be using LLMs soon." Derya Unutmaz, a professor of biology at the Jackson Laboratory, a nonprofit research institute, uses GPT-5 to brainstorm ideas, summarize papers, and plan experiments in his work studying the immune system. In the case study he shared with OpenAI, Unutmaz used GPT-5 to analyze an old data set that his team had previously looked at. The model came up with fresh insights and interpretations. "LLMs are already essential for scientists," he says. "When you can complete analysis of data sets that used to take months, not using them is not an option anymore." Nikita Zhivotovskiy, a statistician at the University of California, Berkeley, says he has been using LLMs in his research since the first version of ChatGPT came out. Like Scherrer, he finds LLMs most useful when they highlight unexpected connections between his own work and existing results he did not know about. "I believe that LLMs are becoming an essential technical tool for scientists, much like computers and the internet did before," he says. "I expect a long-term disadvantage for those who do not use them." But he does not expect LLMs to make novel discoveries anytime soon. "I have seen very few genuinely fresh ideas or arguments that would be worth a publication on their own," he says. "So far, they seem to mainly combine existing results, sometimes incorrectly, rather than produce genuinely new approaches." I also contacted a handful of scientists who are not connected to OpenAI. Andy Cooper, a professor of chemistry at the University of Liverpool and director of the Leverhulme Research Centre for Functional Materials Design, is less enthusiastic. "We have not found, yet, that LLMs are fundamentally changing the way that science is done," he says. "But our recent results suggest that they do have a place." Cooper is leading a project to develop a so-called AI scientist that can fully automate parts of the scientific workflow. He says that his team doesn't use LLMs to come up with ideas. But the tech is starting to prove useful as part of a wider automated system where an LLM can help direct robots, for example. "My guess is that LLMs might stick more in robotic workflows, at least initially, because I'm not sure that people are ready to be told what to do by an LLM," says Cooper. "I'm certainly not." Making errors LLMs may be becoming more and more useful, but caution is still advised. In December, Jonathan Oppenheim, a scientist who works on quantum mechanics, called out a mistake that made its way into a scientific journal. "OpenAI leadership are promoting a paper in Physics Letters B where GPT-5 proposed the main idea -- possibly the first peer-reviewed paper where an LLM generated the core contribution," Oppenheim posted on X. "One small problem: GPT-5's idea tests the wrong thing." He continued: "GPT-5 was asked for a test that detects nonlinear theories. It provided a test that detects nonlocal ones. Related-sounding, but different. It's like asking for a COVID test, and the LLM cheerfully hands you a test for chickenpox." It is clear that a lot of scientists are finding innovative and intuitive ways to engage with LLMs. It is also clear that the technology makes mistakes that can be so subtle even experts miss them. Part of the problem is the way ChatGPT can flatter you into letting down your guard. As Oppenheim put it: "A core issue is that LLMs are being trained to validate the user, while science needs tools that challenge us." In an extreme case, one individual (who was not a scientist) was persuaded by ChatGPT into thinking for months that he'd invented a new branch of mathematics. Of course, Weil is well aware of the problem of hallucination. But he insists that newer models are hallucinating less and less. Even so, focusing on hallucination might be missing the point, he says. "One of my teammates here, an ex math professor, said something that stuck with me," says Weil. "He said: 'When I'm doing research, if I'm bouncing ideas off a colleague, I'm wrong 90% of the time and that's kind of the point. We're both spitballing ideas and trying to find something that works.'" "That's actually a desirable place to be," says Weil. "If you say enough wrong things and then somebody stumbles on a grain of truth and then the other person seizes on it and says, 'Oh, yeah, that's not quite right, but what if we -- ' You gradually kind of find your trail through the woods." This is Weil's core vision for OpenAI for Science. GPT-5 is good, but it is not an oracle. The value of this technology is in pointing people in new directions, not coming up with definitive answers, he says. In fact, one of the things OpenAI is now looking at is making GPT-5 dial down its confidence when it delivers a response. Instead of saying Here's the answer, it might tell scientists: Here's something to consider. "That's actually something that we are spending a bunch of time on," says Weil. "Trying to make sure that the model has some sort of epistemological humility." Another thing OpenAI is looking at is how to use GPT-5 to fact-check GPT-5. It's often the case that if you feed one of GPT-5's answers back into the model, it will pick it apart and highlight mistakes. "You can kind of hook the model up as its own critic," says Weil. "Then you can get a workflow where the model is thinking and then it goes to another model, and if that model finds things that it could improve, then it passes it back to the original model and says, 'Hey, wait a minute -- this part wasn't right, but this part was interesting. Keep it.' It's almost like a couple of agents working together and you only see the output once it passes the critic." What Weil is describing also sounds a lot like what Google DeepMind did with AlphaEvolve, a tool that wrapped the LLM Gemini inside a wider system that filtered out the good responses from the bad and fed them back in again to be improved on. Google DeepMind has used AlphaEvolve to solve several real-world problems. OpenAI faces stiff competition from rival firms, whose own LLMs can do most, if not all, of the things it claims for its own models. If that's the case, why should scientists use GPT-5 instead of Gemini or Anthropic's Claude, families of models that are themselves improving every year? Ultimately, OpenAI for Science may be as much an effort to a flag in new territory as anything else. The real innovations are still to come. "I think 2026 will be for science what 2025 was for software engineering," says Weil. "At the beginning of 2025, if you were using AI to write most of your code, you were an early adopter. Whereas 12 months later, if you're not using AI to write most of your code, you're probably falling behind. We're now seeing those same early flashes for science as we did for code." He continues: "I think that in a year, if you're a scientist and you're not heavily using AI, you'll be missing an opportunity to increase the quality and pace of your thinking."
[2]
Scientists are sending millions of questions to AI, speeding up research
AI contributes to solutions in mathematics, confirmed by human experts * OpenAI claims 8.4 million weekly messages are sent about advanced science and mathematics * GPT-5.2 models can follow long reasoning chains and verify results independently * AI accelerates routine research tasks like coding, literature review, and experiment planning OpenAI wants users to treat ChatGPT as a research collaborator, with new research claiming nearly 8.4 million messages are sent every week focus on advanced science and mathematics topics, generated by roughly 1.3 million users worldwide. OpenAI highlights this usage has grown almost 50% over the past year, suggesting the system is moving beyond occasional experimentation into regular research workflows. These users reportedly engage in work comparable to graduate-level study or active research across mathematics, physics, chemistry, biology, and engineering. Usage scale and research integration Mathematics receives particular attention in the report. GPT-5.2 models are said to sustain long reasoning chains, check their own work, and operate with formal proof systems like Lean. OpenAI claims the models achieved gold-level results at the 2025 International Mathematical Olympiad and demonstrated partial success on the FrontierMath benchmark. The report also states the models contributed to solutions connected to open Erdős problems, with human mathematicians confirming the results. While the models do not generate entirely new mathematical theories, they recombine known ideas and identify connections across fields, which speeds up formal verification and proof discovery. Similar patterns appear in other scientific areas. On graduate-level benchmarks such as GPQA, GPT-5.2 reportedly exceeds 92% accuracy without external tools. Physics laboratories reportedly use AI to integrate simulations, experimental logs, documentation, and control systems while also supporting theoretical exploration. In chemistry and biology, hybrid approaches pair general-purpose language models with specialized tools such as graph neural networks and protein structure predictors. These combinations aim to improve reliability while keeping human oversight central to decision-making. The report places these developments in a broader context. Scientific progress supports medicine, energy systems, and public safety, yet research often advances slowly and requires substantial labor. A small portion of the global population produces most foundational discoveries, while projects such as drug development can take more than a decade. OpenAI argues that researchers increasingly use AI tools to handle routine, time-consuming tasks, including coding, literature review, data analysis, simulation support, and experiment planning. It cites case studies ranging from faster mathematical proofs to protein design with RetroBioSciences, where AI reportedly shortened timelines from years to months. Although the report presents notable usage figures and benchmark results, independent validation remains limited. Questions remain about how well these results hold up over time, how broadly they apply, and whether the reported gains translate into lasting scientific advances. These usage figures and benchmark scores stand out, but independent validation is still limited. Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button! And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.
[3]
Exclusive: OpenAI wants to be a scientific research partner
Why it matters: OpenAI argues that AI can make scientists more productive by upping the amount of research that can get done, ultimately leading to more life-saving breakthroughs. By the numbers: Per OpenAI's report, an internal analysis of a random sample of anonymized ChatGPT conversations from January to December of last year showed: * Average weekly message counts on "advanced hard-science topics" grew nearly 47% over the year. * As of January of this year, nearly 1.3 million weekly users are discussing "advanced topics in hard science" with an average of 8.4 million ChatGPT messages on those topics. Topics include graduate and research-level math, physics, chemistry, biology and engineering. * Among the OpenAI users and messages sampled, ChatGPT was used most for advanced research in computer science, data science and AI. What they're saying: "More researchers are using advanced reasoning systems to make progress on open problems, interpret complex data, and iterate faster in experimental work," Kevin Weil, VP of OpenAI for Science, said in the report. * "We're still early, but the pace of adoption and the quality ofthe work suggest science is entering a new acceleration phase." How it works: Most scientists and engineers use ChatGPT for writing and communications, per the report. The smallest share use it for analysis and calculations. * GPT-5.2 has now "progressed past competition level performance toward mathematical discovery," according to the report, with the most users turning to it for structural equation models. * The report also shows frequent ChatGPT use for computational chemistry and particle physics, among other types of biology, chemistry and physics work. What we're watching: OpenAI is urging policymakers to enhance science and research uses of AI, including scaling AI skilling, opening up data and frontier AI access to more people, and modernizing AI infrastructure. Disclosure: Axios and OpenAI have a licensing and technology agreement that allows OpenAI to access part of Axios' story archives while helping fund the launch of Axios into four local cities and providing some AI tools. Axios has editorial independence.
[4]
ChatGPT Is Being Used as a Scientific Collaborator, Says OpenAI
AI engagement in science spans maths, physics, biology and chemistry OpenAI has outlined a growing role for artificial intelligence (AI) systems as collaborators in scientific research, arguing that tools like ChatGPT can help researchers make progress on complex problems across disciplines including mathematics, physics, chemistry and biology. The 20-page report highlights that scientists are increasingly turning to AI models to assist with literature synthesis, data interpretation and experiment planning, and emphasises that advancing AI's research capabilities could accelerate discovery and productivity in science and engineering. OpenAI Pitches ChatGPT as a Scientific Collaborator The report, which was exclusively shared with Axios, cites anonymised usage data showing that millions of scientists, engineers and mathematicians are already asking advanced questions and using AI to support scholarly work, ranging from drafting scientific text to debugging code and planning experiments. According to OpenAI's analysis, researchers mainly use AI for writing and communication tasks, while fewer use it for rigorous analysis and calculations, suggesting opportunities for deeper integration of AI into the research process. OpenAI said that anonymised ChatGPT conversations between January to December 2025 shows that average weekly message counts on advanced science and mathematics-related topics grew about 47 percent year-on-year (YoY). In absolute numbers, the messages were claimed to increase from 5.7 million to nearly 8.4 million. In January 2026, the AI giant said that nearly 1.3 million weekly users were discussing advanced topics in science and mathematics. "AI is increasingly being used as a scientific collaborator, and we're seeing its impact grow in real research settings. More researchers are using advanced reasoning systems to make progress on open problems, interpret complex data, and iterate faster in experimental work. That usage has been growing quickly over the past year, and the results are starting to show up across fields. We're still early, but the pace of adoption and the quality of the work suggest science is entering a new acceleration phase," said Kevin Weil, VP of OpenAI for Science. The notion of AI as a collaborator aligns with initiatives such as OpenAI for Science, a programme that seeks to connect scientists and mathematicians with AI tools designed to accelerate research workflows from literature analysis to modelling and simulation. OpenAI stated that its goal is to build systems that integrate naturally into scientific practice, helping researchers explore ideas, test hypotheses faster and unlock discoveries that might take years under traditional methods.
Share
Share
Copy Link
OpenAI has established a new AI for Science team led by Kevin Weil to help scientists accelerate research across mathematics, physics, chemistry, and biology. The company reports that 1.3 million weekly users now send 8.4 million messages on advanced science topics, representing 47% growth over the past year. GPT-5.2 achieves 92% accuracy on graduate-level benchmarks, though questions remain about long-term validation.
OpenAI has launched a new division called OpenAI for Science, marking an explicit push to position its large language models as partners in scientific research. Led by Kevin Weil, a vice president who joined the company as chief product officer after stints at Twitter and Instagram, the team was announced in October 2025 and aims to explore how AI tools for researchers can support work across mathematics, physics, chemistry, and biology
1
. Weil, who abandoned a particle physics PhD at Stanford for Silicon Valley, frames the initiative as central to OpenAI's mission of building artificial general intelligence that benefits humanity. "Maybe the biggest, most positive impact we're going to see from AGI will actually be from its ability to accelerate scientific research," he told MIT Technology Review1
.
Source: TechRadar
Internal analysis of anonymized ChatGPT conversations from January to December 2025 reveals substantial adoption among researchers. Average weekly message counts on advanced science and mathematics topics grew nearly 47% year-over-year, climbing from 5.7 million to approximately 8.4 million messages
2
4
. As of January 2026, nearly 1.3 million weekly users discuss advanced topics in scientific research with the AI, spanning graduate and research-level work3
. Computer science, data science, and AI represent the most common domains, though engagement extends to computational chemistry, particle physics, and structural equation models3
. Kevin Weil emphasized that "more researchers are using advanced reasoning systems to make progress on open problems, interpret complex data, and iterate faster in experimental work"3
.The company attributes growing adoption to significant improvements in GPT-5's capabilities for complex problem-solving. On GPQA, an industry benchmark with over 400 multiple-choice questions testing PhD-level knowledge in biology, physics, and chemistry, GPT-5.2 reportedly scores 92% compared to GPT-4's 39% and a human-expert baseline of around 70%
1
. The model achieved gold-level results at the 2025 International Mathematical Olympiad and demonstrated partial success on the FrontierMath benchmark2
. GPT-5.2 models can sustain long reasoning chains, verify results independently, and operate with formal proof systems like Lean2
. OpenAI claims the models contributed to solutions connected to open Erdős problems, with human mathematicians confirming the results2
.Most scientists and engineers use ChatGPT for writing and communication tasks, while a smaller share employ it for rigorous analysis and calculations
3
. Researchers increasingly turn to the system as a scientific collaborator for routine, time-consuming activities including coding, literature synthesis, data interpretation, simulation support, and experiment planning4
. In chemistry and biology, hybrid approaches pair general-purpose large language models with specialized tools such as graph neural networks and protein structure predictors2
. Physics laboratories reportedly use AI to integrate simulations, experimental logs, documentation, and control systems while supporting theoretical exploration2
. OpenAI cites case studies where AI shortened protein design timelines from years to months at RetroBioSciences2
.
Source: Axios
Related Stories
Despite the momentum, OpenAI enters a space where Google DeepMind has operated for years with groundbreaking scientific models such as AlphaFold and AlphaEvolve
1
. When Demis Hassabis, Google DeepMind's CEO and cofounder, discussed his firm's AI-for-science team in 2023, he stated: "This is the reason I started DeepMind ... In fact, it's why I've worked my whole career in AI"1
. OpenAI's timing reflects recent advances in reasoning systems that have elevated AI capabilities beyond SAT-level performance to graduate-level work1
. While the models do not generate entirely new mathematical theories, they recombine known ideas and identify connections across fields, which speeds up formal verification and scientific discovery2
.Independent validation of OpenAI's reported gains remains limited
2
. Questions persist about how well these results hold up over time, how broadly they apply, and whether the reported gains translate into lasting scientific advances2
. OpenAI argues that scientific progress supports medicine, energy systems, and public safety, yet research often advances slowly and requires substantial labor, with projects such as drug development taking more than a decade2
. The company is urging policymakers to enhance science and research uses of AI, including scaling AI skilling, opening up data and frontier AI access to more people, and modernizing AI infrastructure3
. For researchers watching this space, the key question is whether AI can move from handling routine tasks to contributing genuine insights that reshape how scientific discovery unfolds.Summarized by
Navi
[1]
1
Policy and Regulation

2
Business and Economy

3
Technology
