On social media, even the weather isn't safe from artificial intelligence slop. When Hurricane Melissa devastated Jamaica this summer, for example, phony AI-generated videos fooled some people into thinking things were even worse. (No, there weren't really sharks swimming in hotel pools.)
Behind the scenes, however, a different kind of AI effort was doing some good. While Melissa was gathering strength, a Google software model provided the earliest accurate predictions of the storm's path and power, helping people prepare for the worst. And other AI models just might help answer the age-old question of whether it will rain next Tuesday.
Google, Huawei, Microsoft, Nvidia and a small army of startups and university research teams have spent millions of dollars to develop AI forecasting tools. Unlike with traditional meteorological computer models, though, the people who created these tools can't quite explain how they work. "It's sort of a black box that's producing these weather forecasts," says Todd Hutchinson, the director of meteorology at the weather startup Windborne Systems. "It's hard to know how we figured something out."
The tech giants are already embedding these models in their cloud platforms, with the idea that they'll become essential utilities. Energy companies, transit operators and commodity traders are already using AI forecasting tools. Soon enough, anyone trying to plan their kid's outdoor birthday party will likely eyeball a forecast that includes AI inputs -- Google's model is already feeding data into its Maps and Search products.
At a moment when the AI landscape seems awfully bubbly, weather prediction is offering real value. It's the kind of modeling task suited for modern machine learning, one with a vast and growing corpus of training data. Already, the European Centre for Medium-Range Weather Forecasts (ECMWF) is operating a global AI model that outperforms old-school, non-AI simulators on several benchmarks, and it's being used by forecasters across the continent. The US National Weather Service rolled out its own AI forecasting tools in December.
These models use neural networks, the same architecture behind large language models, but train them on datasets derived from temperature and humidity sensors rather than text. Developers turn the programs loose, and they spit out predictions about cold fronts and hurricane tracks that are more accurate than the state-of-the-art systems used at government weather agencies. Perhaps more important, they're far cheaper and faster than traditional forecasting models.
Still, what's going under the hood is less clear. Is the software just really good at pattern matching, or is it deriving a fundamental understanding of atmospheric behavior? The distinction may not matter if the forecasts are useful, but it does matter if we want to use these models to study the way the atmosphere works.
"I would have been pretty skeptical that they've learned physics, but I really had my mind changed," says Mike Pritchard, Nvidia's director of climate simulation research. Pritchard cites a study by Greg Hakim, a professor of atmospheric and climate science at the University of Washington and co-author of an important textbook on atmospheric dynamics.
Hakim's research interests range across the ways we use computers to simulate the weather, and when the new deep learning models arrived, he decided to test them out. He took a Huawei Technologies Co. model called Pangu-Weather and tested its ability to match the behavior we see in the atmosphere. In one example, the scientists dropped a low-pressure system into the mid-Atlantic -- would something like a hurricane pop up? "Lo and behold, the Pangu-Weather model produced very realistic responses to how we were poking it," Hakim says of the paper, published in 2024 in the journal Artificial Intelligence for the Earth Systems. "That was a shock. These models have learned physics!"
Not every scientist goes that far, but there's no denying the excitement about this approach to simulating the weather. Matthew Chantry, a mathematician who leads AI work at the ECMWF, marvels at one of the key abilities of the new models: They can quickly forecast many hours ahead, whereas traditional computer simulations inch forward minutes at a time. "If we write down the physical equations and solve those on a supercomputer, for that to be stable we have to take a tiny time step," Chantry says. Otherwise, "the information propagates too quickly and the whole system breaks down." AI models don't seem to suffer from that problem; they appear to have made a breakthrough that meteorologists cannot yet reverse-engineer.
Hakim and his co-author, grad student Trent Vonich, have put forward a bold claim. They say this approach has the potential to blow past what have been considered fundamental limits on our ability to predict changes in the atmosphere. In other words, AI may have pinned down the butterfly effect.
In case you need a refresher, the butterfly effect describes the idea that small changes to a complex system can have massive and unforeseen effects. A butterfly flapping its wings, that is, might well be responsible for generating a megacyclone halfway around the world. As a practical matter, the butterfly effect has long imposed boundaries on our ability to forecast the weather. At its most tangible level, it's about people being surprised by their computers.
The concept traces its origins to 1960, when Massachusetts Institute of Technology researcher Edward Lorenz was playing around with a much earlier digital simulation of a weather. (Not "the" weather, just "a" weather.) Back then, serious meteorologists eschewed both forecasting and computing. But Lorenz found that when he changed the inputs in his computerized weather simulation by just a few ten-thousandths of a unit, the simulation generated wildly different results, apparently at random.
These confounding theoretical storm fronts helped birth a whole new way of understanding complex systems. Chaos theory examines the hidden uncertainty in life, searching for patterns in the apparently random behavior of animal populations or financial markets. Generations of meteorologists have built on Lorenz's work to create increasingly reliable simulations through traditional means -- feeding their computers real weather data, drawn from sensors around the globe, and coding software that describes the atmosphere in mathematical detail.
Traditional models transform all of this into a 3D grid that provides the raw material for every 10-day forecast you see. The predictions, however, get far less reliable past the seven-day mark, and flat-out unreliable after 14 days. This decline owes to the immense complexity of Earth's atmosphere and the inevitable uncertainty in trying to measure it. Once he got his head around the butterfly effect, Lorenz himself thought long-term forecasting was a waste of time. Only in the past few years has AI given like-minded researchers reason to reconsider.
In 2021, a historic heat wave hit the Pacific Northwest, settling over Portland for three days and driving temperatures as high as 116F. Seventy-two people died from heat-related causes. That extreme event so close to home gave Vonich and Hakim an opportunity to test GraphCast, an AI forecasting model built by Google DeepMind.
The researchers gave their AI model measurements of the heat wave, then asked it to work backward to find the inputs that would have generated a totally accurate prediction. This technique, called backpropagation, is fundamental to training the neural networks that underpin today's AI. It allows models to identify their own errors and correct them. Traditional forecasting simulations struggle to do this -- running the models is already time-consuming and expensive on top of the challenge of reversing the complex equations used to describe the atmosphere. "This is very much like what we've been using with physics models for decades, except now we basically get it for free," Hakim says.
The model supplied Hakim and Vonich with initial conditions it predicted would be ideal, and when they fed those back into the model, they found it improved the forecasts of the heat wave by more than 90%. Predicting the future after it's happened may seem like an easy trick, but the experiment revealed something important: far more margin for improved forecasting than the conventional wisdom understood.
The duo set out to replicate the experiment with an entire year of forecasts, detailed in a paper released last spring. Deep learning models using optimal initial conditions were able to make useful forecasts almost a month in advance. "If the theory about the upper limit of forecasting at two weeks is correct, then these models should not be able to do what we're showing they can do," Hakim says.
These new AI models don't deliver chaos the way traditional weather simulations do. Critics say that's because they aren't as precise and don't take lots of important things into account. "They are just pattern machines," says Tobias Selz, a German postdoctoral researcher in atmospheric science at the Karlsruhe Institute for Technology. "You can produce precipitation within an AI model without having any moisture in the atmosphere." This doesn't mean an AI model understands physics, Selz says, only that it hand-waves the tricky parts.
Hakim says that might be a feature, not a bug. "When a thunderstorm pops up in a physics model, it's like a rock in the pond," he says. "It disturbs things, and that will lead to the growth of errors. These machine learning models don't have that property -- they don't even resolve those scales."
The next challenge is using this insight to make better forecasts. Hakim and Vonich fed the optimal initial conditions generated by their experiment with GraphCast into Pangu and saw improvement, but not as much, suggesting model error plays a role. John Schreck, a researcher at the National Center for Atmospheric Research, successfully replicated Hakim's experiment with a different AI forecasting model, but an initial attempt to use the results to fine-tune its forecasts didn't succeed.
Hakim and Vonich plan to apply their techniques to other historical extreme weather events, and to look for near misses: weather patterns that could have become destructive if the atmosphere had been just a little different. A flap of the butterfly's wings, if you will. They'll be aided by new models that continue to roll out of frontier labs, and soon a more granular global dataset from the ECMWF.
"You can now do the global forecast on a desktop computer with the right GPU, whereas before you needed a multi-hundred-million-dollar supercomputer," Vonich says. "It just democratizes access and the ability to research."
For now, AI-powered forecasts still mostly augment traditional physics-based simulations in the weather business -- the deep learning models still depend on old-school datasets. But examining where both kinds of models diverge is forcing meteorologists to rethink previous assumptions. Even skeptics like Selz see possibilities as AI is brought to bear on more aspects of weather prediction, particularly pulling raw weather data into models. If the quality of the initial conditions can be improved, he says, he can imagine pushing weather reports forward by as much as a few days. According to one recent estimate, such an improvement could be worth $2.1 billion in annual global output. In addition to the money, these models may even save lives -- and spare the rest of us from getting soaked.