11 Sources
[1]
Certain AI prompts generate 50x more CO₂ than others
In recent years, researchers and climate advocates have been ringing the alarm about artificial intelligence's impact on the environment. Advanced and increasingly popular large language models (LLMs) -- such as those offered by OpenAI and Google -- reside in massive data centers that consume significant amounts of electricity and water to cool servers. Every time someone types a question or phrase into one of these platforms, the energy used to generate a response produces a measurable amount of potentially harmful CO₂. But, according to a new research published in Frontiers in Communication, not all of those prompts leave have the same environmental impact. Not even close. The study looked at 14 different LLMs, each varying in the size of their training data, and evaluated their performance using a standardized set of 500 questions across different subject areas. Each model generates a certain number of "thinking tokens" per query, and those tokens correlate with CO₂ emissions. When the researchers compared the responses, they found that more complex "reasoning models" -- which have larger training sets and take longer to process and respond -- produced significantly more CO₂ than smaller, more efficient "concise models." In some cases, reasoning models generated up to 50 times the emissions of their more concise counterparts. Aside from the models themselves, the amount of CO₂ generated by prompts also varied based on subject matter. More complex or open-ended questions, such as those involving advanced algebra or philosophy, tended to produce a larger carbon output than simpler prompts, like high school history questions. These findings shed further light on the often-overlooked ways AI models contribute to soaring energy consumption. Related: [AI will require even more energy than we thought] "The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions," Maximilian Dauner, PhD student at Hochschule München University of Applied Sciences and paper author, said in a statement. Reasoning models -- sometimes called "thinking models" -- refer to large LLMs optimized for solving more complex tasks that require logic, step-by-step breakdowns, or detailed instructions. These models often go by different names. At OpenAI, for example, GPT-4o and GPT-4o-mini are considered "generalized" models, while versions like o1 and o3-mini are classified as reasoning models. Reasoning models employ what some LLM researchers call "chain-of-thought" processing, allowing them to respond more deliberately than generalized models, which prioritize speed and clarity. The end goal is for reasoning models to generate more human-like responses. The most obvious by-product of that, for anyone who has used these, is that reasoning models take longer to generate answers. The researchers found that the reasoning models generated significantly more tokens, which correlate with CO₂ emissions, than the more concise models. (Tokens refer to words or parts of words that are converted into numerical representations the LLM can understand.) The testing occurred in two phases. In the first phase, researchers asked the same multiple-choice questions to models. The next, free response phrase, had the models provide written responses. On average, reasoning models generated 543.5 tokens per question, compared to just 37.7 tokens for concise models. The most accurate reasoning model they examined, called "Cogito," produced three times as much CO₂ as similarly sized models optimized for concise responses. "From an environmental perspective, reasoning models consistently exhibited higher emissions, driven primarily by their elevated token production," the researchers write in the paper. While the difference in emissions per individual prompt might seem marginal, it can make a real difference when scaled up. The researchers estimate that asking DeepSeek's R1 model to answer 600,000 questions would generate roughly the same amount of CO₂ as a round-trip flight from London to New York. By comparison, you could ask the non-reasoning Qwen 2.5 model three times as many questions before reaching the same level of emissions. Overall, the researchers say that their findings highlight a fundamental trade-off between LLM accuracy and environmental sustainability. "As model size increases, accuracy tends to improve," the researchers said. "However, this gain is also linked to substantial growth in both CO₂ emissions and the number of generated tokens." The findings come amid a fierce global race among tech companies to develop increasingly advanced AI models. Over the past year alone, Apple has announced plans to invest $500 billion in manufacturing and data centers over the next four years. Similarly, Project Stargate -- a joint initiative by OpenAI, SoftBank, and Oracle -- has also pledged to spend $500 billion on AI-focused data centers. Researchers warn that this surge in infrastructure could place additional strain on already overburdened energy grids. AI applications, in particular, play an outsized role in the energy consumption of newer data centers. A recent report in the MIT Technology Review notes that starting around 2017, data centers began incorporating more energy-intensive hardware specifically designed for complex AI computations. Energy use surged after that. The Electric Power Research Institute (EPRI) estimates that data centers supporting advanced AI models could account for up to 9.1 percent of the United States' total energy demand by the end of the decade -- up from approximately 4.4 percent today. Companies are scrambling to find new ways to meet this growing energy demand. Meta, Google, and Microsoft have all partnered with nuclear power plants to generate more electricity. Microsoft, one of OpenAI's primary partners, even signed a 20-year agreement to source energy from the Three Mile Island nuclear facility in Pennsylvania, a site once known for the worst reactor accident in U.S. history. Meta is also making major investments in geothermal technology as a less fossil fuel-intensive way to generate power. Others, like OpenAI CEO Sam Altman, who has said the coming age of AI will require an "energy breakthrough" are investing in experimental nuclear fusion. These investments may help companies make progress, but recent research indicates it's almost certain that more fossil fuels -- namely natural gas -- will be needed to fully meet AI's massive energy demand. Related: [The future of AI is even more fossil fuels] That may all sound daunting, but the researchers comparing different types of models say their findings could help empower everyday AI users to take steps to reduce their own carbon impact. If users understand how much more energy-intensive reasoning models are, they may choose to use them more sparingly and rely on concise models for general everyday tasks, such as web searches and answering basic questions. "If users know the exact CO₂ cost of their AI-generated outputs, such as casually turning themselves into an action figure, they might be more selective and thoughtful about when and how they use these technologies," Dauner said.
[2]
Thinking AI models emit 50x more CO2 -- and often for nothing
No matter which questions we ask an AI, the model will come up with an answer. To produce this information - regardless of whether than answer is correct or not - the model uses tokens. Tokens are words or parts of words that are converted into a string of numbers that can be processed by the LLM. This conversion, as well as other computing processes, produce CO emissions. Many users, however, are unaware of the substantial carbon footprint associated with these technologies. Now, researchers in Germany measured and compared CO emissions of different, already trained, LLMs using a set of standardized questions. "The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions," said first author Maximilian Dauner, a researcher at Hochschule München University of Applied Sciences and first author of the Frontiers in Communication study. "We found that reasoning-enabled models produced up to 50 times more CO2 emissions than concise response models." 'Thinking' AI causes most emissions The researchers evaluated 14 LLMs ranging from seven to 72 billion parameters on 1,000 benchmark questions across diverse subjects. Parameters determine how LLMs learn and process information. Reasoning models, on average, created 543.5 'thinking' tokens per questions, whereas concise models required just 37.7 tokens per question. Thinking tokens are additional tokens that reasoning LLMs generate before producing an answer. A higher token footprint always means higher CO2 emissions. It doesn't, however, necessarily mean the resulting answers are more correct, as elaborate detail that is not always essential for correctness. The most accurate model was the reasoning-enabled Cogito model with 70 billion parameters, reaching 84.9% accuracy. The model produced three times more CO emissions than similar sized models that generated concise answers. "Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies," said Dauner. "None of the models that kept emissions below 500 grams of CO2 equivalent achieved higher than 80% accuracy on answering the 1,000 questions correctly." CO equivalent is the unit used to measure the climate impact of various greenhouse gases. Subject matter also resulted in significantly different levels of CO emissions. Questions that required lengthy reasoning processes, for example abstract algebra or philosophy, led to up to six times higher emissions than more straightforward subjects, like high school history. Practicing thoughtful use The researchers said they hope their work will cause people to make more informed decisions about their own AI use. "Users can significantly reduce emissions by prompting AI to generate concise answers or limiting the use of high-capacity models to tasks that genuinely require that power," Dauner pointed out. Choice of model, for instance, can make a significant difference in CO emissions. For example, having DeepSeek R1 (70 billion parameters) answer 600,000 questions would create CO emissions equal to a round-trip flight from London to New York. Meanwhile, Qwen 2.5 (72 billion parameters) can answer more than three times as many questions (about 1.9 million) with similar accuracy rates while generating the same emissions. The researchers said that their results may be impacted by the choice of hardware used in the study, an emission factor that may vary regionally depending on local energy grid mixes, and the examined models. These factors may limit the generalizability of the results. "If users know the exact CO2 cost of their AI-generated outputs, such as casually turning themselves into an action figure, they might be more selective and thoughtful about when and how they use these technologies," Dauner concluded.
[3]
Advanced AI models generate up to 50 times more CO₂ emissions than more common LLMs when answering the same questions
The processes used by advanced reasoning models generate significantly more emissions than those of conventional peers. (Image credit: Getty Images) The more accurate we try to make AI models, the bigger their carbon footprint -- with some prompts producing up to 50 times more carbon dioxide emissions than others, a new study has revealed. Reasoning models, such as Anthropic's Claude, OpenAI's o3 and DeepSeek's R1, are specialized large language models (LLMs) that dedicate more time and computing power to produce more accurate responses than their predecessors. Yet, aside from some impressive results, these models have been shown to face severe limitations in their ability to crack complex problems. Now, a team of researchers has highlighted another constraint on the models' performance -- their exorbitant carbon footprint. They published their findings June 19 in the journal Frontiers in Communication. "The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions," study first author Maximilian Dauner, a researcher at Hochschule München University of Applied Sciences in Germany, said in a statement. "We found that reasoning-enabled models produced up to 50 times more CO₂ emissions than concise response models." To answer the prompts given to them, LLMs break up language into tokens -- word chunks that are converted into a string of numbers before being fed into neural networks. These neural networks are tuned using training data that calculates the probabilities of certain patterns appearing. They then use these probabilities to generate responses. Reasoning models further attempt to boost accuracy using a process known as "chain-of-thought." This is a technique that works by breaking down one complex problem into smaller, more digestible intermediary steps that follow a logical flow, mimicking how humans might arrive at the conclusion to the same problem. Related: AI 'hallucinates' constantly, but there's a solution However, these models have significantly higher energy demands than conventional LLMs, posing a potential economic bottleneck for companies and users wishing to deploy them. Yet, despite some research into the environmental impacts of growing AI adoption more generally, comparisons between the carbon footprints of different models remain relatively rare. To examine the CO₂ emissions produced by different models, the scientists behind the new study asked 14 LLMs 1,000 questions across different topics. The different models had between 7 and 72 billion parameters. The computations were performed using a Perun framework (which analyzes LLM performance and the energy it requires) on an NVIDIA A100 GPU. The team then converted energy usage into CO₂ by assuming each kilowatt-hour of energy produces 480 grams of CO₂. Their results show that, on average, reasoning models generated 543.5 tokens per question compared to just 37.7 tokens for more concise models. These extra tokens -- amounting to more computations -- meant that the more accurate reasoning models produced more CO₂. The most accurate model was the 72 billion parameter Cogito model, which answered 84.9% of the benchmark questions correctly. Cogito released three times the CO₂ emissions of similarly sized models made to generate answers more concisely. "Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies," said Dauner. "None of the models that kept emissions below 500 grams of CO₂ equivalent [total greenhouse gases released] achieved higher than 80% accuracy on answering the 1,000 questions correctly." But the issues go beyond accuracy. Questions that needed longer reasoning times, like in algebra or philosophy, caused emissions to spike six times higher than straightforward look-up queries. The researchers' calculations also show that the emissions depended on the models that were chosen. To answer 60,000 questions, DeepSeek's 70 billion parameter R1 model would produce the CO₂ emitted by a round-trip flight between New York and London. Alibaba Cloud's 72 billion parameter Qwen 2.5 model, however, would be able to answer these with similar accuracy rates for a third of the emissions. The study's findings aren't definitive; emissions may vary depending on the hardware used and the energy grids used to supply their power, the researchers emphasized. But they should prompt AI users to think before they deploy the technology, the researchers noted.. "If users know the exact CO₂ cost of their AI-generated outputs, such as casually turning themselves into an action figure, they might be more selective and thoughtful about when and how they use these technologies," Dauner said.
[4]
Why Some AI Models Spew 50 Times More Greenhouse Gas to Answer the Same Question
It's possible for you to make your LLM use "greener," according to new research. Like it or not, large language models have quickly become embedded into our lives. And due to their intense energy and water needs, they might also be causing us to spiral even faster into climate chaos. Some LLMs, though, might be releasing more planet-warming pollution than others, a new study finds. Queries made to some models generate up to 50 times more carbon emissions than others, according to a new study published in Frontiers in Communication. Unfortunately, and perhaps unsurprisingly, models that are more accurate tend to have the biggest energy costs. It’s hard to estimate just how bad LLMs are for the environment, but some studies have suggested that training ChatGPT used up to 30 times more energy than the average American uses in a year. What isn’t known is whether some models have steeper energy costs than their peers as they’re answering questions. Researchers from the Hochschule München University of Applied Sciences in Germany evaluated 14 LLMs ranging from 7 to 72 billion parametersâ€"the levers and dials that fine-tune a model's understanding and language generationâ€"on 1,000 benchmark questions about various subjects. LLMs convert each word or parts of words in a prompt into a string of numbers called a token. Some LLMs, particularly reasoning LLMs, also insert special “thinking tokens†into the input sequence to allow for additional internal computation and reasoning before generating output. This conversion and the subsequent computations that the LLM performs on the tokens use energy and releases CO2. The scientists compared the number of tokens generated by each of the models they tested. Reasoning models, on average, created 543.5 thinking tokens per question, whereas concise models required just 37.7 tokens per question, the study found. In the ChatGPT world, for example, GPT-3.5 is a concise model, whereas GPT-4o is a reasoning model. This reasoning process drives up energy needs, the authors found. “The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach,†study author Maximilian Dauner, a researcher at Hochschule München University of Applied Sciences, said in a statement. “We found that reasoning-enabled models produced up to 50 times more CO2 emissions than concise response models.†The more accurate the models were, the more carbon emissions they produced, the study found. The reasoning model Cogito, which has 70 billion parameters, reached up to 84.9% accuracyâ€"but it also produced three times more CO2 emissions than similarly sized models that generate more concise answers. “Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies,†said Dauner. “None of the models that kept emissions below 500 grams of CO2 equivalent achieved higher than 80% accuracy on answering the 1,000 questions correctly.†CO2 equivalent is the unit used to measure the climate impact of various greenhouse gases. Another factor was subject matter. Questions that required detailed or complex reasoning, for example abstract algebra or philosophy, led to up to six times higher emissions than more straightforward subjects, according to the study. There are some caveats, though. Emissions are very dependent on how local energy grids are structured and the models that you examine, so it’s unclear how generalizable these findings are. Still, the study authors said they hope that the work will encourage people to be “selective and thoughtful†about the LLM use. “Users can significantly reduce emissions by prompting AI to generate concise answers or limiting the use of high-capacity models to tasks that genuinely require that power,†Dauner said in a statement.
[5]
The hidden carbon cost of chatting to your AI
AI tools like ChatGPT have changed our personal and professional worlds, with around 52% of American adults regularly using a large language model (LLM). Now, a new study details the immense environmental costs of our prompts, and it might make you think twice about what chatbot you use and how you use it. Researchers from Germany's Hochschule München University of Applied Sciences (HM) looked at 14 different LLMs, ranging from basic to more complex knowledge bases, and provided them all with the same 1,000 "benchmark" questions. How many "tokens" the LLM generated can then be translated into greenhouse gas emissions. "The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions," said first author Maximilian Dauner, a researcher at HM. "We found that reasoning-enabled models produced up to 50 times more CO emissions than concise response models." In understanding how LLMs work and end up so environmentally costly, it's important to look at tokens and parameters. When we type a prompt - be it a question or an instruction - we generate tokens, which represent pieces of our prompt, The LLM then generates more of these as it gets to work. LLMs with more intensive advanced reasoning capabilities create even more tokens. Tokens are essentially the computing (searching, linking, assessing), and computing requires power. This power results in CO emissions. When a LLM is trained, it "learns" by adjusting parameters, which are numbers inside a neural network. And these parameters control how the model predicts one token after another. So, a model with fewer parameters is considered simpler and with fewer "weights" (a number that tells the AI how important something is when it's processing information), and will generate fewer tokens but might not be as accurate. On the flip side, a model with a high amount of parameters will also have a high amount of weights - and should have higher accuracy, but it's not always the case. Unfortunately, the most complex and accurate LLMs are also the most energy intensive. The scientists used an NVIDIA A100 GPU computer and the Perun framework (which analyzes LLM performance and the power required) to gauge energy consumption, applying an average emission factor of 480 gCO₂/kWh. They then had each of the 14 models answer 1,000 quiz questions covering philosophy, world history, international law, abstract algebra, and high school math. The LLMs tested were a mix of text-only and reasoning models from Meta, Alibaba, Deep Cognito and Deepseek. "The analysis of combined COeq [CO-equivalent] emissions, accuracy, and token generation across all 1,000 questions reveals clear trends and trade-offs between model scale, reasoning complexity, and environmental impact," the researchers wrote. "As model size increases, accuracy tends to improve. However, this gain is also linked to substantial growth in both COeq emissions and the number of generated tokens." They found that the reasoning models created an average of 543.5 "thinking" tokens per quiz question, while the text-only models averaged around 37.7 tokens for the same prompt. However, while more tokens means more emissions, the researchers found that it didn't also mean the LLM was more accurate - just more verbose. The most accurate model was one of the reasoning LLMs tested, Deep Cogito 70B - 70 billion parameters - with an accuracy rate of 84.9%. It produced three times the emissions of similar sized LLMs that returned more basic answers. "Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies," said Dauner. "None of the models that kept emissions below 500 grams of CO equivalent achieved higher than 80% accuracy on answering the 1,000 questions correctly." Deepseek's R1 70B reasoning model was the most energy-expensive, producing 2,042 g CO-equivalent in emissions, roughly the same as an 8-mile (15-km) trip in a gas vehicle. While this might not seem like a lot on a small scale, it's worth remembering that more than 130 million Americans are using some AI model regularly. And this Deepseek model wasn't the most correct, either, with a 78.9% accuracy rate. The researchers noted that having this model answer 600,000 questions would create CO emissions equal to a London-New York return flight. Alibaba's Qwen 7B model was the most energy efficient (27.7 g COeq emissions), but it only managed 31.9% accuracy. "On average, reasoning-enabled models required significantly more tokens in both testing phases," the researchers noted in the study. "Particularly in the multiple-choice phase, reasoning models frequently struggled to produce concise answers, despite explicit prompts instructing them to only return the choice index. For instance, Deepseek-R1 7B generated up to 14,187 tokens on a single mathematical question, while standard models consistently produced minimal single-token responses." Energy consumption also varied depending on the prompt, with abstract algebra and philosophy requiring more reasoning than more straightforward questions. It's also worth noting that this study looked at only a sample of the LLMs we now have access to, and didn't look at some of the big players including OpenAI's ChatGPT, Google's Gemini, X's Grok and Anthropic's Claude. While LLMs are certainly here to stay, and are likely to be further incorporated into our lives, the researchers hope their study can help users make better choices, switching between models depending on the task at hand. They hope it'll also draw attention to the need for more energy-efficient reasoning models in the future. "Users can significantly reduce emissions by prompting AI to generate concise answers or limiting the use of high-capacity models to tasks that genuinely require that power," Dauner said. "If users know the exact CO cost of their AI-generated outputs, such as casually turning themselves into an action figure, they might be more selective and thoughtful about when and how they use these technologies."
[6]
AI chatbots consume 50 times more energy for long answers - Earth.com
Many people fire off rapid‑fire queries at the latest AI chatbots without a second thought. Each answer taps electricity, and a new study shows that some replies raise the meter by a factor of fifty. The analysis covered 14 large language models (LLMs). The researchers found that the most verbose engines release as much heat‑trapping gas per answer as running a midsize laptop for an hour, while concise models sip power. The study was led by Maximilian Dauner at the Hochschule München University of Applied Sciences. Every interaction begins with tokens, the word fragments a model turns into numbers so its circuits can reason. More tokens mean more processor cycles and, ultimately, more electricity. The link between long output and higher emissions is not new. A 2019 study estimated that training one natural‑language model could emit as much carbon as five cross‑country flights. Recent projections suggest generative AI as a whole could consume 29.3 terawatt‑hours per year, rivaling Ireland's entire grid use. Power plants that supply that demand release hundreds of grams of carbon dioxide equivalent for every kilowatt‑hour they generate, with the International Energy Agency placing the world average at about 480 g. Dauner's team put seven‑ to 72‑billion‑parameter systems through 1,000 standardized questions across philosophy, history, law, abstract algebra, and mathematics. The researchers counted the electricity drawn by an NVIDIA A100 GPU during each run. Models designed to "think out loud" produced an average of 543.5 thinking tokens before answering, compared with 37.7 tokens for brisk responders. That extra chatter translated into as much as 50 times more emissions per question. "The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions," said Dauner. Long answers in algebra caused six times more emissions than short history replies. It was the symbolic reasoning in AI's long answers, not just length, that raised energy use. The 70‑billion‑parameter Cogito model hit 84.9 percent accuracy but released three times more carbon than similarly sized concise systems. No model that stayed below 500 g of emissions cracked the 80 percent accuracy ceiling. That ceiling also shifted by topic. History questions were easy wins, while algebra stumped nearly every contestant despite burning far more power. "Currently, we see a clear accuracy‑sustainability trade‑off inherent in LLM technologies," Dauner noted. Size alone was not destiny. Qwen 2.5, a 72‑billion model that keeps its answers short, handled almost two million questions for the same emissions budget that DeepSeek R1 burned on 600,000, with only a small accuracy gap. To put the numbers in context, generating answers with some reasoning-heavy AI models emits as much CO₂ as driving a car for several miles. For example, DeepSeek R1 (70B) emits over 2,000 grams of CO₂ to answer 1,000 questions, which is roughly the same as burning a quarter of a gallon of gasoline. Over time, those numbers multiply. Running a reasoning-enabled model to answer questions non-stop over a full day could produce as much carbon as running a refrigerator for two weeks. This comparison helps highlight the hidden but accumulating environmental burden of everyday AI use. Individual habits matter. A quick setting that asks for a concise answer slices token counts dramatically. Saving the heavyweight models for code reviews or legal briefs and using lighter variants for trivia cuts emissions further. Cloud providers can help by surfacing real‑time energy dashboards so users see the carbon cost of every query. Transparent metrics nudge people toward greener defaults without sacrificing capability when it truly counts. Engineers are racing to squeeze more work from fewer tokens, trimming redundant reasoning steps and caching intermediate thoughts. Hardware makers also chase efficiency, but clean grids remain essential. Data‑center growth is already prompting utilities to burn more fossil fuel, with a recent investigation warning of billions of dollars in public‑health impacts tied to AI power demand. Policy teams are weighing disclosure rules, minimum efficiency standards, and incentives for renewable‑powered compute zones. Without them, the next wave of models could push the sector's footprint beyond that of entire mid‑size nations. Incremental gains in software design, smarter user choices, and cleaner electricity can bend the curve. The challenge is aligning all three before the growth of digital brains outpaces the planet's patience. Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.
[7]
Some AI prompts could cause 50 times more CO₂ emissions than others, researchers find
No matter which questions we ask an AI, the model will come up with an answer. To produce this information -- regardless of whether the answer is correct or not -- the model uses tokens. Tokens are words or parts of words that are converted into a string of numbers that can be processed by the LLM. This conversion, as well as other computing processes, produce CO emissions. Many users, however, are unaware of the substantial carbon footprint associated with these technologies. Now, researchers in Germany measured and compared CO emissions of different, already trained, LLMs using a set of standardized questions. "The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions," said first author Maximilian Dauner, a researcher at Hochschule München University of Applied Sciences and first author of the Frontiers in Communication study. "We found that reasoning-enabled models produced up to 50 times more CO emissions than concise response models." 'Thinking' AI causes most emissions The researchers evaluated 14 LLMs ranging from seven to 72 billion parameters on 1,000 benchmark questions across diverse subjects. Parameters determine how LLMs learn and process information. Reasoning models, on average, created 543.5 "thinking" tokens per question, whereas concise models required just 37.7 tokens per question. Thinking tokens are additional tokens that reasoning LLMs generate before producing an answer. A higher token footprint always means higher CO emissions. It doesn't, however, necessarily mean the resulting answers are more correct, as elaborate detail is not always essential for correctness. The most accurate model was the reasoning-enabled Cogito model with 70 billion parameters, reaching 84.9% accuracy. The model produced three times more CO emissions than similar-sized models that generated concise answers. "Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies," said Dauner. "None of the models that kept emissions below 500 grams of CO equivalent achieved higher than 80% accuracy on answering the 1,000 questions correctly." CO equivalent is the unit used to measure the climate impact of various greenhouse gases. Subject matter also resulted in significantly different levels of CO emissions. Questions that required lengthy reasoning processes, for example abstract algebra or philosophy, led to up to six times higher emissions than more straightforward subjects, like high school history. Practicing thoughtful use The researchers said they hope their work will cause people to make more informed decisions about their own AI use. "Users can significantly reduce emissions by prompting AI to generate concise answers or limiting the use of high-capacity models to tasks that genuinely require that power," Dauner pointed out. Choice of model, for instance, can make a significant difference in CO emissions. For example, having DeepSeek R1 (70 billion parameters) answer 600,000 questions would create CO emissions equal to a round-trip flight from London to New York. Meanwhile, Qwen 2.5 (72 billion parameters) can answer more than three times as many questions (about 1.9 million) with similar accuracy rates while generating the same emissions. The researchers said that their results may be impacted by the choice of hardware used in the study, an emission factor that may vary regionally depending on local energy grid mixes, and the examined models. These factors may limit the generalizability of the results. "If users know the exact CO cost of their AI-generated outputs, such as casually turning themselves into an action figure, they might be more selective and thoughtful about when and how they use these technologies," Dauner concludes.
[8]
Scientists Just Found Something Unbelievably Grim About Pollution Generated by AI
Tech companies are hellbent on pushing out ever more advanced artificial intelligence models -- but there appears to be a grim cost to that progress. In a new study in the science journal Frontiers in Communication, German researchers found that large language models (LLM) that provide more accurate answers use exponentially more energy -- and hence produce more carbon -- than their simpler and lower-performing peers. In other words, the findings are a grim sign of things to come for the environmental impacts of the AI industry: the more accurate a model is, the higher its toll on the climate. "Everyone knows that as you increase model size, typically models become more capable, use more electricity and have more emissions," Allen Institute for AI researcher Jesse Dodge, who didn't work on the German research but has conducted similar analysis of his own, told the New York Times. The team examined 14 open source LLMs -- they were unable to access the inner workings of commercial offerings like OpenAI's ChatGPT or Anthropic's Claude -- of various sizes and fed them 500 multiple choice questions plus 500 "free-response questions." Crunching the numbers, the researchers found that big, more accurate models such as DeepSeek produce the most carbon compared to chatbots with smaller digital brains. So-called "reasoning" chatbots, which break problems down into steps in their attempts to solve them, also produced markedly more emissions than their simpler brethren. There were occasional LLMs that bucked the trend -- Cogito 70B achieved slightly higher accuracy than DeepSeek, but with a modestly smaller carbon footprint, for instance -- but the overall pattern was stark: the more reliable an AI's outputs, the greater its environmental harm. "We don't always need the biggest, most heavily trained model, to answer simple questions," Maximilian Dauner, a German doctoral student and lead author of the paper, told the NYT. "Smaller models are also capable of doing specific things well. The goal should be to pick the right model for the right task." That brings up an interesting point: do we really need AI in everything? When you go on Google, those annoying AI summaries pop up, no doubt generating pollution for a result that you never asked for in the first place. Each individual query might not count for much, but when you add them all up, the effects on the climate could be immense. OpenAI CEO Sam Altman, for example, recently enthused that a "significant fraction" of the Earth's total power production should eventually go to AI.
[9]
AI users have to choose between accuracy or sustainability
Cheap or free access to AI models keeps improving, with Google the latest firm to make its newest models available to all users, not just paying ones. But that access comes with one cost: the environment. In a new study, German researchers tested 14 large language models (LLMs) of various sizes from leading developers such as Meta, Alibaba, and others. Each model answered 1,000 difficult academic questions spanning topics from world history to advanced mathematics. The tests ran on a powerful, energy-intensive NVIDIA A100 GPU, using a specialized framework to precisely measure electricity consumption per answer. This data was then converted into carbon dioxide equivalent emissions, providing a clear comparison of each model's environmental impact. The researchers found that many LLMs are far more powerful than needed for everyday queries. Smaller, less energy-hungry models can answer many factual questions just as well. The carbon and water footprints of a single prompt vary dramatically depending on model size and task type. Prompts requiring reasoning, which force models to "think aloud," are especially polluting because they generate many more tokens. One model, Cogito, topped the accuracy table -- answering nearly 85% of questions correctly -- but produced three times more emissions than similar-sized models, highlighting a trade-off rarely visible to AI developers or users. (Cogito did not respond to a request for comment.) "Do we really need a 400-billion parameter GPT model to answer when World War II was, for example," says Maximilian Dauner, a researcher at Hochschule München University of Applied Sciences and one of the study's authors.
[10]
Can you choose an AI model that harms the planet less?
And some chatbots are linked to more greenhouse gas emissions than others. A study published Thursday in the journal Frontiers in Communication analyzed different generative AI chatbots' capabilities and the planet-warming emissions generated from running them. Researchers found that chatbots with bigger "brains" used exponentially more energy and answered questions more accurately -- up until a point.From uninvited results at the top of your search engine queries to offering to write your emails and helping students do homework, generative artificial intelligence is quickly becoming part of daily life as tech giants race to develop the most advanced models and attract users. All those prompts come with an environmental cost: A report last year from the Energy Department found AI could help increase the portion of the nation's electricity supply consumed by data centers from 4.4% to 12% by 2028. To meet this demand, some power plants are expected to burn more coal and natural gas. And some chatbots are linked to more greenhouse gas emissions than others. A study published Thursday in the journal Frontiers in Communication analyzed different generative AI chatbots' capabilities and the planet-warming emissions generated from running them. Researchers found that chatbots with bigger "brains" used exponentially more energy and answered questions more accurately -- up until a point. "We don't always need the biggest, most heavily trained model, to answer simple questions. Smaller models are also capable of doing specific things well," said Maximilian Dauner, a doctoral student at the Munich University of Applied Sciences and lead author of the paper. "The goal should be to pick the right model for the right task." The study evaluated 14 large language models, a common form of generative AI often referred to by the acronym LLMs, by asking each a set of 500 multiple choice and 500 free response questions across five different subjects. Dauner then measured the energy used to run each model and converted the results into carbon dioxide equivalents based on global averages.In most of the models tested, questions in logic-based subjects, like abstract algebra, produced the longest answers -- which likely means they used more energy to generate compared with fact-based subjects, such as history, Dauner said. AI chatbots that show their step-by-step reasoning while responding tend to use far more energy per question than chatbots that don't. The five reasoning models tested in the study did not answer questions much more accurately than the nine other studied models. The model that emitted the most, DeepSeek-R1, offered answers of comparable accuracy to those that generated a fourth of the amount of emissions. There is key information not captured by the study, which only included open-source LLMs: Some of the most popular AI programs made by large tech corporations, such as OpenAI's ChatGPT and Google's Gemini, were not included in the results. And because the paper converted the measured energy to emissions based on a global CO2 average, it only offered an estimate; it did not indicate the actual emissions generated by using these models, which can vary hugely depending on which country the data center running it is in. "Some regions are going to be powered by electricity from renewable sources, and some are going to be primarily running on fossil fuels," said Jesse Dodge, a senior research scientist at the Allen Institute for AI who was not affiliated with the new research. In 2022, Dodge led a study comparing the difference in greenhouse gas emissions generated by training a LLM in 16 different regions of the world. Depending on the time of year, some of the most emitting areas, like the central United States, had roughly three times the carbon intensity of the least emitting ones, such as Norway. But even with this limitation, the new study fills a gap in research on the trade-off between energy cost and model accuracy, Dodge said. "Everyone knows that as you increase model size, typically models become more capable, use more electricity and have more emissions," he said. Reasoning models, which have been increasingly trendy, are likely further bumping up energy costs, because of their longer answers. "For specific subjects an LLM needs to use more words to get to a more accurate response," Dauner said. "Longer answers and those that use a reasoning process generate more emissions." Sasha Luccioni, the AI and climate lead at Hugging Face, an AI company, said that subject matter is less important than output length, which is determined by how the model was trained. She also emphasized that the study's sample size is too small to create a complete picture of emissions from AI. "What's relevant here is not the fact that it's math and philosophy, it's the length of the input and the output," she said. Last year, Luccioni published a study that compared 88 LLMs and also found that larger models generally had higher emissions. Her results also indicated that AI text generation -- which is what chatbots do -- used 10 times as much energy compared with simple classification tasks like sorting emails into folders. Luccioni said that these kinds of "old school" AI tools, including classic search engine functions, have been overlooked as generative models have become more widespread. Most of the time, she said, the average person doesn't need to use an LLM at all. Dodge added that people looking for facts are better off just using a search engine, since generative AI can "hallucinate" false information. "We're reinventing the wheel," Luccioni said. People don't need to use generative AI as a calculator, she said. "Use a calculator as a calculator." This article originally appeared in The New York Times.
[11]
AI chatbots using reason emit more carbon than those responding concisely, study finds
A study has found that chat-based generative AI emits significantly more carbon when handling complex prompts. Reasoning-enabled models produced up to 50 times more emissions than concise ones. While these models are more accurate, researchers warn of a trade-off between accuracy and sustainability, urging optimisation for environmentally conscious AI development.A study found that carbon emissions from chat-based generative AI can be six times higher when responding to complex prompts, like abstract algebra or philosophy, compared to simpler prompts, such as high school history. "The environmental impact of questioning trained (large-language models) is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions," first author Maximilian Dauner, a researcher at Hochschule Munchen University of Applied Sciences, Germany, said. "We found that reasoning-enabled models produced up to 50 times more (carbon dioxide) emissions than concise response models," Dauner added. The study, published in the journal Frontiers in Communication, evaluated how 14 large-language models (which power chatbots), including DeepSeek and Cogito, process information before responding to 1,000 benchmark questions -- 500 multiple-choice and 500 subjective. Each model responded to 100 questions on each of the five subjects chosen for the analysis -- philosophy, high school world history, international law, abstract algebra, and high school mathematics. "Zero-token reasoning traces appear when no intermediate text is needed (e.g. Cogito 70B reasoning on certain history items), whereas the maximum reasoning burden (6.716 tokens) is observed for the Deepseek R1 7B model on an abstract algebra prompt," the authors wrote. Tokens are virtual objects created by conversational AI when processing a user's prompt in natural language. More tokens lead to increased carbon dioxide emissions. Chatbots equipped with an ability to reason, or 'reasoning models', produced 543.5 'thinking' tokens per question, whereas concise models -- producing one-word answers -- required just 37.7 tokens per question, the researchers found. Thinking tokens are additional ones that reasoning models generate before producing an answer, they explained. However, more thinking tokens do not necessarily guarantee correct responses, as the team said, elaborate detail is not always essential for correctness. Dauner said, "None of the models that kept emissions below 500 grams of CO₂ equivalent achieved higher than 80 per cent accuracy on answering the 1,000 questions correctly." "Currently, we see a clear accuracy-sustainability trade-off inherent in (large-language model) technologies," the author added. The most accurate performance was seen in the reasoning model Cogito, with a nearly 85 per cent accuracy in responses, whilst producing three times more carbon dioxide emissions than similar-sized models generating concise answers. "In conclusion, while larger and reasoning-enhanced models significantly outperform smaller counterparts in terms of accuracy, this improvement comes with steep increases in emissions and computational demand," the authors wrote. "Optimising reasoning efficiency and response brevity, particularly for challenging subjects like abstract algebra, is crucial for advancing more sustainable and environmentally conscious artificial intelligence technologies," they wrote.
Share
Copy Link
A new study reveals that advanced AI reasoning models produce significantly higher CO₂ emissions compared to more concise models when answering the same questions, highlighting the environmental impact of AI technology.
A groundbreaking study published in Frontiers in Communication has revealed that advanced AI reasoning models can produce up to 50 times more CO₂ emissions than their more concise counterparts when answering the same questions 1. This finding sheds light on the significant environmental impact of increasingly sophisticated artificial intelligence technologies.
Source: Live Science
Researchers from Hochschule München University of Applied Sciences evaluated 14 different Large Language Models (LLMs), ranging from 7 to 72 billion parameters, using a standardized set of 1,000 benchmark questions across various subjects 2. The study utilized the Perun framework and an NVIDIA A100 GPU to analyze LLM performance and energy requirements.
Key findings include:
The study highlights a clear trade-off between AI accuracy and environmental sustainability. Maximilian Dauner, the study's first author, stated, "None of the models that kept emissions below 500 grams of CO₂ equivalent achieved higher than 80% accuracy on answering the 1,000 questions correctly" 2.
This trade-off poses a significant challenge for AI developers and users alike. As model size increases, accuracy tends to improve, but at the cost of substantially higher CO₂ emissions and token generation 1.
Source: Popular Science
The environmental impact of AI models becomes particularly concerning when considering their widespread use. With approximately 52% of American adults regularly using LLMs, the cumulative effect on carbon emissions could be substantial 5.
To put this into perspective:
Source: ScienceDaily
The findings of this study have significant implications for both AI developers and users:
As Dauner suggests, "If users know the exact CO₂ cost of their AI-generated outputs, such as casually turning themselves into an action figure, they might be more selective and thoughtful about when and how they use these technologies" 2.
As AI continues to evolve and integrate into various aspects of our lives, addressing its environmental impact becomes increasingly crucial. This study serves as a wake-up call for the tech industry and policymakers to consider sustainability alongside performance in AI development and deployment strategies.
The challenge moving forward will be to strike a balance between the undeniable benefits of advanced AI reasoning models and their environmental costs, ensuring that the pursuit of artificial intelligence doesn't come at the expense of our planet's health.
French tech giant Capgemini agrees to acquire US-listed WNS Holdings for $3.3 billion, aiming to strengthen its position in AI-powered intelligent operations and expand its presence in the US market.
10 Sources
Business and Economy
6 hrs ago
10 Sources
Business and Economy
6 hrs ago
Isomorphic Labs, a subsidiary of Alphabet, is preparing to begin human trials for drugs developed using artificial intelligence, potentially revolutionizing the pharmaceutical industry.
3 Sources
Science and Research
14 hrs ago
3 Sources
Science and Research
14 hrs ago
BRICS leaders are set to call for protections against unauthorized AI use, addressing concerns over data collection and fair payment mechanisms during their summit in Rio de Janeiro.
3 Sources
Policy and Regulation
22 hrs ago
3 Sources
Policy and Regulation
22 hrs ago
Huawei's AI research division, Noah Ark Lab, denies allegations that its Pangu Pro large language model copied elements from Alibaba's Qwen model, asserting independent development and adherence to open-source practices.
3 Sources
Technology
6 hrs ago
3 Sources
Technology
6 hrs ago
Samsung Electronics is forecasted to report a significant drop in Q2 operating profit due to delays in supplying advanced memory chips to AI leader Nvidia, highlighting the company's struggles in the competitive AI chip market.
2 Sources
Business and Economy
14 hrs ago
2 Sources
Business and Economy
14 hrs ago