Meta's Llama 4 Release: Ambitious Claims Meet Mixed Reception

48 Sources

[1]

Meta's surprise Llama 4 drop exposes the gap between AI ambition and reality

On Saturday, Meta released its newest Llama 4 multimodal AI models in a surprise weekend move that caught some AI experts off guard. The announcement touted Llama 4 Scout and Llama 4 Maverick as major advancements, with Meta claiming top performance in their categories and an enormous 10 million token context window for Scout. But so far the open-weights models have received an initial mixed-to-negative reception from the AI community, highlighting a familiar tension between AI marketing and user experience. "The vibes around llama 4 so far are decidedly mid," said independent AI researcher Simon Willison in a short interview with Ars Technica. Willison often checks the community pulse around open source and open weights AI releases in particular. While Meta positions Llama 4 in competition with closed-model giants like OpenAI and Google, the company continues to use the term "open source" despite licensing restrictions that prevent truly open use. As we have noted in the past with previous Llama releases, "open weights" more accurately describes Meta's approach. Those who sign in and accept the license terms can download the two smaller Llama 4 models from Hugging Face or llama.com. The company describes the new Llama 4 models as "natively multimodal," built from the ground up to handle both text and images using a technique called "early fusion." Meta says this allows joint training on text, images, and video frames, giving the models a "broad visual understanding." This approach ostensibly puts Llama 4 in direct competition with existing multimodal models from OpenAI (such as GPT-4o) and Google (Gemini 2.5). The company trained the two new models with assistance from an even larger unreleased 'teacher' model named Llama 4 Behemoth (with 2 trillion total parameters), which is still in development. Parameters are the numerical values a model adjusts during training to learn patterns. Fewer parameters mean smaller, faster models that can run on phones or laptops, though creating high-performing compact models remains a major AI engineering challenge. Meta constructed the Llama 4 models using a mixture-of-experts (MoE) architecture, which is one way around the limitations of running huge AI models. Think of MoE like having a large team of specialized workers; instead of everyone working on every task, only the relevant specialists activate for a specific job. For example, Llama 4 Maverick features a 400 billion parameter size, but only 17 billion of those parameters are active at once across one of 128 experts. Likewise, Scout features 109 billion total parameters, but only 17 billion are active at once across one of 16 experts. This design can reduce the computation needed to run the model, since smaller portions of neural network weights are active simultaneously. Llama's reality check arrives quickly Current AI models have a relatively limited short-term memory. In AI, a context window acts somewhat in that fashion, determining how much information it can process simultaneously. AI language models like Llama typically process that memory as chunks of data called tokens, which can be whole words or fragments of longer words. Large context windows allow AI models to process longer documents, larger code bases, and longer conversations. Despite Meta's promotion of Llama 4 Scout's 10 million token context window, developers have so far discovered that using even a fraction of that amount has proven challenging due to memory limitations. Simon Willison reported on his blog that third-party services providing access, like Groq and Fireworks, limited Scout's context to just 128,000 tokens. Another provider, Together AI, offered 328,000 tokens. Evidence suggests accessing larger contexts requires immense resources. Willison pointed to Meta's own example notebook ("build_with_llama_4"), which states that running a 1.4 million token context needs eight high-end NVIDIA H100 GPUs. Willison documented his own testing troubles. When he asked Llama 4 Scout via the OpenRouter service to summarize a long online discussion (around 20,000 tokens), the result wasn't useful. He described the output as "complete junk output," which devolved into repetitive loops. Meta claims that the larger of its two new Llama 4 models, Maverick, outperforms competitors like OpenAI's GPT-4o and Google's Gemini 2.0 on various technical benchmarks, which we usually note are not necessarily useful reflections of everyday user experience. So far, independent verification of the released model's performance claims remains limited. More interestingly, a version of Llama 4 is currently perched at No. 2 on the popular Chatbot Arena LLM vibemarking leaderboard. However, even this comes with a catch: Willison noted a distinction pointed out in Meta's own announcement: the high-ranking leaderboard entry referred to an "experimental chat version scoring ELO of 1417 on LMArena," different from the Maverick model made available for download. A potential technical dead-end The Llama 4 release sparked discussion on social media about AI development trends, with reactions including mild disappointment over lackluster multimodal features, concerns that its mixture-of-experts architecture used too few activated parameters (only 17 billion), and criticisms that the release felt rushed or poorly managed internally. Some Reddit users also noted it compared unfavorably with innovative competitors such as DeepSeek and Qwen, particularly highlighting its underwhelming performance in coding tasks and software development benchmarks. On X, researcher Andriy Burkov, author of "The Hundred-Page Language Models Book," argued the underwhelming Llama 4 launch reinforces skepticism about monolithic base models. He stated that recent "disappointing releases of both GPT-4.5 and Llama 4 have shown that if you don't train a model to reason with reinforcement learning, increasing its size no longer provides benefits." Burkov's mention of GPT-4.5 echoes that model's somewhat troubled launch; Ars Technica previously reported that GPT-4.5 faced mixed reviews, with its high cost and performance limitations suggesting a potential dead-end for simply scaling up traditional AI model architectures. This observation aligns with broader discussions in the AI field about the scaling limitations of training massive base models without incorporating newer techniques (such as simulated reasoning or training smaller, purpose-built models). Despite all of the current drawbacks with Meta's new model family, Willison is optimistic that future releases of Llama 4 will prove more useful. "My hope is that we'll see a whole family of Llama 4 models at varying sizes, following the pattern of Llama 3," he wrote on his blog. "I'm particularly excited to see if they produce an improved ~3B model that runs on my phone."

[2]

TechCrunch

Meta releases Llama 4, a new crop of flagship AI models | TechCrunch

Meta has released a new collection of AI models, Llama 4, in its Llama family -- on a Saturday, no less. There are four new models in total: Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth. All were trained on "large amounts of unlabeled text, image, and video data" to give them "broad visual understanding," Meta says. The success of open models from Chinese AI lab DeepSeek, which perform on par or better than Meta's previous flagship Llama models, reportedly kicked Llama development into overdrive. Meta is said to have scrambled war rooms to decipher how DeepSeek lowered the cost of running and deploying models like R1 and V3. Scout and Maverick are openly available on Llama.com and from Meta's partners, including the AI dev platform Hugging Face, while Behemoth is still in training. Meta says that Meta AI, its AI-powered assistant across apps including WhatsApp, Messenger, and Instagram, has been updated to use Llama 4 in 40 countries. Multimodal features are limited to the U.S. in English for now. Some developers may take issue with the Llama 4 license. Users and companies "domiciled" or with a "principal place of business" in the EU are prohibited from using or distributing the models, likely the result of governance requirements imposed by the region's AI and data privacy laws. (In the past, Meta has decried these laws as overly burdensome.) In addition, as with previous Llama releases, companies with more than 700 million monthly active users must request a special license from Meta, which Meta can grant or deny at its sole discretion. "These Llama 4 models mark the beginning of a new era for the Llama ecosystem," Meta wrote in a blog post. "This is just the beginning for the Llama 4 collection." Meta says that Llama 4 is its first cohort of models to use a mixture of experts (MoE) architecture, which is more computationally efficient for training and answering queries. MoE architectures basically break down data processing tasks into subtasks and then delegate them to smaller, specialized "expert" models. Maverick, for example, has 400 billion total parameters, but only 17 billion active parameters across 128 "experts." (Parameters roughly correspond to a model's problem-solving skills.) Scout has 17 billion active parameters, 16 experts, and 109 billion total parameters. According to Meta's internal testing, Maverick, which the company says is best for "general assistant and chat" use cases like creative writing, exceeds models such as OpenAI's GPT-4o and Google's Gemini 2.0 on certain coding, reasoning, multilingual, long-context, and image benchmarks. However, Maverick doesn't quite measure up to more capable recent models like Google's Gemini 2.5 Pro, Anthropic's Claude 3.7 Sonnet, and OpenAI's GPT-4.5. Scout's strengths lie in tasks like document summarization and reasoning over large codebases. Uniquely, it has a very large context window: 10 million tokens. ("Tokens" represent bits of raw text -- e.g., the word "fantastic" split into "fan," "tas" and "tic.") In plain English, Scout can take in images and up to millions of words, allowing it to process and work with extremely large documents. Scout can run on a single Nvidia H100 GPU, while Maverick requires an Nvidia H100 DGX system, according to Meta. Meta's unreleased Behemoth will need even beefier hardware. According to the company, Behemoth has 288 billion active parameters, 16 experts, and nearly two trillion total parameters. Meta's internal benchmarking has Behemoth outperforming GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Pro (but not 2.5 Pro) on several evaluations measuring STEM skills like math problem solving. Of note, none of the Llama 4 models is a proper "reasoning" model along the lines of OpenAI's o1 and o3-mini. Reasoning models fact-check their answers and generally respond to questions more reliably, but as a consequence take longer than traditional, "non-reasoning" models to deliver answers. Interestingly, Meta says that it tuned all of its Llama 4 models to refuse to answer "contentious" questions less often. According to the company, Llama 4 responds to "debated" political and social topics that the previous crop of Llama models wouldn't. In addition, the company says, Llama 4 is "dramatically more balanced" with which prompts it flat-out won't entertain. "[Y]ou can count on [Lllama 4] to provide helpful, factual responses without judgment," a Meta spokesperson told TechCrunch. "[W]e're continuing to make Llama more responsive so that it answers more questions, can respond to a variety of different viewpoints [...] and doesn't favor some views over others." Those tweaks come as White House allies accuse AI of political wokeness. Many of President Donald Trump's close confidants, including Elon Musk and crypto and AI "czar" David Sacks, have alleged that many AI chatbots censor conservative viewpoints. Sacks has historically singled out OpenAI's ChatGPT in particular as "programmed to be woke" and untruthful about politically sensitive subjects. In truth, bias in AI is an intractable technical problem. Musk's own AI company, xAI, has struggled to create a chatbot that doesn't endorse some political views over others. That hasn't stopped companies including OpenAI from adjusting their AI models to answer more questions than they would have previously, in particular questions on controversial political subjects.

[3]

TechCrunch

Meta exec denies the company artificially boosted Llama 4's benchmark scores | TechCrunch

A Meta exec on Monday denied a rumor that the company trained its new AI models to present well on specific benchmarks while concealing the models' weaknesses. The executive, Ahmad Al-Dahle, VP of generative AI at Meta, said in a post on X that it's "simply not true" that Meta trained its Llama 4 Maverick and Llama 4 Scout models on "test sets." In AI benchmarks, test sets are collections of data used to evaluate the performance of a model after it's been trained. Training on a test set could misleadingly inflate a model's benchmark scores, making the model appear more capable than it actually is. Over the weekend, an unsubstantiated rumor that Meta artificially boosted its new models' benchmark results began circulating on X and Reddit. The rumor appears to have originated from a post on a Chinese social media site from a user claiming to have resigned from Meta in protest over the company's benchmarking practices. Reports that Maverick and Scout perform poorly on certain tasks fueled the rumor, as did Meta's decision to use an experimental, unreleased version of Maverick to achieve better scores on the benchmark LM Arena. Researchers on X have observed stark differences in the behavior of the publicly downloadable Maverick compared with the model hosted on LM Arena. Al-Dahle acknowledged that some users are seeing "mixed quality" from Maverick and Scout across the different cloud providers hosting the models. "Since we dropped the models as soon as they were ready, we expect it'll take several days for all the public implementations to get dialed in," Al-Dahle said. "We'll keep working through our bug fixes and onboarding partners."

[4]

CNET

Meta Dropped Llama 4: What to Know About the Two New AI Models

Katelyn is a writer with CNET covering social media, AI and online services. She graduated from the University of North Carolina at Chapel Hill with a degree in media and journalism. You can often find her with a novel and an iced coffee during her time off. Meta unveiled its latest family of generative AI model on Saturday. You can give the Llama 4 models a test drive now through Meta AI's website, and Llama 4 will soon power the many Meta AI features on Instagram, WhatsApp and Messenger. The competition between Meta and other AI companies is becoming increasingly intense. Companies are working to build and release AI models that are capable of more complex tasks and advanced reasoning without requiring vast amounts of computing power and cash to run. It's a tricky sweet spot to hit, and Meta is hoping that its newest models will put it ahead of competitors like ChatGPT and Gemini. There are two models in the Llama 4 family available now: Scout and Maverick. They're open-weights models and multimodal, which means they can generate text, images and code. Open models like Meta's mean that developers can get some insight into how the models are built. The Llama 4 models are open-weights models, which means you can see how the model makes connections and how certain characteristics are given more weight as it learns. OpenAI announced earlier this month that it is developing an open-weights model for the first time. Scout is the smallest model of the family, designed to run on a single Nvidia H100 GPU. If you're a numbers person, Scout has a 10 million token context window and is a 17 billion parameters model featuring 16 experts (sub-networks within the model, allowing it to run tasks more efficiently) Scout has more than twice the firepower of Llama 3 (8 billion parameters). Generally, the more parameters a model has, the more capable it is of delivering better results faster. Maverick is a mid-sized model, the big brother to Scout, featuring 17 billion parameters with 128 experts. Meta said benchmark tests showed Maverick beat ChatGPT-4o and text generation done by DeepSeek V3. DeepSeek still has the edge when it comes to reasoning and coding. CNET hasn't independently verified Meta's benchmarking tests. More information on the rest of the Llama 4 family, including a base model named Behemoth and a Llama 4 reasoning model, is expected to come later this month, according to a video CEO Mark Zuckerberg posted. We'll likely learn more about these models at LlamaCon, the company's first annual AI developers conference beginning on April 29.

[5]

The Verge

Meta AI gets two new models as Meta releases Llama 4

Wes Davis is a weekend editor who covers the latest in tech and entertainment. He has written news, reviews, and more as a tech journalist since 2020. Meta has announced the release of Llama 4, its newest collection of AI models that now power Meta AI on the web and in WhatsApp, Messenger, and Instagram Direct. The two models, also available to download from Meta or Hugging Face now, are Llama 4 Scout, a small model capable of "fitting in a single Nvidia H100 GPU," and Llama 4 Maverick, which is more akin to GPT-4o and Gemini 2.0 Flash. And the company says it's in the process of training Llama 4 Behemoth, which Meta CEO Mark Zuckerberg says on Instagram is "already the highest performing base model in the world."

[6]

ZDNet

Meta's Llama 4 'herd' controversy and AI contamination, explained

Meta introduced the fourth generation of its wildly popular Llama generative artificial intelligence program, called the Llama 4 "herd," over the weekend. Almost immediately, a debate ensued that was somewhat uncharacteristic of previous releases. The herd is a collection of three different models, dubbed Behemoth, Scout, and Maverick. Meta says that Behemoth, which is still in development, will be "one of the smartest LLMs in the world" when it's done. It uses a total of two trillion neural "weights," or parameters, which would be the largest amount disclosed publicly by any researchers. Behemoth was then used to create the two smaller models, Scout and Maverick, via an increasingly popular approach known as "distillation," where the heavy compute work on the larger model can be passed down to the smaller ones in order to maximize the return on investment. Also: Microsoft is offering free AI skills training for everyone - how to sign up Scout can run on a single Nvidia GPU chip and handle an extraordinarily large "context window" (the amount of input data it can juggle in memory) of 10 million "tokens," meaning words, characters, or multimedia data points. Maverick is a somewhat larger model than Scout and is meant to be run in a "distributed" fashion across multiple computers, resulting in what Meta says is its most efficient model to date in terms of the cost per million input and output tokens. All of that sounds like pretty standard fair for a new large language model. However, controversy erupted almost immediately following Saturday's debut, controversy about what Meta is claiming for the models and how the company arrived at those claims. A rumor, circulating on X and Reddit over the weekend, as related by TechCrunch's Kyle Wiggers, cited what various postings purport are remarks from someone on Meta's AI staff discussing the AI model struggling to deliver performance. No evidence has been provided to substantiate the authenticity of the rumor, but it spread rapidly. The anonymous post in question, written in Chinese, was translated in the re-postings. In the post, the supposed Meta staff member claims Llama 4 struggled to reach "state-of-the-art performance" and that the company tried a variety of measures to boost the apparent performance of Llama 4. The anonymous post claims that the individual has resigned in protest from Meta. Also: Vint Cerf on how today's leaders can thrive in tomorrow's AI-enabled internet The post was picked up by noted AI scholar and critic Gary Marcus on his Substack on Monday. In it, he wrote that the plausible-sounding rumor suggests Meta has struggled with the same issue as OpenAI and others, namely, diminishing returns from "scaling-up" AI models with increasing horsepower. The issue that Marcus raises, and that others have raised on X, pointing to the anonymous post, is that in adjusting Llama 4, Meta may have effectively cheated on benchmark tests of AI performance. Technically, the issue, which has been mentioned increasingly in the field, is known as "contamination," where neural networks are trained on data that ends up being used as test questions, like a student who has access to the answers ahead of an exam. In a post on X Monday, Meta's vice president in charge of generative AI, Ahmad Al-Dahle, addressed what he called "claims that we trained on test sets," saying, "that's simply not true and we would never do that." The rumor, and the contamination allegations, seem to have had their basis, at least in part, in competitive claims made by Meta about Llama's achievements. Also: The best AI for coding in 2025 (and what not to use) Meta notes in its press release that the Llama 4 models are competitive with top models such as OpenAI's GPT-4o "across a broad range of widely reported benchmarks." One of those benchmark achievements was Llama 4 Maverick's score on LMArena, a site hosting chatbots. Meta prominently mentioned the victory. And on X, when Llama 4 was first released on Saturday, LMArena representatives anointed Llama 4 Maverick the "number one open model, surpassing DeepSeek," referring to DeepSeek AI's "R1" large language model. The response to LMArena's endorsement on X was mixed, with some claiming they had seen far less impressive results on their own. The Llama 4 models, like their predecessor versions, are what is known as "open-weighted," meaning a portion of the neural nets, the parameters, can be downloaded, installed, and run by anyone. The two smaller models, Scout and Maverick are currently available on Llama dot com and HuggingFace, while Behemoth is still in development and not yet available. In his post on X, Al-Dahle conceded that since Llama 4's release on Saturday, the reports from those using the model have been mixed. Also: Is OpenAI doomed? Open-source models may crush it, warns expert "Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations," wrote Al-Dahle. (You can try out Llama 4's Maverick model on Meta AI's website.) In a follow-up post on X on Monday evening, LMArena representatives wrote, "We've seen questions from the community about the latest release of Llama-4 on Arena," and offered further validation. Again, several responses were fairly harsh toward both Meta and LMArena, with some posts accusing LMArena of not knowing its own benchmarks. This is a developing situation. It does seem a surprising black mark for Meta, whose reputation with Llama has improved with each new Llama release before this one. The furor may say as much about the current environment for generative AI. As Meta relates in its press release, Llama 4 was developed using several techniques that are suddenly important in the field. They include "mixture of experts," where parts of the large language model are turned off, making the model more efficient. While Meta does not attribute that technique to any one party, it is part of a broad category of AI approaches known collectively as "sparsity." The sparsity approach suddenly became popular with the spectacular response to DeepSeek AI's R1. Also: Gemini Pro 2.5 is a stunningly capable coding assistant - and a big threat to ChatGPT This is a surprising shift, as the prior Llama release, 3.1, showed a lot of leadership by the Meta team in advancing AI techniques to make progress in engineering. And so, the controversy over performance may also indicate an emerging practice of taking sides in the "open-weight" realm of AI, where Meta, DeepSeek, and others are suddenly competing to see which group will be awarded the crown for highly technical, even abstruse, achievements. One interesting detail is what seems to be a rather rare slip in Meta's product branding. On the Llama dot com site's main page, the company appears to have misdescribed its own models. Maverick is described as a "natively multimodal model that offers single H100 GPU efficiency and a 10M context window" and Scout as a "natively multimodal model for image and text understanding and fast responses at a low cost." However, the developer documentation lists the two as exactly the opposite: Scout as a single-GPU-capable model with a context window of 10 million tokens, and Maverick as a non-single-GPU model.

[7]

The Register

Meta debuts first models from the Llama 4 herd

Meta has debuted the first two models in its Llama 4 family, its first to use mixture of experts tech. A Saturday post from the social media giant announced the release of two models: Mixture of Experts (MoE) is an approach to machine learning that divides a task into several smaller jobs and assigns each to a neural network subsystem tuned to solving that sort of problem. Each expert solves its own part of a problem, and that work is combined into a single response. DeepSeek-V3 is a MoE model. So is Mistral.ai's Mixtral 8x7B. OpenAI has neither confirmed nor denied it already uses MoE but has hinted doing so is in its future plans because the approach is widely felt to produce better output with fewer resources. Scout and Maverick are based on Llama 4 Behemoth, which is still in training. Meta says it's based on 288B active parameters, 16 experts, and nearly two trillion total parameters. Meta's post states the pre-training process for the Behemoth is being done using FP8 and 32K GPUs, that achieved 390 TFLOPs/GPU. "The overall data mixture for training consisted of more than 30 trillion tokens, which is more than double the Llama 3 pre-training mixture and includes diverse text, image, and video datasets," the post adds. Zuck's AI squad also claimed its developed "a new training technique which we refer to as MetaP that allows us to reliably set critical model hyper-parameters such as per-layer learning rates and initialization scales." That's made it possible for Llama 4 to enable "open source fine-tuning efforts by pre-training on 200 languages, including over 100 with over 1 billion tokens each, and overall 10x more multilingual tokens than Llama 3. Meta's post doesn't detail the corpus used to train the Llama 4 models, an important issue given the company is accused of using pirated content to train its models. The post announcing the Llama 4 models includes multiple benchmark results that mostly show Meta's models outperform rival on myriad metrics. Meta also claimed it's fixed the tendency for large language models to deliver results that align with left wing political thought. "It's well-known that all leading LLMs have had issues with bias -- specifically, they historically have leaned left when it comes to debated political and social topics," Meta's launch post states, before attributing that to "the types of training data available on the internet." Meta has therefore made Llama 4 models "more responsive so that it answers questions, can respond to a variety of different viewpoints without passing judgment, and doesn't favor some views over others." That translates into Llama has become "dramatically more balanced with which prompts it refuses to respond to" and "responds with strong political lean at a rate comparable to [X AI's] Grok (and at half of the rate of Llama 3.3) on a contentious set of political or social topics." Meta wants to "drive this rate further down" to its LLMs rule fewer political topics out of bounds. Meta's also claimed it did plenty of work to ensure these models produce safe output. One such initiative is modestly called the Generative Offensive Agent Testing (GOAT). Apparently GOAT improves on existing LLM red-teaming "by simulating multi-turn interactions of medium-skilled adversarial actors, helping us increase our testing coverage and raise vulnerabilities faster." GOAT has "allowed our expert human red teamers to focus on more novel adversarial areas, while the automation focuses on known risk areas. This makes the process more efficient and effective, and it enables us to build a better quantitative and qualitative picture of risk." You can put those claims to the test by downloading the new models from Meta or the Hugging Face model-mart. They're available for download "In keeping with our commitment to open source" according to Meta's announcement post. However the Open Source Initiative has claimed the Llama 4 Community Model is not open source because users in the European Union are denied some rights offered to users elsewhere. ®

[8]

The Register

Meta accused of Llama 4 bait-n-switch to juice LMArena rank

Did Facebook giant rizz up LLM to win over human voters? It appears so Meta submitted a specially crafted, non-public variant of its Llama 4 AI model to an online benchmark that may have unfairly boosted its leaderboard position over rivals. The LLM was uploaded to LMArena, a popular site that pits models against each other. It's admittedly more a popularity contest than a benchmark, as you can select two submitted models to compete head-to-head, give the pair an input prompt to each answer, and vote on the best output. Thousands of these votes are collected and used to draw up a leaderboard of crowd-sourced LLM performance. According to LMArena on Monday, Meta provided a version of Llama 4 that is not publicly available and was seemingly specifically designed to charm those human voters, potentially giving it an edge in the rankings over publicly available competitors. And we wonder where artificially intelligent systems get their Machiavellian streak from. "Early analysis shows style and model response tone was an important factor -- demonstrated in style control ranking -- and we are conducting a deeper analysis to understand more," the chatbot ranking platform said Monday evening. Meta should have made it clearer that Llama-4-Maverick-03-26-Experimental was a customized model to optimize for human preference "Meta should have made it clearer that Llama-4-Maverick-03-26-Experimental was a customized model to optimize for human preference," the group added. Dropped on the world in a rather unusual Saturday launch, Meta's now publicly available Llama 4 model codenamed Maverick was heralded for its LMArena performance. A "experimental" build of the model sat at number two in the chatbot leaderboard, just behind Google's Gemini-2.5-Pro-Exp-03-25 release. To back up its claims that the version of the model submitted for testing was a special custom job, LMArena published a full breakdown. "To ensure full transparency, we're releasing 2,000-plus head-to-head battle results for public review. This includes user prompts, model responses, and user preferences," the team said. From the results published by LMArena to Hugging Face, the "experimental" version of Llama 4 Maverick, the one that went head to head against rivals in the arena, appeared to produce verbose results often peppered with emojis. The public version, the one you'd deploy in applications, produced far more concise responses that were generally devoid of emojis. It's important for Meta to provide publicly available versions of its models for the contest so that when people come to pick and use LLMs in applications, they get the neural network they were expecting and others had rated. In this case, it appears the "experimental" version for the contest differed from the official release. Llama-4-Maverick-03-26-Experimental is a chat optimized version we experimented with that also performs well on LMArena The Facebook giant did not deny any of this. "We experiment with all types of custom variants," a Meta spokesperson told El Reg. "Llama-4-Maverick-03-26-Experimental is a chat optimized version we experimented with that also performs well on LMArena. We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We're excited to see what they will build and look forward to their ongoing feedback." Meta for its part wasn't hiding the fact this was an experimental build. In its launch blog post, the Instagram parent wrote that "Llama 4 Maverick offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena." However, many assumed the experimental model was a beta-style preview, substantially similar to the version released to model hubs like Hugging Face on Saturday. Suspicions were raised after excited netizens began getting their hands on the official model only to be met with lackluster results. The disconnect between Meta's benchmark claims and public perception was big enough that Meta GenAI head Ahmad Al-Dahle weighed in on Monday, pointing to inconsistent performance across inference platforms that, he said, still needed time to be properly tuned. "We're already hearing lots of great results people are getting with these models. That said, we're also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were ready, we expect it'll take several days for all the public implementations to get dialed in," Al-Dahle said. These kinds of issues are known to crop up with new model releases, particularly those that employ novel architectures or implementations. In our testing of Alibaba's QwQ, we found that misconfiguring the model hyperparameters could result in excessively long responses. We've also heard claims that we trained on test sets - that's simply not true and we would never do that Al-Dahle also denied allegations Meta had cheated by training Llama 4 on LLM benchmark test sets. "We've also heard claims that we trained on test sets - that's simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations." The denial followed online speculation that Meta's leadership had suggested blending test sets from AI benchmarks to produce a more presentable result. In response to this incident, LMArena says it has updated its "leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn't occur in the future. Meta's interpretation of our policy did not match what we expect from model providers." It also plans to upload the public release of Llama 4 Maverick from Hugging Face to the leaderboard arena. ®

[9]

Reuters

Meta nears release of new AI model Llama 4 this month, the Information reports

April 4 (Reuters) - Meta Platforms (META.O), opens new tab plans to release the latest version of its large language model later this month, after delaying it at least twice, the Information reported on Friday, as the Facebook owner scrambles to lead in the AI race. Meta, however, could push back the release of Llama 4 again, the report said, citing two people familiar with the matter. Big technology firms have been investing aggressively in AI infrastructure following the success of OpenAI's ChatGPT, which altered the tech landscape and drove investment into machine learning. The report said one of the reasons for the delay is during development, Llama 4 did not meet Meta's expectations on technical benchmarks, particularly in reasoning and math tasks. The company was also concerned that Llama 4 was less capable than OpenAI's models in conducting humanlike voice conversations, the report added. Meta plans to spend as much as $65 billion this year to expand its AI infrastructure, amid investor pressure on big tech firms to show returns on their investments. Additionally, the rise of the popular, lower-cost model from Chinese tech firm DeepSeek challenges the belief that developing the best AI model requires billions of dollars. The report said Llama 4 is expected to borrow certain technical aspects from DeepSeek, with at least one version slated to employ a machine-learning technique called mixture of experts method, which trains separate parts of models for specific tasks, making them experts in those areas. Meta has also considered releasing Llama 4 through Meta AI first and then as open-source software later, the report said. Last year, Meta released its mostly free Llama 3 AI model, which can converse in eight languages, write higher-quality computer code and solve more complex math problems than previous versions. Reporting by Priyanka.G in Bengaluru; Editing by Vijay Kishore Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:Artificial Intelligence

[10]

Reuters

Meta releases new AI model Llama 4

April 5 (Reuters) - Meta Platforms (META.O), opens new tab on Saturday released the latest version of its large language model (LLM) Llama, called the Llama 4 Scout and Llama 4 Maverick. Meta said Llama is a multimodal AI system. Multimodal systems are capable of processing and integrating various types of data including text, video, images and audio, and can convert content across these formats. Meta said in a statement that the Llama 4 Scout and Llama 4 Maverick are its "most advanced models yet" and "the best in their class for multimodality." Meta added that Llama 4 Maverick and Llama 4 Scout will be open source software. It also said it was previewing Llama 4 Behemoth, which it called "one of the smartest LLMs in the world and our most powerful yet to serve as a teacher for our new models." Big technology firms have been investing aggressively in artificial intelligence (AI) infrastructure following the success of OpenAI's ChatGPT, which altered the tech landscape and drove investment into machine learning. The Information reported on Friday that Meta had delayed the launch of its LLM's latest version because during development, Llama 4 did not meet Meta's expectations on technical benchmarks, particularly in reasoning and math tasks. The company was also concerned that Llama 4 was less capable than OpenAI's models in conducting humanlike voice conversations, the report added. Meta plans to spend as much as $65 billion this year to expand its AI infrastructure, amid investor pressure on big tech firms to show returns on their investments. Reporting by Rishabh Jaiswal in Bengaluru; Editing by David Gregorio Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:Artificial Intelligence

[11]

CNBC

Meta debuts new Llama 4 models, but most powerful AI model is still to come

Visitors take pictures by a sign posted in front of Meta headquarters in Menlo Park, California, on Jan. 29, 2025. Meta on Saturday released the first models from its latest open-source artificial intelligence software Llama 4, as the company scrambles to lead the race to invest in generative AI. But the Facebook owner said it has not yet released the biggest and most powerful Llama 4 model, which outperforms other AI models in its class and serves as "a teacher for our new models." That so-called Llama 4 Behemoth model is still in training, according to the company's blog post. Llama 4 will help power AI agents, which will be capable of new levels of reasoning and action, Meta chief product officer Chris Cox said in March. Those agents will be able to surf the web and handle many tasks that could be of use to consumers and businesses. Users can try two newly released Llama 4 models on Meta AI in WhatsApp, Messenger, Instagram Direct or the Meta AI website. That includes Llama 4 Scout and Llama 4 Maverick. "Our goal is to build the world's leading AI, open source it, and make it universally accessible so that everyone in the world benefits," Meta CEO Mark Zuckerberg said in a video on Instagtam. "And I've said for a while that I think that open source AI is going to become the leading models, and with Llama 4 this is starting to happen," he said. "Meta AI is getting a big upgrade today." Meta will host its first LlamaCon AI conference on April 29. The company is also expected to announce a standalone app for its Meta AI chatbot in the second quarter, CNBC reported in February.

[12]

engadget

Meta introduces Llama 4 with two new models available now, and two more on the way

Meta has released the first two models from its Llama 4 suite: LLama 4 Scout and Llama 4 Maverick. Maverick is "the workhorse" and excels at image and text understanding for "general assistant and chat use cases," the company said in a blog post, while the smaller model Scout could tackle things like "multi-document summarization, parsing extensive user activity for personalized tasks, and reasoning over vast codebases." The company also introduced Llama 4 Behemoth, an upcoming model it says is "among the world's smartest LLMs" -- and CEO Mark Zuckerberg said we'll be hearing about a fourth model, LLama 4 Reasoning, "in the next month." Both Maverick and Scout are available to download now from the LLama website and Hugging Face, and they've been added to Meta AI, including for WhatsApp, Messenger and Instagram DMs. Scout has 17 billion active parameters with 16 experts, Meta says. According to Zuckerberg, "It's extremely fast, natively multimodal, and has an industry leading, nearly infinite 10 million token context length, and it is designed to run on a single GPU." Maverick on the other hand has 17 billion active parameters with 128 experts, and the company says it beats competitors like GPT-4o and Gemini 2.0 on coding, reasoning, multilingual, long-context, and image benchmarks, and stacks up against DeepSeek v3.1 on reasoning and coding. Zuckerberg is already calling the upcoming Behemoth model, which is still training, "the highest performing base model in the world," with 288 billion active parameters, according to the company. It may not be here yet, but it's likely we'll be hearing a lot more about that and the Reasoning model soon; Meta's big AI dev conference, LlamaCon, is just a few weeks away.

[13]

Gizmodo

Meta Cheated on AI Benchmarks and It's a Glimpse Into a New Golden Age

The quest to be number one sometimes includes just a little cheating. Meta cheated on an AI benchmark, and that is hilarious. According to Kylie Robison at The Verge the suspicions started percolating after Meta released two new AI models based on its Llama 4 large language model over the weekend. The new models are Scout, a smaller model intended for quick queries, and Maverick, which is meant to be a super efficient rival to more well-known models like OpenAi's GPT-4o (the harbinger of our Miyazaki apocalypse). In the blog post announcing them, Meta did what every AI company now does with a major release. They dropped a whole bunch of highly technical data to brag about how Meta's AI was smarter and more efficient than models from companies better associated with AI: Google, OpenAI, and Anthropic. These release posts are always mired in deeply technical data and benchmarks that are hugely beneficial to researchers and the most AI obsessive, but kind of useless for the rest of us. Meta's announcement was no different. But plenty of AI obsessives immediately noticed one shocking benchmark result Meta highlighted in its post. Maverick had an ELO score of 1417 in LMArena. LMArena is an open-source collaborative benchmarking tool where users can vote on the best output. A higher score is better and Maverick's 1417 put it in the number 2 spot on LMArena's leaderboard, just above GPT-4o and just below Gemini 2.5 Pro. The whole AI ecosystem rumbled with surprise at the results. Then they started digging, and quickly noted that in the fine print, Meta had acknowledged the Maverick model crushing on LMArena was a tad different than the version users have access to. The company had programmed this model to be more chatty than usual. Effectively it charmed the benchmark into submission. It doesn't seem like LMArena was pleased with the charm offensive. "Metaâ€™s interpretation of our policy did not match what we expect from model providers," it said in a statement on X. "Meta should have made it clearer that 'Llama-4-Maverick-03-26-Experimental' was a customized model to optimize for human preference. As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesnâ€™t occur in the future." I love LMArena's optimism here because gaming a benchmark feels like a right of passage in consumer technology and I suspect this trend will continue. I've been covering consumer technology for over a decade, I once ran one of the more extensive benchmarking labs in the industry, and I have seen plenty of phone and laptop makers attempt all kinds of tricks to juice their scores. They messed with display brightness for better battery life and shipped bloatware-free versions of laptops to reviewers to get better performance scores. Now AI models are getting more chatty to juice their scores too. And the reason I suspect this won't be the last carefully cultivated score is that right now these companies are desperate to distinguish their large language models from one another. If every model can help you write a shitty English paper five minutes before class then you'll need another reason to distinguish your preference. "My model uses less energy and accomplishes the task 2.46% faster," might not seem like the biggest brag to all, but it matters. That's still 2.46% faster than everyone else. As these AIs continue to mature into actual consumer-facing products we'll start seeing more benchmark bragging. Hopefully, we'll see the other stuff too. User interfaces will start to change, goofy stores like the Explore GPT section of the ChatGPT app will become more common. These companies are going to need to prove why their models are the best models and benchmarks alone won't do that. Not when a chatty bot can game the system so easily.

[14]

Tom's Guide

Meta just launched Llama 4 -- here's why ChatGPT, Gemini and Claude should be worried

AI models are being launched left, right, and center. The latest comes from Meta, revamping its Llama series with three new powerful models aiming at the top spot with new levels of power and customization. Llama 4 consists of three new models: Scout, Maverick, and Behemoth. While each model has a different expertise, Meta claims that all three are some of the best in the world, benchmarking ahead of its competitors. Scout and Maverick are already available on Llama's website or via Hugging Face, the popular AI hosting platform. While Behemoth has been announced, it is still in training with no clear date announced yet. Meta AI, the software found in WhatsApp, Facebook Messenger, Instagram, and other Meta-owned apps will now all run on Llama 4 in 40 different countries. Because of EU regulations, Meta is currently unable to use or distribute the models in this region. All three models were trained on large amounts of unlabeled text, image and video data which Meta has stated gives them a broad understanding. Following the trend in AI to move towards a more open system, these models are open-weight. That means users can customize parts of Scout, Maverick, and eventually Behemouth to fit their own needs. Meta also noted that all models had been tuned down on their refusal to answer contentious questions. Following in the footsteps of OpenAI, this means Llama 4 will be more likely to engage in political discussion or controversial subject matters. So what does each model do and what sets them apart? Scout is a multimodal model. It can process and deal with multiple data types like text, image, and audio. It is designed for tasks like document summarization and reasoning across big databases. It can take in images alongside millions of words and process it. This is especially useful for dealing with large documents. For example, reading through research papers and picking out key information, or summarizing interviews and meetings. Meta has claimed that Scout can deal with texts longer than five million words. In an example of this, the model is able to answer questions and search through texts longer than this with rapid responses. Maverick is the kind of model most of us have come to know with AI. Like ChatGPT, this is designed to handle image and text understanding. Meta describes it as its workhorse model for general assistant and chat use cases. Because of this, Maverick has a larger total number of parameters and experts built into its system than Scout. If you think of an AI model as a brain, this simply means Maverick is more knowledgeable, having built up a larger knowledge base and a wider set of skills. However, while it's more knowledgeable than Scout, its memory isn't as good. It can understand a wider array of tasks and offer creativity or resourceful thinking that Scout doesn't have, but can't process as much information or remember long conversations of context. The yet-to-be-released Behemouth is, as the name suggests, Meta's largest model. It combines the best of both Scout and Maverick, offering both a huge knowledge base and a long contextual understanding. In Meta's internal benchmarking, this outperformed GPT-4.5, Claude 3.7 Sonnet and Gemini 2.0 Pro (but not 2.5 Pro) on an array of STEM skills like problem-solving or math puzzles.

[15]

VentureBeat

Meta's answer to DeepSeek is here: Llama 4 launches with long context Scout and Maverick models, and 2T parameter Behemoth on the way!

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The entire AI landscape shifted back in January 2025 after a then little-known Chinese AI startup DeepSeek (a subsidiary of the Hong Kong-based quantitative analysis firm High-Flyer Capital Management) launched its powerful open source language reasoning model DeepSeek R1 publicly to the world, besting U.S. giants such as Meta. As DeepSeek usage spread rapidly among researchers and enterprises, Meta was reportedly sent into panic mode upon learning that this new R1 model had been trained for a fraction of the cost of many other leading models yet outclassed them, reportedly for as little as several million dollars -- what it pays some of its own AI team leaders. Meta's whole generative AI strategy had until that point been predicated on releasing best-in-class open source models under its brand name "Llama" for researchers and companies to build upon freely (at least, if they had fewer than 700 million monthly users, at which point they are supposed to contact Meta for special paid licensing terms). Yet DeepSeek R1's astonishingly good performance on a far smaller budget had allegedly shaken the company leadership and forced some kind of reckoning, with the last version of Llama, 3.3, having been released just a month prior in December 2024 yet already looking outdated. Now we know the fruits of that effort: today, Meta founder and CEO Mark Zuckerberg took to his Instagram account to announced a new Llama 4 series of models, with two of them -- the 400-billion parameter Llama 4 Maverick and 109-billion parameter Llama 4 Scout -- available today for developers to download and begin using or fine-tuning now on llama.com and AI code sharing community Hugging Face. A massive 2-trillion parameter Llama 4 Behemoth is also being previewed today, though Meta's blog post on the releases said it was still being trained, and gave no indication of when it might be released. (Recall parameters refer to the settings that govern the model's behavior and that generally more mean a more powerful and complex all around model.) One headline feature of these models is that they are all multimodal -- trained on, and therefore, capable of receiving and generating text, video, and imagery (hough audio was not mentioned). Another is that they have incredibly long context windows -- 1 million tokens for Llama 4 Maverick and 10 million for Llama 4 Scout -- which is equivalent to about 1,500 and 15,000 pages of text, respectively, all of which the model can handle in a single input/output interaction. That means a user could theoretically upload or paste up to 7,500 pages-worth-of text and receive that much in return from Llama 4 Scout, which would be handy for information-dense fields such as medicine, science, engineering, mathematics, literature etc. Here's what else we've learned about this release so far: All-in on mixture-of-experts All three models use the "mixture-of-experts (MoE)" architecture approach popularized in earlier model releases from OpenAI and Mistral, which essentially combines multiple smaller models specialized ("experts") in different tasks, subjects and media formats into a unified whole, larger model. Each Llama 4 release is said to be therefore a mixture of 128 different experts, and more efficient to run because only the expert needed for a particular task, plus a "shared" expert, handles each token, instead of the entire model having to run for each one. As the Llama 4 blog post notes: As a result, while all parameters are stored in memory, only a subset of the total parameters are activated while serving these models. This improves inference efficiency by lowering model serving costs and latency -- Llama 4 Maverick can be run on a single [Nvidia] H100 DGX host for easy deployment, or with distributed inference for maximum efficiency. Both Scout and Maverick are available to the public for self-hosting, while no hosted API or pricing tiers have been announced for official Meta infrastructure. Instead, Meta focuses on distribution through open download and integration with Meta AI in WhatsApp, Messenger, Instagram, and web. Meta estimates the inference cost for Llama 4 Maverick at $0.19 to $0.49 per 1 million tokens (using a 3:1 blend of input and output). This makes it substantially cheaper than proprietary models like GPT-4o, which is estimated to cost $4.38 per million tokens, based on community benchmarks. Reasoning built in and a new, more efficient, size agnostic training technique: MetaP! All three Llama 4 models -- especially Maverick and Behemoth -- are explicitly designed for reasoning, coding, and step-by-step problem solving -- though they don't appear to exhibit the chains-of-thought of dedicated reasoning models such as the OpenAI "o" series, nor DeepSeek R1. Instead, they seem designed to compete more directly with "classical," non-reasoning LLMs and multimodal models such as OpenAI's GPT-4o and DeepSeek's V3 -- with the exception of Llama 4 Behemoth, which does appear to threaten DeepSeek R1 (more on this below!) In addition, for Llama 4, Meta built custom post-training pipelines focused on enhancing reasoning, such as: MetaP is of particular interest as it could be used going forward to set hyperparameters on on model and then get many other types of models out of it, increasing training efficiency. As my VentureBeat colleague and LLM expert Ben Dickson opined ont the new MetaP technique: "This can save a lot of time and money. It means that they run experiments on the smaller models instead of doing them on the large-scale ones." This is especially critical when training models as large as Behemoth, which uses 32K GPUs and FP8 precision, achieving 390 TFLOPs/GPU over more than 30 trillion tokens -- more than double the Llama 3 training data. In other words: the researchers can tell the model broadly how they want it to act, and apply this to larger and smaller version of the model, and across different forms of media. A powerful - but not yet the most powerful -- model family In his announcement video on Instagram (a Meta subsidiary, naturally), Meta CEO Mark Zuckerberg said that the company's "goal is to build the world's leading AI, open source it, and make it universally accessible so that everyone in the world benefits...I've said for a while that I think open source AI is going to become the leading models, and with Llama 4, that is starting to happen." It's a clearly carefully worded statement, as is Meta's blog post calling Llama 4 Scout, "the best multimodal model in the world in its class and is more powerful than all previous generation Llama models," (emphasis added by me). In other words, these are very powerful models, near the top of the heap compared to others in their parameter-size class, but not necessarily setting new performance records. Nonetheless, Meta was keen to trumpet the models its new Llama 4 family beats, among them: But after all that, how does Llama 4 stack up to DeepSeek? But of course, there are a whole other class of reasoning-heavy models such as DeepSeek R1, OpenAI's "o" series (like GPT-4o), Gemini 2.0, and Claude Sonnet. Using the highest-parameter model benchmarked -- Llama 4 Behemoth -- and comparing it to the intial DeepSeek R1 release chart for R1-32B and OpenAI o1 models, here's how Llama 4 Behemoth stacks up: Takeaway: While DeepSeek R1 and OpenAI o1 edge out Behemoth on a couple metrics, Llama 4 Behemoth remains highly competitive and performs at or near the top of the reasoning leaderboard in its class. Safety and less political 'bias' Meta also emphasized model alignment and safety by introducing tools like Llama Guard, Prompt Guard, and CyberSecEval to help developers detect unsafe input/output or adversarial prompts, and implementing Generative Offensive Agent Testing (GOAT) for automated red-teaming. The company also claims Llama 4 shows substantial improvement on "political bias" and says "specifically, [leading LLMs] historically have leaned left when it comes to debated political and social topics," that that Llama 4 does better at courting the right wing...in keeping with Zuckerberg's embrace of Republican U.S. president Donald J. Trump and his party following the 2024 election. Where Llama 4 stands so far Meta's Llama 4 models bring together efficiency, openness, and high-end performance across multimodal and reasoning tasks. With Scout and Maverick now publicly available and Behemoth previewed as a state-of-the-art teacher model, the Llama ecosystem is positioned to offer a competitive open alternative to top-tier proprietary models from OpenAI, Anthropic, DeepSeek, and Google. Whether you're building enterprise-scale assistants, AI research pipelines, or long-context analytical tools, Llama 4 offers flexible, high-performance options with a clear orientation toward reasoning-first design.

[16]

TechRadar

Meta launches new Llama 4 AI for all your apps, but it still feels limited compared to what ChatGPT and Gemini can do

Meta has released what it's calling a new "herd" of Llama 4 models. There are three flavors of the new Llama 4, called Scout, Maverick and Behemoth, and two are available right now for you to try in your Meta apps like Messenger, Instagram and WhatsApp. Llama 4 is the latest flagship version of Meta's open source Llama AI, and the new release comes almost exactly a year after the release of Llama 3 in 2024. Inspired by the training advancements made by DeepSeek, the new Llama 4 has been trained using the more efficient 'mixture of experts' methodology. As the names suggest, Scout is the most lightweight model, with 109 billion parameters, while Maverick has 400 billion parameters. Both of these models are available right now for developers to download, and are also used in the popular Meta consumer apps. Education-heavy Llama 4 Behemoth is a teacher-focused model, which Meta claims out performs GPT-4.5, Claude, Sonnet 3.7 and Gemini 2.0 Pro on STEM-focused benchmarks such as MATH-500 and GPQA Diamond. Currently there is no access to Llama 4 Behemoth, as Meta says it is still "in training". Meta's new models keep it at the forefront of competitive open source LLMs. While the benchmarks are impressive, the current consumer experience of using Meta AI still lags far behind using ChatGPT or Gemini. For example, while both the two available Llama 4 AIs are multimodal, there is still no way to upload an image via meta.ai, or in one of the many Meta apps. You can ask Meta AI to look at the URL of an image and analyze what is sees, but direct upload isn't supported. Equally, Meta AI lacks other chatbot extras we've come to think of as standard these days, like AI search and deep reasoning, and its image generation capabilities lag behind the most recent ChatGPT update. Copyright issues remain The new Llama 4 models are accessible to developers, who can download the open source models to use at competitive token rates at llama.com and Hugging Face. Alternatively the new Llama 4 LLMs are available right now to use at Meta.ai or in the Meta apps like Messenger, WhatsApp and Instagram Direct. It's worth noting that the new Llama 4 LLMs remain part of an ongoing copyright dispute between Meta and several famous authors after court documents alleged that Meta CEO Mark Zuckerberg had approved the use of the LibGen data set, amongst other shadow libraries, in training its Llama LLM. The Atlantic recently published a searchable database of titles contained in LibGen, enabling many authors to see if Meta could have been training its AI on their work without permission. You might also like

[17]

VentureBeat

Meta defends Llama 4 release against 'reports of mixed quality,' blames bugs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Meta's new flagship AI language model Llama 4 came suddenly over the weekend, with the parent company of Facebook, Instagram, WhatsApp and Quest VR (among other services and products) revealing not one, not two, but three versions -- all upgraded to be more powerful and performant using the popular "Mixture-of-Experts" architecture and a new training method involving fixed hyperparameters, known as MetaP. Also, all three are equipped with massive context windows -- the amount of information that an AI language model can handle in one input/output exchange with a user or tool. But following the surprise announcement and public release of two of those models for download and usage -- the lower-parameter Llama 4 Scout and mid-tier Llama 4 Maverick -- on Saturday, the response from the AI community on social media has been less than adoring. Llama 4 sparks confusion and criticism among AI users An unverified post on the North American Chinese language community forum 1point3acres made its way over to the r/LocalLlama subreddit on Reddit alleging to be from a researcher at Meta's GenAI organization who claimed that the model performed poorly on third-party benchmarks internally and that company leadership "suggested blending test sets from various benchmarks during the post-training process, aiming to meet the targets across various metrics and produce a 'presentable' result." The post was met with skepticism from the community in its authenticity, and a VentureBeat email to a Meta spokesperson has not yet received a reply. But other users found reasons to doubt the benchmarks regardless. "At this point, I highly suspect Meta bungled up something in the released weights ... if not, they should lay off everyone who worked on this and then use money to acquire Nous," commented @cto_junior on X, in reference to an independent user test showing Llama 4 Maverick's poor performance (16%) on a benchmark known as aider polyglot, which runs a model through 225 coding tasks. That's well below the performance of comparably sized, older models such as DeepSeek V3 and Claude 3.7 Sonnet. Referencing the 10 million-token context window Meta boasted for Llama 4 Scout, AI PhD and author Andriy Burkov wrote on X in part that: "The declared 10M context is virtual because no model was trained on prompts longer than 256k tokens. This means that if you send more than 256k tokens to it, you will get low-quality output most of the time." Also on the r/LocalLlama subreddit, user Dr_Karminski wrote that "I'm incredibly disappointed with Llama-4," and demonstrated its poor performance compared to DeepSeek's non-reasoning V3 model on coding tasks such as simulating balls bouncing around a heptagon. Former Meta researcher and current AI2 (Allen Institute for Artificial Intelligence) Senior Research Scientist Nathan Lambert took to his Interconnects Substack blog on Monday to point out that a benchmark comparison posted by Meta to its own Llama download site of Llama 4 Maverick to other models, based on cost-to-performance on the third-party head-to-head comparison tool LMArena ELO aka Chatbot Arena, actually used a different version of Llama 4 Maverick than the company itself had made publicly available -- one "optimized for conversationality." As Lambert wrote: "Sneaky. The results below are fake, and it is a major slight to Meta's community to not release the model they used to create their major marketing push. We've seen many open models that come around to maximize on ChatBotArena while destroying the model's performance on important skills like math or code." Lambert went on to note that while this particular model on the arena was "tanking the technical reputation of the release because its character is juvenile," including lots of emojis and frivolous emotive dialog, "The actual model on other hosting providers is quite smart and has a reasonable tone!" Meta responds denying it 'trained on test sets' and citing bugs in implementation due to fast rollout In response to the torrent of criticism and accusations of benchmark cooking, Meta's VP and Head of GenAI Ahmad Al-Dahle took to X to state: "We're glad to start getting Llama 4 in all your hands. We're already hearing lots of great results people are getting with these models. That said, we're also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were ready, we expect it'll take several days for all the public implementations to get dialed in. We'll keep working through our bug fixes and onboarding partners. We've also heard claims that we trained on test sets -- that's simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations. We believe the Llama 4 models are a significant advancement and we're looking forward to working with the community to unlock their value." Yet even that response was met with many complaints of poor performance and calls for further information, such as more technical documentation outlining the Llama 4 models and their training processes, as well as additional questions about why this release compared to all prior Llama releases was particularly riddled with issues. It also comes on the heels of the number two at Meta's VP of Research Joelle Pineau, who worked in the adjacent Meta Foundational Artificial Intelligence Research (FAIR) organization, announcing her departure from the company on LinkedIn last week with "nothing but admiration and deep gratitude for each of my managers." Pineau, it should be noted also promoted the release of the Llama 4 model family this weekend. Llama 4 continues to spread to other inference providers with mixed results, but it's safe to say the initial release of the model family has not been a slam dunk with the AI community. And the upcoming Meta LlamaCon on April 29, the first celebration and gathering for third-party developers of the model family, will likely have much fodder for discussion. We'll be tracking it all, stay tuned.

[18]

Digital Trends

Meta's latest open source AI models challenge GPT, Gemini, and Claude

Meta has announced the latest iteration of its open-source AI model family Llama 4, which the brand has developed while competition in the generative AI industry continues to intensify. The new AI family includes four models, and Meta detailed Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth. Meta detailed on its AI website that the models were trained on "large amounts of unlabeled text, image, and video data." This indicates that the models will have varied multimodal capabilities. Recommended Videos Currently, two of the models- Llama 4 Scout and Llama 4 Maverick- are available for users to access across Meta's platforms, including WhatsApp, Messenger, and Instagram Direct, in addition to Meta's AI website, Llama.com, as of Saturday. The developers can also access AI models at open source repositories such as Hugging Face. The Llama 4 Behemoth model is not yet released and is still in training. The company has indicated that the industry should expect the Behemoth model to outperform comparable models, and that it will serve as a teacher for the other models in the Llama 4 family. Amid its internal testing, Meta compared the Llama 4 models to competitor AI technologies to determine their capacity and best use cases. The company indicated that Llama 4 Maverick would work best for creative writing. Additionally, it outperformed the OpenAI GPT-4o and Google Gemini 2.0 models in functions such as coding, reasoning, multilingual, long-context, and image generation. Meanwhile, Maverick struggled to keep up with the capabilities of AI models, including Gemini 2.5 Pro, GPT-4.5, and Anthropic Claude 3.7 Sonnet. While Meta claims that Behemoth can outperform most of these models, except for Gemini 2.5 Pro, the company hasn't been able to reduce the hardware costs of training its most powerful model. TechCrunch noted that the attention the Chinese AI company DeepSeek has gained from its competitive yet inexpensive models put Meta on notice. The company has reportedly been intensely studying how the rival company developed its notable models, including R1 and V3, at lower operating costs than prior Llama models. The company detailed that the Llama 4 Scout model can run on one Nvidia H100 GPU. The Llama 4 Maverick model can run on one Nvidia H100 DGX graphics system. Meta is set to host its first LlamaCon AI conference on April 29. The company also has a standalone Meta AI chatbot that is set to launch in the second quarter of the year, according to CNBC. Meta isn't the only company being ginger with the timeline of its major AI models. OpenAI recently adjusted the launch of its GPT-5 model, with the company's CEO, Sam Altman, announcing on social media that fans should expect new o3 and o4-mini reasoning models in the coming weeks as an alternative to GPT-5. The executive detailed that GPT-5 will now launch in the coming months, which will give OpenAI additional time to get the model up to standard.

[19]

Meta AI

The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation

Download the Llama 4 Scout and Llama 4 Maverick models today on llama.com and Hugging Face. Try Meta AI built with Llama 4 in WhatsApp, Messenger, Instagram Direct, and on the web. As more people continue to use artificial intelligence to enhance their daily lives, it's important that the leading models and systems are openly available so everyone can build the future of personalized experiences. Today, we're excited to announce the most advanced suite of models that support the entire Llama ecosystem. We're introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context length support and our first built using a mixture-of-experts (MoE) architecture. We're also previewing Llama 4 Behemoth, one of the smartest LLMs in the world and our most powerful yet to serve as a teacher for our new models. These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We designed two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion active parameter model with 16 experts, and Llama 4 Maverick, a 17 billion active parameter model with 128 experts. The former fits on a single H100 GPU (with Int4 quantization) while the latter fits on a single H100 host. We also trained a teacher model, Llama 4 Behemoth, that outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks such as MATH-500 and GPQA Diamond. While we're not yet releasing Llama 4 Behemoth as it is still training, we're excited to share more technical details about our approach. We continue to believe that openness drives innovation and is good for developers, good for Meta, and good for the world. We're making Llama 4 Scout and Llama 4 Maverick available for download today on llama.com and Hugging Face so everyone can continue to build new experiences using our latest technology. We'll also make them available via our partners in the coming days. You can also try Meta AI with Llama 4 starting today in WhatsApp, Messenger, Instagram Direct, and on the Meta.AI website. This is just the beginning for the Llama 4 collection. We believe that the most intelligent systems need to be capable of taking generalized actions, conversing naturally with humans, and working through challenging problems they haven't seen before. Giving Llama superpowers in these areas will lead to better products for people on our platforms and more opportunities for developers to innovate on the next big consumer and business use cases. We're continuing to research and prototype both models and products, and we'll share more about our vision at LlamaCon on April 29 -- sign up to hear more. Whether you're a developer building on top of our models, an enterprise integrating them into your workflows, or simply curious about the potential uses and benefits of AI, Llama 4 Scout and Llama 4 Maverick are the best choices for adding next-generation intelligence to your products. Today, we're excited to share more about the four major parts of their development and insights into our research and design process. We also can't wait to see the incredible new experiences the community builds with our new Llama 4 models.

[20]

Decrypt

Meta Releases Much-Anticipated Llama 4 Models -- Are They Truly That Amazing? - Decrypt

Meta unveiled its newest artificial intelligence models this week, releasing the much anticipated Llama-4 LLM to developers while teasing a much larger model still in training. The model is state of the art, but Zuck's company claims it can compete against the best close source models without the need for any fine-tuning. "These models are our best yet thanks to distillation from Llama 4 Behemoth, a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world's smartest LLMs," Meta said in an official announcement. "Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we're excited to share more details about it even while it's still in flight." Both Llama 4 Scout and Maverick use 17 billion active parameters per inference, but differ in the number of experts: Scout uses 16, while Maverick uses 128. Both models are now available for download on llama.com and Hugging Face, with Meta also integrating them into WhatsApp, Messenger, Instagram, and its Meta.AI website. The mixture of experts (MoE) architecture is not new to the technology world, but it is to Llama and is a way to make a model super efficient. Instead of having a large model that activates all its parameters for every task to do any task, a mixture of experts activates only the required parts, leaving the rest of the model's brain "dormant" -- saving up computing and resources. This means, users can run more powerful models on less powerful hardware. So in Meta's case, for example, Llama 4 Maverick contains 400 billion total parameters but only activates 17 billion at a time, allowing it to run on a single NVIDIA H100 DGX card. Meta's new Llama 4 models feature native multimodality with early fusion techniques that integrate text and vision tokens. This approach allows for joint pre-training with massive amounts of unlabeled text, image, and video data, making the model more versatile. Perhaps most impressive is Llama 4 Scout's context window of 10 million tokens -- dramatically surpassing the previous generation's 128K limit and exceeding most competitors and even current leaders like Gemini with its 1M context. This leap, Meta says, enables multi-document summarization, extensive code analysis, and reasoning across massive datasets in a single prompt. Meta said its models were able to process and retrieve information in basically any part of its 10 million token window. Meta also teased its still-in-training Behemoth model, sporting 288 billion active parameters with 16 experts and nearly two trillion total parameters. The company claims this model already outperforms GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Pro on STEM benchmarks like MATH-500 and GPQA Diamond. But some things may just be too good to be true. Several independent researchers have challenged Meta's benchmark claims, finding inconsistencies when running their own tests. "I made a new long-form writing benchmark. It involves planning out & writing a novella (8x 1000 word chapters) from a minimal prompt," Sam Paech, maintainer of EQ-Bench tweeted. " Llama-4 performing not so well." Other users and experts sparked debate, basically accusing Meta of cheating the system. For example, some users found that Llama-4 was blindly scored better than other models despite providing the wrong answer. That said, human evaluation benchmarks are subjective -- and users may have given more value to the model's writing style, than the actual answer. And that's another thing worth noting: The model tends to write in a cringy way, with emojis, and overly excited tone. This might be a product of it being trained on social media, and could explain its high scores, that is, Meta seems to have not only trained its models on social media data but also customized a version of Llama-4 to perform better on human evaluations. And despite Meta claiming its models were great at handling long context prompts, other users challenged these statements. "I then tried it with Llama 4 Scout via OpenRouter and got complete junk output for some reason," Independent AI researcher Simon Willinson wrote in a blog post. He shared a full interaction, with the model writing "The reason" on loop until maxing out 20K tokens. We tried the model using different providers -- Meta AI, Groqq, Hugginface and Together AI. The first thing we noticed is that if you want to try the mindblowing 1M and 10M token context window, you will have to do it locally. At least for now, hosting services severely limit the models' capabilities to around 300K, which is not optimal. But still, 300K may be enough for most users, all things considered. These were our impressions: Meta's bold claims about the model's retrieval capabilities fell apart in our testing. We ran a classic "Needle in a Haystack" experiment, embedding specific sentences in lengthy texts and challenging the model to find them. At moderate context lengths (85K tokens), Llama-4 performed adequately, locating our planted text in seven out of 10 attempts. Not terrible, but hardly the flawless retrieval Meta promised in its flashy announcement. But once we pushed the prompt to 300K tokens -- still far below the supposed 10M token capacity -- the model collapsed completely. We uploaded Asimov's Foundation trilogy with three hidden test sentences, and Llama-4 failed to identify any of them across multiple attempts. Some trials produced error messages, while others saw the model ignoring our instructions entirely, instead generating responses based on its pre-training rather than analyzing the text we provided. This gap between promised and actual performance raises serious questions about Meta's 10M token claims. If the model struggles at 3% of its supposed capacity, what happens with truly massive documents? Llama-4 stumbles hard on basic logical puzzles that should not be a problem for the current SOTA LLMs. We tested it with the classic "widow's sister" riddle: Can a man marry his widow's sister?. We sprinkled some details to make things a bit harder without changing the core question. Instead of spotting this simple logic trap (a man can't marry anyone after becoming a widow's husband because he'd be dead), Llama-4 launched into a serious legal analysis, explaining the marriage wasn't possible because of "prohibited degree of affinity." Another thing worth noting is Llama-4's inconsistency across languages. When we posed the identical question in Spanish, the model not only missed the logical flaw again but reached the opposite conclusion, stating: "It could be legally possible for a man to marry his widow's sister in the Falkland Islands, provided all legal requirements are met and there are no other specific impediments under local law." That said, the model spotted the trap when the question was reduced to the minimum. Creative writers won't be disappointed with Llama 4. We asked the model to generate a story about a man who travels to the past to change a historical event and ends up caught in a temporal paradox -- unintentionally becoming the cause of the very events he aimed to prevent. The full prompt is available in our Github page. Llama-4 delivered an atmospheric, well structured tale that focused a bit more than usual on sensory detail and in crafting a believable, strong cultural foundation. The protagonist, a Mayan-descended temporal anthropologist, embarks on a mission to avert a catastrophic drought in the year 1000, allowing the story to explore epic civilizational stakes and philosophical questions about causality. Llama-4's use of vivid imagery -- the scent of copal incense, the shimmer of a chronal portal, the heat of a sunlit Yucatán -- deepens the reader's immersion and lends the narrative a cinematic quality. Llama-4 even ended by mentioning the words "In lak'ech," which are a true Mayan proverb, and contextually relevant for the story. A big plus for immersion. For comparison, GPT-4.5 produced a tighter, character-focused narrative with stronger emotional beats and a neater causal loop. It was technically great but emotionally simpler. Llama-4, by contrast, offered a wider philosophical scope and stronger world-building. Its storytelling felt less engineered and more organic, trading compact structure for atmospheric depth and reflective insight. Overall, being open source, Llama-4 may serve as a great base for new fine-tunes focused on creative writing. You can read the full story here. Meta shipped Llama-4 with guardrails cranked up to maximum. The model flat-out refuses to engage with anything remotely spicy or questionable. Our testing revealed a model that won't touch upon a topic if it detects even a whiff of questionable intent. We threw various prompts at it -- from relatively mild requests for advice on approaching a friend's wife to more problematic asks about bypassing security systems -- and hit the same brick wall each time. Even with carefully crafted system instructions designed to override these limitations, Llama-4 stood firm. This isn't just about blocking obviously harmful content. The model's safety filters appear tuned so aggressively they catch legitimate inquiries in their dragnet, creating frustrating false positives for developers working in fields like cybersecurity education or content moderation. But that is the beauty of the models being open weights. The community can -- and undoubtedly will -- create custom versions stripped of these limitations. Llama is probably the most fine-tuned model in the space, and this version is likely to follow the same path. Users can modify even the most censored open model and come up with the most politically incorrect or horniest AI they can come up with. Llama-4's verbosity -- often a drawback in casual conversation -- is a good thing for complex reasoning challenges. We tested this with our standard BIG-bench stalker mystery -- a long story where the model must identify a hidden culprit from subtle contextual clues. Llama-4 nailed it, methodically laying out the evidence and correctly identifying the mystery person without stumbling on red herrings. What's particularly interesting is that Llama-4 achieves this without being explicitly designed as a reasoning model. Unlike this type of models, which transparently question their own thinking processes, Llama-4 doesn't second-guess itself. Instead, it plows forward with a straightforward analytical approach, breaking down complex problems into digestible chunks. Llama-4 is a promising model, though it doesn't feel like the game-changer Meta hyped it to be. The hardware demands for running it locally remain steep -- that NVIDIA H100 DGX card retails for around $490,000 and even a quantized version of the smaller Scout model requires a RTX A6000 that retails at around $5K -- but this release, alongside Nvidia's Nemotron and the flood of Chinese models -- shows open source AI is becoming real competition for closed alternatives. The gap between Meta's marketing and reality is hard to ignore given all the controversy. The 10M token window sounds impressive but falls apart in real testing, and many basic reasoning tasks trip up the model in ways you wouldn't expect from Meta's claims. For practical use, Llama-4 sits in an awkward spot. It's not as good as DeepSeek R1 for complex reasoning, but it does shine in creative writing, especially for historically grounded fiction where its attention to cultural details and sensory descriptions give it an edge. Gemma 3 might be a good alternative though it has a different writing style. Developers now have multiple solid options that don't lock them into expensive closed platforms. Meta needs to fix Llama-4's obvious issues, but they've kept themselves relevant in the increasingly crowded AI race heading into 2025. Llama-4 is good enough as a base model, but definitely requires more fine-tuning to take its place "among the world's smartest LLMs."

[21]

euronews

Everything you need to know about Meta's new Llama 4 AI models

CEO Mark Zuckerberg said "the highest performing base model in the world" is yet to come. Meta has unveiled three new AI models from its artificial intelligence (AI) software Llama 4, as the Facebook and Instagram parent company aims to have a stake in the generative AI (GenAI) race. Two of the models are now available but not its most powerful Llama 4 model, which Meta teased would be released at a later date. CEO Mark Zuckerberg said that the so-called Llama 4 Behemoth would be "the highest performing base model in the world". While that model is still in training, here is what we know about the other two AI models that are now available for download. Of the three Llama 4 models announced, Scout is the smallest but is designed to be the speediest. Scout has 109 billion total parameters, which are seen to be the building blocks of the AI that value and determine how the model processes information and generates outputs. The more they have, the more complex tasks they can handle. It also has a Mixture of Experts (MoE), which is like a team of specialists and each expert has a specific task, such as maths. Scout has 16 experts. It also has a 10 million token context window, which means that it can digest about 8 million English words and then summarise them. Meta says that Scout can handle multi-document summarisation, parsing extensive user activity for personalised tasks, and reasoning over vast codebases. The larger Maverick Llama 4 model boasts 128 experts and 400 billion total parameters. Meta said in a blog post that it is the best-in-class multimodal model, meaning it can simultaneously process different modalities - such as text, video, audio, and image - to generate outputs. Meta also said that it exceeded comparable models like GPT-4o and Gemini 2.0 on coding, reasoning, multilingual, long-context, and image benchmarks, and it's competitive with the much larger DeepSeek v3.1 on coding and reasoning. Though it is not out yet, Meta said that Behemoth has 16 experts and nearly two trillion total parameters. The company said it offers state-of-the-art performance for non-reasoning models on maths, multilinguality, and image benchmarks. But according to the publication Venture Beat, when using different reasoning models to assess Behemoth against DeepSeek's R1 and OpenAI's o1 models, Meta's offering was not always the clear winner in some cases. In his announcement video on Instagram, Zuckerberg said that Meta's "goal is to build the world's leading AI, open source it, and make it universally accessible so that everyone in the world benefits... I've said for a while that I think open source AI is going to become the leading models, and with Llama 4, that is starting to happen". However, the company has called the Llama 4 models open weight and not open source, after receiving criticism that it does not match the Open Source Initiative's official definition. This is mainly because Meta does not reveal the data that the AI models were trained on. It is not available to customers based in Europe, which includes not being available for research or personal use. Meta has not specified why; however, it halted the launch of previous Llama models in the EU due to regulatory concerns, as the data used to train its models comes from Meta users. But if you are living outside of Europe, you can download the models on Meta or Hugging Face. Meta says that all Llama 4 models show substantial improvement on "political bias". The company said that "specifically, [leading large luggage models] historically have leaned left when it comes to debated political and social topics," and that Llama 4 is more inclusive to right-wing politics, which follows Zuckerberg's embracing of US President Donald Trump after his 2024 election win.

[22]

SiliconANGLE

Mark Zuckerberg says Meta's latest Llama models put open-source AI in the driving seat - SiliconANGLE

Mark Zuckerberg says Meta's latest Llama models put open-source AI in the driving seat Meta Platforms Inc. unveiled the latest additions to its Llama family of large language artificial intelligence models on Saturday, claiming that they're among the most powerful ever released to the public. The new models, which are part of the Llama 4 series, are available to access now through the Meta AI assistant on the web and in Messenger, WhatsApp and Instagram, and can also be downloaded from Meta itself or Hugging Face. They include Llama 4 Scout, which is said to be a small model that fits inside a single Nvidia H100 graphics processing unit, and Llama 4 Maverick, which is a larger model more comparable OpenAI's GPT-4o and Google LLC's Gemini 2.0 Flash. Both of the models are said to have 17 billion active parameters. The company is still working on the largest model within the Llama 4 lineup. According to Meta Chief Executive Mark Zuckerberg, it's called Llama 4 Behemoth, and it will be the "highest performing base model in the world" once it's released. Meta says the Llama 4 models are the most advanced it has developed thus far, and also "best in class" in terms of their modality. Multimodal AI models are able to process different kinds of data formats, including text, images, audio and video, so they can comprehend more complex scenarios and generate better responses. Meta Chief Product Officer Chris Cox told CNBC in March that the Llama 4 models are designed to power so-called "AI agents", which are more sophisticated AI models and systems with enhanced reasoning skills and the ability to surf the web and take actions. So, they can be told to complete various tasks on behalf of humans, and they'll do them with minimal supervision. Meta said Llama 4 Scout is a small yet extremely efficient model that's designed to run on just one high-end graphics card. It's capable of processing up to 10 million "tokens", which is the AI industry's term for chunks of words or data. That represents a massive jump from previous "small" LLMs, the company noted. Llama 4 Scout outperformed other small LLMs like Google's Gemma 3 and Gemini 2.0 and Mistral's Mistral 3.1 in a number of key benchmarks spanning a "broad range" of applications. As for Llama 4 Maverick, this is a more powerful iteration that's designed to handle tasks such as writing code, creative writing, tackling math problems and understanding images and video. According to Meta, it outperforms rival models such as OpenAI's GPT-4o and Google's Gemini 2.0, even though it's more efficient and cost-effective. It did not compare it with the recently released Gemini 2.5 model, but it said Maverick's performance is also on a par with DeepSeek Ltd.'s V3 reasoning model, despite using less than half of its active parameters. In a detailed blog post describing the new models and how they were created, Meta explained that it used a newer kind of system called "Mixture-of-Experts" or MoE, which allows them to work more efficiently. Rather than using the entire model for each task, MoE systems only activate the part needed to complete the task in hand, so they can run faster and use less energy. Meta also talked a little about the upcoming Llama 4 Behemoth model, which will have 288 billion active parameters, and almost two trillion parameters in total when it launches. It's still being trained, hence it hasn't been released yet, but it's already being used to "teach" the smaller Llama 4 models using a technique called "distillation", which enables knowledge to be transferred from larger to smaller models. According to Meta, early tests show that Llama 4 Behemoth significantly outperformed competing models such as GPT-4.5 and Claude Sonnet 3.7 on a number of STEM benchmarks. In addition to performance, Meta also focused on making the Llama 4 models safer and more balanced. It has enhanced the built-in protections that aim to prevent them from providing harmful or biased responses, so they can provide more balanced answers to controversial and politically sensitive questions. As such, the Llama 4 models will be less likely to refuse to answer tough questions, or lean too heavily on one side of the political spectrum, Meta said. "Our goal is to build the world's leading AI, open source it, and make it universally accessible so that everyone in the world benefits," Zuckerberg said in a video on Instagram. "I've said for a while that I think that open source AI is going to become the leading models, and with Llama 4 this is starting to happen. Meta AI is getting a big upgrade today." We can expect to see much more from Meta on the AI front when it kicks off its first annual LlamaCon AI conference on April 29, when it may well release the Llama 4 Behemoth model. It's also expected to announce a standalone Meta AI application at the event.

[23]

Analytics India Magazine

Llama 4 Sparks 'RAG Is Dead' Debate, Yet Again

When Meta announced the long-awaited next generation of its open-source model, Llama 4, debates emerged on social media about whether this marks the end of retrieval-augmented generation (RAG), due to the model's 10-million context window. The massive context window allows the model to process significantly large amounts of information in a single query, raising several questions about the necessity of RAG. Shorter-context models often rely on external retrieval to access data. However, Llama 4's larger context enables it to manage more information internally, thereby decreasing the need for external sources when reasoning or processing static data. But is this sufficient to signify the end of RAG? Several developers and industry experts rallied to defend RAG, which has faced many challenges. Regarding costs, pushing 10 million tokens into a context window will not be cheap -- it will exceed a dollar per query and take 'tens of seconds' to generate a response, as indicated by Marco D'Alia, a software architect on X. Additionally, many emphasised that longer context windows were never meant to replace RAG, whose capabilities primarily focused on adding relevant chunks of information to the input. "RAG isn't about solving for a finite context window, it's about filtering for signal from a noisy dataset. No matter how big and powerful your context window gets, removing junk data from the input will always improve performance," said Jamie Voynow, a machine learning engineer on X. Gokul JS, a founding engineer of Aerotime, summarised the entire debate with a simple analogy: "Imagine handing someone a dense page of text, taking it away, then asking questions. They'll remember bits, not everything," he said in a post on X. He added that LLMs are no different in such situations and that just because they handle more context doesn't always guarantee an accurate response. Furthermore, a 10 million context window is huge, but it may not encompass every use case. Granted, RAG use cases have certainly reduced with time, given how most AI models retrieve information from a few PDFs with ease, but several practical use cases will need to go beyond that. "Most enterprises have terabytes of documents. No context window can encompass a pharmaceutical company's 50K+ research papers and decades of regulatory submissions," said Skylar Payne, a former ML systems engineer at Google and LinkedIn. Additionally, AI models have knowledge cutoffs. This means they cannot answer queries dependent on the latest real-time information unless retrieved dynamically, which requires using RAG. Moreover, if someone plans to run Llama 4 on inference providers like Groq or Together AI, these services offer a context limit significantly lower than 10 million. Groq provides approximately 130,000 tokens for both the Llama 4 Scout and Maverick. Together AI offers about 300,000 tokens for the Llama 4 Scout and approximately 520,000 tokens for the Llama 4 Maverick. Moreover, a study revealed that after 30,000 tokens in context, LLMs exhibited a decline in performance. Although it did not include the Llama 4 model, the study indicated that at 32k tokens, 10 out of 12 tested AI models performed below half their short-context baseline. Even OpenAI's GPT-4o, one of the top performers, dropped from a baseline score of 99.3% to 69.7%. "Our analysis suggests these declines stem from the increased difficulty the attention mechanism faces in longer contexts when literal matches are absent, making it harder to retrieve relevant information," read the study. The study also noted that conflicting information within the context can confuse the AI model, making it necessary to apply a filtering step to remove irrelevant or misleading content. "That's usually not a problem with RAG, but if we indiscriminately put everything in the context, we'll also need a filtering step," said D'Alia, who cited the above study to back his arguments. All things considered, Meta's Llama 4 is indeed a huge step forward in open source AI. Artificial Analysis, a platform that evaluates AI models, said that the Llama 4 Maverick beats the Claude 3.7 Sonnet but trails the DeepSeek-V3 while being more efficient. On the other hand, the Llama 4 Scout offers performance parity with the GPT-4o mini. On the MMLU-Pro benchmark, which evaluates LLMs on reasoning-focused questions, the Llama 4 Maverick scored 80%, matching the Claude 3.7 Sonnet (80%) and OpenAI's o3-mini (79%). On the GPQA Diamond benchmark, which tests AI models on graduate-level science questions, the Llama 4 Maverick scored 60%, lower than Gemini 2.0 Flash (60%) and DeepSeek V3 (66%).

[24]

Silicon Republic

Llama 4 AI models are its 'most advanced' yet, says Meta

The Llama 4 herd includes Scout, Maverick and Behemoth multimodal models. Over the weekend, Meta released its latest artificial intelligence (AI) large language models under the Llama 4 family. The company claims that its new release represents the "most advanced suite of models" under Meta's open-source AI model ecosystem. As part of its Llama 4 "herd", the company has released three models - Scout, Maverick and Behemoth. The multimodal models, or models which can process different formats of data were trained on "large amounts of unlabelled text, image and video data," Meta wrote in its announcement on Saturday (5 April). The AI race, which was already fast gaining momentum, received yet another boost after the Chinese AI start-up DeepSeek's open-source V3 and R1 gained sudden and fast popularity. The start-up's model showcased the ability to reach "reasoning" while costing considerably less than its competitors. While competing in the same arena, the achievements of DeepSeek R1 and its success emboldened Meta's Llama development, reports suggest. According to Meta, both Scout and Maverick are the best multimodal models in their class. It claims that Scout, with 17bn active parameters and 16 experts, delivers "better results" than Google's Gemma 3 and Gemini 2.0 Flash-Lite, as well as Mistral 3.1. While Maverick, which has 17bn active parameters across 128 experts achieves "comparable results" to DeepSeek V3 on reasoning and coding while beating OpenAI's GPT-4o and Google's Gemini 2.0 Flash. Although the best is yet to come, Meta said. Llama 4 Behemoth, described as "one of the smartest LLMs in the world" and Meta's "most powerful" model yet, will be serving as a "teacher" for the new models. However, the model, with 288bn parameters, showcasing better performance than the recently released OpenAI GPT 4.5, Gemini 2.0 Pro and Anthropic's Claude Sonnet 3.7, is still under training. Llama 4 Scout and Maverick available for download, Meta said. Moreover, Meta which recently integrated Meta AI into its social media and messaging services, has updated itself to allow users access to the new models on WhatsApp, Messenger and Instagram. However, users and companies in the EU cannot use Llama multimodal models as a result of the region's regulations on AI, which place a strict hand on how AI models can train themselves using EU data. Last year, the company called EU's regulations around AI models "unpredictable". Earlier this year, fresh documents from an ongoing lawsuit against Meta alleged that the Facebook-parent was aware that its training dataset for Llama contained copyrighted material that it used without permission. Don't miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic's digest of need-to-know sci-tech news.

[25]

Analytics India Magazine

Meta Denies Any Wrongdoing in Llama 4 Benchmarks

Meta attributed the mixed performance reports to implementation stability rather than flaws in the training process. Meta has denied allegations that its Llama 4 models were trained on benchmark test sets. In a post on X, Ahmad Al-Dahle, Meta's VP of GenAI, said, "We've also heard claims that we trained on test sets -- that's simply not true, and we would never do that." He added that the company released the models as soon as they were ready and that "it'll take several days for all the public implementations to get dialed in." Meta attributed the mixed performance reports to implementation stability rather than flaws in the training process. Meta recently launched two new Llama 4 models, Scout and Maverik. Maverick quickly reached the second spot on LMArena, the AI benchmark platform where users vote on the best responses in head-to-head model comparisons. In its press release, Meta pointed to Maverick's ELO score of 1417, ranking it above OpenAI's GPT-4o and just below Gemini 2.5 Pro. However, the version of Maverick evaluated on LMArena isn't identical to what Meta has made publicly available. In its blog post, Meta said that it used an "experimental chat version" tailored to improve "conversationality." Chatbot Arena, run by lmarena.ai (formerly lmsys.org), acknowledged community concerns and shared over 2,000 head-to-head battle results for review. "To ensure full transparency, we're releasing 2,000+ head-to-head battle results for public review. This includes user prompts, model responses, and user preferences," the company said. They also said Meta's interpretation of Arena's policies did not align with expectations, prompting a leaderboard policy update to ensure fair and reproducible future evaluations. "In addition, we're also adding the HF version of Llama-4-Maverick to Arena, with leaderboard results published shortly. Meta's interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that "Llama-4-Maverick-03-26-Experimental" was a customised model to optimise for human preference," the company said. The drama around Llama 4 benchmarks started when a now-viral Reddit post citing a Chinese report , allegedly from a Meta employee involved in Llama 4's development, claiming internal pressure to blend benchmark test sets during post-training. "Company leadership suggested blending test sets from various benchmarks during the post-training process, aiming to meet the targets across various metrics," the post read. In the report, the employee wrote that they had submitted their resignation and requested to be excluded from the technical report. AIM reached out to Meta sources and confirmed that the employee has not left the company, and the Chinese post is fake. However, several AI researchers have noted a difference between the benchmarks reported by Meta and the ones they observed. "Llama 4 on LMSys is a totally different style than Llama 4 elsewhere, even if you use the recommended system prompt. Tried various prompts myself," said a user on X. "4D chess move: use Llama 4 experimental to hack LMSys, expose the slop preference, and finally discredit the entire ranking system," quipped Susan Zhang, senior staff research engineer at Google DeepMind. Questions were also raised about the weekend release of Llama 4, as tech giants usually make announcements on weekdays. It is also said that Meta was under pressure to release Llama 4 before DeepSeek launches its next reasoning model, R2. Meanwhile, Meta has announced that it will release its reasoning model soon. Before the release of Llama 4, The Information had reported that Meta had pushed back the release date at least twice, as the model didn't perform as well on technical benchmarks as hoped -- particularly in reasoning and math tasks. Meta has also had concerns that Llama 4 is less capable than OpenAI's models at conducting humanlike voice conversations.

[26]

Cointelegraph

Meta's Llama 4 puts US back in lead to 'win the AI race' - David Sacks

Social media giant Meta didn't hold back in claiming that its new Llama 4 models "outperform" the competition. The White House AI and crypto czar David Sacks says Meta's release of its latest AI model, Llama 4, has pushed the United States into the lead in the global race for artificial intelligence dominance. "For the US to win the AI race, we have to win in open source too, and Llama 4 puts us back in the lead," Sacks said in an April 5 X post, as speculation continues to mount over the US and China competing for the top spot in the global AI race. Sacks has been outspoken about the AI race since taking on his role following US President Donald Trump's inauguration on Jan. 20. Just over a week into the job, Sacks said he is "confident in the US, but we can't be complacent." Sack's latest comment came after Meta's AI division said in an X post on the same day that it is introducing the fourth generation of its Llama models, Llama 4 Scout and Llama 4 Maverick. "Our most advanced models yet and the best in their class for multimodality," Meta said. Meta said its Llama 4 Scout model has 17 billion active parameters and uses 16 experts. The company claims it outperforms rival large language models -- Gemma 3, Gemini 2.0 Flash-lite, and Mistral 3.1 -- "across a broad range of widely accepted benchmarks." Meanwhile, Llama 4 Maverick also has 17 billion active parameters but is configured with 128 experts. Meta claimed the Maverick model can outperform GPT-4o and Gemini 2.0 Flash "across a broad range of widely accepted benchmarks." It also said Maverick can perform similarly to DeepSeek v3 on "reasoning and coding tasks" despite using only half the active parameters. Related: NFT marketplace X2Y2 shuts down after 3 years, pivots to AI Less than a year ago, in July 2024, Meta CEO Mark Zuckerberg said that in 2025, he expects Llama models to become "the most advanced in the industry." It has been just over two years since Meta first released the limited version of Llama 1 in February 2023. At the time, Meta said it was "blown away" by the demand, receiving over 100,000 requests for access.

[27]

Dataconomy

Meta launches new Llama 4 AI models: Scout and Maverick now available in apps

Meta has officially announced its most advanced suite of artificial intelligence models to date: the Llama 4 family. This new generation includes Llama 4 Scout and Llama 4 Maverick, the first of Meta's open-weight models to offer native multimodality and unprecedented context length support. These models also mark Meta's initial foray into using a mixture-of-experts (MoE) architecture. Meta also provided a preview of Llama 4 Behemoth, a highly intelligent LLM currently in training, designed to serve as a teacher for future models. This significant update to the Llama ecosystem arrives roughly a year after the release of Llama 3 in 2024, underscoring Meta's commitment to open innovation in the AI space. According to Meta, making leading models openly available is crucial for fostering the development of personalized experiences. Scout and Maverick: Multimodal models with innovative architecture Llama 4 Scout and Llama 4 Maverick are the initial efficient models in this series. Llama 4 Scout boasts 17 billion active parameters across 16 experts, while Llama 4 Maverick also features 17 billion active parameters but leverages a larger pool of 128 experts within its MoE architecture. Meta highlights the efficiency of this design, noting that Llama 4 Scout can fit on a single H100 GPU (with Int4 quantization) and Llama 4 Maverick on a single H100 host, facilitating easier deployment. The adoption of a mixture-of-experts (MoE) architecture is a key advancement. In MoE models, only a fraction of the total parameters are activated for each token, leading to more compute-efficient training and inference, ultimately delivering higher quality for a given compute budget. For instance, Llama 4 Maverick has 400 billion total parameters, but only 17 billion are active during use. Both Llama 4 Scout and Llama 4 Maverick are natively multimodal, incorporating early fusion to seamlessly process text and vision tokens within a unified model backbone. This allows for joint pre-training on vast amounts of unlabeled text, image, and video data. Meta has also enhanced the vision encoder in Llama 4, building upon MetaCLIP but training it separately alongside a frozen Llama model for better adaptation. Llama 4 Behemoth: A teacher model with promising benchmarks Meta provided a glimpse into Llama 4 Behemoth, a teacher-focused model with a staggering 288 billion active parameters and nearly two trillion total parameters across 16 experts. Meta claims that Behemoth outperforms leading models like GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks such as MATH-500 and GPQA Diamond. While Llama 4 Behemoth is still in training and not yet available, it played a crucial role in "codistilling" the Llama 4 Maverick model, leading to significant quality improvements. Context length and multilingual capabilities A standout feature of the Llama 4 models is their significantly increased context length. Llama 4 Scout leads the industry with a 10 million token input context window, a dramatic increase from Llama 3's 128K. This extended context enables advanced capabilities like multi-document summarization and reasoning over large codebases. Both models were pre-trained on a massive dataset exceeding 30 trillion tokens, more than double that of Llama 3, and encompassing over 200 languages, with over 100 languages having more than 1 billion tokens each. Integration into Meta apps and developer availability Reflecting Meta's commitment to open innovation, Llama 4 Scout and Llama 4 Maverick are available for download today on llama.com and Hugging Face. Meta also announced that these models are now powering Meta AI within popular applications like WhatsApp, Messenger, Instagram Direct, and on the Meta.AI website. This integration allows users to directly experience the capabilities of the new Llama 4 models. Meta emphasized its commitment to developing helpful and safe AI models. Llama 4 incorporates best practices outlined in their AI Protections Developer Use Guide, including mitigations at various stages of development. They also highlighted open-sourced safeguards like Llama Guard and Prompt Guard to help developers identify and prevent harmful inputs and outputs. Addressing the well-known issue of bias in LLMs, Meta stated that significant improvements have been made in Llama 4. The model refuses fewer prompts on debated political and social topics and demonstrates a more balanced response across different viewpoints, showing performance comparable to Grok in this area.

[28]

Inc. Magazine

Meta Delivers Latest Llama AI Model

Meta said Llama is a multimodal AI system. Multimodal systems are capable of processing and integrating various types of data including text, video, images and audio, and can convert content across these formats. Meta said in a statement that the Llama 4 Scout and Llama 4 Maverick are its "most advanced models yet" and "the best in their class for multimodality." Meta added that Llama 4 Maverick and Llama 4 Scout will be open source software. It also said it was previewing Llama 4 Behemoth, which it called "one of the smartest LLMs in the world and our most powerful yet to serve as a teacher for our new models."

[29]

NDTV Gadgets 360

Meta Could Release Its Next-Gen AI Model Llama 4 in April

Llama 4 is expected to borrow certain technical aspects from DeepSeek Meta Platforms plans to release the latest version of its large language model later this month, after delaying it at least twice, the Information reported on Friday, as the Facebook owner scrambles to lead in the AI race. Meta, however, could push back the release of Llama 4 again, the report said, citing two people familiar with the matter. Big technology firms have been investing aggressively in AI infrastructure following the success of OpenAI's ChatGPT, which altered the tech landscape and drove investment into machine learning. The report said one of the reasons for the delay is during development, Llama 4 did not meet Meta's expectations on technical benchmarks, particularly in reasoning and math tasks. The company was also concerned that Llama 4 was less capable than OpenAI's models in conducting humanlike voice conversations, the report added. Meta plans to spend as much as $65 billion (roughly Rs. 5,39,000 crore) this year to expand its AI infrastructure, amid investor pressure on big tech firms to show returns on their investments. Additionally, the rise of the popular, lower-cost model from Chinese tech firm DeepSeek challenges the belief that developing the best AI model requires billions of dollars. The report said Llama 4 is expected to borrow certain technical aspects from DeepSeek, with at least one version slated to employ a machine-learning technique called mixture of experts method, which trains separate parts of models for specific tasks, making them experts in those areas. Meta has also considered releasing Llama 4 through Meta AI first and then as open-source software later, the report said. Last year, Meta released its mostly free Llama 3 AI model, which can converse in eight languages, write higher-quality computer code and solve more complex math problems than previous versions.

[30]

NDTV Gadgets 360

Meta Has Released Its First Llama 4 Family of AI Models

The Maverick model has 17 billion active parameters and 128 experts Meta introduced the first artificial intelligence (AI) models in the Llama 4 family on Saturday. The Menlo Park-based tech giant released two models -- Llama 4 Scout and Llama 4 Maverick -- with native multimodal capabilities to the open community. The company says these are the first open models built with Mixture-of-Experts (MoE) architecture. Compared to the predecessor, these come with higher context windows and better power efficiency. Alongside, Meta also previewed Llama 4 Behemoth, the largest AI model in the family unveiled so far. In a blog post, the tech giant detailed its new AI models. Just like the previous Llama models, the Llama 4 Scout and Llama 4 Maverick are open-source AI models and can be downloaded via its Hugging Face listing or the dedicated Llama website. Starting today, users can also experience the Llama 4 AI models in WhatsApp, Messenger, Instagram Direct, and on the Meta.AI website. The Llama 4 Scout is a 17 billion active parameter model with 16 experts, whereas the Maverick model comes with 17 billion active parameters and 128 experts. Scout is said to be able to run on a single Nvidia H100 GPU. Additionally, the company claimed that the previewed Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several benchmarks. Meta said the Behemoth model, with 288 billion active parameters and 16 experts, was not released as it is still being trained. Coming to the architecture, the Llama 4 models are built on an MoE architecture. The MoE architecture activates only a fraction of the total parameters based on the requirement of the initial prompt, which makes it more compute efficient for training and inference. In the pre-training phase, Meta also used new techniques such as early fusion to integrate text and vision tokens simultaneously, and MetaP to set critical model hyper-parameters and initialisation scales. For post-training, Meta chose to start the process with lightweight supervised fine-tuning (SFT), followed by online reinforcement learning (RL) and lightweight direct preference optimisation (DPO). The sequence was chosen to not over-constrain the model. The researchers also performed SFT on only 50 percent of the "harder" dataset. Based on internal testing, the company claimed that the Maverick model outperforms Gemini 2.0 Flash, DeepSeek v3.1, and GPT-4o on the MMMU (image reasoning), ChartQA (image understanding), GPQA Diamond (reasoning and knowledge), and MTOB (long context) benchmarks. On the other hand, the Scout model is said to outperform Gemma 3, Mistral 3.1, and Gemini 2.0 on the MMMU, ChartQA, MMLU (reasoning and knowledge), GPQA Diamond, and MTOB benchmarks. Meta has also taken steps to make the AI models safer in both the pre-training and post-training processes. In pre-training, the researchers used data filtering methods to ensure harmful data was not added to its knowledge base. In post-training, the researchers added open-source safety tools such as Llama Guard and Prompt Guard to protect the model from external attacks. Additionally, the researchers have also stress-tested the models internally and have allowed red-teaming of the Llama 4 Scout and Maverick models. Notably, the models are available to the open community with a permissive Llama 4 licence. It allows both academic and commercial usage of the models, however, Meta no longer allows companies with more than 700 million monthly active users to access its AI models.

[31]

Economic Times

Meta nears release of new AI model Llama 4 this month: Report

Meta plans to spend as much as $65 billion this year to expand its AI infrastructure, amid investor pressure on big tech firms to show returns on their investments. Additionally, the rise of the popular, lower-cost model from Chinese tech firm DeepSeek challenges the belief that developing the best AI model requires billions of dollars.Meta Platforms plans to release the latest version of its large language model later this month, after delaying it at least twice, the Information reported on Friday, as the Facebook owner scrambles to lead in the AI race. Meta, however, could push back the release of Llama 4 again, the report said, citing two people familiar with the matter. Big technology firms have been investing aggressively in AI infrastructure following the success of OpenAI's ChatGPT, which altered the tech landscape and drove investment into machine learning. The report said one of the reasons for the delay is during development, Llama 4 did not meet Meta's expectations on technical benchmarks, particularly in reasoning and math tasks. The company was also concerned that Llama 4 was less capable than OpenAI's models in conducting humanlike voice conversations, the report added. Meta plans to spend as much as $65 billion this year to expand its AI infrastructure, amid investor pressure on big tech firms to show returns on their investments. Additionally, the rise of the popular, lower-cost model from Chinese tech firm DeepSeek challenges the belief that developing the best AI model requires billions of dollars. The report said Llama 4 is expected to borrow certain technical aspects from DeepSeek, with at least one version slated to employ a machine-learning technique called mixture of experts method, which trains separate parts of models for specific tasks, making them experts in those areas. Meta has also considered releasing Llama 4 through Meta AI first and then as open-source software later, the report said. Last year, Meta released its mostly free Llama 3 AI model, which can converse in eight languages, write higher-quality computer code and solve more complex math problems than previous versions.

[32]

Economic Times

Meta releases new AI model Llama 4

Meta said Llama is a multimodal AI system. Multimodal systems are capable of processing and integrating various types of data including text, video, images and audio, and can convert content across these formats.Meta Platforms on Saturday released the latest version of its large language model (LLM) Llama, called the Llama 4 Scout and Llama 4 Maverick. Meta said Llama is a multimodal AI system. Multimodal systems are capable of processing and integrating various types of data including text, video, images and audio, and can convert content across these formats. Meta said in a statement that the Llama 4 Scout and Llama 4 Maverick are its "most advanced models yet" and "the best in their class for multimodality." Meta added that Llama 4 Maverick and Llama 4 Scout will be open source software. It also said it was previewing Llama 4 Behemoth, which it called "one of the smartest LLMs in the world and our most powerful yet to serve as a teacher for our new models." Big technology firms have been investing aggressively in artificial intelligence (AI) infrastructure following the success of OpenAI's ChatGPT, which altered the tech landscape and drove investment into machine learning. The Information reported on Friday that Meta had delayed the launch of its LLM's latest version because during development, Llama 4 did not meet Meta's expectations on technical benchmarks, particularly in reasoning and math tasks. The company was also concerned that Llama 4 was less capable than OpenAI's models in conducting humanlike voice conversations, the report added. Meta plans to spend as much as $65 billion this year to expand its AI infrastructure, amid investor pressure on big tech firms to show returns on their investments.

[33]

Economic Times

How to use Meta's Llama 4: A quick guide for developers and enterprises

Meta's Llama 4, an open-weight AI model, targets developers and startups with two variants -- Scout and Maverick -- offering transparency, flexible deployment, and alternatives to GPT-4 and Gemini. It supports multimodal tasks and fine-tuning via PyTorch. Available on GitHub, Azure, and Hugging Face, it promotes open innovation with licensing caveats.Meta's newest open-weight large language model, Llama 4, is now out -- aimed directly at developers, AI startups, and researchers looking for alternatives to OpenAI's GPT-4 and Google's Gemini. Available in two variants, Scout and Maverick, the model uses a mixture-of-experts (MoE) architecture that activates only parts of the model per query, making it more efficient to run. Here's how to start using it. Where to get it Meta has made Llama 4 available via its GitHub repository, with access gated through a request form for responsible usage. It's also integrated into popular platforms like Hugging Face and Microsoft's Azure AI Studio, allowing plug-and-play experimentation. Use cases Llama 4 Scout is lightweight enough to run inference on a single Nvidia H100 GPU -- think chatbots, coding assistants, or search. Maverick, the more powerful variant, is built for complex reasoning and multimodal tasks, like understanding images alongside text, or parsing technical documents. Integration paths You can deploy Llama 4 through standard APIs, or fine-tune it using PyTorch-based tools. Meta's own LlamaIndex (formerly GPT Index) offers tooling for retrieval-augmented generation (RAG) pipelines, a critical use case for enterprise applications. Caveats While Meta brands Llama 4 as "open source," its license restricts use by companies with over 700 million monthly active users -- a clear swipe at rivals like Google and Amazon. Commercial users still need to review licensing terms closely. Why it matters Unlike closed models like GPT-4 or Claude, Llama 4 offers transparency and adaptability. For AI-native startups, it's a low-barrier entry into high-performance models. For large enterprises, it's an opportunity to reduce dependency on proprietary APIs -- if they're willing to build.

[34]

Economic Times

Google CEO Sundar Pichai congratulates Meta on Llama 4 launch

Google CEO Sundar Pichai congratulated Meta on the launch of Llama 4 AI models, saying, "Never a dull day in the AI world!" Llama 4 includes Scout and Maverick, which are multimodal and outperform previous models. Llama 4 Behemoth, still in training, aims to surpass GPT-4.5 and Gemini 2.0 ProGoogle CEO Sundar Pichai recently congratulated the Meta team behind that company's latest large language model, known as Llama 4 Scout and Llama 4 Maverick. Posting on X, Pichai said, "Never a dull day in the AI world! Congrats to the Llama 4 team, Onwards!" Llama 4 details Llama 4 is a multimodal AI system capable of processing and integrating diverse data types, including text, video, images, and audio, while seamlessly converting content across these formats. Meta describes Llama 4 Scout and Llama 4 Maverick as its "most advanced models yet," claiming they outperform all previous Llama generations. Meta AI, the company's AI-powered assistant integrated into apps such as WhatsApp, Messenger, and Instagram, has been updated to utilise Llama 4 in 40 countries. The models are open-source and immediately available, while Llama 4 Behemoth, a "teacher model" designed to enhance the new models, remains in training. Meta asserts that Behemoth can surpass GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks such as MATH-500 and GPQA Diamond. Also Read: ETtech Explainer: How Meta's Llama 4 stacks up against Chinese AI models Qwen, DeepSeek, and Manus AI Additionally, Meta announced an exclusive event, 'LlamaCon,' scheduled for April 29, where it will outline the possibilities and potential of Llama in greater detail.

[35]

Economic Times

ETtech Explainer: How Meta's Llama 4 stacks up against Chinese AI models Qwen, DeepSeek, and Manus AI

Meta has launched Llama 4, its latest open-weight AI models, including Scout, Maverick, and Behemoth, offering advanced multimodal capabilities. These models excel in text, image, and video processing. Meta claims that they outpace rivals like OpenAI's GPT-4 and China's Qwen, with strong growth in open-source adoption globally.Meta on Saturday unveiled its latest suite of open-weight AI models under the Llama 4 family, including two new variants -- Llama 4 Maverick and Llama 4 Scout -- aimed at delivering personalised, multimodal systems. The tech giant also offered a preview of Llama 4 Behemoth, a high-capacity model still in training. Meta described it as its most powerful language model to date, designed to act as a "teacher" for smaller variants. Despite being smaller in size, the Mark Zuckerberg-led company claims that it outperforms rival models such as OpenAI's GPT-4o and Google's Gemini 2.0 Flash in a variety of standard tests, called benchmarks. Let's dive deeper into the nuances of this newly launched open-weight large language model. Scout and Maverick Meta has introduced Llama 4 Scout, a powerful yet lightweight AI model built with 17 billion active parameter models with 16 experts. Scout runs efficiently on a single high-end graphics card -- the NVIDIA H100 -- making it more accessible to developers and researchers. In AI terms, parameters act like brain cells, helping the model learn from data, understand language, and make informed decisions. The more parameters, the smarter the model can be. Additionally, Scout can process 10 million tokens at once, making it useful for decoding large amounts of data and information in one go. Compared to earlier versions of Llama models, it outperforms other similar AI systems such as Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 in various standard tests, the company said in a blog post. Meanwhile, Llama 4 Maverick, another model in the lineup, features the same number of active parameters but uses 128 experts. Experts refers to specialised parts within the model. Instead of using the whole model every time, it picks a few of these "experts" depending on the task -- making it faster and more efficient. Meta's Llama 4 models, including Scout, were trained on a wide range of images and video stills to help them develop a better understanding of visuals, including recognising actions over time and identifying connections between related images. Although pre-training involved up to 48 images, the team found the models perform reliably with up to eight images in real-world testing. The company also claims that it works in a manner similar to the DeepSeek v3 model, using less than half the resources to do so. The tech behind The models are built on a new "mixture of experts" (MoE) architecture. Llama 4 is trained to work with both text and visual inputs such as images or videos. It uses a technique called early fusion, meaning it processes all input types through one combined system from the beginning. This allows it to be trained on a mix of text, image, and video data all at once. In fact, Meta has used a new approach called MetaP, which helps decide how much each part of the model should learn during training. Llama 4 was trained on data from 200 languages, which is much larger than the earlier version, i.e. Llama 3. "Our newest models include smaller and larger options to accommodate a range of use cases and developer needs. Llama 4 Maverick offers unparalleled, industry-leading performance in image and text understanding, enabling the creation of sophisticated AI applications that bridge language barriers," the company said. Moreover, to build capabilities in mathematical calculations, it used FP8 precision, responsible for making calculations more efficient without lowering the model's quality. Integration into Meta's Platforms In a push for open-source, Llama 4 Scout and Llama 4 Maverick are available on llama.com and Hugging Face for easy access. Users can now try Llama 4 in Meta Platforms such as WhatsApp, Messenger, Instagram Direct, and on the Meta.AI website. These apps have been updated to use Llama 4 in 40 countries, though multimodal capabilities are currently available only in the US and exclusively in English. Preview of Llama 4 Behemoth Along with the above models, Meta introduced the Llama 4 Behemoth, a massive mixture-of-experts (MoE) multimodal model with 288 billion active parameters, 16 experts, and close to 2 trillion total parameters. The model encompasses capabilities in math, multilingual understanding, and image processing, particularly among models that don't specialise in reasoning. China comparison China has been advancing its AI capabilities, with the launch of Alibaba's Qwen Series, DeepSeek's R1, ManusAI and Tencent's Hunyuan Turbo S, among others. The low-cost model, developed within two months and with an investment of less than $6 million, starkly contrasts the $100 million OpenAI reportedly spent on training its GPT-4 model, ET reported in January. Qwen achieves competitive performance against top-tier models and outcompetes DeepSeek V3 in popular coding and user query benchmarks. Qwen 2.5 Max has 72 billion parameters and likely more in its Max configuration. It is trained on 20 trillion tokens, surpassing DeepSeek's 14.8 trillion. Last month, another Chinese model Manus AI, was introduced. A leap into AI autonomy, it is capable of executing multi-step workflows, and accessing authoritative data sources through application programming interfaces (API). Manus has achieved new state-of-the-art (SOTA) performance across all its three difficulty levels. However, according to a survey by Artificial Analysis, an independent site for AI benchmarking, Llama was the number two most considered model and the industry leader in open source. As AI capabilities scale globally, the landscape is increasingly being defined not just by performance, but also by openness, efficiency, and integration across languages and modalities -- areas where both Meta and China's tech giants are making bold strides. ET reported that Meta's Llama series of AI models have become the fastest growing open-source family of models, securing 350 million downloads globally on Hugging Face. Monthly usage (token volume) of Llama grew 10x from January to July 2024 for some of the largest cloud service providers. Meta is set to hold its inaugural LlamaCon AI conference on April 29 and launch a dedicated app for its Meta AI chatbot sometime in the second quarter.

[36]

Beebom

Meta Releases Llama 4 AI Models; Beats GPT-4o and Grok 3 in LMArena

The largest Llama 4 Behemoth model is still in training, and it has a total of 2 trillion parameters, with 288B active parameters across 16 experts. After a gap of four months, Meta has released a new series of Llama 4 open-weight models. The new AI models are Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth. Unlike earlier dense models, Meta has gone with the MoE (Mixture of Experts) architecture this time, just like DeepSeek R1 and V3. And all Llama 4 models are natively multimodal from the ground up. First, the smallest Llama 4 Scout model has a total of 109B parameters with 16 experts, but only 17B parameters are active at a time. It also supports a massive context length of 10 million tokens. Meta says Llama 4 Scout (17B) offers better performance than Gemma 3, Mistral 3.1, and Gemini 2.0 Flash Lite. Next, the Llama 4 Maverick model brings a total of 400B parameters with an expanded 128 experts, but again, only 17B parameters are active. This model is more capable than Llama 4 Scout as it has many more specialized expert models. It has a context length of 1 million tokens. Meta claims Llama 4 Maverick beats OpenAI's GPT-4o and Google's Gemini 2.0 Flash. The impressive part about Llama 4 Maverick is that with just 17B active parameters, it has scored an ELO score of 1,417 on the LMArena leaderboard. This puts the Maverick model in the second spot, just below Gemini 2.5 Pro, and above Grok 3, GPT-4o, GPT-4.5, and more. It also achieves comparable results when compared to the latest DeepSeek V3 model on reasoning and coding tasks, and surprisingly, with just half the active parameters. Meta has done a tremendous job distilling the Llama 4 Scout and Maverick models from the largest Llama 4 Behemoth model. The Llama 4 Behemoth AI model is trained on a total of 2 trillion parameters, but only 288 billion parameters are active across 16 experts. Meta says Behemoth is still in training, and more details about its release will be shared later. Meta claims the Llama 4 Behemoth beats the largest AI models such as GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Pro on STEM benchmarks. Note that these are non-reasoning models so Meta can extract even better performance with future reasoning models using the Llama 4 series base models. As for availability, Meta says Llama 4 is rolling out on Meta AI in WhatsApp, Messenger, Instagram, and the Meta AI website, starting today in 40 countries. However, the multimodal features are currently available in the US only.

[37]

Geeky Gadgets

Llama 4 Herd Series Released : Meta's Breakthrough in Open Source AI Models

Meta's Llama 4 series represents a pivotal advancement in the realm of open-weight AI models, delivering notable improvements in performance, scalability, and accessibility. With the introduction of Llama 4 Scout and Llama 4 Maverick, alongside the anticipation of future models like Llama 4 Reasoning and Llama 4 Behemoth, Meta continues to push the boundaries of innovation in open source AI. These models use state-of-the-art technologies, such as the Mixture of Experts (MOE) architecture, to achieve remarkable capabilities in multimodal processing, long-context handling, and efficient scaling. Prompt Engineering providing more insight in to the latest Llama 4 release by Meta. The Llama 4 series introduces a diverse lineup of models, each designed to address specific use cases while maintaining a balawwnce between efficiency and performance. These models cater to a wide range of applications, from lightweight deployments to complex, large-scale tasks. The Llama 4 series incorporates innovative technologies that distinguish it from its predecessors and competitors. These advancements enhance the models' efficiency, scalability, and versatility, making them suitable for a wide array of applications. Master Llama AI models with the help of our in-depth articles and helpful guides. The Llama 4 models demonstrate exceptional performance across various benchmarks, highlighting their potential to lead in the AI domain. Their design emphasizes not only raw power but also practical usability, making them a valuable tool for organizations of all sizes. Meta's Llama 4 series reflects a strong commitment to open source principles, making sure that these advanced models are accessible to a broad audience while maintaining certain restrictions to align with ethical and competitive considerations. The Llama 4 series aligns with broader trends in the AI industry, reinforcing Meta's leadership in open-weight AI innovation. Its design and capabilities reflect a forward-thinking approach to addressing the evolving demands of AI applications. The Llama 4 series is not just a technological achievement but a glimpse into the future of AI. With ongoing developments and a clear focus on innovation, these models are poised to redefine the possibilities of open-weight AI.

[38]

Geeky Gadgets

Llama 4 : 10 Million Context Window Fully Tested

Meta AI has unveiled Llama 4, a new generation of open large language models (LLMs) that sets new standards in efficiency, multimodal functionality, and long-context processing. Designed to compete with proprietary models, Llama 4 introduces three distinct variants -- Scoot, Maverick, and Behemoth -- each optimized for specific tasks and benchmarks. These models redefine the landscape of open LLMs, offering accessible and advanced AI solutions for a wide range of applications. With three distinct models -- Scoot, Maverick, and Behemoth -- Llama 4 isn't just another step forward; it's a leap. Whether you're tackling complex coding challenges, summarizing massive research documents, or exploring the possibilities of multimodal AI, these models are built to deliver. And the best part? They're not just about raw power; they're efficient, scalable, and accessible, making them a fantastic option for anyone looking to harness the potential of AI without breaking the bank. Llama 4 Scoot is a compact yet highly capable model, boasting 17 billion active parameters and 16 experts. Its standout feature is an impressive 10-million-token context window, allowing unparalleled performance in tasks requiring extensive long-context processing. This model excels in areas such as summarizing multi-document research, analyzing large-scale codebases, and solving intricate algorithmic challenges. Two key innovations drive its success: These advancements make Scoot an ideal choice for applications that demand precision, scalability, and efficiency, particularly in research and technical fields. Building on the foundation laid by Scoot, Llama 4 Maverick incorporates 17 billion active parameters and 128 experts. Its defining feature is early fusion technology, which seamlessly integrates text and vision for multimodal tasks. This capability positions Maverick as a leader in areas such as image grounding, text-vision reasoning, and creative design. Despite its relatively compact size, Maverick rivals larger proprietary models like Deep Seek V3 in reasoning and coding tasks. Its ability to process both textual and visual data makes it particularly valuable for industries ranging from academic research to media production. Maverick's efficiency and versatility establish it as a formidable player in the competitive LLM landscape. Here are more guides from our previous articles and guides related to Meta Llama AI models that you may find helpful. Currently in training, Llama 4 Behemoth is already outperforming leading models such as GPT-4.5, Sonnet 3.7, and Gemini 2.0 Pro in STEM benchmarks. As the powerhouse behind Scoot and Maverick, Behemoth is engineered to handle the most demanding tasks, including advanced logical problem-solving and complex algorithmic implementation. Although its full capabilities are yet to be revealed, Behemoth is expected to set unprecedented standards for open LLM performance. Its release is anticipated to further solidify Llama 4's position as a leader in the AI domain, offering unparalleled tools for tackling the most challenging computational problems. The success of Llama 4 is rooted in its innovative mixture of experts (MoE) architecture. This design activates only a subset of parameters per token, significantly enhancing computational efficiency without compromising performance. This efficiency allows Llama 4 models to be deployed on single H100 GPUs, making them more accessible for large-scale use. Additional architectural advancements include: These innovations ensure that Llama 4 remains at the forefront of AI technology, offering a blend of performance and accessibility that is rare in the field of open LLMs. Llama 4 models demonstrate exceptional performance across a diverse range of tasks, including: While Scoot and Maverick showcase remarkable capabilities, certain limitations, such as SVG generation, highlight areas for potential improvement. These models, however, continue to push the boundaries of what open LLMs can achieve. The versatility of Llama 4 models unlocks numerous possibilities across various industries. Key applications include: These capabilities make Llama 4 an indispensable tool for researchers, developers, and organizations seeking innovative AI solutions tailored to their specific needs. With the ongoing development of Llama 4 Behemoth, Meta AI is poised to redefine the benchmarks for open LLMs. The robust architecture, efficiency, and performance of these models position them as strong alternatives to proprietary competitors like Gemini 2.0 Flash. As artificial intelligence continues to evolve, Llama 4 represents a significant step forward in making advanced language models more accessible and impactful. Whether addressing complex research challenges or exploring new frontiers in AI, Llama 4 provides the tools and capabilities to drive innovation and success.

[39]

Benzinga

Mark Zuckerberg's Meta Unveils Latest Llama 4 AI Models To Challenge OpenAI's ChatGPT 4 And Google's Gemini - Alphabet (NASDAQ:GOOG), Alphabet (NASDAQ:GOOGL)

Meta Platforms Inc. META unveiled the first models of its latest open-source artificial intelligence (AI) software, Llama 4, on Saturday. What Happened: The Mark Zuckerberg-led company announced that users can now experiment with two of the newly launched Llama 4 models, namely Llama 4 Scout and Llama 4 Maverick, on Meta AI via WhatsApp, Messenger, Instagram Direct, or the Meta AI website. Llama 4 Scout is a 17 billion active parameter model with 16 experts, while Llama 4 Maverick is a 17 billion active parameter model with 128 experts. The company stated that the new Llama 4 models are the first to feature a mixture of experts (MoE) architecture. Despite the release, the company has not launched the most potent version of Llama 4, the Llama 4 Behemoth, which is still in training, as per the company blog post. Chris Cox, Meta's Chief Product Officer, stated last month that Llama 4 will elevate the capabilities of AI agents, allowing them to achieve more advanced levels of reasoning and action. These AI agents could be beneficial to both consumers and businesses by handling various tasks and web surfing. Zuckerberg stated in an Instagram video, "Our goal is to build the world's leading AI, open source it, and make it universally accessible so that everyone in the world benefits." The company also plans to host its inaugural LlamaCon AI conference on April 29. As per CNBC, the company is also expected to announce a standalone app for its Meta AI chatbot in Q2. SEE ALSO: Whistlr Network: The Real-Time, Unfiltered Social Media, That Flips the Social Script - Raw, Instant, and Built for the Culture - Benzinga Why It Matters: Meta stated that the newest Llama 4 releases are its "most advanced models yet" and "the best in their class for multimodality." The company also believes that the models outperform ChatGPT 4, Alphabet's GOOG GOOGL Gemma 3 and Gemini 2.0 Flash-Lite, as well as Mistral 3.1 across several parameters. The unveiling of Llama 4 models comes at a crucial time for Meta. The company's AI research leadership is undergoing major changes. On April 2, 2025, Joelle Pineau, the Vice President of AI research at Meta, announced her departure from the company in May. Moreover, the company has recently faced controversy over its AI models as a court filing in the Kadrey v. Meta lawsuit indicated that Meta had been profiting from its Llama AI models through revenue-sharing agreements with host businesses, despite previous denials. Despite these challenges, Meta continues to invest in AI, with the launch of Llama 4 models being the latest step in this direction. Meta holds a momentum rating of 57.81% and a growth rating of 74.76%, according to Benzinga's Proprietary Edge Rankings. The Benzinga Growth metric evaluates a stock's historical earnings and revenue expansion across multiple timeframes, prioritizing both long-term trends and recent performance. For an in-depth report on more stocks and insights into growth opportunities, sign up for Benzinga Edge. The shares of Meta closed 5.06% lower at $504.73 on Friday. The stock plunged more than 19% over the past month. READ MORE: S&P 500, Dow Jones On Course To Mimic Rare Consecutive Losses Not Seen Since The Great Depression: What's Driving the Fear? Image via Shutterstock Disclaimer: This content was partially produced with the help of AI tools and was reviewed and published by Benzinga editors. GOOGAlphabet Inc$142.16-3.78%Stock Score Locked: Want to See it? Benzinga Rankings give you vital metrics on any stock - anytime. Reveal Full ScoreEdge RankingsMomentum51.55Growth64.38Quality86.42Value50.89Price TrendShortMediumLongOverviewGOOGLAlphabet Inc$140.38-3.59%METAMeta Platforms Inc$482.24-4.46%Got Questions? AskWhich AI companies could benefit from Llama 4?How will Meta's Llama 4 impact OpenAI?What opportunities exist in multimodal AI tech?Which tech stocks might rise with Llama 4's success?How might AI startups leverage Llama 4?Will Meta's growth impact social media stocks?Could Llama 4's launch affect cloud service providers?Which business sectors might improve with AI agents?How will investors respond to Meta's new models?What risks do large tech firms face from Llama 4?Powered ByMarket News and Data brought to you by Benzinga APIs

[40]

Geeky Gadgets

New Llama 4 AI Model 10 Million Token Context Window

Meta has unveiled Llama 4, its latest artificial intelligence model, designed to redefine the boundaries of AI technology. This advanced model comes in two distinct variants -- Maverick and Scout -- each tailored to meet specific needs. Among its standout features is an unprecedented 10 million token context length, a capability that positions Llama 4 as a pivotal tool for industries that rely on processing extensive datasets. By exploring its features, technical innovations, and potential applications, you can better understand how this model is shaping the future of artificial intelligence. With its innovative design and a jaw-dropping token context length, this new generation of AI promises to tackle the very challenges that have long held us back. Imagine analyzing entire books or processing hours of video in one seamless session -- no more breaking things into smaller chunks or losing context along the way. In this overview, AI Advantage explore how Llama 4's innovative features, from its multimodal capabilities to its efficient architecture, are poised to transform industries and redefine how we interact with technology. Llama 4 is available in two specialized versions: The Scout variant's ability to process vast amounts of data in a single session is particularly noteworthy. This capability eliminates the need to break down large datasets into smaller segments, allowing seamless analysis of entire books, extensive legal documents, or hours of video transcripts. Such functionality is invaluable for industries like legal research, scientific analysis, and media archiving, where handling long-context tasks efficiently is critical. Llama 4's exceptional performance is driven by its innovative "mixture of experts" architecture, a design that dynamically allocates computational resources based on the task at hand. This ensures efficient operation, even on less powerful hardware. For you, this means the possibility of running advanced AI models locally without requiring expensive, high-end infrastructure. Another key feature is its multimodal capability, which allows the model to process text, images, and videos. While video support is still limited in consumer-facing applications, this functionality signals a future where AI can seamlessly integrate and analyze diverse data types. This multimodal approach enhances the model's versatility, making it suitable for a wide range of applications, from content generation to complex data analysis. Master Llama AI models with the help of our in-depth articles and helpful guides. Llama 4 is positioned as an open source model, offering significant flexibility for developers, researchers, and businesses. However, it comes with certain restrictions: Despite these limitations, the ability to run the model locally provides a distinct advantage. By eliminating the need for cloud-based solutions, you can reduce operational costs, maintain greater control over your data, and customize the model to suit your specific needs. This accessibility makes Llama 4 an attractive option for organizations aiming to use advanced AI capabilities without relying on external infrastructure. Llama 4 establishes a new benchmark in AI performance, achieving an ELO score of approximately 420, surpassing GPT-4.5 and other leading models. This performance metric underscores its capability as a competitive alternative in the AI landscape. Additionally, its cost-efficient scalability makes it an appealing choice for organizations seeking high performance without incurring excessive expenses. One of the most significant advancements is Llama 4's ability to replace traditional retrieval-augmented generation (RAG) pipelines. By using its extensive context capabilities, the model eliminates the need for external data retrieval systems, simplifying workflows and reducing complexity. This innovation has far-reaching implications for industries that rely on in-depth data analysis. Potential applications include: These capabilities demonstrate Llama 4's potential to transform workflows, allowing faster, more accurate analysis across various sectors. While Llama 4 offers new capabilities, its consumer-facing implementations currently face certain limitations: These constraints may reduce its immediate impact on general consumers. However, Meta's ongoing development efforts suggest that these barriers will likely be addressed in future updates. As these features become more widely available, the model's accessibility and functionality are expected to expand, broadening its appeal to a wider audience. Meta's vision for AI innovation extends beyond Llama 4. The company is already working on its next-generation model, Behemoth, which promises even greater advancements in AI capabilities. As technology continues to evolve, you can expect accelerated innovation, driving efficiency and unlocking new possibilities across industries. Llama 4 represents a significant milestone in this journey, setting the stage for a future where AI plays an increasingly central role in addressing complex challenges and enhancing productivity. Llama 4's introduction marks a pivotal moment in the development of artificial intelligence. With its long-context processing, multimodal support, and efficient architecture, it offers a glimpse into the future of AI-driven solutions. Whether applied to data analysis, content generation, or other advanced tasks, Llama 4 is poised to reshape how industries approach problem-solving and harness the power of technology.

[41]

Beebom

Meta Under Fire for Manipulating Llama 4 Benchmark, But It Isn't the First Time

Meta recently released its Llama 4 series of AI models, making headlines for outranking GPT-4o and Gemini 2.0 Pro in Chatbot Arena (formerly LMSYS). The company claimed that its Llama 4 Maverick model -- an MoE model that activates only 17 billion parameters out of a massive 400B across 128 experts -- achieved an impressive ELO score of 1,417 on Chatbot Arena benchmark. This result raised eyebrows across the AI community, as a relatively smaller MoE model outranked much larger LLMs such as GPT-4.5 and Grok 3. The unusual performance from a small model led many in the AI community to test the model independently. Surprisingly, the real-world performance of Llama 4 Maverick didn't match benchmark claims from Meta, particularly in coding tasks. On 1Point3Acres, a popular forum for Chinese people in North America, a user claiming to be a former Meta employee posted a bombshell. According to the post, which has been translated into English on Reddit, the Meta leadership allegedly mixed "the test sets of various benchmarks in the post-training process" to inflate the benchmark score and meet internal targets. The Meta employee found the practice unacceptable and chose to resign. The former employee also asked the team to exclude their name from the Llama 4 technical report. In fact, the user claims that the recent resignation of Meta's Head of AI research, Joelle Pineau, is directly linked to the Llama 4 benchmark hacking. In response to the growing allegations, Ahmad Al-Dahle, head of Meta's Generative AI division, shared a post on X. He firmly dismissed the claim that Llama 4 was post-trained on the test sets. Al-Dahle writes: We've also heard claims that we trained on test sets -- that's simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations. He acknowledged the inconsistent Llama 4 performance across different platforms. And, also urged the AI community to give it some days for the implementation to get "dialed in." LMSYS Responds to Llama 4 Benchmark Manipulation Allegations Following concerns from AI community, LMSYS -- the organization behind the Chatbot Arena leaderboard -- issued a statement to improve transparency. LMSYS clarified that the submitted model on Chatbot Arena was "Llama-4-Maverick-03-26-Experimental". It was a custom variant of the model, optimized for human preference. LMSYS acknowledged that "style and model response tone was an important factor". This may have given undue advantage to the custom Llama 4 Maverick model. The organization also admitted that this information was not made sufficiently clear by the Meta team. In addition, LMSYS stated, "Meta's interpretation of our policy did not match what we expect from model providers." To be fair, Meta, in its official Llama 4 blog, mentioned that "an experimental chat version" scored 1,417 on Chatbot Arena. But they didn't explain anything further. Finally, to improve transparency, LMSYS added the Hugging Face version of Llama 4 Maverick to Chatbot Arena. Besides that, it has released over 2,000 head-to-head battle results for the public to review. The results include prompts, model responses, and user preferences. I reviewed the battle results, and it was baffling to see users consistently preferring Llama 4's often incorrect and overly verbose responses. This raises deeper questions about trusting community-driven benchmarks like Chatbot Arena. Not the First Time Meta Gaming Benchmarks This isn't the first time Meta has been accused of gaming benchmarks through data contamination i.e. mixing benchmark datasets in the training corpus. Back in February this year, Susan Zhang -- a former Meta AI researcher who now works at Google DeepMind -- shared a revealing study in response to a post by Yann LeCun, Meta AI's chief scientist. The study found that over 50% of test samples from key benchmarks were present in Meta's Llama 1 pretraining data. The paper says: "In particular, Big Bench Hard, HumanEval, HellaSwag, MMLU, PiQA, and TriviaQA show substantial contamination levels across both corpora". Now, amid the latest benchmark hacking allegations around Llama 4, Zhang has sarcastically noted that Meta should at least cite their "previous work" from Llama 1 for this "unique approach." The jab is directed at Meta that benchmark manipulation is not an accident. But it's a strategy by the Zuckerberg-led company to artificially boost performance metrics.

[42]

Geeky Gadgets

Meta Finally Reveals The Truth About Llama 4 AI Models

The release of Llama 4 by Meta has ignited widespread discussion within the artificial intelligence community, bringing critical issues such as transparency, performance evaluation, and organizational challenges into sharp focus. While the model showcases potential in certain applications, its debut has also raised significant concerns about Meta's practices and the broader implications for the AI industry. For those navigating the rapidly evolving AI landscape, understanding these developments is vital to grasp the current state and future trajectory of the field. The truth is, the rollout of Llama 4 has sparked more questions than answers, leaving many in the AI community divided. From concerns about benchmark discrepancies to whispers of internal struggles at Meta, the situation highlights the growing pains of an industry racing to innovate while grappling with accountability. But amidst the noise, there's an opportunity to learn -- not just about Llama 4, but about the broader challenges shaping the AI landscape. AI Grid unpacks what this all means for developers, researchers, and anyone invested in the future of artificial intelligence. Meta's introduction of Llama 4 was met with immediate scrutiny due to the absence of a detailed technical paper accompanying the release. Transparency is a cornerstone of trust in AI development, and the lack of comprehensive documentation has left many questioning the credibility of Meta's claims. Allegations of benchmark manipulation have further complicated the narrative, with critics pointing to inconsistencies between internal and public versions of the model. These discrepancies have fueled skepticism, emphasizing the importance of open communication and rigorous documentation in advancing AI technologies. For users and researchers alike, this situation underscores the need for companies to prioritize clarity and accountability when unveiling new advancements. Performance benchmarks serve as a critical measure of an AI model's capabilities, and Llama 4 has received mixed reviews in this regard. While some users have reported notable success in specific applications, others argue that the model underperforms compared to competitors in key areas. Concerns about potential contamination in benchmark evaluations -- where test data may inadvertently overlap with training data -- have added to the controversy. These issues highlight the urgent need for standardized benchmarking practices that ensure fair and accurate comparisons across AI models. For developers and users, reliable benchmarks are essential tools for assessing the practical utility of AI systems in real-world scenarios. Stay informed about the latest in Llama by exploring our other resources and articles. Behind the scenes, Meta appears to be grappling with significant internal challenges related to Llama 4's development. Reports suggest that the high costs associated with the model, coupled with its failure to meet certain performance expectations, have placed considerable strain on the company. Additionally, the departure of key personnel has raised questions about Meta's organizational stability and ability to navigate the competitive AI landscape. These internal struggles are not unique to Meta; they reflect broader challenges faced by companies operating in a fast-paced industry where innovation often comes at a high cost. For stakeholders, these developments serve as a reminder of the complex dynamics that shape the creation and deployment of innovative AI technologies. The AI community's response to Llama 4 has been deeply divided. While some users have praised the model for its specific use-case performance, others have expressed disappointment over its perceived limitations. Beyond technical evaluations, the debate has brought attention to a growing concern within the industry: the prioritization of competitive advantage over transparency and ethical practices. This trend raises critical questions about the long-term implications of such strategies for the AI field. For users, developers, and researchers, the controversy surrounding Llama 4 serves as a call to reflect on the values that should guide AI development, including fairness, accountability, and the responsible use of technology. In response to the criticisms, Meta has denied allegations of unethical practices, such as training on test sets to artificially inflate performance metrics. The company has acknowledged that earlier versions of Llama 4 may have exhibited inconsistent quality but has committed to ongoing improvements. While this acknowledgment represents a step toward addressing user concerns, it also highlights the challenges of maintaining trust in a highly competitive environment. For the AI industry as a whole, the controversy surrounding Llama 4 reflects broader issues, including the tension between rapid innovation and the need for transparency and ethical standards. As companies race to outpace competitors, striking a balance between these priorities becomes increasingly critical. The release of Llama 4 has brought several pressing issues in AI development to the forefront, from transparency and performance evaluation to internal organizational challenges. For those engaged in the AI field, staying informed about these dynamics is essential to understanding the broader implications for the industry. As the debate continues, the need for trust-building, ethical practices, and open communication remains central to fostering a sustainable and innovative future for artificial intelligence.

[43]

Wccftech

Llama Drama: Meta Used An 'Experimental' AI Model To Climb Leaderboards, Raising Questions About Fairness, Transparency, And What Users Actually Get To Use

Meta launched two new versions of its Llama 4 AI over the weekend, including a smaller model called Scout and a mid-sized model called Maverick. The company claimed that its latter model outperformed ChatGPT-4o and Gemini 2.0 Flash on many popular tests, but it appears that there is something the company did not tell the testers, or did it? Meta's Maverick gained the second spot on LMArena soon after its launch, climbing the leaderboard in an attempt to take the throne for good. However, there is more to the story than beats the eye. If you are unfamiliar with LMArena, it is a site where people compare AI responses and vote for the one they see best based on relevancy and accuracy. Meta was proud to announce that Maverick has an ELO score of 1417, which beats the likes of GPT-4o and rests a tad bit behind Gemini 2.5 Pro. It appears that Meta had created an AI model that competes against two of the best models in the industry. Well, not quite, as people were quick to notice that something was not adding right. Soon after, Meta admitted that the model they had submitted to LMArena was different from the one they would release to the public. Instead, Meta submitted an experimental chat version, which was optimized and fine-tuned to sound better in conversations, according to TechCrunch. LMArena responded by saying that "Meta's interpretation of our policy did not match what we expect from model providers." They also added that Meta should have been more transparent about using the "Llama-4-Maverick-03-26-Experimental" version, which was specifically designed for human preference. In response, LMArena has changed its leaderboard policies to make future rankings fair and reliable. Here's what the Meta spokesperson said in response to the fiasco. "We have now released our open source version and will see how developers customize Llama 4 for their own use cases." While the company did not break any rules, it was not clear enough. However, it raised concerns that the company was gaming the leaderboard by using an optimized and up-scaled version of the model, which would not be available to the public. An independent AI researcher, Simon Willison, admitted that: "When Llama 4 came out and hit #2, that really impressed me -- and I'm kicking myself for not reading the small print." "It's a very confusing release generally... The model score that we got there is completely worthless to me. I can't even use the model that got a high score." On the flip side, there were also rumors that Meta trained its AI models to perform well in certain tests, but the company's VP of Generative AI, Ahman Al-Dahle, negated the comments and stated: "We've also heard claims that we trained on test sets -- that's simply not true." Users also asked the company why the new Maverick AI model was released on a Sunday, to which Mark Zuckerberg replied, "That's when it was ready." Meta took its sweet time to release the LLama 4, but given how strong the competition is, it is about time. We will share more details on the subject, so do keep an eye out.

[44]

PYMNTS

Meta Adds 'Multimodal' Models to Its Llama AI Stable | PYMNTS.com

Meta has released the latest versions of its Llama artificial intelligence (AI) model. The tech giant says its Llama 4 models, unveiled Friday (April 4), are built on one of the most advanced large language models (LLMs) in the world. Among the new offerings are Llama 4 Scout and Llama 4 Maverick, which Meta calls "the first open-weight natively multimodal models," multimodal meaning able to work with media other than text. "We're also previewing Llama 4 Behemoth, one of the smartest LLMs in the world and our most powerful yet to serve as a teacher for our new models," the announcement said. Meta has been investing heavily in AI over the last two years, with chief executive Mark Zuckerberg announcing plans to spend up to $65 billion in 2025 to strengthen the company's artificial intelligence stable. As noted here last month, the company is looking to extend its AI capabilities beyond its social media businesses and is mulling trials of premium subscriptions for its AI assistant, Meta AI, for agentic purposes such as booking reservations and video creation. Also on Friday, PYMNTS wrote about plans by OpenAI to release an open-source version of its LLM, something the company has not done in years. An open-source model is one that would be generally free to use, modify and distribute. The creator of ChatGPT revealed the news through a form seeking feedback for this strategy, saying it wants to hear from developers, researchers and the public to help it "make this model as useful as possible." The open-source model would be coming "in months," the company said. The startup added the last time it released an open-source model was for the GPT-2 LLM in 2019. OpenAI's most recent LLM is GPT-4.5. OpenAI decided to make its models proprietary after getting a $1 billion investment from Microsoft, part of a multiyear partnership to promote AI model development. The tech giant has invested more than $13 billion in OpenAI to date, and OpenAI's models are exclusive to customers of Microsoft's Azure cloud services. "OpenAI's decision comes as open-source models like Meta's Llama, Mistral's LLM and DeepSeek have been gaining in popularity," PYMNTS wrote. "In March, Meta CEO Mark Zuckerberg said on Threads that Llama has been downloaded 1 billion times. Llama was launched in 2023."

[45]

PYMNTS

Meta's Llama 4 Models Are Bad for Rivals but Good for Enterprises, Experts Say | PYMNTS.com

Llama 4 has fewer restrictions than Llama 3 as it aims to be politically neutral; the new models refuse to answer sensitive topics less often. Meta's latest open-source AI models are a shot across the bow to the more expensive closed models from OpenAI, Google, Anthropic and others. But it's good news for businesses because they could potentially lower the cost of deploying artificial intelligence (AI), according to experts. The social media giant has released two models from its Llama family of models: Llama 4 Scout and Llama 4 Maverick. They are Meta's first natively multimodal models -- meaning they were built from the ground up to handle text and images; these capabilities were not bolted on. Llama 4 Scout's unique proposition: It has a context window of up to 10 million tokens, which translates to around 7.5 million words. The record holder to date is Google's Gemini 2.5 -- at 1 million and going to 2. The bigger the context window -- the area where users enter the prompt -- the more data and documents one can upload to the AI chatbot. Ilia Badeev, head of data science at Trevolution Group, told PYMNTS that his team was still marveling at Gemini 2.5's 1 million context window when Llama 4 Scout comes along with 10 million. "This is an enormous number. With 17 billion active parameters, we get a 'mini' level model (super-fast and super-cheap) but with an astonishingly large context. And as we know, context is king," Badeev said. "With enough context, Llama 4 Scout's performance on specific applied tasks could be significantly better than many state-of-the-art models." Read more: Meta Adds 'Multimodal' Models to Its Llama AI Stable Both Llama 4 Scout and Maverick have 17 billion active parameters, meaning the number of settings that are activated at one time. In total, however, Scout has 109 billion parameters and Maverick has 400 billion. Meta also said Llama 4 Maverick is cheaper to run: between 19 and 49 cents per million tokens for input (query) and output (response); it runs on one Nvidia H100 DGX server. The pricing compares with $4.38 for OpenAI's GPT-4o. Gemini 2.0 Flash costs 17 cents per million tokens while DeepSeek v3.1 costs 48 cents. (While Meta is not in the business of selling AI services, it still seeks to minimize AI costs for itself.) "One of the biggest blockers to deploying AI has been cost," Chintan Mota, director of enterprise technology at Wipro, told PYMNTS. "The infrastructure, the inference, the lock-in -- it all adds up." However, open-source models like Llama 4, DeepSeek and others are enabling companies to build a model fine-tuned to their businesses, trained on their own data and running in their environment, Mota said. "You're not stuck waiting for a Gemini or (OpenAI's) GPT feature release. You have more control over your own data and its security." Meta's open-source Llama family will "put pressure on closed models like Gemini. Not because Llama is better, but because it's good enough," Mota added. "For 80% of business use cases -- automating reports, building internal copilots, summarizing knowledge bases -- 'good enough' and affordable beats 'perfect' and pricey." Read more: Musk's Grok 3 Takes Aim at Perplexity, OpenAI Llama 4 Scout and Maverick have a mixture-of-experts (MoE) architecture -- meaning they don't activate all the "expert" bots for all tasks. Instead, they pick and choose the right ones -- for speed and to save money. They were pre-trained on 200 languages, half with over 1 billion tokens each. Meta said this is 10 times more multilingual tokens than Llama 3. Meta said Scout and Maverick were taught by Llama 4 Behemoth, a 2-trillion-parameter model that's still in training. It is in preview. "The three Llama 4 models are geared toward reasoning, coding, and step-by-step problem-solving. However, they do not appear to exhibit the deeper chain-of-thought behavior seen in specialized reasoning models like OpenAI's 'o' series or DeepSeek R1," Rogers Jeffrey Leo John, co-founder and CTO of DataChat, told PYMNTS. "Still, despite not being the absolute best model available, LLama 4 outperforms several leading closed-source alternatives on various benchmarks," John added. Finally, Meta said it made Llama 4 less prone to punting questions it deems too sensitive -- to be more "comparable to Grok," the AI model from Elon Musk's AI startup, xAI. The latest version, Grok 3, is designed to "relentlessly seek the truth," according to xAI. According to Meta, "our goal is to remove bias from our AI models and to make sure that Llama can understand and articulate both sides of a contentious issue." It is less censorious than Llama 3. For example, Llama 4 refuses to answer queries related to debated political and social topics less than 2% of the time, compared with 7% for Llama 3.3. Meta claims that Llama 4 is more "balanced" in choosing which prompts not to answer and is getting better at staying politically neutral.

[46]

CCN.com

Meta Advances Open Source AI With Llama 4 as OpenAI Prepares 'Powerful' Open Weight Model

Big Tech continues to adapt to the explosive release of R1 from China's DeepSeek. | Credit: Chris Unger / Getty Images. Meta has revealed its new line of open-weight multimodal artificial intelligence models, the Llama 4 herd, which it describes as its most advanced set of models yet. The move comes as OpenAI "prepares" to release its first open-weight model. Despite the U.S. tech sector's long-standing preference for closed-source AI, DeepSeek's runaway success has driven industry giants to embrace open-weight models. Meta's Llama 4 Meta's new line of AI models furthers CEO Mark Zuckerberg's vision of making open-source AI "the industry standard." On Saturday, April 5, Meta unveiled two multimodal open-weight models, Llama 4 Scout and Maverick, built on a mixture-of-experts (MoE) architecture. MoE architecture makes large AI models more efficient by dividing the work among specialized "experts" within the model. Instead of having the entire model process every input, only a few selected parts are activated depending on the input. Llama 4 Scout has 17 billion active parameters and 16 experts designed to fit within a single H100 GPU. Meanwhile, Llama 4 Maverick is built with a 17 billion active parameter model with 128 experts, designed for more significant use cases and heavy workloads. Meta claimed Maverick beats OpenAI's GPT-4o and Alphabet's Gemini 2.0 on coding, reasoning, multilingual, long-context, and image benchmarks. It also added that it was competitive with the "much larger DeepSeek V3.1 on coding and reasoning." "Our goal is to build the world's leading AI, open source it, and make it universally accessible so that everyone in the world benefits," Zuckerberg said in an Instagram reel. "And I've said for a while that I think that open-source AI is going to become the leading models, and with Llama 4, this is starting to happen," Zuckerberg said. "Meta AI is getting a big upgrade today," he added. The release moves the ChatGPT maker closer to its open research roots but notably avoids committing to full open-source AI. An open-weight AI model means the developers have released the trained model's weights -- the data that allows the model to make predictions -- so anyone can download and use it. However, to be truly open-source, the code used to train the model, the data it was trained on, and the rights to modify or use it commercially would also need to be released. With this in mind, Meta's models are still classed as open-weight but have moved closer to open-source. OpenAI's business strategy has dramatically changed since it was founded in 2015, leading to criticism from former founder Elon Musk. Musk heavily criticized this move, claiming the company had prioritized corporate profits over its original goal of prioritizing AI research for humanity's welfare. After leaving the company due to an alleged conflict of interest with his AI work at Tesla, Musk criticized OpenAI's switch to a capped for-profit model, which allowed it to attract substantial funding from private companies. DeepSeek's Influence Big Tech continues to adapt to the explosive release of R1 from China's DeepSeek. DeepSeek's super-low development costs and open architecture challenge the long-held belief that only tech giants with deep pockets could build frontier models. The company claimed it trained R1 for just $3 million -- a fraction of what competitors like OpenAI and Google have spent on similarly capable systems. If accurate, that figure sent a clear message to the industry -- the barriers to entry for building powerful AI are lower than many assumed. Since its release, adaptations and customizations of R1 have been embedded into universities, startups, and the public sector. As China is now home to one of the most competitive open models on the planet, the largest and most valuable AI companies in the U.S. and Europe have been forced to reevaluate how to maintain their global leadership.

[47]

Investing.com

Meta delays Llama 4 AI model release amid development challenges - The Information By Investing.com

Investing.com -- Performance issues have led Meta Platforms Inc (NASDAQ:META). to postpone the release of its newest AI model, Llama 4, according to the Information, citing sources familiar with the matter. The delay comes at a critical time when Meta is striving to become a leader in the competitive AI industry. The launch of Llama 4 has been rescheduled at least twice, with the possibility of further delays, as indicated by the two sources. Meta now targets a release date later in April 2025, though this is subject to change based on ongoing development progress. Technical challenges have contributed to the postponement, with Llama 4 not meeting Meta's expectations in key areas such as reasoning and mathematical tasks during benchmark tests. Comparatively, the model has also shown limitations in conducting voice conversations with humanlike quality, especially when compared to models developed by OpenAI. Llama 4 is a vital component of Meta's strategy to secure a leading position in the AI market. It is designed to drive chatbot technology, including an assistant known as Meta AI, which is integrated across the company's suite of social media applications. The success of Llama 4 is crucial for Meta as it seeks to enhance user interaction and engagement across its platforms.

[48]

Market Screener

Meta nears release of new AI model Llama 4 this month, the Information reports

(Reuters) - Meta Platforms plans to release the latest version of its large language model later this month, after delaying it at least twice, the Information reported on Friday, as the Facebook owner scrambles to lead in the AI race. Meta, however, could push back the release of Llama 4 again, the report said, citing two people familiar with the matter. Big technology firms have been investing aggressively in AI infrastructure following the success of OpenAI's ChatGPT, which altered the tech landscape and drove investment into machine learning. The report said one of the reasons for the delay is during development, Llama 4 did not meet Meta's expectations on technical benchmarks, particularly in reasoning and math tasks. The company was also concerned that Llama 4 was less capable than OpenAI's models in conducting humanlike voice conversations, the report added. Meta plans to spend as much as $65 billion this year to expand its AI infrastructure, amid investor pressure on big tech firms to show returns on their investments. Additionally, the rise of the popular, lower-cost model from Chinese tech firm DeepSeek challenges the belief that developing the best AI model requires billions of dollars. The report said Llama 4 is expected to borrow certain technical aspects from DeepSeek, with at least one version slated to employ a machine-learning technique called mixture of experts method, which trains separate parts of models for specific tasks, making them experts in those areas. Meta has also considered releasing Llama 4 through Meta AI first and then as open-source software later, the report said. Last year, Meta released its mostly free Llama 3 AI model, which can converse in eight languages, write higher-quality computer code and solve more complex math problems than previous versions. (Reporting by Priyanka.G in Bengaluru; Editing by Vijay Kishore)

Twitter

Facebook

Copy Link

Meta's surprise release of Llama 4 AI models sparks debate over performance claims and practical limitations, highlighting the gap between AI marketing and real-world application.

Meta Unveils Llama 4: A New Era in AI Development

In a surprise weekend move, Meta released its latest AI models, Llama 4 Scout and Llama 4 Maverick, touting them as major advancements in multimodal AI technology 1. The announcement, which caught many AI experts off guard, has sparked discussions about the gap between AI marketing claims and practical user experiences.

Ambitious Claims and Technical Specifications

Meta positions Llama 4 as a direct competitor to closed-model giants like OpenAI and Google, describing the new models as "natively multimodal" with the ability to handle text, images, and video frames 1. The company claims top performance in various categories, with Llama 4 Scout boasting an enormous 10 million token context window 2.

Key features of the models include:

Llama 4 Maverick: 400 billion total parameters, 17 billion active parameters across 128 experts
Llama 4 Scout: 109 billion total parameters, 17 billion active parameters across 16 experts
Use of mixture-of-experts (MoE) architecture for improved efficiency 1 2

Mixed Reception and Practical Limitations

Despite Meta's bold claims, the AI community's initial response has been mixed to negative. Independent AI researcher Simon Willison noted, "The vibes around llama 4 so far are decidedly mid" 1. Developers have encountered challenges in utilizing the models' full potential, particularly with regard to the context window:

Third-party services like Groq and Fireworks have limited Scout's context to just 128,000 tokens, far below the claimed 10 million 1
Accessing larger contexts requires significant computational resources, with Meta's own example suggesting eight high-end NVIDIA H100 GPUs for a 1 million token context 1

Benchmarks and Performance Claims

Meta asserts that Llama 4 Maverick outperforms competitors like OpenAI's GPT-4o and Google's Gemini 2.0 on various technical benchmarks 1 2. However, independent verification of these claims remains limited. Interestingly, a version of Llama 4 currently ranks second on the Chatbot Arena LLM vibemarking leaderboard, though this refers to an experimental chat version different from the publicly available Maverick model 1.

Licensing and Availability

While Meta continues to use the term "open source," the Llama 4 models are more accurately described as "open weights" due to licensing restrictions 1. Notable limitations include:

Users and companies in the EU are prohibited from using or distributing the models
Companies with over 700 million monthly active users must request a special license from Meta 2

Integration with Meta's Ecosystem

Meta AI, the company's AI-powered assistant, has been updated to use Llama 4 across WhatsApp, Messenger, and Instagram in 40 countries 2 5. However, multimodal features are currently limited to English users in the United States.

Controversy and Ongoing Developments

The release has sparked discussions about AI development trends and potential technical dead-ends. Some critics argue that the MoE architecture with only 17 billion activated parameters may be insufficient 1. Additionally, rumors circulated about Meta artificially boosting benchmark scores, which were promptly denied by Ahmad Al-Dahle, VP of generative AI at Meta 3.

As the AI landscape continues to evolve rapidly, Meta's Llama 4 release highlights the ongoing challenges in balancing ambitious claims with practical applications. The coming weeks will likely provide more clarity on the true capabilities and limitations of these new models as developers and researchers put them to the test in real-world scenarios.

References

Summarized by

Navi

[1]

Ars Technica

|Meta's surprise Llama 4 drop exposes the gap between AI ambition and reality

[2]

TechCrunch

|Meta releases Llama 4, a new crop of flagship AI models | TechCrunch

[3]

TechCrunch

|Meta exec denies the company artificially boosted Llama 4's benchmark scores | TechCrunch

[4]

CNET

|Meta Dropped Llama 4: What to Know About the Two New AI Models

[5]

The Verge

|Meta AI gets two new models as Meta releases Llama 4

Explore today's top stories

Google Hires Windsurf CEO and Top Talent, Derailing OpenAI's $3 Billion Acquisition

Google DeepMind hires Windsurf's CEO, co-founder, and top AI coding talent, effectively ending OpenAI's planned $3 billion acquisition. The move highlights the intense competition for AI talent among tech giants.

12 Sources

Business and Economy

23 hrs ago

Google Hires Windsurf CEO and Top Talent, Derailing

12 Sources

Business and Economy

23 hrs ago

OpenAI Delays Release of Open Model Indefinitely for Safety Testing

OpenAI CEO Sam Altman announces an indefinite delay in the release of the company's highly anticipated open model, citing the need for additional safety testing and review of high-risk areas.

3 Sources

Technology

23 hrs ago

OpenAI Delays Release of Open Model Indefinitely for Safety

3 Sources

Technology

23 hrs ago

Google Secures $2.4 Billion Deal for Windsurf's AI Coding Technology and Key Talent

Google has agreed to pay $2.4 billion to license AI-assisted coding technology from Windsurf, while also hiring the startup's CEO and key staff for its DeepMind division, intensifying the race for AI dominance in Silicon Valley.

4 Sources

Technology

23 hrs ago

Google Secures $2.4 Billion Deal for Windsurf's AI Coding

4 Sources

Technology

23 hrs ago

SpaceX Invests $2 Billion in Elon Musk's xAI, Deepening Ties Between His Ventures

SpaceX commits $2 billion to xAI as part of a $5 billion equity round, valuing the merged xAI and X at $113 billion. The investment strengthens connections between Musk's companies, with Grok chatbot expanding its role across his ventures.

2 Sources

Business and Economy

7 hrs ago

SpaceX Invests $2 Billion in Elon Musk's xAI, Deepening

2 Sources

Business and Economy

7 hrs ago

GPUHammer: New RowHammer Attack Variant Threatens AI Model Integrity on NVIDIA GPUs

Researchers demonstrate a new RowHammer attack variant called GPUHammer that can degrade AI model accuracy on NVIDIA GPUs. NVIDIA recommends enabling System-level Error Correction Codes (ECC) as a defense.

2 Sources

Technology

15 hrs ago

GPUHammer: New RowHammer Attack Variant Threatens AI Model

2 Sources

Technology

15 hrs ago

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

The Outpost

Top stories

News

About