



5 Sources
5 Sources
[1]

3 ways Meta's Llama 3.1 is an advance for Gen AI
Meta on Tuesday unveiled the latest incarnation of Llama, its family of large language models (LLMs). The company says Llama 3.1 is the first open-source "frontier model," a claim reserved generally for the biggest examples of AI code. Llama 3.1 comes in multiple sizes, and the largest, "405B," is not only noteworthy for the scale of computing it involves -- Llama 3.1 has 405 billion neural "weights," or, parameters, larger than prominent open-source models such as Nvidia's Nemotron 4, Google's Gemma 2, and Mixtral -- it is also significant for three choices that the Meta team made. Taken together, the three decisions are a tour de force of neural network engineering and are at the heart of how the company built and trained Llama 3.1 405B. They complement advances Meta showed with Llama 2 that suggested ways to slim down deep learning's total compute budget. Also: Meta's 'pruning' of Llama 2 model shows path to slimmer AI (An "AI model" is the part of an AI program that contains numerous neural net parameters and activation functions that are the key elements for how an AI program functions.) First, Llama 3.1 405B dispenses with what's called a "mixture of experts," the approach Google uses for its newest closed-source model, Gemini 1.5, and Mistral uses for its Mixtral models. A mixture of experts creates various alternate combinations of the neural weights. Some can be switched off so that a subset of weights is used to make predictions. Meta's researchers "opted for a standard decoder-only transformer model architecture," the near-ubiquitous building block first developed in 2017 as Google's Transformer. The researchers claim this makes the model more stable during its training. Also: Anthropic launches Claude 3.5 Sonnet and debuts Artifacts for collaboration Second, to improve the results of the plain-vanilla transformer-based model, Meta's researchers describe an ingenious approach to training the model in stages. It's well known that both the amount of training data and the amount of compute used can be balanced in an optimal way to produce better predictions. As described in the formal paper for Llama 3.1, the researchers took a look at existing "scaling laws," which tell how well a model will do at producing a correct prediction depending on the size of the model and the amount of training data. That approach doesn't really tell how good a model is at carrying out a "downstream" task, such as a standardized test of reasoning. Instead, Meta came up with its own scaling law. The company progressively increased both the amount of training data and the amount of compute, checking over multiple iterations to see how well the resulting trained model does on the downstream tasks. "We use the resulting compute-optimal models to forecast the performance of the flagship Llama 3 model on benchmark datasets," the Meta team wrote. This approach has echoes of Meta's recent research, in which the researchers train the model for the end outcome, rather than just a raw score on predicting the next word. The important part is that the iterative process of validating each successive data and compute combination is what leads to the selection of the 405 billion parameters as the sweet spot. "Based on this observation, we ultimately decided to train a flagship model with 405B parameters," the researchers wrote. The 405-billion-parameter model's final training was on 16,000 Nvidia H100 GPU chips, which are run on Meta's Grand Teton AI server. Meta used a complex system of clustering the many servers together to run batches of data, and the neural weights, in parallel. Also: Meta's GenAI moves from simple predictions to a chess game of consequences The third big innovation is executing an equally ingenious combination of steps after each round of training the model, known as "post-training." In post-training, a pre-trained Llama 3.1 is first subjected to human raters' expressed preferences, similar to what OpenAI and others do to shape the kinds of output the model produces. Then, Meta uses the human preferences to re-train the model in what's called "supervised fine-tuning," where the model is re-trained until it can pick out the desirable from the undesirable outputs in the human feedback. Meta then adds to the fine-tuning with a technique introduced this year by Stanford University AI scholars called "direct preference optimization," or, DPO. It's a form of the "reinforcement learning from human feedback" that OpenAI made popular, but it's designed to be much more efficient. To these broad post-training approaches, the Meta researchers add a couple of twists. For one, they post-trained Llama 3.1 405B to use "tools," external programs that can perform functions such as search engines. That involves things such feeding the model examples of prompts that are solved by invoking API calls. By fine-tuning Llama on the examples, Meta claims the model gains much better "zero-shot" tool use, the ability to invoke a tool that has not actually been shown in its training data. To diminish the prevalence of "hallucinations," the authors cherry-pick examples of the training data and craft original question-answer pairs. They use these to further fine-tune the model in order to "encourage the model to only answer questions which it has knowledge about, and refuse answering those questions that it is unsure about." Also: We found a way to escape Meta AI on Facebook - but there's a catch The Meta researchers characterized all of their choices as aiming for simplicity. "Throughout the development of the Llama 3 model family, we found that a strong focus on high-quality data, scale, and simplicity consistently yielded the best results," they wrote. "In preliminary experiments, we explored more complex model architectures and training recipes but did not find the benefits of such approaches to outweigh the additional complexity they introduce in model development." Certainly, the scale of the program is a landmark for open-source models, which typically have been far smaller than their commercial, closed-source competitors. Meta co-founder and CEO Mark Zuckerberg lauded the economics of using Llama 3.1. "Developers can run inference on Llama 3.1 405B on their own infra[structure] at roughly 50% the cost of using closed models like GPT-4o, for both user-facing and offline inference tasks," Zuckerberg wrote. Zuckerberg also broadly defended open-source AI as a natural evolution of software. It is the equivalent, he wrote, of the Unix operating system that evolved from early proprietary versions to "a more advanced, secure, and broader ecosystem" because of its open-source versions. Also: Meta inches toward open source AI with new LLaMA 3.1 As ZDNET's Steven Vaughan-Nichols writes, however, some details have been left out of Meta's code posting on Hugging Face, and its code license is more restrictive than other open-source licenses. That means that Llama 3.1 is kind-of open source, but not entirely. Although reasonable parties can disagree on how strictly they recognize the open-source nature of Llama 3.1, the fact that so much detail is offered about the training process of the model is itself a welcome trove of disclosure. This is especially true at a time when OpenAI and Google increasingly share little if any information about how they construct their closed-source models.
[2]

Meta releases its biggest 'open' AI model yet | TechCrunch
Meta's latest open-source AI model is its biggest yet. Today, Meta said it is releasing Llama 3.1 405B, a model containing 405 billion parameters. Parameters roughly correspond to a model's problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. At 405 billion parameters, Llama 3.1 405B isn't the absolute largest open-source model out there, but it's the biggest in recent years. Trained using 16,000 Nvidia H100 GPUs, it also benefits from newer training and development techniques that Meta claims makes it competitive with leading proprietary models like OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet (with a few caveats). As with Meta's previous models, Llama 3.1 405B is available to download or use on cloud platforms like AWS, Azure and Google Cloud. It's also being used on WhatsApp and Meta.ai, where it's powering a chatbot experience for U.S.-based users. Like other open- and closed-source generative AI models, Llama 3.1 405B can perform a range of different tasks, from coding and answering basic math questions to summarizing documents in eight languages (English, German, French, Italian, Portuguese, Hindi, Spanish and Thai). It's text-only, meaning that it can't, for example, answer questions about an image, but most text-based workloads -- think analyzing files like PDFs and spreadsheets -- are within its purview. Meta wants to make it known that it is experimenting with multimodality. In a paper published today, researchers at the company write that they're actively developing Llama models that can recognize images and videos, and understand (and generate) speech. Still, these models aren't yet ready for public release. To train Llama 3.1 405B, Meta used a data set of 15 trillion tokens dating up to 2024 (tokens are parts of words that models can more easily internalize than whole words, and 15 trillion tokens translates to a mind-boggling 750 billion words). It's not a new training set per se, since Meta used the base set to train earlier Llama models, but the company claims it refined its curation pipelines for data and adopted "more rigorous" quality assurance and data filtering approaches in developing this model. The company also used synthetic data (data generated by other AI models) to fine-tune Llama 3.1 405B. Most major AI vendors, including OpenAI and Anthropic, are exploring applications of synthetic data to scale up their AI training, but some experts believe that synthetic data should be a last resort due to its potential to exacerbate model bias. For its part, Meta insists that it "carefully balance[d]" Llama 3.1 405B's training data, but declined to reveal exactly where the data came from (outside of webpages and public web files). Many generative AI vendors see training data as a competitive advantage and so keep it and any information pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive for companies to reveal much. In the aforementioned paper, Meta researchers wrote that compared to earlier Llama models, Llama 3.1 405B was trained on an increased mix of non-English data (to improve its performance on non-English languages), more "mathematical data" and code (to improve the model's mathematical reasoning skills), and recent web data (to bolster its knowledge of current events). Recent reporting by Reuters revealed that Meta at one point used copyrighted e-books for AI training despite its own lawyers' warnings. The company controversially trains its AI on Instagram and Facebook posts, photos and captions, and makes it difficult for users to opt out. What's more, Meta, along with OpenAI, is the subject of an ongoing lawsuit brought by authors, including comedian Sarah Silverman, over the companies' alleged unauthorized use of copyrighted data for model training. "The training data, in many ways, is sort of like the secret recipe and the sauce that goes into building these models," Ragavan Srinivasan, VP of AI program management at Meta, told TechCrunch in an interview. "And so from our perspective, we've invested a lot in this. And it is going to be one of these things where we will continue to refine it." Llama 3.1 405B has a larger context window than previous Llama models: 128,000 tokens, or roughly the length of a 50-page book. A model's context, or context window, refers to the input data (e.g., text) that the model considers before generating output (e.g., additional text). One of the advantages of models with larger contexts is that they can summarize longer text snippets and files. When powering chatbots, such models are also less likely to forget topics that were recently discussed. Two other new, smaller models Meta unveiled today, Llama 3.1 8B and Llama 3.1 70B -- updated versions of the company's Llama 3 8B and Llama 3 70B models released in April -- also have 128,000-token context windows. The previous models' contexts topped out at 8,000 tokens, which makes this upgrade fairly substantial assuming the new Llama models can effectively reason across all that context. All of the Llama 3.1 models can use third-party tools, apps and APIs to complete tasks, like rival models from Anthropic and OpenAI. Out of the box, they're trained to tap Brave Search to answer questions about recent events, the Wolfram Alpha API for math- and science-related queries, and a Python interpreter for validating code. In addition, Meta claims the Llama 3.1 models can use certain tools they haven't seen before -- to an extent. If benchmarks are to be believed (not that benchmarks are the end-all be-all in generative AI), Llama 3.1 405B is a very capable model indeed. That'd be a good thing, considering some of the painfully obvious limitations of previous-generation Llama models. Llama 3 405B performs on par with OpenAI's GPT-4, and achieves "mixed results" compared to GPT-4o and Claude 3.5 Sonnet, per human evaluators that Meta hired, the paper notes. While Llama 3 405B is better at executing code and generating plots than GPT-4o, its multilingual capabilities are overall weaker, and Llama 3 405B trails Claude 3.5 Sonnet in programming and general reasoning. And because of its size, it needs beefy hardware to run. Meta recommends at least a server node. That's perhaps why Meta's pushing its smaller new models, Llama 3.1 8B and Llama 3.1 70B, for general-purpose applications like powering chatbots and generating code. Llama 3.1 405B, the company says, is better reserved for model distillation -- the process of transferring knowledge from a large model to a smaller, more efficient model -- and generating synthetic data to train (or fine-tune) alternative models. To encourage the synthetic data use case, Meta said it has updated Llama's license to let developers use outputs from the Llama 3.1 model family to develop third-party AI generative models (whether that's a wise idea is up for debate). Importantly, the license still constrains how developers can deploy Llama models: App developers with more than 700 million monthly users must request a special license from Meta that the company will grant on its discretion. That change in licensing around outputs, which allays a major criticism of Meta's models within the AI community, is a part of the company's aggressive push for mindshare in generative AI. Alongside the Llama 3.1 family, Meta is releasing what it's calling a "reference system" and new safety tools -- several of these block prompts that might cause Llama models to behave in unpredictable or undesirable ways -- to encourage developers to use Llama in more places. The company is also previewing and seeking comment on the Llama Stack, a forthcoming API for tools that can be used to fine-tune Llama models, generate synthetic data with Llama, and build "agentic" applications -- apps powered by Llama that can take action on a user's behalf. "We have heard repeatedly from developers is an interest in learning how to actually deploy [Llama models] in production," Srinivasan said. "So we're trying to start giving them a bunch of different tools and options." In an open letter published this morning, Meta CEO Mark Zuckerberg lays out a vision for the future in which AI tools and models reach the hands of more developers around the world, ensuring people have access to the "benefits and opportunities" of AI. It's couched very philanthropically, but implicit in the letter is Zuckerberg's desire that these tools and models be of Meta's making. Meta's racing to catch up to companies like OpenAI and Anthropic, and it is employing a tried-and-true strategy: give tools away for free to foster an ecosystem and then slowly add products and services, some paid, on top. Spending billions of dollars on models that it can then commoditize also has the effect of driving down Meta competitors' prices and spreading the company's version of AI broadly. It also lets the company incorporate improvements from the open source community into its future models. Llama certainly has developers' attention. Meta claims Llama models have been downloaded over 300 million times, and more than 20,000 Llama-derived models have been created so far. Make no mistake, Meta's playing for keeps. It is spending millions on lobbying regulators to come around to its preferred flavor of "open" generative AI. None of the Llama 3.1 models solve the intractable problems with today's generative AI tech, like its tendency to make things up and regurgitate problematic training data. But they do advance one of Meta's key goals: Becoming synonymous with generative AI. There are costs to this. In the research paper, the co-authors -- echoing Zuckerberg's recent comments -- discuss energy-related reliability issues with training Meta's ever-growing generative AI models. "During training, tens of thousands of GPUs may increase or decrease power consumption at the same time, for example, due to all GPUs waiting for checkpointing or collective communications to finish, or the startup or shutdown of the entire training job," they write. "When this happens, it can result in instant fluctuations of power consumption across the data center on the order of tens of megawatts, stretching the limits of the power grid. This is an ongoing challenge for us as we scale training for future, even larger Llama models." One hopes that training those larger models won't force more utilities to keep old coal-burning power plants around.
[3]

Meta says all new Llama 3.1 405B model bests OpenAI's GPT-4
Meta today released Llama 3.1 405B, its largest and most capable large language model yet, which the social network claims can go toe-to-toe with OpenAI and Anthropic's top models. "Our experimental evaluation suggests that our flagship model is competitive with leading foundation models across a range of tasks, including GTP-4, GPT-4o, and Claude 3.5 Sonnet," Meta boasted in an announcement, describing the neural network as the "world's largest and most capable openly available foundation model." As you'd expect for an LLM, Llama 3.1 405B generates prose, chat responses, and more from input prompts. First teased alongside the launch of its smaller eight- and 70-billion parameter siblings earlier this spring, Meta's Llama 3.1 405B was trained on more than 15 trillion tokens -- think of these each as fragments of words, phrases, figures, and punctuation -- using 16,000 Nvidia H100 GPUs. In total, the Facebook giant says training the 405-billion-parameter model required the equivalent of 30.84 million GPU hours and produced the equivalent of 11,390 tons of CO emissions. However, Meta insists this much computing power was necessary to train the latest Llama in a meaningful amount of time, and is its first model trained at this scale. The Instagram titan also stuck with a standard decoder-only transformer architecture, rather than implement a more complex mixture of expert models to improve stability during training. The result is a model that, at least according to Meta's benchmarks, is ahead of larger, more proprietary systems from OpenAI and Anthropic on a variety of benchmarks. OpenAI's GPT-4, for reference, is reportedly on the scale of 1.8 trillion parameters in size. Despite being smaller than some competing models, you'll still need a rather beefy system to get Llama trotting along. At 405 billion parameters, Meta's model would require roughly 810GB of memory to run at the full 16-bit precision it was trained at. To put that in perspective, that's more than a single Nvidia DGX H100 system (eight H100 accelerators in a box) can handle. Because of this, Meta has released a 8-bit quantized version of the model, which cuts its memory footprint roughly in half. It's not clear whether this quantization step was implemented before or after training; we've asked Meta for clarification on this. In the meantime, you can find our hands-on guide for post-training quantization here. In addition to the larger 405-billion-parameter model, Meta is also rolling out a slew of updates to its larger Llama 3 family. With the 3.1 release, all three models, including the original 8B and 70B variants, have been upgraded with support for eight languages (English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai) and a substantially larger 128,000 token context window. That's up from 8,000 tokens for the original Llama 3 8B and 70B releases. You can think of an LLM's context window a bit like its short-term memory. The bigger the context window, the more information the model can hold onto at any given moment when generating responses to input prompts. 8,000 tokens may sound like a reasonable context window for something like a customer-service chatbot, or for certain tasks such as long-form summarization or coding assistance, a much larger context size is definitely beneficial. This is why Google is so keen to highlight Gemini's one million token context window. And at least according to Meta, Llama 3.1's larger context window has been achieved without compromising the quality of the models, which it claims have much stronger reasoning capabilities. Well, highly artificial reasoning; as always, there is no sentient intelligence here. You can find more information about Meta's third-generation Llama models, and the approach Meta took to training them, in our launch day coverage here. Alongside the new and updated models, Meta also outlined its vision for where Llama will go next. "Llama models were always intended to work as part of an overall system that can orchestrate several components, including calling external tools," the social network giant wrote. "Our vision is to go beyond the foundation models to give developers access to a broader system that gives them the flexibility to design and create custom offerings that align with their vision." As part of this, Meta has released a reference system which includes sample apps and components such as the Llama Guard 3 safety model and Prompt Guard, its prompt-injection filter. However, Meta admits that its vision is still developing and the biz is seeking feedback from industry partners, startups, and community members, to shape its AI direction. As part of this, Meta has opened a request for comment on its GitHub page for what it's calling the Llama Stack. Llama Stack will eventually form a series of standardized interfaces that define how toolchain components -- for example, fine-tuning or synthetic data generation -- or agentic applications should be built. Meta's hope is that by crowdsourcing these efforts such interfaces will become the industry standard. Meta's position on developing AI models in the open hasn't changed much. CEO Mark Zuckerberg emphasized the importance of open AI development in a letter published Tuesday that drew comparisons to the open source Linux kernel's victory over proprietary Unix operating systems. "Linux gained popularity - initially because it allowed developers to modify its code however they wanted and was more affordable, and over time because it became more advanced, more secure, and had a broader ecosystem supporting more capabilities than any closed Unix," he wrote. "I believe that AI will develop in a similar way." In line with this, Meta is also modifying Llama's license structure to allow developers to use the outputs from Llama models to improve other models. For example, if you wanted to use Llama 3.1 405B to generate a mountain of synthetic data to train a smaller non-Meta model, you can now do that. It's worth noting, Llama's licensing has proven to be somewhat contentious in the past. If you can't get behind Meta's license, there are several MIT and Apache 2.0 licensed models from Microsoft, Mistral, and others. In any case, the trio of Llama 3.1 models are available for download on both Hugging Face and Meta's website, and if you'd like to try them -- including 405B -- at home, check out our local LLM guide here. ®
[4]

The first GPT-4-class AI model anyone can download has arrived: Llama 405B
"Open source AI is the path forward," says Mark Zuckerberg, misusing the term. In the AI world, there's a buzz in the air about a new AI language model released Tuesday by Meta: Llama 3.1 405B. The reason? It's potentially the first time anyone can download a GPT-4-class large language model (LLM) for free and run it on their own hardware. You'll still need some beefy hardware: Meta says it can run on a "single server node," which isn't desktop PC-grade equipment. But it's a provocative shot across the bow of "closed" AI model vendors such as OpenAI and Anthropic. Further Reading "Llama 3.1 405B is the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation," says Meta. Company CEO Mark Zuckerberg calls 405B "the first frontier-level open source AI model." In the AI industry, "frontier model" is a term for an AI system designed to push the boundaries of current capabilities. In this case, Meta is positioning 405B among the likes of the industry's top AI models, such as OpenAI's GPT-4o, Claude's 3.5 Sonnet, and Google Gemini 1.5 Pro. A chart published by Meta suggests that 405B gets very close to matching the performance of GPT-4 Turbo, GPT-4o, and Claude 3.5 Sonnet in benchmarks like MMLU (undergraduate level knowledge), GSM8K (grade school math), and HumanEval (coding). But as we've noted many times since March, these benchmarks aren't necessarily scientifically sound or translate to the subjective experience of interacting with AI language models. In fact, this traditional slate of AI benchmarks is so generally useless to laypeople that even Meta's PR department now just posts a few images of charts and doesn't even try to explain them in any detail. We've instead found that measuring the subjective experience of using a conversational AI model (through what might be called "vibemarking") on A/B leaderboards like Chatbot Arena is a better way to judge new LLMs. In the absence of Chatbot Arena data, Meta has provided the results of its own human evaluations of 405B's outputs that seem to show Meta's new model holding its own against GPT-4 Turbo and Claude 3.5 Sonnet. Whatever the benchmarks, early word on the street (after the model leaked on 4chan yesterday) seems to match the claim that 405B is roughly equivalent to GPT-4. It took a lot of expensive computer training time to get there -- and money, of which the social media giant has plenty to burn. Meta trained the 405B model on over 15 trillion tokens of training data scraped from the web (then parsed, filtered, and annotated by Llama 2), using more than 16,000 H100 GPUs. So what's with the 405B name? In this case, "405B" means 405 billion parameters, and parameters are numerical values that store trained information in a neural network. More parameters translate to a larger neural network powering the AI model, which generally (but not always) means more capability, such as better ability to make contextual connections between concepts. But larger-parameter models have a tradeoff in needing more computing power (AKA "compute") to run. Further Reading We've been expecting the release of a 400+ billion parameter model of the Llama 3 family since Meta gave word that it was training one in April, and today's announcement isn't just about the biggest member of the Llama 3 family: There's an entire new iteration of improved Llama models with the designation "Llama 3.1." That includes upgraded versions of its smaller 8B and 70B models, which now feature multilingual support and an extended context length of 128,000 tokens (the "context length" is roughly the working memory capacity of the model, and "tokens" are chunks of data used by LLMs to process information). Meta says that 405B is useful for long-form text summarization, multilingual conversational agents, and coding assistants and for creating synthetic data used to train future AI language models. Notably, that last use-case -- allowing developers to use outputs from Llama models to improve other AI models -- is now officially supported by Meta's Llama 3.1 license for the first time. Abusing the term "open source" Llama 3.1 405B is an open-weights model, which means anyone can download the trained neural network files and run them or fine-tune them. That directly challenges a business model where companies like OpenAI keep the weights to themselves and instead monetize the model through subscription wrappers like ChatGPT or charge for access by the token through an API. Further Reading Fighting the "closed" AI model is a big deal to Mark Zuckerberg, who simultaneously released a 2,300-word manifesto today on why the company believes in open releases of AI models, titled, "Open Source AI Is the Path Forward." More on the terminology in a minute. But briefly, he writes about the need for customizable AI models that offer user control and encourage better data security, higher cost-efficiency, and better future-proofing, as opposed to vendor-locked solutions. All that sounds reasonable, but undermining your competitors using a model subsidized by a social media war chest is also an efficient way to play spoiler in a market where you might not always win with the most cutting-edge tech. That benefits Meta, Zuckerberg says, because he doesn't want to get locked into a system where companies like his have to pay a toll to access AI capabilities, drawing comparisons to "taxes" Apple levies on developers through its App Store. So about that "open source" term. As we first wrote in an update to our Llama 2 launch article a year ago, "open source" has a very particular meaning that has traditionally been defined by the Open Source Initiative. The AI industry has not yet settled on terminology for AI model releases that ship either code or weights with restrictions (such as Llama 3.1) or that ship without providing training data. We've been calling these releases "open weights" instead. Unfortunately for terminology sticklers, Zuckerberg has now baked the erroneous "open source" label into the title of his potentially historic aforementioned essay on open AI releases, so fighting for the correct term in AI may be a losing battle. Still, his usage annoys people like independent AI researcher Simon Willison, who likes Zuckerberg's essay otherwise. "I see Zuck's prominent misuse of 'open source' as a small-scale act of cultural vandalism," Willison told Ars Technica. "Open source should have an agreed meaning. Abusing the term weakens that meaning which makes the term less generally useful, because if someone says 'it's open source,' that no longer tells me anything useful. I have to then dig in and figure out what they're actually talking about." The Llama 3.1 models are available for download through Meta's own website and on Hugging Face. They both require providing contact information and agreeing to a license and an acceptable use policy, which means that Meta can technically legally pull the rug out from under your use of Llama 3.1 or its outputs at any time.
[5]

The 'Linux Moment' in AI Has Finally Arrived, Says Meta Chief Mark Zuckerberg
Developers can run inference on Llama 3.1 405B on their own infrastructure at roughly 50% of the cost of using closed models like GPT-4o. A few days ago, AIM asked a pertinent question: "When will the 'Linux moment' in AI arrive?" It looks like it has arrived sooner than expected. Yesterday, Meta released Llama 3.1, with 405 billion parameters, its largest and best model yet. The state-of-the-art model outperformed OpenAI's GPT-4o on several benchmarks, making it the best open-source model available. Llama 3.1 is competitive with leading foundation models like GPT-4, GPT-4o, and Claude 3.5 Sonnet across various benchmarks. The smaller models in the Llama 3.1 series also perform on par with both closed and open models of similar parameter sizes. Meta chief Mark Zuckerberg is confident that Llama 3.1 will have a similar impact on the AI ecosystem as Linux had on the operating system world. "Today, Linux is the industry standard foundation for both cloud computing and the operating systems that run most mobile devices - and we all benefit from superior products because of it," he said, adding, "I believe that AI will develop in a similar way. Today, several tech companies are developing leading closed models. But open source is quickly closing the gap." "I think open source AI is going to become the industry standard just like Linux did. It gives you control to customise and run your own models. You don't have to send your data to another, and it's more affordable," he added. Echoing similar sentiments, OpenAI co-founder Andrej Karpathy said, "I'd like to say that it is still very early days, that we are back in the ~1980s of computing all over again, that LLMs are a next major computing paradigm, and Meta is clearly positioning itself to be the open ecosystem leader of it." He elaborated on the applications of AI models, stating that people will prompt and retrieve information from the models, fine-tune them, distil them into smaller expert models for specific tasks and applications, and study, benchmark, and optimise these models. Recently, Karpathy also discussed how AI kernels could potentially replace current operating systems. Karpathy envisions an operating system where the LLM acts as the kernel, managing and coordinating system resources and user interactions, and AI agents function as applications, providing various services and functionalities. In this system, natural language serves as the primary programming and interaction interface, allowing users to communicate with the system in plain English or other languages. Zuckerberg anticipates that Meta AI will become the leading chatbot by the end of this year, given that ChatGPT currently boasts over 100 million users. Interestingly, Meta has not yet disclosed any usage statistics for its own assistant. Meta anticipates that billions of AI agents will be developed worldwide using open-source models. "Our vision is that there should be a lot of different AIs and AI services out there, not just one singular AI, and that really informs the open-source approach," said Zuckerberg. "A lot of what we're focused on is giving every creator and every small business the ability to create AI agents for themselves, making it so that every person on our platforms can create their own AI agents to interact with," he added. Vinod Khosla has been dreaming about it too. He envisions a future in which internet access will mostly be through agents acting for consumers, carrying out tasks, and fending off marketers and bots. "Tens of billions of agents on the internet will be normal," he wrote. "Eventually, all our interactions with the digital world will be mediated by AI assistants. This means that AI assistants will constitute a repository of all human knowledge and culture; they will constitute a shared infrastructure like the internet is today," said LeCun. He urged platforms to be open-source and said that we cannot have a small number of AI assistants controlling the entire digital diet of every citizen across the world, taking a dig at OpenAI and a few other companies without naming them. "This will be extremely dangerous for diversity of thought, for democracy, and for just about everything," he added. Surprisingly, for the first time in history, Meta also updated its licence to allow developers to use the outputs from Llama models -- including 405B -- to improve other models. "We're excited about how this will enable new advancements in the field through synthetic data generation and model distillation workflows, capabilities that have never been achieved at this scale in open source," said the company. Last week, a faulty sensor configuration update by CrowdStrike caused a significant Microsoft Windows outage, impacting global transport, finance, and medical sectors. Looking ahead, if AI agents are managed by a single entity and cloud infrastructure, a similar failure could re-occur. "I think the main thing that people are going to do [with Llama 3.1 405], especially because it's open source, is use it as a teacher to train smaller models that they use in different applications," said Zuckerberg. "If you just think about all the startups out there, or all the enterprises or even governments that are trying to do different things, they probably all need to, at some level, build custom models for what they're doing," he added. The idea was suggested earlier by Karpathy, who explained that in the future, as larger models help refine and optimise the training process, smaller models will emerge. "The models have to first get larger before they can get smaller because we need their (automated) help to refactor and mould the training data into ideal, synthetic formats." Last week, we saw the release of several small models that can be run locally without relying on the cloud. Small language models, or SLMs, are expected to become the future alongside generalised models like GPT-4 or Claude 3.5 Sonnet. There is no doubt that OpenAI felt pressure as it made fine-tuning available for its latest model, GPT-4o mini. "Customise GPT-4o mini for your application with fine-tuning. Available today to tier 4 and 5 users, we plan to gradually expand access to all tiers. The first 2M training tokens a day are free, through Sept 23," the company posted on X. Meanwhile, Zuckerberg has claimed that developers can run inference on Llama 3.1 405B on their own infrastructure at roughly 50% of the cost of using closed models like GPT-4o, for both user-facing and offline inference tasks. One thing is clear, OpenAI can't stay silent when its competitors are on the verge of releasing a new model. Throughout the day, the company has been emphasising safety and preparedness. "We won't release a new model if it crosses a 'medium' risk threshold until we implement sufficient safety interventions," the company said. OpenAI appears to be subtly suggesting that open-source models might present risks to society. Mira Murati, the chief technology officer of OpenAI, during a recent interview at the AI Everywhere event at Dartmouth College, said that OpenAI gives the government early access to new AI models, and they have been in favour of more regulation. "We've been advocating for more regulation on frontier models, which will have these amazing capabilities and also have a downside because of misuse. We've been very open with policymakers and working with regulators on that," she said. On the other hand, many are concerned about potential regulations stifling innovation. "I hope foolish regulations like California's proposed SB1047 don't stop such innovations," said Deeplearning.ai founder Andrew Ng on Llama 3.1 release. Meanwhile, Meta has decided not to release its multimodal Llama AI model in the European Union due to regulatory concerns. The company cited the lack of clarity and predictability in the EU's regulatory framework for AI, which includes existing regulations like the General Data Protection Regulation and forthcoming ones such as the AI Act.
Share
Share
Copy Link
Meta has released Llama 3.1, its largest and most advanced open-source AI model to date. This 405 billion parameter model is being hailed as a significant advancement in generative AI, potentially rivaling closed-source models like GPT-4.

Meta, the parent company of Facebook, has made a significant leap in the world of artificial intelligence with the release of Llama 3.1, its largest and most advanced open-source AI model to date
1
. This new model, boasting an impressive 405 billion parameters, is being hailed as a potential game-changer in the field of generative AI2
.Llama 3.1 represents a significant upgrade from its predecessor, Llama 2, which had 70 billion parameters. The new model's vast increase in size to 405 billion parameters puts it in direct competition with closed-source models like OpenAI's GPT-4
3
. This expansion in scale is expected to enhance the model's performance across a wide range of tasks, including natural language processing, code generation, and complex problem-solving.One of the most notable aspects of Llama 3.1 is its open-source nature. Unlike many of its competitors, Meta has made the full model weights and architecture publicly available for download and use
4
. This move aligns with Meta's commitment to democratizing AI technology and fostering innovation in the field.The release of Llama 3.1 has generated significant buzz in the tech industry. Mark Zuckerberg, Meta's CEO, has described this moment as the "Linux moment in AI," drawing parallels to the open-source revolution in operating systems
5
. This comparison underscores the potential for Llama 3.1 to drive collaborative development and accelerate AI research globally.Related Stories
While the release of Llama 3.1 is largely seen as a positive development, it also raises important questions about the ethical implications of widely accessible, powerful AI models. Concerns about potential misuse, such as the generation of deepfakes or misinformation, have been voiced by some experts in the field
1
.Early benchmarks suggest that Llama 3.1 performs comparably to, and in some cases outperforms, proprietary models like GPT-4 on various tasks
2
. This level of performance, combined with its open-source nature, positions Llama 3.1 as a potentially disruptive force in the AI landscape, challenging the dominance of closed-source models.Summarized by

Navi
[3]
[5]

Analytics India Magazine
|