Curated by THEOUTPOST
On Wed, 13 Nov, 12:02 AM UTC
5 Sources
[1]
The Gap Between Open and Closed AI Models Is Closing Faster Than Expected
Meta's Llama models are steadily closing the gap with OpenAI's GPT-4o and o1, pushing towards autonomous machine intelligence with advancements in real-time reasoning and adaptability. The gap between open and closed-source models is blurring. According to a recent study published by research group Epoch AI -- 'How Far Behind Are Open Models?' -- the best open-source large language models (LLMs) have lagged behind the best closed-source LLMs by five to 22 months in terms of benchmark performance. "Meta's Llama 3.1 405B is the most recent open model to close the gap across multiple benchmarks. The results are similar when we exclude Meta's Llama models," the report states. Meanwhile, Meta's chief AI scientist Yann LeCun said on LinkedIn, "In the future, our entire information diet is going to be mediated by [AI] systems. They will constitute basically the repository of all human knowledge. And you cannot have this kind of dependency on a proprietary, closed system." It is evident that the AI chatbot market is highly competitive. For instance, ChatGPT, which operates using closed models has around 350 million monthly users, while Meta's AI assistant, utilising open models, has close to 500 million monthly users. The report presents ample evidence for comparing the capabilities of open and closed AI models over time. It has systematically collected data on the availability of model weights and training codes for hundreds of AI models released since 2018, which are now available in the Notable AI Models database. Meta's Llama models are pushing towards autonomous machine intelligence with advancements in real-time reasoning and adaptability. As Manohar Paluri, VP of AI at Meta, told AIM, future Llama versions aim to "know that they're on the right track and backtrack if needed", enhancing complex problem-solving by combining perception, reasoning, and planning. Leveraging self-supervised learning (SSL) for broad knowledge acquisition and reinforcement learning with human feedback (RLHF) for task-specific alignment, Llama also excels in synthetic data generation for underserved languages, making it ideal for multilingual applications. Earlier this year, after Llama 3.1 was leaked, Meta officially released Llama 3.1 405B, a new frontier-level open-source AI model, alongside its 70B and 8B versions. Meta is offering developers free access to its weights and codes, and enabling fine-tuning, distillation, and deployment. The Llama 3.1 405B model performs on par with the best closed-source models. It supports a context length of 128k and eight languages and offers robust capabilities in code generation, complex reasoning, and tool use. "Meta AI is on track to reach our goal of becoming the most used AI assistant in the world by the end of the year," said Meta chief Mark Zuckerberg. "Today, several tech companies are developing leading closed models. But open source is quickly closing the gap. Last year, Llama 2 was only comparable to an older generation of models behind the frontier. This year, Llama 3 is competitive with the most advanced models and leading in some areas," he further said. Zuckerberg has predicted that, starting next year, future Llama models will become the most advanced in the industry. Furthermore, the launch of Llama 3.2 also enhanced edge AI and vision tasks, offering both small and medium vision LLMs (11B and 90B) and lightweight models (1B and 3B) optimised for on-device use, with robust support for Qualcomm and MediaTek hardware. "Llama 3.2 models bring SOTA capabilities to developers without the need for extensive resources, enabling innovation and breakthroughs directly on edge and mobile devices," Zuckerberg added. Interestingly, Llama 3.2 claimed to beat all closed-source models on vision, including Claude 3 Haiku and GPT-4o-mini. What are the Findings? "In terms of training compute, the largest open models have lagged behind the largest closed models by about 15 months," the report has found. It further highlighted that the release of Llama 3.1 405B relative to GPT-4 is consistent with this lag, at 16 months. According to benchmarks, Llama 3.1 outperforms OpenAI's GPT-4o in categories such as general knowledge, reasoning, reading comprehension, code generation, and multilingual capabilities. "Open-source is about to be SOTA (state-of-the-art) -- even the 70B is > GPT-4o, and this is before instruct tuning, which should make it even better," wrote an X user. However, the report mentioned that closed models are outperforming not only in accuracy benchmarks but also in user preference rankings. "In leaderboards based on human preferences between models, such as LMSYS Chatbot Arena and SEAL Coding, closed models such as OpenAI's o1 and Google DeepMind's Gemini 1.5 Pro outrank open models such as Llama 3.1 405B and DeepSeek-V2.5," it added. The analysis of benchmark performance and training compute shows that on the GPQA benchmark, open models lag by about five months, shorter than the 16-25 month lag seen on MMLU. "The lag on MMLU, GSM1k and BBH is also shorter at higher levels of accuracy, which were achieved more recently. This weakly suggests that the lag of open models has shortened in the past year," the findings revealed. However, Meta aims for Llama 4 to be the "most advanced model in the industry next year", requiring nearly 10 times more compute than Llama 3. With Llama 3.1, Meta has made it clear that their focus spans the entire LLM market, regardless of size. Rumours suggest Meta has already begun training Llama 4, which is expected to be multimodal with audio features, integrated into the Meta Ray-Ban glasses. Small language models have gained popularity over the past few months, and they can greatly help with several applications that do not demand high output accuracy. With further research and development focusing on improving the performance of SLMs and optimising LLMs, will we reach a point where standard large parameter models seem redundant for most applications? Meta's quantised models, Microsoft's Phi, HuggingFace's SmolLM and OpenAI's GPT Mini indicate strong efforts to build efficient, and small-sized models. The Indian AI ecosystem was quick to turn towards SLMs as well. Recently, Infosys and Saravam AI collaborated to develop small language models for banking and IT applications. Soon, we'll certainly see a rising interest in techniques and frameworks that optimise LLMs. The study stated that in terms of training compute, the top-1 open models have scaled at a similar pace to the top-1 closed models, at 4.6x/year for the past five years. This suggests that the lag of open models will remain stable rather than shorten. Meanwhile, looking at a broader set of models -- the top-10 -- open models have scaled at 3.6x/year, slower than closed models at 5.0x/year. This suggests a growing lag for this broader set of open models. * Small teams develop open-source models less frequently * Advanced neural networks need more engineering." The report however, highlights the trend in Meta's open Llama models and states that they "expect the lag between the best open and closed models to shorten next year". "On historical trends alone, the evidence for how the lag will change is mixed," the report further added. The report attributed the prevalence of notable open models to the significant growth in the overall number of open models. The model hosting platform HuggingFace, founded in 2016, currently hosts over 1 million open models. Additionally, CodeGPT mentioned on X, "Since launching in March of last year, @codegptAI has been downloaded over 1.4M times with users in 180+ countries. It's one of the top players in the AI for developers space and Llama models have been a big part of that." As a matter of fact,in the 2023 earnings call, NVIDIA chief Jensen Huang noted that everyone would be able to code, "you just have to say something to the computer". Without learning how to code, even kids can do it with the help of low-code or no-code platforms. For example, many of the current Indian language models such as Kannada Llama, or MalayaLLM, or Telugu Llama, have been created by college students, still in the second-year of their degree course. Without undermining their achievements, it is essential to note that the barrier to entry for training these models has become increasingly low.
[2]
LLMs Have Hit a Wall
"Scaling the right thing matters more now than ever," said former OpenAI co-founder and Safe Superintelligence (SSI) founder Ilya Sutskever in an interview with Reuters. He's reportedly working on an alternative approach to scale LLMs, and eventually build safe superintelligence. Sutskever believes that the 2010s were the age of scaling, now they're back in the age of wonder and discovery. "Some people can work really long hours and just go down the same path faster. It's not so much our style. But if you do something different, then it becomes possible for you to do something special," he said. Based on his academic and research interests, it is most likely that Sutskever is advancing AGI by scaling transformer architectures with a focus on reinforcement learning and self-supervised methods, which support models in learning from vast data with minimal human guidance and increase their adaptability to complex tasks. OpenAI, more or less, is also treading on a similar path. To tackle the scaling challenge the company plans to scale test-time compute and utilise the high-quality synthetic data generated by previous models. OpenAI reportedly uses Strawberry (o1) to generate synthetic data for GPT-5. This sets up a "recursive improvement cycle", where each GPT version (say, GPT-5 or GPT-6) will be trained on higher-quality synthetic data created by the previous model. Another former OpenAI co-founder and founder of Eureka Labs, Andrej Karpathy, also highlighted that LLMs lack thought process data, noting that current data is mostly fragmented information. He believes that enough high-quality thought process data can help in achieving AGI. "The big one, I think, is the present lack of "cognitive self-knowledge", which requires more sophisticated approaches in model post-training instead of the naive "imitate human labelers and make it big" solutions that have mostly gotten us this far," said Karpathy, while coining the term jagged intelligence. All of these developments come on the back of reports indicating that traditional scaling may be reaching its limits, with Gemini 2.0 and Anthropic's Opus 3.5 rumoured to underperform despite scaling efforts. The emphasis is shifting to quality synthetic data and scaling test-time compute. Meta's chief AI scientist, Yann LeCun, couldn't resist joining in to criticise OpenAI's new approach. "I don't want to say 'I told you so', but I told you so!" he said, adding that Meta has been working on 'the next thing' for a while now at FAIR. Meta Bets on Autonomous Machine Intelligence Earlier this year, Meta threw its hat in the ring in the pursuit of AGI by merging two major AI research efforts, FAIR and the GenAI team. Under the guidance of LeCun, the company is developing a 'world model' with reasoning capabilities akin to those of humans and animals, which LeCun dubs AMI (autonomous machine intelligence), aka 'friend' in French. Earlier this year, Meta released a new AI model called Video Joint Embedding Predictive Architecture (V-JEPA). It enhances machines' understanding of the world by analysing interactions between objects in videos. Last month, the company introduced several advanced models, including Segment Anything Model (SAM) 2.1, Meta Spirit LM, Layer Skip, SALSA, and Meta Lingua. Interestingly, Layer Skip optimises the performance of LLMs by selectively executing layers and verifying outputs. This end-to-end solution accelerates LLM generation times on new data without the need for specialised hardware or software. Besides this, Meta plans to launch Llama 4 early next year. Meta said that it leverages self-supervised learning (SSL) during its training to help Llama learn broad representations of data across domains, which allows for flexibility in general knowledge. RLHF (reinforcement learning with human feedback), which currently powers GPT-4o and a majority of other models, focuses on refining behaviour for specific tasks, ensuring that the model not only understands data but also aligns with practical applications. But, now OpenAI and others, seem to be walking the path of Meta's deep-learning school of thought. Meta recently also launched 'Self-Taught Evaluator', which can assess the performance of other models. It employs the chain of thought technique, breaking down complex problems into smaller, logical steps to improve accuracy in fields like science, coding, and mathematics. LeCun was right all along when he said auto-regressive LLMs are hitting a performance ceiling. "I've always said that LLMs were useful but were an off-ramp on the road towards human-level AI. I've said that reaching human-level AI will require new architectures and new paradigms," he recently clarified to Gary Marcus. Anthropic chief Dario Amodei, in a recent interview, discussed the various approaches to scaling, including the use of synthetic data coupled with reinforcement learning. However, he expressed scepticism about this method. "We'll overcome the data limitation, or there may be other sources of data available, but we could also observe that even if there's no problem with data, as we start to scale models up, they just stop getting better," he said. He also spoke about OpenAI's o1 approach, saying, "The other direction, of course, is these reasoning models that do the chain of thought and stop to think and reflect on their own thinking." Surprisingly, taking a leaf out of OpenAI's book, Anthropic recently added a new prompt improver to the Anthropic Console. It will take an existing prompt and Claude will automatically refine it with prompt engineering techniques like chain-of-thought reasoning. He believes the solution to this problem lies in finding a new architecture. "There have been problems in the past with, say, the numerical stability of models, where it looked like things were levelling off, but, you know, when we found the right unblocker, they didn't end up doing so," said Amodei, adding that there might be a new optimisation technique or a new technique to unblock things. "I've seen no evidence of that so far, but if things were to slow down, that could perhaps be one reason," he added. It appears that Anthropic currently plans to scale its compute. Amodei estimates that around $1 billion per AI company will be spent on compute this year, around $10 billion in 2025, and $100 billion in 2026. The question remains when Anthropic will have its o1 moment. Amodei revealed that the company will soon release Claude 3.5 Opus and is also progressing on Claude 4. Anthropic recently published a blog titled 'Mapping the Mind of a Large Language Model', which explains that LLMs can make analogies, recognize patterns, and even exhibit reasoning abilities by showing how features can be activated to manipulate responses. The researchers employed a technique called 'dictionary learning', borrowed from classical machine learning, which isolates patterns of neuron activations (called features) that recur across different contexts. Google DeepMind chief Demis Hassabis, in an interview earlier this year, explained that the research lab is focused on more than just scaling. "Half our efforts have to do with inventing the next architectures and the next algorithms that will be needed, knowing that larger and larger scaled models are coming down the line," he said. He further added that Google's upcoming models, including Gemini 2, will become multimodal. "As we start ingesting things like video and audiovisual data, as well as text data, the system starts correlating those things together," said Hassabis. Unlike Altman, Hassabis expects the AGI to come within the next decade. However, a recent report indicates that this approach isn't working for Google, as Gemini, despite increased computing power and extensive training data from online text and images, didn't meet the performance gains its leaders anticipated. He explained that their systems will begin to understand the physics of the real world better. "One could imagine the active version of that as a very realistic simulation or game environment where you're starting to learn about what your actions do in the world and how that affects the world itself," he added. Citing the example of AlphaGo and AlphaZero, he said these use RL agents that learn by interacting with an environment. The agent makes decisions, receives feedback (usually in the form of rewards or penalties), and adjusts its actions based on that feedback. In August, Google DeepMind published a paper titled Scaling LLM Test-Time Compute Optimally Can Be More Effective than Scaling Model Parameters similar to OpenAI's o1 strategy. The paper found that applying a compute-optimal scaling approach can improve test-time compute efficiency by 2-4x. It also showed that, when comparing additional test-time compute to pre-training compute in a FLOPs-matched setting, simple methods like revisions and search can significantly improve certain prompts, outperforming the gains from pre-training. Meanwhile, DeepMind is also betting on the neuro-symbolic approach. Its models, AlphaProof and AlphaGeometry, recently won a Silver Medal at the International Maths Olympiad. Many believe neuro-symbolic AI could help prevent the generative AI bubble from bursting. The path to AGI is fascinating, where scaling alone won't lead the way. From OpenAI's compute-heavy methods to Meta's human-like reasoning and DeepMind's neuro-symbolic models, each step takes us closer to a future where these models will truly understand and maybe even surpass our intelligence.
[3]
Legal tech darling Robin AI raises another $25 million
Hello and welcome to Eye on AI. In this newsletter...why a legal AI startup shows there's more to the AI boom than just foundational models; Zoox starts offering robotaxi rides in San Francisco; it's worryingly easy to jailbreak LLM-powered robots; is foundation model progress topping out? There's been a lot of talk this past week about whether the progress of general-purpose foundation models might be hitting a wall and what that means for the AI boom. (More on this in the Brain Food section below.) Some skeptics, such as Gary Marcus, are predicting a reckoning on par with the dotcom crash. I will be discussing this very topic tomorrow at Web Summit in Lisbon, moderating a center stage conversation at 4:25 pm local time on "Is the AI Bubble About to Burst?" with Moveworks CEO Bhavin Shah and the AI Now Institute's co-executive director Sarah Myers West. You can check it out on the Web Summit livestream! My view is that even if foundation model progress is decelerating, it may not matter as much for companies implementing AI applications for specific industries as it does for companies such as OpenAI whose $157 billion valuation seems largely predicated on achieving artificial general intelligence (AGI). Or at least it's predicated on a scenario in which OpenAI remains at the forefront of model development and has some kind of defensible moat around its business, which won't be the case if building ever bigger LLMs doesn't confer a significant capability advantage to justify the cost. Many of these AI application companies are in the business of selling a solution to a specific industry problem, not selling one particular AI model or some vague concept like "general purpose intelligence." In many cases, these solutions do not require AGI -- or even necessarily any further leaps in AI capabilities. In some cases, just coupling together several existing models and fine-tuning them on data relevant to a particular professional task is all that's required to create a pretty good business. A great example of this from the world of legal tech is Robin AI. The company was founded in 2019 by Richard Robinson, a former lawyer at the firm Clifford Chance, and James Clough, a former machine learning researchers at Imperial College and Kings College London. Robin doesn't just sell companies a particular piece of technology. Instead, it sells legal services to large corporations -- with some of those services delivered automatically through AI software, and some of those services delivered by human lawyers and paralegals on Robin's payroll, who are assisted by technology, including AI, that Robin has developed. "It's a combination of doing things that the models are currently capable of, but also investing in what is just out of reach today, and then using humans in the loop to bridge the capability gap," Robinson tells me. He acknowledges that "there is a gap between what people expect the models can do and what they can actually do reliably." For instance, he says, the most advanced AI models are now excellent at summarization and pretty good at translation. But they can't yet negotiate a complex legal document reliably nor can they draft a brief for a court case accurately. "They can do parts of the task, but nothing like the whole thing," he says. But -- and here's the crucial thing -- Robin AI has a viable business even with those gaps. And it will still have a viable business even if those gaps close only slowly, or perhaps even never close at all. That's because, while some customers do just buy the software from Robin, others outsource an entire legal task to the company -- and it is up to Robin to figure out how best to deliver that task at a given price. "We have people, but they are highly optimized with our technology and that massively reduces the cost," Robinson says, noting that the company does not engage in labor arbitrage by hiring paralegals in low-cost countries like India or the Philippines. Instead, it has lawyers and paralegals on the payroll in New York, London, and Singapore -- but they can work much faster assisted by Robin's legal copilot technology. And that tech doesn't just consist of foundation models developed by the likes of OpenAI, Anthropic, and Meta, but also a whole host of other technologies, including search algorithms and old-fashioned hard-coded rules, all chained together in a complex workflow. In a sign of confidence in Robin's prospects, Eye on AI can report that the company has closed a "Series B Plus" round of $25 million, on top of its initial $26 million Series B fundraising announced in January. This brings the total amount Robin AI has raised to date to $61.5 million. Investors in the new funding round include the venture arm of PayPal, billionaire Michael Bloomberg's family office, which is called Willets, and the University of Cambridge -- all of which are also customers of Robin AI. The original Series B round was led by Temasek. The company did not disclose its valuation following the latest investment. It said it is currently earning $10 million in annual recurring revenue. Robinson says the company wanted to take on further investment, even though it still has plenty of financial runway left from the initial Series B, in part to add additional features to a product called "Reports" that has proved especially popular with customers. Reports allows users to ask unlimited questions about a set of documents. It uses Anthropic's Claude model under the hood to help power its responses. Robinson says the company is hoping to add even more reasoning abilities to what Reports can do -- but that using the most advanced foundation models adds to the company's costs, which is why having additional funding in the bank is helpful. Robin AI is also in competition with a lot of deep-pocketed rivals, including Harvey AI, which is backed by OpenAI and this past summer raised a $100 million funding round at a $1.5 billion valuation. It is also competing with products from Thomson Reuters, which owns Westlaw and which has acquired several legal AI startups, including Casetext, which it bought for $650 million in 2023. In one recent case, Robin says it helped an unnamed U.S. biotech firm deal with a data breach -- reviewing 10,000 contracts, across 30 different contract types, to understand what the biotech's obligations were in terms of notifying counterparties about the breach. Using the Reports product, as well as Robin's human legal experts, Robin says the biotech was able to identify the 50 highest priority contracts that required notification in just hours, and have an action plan for all 10,000 contracts within 72 hours. It estimated that this saved the biotech company 93% of the time and 80% of the estimated $2.6 million it would have taken to hire an outside law firm to manually review the contracts. That's value companies are deriving from AI today. And it's value that is not going away, even if GPT-5 proves not to be as big an advance on GPT-4 as GPT-4 was on GPT-3. With that, here's more AI news. Jeremy Kahn jeremy.kahn@fortune.com @jeremyakahn **Before we get the news: If you want to learn more about what's next in AI and how your company can derive ROI from the technology, join me in San Francisco on December 9 and 10 for Fortune Brainstorm AI. We'll hear about the future of Amazon Alexa from Rohit Prasad, the company's senior vice president and head scientist, artificial general intelligence; we'll learn about the future of generative AI search at Google from Liz Reid, Google's vice president, search; and about the shape of AI to come from Christopher Young, Microsoft's executive vice president of business development, strategy, and ventures; and we'll hear from former San Francisco 49er Colin Kaepernick about his company Lumi and AI's impact on the creator economy. You can view the agenda and apply to attend here. (And remember, if you write the code KAHN20 in the "Additional comments" section of the registration page, you'll get 20% off the ticket price -- a nice reward for being a loyal "Eye on AI" reader!) Robotaxi company Zoox launches in San Francisco. The company's autonomous taxis will initially be available only to Zoox employees and operate only in one neighborhood, SoMa, Zoox said in a blog post. Still, this marks Zoox's entry into a second market, following Las Vegas where the company has operated autonomously on public roads since 2023 and where it has now expanded its operations to cover the Vegas Strip. Unlike some competing self-driving cars, Zoox's robotaxis lack manual controls. Google DeepMind and sister company Isomorphic make AlphaFold 3 publicly available for research. The two Alphabet divisions said in an updated blog post they were making the model weights and code of AlphaFold 3 available for free to academic researchers on GitHub. The model can predict the structure and interactions of every type of biological molecule, including proteins, DNA, RNA, ligands and more. It could help researchers in myriad ways. But commercial use of the model by drug discovery companies is not permitted under the AlphaFold 3 license terms. Chinese company Tencent claims title of most capable open weight AI model. The Chinese internet giant unveiled its Hunyuan-Large model, and said it beat Meta's Llama 3.1 405B model on a range of benchmark tests. As with Meta's models, Hunyuan is an "open model" but not truly an open-source one, since the model weights are made public, but not the data on which the model was trained. You can read more about Hunyuan and the benchmark results in a paper Tencent published here. It turns out that jailbreaking LLM-powered robots is just as easy as jailbreaking LLM-powered chatbots. That's perhaps not surprising, but it is disturbing. Researchers have found that large language models are relatively easy to jailbreak -- getting the AI system to jump its guardrails and provide outputs that it is not supposed to and that might be dangerous (like giving someone a recipe for building a bomb or telling someone to self-harm). But this kind of jailbreaking is even more dangerous when the LLM controls a real robot that can take actions in the world and might cause direct physical harm. Researchers at the University of Pennsylvania developed a piece of software called RoboPAIR, designed to automatically find prompts that will jailbreak an LLM-controlled robot and tested it on three different robot systems. In each case, RoboPAIR achieved 100% success rate in overcoming the robot's guardrails within a few days of trying. The system even worked against Go2, a robot control system whose code is not publicly-available, meaning RoboPAIR could only look at the robot's response to prompts for clues as to how to shape an attack to beat its guardrails. You can read more about the research in a story in IEEE Spectrum here. Art made by humanoid robot sells for $1 million at auction -- by Chris Morris Think Donald Trump's AI policy plans are predictable? Prepare to be surprised -- by Sharon Goldman Duolingo's new eyerolling emo chatbot Lily briefly replaces CEO on investor call to showcase its AI technology -- by Christiaan Hetzner Dec. 8-12: Neural Information Processing Systems (Neurips) 2024, Vancouver, British Columbia Are AI's scaling laws broken? Back in 2020, researchers at OpenAI posited that LLMs followed what they called scaling laws -- that taking the same basic model design but making the model larger and training it on more data would lead to an increase in performance proportional to the increase in model size and data. The OpenAI researchers called these scaling laws because they wanted to evoke laws of physics -- inexorable truths -- but they were never more than observations of what had seemed to work at the moment. And now there is growing evidence that they aren't holding any longer -- that an increase in model size and data may, after a certain point, yield diminishing returns. OpenAI has found that its latest AI model, codenamed Orion, which was supposed to be a successor to its GPT-4 model, has, despite being larger and trained on more data, failed to beat GPT-4 on some key metrics, according to a blockbuster report from The Information that cited unnamed company employees. In particular, Orion's skill at tasks such as coding, was not improved, and might have even been worse, than GPT-4o's. As a result, the publication reported, OpenAI is having to fall back on other techniques to improve Orion's performance. This may include fine-tuning the model more after its initial training, as well as merging the base Orion model with a system more similar to OpenAI's o1 "Strawberry" model, which is trained with reinforcement learning to use a search process across multiple possible response pathways to "reason" its way to a better answer. What will this mean for the whole AI boom? It's unclear, but it certainly makes OpenAI's path to AGI -- and that of the other companies that now say they are pursuing that goal, from Google DeepMind to Meta to Amazon -- look more difficult. The good news though is that this set back may mean companies will look more seriously at other AI architectures and algorithms that might be much more learning efficient -- using less data, less computer hardware, and less energy. And that should be good for the world, even if it might not be good news for OpenAI.
[4]
Liquid foundation models promise competition for LLMs - here's how
Liquid AI, an MIT spinout, has released the first set of Liquid Foundation Models (LFM) and a development kit that could replace the Large Language Models (LLM) underpinning most generative AI applications today. They have also developed application-specific versions for detecting fraud, analyzing time series data, autonomous driving, and understanding protein structures. Three families of 40, 3 and 1.3 billion feature models target cloud, computer and mobile & edge scenarios. The new LFMs require less compute to train, fine-tune, and run inferences in production. Also, they can process more data at runtime using far less memory than LLMs. They also tend to hallucinate less than LLMs, and it is easier to identify and correct the root cause of the problem when they do hallucinate. Just to be clear, the Liquid team has been working on the core concepts for many years, and the new release represents a production-ready platform for a new approach to generative AI. The core team first published about the discovery of a new shortcut to building the latest models in 2022, and they have since worked out the hiccups around training, fine-tuning, and deploying them at scale. The new models take advantage of a fundamentally new approach to crafting the neural networks that differ from the multi-layered perceptrons at the heart of most deep learning approaches today. The name liquid points to a more dynamic approach to architecting neurons informed by research into dynamic systems, signal processing, numerical linear algebra, and microscopic worm brains. This approach allowed the team to identify more efficient ways to correlate connections within or across different modalities of data, such as text, audio, video, sensor feeds, and customer interaction records. It also introduces the first practical alternative to the attention mechanism at the heart of transformers and LLMs. At the launch event, senior executives from AMD, Cap Gemini, Open AI, Shopify and others expressed confidence in the new approach. Stephen Gerard Pagliuca, co-chairman of Bain Capital and an investor in Liquid AI, said: I think liquid AI is a transformative technology. It is going to have the same kind of impact in AI that we saw from the internet on business and commerce and exploration and knowledge. Pagliuca contrasted Liquid AI's approach with LLMs, that are hard to explain and hallucinate. He is also impressed that the new approach uses less power and is more sustainable, explainable, and flexible. He believes the new approach will make it far easier and cheaper for enterprises to create domain-specific models at a lower cost. The new approach could allow companies to spin up a new foundation model for as little as five to ten million versus the five hundred million required today. He said: If every company had to spend $500 million to build a proprietary model, we'd have a very slow adoption of AI. Ralph Wittig, Head of R&D, AMD, expressed concern that the current approach to scaling AI is starting to hit a roadblock regarding sustainability and energy consumption. Datacenter GPUs now require a thousand watts per GPU, and scaling to thousands of GPUs could require gigawatts of power. He predicts growth across three waves that include cloud, laptops & phone computing, and finally, physically powered AI agents running on embedded systems: There's a trend to cloud right now but we think the client side as well as the edge embedded side are equally important. Enterprises are confident that LFMs could support new and potential use cases. Sebastian Bubeck, a member of the technical staff at Open AI and previously VP of AI at Microsoft, contrasts the conventional wisdom around scaling models and compute with Liquid's approach to building smaller and more competent models: By making the model smaller, you also discover new capabilities, for example, just the cost efficiency, the speed, which allows you to do all kinds of things that would not be possible with the very, very large language model. So these are two directions, scaling up and scaling down. And then there is a new direction, which I'm personally extremely excited about, and I think unlocks a completely new direction, which is to think harder. So, not only have a bigger model but have a model that tries to do more cycles before it gives you an answer... So I think this is a new technological breakthrough, this thinking harder, and I'm very optimistic that we haven't seen the end of scaling in that direction. Bubeck is also impressed that LFMs support feedback mechanisms that can improve performance in production, which contrasts with traditional kludges for adjusting transformer-based approaches after the fact. Keith Williams, CTO of Capgemini Engineering, is convinced that AI is transforming the way we build engineered products and the products themselves. But he sees this as an augmentation of activity rather than a magic dust you sprinkle on your problems to make them go away. There are two big concerns in the engineering world: correctness and constraints. A good guess is not sufficient if you are building an aircraft or car. Engineers want to know whether it is correct or not. So engineers are looking for technologies that help solve the problem of correctness. The other aspect is constraints, which is a two-fold concern. One aspect is the AI models' constraints, such as memory footprint, support for real-time performance, and size. Another aspect relates to the process, which includes standards and regulations. This is why explainability is important from an engineering perspective. One application Cap Gemini has developed with LFMs is a smart car handbook. Williams explained: You've got an 800-page owner's manual if you buy a car these days, and no one really reads it. And you need to be able to interact with that manual effectively. And an LFM is a really nice way to do that. But of course, you know, from an automotive viewpoint, you're limited with the compute. Okay, you cannot assume that you've got you've got connectivity. So we've managed to deploy a really lovely owner's manual type interaction on conventional hardware that you would find in today's automobile, which is a really great demonstration of a practical application of LFMs. Capgemini has also explored how LFMs could help optimize telecom networks that face huge challenges in reducing energy and cost in mobile networks. Here, they are using LFMs to model the impact of switching and turning antennas off to save energy and reduce costs at certain times. Williams says they have been comparing it to other deep learning approaches, and the LFMs are more accurate in predicting packet loss for the number of expected users at any point in time. Stuart Schreiber, CEO at Arena BioWorks, has been looking at how LFMs might support their mission to understand disease mechanisms and translate these insights into new medicines. Much work has been done on generative AI to predict and design static protein structures. However, proteins move around and change in the body, which is essential for their function. He believes that LFMs will make it easier to model how proteins change as they interact with enzymes and various kinds of cells. LFMs could also help to understand some of the causes and possible cures of different neurodegenerative diseases like Alzheimer's disease and schizophrenia. The commercial launch of a new approach to foundation models represents a sea change in the perspective that scaling LLMs alone will lead to more competent generative AI systems. It's time to think harder and not bigger. It is important to note that it took five years between the seminal discovery of transformers and the launch of the LLMs underpinning ChatGPT. It only took two years with LFMs, thanks to innovations in AI hardware, tooling, data science, and development processes. Other approaches, such as active inference, which John Reed previously podcasted, may not be far behind. Also, it is important to consider the various limitations of LFMs - the warning labels of what they are not good at yet. These include zero-shot code tasks, precise numerical calculations, time-sensitive information, counting 'r's in the word "strawberry," and human preference optimization. Liquid may improve support for some of these tasks, but it will take time. LFMs also require a new programming paradigm around working with operators, blocks, and backbones. These might be more efficient in the long run but will take time to appreciate and work with. One last insight is that Liquid AI has been optimizing its models to work across hardware from NVIDIA, AMD, Qualcomm, Cerebras, and Apple. Somewhat surprisingly, Intel was not on their initial list, which speaks volumes to Intel's AI challenges and Liquid AI's priorities. Onwards - reasonable competition in the field of gen AI approaches feels like a good thing.
[5]
Anthropic Will Accelerate
"Our mission has always been to get the most frontier model capabilities in as many people's hands as quickly as possible," said Jared Kaplan, Anthropic's chief science officer. Over the last few months, Anthropic has released several updates to its 3.5 series of Claude models, introducing impactful new features like Computer Use, Claude Artifacts, Analysis tool and Visual PDF. Despite these additions, Anthropic still hasn't increased its version number. In the latest podcast episode with Lex Fridman, however, Anthropic CEO Dario Amodei revealed intriguing details about Anthropic's future to pique one's curiosity about what's coming next. Amodei revealed that their Opus model isn't going anywhere, adding that Anthropic will release the much-anticipated update and launch Claude 3.5 Opus. He further revealed that Claude 4.0 will be released as per the usual business cycle. Even on the capital front, Anthropic is set to make big moves. Reports claiming Anthropic is raising another round of funds from Amazon have surfaced. So what does Claude's future look like? While Amodei confirmed that Anthropic will continue to upgrade Claude, he did not promise that newer versions would adhere to the expected nomenclature and may not carry the '4.0' tag. "I don't want to commit to it (a naming scheme). I would expect in a normal course of business that Claude 4 would come after Claude 3. 5, but you never know in this wacky field," he said. Amodei also revealed that Anthropic's main focus is improving the current models and shifting their capabilities. He suggested that the latest Sonnet 3.5 performs better than the Opus 3 and that their lightest, new Haiku 3.5 performs better than Sonnet 3 and is on par with the Opus 3. So, Claude 4.0 can only be expected if Anthropic achieves significant performance improvements in the latest Claude Sonnet 3.5 and the upcoming Opus 3.5. "Scaling is continuing. There will definitely be more powerful models coming from us than the models that exist today. That is certain. Or if there aren't, we've deeply failed as a company," Amodei further said. Over the last few months, Anthropic has mainly focused on making Claude the best coding model. While no other model, including the latest OpenAI o1, has matched Sonnet 3.5's coding capabilities, Anthropic faces some competition from China. The latest Qwen 2.5-Coder, an open-source model from Alibaba, supposedly scores better in benchmarks than the 3.5 Sonnet in coding. Moreover, they're also offering an 'Artifacts-like' experience for free on HuggingFace. It may take some time to check if the Qwen 2.5 Coder is better than Claude, but if it is, Anthropic will have to level up. AI coding tools like Cursor, and GitHub Copilot rely heavily on the capabilities of Claude 3.5 Sonnet. A few weeks ago in GitHub Universe, Anthropic's chief science officer Jared Kaplan, underscored the importance of staying on top of their game. "Our mission has always been to get the most frontier model capabilities in as many people's hands as quickly as possible. If we can't deploy them now, they'll be obsolete in six months to a year," he said. "Models have gone from 3% in January of this year to 50% in October of this year... But I would guess that in another 10 months, we'll probably get pretty close. We'll be at least 90%," Amodei predicted. Computer Use was a wild surprise. Anthropic, one of the first companies to release a feature that lets users control their devices, said it didn't take a lot of computation and effort to build the feature. "We want to get up to the human level reliability of 80-90% just like anywhere else," revealed Amodei, suggesting that Anthropic has big plans with Computer Use. Considering that Microsoft Copilot Vision has joined the game, Google's plan with Jarvis, and OpenAI's plans to release their new agentic feature, the competition seems to be heating up. Anthropic has always focused on partnerships, and it looks like it will continue to do so in the future. Interestingly, Claude's revenue generated from APIs and enterprise is more than that of OpenAI. "Our view has been: let 1,000 flowers bloom," said Amodei, indicating that he wants to continue building more partnerships. This raises an intriguing question: will AI hardware innovators, such as Rabbit and Humane, soon turn towards Computer Use as the missing link to bridge gaps in their ecosystems? Moreover, Amazon will certainly look to improve its consumer hardware product game with Computer Use. A few days ago, AWS revealed its plans to enhance and scale Anthropic's models using the Trainium 2 Chip. Amodei also suggested they will continue to scale these features while keeping safety in mind. Anthropic's Responsible Scaling Policy outlines different levels of AI safety, with level 1 being the safest, and level 5 indicating the highest risks and danger. "We are working actively to prepare ASL-3 security measures as well as ASL-3 deployment measures. I'm not going to go into details, but we've made a lot of progress on both, and I think we'll be ready quite soon. I would not be surprised at all if we hit ASL-3 next year," he said. Moreover, Amodei revealed that this would not disrupt their scaling plans. For example, he said that despite the safety concerns raised by Computer Use, Anthropic will continue to amplify its capabilities. "I definitely feel that it's important to get these capabilities out there. As models get more powerful, we're going to have to grapple with how we use these capabilities safely," Amode further said. Recently, Amodei's reputation was under fire. Their partnership with the US government for defence applications raised many concerns, and Anthropic certainly has more to prove. A few months ago, Anthropic made its AI models available on the AWS marketplace and revealed its intent to provide Claude to the government so that its frontier models can be used in various sectors. "Government agencies can use Claude to provide improved citizen services, streamline document review and preparation, enhance policymaking with data-driven insights, and create realistic training scenarios," Anthropic said. In the near future, AI could assist in disaster response coordination, enhance public health initiatives, or optimise energy grids for sustainability. While Anthropic is set to fulfil the defence sector, it is quite possible that it will target more such sectors with its newer and more powerful models in the future. This also aligns with Amodei's vision for the future. In his essay titled 'Machines of Loving Grace', he outlined all the possible ways AI can improve humanity. "I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be," he wrote.
Share
Share
Copy Link
Recent developments suggest open-source AI models are rapidly catching up to closed models, while traditional scaling approaches for large language models may be reaching their limits. This shift is prompting AI companies to explore new strategies for advancing artificial intelligence.
Recent studies indicate that open-source large language models (LLMs) are rapidly catching up to their closed-source counterparts. According to research by Epoch AI, the best open-source LLMs have lagged behind closed-source models by five to 22 months in benchmark performance 1. However, this gap appears to be narrowing, with Meta's Llama 3.405B model emerging as a frontrunner in closing the performance divide across multiple benchmarks 1.
Meta's chief AI scientist, Yann LeCun, emphasized the importance of open models, stating, "In the future, our entire information diet is going to be mediated by [AI] systems. They will constitute basically the repository of all human knowledge. And you cannot have this kind of dependency on a proprietary, closed system" 1.
While open models are advancing, there are indications that traditional scaling approaches for LLMs may be reaching their limits. Former OpenAI co-founder Ilya Sutskever suggested that "scaling the right thing matters more now than ever," hinting at the need for new approaches beyond simply increasing model size 2.
Reports suggest that recent efforts to scale models like Gemini 2.0 and Anthropic's Opus 3.0 may have underperformed despite increased scaling 2. This has led to a shift in focus towards quality synthetic data and scaling test-time compute.
In response to these challenges, AI companies are exploring alternative strategies:
OpenAI is reportedly using its Strawberry (o1) model to generate synthetic data for GPT-5, creating a "recursive improvement cycle" 2.
Meta is developing a 'world model' with reasoning capabilities, dubbed Autonomous Machine Intelligence (AMI), under the guidance of Yann LeCun 2.
Anthropic is investigating new architectures and approaches to overcome data limitations and improve model performance 2.
A promising development in the field is the introduction of Liquid Foundation Models (LFM) by Liquid AI, an MIT spinout. These models offer an alternative to traditional LLMs, requiring less compute to train, fine-tune, and run inferences 4. Key advantages of LFMs include:
The evolving AI landscape is already influencing various industries:
Legal Tech: Companies like Robin AI are leveraging AI to provide legal services, combining AI software with human expertise 3.
Engineering: Capgemini is exploring LFMs for applications such as smart car handbooks, focusing on correctness and constraint management in AI-assisted engineering 4.
Coding and Development: Anthropic's Claude models, particularly the 3.0 series, are being integrated into coding tools like Cursor and GitHub Copilot 5.
As the AI field continues to evolve, several trends are emerging:
These developments suggest a dynamic and rapidly changing AI landscape, with potential for significant advancements in both open and closed-source models in the near future.
Reference
[1]
[2]
[5]
Meta has released Llama 3.1, its largest and most advanced open-source AI model to date. This 405 billion parameter model is being hailed as a significant advancement in generative AI, potentially rivaling closed-source models like GPT-4.
5 Sources
5 Sources
The AI industry is witnessing a shift in focus from larger language models to smaller, more efficient ones. This trend is driven by the need for cost-effective and practical AI solutions, challenging the notion that bigger models are always better.
2 Sources
2 Sources
Recent developments in AI models from DeepSeek, Allen Institute, and Alibaba are reshaping the landscape of artificial intelligence, challenging industry leaders and pushing the boundaries of what's possible in language processing and reasoning capabilities.
4 Sources
4 Sources
Leading AI companies are experiencing diminishing returns on scaling their AI systems, prompting a shift in approach and raising questions about the future of AI development.
7 Sources
7 Sources
Chinese AI company DeepSeek's new large language model challenges US tech dominance, sparking debates on open-source AI and geopolitical implications.
9 Sources
9 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved