3 Sources
3 Sources
[1]
Baidu launches new generation of Ernie AI
Company also introduced new processors to up its inference and training game. The AI marketspace is getting mighty crowded, and Chinese company Baidu is the latest player to launch its newest model into the world. At its Baidu World conference this week, it unveiled Ernie 5.0. Baidu CTO and head of AI Group Haifeng Wang said (via translated subtitles supplied by the conference) that Ernie 5.0's technical route was to "adopt a unified auto-regression architecture for native full multimodal modelling." He said that this meant that "from the beginning of training, speech had been integrated, and images had been integrated, video, audio, and other multimodal data." While its predecessor, Ernie-4.5-VL-28B-A3B-Thinking, is supplied under an Apache license and is expected to provide an alternative to the likes of OpenAI, Ernie 5.0 is proprietary, built on the company's PaddlePaddle deep learning framework.
[2]
Baidu unveils proprietary ERNIE 5 beating GPT-5 performance on charts, document understanding and more
Mere hours after OpenAI updated its flagship foundation model GPT-5 to GPT-5.1, promising reduced token usage overall and a more pleasant personality with more preset options, Chinese search giant Baidu unveiled its next-generation foundation model, ERNIE 5.0, alongside a suite of AI product upgrades and strategic international expansions. The goal: to position as a global contender in the increasingly competitive enterprise AI market. Announced at the company's Baidu World 2025 event, ERNIE 5.0 is a proprietary, natively omni-modal model designed to jointly process and generate content across text, images, audio, and video. Unlike Baidu's recently released ERNIE-4.5-VL-28B-A3B-Thinking, which is open source under an enterprise-friendly and permissive Apache 2.0 license, ERNIE 5.0 is a proprietary model and is available only via Baidu's ERNIE Bot website (I needed to select it manuallyu from the model picker dropdown) and the Qianfan cloud platform application programming interface (API) for enterprise customers. Alongside the model launch, Baidu introduced major updates to its digital human platform, no-code tools, and general-purpose AI agents -- all targeted at expanding its AI footprint beyond China. The company also introduced ERNIE 5.0 Preview 1022, a variant optimized for text-intensive tasks, alongside the general preview model that balances across modalities. Baidu emphasized that ERNIE 5.0 represents a shift in how intelligence is deployed at scale, with CEO Robin Li stating: "When you internalize AI, it becomes a native capability and transforms intelligence from a cost into a source of productivity." Where ERNIE 5.0 outshines GPT-5 and Gemini 2.5 Pro ERNIE 5.0's benchmark results suggest that Baidu has achieved parity -- or near-parity -- with the top Western foundation models across a wide spectrum of tasks. In public benchmark slides shared during the Baidu World 2025 event, ERNIE 5.0 Preview outperformed or matched OpenAI's GPT-5-High and Google's Gemini 2.5 Pro in multimodal reasoning, document understanding, and image-based QA, while also demonstrating strong language modeling and code execution abilities. The company emphasized its ability to handle joint inputs and outputs across modalities, rather than relying on post-hoc modality fusion, which it framed as a technical differentiator. On visual tasks, ERNIE 5.0 achieved leading scores on OCRBench, DocVQA, and ChartQA, three benchmarks that test document recognition, comprehension, and structured data reasoning. Baidu claims the model beat both GPT-5-High and Gemini 2.5 Pro on these document and chart-based benchmarks, areas it describes as core to enterprise applications like automated document processing and financial analysis. In image generation, ERNIE 5.0 tied or exceeded Google's Veo3 across categories including semantic alignment and image quality, according to Baidu's internal GenEval-based evaluation. Baidu claimed that the model's multimodal integration allows it to generate and interpret visual content with greater contextual awareness than models relying on modality-specific encoders. For audio and speech tasks, ERNIE 5.0 demonstrated competitive results on MM-AU and TUT2017 audio understanding benchmarks, as well as question answering from spoken language inputs. Its audio performance, while not as heavily emphasized as vision or text, suggests a broad capability footprint intended to support full-spectrum multimodal applications. In language tasks, the model showed strong results on instruction following, factual question answering, and mathematical reasoning -- core areas that define the enterprise utility of large language models. The Preview 1022 variant of ERNIE 5.0, tailored for textual performance, showed even stronger language-specific results in early developer access. While Baidu does not claim broad superiority in general language reasoning, its internal evaluations suggest that ERNIE 5.0 Preview 1022 closes the gap with top-tier English-language models and outperforms them in Chinese-language performance. While Baidu did not release full benchmark details or raw scores publicly, its performance positioning suggests a deliberate attempt to frame ERNIE 5.0 not as a niche multimodal system but as a flagship model competitive with the largest closed models in general-purpose reasoning. Where Baidu claims a clear lead is in structured document understanding, visual chart reasoning, and integration of multiple modalities into a single, native modeling architecture. Independent verification of these results remains pending, but the breadth of claimed capabilities positions ERNIE 5.0 as a serious alternative in the multimodal foundation model landscape. Enterprise Pricing Strategy ERNIE 5.0 is positioned at the premium end of Baidu's model pricing structure. The company has released specific pricing for API usage on its Qianfan platform, aligning the cost with other top-tier offerings from Chinese competitors like Alibaba. The contrast in cost between ERNIE 5.0 and earlier models such as ERNIE 4.5 Turbo underscores Baidu's strategy to differentiate between high-volume, low-cost models and high-capability models designed for complex tasks and multimodal reasoning. Compared to other U.S. alternatives, it remains mid-range in pricing: Global Expansion: Products and Platforms In tandem with the model release, Baidu is expanding internationally: * GenFlow 3.0, now with 20M+ users, is the company's largest general-purpose AI agent and features enhanced memory and multimodal task handling. * Famou, a self-evolving agent capable of dynamically solving complex problems, is now commercially available via invite. * MeDo, the international version of Baidu's no-code builder Miaoda, is live globally via medo.dev. * Oreate, a productivity workspace with document, slide, image, video, and podcast support, has reached over 1.2M users worldwide. Baidu's digital human platform, already rolled out in Brazil, is also part of the global push. According to company data, 83% of livestreamers during this year's "Double 11" shopping event in China used Baidu's digital human tech, contributing to a 91% increase in GMV. Meanwhile, Baidu's autonomous ride-hailing service Apollo Go has surpassed 17 million rides, operating driverless fleets in 22 cities and claiming the title of the world's largest robotaxi network. Open-Source Vision-Language Model Garners Industry Attention Two days before the flagship ERNIE 5.0 event, Baidu also released an open-source multimodal model under the Apache 2.0 license: ERNIE-4.5-VL-28B-A3B-Thinking. As reported by my colleague Michael Nuñez at VentureBeat, the model activates just 3 billion parameters while maintaining a total of 28 billion, using a Mixture-of-Experts (MoE) architecture for efficient inference. Key technical innovations include: * "Thinking with Images", which enables dynamic zoom-based visual analysis * Support for chart interpretation, document understanding, visual grounding, and temporal awareness in video * Runtime on a single 80GB GPU, making it accessible to mid-sized organizations * Full compatibility with Transformers, vLLM, and Baidu's FastDeploy toolkits This release adds pressure on closed-source competitors. With Apache 2.0 licensing, ERNIE-4.5-VL-28B-A3B-Thinking becomes a viable foundation model for commercial applications without licensing restrictions -- something few high-performing models in this class offer. Community Feedback and Baidu's Response Following the launch of ERNIE 5.0, developer and AI evaluator Lisan al Gaib (@scaling01) posted a mixed review on X. While initially impressed by the model's benchmark performance, they reported a persistent issue where ERNIE 5.0 would repeatedly invoke tools -- even when explicitly instructed not to -- during SVG generation tasks. "ERNIE 5.0 benchmarks looked insane until I tested it... unfortunately it's RL braindamaged or they have a serious issue with their chat platform / system prompt," Lisan wrote. In a matter of hours, Baidu's developer-focused support account, @ErnieforDevs, responded: "Thanks for the feedback! It's a known bug -- certain syntax can consistently trigger it. We're working on a fix. You can try rephrasing or changing the prompt to avoid it for now." The quick turnaround reflects Baidu's increasing emphasis on developer communication, especially as it courts international users through both proprietary and open-source offerings. Outlook for Baidu and its ERNIE foundational LLM family Baidu's ERNIE 5.0 marks a strategic escalation in the global foundation model race. With performance claims that put it on par with the most advanced systems from OpenAI and Google, and a mix of premium pricing and open-access alternatives, Baidu is signaling its ambition to become not just a domestic AI leader, but a credible global infrastructure provider. At a time when enterprise AI users are increasingly demanding multimodal performance, flexible licensing, and deployment efficiency, Baidu's two-track approach -- premium hosted APIs and open-source releases -- may broaden its appeal across both corporate and developer communities. Whether the company's performance claims hold up under third-party testing remains to be seen. But in a landscape shaped by rising costs, model complexity, and compute bottlenecks, ERNIE 5.0 and its supporting ecosystem give Baidu a competitive position in the next wave of AI deployment.
[3]
Baidu Unveils ERNIE 5.0 and A Series of AI Applications At Baidu World 2025, Ramps Up Global Push
Baidu, Inc. unveiled the natively omni-modal foundation model, ERNIE 5.0, at its annual event, Baidu World 2025. ERNIE 5.0 jointly models text, images, audio, and videos for comprehensive multimodal understanding and generation. The company also introduced a suite of AI products and services and announced plans to roll out select products to global markets. At the event, Baidu introduced upgrades for a suite of AI products, including its next-generation real-time digital human, an enhanced 2.0 version of its no-code application builder Miaoda, a revamped Baidu Search experience powered by more intelligent capabilities, and the general AI agent GenFlow 3.0. It also unveiled Famou, a self-evolving AI agent, and announced plans to roll out products such as the digital human technology, no-code application builder MeDo, one-stop AI workspace Oreate to global markets. Robin Li, Co-founder and CEO of Baidu, highlighted the importance of internalizing AI capabilities in everyacet of the modern workflow. As the latest-gen foundation model of the ERNIE series, ERNie 5.0 is built upon natively unified omni-modal modeling technology. From the ground up, it jointly models text, images, Audio, and video, achieving comprehensive multimodal understand and generation. With fully upgraded foundational abilities, ERNIE 5. 0cels in multimodal understanding, instruction following, creative writing, factual reasoning, agentic planning, and tool use. The preview of the ERNIE 5.0 model is now available to the public via ERNIE Bot and to enterprise users via Baidu AI Cloud's MaaS platform Qianfan. Li noted that foundation models are iterating rapidly, as evidenced by continuous breakthroughs in intelligence limits, increased model "thinking-time," the native integration of multiple modalities, and the ability for self-learning and evolution. Apollo Go Robotaxi Reaches Over 17 million Rides Globally, The World's Largest Baidu's autonomous ride-hailing service, Apollo Go, has completed over 17 million rides globally, making it the largest in the world. Its weekly ride count recently surpassed 250,000, all of which are fully driverless. Digital Human Tech Debuts in Brazil Baidu's no-code application builder Miaod has been upgraded to version 2.0, which has already been used to generate over 400,000 applications. Miaoda's international version, MeDo, was also launched at the event. It is now available for global developers via medo.dev. The company also announced it will make its digital human technology globally available. The technology has debuted in Brazil and is exploring expansion opportunities into key markets such as the U.S. and Southeast Asia, and platforms such as Shopee and Lazada.
Share
Share
Copy Link
Chinese tech giant Baidu unveils its proprietary ERNIE 5.0 foundation model at Baidu World 2025, featuring native multimodal capabilities and claiming superior performance over Western competitors in document understanding and visual tasks. The company also announces global expansion plans for its AI products.
Chinese technology giant Baidu introduced its latest artificial intelligence model, ERNIE 5.0, at the company's annual Baidu World 2025 conference, positioning it as a direct competitor to leading Western AI models including OpenAI's GPT-5 and Google's Gemini 2.5 Pro
1
2
.
Source: InfoWorld
The new model represents a significant departure from Baidu's previous open-source approach. While its predecessor, ERNIE-4.5-VL-28B-A3B-Thinking, was released under an Apache license, ERNIE 5.0 is proprietary and built on the company's PaddlePaddle deep learning framework
1
.Baidu CTO and head of AI Group Haifeng Wang explained that ERNIE 5.0 adopts a "unified auto-regression architecture for native full multimodal modelling," integrating speech, images, video, and audio data from the beginning of training rather than through post-processing fusion
1
.
Source: VentureBeat
This native multimodal approach distinguishes ERNIE 5.0 from competitors that rely on modality-specific encoders. The model jointly processes and generates content across text, images, audio, and video, enabling comprehensive multimodal understanding and generation capabilities
3
.Baidu presented benchmark results suggesting ERNIE 5.0 achieves parity or superiority compared to top Western foundation models across multiple task categories. According to company data shared at the conference, ERNIE 5.0 Preview outperformed or matched OpenAI's GPT-5-High and Google's Gemini 2.5 Pro in multimodal reasoning, document understanding, and image-based question answering
2
.The model demonstrated particularly strong performance on visual tasks, achieving leading scores on OCRBench, DocVQA, and ChartQA benchmarks that test document recognition, comprehension, and structured data reasoning. Baidu claims these results position ERNIE 5.0 as superior to both GPT-5-High and Gemini 2.5 Pro in document and chart-based applications crucial for enterprise use cases
2
.Related Stories
Baidu introduced ERNIE 5.0 Preview 1022, a specialized variant optimized for text-intensive tasks, alongside the general preview model that balances performance across all modalities. The Preview 1022 variant showed enhanced language-specific results in early developer access, particularly excelling in Chinese-language performance
2
.The ERNIE 5.0 preview is currently available to the public through ERNIE Bot and to enterprise users via Baidu AI Cloud's MaaS platform Qianfan. The model is positioned at the premium end of Baidu's pricing structure, aligning costs with other top-tier offerings in the market
2
3
.Beyond the model launch, Baidu announced significant international expansion plans for its AI product suite. The company introduced upgrades to its digital human platform, no-code application builder Miaoda 2.0, and general AI agent GenFlow 3.0, all targeted at expanding its AI footprint beyond China
3
.Baidu's digital human technology has already debuted in Brazil, with the company exploring expansion opportunities into key markets including the United States and Southeast Asia. The international version of its no-code application builder, called MeDo, is now available globally via medo.dev
3
.Summarized by
Navi
[1]
[2]
16 Mar 2025•Technology

27 Feb 2025•Technology

12 Feb 2025•Technology

1
Business and Economy

2
Technology

3
Policy and Regulation
