2 Sources
2 Sources
[1]
Tencent's New Hy3 AI Model Is the Most Efficient Chinese LLM No One's Talking About - Decrypt
It's also available on Tencent Cloud's official website, under a paid plan. My3 packs 295 billion total parameters (a measurement of a model's potential breadth of knowledge) but only 21 billion active at any given time. That's the beauty of a Mixture-of-Experts architecture -- the model routes each query to a specialized subset of its "expert" sub-networks instead of running everything at once. Less compute, lower cost, roughly similar output quality. It also supports up to 256,000 tokens of context, which is enough to swallow a full-length novel in a single prompt. The model was built to balance three things Tencent says it stopped sacrificing for each other: capability breadth, honest evaluation, and cost-efficiency. Their previous flagship, Hy2, had over 400 billion parameters. Tencent explicitly walked that back, arguing 295 billion is the optimal sweet spot where reasoning fully matures but the cost of adding more parameters stops paying off. This also doesn't mean the model is worse. Models with better training and lower parameters outperform bigger generalist ones quite frequently. On coding, the improvement is dramatic. SWE-bench Verified is a benchmark that tests whether a model can actually fix real bugs from GitHub repositories -- not toy problems, but production code. Hy2 scored 53.0%. Hy3 preview scores 74.4%. That's a 40% jump in one generation, landing it in range of Claude Opus 4.6 (80.8%) and above GLM-5 (77.8%) and Kimi-K2.5 (76.8%). Terminal-Bench 2.0, which measures autonomous task execution in a real command-line environment, went from 23.2% to 54.4% -- also a massive leap. The model, however, can be a very interesting choice for people building with agents. Agents have a very complex set of instructions that involve memories, skills, and tool calls. They usually miss something, which can ruin a workflow or produce poor results. That's why agentic capabilities are becoming more and more important for AI developers as this area becomes the most hyped thing in the industry. It's also why the model was immediately made available on Openclaw. Search and browsing agents -- where models must retrieve, filter, and synthesize information from the open web without human guidance -- also improved sharply. On BrowseComp, a benchmark tracking complex web research tasks, Hy3 preview reached 67.1% (up from Hy2's 28.7%). On WideSearch, it hit 70.2%, outperforming GLM-5 and Kimi-K2.5 but trailing Claude Opus 4.6's 77.2%. In reasoning, the model topped every Chinese competitor on Tsinghua University's math PhD qualifying exam (Spring 2026), scoring 88.4 on the average of three runs avg@3. That's a real-world exam, not a curated dataset -- the kind of evaluation Tencent says it's prioritizing to avoid benchmark gaming. The model also scored 87.8 on CHSBO 2025 (China's national high school biology olympiad), highest among Chinese models in that category. Hy3 preview started training in late January 2026 and launched Thursday -- under three months from cold start to open-source release. Unusually fast for a frontier-class model. Tencent attributes it to a February infrastructure overhaul led by Yao Shunyu, its chief AI scientist, who pushed a full rebuild of the pretraining and reinforcement learning stack. This is a very different approach from what Chinese AI labs were doing a year ago, when DeepSeek's R1 shocked the industry with its cost-efficiency. Hy3 still trails OpenAI and Google DeepMind's flagships, but by the size-to-performance ratio, Hy3 preview is hard to dismiss: the agent benchmark composite shows it in the "optimal zone" with ~295 billion parameters, ahead of DeepSeek-V3.2 (600 billion+) and matching Kimi-K2.5 (over 1 trillion parameters) at a fraction of the compute cost. Hunyuan models have already been deployed across Yuanbao, CodeBuddy, WorkBuddy, QQ, and Tencent Docs. On CodeBuddy and WorkBuddy, first-token latency dropped 54%, end-to-end generation time fell 47%, and the model successfully ran agent workflows as long as 495 steps. Tencent Cloud is offering API access at approximately $0.18 per million input tokens and $0.59 per million output tokens, with personal Token Plan packages starting at around $4.10 per month.
[2]
Tencent uses product rollout, not just benchmarks, to define Hy3 preview
Tencent is launching Hy3 preview with the usual benchmark claims expected of a new large language model. But the more distinctive part of the rollout is where the company is putting its proof: inside products. According to Tencent's latest briefing materials, Hy3 preview has already been integrated into Yuanbao, CodeBuddy, WorkBuddy, ima, Tencent Docs and Peacekeeper Elite before its broader public rollout. That matters because the AI market is reaching a point where raw model claims increasingly look similar. In that environment, product performance may say more than another round of leaderboard results. Hy3 preview itself is substantial. Tencent describes it as a fast-and-slow-thinking fused MoE language model with 295 billion total parameters, 21 billion activated parameters, and support for up to 256K context. The company says the model has improved inference efficiency by 40% and performs strongly across reasoning, instruction following, in-context learning, coding and agentic tasks. Tencent also cites more than 50 evaluation sets, including specialized tests such as SWE-Bench Verified, Terminal-Bench 2.0, BrowseComp, WideSearch, FrontierScience-Olympiad and IMOAnswerBench. But Tencent is not relying on benchmarks alone to make the case. In CodeBuddy and WorkBuddy, the company says Hy3 preview reduced first-token latency by 54%, cut end-to-end duration by 47%, and improved task success rates to 99.99%+. Tencent also says the model has stably supported complex agent workflows of up to 495 steps in real user environments, spanning tasks such as document handling, data analysis, knowledge retrieval and tool orchestration. Those numbers give the launch a more concrete basis than the usual abstract benchmark language. Tencent says HY3 is also being shaped through product co-design and open-source feedback, as the company works to improve the model's performance in real-world scenarios ahead of the official HY3 release. Yao Shunyu, Tencent's chief AI scientist, described the preview as the first step in rebuilding the Hunyuan model line, with further gains expected from continued pre-training and reinforcement-learning work. In Yuanbao, Tencent says the model has been co-developed against product-side requirements including intent understanding, search quality, writing style, emotional intelligence and professionalism. In Peacekeeper Elite, the company says Hy3 preview has shown strong performance in AI NPC scenarios, including both persona-driven dialogue outside matches and more time-sensitive, human-like responses during matches. Together, those examples suggest Tencent is trying to define model progress through visible user-facing behavior, not just through lab-style evaluations. Tencent is also pairing that product argument with pricing and deployment signals. Through TokenHub, Hy3 preview starts at RMB 1.2 per million input tokens and RMB 4 per million output tokens, with the company stressing lower deployment barriers for enterprise use. That helps frame the model not only as capable, but as practical to run at scale. That may be the more important message behind the launch. Hy3 preview is not being presented as a model that exists apart from products. Tencent is using live product rollout to argue that the model already matters inside them. Tencent said users will be able to try HY3 Preview through a two-week free token offer, extending the launch's emphasis on real-world testing beyond internal products. Hy3 preview is now available via the following access (free access for a limited two-week period) : https://openrouter.ai/tencent/hy3-preview:free
Share
Share
Copy Link
Tencent unveiled Hy3 preview, a 295-billion-parameter Chinese LLM that improves inference efficiency by 40% while excelling at coding and agent workflows. Unlike typical model launches focused on benchmarks, Tencent is proving Hy3's capabilities through live integration across Yuanbao, CodeBuddy, and other products, where it reduced latency by 54% and handles complex tasks up to 495 steps.
Tencent is taking a different approach with its latest Chinese LLM release. While the Hy3 AI Model arrives with the expected benchmark results, the company is staking its credibility on something more tangible: real-world product rollout
2
. Before its broader public launch, Tencent Hy3 preview has already been integrated into Yuanbao, CodeBuddy, WorkBuddy, Tencent Docs, and even the gaming title Peacekeeper Elite2
. This strategy signals a shift in how AI labs might need to prove their models matter as benchmark scores increasingly converge across competitors.
Source: Decrypt
The Hy3 preview leverages a Mixture-of-Experts (MoE) architecture with 295 billion total parameters but only 21 billion active at any given time
1
. This design routes each query to specialized expert sub-networks rather than activating the entire model, delivering lower computational costs without sacrificing output quality1
. Tencent explicitly scaled back from its previous flagship Hy2, which had over 400 billion parameters, arguing that 295 billion represents the optimal point where reasoning fully matures but additional parameters stop delivering returns1
. The model also supports up to 256,000 tokens of context, enough capacity to process a full-length novel in a single prompt1
.On coding benchmarks, the progress is striking. SWE-Bench Verified tests whether models can fix actual bugs from GitHub repositories rather than artificial problems
1
. Hy2 scored 53.0%, while Hy3 preview reaches 74.4%—a 40% jump in one generation that places it near Claude Opus 4.6 at 80.8% and above GLM-5 at 77.8%1
. Terminal-Bench 2.0, which measures autonomous task execution in real command-line environments, saw an even larger leap from 23.2% to 54.4%1
.These gains make the model particularly compelling for agent workflows, where complex instruction sets involving memories, skills, and tool calls often break down
1
. In CodeBuddy and WorkBuddy, Tencent reports that Hy3 preview reduced first-token latency by 54%, cut end-to-end generation time by 47%, and achieved task success rates exceeding 99.99%2
. The model has stably supported complex agent workflows spanning up to 495 steps in live user environments, handling document processing, data analysis, knowledge retrieval, and tool orchestration2
.Tencent claims inference efficiency improvements of 40% with Hy3 preview
2
. By the size-to-performance ratio, the model occupies what Tencent calls an "optimal zone" with approximately 295 billion parameters, ahead of DeepSeek-V3.2 with over 600 billion parameters and matching Kimi-K2.5 with over 1 trillion parameters at a fraction of the compute cost1
. Search and browsing agents also saw sharp improvements, with BrowseComp scores reaching 67.1% from Hy2's 28.7%, and WideSearch hitting 70.2%, outperforming GLM-5 and Kimi-K2.5 though trailing Claude Opus 4.6's 77.2%1
.On reasoning tasks, the model scored 88.4 on Tsinghua University's math PhD qualifying exam for Spring 2026 and 87.8 on China's national high school biology olympiad (CHSBO 2025), topping every Chinese competitor in both categories
1
. Tencent emphasizes these are real-world exams rather than curated datasets, part of its effort to avoid benchmark gaming1
.Related Stories
Hy3 preview started training in late January 2026 and launched in April—under three months from cold start to open-source release, unusually fast for a frontier-class model
1
. Tencent attributes this speed to a February infrastructure overhaul led by Yao Shunyu, its chief AI scientist, who pushed a complete rebuild of the pretraining and reinforcement learning stack1
. Yao described the preview as the first step in rebuilding the Hunyuan model line, with further gains expected from continued pre-training and reinforcement learning work2
.Tencent Cloud offers API access at approximately $0.18 per million input tokens and $0.59 per million output tokens, with personal Token Plan packages starting around $4.10 per month
1
. Through TokenHub, pricing starts at RMB 1.2 per million input tokens and RMB 4 per million output tokens, emphasizing lower deployment barriers for enterprise use2
. Users can access Hy3 preview through a two-week free token offer via OpenRouter, extending the emphasis on real-world testing beyond internal products2
.For developers building with agents, this matters. As agentic capabilities become the most hyped area in AI, models that can reliably execute complex multi-step workflows without breaking down offer practical value beyond raw parameter counts
1
. Tencent is betting that visible user-facing behavior in live products will define model progress more convincingly than lab-style evaluations alone2
. Watch whether other Chinese LLM developers follow this product-first validation approach as the AI market matures.Summarized by
Navi
[1]
23 Jul 2025•Technology

16 Feb 2026•Technology

29 Apr 2025•Technology

1
Technology

2
Science and Research

3
Technology
