Curated by THEOUTPOST
On Thu, 19 Dec, 4:03 PM UTC
3 Sources
[1]
Apple Has Teamed Up With NVIDIA To Research On A 'ReDrafter' Technique That Speeds Up Text Generation With Large Language Models
Generative AI features under the Apple Intelligence banner have steered clear from leveraging NVIDIA GPUs to handle cloud-based inputs, with the California-based giant sticking with its custom silicon in its servers that will eventually be replaced by the unreleased M4 Ultra to speed up its Large Language Models. However, a recent blog post from the iPhone maker reveals that Apple and its engineers are not shying away from partnering with NVIDIA if it means both entities have a common goal; implementing faster text generation performance with LLMs. Known as 'ReDrafter' for short, the new blog post states that this technique combines two approaches; one is beam search, and the other is tree attention. Both techniques are designed for improving text generation performance and after Apple's own research, it collaborated with NVIDIA to integrate ReDrafter into TensorRT-LLM, which is a tool that helps Large Language Models run faster on NVIDIA GPUs. Another improvement is that this technology can reduce latency while utilizing less power. "This research work demonstrated strong results, but its greater impact comes from being applied in production to accelerate LLM inference. To make this advancement production-ready for NVIDIA GPUs, we collaborated with NVIDIA to integrate ReDrafter into the NVIDIA TensorRT-LLM inference acceleration framework. Although TensorRT-LLM supports numerous open source LLMs and the Medusa speculative decoding method, ReDrafter's beam search and tree attention algorithms rely on operators that had never been used in previous applications. To enable the integration of ReDrafter, NVIDIA added new operators or exposed existing ones, which considerably improved TensorRT-LLM's capability to accommodate sophisticated models and decoding methods. ML developers using NVIDIA GPUs can now easily benefit from ReDrafter's accelerated token generation for their production LLM applications with TensorRT-LLM. In benchmarking a tens-of-billions parameter production model on NVIDIA GPUs, using the NVIDIA TensorRT-LLM inference acceleration framework with ReDrafter, we have seen 2.7x speed-up in generated tokens per second for greedy decoding. These benchmark results indicate this tech could significantly reduce latency users may experience, while also using fewer GPUs and consuming less power." While this collaboration proves that there is a sliver of a chance for Apple and NVIDIA to enter into an agreement, we strongly believe that such a partnership will not materialize due to the past history shared by the technology giants. We should see short-term tag teams like this formed again in the future, but a meaningful business relationship appears to be out the window.
[2]
Apple Is Using Nvidia's Tools to Make Its AI Models Faster
Apple claims the process resulted in 2.7x faster token generation Apple is partnering with Nvidia in an effort to improve the performance speed of artificial intelligence (AI) models. On Wednesday, the Cupertino-based tech giant announced that it has been researching inference acceleration on Nvidia's platform to see whether both the efficiency and latency of a large language model (LLM) can be improved simultaneously. The iPhone maker used a technique dubbed Recurrent Drafter (ReDrafter) that was published in a research paper earlier this year. This technique was combined with the Nvidia TensorRT-LLM inference acceleration framework. In a blog post, Apple researchers detailed the new collaboration with Nvidia for LLM performance and the results achieved from it. The company highlighted that it has been researching the problem of improving inference efficiency while maintaining latency in AI models. Inference in machine learning refers to the process of making predictions, decisions, or conclusions based on a given set of data or input while using a trained model. Put simply, it is the processing step of an AI model where it decodes the prompts and converts raw data into processed unseen information. Earlier this year, Apple published and open-sourced the ReDrafter technique bringing a new approach to the speculative decoding of data. Using a Recurrent neural network (RNN) draft model, it combines beam search (a mechanism where AI explores multiple possibilities for a solution) and dynamic tree attention (tree-structure data is processed using an attention mechanism). The researchers stated that it can speed up LLM token generation by up to 3.5 tokens per generation step. While the company was able to improve performance efficiency to a certain degree by combining two processes, Apple highlighted that there was no significant boost to speed. To solve this, researchers integrated ReDrafter into the Nvidia TensorRT-LLM inference acceleration framework. As a part of the collaboration, Nvidia added new operators and exposed the existing ones to improve the speculative decoding process. The post claimed that when using the Nvidia platform with ReDrafter, they found a 2.7x speed-up in generated tokens per second for greedy decoding (a decoding strategy used in sequence generation tasks). Apple highlighted that this technology can be used to reduce the latency of AI processing while also using fewer GPUs and consuming less power.
[3]
Nvidia, Apple Team Up To Supercharge ChatGPT-Like LLMs With ReDrafter Technique - NVIDIA (NASDAQ:NVDA)
Apple Inc. AAPL and Nvidia Corporation NVDA have announced a collaboration for the improvement in the performance of large language models. What Happened: The partnership focuses on integrating Apple's Recurrent Drafter (ReDrafter) technique with Nvidia's TensorRT-LLM, to boost text generation speeds. ReDrafter, which Apple open-sourced earlier this year, combines beam search and dynamic tree attention to improve LLM performance. The collaboration with Nvidia has led to the integration of ReDrafter into TensorRT-LLM, a tool designed to accelerate LLMs on Nvidia GPUs. See Also: Elon Musk Agrees With Satya Nadella As Microsoft CEO Reflects On 'Incredible Speed' Of AI Diffusion In 2024 This integration involved adding new operators to improve TensorRT-LLM's ability to handle complex models and decoding methods. Benchmarking results show a 2.7x increase in token generation speed for greedy decoding on Nvidia GPUs, significantly reducing latency and power consumption. Apple's machine learning researchers noted that this advancement could lower computational costs and improve user experience by reducing latency in production applications. Subscribe to the Benzinga Tech Trends newsletter to get all the latest tech developments delivered to your inbox. Why It Matters: In October earlier this year, Apple reported fourth-quarter revenue of $94.9 billion, surpassing analyst expectations of $94.56 billion. Last month, Nvidia reported third-quarter revenue of $35.1 billion, marking a 94% increase compared to the last year, and exceeding the Street consensus estimate of $33.12 billion, as per Benzinga Pro data. Together these two tech giants are worth about $7 trillion, with Apple being the most valuable company in the world and Nvidia ranking third. Price Action: Apple shares fell 2.14% to $248.05 on Wednesday, and dipped further to $247.19 in after-hours trading. Meanwhile, Nvidia shares gained 2.01% to $131.50 in after-hours trading, recovering from a 1.14% decline to $128.91 during the regular session. Photo Courtesy: Koshiro K On Shutterstock.com Check out more of Benzinga's Consumer Tech coverage by following this link. Read Next: Micron Gets Investment From White House, China Launches Antitrust Investigation On Nvidia, Apple Faces $1.2 Billion Lawsuit In California & More: Consumer Tech News Disclaimer: This content was partially produced with the help of Benzinga Neuro and was reviewed and published by Benzinga editors. NVDANVIDIA Corp$131.500.85%Overview Rating:Good75%Technicals Analysis1000100Financials Analysis600100WatchlistOverviewAAPLApple Inc$247.19-2.48%Market News and Data brought to you by Benzinga APIs
Share
Share
Copy Link
Apple and NVIDIA have joined forces to integrate the ReDrafter technique into NVIDIA's TensorRT-LLM framework, significantly improving the speed and efficiency of large language models.
In a surprising collaboration, tech giants Apple and NVIDIA have partnered to improve the performance of large language models (LLMs). The focus of this partnership is the integration of Apple's Recurrent Drafter (ReDrafter) technique with NVIDIA's TensorRT-LLM framework, aiming to significantly boost text generation speeds in AI models 12.
ReDrafter, a technique open-sourced by Apple earlier this year, combines two approaches to enhance LLM performance:
This innovative approach can speed up LLM token generation by up to 3.5 tokens per generation step 2.
To make ReDrafter production-ready for NVIDIA GPUs, the two companies collaborated to integrate it into the NVIDIA TensorRT-LLM inference acceleration framework. This integration required NVIDIA to add new operators and expose existing ones, significantly improving TensorRT-LLM's capability to accommodate sophisticated models and decoding methods 1.
The collaboration has yielded remarkable results:
This technological advancement could have far-reaching effects on AI development and application:
Machine learning developers using NVIDIA GPUs can now easily benefit from ReDrafter's accelerated token generation for their production LLM applications with TensorRT-LLM 1.
While this collaboration demonstrates the potential for Apple and NVIDIA to work together, it's important to note that this appears to be a short-term partnership focused on specific technological advancements. Given the companies' past history, a long-term business relationship seems unlikely 13.
Both Apple and NVIDIA are major players in the tech industry:
Together, these tech giants are valued at approximately $7 trillion, with Apple being the most valuable company globally and NVIDIA ranking third 3.
This collaboration between two industry leaders highlights the ongoing race to improve AI technologies and could potentially reshape the landscape of AI development and application in the near future.
Reference
[2]
Apple has reportedly opted for Google's Tensor Processing Units (TPUs) instead of Nvidia's GPUs for its AI training needs. This decision marks a significant shift in the tech industry's AI hardware landscape and could have far-reaching implications for future AI developments.
7 Sources
7 Sources
Apple's historically strained relationship with Nvidia is explored, highlighting past conflicts and Apple's current efforts to develop its own AI chip, potentially ending its reliance on Nvidia's technology.
2 Sources
2 Sources
Apple reveals its use of Amazon Web Services' custom AI chips for services like search and considers using Trainium2 for pre-training AI models, potentially improving efficiency by up to 50%.
13 Sources
13 Sources
Apple is reportedly ordering $1 billion worth of NVIDIA's advanced AI servers, indicating a significant move to boost its AI capabilities and potentially address recent challenges with Siri development.
4 Sources
4 Sources
Apple is reportedly using Google's custom chips to train its AI models, moving away from Nvidia hardware. This collaboration aims to enhance iPhone intelligence and AI capabilities.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved