Apple and NVIDIA Collaborate to Boost AI Text Generation with ReDrafter

The partnership integrates Apple's ReDrafter technique into NVIDIA's TensorRT-LLM, achieving faster and more efficient large language model performance.

Overview

Apple's open-sourced ReDrafter combines beam search and dynamic tree attention to enhance text generation in large language models (LLMs).
ReDrafter has been integrated into NVIDIA's TensorRT-LLM framework, enabling faster and more efficient LLM inference on NVIDIA GPUs.
The integration required NVIDIA to add or modify operators, improving TensorRT-LLM's ability to handle complex models and decoding methods.
Benchmarks show a 2.7x increase in token generation speed for greedy decoding, reducing latency, power consumption, and computational costs.
This collaboration underscores the potential for short-term partnerships between Apple and NVIDIA on AI technologies, despite their historically limited business ties.