NVIDIA's TensorRT-LLM Accelerates Large Language Models While Generative AI Hits Its Jetson Platform

New library quadruples the performance of LLMs on Windows, providing the computational grunt for better contextual responses.

UPDATE (10/20/2023): NVIDIA has released its TensorRT-LLM library, which it says can dramatically improve the performance of large language models (LLMs) using the Tensor Cores on RTX graphics cards.

The library can now be downloaded from NVIDIA's GitHub repository, where the source code and examples are made available under the permissive Apache 2.0 license. The library has also been integrated into NVIDIA's NeMo framework, which offers an end-to-end platform for building, customizing, and deploying generative AI models including LLMs.

Original article continues below.

NVIDIA has announced the impending release of a new library which can boost the performance of large language models (LLMs) up to fourfold, by processing the workload on the Tensor Cores of RTX graphics cards — and it's also promising new generative artificial intelligence (AI) capabilities in its robotics platforms, too.

"Generative AI is one of the most important trends in the history of personal computing, bringing advancements to gaming, creativity, video, productivity, development and more," claims NVIDIA's Jesse Clayton. "And GeForce RTX and NVIDIA RTX GPUs, which are packed with dedicated AI processors called Tensor Cores, are bringing the power of generative AI natively to more than 100 million Windows PCs and workstations."

Clayton's claims relate to a new library for Windows dubbed TensorRT-LLM, dedicated to accelerating the performance of large language models like OpenAI's ChatGPT. Using TensorRT-LLM in a system with an RTX graphics card, NVIDIA says, offers a quadrupling of performance. The acceleration can also be used to improve not only the response time from an LLM but its accuracy too, Clayton says, offering the performance required to enable real-time retrieval-augmented generation — tying the LLM into a vector library or database to provide a task-specific dataset.

For those more interested in the generation of graphics rather than text, NVIDIA says its RTX hardware can now accelerate the popular Stable Diffusion prompt-to-image model, offering double or more the performance — and, Clayton says, up to seven times when running on a GeForce RTX 4090 GPU than on an Apple Mac with M2 Ultra processor.

The company's push into generative AI goes beyond providing acceleration, however: NVIDIA has announced the Jetson Generative AI Lab, through which it promises to provide developers with "optimized tools and tutorials" including vision language models (VLMs) and vision transformers (VIT) to drive visual artificial intelligence with scene comprehension. These models can then be trained and optimized in the company's TAO Toolkit, before being deployed to the company's Jetson platform.

"Generative AI will significantly accelerate deployments of AI at the edge with better generalization, ease of use, and higher accuracy than previously possible," says Deepu Talla, vice president of embedded and edge computing at NVIDIA, of the company's latest news. "This largest-ever software expansion of our Metropolis and Isaac frameworks on Jetson, combined with the power of transformer models and generative AI, addresses this need."

More information on the Generative AI Lab is to be announced during a webinar on November 7th; the TensorRT-LLM library will be available to download "soon" from the NVIDIA Developer site, the company has promised.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Get our weekly newsletter when you join Hackster.
Latest articles
Read more
Related articles