NVIDIA Shrinks Mistral Nemo 12B to Create the Mistral-NeMo-Minitron 8B "Small Language Model"

A four billion parameter variant, meanwhile, has been developed for on-device use in games.

NVIDIA has released a "small language model" that takes the giant Mistral Nemo 12B model it worked on alongside Mistral AI and compresses it down to the point where it'll run on a workstation with an RTX graphics card — and if eight billion parameters is still too big for you, it's got a four billion parameter version tailored for on-device use on PCs and laptops.

"We combined two different AI [Artificial Intelligence] optimization methods — pruning to shrink Mistral NeMo’s 12 billion parameters into eight billion, and distillation to improve accuracy," explains NVIDIA's vice president of applied deep learning research Bryan Catanzaro of his team's work. "By doing so, Mistral-NeMo-Minitron 8B delivers comparable accuracy to the original model at lower computational cost."

Large language models are all the rage at the moment, powering everything from interactive fiction to AI assistants on your phone — but they all share the same problem: they're large, requiring high-power servers that preclude on-device use. Small language models, by contrast, are designed to run on-device — though in the case of the still eight-billion-parameters-strong Mistral-NeMo-Minitron 8B, those devices are high-end workstations fitted with NVIDIA's RTX graphics accelerators.

Despite its smaller size, this take on the Mistral NeMo model delivers equivalent performance, primarily by pruning model weights known to contribute the least to overall accuracy and retaining on a dataset considerably smaller than the original — taking just one-fortieth the compute for the retraining as NeMo's original training. The same techniques were used to shrink the model still further into the application-specific Nemotron-4 4B Instruct, designed for use on consumer PCs and laptops to deliver what NVIDIA calls "state-of-the-art digital human technology: in games.

Mistral-Nemo-Minitron 8B is available as an NVIDIA NIM microservice or for download on Hugging Face now; "a downloadable NVIDIA NIM, which can be deployed on any GPU-accelerated system in minutes, will be available soon," the company promises. A technical report has also been published, for those looking to dig into the details.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles