No Drama Llama Installation

Thanks to recent advances, powerful LLMs can now run on our own computers. David Eastman gives us a rundown on how to make it happen.

Large language models (LLMs) have been all the rage lately, with their capabilities expanding across a number of domains, from natural language processing to creative writing and even assisting in scientific research. The biggest players in the field, like OpenAI’s ChatGPT and Google’s Gemini, have captured most of the spotlight to date. But there is a noticeable change in the air — as open source efforts continue to advance in capabilities and efficiency, they are becoming much more widely used.

This has made it possible for people to run LLMs on their own hardware. Doing so can save on subscription fees, protect one’s privacy (no data needs to be transferred to a cloud-based service), and even allow technically-inclined individuals to fine-tune models for their own use cases. As recently as a year or two ago, this might have seemed virtually impossible. LLMs are notorious for the massive amount of compute resources they need to execute. And many powerful LLMs still do require a huge amount of resources, but a number of advancements have made it practical to run more compact models with excellent performance on smaller and smaller hardware platforms.

Starting up a Llama 2 model with Ollama (📷: D. Eastman)

A software developer named David Eastman has been on a kick of eliminating a number of cloud services lately. For the aforementioned reasons, LLM chatbots have been one of the most challenging services to reproduce locally. But sensing the shift that is taking place at present, Eastman wanted to try to install a local LLM chatbot. Lucky for us, that project resulted in the writing of a guide that can help others to do the same — and quickly.

The guide focuses on using Ollama, which is a tool that makes it simple to install and run an open source LLM locally. Typically, this would require the installation of a machine learning framework and all of its dependencies, downloading the model files, and configuring everything. This can be a frustrating process, especially for someone that is not experienced with these tools. Using Ollama, one need only download the tool and select the model that they would like to use from a library of available options — in this case, Eastman gave Llama 2 a whirl.

After issuing a “run” command, the selected model is automatically downloaded, then a text-based interface is presented to interact with the LLM. Ollama also starts up a local API service, so it is easy to work with the model via custom software developed in Python or C++, for example. Eastman tested this capability out by writing some simple programs in C#.

Getting hungry? (📷: D. Eastman)

After asking a few basic questions of the model, like "Why is the sky blue?,” Eastman wrote some more complex prompts to see what Llama 2 was really made of. In one prompt, the model was asked to come up with some recipes based on what was available in the fridge. The response may not have been very fast, but when the results were produced, they looked pretty good. Not bad for a model running on an older pre-M1 MacBook with just 8 GB of memory!

Be sure to check out Eastman’s guide if you have an interest in running your own LLM, but do not want to devote the next few weeks of your life to understanding the associated technologies. You might also be interested in checking out this LLM-based voice assistant that runs 100% locally on a Raspberry Pi.

nickbild

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

Latest Articles