Edgen Aims to Deliver an Open Source Drop-In Replacement to OpenAI's API, for Gen AI at the Edge
Aiming for compatibility with OpenAI's API, the Edgen server delivers private, local, CPU-compatible generative AI for all.
Rafael Chamusca and colleagues are trying to make it easier to get involved with generative artificial intelligence (gen AI) without needing to use commercial services — launching Edgen, a local, private server offering a drop-in replacement for the OpenAI application programming interface (API).
"We have launched an open source project made for anyone to run the best of generative AI locally on their devices," Chamusca tells us of his team's effort. "It is compatible with any operating system, is simply one download, and it greatly optimizes gen AI models for edge deployment."
The idea is simple: a number of makers are building projects around generative AI, with many using OpenAI's servers to do so. Reliance on a commercial service, though, comes at a cost: all except basic, throttled use is charged, the service could technically be removed at any time, and data used in the generation can't be kept private.
That's where Edgen comes in. "Leveraging Edgen is similar to using a OpenAI's API, but with the added benefits of on-device processing," Chamusca and Francisco Melo, who co-developed the project, explain. "One of Edgen's key strengths is its versatility and ease of integration across various programming environments. Whether you're a Python aficionado, a C++ veteran, or a JavaScript enthusiast, Edgen caters to your preferred platform with minimal setup requirements."
Edgen aims to run entirely locally, with its initial app offering — a text-based chatbot — working even if the system on which it's installed is disconnected from the internet. The server side is cross-platform and designed for easy extensibility, both in terms of connecting apps and the models it can offer. It's also possible, the team explains, to tie the server in to locally-stored databases to improve the quality of generated responses — without exposing them to a third party.
"The main obstacle to overcome when running GenAI models on-device is memory. Models like LLMs [Large Language Models] are big and require a lot of memory to run," Chamusca and Melo explains. "Consumer-grade hardware is not ready for this, but there are many model compression techniques to reduce the memory footprint of GenAI models, such as: quantization, pruning, sparsification, knowledge distillation, to name a few.
"Edgen leverages the latest techniques and runtimes to optimize the inference of GenAI models. This means that inference is fast and efficient even on low-end devices, and developers building their apps with Edgen don't need to be experts in ML optimization to get the best performance out of their models." It also means, the team say, that it can run on a CPU without needing a high-priced GPU —though GPU acceleration is on the roadmap, for those who want the extra performance.
More information on Edgen is available on the project website, while its source code is available on GitHub under the permissive Apache 2.0 license.