Hailo's Latest Accelerator, the Hailo-10H, Promises On-Device Gen AI in a Sub-5W Power Envelope
Power-sipping accelerator delivers 10 tokens per second in Llama2-7B and five-second image generation in Stable Diffusion 2.1.
Edge artificial intelligence (edge AI) specialist Hailo has announced a new accelerator family, the Hailo-10, as it celebrates raising an additional $120 in funding — promising the ability to run generative AI (gen AI) models in a sub-five-watt power envelope.
"As gen AI on the edge becomes immersive, the focus turns to handling large LLMs [Large Language Models] in the smallest possible power envelope — essentially less than five watts," claims Hailo's chief executive officer and co-founder Orr Danon of the company's latest launch.
"We designed Hailo-10 to seamlessly integrate gen AI capabilities into users’ daily lives," Danon continues, "freeing users from cloud network constraints. This empowers them to utilize chatbots, copilots, and other emerging content generation tools with unparalleled flexibility and immediacy, enhancing productivity and enriching lives."
According to Hailo's internal testing, the Hailo-10 accelerators can deliver up to 40 tera-operations per second (TOPS) of compute performance for on-device machine learning (ML) and artificial intelligence workloads — twice the performance of Intel's rival Core Ultra NPU, the company claims, at half the power envelope. Translated to real-world performance that means running the Llama2-7B LLM at up to 10 tokens per second or the Stable Diffusion 2.1 image generation model at five seconds per image, both while drawing under 5W of power.
Initial target workloads for the new accelerators will be PCs and automotive infotainment systems, Hailo says, though it's not ruling out other use-cases. "Whether users employ gen AI to automate real-time translation or summarization services, generate software code, or images and videos from text prompts," Danon claims, "Hailo-10 lets them do it directly on their PCs or other edge systems, without straining the CPU or draining the battery."
The company has confirmed an initial Hailo-10 part, the Hailo-10H M.2 Generative AI Acceleration Module with 8GB of LPDDR4 on-module memory and support for x86 or Arm aarch-64 hosts running Microsoft Windows, to begin sampling in the second quarter; general availability and pricing, however, had not been disclosed at the time of writing.
More information is available on the Hailo website.