EdgeCortix Unveils the 60 TOPS SAKURA-II Accelerator, Optimized for On-Device Gen AI
Next-generation accelerator delivers a claimed 60 TOPS of INT8 or 30 TFLOPS of BF16 compute in a 10W power envelope.
Edge machine learning specialist EdgeCortix has announced the release of its next-generation SAKURA-II accelerator — aiming to deliver up to 60 tera-operations per second (TOPS) of energy-efficient compute for on-device large language models (LLMs) and other generative artificial intelligence (gen AI) workloads.
"SAKURA-II's impressive 60 TOPS performance within 8W of typical power consumption, combined with its mixed-precision and built-in memory compression capabilities, positions it as a pivotal technology for the latest generative AI solutions at the edge," claims EdgeCortix founder and chief executive officer Sakyasingha Dasgupta
"Whether running traditional AI models or the latest Llama 2/3, Stable-diffusion, Whisper, or Vision-transformer models," Dasgupta continues, "SAKURA-II provides deployment flexibility at superior performance per watt and cost-efficiency. We are committed to ensuring we meet our customer’s varied needs and also to securing a technological foundation that remains robust and adaptable within the swiftly evolving AI sector."
The SAKURA-II accelerator, which the company has been "tailored specifically for processing generative AI workloads at the edge," is capable of running multi-billion parameter AI models on-device, including Llama 2, Stable Diffusion, DETR, and ViT, with a claimed "typical" power draw of 10W. The chip includes 20MB of on-device static RAM (SRAM) and delivers its claimed 60 TOPS at INT8 precision, or 30 tera-floating point operations per second (TFLOPS) at BF16.
For those working with space-constrained devices, the SAKURA-II is being made available on an M.2 2280-footprint PCI Express module; for workstations and servers, a full-size PCI Express add-in board (AIB) variant hosts one or two SAKURA-II chips to deliver up to 120 TOPS per card. The M.2 variant is available with 8GB or 16GB of LPDDR4 memory, while the PCIe AIB is available with 16GB in single-chip or 32GB in dual-chip variants — with the latter, naturally enough, doubling the typical power draw to 20W.
The accelerator is backed by EdgeCortix's MERA software stack, which it says delivers support for a range of models including traditional convolutional neural networks (CNNs) like ResNet 50/101 and YoloX and transformer-based models including DINO, GPT-2, Open-Llama2, and Llama 3 — the latter running on-device at an eight-billion parameter size.
The SAKURA-II cards are now available to pre-order ahead of a planned release in the second half of the year, priced at $249 for the M.2 8GB, $299 for the M.2 16GB, $429 for the single-chip 16GB PCIe AIB, and $749 for the double-chip 32GB PCIe AIB. While EdgeCortix had confirmed plans to also sell the SAKURA-II as a standalone chip for those looking to integrate it into their own device designs, it had not released pricing at the time of writing.