DeepSeek Releases an OpenAI-Comparable Large Language Model Trained at a Fraction of the Cost
DeepSeek-R1 performs on par with OpenAI's equivalent β but was trained for under $10 million, and is available to download free of charge.
A Chinese artificial intelligence (AI) company has shaken things up with the release, under a free-to-use license, of a large language model (LLM) it claims can go toe-to-toe with the best from companies like OpenAI and Meta β despite, it says, having been trained for a fraction of the cost: DeepSeek-R1.
"DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors," the company claims of its creation. "However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks."
Large language models (LLMs), which break down user-provided inputs into tokens then produce the most statistically-likely response tokens, are enjoying their time in the limelight of late. The technology is being used to drive "AI" features being added to seemingly every commercial app and service around, despite their unreliability in producing factual responses and the vast computational and environmental resources required to train and run them.
It's this latter where DeepSeek says it can help: its models, of which the DeepSeek-R1 family is only the latest, are said to be trained for a fraction of the cost of those of its rivals including Google, OpenAI, and Meta. Despite this, the resulting devices benchmark competitively β even surpassing rivals in certain tests. To prove it, DeepSeek has released pre-trained models for both DeepSeek-R1-Zero and DeepSeek-R1 with 37 billion activated parameters from a 671 billion parameter total β along with fine-tuned distilled models, based on the Qwen and Meta Llama models, with as few as 1.5 billion parameters and suitable for use on-device on consumer-grade laptops and desktops.
DeepSeek has claimed it can train its cutting-edge models for less than $10 million on hardware reportedly including a collection of 50,000 Nvidia A100 accelerators acquired by company founder Liang Wenfeng prior to an export ban and coupled with readily-available lower-performance chips β still far from pocket change, but a fraction of the billions of dollars being spent by the company's US rivals. As well as being cheaper and less environmentally damaging, the models will also be cheaper to use: DeepSeek has priced its hosted version at $0.14-0.55 per million input and $2.19 per million output tokens β considerably less than OpenAI's equivalent models.
There are, however, some caveats to the company's offerings. Like its rivals, while it makes claim to releasing the models under an open source license it does not provide everything one would need to reproduce the model from scratch β only enough to make use of the model or fine-tune it further, rather than inspect its inner workings. Concerns surround its training data, too, beyond the usual issues of the use of copyright materials: early adopters of the model have found it unsurprisingly toeing the Chinese Communist Party (CCP) line, refusing to respond to queries on subjects censored by the government.
Despite this, the release of DeepSeek-R1 and its related models has had a dramatic impact on the market: shares in NVIDIA, whose stock price has been buoyed by the expectation that the AI boom will require billions of dollars to be spent on its hardware, has seen its share price plummet nearly 14% since markets opened, though while Meta's share price saw a similar drop on open it has since more than recovered.
A white paper detailing DeepSeek-R1 is available on GitHub under the permissive MIT license; the company's models are available on HuggingFace under the same MIT license β while the Qwen and Llama derived models are licensed under the Apache 2.0 and Meta's custom llama3.1 licenses respectively. DeepSeek's models are also available for use on the company's cloud platform and in its mobile apps.