DeepSeek: The New Frontier in Chinese AI Shocks Industry Giants
Exploring 2 Breakthrough Technologies and 6 Reasons Why AI Titans Are Surprised
DeepSeek, an AI company established in 2023 by the private equity fund High-Flyer Quant, has quickly become a focal point in the AI field despite its short history. The company's latest breakthrough, the DeepSeek-V3 model, boasts an impressive 671 billion parameters, setting a new benchmark for balancing performance and cost efficiency. But what are the innovations that make DeepSeek truly stand out?
Brand-new experience, redefining possibilities
-DeepSeek
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
2 Key Technologies Behind Low Costs
DeepSeek managed to develop a high-performance AI model within two years at a cost of only $5.57 million, in stark contrast to OpenAI’s GPT-4 training cost of $63 million, and far below the projected $500 million budget for GPT-5. These achievements are made possible by the following innovative technologies:
Selective Activation of "Brain Cells"
DeepSeek-V3 adopts a design called the "Mixture of Experts" (MoE) architecture. Simply put, it activates only a portion of the "brain cells" when needed, rather than utilizing all of them. This dramatically reduces the consumption of computational resources, allowing the model to be trained using only 2,048 NVIDIA H800 GPUs.
MLA (Multi-head Latent Attention): a better and faster attention that ensures efficient inference via reducing KV cache.
DeepSeekMoE (Mixture of Experts) : a novel sparse architecture that enables training strong models at an economical cost through sparse computation.
Data Processing and Energy Efficiency Innovations
DeepSeek has developed internal tools to generate high-quality training data and employs "Distillation Techniques" to further reduce computational resource requirements. During the training process, FP8 technology is utilized—a low-precision data format that significantly reduces memory demands while improving efficiency. The use of FP8 cuts memory requirements to half of those needed for traditional FP16 technology, without compromising computational performance.
6 Reasons Why Tech Giants Are Surprised
After exploring the technology behind DeepSeek, let’s dive into why it has caused such a significant stir in the industry:
1. Low Cost and High Efficiency
DeepSeek's development took only two months and approximately $5.5 million, a fraction of the billions spent by giants like OpenAI and Google to develop similar models. This rapid and efficient development approach highlights how the barriers to creating large language models (LLMs) are shrinking significantly.
2. Competitive Performance
According to third-party benchmarks, DeepSeek's performance is on par with, or even superior to, state-of-the-art models from OpenAI and Meta in certain domains. This proves that achieving high-performance models no longer requires massive financial investment.
3. Breaking Hardware Limitations
DeepSeek trained its models using NVIDIA H800 chips, a version with lower performance compared to H100 but more accessible. This approach not only reduced hardware costs but also avoided supply constraints associated with H100 chips.
The Battlefield of Large Models
The A800, after being "trimmed," has reduced the efficiency of training large models. The A800 SXM primarily suffers from reduced data transfer efficiency between GPU cards, with bandwidth decreased by 33%. For instance, in training a model like GPT-3 with 175 billion parameters, multiple GPUs need to work together. If bandwidth is insufficient, performance can drop by around 40% (due to GPUs waiting for data to arrive). Considering the cost-effectiveness of the A800 and H800, China users still lean toward the A800. However, the reduced training efficiency of the A800 and H800 stems from the need to exchange some training data between cards, and the decrease in transfer speed directly impacts their efficiency.In the HPC Field
In terms of double-precision computing, the A800 and A100 have the same computational power, so there is no impact on high-performance scientific computing. However, the situation with the H800 is much worse, as its double-precision computing power has been cut to just 1 TFLOPS, rendering it nearly unusable. This has a significant impact on the supercomputing field.
4. Challenging Market Dominance
The emergence of DeepSeek signals that the dominance of AI leaders like OpenAI, Google, and Meta could be disrupted by new competitors. This serves as an important wake-up call for the existing industry giants.
5. A New Perspective for Investors
DeepSeek’s success has prompted investors to reconsider whether they need to continue funding costly cutting-edge model training, or if similar results can be achieved with significantly lower budgets. This could shift the flow of capital and have profound implications for the market order.
The chart shows the percentage of NVIDIA's revenue coming from the CAPEX investments of major tech companies:
Microsoft accounts for a significant portion, representing one of NVIDIA's largest customers.
Meta also contributes substantially, followed by other companies.
Alphabet (Google) and Amazon have smaller, yet notable shares compared to Microsoft and Meta.
Tech giants rely heavily on NVIDIA's GPUs and related products for AI workloads, data center operations, and other advanced computing needs. The phrase "The more you buy, the more you save" suggests that these companies are leveraging bulk purchasing to optimize their costs while building out their AI and computing infrastructures.
A Unique Path for AI Development in China
The Chinese market boasts the world's largest data resources but faces challenges in hardware computational power due to factors such as technological embargoes and hardware supply shortages. This has led Chinese AI companies to place greater emphasis on efficiency optimization. DeepSeek's success exemplifies a new balance point between resource usage and performance.
Meanwhile, tech giants like Google, Microsoft, and Meta are betting on nuclear power to support their energy-intensive AI training needs. In contrast, emerging companies like DeepSeek have chosen a different path, focusing on technological innovation to minimize resource wastage and providing the industry with fresh perspectives.
The story of DeepSeek demonstrates that the future of AI competition is not just about technology itself but about achieving the best outcomes with limited resources. This approach could very well be the key to changing the rules of the game in the market.
9 of China's top AI models and startups
Baidu — Ernie 4.0
Alibaba Cloud — Qwen 2
ByteDance — Doubao
Tencent — Hunyuan
Moonshot AI — Ohai, Noisee, and Kimi
MiniMax — Talkie
Kuaishou — Kling
iFlytek — Spark V4.0
Zhipu AI