DeepSeek-V3.1 来袭：FP8 技术加持，国产GPU效率增 3 倍，加速算力自主

Note: The English version is provided below.

摘要

2025.8.22日，国产大模型企业深度求索DeepSeek正式发布最新的DeepSeek-V3.1模型。DeepSeek-V3.1 在多项搜索评测指标上取得了较大提升。在需要多步推理的复杂搜索测试（browsecomp）与多学科专家级难题测试（HLE）上，DeepSeek-V3.1 性能已大幅领先 R1-0528。V3.1的 Base 模型一共增加训练了 840B tokens，Base模型与后训练模型均已开源。

需要注意的是，DeepSeek-V3.1采用了 UE8M0 FP8 Scale 的参数精度，这是一种专门针对国产GPU芯片设计的低精度计算格式。FP8（8位浮点数）是一种用于深度学习训练和推理的计算格式，相较于传统的FP16（16位浮点）或FP32（32位浮点），它可以显著降低显存占用和计算资源需求。部分测试显示，国产芯片的运行效率提升300%以上，专家模块利用率从 30%提升至85%，显著增强了国产芯片的可用性。

DeepSeek-V3.1 在设计上充分考虑国产芯片的架构特点，例如华为昇腾、寒武纪、沐曦等国产GPU，这使得模型可以在国产算力平台上高效运行。DeepSeek-V3.1 的开源和推广，促使国产GPU 厂商加速兼容和优化自身的硬件，以支持FP8计算。

DeepSeek-V3.1 通过 UE8M0 FP8精度格式和深度优化国产芯片适配，显著提升了国产GPU 的计算效率和可用性，降低了部署成本，推动了国产AI生态的自主化。同时，它的发展刺激国产GPU厂商加速技术迭代，并在全产业链层面促进协同发展，成为中国AI算力自主化的重要推动力。

Abstract
On August 22, 2025, DeepSeek, a leading Chinese model development company, officially released its latest model, DeepSeek-V3.1. DeepSeek-V3.1 has achieved significant improvements in multiple search evaluation metrics. It has outperformed R1-0528 by a large margin in complex search tests requiring multi-step reasoning (browsecomp) and multi-disciplinary expert-level problem tests (HLE). The Base model of V3.1 was trained with with 840 billion tokens, both the base model and the post-trained model are now open source.

It is important to note that DeepSeek-V3.1 uses the UE8M0 FP8 scale parameter precision, a low-precision computing format specifically designed for domestic GPU chips. FP8 (8-bit floating-point) is a computing format used in deep learning training and inference. Compared with traditional FP16 (16-bit floating-point) or FP32 (32-bit floating-point), it can significantly reduce memory usage and computing resource requirements. Some tests show that the operating efficiency of domestic chips has increased by more than 300%, and the utilization rate of expert modules has risen from 30% to 85%, significantly enhancing the usability of domestic chips.

DeepSeek-V3.1 was designed with full consideration of the architectural characteristics of domestic chips, such as Huawei Ascend, Cambricon, and Muxi domestic GPUs, enabling the model to run efficiently on domestic computing power platforms. The open-sourcing and promotion of DeepSeek-V3.1 have prompted domestic GPU manufacturers to accelerate the compatibility and optimization of their hardware to support FP8 computing.

Comment
DeepSeek-V3.1 significantly improves the computing efficiency and usability of domestic GPUs through its adaptation to the UE8M0 FP8 precision format and deep optimization of domestic chips, reducing deployment costs and promoting the independence of the domestic AI ecosystem. Meanwhile, its development has stimulated domestic GPU manufacturers to accelerate technological iteration and promoted collaborative development at the entire industry chain level, becoming an important driving force for China's AI computing power autonomy.

DeepSeek-V3.1 来袭：FP8 技术加持，国产GPU效率增 3 倍，加速算力自主

相关推荐