• 正文
  • 相关推荐
申请入驻 产业图谱

DeepSeek-V3.1 来袭:FP8 技术加持,国产GPU效率增 3 倍,加速算力自主

原创
08/22 15:34 来源:与非观察
3791
加入交流群
扫码加入
获取工程师必备礼包
参与热点资讯讨论

Note: The English version is provided below.

摘要

2025.8.22日,国产大模型企业深度求索DeepSeek正式发布最新的DeepSeek-V3.1模型DeepSeek-V3.1 在多项搜索评测指标上取得了较大提升。在需要多步推理的复杂搜索测试(browsecomp)与多学科专家级难题测试(HLE)上,DeepSeek-V3.1 性能已大幅领先 R1-0528。V3.1的 Base 模型一共增加训练了 840B tokens,Base模型与后训练模型均已开源

需要注意的是,DeepSeek-V3.1采用了 UE8M0 FP8 Scale 的参数精度,这是一种专门针对国产GPU芯片设计的低精度计算格式。FP8(8位浮点数)是一种用于深度学习训练和推理的计算格式,相较于传统的FP16(16位浮点)或FP32(32位浮点),它可以显著降低显存占用和计算资源需求。部分测试显示,国产芯片的运行效率提升300%以上,专家模块利用率从 30%提升至85%,显著增强了国产芯片的可用性。

DeepSeek-V3.1 在设计上充分考虑国产芯片的架构特点,例如华为昇腾、寒武纪、沐曦等国产GPU,这使得模型可以在国产算力平台上高效运行。DeepSeek-V3.1 的开源和推广,促使国产GPU 厂商加速兼容和优化自身的硬件,以支持FP8计算。

评论

DeepSeek-V3.1 通过 UE8M0 FP8精度格式和深度优化国产芯片适配,显著提升了国产GPU 的计算效率和可用性,降低了部署成本,推动了国产AI生态的自主化。同时,它的发展刺激国产GPU厂商加速技术迭代,并在全产业链层面促进协同发展,成为中国AI算力自主化的重要推动力。

 

Abstract
On August 22, 2025, DeepSeek, a leading Chinese model development company, officially released its latest model, DeepSeek-V3.1. DeepSeek-V3.1 has achieved significant improvements in multiple search evaluation metrics. It has outperformed R1-0528 by a large margin in complex search tests requiring multi-step reasoning (browsecomp) and multi-disciplinary expert-level problem tests (HLE). The Base model of V3.1 was trained with with 840 billion tokens, both the base model and the post-trained model are now open source.

It is important to note that DeepSeek-V3.1 uses the UE8M0 FP8 scale parameter precision, a low-precision computing format specifically designed for domestic GPU chips. FP8 (8-bit floating-point) is a computing format used in deep learning training and inference. Compared with traditional FP16 (16-bit floating-point) or FP32 (32-bit floating-point), it can significantly reduce memory usage and computing resource requirements. Some tests show that the operating efficiency of domestic chips has increased by more than 300%, and the utilization rate of expert modules has risen from 30% to 85%, significantly enhancing the usability of domestic chips.

DeepSeek-V3.1 was designed with full consideration of the architectural characteristics of domestic chips, such as Huawei Ascend, Cambricon, and Muxi domestic GPUs, enabling the model to run efficiently on domestic computing power platforms. The open-sourcing and promotion of DeepSeek-V3.1 have prompted domestic GPU manufacturers to accelerate the compatibility and optimization of their hardware to support FP8 computing.

Comment
DeepSeek-V3.1 significantly improves the computing efficiency and usability of domestic GPUs through its adaptation to the UE8M0 FP8 precision format and deep optimization of domestic chips, reducing deployment costs and promoting the independence of the domestic AI ecosystem. Meanwhile, its development has stimulated domestic GPU manufacturers to accelerate technological iteration and promoted collaborative development at the entire industry chain level, becoming an important driving force for China's AI computing power autonomy.

来源: 与非网,作者: 王兵,原文链接: https://www.eefocus.com/article/1880627.html

寒武纪

寒武纪

寒武纪提供云边端一体、软硬件协同、训练推理融合、具备统一生态的系列化智能芯片产品和平台化基础系统软件。寒武纪产品广泛应用于服务器厂商和产业公司,面向互联网、金融、交通、能源、电力和制造等领域的复杂 AI 应用场景提供充裕算力,推动人工智能赋能产业升级。AI芯片及处理器设计商,产品覆盖云端、边缘计算,2023年研发投入占比超200%。

寒武纪提供云边端一体、软硬件协同、训练推理融合、具备统一生态的系列化智能芯片产品和平台化基础系统软件。寒武纪产品广泛应用于服务器厂商和产业公司,面向互联网、金融、交通、能源、电力和制造等领域的复杂 AI 应用场景提供充裕算力,推动人工智能赋能产业升级。AI芯片及处理器设计商,产品覆盖云端、边缘计算,2023年研发投入占比超200%。收起

查看更多

相关推荐

登录即可解锁
  • 海量技术文章
  • 设计资源下载
  • 产业链客户资源
  • 写文章/发需求
立即登录

与非网资深行业分析师,工科背景,11年行业研究经历。擅长从行业供需、量价、公司财务基本面等角度分析,洞悉电子行业未来发展方向,欢迎交流。