20231225_浙商证券_计算机行业:华为算力框架报告昇腾鲲鹏构筑国内算力第二极_37页.pdf

返回 相关 举报
20231225_浙商证券_计算机行业:华为算力框架报告昇腾鲲鹏构筑国内算力第二极_37页.pdf_第1页
第1页 / 共37页
20231225_浙商证券_计算机行业:华为算力框架报告昇腾鲲鹏构筑国内算力第二极_37页.pdf_第2页
第2页 / 共37页
20231225_浙商证券_计算机行业:华为算力框架报告昇腾鲲鹏构筑国内算力第二极_37页.pdf_第3页
第3页 / 共37页
20231225_浙商证券_计算机行业:华为算力框架报告昇腾鲲鹏构筑国内算力第二极_37页.pdf_第4页
第4页 / 共37页
20231225_浙商证券_计算机行业:华为算力框架报告昇腾鲲鹏构筑国内算力第二极_37页.pdf_第5页
第5页 / 共37页
亲,该文档总共37页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
2023 12 1 S1230523020002 S1230522060001 2 AI 2024 AI 400 8 AI 2024 AI 211.50EFlops 98.24EFlops 46.45%AI 30.7 307 8 AI 3.84 409.33 AI 2006 CUDA 400 4000 3000 CUDA AI IDC 2022 AI 109 AI 85%10%2%1%AI G Atlas G 1 2 3 4 YWDWwPrMtQnMpRtQtOrOoP6MbPbRpNnNnPoNfQpOmOlOsQrP8OrRuMMYmQpNuOmRsR2024 AI 400 01Partone34 8 1 2015 4 Intel AMD CPU(CPU)2022 TOP100 95%2 2019 PC+5 AI 1 AI 2 AI 3 AI AI+10 5 3 2024 AI 400 6 AI AI H100 AI AI 2024 2024 AI 30.7 307 AI 3.84 409.33 910 FP16 320T INT8 640T2024 EFlops%EFlops Tflops AI AI FP16 23.00 100%23.00 320 7.19 8 0.90 FP16 43.63 80%34.90 320 10.91 8 1.36 FP16 10.69 60%6.41 320 2.00 8 0.25-FP16 79.37 25%19.84 320 6.20 8 0.78-FP16 49.60 20%9.92 320 3.10 8 0.39-INT8 10.42 80%8.33 640 1.30 8 0.16 211.50 98.24 30.70 3.84 AI 02Partone78 CUDA CUDA Compute Unified Device Architecture CUDA CUDA Core FP32/FP64 Tensor Core FP16 INT8 API CUDA-X cuDNN cuML TensorRT cuDF cuGraph 13 13 CUDA CUDA 2006 CUDA 400 4000 3000 AI CSDN COMPUTEX 2023 21Tech Wikipedia khronos run.ai AMD CUDA OpenCL ROCm NVIDIA Apple AMD(NVIDIA)NVIDIA AMD Intel GPU CPU FPGA AMD GPU C,C+,Fortran,Python,MATLAB C HIP CUDA OpenCL NVIDIA GPU CUDA OpenCL 30%CUDA-400 3,000-GPU OpenCL HPC AI AMD Infinity Hub TensorFlow 1.x PyTorch 1.8 MXNet CUDA OpenCl ROCm CUDA AI 9 AI HUAWEI Ascend AI CANN Compute Architecture for Neural Networks AI AI AI Atlas CANN AI MindSpore MindX ModelArts 910 AI Atlas900 AI Atlas800 Atlas500 AI Atlas300 AI Atlas200 Atlas AI 10 GPU CPU ALU GPU SM GPU AI MAC GPU CPU AI Intel2023 Sapphire Rapids 60 Nvidia H100 GPU 132 SM SM 64 Core 8448 Core CPU GPU GPU IT 11 microarchitecture 2006 GPU Tesla 1-2 JPR 2023 Q2 87%JPR Tesla Fermi Kepler Maxwell Pascal Volta Turing Ampere Hopper 2006 2009 2012 2014 2016 2017 2018 2020 2022 40nm 28nm 28nm 16nm 12nm 12nm 8nm 4nm 128 16 SM*32CUDA Core 512 CUDA Core15 SMX*192+64 CUDA core 3072 CUDA 3840 CUDA 5120 CUDA 640 4608 CUDA 576 6912 CUDA 432 18432 FP32 CUDA 576 GPU CUDA C GPU GPU L1/L2 GPU Direct GPU Fermi 3-4 GPU Kepler GPU GPU 300W Maxwell 50%GPU AI 112 TFLOPS Pascal 3 Ray Tracing(RT Core)400W Hopper Transformer FP16 FP8 NVIDIA AI NPU 12 AI AI NPU GPU A100 AI NPU CPU DVPP Soc 910 AI Host AI Core AI 1 16*16 163=4096 3D Cube FP16 2 FP32 FP16 INT32 INT8 AI Core 3 CPU AI Core AI Core/A100 13 AI IP 910 FP16 320TFLOPS 910 CPU Core DVPP(Task Scheduler)Host CPU HCCS PCle 4.0 ROCE v2(Scale Out)(Scale Up)GPU A100 310 22TOPS INT8 11TFLOPS FP16 8W AI IDC 2022 AI 109 AI 85%10%2%1%2023 AI 50/AI 10%199it IDC AMD IT AI AMD MI300X L40s A100 SXM H100 SXM 310 910 DCU 370 FP64 47.9T-9.7T 34T-11.5T-FP32 47.9T 183T 19.5T 67T-24TFP16 383T 362.05T 312T 989.5T 11T 320T-96TINT8-733T 624T 1979T 22T 640T-256T 192GB 48GB 80GB 80GB-32GB 24GB 5.05TB/s 864GB/s 1.99TB/s 3.35TB/s-1TB/s 307.2 GB/s 600w 350W 400w 700w 8W 310W 260-350W 150W 14Atlas 900 AI Atlas 900 PoDAtlas 800 Atlas 800 Atlas 800 Atlas 800 Atlas 300T Atlas 300I 9000 9000 9000 9010 3000 3010 9000 3000/3010-47U 4U 4U 2U 2U 3/4 PCIe CPU-32*920 4*9202*Intel V5 Cascaded Lake 2*9201/2 Intel Xeon SP Skylake Cascade Lake 205W-AI 910 AI 64*910 8*910 8*910 8 Atlas 300I 7 Atlas 300I 910 310HBM-2048 GB 32 GB 1228GB/s 32GB 1228GB/s-AI 256 1024 PFLOPSFP1614.08 20.48 PFLOPS FP16 1 EFLOPS FP162.24 PFLOPS FP161.76 PFLOPS FP162.24 PFLOPS FP161.76 PFLOPS FP16 704 TOPS INT8 616 TOPS INT8 30 AI Core280 TFLOPS FP16(Pro)220 TFLOPS FP1688 TOPS INT8 HCCS PCIe 100G RoCE-8*100GE+4*25GE/2*100GE8*100GE1*OCP NIC 3.0 2*25GE 9 PCIe4.0 PCIe 10 PCIe Gen3.0 1*100GE QSFP-DD 56.5 Gb/sPCIe x16 Gen3.0 50KW 46 kW 5.6 kW 5.6 kW-300W 67W/-CUDA 15 CUDA NVIDIA(GPU)API GPU CUDA CUDA OpenCL OpenGL API C C+Fortran Python MATLAB GDB Nsight Memcheck CUDA CUDA-X CNSD run.ai GPU+B1:Q38 cuBLAS cuFFT CUDA cuRAND cuSOLVER cuSPARSE cuTENSOR AmgX GPU C+Thrust GPU CUDA GPU nvJPEG NVIDIA NVIDIA SDK NVIDIA SDK GPU NVSHMEM NCCL GPU CUDA GPU NVIDIA cuDNN NVIDIA TensorRT NVIDIA Riva NVIDIA DeepStream SDK NVIDIA DALI 合作伙伴共建OpenCV FFmpeg ArrayFire MAGMA IMSL Fortran Gunrock CHOLMOD Triton Ocean SDK CUVIlib CANN CUDA 16 CANN CUDA+CuDNN AI CANN 1 CANN DSL 70%TIK 2 CANN 7.0 AI 10 EMUI Andriod openEuler UOS Ubuntu Debian Suse 14 AI CPU NPU AI CANNAI Driver RuntimeGraph Engine AscendCL Al DVPP AIPP HCCL 100G ROCE AI MindSpore AI TensorFlow/PyTorch CANN CSDN IT 17 AI AI Al MindSpore-AI-MindSpore 1 1.0 AI 1.5 2.0 LLaMA Bloom GLM GPT 2.2 20+52+MindSpore 22+MindSpore 20 NPU GPGPU CPU Omdia MindSpore AI 11%MindSpore MindSpore MindSpore Omdia 199it CSDN MindSpore MindSpore Pytorch 18 AI Google-TensorFlow Meta-PyTorch TensorFlow PyTorch MindSpore Omdia TensorFlow PyTorch MindSpore PaddlePaddle MindSpore 40%1.3 MindSpore 900 Papers with Code 2022 MindSpore AI PyTorch CANN AI Framework Adaptor TensorFlow PyTorch AI 10 18 Premier AI PyTorch PyTorch2.1 NPU PyTorch2.1 PyTorch BLOOM GPT-3 LLaMA AI()()()()1 TensorFlow 151592 88700+177000+34102 PyTorch 62559 19000+69400 2864 1 Mindspore 69956 645 3600 4652 PaddlePaddle 42793 5300 20700 815 2023 8 AI GitHub Github CSDN Omdia 199it 63.20%36.80%37.30%20.10%42.60%61.20%38.80%0%10%20%30%40%50%60%70%TensorFlow PyTorch TensorFlow PyTorch 其他 TensorFlow PyTorch份额2023%PyTorch,34%TensorFlow,30%PaddlePaddle,11%MindSpore,11%OneFlow,3%MXNet,2%MegEngine,2%Jittor,1%其他,6%MindStudio 340%19 MindStudio AI AI GPU GPU PyTorch MindStudio torch GPU API torch NPU API AutoML AutoML MindSpore PyTorch NLP 20%AI IT 训练 OCR MindStudio Profiling 340%+300%+推理 CV OCR NLP CV Pytorch GPU2Ascend YOLOV3 centernet DBNet 20 Model ArtsHiAI Service MindXMindSpore TensorFlow/PyTorch/.CANNNPU MindstudioFusionDirector/Smartkits IHV ISVOEM ODMC&SI IHV-21 CPU 920 GPU 910bGPU 310Atlas 200 AI INT8 22T Atlas 200 AI INT8,22T Atlas 500Atlas 500Pro INT8,352T Atlas 300I INT8,88T Atlas 800 INT8,704T Atlas 300T FP16,320T Atlas800 FP16,2.24P Atlas 900 AI IHV AI 22 wind AI G 2022 23Q3 23&31%925 226.53 3.89 23.22%-25.00%-48.39%-5%-70%66 8.03-19.37-12%-22%-100%4%1159 29.7 5.64 70.00%-5%22 9.49-5.07 74.95%-66.60%-16%309 31-42.91-140 44.97-7.1-75 93.25-7.91 100%-118 12.41-6.16 100%-238 61.33-15.94-42 6.16-3.74-23 IT AI AI G CPU+GPU 2.5PFP16 5 10 AI 24 Atlas 900 AI HCCS PCIe 4.0 100G RoCE Atlas 900 SuperCluster AI CloudEngine XH16800 800GE 2250 18000 10 36Kr 050001000015000200002020.6 2023.6 2023.12集 群 规 模(卡)2020.6 2023.6 2023.12024681012业界SOTA Atlas 900(8K 集群)Atlas 900(16K 集群)收 敛 时 长(天)业界SOTA Atlas 900(8K 集群)Atlas 900(16K 集群)1 0.5 05101520253035业界SOTA Atlas 900稳 定 训 练(天)业界SOTA Atlas 900Atlas 900 AI 4K8K16K10 3 30 25 Atlas 900 AI AI AI E AI AI 1 L0-L3 2 Atlas HCSO(ModelArts)AI 50%80%70%AI 4.5 1.8 3 MindXDL web 510/40 100P 28 AI 7 AI AI AI L4 AI/L3 AI HCSO ModelArts L2 X86 AI GPU Atlas 800 Atlas 900 AI Altas 800 Atlas 900 AI L1/L0 26 1 21 102 1000P 22 180 3 300P 12.7 23 4 5.9 24 4000 61 5 400P 5.47 25 300P 15 6 100P 3.9 26 2000P 50.5 7 140P 5.7 27 1000P 180 8 1250P 13 28 400P 10 9/29 18 10 450 30 11 96 31 12 140P 3.3 32 8.5 13 300P 10 33 1000P 109 14-34 15 400P 35 19 16 36 11.8 17 1000P 37 1.9718 38 19 12000P 39 5.5P 4.3 20 100P 40 300P 10.6 02Partone27 28 AI BMC SSD RAID 29 SSD PC openEuler openGauss ARM 30 wind CPU CPU 1 CPU 2 CPU CISC RISC Intel X86 ARM MIPS RISC-V LoongArch CISC RISC x86 ARM MIPS Alpha 1 2 3 4 5 1 2 3 1 32 2 3 32 4 1 32 2 x86 x86 ARM MIPS Alpha Intel AMD ARM v8 31 Voice+Arm Ampere 128 Ampere Computing x86 3 4 50%200%Arm 90%920 CPU ARM v8.2 2.6GHz 64 8 DDR4 100G RoCE PCle4.0 CCIX 640Gbps SPECint Benchmark 930 25%920 Intel AMD Xeon6354 EPYC7542 7285 KH-30000 920-7260S2500 3C5000L 1621 x86 x86 x86 x86 ARM ARM LoongArch SW_64 18 32 32 8 64 64 16 16 36 64 64 3.0GHz 2.9GHz 2.0GHz 3.0GHz 2.6GHz 2.2GHz 2.2GHz 2.0GHz DDR4 DDR4 DDR4 DDR4 DDR4 DDR4 DDR4 DDR3 8 8 8 2 8 8 4 8 3200MHz 3200MHz 2666MHz 2666 MHz 2933MHz 3200MHz 3200MHz 2133MHzPCIe 64 128 128 16 40 17 32 16 CPU 2.0 32 2.0 2019 2020 2.0+BIOS/BMC CPU+2.0 DevKit 2700+/8600+33 DevKit&2023 DevKit 23.0 2700+/8600+DevKit TOP 10 Hour Day DevKit ExaGear x86 10%DevKit SDK DevKit 23.0 BoostKit 90%34 BoostKit/90%BoostKit 23.0 5 HPC BoostKit 04Partone35 361 2 3 4 4 AI 5 AI 95%6 300 1 300 10%2 300 10%10%3 300 10%37
展开阅读全文
相关资源
相关搜索
资源标签

copyright@ 2017-2022 报告吧 版权所有
经营许可证编号:宁ICP备17002310号 | 增值电信业务经营许可证编号:宁B2-20200018  | 宁公网安备64010602000642