ROCm 快速起步

AMD官方提供了主要Linux发行版安装ROCm的方法,我的实践在 Debian 上完成

安装

RCOm安装

  • Debian 12 系统安装方法:

在 Debian 上安装 ROCm
sudo apt update
sudo apt install "linux-headers-$(uname -r)"
sudo apt install -y python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
wget https://repo.radeon.com/amdgpu-install/6.3.2/ubuntu/jammy/amdgpu-install_6.3.60302-1_all.deb
sudo apt install ./amdgpu-install_6.3.60302-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm

备注

执行 apt install amdgpu-dkms rocm 提示需要下载 3GB 软件包,并且安装需要 36GB 空间。

由于我的根文件系统划分很小,需要有一个大容量空间磁盘来存储,然后构建软连接

参考 ROCm on Linux detailed installation overview > Post-installation instructions 看起来安装目录是 /opt/rocm ,所以我将整个 /opt 目录迁移到大容量规格磁盘中,然后建立 /opt 目录软链接:

/opt 目录迁移到大容量磁盘后建立软链接
mv /opt /huggingface.co/
ln -s /huggingface.co/opt /opt

我发现 nvidia 也是将安装目录存储在 /opt 中,所以其实 nvidia 和 amd 的安装目录可以一起迁移

Ubuntu 24.04 安装 ROCm
cd /tmp/

wget https://repo.radeon.com/amdgpu-install/6.4.3/ubuntu/noble/amdgpu-install_6.4.60403-1_all.deb
sudo apt install ./amdgpu-install_6.4.60403-1_all.deb
sudo apt update
sudo apt install python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
sudo apt install rocm

备注

如果安装包不在 /tmp/ 目录,例如在用户自己的home目录下,执行 sudo apt install xxx.deb 会报错:

安装提示 _apt 用户不能访问的权限错误
...
N: Download is performed unsandboxed as root as file '/home/huatai/amdgpu-install_6.4.60403-1_all.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)

原因是 "以 root 身份未在沙盒环境下下载文件,用户 _apt 无法访问" 。apt 软件包管理系统出于安全考虑,尤其是在 "沙盒" 环境中下载和验证软件包时,会使用专用的非特权用户 _apt 。上述报错通常发生在 .deb 文件位于非全局可读目录(例如用户的主目录或 "下载" 目录)时。

备注

安装提示 "下载需要 4,365 MB,另外需要 25.3 GB 附加安装空间",所以我需要先扩容虚拟机磁盘空间:

我检查发现原来 bhyve虚拟化运行Ubuntu 我配置的是稀疏卷(60G),但是安装的系统只分配了一半的磁盘空间,所以我执行 bhyve上Ubuntu虚拟机扩展LVM上的EXT4文件系统

AMDGPU驱动安装

备注

AMD GPU驱动是ROCm的底层驱动和运行依赖

对于 AMD Container Toolkit :

  • Host主机安装 AMDGPU 驱动

  • 容器内部安装 ROCm

  • 安装AMDGPU驱动:

Ubuntu 24.04 安装 amdgpu 驱动
wget https://repo.radeon.com/amdgpu-install/6.4.3/ubuntu/noble/amdgpu-install_6.4.60403-1_all.deb
sudo apt install ./amdgpu-install_6.4.60403-1_all.deb
sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt install amdgpu-dkms
  • 驱动安装完成后,在host主机上可以看到加载了 amdgpu 内核模块( lsmod | grep amd ):

检查内核模块
huatai@adev:~$ lsmod | grep amd
amdgpu              17133568  0
amdxcp                 12288  1 amdgpu
drm_exec               12288  1 amdgpu
gpu_sched              61440  1 amdgpu
drm_buddy              20480  1 amdgpu
drm_suballoc_helper    16384  1 amdgpu
drm_ttm_helper         12288  1 amdgpu
ttm                   110592  2 amdgpu,drm_ttm_helper
drm_display_helper    237568  1 amdgpu
i2c_algo_bit           16384  1 amdgpu
video                  77824  1 amdgpu

可以通过 动态内核模块支持(DKMS) 检查状态来获知驱动是否安装完成:

检查 dkms 状态
dkms status

输出显示:

检查 dkms 状态可以看到amdgpu驱动模块
amdgpu/6.12.12-2194681.24.04, 6.8.0-71-generic, x86_64: installed
  • 检查 rocminfo 输出中包含如下信息:

rocminfo 信息中包含显卡GPU信息
[...]
*******
Agent 2
*******
  Name:                    gfx906
  Uuid:                    GPU-9332612173497dfc
  Marketing Name:          AMD Radeon Graphics
  Vendor Name:             AMD
  [...]
[...]

可以看到我的 AMD Radeon Instinct MI50 显示为 gfx906

  • 检查GPU是否检测到(这里遇到了异常)

执行 clinfo 检查GPU是否列出
clinfo

输出正常显示了 Platform NameBoard Name ,但是很不幸卡住了输出,最后出现了 GPU Hang 报错:

执行 clinfo 检查出现GPU Hang报错
...
  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon Graphics
...
HW Exception by GPU node-1 (Agent handle: 0x5bcb243beec0) reason :GPU Hang
Aborted (core dumped)

异常问题在后续排查

备注

到这里为止,就可以尝试 Ollama使用AMD GPU运行大模型

异常排查

虽然看上去成功安装了 ROCmAMDGPU driver ,但是我发现 rocm-smi 输出显示没有可用的AMD GPU:

rocm-smi 显示没有可用AMD GPU
WARNING: No AMD GPUs specified
===================================== ROCm System Management Interface =====================================
=============================================== Concise Info ===============================================
Device  Node  IDs           Temp    Power  Partitions          SCLK  MCLK  Fan  Perf  PwrCap  VRAM%  GPU%
              (DID,  GUID)  (Edge)  (Avg)  (Mem, Compute, ID)
============================================================================================================
============================================================================================================
=========================================== End of ROCm SMI Log ============================================
  • 检查 dmesg | grep amdgpu 发现初始化异常,通过完整的 dmesg 显示,似乎 atom_bios (看起来是bhyve模拟的bios存在问题不能支持 amdgpu )

检查系统日志发现AMD GPU初始化失败
[    3.252058] [drm] amdgpu kernel modesetting enabled.
[    3.252243] [drm] amdgpu version: 6.12.12
[    3.252406] [drm] OS DRM version: 6.8.0
[    3.253244] amdgpu: Virtual CRAT table created for CPU
[    3.253915] amdgpu: Topology: Add CPU node
[    3.257044] amdgpu 0000:00:06.0: can't derive routing for PCI INT A
[    3.257228] amdgpu 0000:00:06.0: PCI INT A: no GSI - using ISA IRQ 10
[    3.257425] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66A1 0x1002:0x0834 0x01).
[    3.257597] [drm] register mmio base: 0xC1000000
[    3.257784] [drm] register mmio size: 524288
[    3.258559] amdgpu 0000:00:06.0: amdgpu: detected ip block number 0 <soc15_common>
[    3.258754] amdgpu 0000:00:06.0: amdgpu: detected ip block number 1 <gmc_v9_0>
[    3.258909] amdgpu 0000:00:06.0: amdgpu: detected ip block number 2 <vega20_ih>
[    3.259063] amdgpu 0000:00:06.0: amdgpu: detected ip block number 3 <psp>
[    3.259210] amdgpu 0000:00:06.0: amdgpu: detected ip block number 4 <powerplay>
[    3.259354] amdgpu 0000:00:06.0: amdgpu: detected ip block number 5 <dm>
[    3.259496] amdgpu 0000:00:06.0: amdgpu: detected ip block number 6 <gfx_v9_0>
[    3.259634] amdgpu 0000:00:06.0: amdgpu: detected ip block number 7 <sdma_v4_0>
[    3.259790] amdgpu 0000:00:06.0: amdgpu: detected ip block number 8 <uvd_v7_0>
[    3.259925] amdgpu 0000:00:06.0: amdgpu: detected ip block number 9 <vce_v4_0>
[    3.260507] amdgpu 0000:00:06.0: ROM [??? 0x00000000 flags 0x20000000]: can't assign; bogus alignment
[    3.299305] amdgpu 0000:00:06.0: amdgpu: Fetched VBIOS from ROM
[    3.299903] amdgpu: ATOM BIOS: 113-D1631711-100
[    3.303174] [drm] UVD(0) is enabled in VM mode
[    3.303344] [drm] UVD(1) is enabled in VM mode
[    3.303496] [drm] UVD(0) ENC is enabled in VM mode
[    3.303646] [drm] UVD(1) ENC is enabled in VM mode
[    3.303825] [drm] VCE enabled in VM mode
[    3.303995] amdgpu 0000:00:06.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    3.304184] amdgpu 0000:00:06.0: amdgpu: MODE1 reset
[    3.304333] amdgpu 0000:00:06.0: amdgpu: GPU mode1 reset
[    3.304766] amdgpu 0000:00:06.0: amdgpu: GPU psp mode1 reset
[    3.814317] [drm] psp mode1 reset succeed
[    4.093317] [drm] GPU posting now...
[   24.094673] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
[   24.095199] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 4EC4 (len 74, WS 0, PS 8) @ 0x4EDC
[   24.095753] amdgpu 0000:00:06.0: amdgpu: gpu post error!
[   24.095905] amdgpu 0000:00:06.0: amdgpu: Fatal error during GPU init
[   24.096112] amdgpu 0000:00:06.0: amdgpu: amdgpu: finishing device.
[   24.096302] amdgpu: probe of 0000:00:06.0 failed with error -22

看起来驱动初始化异常,可能原因(可能性从高到低):

我看到Reddit上的一个帖子 Mi50 32gb (Working config, weirdness and performance) 可以正常使用ROCm和AMDGPU驱动(非虚拟机)

  • 这个方法已经验证没有解决我的问题 PROXMOX论坛帖子 AMD GPU firmware/bios missing? amdgpu fatal error 提出了通过安装 firmware-amd-graphics 来解决。不过这个firmware是私有软件,我直接在Ubuntu中执行 apt install firmware-amd-graphics 显示不存在(PROXMOX提供的虚拟机可能已经内置提供了软件仓库)

上述帖子提供了手工下载Firmware的方法: 在 debian firmware-nonfree 提供下载:

手工安装 firmware-amd-graphics
# Firmware download, extract and copy
wget http://ftp.debian.org/debian/pool/non-free-firmware/f/firmware-nonfree/firmware-amd-graphics_20250708-1_all.deb
dpkg -x firmware-amd-graphics_20250708-1_all.deb firmware-amd-graphics
cp -r firmware-amd-graphics/usr/lib/firmware/* /lib/firmware/

# Update "initramfs"
update-initramfs -k all -u

# Reboot
reboot

警告

我发现上述手工安装的 firmware 实际上在之前安装 amdgpu driver 已经安装在 /lib/firmware/amdgpu 目录下了

但是比较奇怪,在 /lib/firmware/amdgpu 目录下似乎是 .zst 后缀的压缩文件,例如 yellow_carp_vcn.bin.zst ; 而 firmware-amd-graphics/usr/lib/firmware/amdgpu/ 目录下是解压缩的文件,例如 yellow_carp_vcn.bin

我尝试了上述方法,没有解决问题,报错依旧

安装完成后步骤(归档)

备注

以前的实践笔记,有些步骤可能不再需要,例如 /etc/ld.so.conf.d/ 目录下现在安装软件包会自动添加 10-rocm-opencl.conf20-amdgpu.conf

  • 配置系统链接器为ROCm应用指明查找共享目标文件 .so 位置:

配置 ldconfig
sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
/opt/rocm/lib
/opt/rocm/lib64
EOF
sudo ldconfig
  • 配置 ROCm 执行文件路径,有以下两种方法(使用一个即可):

    • update-alternatives 可以用来管理多个程序版本(当前 Debian 12 已经具备)

    • environment-modules 是shell工具用于简化初始化,可以使用module文件来修改会话的环境

这里使用 update-alternatives :

使用 update-alternatives 显示所有的ROCm命令
update-alternatives --list rocm

输出显示:

使用 update-alternatives 显示ROCm的输出
/opt/rocm-6.3.2

如果安装了多个 ROCm 版本,则使用以下命令切换版本:

使用 update-alternatives 切换ROCm的版本
update-alternatives --config rocm
  • 验证内核模块驱动的安装:

执行 dkms 检查安装的内核模块驱动
dkms status

输出显示主机安装了 AMD 和 NVIDIA 的模块驱动:

执行 dkms 检查安装的内核模块驱动,这里可以看到AMD和NVIDIA驱动
amdgpu/6.10.5-2109964.22.04, 6.1.0-31-amd64, amd64: installed
nvidia/570.86.15, 6.1.0-30-amd64, x86_64: installed
nvidia/570.86.15, 6.1.0-31-amd64, x86_64: installed
  • 输出 LD_LIBRARY_PATH ,我这里修订 /etc/profile 添加:

设置 LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/rocm-6.3.2/lib
  • 验证 ROCm 安装:

执行 rocminfo
rocminfo

输出:

执行 rocminfo
ROCk module version 6.10.5 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3100                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            24                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    396108548(0x179c2304) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    396108548(0x179c2304) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    396108548(0x179c2304) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    396108548(0x179c2304) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3100                               
  BDFID:                   0                                  
  Internal Node ID:        1                                  
  Compute Unit:            24                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    396352256(0x179fdb00) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    396352256(0x179fdb00) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    396352256(0x179fdb00) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    396352256(0x179fdb00) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 3                  
*******                  
  Name:                    gfx802                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD FirePro S7150                  
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26921(0x6929)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   920                                
  BDFID:                   3328                               
  Internal Node ID:        2                                  
  Compute Unit:            32                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 718                                
  SDMA engine uCode::      71                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8386560(0x7ff800) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8386560(0x7ff800) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx802          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 4                  
*******                  
  Name:                    gfx802                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD FirePro S7150                  
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    3                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26921(0x6929)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   920                                
  BDFID:                   3840                               
  Internal Node ID:        3                                  
  Compute Unit:            32                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 718                                
  SDMA engine uCode::      71                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8386560(0x7ff800) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8386560(0x7ff800) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx802          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

参考