ROCm 快速起步
AMD官方提供了主要Linux发行版安装ROCm的方法,我的实践在 Debian 上完成
安装
RCOm安装
- Debian 12 系统安装方法: 
sudo apt update
sudo apt install "linux-headers-$(uname -r)"
sudo apt install -y python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
wget https://repo.radeon.com/amdgpu-install/6.3.2/ubuntu/jammy/amdgpu-install_6.3.60302-1_all.deb
sudo apt install ./amdgpu-install_6.3.60302-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm
备注
执行 apt install amdgpu-dkms rocm 提示需要下载 3GB 软件包,并且安装需要 36GB 空间。
由于我的根文件系统划分很小,需要有一个大容量空间磁盘来存储,然后构建软连接
参考 ROCm on Linux detailed installation overview > Post-installation instructions 看起来安装目录是 /opt/rocm ,所以我将整个 /opt 目录迁移到大容量规格磁盘中,然后建立 /opt 目录软链接:
/opt 目录迁移到大容量磁盘后建立软链接mv /opt /huggingface.co/
ln -s /huggingface.co/opt /opt
我发现 nvidia 也是将安装目录存储在 /opt 中,所以其实 nvidia 和 amd 的安装目录可以一起迁移
- Ubuntu Linux 系统安装方法需要区分 24.04 和 22.04,我的实践是在 在bhyve中实现AMD GPU passthrough bhyve(BSD hypervisor) 中运行 Ubuntu 24.04,安装如下: 
cd /tmp/
wget https://repo.radeon.com/amdgpu-install/6.4.3/ubuntu/noble/amdgpu-install_6.4.60403-1_all.deb
sudo apt install ./amdgpu-install_6.4.60403-1_all.deb
sudo apt update
sudo apt install python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
sudo apt install rocm
备注
如果安装包不在 /tmp/ 目录,例如在用户自己的home目录下,执行 sudo apt install xxx.deb 会报错:
_apt 用户不能访问的权限错误...
N: Download is performed unsandboxed as root as file '/home/huatai/amdgpu-install_6.4.60403-1_all.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)
原因是 "以 root 身份未在沙盒环境下下载文件,用户 _apt 无法访问" 。apt 软件包管理系统出于安全考虑,尤其是在 "沙盒" 环境中下载和验证软件包时,会使用专用的非特权用户 _apt 。上述报错通常发生在 .deb 文件位于非全局可读目录(例如用户的主目录或 "下载" 目录)时。
备注
安装提示 "下载需要 4,365 MB,另外需要 25.3 GB 附加安装空间",所以我需要先扩容虚拟机磁盘空间:
我检查发现原来 bhyve虚拟化运行Ubuntu 我配置的是稀疏卷(60G),但是安装的系统只分配了一半的磁盘空间,所以我执行 bhyve上Ubuntu虚拟机扩展LVM上的EXT4文件系统
AMDGPU驱动安装
- 安装AMDGPU驱动: 
wget https://repo.radeon.com/amdgpu-install/6.4.3/ubuntu/noble/amdgpu-install_6.4.60403-1_all.deb
sudo apt install ./amdgpu-install_6.4.60403-1_all.deb
sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt install amdgpu-dkms
- 驱动安装完成后,在host主机上可以看到加载了 - amdgpu内核模块(- lsmod | grep amd):
huatai@adev:~$ lsmod | grep amd
amdgpu              17133568  0
amdxcp                 12288  1 amdgpu
drm_exec               12288  1 amdgpu
gpu_sched              61440  1 amdgpu
drm_buddy              20480  1 amdgpu
drm_suballoc_helper    16384  1 amdgpu
drm_ttm_helper         12288  1 amdgpu
ttm                   110592  2 amdgpu,drm_ttm_helper
drm_display_helper    237568  1 amdgpu
i2c_algo_bit           16384  1 amdgpu
video                  77824  1 amdgpu
可以通过 动态内核模块支持(DKMS) 检查状态来获知驱动是否安装完成:
dkms 状态dkms status
输出显示:
dkms 状态可以看到amdgpu驱动模块amdgpu/6.12.12-2194681.24.04, 6.8.0-71-generic, x86_64: installed
- 检查 - rocminfo输出中包含如下信息:
rocminfo 信息中包含显卡GPU信息[...]
*******
Agent 2
*******
  Name:                    gfx906
  Uuid:                    GPU-9332612173497dfc
  Marketing Name:          AMD Radeon Graphics
  Vendor Name:             AMD
  [...]
[...]
可以看到我的 AMD Radeon Instinct MI50 显示为 gfx906
- 检查GPU是否检测到(这里遇到了异常) 
clinfo 检查GPU是否列出clinfo
输出正常显示了 Platform Name 和 Board Name ,但是很不幸卡住了输出,最后出现了 GPU Hang 报错:
clinfo 检查出现GPU Hang报错...
  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon Graphics
...
HW Exception by GPU node-1 (Agent handle: 0x5bcb243beec0) reason :GPU Hang
Aborted (core dumped)
异常问题在后续排查
备注
到这里为止,就可以尝试 Ollama使用AMD GPU运行大模型
异常排查
虽然看上去成功安装了 ROCm 和  AMDGPU driver ,但是我发现 rocm-smi 输出显示没有可用的AMD GPU:
rocm-smi 显示没有可用AMD GPUWARNING: No AMD GPUs specified
===================================== ROCm System Management Interface =====================================
=============================================== Concise Info ===============================================
Device  Node  IDs           Temp    Power  Partitions          SCLK  MCLK  Fan  Perf  PwrCap  VRAM%  GPU%
              (DID,  GUID)  (Edge)  (Avg)  (Mem, Compute, ID)
============================================================================================================
============================================================================================================
=========================================== End of ROCm SMI Log ============================================
- 检查 - dmesg | grep amdgpu发现初始化异常,通过完整的- dmesg显示,似乎- atom_bios(看起来是bhyve模拟的bios存在问题不能支持- amdgpu)
[    3.252058] [drm] amdgpu kernel modesetting enabled.
[    3.252243] [drm] amdgpu version: 6.12.12
[    3.252406] [drm] OS DRM version: 6.8.0
[    3.253244] amdgpu: Virtual CRAT table created for CPU
[    3.253915] amdgpu: Topology: Add CPU node
[    3.257044] amdgpu 0000:00:06.0: can't derive routing for PCI INT A
[    3.257228] amdgpu 0000:00:06.0: PCI INT A: no GSI - using ISA IRQ 10
[    3.257425] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66A1 0x1002:0x0834 0x01).
[    3.257597] [drm] register mmio base: 0xC1000000
[    3.257784] [drm] register mmio size: 524288
[    3.258559] amdgpu 0000:00:06.0: amdgpu: detected ip block number 0 <soc15_common>
[    3.258754] amdgpu 0000:00:06.0: amdgpu: detected ip block number 1 <gmc_v9_0>
[    3.258909] amdgpu 0000:00:06.0: amdgpu: detected ip block number 2 <vega20_ih>
[    3.259063] amdgpu 0000:00:06.0: amdgpu: detected ip block number 3 <psp>
[    3.259210] amdgpu 0000:00:06.0: amdgpu: detected ip block number 4 <powerplay>
[    3.259354] amdgpu 0000:00:06.0: amdgpu: detected ip block number 5 <dm>
[    3.259496] amdgpu 0000:00:06.0: amdgpu: detected ip block number 6 <gfx_v9_0>
[    3.259634] amdgpu 0000:00:06.0: amdgpu: detected ip block number 7 <sdma_v4_0>
[    3.259790] amdgpu 0000:00:06.0: amdgpu: detected ip block number 8 <uvd_v7_0>
[    3.259925] amdgpu 0000:00:06.0: amdgpu: detected ip block number 9 <vce_v4_0>
[    3.260507] amdgpu 0000:00:06.0: ROM [??? 0x00000000 flags 0x20000000]: can't assign; bogus alignment
[    3.299305] amdgpu 0000:00:06.0: amdgpu: Fetched VBIOS from ROM
[    3.299903] amdgpu: ATOM BIOS: 113-D1631711-100
[    3.303174] [drm] UVD(0) is enabled in VM mode
[    3.303344] [drm] UVD(1) is enabled in VM mode
[    3.303496] [drm] UVD(0) ENC is enabled in VM mode
[    3.303646] [drm] UVD(1) ENC is enabled in VM mode
[    3.303825] [drm] VCE enabled in VM mode
[    3.303995] amdgpu 0000:00:06.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    3.304184] amdgpu 0000:00:06.0: amdgpu: MODE1 reset
[    3.304333] amdgpu 0000:00:06.0: amdgpu: GPU mode1 reset
[    3.304766] amdgpu 0000:00:06.0: amdgpu: GPU psp mode1 reset
[    3.814317] [drm] psp mode1 reset succeed
[    4.093317] [drm] GPU posting now...
[   24.094673] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
[   24.095199] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 4EC4 (len 74, WS 0, PS 8) @ 0x4EDC
[   24.095753] amdgpu 0000:00:06.0: amdgpu: gpu post error!
[   24.095905] amdgpu 0000:00:06.0: amdgpu: Fatal error during GPU init
[   24.096112] amdgpu 0000:00:06.0: amdgpu: amdgpu: finishing device.
[   24.096302] amdgpu: probe of 0000:00:06.0 failed with error -22
看起来驱动初始化异常,可能原因(可能性从高到低):
在bhyve中实现AMD GPU passthrough 对这款 AMD Radeon Instinct MI50 驱动对虚拟化支持存在问题
可能需要裸物理主机安装一个Ubuntu来对比验证
可能需要再部署一个 LFS(Linux from scratch) 来对比Linux环境 IOMMU 采用OVMF实现passthrough GPU和NVMe存储
AMDGPU driver可能需要降级到低版本来支持旧GPU
我看到Reddit上的一个帖子 Mi50 32gb (Working config, weirdness and performance) 可以正常使用ROCm和AMDGPU驱动(非虚拟机)
- 这个方法已经验证没有解决我的问题 PROXMOX论坛帖子 AMD GPU firmware/bios missing? amdgpu fatal error 提出了通过安装 - firmware-amd-graphics来解决。不过这个firmware是私有软件,我直接在Ubuntu中执行- apt install firmware-amd-graphics显示不存在(PROXMOX提供的虚拟机可能已经内置提供了软件仓库)
上述帖子提供了手工下载Firmware的方法: 在 debian firmware-nonfree 提供下载:
# Firmware download, extract and copy
wget http://ftp.debian.org/debian/pool/non-free-firmware/f/firmware-nonfree/firmware-amd-graphics_20250708-1_all.deb
dpkg -x firmware-amd-graphics_20250708-1_all.deb firmware-amd-graphics
cp -r firmware-amd-graphics/usr/lib/firmware/* /lib/firmware/
# Update "initramfs"
update-initramfs -k all -u
# Reboot
reboot
警告
我发现上述手工安装的 firmware 实际上在之前安装 amdgpu driver 已经安装在 /lib/firmware/amdgpu 目录下了
但是比较奇怪,在 /lib/firmware/amdgpu 目录下似乎是 .zst 后缀的压缩文件,例如 yellow_carp_vcn.bin.zst ; 而 firmware-amd-graphics/usr/lib/firmware/amdgpu/ 目录下是解压缩的文件,例如 yellow_carp_vcn.bin
我尝试了上述方法,没有解决问题,报错依旧
安装完成后步骤(归档)
备注
以前的实践笔记,有些步骤可能不再需要,例如 /etc/ld.so.conf.d/ 目录下现在安装软件包会自动添加 10-rocm-opencl.conf 和 20-amdgpu.conf
- 配置系统链接器为ROCm应用指明查找共享目标文件 - .so位置:
ldconfigsudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
/opt/rocm/lib
/opt/rocm/lib64
EOF
sudo ldconfig
- 配置 ROCm 执行文件路径,有以下两种方法(使用一个即可): - update-alternatives可以用来管理多个程序版本(当前 Debian 12 已经具备)
- environment-modules是shell工具用于简化初始化,可以使用module文件来修改会话的环境
 
这里使用 update-alternatives :
update-alternatives 显示所有的ROCm命令update-alternatives --list rocm
输出显示:
update-alternatives 显示ROCm的输出/opt/rocm-6.3.2
如果安装了多个 ROCm 版本,则使用以下命令切换版本:
update-alternatives 切换ROCm的版本update-alternatives --config rocm
- 验证内核模块驱动的安装: 
dkms 检查安装的内核模块驱动dkms status
输出显示主机安装了 AMD 和 NVIDIA 的模块驱动:
dkms 检查安装的内核模块驱动,这里可以看到AMD和NVIDIA驱动amdgpu/6.10.5-2109964.22.04, 6.1.0-31-amd64, amd64: installed
nvidia/570.86.15, 6.1.0-30-amd64, x86_64: installed
nvidia/570.86.15, 6.1.0-31-amd64, x86_64: installed
- 输出 - LD_LIBRARY_PATH,我这里修订- /etc/profile添加:
LD_LIBRARY_PATHexport LD_LIBRARY_PATH=/opt/rocm-6.3.2/lib
- 验证 ROCm 安装: 
rocminforocminfo
输出:
rocminfoROCk module version 6.10.5 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES
==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3100                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            24                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    396108548(0x179c2304) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    396108548(0x179c2304) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    396108548(0x179c2304) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    396108548(0x179c2304) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3100                               
  BDFID:                   0                                  
  Internal Node ID:        1                                  
  Compute Unit:            24                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    396352256(0x179fdb00) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    396352256(0x179fdb00) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    396352256(0x179fdb00) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    396352256(0x179fdb00) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 3                  
*******                  
  Name:                    gfx802                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD FirePro S7150                  
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26921(0x6929)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   920                                
  BDFID:                   3328                               
  Internal Node ID:        2                                  
  Compute Unit:            32                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 718                                
  SDMA engine uCode::      71                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8386560(0x7ff800) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8386560(0x7ff800) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx802          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 4                  
*******                  
  Name:                    gfx802                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD FirePro S7150                  
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    3                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26921(0x6929)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   920                                
  BDFID:                   3840                               
  Internal Node ID:        3                                  
  Compute Unit:            32                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 718                                
  SDMA engine uCode::      71                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8386560(0x7ff800) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8386560(0x7ff800) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx802          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***