FreeBSD 15环境bhyve中实现NVIDIA GPU passthrough
我的两次实践 在bhyve中实现NVIDIA GPU passthrough 使用INIPIN补丁在bhyve中实现NVIDIA GPU passthrough 都不太成功,考虑到当前开发的 bhyve: assign a valid INTPIN to NVIDIA GPUs 补丁是面向FreeBSD 15,并且FreeBSD 15还有2个多月(2025年12月2日)就要发布,我决定再尝试安装FreeBSD 15来验证是否可以支持我的两个 Nvidia Tesla P4 GPU运算卡 和 Nvidia Tesla P10 GPU运算卡 ,如果还存在问题,也方便向社区提交bug。
警告
实践暂时没有成功,我需要排除BIOS设置(组装机的 Above 4G Decoding BIOS设置 不确定是否正确,以及 PCIe bifurcation 配置也存疑)以及可能的硬件问题( Nvidia Tesla P4 GPU运算卡 还么有实际成功使用过)
我准备切换到 LFS(Linux from scratch) 来尝试在Linux上先完成 采用OVMF实现passthrough GPU和NVMe存储 ,验证无误后再重新尝试 bhyve 以便能够排除各种影响因素
准备工作
安装 FreeBSD 15 Alpha 1 版本,不过我在 vm-bhyve 安装 Ubuntu Linux 24.04.3 遇到安装过程出现 crash导致没有安装成功
正好当天发现社区发布了Alpha 2,所以 FreeBSD 15 Alphas更新和升级 到 Alpha 2重新开始
但后来发现,在 Alpha 2 中也同样报错,不过实际操作系统已经安装完成,只是在以后 cloud-init 步骤时crash,似乎不影响后续vm启动运行。暂时忽略这个错误
配置PCI passthru
检查 PCI 设备:
vm 检查 passthru 设备列表vm passthru
备注
我分别测试了 Nvidia Tesla P10 GPU运算卡 和 Nvidia Tesla P4 GPU运算卡 ,单独安装其中任一设备,都识别为 BHYVE ID 是 1/0/0 ,所以下文案例共用步骤
vm 检查 passthru 设备列表: Tesla P10DEVICE BHYVE ID READY DESCRIPTION
...
vgapci0 1/0/0 No GP102GL [Tesla P10]
...
vm 检查 passthru 设备列表: Tesla P4DEVICE BHYVE ID READY DESCRIPTION
...
vgapci0 1/0/0 No GP104GL [Tesla P4]
...
配置
/boot/loader.conf屏蔽掉需要passthru的GPU:
1/0/0pptdevs="1/0/0"
重启系统,然后再次检查
vm passthru,此时看到Tesla P10/Tesla P4的设备一列应该显示为ppt0:
ppt0DEVICE BHYVE ID READY DESCRIPTION
...
ppt0 1/0/0 No GP102GL [Tesla P10]
...
ppt0DEVICE BHYVE ID READY DESCRIPTION
...
ppt0 1/0/0 No GP102GL [Tesla P4]
...
修订
xdev虚拟机配置/zroot/vms/xdev/xdev.conf
xdev 添加直通PCI设备 1/0/0 也就是 Nvidia Tesla P10 GPU运算卡 / Nvidia Tesla P4 GPU运算卡loader="uefi"
cpu=4
memory=16G
wired_memory="yes"
network0_type="virtio-net"
network0_switch="igc0bridge"
network0_device="tap0"
disk0_name="disk0"
disk0_dev="sparse-zvol"
disk0_type="virtio-blk"
passthru0="1/0/0"
graphics="yes"
graphics_listen="0.0.0.0"
graphics_port="5900"
uuid="2a96f70d-8988-11f0-be9b-0003ee002989"
network0_mac="58:9c:fc:0c:9c:e4"
启动
xdev虚拟机后,在虚拟机内部检查:
root@xdev:~# lspci -v -s 00:06.0
00:06.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
Flags: bus master, fast devsel, latency 0
Memory at c2000000 (32-bit, non-prefetchable) [size=16M]
Memory at 800000000 (64-bit, prefetchable) [size=256M]
Memory at 810000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Kernel modules: nvidiafb, nouveau
安装CUDA驱动
参考之前在 在Ubuntu安装NVIDIA CUDA 经验,也包括我之前 Bhyve环境Ubuntu虚拟机运行Tesla P4 GPU的Docker ,快速完成 cuda-driver 安装:
采用 Debian精简系统初始化 纯后台服务器系统安装开发工具的方式(安装
build-essential为主)
sudo apt install build-essential cmake vim-nox python3-dev -y
CUDA驱动需要内核头文件以及开发工具包来完成内核相关的驱动安装,因为内核驱动需要根据内核进行编译
安装 linux-headers (不过直接安装 cuda-driver 也会自动依赖安装):
apt-get install linux-headers-$(uname -r)
从NVIDIA官方提供 NVIDIA CUDA Toolkit repo 下载 选择
linux=>x86_64=>Ubuntu=>24.04=>deb(network)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
安装驱动
cuda-driver:
sudo apt-get -y install cuda-drivers
重启虚拟机操作系统
Nvidia Tesla P4 GPU运算卡
没有 INTPIN 补丁之前
安装完驱动以后检查
lspci信息
lspci 信息root@xdev:~# lspci -v -s 00:06.0
00:06.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
Flags: fast devsel
Memory at c2000000 (32-bit, non-prefetchable) [size=16M]
Memory at 800000000 (64-bit, prefetchable) [size=256M]
Memory at 810000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
安装完驱动后检查(此时
bhyve尚未安装补丁),可以看到dmesg有报错信息:
...
[ 2.550435] nvidia: loading out-of-tree module taints kernel.
[ 2.550445] nvidia: module license 'NVIDIA' taints kernel.
[ 2.550447] Disabling lock debugging due to kernel taint
[ 2.550451] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 2.550452] nvidia: module license taints kernel.
[ 2.568334] loop0: detected capacity change from 0 to 8
[ 2.683645] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[ 2.683654] NVRM: Can't find an IRQ for your NVIDIA card!
[ 2.690103] NVRM: Please check your BIOS settings.
[ 2.690104] NVRM: [Plug & Play OS] should be set to NO
[ 2.690105] NVRM: [Assign IRQ to VGA] should be set to YES
[ 2.690646] nvidia: probe of 0000:00:06.0 failed with error -1
[ 2.690691] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 2.690692] NVRM: None of the NVIDIA devices were initialized.
[ 2.694327] nvidia-nvlink: Unregistered Nvlink Core, major device number 239
...
安装 INTPIN 补丁之后
bhyvecd /usr/src/usr.sbin/bhyve/
make -j8
sudo make install
重启虚拟机,然后检查
dmesg发现报错信息显示Failed to allocate NvKmsKapiDevice(不再抱IRQ错误)
Failed to allocate NvKmsKapiDevice 错误[ 2.844287] nvidia: loading out-of-tree module taints kernel.
[ 2.844296] nvidia: module license 'NVIDIA' taints kernel.
[ 2.844298] Disabling lock debugging due to kernel taint
[ 2.844301] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 2.844303] nvidia: module license taints kernel.
[ 2.977045] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[ 2.988219] nvidia 0000:00:06.0: can't derive routing for PCI INT A
[ 2.988223] nvidia 0000:00:06.0: PCI INT A: no GSI - using ISA IRQ 10
[ 3.031275] loop0: detected capacity change from 0 to 8
[ 3.122879] NET: Registered PF_QIPCRTR protocol family
[ 3.221719] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 580.82.07 Wed Aug 27 18:39:48 UTC 2025
[ 3.238952] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 580.82.07 Wed Aug 27 18:05:23 UTC 2025
[ 3.441376] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x31:0xffff:2767)
[ 3.441394] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0
[ 3.453812] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x31:0xffff:2767)
[ 3.453816] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0
[ 3.453861] [drm] [nvidia-drm] [GPU ID 0x00000006] Loading driver
[ 3.459751] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x31:0xffff:2767)
[ 3.459757] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0
[ 3.460136] [drm:nv_drm_dev_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000006] Failed to allocate NvKmsKapiDevice
[ 3.460369] [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000006] Failed to load device
上述报错有可能bios错误,也可能是硬件问题,也可能是vm passthru的问题。
备注
我无法判断解决,这个杂牌的主机我之前配置了 PCI-E Bifurcation ,但是我现在又找不到BIOS的入口
我尝试将BIOS的配置恢复为出场默认设置,但是启动虚拟机之后的GPU初始化报错依旧如上
Nvidia Tesla P10 GPU运算卡
当我将BIOS配置恢复为默认出厂设置,没有配置 Above 4G Decoding BIOS设置 时,启动
xdev虚拟机时,使用 Nvidia Tesla P10 GPU运算卡 的虚拟机直接导致Host主机kernel panic了:
我重启到BIOS中,设置 Above 4G Decoding BIOS设置 其中的一个步骤(只调整了一个参数)
这次启动VM后,Host主机不再kernel panic,但是,又回到之前的问题: 引入了 Nvidia Tesla P10 GPU运算卡 的VM没有正常启动,VNC是黑屏