.. _libvirt_network_pool_sr-iov:
=============================
Libvirt管理SR-IOV虚拟网络池
=============================
:ref:`libvirt` 是管理虚拟设备和hypervisor的API及服务,也提供了一种通过创建虚拟网络资源池的方式来管理VF,不需要像 :ref:`config_sr-iov_network` 复杂的对PCI设备ID进行查询和配置,只需要提供一个物理网卡设备( ``PF`` )给libivrt,然后在KVM创建时引用这个虚拟网卡资源池就可以自动分配VF。
准备
=======
和 :ref:`config_sr-iov_network` 一样,首先需要确保内核已经激活启用 :ref:`iommu` ,也就是内核配置::
intel_iommu=on iommu=pt
配置方法参见 :ref:`config_sr-iov_network`
- 激活 VF::
for i in {0..3};do
n=$[49+$i]
# 激活VF eno49 ~ eno52
echo 7 | sudo tee /sys/class/net/eno${n}/device/sriov_numvfs
done
- 设置启动操作系统时自动激活VF:
虽然可以如 :ref:`config_sr-iov_network` 中所述,采用命令行(或者启动 ``/etc/rc.d/rc.local`` )来激活。但是,在启动操作系统时候自动配置设备的标准且推荐方法是采用 :ref:`udev` (毕竟运维工作是一个标准化协作过程),所以,配置 ``/etc/udev/rules.d/igb.rules`` ::
ACTION=="add", SUBSYSTEM=="net", ENV{ID_NET_DRIVER}=="igb", ATTR{device/sriov_numvfs}="7"
这样操作系统启动时,使用 ``igb`` 驱动的网卡(4口Intel I350)都会配置VF
- 检查VF::
lspci | grep -i i350
可以看到::
04:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
04:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
04:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
04:10.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
04:10.1 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
04:10.2 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
04:10.3 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
04:10.4 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
04:10.5 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
04:10.6 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
04:10.7 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
04:11.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
...
- 验证设备详情
物理网卡 ``eno49`` 对应的 PCI 设备ID 是 ``04:00.0`` ,通过 ``virsh nodedev-list | grep 04_00_0`` 可以看到::
pci_0000_04_00_0
这个设备在virsh管理中就是物理网卡,我们可以通过命令查看::
virsh nodedev-dumpxml pci_0000_04_00_0
输出会显示PF以及对应所有VF:
.. literalinclude:: libvirt_network_pool_sr-iov/virsh_node-dumpxml_pci_0000_04_00_0.xml
:language: xml
:linenos:
:caption: virsh nodedev-dumpxml pci_0000_04_00_0 检查SR-IOV的PF及所有VF
我们也可以检查VF ,例如第一个VF ``
`` ::
virsh nodedev-dumpxml pci_0000_04_10_0
输出这个VF的相信信息:
.. literalinclude:: libvirt_network_pool_sr-iov/virsh_node-dumpxml_pci_0000_04_10_0.xml
:language: xml
:linenos:
:caption: virsh nodedev-dumpxml pci_0000_04_10_0 检查指定VF
较为复杂的VF添加
------------------
添加VF时可以指定VLAN,例如:
.. literalinclude:: libvirt_network_pool_sr-iov/eno49vf0-vlan.xml
:language: xml
:linenos:
:caption: 配置VF的VLAN等复杂案例
然后添加到虚拟机::
virsh attach-device MyGuest eno49vf0-vlan.xml --live --config
创建SR-IOV虚拟网络资源池
==========================
使用硬编码配置PCI地址方式VF有2个缺陷:
- 当guest虚拟机启动时,特定VF必须可用: 这对管理员来说非常麻烦,需要指定每个VF和每个指定虚拟机
- 如果虚拟机被迁移到另外一台物理主机,则另一台物理服务器必须在PCI总线相同位置有相同的硬件,否则虚拟机配置必须修改后才能启动
为了解决上述问题,通过创建一个libvirt网络设备池来包含一个SR-IOV设备的所有VF。只要配置guest虚拟机引用这个网络,每次启动虚拟机,一个VF就会从资源池分配给虚拟机。一旦虚拟机停止,VF就会返回资源池用于另一个虚拟机。
- 网络资源池配置:
.. literalinclude:: libvirt_network_pool_sr-iov/eno49-sr-iov.xml
:language: xml
:linenos:
:caption: 配置eno49网卡的VF网络资源池
- 加载网络资源池定义::
virsh net-define eno49-sr-iov.xml
- 配置定义的网络自动启动::
virsh net-autostart eno49-sr-iov
- 启动 ``eno49-sr-iov`` 网络资源池::
virsh net-start eno49-sr-iov
然后检查::
virsh net-list
可以看到::
Name State Autostart Persistent
-------------------------------------------------
default active yes yes
eno49-sr-iov active yes yes
通过libvirt网络资源池分配VF给VM
=================================
- 配置 ``vm-sr-iov.xml`` :
.. literalinclude:: libvirt_network_pool_sr-iov/vm-sr-iov.xml
:language: xml
:linenos:
:caption: 配置虚拟机sr-iov设备xml
- 添加设备::
virsh attach-device z-k8s-n-1 vm-sr-iov.xml --config
检查虚拟机设备::
virsh dumpxml z-k8s-n-1
可以看到虚拟机添加了一段网络设备配置::
奇怪,怎么显示是 ``type='rtl8139'`` ,并且地址也和之前VF不同?
- 启动虚拟机::
virsh start z-k8s-n-1
vfio权限问题
================
- 启动虚拟机::
virsh start z-k8s-n-1
提示报错::
error: Failed to start domain z-k8s-n-1
error: internal error: qemu unexpectedly closed the monitor: 2021-12-18T15:13:32.733835Z qemu-system-x86_64: -device vfio-pci,host=0000:04:10.0,id=hostdev0,bus=pci.8,addr=0x1: vfio 0000:04:10.0: failed to open /dev/vfio/94: Permission denied
这里可以看出,其实 ``vfio`` 映射还是访问 ``vfio 0000:04:10.0`` 也就是VF设备
但是,为何没有权限?我尝试了加上 ``sudo`` 也是同样报错
在 ``/var/log/libvirt/qemu/z-k8s-n-1.log`` 中有日志记录::
2021-12-18T15:36:17.735032Z qemu-system-x86_64: -device vfio-pci,host=0000:04:10.0,id=hostdev0,bus=pci.8,addr=0x1: vfio 0000:04:10.0: failed to open /dev/vfio/94: Permission denied
2021-12-18 15:36:17.858+0000: shutting down, reason=failed
在 `Bug 1196185 - libvirt doesn't set permissions for VFIO endpoint `_ 说明::
RHEV by default sets dynamic_ownership=0, which caused the endpoint not to be accessible by qemu (and we explicitly told libvirt not to do it for us). Works with dynamic_ownership=1.
我检查了 ``/etc/libvirt/qemu.conf`` 有这个配置::
# Whether libvirt should dynamically change file ownership
# to match the configured user/group above. Defaults to 1.
# Set to 0 to disable file ownership changes.
#dynamic_ownership = 1
看起来默认就是 ``1``
检查host主机 ``ls -lh /dev/vfio/*`` 输出是::
crw------- 1 root root 243, 0 Dec 16 09:20 /dev/vfio/39
crw------- 1 root root 243, 1 Dec 16 09:20 /dev/vfio/40
crw------- 1 root root 243, 2 Dec 16 09:20 /dev/vfio/41
crw------- 1 root root 243, 3 Dec 16 09:20 /dev/vfio/79
crw-rw-rw- 1 root root 10, 196 Dec 16 09:20 /dev/vfio/vfio
并没有看到设备 ``/dev/vfio/94`` 这个设备
- 尝试重启操作系统,重启操作系统后执行::
virsh start z-k8s-n-1
提示报错::
error: Failed to start domain z-k8s-n-1
error: internal error: Unable to configure VF 0 of PF 'eno49' because the PF is not online. Please change host network config to put the PF online.
- 检查 ``ifconfig -a | grep eno`` 输出显示网卡PF ( ``eno49`` 到 ``eno52`` )确实没有激活( ``UP`` )::
...
eno49: flags=4098 mtu 1500
...
eno49v0: flags=4098 mtu 1500
eno49v1: flags=4098 mtu 1500
...
那么,如何能够自动激活 ``eno49`` 同时不分配IP地址呢? 参考 `Bring up but don't assign address with Netplan `_ 配置 ``/etc/netplan/02-eno49-config.yaml``
.. literalinclude:: libvirt_network_pool_sr-iov/02-eno49-config.yaml
:language: xml
:linenos:
:caption: netplan激活eno49但不分配IP的方法
然后执行::
sudo netplan apply
此时 ``ifconfig -a | grep eno`` ::
...
eno49: flags=4163 mtu 1500
eno49v0: flags=4098 mtu 1500
eno49v1: flags=4098 mtu 1500
Ok,解决了 ``eno49`` 的 ``UP`` 问题,依然在 ``virsh start z-k8s-n-1`` 遇到报错::
error: Failed to start domain z-k8s-n-1
error: internal error: qemu unexpectedly closed the monitor: 2021-12-19T15:12:47.375350Z qemu-system-x86_64: -device vfio-pci,host=0000:04:10.0,id=hostdev0,bus=pci.8,addr=0x1: vfio 0000:04:10.0: failed to open /dev/vfio/96: Permission denied
我找到两种可能解决方法:
- `Permission denied when using vfio with interface pools `_
提供的解决方法是修订 ``/etc/apparmor.d/abstractions/libvirt-qemu`` ( ``bionic`` 版本),或者在更高版本,修订覆盖配置文件 ``/etc/apparmor.d/local/abstractions/libvirt-qemu`` ,将::
# for vfio hotplug on systems without static vfio (LP: #1775777)
/dev/vfio/vfio rw,
修改成::
/dev/vfio/* rw,
由于我是最新版本,所以我在 ``/etc/apparmor.d/local/abstractions/libvirt-qemu`` 添加了一行::
/dev/vfio/* rw,
然后就可以正常启动虚拟机
- `failed to open /dev/vfio/13: Permission denied `_ 提供了另一中解决思路,就是添加一个 :ref:`udev` 规则::
SUBSYSTEM=="vfio", OWNER="root", GROUP="kvm"
这样所有的vfio设备都会被qemu读写。这个思路应该可行,不过我没有实践
虚拟机检查
============
正确启动虚拟机之后,登陆 ``z-k8s-n-1`` 检查网卡::
$ lspci | grep -i eth
可以看到有2个ethernet设备::
01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
07:01.0 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
其中有一个是 ``Intel I350`` 的 VF设备
- 检查网卡::
ip addr
看到::
2: ens1: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 52:54:00:2b:4e:d3 brd ff:ff:ff:ff:ff:ff
3: enp1s0: mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:ff:37:67 brd ff:ff:ff:ff:ff:ff
inet 192.168.6.111/24 brd 192.168.6.255 scope global enp1s0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:feff:3767/64 scope link
valid_lft forever preferred_lft forever
根据 ``virsh dumpxml z-k8s-n-1`` 输出有关 ::
可以知道 ``ens1`` 就是 ``SR-IOV`` 设备
注入多块 ``SR-IOV``
=====================
规划在一个虚拟机中注入4个 ``SR-IOV`` 网卡,作为后续Kubernetes节点容器使用,所以对该虚拟机再次执行::
virsh attach-device z-k8s-n-1 vm-sr-iov.xml --live --config
然后检查 ``virsh dumpxml z-k8s-n-1`` ,果然,具备了第二块SR-IOV网卡::
此时,在虚拟机内部检查::
2: ens1: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 52:54:00:2b:4e:d3 brd ff:ff:ff:ff:ff:ff
3: enp1s0: mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:ff:37:67 brd ff:ff:ff:ff:ff:ff
inet 192.168.6.111/24 brd 192.168.6.255 scope global enp1s0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:feff:3767/64 scope link
valid_lft forever preferred_lft forever
4: enp8s0: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 52:54:00:47:82:9e brd ff:ff:ff:ff:ff:ff
但是,再注入第3块SR-IOV::
virsh attach-device z-k8s-n-1 vm-sr-iov.xml --live --config
报错::
error: Failed to attach device from vm-sr-iov.xml
error: internal error: No more available PCI slots
这个问题参考 `libvirtd: No more available PCI slots `_ ,去掉 ``--live`` 参数,只修改配置,然后重新启动虚拟机,此时libvirt会自动添加所需的pcie-root-port
按照上述建议方法,我再重复执行2次::
virsh attach-device z-k8s-n-1 vm-sr-iov.xml --config
然后确保 ``z-k8s-n-1`` 中具备了4个SR-IOV设备配置,然后重新启动虚拟机,登陆虚拟机就可以看到虚拟机除了一块 virtio-net 虚拟网卡,还添加了4块 ``SR-IOV`` 网卡::
2: enp1s0: mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:ff:37:67 brd ff:ff:ff:ff:ff:ff
inet 192.168.6.111/24 brd 192.168.6.255 scope global enp1s0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:feff:3767/64 scope link
valid_lft forever preferred_lft forever
3: ens1: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 52:54:00:2b:4e:d3 brd ff:ff:ff:ff:ff:ff
4: ens2: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 52:54:00:47:82:9e brd ff:ff:ff:ff:ff:ff
5: ens3: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 52:54:00:ed:e4:a3 brd ff:ff:ff:ff:ff:ff
6: ens4: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 52:54:00:16:55:cf brd ff:ff:ff:ff:ff:ff
- 在 ``z-k8s-n-2`` 上我也采用上述方法执行4次::
virsh attach-device z-k8s-n-2 vm-sr-iov.xml --config
virsh attach-device z-k8s-n-2 vm-sr-iov.xml --config
virsh attach-device z-k8s-n-2 vm-sr-iov.xml --config
virsh attach-device z-k8s-n-2 vm-sr-iov.xml --config
但是启动 ``virsh start z-k8s-n-2`` 报错::
error: Failed to start domain z-k8s-n-2
error: internal error: network 'eno49-sr-iov' requires exclusive access to interfaces, but none are available
原因是 ``Intel I350`` 网卡,也就是 ``igb`` 只支持7个VF,另外一个是PF不能添加到虚拟机内部,所以,对于第二台虚拟机,最多只能添加3个SR-IOV VF。
``virsh edit z-k8s-n-2`` 去除掉第4个添加的VF,就能正常启动了。
参考
=======
- `Configure SR-IOV Network Virtual Functions in Linux KVM `_
- `Red Hat Enterprise Linux > 7 > Virtualization Deployment and Administration Guide > 16.2. PCI Device Assignment with SR-IOV Devices `_