Introduction
Virtio is an open standard that defines a protocol for communication between drivers and devices of different types, see Chapter 5 (“Device Types”) of the virtio spec ([1]). Originally developed as a standard for paravirtualized devices implemented by a hypervisor, it can be used to interface any compliant device (real or emulated) with a driver.
Virtio是一个开放的标准,它定义了驱动程序和不同类型的设备之间的通信协议,见virtio规范([1])的第五章(”设备类型”)。它最初是作为由管理程序实现的准虚拟化设备的标准而开发的,但它可以用来将任何符合要求的设备(真实的或模拟的)与驱动程序连接。
For illustrative purposes, this document will focus on the common case of a Linux kernel running in a virtual machine and using paravirtualized devices provided by the hypervisor, which exposes them as virtio devices via standard mechanisms such as PCI.
为了说明问题,本文将重点讨论Linux内核在虚拟机中运行并使用由管理程序提供的准虚拟化设备的常见情况,管理程序通过标准机制(如PCI)将它们暴露为virtio设备。
Device - Driver communication: virtqueues
Although the virtio devices are really an abstraction layer in the hypervisor, they’re exposed to the guest as if they are physical devices using a specific transport method – PCI, MMIO or CCW – that is orthogonal to the device itself. The virtio spec defines these transport methods in detail, including device discovery, capabilities and interrupt handling.
尽管virtio设备实际上是管理程序中的一个抽象层,但它们被暴露给客户,就像它们是使用特定的传输方法–PCI、MMIO或CCW–的物理设备一样,这与设备本身是正交的。virtio规范详细定义了这些传输方法,包括设备发现、能力和中断处理。
The communication between the driver in the guest OS and the device in the hypervisor is done through shared memory (that’s what makes virtio devices so efficient) using specialized data structures called virtqueues, which are actually ring buffers 1 of buffer descriptors similar to the ones used in a network device:
客户操作系统中的驱动程序和管理程序中的设备之间的通信是通过共享内存完成的(这就是virtio设备如此高效的原因),使用称为virtqueues的专门数据结构,这实际上是类似于网络设备中使用的缓冲区描述符的环形缓冲区1。
struct vring_desc
Virtio ring descriptors, 16 bytes long. These can chain together via next.
Definition:
1 | struct vring_desc { |
Members
addrbuffer address (guest-physical)
lenbuffer length
flagsdescriptor flags
nextindex of the next descriptor in the chain, if the VRING_DESC_F_NEXT flag is set. We chain unused descriptors via this, too.
All the buffers the descriptors point to are allocated by the guest and used by the host either for reading or for writing but not for both.
Refer to Chapter 2.5 (“Virtqueues”) of the virtio spec ([1]) for the reference definitions of virtqueues and “Virtqueues and virtio ring: How the data travels” blog post ([2]) for an illustrated overview of how the host device and the guest driver communicate.
描述符指向的所有缓冲区都是由guest分配的,并由host用于读取或写入,但不能同时使用。
请参考virtio规范([1])的第2.5章(”虚拟队列”),了解虚拟队列的参考定义和 “虚拟队列和virtio环。数据是如何传输的 “博文([2]),以图文并茂的方式概述了主机设备和客户驱动的通信方式。
The vring_virtqueue struct models a virtqueue, including the ring buffers and management data. Embedded in this struct is the virtqueue struct, which is the data structure that’s ultimately used by virtio drivers:
struct virtqueue
a queue to register buffers for sending or receiving.
Definition:
1 | struct virtqueue { |
Members
listthe chain of virtqueues for this device
callbackthe function to call when buffers are consumed (can be NULL).
namethe name of this virtqueue (mainly for debugging)
vdevthe virtio device this queue was created for.
indexthe zero-based ordinal number for this queue.
num_freenumber of elements we expect to be able to fit.
num_maxthe maximum number of elements supported by the device.
priva pointer for the virtqueue implementation to use.
resetvq is in reset state or not.
Description
A note on num_free: with indirect buffers, each buffer needs one element in the queue, otherwise a buffer will need one element per sg element.
The callback function pointed by this struct is triggered when the device has consumed the buffers provided by the driver. More specifically, the trigger will be an interrupt issued by the hypervisor (see
vring_interrupt()). Interrupt request handlers are registered for a virtqueue during the virtqueue setup process (transport-specific).
关于num_free的说明:对于间接缓冲区,每个缓冲区需要队列中的一个元素,否则一个缓冲区将需要每个sg元素的一个元素。
当设备消耗完驱动提供的缓冲区时,这个结构所指向的回调函数会被触发。更具体地说,触发器将是由管理程序发出的中断(见vring_interrupt())。中断请求处理程序是在虚拟队列设置过程中为虚拟队列注册的(特定于传输)。
irqreturn_t vring_interrupt(int irq, void *_vq)
notify a virtqueue on an interrupt
Parameters
int irqthe IRQ number (ignored)
void *_vqthe
struct virtqueueto notify
Description
Calls the callback function of _vq to process the virtqueue notification.
Device discovery and probing
In the kernel, the virtio core contains the virtio bus driver and transport-specific drivers like virtio-pci and virtio-mmio. Then there are individual virtio drivers for specific device types that are registered to the virtio bus driver.
在内核中,virtio核心包含virtio总线驱动和特定的传输驱动,如virtio-pci和virtio-mmio。然后,还有针对特定设备类型的单独的virtio驱动程序,它们被注册到virtio总线驱动程序上。
How a virtio device is found and configured by the kernel depends on how the hypervisor defines it. Taking the QEMU virtio-console device as an example. When using PCI as a transport method, the device will present itself on the PCI bus with vendor 0x1af4 (Red Hat, Inc.) and device id 0x1003 (virtio console), as defined in the spec, so the kernel will detect it as it would do with any other PCI device.
内核如何发现和配置virtio设备,取决于管理程序如何定义它。以QEMU virtio-console设备为例。当使用PCI作为传输方式时,该设备将在PCI总线上以供应商0x1af4(Red Hat, Inc.)和设备ID 0x1003(virtio console)的形式出现,正如规范中所定义的那样,所以内核会像检测其他PCI设备那样检测它。
During the PCI enumeration process, if a device is found to match the virtio-pci driver (according to the virtio-pci device table, any PCI device with vendor id = 0x1af4):
在PCI枚举过程中,如果发现一个设备与virtio-pci驱动相匹配(根据virtio-pci设备表,任何PCI设备的厂商ID=0x1af4)。
1 | /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */ |
then the virtio-pci driver is probed and, if the probing goes well, the device is registered to the virtio bus:
然后对virtio-pci驱动进行探测,如果探测顺利,该设备就被注册到virtio总线上。
1 | static int virtio_pci_probe(struct pci_dev *pci_dev, |
When the device is registered to the virtio bus the kernel will look for a driver in the bus that can handle the device and call that driver’s probe method.
At this point, the virtqueues will be allocated and configured by calling the appropriate virtio_find helper function, such as virtio_find_single_vq() or virtio_find_vqs(), which will end up calling a transport-specific find_vqs method.
当设备被注册到virtio总线上时,内核将在总线上寻找一个可以处理该设备的驱动程序,并调用该驱动程序的探测方法。
此时,将通过调用适当的virtio_find辅助函数,如virtio_find_single_vq()或virtio_find_vqs()来分配和配置virtqueues,最终会调用一个特定于传输的find_vqs方法。