Virtio on Linux

Introduction

Virtio is an open standard that defines a protocol for communication between drivers and devices of different types, see Chapter 5 (“Device Types”) of the virtio spec ([1]). Originally developed as a standard for paravirtualized devices implemented by a hypervisor, it can be used to interface any compliant device (real or emulated) with a driver.

Virtio是一个开放的标准,它定义了驱动程序和不同类型的设备之间的通信协议,见virtio规范([1])的第五章(”设备类型”)。它最初是作为由管理程序实现的准虚拟化设备的标准而开发的,但它可以用来将任何符合要求的设备(真实的或模拟的)与驱动程序连接。

For illustrative purposes, this document will focus on the common case of a Linux kernel running in a virtual machine and using paravirtualized devices provided by the hypervisor, which exposes them as virtio devices via standard mechanisms such as PCI.

为了说明问题,本文将重点讨论Linux内核在虚拟机中运行并使用由管理程序提供的准虚拟化设备的常见情况,管理程序通过标准机制(如PCI)将它们暴露为virtio设备。

Device - Driver communication: virtqueues

Although the virtio devices are really an abstraction layer in the hypervisor, they’re exposed to the guest as if they are physical devices using a specific transport method – PCI, MMIO or CCW – that is orthogonal to the device itself. The virtio spec defines these transport methods in detail, including device discovery, capabilities and interrupt handling.

尽管virtio设备实际上是管理程序中的一个抽象层,但它们被暴露给客户,就像它们是使用特定的传输方法–PCI、MMIO或CCW–的物理设备一样,这与设备本身是正交的。virtio规范详细定义了这些传输方法,包括设备发现、能力和中断处理。

The communication between the driver in the guest OS and the device in the hypervisor is done through shared memory (that’s what makes virtio devices so efficient) using specialized data structures called virtqueues, which are actually ring buffers 1 of buffer descriptors similar to the ones used in a network device:

客户操作系统中的驱动程序和管理程序中的设备之间的通信是通过共享内存完成的(这就是virtio设备如此高效的原因),使用称为virtqueues的专门数据结构,这实际上是类似于网络设备中使用的缓冲区描述符的环形缓冲区1

struct vring_desc

Virtio ring descriptors, 16 bytes long. These can chain together via next.

Definition:

1
2
3
4
5
6
struct vring_desc {
__virtio64 addr;
__virtio32 len;
__virtio16 flags;
__virtio16 next;
};

Members

  • addr

    buffer address (guest-physical)

  • len

    buffer length

  • flags

    descriptor flags

  • next

    index of the next descriptor in the chain, if the VRING_DESC_F_NEXT flag is set. We chain unused descriptors via this, too.

All the buffers the descriptors point to are allocated by the guest and used by the host either for reading or for writing but not for both.

Refer to Chapter 2.5 (“Virtqueues”) of the virtio spec ([1]) for the reference definitions of virtqueues and “Virtqueues and virtio ring: How the data travels” blog post ([2]) for an illustrated overview of how the host device and the guest driver communicate.

描述符指向的所有缓冲区都是由guest分配的,并由host用于读取或写入,但不能同时使用。

请参考virtio规范([1])的第2.5章(”虚拟队列”),了解虚拟队列的参考定义和 “虚拟队列和virtio环。数据是如何传输的 “博文([2]),以图文并茂的方式概述了主机设备和客户驱动的通信方式。

The vring_virtqueue struct models a virtqueue, including the ring buffers and management data. Embedded in this struct is the virtqueue struct, which is the data structure that’s ultimately used by virtio drivers:

struct virtqueue

a queue to register buffers for sending or receiving.

Definition:

1
2
3
4
5
6
7
8
9
10
11
struct virtqueue {
struct list_head list;
void (*callback)(struct virtqueue *vq);
const char *name;
struct virtio_device *vdev;
unsigned int index;
unsigned int num_free;
unsigned int num_max;
void *priv;
bool reset;
};

Members

  • list

    the chain of virtqueues for this device

  • callback

    the function to call when buffers are consumed (can be NULL).

  • name

    the name of this virtqueue (mainly for debugging)

  • vdev

    the virtio device this queue was created for.

  • index

    the zero-based ordinal number for this queue.

  • num_free

    number of elements we expect to be able to fit.

  • num_max

    the maximum number of elements supported by the device.

  • priv

    a pointer for the virtqueue implementation to use.

  • reset

    vq is in reset state or not.

Description

A note on num_free: with indirect buffers, each buffer needs one element in the queue, otherwise a buffer will need one element per sg element.

The callback function pointed by this struct is triggered when the device has consumed the buffers provided by the driver. More specifically, the trigger will be an interrupt issued by the hypervisor (see vring_interrupt()). Interrupt request handlers are registered for a virtqueue during the virtqueue setup process (transport-specific).

关于num_free的说明:对于间接缓冲区,每个缓冲区需要队列中的一个元素,否则一个缓冲区将需要每个sg元素的一个元素。

当设备消耗完驱动提供的缓冲区时,这个结构所指向的回调函数会被触发。更具体地说,触发器将是由管理程序发出的中断(见vring_interrupt())。中断请求处理程序是在虚拟队列设置过程中为虚拟队列注册的(特定于传输)。

irqreturn_t vring_interrupt(int irq, void *_vq)

notify a virtqueue on an interrupt

Parameters

Description

Calls the callback function of _vq to process the virtqueue notification.

Device discovery and probing

In the kernel, the virtio core contains the virtio bus driver and transport-specific drivers like virtio-pci and virtio-mmio. Then there are individual virtio drivers for specific device types that are registered to the virtio bus driver.

在内核中,virtio核心包含virtio总线驱动和特定的传输驱动,如virtio-pci和virtio-mmio。然后,还有针对特定设备类型的单独的virtio驱动程序,它们被注册到virtio总线驱动程序上。

How a virtio device is found and configured by the kernel depends on how the hypervisor defines it. Taking the QEMU virtio-console device as an example. When using PCI as a transport method, the device will present itself on the PCI bus with vendor 0x1af4 (Red Hat, Inc.) and device id 0x1003 (virtio console), as defined in the spec, so the kernel will detect it as it would do with any other PCI device.

内核如何发现和配置virtio设备,取决于管理程序如何定义它。以QEMU virtio-console设备为例。当使用PCI作为传输方式时,该设备将在PCI总线上以供应商0x1af4(Red Hat, Inc.)和设备ID 0x1003(virtio console)的形式出现,正如规范中所定义的那样,所以内核会像检测其他PCI设备那样检测它。

During the PCI enumeration process, if a device is found to match the virtio-pci driver (according to the virtio-pci device table, any PCI device with vendor id = 0x1af4):

在PCI枚举过程中,如果发现一个设备与virtio-pci驱动相匹配(根据virtio-pci设备表,任何PCI设备的厂商ID=0x1af4)。

1
2
3
4
5
/* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
static const struct pci_device_id virtio_pci_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
{ 0 }
};

then the virtio-pci driver is probed and, if the probing goes well, the device is registered to the virtio bus:

然后对virtio-pci驱动进行探测,如果探测顺利,该设备就被注册到virtio总线上。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static int virtio_pci_probe(struct pci_dev *pci_dev,
const struct pci_device_id *id)
{
...

if (force_legacy) {
rc = virtio_pci_legacy_probe(vp_dev);
/* Also try modern mode if we can't map BAR0 (no IO space). */
if (rc == -ENODEV || rc == -ENOMEM)
rc = virtio_pci_modern_probe(vp_dev);
if (rc)
goto err_probe;
} else {
rc = virtio_pci_modern_probe(vp_dev);
if (rc == -ENODEV)
rc = virtio_pci_legacy_probe(vp_dev);
if (rc)
goto err_probe;
}

...

rc = register_virtio_device(&vp_dev->vdev);

When the device is registered to the virtio bus the kernel will look for a driver in the bus that can handle the device and call that driver’s probe method.

At this point, the virtqueues will be allocated and configured by calling the appropriate virtio_find helper function, such as virtio_find_single_vq() or virtio_find_vqs(), which will end up calling a transport-specific find_vqs method.

当设备被注册到virtio总线上时,内核将在总线上寻找一个可以处理该设备的驱动程序,并调用该驱动程序的探测方法。

此时,将通过调用适当的virtio_find辅助函数,如virtio_find_single_vq()或virtio_find_vqs()来分配和配置virtqueues,最终会调用一个特定于传输的find_vqs方法。