2022-10-06

Input–output memory management unit

From Wikipedia, the free encyclopedia

In computing, an input–output memory management unit (IOMMU) is a memory management unit (MMU) connecting a direct-memory-access–capable (DMA-capable) I/O bus to the main memory. Like a traditional MMU, which translates CPU-visible virtual addresses to physical addresses, the IOMMU maps device-visible virtual addresses (also called device addresses or I/O addresses in this context) to physical addresses. Some units also provide memory protection from faulty or malicious devices.

计算机中，input–output memory management unit (IOMMU) 是直接连接到主存的一个direct-memory-access-capable（DMA-capable，允许内存直接访问的）I/O总线内存管理单元（MMU）。类似传统的MMU，会翻译CPU可见的虚拟地址为物理地址，IOMMU映射设备可见的虚拟地址（也叫做设备地址或者I/O地址）到物理地址。一些单元也提供内存保护功能，避免设备出错或者受到攻击。

An example IOMMU is the graphics address remapping table (GART) used by AGP and PCI Express graphics cards on Intel Architecture and AMD computers.

On the x86 architecture, prior to splitting the functionality of northbridge and southbridge between the CPU and Platform Controller Hub (PCH), I/O virtualization was not performed by the CPU but instead by the chipset.[1][2]

举个例子，在Intel架构和AMD计算机上被AGP和PCI Express图形卡使用的graphics address remapping table（GART）就是IOMMU

在x86架构，提前拆分CPU，平台控制器HUB（PCH），I/O虚拟化在北桥和南桥的功能并不是CPU直接处理的，而是由芯片组解决的。

The advantages of having an IOMMU, compared to direct physical addressing of the memory (DMA), include[*citation needed*]:

Large regions of memory can be allocated without the need to be contiguous in physical memory – the IOMMU maps contiguous virtual addresses to the underlying fragmented physical addresses. Thus, the use of vectored I/O (scatter-gather lists) can sometimes be avoided.

Devices that do not support memory addresses long enough to address the entire physical memory can still address the entire memory through the IOMMU, avoiding overheads associated with copying buffers to and from the peripheral’s addressable memory space.

For example, x86 computers can address more than 4 gigabytes of memory with the Physical Address Extension (PAE) feature in an x86 processor. Still, an ordinary 32-bit PCI device simply cannot address the memory above the 4 GiB boundary, and thus it cannot directly access it. Without an IOMMU, the operating system would have to implement time-consuming bounce buffers (also known as double buffers[3]).

Memory is protected from malicious devices that are attempting DMA attacks and faulty devices that are attempting errant memory transfers because a device cannot read or write to memory that has not been explicitly allocated (mapped) for it. The memory protection is based on the fact that OS running on the CPU (see figure) exclusively controls both the MMU and the IOMMU. The devices are physically unable to circumvent or corrupt configured memory management tables.

In virtualization, guest operating systems can use hardware that is not specifically made for virtualization. Higher performance hardware such as graphics cards use DMA to access memory directly; in a virtual environment all memory addresses are re-mapped by the virtual machine software, which causes DMA devices to fail. The IOMMU handles this re-mapping, allowing the native device drivers to be used in a guest operating system.

In some architectures IOMMU also performs hardware interrupt re-mapping, in a manner similar to standard memory address re-mapping.

Peripheral memory paging can be supported by an IOMMU. A peripheral using the PCI-SIG PCIe Address Translation Services (ATS) Page Request Interface (PRI) extension can detect and signal the need for memory manager services.

For system architectures in which port I/O is a distinct address space from the memory address space, an IOMMU is not used when the CPU communicates with devices via I/O ports. In system architectures in which port I/O and memory are mapped into a suitable address space, an IOMMU can translate port I/O accesses.

IOMMU的优势，和内存的直接物理映射（DMA）相比，包括：

支持分配大片的内存，而且不需要是物理上连续的。IOMMU的映射保证虚拟地址连续，物理地址可以不连续。因此vectored I/O (scatter-gather lists) 有些时候是不能用的
内存地址长度不支持映射整个物理内存的时候也可以通过IOMMU来找到整个内存的地址。同时可以避免拷贝特定的内存到外围的内存空间而造成损耗
- 对x86计算机，具备通过Physical Address Extension（PAE）寻址超过4GB的内存的特性。同时一个普通的32位 PCI设备并不能很简单的找到超过4GB的内存地址，并且也无法访问这个地址。没有IOMMU的情况下，操作系统必须要实现一个消耗时间的bounce buffers（也叫做double buffer）
内存保护，避免恶意设备通过DMA攻击或者错误的设备尝试传输不正确的内存，因为一个设备并不能读或者写并不是这个设备分配映射的内存。内存保护是基于OS在CPU上运行并且排他的控制MMU以及IOMMU的事实实现的。设备在物理上就是没办法去避免或者破坏已经配置好的内存管理表的。
- 在虚拟化中，guest操作系统可以使用非特殊的虚拟化模式的硬件（物理硬件）。高性能硬件，比如说图形卡就是使用DMA来直接访问内存的。在虚拟环境中，所有的内存地址都被虚拟机软件重新做了映射，这就会导致DMA设备会访问失败。IOMMU需要处理这个重新映射的行为，并允许本地的硬件驱动能够被虚拟机操作系统使用。
在一些架构中，IOMMU也执行硬件中断的重新映射，类似标准内存地址的重新映射
IOMMU也支持外围的内存页。一个外围的使用PIC-SIG PCIe地址翻译服务（Address Translation Services）页请求接口（Page Request Interface）拓展能够探测并发出信号给需要内存管理的服务

对port I/O在内存空间中有有独有地址空间系统架构里，IOMMU并不通过I/O ports进行设备和CPU的沟通。系统架构里，port I/O和内存是映射到合适的地址空间里，IOMMU可以翻译port I/O的访问。

The disadvantages of having an IOMMU, compared to direct physical addressing of the memory, include:[4]

Some degradation of performance from translation and management overhead (e.g., page table walks).

Consumption of physical memory for the added I/O page (translation) tables. This can be mitigated if the tables can be shared with the processor.

In order to decrease the page table size the granularity of many IOMMUs is equal to the memory paging (often 4096 bytes), and hence each small buffer that needs protection against DMA attack has to be page aligned and zeroed before making visible to the device. Due to OS memory allocation complexity this means that the device driver needs to use bounce buffers for the sensitive data structures and hence decreasing overall performance.

IOMMU的缺点，和内存直接访问物理地址，包括：

因为翻译和管理地址导致的性能损耗（比如page table walks）
增加I/O page（translation）tables需要消耗物理内存。如果处理器可以共享page table就能够给减缓这个问题
为了减少page tables的大小以及减少page的粒度，大部分IOMMUs是和内存页（通常是4096字节）对等的，并且因为每个小buffer都需要避免DMA attack，在提供给设备访问之前，必须要被对齐并且置为0。由于OS内存分配的复杂性，这意味着设备驱动对敏感的数据结构要使用一个弹性的buffers因此降低了性能（因为要动态分配）

When an operating system is running inside a virtual machine, including systems that use paravirtualization, such as Xen and KVM, it does not usually know the host-physical addresses of memory that it accesses. This makes providing direct access to the computer hardware difficult, because if the guest OS tried to instruct the hardware to perform a direct memory access (DMA) using guest-physical addresses, it would likely corrupt the memory, as the hardware does not know about the mapping between the guest-physical and host-physical addresses for the given virtual machine. The corruption can be avoided if the hypervisor or host OS intervenes in the I/O operation to apply the translations. However, this approach incurs a delay in the I/O operation.

An IOMMU solves this problem by re-mapping the addresses accessed by the hardware according to the same (or a compatible) translation table that is used to map guest-physical address to host-physical addresses.[5]

当操作系统运行在虚拟机里，包括操作系统辅助虚拟化，比如Xen和KVM，通常情况下是不知道内存要访问的物理机器物理地址的。这个情况下要直接提供计算机的硬件地址是很困难的，因为如果guest OS尝试去命令硬件使用guest的物理地址执行直接内存访问（DMA），将会是一个错误的地址（因为虚拟化，guest的内存空间实际上和host的内存空间不是一回事），这是由于硬件并不知道guest物理地址和host物理地址之间的映射关系。通过host OS的介入把这个I/O操作翻译掉，就能够解决这个问题，而这个方法就会造成I/O操作变慢。

一个IOMMU可以通过宠幸映射硬件关联地址的翻译来解决这个问题，也就是用来做guest物理地址和host物理地址的映射。