[RFC 0/3] virtio-iommu: a paravirtualized IOMMU

Discussion:

Jean-Philippe Brucker

2017-04-07 19:17:44 UTC

This is the initial proposal for a paravirtualized IOMMU device using
virtio transport. It contains a description of the device, a Linux driver,
and a toy implementation in kvmtool. With this prototype, you can
translate DMA to guest memory from emulated (virtio), or passed-through
(VFIO) devices.

In its simplest form, implemented here, the device handles map/unmap
requests from the guest. Future extensions proposed in "RFC 3/3" should
allow to bind page tables to devices.

There are a number of advantages in a paravirtualized IOMMU over a full
emulation. It is portable and could be reused on different architectures.
It is easier to implement than a full emulation, with less state tracking.
It might be more efficient in some cases, with less context switches to
the host and the possibility of in-kernel emulation.

When designing it and writing the kvmtool device, I considered two main
scenarios, illustrated below.

Scenario 1: a hardware device passed through twice via VFIO

MEM____pIOMMU________PCI device________________________ HARDWARE
| (2b) \
----------|-------------+-------------+------------------\-------------
| : KVM : \
| : : \
pIOMMU drv : _______virtio-iommu drv \ KERNEL
| : | : | \
VFIO : | : VFIO \
| : | : | \
| : | : | /
----------|-------------+--------|----+----------|------------/--------
| | : | /
| (1c) (1b) | : (1a) | / (2a)
| | : | /
| | : | / USERSPACE
|___virtio-iommu dev___| : net drv___/
:
--------------------------------------+--------------------------------
HOST : GUEST

(1) a. Guest userspace is running a net driver (e.g. DPDK). It allocates a
buffer with mmap, obtaining virtual address VA. It then send a
VFIO_IOMMU_MAP_DMA request to map VA to an IOVA (possibly VA=IOVA).
b. The maping request is relayed to the host through virtio
(VIRTIO_IOMMU_T_MAP).
c. The mapping request is relayed to the physical IOMMU through VFIO.

(2) a. The guest userspace driver can now instruct the device to directly
access the buffer at IOVA
b. IOVA accesses from the device are translated into physical
addresses by the IOMMU.

Scenario 2: a virtual net device behind a virtual IOMMU.

MEM__pIOMMU___PCI device HARDWARE
| |
-------|---------|------+-------------+-------------------------------
| | : KVM :
| | : :
pIOMMU drv | : :
\ | : _____________virtio-net drv KERNEL
\_net drv : | : / (1a)
| : | : /
tap : | ________virtio-iommu drv
| : | | : (1b)
-----------------|------+-----|---|---+-------------------------------
| | | :
|_virtio-net_| | :
/ (2) | :
/ | : USERSPACE
virtio-iommu dev______| :
:
--------------------------------------+-------------------------------
HOST : GUEST

(1) a. Guest virtio-net driver maps the virtio ring and a buffer
b. The mapping requests are relayed to the host through virtio.
(2) The virtio-net device now needs to access any guest memory via the
IOMMU.

Physical and virtual IOMMUs are completely dissociated. The net driver is
mapping its own buffers via DMA/IOMMU API, and buffers are copied between
virtio-net and tap.

The description itself seemed too long for a single email, so I split it
into three documents, and will attach Linux and kvmtool patches to this
email.

1. Firmware note,
2. device operations (draft for the virtio specification),
3. future work/possible improvements.

Just to be clear on the terms I'm using:

pIOMMU physical IOMMU, controlling DMA accesses from physical devices
vIOMMU virtual IOMMU (virtio-iommu), controlling DMA accesses from
physical and virtual devices to guest memory.
GVA, GPA, HVA, HPA
Guest/Host Virtual/Physical Address
IOVA I/O Virtual Address, the address accessed by a device doing DMA
through an IOMMU. In the context of a guest OS, IOVA is GVA.

Note: kvmtool is GPLv2. Linux patches are GPLv2, except for UAPI
virtio-iommu.h header, which is BSD 3-clause. For the time being, the
specification draft in RFC 2/3 is also BSD 3-clause.

This proposal may be involuntarily centered around ARM architectures at
times. Any feedback would be appreciated, especially regarding other IOMMU
architectures.

Thanks,
Jean-Philippe

Jean-Philippe Brucker

2017-04-07 19:17:45 UTC