229 lines
8.5 KiB
Diff
229 lines
8.5 KiB
Diff
|
From 71aa0219f7c84cbf175eb2a091d48d5fd5daa40b Mon Sep 17 00:00:00 2001
|
||
|
From: Zhenzhong Duan <zhenzhong.duan@intel.com>
|
||
|
Date: Tue, 21 Nov 2023 16:44:26 +0800
|
||
|
Subject: [PATCH 047/101] docs/devel: Add VFIO iommufd backend documentation
|
||
|
MIME-Version: 1.0
|
||
|
Content-Type: text/plain; charset=UTF-8
|
||
|
Content-Transfer-Encoding: 8bit
|
||
|
|
||
|
RH-Author: Eric Auger <eric.auger@redhat.com>
|
||
|
RH-MergeRequest: 211: IOMMUFD backend backport
|
||
|
RH-Jira: RHEL-19302 RHEL-21057
|
||
|
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
|
||
|
RH-Acked-by: Sebastian Ott <sebott@redhat.com>
|
||
|
RH-Commit: [46/67] 6cf49d00e87788f894d690a985bb6798eae24505 (eauger1/centos-qemu-kvm)
|
||
|
|
||
|
Suggested-by: Cédric Le Goater <clg@redhat.com>
|
||
|
Signed-off-by: Eric Auger <eric.auger@redhat.com>
|
||
|
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
|
||
|
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
|
||
|
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
|
||
|
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
||
|
(cherry picked from commit 98dad2b01931f6064c6c4b48ca3c2a1d9f542cd8)
|
||
|
Signed-off-by: Eric Auger <eric.auger@redhat.com>
|
||
|
---
|
||
|
MAINTAINERS | 1 +
|
||
|
docs/devel/index-internals.rst | 1 +
|
||
|
docs/devel/vfio-iommufd.rst | 166 +++++++++++++++++++++++++++++++++
|
||
|
3 files changed, 168 insertions(+)
|
||
|
create mode 100644 docs/devel/vfio-iommufd.rst
|
||
|
|
||
|
diff --git a/MAINTAINERS b/MAINTAINERS
|
||
|
index ca70bb4e64..0ddb20a35f 100644
|
||
|
--- a/MAINTAINERS
|
||
|
+++ b/MAINTAINERS
|
||
|
@@ -2176,6 +2176,7 @@ F: backends/iommufd.c
|
||
|
F: include/sysemu/iommufd.h
|
||
|
F: include/qemu/chardev_open.h
|
||
|
F: util/chardev_open.c
|
||
|
+F: docs/devel/vfio-iommufd.rst
|
||
|
|
||
|
vhost
|
||
|
M: Michael S. Tsirkin <mst@redhat.com>
|
||
|
diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst
|
||
|
index 6f81df92bc..3def4a138b 100644
|
||
|
--- a/docs/devel/index-internals.rst
|
||
|
+++ b/docs/devel/index-internals.rst
|
||
|
@@ -18,5 +18,6 @@ Details about QEMU's various subsystems including how to add features to them.
|
||
|
s390-dasd-ipl
|
||
|
tracing
|
||
|
vfio-migration
|
||
|
+ vfio-iommufd
|
||
|
writing-monitor-commands
|
||
|
virtio-backends
|
||
|
diff --git a/docs/devel/vfio-iommufd.rst b/docs/devel/vfio-iommufd.rst
|
||
|
new file mode 100644
|
||
|
index 0000000000..3d1c11f175
|
||
|
--- /dev/null
|
||
|
+++ b/docs/devel/vfio-iommufd.rst
|
||
|
@@ -0,0 +1,166 @@
|
||
|
+===============================
|
||
|
+IOMMUFD BACKEND usage with VFIO
|
||
|
+===============================
|
||
|
+
|
||
|
+(Same meaning for backend/container/BE)
|
||
|
+
|
||
|
+With the introduction of iommufd, the Linux kernel provides a generic
|
||
|
+interface for user space drivers to propagate their DMA mappings to kernel
|
||
|
+for assigned devices. While the legacy kernel interface is group-centric,
|
||
|
+the new iommufd interface is device-centric, relying on device fd and iommufd.
|
||
|
+
|
||
|
+To support both interfaces in the QEMU VFIO device, introduce a base container
|
||
|
+to abstract the common part of VFIO legacy and iommufd container. So that the
|
||
|
+generic VFIO code can use either container.
|
||
|
+
|
||
|
+The base container implements generic functions such as memory_listener and
|
||
|
+address space management whereas the derived container implements callbacks
|
||
|
+specific to either legacy or iommufd. Each container has its own way to setup
|
||
|
+secure context and dma management interface. The below diagram shows how it
|
||
|
+looks like with both containers.
|
||
|
+
|
||
|
+::
|
||
|
+
|
||
|
+ VFIO AddressSpace/Memory
|
||
|
+ +-------+ +----------+ +-----+ +-----+
|
||
|
+ | pci | | platform | | ap | | ccw |
|
||
|
+ +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+
|
||
|
+ | | | | | AddressSpace |
|
||
|
+ | | | | +------------+---------+
|
||
|
+ +---V-----------V-----------V--------V----+ /
|
||
|
+ | VFIOAddressSpace | <------------+
|
||
|
+ | | | MemoryListener
|
||
|
+ | VFIOContainerBase list |
|
||
|
+ +-------+----------------------------+----+
|
||
|
+ | |
|
||
|
+ | |
|
||
|
+ +-------V------+ +--------V----------+
|
||
|
+ | iommufd | | vfio legacy |
|
||
|
+ | container | | container |
|
||
|
+ +-------+------+ +--------+----------+
|
||
|
+ | |
|
||
|
+ | /dev/iommu | /dev/vfio/vfio
|
||
|
+ | /dev/vfio/devices/vfioX | /dev/vfio/$group_id
|
||
|
+ Userspace | |
|
||
|
+ ============+============================+===========================
|
||
|
+ Kernel | device fd |
|
||
|
+ +---------------+ | group/container fd
|
||
|
+ | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU)
|
||
|
+ | ATTACH_IOAS) | | device fd
|
||
|
+ | | |
|
||
|
+ | +-------V------------V-----------------+
|
||
|
+ iommufd | | vfio |
|
||
|
+ (map/unmap | +---------+--------------------+-------+
|
||
|
+ ioas_copy) | | | map/unmap
|
||
|
+ | | |
|
||
|
+ +------V------+ +-----V------+ +------V--------+
|
||
|
+ | iommfd core | | device | | vfio iommu |
|
||
|
+ +-------------+ +------------+ +---------------+
|
||
|
+
|
||
|
+* Secure Context setup
|
||
|
+
|
||
|
+ - iommufd BE: uses device fd and iommufd to setup secure context
|
||
|
+ (bind_iommufd, attach_ioas)
|
||
|
+ - vfio legacy BE: uses group fd and container fd to setup secure context
|
||
|
+ (set_container, set_iommu)
|
||
|
+
|
||
|
+* Device access
|
||
|
+
|
||
|
+ - iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX``
|
||
|
+ - vfio legacy BE: device fd is retrieved from group fd ioctl
|
||
|
+
|
||
|
+* DMA Mapping flow
|
||
|
+
|
||
|
+ 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
|
||
|
+ 2. VFIO populates DMA map/unmap via the container BEs
|
||
|
+ * iommufd BE: uses iommufd
|
||
|
+ * vfio legacy BE: uses container fd
|
||
|
+
|
||
|
+Example configuration
|
||
|
+=====================
|
||
|
+
|
||
|
+Step 1: configure the host device
|
||
|
+---------------------------------
|
||
|
+
|
||
|
+It's exactly same as the VFIO device with legacy VFIO container.
|
||
|
+
|
||
|
+Step 2: configure QEMU
|
||
|
+----------------------
|
||
|
+
|
||
|
+Interactions with the ``/dev/iommu`` are abstracted by a new iommufd
|
||
|
+object (compiled in with the ``CONFIG_IOMMUFD`` option).
|
||
|
+
|
||
|
+Any QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must
|
||
|
+be linked with an iommufd object. It gets a new optional property
|
||
|
+named iommufd which allows to pass an iommufd object. Take ``vfio-pci``
|
||
|
+device for example:
|
||
|
+
|
||
|
+.. code-block:: bash
|
||
|
+
|
||
|
+ -object iommufd,id=iommufd0
|
||
|
+ -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
|
||
|
+
|
||
|
+Note the ``/dev/iommu`` and VFIO cdev can be externally opened by a
|
||
|
+management layer. In such a case the fd is passed, the fd supports a
|
||
|
+string naming the fd or a number, for example:
|
||
|
+
|
||
|
+.. code-block:: bash
|
||
|
+
|
||
|
+ -object iommufd,id=iommufd0,fd=22
|
||
|
+ -device vfio-pci,iommufd=iommufd0,fd=23
|
||
|
+
|
||
|
+If the ``fd`` property is not passed, the fd is opened by QEMU.
|
||
|
+
|
||
|
+If no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd
|
||
|
+is not used and the user gets the behavior based on the legacy VFIO
|
||
|
+container:
|
||
|
+
|
||
|
+.. code-block:: bash
|
||
|
+
|
||
|
+ -device vfio-pci,host=0000:02:00.0
|
||
|
+
|
||
|
+Supported platform
|
||
|
+==================
|
||
|
+
|
||
|
+Supports x86, ARM and s390x currently.
|
||
|
+
|
||
|
+Caveats
|
||
|
+=======
|
||
|
+
|
||
|
+Dirty page sync
|
||
|
+---------------
|
||
|
+
|
||
|
+Dirty page sync with iommufd backend is unsupported yet, live migration is
|
||
|
+disabled by default. But it can be force enabled like below, low efficient
|
||
|
+though.
|
||
|
+
|
||
|
+.. code-block:: bash
|
||
|
+
|
||
|
+ -object iommufd,id=iommufd0
|
||
|
+ -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on
|
||
|
+
|
||
|
+P2P DMA
|
||
|
+-------
|
||
|
+
|
||
|
+PCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI
|
||
|
+BAR region yet. Below warning shows for assigned PCI device, it's not a bug.
|
||
|
+
|
||
|
+.. code-block:: none
|
||
|
+
|
||
|
+ qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
|
||
|
+ qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
|
||
|
+
|
||
|
+FD passing with mdev
|
||
|
+--------------------
|
||
|
+
|
||
|
+``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev.
|
||
|
+If FD passing is used, there is no way to know that and the mdev is treated
|
||
|
+like a real PCI device. There is an error as below if user wants to enable
|
||
|
+RAM discarding for mdev.
|
||
|
+
|
||
|
+.. code-block:: none
|
||
|
+
|
||
|
+ qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices
|
||
|
+
|
||
|
+``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend
|
||
|
+devices are always mdev and RAM discarding is force enabled.
|
||
|
--
|
||
|
2.39.3
|
||
|
|