126 lines
4.9 KiB
Diff
126 lines
4.9 KiB
Diff
|
From 2edb7c1181fb69e410ffc688986a12d36899f976 Mon Sep 17 00:00:00 2001
|
||
|
From: Miroslav Rezanina <mrezanin@redhat.com>
|
||
|
Date: Mon, 19 Aug 2019 08:54:16 +0100
|
||
|
Subject: [PATCH 4/7] spapr: Reset CAS & IRQ subsystem after devices
|
||
|
MIME-Version: 1.0
|
||
|
Content-Type: text/plain; charset=UTF-8
|
||
|
Content-Transfer-Encoding: 8bit
|
||
|
|
||
|
RH-Author: Miroslav Rezanina <mrezanin@redhat.com>
|
||
|
Message-id: <9b7c319c271fa2c8cda410e87aef985d8c180049.1566204425.git.mrezanin@redhat.com>
|
||
|
Patchwork-id: 90057
|
||
|
O-Subject: [RHEL-AV-8.1 qemu-kvm PATCH 2/5] spapr: Reset CAS & IRQ subsystem after devices
|
||
|
Bugzilla: 1733977
|
||
|
RH-Acked-by: Philippe Mathieu-Daudé <philmd@redhat.com>
|
||
|
RH-Acked-by: Yash Mankad <ymankad@redhat.com>
|
||
|
RH-Acked-by: Danilo de Paula <ddepaula@redhat.com>
|
||
|
|
||
|
From: David Gibson <david@gibson.dropbear.id.au>
|
||
|
|
||
|
Bugzilla: 1733977
|
||
|
|
||
|
This fixes a nasty regression in qemu-4.1 for the 'pseries' machine,
|
||
|
caused by the new "dual" interrupt controller model. Specifically,
|
||
|
qemu can crash when used with KVM if a 'system_reset' is requested
|
||
|
while there's active I/O in the guest.
|
||
|
|
||
|
The problem is that in spapr_machine_reset() we:
|
||
|
|
||
|
1. Reset the CAS vector state
|
||
|
spapr_ovec_cleanup(spapr->ov5_cas);
|
||
|
|
||
|
2. Reset all devices
|
||
|
qemu_devices_reset()
|
||
|
|
||
|
3. Reset the irq subsystem
|
||
|
spapr_irq_reset();
|
||
|
|
||
|
However (1) implicitly changes the interrupt delivery mode, because
|
||
|
whether we're using XICS or XIVE depends on the CAS state. We don't
|
||
|
properly initialize the new irq mode until (3) though - in particular
|
||
|
setting up the KVM devices.
|
||
|
|
||
|
During (2), we can temporarily drop the BQL allowing some irqs to be
|
||
|
delivered which will go to an irq system that's not properly set up.
|
||
|
|
||
|
Specifically, if the previous guest was in (KVM) XIVE mode, the CAS
|
||
|
reset will put us back in XICS mode. kvm_kernel_irqchip() still
|
||
|
returns true, because XIVE was using KVM, however XICs doesn't have
|
||
|
its KVM components intialized and kernel_xics_fd == -1. When the irq
|
||
|
is delivered it goes via ics_kvm_set_irq() which assert()s that
|
||
|
kernel_xics_fd != -1.
|
||
|
|
||
|
This change addresses the problem by delaying the CAS reset until
|
||
|
after the devices reset. The device reset should quiesce all the
|
||
|
devices so we won't get irqs delivered while we mess around with the
|
||
|
IRQ. The CAS reset and irq re-initialize should also now be under the
|
||
|
same BQL critical section so nothing else should be able to interrupt
|
||
|
it either.
|
||
|
|
||
|
We also move the spapr_irq_msi_reset() used in one of the legacy irq
|
||
|
modes, since it logically makes sense at the same point as the
|
||
|
spapr_irq_reset() (it's essentially an equivalent operation for older
|
||
|
machine types). Since we don't need to switch between different
|
||
|
interrupt controllers for those old machine types it shouldn't
|
||
|
actually be broken in those cases though.
|
||
|
|
||
|
Cc: Cédric Le Goater <clg@kaod.org>
|
||
|
|
||
|
Fixes: b2e22477 "spapr: add a 'reset' method to the sPAPR IRQ backend"
|
||
|
Fixes: 13db0cd9 "spapr: introduce a new sPAPR IRQ backend supporting
|
||
|
XIVE and XICS"
|
||
|
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
|
||
|
(cherry picked from commit 25c9780d38d4494f8610371d883865cf40b35dd6)
|
||
|
|
||
|
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
|
||
|
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
|
||
|
---
|
||
|
hw/ppc/spapr.c | 24 ++++++++++++------------
|
||
|
1 file changed, 12 insertions(+), 12 deletions(-)
|
||
|
|
||
|
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
|
||
|
index ab64d43..669eae1 100644
|
||
|
--- a/hw/ppc/spapr.c
|
||
|
+++ b/hw/ppc/spapr.c
|
||
|
@@ -1727,6 +1727,18 @@ static void spapr_machine_reset(MachineState *machine)
|
||
|
}
|
||
|
|
||
|
/*
|
||
|
+ * NVLink2-connected GPU RAM needs to be placed on a separate NUMA node.
|
||
|
+ * We assign a new numa ID per GPU in spapr_pci_collect_nvgpu() which is
|
||
|
+ * called from vPHB reset handler so we initialize the counter here.
|
||
|
+ * If no NUMA is configured from the QEMU side, we start from 1 as GPU RAM
|
||
|
+ * must be equally distant from any other node.
|
||
|
+ * The final value of spapr->gpu_numa_id is going to be written to
|
||
|
+ * max-associativity-domains in spapr_build_fdt().
|
||
|
+ */
|
||
|
+ spapr->gpu_numa_id = MAX(1, nb_numa_nodes);
|
||
|
+ qemu_devices_reset();
|
||
|
+
|
||
|
+ /*
|
||
|
* If this reset wasn't generated by CAS, we should reset our
|
||
|
* negotiated options and start from scratch
|
||
|
*/
|
||
|
@@ -1742,18 +1754,6 @@ static void spapr_machine_reset(MachineState *machine)
|
||
|
}
|
||
|
|
||
|
/*
|
||
|
- * NVLink2-connected GPU RAM needs to be placed on a separate NUMA node.
|
||
|
- * We assign a new numa ID per GPU in spapr_pci_collect_nvgpu() which is
|
||
|
- * called from vPHB reset handler so we initialize the counter here.
|
||
|
- * If no NUMA is configured from the QEMU side, we start from 1 as GPU RAM
|
||
|
- * must be equally distant from any other node.
|
||
|
- * The final value of spapr->gpu_numa_id is going to be written to
|
||
|
- * max-associativity-domains in spapr_build_fdt().
|
||
|
- */
|
||
|
- spapr->gpu_numa_id = MAX(1, nb_numa_nodes);
|
||
|
- qemu_devices_reset();
|
||
|
-
|
||
|
- /*
|
||
|
* This is fixing some of the default configuration of the XIVE
|
||
|
* devices. To be called after the reset of the machine devices.
|
||
|
*/
|
||
|
--
|
||
|
1.8.3.1
|
||
|
|