Import of kernel-6.12.0-211.7.3.el10_2

2026-05-27 05:36:45 +00:00 · 2026-05-27 05:36:45 +00:00 · b24cd7a995
commit b24cd7a995
parent 6cdebfc8f7
26454 changed files with 889254 additions and 413707 deletions
--- a/COPYING-6.12.0-124.56.5.el10
+++ b/COPYING-6.12.0-124.56.5.el10
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@ -547,6 +547,21 @@ Description:
 		[RO] Maximum size in bytes of a single element in a DMA
 		scatter/gather list.

+What:		/sys/block/<disk>/queue/max_write_streams
+Date:		November 2024
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Maximum number of write streams supported, 0 if not
+		supported. If supported, valid values are 1 through
+		max_write_streams, inclusive.
+
+What:		/sys/block/<disk>/queue/write_stream_granularity
+Date:		November 2024
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Granularity of a write stream in bytes.  The granularity
+		of a write stream is the size that should be discarded or
+		overwritten together to avoid write amplification in the device.

 What:		/sys/block/<disk>/queue/max_segments
 Date:		March 2010
--- a/Documentation/ABI/stable/sysfs-driver-qaic
+++ b/Documentation/ABI/stable/sysfs-driver-qaic
@ -0,0 +1,19 @@
+What:		/sys/bus/pci/drivers/qaic/XXXX:XX:XX.X/accel/accel<minor_nr>/dbc<N>_state
+Date:		October 2025
+KernelVersion:	6.19
+Contact:	Jeff Hugo <jeff.hugo@oss.qualcomm.com>
+Description:	Represents the current state of DMA Bridge channel (DBC). Below are the possible
+		states:
+
+		===================	==========================================================
+		IDLE (0)		DBC is free and can be activated
+		ASSIGNED (1)		DBC is activated and a workload is running on device
+		BEFORE_SHUTDOWN (2)	Sub-system associated with this workload has crashed and
+					it will shutdown soon
+		AFTER_SHUTDOWN (3)	Sub-system associated with this workload has crashed and
+					it has shutdown
+		BEFORE_POWER_UP (4)	Sub-system associated with this workload is shutdown and
+					it will be powered up soon
+		AFTER_POWER_UP (5)	Sub-system associated with this workload is now powered up
+		===================	==========================================================
+Users:		Any userspace application or clients interested in DBC state.
--- a/Documentation/ABI/testing/debugfs-amd-iommu
+++ b/Documentation/ABI/testing/debugfs-amd-iommu
@ -0,0 +1,131 @@
+What:		/sys/kernel/debug/iommu/amd/iommu<x>/mmio
+Date:		January 2025
+Contact:	Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
+Description:
+		This file provides read/write access for user input. Users specify the
+		MMIO register offset for iommu<x>, and the file outputs the corresponding
+		MMIO register value of iommu<x>
+
+		Example::
+
+		  $ echo "0x18" > /sys/kernel/debug/iommu/amd/iommu00/mmio
+		  $ cat /sys/kernel/debug/iommu/amd/iommu00/mmio
+
+		Output::
+
+		  Offset:0x18 Value:0x000c22000003f48d
+
+What:		/sys/kernel/debug/iommu/amd/iommu<x>/capability
+Date:		January 2025
+Contact:	Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
+Description:
+		This file provides read/write access for user input. Users specify the
+		capability register offset for iommu<x>, and the file outputs the
+		corresponding capability register value of iommu<x>.
+
+		Example::
+
+		  $ echo "0x10" > /sys/kernel/debug/iommu/amd/iommu00/capability
+		  $ cat /sys/kernel/debug/iommu/amd/iommu00/capability
+
+		Output::
+
+		  Offset:0x10 Value:0x00203040
+
+What:		/sys/kernel/debug/iommu/amd/iommu<x>/cmdbuf
+Date:		January 2025
+Contact:	Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
+Description:
+		This file is a read-only output file containing iommu<x> command
+		buffer entries.
+
+		Examples::
+
+		  $ cat /sys/kernel/debug/iommu/amd/iommu<x>/cmdbuf
+
+		Output::
+
+		  CMD Buffer Head Offset:339 Tail Offset:339
+		    0: 00835001 10000001 00003c00 00000000
+		    1: 00000000 30000005 fffff003 7fffffff
+		    2: 00835001 10000001 00003c01 00000000
+		    3: 00000000 30000005 fffff003 7fffffff
+		    4: 00835001 10000001 00003c02 00000000
+		    5: 00000000 30000005 fffff003 7fffffff
+		    6: 00835001 10000001 00003c03 00000000
+		    7: 00000000 30000005 fffff003 7fffffff
+		    8: 00835001 10000001 00003c04 00000000
+		    9: 00000000 30000005 fffff003 7fffffff
+		   10: 00835001 10000001 00003c05 00000000
+		   11: 00000000 30000005 fffff003 7fffffff
+		  [...]
+
+What:		/sys/kernel/debug/iommu/amd/devid
+Date:		January 2025
+Contact:	Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
+Description:
+		This file provides read/write access for user input. Users specify the
+		device ID, which can be used to dump IOMMU data structures such as the
+		interrupt remapping table and device table.
+
+		Example:
+
+		1.
+		  ::
+
+		    $ echo 0000:01:00.0 > /sys/kernel/debug/iommu/amd/devid
+		    $ cat /sys/kernel/debug/iommu/amd/devid
+
+		  Output::
+
+		    0000:01:00.0
+
+		2.
+		  ::
+
+		    $ echo 01:00.0 > /sys/kernel/debug/iommu/amd/devid
+		    $ cat /sys/kernel/debug/iommu/amd/devid
+
+		  Output::
+
+		    0000:01:00.0
+
+What:		/sys/kernel/debug/iommu/amd/devtbl
+Date:		January 2025
+Contact:	Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
+Description:
+		This file is a read-only output file containing the device table entry
+		for the device ID provided in /sys/kernel/debug/iommu/amd/devid.
+
+		Example::
+
+		  $ cat /sys/kernel/debug/iommu/amd/devtbl
+
+		Output::
+
+		  DeviceId             QWORD[3]         QWORD[2]         QWORD[1]         QWORD[0] iommu
+		  0000:01:00.0 0000000000000000 20000001373b8013 0000000000000038 6000000114d7b603 iommu3
+
+What:		/sys/kernel/debug/iommu/amd/irqtbl
+Date:		January 2025
+Contact:	Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
+Description:
+		This file is a read-only output file containing valid IRT table entries
+		for the device ID provided in /sys/kernel/debug/iommu/amd/devid.
+
+		Example::
+
+		  $ cat /sys/kernel/debug/iommu/amd/irqtbl
+
+		Output::
+
+		  DeviceId 0000:01:00.0
+		  IRT[0000] 0000000000000020 0000000000000241
+		  IRT[0001] 0000000000000020 0000000000000841
+		  IRT[0002] 0000000000000020 0000000000002041
+		  IRT[0003] 0000000000000020 0000000000008041
+		  IRT[0004] 0000000000000020 0000000000020041
+		  IRT[0005] 0000000000000020 0000000000080041
+		  IRT[0006] 0000000000000020 0000000000200041
+		  IRT[0007] 0000000000000020 0000000000800041
+		  [...]
--- a/Documentation/ABI/testing/debugfs-driver-qat
+++ b/Documentation/ABI/testing/debugfs-driver-qat
@ -67,7 +67,7 @@ Contact:	qat-linux@intel.com
 Description:	(RO) Read returns power management information specific to the
 		QAT device.

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/kernel/debug/qat_<device>_<BDF>/cnv_errors
 Date:		January 2024
--- a/Documentation/ABI/testing/debugfs-driver-qat_telemetry
+++ b/Documentation/ABI/testing/debugfs-driver-qat_telemetry
@ -32,7 +32,7 @@ Description:	(RW) Enables/disables the reporting of telemetry metrics.

 		  echo 0 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/kernel/debug/qat_<device>_<BDF>/telemetry/device_data
 Date:		March 2024
@ -57,6 +57,7 @@ Description:	(RO) Reports device telemetry counters.
 		gp_lat_acc_avg		average get to put latency [ns]
 		bw_in			PCIe, write bandwidth [Mbps]
 		bw_out			PCIe, read bandwidth [Mbps]
+		re_acc_avg		average ring empty time [ns]
 		at_page_req_lat_avg	Address Translator(AT), average page
 					request latency [ns]
 		at_trans_lat_avg	AT, average page translation latency [ns]
@ -67,6 +68,10 @@ Description:	(RO) Reports device telemetry counters.
 		exec_xlt<N>		execution count of Translator slice N
 		util_dcpr<N>		utilization of Decompression slice N [%]
 		exec_dcpr<N>		execution count of Decompression slice N
+		util_cnv<N>		utilization of Compression and verify slice N [%]
+		exec_cnv<N>		execution count of Compression and verify slice N
+		util_dcprz<N>		utilization of Decompression slice N [%]
+		exec_dcprz<N>		execution count of Decompression slice N
 		util_pke<N>		utilization of PKE N [%]
 		exec_pke<N>		execution count of PKE N
 		util_ucs<N>		utilization of UCS slice N [%]
@ -81,6 +86,32 @@ Description:	(RO) Reports device telemetry counters.
 		exec_cph<N>		execution count of Cipher slice N
 		util_ath<N>		utilization of Authentication slice N [%]
 		exec_ath<N>		execution count of Authentication slice N
+		cmdq_wait_cnv<N>	wait time for cmdq N to get Compression and verify
+					slice ownership
+		cmdq_exec_cnv<N>	Compression and verify slice execution time while
+					owned by cmdq N
+		cmdq_drain_cnv<N>	time taken for cmdq N to release Compression and
+					verify slice ownership
+		cmdq_wait_dcprz<N>	wait time for cmdq N to get Decompression
+					slice N ownership
+		cmdq_exec_dcprz<N>	Decompression slice execution time while
+					owned by cmdq N
+		cmdq_drain_dcprz<N>	time taken for cmdq N to release Decompression
+					slice ownership
+		cmdq_wait_pke<N>	wait time for cmdq N to get PKE slice ownership
+		cmdq_exec_pke<N>	PKE slice execution time while owned by cmdq N
+		cmdq_drain_pke<N>	time taken for cmdq N to release PKE slice
+					ownership
+		cmdq_wait_ucs<N>	wait time for cmdq N to get UCS slice ownership
+		cmdq_exec_ucs<N>	UCS slice execution time while owned by cmdq N
+		cmdq_drain_ucs<N>	time taken for cmdq N to release UCS slice
+					ownership
+		cmdq_wait_ath<N>	wait time for cmdq N to get Authentication slice
+					ownership
+		cmdq_exec_ath<N>	Authentication slice execution time while owned
+					by cmdq N
+		cmdq_drain_ath<N>	time taken for cmdq N to release Authentication
+					slice ownership
 		=======================	========================================

 		The telemetry report file can be read with the following command::
@ -100,7 +131,7 @@ Description:	(RO) Reports device telemetry counters.
 		If a device lacks of a specific accelerator, the corresponding
 		attribute is not reported.

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/kernel/debug/qat_<device>_<BDF>/telemetry/rp_<A/B/C/D>_data
 Date:		March 2024
@ -225,4 +256,4 @@ Description:	(RW) Selects up to 4 Ring Pairs (RP) to monitor, one per file,
 		``rp2srv`` from sysfs.
 		See Documentation/ABI/testing/sysfs-driver-qat for details.

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.
--- a/Documentation/ABI/testing/debugfs-pcie-ptm
+++ b/Documentation/ABI/testing/debugfs-pcie-ptm
@ -0,0 +1,70 @@
+What:		/sys/kernel/debug/pcie_ptm_*/local_clock
+Date:		May 2025
+Contact:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
+Description:
+		(RO) PTM local clock in nanoseconds. Applicable for both Root
+		Complex and Endpoint controllers.
+
+What:		/sys/kernel/debug/pcie_ptm_*/master_clock
+Date:		May 2025
+Contact:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
+Description:
+		(RO) PTM master clock in nanoseconds. Applicable only for
+		Endpoint controllers.
+
+What:		/sys/kernel/debug/pcie_ptm_*/t1
+Date:		May 2025
+Contact:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
+Description:
+		(RO) PTM T1 timestamp in nanoseconds. Applicable only for
+		Endpoint controllers.
+
+What:		/sys/kernel/debug/pcie_ptm_*/t2
+Date:		May 2025
+Contact:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
+Description:
+		(RO) PTM T2 timestamp in nanoseconds. Applicable only for
+		Root Complex controllers.
+
+What:		/sys/kernel/debug/pcie_ptm_*/t3
+Date:		May 2025
+Contact:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
+Description:
+		(RO) PTM T3 timestamp in nanoseconds. Applicable only for
+		Root Complex controllers.
+
+What:		/sys/kernel/debug/pcie_ptm_*/t4
+Date:		May 2025
+Contact:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
+Description:
+		(RO) PTM T4 timestamp in nanoseconds. Applicable only for
+		Endpoint controllers.
+
+What:		/sys/kernel/debug/pcie_ptm_*/context_update
+Date:		May 2025
+Contact:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
+Description:
+		(RW) Control the PTM context update mode. Applicable only for
+		Endpoint controllers.
+
+		Following values are supported:
+
+		* auto = PTM context auto update trigger for every 10ms
+
+		* manual = PTM context manual update. Writing 'manual' to this
+			   file triggers PTM context update (default)
+
+What:		/sys/kernel/debug/pcie_ptm_*/context_valid
+Date:		May 2025
+Contact:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
+Description:
+		(RW) Control the PTM context validity (local clock timing).
+		Applicable only for Root Complex controllers. PTM context is
+		invalidated by hardware if the Root Complex enters low power
+		mode or changes link frequency.
+
+		Following values are supported:
+
+		* 0 = PTM context invalid (default)
+
+		* 1 = PTM context valid
--- a/Documentation/ABI/testing/debugfs-vfio
+++ b/Documentation/ABI/testing/debugfs-vfio
@ -23,3 +23,9 @@ Contact:	Longfang Liu <liulongfang@huawei.com>
 Description:	Read the live migration status of the vfio device.
 		The contents of the state file reflects the migration state
 		relative to those defined in the vfio_device_mig_state enum
+
+What:		/sys/kernel/debug/vfio/<device>/migration/features
+Date:		Oct 2025
+KernelVersion:	6.18
+Contact:	Cédric Le Goater <clg@redhat.com>
+Description:	Read the migration features of the vfio device.
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@ -321,14 +321,13 @@ KernelVersion:	v6.0
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
-		translates from a host physical address range, to a device local
-		address range. Device-local address ranges are further split
-		into a 'ram' (volatile memory) range and 'pmem' (persistent
-		memory) range. The 'mode' attribute emits one of 'ram', 'pmem',
-		'mixed', or 'none'. The 'mixed' indication is for error cases
-		when a decoder straddles the volatile/persistent partition
-		boundary, and 'none' indicates the decoder is not actively
-		decoding, or no DPA allocation policy has been set.
+		translates from a host physical address range, to a device
+		local address range. Device-local address ranges are further
+		split into a 'ram' (volatile memory) range and 'pmem'
+		(persistent memory) range. The 'mode' attribute emits one of
+		'ram', 'pmem', or 'none'. The 'none' indicates the decoder is
+		not actively decoding, or no DPA allocation policy has been
+		set.

 		'mode' can be written, when the decoder is in the 'disabled'
 		state, with either 'ram' or 'pmem' to set the boundaries for the
@ -571,6 +570,18 @@ Description:
 		number to the closest CPU.


+What:		/sys/bus/cxl/devices/nvdimm-bridge0/ndbusX/nmemY/cxl/dirty_shutdown
+Date:		Feb, 2025
+KernelVersion:	v6.15
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) The device dirty shutdown count value, which is the number
+		of times the device could have incurred in potential data loss.
+		The count is persistent across power loss and wraps back to 0
+		upon overflow. If this file is not present, the device does not
+		have the necessary support for dirty tracking.
+
+
 What:		/sys/bus/cxl/devices/regionZ/accessY/read_latency
 		/sys/bus/cxl/devices/regionZ/accessY/write_latency
 Date:		Jan, 2024
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@ -583,3 +583,32 @@ Description:
 		enclosure-specific indications "specific0" to "specific7",
 		hence the corresponding led class devices are unavailable if
 		the DSM interface is used.
+
+What:		/sys/bus/pci/devices/.../doe_features
+Date:		March 2025
+Contact:	Linux PCI developers <linux-pci@vger.kernel.org>
+Description:
+		This directory contains a list of the supported Data Object
+		Exchange (DOE) features. The features are the file name.
+		The contents of each file is the raw Vendor ID and data
+		object feature values.
+
+		The value comes from the device and specifies the vendor and
+		data object type supported. The lower (RHS of the colon) is
+		the data object type in hex. The upper (LHS of the colon)
+		is the vendor ID.
+
+		As all DOE devices must support the DOE discovery feature,
+		if DOE is supported you will at least see the doe_discovery
+		file, with this contents:
+
+		  # cat doe_features/doe_discovery
+		  0001:00
+
+		If the device supports other features you will see other
+		files as well. For example if CMA/SPDM and secure CMA/SPDM
+		are supported the doe_features directory will look like
+		this:
+
+		  # ls doe_features
+		  0001:01        0001:02        doe_discovery
--- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer
@ -0,0 +1,163 @@
+PCIe Device AER statistics
+--------------------------
+
+These attributes show up under all the devices that are AER capable. These
+statistical counters indicate the errors "as seen/reported by the device".
+Note that this may mean that if an endpoint is causing problems, the AER
+counters may increment at its link partner (e.g. root port) because the
+errors may be "seen" / reported by the link partner and not the
+problematic endpoint itself (which may report all counters as 0 as it never
+saw any problems).
+
+What:		/sys/bus/pci/devices/<dev>/aer_dev_correctable
+Date:		July 2018
+KernelVersion:	4.19.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	List of correctable errors seen and reported by this
+		PCI device using ERR_COR. Note that since multiple errors may
+		be reported using a single ERR_COR message, thus
+		TOTAL_ERR_COR at the end of the file may not match the actual
+		total of all the errors in the file. Sample output::
+
+		    localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable
+		    Receiver Error 2
+		    Bad TLP 0
+		    Bad DLLP 0
+		    RELAY_NUM Rollover 0
+		    Replay Timer Timeout 0
+		    Advisory Non-Fatal 0
+		    Corrected Internal Error 0
+		    Header Log Overflow 0
+		    TOTAL_ERR_COR 2
+
+What:		/sys/bus/pci/devices/<dev>/aer_dev_fatal
+Date:		July 2018
+KernelVersion:	4.19.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	List of uncorrectable fatal errors seen and reported by this
+		PCI device using ERR_FATAL. Note that since multiple errors may
+		be reported using a single ERR_FATAL message, thus
+		TOTAL_ERR_FATAL at the end of the file may not match the actual
+		total of all the errors in the file. Sample output::
+
+		    localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal
+		    Undefined 0
+		    Data Link Protocol 0
+		    Surprise Down Error 0
+		    Poisoned TLP 0
+		    Flow Control Protocol 0
+		    Completion Timeout 0
+		    Completer Abort 0
+		    Unexpected Completion 0
+		    Receiver Overflow 0
+		    Malformed TLP 0
+		    ECRC 0
+		    Unsupported Request 0
+		    ACS Violation 0
+		    Uncorrectable Internal Error 0
+		    MC Blocked TLP 0
+		    AtomicOp Egress Blocked 0
+		    TLP Prefix Blocked Error 0
+		    TOTAL_ERR_FATAL 0
+
+What:		/sys/bus/pci/devices/<dev>/aer_dev_nonfatal
+Date:		July 2018
+KernelVersion:	4.19.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	List of uncorrectable nonfatal errors seen and reported by this
+		PCI device using ERR_NONFATAL. Note that since multiple errors
+		may be reported using a single ERR_FATAL message, thus
+		TOTAL_ERR_NONFATAL at the end of the file may not match the
+		actual total of all the errors in the file. Sample output::
+
+		    localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal
+		    Undefined 0
+		    Data Link Protocol 0
+		    Surprise Down Error 0
+		    Poisoned TLP 0
+		    Flow Control Protocol 0
+		    Completion Timeout 0
+		    Completer Abort 0
+		    Unexpected Completion 0
+		    Receiver Overflow 0
+		    Malformed TLP 0
+		    ECRC 0
+		    Unsupported Request 0
+		    ACS Violation 0
+		    Uncorrectable Internal Error 0
+		    MC Blocked TLP 0
+		    AtomicOp Egress Blocked 0
+		    TLP Prefix Blocked Error 0
+		    TOTAL_ERR_NONFATAL 0
+
+PCIe Rootport AER statistics
+----------------------------
+
+These attributes show up under only the rootports (or root complex event
+collectors) that are AER capable. These indicate the number of error messages as
+"reported to" the rootport. Please note that the rootports also transmit
+(internally) the ERR_* messages for errors seen by the internal rootport PCI
+device, so these counters include them and are thus cumulative of all the error
+messages on the PCI hierarchy originating at that root port.
+
+What:		/sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor
+Date:		July 2018
+KernelVersion:	4.19.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Total number of ERR_COR messages reported to rootport.
+
+What:		/sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal
+Date:		July 2018
+KernelVersion:	4.19.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Total number of ERR_FATAL messages reported to rootport.
+
+What:		/sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal
+Date:		July 2018
+KernelVersion:	4.19.0
+Contact:	linux-pci@vger.kernel.org, rajatja@google.com
+Description:	Total number of ERR_NONFATAL messages reported to rootport.
+
+PCIe AER ratelimits
+-------------------
+
+These attributes show up under all the devices that are AER capable.
+They represent configurable ratelimits of logs per error type.
+
+See Documentation/PCI/pcieaer-howto.rst for more info on ratelimits.
+
+What:		/sys/bus/pci/devices/<dev>/aer/correctable_ratelimit_interval_ms
+Date:		May 2025
+KernelVersion:	6.16.0
+Contact:	linux-pci@vger.kernel.org
+Description:	Writing 0 disables AER correctable error log ratelimiting.
+		Writing a positive value sets the ratelimit interval in ms.
+		Default is DEFAULT_RATELIMIT_INTERVAL (5000 ms).
+
+What:		/sys/bus/pci/devices/<dev>/aer/correctable_ratelimit_burst
+Date:		May 2025
+KernelVersion:	6.16.0
+Contact:	linux-pci@vger.kernel.org
+Description:	Ratelimit burst for correctable error logs. Writing a value
+		changes the number of errors (burst) allowed per interval
+		before ratelimiting. Reading gets the current ratelimit
+		burst. Default is DEFAULT_RATELIMIT_BURST (10).
+
+What:		/sys/bus/pci/devices/<dev>/aer/nonfatal_ratelimit_interval_ms
+Date:		May 2025
+KernelVersion:	6.16.0
+Contact:	linux-pci@vger.kernel.org
+Description:	Writing 0 disables AER non-fatal uncorrectable error log
+		ratelimiting. Writing a positive value sets the ratelimit
+		interval in ms. Default is DEFAULT_RATELIMIT_INTERVAL
+		(5000 ms).
+
+What:		/sys/bus/pci/devices/<dev>/aer/nonfatal_ratelimit_burst
+Date:		May 2025
+KernelVersion:	6.16.0
+Contact:	linux-pci@vger.kernel.org
+Description:	Ratelimit burst for non-fatal uncorrectable error logs.
+		Writing a value changes the number of errors (burst)
+		allowed per interval before ratelimiting. Reading gets the
+		current ratelimit burst. Default is DEFAULT_RATELIMIT_BURST
+		(10).
--- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
@ -1,119 +0,0 @@
-PCIe Device AER statistics
--------------------------
-
-These attributes show up under all the devices that are AER capable. These
-statistical counters indicate the errors "as seen/reported by the device".
-Note that this may mean that if an endpoint is causing problems, the AER
-counters may increment at its link partner (e.g. root port) because the
-errors may be "seen" / reported by the link partner and not the
-problematic endpoint itself (which may report all counters as 0 as it never
-saw any problems).
-
-What:		/sys/bus/pci/devices/<dev>/aer_dev_correctable
-Date:		July 2018
-KernelVersion:	4.19.0
-Contact:	linux-pci@vger.kernel.org, rajatja@google.com
-Description:	List of correctable errors seen and reported by this
-		PCI device using ERR_COR. Note that since multiple errors may
-		be reported using a single ERR_COR message, thus
-		TOTAL_ERR_COR at the end of the file may not match the actual
-		total of all the errors in the file. Sample output::
-
-		    localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable
-		    Receiver Error 2
-		    Bad TLP 0
-		    Bad DLLP 0
-		    RELAY_NUM Rollover 0
-		    Replay Timer Timeout 0
-		    Advisory Non-Fatal 0
-		    Corrected Internal Error 0
-		    Header Log Overflow 0
-		    TOTAL_ERR_COR 2
-
-What:		/sys/bus/pci/devices/<dev>/aer_dev_fatal
-Date:		July 2018
-KernelVersion:	4.19.0
-Contact:	linux-pci@vger.kernel.org, rajatja@google.com
-Description:	List of uncorrectable fatal errors seen and reported by this
-		PCI device using ERR_FATAL. Note that since multiple errors may
-		be reported using a single ERR_FATAL message, thus
-		TOTAL_ERR_FATAL at the end of the file may not match the actual
-		total of all the errors in the file. Sample output::
-
-		    localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal
-		    Undefined 0
-		    Data Link Protocol 0
-		    Surprise Down Error 0
-		    Poisoned TLP 0
-		    Flow Control Protocol 0
-		    Completion Timeout 0
-		    Completer Abort 0
-		    Unexpected Completion 0
-		    Receiver Overflow 0
-		    Malformed TLP 0
-		    ECRC 0
-		    Unsupported Request 0
-		    ACS Violation 0
-		    Uncorrectable Internal Error 0
-		    MC Blocked TLP 0
-		    AtomicOp Egress Blocked 0
-		    TLP Prefix Blocked Error 0
-		    TOTAL_ERR_FATAL 0
-
-What:		/sys/bus/pci/devices/<dev>/aer_dev_nonfatal
-Date:		July 2018
-KernelVersion:	4.19.0
-Contact:	linux-pci@vger.kernel.org, rajatja@google.com
-Description:	List of uncorrectable nonfatal errors seen and reported by this
-		PCI device using ERR_NONFATAL. Note that since multiple errors
-		may be reported using a single ERR_FATAL message, thus
-		TOTAL_ERR_NONFATAL at the end of the file may not match the
-		actual total of all the errors in the file. Sample output::
-
-		    localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal
-		    Undefined 0
-		    Data Link Protocol 0
-		    Surprise Down Error 0
-		    Poisoned TLP 0
-		    Flow Control Protocol 0
-		    Completion Timeout 0
-		    Completer Abort 0
-		    Unexpected Completion 0
-		    Receiver Overflow 0
-		    Malformed TLP 0
-		    ECRC 0
-		    Unsupported Request 0
-		    ACS Violation 0
-		    Uncorrectable Internal Error 0
-		    MC Blocked TLP 0
-		    AtomicOp Egress Blocked 0
-		    TLP Prefix Blocked Error 0
-		    TOTAL_ERR_NONFATAL 0
-
-PCIe Rootport AER statistics
----------------------------
-
-These attributes show up under only the rootports (or root complex event
-collectors) that are AER capable. These indicate the number of error messages as
-"reported to" the rootport. Please note that the rootports also transmit
-(internally) the ERR_* messages for errors seen by the internal rootport PCI
-device, so these counters include them and are thus cumulative of all the error
-messages on the PCI hierarchy originating at that root port.
-
-What:		/sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor
-Date:		July 2018
-KernelVersion:	4.19.0
-Contact:	linux-pci@vger.kernel.org, rajatja@google.com
-Description:	Total number of ERR_COR messages reported to rootport.
-
-What:		/sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal
-Date:		July 2018
-KernelVersion:	4.19.0
-Contact:	linux-pci@vger.kernel.org, rajatja@google.com
-Description:	Total number of ERR_FATAL messages reported to rootport.
-
-What:		/sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal
-Date:		July 2018
-KernelVersion:	4.19.0
-Contact:	linux-pci@vger.kernel.org, rajatja@google.com
-Description:	Total number of ERR_NONFATAL messages reported to rootport.
--- a/Documentation/ABI/testing/sysfs-class-net-phydev
+++ b/Documentation/ABI/testing/sysfs-class-net-phydev
@ -26,6 +26,16 @@ Description:
 		This ID is used to match the device with the appropriate
 		driver.

+What:		/sys/class/mdio_bus/<bus>/<device>/c45_phy_ids/mmd<n>_device_id
+Date:		June 2025
+KernelVersion:	6.17
+Contact:	netdev@vger.kernel.org
+Description:
+		This attribute contains the 32-bit PHY Identifier as reported
+		by the device during bus enumeration, encoded in hexadecimal.
+		These C45 IDs are used to match the device with the appropriate
+		driver. These files are invisible to the C22 device.
+
 What:		/sys/class/mdio_bus/<bus>/<device>/phy_interface
 Date:		February 2014
 KernelVersion:	3.15
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@ -268,6 +268,60 @@ Description:	Discover CPUs in the same CPU frequency coordination domain
 		This file is only present if the acpi-cpufreq or the cppc-cpufreq
 		drivers are in use.

+What:		/sys/devices/system/cpu/cpuX/cpufreq/auto_select
+Date:		May 2025
+Contact:	linux-pm@vger.kernel.org
+Description:	Autonomous selection enable
+
+		Read/write interface to control autonomous selection enable
+			Read returns autonomous selection status:
+				0: autonomous selection is disabled
+				1: autonomous selection is enabled
+
+			Write 'y' or '1' or 'on' to enable autonomous selection.
+			Write 'n' or '0' or 'off' to disable autonomous selection.
+
+		This file is only present if the cppc-cpufreq driver is in use.
+
+What:		/sys/devices/system/cpu/cpuX/cpufreq/auto_act_window
+Date:		May 2025
+Contact:	linux-pm@vger.kernel.org
+Description:	Autonomous activity window
+
+		This file indicates a moving utilization sensitivity window to
+		the platform's autonomous selection policy.
+
+		Read/write an integer represents autonomous activity window (in
+		microseconds) from/to this file. The max value to write is
+		1270000000 but the max significand is 127. This means that if 128
+		is written to this file, 127 will be stored. If the value is
+		greater than 130, only the first two digits will be saved as
+		significand.
+
+		Writing a zero value to this file enable the platform to
+		determine an appropriate Activity Window depending on the workload.
+
+		Writing to this file only has meaning when Autonomous Selection is
+		enabled.
+
+		This file is only present if the cppc-cpufreq driver is in use.
+
+What:		/sys/devices/system/cpu/cpuX/cpufreq/energy_performance_preference_val
+Date:		May 2025
+Contact:	linux-pm@vger.kernel.org
+Description:	Energy performance preference
+
+		Read/write an 8-bit integer from/to this file. This file
+		represents a range of values from 0 (performance preference) to
+		0xFF (energy efficiency preference) that influences the rate of
+		performance increase/decrease and the result of the hardware's
+		energy efficiency and performance optimization policies.
+
+		Writing to this file only has meaning when Autonomous Selection is
+		enabled.
+
+		This file is only present if the cppc-cpufreq driver is in use.
+

 What:		/sys/devices/system/cpu/cpu*/cache/index3/cache_disable_{0,1}
 Date:		August 2008
@ -485,6 +539,7 @@ What:		/sys/devices/system/cpu/cpuX/regs/
 		/sys/devices/system/cpu/cpuX/regs/identification/
 		/sys/devices/system/cpu/cpuX/regs/identification/midr_el1
 		/sys/devices/system/cpu/cpuX/regs/identification/revidr_el1
+		/sys/devices/system/cpu/cpuX/regs/identification/aidr_el1
 		/sys/devices/system/cpu/cpuX/regs/identification/smidr_el1
 Date:		June 2016
 Contact:	Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org>
@ -517,6 +572,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
 		/sys/devices/system/cpu/vulnerabilities/mds
 		/sys/devices/system/cpu/vulnerabilities/meltdown
 		/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
+		/sys/devices/system/cpu/vulnerabilities/old_microcode
 		/sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
 		/sys/devices/system/cpu/vulnerabilities/retbleed
 		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
@ -703,6 +759,17 @@ Description:
 		participate in load balancing. These CPUs are set by
 		boot parameter "isolcpus=".

+What:		/sys/devices/system/cpu/housekeeping
+Date:		Oct 2025
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:
+		(RO) the list of logical CPUs that are designated by the kernel as
+		"housekeeping". Each CPU are responsible for handling essential
+		system-wide background tasks, including RCU callbacks, delayed
+		timer callbacks, and unbound workqueues, minimizing scheduling
+		jitter on low-latency, isolated CPUs. These CPUs are set when boot
+		parameter "isolcpus=nohz" or "nohz_full=" is specified.
+
 What:		/sys/devices/system/cpu/crash_hotplug
 Date:		Aug 2023
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
--- a/Documentation/ABI/testing/sysfs-driver-qat
+++ b/Documentation/ABI/testing/sysfs-driver-qat
@ -14,7 +14,7 @@ Description:	(RW) Reports the current state of the QAT device. Write to
 		It is possible to transition the device from up to down only
 		if the device is up and vice versa.

-		This attribute is only available for qat_4xxx devices.
+		This attribute is available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat/cfg_services
 Date:		June 2022
@ -23,24 +23,28 @@ Contact:	qat-linux@intel.com
 Description:	(RW) Reports the current configuration of the QAT device.
 		Write to the file to change the configured services.

-		The values are:
+		One or more services can be enabled per device.
+		Certain configurations are restricted to specific device types;
+		where applicable this is explicitly indicated, for example
+		(qat_6xxx) denotes applicability exclusively to that device series.

-		* sym;asym: the device is configured for running crypto
-		  services
-		* asym;sym: identical to sym;asym
-		* dc: the device is configured for running compression services
-		* dcc: identical to dc but enables the dc chaining feature,
-		  hash then compression. If this is not required chose dc
-		* sym: the device is configured for running symmetric crypto
-		  services
-		* asym: the device is configured for running asymmetric crypto
-		  services
-		* asym;dc: the device is configured for running asymmetric
-		  crypto services and compression services
-		* dc;asym: identical to asym;dc
-		* sym;dc: the device is configured for running symmetric crypto
-		  services and compression services
-		* dc;sym: identical to sym;dc
+		The available services include:
+
+		* sym: Configures the device for symmetric cryptographic operations.
+		* asym: Configures the device for asymmetric cryptographic operations.
+		* dc: Configures the device for compression and decompression
+		  operations.
+		* dcc: Similar to dc, but with the additional dc chaining feature
+		  enabled, cipher then compress (qat_6xxx), hash then compression.
+		  If this is not required choose dc.
+		* decomp: Configures the device for decompression operations (qat_6xxx).
+
+		Service combinations are permitted for all services except dcc.
+		On QAT GEN4 devices (qat_4xxx driver) a maximum of two services can be
+		combined and on QAT GEN6 devices (qat_6xxx driver ) a maximum of three
+		services can be combined.
+		The order of services is not significant. For instance, sym;asym is
+		functionally equivalent to asym;sym.

 		It is possible to set the configuration only if the device
 		is in the `down` state (see /sys/bus/pci/devices/<BDF>/qat/state)
@ -59,7 +63,7 @@ Description:	(RW) Reports the current configuration of the QAT device.
 			# cat /sys/bus/pci/devices/<BDF>/qat/cfg_services
 			dc

-		This attribute is only available for qat_4xxx devices.
+		This attribute is available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat/pm_idle_enabled
 Date:		June 2023
@ -94,7 +98,7 @@ Description:	(RW) This configuration option provides a way to force the device i
 			# cat /sys/bus/pci/devices/<BDF>/qat/pm_idle_enabled
 			0

-		This attribute is only available for qat_4xxx devices.
+		This attribute is available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat/rp2srv
 Date:		January 2024
@ -126,7 +130,7 @@ Description:
 			# cat /sys/bus/pci/devices/<BDF>/qat/rp2srv
 			sym

-		This attribute is only available for qat_4xxx devices.
+		This attribute is available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat/num_rps
 Date:		January 2024
@ -140,7 +144,7 @@ Description:
 			# cat /sys/bus/pci/devices/<BDF>/qat/num_rps
 			64

-		This attribute is only available for qat_4xxx devices.
+		This attribute is available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat/auto_reset
 Date:		May 2024
@ -160,4 +164,4 @@ Description:	(RW) Reports the current state of the autoreset feature
 		* 0/Nn/off: auto reset disabled. If the device encounters an
 		  unrecoverable error, it will not be reset.

-		This attribute is only available for qat_4xxx devices.
+		This attribute is available for qat_4xxx and qat_6xxx devices.
--- a/Documentation/ABI/testing/sysfs-driver-qat_ras
+++ b/Documentation/ABI/testing/sysfs-driver-qat_ras
@ -4,7 +4,7 @@ KernelVersion:	6.7
 Contact:	qat-linux@intel.com
 Description:	(RO) Reports the number of correctable errors detected by the device.

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat_ras/errors_nonfatal
 Date:		January 2024
@ -12,7 +12,7 @@ KernelVersion:	6.7
 Contact:	qat-linux@intel.com
 Description:	(RO) Reports the number of non fatal errors detected by the device.

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat_ras/errors_fatal
 Date:		January 2024
@ -20,7 +20,7 @@ KernelVersion:	6.7
 Contact:	qat-linux@intel.com
 Description:	(RO) Reports the number of fatal errors detected by the device.

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat_ras/reset_error_counters
 Date:		January 2024
@ -38,4 +38,4 @@ Description:	(WO) Write to resets all error counters of a device.
 			# cat /sys/bus/pci/devices/<BDF>/qat_ras/errors_fatal
 			0

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.
--- a/Documentation/ABI/testing/sysfs-driver-qat_rl
+++ b/Documentation/ABI/testing/sysfs-driver-qat_rl
@ -31,7 +31,7 @@ Description:
 		* rm_all: Removes all the configured SLAs.
 			* Inputs: None

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat_rl/rp
 Date:		January 2024
@ -68,7 +68,7 @@ Description:
 			## Write
 			# echo 0x5 > /sys/bus/pci/devices/<BDF>/qat_rl/rp

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat_rl/id
 Date:		January 2024
@ -101,7 +101,7 @@ Description:
 			# cat /sys/bus/pci/devices/<BDF>/qat_rl/rp
 			0x5  ## ring pair ID 0 and ring pair ID 2

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat_rl/cir
 Date:		January 2024
@ -135,7 +135,7 @@ Description:
 			# cat /sys/bus/pci/devices/<BDF>/qat_rl/cir
 			500

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat_rl/pir
 Date:		January 2024
@ -169,7 +169,7 @@ Description:
 			# cat /sys/bus/pci/devices/<BDF>/qat_rl/pir
 			750

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat_rl/srv
 Date:		January 2024
@ -202,7 +202,7 @@ Description:
 			# cat /sys/bus/pci/devices/<BDF>/qat_rl/srv
 			dc

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.

 What:		/sys/bus/pci/devices/<BDF>/qat_rl/cap_rem
 Date:		January 2024
@ -223,4 +223,4 @@ Description:
 			# cat /sys/bus/pci/devices/<BDF>/qat_rl/cap_rem
 			0

-		This attribute is only available for qat_4xxx devices.
+		This attribute is only available for qat_4xxx and qat_6xxx devices.
--- a/Documentation/ABI/testing/sysfs-driver-spi-intel
+++ b/Documentation/ABI/testing/sysfs-driver-spi-intel
@ -0,0 +1,20 @@
+What:		/sys/devices/.../intel_spi_protected
+Date:		Feb 2025
+KernelVersion:	6.13
+Contact:	Alexander Usyskin <alexander.usyskin@intel.com>
+Description:	This attribute allows the userspace to check if the
+		Intel SPI flash controller is write protected from the host.
+
+What:		/sys/devices/.../intel_spi_locked
+Date:		Feb 2025
+KernelVersion:	6.13
+Contact:	Alexander Usyskin <alexander.usyskin@intel.com>
+Description:	This attribute allows the user space to check if the
+		Intel SPI flash controller locks supported opcodes.
+
+What:		/sys/devices/.../intel_spi_bios_locked
+Date:		Feb 2025
+KernelVersion:	6.13
+Contact:	Alexander Usyskin <alexander.usyskin@intel.com>
+Description:	This attribute allows the user space to check if the
+		Intel SPI flash controller BIOS region is locked for writes.
--- a/Documentation/ABI/testing/sysfs-driver-typec-displayport
+++ b/Documentation/ABI/testing/sysfs-driver-typec-displayport
@ -62,3 +62,13 @@ Description:
 			     by VESA DisplayPort Alt Mode on USB Type-C Standard.
 			- 0 when HPD’s logical state is low (HPD_Low) as defined by
 			     VESA DisplayPort Alt Mode on USB Type-C Standard.
+
+What:		/sys/bus/typec/devices/.../displayport/irq_hpd
+Date:		June 2025
+Contact:	RD Babiera <rdbabiera@google.com>
+Description:
+		IRQ_HPD events are sent over the USB PD protocol in Status Update and
+		Attention messages. IRQ_HPD can only be asserted when HPD is high,
+		and is asserted when an IRQ_HPD has been issued since the last Status
+		Update. This is a read only node that returns the number of IRQ events
+		raised in the driver's lifetime.
--- a/Documentation/ABI/testing/sysfs-firmware-acpi
+++ b/Documentation/ABI/testing/sysfs-firmware-acpi
@ -108,15 +108,15 @@ Description:
 		number of a "General Purpose Events" (GPE).

 		A GPE vectors to a specified handler in AML, which
-		can do a anything the BIOS writer wants from
+		can do anything the BIOS writer wants from
 		OS context.  GPE 0x12, for example, would vector
 		to a level or edge handler called _L12 or _E12.
 		The handler may do its business and return.
-		Or the handler may send send a Notify event
+		Or the handler may send a Notify event
 		to a Linux device driver registered on an ACPI device,
 		such as a battery, or a processor.

-		To figure out where all the SCI's are coming from,
+		To figure out where all the SCIs are coming from,
 		/sys/firmware/acpi/interrupts contains a file listing
 		every possible source, and the count of how many
 		times it has triggered::
--- a/Documentation/ABI/testing/sysfs-firmware-efi
+++ b/Documentation/ABI/testing/sysfs-firmware-efi
@ -36,3 +36,10 @@ Description:	Displays the content of the Runtime Configuration Interface
 		Table version 2 on Dell EMC PowerEdge systems in binary format
 Users:		It is used by Dell EMC OpenManage Server Administrator tool to
 		populate BIOS setup page.
+
+What:		/sys/firmware/efi/ovmf_debug_log
+Date:		July 2025
+Contact:	Gerd Hoffmann <kraxel@redhat.com>, linux-efi@vger.kernel.org
+Description:	Displays the content of the OVMF debug log buffer.  The file is
+		only present in case the firmware supports logging to a memory
+		buffer.
--- a/Documentation/ABI/testing/sysfs-kernel-rcu_stall_count
+++ b/Documentation/ABI/testing/sysfs-kernel-rcu_stall_count
@ -0,0 +1,6 @@
+What:		/sys/kernel/rcu_stall_count
+Date:		May 2025
+KernelVersion:	6.16
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:
+		Shows how many times the system has detected an RCU stall since last boot.
--- a/Documentation/ABI/testing/sysfs-platform-dell-privacy-wmi
+++ b/Documentation/ABI/testing/sysfs-platform-dell-privacy-wmi
@ -1,4 +1,4 @@
-What:		/sys/bus/wmi/devices/6932965F-1671-4CEB-B988-D3AB0A901919/dell_privacy_supported_type
+What:		/sys/bus/wmi/devices/6932965F-1671-4CEB-B988-D3AB0A901919[-X]/dell_privacy_supported_type
 Date:		Apr 2021
 KernelVersion:	5.13
 Contact:	"<perry.yuan@dell.com>"
@ -29,12 +29,12 @@ Description:

 		For example to check which privacy devices are supported::

-		    # cat /sys/bus/wmi/drivers/dell-privacy/6932965F-1671-4CEB-B988-D3AB0A901919/dell_privacy_supported_type
+		    # cat /sys/bus/wmi/drivers/dell-privacy/6932965F-1671-4CEB-B988-D3AB0A901919*/dell_privacy_supported_type
 		    [Microphone Mute] [supported]
 		    [Camera Shutter] [supported]
 		    [ePrivacy Screen] [unsupported]

-What:		/sys/bus/wmi/devices/6932965F-1671-4CEB-B988-D3AB0A901919/dell_privacy_current_state
+What:		/sys/bus/wmi/devices/6932965F-1671-4CEB-B988-D3AB0A901919[-X]/dell_privacy_current_state
 Date:		Apr 2021
 KernelVersion:	5.13
 Contact:	"<perry.yuan@dell.com>"
@ -66,6 +66,6 @@ Description:

 		For example to check all supported current privacy device states::

-		    # cat /sys/bus/wmi/drivers/dell-privacy/6932965F-1671-4CEB-B988-D3AB0A901919/dell_privacy_current_state
+		    # cat /sys/bus/wmi/drivers/dell-privacy/6932965F-1671-4CEB-B988-D3AB0A901919*/dell_privacy_current_state
 		    [Microphone] [unmuted]
 		    [Camera Shutter] [unmuted]
--- a/Documentation/ABI/testing/sysfs-platform-intel-wmi-sbl-fw-update
+++ b/Documentation/ABI/testing/sysfs-platform-intel-wmi-sbl-fw-update
@ -1,4 +1,4 @@
-What:		/sys/bus/wmi/devices/44FADEB1-B204-40F2-8581-394BBDC1B651/firmware_update_request
+What:		/sys/bus/wmi/devices/44FADEB1-B204-40F2-8581-394BBDC1B651[-X]/firmware_update_request
 Date:		April 2020
 KernelVersion:	5.7
 Contact:	"Jithu Joseph" <jithu.joseph@intel.com>
--- a/Documentation/ABI/testing/sysfs-platform-intel-wmi-thunderbolt
+++ b/Documentation/ABI/testing/sysfs-platform-intel-wmi-thunderbolt
@ -1,4 +1,4 @@
-What:		/sys/devices/platform/<platform>/force_power
+What:		/sys/bus/wmi/devices/86CCFD48-205E-4A77-9C48-2021CBEDE341[-X]/force_power
 Date:		September 2017
 KernelVersion:	4.15
 Contact:	"Mario Limonciello" <mario.limonciello@outlook.com>
--- a/Documentation/ABI/testing/sysfs-secvar
+++ b/Documentation/ABI/testing/sysfs-secvar
@ -22,9 +22,13 @@ Description:	A string indicating which backend is in use by the firmware.
 		and is expected to be "ibm,edk2-compat-v1".

 		On pseries/PLPKS, this is generated by the kernel based on the
-		version number in the SB_VERSION variable in the keystore, and
-		has the form "ibm,plpks-sb-v<version>", or
-		"ibm,plpks-sb-unknown" if there is no SB_VERSION variable.
+		version number in the SB_VERSION variable in the keystore. The
+		version numbering in the SB_VERSION variable starts from 1. The
+		format string takes the form "ibm,plpks-sb-v<version>" in the
+		case of dynamic key management mode. If the SB_VERSION variable
+		does not exist (or there is an error while reading it), it takes
+		the form "ibm,plpks-sb-v0", indicating that the key management
+		mode is static.

 What:		/sys/firmware/secvar/vars/<variable name>
 Date:		August 2019
@ -34,6 +38,13 @@ Description:	Each secure variable is represented as a directory named as
 		representation. The data and size can be determined by reading
 		their respective attribute files.

+		Only secvars relevant to the key management mode are exposed.
+		Only in the dynamic key management mode should the user have
+		access (read and write) to the secure boot secvars db, dbx,
+		grubdb, grubdbx, and sbat. These secvars are not consumed in the
+		static key management mode. PK, trustedcadb and moduledb are the
+		secvars common to both static and dynamic key management modes.
+
 What:		/sys/firmware/secvar/vars/<variable_name>/size
 Date:		August 2019
 Contact:	Nayna Jain <nayna@linux.ibm.com>
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@ -101,22 +101,6 @@ quiet_cmd_sphinx = SPHINX  $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
 		cp $(if $(patsubst /%,,$(DOCS_CSS)),$(abspath $(srctree)/$(DOCS_CSS)),$(DOCS_CSS)) $(BUILDDIR)/$3/_static/; \
 	fi

-YNL_INDEX:=$(srctree)/Documentation/networking/netlink_spec/index.rst
-YNL_RST_DIR:=$(srctree)/Documentation/networking/netlink_spec
-YNL_YAML_DIR:=$(srctree)/Documentation/netlink/specs
-YNL_TOOL:=$(srctree)/tools/net/ynl/pyynl/ynl_gen_rst.py
-
-YNL_RST_FILES_TMP := $(patsubst %.yaml,%.rst,$(wildcard $(YNL_YAML_DIR)/*.yaml))
-YNL_RST_FILES := $(patsubst $(YNL_YAML_DIR)%,$(YNL_RST_DIR)%, $(YNL_RST_FILES_TMP))
-
-$(YNL_INDEX): $(YNL_RST_FILES)
-	$(Q)$(YNL_TOOL) -o $@ -x
-
-$(YNL_RST_DIR)/%.rst: $(YNL_YAML_DIR)/%.yaml $(YNL_TOOL)
-	$(Q)$(YNL_TOOL) -i $< -o $@
-
-htmldocs texinfodocs latexdocs epubdocs xmldocs: $(YNL_INDEX)
-
 htmldocs:
 	@$(srctree)/scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
@ -183,7 +167,6 @@ refcheckdocs:
 	$(Q)cd $(srctree);scripts/documentation-file-ref-check

 cleandocs:
-	$(Q)rm -f $(YNL_INDEX) $(YNL_RST_FILES)
 	$(Q)rm -rf $(BUILDDIR)
 	$(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/userspace-api/media clean

--- a/Documentation/PCI/controller/index.rst
+++ b/Documentation/PCI/controller/index.rst
@ -0,0 +1,10 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
+PCI Native Host Bridge and Endpoint Drivers
+===========================================
+
+.. toctree::
+   :maxdepth: 2
+
+   rcar-pcie-firmware
--- a/Documentation/PCI/controller/rcar-pcie-firmware.rst
+++ b/Documentation/PCI/controller/rcar-pcie-firmware.rst
@ -0,0 +1,32 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================================================
+Firmware of PCIe controller for Renesas R-Car V4H
+=================================================
+
+Renesas R-Car V4H (r8a779g0) has a PCIe controller, requiring a specific
+firmware download during startup.
+
+However, Renesas currently cannot distribute the firmware free of charge.
+
+The firmware file "104_PCIe_fw_addr_data_ver1.05.txt" (note that the file name
+might be different between different datasheet revisions) can be found in the
+datasheet encoded as text, and as such, the file's content must be converted
+back to binary form. This can be achieved using the following example script:
+
+.. code-block:: sh
+
+	$ awk '/^\s*0x[0-9A-Fa-f]{4}\s+0x[0-9A-Fa-f]{4}/ { print substr($2,5,2) substr($2,3,2) }' \
+		104_PCIe_fw_addr_data_ver1.05.txt | \
+			xxd -p -r > rcar_gen4_pcie.bin
+
+Once the text content has been converted into a binary firmware file, verify
+its checksum as follows:
+
+.. code-block:: sh
+
+	$ sha1sum rcar_gen4_pcie.bin
+	1d0bd4b189b4eb009f5d564b1f93a79112994945  rcar_gen4_pcie.bin
+
+The resulting binary file called "rcar_gen4_pcie.bin" should be placed in the
+"/lib/firmware" directory before the driver runs.
--- a/Documentation/PCI/endpoint/pci-endpoint.rst
+++ b/Documentation/PCI/endpoint/pci-endpoint.rst
@ -57,11 +57,10 @@ by the PCI controller driver.
   The PCI controller driver can then create a new EPC device by invoking
   devm_pci_epc_create()/pci_epc_create().

-* devm_pci_epc_destroy()/pci_epc_destroy()
+* pci_epc_destroy()

-   The PCI controller driver can destroy the EPC device created by either
-   devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or
-   pci_epc_destroy().
+   The PCI controller driver can destroy the EPC device created by
+   pci_epc_create() using pci_epc_destroy().

 * pci_epc_linkup()

--- a/Documentation/PCI/endpoint/pci-test-howto.rst
+++ b/Documentation/PCI/endpoint/pci-test-howto.rst
@ -203,3 +203,18 @@ controllers, it is advisable to skip this testcase using this
 command::

 	# pci_endpoint_test -f pci_ep_bar -f pci_ep_basic -v memcpy -T COPY_TEST -v dma
+
+Kselftest EP Doorbell
+~~~~~~~~~~~~~~~~~~~~~
+
+If the Endpoint MSI controller is used for the doorbell usecase, run below
+command for testing it:
+
+	# pci_endpoint_test -f pcie_ep_doorbell
+
+	# Starting 1 tests from 1 test cases.
+	#  RUN           pcie_ep_doorbell.DOORBELL_TEST ...
+	#            OK  pcie_ep_doorbell.DOORBELL_TEST
+	ok 1 pcie_ep_doorbell.DOORBELL_TEST
+	# PASSED: 1 / 1 tests passed.
+	# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@ -17,5 +17,6 @@ PCI Bus Subsystem
   pci-error-recovery
   pcieaer-howto
   endpoint/index
+   controller/index
   boot-interrupts
   tph
--- a/Documentation/PCI/pcieaer-howto.rst
+++ b/Documentation/PCI/pcieaer-howto.rst
@ -85,12 +85,27 @@ In the example, 'Requester ID' means the ID of the device that sent
 the error message to the Root Port. Please refer to PCIe specs for other
 fields.

+AER Ratelimits
+--------------
+
+Since error messages can be generated for each transaction, we may see
+large volumes of errors reported. To prevent spammy devices from flooding
+the console/stalling execution, messages are throttled by device and error
+type (correctable vs. non-fatal uncorrectable).  Fatal errors, including
+DPC errors, are not ratelimited.
+
+AER uses the default ratelimit of DEFAULT_RATELIMIT_BURST (10 events) over
+DEFAULT_RATELIMIT_INTERVAL (5 seconds).
+
+Ratelimits are exposed in the form of sysfs attributes and configurable.
+See Documentation/ABI/testing/sysfs-bus-pci-devices-aer.
+
 AER Statistics / Counters
 -------------------------

 When PCIe AER errors are captured, the counters / statistics are also exposed
 in the form of sysfs attributes which are documented at
-Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
+Documentation/ABI/testing/sysfs-bus-pci-devices-aer.

 Developer Guide
 ===============
--- a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
+++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
@ -286,6 +286,39 @@ in order to detect the beginnings and ends of grace periods in a
 distributed fashion. The values flow from ``rcu_state`` to ``rcu_node``
 (down the tree from the root to the leaves) to ``rcu_data``.

+-----------------------------------------------------------------------+
+| **Quick Quiz**:                                                       |
+-----------------------------------------------------------------------+
+| Given that the root rcu_node structure has a gp_seq field,            |
+| why does RCU maintain a separate gp_seq in the rcu_state structure?   |
+| Why not just use the root rcu_node's gp_seq as the official record    |
+| and update it directly when starting a new grace period?              |
+-----------------------------------------------------------------------+
+| **Answer**:                                                           |
+-----------------------------------------------------------------------+
+| On single-node RCU trees (where the root node is also a leaf),        |
+| updating the root node's gp_seq immediately would create unnecessary  |
+| lock contention. Here's why:                                          |
+|                                                                       |
+| If we did rcu_seq_start() directly on the root node's gp_seq:         |
+|                                                                       |
+| 1. All CPUs would immediately see their node's gp_seq from their rdp's|
+|    gp_seq, in rcu_pending(). They would all then invoke the RCU-core. |
+| 2. Which calls note_gp_changes() and try to acquire the node lock.    |
+| 3. But rnp->qsmask isn't initialized yet (happens later in            |
+|    rcu_gp_init())                                                     |
+| 4. So each CPU would acquire the lock, find it can't determine if it  |
+|    needs to report quiescent state (no qsmask), update rdp->gp_seq,   |
+|    and release the lock.                                              |
+| 5. Result: Lots of lock acquisitions with no grace period progress    |
+|                                                                       |
+| By having a separate rcu_state.gp_seq, we can increment the official  |
+| grace period counter without immediately affecting what CPUs see in   |
+| their nodes. The hierarchical propagation in rcu_gp_init() then       |
+| updates the root node's gp_seq and qsmask together under the same lock|
+| acquisition, avoiding this useless contention.                        |
+-----------------------------------------------------------------------+
+
 Miscellaneous
 '''''''''''''

--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@ -1970,6 +1970,134 @@ corresponding CPU's leaf node lock is held. This avoids race conditions
 between RCU's hotplug notifier hooks, the grace period initialization
 code, and the FQS loop, all of which refer to or modify this bookkeeping.

+Note that grace period initialization (rcu_gp_init()) must carefully sequence
+CPU hotplug scanning with grace period state changes. For example, the
+following race could occur in rcu_gp_init() if rcu_seq_start() were to happen
+after the CPU hotplug scanning.
+
+.. code-block:: none
+
+   CPU0 (rcu_gp_init)                   CPU1                          CPU2
+   ---------------------                ----                          ----
+   // Hotplug scan first (WRONG ORDER)
+   rcu_for_each_leaf_node(rnp) {
+       rnp->qsmaskinit = rnp->qsmaskinitnext;
+   }
+                                        rcutree_report_cpu_starting()
+                                            rnp->qsmaskinitnext |= mask;
+                                        rcu_read_lock()
+                                        r0 = *X;
+                                                                      r1 = *X;
+                                                                      X = NULL;
+                                                                      cookie = get_state_synchronize_rcu();
+                                                                      // cookie = 8 (future GP)
+   rcu_seq_start(&rcu_state.gp_seq);
+   // gp_seq = 5
+
+   // CPU1 now invisible to this GP!
+   rcu_for_each_node_breadth_first() {
+       rnp->qsmask = rnp->qsmaskinit;
+       // CPU1 not included!
+   }
+
+   // GP completes without CPU1
+   rcu_seq_end(&rcu_state.gp_seq);
+   // gp_seq = 8
+                                                                      poll_state_synchronize_rcu(cookie);
+                                                                      // Returns true!
+                                                                      kfree(r1);
+                                        r2 = *r0; // USE-AFTER-FREE!
+
+By incrementing gp_seq first, CPU1's RCU read-side critical section
+is guaranteed to not be missed by CPU2.
+
+**Concurrent Quiescent State Reporting for Offline CPUs**
+
+RCU must ensure that CPUs going offline report quiescent states to avoid
+blocking grace periods. This requires careful synchronization to handle
+race conditions
+
+**Race condition causing Offline CPU to hang GP**
+
+A race between CPU offlining and new GP initialization (gp_init) may occur
+because `rcu_report_qs_rnp()` in `rcutree_report_cpu_dead()` must temporarily
+release the `rcu_node` lock to wake the RCU grace-period kthread:
+
+.. code-block:: none
+
+   CPU1 (going offline)                 CPU0 (GP kthread)
+   --------------------                 -----------------
+   rcutree_report_cpu_dead()
+     rcu_report_qs_rnp()
+       // Must release rnp->lock to wake GP kthread
+       raw_spin_unlock_irqrestore_rcu_node()
+                                        // Wakes up and starts new GP
+                                        rcu_gp_init()
+                                          // First loop:
+                                          copies qsmaskinitnext->qsmaskinit
+                                          // CPU1 still in qsmaskinitnext!
+
+                                          // Second loop:
+                                          rnp->qsmask = rnp->qsmaskinit
+                                          mask = rnp->qsmask & ~rnp->qsmaskinitnext
+                                          // mask is 0! CPU1 still in both masks
+       // Reacquire lock (but too late)
+     rnp->qsmaskinitnext &= ~mask       // Finally clears bit
+
+Without `ofl_lock`, the new grace period includes the offline CPU and waits
+forever for its quiescent state causing a GP hang.
+
+**A solution with ofl_lock**
+
+The `ofl_lock` (offline lock) prevents `rcu_gp_init()` from running during
+the vulnerable window when `rcu_report_qs_rnp()` has released `rnp->lock`:
+
+.. code-block:: none
+
+   CPU0 (rcu_gp_init)                   CPU1 (rcutree_report_cpu_dead)
+   ------------------                   ------------------------------
+   rcu_for_each_leaf_node(rnp) {
+       arch_spin_lock(&ofl_lock) -----> arch_spin_lock(&ofl_lock) [BLOCKED]
+
+       // Safe: CPU1 can't interfere
+       rnp->qsmaskinit = rnp->qsmaskinitnext
+
+       arch_spin_unlock(&ofl_lock) ---> // Now CPU1 can proceed
+   }                                    // But snapshot already taken
+
+**Another race causing GP hangs in rcu_gpu_init(): Reporting QS for Now-offline CPUs**
+
+After the first loop takes an atomic snapshot of online CPUs, as shown above,
+the second loop in `rcu_gp_init()` detects CPUs that went offline between
+releasing `ofl_lock` and acquiring the per-node `rnp->lock`. This detection is
+crucial because:
+
+1. The CPU might have gone offline after the snapshot but before the second loop
+2. The offline CPU cannot report its own QS if it's already dead
+3. Without this detection, the grace period would wait forever for CPUs that
+   are now offline.
+
+The second loop performs this detection safely:
+
+.. code-block:: none
+
+   rcu_for_each_node_breadth_first(rnp) {
+       raw_spin_lock_irqsave_rcu_node(rnp, flags);
+       rnp->qsmask = rnp->qsmaskinit;  // Apply the snapshot
+
+       // Detect CPUs offline after snapshot
+       mask = rnp->qsmask & ~rnp->qsmaskinitnext;
+
+       if (mask && rcu_is_leaf_node(rnp))
+           rcu_report_qs_rnp(mask, ...)  // Report QS for offline CPUs
+   }
+
+This approach ensures atomicity: quiescent state reporting for offline CPUs
+happens either in `rcu_gp_init()` (second loop) or in `rcutree_report_cpu_dead()`,
+never both and never neither. The `rnp->lock` held throughout the sequence
+prevents races - `rcutree_report_cpu_dead()` also acquires this lock when
+clearing `qsmaskinitnext`, ensuring mutual exclusion.
+
 Scheduler and RCU
 ~~~~~~~~~~~~~~~~~

--- a/Documentation/RCU/listRCU.rst
+++ b/Documentation/RCU/listRCU.rst
@ -334,7 +334,7 @@ If the system-call audit module were to ever need to reject stale data, one way
 to accomplish this would be to add a ``deleted`` flag and a ``lock`` spinlock to the
 ``audit_entry`` structure, and modify audit_filter_task() as follows::

-	static enum audit_state audit_filter_task(struct task_struct *tsk)
+	static struct audit_entry *audit_filter_task(struct task_struct *tsk, char **key)
 	{
 		struct audit_entry *e;
 		enum audit_state   state;
@ -346,16 +346,18 @@ to accomplish this would be to add a ``deleted`` flag and a ``lock`` spinlock to
 				if (e->deleted) {
 					spin_unlock(&e->lock);
 					rcu_read_unlock();
-					return AUDIT_BUILD_CONTEXT;
+					return NULL;
 				}
 				rcu_read_unlock();
 				if (state == AUDIT_STATE_RECORD)
 					*key = kstrdup(e->rule.filterkey, GFP_ATOMIC);
-				return state;
+				/* As long as e->lock is held, e is valid and
+				 * its value is not stale */
+				return e;
 			}
 		}
 		rcu_read_unlock();
-		return AUDIT_BUILD_CONTEXT;
+		return NULL;
 	}

 The ``audit_del_rule()`` function would need to set the ``deleted`` flag under the
--- a/Documentation/RCU/lockdep.rst
+++ b/Documentation/RCU/lockdep.rst
@ -106,7 +106,7 @@ or the RCU-protected data that it points to can change concurrently.
 Like rcu_dereference(), when lockdep is enabled, RCU list and hlist
 traversal primitives check for being called from within an RCU read-side
 critical section.  However, a lockdep expression can be passed to them
-as a additional optional argument.  With this lockdep expression, these
+as an additional optional argument.  With this lockdep expression, these
 traversal primitives will complain only if the lockdep expression is
 false and they are called from outside any RCU read-side critical section.

--- a/Documentation/RCU/rcubarrier.rst
+++ b/Documentation/RCU/rcubarrier.rst
@ -329,10 +329,7 @@ Answer:
 	was first added back in 2005.  This is because on_each_cpu()
 	disables preemption, which acted as an RCU read-side critical
 	section, thus preventing CPU 0's grace period from completing
-	until on_each_cpu() had dealt with all of the CPUs.  However,
-	with the advent of preemptible RCU, rcu_barrier() no longer
-	waited on nonpreemptible regions of code in preemptible kernels,
-	that being the job of the new rcu_barrier_sched() function.
+	until on_each_cpu() had dealt with all of the CPUs.

 	However, with the RCU flavor consolidation around v4.20, this
 	possibility was once again ruled out, because the consolidated
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@ -96,6 +96,13 @@ warnings:
 	the ``rcu_.*timer wakeup didn't happen for`` console-log message,
 	which will include additional debugging information.

+-	A timer issue causes time to appear to jump forward, so that RCU
+	believes that the RCU CPU stall-warning timeout has been exceeded
+	when in fact much less time has passed.  This could be due to
+	timer hardware bugs, timer driver bugs, or even corruption of
+	the "jiffies" global variable.	These sorts of timer hardware
+	and driver bugs are not uncommon when testing new hardware.
+
 -	A low-level kernel issue that either fails to invoke one of the
 	variants of rcu_eqs_enter(true), rcu_eqs_exit(true), ct_idle_enter(),
 	ct_idle_exit(), ct_irq_enter(), or ct_irq_exit() on the one
@ -112,7 +119,7 @@ warnings:
 	uncommon in large datacenter.  In one memorable case some decades
 	back, a CPU failed in a running system, becoming unresponsive,
 	but not causing an immediate crash.  This resulted in a series
-	of RCU CPU stall warnings, eventually leading the realization
+	of RCU CPU stall warnings, eventually leading to the realization
 	that the CPU had failed.

 The RCU, RCU-sched, RCU-tasks, and RCU-tasks-trace implementations have
@ -249,7 +256,7 @@ ticks this GP)" indicates that this CPU has not taken any scheduling-clock
 interrupts during the current stalled grace period.

 The "idle=" portion of the message prints the dyntick-idle state.
-The hex number before the first "/" is the low-order 12 bits of the
+The hex number before the first "/" is the low-order 16 bits of the
 dynticks counter, which will have an even-numbered value if the CPU
 is in dyntick-idle mode and an odd-numbered value otherwise.  The hex
 number between the two "/"s is the value of the nesting, which will be
--- a/Documentation/RCU/torture.rst
+++ b/Documentation/RCU/torture.rst
@ -364,7 +364,7 @@ systems must come first.
 The kvm.sh ``--dryrun scenarios`` argument is useful for working out
 how many scenarios may be run in one batch across a group of systems.

-You can also re-run a previous remote run in a manner similar to kvm.sh:
+You can also re-run a previous remote run in a manner similar to kvm.sh::

 	kvm-remote.sh "system0 system1 system2 system3 system4 system5" \
 		tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28-remote \
--- a/Documentation/RCU/whatisRCU.rst
+++ b/Documentation/RCU/whatisRCU.rst
@ -15,6 +15,9 @@ to start learning about RCU:
 |	2014 Big API Table           https://lwn.net/Articles/609973/
 | 6.	The RCU API, 2019 Edition    https://lwn.net/Articles/777036/
 |	2019 Big API Table           https://lwn.net/Articles/777165/
+| 7.	The RCU API, 2024 Edition    https://lwn.net/Articles/988638/
+|       2024 Background Information  https://lwn.net/Articles/988641/
+|	2024 Big API Table           https://lwn.net/Articles/988666/

 For those preferring video:

--- a/Documentation/accel/qaic/aic080.rst
+++ b/Documentation/accel/qaic/aic080.rst
@ -0,0 +1,14 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+===============================
+ Qualcomm Cloud AI 80 (AIC080)
+===============================
+
+Overview
+========
+
+The Qualcomm Cloud AI 80/AIC080 family of products are a derivative of AIC100.
+The number of NSPs and clock rates are reduced to fit within resource
+constrained solutions. The PCIe Product ID is 0xa080.
+
+As a derivative product, all AIC100 documentation applies.
--- a/Documentation/accel/qaic/aic100.rst
+++ b/Documentation/accel/qaic/aic100.rst
@ -229,6 +229,8 @@ of the defined channels, and their uses.
 | _PERIODIC      |         |          | timestamps in the device side logs with|
 |                |         |          | the host time source.                  |
 +----------------+---------+----------+----------------------------------------+
+| IPCR           | 24 & 25 | AMSS     | AF_QIPCRTR clients and servers.        |
+----------------+---------+----------+----------------------------------------+

 DMA Bridge
 ==========
@ -485,8 +487,8 @@ one user crashes, the fallout of that should be limited to that workload and not
 impact other workloads. SSR accomplishes this.

 If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
-channel. This notification identifies the workload by it's assigned DBC. A
-multi-stage recovery process is then used to cleanup both sides, and get the
+channel. This notification identifies the workload by its assigned DBC. A
+multi-stage recovery process is then used to cleanup both sides, and gets the
 DBC/NSPs into a working state.

 When SSR occurs, any state in the workload is lost. Any inputs that were in
@ -494,6 +496,27 @@ process, or queued by not yet serviced, are lost. The loaded artifacts will
 remain in on-card DDR, but the host will need to re-activate the workload if
 it desires to recover the workload.

+When SSR occurs for a specific NSP, the assigned DBC goes through the
+following state transactions in order:
+
+DBC_STATE_BEFORE_SHUTDOWN
+	Indicates that the affected NSP was found in an unrecoverable error
+	condition.
+DBC_STATE_AFTER_SHUTDOWN
+	Indicates that the NSP is under reset.
+DBC_STATE_BEFORE_POWER_UP
+	Indicates that the NSP's debug information has been collected, and is
+	ready to be collected by the host (if desired). At that stage the NSP
+	is restarted by QSM.
+DBC_STATE_AFTER_POWER_UP
+	Indicates that the NSP has been restarted, fully operational and is
+	in idle state.
+
+SSR also has an optional crashdump collection feature. If enabled, the host can
+collect the memory dump for the crashed NSP and dump it to the user space via
+the dev_coredump subsystem. The host can also decline the crashdump collection
+request from the device.
+
 Reliability, Accessibility, Serviceability (RAS)
 ================================================

--- a/Documentation/accel/qaic/index.rst
+++ b/Documentation/accel/qaic/index.rst
@ -10,4 +10,5 @@ accelerator cards.
 .. toctree::

   qaic
+   aic080
   aic100
--- a/Documentation/accel/qaic/qaic.rst
+++ b/Documentation/accel/qaic/qaic.rst
@ -36,7 +36,7 @@ polling mode and reenables the IRQ line.
 This mitigation in QAIC is very effective. The same lprnet usecase that
 generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
 IRQs over 5 minutes while keeping the host system stable, and having the same
-workload throughput performance (within run to run noise variation).
+workload throughput performance (within run-to-run noise variation).

 Single MSI Mode
 ---------------
@ -49,7 +49,7 @@ useful to be able to fall back to a single MSI when needed.
 To support this fallback, we allow the case where only one MSI is able to be
 allocated, and share that one MSI between MHI and the DBCs. The device detects
 when only one MSI has been configured and directs the interrupts for the DBCs
-to the interrupt normally used for MHI. Unfortunately this means that the
+to the interrupt normally used for MHI. Unfortunately, this means that the
 interrupt handlers for every DBC and MHI wake up for every interrupt that
 arrives; however, the DBC threaded irq handlers only are started when work to be
 done is detected (MHI will always start its threaded handler).
@ -62,9 +62,9 @@ never disabled, allowing each new entry to the FIFO to trigger a new interrupt.
 Neural Network Control (NNC) Protocol
 =====================================

-The implementation of NNC is split between the KMD (QAIC) and UMD. In general
+The implementation of NNC is split between the KMD (QAIC) and UMD. In general,
 QAIC understands how to encode/decode NNC wire protocol, and elements of the
-protocol which require kernel space knowledge to process (for example, mapping
+protocol which requires kernel space knowledge to process (for example, mapping
 host memory to device IOVAs). QAIC understands the structure of a message, and
 all of the transactions. QAIC does not understand commands (the payload of a
 passthrough transaction).
--- a/Documentation/admin-guide/blockdev/index.rst
+++ b/Documentation/admin-guide/blockdev/index.rst
@ -11,6 +11,7 @@ Block Devices
   nbd
   paride
   ramdisk
+   zoned_loop
   zram

   drbd/index
--- a/Documentation/admin-guide/blockdev/zoned_loop.rst
+++ b/Documentation/admin-guide/blockdev/zoned_loop.rst
@ -0,0 +1,169 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Zoned Loop Block Device
+=======================
+
+.. Contents:
+
+	1) Overview
+	2) Creating a Zoned Device
+	3) Deleting a Zoned Device
+	4) Example
+
+
+1) Overview
+-----------
+
+The zoned loop block device driver (zloop) allows a user to create a zoned block
+device using one regular file per zone as backing storage. This driver does not
+directly control any hardware and uses read, write and truncate operations to
+regular files of a file system to emulate a zoned block device.
+
+Using zloop, zoned block devices with a configurable capacity, zone size and
+number of conventional zones can be created. The storage for each zone of the
+device is implemented using a regular file with a maximum size equal to the zone
+size. The size of a file backing a conventional zone is always equal to the zone
+size. The size of a file backing a sequential zone indicates the amount of data
+sequentially written to the file, that is, the size of the file directly
+indicates the position of the write pointer of the zone.
+
+When resetting a sequential zone, its backing file size is truncated to zero.
+Conversely, for a zone finish operation, the backing file is truncated to the
+zone size. With this, the maximum capacity of a zloop zoned block device created
+can be larger configured to be larger than the storage space available on the
+backing file system. Of course, for such configuration, writing more data than
+the storage space available on the backing file system will result in write
+errors.
+
+The zoned loop block device driver implements a complete zone transition state
+machine. That is, zones can be empty, implicitly opened, explicitly opened,
+closed or full. The current implementation does not support any limits on the
+maximum number of open and active zones.
+
+No user tools are necessary to create and delete zloop devices.
+
+2) Creating a Zoned Device
+--------------------------
+
+Once the zloop module is loaded (or if zloop is compiled in the kernel), the
+character device file /dev/zloop-control can be used to add a zloop device.
+This is done by writing an "add" command directly to the /dev/zloop-control
+device::
+
+	$ modprobe zloop
+        $ ls -l /dev/zloop*
+        crw-------. 1 root root 10, 123 Jan  6 19:18 /dev/zloop-control
+
+        $ mkdir -p <base directory/<device ID>
+        $ echo "add [options]" > /dev/zloop-control
+
+The options available for the add command can be listed by reading the
+/dev/zloop-control device::
+
+	$ cat /dev/zloop-control
+        add id=%d,capacity_mb=%u,zone_size_mb=%u,zone_capacity_mb=%u,conv_zones=%u,base_dir=%s,nr_queues=%u,queue_depth=%u,buffered_io
+        remove id=%d
+
+In more details, the options that can be used with the "add" command are as
+follows.
+
+================   ===========================================================
+id                 Device number (the X in /dev/zloopX).
+                   Default: automatically assigned.
+capacity_mb        Device total capacity in MiB. This is always rounded up to
+                   the nearest higher multiple of the zone size.
+                   Default: 16384 MiB (16 GiB).
+zone_size_mb       Device zone size in MiB. Default: 256 MiB.
+zone_capacity_mb   Device zone capacity (must always be equal to or lower than
+                   the zone size. Default: zone size.
+conv_zones         Total number of conventioanl zones starting from sector 0.
+                   Default: 8.
+base_dir           Path to the base directoy where to create the directory
+                   containing the zone files of the device.
+                   Default=/var/local/zloop.
+                   The device directory containing the zone files is always
+                   named with the device ID. E.g. the default zone file
+                   directory for /dev/zloop0 is /var/local/zloop/0.
+nr_queues          Number of I/O queues of the zoned block device. This value is
+                   always capped by the number of online CPUs
+                   Default: 1
+queue_depth        Maximum I/O queue depth per I/O queue.
+                   Default: 64
+buffered_io        Do buffered IOs instead of direct IOs (default: false)
+================   ===========================================================
+
+3) Deleting a Zoned Device
+--------------------------
+
+Deleting an unused zoned loop block device is done by issuing the "remove"
+command to /dev/zloop-control, specifying the ID of the device to remove::
+
+        $ echo "remove id=X" > /dev/zloop-control
+
+The remove command does not have any option.
+
+A zoned device that was removed can be re-added again without any change to the
+state of the device zones: the device zones are restored to their last state
+before the device was removed. Adding again a zoned device after it was removed
+must always be done using the same configuration as when the device was first
+added. If a zone configuration change is detected, an error will be returned and
+the zoned device will not be created.
+
+To fully delete a zoned device, after executing the remove operation, the device
+base directory containing the backing files of the device zones must be deleted.
+
+4) Example
+----------
+
+The following sequence of commands creates a 2GB zoned device with zones of 64
+MB and a zone capacity of 63 MB::
+
+        $ modprobe zloop
+        $ mkdir -p /var/local/zloop/0
+        $ echo "add capacity_mb=2048,zone_size_mb=64,zone_capacity=63MB" > /dev/zloop-control
+
+For the device created (/dev/zloop0), the zone backing files are all created
+under the default base directory (/var/local/zloop)::
+
+        $ ls -l /var/local/zloop/0
+        total 0
+        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000000
+        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000001
+        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000002
+        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000003
+        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000004
+        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000005
+        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000006
+        -rw-------. 1 root root 67108864 Jan  6 22:23 cnv-000007
+        -rw-------. 1 root root        0 Jan  6 22:23 seq-000008
+        -rw-------. 1 root root        0 Jan  6 22:23 seq-000009
+        ...
+
+The zoned device created (/dev/zloop0) can then be used normally::
+
+        $ lsblk -z
+        NAME   ZONED        ZONE-SZ ZONE-NR ZONE-AMAX ZONE-OMAX ZONE-APP ZONE-WGRAN
+        zloop0 host-managed     64M      32         0         0       1M         4K
+        $ blkzone report /dev/zloop0
+          start: 0x000000000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+          start: 0x000020000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+          start: 0x000040000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+          start: 0x000060000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+          start: 0x000080000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+          start: 0x0000a0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+          start: 0x0000c0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+          start: 0x0000e0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+          start: 0x000100000, len 0x020000, cap 0x01f800, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
+          start: 0x000120000, len 0x020000, cap 0x01f800, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
+          ...
+
+Deleting this device is done using the command::
+
+        $ echo "remove id=0" > /dev/zloop-control
+
+The removed device can be re-added again using the same "add" command as when
+the device was first created. To fully delete a zoned device, its backing files
+should also be deleted after executing the remove command::
+
+        $ rm -r /var/local/zloop/0
--- a/Documentation/admin-guide/device-mapper/dm-crypt.rst
+++ b/Documentation/admin-guide/device-mapper/dm-crypt.rst
@ -146,6 +146,11 @@ integrity:<bytes>:<type>
    integrity for the encrypted device. The additional space is then
    used for storing authentication tag (and persistent IV if needed).

+integrity_key_size:<bytes>
+    Optionally set the integrity key size if it differs from the digest size.
+    It allows the use of wrapped key algorithms where the key size is
+    independent of the cryptographic key size.
+
 sector_size:<bytes>
    Use <bytes> as the encryption unit instead of 512 bytes sectors.
    This option can be in range 512 - 4096 bytes and must be power of two.
--- a/Documentation/admin-guide/device-mapper/dm-integrity.rst
+++ b/Documentation/admin-guide/device-mapper/dm-integrity.rst
@ -92,6 +92,11 @@ Target arguments:
 		allowed. This mode is useful for data recovery if the
 		device cannot be activated in any of the other standard
 		modes.
+	I - inline mode - in this mode, dm-integrity will store integrity
+		data directly in the underlying device sectors.
+		The underlying device must have an integrity profile that
+		allows storing user integrity data and provides enough
+		space for the selected integrity tag.

 5. the number of additional arguments

--- a/Documentation/admin-guide/device-mapper/thin-provisioning.rst
+++ b/Documentation/admin-guide/device-mapper/thin-provisioning.rst
@ -80,11 +80,11 @@ less sharing than average you'll need a larger-than-average metadata device.

 As a guide, we suggest you calculate the number of bytes to use in the
 metadata device as 48 * $data_dev_size / $data_block_size but round it up
-to 2MB if the answer is smaller.  If you're creating large numbers of
+to 2MiB if the answer is smaller.  If you're creating large numbers of
 snapshots which are recording large amounts of change, you may find you
 need to increase this.

-The largest size supported is 16GB: If the device is larger,
+The largest size supported is 16GiB: If the device is larger,
 a warning will be issued and the excess space will not be used.

 Reloading a pool table
@ -107,13 +107,13 @@ Using an existing pool device

 $data_block_size gives the smallest unit of disk space that can be
 allocated at a time expressed in units of 512-byte sectors.
-$data_block_size must be between 128 (64KB) and 2097152 (1GB) and a
-multiple of 128 (64KB).  $data_block_size cannot be changed after the
+$data_block_size must be between 128 (64KiB) and 2097152 (1GiB) and a
+multiple of 128 (64KiB).  $data_block_size cannot be changed after the
 thin-pool is created.  People primarily interested in thin provisioning
-may want to use a value such as 1024 (512KB).  People doing lots of
-snapshotting may want a smaller value such as 128 (64KB).  If you are
+may want to use a value such as 1024 (512KiB).  People doing lots of
+snapshotting may want a smaller value such as 128 (64KiB).  If you are
 not zeroing newly-allocated data, a larger $data_block_size in the
-region of 256000 (128MB) is suggested.
+region of 262144 (128MiB) is suggested.

 $low_water_mark is expressed in blocks of size $data_block_size.  If
 free space on the data device drops below this level then a dm event
@ -291,7 +291,7 @@ i) Constructor
      error_if_no_space:
 	Error IOs, instead of queueing, if no space.

-    Data block size must be between 64KB (128 sectors) and 1GB
+    Data block size must be between 64KiB (128 sectors) and 1GiB
    (2097152 sectors) inclusive.


--- a/Documentation/admin-guide/device-mapper/verity.rst
+++ b/Documentation/admin-guide/device-mapper/verity.rst
@ -87,6 +87,15 @@ panic_on_corruption
    Panic the device when a corrupted block is discovered. This option is
    not compatible with ignore_corruption and restart_on_corruption.

+restart_on_error
+    Restart the system when an I/O error is detected.
+    This option can be combined with the restart_on_corruption option.
+
+panic_on_error
+    Panic the device when an I/O error is detected. This option is
+    not compatible with the restart_on_error option but can be combined
+    with the panic_on_corruption option.
+
 ignore_zero_blocks
    Do not verify blocks that are expected to contain zeroes and always return
    zeroes instead. This may be useful if the partition contains unused blocks
@ -142,8 +151,15 @@ root_hash_sig_key_desc <key_description>
    already in the secondary trusted keyring.

 try_verify_in_tasklet
-    If verity hashes are in cache, verify data blocks in kernel tasklet instead
-    of workqueue. This option can reduce IO latency.
+    If verity hashes are in cache and the IO size does not exceed the limit,
+    verify data blocks in bottom half instead of workqueue. This option can
+    reduce IO latency. The size limits can be configured via
+    /sys/module/dm_verity/parameters/use_bh_bytes. The four parameters
+    correspond to limits for IOPRIO_CLASS_NONE, IOPRIO_CLASS_RT,
+    IOPRIO_CLASS_BE and IOPRIO_CLASS_IDLE in turn.
+    For example:
+    <none>,<rt>,<be>,<idle>
+    4096,4096,4096,4096

 Theory of operation
 ===================
--- a/Documentation/admin-guide/hw-vuln/attack_vector_controls.rst
+++ b/Documentation/admin-guide/hw-vuln/attack_vector_controls.rst
@ -0,0 +1,236 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Attack Vector Controls
+======================
+
+Attack vector controls provide a simple method to configure only the mitigations
+for CPU vulnerabilities which are relevant given the intended use of a system.
+Administrators are encouraged to consider which attack vectors are relevant and
+disable all others in order to recoup system performance.
+
+When new relevant CPU vulnerabilities are found, they will be added to these
+attack vector controls so administrators will likely not need to reconfigure
+their command line parameters as mitigations will continue to be correctly
+applied based on the chosen attack vector controls.
+
+Attack Vectors
+--------------
+
+There are 5 sets of attack-vector mitigations currently supported by the kernel:
+
+#. :ref:`user_kernel`
+#. :ref:`user_user`
+#. :ref:`guest_host`
+#. :ref:`guest_guest`
+#. :ref:`smt`
+
+To control the enabled attack vectors, see :ref:`cmdline`.
+
+.. _user_kernel:
+
+User-to-Kernel
+^^^^^^^^^^^^^^
+
+The user-to-kernel attack vector involves a malicious userspace program
+attempting to leak kernel data into userspace by exploiting a CPU vulnerability.
+The kernel data involved might be limited to certain kernel memory, or include
+all memory in the system, depending on the vulnerability exploited.
+
+If no untrusted userspace applications are being run, such as with single-user
+systems, consider disabling user-to-kernel mitigations.
+
+Note that the CPU vulnerabilities mitigated by Linux have generally not been
+shown to be exploitable from browser-based sandboxes.  User-to-kernel
+mitigations are therefore mostly relevant if unknown userspace applications may
+be run by untrusted users.
+
+*user-to-kernel mitigations are enabled by default*
+
+.. _user_user:
+
+User-to-User
+^^^^^^^^^^^^
+
+The user-to-user attack vector involves a malicious userspace program attempting
+to influence the behavior of another unsuspecting userspace program in order to
+exfiltrate data.  The vulnerability of a userspace program is based on the
+program itself and the interfaces it provides.
+
+If no untrusted userspace applications are being run, consider disabling
+user-to-user mitigations.
+
+Note that because the Linux kernel contains a mapping of all physical memory,
+preventing a malicious userspace program from leaking data from another
+userspace program requires mitigating user-to-kernel attacks as well for
+complete protection.
+
+*user-to-user mitigations are enabled by default*
+
+.. _guest_host:
+
+Guest-to-Host
+^^^^^^^^^^^^^
+
+The guest-to-host attack vector involves a malicious VM attempting to leak
+hypervisor data into the VM.  The data involved may be limited, or may
+potentially include all memory in the system, depending on the vulnerability
+exploited.
+
+If no untrusted VMs are being run, consider disabling guest-to-host mitigations.
+
+*guest-to-host mitigations are enabled by default if KVM support is present*
+
+.. _guest_guest:
+
+Guest-to-Guest
+^^^^^^^^^^^^^^
+
+The guest-to-guest attack vector involves a malicious VM attempting to influence
+the behavior of another unsuspecting VM in order to exfiltrate data.  The
+vulnerability of a VM is based on the code inside the VM itself and the
+interfaces it provides.
+
+If no untrusted VMs, or only a single VM is being run, consider disabling
+guest-to-guest mitigations.
+
+Similar to the user-to-user attack vector, preventing a malicious VM from
+leaking data from another VM requires mitigating guest-to-host attacks as well
+due to the Linux kernel phys map.
+
+*guest-to-guest mitigations are enabled by default if KVM support is present*
+
+.. _smt:
+
+Cross-Thread
+^^^^^^^^^^^^
+
+The cross-thread attack vector involves a malicious userspace program or
+malicious VM either observing or attempting to influence the behavior of code
+running on the SMT sibling thread in order to exfiltrate data.
+
+Many cross-thread attacks can only be mitigated if SMT is disabled, which will
+result in reduced CPU core count and reduced performance.
+
+If cross-thread mitigations are fully enabled ('auto,nosmt'), all mitigations
+for cross-thread attacks will be enabled.  SMT may be disabled depending on
+which vulnerabilities are present in the CPU.
+
+If cross-thread mitigations are partially enabled ('auto'), mitigations for
+cross-thread attacks will be enabled but SMT will not be disabled.
+
+If cross-thread mitigations are disabled, no mitigations for cross-thread
+attacks will be enabled.
+
+Cross-thread mitigation may not be required if core-scheduling or similar
+techniques are used to prevent untrusted workloads from running on SMT siblings.
+
+*cross-thread mitigations default to partially enabled*
+
+.. _cmdline:
+
+Command Line Controls
+---------------------
+
+Attack vectors are controlled through the mitigations= command line option.  The
+value provided begins with a global option and then may optionally include one
+or more options to disable various attack vectors.
+
+Format:
+	| ``mitigations=[global]``
+	| ``mitigations=[global],[attack vectors]``
+
+Global options:
+
+============ =============================================================
+Option       Description
+============ =============================================================
+'off'        All attack vectors disabled.
+'auto'       All attack vectors enabled, partial cross-thread mitigations.
+'auto,nosmt' All attack vectors enabled, full cross-thread mitigations.
+============ =============================================================
+
+Attack vector options:
+
+================= =======================================
+Option            Description
+================= =======================================
+'no_user_kernel'  Disables user-to-kernel mitigations.
+'no_user_user'    Disables user-to-user mitigations.
+'no_guest_host'   Disables guest-to-host mitigations.
+'no_guest_guest'  Disables guest-to-guest mitigations
+'no_cross_thread' Disables all cross-thread mitigations.
+================= =======================================
+
+Multiple attack vector options may be specified in a comma-separated list.  If
+the global option is not specified, it defaults to 'auto'.  The global option
+'off' is equivalent to disabling all attack vectors.
+
+Examples:
+	| ``mitigations=auto,no_user_kernel``
+
+	Enable all attack vectors except user-to-kernel.  Partial cross-thread
+	mitigations.
+
+	| ``mitigations=auto,nosmt,no_guest_host,no_guest_guest``
+
+	Enable all attack vectors and cross-thread mitigations except for
+	guest-to-host and guest-to-guest mitigations.
+
+	| ``mitigations=,no_cross_thread``
+
+	Enable all attack vectors but not cross-thread mitigations.
+
+Interactions with command-line options
+--------------------------------------
+
+Vulnerability-specific controls (e.g. "retbleed=off") take precedence over all
+attack vector controls.  Mitigations for individual vulnerabilities may be
+turned on or off via their command-line options regardless of the attack vector
+controls.
+
+Summary of attack-vector mitigations
+------------------------------------
+
+When a vulnerability is mitigated due to an attack-vector control, the default
+mitigation option for that particular vulnerability is used.  To use a different
+mitigation, please use the vulnerability-specific command line option.
+
+The table below summarizes which vulnerabilities are mitigated when different
+attack vectors are enabled and assuming the CPU is vulnerable.
+
+=============== ============== ============ ============= ============== ============ ========
+Vulnerability   User-to-Kernel User-to-User Guest-to-Host Guest-to-Guest Cross-Thread Notes
+=============== ============== ============ ============= ============== ============ ========
+BHI                   X                           X
+ITS                   X                           X
+GDS                   X              X            X              X            *       (Note 1)
+L1TF                  X                           X                           *       (Note 2)
+MDS                   X              X            X              X            *       (Note 2)
+MMIO                  X              X            X              X            *       (Note 2)
+Meltdown              X
+Retbleed              X                           X                           *       (Note 3)
+RFDS                  X              X            X              X
+Spectre_v1            X
+Spectre_v2            X                           X
+Spectre_v2_user                      X                           X            *       (Note 1)
+SRBDS                 X              X            X              X
+SRSO                  X              X            X              X
+SSB                                  X
+TAA                   X              X            X              X            *       (Note 2)
+TSA                   X              X            X              X
+VMSCAPE                                           X
+=============== ============== ============ ============= ============== ============ ========
+
+Notes:
+   1 --  Can be mitigated without disabling SMT.
+
+   2 --  Disables SMT if cross-thread mitigations are fully enabled  and the CPU
+   is vulnerable
+
+   3 --  Disables SMT if cross-thread mitigations are fully enabled, the CPU is
+   vulnerable, and STIBP is not supported
+
+When an attack-vector is disabled, all mitigations for the vulnerabilities
+listed in the above table are disabled, unless mitigation is required for a
+different enabled attack-vector or a mitigation is explicitly selected via a
+vulnerability-specific command line option.
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@ -9,6 +9,7 @@ are configurable at compile, boot or run time.
 .. toctree::
   :maxdepth: 1

+   attack_vector_controls
   spectre
   l1tf
   mds
@ -23,5 +24,6 @@ are configurable at compile, boot or run time.
   gather_data_sampling
   reg-file-data-sampling
   rsb
+   old_microcode
   indirect-target-selection
   vmscape
--- a/Documentation/admin-guide/hw-vuln/old_microcode.rst
+++ b/Documentation/admin-guide/hw-vuln/old_microcode.rst
@ -0,0 +1,21 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Old Microcode
+=============
+
+The kernel keeps a table of released microcode. Systems that had
+microcode older than this at boot will say "Vulnerable".  This means
+that the system was vulnerable to some known CPU issue. It could be
+security or functional, the kernel does not know or care.
+
+You should update the CPU microcode to mitigate any exposure. This is
+usually accomplished by updating the files in
+/lib/firmware/intel-ucode/ via normal distribution updates. Intel also
+distributes these files in a github repo:
+
+	https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files.git
+
+Just like all the other hardware vulnerabilities, exposure is
+determined at boot. Runtime microcode updates do not change the status
+of this vulnerability.
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@ -551,6 +551,38 @@ from within add_taint() whenever the value set in this bitmask matches with the
 bit flag being set by add_taint().
 This will cause a kdump to occur at the add_taint()->panic() call.

+Write the dump file to encrypted disk volume
+============================================
+
+CONFIG_CRASH_DM_CRYPT can be enabled to support saving the dump file to an
+encrypted disk volume (only x86_64 supported for now). User space can interact
+with /sys/kernel/config/crash_dm_crypt_keys for setup,
+
+1. Tell the first kernel what logon keys are needed to unlock the disk volumes,
+    # Add key #1
+    mkdir /sys/kernel/config/crash_dm_crypt_keys/7d26b7b4-e342-4d2d-b660-7426b0996720
+    # Add key #1's description
+    echo cryptsetup:7d26b7b4-e342-4d2d-b660-7426b0996720 > /sys/kernel/config/crash_dm_crypt_keys/description
+
+    # how many keys do we have now?
+    cat /sys/kernel/config/crash_dm_crypt_keys/count
+    1
+
+    # Add key #2 in the same way
+
+    # how many keys do we have now?
+    cat /sys/kernel/config/crash_dm_crypt_keys/count
+    2
+
+    # To support CPU/memory hot-plugging, re-use keys already saved to reserved
+    # memory
+    echo true > /sys/kernel/config/crash_dm_crypt_key/reuse
+
+2. Load the dump-capture kernel
+
+3. After the dump-capture kerne get booted, restore the keys to user keyring
+   echo yes > /sys/kernel/crash_dm_crypt_keys/restore
+
 Contact
 =======

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@ -3501,8 +3501,16 @@

 	mga=		[HW,DRM]

-	microcode.force_minrev=	[X86]
-			Format: <bool>
+	microcode=      [X86] Control the behavior of the microcode loader.
+	                Available options, comma separated:
+
+			base_rev=X - with <X> with format: <u32>
+			Set the base microcode revision of each thread when in
+			debug mode.
+
+			dis_ucode_ldr: disable the microcode loader
+
+			force_minrev:
 			Enable or disable the microcode minimal revision
 			enforcement for the runtime microcode loader.

@ -3588,6 +3596,10 @@
 					       mmio_stale_data=full,nosmt [X86]
 					       retbleed=auto,nosmt [X86]

+			[X86] After one of the above options, additionally
+			supports attack-vector based controls as documented in
+			Documentation/admin-guide/hw-vuln/attack_vector_controls.rst
+
 	mminit_loglevel=
 			[KNL,EARLY] When CONFIG_DEBUG_MEMORY_INIT is set, this
 			parameter allows control of the logging verbosity for
@ -4229,6 +4241,18 @@
 			This can be set from sysctl after boot.
 			See Documentation/admin-guide/sysctl/vm.rst for details.

+	nvme.quirks=    [NVME] A list of quirk entries to augment the built-in
+			nvme quirk list. List entries are separated by a
+			'-' character.
+			Each entry has the form VendorID:ProductID:quirk_names.
+			The IDs are 4-digits hex numbers and quirk_names is a
+			list of quirk names separated by commas. A quirk name
+			can be prefixed by '^', meaning that the specified
+			quirk must be disabled.
+
+			Example:
+			nvme.quirks=7710:2267:bogus_nid,^identify_cns-9900:7711:broken_msi
+
 	ohci1394_dma=early	[HW,EARLY] enable debugging via the ohci1394 driver.
 			See Documentation/core-api/debugging-via-ohci1394.rst for more
 			info.
@ -5262,7 +5286,8 @@
 			echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
 			or pass a boot parameter "rcutree.rcu_normal_wake_from_gp=1"

-			Default is 0.
+			Default is 1 if num_possible_cpus() <= 16 and it is not explicitly
+			disabled by the boot parameter passing 0.

 	rcuscale.gp_async= [KNL]
 			Measure performance of asynchronous
@ -5395,7 +5420,42 @@

 	rcutorture.gp_cond= [KNL]
 			Use conditional/asynchronous update-side
-			primitives, if available.
+			normal-grace-period primitives, if available.
+
+	rcutorture.gp_cond_exp= [KNL]
+			Use conditional/asynchronous update-side
+			expedited-grace-period primitives, if available.
+
+	rcutorture.gp_cond_full= [KNL]
+			Use conditional/asynchronous update-side
+			normal-grace-period primitives that also take
+			concurrent expedited grace periods into account,
+			if available.
+
+	rcutorture.gp_cond_exp_full= [KNL]
+			Use conditional/asynchronous update-side
+			expedited-grace-period primitives that also take
+			concurrent normal grace periods into account,
+			if available.
+
+	rcutorture.gp_cond_wi= [KNL]
+			Nominal wait interval for normal conditional
+			grace periods (specified by rcutorture's
+			gp_cond and gp_cond_full module parameters),
+			in microseconds.  The actual wait interval will
+			be randomly selected to nanosecond granularity up
+			to this wait interval.	Defaults to 16 jiffies,
+			for example, 16,000 microseconds on a system
+			with HZ=1000.
+
+	rcutorture.gp_cond_wi_exp= [KNL]
+			Nominal wait interval for expedited conditional
+			grace periods (specified by rcutorture's
+			gp_cond_exp and gp_cond_exp_full module
+			parameters), in microseconds.  The actual wait
+			interval will be randomly selected to nanosecond
+			granularity up to this wait interval.  Defaults to
+			128 microseconds.

 	rcutorture.gp_exp= [KNL]
 			Use expedited update-side primitives, if available.
@ -5404,6 +5464,43 @@
 			Use normal (non-expedited) asynchronous
 			update-side primitives, if available.

+	rcutorture.gp_poll= [KNL]
+			Use polled update-side normal-grace-period
+			primitives, if available.
+
+	rcutorture.gp_poll_exp= [KNL]
+			Use polled update-side expedited-grace-period
+			primitives, if available.
+
+	rcutorture.gp_poll_full= [KNL]
+			Use polled update-side normal-grace-period
+			primitives that also take concurrent expedited
+			grace periods into account, if available.
+
+	rcutorture.gp_poll_exp_full= [KNL]
+			Use polled update-side expedited-grace-period
+			primitives that also take concurrent normal
+			grace periods into account, if available.
+
+	rcutorture.gp_poll_wi= [KNL]
+			Nominal wait interval for normal conditional
+			grace periods (specified by rcutorture's
+			gp_poll and gp_poll_full module parameters),
+			in microseconds.  The actual wait interval will
+			be randomly selected to nanosecond granularity up
+			to this wait interval.	Defaults to 16 jiffies,
+			for example, 16,000 microseconds on a system
+			with HZ=1000.
+
+	rcutorture.gp_poll_wi_exp= [KNL]
+			Nominal wait interval for expedited conditional
+			grace periods (specified by rcutorture's
+			gp_poll_exp and gp_poll_exp_full module
+			parameters), in microseconds.  The actual wait
+			interval will be randomly selected to nanosecond
+			granularity up to this wait interval.  Defaults to
+			128 microseconds.
+
 	rcutorture.gp_sync= [KNL]
 			Use normal (non-expedited) synchronous
 			update-side primitives, if available.  If all
@ -5412,6 +5509,31 @@
 			are zero, rcutorture acts as if is interpreted
 			they are all non-zero.

+	rcutorture.gpwrap_lag= [KNL]
+			Enable grace-period wrap lag testing. Setting
+			to false prevents the gpwrap lag test from
+			running. Default is true.
+
+	rcutorture.gpwrap_lag_gps= [KNL]
+			Set the value for grace-period wrap lag during
+			active lag testing periods. This controls how many
+			grace periods differences we tolerate between
+			rdp and rnp's gp_seq before setting overflow flag.
+			The default is always set to 8.
+
+	rcutorture.gpwrap_lag_cycle_mins= [KNL]
+			Set the total cycle duration for gpwrap lag
+			testing in minutes. This is the total time for
+			one complete cycle of active and inactive
+			testing periods. Default is 30 minutes.
+
+	rcutorture.gpwrap_lag_active_mins= [KNL]
+			Set the duration for which gpwrap lag is active
+			within each cycle, in minutes. During this time,
+			the grace-period wrap lag will be set to the
+			value specified by gpwrap_lag_gps. Default is
+			5 minutes.
+
 	rcutorture.irqreader= [KNL]
 			Run RCU readers from irq handlers, or, more
 			accurately, from a timer handler.  Not all RCU
@ -5457,10 +5579,21 @@
 			Set time (jiffies) between CPU-hotplug operations,
 			or zero to disable CPU-hotplug testing.

-	rcutorture.read_exit= [KNL]
-			Set the number of read-then-exit kthreads used
-			to test the interaction of RCU updaters and
-			task-exit processing.
+	rcutorture.preempt_duration= [KNL]
+			Set duration (in milliseconds) of preemptions
+			by a high-priority FIFO real-time task.  Set to
+			zero (the default) to disable.	The CPUs to
+			preempt are selected randomly from the set that
+			are online at a given point in time.  Races with
+			CPUs going offline are ignored, with that attempt
+			at preemption skipped.
+
+	rcutorture.preempt_interval= [KNL]
+			Set interval (in milliseconds, defaulting to one
+			second) between preemptions by a high-priority
+			FIFO real-time task.  This delay is mediated
+			by an hrtimer and is further fuzzed to avoid
+			inadvertent synchronizations.

 	rcutorture.read_exit_burst= [KNL]
 			The number of times in a given read-then-exit
@ -5471,6 +5604,14 @@
 			The delay, in seconds, between successive
 			read-then-exit testing episodes.

+	rcutorture.reader_flavor= [KNL]
+			A bit mask indicating which readers to use.
+			If there is more than one bit set, the readers
+			are entered from low-order bit up, and are
+			exited in the opposite order.  For SRCU, the
+			0x1 bit is normal readers, 0x2 NMI-safe readers,
+			and 0x4 light-weight readers.
+
 	rcutorture.shuffle_interval= [KNL]
 			Set task-shuffle interval (s).  Shuffling tasks
 			allows some CPUs to go into dyntick-idle mode
@ -5534,6 +5675,11 @@
 	rcutorture.test_boost_duration= [KNL]
 			Duration (s) of each individual boost test.

+	rcutorture.test_boost_holdoff= [KNL]
+			Holdoff time (s) from start of test to the start
+			of RCU priority-boost testing.	Defaults to zero,
+			that is, no holdoff.
+
 	rcutorture.test_boost_interval= [KNL]
 			Interval (s) between each boost test.

@ -5909,12 +6055,15 @@
 			blocked and everything unblocked.

 	rh_waived=
-			Enable waived features in RHEL.
+			Enable waived items in RHEL.

-			Waived features are disabled by default in RHEL, this parameter
-			provides support to enable such features, as needed.
+			Some specific features, or security mitigations, can be
+			waived (toggled on/off) on demand in RHEL.  However,
+			waiving any of these items should be used judiciously,
+			as it generally means the system might end up being
+			considered insecure or even out-of-scope for support.

-			Format: <feat-1>,<feat-2>...<feat-n>
+			Format: <item-1>,<item-2>...<item-n>

 			Use 'rh_waived' to enable all waived features listed at
 			Documentation/admin-guide/rh-waived-features.rst
@ -5958,6 +6107,9 @@

 	rootflags=	[KNL] Set root filesystem mount option string

+	initramfs_options= [KNL]
+                        Specify mount options for for the initramfs mount.
+
 	rootfstype=	[KNL] Set root filesystem type

 	rootwait	[KNL] Wait (indefinitely) for root device to show up.
@ -5973,6 +6125,11 @@
 			Memory area to be used by remote processor image,
 			managed by CMA.

+	rt_group_sched=	[KNL] Enable or disable SCHED_RR/FIFO group scheduling
+			when CONFIG_RT_GROUP_SCHED=y. Defaults to
+			!CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED.
+			Format: <bool>
+
 	rw		[KNL] Mount root device read-write on boot

 	S		[KNL] Run init in single mode
--- a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
+++ b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
@ -315,7 +315,7 @@ To reduce its OS jitter, do at least one of the following:
 	to do.

 Name:
-  rcuop/%d and rcuos/%d
+  rcuop/%d, rcuos/%d, and rcuog/%d

 Purpose:
  Offload RCU callbacks from the corresponding CPU.
--- a/Documentation/admin-guide/namespaces/resource-control.rst
+++ b/Documentation/admin-guide/namespaces/resource-control.rst
@ -1,17 +1,17 @@
-===========================
-Namespaces research control
-===========================
+====================================
+User namespaces and resource control
+====================================

-There are a lot of kinds of objects in the kernel that don't have
-individual limits or that have limits that are ineffective when a set
-of processes is allowed to switch user ids.  With user namespaces
-enabled in a kernel for people who don't trust their users or their
-users programs to play nice this problems becomes more acute.
+The kernel contains many kinds of objects that either don't have
+individual limits or that have limits which are ineffective when
+a set of processes is allowed to switch their UID. On a system
+where the admins don't trust their users or their users' programs,
+user namespaces expose the system to potential misuse of resources.

-Therefore it is recommended that memory control groups be enabled in
-kernels that enable user namespaces, and it is further recommended
-that userspace configure memory control groups to limit how much
-memory user's they don't trust to play nice can use.
+In order to mitigate this, we recommend that admins enable memory
+control groups on any system that enables user namespaces.
+Furthermore, we recommend that admins configure the memory control
+groups to limit the maximum memory usable by any untrusted user.

 Memory control groups can be configured by installing the libcgroup
 package present on most distros editing /etc/cgrules.conf,
--- a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
+++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
@ -16,8 +16,8 @@ provides the following two features:

 - one 64-bit counter for Time Based Analysis (RX/TX data throughput and
  time spent in each low-power LTSSM state) and
- one 32-bit counter for Event Counting (error and non-error events for
-  a specified lane)
+- one 32-bit counter per event for Event Counting (error and non-error
+  events for a specified lane)

 Note: There is no interrupt for counter overflow.

@ -60,7 +60,7 @@ description of available events and configuration options in sysfs, see
 The "format" directory describes format of the config fields of the
 perf_event_attr structure. The "events" directory provides configuration
 templates for all documented events.  For example,
-"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
+"rx_pcie_tlp_data_payload" is an equivalent of "eventid=0x21,type=0x0".

 The "perf list" command shall list the available events from sysfs, e.g.::

@ -79,8 +79,8 @@ Example usage of counting PCIe RX TLP data payload (Units of bytes)::

 The average RX/TX bandwidth can be calculated using the following formula:

-    PCIe RX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window
-    PCIe TX Bandwidth = Tx_PCIe_TLP_Data_Payload / Measure_Time_Window
+    PCIe RX Bandwidth = rx_pcie_tlp_data_payload / Measure_Time_Window
+    PCIe TX Bandwidth = tx_pcie_tlp_data_payload / Measure_Time_Window

 Lane Event Usage
 -------------------------------
--- a/Documentation/admin-guide/perf/fujitsu_uncore_pmu.rst
+++ b/Documentation/admin-guide/perf/fujitsu_uncore_pmu.rst
@ -0,0 +1,115 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+================================================
+Fujitsu Uncore Performance Monitoring Unit (PMU)
+================================================
+
+This driver supports the Uncore MAC PMUs and the Uncore PCI PMUs found
+in Fujitsu chips.
+Each MAC PMU on these chips is exposed as a uncore perf PMU with device name
+mac_iod<iod>_mac<mac>_ch<ch>.
+And each PCI PMU on these chips is exposed as a uncore perf PMU with device name
+pci_iod<iod>_pci<pci>.
+
+The driver provides a description of its available events and configuration
+options in sysfs, see /sys/bus/event_sources/devices/mac_iod<iod>_mac<mac>_ch<ch>/
+and /sys/bus/event_sources/devices/pci_iod<iod>_pci<pci>/.
+This driver exports:
+
+- formats, used by perf user space and other tools to configure events
+- events, used by perf user space and other tools to create events
+  symbolically, e.g.::
+
+    perf stat -a -e mac_iod0_mac0_ch0/event=0x21/ ls
+    perf stat -a -e pci_iod0_pci0/event=0x24/ ls
+
+- cpumask, used by perf user space and other tools to know on which CPUs
+  to open the events
+
+This driver supports the following events for MAC:
+
+- cycles
+  This event counts MAC cycles at MAC frequency.
+- read-count
+  This event counts the number of read requests to MAC.
+- read-count-request
+  This event counts the number of read requests including retry to MAC.
+- read-count-return
+  This event counts the number of responses to read requests to MAC.
+- read-count-request-pftgt
+  This event counts the number of read requests including retry with PFTGT
+  flag.
+- read-count-request-normal
+  This event counts the number of read requests including retry without PFTGT
+  flag.
+- read-count-return-pftgt-hit
+  This event counts the number of responses to read requests which hit the
+  PFTGT buffer.
+- read-count-return-pftgt-miss
+  This event counts the number of responses to read requests which miss the
+  PFTGT buffer.
+- read-wait
+  This event counts outstanding read requests issued by DDR memory controller
+  per cycle.
+- write-count
+  This event counts the number of write requests to MAC (including zero write,
+  full write, partial write, write cancel).
+- write-count-write
+  This event counts the number of full write requests to MAC (not including
+  zero write).
+- write-count-pwrite
+  This event counts the number of partial write requests to MAC.
+- memory-read-count
+  This event counts the number of read requests from MAC to memory.
+- memory-write-count
+  This event counts the number of full write requests from MAC to memory.
+- memory-pwrite-count
+  This event counts the number of partial write requests from MAC to memory.
+- ea-mac
+  This event counts energy consumption of MAC.
+- ea-memory
+  This event counts energy consumption of memory.
+- ea-memory-mac-write
+  This event counts the number of write requests from MAC to memory.
+- ea-ha
+  This event counts energy consumption of HA.
+
+  'ea' is the abbreviation for 'Energy Analyzer'.
+
+Examples for use with perf::
+
+  perf stat -e mac_iod0_mac0_ch0/ea-mac/ ls
+
+And, this driver supports the following events for PCI:
+
+- pci-port0-cycles
+  This event counts PCI cycles at PCI frequency in port0.
+- pci-port0-read-count
+  This event counts read transactions for data transfer in port0.
+- pci-port0-read-count-bus
+  This event counts read transactions for bus usage in port0.
+- pci-port0-write-count
+  This event counts write transactions for data transfer in port0.
+- pci-port0-write-count-bus
+  This event counts write transactions for bus usage in port0.
+- pci-port1-cycles
+  This event counts PCI cycles at PCI frequency in port1.
+- pci-port1-read-count
+  This event counts read transactions for data transfer in port1.
+- pci-port1-read-count-bus
+  This event counts read transactions for bus usage in port1.
+- pci-port1-write-count
+  This event counts write transactions for data transfer in port1.
+- pci-port1-write-count-bus
+  This event counts write transactions for bus usage in port1.
+- ea-pci
+  This event counts energy consumption of PCI.
+
+  'ea' is the abbreviation for 'Energy Analyzer'.
+
+Examples for use with perf::
+
+  perf stat -e pci_iod0_pci0/ea-pci/ ls
+
+Given that these are uncore PMUs the driver does not support sampling, therefore
+"perf record" will not work. Per-task perf sessions are not supported.
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@ -26,3 +26,4 @@ Performance monitor support
   meson-ddr-pmu
   cxl
   ampere_cspmu
+   fujitsu_uncore_pmu
--- a/Documentation/admin-guide/rh-waived-features.rst
+++ b/Documentation/admin-guide/rh-waived-features.rst
@ -1,21 +0,0 @@
-.. _rh_waived_features:
-
-=======================
-Red Hat Waived Features
-=======================
-
-Red Hat waived features are features considered unmaintained, insecure, rudimentary, or
-deprecated and are shipped in RHEL only for customer convenience. These features are disabled
-by default but can be enabled on demand via the ``rh_waived`` kernel boot parameter. To allow
-a set of waived features, append ``rh_waived=<feature name>,...,<feature name>`` to the kernel
-cmdline. Appending only ``rh_waived`` (with no arguments) will enable all waived features
-listed below.
-
-The waived features listed in the next session follow the pattern below:
-
- feature name
-        feature description
-
-List of Red Hat Waived Features
-===============================
-
--- a/Documentation/admin-guide/rh-waived-items.rst
+++ b/Documentation/admin-guide/rh-waived-items.rst
@ -0,0 +1,35 @@
+.. _rh_waived_items:
+
+====================
+Red Hat Waived Items
+====================
+
+Waived Items is a mechanism offered by Red Hat which allows customers to "waive"
+and utilize features that are not enabled by default as these are considered as
+unmaintained, insecure, rudimentary, or deprecated, but are shipped with the
+RHEL kernel for customer's convinience only.
+Waived Items can range from features that can be enabled on demand to specific
+security mitigations that can be disabled on demand.
+
+To explicitly "waive" any of these items, RHEL offers the ``rh_waived``
+kernel boot parameter. To allow set of waived items, append
+``rh_waived=<item name>,...,<item name>`` to the kernel
+cmdline.
+Appending ``rh_waived=features`` will waive all features listed below,
+and appending ``rh_waived=cves`` will waive all security mitigations
+listed below.
+
+The waived items listed in the next session follow the pattern below:
+
+- item name
+        item description
+
+List of Red Hat Waived Items
+============================
+
+- CVE-2025-38085
+        Waiving this mitigation can help with addressing perceived performace
+        degradation on some workloads utilizing huge-pages [1] at the expense
+        of re-introducing conditions to allow for the data race that leads to
+        the enumerated common vulnerability.
+        [1] https://access.redhat.com/solutions/7132440
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@ -53,20 +53,25 @@ following prctl:

  prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <offset>, <length>, [selector])

-<op> is either PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF, to enable and
-disable the mechanism globally for that thread.  When
-PR_SYS_DISPATCH_OFF is used, the other fields must be zero.
+<op> is either PR_SYS_DISPATCH_EXCLUSIVE_ON/PR_SYS_DISPATCH_INCLUSIVE_ON
+or PR_SYS_DISPATCH_OFF, to enable and disable the mechanism globally for
+that thread.  When PR_SYS_DISPATCH_OFF is used, the other fields must be zero.

-[<offset>, <offset>+<length>) delimit a memory region interval
-from which syscalls are always executed directly, regardless of the
-userspace selector.  This provides a fast path for the C library, which
-includes the most common syscall dispatchers in the native code
-applications, and also provides a way for the signal handler to return
+For PR_SYS_DISPATCH_EXCLUSIVE_ON [<offset>, <offset>+<length>) delimit
+a memory region interval from which syscalls are always executed directly,
+regardless of the userspace selector.  This provides a fast path for the
+C library, which includes the most common syscall dispatchers in the native
+code applications, and also provides a way for the signal handler to return
 without triggering a nested SIGSYS on (rt\_)sigreturn.  Users of this
 interface should make sure that at least the signal trampoline code is
 included in this region. In addition, for syscalls that implement the
 trampoline code on the vDSO, that trampoline is never intercepted.

+For PR_SYS_DISPATCH_INCLUSIVE_ON [<offset>, <offset>+<length>) delimit
+a memory region interval from which syscalls are dispatched based on
+the userspace selector. Syscalls from outside of the range are always
+executed directly.
+
 [selector] is a pointer to a char-sized region in the process memory
 region, that provides a quick way to enable disable syscall redirection
 thread-wide, without the need to invoke the kernel directly.  selector
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@ -177,6 +177,7 @@ core_pattern
 	%E		executable path
 	%c		maximum size of core file by resource limit RLIMIT_CORE
 	%C		CPU the task ran on
+	%F		pidfd number
 	%<OTHER>	both are dropped
 	========	==========================================

--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@ -40,8 +40,8 @@ Table : Subdirectories in /proc/sys/net
 bridge    Bridging              rose       X.25 PLP layer
 core      General parameter     tipc       TIPC
 ethernet  Ethernet protocol     unix       Unix domain sockets
- ipv4      IP version 4          x25        X.25 protocol
- ipv6      IP version 6
+ ipv4      IP version 4          vsock      VSOCK sockets
+ ipv6      IP version 6          x25        X.25 protocol
 ========= =================== = ========== ===================

 1. /proc/sys/net/core - Network core options
@ -513,3 +513,54 @@ originally may have been issued in the correct sequential order.
 If named_timeout is nonzero, failed topology updates will be placed on a defer
 queue until another event arrives that clears the error, or until the timeout
 expires. Value is in milliseconds.
+
+6. /proc/sys/net/vsock - VSOCK sockets
+--------------------------------------
+
+VSOCK sockets (AF_VSOCK) provide communication between virtual machines and
+their hosts. The behavior of VSOCK sockets in a network namespace is determined
+by the namespace's mode (``global`` or ``local``), which controls how CIDs
+(Context IDs) are allocated and how sockets interact across namespaces.
+
+ns_mode
+-------
+
+Read-only. Reports the current namespace's mode, set at namespace creation
+and immutable thereafter.
+
+Values:
+
+	- ``global`` - the namespace shares system-wide CID allocation and
+	  its sockets can reach any VM or socket in any global namespace.
+	  Sockets in this namespace cannot reach sockets in local
+	  namespaces.
+	- ``local`` - the namespace has private CID allocation and its
+	  sockets can only connect to VMs or sockets within the same
+	  namespace.
+
+The init_net mode is always ``global``.
+
+child_ns_mode
+-------------
+
+Controls what mode newly created child namespaces will inherit. At namespace
+creation, ``ns_mode`` is inherited from the parent's ``child_ns_mode``. The
+initial value matches the namespace's own ``ns_mode``.
+
+Values:
+
+	- ``global`` - child namespaces will share system-wide CID allocation
+	  and their sockets will be able to reach any VM or socket in any
+	  global namespace.
+	- ``local`` - child namespaces will have private CID allocation and
+	  their sockets will only be able to connect within their own
+	  namespace.
+
+The first write to ``child_ns_mode`` locks its value. Subsequent writes of the
+same value succeed, but writing a different value returns ``-EBUSY``.
+
+Changing ``child_ns_mode`` only affects namespaces created after the change;
+it does not modify the current namespace or any existing children.
+
+A namespace with ``ns_mode`` set to ``local`` cannot change
+``child_ns_mode`` to ``global`` (returns ``-EPERM``).
--- a/Documentation/admin-guide/thunderbolt.rst
+++ b/Documentation/admin-guide/thunderbolt.rst
@ -296,6 +296,39 @@ information is missing.
 To recover from this mode, one needs to flash a valid NVM image to the
 host controller in the same way it is done in the previous chapter.

+Tunneling events
+----------------
+The driver sends ``KOBJ_CHANGE`` events to userspace when there is a
+tunneling change in the ``thunderbolt_domain``. The notification carries
+following environment variables::
+
+  TUNNEL_EVENT=<EVENT>
+  TUNNEL_DETAILS=0:12 <-> 1:20 (USB3)
+
+Possible values for ``<EVENT>`` are:
+
+  activated
+    The tunnel was activated (created).
+
+  changed
+    There is a change in this tunnel. For example bandwidth allocation was
+    changed.
+
+  deactivated
+    The tunnel was torn down.
+
+  low bandwidth
+    The tunnel is not getting optimal bandwidth.
+
+  insufficient bandwidth
+    There is not enough bandwidth for the current tunnel requirements.
+
+The ``TUNNEL_DETAILS`` is only provided if the tunnel is known. For
+example, in case of Firmware Connection Manager this is missing or does
+not provide full tunnel information. In case of Software Connection Manager
+this includes full tunnel details. The format currently matches what the
+driver uses when logging. This may change over time.
+
 Networking over Thunderbolt cable
 ---------------------------------
 Thunderbolt technology allows software communication between two hosts
@ -325,12 +358,7 @@ Forcing power
 Many OEMs include a method that can be used to force the power of a
 Thunderbolt controller to an "On" state even if nothing is connected.
 If supported by your machine this will be exposed by the WMI bus with
-a sysfs attribute called "force_power".
-
-For example the intel-wmi-thunderbolt driver exposes this attribute in:
-  /sys/bus/wmi/devices/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power
-
-  To force the power to on, write 1 to this attribute file.
-  To disable force power, write 0 to this attribute file.
+a sysfs attribute called "force_power", see
+Documentation/ABI/testing/sysfs-platform-intel-wmi-thunderbolt for details.

 Note: it's currently not possible to query the force power state of a platform.
--- a/Documentation/admin-guide/xfs.rst
+++ b/Documentation/admin-guide/xfs.rst
@ -124,6 +124,14 @@ When mounting an XFS filesystem, the following options are accepted.
 	controls the size of each buffer and so is also relevant to
 	this case.

+  lifetime (default) or nolifetime
+	Enable data placement based on write life time hints provided
+	by the user. This turns on co-allocation of data of similar
+	life times when statistically favorable to reduce garbage
+	collection cost.
+
+	These options are only available for zoned rt file systems.
+
  logbsize=value
 	Set the size of each in-memory log buffer.  The size may be
 	specified in bytes, or in kilobytes with a "k" suffix.
@ -143,6 +151,14 @@ When mounting an XFS filesystem, the following options are accepted.
 	optional, and the log section can be separate from the data
 	section or contained within it.

+  max_open_zones=value
+	Specify the max number of zones to keep open for writing on a
+	zoned rt device. Many open zones aids file data separation
+	but may impact performance on HDDs.
+
+	If ``max_open_zones`` is not specified, the value is determined
+	by the capabilities and the size of the zoned rt device.
+
  noalign
 	Data allocations will not be aligned at stripe unit
 	boundaries. This is only relevant to filesystems created
@ -542,3 +558,24 @@ The interesting knobs for XFS workqueues are as follows:
  nice           Relative priority of scheduling the threads.  These are the
                 same nice levels that can be applied to userspace processes.
 ============     ===========
+
+Zoned Filesystems
+=================
+
+For zoned file systems, the following attributes are exposed in:
+
+  /sys/fs/xfs/<dev>/zoned/
+
+  max_open_zones		(Min:  1  Default:  Varies  Max:  UINTMAX)
+	This read-only attribute exposes the maximum number of open zones
+	available for data placement. The value is determined at mount time and
+	is limited by the capabilities of the backing zoned device, file system
+	size and the max_open_zones mount option.
+
+  zonegc_low_space		(Min:  0  Default:  0  Max:  100)
+	Define a percentage for how much of the unused space that GC should keep
+	available for writing. A high value will reclaim more of the space
+	occupied by unused blocks, creating a larger buffer against write
+	bursts at the cost of increased write amplification.  Regardless
+	of this value, garbage collection will always aim to free a minimum
+	amount of blocks to keep max_open_zones open for data placement purposes.
--- a/Documentation/arch/arm64/booting.rst
+++ b/Documentation/arch/arm64/booting.rst
@ -223,6 +223,47 @@ Before jumping into the kernel, the following conditions must be met:

    - SCR_EL3.HCE (bit 8) must be initialised to 0b1.

+  For systems with a GICv5 interrupt controller to be used in v5 mode:
+
+  - If the kernel is entered at EL1 and EL2 is present:
+
+      - ICH_HFGRTR_EL2.ICC_PPI_ACTIVERn_EL1 (bit 20) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_PPI_PRIORITYRn_EL1 (bit 19) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_PPI_PENDRn_EL1 (bit 18) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_PPI_ENABLERn_EL1 (bit 17) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_PPI_HMRn_EL1 (bit 16) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_IAFFIDR_EL1 (bit 7) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_ICSR_EL1 (bit 6) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_PCR_EL1 (bit 5) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_HPPIR_EL1 (bit 4) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_HAPR_EL1 (bit 3) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_CR0_EL1 (bit 2) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_IDRn_EL1 (bit 1) must be initialised to 0b1.
+      - ICH_HFGRTR_EL2.ICC_APR_EL1 (bit 0) must be initialised to 0b1.
+
+      - ICH_HFGWTR_EL2.ICC_PPI_ACTIVERn_EL1 (bit 20) must be initialised to 0b1.
+      - ICH_HFGWTR_EL2.ICC_PPI_PRIORITYRn_EL1 (bit 19) must be initialised to 0b1.
+      - ICH_HFGWTR_EL2.ICC_PPI_PENDRn_EL1 (bit 18) must be initialised to 0b1.
+      - ICH_HFGWTR_EL2.ICC_PPI_ENABLERn_EL1 (bit 17) must be initialised to 0b1.
+      - ICH_HFGWTR_EL2.ICC_ICSR_EL1 (bit 6) must be initialised to 0b1.
+      - ICH_HFGWTR_EL2.ICC_PCR_EL1 (bit 5) must be initialised to 0b1.
+      - ICH_HFGWTR_EL2.ICC_CR0_EL1 (bit 2) must be initialised to 0b1.
+      - ICH_HFGWTR_EL2.ICC_APR_EL1 (bit 0) must be initialised to 0b1.
+
+      - ICH_HFGITR_EL2.GICRCDNMIA (bit 10) must be initialised to 0b1.
+      - ICH_HFGITR_EL2.GICRCDIA (bit 9) must be initialised to 0b1.
+      - ICH_HFGITR_EL2.GICCDDI (bit 8) must be initialised to 0b1.
+      - ICH_HFGITR_EL2.GICCDEOI (bit 7) must be initialised to 0b1.
+      - ICH_HFGITR_EL2.GICCDHM (bit 6) must be initialised to 0b1.
+      - ICH_HFGITR_EL2.GICCDRCFG (bit 5) must be initialised to 0b1.
+      - ICH_HFGITR_EL2.GICCDPEND (bit 4) must be initialised to 0b1.
+      - ICH_HFGITR_EL2.GICCDAFF (bit 3) must be initialised to 0b1.
+      - ICH_HFGITR_EL2.GICCDPRI (bit 2) must be initialised to 0b1.
+      - ICH_HFGITR_EL2.GICCDDIS (bit 1) must be initialised to 0b1.
+      - ICH_HFGITR_EL2.GICCDEN (bit 0) must be initialised to 0b1.
+
+  - The DT or ACPI tables must describe a GICv5 interrupt controller.
+
  For systems with a GICv3 interrupt controller to be used in v3 mode:
  - If EL3 is present:

@ -234,7 +275,7 @@ Before jumping into the kernel, the following conditions must be met:

  - If the kernel is entered at EL1:

-      - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1
+      - ICC_SRE_EL2.Enable (bit 3) must be initialised to 0b1
      - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1.

  - The DT or ACPI tables must describe a GICv3 interrupt controller.
@ -388,6 +429,27 @@ Before jumping into the kernel, the following conditions must be met:

    - SMCR_EL2.EZT0 (bit 30) must be initialised to 0b1.

+  For CPUs with the Branch Record Buffer Extension (FEAT_BRBE):
+
+  - If EL3 is present:
+
+    - MDCR_EL3.SBRBE (bits 33:32) must be initialised to 0b01 or 0b11.
+
+  - If the kernel is entered at EL1 and EL2 is present:
+
+    - BRBCR_EL2.CC (bit 3) must be initialised to 0b1.
+    - BRBCR_EL2.MPRED (bit 4) must be initialised to 0b1.
+
+    - HDFGRTR_EL2.nBRBDATA (bit 61) must be initialised to 0b1.
+    - HDFGRTR_EL2.nBRBCTL  (bit 60) must be initialised to 0b1.
+    - HDFGRTR_EL2.nBRBIDR  (bit 59) must be initialised to 0b1.
+
+    - HDFGWTR_EL2.nBRBDATA (bit 61) must be initialised to 0b1.
+    - HDFGWTR_EL2.nBRBCTL  (bit 60) must be initialised to 0b1.
+
+    - HFGITR_EL2.nBRBIALL (bit 56) must be initialised to 0b1.
+    - HFGITR_EL2.nBRBINJ  (bit 55) must be initialised to 0b1.
+
  For CPUs with the Performance Monitors Extension (FEAT_PMUv3p9):

 - If EL3 is present:
@ -404,6 +466,17 @@ Before jumping into the kernel, the following conditions must be met:
    - HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
    - HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.

+  For CPUs with SPE data source filtering (FEAT_SPE_FDS):
+
+  - If EL3 is present:
+
+    - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
+
+  - If the kernel is entered at EL1 and EL2 is present:
+
+    - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
+    - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
+
  For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):

  - If the kernel is entered at EL1 and EL2 is present:
--- a/Documentation/arch/arm64/cpu-feature-registers.rst
+++ b/Documentation/arch/arm64/cpu-feature-registers.rst
@ -72,14 +72,15 @@ there are some issues with their usage.
    process could be migrated to another CPU by the time it uses the
    register value, unless the CPU affinity is set. Hence, there is no
    guarantee that the value reflects the processor that it is
-    currently executing on. The REVIDR is not exposed due to this
-    constraint, as REVIDR makes sense only in conjunction with the
-    MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs
-    at::
+    currently executing on. REVIDR and AIDR are not exposed due to this
+    constraint, as these registers only make sense in conjunction with
+    the MIDR. Alternately, MIDR_EL1, REVIDR_EL1, and AIDR_EL1 are exposed
+    via sysfs at::

 	/sys/devices/system/cpu/cpu$ID/regs/identification/
-	                                              \- midr
-	                                              \- revidr
+	                                              \- midr_el1
+	                                              \- revidr_el1
+	                                              \- aidr_el1

 3. Implementation
 --------------------
--- a/Documentation/arch/arm64/elf_hwcaps.rst
+++ b/Documentation/arch/arm64/elf_hwcaps.rst
@ -435,6 +435,16 @@ HWCAP2_SME_SF8DP4
 HWCAP2_POE
    Functionality implied by ID_AA64MMFR3_EL1.S1POE == 0b0001.

+HWCAP3_MTE_FAR
+    Functionality implied by ID_AA64PFR2_EL1.MTEFAR == 0b0001.
+
+HWCAP3_MTE_STORE_ONLY
+    Functionality implied by ID_AA64PFR2_EL1.MTESTOREONLY == 0b0001.
+
+HWCAP3_LSFE
+    Functionality implied by ID_AA64ISAR3_EL1.LSFE == 0b0001
+
+
 4. Unused AT_HWCAP bits
 -----------------------

--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@ -200,6 +200,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-V3     | #3312417        | ARM64_ERRATUM_3194386       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-V3AE   | #3312417        | ARM64_ERRATUM_3194386       |
+----------------+-----------------+-----------------+-----------------------------+
 | ARM            | MMU-500         | #841119,826419  | ARM_SMMU_MMU_500_CPRE_ERRATA|
 |                |                 | #562869,1047329 |                             |
 +----------------+-----------------+-----------------+-----------------------------+
@ -286,6 +288,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | Rockchip       | RK3588          | #3588001        | ROCKCHIP_ERRATUM_3588001    |
 +----------------+-----------------+-----------------+-----------------------------+
+| Rockchip       | RK3568          | #3568002        | ROCKCHIP_ERRATUM_3568002    |
+----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
 | Fujitsu        | A64FX           | E#010001        | FUJITSU_ERRATUM_010001      |
 +----------------+-----------------+-----------------+-----------------------------+
--- a/Documentation/arch/arm64/sme.rst
+++ b/Documentation/arch/arm64/sme.rst
@ -69,8 +69,8 @@ model features for SME is included in Appendix A.
  vectors from 0 to VL/8-1 stored in the same endianness invariant format as is
  used for SVE vectors.

-* On thread creation TPIDR2_EL0 is preserved unless CLONE_SETTLS is specified,
-  in which case it is set to 0.
+* On thread creation PSTATE.ZA and TPIDR2_EL0 are preserved unless CLONE_VM
+  is specified, in which case PSTATE.ZA is set to 0 and TPIDR2_EL0 is set to 0.

 2.  Vector lengths
 ------------------
@ -81,17 +81,7 @@ The ZA matrix is square with each side having as many bytes as a streaming
 mode SVE vector.


-3.  Sharing of streaming and non-streaming mode SVE state
---------------------------------------------------------
-
-It is implementation defined which if any parts of the SVE state are shared
-between streaming and non-streaming modes.  When switching between modes
-via software interfaces such as ptrace if no register content is provided as
-part of switching no state will be assumed to be shared and everything will
-be zeroed.
-
-
-4.  System call behaviour
+3.  System call behaviour
 -------------------------

 * On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
@ -112,10 +102,10 @@ be zeroed.
  exceptions for execve() described in section 6.


-5.  Signal handling
+4.  Signal handling
 -------------------

-* Signal handlers are invoked with streaming mode and ZA disabled.
+* Signal handlers are invoked with PSTATE.SM=0, PSTATE.ZA=0, and TPIDR2_EL0=0.

 * A new signal frame record TPIDR2_MAGIC is added formatted as a struct
  tpidr2_context to allow access to TPIDR2_EL0 from signal handlers.
@ -241,7 +231,7 @@ prctl(PR_SME_SET_VL, unsigned long arg)
      length, or calling PR_SME_SET_VL with the PR_SME_SET_VL_ONEXEC flag,
      does not constitute a change to the vector length for this purpose.

-    * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared.
+    * Changing the vector length causes PSTATE.ZA to be cleared.
      Calling PR_SME_SET_VL with vl equal to the thread's current vector
      length, or calling PR_SME_SET_VL with the PR_SME_SET_VL_ONEXEC flag,
      does not constitute a change to the vector length for this purpose.
--- a/Documentation/arch/arm64/tagged-pointers.rst
+++ b/Documentation/arch/arm64/tagged-pointers.rst
@ -60,11 +60,12 @@ that signal handlers in applications making use of tags cannot rely
 on the tag information for user virtual addresses being maintained
 in these fields unless the flag was set.

-Due to architecture limitations, bits 63:60 of the fault address
-are not preserved in response to synchronous tag check faults
-(SEGV_MTESERR) even if SA_EXPOSE_TAGBITS was set. Applications should
-treat the values of these bits as undefined in order to accommodate
-future architecture revisions which may preserve the bits.
+If FEAT_MTE_TAGGED_FAR (Armv8.9) is supported, bits 63:60 of the fault address
+are preserved in response to synchronous tag check faults (SEGV_MTESERR)
+otherwise not preserved even if SA_EXPOSE_TAGBITS was set.
+Applications should interpret the values of these bits based on
+the support for the HWCAP3_MTE_FAR. If the support is not present,
+the values of these bits should be considered as undefined otherwise valid.

 For signals raised in response to watchpoint debug exceptions, the
 tag information will be preserved regardless of the SA_EXPOSE_TAGBITS
--- a/Documentation/arch/powerpc/htm.rst
+++ b/Documentation/arch/powerpc/htm.rst
@ -0,0 +1,104 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. _htm:
+
+===================================
+HTM (Hardware Trace Macro)
+===================================
+
+Athira Rajeev, 2 Mar 2025
+
+.. contents::
+    :depth: 3
+
+
+Basic overview
+==============
+
+H_HTM is used as an interface for executing Hardware Trace Macro (HTM)
+functions, including setup, configuration, control and dumping of the HTM data.
+For using HTM, it is required to setup HTM buffers and HTM operations can
+be controlled using the H_HTM hcall. The hcall can be invoked for any core/chip
+of the system from within a partition itself. To use this feature, a debugfs
+folder called "htmdump" is present under /sys/kernel/debug/powerpc.
+
+
+HTM debugfs example usage
+=========================
+
+.. code-block:: sh
+
+  #  ls /sys/kernel/debug/powerpc/htmdump/
+  coreindexonchip  htmcaps  htmconfigure  htmflags  htminfo  htmsetup
+  htmstart  htmstatus  htmtype  nodalchipindex  nodeindex  trace
+
+Details on each file:
+
+* nodeindex, nodalchipindex, coreindexonchip specifies which partition to configure the HTM for.
+* htmtype: specifies the type of HTM. Supported target is hardwareTarget.
+* trace: is to read the HTM data.
+* htmconfigure: Configure/Deconfigure the HTM. Writing 1 to the file will configure the trace, writing 0 to the file will do deconfigure.
+* htmstart: start/Stop the HTM. Writing 1 to the file will start the tracing, writing 0 to the file will stop the tracing.
+* htmstatus: get the status of HTM. This is needed to understand the HTM state after each operation.
+* htmsetup: set the HTM buffer size. Size of HTM buffer is in power of 2
+* htminfo: provides the system processor configuration details. This is needed to understand the appropriate values for nodeindex, nodalchipindex, coreindexonchip.
+* htmcaps : provides the HTM capabilities like minimum/maximum buffer size, what kind of tracing the HTM supports etc.
+* htmflags : allows to pass flags to hcall. Currently supports controlling the wrapping of HTM buffer.
+
+To see the system processor configuration details:
+
+.. code-block:: sh
+
+  # cat /sys/kernel/debug/powerpc/htmdump/htminfo > htminfo_file
+
+The result can be interpreted using hexdump.
+
+To collect HTM traces for a partition represented by nodeindex as
+zero, nodalchipindex as 1 and coreindexonchip as 12
+
+.. code-block:: sh
+
+  # cd /sys/kernel/debug/powerpc/htmdump/
+  # echo 2 > htmtype
+  # echo 33 > htmsetup ( sets 8GB memory for HTM buffer, number is size in power of 2 )
+
+This requires a CEC reboot to get the HTM buffers allocated.
+
+.. code-block:: sh
+
+  # cd /sys/kernel/debug/powerpc/htmdump/
+  # echo 2 > htmtype
+  # echo 0 > nodeindex
+  # echo 1 > nodalchipindex
+  # echo 12 > coreindexonchip
+  # echo 1 > htmflags     # to set noWrap for HTM buffers
+  # echo 1 > htmconfigure # Configure the HTM
+  # echo 1 > htmstart     # Start the HTM
+  # echo 0 > htmstart     # Stop the HTM
+  # echo 0 > htmconfigure # Deconfigure the HTM
+  # cat htmstatus         # Dump the status of HTM entries as data
+
+Above will set the htmtype and core details, followed by executing respective HTM operation.
+
+Read the HTM trace data
+========================
+
+After starting the trace collection, run the workload
+of interest. Stop the trace collection after required period
+of time, and read the trace file.
+
+.. code-block:: sh
+
+  # cat /sys/kernel/debug/powerpc/htmdump/trace > trace_file
+
+This trace file will contain the relevant instruction traces
+collected during the workload execution. And can be used as
+input file for trace decoders to understand data.
+
+Benefits of using HTM debugfs interface
+=======================================
+
+It is now possible to collect traces for a particular core/chip
+from within any partition of the system and decode it. Through
+this enablement, a small partition can be dedicated to collect the
+trace data and analyze to provide important information for Performance
+analysis, Software tuning, or Hardware debug.
--- a/Documentation/arch/powerpc/index.rst
+++ b/Documentation/arch/powerpc/index.rst
@ -21,6 +21,7 @@ powerpc
    elf_hwcaps
    elfnote
    firmware-assisted-dump
+    htm
    hvcs
    imc
    isa-versions
--- a/Documentation/arch/powerpc/papr_hcalls.rst
+++ b/Documentation/arch/powerpc/papr_hcalls.rst
@ -289,6 +289,17 @@ to be issued multiple times in order to be completely serviced. The
 subsequent hcalls to the hypervisor until the hcall is completely serviced
 at which point H_SUCCESS or other error is returned by the hypervisor.

+**H_HTM**
+
+| Input: flags, target, operation (op), op-param1, op-param2, op-param3
+| Out: *dumphtmbufferdata*
+| Return Value: *H_Success,H_Busy,H_LongBusyOrder,H_Partial,H_Parameter,
+		 H_P2,H_P3,H_P4,H_P5,H_P6,H_State,H_Not_Available,H_Authority*
+
+H_HTM supports setup, configuration, control and dumping of Hardware Trace
+Macro (HTM) function and its data. HTM buffer stores tracing data for functions
+like core instruction, core LLAT and nest.
+
 References
 ==========
 .. [1] "Power Architecture Platform Reference"
--- a/Documentation/arch/s390/driver-model.rst
+++ b/Documentation/arch/s390/driver-model.rst
@ -305,24 +305,3 @@ xpram shows up under devices/system/ as 'xpram'.

 For each cpu, a directory is created under devices/system/cpu/. Each cpu has an
 attribute 'online' which can be 0 or 1.
-
-
-4. Other devices
----------------
-
-4.1 Netiucv
-----------
-
-The netiucv driver creates an attribute 'connection' under
-bus/iucv/drivers/netiucv. Piping to this attribute creates a new netiucv
-connection to the specified host.
-
-Netiucv connections show up under devices/iucv/ as "netiucv<ifnum>". The interface
-number is assigned sequentially to the connections defined via the 'connection'
-attribute.
-
-user
-    - shows the connection partner.
-
-buffer
-    - maximum buffer size. Pipe to it to change buffer size.
--- a/Documentation/arch/x86/amd-memory-encryption.rst
+++ b/Documentation/arch/x86/amd-memory-encryption.rst
@ -130,8 +130,126 @@ SNP feature support.

 More details in AMD64 APM[1] Vol 2: 15.34.10 SEV_STATUS MSR

+Reverse Map Table (RMP)
+=======================
+
+The RMP is a structure in system memory that is used to ensure a one-to-one
+mapping between system physical addresses and guest physical addresses. Each
+page of memory that is potentially assignable to guests has one entry within
+the RMP.
+
+The RMP table can be either contiguous in memory or a collection of segments
+in memory.
+
+Contiguous RMP
+--------------
+
+Support for this form of the RMP is present when support for SEV-SNP is
+present, which can be determined using the CPUID instruction::
+
+	0x8000001f[eax]:
+		Bit[4] indicates support for SEV-SNP
+
+The location of the RMP is identified to the hardware through two MSRs::
+
+        0xc0010132 (RMP_BASE):
+                System physical address of the first byte of the RMP
+
+        0xc0010133 (RMP_END):
+                System physical address of the last byte of the RMP
+
+Hardware requires that RMP_BASE and (RPM_END + 1) be 8KB aligned, but SEV
+firmware increases the alignment requirement to require a 1MB alignment.
+
+The RMP consists of a 16KB region used for processor bookkeeping followed
+by the RMP entries, which are 16 bytes in size. The size of the RMP
+determines the range of physical memory that the hypervisor can assign to
+SEV-SNP guests. The RMP covers the system physical address from::
+
+        0 to ((RMP_END + 1 - RMP_BASE - 16KB) / 16B) x 4KB.
+
+The current Linux support relies on BIOS to allocate/reserve the memory for
+the RMP and to set RMP_BASE and RMP_END appropriately. Linux uses the MSR
+values to locate the RMP and determine the size of the RMP. The RMP must
+cover all of system memory in order for Linux to enable SEV-SNP.
+
+Segmented RMP
+-------------
+
+Segmented RMP support is a new way of representing the layout of an RMP.
+Initial RMP support required the RMP table to be contiguous in memory.
+RMP accesses from a NUMA node on which the RMP doesn't reside
+can take longer than accesses from a NUMA node on which the RMP resides.
+Segmented RMP support allows the RMP entries to be located on the same
+node as the memory the RMP is covering, potentially reducing latency
+associated with accessing an RMP entry associated with the memory. Each
+RMP segment covers a specific range of system physical addresses.
+
+Support for this form of the RMP can be determined using the CPUID
+instruction::
+
+        0x8000001f[eax]:
+                Bit[23] indicates support for segmented RMP
+
+If supported, segmented RMP attributes can be found using the CPUID
+instruction::
+
+        0x80000025[eax]:
+                Bits[5:0]  minimum supported RMP segment size
+                Bits[11:6] maximum supported RMP segment size
+
+        0x80000025[ebx]:
+                Bits[9:0]  number of cacheable RMP segment definitions
+                Bit[10]    indicates if the number of cacheable RMP segments
+                           is a hard limit
+
+To enable a segmented RMP, a new MSR is available::
+
+        0xc0010136 (RMP_CFG):
+                Bit[0]     indicates if segmented RMP is enabled
+                Bits[13:8] contains the size of memory covered by an RMP
+                           segment (expressed as a power of 2)
+
+The RMP segment size defined in the RMP_CFG MSR applies to all segments
+of the RMP. Therefore each RMP segment covers a specific range of system
+physical addresses. For example, if the RMP_CFG MSR value is 0x2401, then
+the RMP segment coverage value is 0x24 => 36, meaning the size of memory
+covered by an RMP segment is 64GB (1 << 36). So the first RMP segment
+covers physical addresses from 0 to 0xF_FFFF_FFFF, the second RMP segment
+covers physical addresses from 0x10_0000_0000 to 0x1F_FFFF_FFFF, etc.
+
+When a segmented RMP is enabled, RMP_BASE points to the RMP bookkeeping
+area as it does today (16K in size). However, instead of RMP entries
+beginning immediately after the bookkeeping area, there is a 4K RMP
+segment table (RST). Each entry in the RST is 8-bytes in size and represents
+an RMP segment::
+
+        Bits[19:0]  mapped size (in GB)
+                    The mapped size can be less than the defined segment size.
+                    A value of zero, indicates that no RMP exists for the range
+                    of system physical addresses associated with this segment.
+        Bits[51:20] segment physical address
+                    This address is left shift 20-bits (or just masked when
+                    read) to form the physical address of the segment (1MB
+                    alignment).
+
+The RST can hold 512 segment entries but can be limited in size to the number
+of cacheable RMP segments (CPUID 0x80000025_EBX[9:0]) if the number of cacheable
+RMP segments is a hard limit (CPUID 0x80000025_EBX[10]).
+
+The current Linux support relies on BIOS to allocate/reserve the memory for
+the segmented RMP (the bookkeeping area, RST, and all segments), build the RST
+and to set RMP_BASE, RMP_END, and RMP_CFG appropriately. Linux uses the MSR
+values to locate the RMP and determine the size and location of the RMP
+segments. The RMP must cover all of system memory in order for Linux to enable
+SEV-SNP.
+
+More details in the AMD64 APM Vol 2, section "15.36.3 Reverse Map Table",
+docID: 24593.
+
 Secure VM Service Module (SVSM)
 ===============================
+
 SNP provides a feature called Virtual Machine Privilege Levels (VMPL) which
 defines four privilege levels at which guest software can run. The most
 privileged level is 0 and numerically higher numbers have lesser privileges.
--- a/Documentation/arch/x86/amd_hsmp.rst
+++ b/Documentation/arch/x86/amd_hsmp.rst
@ -4,8 +4,9 @@
 AMD HSMP interface
 ============================================

-Newer Fam19h EPYC server line of processors from AMD support system
-management functionality via HSMP (Host System Management Port).
+Newer Fam19h(model 0x00-0x1f, 0x30-0x3f, 0x90-0x9f, 0xa0-0xaf),
+Fam1Ah(model 0x00-0x1f) EPYC server line of processors from AMD support
+system management functionality via HSMP (Host System Management Port).

 The Host System Management Port (HSMP) is an interface to provide
 OS-level software with access to system management functions via a
@ -16,14 +17,25 @@ More details on the interface can be found in chapter
 Eg: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/55898_B1_pub_0_50.zip


-HSMP interface is supported on EPYC server CPU models only.
+HSMP interface is supported on EPYC line of server CPUs and MI300A (APU).


 HSMP device
 ============================================

-amd_hsmp driver under the drivers/platforms/x86/ creates miscdevice
-/dev/hsmp to let user space programs run hsmp mailbox commands.
+amd_hsmp driver under drivers/platforms/x86/amd/hsmp/ has separate driver files
+for ACPI object based probing, platform device based probing and for the common
+code for these two drivers.
+
+Kconfig option CONFIG_AMD_HSMP_PLAT compiles plat.c and creates amd_hsmp.ko.
+Kconfig option CONFIG_AMD_HSMP_ACPI compiles acpi.c and creates hsmp_acpi.ko.
+Selecting any of these two configs automatically selects CONFIG_AMD_HSMP. This
+compiles common code hsmp.c and creates hsmp_common.ko module.
+
+Both the ACPI and plat drivers create the miscdevice /dev/hsmp to let
+user space programs run hsmp mailbox commands.
+
+The ACPI object format supported by the driver is defined below.

 $ ls -al /dev/hsmp
 crw-r--r-- 1 root root 10, 123 Jan 21 21:41 /dev/hsmp
@ -59,6 +71,81 @@ Note: lseek() is not supported as entire metrics table is read.
 Metrics table definitions will be documented as part of Public PPR.
 The same is defined in the amd_hsmp.h header.

+2. HSMP telemetry sysfs files
+
+Following sysfs files are available at /sys/devices/platform/AMDI0097:0X/.
+
+* c0_residency_input: Percentage of cores in C0 state.
+* prochot_status: Reports 1 if the processor is at thermal threshold value,
+  0 otherwise.
+* smu_fw_version: SMU firmware version.
+* protocol_version: HSMP interface version.
+* ddr_max_bw: Theoretical maximum DDR bandwidth in GB/s.
+* ddr_utilised_bw_input: Current utilized DDR bandwidth in GB/s.
+* ddr_utilised_bw_perc_input(%): Percentage of current utilized DDR bandwidth.
+* mclk_input: Memory clock in MHz.
+* fclk_input: Fabric clock in MHz.
+* clk_fmax: Maximum frequency of socket in MHz.
+* clk_fmin: Minimum frequency of socket in MHz.
+* cclk_freq_limit_input: Core clock frequency limit per socket in MHz.
+* pwr_current_active_freq_limit: Current active frequency limit of socket
+  in MHz.
+* pwr_current_active_freq_limit_source: Source of current active frequency
+  limit.
+
+ACPI device object format
+=========================
+The ACPI object format expected from the amd_hsmp driver
+for socket with ID00 is given below::
+
+  Device(HSMP)
+		{
+			Name(_HID, "AMDI0097")
+			Name(_UID, "ID00")
+			Name(HSE0, 0x00000001)
+			Name(RBF0, ResourceTemplate()
+			{
+				Memory32Fixed(ReadWrite, 0xxxxxxx, 0x00100000)
+			})
+			Method(_CRS, 0, NotSerialized)
+			{
+				Return(RBF0)
+			}
+			Method(_STA, 0, NotSerialized)
+			{
+				If(LEqual(HSE0, One))
+				{
+					Return(0x0F)
+				}
+				Else
+				{
+					Return(Zero)
+				}
+			}
+			Name(_DSD, Package(2)
+			{
+				Buffer(0x10)
+				{
+					0x9D, 0x61, 0x4D, 0xB7, 0x07, 0x57, 0xBD, 0x48,
+					0xA6, 0x9F, 0x4E, 0xA2, 0x87, 0x1F, 0xC2, 0xF6
+				},
+				Package(3)
+				{
+					Package(2) {"MsgIdOffset", 0x00010934},
+					Package(2) {"MsgRspOffset", 0x00010980},
+					Package(2) {"MsgArgOffset", 0x000109E0}
+				}
+			})
+		}
+
+HSMP HWMON interface
+====================
+HSMP power sensors are registered with the hwmon interface. A separate hwmon
+directory is created for each socket and the following files are generated
+within the hwmon directory.
+- power1_input (read only)
+- power1_cap_max (read only)
+- power1_cap (read, write)

 An example
 ==========
--- a/Documentation/arch/x86/boot.rst
+++ b/Documentation/arch/x86/boot.rst
@ -1029,16 +1029,6 @@ Offset/size:	0x000c/4
  This field contains maximal allowed type for setup_data and setup_indirect structs.


-The Image Checksum
-==================
-
-From boot protocol version 2.08 onwards the CRC-32 is calculated over
-the entire file using the characteristic polynomial 0x04C11DB7 and an
-initial remainder of 0xffffffff.  The checksum is appended to the
-file; therefore the CRC of the file up to the limit specified in the
-syssize field of the header is always 0.
-
-
 The Kernel Command Line
 =======================

--- a/Documentation/arch/x86/buslock.rst
+++ b/Documentation/arch/x86/buslock.rst
@ -26,7 +26,8 @@ Detection
 =========

 Intel processors may support either or both of the following hardware
-mechanisms to detect split locks and bus locks.
+mechanisms to detect split locks and bus locks. Some AMD processors also
+support bus lock detect.

 #AC exception for split lock detection
 --------------------------------------
--- a/Documentation/arch/x86/cpuinfo.rst
+++ b/Documentation/arch/x86/cpuinfo.rst
@ -130,14 +130,18 @@ x86_cap/bug_flags[] arrays in kernel/cpu/capflags.c. The names in the
 resulting x86_cap/bug_flags[] are used to populate /proc/cpuinfo. The naming
 of flags in the x86_cap/bug_flags[] are as follows:

-a: The name of the flag is from the string in X86_FEATURE_<name> by default.
----------------------------------------------------------------------------
-By default, the flag <name> in /proc/cpuinfo is extracted from the respective
-X86_FEATURE_<name> in cpufeatures.h. For example, the flag "avx2" is from
-X86_FEATURE_AVX2.
+a: Flags do not appear by default in /proc/cpuinfo
+--------------------------------------------------
+
+Feature flags are omitted by default from /proc/cpuinfo as it does not make
+sense for the feature to be exposed to userspace in most cases. For example,
+X86_FEATURE_ALWAYS is defined in cpufeatures.h but that flag is an internal
+kernel feature used in the alternative runtime patching functionality. So the
+flag does not appear in /proc/cpuinfo.
+
+b: Specify a flag name if absolutely needed
+-------------------------------------------

-b: The naming can be overridden.
--------------------------------
 If the comment on the line for the #define X86_FEATURE_* starts with a
 double-quote character (""), the string inside the double-quote characters
 will be the name of the flags. For example, the flag "sse4_1" comes from
@ -148,14 +152,6 @@ needed. For instance, /proc/cpuinfo is a userspace interface and must remain
 constant. If, for some reason, the naming of X86_FEATURE_<name> changes, one
 shall override the new naming with the name already used in /proc/cpuinfo.

-c: The naming override can be "", which means it will not appear in /proc/cpuinfo.
----------------------------------------------------------------------------------
-The feature shall be omitted from /proc/cpuinfo if it does not make sense for
-the feature to be exposed to userspace. For example, X86_FEATURE_ALWAYS is
-defined in cpufeatures.h but that flag is an internal kernel feature used
-in the alternative runtime patching functionality. So, its name is overridden
-with "". Its flag will not appear in /proc/cpuinfo.
-
 Flags are missing when one or more of these happen
 ==================================================

--- a/Documentation/arch/x86/index.rst
+++ b/Documentation/arch/x86/index.rst
@ -32,7 +32,6 @@ x86-specific Documentation
   pti
   mds
   microcode
-   resctrl
   tsx_async_abort
   buslock
   usb-legacy-support
--- a/Documentation/arch/x86/x86_64/boot-options.rst
+++ b/Documentation/arch/x86/x86_64/boot-options.rst
@ -305,3 +305,8 @@ The available options are:

   debug
     Enable debug messages.
+
+   nosnp
+     Do not enable SEV-SNP (applies to host/hypervisor only). Setting
+     'nosnp' avoids the RMP check overhead in memory accesses when
+     users do not want to run SEV-SNP guests.
--- a/Documentation/block/ublk.rst
+++ b/Documentation/block/ublk.rst
@ -115,15 +115,15 @@ managing and controlling ublk devices with help of several control commands:

 - ``UBLK_CMD_START_DEV``

-  After the server prepares userspace resources (such as creating per-queue
-  pthread & io_uring for handling ublk IO), this command is sent to the
+  After the server prepares userspace resources (such as creating I/O handler
+  threads & io_uring for handling ublk IO), this command is sent to the
  driver for allocating & exposing ``/dev/ublkb*``. Parameters set via
  ``UBLK_CMD_SET_PARAMS`` are applied for creating the device.

 - ``UBLK_CMD_STOP_DEV``

  Halt IO on ``/dev/ublkb*`` and remove the device. When this command returns,
-  ublk server will release resources (such as destroying per-queue pthread &
+  ublk server will release resources (such as destroying I/O handler threads &
  io_uring).

 - ``UBLK_CMD_DEL_DEV``
@ -208,15 +208,15 @@ managing and controlling ublk devices with help of several control commands:
  modify how I/O is handled while the ublk server is dying/dead (this is called
  the ``nosrv`` case in the driver code).

-  With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
-  handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
+  With just ``UBLK_F_USER_RECOVERY`` set, after the ublk server exits,
+  ublk does not delete ``/dev/ublkb*`` during the whole
  recovery stage and ublk device ID is kept. It is ublk server's
  responsibility to recover the device context by its own knowledge.
  Requests which have not been issued to userspace are requeued. Requests
  which have been issued to userspace are aborted.

-  With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
-  (ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
+  With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after the ublk server
+  exits, contrary to ``UBLK_F_USER_RECOVERY``,
  requests which have been issued to userspace are requeued and will be
  re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
  ``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
@ -241,10 +241,11 @@ can be controlled/accessed just inside this container.
 Data plane
 ----------

-ublk server needs to create per-queue IO pthread & io_uring for handling IO
-commands via io_uring passthrough. The per-queue IO pthread
-focuses on IO handling and shouldn't handle any control & management
-tasks.
+The ublk server should create dedicated threads for handling I/O. Each
+thread should have its own io_uring through which it is notified of new
+I/O, and through which it can complete I/O. These dedicated threads
+should focus on IO handling and shouldn't handle any control &
+management tasks.

 The's IO is assigned by a unique tag, which is 1:1 mapping with IO
 request of ``/dev/ublkb*``.
@ -265,6 +266,18 @@ with specified IO tag in the command data:
  destined to ``/dev/ublkb*``. This command is sent only once from the server
  IO pthread for ublk driver to setup IO forward environment.

+  Once a thread issues this command against a given (qid,tag) pair, the thread
+  registers itself as that I/O's daemon. In the future, only that I/O's daemon
+  is allowed to issue commands against the I/O. If any other thread attempts
+  to issue a command against a (qid,tag) pair for which the thread is not the
+  daemon, the command will fail. Daemons can be reset only be going through
+  recovery.
+
+  The ability for every (qid,tag) pair to have its own independent daemon task
+  is indicated by the ``UBLK_F_PER_IO_DAEMON`` feature. If this feature is not
+  supported by the driver, daemons must be per-queue instead - i.e. all I/Os
+  associated to a single qid must be handled by the same task.
+
 - ``UBLK_IO_COMMIT_AND_FETCH_REQ``

  When an IO request is destined to ``/dev/ublkb*``, the driver stores
@ -309,18 +322,112 @@ with specified IO tag in the command data:
  ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
  the server buffer (pages) read to the IO request pages.

-Future development
-==================
-
 Zero copy
 ---------

-Zero copy is a generic requirement for nbd, fuse or similar drivers. A
-problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
-can't be remapped any more in kernel with existing mm interfaces. This can
-occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
-big requests (IO size >= 256 KB) may benefit a lot from zero copy.
+ublk zero copy relies on io_uring's fixed kernel buffer, which provides
+two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.

+ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
+`io_buffer_register_bvec()` for ublk server to register client request
+buffer into io_uring buffer table, then ublk server can submit io_uring
+IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
+calls `io_buffer_unregister_bvec()` to unregister the buffer, which is
+guaranteed to be live between calling `io_buffer_register_bvec()` and
+`io_buffer_unregister_bvec()`. Any io_uring operation which supports this
+kind of kernel buffer will grab one reference of the buffer until the
+operation is completed.
+
+ublk server implementing zero copy or user copy has to be CAP_SYS_ADMIN and
+be trusted, because it is ublk server's responsibility to make sure IO buffer
+filled with data for handling read command, and ublk server has to return
+correct result to ublk driver when handling READ command, and the result
+has to match with how many bytes filled to the IO buffer. Otherwise,
+uninitialized kernel IO buffer will be exposed to client application.
+
+ublk server needs to align the parameter of `struct ublk_param_dma_align`
+with backend for zero copy to work correctly.
+
+For reaching best IO performance, ublk server should align its segment
+parameter of `struct ublk_param_segment` with backend for avoiding
+unnecessary IO split, which usually hurts io_uring performance.
+
+Auto Buffer Registration
+------------------------
+
+The ``UBLK_F_AUTO_BUF_REG`` feature automatically handles buffer registration
+and unregistration for I/O requests, which simplifies the buffer management
+process and reduces overhead in the ublk server implementation.
+
+This is another feature flag for using zero copy, and it is compatible with
+``UBLK_F_SUPPORT_ZERO_COPY``.
+
+Feature Overview
+~~~~~~~~~~~~~~~~
+
+This feature automatically registers request buffers to the io_uring context
+before delivering I/O commands to the ublk server and unregisters them when
+completing I/O commands. This eliminates the need for manual buffer
+registration/unregistration via ``UBLK_IO_REGISTER_IO_BUF`` and
+``UBLK_IO_UNREGISTER_IO_BUF`` commands, then IO handling in ublk server
+can avoid dependency on the two uring_cmd operations.
+
+IOs can't be issued concurrently to io_uring if there is any dependency
+among these IOs. So this way not only simplifies ublk server implementation,
+but also makes concurrent IO handling becomes possible by removing the
+dependency on buffer registration & unregistration commands.
+
+Usage Requirements
+~~~~~~~~~~~~~~~~~~
+
+1. The ublk server must create a sparse buffer table on the same ``io_ring_ctx``
+   used for ``UBLK_IO_FETCH_REQ`` and ``UBLK_IO_COMMIT_AND_FETCH_REQ``. If
+   uring_cmd is issued on a different ``io_ring_ctx``, manual buffer
+   unregistration is required.
+
+2. Buffer registration data must be passed via uring_cmd's ``sqe->addr`` with the
+   following structure::
+
+    struct ublk_auto_buf_reg {
+        __u16 index;      /* Buffer index for registration */
+        __u8 flags;       /* Registration flags */
+        __u8 reserved0;   /* Reserved for future use */
+        __u32 reserved1;  /* Reserved for future use */
+    };
+
+   ublk_auto_buf_reg_to_sqe_addr() is for converting the above structure into
+   ``sqe->addr``.
+
+3. All reserved fields in ``ublk_auto_buf_reg`` must be zeroed.
+
+4. Optional flags can be passed via ``ublk_auto_buf_reg.flags``.
+
+Fallback Behavior
+~~~~~~~~~~~~~~~~~
+
+If auto buffer registration fails:
+
+1. When ``UBLK_AUTO_BUF_REG_FALLBACK`` is enabled:
+
+   - The uring_cmd is completed
+   - ``UBLK_IO_F_NEED_REG_BUF`` is set in ``ublksrv_io_desc.op_flags``
+   - The ublk server must manually deal with the failure, such as, register
+     the buffer manually, or using user copy feature for retrieving the data
+     for handling ublk IO
+
+2. If fallback is not enabled:
+
+   - The ublk I/O request fails silently
+   - The uring_cmd won't be completed
+
+Limitations
+~~~~~~~~~~~
+
+- Requires same ``io_ring_ctx`` for all operations
+- May require manual buffer management in fallback cases
+- io_ring_ctx buffer table has a max size of 16K, which may not be enough
+  in case that too many ublk devices are handled by this single io_ring_ctx
+  and each one has very large queue depth

 References
 ==========
@ -334,5 +441,3 @@ References
 .. [#userspace_readme] https://github.com/ming1/ubdsrv/blob/master/README

 .. [#stefan] https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/
-
-.. [#xiaoguang] https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/
--- a/Documentation/bpf/bpf_devel_QA.rst
+++ b/Documentation/bpf/bpf_devel_QA.rst
@ -382,6 +382,14 @@ In case of new BPF instructions, once the changes have been accepted
 into the Linux kernel, please implement support into LLVM's BPF back
 end. See LLVM_ section below for further information.

+Q: What "BPF_INTERNAL" symbol namespace is for?
+-----------------------------------------------
+A: Symbols exported as BPF_INTERNAL can only be used by BPF infrastructure
+like preload kernel modules with light skeleton. Most symbols outside
+of BPF_INTERNAL are not expected to be used by code outside of BPF either.
+Symbols may lack the designation because they predate the namespaces,
+or due to an oversight.
+
 Stable submission
 =================

@ -603,9 +611,10 @@ Q: I have added a new BPF instruction to the kernel, how can I integrate
 it into LLVM?

 A: LLVM has a ``-mcpu`` selector for the BPF back end in order to allow
-the selection of BPF instruction set extensions. By default the
-``generic`` processor target is used, which is the base instruction set
-(v1) of BPF.
+the selection of BPF instruction set extensions. Before llvm version 20,
+the ``generic`` processor target is used, which is the base instruction
+set (v1) of BPF. Since llvm 20, the default processor target has changed
+to instruction set v3.

 LLVM has an option to select ``-mcpu=probe`` where it will probe the host
 kernel for supported BPF instruction set extensions and selects the
--- a/Documentation/bpf/bpf_iterators.rst
+++ b/Documentation/bpf/bpf_iterators.rst
@ -2,10 +2,117 @@
 BPF Iterators
 =============

+--------
+Overview
+--------

----------
-Motivation
----------
+BPF supports two separate entities collectively known as "BPF iterators": BPF
+iterator *program type* and *open-coded* BPF iterators. The former is
+a stand-alone BPF program type which, when attached and activated by user,
+will be called once for each entity (task_struct, cgroup, etc) that is being
+iterated. The latter is a set of BPF-side APIs implementing iterator
+functionality and available across multiple BPF program types. Open-coded
+iterators provide similar functionality to BPF iterator programs, but gives
+more flexibility and control to all other BPF program types. BPF iterator
+programs, on the other hand, can be used to implement anonymous or BPF
+FS-mounted special files, whose contents are generated by attached BPF iterator
+program, backed by seq_file functionality. Both are useful depending on
+specific needs.
+
+When adding a new BPF iterator program, it is expected that similar
+functionality will be added as open-coded iterator for maximum flexibility.
+It's also expected that iteration logic and code will be maximally shared and
+reused between two iterator API surfaces.
+
+------------------------
+Open-coded BPF Iterators
+------------------------
+
+Open-coded BPF iterators are implemented as tightly-coupled trios of kfuncs
+(constructor, next element fetch, destructor) and iterator-specific type
+describing on-the-stack iterator state, which is guaranteed by the BPF
+verifier to not be tampered with outside of the corresponding
+constructor/destructor/next APIs.
+
+Each kind of open-coded BPF iterator has its own associated
+struct bpf_iter_<type>, where <type> denotes a specific type of iterator.
+bpf_iter_<type> state needs to live on BPF program stack, so make sure it's
+small enough to fit on BPF stack. For performance reasons its best to avoid
+dynamic memory allocation for iterator state and size the state struct big
+enough to fit everything necessary. But if necessary, dynamic memory
+allocation is a way to bypass BPF stack limitations. Note, state struct size
+is part of iterator's user-visible API, so changing it will break backwards
+compatibility, so be deliberate about designing it.
+
+All kfuncs (constructor, next, destructor) have to be named consistently as
+bpf_iter_<type>_{new,next,destroy}(), respectively. <type> represents iterator
+type, and iterator state should be represented as a matching
+`struct bpf_iter_<type>` state type. Also, all iter kfuncs should have
+a pointer to this `struct bpf_iter_<type>` as the very first argument.
+
+Additionally:
+  - Constructor, i.e., `bpf_iter_<type>_new()`, can have arbitrary extra
+    number of arguments. Return type is not enforced either.
+  - Next method, i.e., `bpf_iter_<type>_next()`, has to return a pointer
+    type and should have exactly one argument: `struct bpf_iter_<type> *`
+    (const/volatile/restrict and typedefs are ignored).
+  - Destructor, i.e., `bpf_iter_<type>_destroy()`, should return void and
+    should have exactly one argument, similar to the next method.
+  - `struct bpf_iter_<type>` size is enforced to be positive and
+    a multiple of 8 bytes (to fit stack slots correctly).
+
+Such strictness and consistency allows to build generic helpers abstracting
+important, but boilerplate, details to be able to use open-coded iterators
+effectively and ergonomically (see libbpf's bpf_for_each() macro). This is
+enforced at kfunc registration point by the kernel.
+
+Constructor/next/destructor implementation contract is as follows:
+  - constructor, `bpf_iter_<type>_new()`, always initializes iterator state on
+    the stack. If any of the input arguments are invalid, constructor should
+    make sure to still initialize it such that subsequent next() calls will
+    return NULL. I.e., on error, *return error and construct empty iterator*.
+    Constructor kfunc is marked with KF_ITER_NEW flag.
+
+  - next method, `bpf_iter_<type>_next()`, accepts pointer to iterator state
+    and produces an element. Next method should always return a pointer. The
+    contract between BPF verifier is that next method *guarantees* that it
+    will eventually return NULL when elements are exhausted. Once NULL is
+    returned, subsequent next calls *should keep returning NULL*. Next method
+    is marked with KF_ITER_NEXT (and should also have KF_RET_NULL as
+    NULL-returning kfunc, of course).
+
+  - destructor, `bpf_iter_<type>_destroy()`, is always called once. Even if
+    constructor failed or next returned nothing.  Destructor frees up any
+    resources and marks stack space used by `struct bpf_iter_<type>` as usable
+    for something else. Destructor is marked with KF_ITER_DESTROY flag.
+
+Any open-coded BPF iterator implementation has to implement at least these
+three methods. It is enforced that for any given type of iterator only
+applicable constructor/destructor/next are callable. I.e., verifier ensures
+you can't pass number iterator state into, say, cgroup iterator's next method.
+
+From a 10,000-feet BPF verification point of view, next methods are the points
+of forking a verification state, which are conceptually similar to what
+verifier is doing when validating conditional jumps. Verifier is branching out
+`call bpf_iter_<type>_next` instruction and simulates two outcomes: NULL
+(iteration is done) and non-NULL (new element is returned). NULL is simulated
+first and is supposed to reach exit without looping. After that non-NULL case
+is validated and it either reaches exit (for trivial examples with no real
+loop), or reaches another `call bpf_iter_<type>_next` instruction with the
+state equivalent to already (partially) validated one. State equivalency at
+that point means we technically are going to be looping forever without
+"breaking out" out of established "state envelope" (i.e., subsequent
+iterations don't add any new knowledge or constraints to the verifier state,
+so running 1, 2, 10, or a million of them doesn't matter). But taking into
+account the contract stating that iterator next method *has to* return NULL
+eventually, we can conclude that loop body is safe and will eventually
+terminate. Given we validated logic outside of the loop (NULL case), and
+concluded that loop body is safe (though potentially looping many times),
+verifier can claim safety of the overall program logic.
+
+------------------------
+BPF Iterators Motivation
+------------------------

 There are a few existing ways to dump kernel data into user space. The most
 popular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps
@ -86,7 +193,7 @@ following steps:
 The following are a few examples of selftest BPF iterator programs:

 * `bpf_iter_tcp4.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c>`_
-* `bpf_iter_task_vma.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c>`_
+* `bpf_iter_task_vmas.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vmas.c>`_
 * `bpf_iter_task_file.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c>`_

 Let us look at ``bpf_iter_task_file.c``, which runs in kernel space:
@ -323,8 +430,8 @@ Now, in the userspace program, pass the pointer of struct to the

 ::

-  link = bpf_program__attach_iter(prog, &opts); iter_fd =
-  bpf_iter_create(bpf_link__fd(link));
+  link = bpf_program__attach_iter(prog, &opts);
+  iter_fd = bpf_iter_create(bpf_link__fd(link));

 If both *tid* and *pid* are zero, an iterator created from this struct
 ``bpf_iter_attach_opts`` will include every opened file of every task in the
--- a/Documentation/bpf/btf.rst
+++ b/Documentation/bpf/btf.rst
@ -102,7 +102,8 @@ Each type contains the following common data::
         * bits 24-28: kind (e.g. int, ptr, array...etc)
         * bits 29-30: unused
         * bit     31: kind_flag, currently used by
-         *             struct, union, fwd, enum and enum64.
+         *             struct, union, enum, fwd, enum64,
+         *             decl_tag and type_tag
         */
        __u32 info;
        /* "size" is used by INT, ENUM, STRUCT, UNION and ENUM64.
@ -478,7 +479,7 @@ No additional type data follow ``btf_type``.

 ``struct btf_type`` encoding requirement:
 * ``name_off``: offset to a non-empty string
- * ``info.kind_flag``: 0
+ * ``info.kind_flag``: 0 or 1
 * ``info.kind``: BTF_KIND_DECL_TAG
 * ``info.vlen``: 0
 * ``type``: ``struct``, ``union``, ``func``, ``var`` or ``typedef``
@ -489,7 +490,6 @@ No additional type data follow ``btf_type``.
        __u32   component_idx;
    };

-The ``name_off`` encodes btf_decl_tag attribute string.
 The ``type`` should be ``struct``, ``union``, ``func``, ``var`` or ``typedef``.
 For ``var`` or ``typedef`` type, ``btf_decl_tag.component_idx`` must be ``-1``.
 For the other three types, if the btf_decl_tag attribute is
@ -499,12 +499,21 @@ the attribute is applied to a ``struct``/``union`` member or
 a ``func`` argument, and ``btf_decl_tag.component_idx`` should be a
 valid index (starting from 0) pointing to a member or an argument.

+If ``info.kind_flag`` is 0, then this is a normal decl tag, and the
+``name_off`` encodes btf_decl_tag attribute string.
+
+If ``info.kind_flag`` is 1, then the decl tag represents an arbitrary
+__attribute__. In this case, ``name_off`` encodes a string
+representing the attribute-list of the attribute specifier. For
+example, for an ``__attribute__((aligned(4)))`` the string's contents
+is ``aligned(4)``.
+
 2.2.18 BTF_KIND_TYPE_TAG
 ~~~~~~~~~~~~~~~~~~~~~~~~

 ``struct btf_type`` encoding requirement:
 * ``name_off``: offset to a non-empty string
- * ``info.kind_flag``: 0
+ * ``info.kind_flag``: 0 or 1
 * ``info.kind``: BTF_KIND_TYPE_TAG
 * ``info.vlen``: 0
 * ``type``: the type with ``btf_type_tag`` attribute
@ -522,6 +531,14 @@ type_tag, then zero or more const/volatile/restrict/typedef
 and finally the base type. The base type is one of
 int, ptr, array, struct, union, enum, func_proto and float types.

+Similarly to decl tags, if the ``info.kind_flag`` is 0, then this is a
+normal type tag, and the ``name_off`` encodes btf_type_tag attribute
+string.
+
+If ``info.kind_flag`` is 1, then the type tag represents an arbitrary
+__attribute__, and the ``name_off`` encodes a string representing the
+attribute-list of the attribute specifier.
+
 2.2.19 BTF_KIND_ENUM64
 ~~~~~~~~~~~~~~~~~~~~~~

--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@ -160,6 +160,23 @@ Or::
                ...
        }

+2.2.6 __prog Annotation
+---------------------------
+This annotation is used to indicate that the argument needs to be fixed up to
+the bpf_prog_aux of the caller BPF program. Any value passed into this argument
+is ignored, and rewritten by the verifier.
+
+An example is given below::
+
+        __bpf_kfunc int bpf_wq_set_callback_impl(struct bpf_wq *wq,
+                                                 int (callback_fn)(void *map, int *key, void *value),
+                                                 unsigned int flags,
+                                                 void *aux__prog)
+         {
+                struct bpf_prog_aux *aux = aux__prog;
+                ...
+         }
+
 .. _BPF_kfunc_nodef:

 2.3 Using an existing kernel function
--- a/Documentation/bpf/map_hash.rst
+++ b/Documentation/bpf/map_hash.rst
@ -233,10 +233,16 @@ attempts in order to enforce the LRU property which have increasing impacts on
 other CPUs involved in the following operation attempts:

 - Attempt to use CPU-local state to batch operations
- Attempt to fetch free nodes from global lists
+- Attempt to fetch ``target_free`` free nodes from global lists
 - Attempt to pull any node from a global list and remove it from the hashmap
 - Attempt to pull any node from any CPU's list and remove it from the hashmap

+The number of nodes to borrow from the global list in a batch, ``target_free``,
+depends on the size of the map. Larger batch size reduces lock contention, but
+may also exhaust the global structure. The value is computed at map init to
+avoid exhaustion, by limiting aggregate reservation by all CPUs to half the map
+size. With a minimum of a single element and maximum budget of 128 at a time.
+
 This algorithm is described visually in the following diagram. See the
 description in commit 3a08c2fd7634 ("bpf: LRU List") for a full explanation of
 the corresponding operations:
--- a/Documentation/bpf/map_lru_hash_update.dot
+++ b/Documentation/bpf/map_lru_hash_update.dot
@ -35,18 +35,18 @@ digraph {
  fn_bpf_lru_list_pop_free_to_local [shape=rectangle,fillcolor=2,
    label="Flush local pending,
    Rotate Global list, move
-    LOCAL_FREE_TARGET
+    target_free
    from global -> local"]
  // Also corresponds to:
  // fn__local_list_flush()
  // fn_bpf_lru_list_rotate()
  fn___bpf_lru_node_move_to_free[shape=diamond,fillcolor=2,
-    label="Able to free\nLOCAL_FREE_TARGET\nnodes?"]
+    label="Able to free\ntarget_free\nnodes?"]

  fn___bpf_lru_list_shrink_inactive [shape=rectangle,fillcolor=3,
    label="Shrink inactive list
      up to remaining
-      LOCAL_FREE_TARGET
+      target_free
      (global LRU -> local)"]
  fn___bpf_lru_list_shrink [shape=diamond,fillcolor=2,
    label="> 0 entries in\nlocal free list?"]
--- a/Documentation/bpf/standardization/instruction-set.rst
+++ b/Documentation/bpf/standardization/instruction-set.rst
@ -324,34 +324,42 @@ register.

 .. table:: Arithmetic instructions

-  =====  =====  =======  ==========================================================
+  =====  =====  =======  ===================================================================================
  name   code   offset   description
-  =====  =====  =======  ==========================================================
+  =====  =====  =======  ===================================================================================
  ADD    0x0    0        dst += src
  SUB    0x1    0        dst -= src
  MUL    0x2    0        dst \*= src
  DIV    0x3    0        dst = (src != 0) ? (dst / src) : 0
-  SDIV   0x3    1        dst = (src != 0) ? (dst s/ src) : 0
+  SDIV   0x3    1        dst = (src == 0) ? 0 : ((src == -1 && dst == LLONG_MIN) ? LLONG_MIN : (dst s/ src))
  OR     0x4    0        dst \|= src
  AND    0x5    0        dst &= src
  LSH    0x6    0        dst <<= (src & mask)
  RSH    0x7    0        dst >>= (src & mask)
  NEG    0x8    0        dst = -dst
  MOD    0x9    0        dst = (src != 0) ? (dst % src) : dst
-  SMOD   0x9    1        dst = (src != 0) ? (dst s% src) : dst
+  SMOD   0x9    1        dst = (src == 0) ? dst : ((src == -1 && dst == LLONG_MIN) ? 0: (dst s% src))
  XOR    0xa    0        dst ^= src
  MOV    0xb    0        dst = src
  MOVSX  0xb    8/16/32  dst = (s8,s16,s32)src
  ARSH   0xc    0        :term:`sign extending<Sign Extend>` dst >>= (src & mask)
  END    0xd    0        byte swap operations (see `Byte swap instructions`_ below)
-  =====  =====  =======  ==========================================================
+  =====  =====  =======  ===================================================================================

 Underflow and overflow are allowed during arithmetic operations, meaning
 the 64-bit or 32-bit value will wrap. If BPF program execution would
 result in division by zero, the destination register is instead set to zero.
+Otherwise, for ``ALU64``, if execution would result in ``LLONG_MIN``
+divided by -1, the destination register is instead set to ``LLONG_MIN``. For
+``ALU``, if execution would result in ``INT_MIN`` divided by -1, the
+destination register is instead set to ``INT_MIN``.
+
 If execution would result in modulo by zero, for ``ALU64`` the value of
 the destination register is unchanged whereas for ``ALU`` the upper
-32 bits of the destination register are zeroed.
+32 bits of the destination register are zeroed. Otherwise, for ``ALU64``,
+if execution would resuslt in ``LLONG_MIN`` modulo -1, the destination
+register is instead set to 0. For ``ALU``, if execution would result in
+``INT_MIN`` modulo -1, the destination register is instead set to 0.

 ``{ADD, X, ALU}``, where 'code' = ``ADD``, 'source' = ``X``, and 'class' = ``ALU``, means::

--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@ -1,25 +1,96 @@
-# -*- coding: utf-8 -*-
-#
-# The Linux Kernel documentation build configuration file, created by
-# sphinx-quickstart on Fri Feb 12 13:51:46 2016.
-#
-# This file is execfile()d with the current directory set to its
-# containing dir.
-#
-# Note that not all possible configuration values are present in this
-# autogenerated file.
-#
-# All configuration values have a default; values that are commented out
-# serve to show the default.
+# SPDX-License-Identifier: GPL-2.0-only
+# pylint: disable=C0103,C0209
+
+"""
+The Linux Kernel documentation build configuration file.
+"""

-import sys
 import os
-import sphinx
 import shutil
+import sys
+
+import sphinx
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+sys.path.insert(0, os.path.abspath("sphinx"))
+
+from load_config import loadConfig               # pylint: disable=C0413,E0401
+
+# Minimal supported version
+needs_sphinx = "3.4.3"
+
+# Get Sphinx version
+major, minor, patch = sphinx.version_info[:3]          # pylint: disable=I1101
+
+# Include_patterns were added on Sphinx 5.1
+if (major < 5) or (major == 5 and minor < 1):
+    has_include_patterns = False
+else:
+    has_include_patterns = True
+    # Include patterns that don't contain directory names, in glob format
+    include_patterns = ["**.rst"]
+
+# Location of Documentation/ directory
+doctree = os.path.abspath(".")
+
+# Exclude of patterns that don't contain directory names, in glob format.
+exclude_patterns = []
+
+# List of patterns that contain directory names in glob format.
+dyn_include_patterns = []
+dyn_exclude_patterns = ["output"]
+
+# Currently, only netlink/specs has a parser for yaml.
+# Prefer using include patterns if available, as it is faster
+if has_include_patterns:
+    dyn_include_patterns.append("netlink/specs/*.yaml")
+else:
+    dyn_exclude_patterns.append("netlink/*.yaml")
+    dyn_exclude_patterns.append("devicetree/bindings/**.yaml")
+    dyn_exclude_patterns.append("core-api/kho/bindings/**.yaml")
+
+# Properly handle include/exclude patterns
+# ----------------------------------------
+
+def update_patterns(app, config):
+    """
+    On Sphinx, all directories are relative to what it is passed as
+    SOURCEDIR parameter for sphinx-build. Due to that, all patterns
+    that have directory names on it need to be dynamically set, after
+    converting them to a relative patch.
+
+    As Sphinx doesn't include any patterns outside SOURCEDIR, we should
+    exclude relative patterns that start with "../".
+    """
+
+    # setup include_patterns dynamically
+    if has_include_patterns:
+        for p in dyn_include_patterns:
+            full = os.path.join(doctree, p)
+
+            rel_path = os.path.relpath(full, start=app.srcdir)
+            if rel_path.startswith("../"):
+                continue
+
+            config.include_patterns.append(rel_path)
+
+    # setup exclude_patterns dynamically
+    for p in dyn_exclude_patterns:
+        full = os.path.join(doctree, p)
+
+        rel_path = os.path.relpath(full, start=app.srcdir)
+        if rel_path.startswith("../"):
+            continue
+
+        config.exclude_patterns.append(rel_path)
+

 # helper
 # ------

+
 def have_command(cmd):
    """Search ``cmd`` in the ``PATH`` environment.

@ -28,105 +99,89 @@ def have_command(cmd):
    """
    return shutil.which(cmd) is not None

-# Get Sphinx version
-major, minor, patch = sphinx.version_info[:3]
-
-#
-# Warn about older versions that we don't want to support for much
-# longer.
-#
-if (major < 2) or (major == 2 and minor < 4):
-    print('WARNING: support for Sphinx < 2.4 will be removed soon.')
-
-# If extensions (or modules to document with autodoc) are in another directory,
-# add these directories to sys.path here. If the directory is relative to the
-# documentation root, use os.path.abspath to make it absolute, like shown here.
-sys.path.insert(0, os.path.abspath('sphinx'))
-from load_config import loadConfig

 # -- General configuration ------------------------------------------------

-# If your documentation needs a minimal Sphinx version, state it here.
-needs_sphinx = '2.4.4'
+# Add any Sphinx extensions in alphabetic order
+extensions = [
+    "automarkup",
+    "kernel_abi",
+    "kerneldoc",
+    "kernel_feat",
+    "kernel_include",
+    "kfigure",
+    "maintainers_include",
+    "parser_yaml",
+    "rstFlatTable",
+    "sphinx.ext.autosectionlabel",
+    "sphinx.ext.ifconfig",
+    "translations",
+]
+# Since Sphinx version 3, the C function parser is more pedantic with regards
+# to type checking. Due to that, having macros at c:function cause problems.
+# Those needed to be escaped by using c_id_attributes[] array
+c_id_attributes = [
+    # GCC Compiler types not parsed by Sphinx:
+    "__restrict__",

-# Add any Sphinx extension module names here, as strings. They can be
-# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
-# ones.
-extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include',
-              'kfigure', 'sphinx.ext.ifconfig', 'automarkup',
-              'maintainers_include', 'sphinx.ext.autosectionlabel',
-              'kernel_abi', 'kernel_feat', 'translations']
+    # include/linux/compiler_types.h:
+    "__iomem",
+    "__kernel",
+    "noinstr",
+    "notrace",
+    "__percpu",
+    "__rcu",
+    "__user",
+    "__force",
+    "__counted_by_le",
+    "__counted_by_be",

-if major >= 3:
-    if (major > 3) or (minor > 0 or patch >= 2):
-        # Sphinx c function parser is more pedantic with regards to type
-        # checking. Due to that, having macros at c:function cause problems.
-        # Those needed to be scaped by using c_id_attributes[] array
-        c_id_attributes = [
-            # GCC Compiler types not parsed by Sphinx:
-            "__restrict__",
+    # include/linux/compiler_attributes.h:
+    "__alias",
+    "__aligned",
+    "__aligned_largest",
+    "__always_inline",
+    "__assume_aligned",
+    "__cold",
+    "__attribute_const__",
+    "__copy",
+    "__pure",
+    "__designated_init",
+    "__visible",
+    "__printf",
+    "__scanf",
+    "__gnu_inline",
+    "__malloc",
+    "__mode",
+    "__no_caller_saved_registers",
+    "__noclone",
+    "__nonstring",
+    "__noreturn",
+    "__packed",
+    "__pure",
+    "__section",
+    "__always_unused",
+    "__maybe_unused",
+    "__used",
+    "__weak",
+    "noinline",
+    "__fix_address",
+    "__counted_by",

-            # include/linux/compiler_types.h:
-            "__iomem",
-            "__kernel",
-            "noinstr",
-            "notrace",
-            "__percpu",
-            "__rcu",
-            "__user",
-            "__force",
-            "__counted_by_le",
-            "__counted_by_be",
+    # include/linux/memblock.h:
+    "__init_memblock",
+    "__meminit",

-            # include/linux/compiler_attributes.h:
-            "__alias",
-            "__aligned",
-            "__aligned_largest",
-            "__always_inline",
-            "__assume_aligned",
-            "__cold",
-            "__attribute_const__",
-            "__copy",
-            "__pure",
-            "__designated_init",
-            "__visible",
-            "__printf",
-            "__scanf",
-            "__gnu_inline",
-            "__malloc",
-            "__mode",
-            "__no_caller_saved_registers",
-            "__noclone",
-            "__nonstring",
-            "__noreturn",
-            "__packed",
-            "__pure",
-            "__section",
-            "__always_unused",
-            "__maybe_unused",
-            "__used",
-            "__weak",
-            "noinline",
-            "__fix_address",
-            "__counted_by",
+    # include/linux/init.h:
+    "__init",
+    "__ref",

-            # include/linux/memblock.h:
-            "__init_memblock",
-            "__meminit",
+    # include/linux/linkage.h:
+    "asmlinkage",

-            # include/linux/init.h:
-            "__init",
-            "__ref",
-
-            # include/linux/linkage.h:
-            "asmlinkage",
-
-            # include/linux/btf.h
-            "__bpf_kfunc",
-        ]
-
-else:
-    extensions.append('cdomain')
+    # include/linux/btf.h
+    "__bpf_kfunc",
+]

 # Ensure that autosectionlabel will produce unique names
 autosectionlabel_prefix_document = True
@ -135,48 +190,45 @@ autosectionlabel_maxdepth = 2
 # Load math renderer:
 # For html builder, load imgmath only when its dependencies are met.
 # mathjax is the default math renderer since Sphinx 1.8.
-have_latex =  have_command('latex')
-have_dvipng = have_command('dvipng')
+have_latex = have_command("latex")
+have_dvipng = have_command("dvipng")
 load_imgmath = have_latex and have_dvipng

 # Respect SPHINX_IMGMATH (for html docs only)
-if 'SPHINX_IMGMATH' in os.environ:
-    env_sphinx_imgmath = os.environ['SPHINX_IMGMATH']
-    if 'yes' in env_sphinx_imgmath:
+if "SPHINX_IMGMATH" in os.environ:
+    env_sphinx_imgmath = os.environ["SPHINX_IMGMATH"]
+    if "yes" in env_sphinx_imgmath:
        load_imgmath = True
-    elif 'no' in env_sphinx_imgmath:
+    elif "no" in env_sphinx_imgmath:
        load_imgmath = False
    else:
        sys.stderr.write("Unknown env SPHINX_IMGMATH=%s ignored.\n" % env_sphinx_imgmath)

-# Always load imgmath for Sphinx <1.8 or for epub docs
-load_imgmath = (load_imgmath or (major == 1 and minor < 8)
-                or 'epub' in sys.argv)
-
 if load_imgmath:
    extensions.append("sphinx.ext.imgmath")
-    math_renderer = 'imgmath'
+    math_renderer = "imgmath"
 else:
-    math_renderer = 'mathjax'
+    math_renderer = "mathjax"

 # Add any paths that contain templates here, relative to this directory.
-templates_path = ['sphinx/templates']
+templates_path = ["sphinx/templates"]

-# The suffix(es) of source filenames.
-# You can specify multiple suffix as a list of string:
-# source_suffix = ['.rst', '.md']
-source_suffix = '.rst'
+# The suffixes of source filenames that will be automatically parsed
+source_suffix = {
+    ".rst": "restructuredtext",
+    ".yaml": "yaml",
+}

 # The encoding of source files.
-#source_encoding = 'utf-8-sig'
+# source_encoding = 'utf-8-sig'

 # The master toctree document.
-master_doc = 'index'
+master_doc = "index"

 # General information about the project.
-project = 'The Linux Kernel'
-copyright = 'The kernel development community'
-author = 'The kernel development community'
+project = "The Linux Kernel"
+copyright = "The kernel development community"         # pylint: disable=W0622
+author = "The kernel development community"

 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the
@ -191,86 +243,86 @@ author = 'The kernel development community'
 try:
    makefile_version = None
    makefile_patchlevel = None
-    for line in open('../Makefile'):
-        key, val = [x.strip() for x in line.split('=', 2)]
-        if key == 'VERSION':
-            makefile_version = val
-        elif key == 'PATCHLEVEL':
-            makefile_patchlevel = val
-        if makefile_version and makefile_patchlevel:
-            break
-except:
+    with open("../Makefile", encoding="utf=8") as fp:
+        for line in fp:
+            key, val = [x.strip() for x in line.split("=", 2)]
+            if key == "VERSION":
+                makefile_version = val
+            elif key == "PATCHLEVEL":
+                makefile_patchlevel = val
+            if makefile_version and makefile_patchlevel:
+                break
+except Exception:
    pass
 finally:
    if makefile_version and makefile_patchlevel:
-        version = release = makefile_version + '.' + makefile_patchlevel
+        version = release = makefile_version + "." + makefile_patchlevel
    else:
        version = release = "unknown version"

-#
-# HACK: there seems to be no easy way for us to get at the version and
-# release information passed in from the makefile...so go pawing through the
-# command-line options and find it for ourselves.
-#
+
 def get_cline_version():
-    c_version = c_release = ''
+    """
+    HACK: There seems to be no easy way for us to get at the version and
+    release information passed in from the makefile...so go pawing through the
+    command-line options and find it for ourselves.
+    """
+
+    c_version = c_release = ""
    for arg in sys.argv:
-        if arg.startswith('version='):
+        if arg.startswith("version="):
            c_version = arg[8:]
-        elif arg.startswith('release='):
+        elif arg.startswith("release="):
            c_release = arg[8:]
    if c_version:
        if c_release:
-            return c_version + '-' + c_release
+            return c_version + "-" + c_release
        return c_version
-    return version # Whatever we came up with before
+    return version  # Whatever we came up with before
+

 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
 #
 # This is also used if you do content translation via gettext catalogs.
 # Usually you set "language" from the command line for these cases.
-language = 'en'
+language = "en"

 # There are two options for replacing |today|: either, you set today to some
 # non-false value, then it is used:
-#today = ''
+# today = ''
 # Else, today_fmt is used as the format for a strftime call.
-#today_fmt = '%B %d, %Y'
-
-# List of patterns, relative to source directory, that match files and
-# directories to ignore when looking for source files.
-exclude_patterns = ['output']
+# today_fmt = '%B %d, %Y'

 # The reST default role (used for this markup: `text`) to use for all
 # documents.
-#default_role = None
+# default_role = None

 # If true, '()' will be appended to :func: etc. cross-reference text.
-#add_function_parentheses = True
+# add_function_parentheses = True

 # If true, the current module name will be prepended to all description
 # unit titles (such as .. function::).
-#add_module_names = True
+# add_module_names = True

 # If true, sectionauthor and moduleauthor directives will be shown in the
 # output. They are ignored by default.
-#show_authors = False
+# show_authors = False

 # The name of the Pygments (syntax highlighting) style to use.
-pygments_style = 'sphinx'
+pygments_style = "sphinx"

 # A list of ignored prefixes for module index sorting.
-#modindex_common_prefix = []
+# modindex_common_prefix = []

 # If true, keep warnings as "system message" paragraphs in the built documents.
-#keep_warnings = False
+# keep_warnings = False

 # If true, `todo` and `todoList` produce output, else they produce nothing.
 todo_include_todos = False

-primary_domain = 'c'
-highlight_language = 'none'
+primary_domain = "c"
+highlight_language = "none"

 # -- Options for HTML output ----------------------------------------------

@ -278,43 +330,45 @@ highlight_language = 'none'
 # a list of builtin themes.

 # Default theme
-html_theme = 'alabaster'
+html_theme = "alabaster"
 html_css_files = []

 if "DOCS_THEME" in os.environ:
    html_theme = os.environ["DOCS_THEME"]

-if html_theme == 'sphinx_rtd_theme' or html_theme == 'sphinx_rtd_dark_mode':
+if html_theme in ["sphinx_rtd_theme", "sphinx_rtd_dark_mode"]:
    # Read the Docs theme
    try:
        import sphinx_rtd_theme
+
        html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]

        # Add any paths that contain custom static files (such as style sheets) here,
        # relative to this directory. They are copied after the builtin static files,
        # so a file named "default.css" will overwrite the builtin "default.css".
        html_css_files = [
-            'theme_overrides.css',
+            "theme_overrides.css",
        ]

        # Read the Docs dark mode override theme
-        if html_theme == 'sphinx_rtd_dark_mode':
+        if html_theme == "sphinx_rtd_dark_mode":
            try:
-                import sphinx_rtd_dark_mode
-                extensions.append('sphinx_rtd_dark_mode')
-            except ImportError:
-                html_theme == 'sphinx_rtd_theme'
+                import sphinx_rtd_dark_mode            # pylint: disable=W0611

-        if html_theme == 'sphinx_rtd_theme':
-                # Add color-specific RTD normal mode
-                html_css_files.append('theme_rtd_colors.css')
+                extensions.append("sphinx_rtd_dark_mode")
+            except ImportError:
+                html_theme = "sphinx_rtd_theme"
+
+        if html_theme == "sphinx_rtd_theme":
+            # Add color-specific RTD normal mode
+            html_css_files.append("theme_rtd_colors.css")

        html_theme_options = {
-            'navigation_depth': -1,
+            "navigation_depth": -1,
        }

    except ImportError:
-        html_theme = 'alabaster'
+        html_theme = "alabaster"

 if "DOCS_CSS" in os.environ:
    css = os.environ["DOCS_CSS"].split(" ")
@ -322,22 +376,14 @@ if "DOCS_CSS" in os.environ:
    for l in css:
        html_css_files.append(l)

-if major <= 1 and minor < 8:
-    html_context = {
-        'css_files': [],
-    }
-
-    for l in html_css_files:
-        html_context['css_files'].append('_static/' + l)
-
-if  html_theme == 'alabaster':
+if html_theme == "alabaster":
    html_theme_options = {
-        'description': get_cline_version(),
-        'page_width': '65em',
-        'sidebar_width': '15em',
-        'fixed_sidebar': 'true',
-        'font_size': 'inherit',
-        'font_family': 'serif',
+        "description": get_cline_version(),
+        "page_width": "65em",
+        "sidebar_width": "15em",
+        "fixed_sidebar": "true",
+        "font_size": "inherit",
+        "font_family": "serif",
    }

 sys.stderr.write("Using %s theme\n" % html_theme)
@ -345,109 +391,79 @@ sys.stderr.write("Using %s theme\n" % html_theme)
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['sphinx-static']
+html_static_path = ["sphinx-static"]

 # If true, Docutils "smart quotes" will be used to convert quotes and dashes
 # to typographically correct entities.  However, conversion of "--" to "—"
 # is not always what we want, so enable only quotes.
-smartquotes_action = 'q'
+smartquotes_action = "q"

 # Custom sidebar templates, maps document names to template names.
 # Note that the RTD theme ignores this
-html_sidebars = { '**': ['searchbox.html', 'kernel-toc.html', 'sourcelink.html']}
+html_sidebars = {"**": ["searchbox.html",
+                        "kernel-toc.html",
+                        "sourcelink.html"]}

 # about.html is available for alabaster theme. Add it at the front.
-if html_theme == 'alabaster':
-    html_sidebars['**'].insert(0, 'about.html')
+if html_theme == "alabaster":
+    html_sidebars["**"].insert(0, "about.html")

 # The name of an image file (relative to this directory) to place at the top
 # of the sidebar.
-html_logo = 'images/logo.svg'
+html_logo = "images/logo.svg"

 # Output file base name for HTML help builder.
-htmlhelp_basename = 'TheLinuxKerneldoc'
+htmlhelp_basename = "TheLinuxKerneldoc"

 # -- Options for LaTeX output ---------------------------------------------

 latex_elements = {
    # The paper size ('letterpaper' or 'a4paper').
-    'papersize': 'a4paper',
-
+    "papersize": "a4paper",
    # The font size ('10pt', '11pt' or '12pt').
-    'pointsize': '11pt',
-
+    "pointsize": "11pt",
    # Latex figure (float) alignment
-    #'figure_align': 'htbp',
-
+    # 'figure_align': 'htbp',
    # Don't mangle with UTF-8 chars
-    'inputenc': '',
-    'utf8extra': '',
-
+    "inputenc": "",
+    "utf8extra": "",
    # Set document margins
-    'sphinxsetup': '''
+    "sphinxsetup": """
        hmargin=0.5in, vmargin=1in,
        parsedliteralwraps=true,
        verbatimhintsturnover=false,
-    ''',
-
+    """,
    #
    # Some of our authors are fond of deep nesting; tell latex to
    # cope.
    #
-    'maxlistdepth': '10',
-
+    "maxlistdepth": "10",
    # For CJK One-half spacing, need to be in front of hyperref
-    'extrapackages': r'\usepackage{setspace}',
-
+    "extrapackages": r"\usepackage{setspace}",
    # Additional stuff for the LaTeX preamble.
-    'preamble': '''
+    "preamble": """
        % Use some font with UTF-8 support with XeLaTeX
        \\usepackage{fontspec}
        \\setsansfont{DejaVu Sans}
        \\setromanfont{DejaVu Serif}
        \\setmonofont{DejaVu Sans Mono}
-    ''',
+    """,
 }

-# Fix reference escape troubles with Sphinx 1.4.x
-if major == 1:
-    latex_elements['preamble']  += '\\renewcommand*{\\DUrole}[2]{ #2 }\n'
-
-
 # Load kerneldoc specific LaTeX settings
-latex_elements['preamble'] += '''
+latex_elements["preamble"] += """
        % Load kerneldoc specific LaTeX settings
-	\\input{kerneldoc-preamble.sty}
-'''
-
-# With Sphinx 1.6, it is possible to change the Bg color directly
-# by using:
-#	\definecolor{sphinxnoteBgColor}{RGB}{204,255,255}
-#	\definecolor{sphinxwarningBgColor}{RGB}{255,204,204}
-#	\definecolor{sphinxattentionBgColor}{RGB}{255,255,204}
-#	\definecolor{sphinximportantBgColor}{RGB}{192,255,204}
-#
-# However, it require to use sphinx heavy box with:
-#
-#	\renewenvironment{sphinxlightbox} {%
-#		\\begin{sphinxheavybox}
-#	}
-#		\\end{sphinxheavybox}
-#	}
-#
-# Unfortunately, the implementation is buggy: if a note is inside a
-# table, it isn't displayed well. So, for now, let's use boring
-# black and white notes.
+        \\input{kerneldoc-preamble.sty}
+"""

 # Grouping the document tree into LaTeX files. List of tuples
 # (source start file, target name, title,
 #  author, documentclass [howto, manual, or own class]).
 # Sorted in alphabetical order
-latex_documents = [
-]
+latex_documents = []

 # Add all other index files from Documentation/ subdirectories
-for fn in os.listdir('.'):
+for fn in os.listdir("."):
    doc = os.path.join(fn, "index")
    if os.path.exists(doc + ".rst"):
        has = False
@ -456,34 +472,39 @@ for fn in os.listdir('.'):
                has = True
                break
        if not has:
-            latex_documents.append((doc, fn + '.tex',
-                                    'Linux %s Documentation' % fn.capitalize(),
-                                    'The kernel development community',
-                                    'manual'))
+            latex_documents.append(
+                (
+                    doc,
+                    fn + ".tex",
+                    "Linux %s Documentation" % fn.capitalize(),
+                    "The kernel development community",
+                    "manual",
+                )
+            )

 # The name of an image file (relative to this directory) to place at the top of
 # the title page.
-#latex_logo = None
+# latex_logo = None

 # For "manual" documents, if this is true, then toplevel headings are parts,
 # not chapters.
-#latex_use_parts = False
+# latex_use_parts = False

 # If true, show page references after internal links.
-#latex_show_pagerefs = False
+# latex_show_pagerefs = False

 # If true, show URL addresses after external links.
-#latex_show_urls = False
+# latex_show_urls = False

 # Documents to append as an appendix to all manuals.
-#latex_appendices = []
+# latex_appendices = []

 # If false, no module index is generated.
-#latex_domain_indices = True
+# latex_domain_indices = True

 # Additional LaTeX stuff to be copied to build directory
 latex_additional_files = [
-    'sphinx/kerneldoc-preamble.sty',
+    "sphinx/kerneldoc-preamble.sty",
 ]


@ -492,12 +513,11 @@ latex_additional_files = [
 # One entry per manual page. List of tuples
 # (source start file, name, description, authors, manual section).
 man_pages = [
-    (master_doc, 'thelinuxkernel', 'The Linux Kernel Documentation',
-     [author], 1)
+    (master_doc, "thelinuxkernel", "The Linux Kernel Documentation", [author], 1)
 ]

 # If true, show URL addresses after external links.
-#man_show_urls = False
+# man_show_urls = False


 # -- Options for Texinfo output -------------------------------------------
@ -505,11 +525,15 @@ man_pages = [
 # Grouping the document tree into Texinfo files. List of tuples
 # (source start file, target name, title, author,
 #  dir menu entry, description, category)
-texinfo_documents = [
-    (master_doc, 'TheLinuxKernel', 'The Linux Kernel Documentation',
-     author, 'TheLinuxKernel', 'One line description of project.',
-     'Miscellaneous'),
-]
+texinfo_documents = [(
+        master_doc,
+        "TheLinuxKernel",
+        "The Linux Kernel Documentation",
+        author,
+        "TheLinuxKernel",
+        "One line description of project.",
+        "Miscellaneous",
+    ),]

 # -- Options for Epub output ----------------------------------------------

@ -520,9 +544,9 @@ epub_publisher = author
 epub_copyright = copyright

 # A list of files that should not be packed into the epub file.
-epub_exclude_files = ['search.html']
+epub_exclude_files = ["search.html"]

-#=======
+# =======
 # rst2pdf
 #
 # Grouping the document tree into PDF files. List of tuples
@ -534,17 +558,23 @@ epub_exclude_files = ['search.html']
 # multiple PDF files here actually tries to get the cross-referencing right
 # *between* PDF files.
 pdf_documents = [
-    ('kernel-documentation', u'Kernel', u'Kernel', u'J. Random Bozo'),
+    ("kernel-documentation", "Kernel", "Kernel", "J. Random Bozo"),
 ]

 # kernel-doc extension configuration for running Sphinx directly (e.g. by Read
 # the Docs). In a normal build, these are supplied from the Makefile via command
 # line arguments.
-kerneldoc_bin = '../scripts/kernel-doc'
-kerneldoc_srctree = '..'
+kerneldoc_bin = "../scripts/kernel-doc"
+kerneldoc_srctree = ".."

 # ------------------------------------------------------------------------------
 # Since loadConfig overwrites settings from the global namespace, it has to be
 # the last statement in the conf.py file
 # ------------------------------------------------------------------------------
 loadConfig(globals())
+
+
+def setup(app):
+    """Patterns need to be updated at init time on older Sphinx versions"""
+
+    app.connect('config-inited', update_patterns)
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@ -530,6 +530,77 @@ routines, e.g.:::
 		....
 	}

+Part Ie - IOVA-based DMA mappings
+---------------------------------
+
+These APIs allow a very efficient mapping when using an IOMMU.  They are an
+optional path that requires extra code and are only recommended for drivers
+where DMA mapping performance, or the space usage for storing the DMA addresses
+matter.  All the considerations from the previous section apply here as well.
+
+::
+
+    bool dma_iova_try_alloc(struct device *dev, struct dma_iova_state *state,
+		phys_addr_t phys, size_t size);
+
+Is used to try to allocate IOVA space for mapping operation.  If it returns
+false this API can't be used for the given device and the normal streaming
+DMA mapping API should be used.  The ``struct dma_iova_state`` is allocated
+by the driver and must be kept around until unmap time.
+
+::
+
+    static inline bool dma_use_iova(struct dma_iova_state *state)
+
+Can be used by the driver to check if the IOVA-based API is used after a
+call to dma_iova_try_alloc.  This can be useful in the unmap path.
+
+::
+
+    int dma_iova_link(struct device *dev, struct dma_iova_state *state,
+		phys_addr_t phys, size_t offset, size_t size,
+		enum dma_data_direction dir, unsigned long attrs);
+
+Is used to link ranges to the IOVA previously allocated.  The start of all
+but the first call to dma_iova_link for a given state must be aligned
+to the DMA merge boundary returned by ``dma_get_merge_boundary())``, and
+the size of all but the last range must be aligned to the DMA merge boundary
+as well.
+
+::
+
+    int dma_iova_sync(struct device *dev, struct dma_iova_state *state,
+		size_t offset, size_t size);
+
+Must be called to sync the IOMMU page tables for IOVA-range mapped by one or
+more calls to ``dma_iova_link()``.
+
+For drivers that use a one-shot mapping, all ranges can be unmapped and the
+IOVA freed by calling:
+
+::
+
+   void dma_iova_destroy(struct device *dev, struct dma_iova_state *state,
+		size_t mapped_len, enum dma_data_direction dir,
+                unsigned long attrs);
+
+Alternatively drivers can dynamically manage the IOVA space by unmapping
+and mapping individual regions.  In that case
+
+::
+
+    void dma_iova_unlink(struct device *dev, struct dma_iova_state *state,
+		size_t offset, size_t size, enum dma_data_direction dir,
+		unsigned long attrs);
+
+is used to unmap a range previously mapped, and
+
+::
+
+   void dma_iova_free(struct device *dev, struct dma_iova_state *state);
+
+is used to free the IOVA space.  All regions must have been unmapped using
+``dma_iova_unlink()`` before calling ``dma_iova_free()``.

 Part II - Non-coherent DMA allocations
 --------------------------------------
@ -745,7 +816,7 @@ example warning message may look like this::
 	[<ffffffff80235177>] find_busiest_group+0x207/0x8a0
 	[<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50
 	[<ffffffff803c7ea3>] check_unmap+0x203/0x490
-	[<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50
+	[<ffffffff803c8259>] debug_dma_unmap_phys+0x49/0x50
 	[<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0
 	[<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0
 	[<ffffffff8026df84>] handle_IRQ_event+0x34/0x70
@ -839,7 +910,7 @@ that a driver may be leaking mappings.
 dma-debug interface debug_dma_mapping_error() to debug drivers that fail
 to check DMA mapping errors on addresses returned by dma_map_single() and
 dma_map_page() interfaces. This interface clears a flag set by
-debug_dma_map_page() to indicate that dma_mapping_error() has been called by
+debug_dma_map_phys() to indicate that dma_mapping_error() has been called by
 the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
 this flag is still set, prints warning message that includes call trace that
 leads up to the unmap. This interface can be called from dma_mapping_error()
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@ -130,3 +130,21 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
 subsystem that the buffer is fully accessible at the elevated privilege
 level (and ideally inaccessible or at least read-only at the
 lesser-privileged levels).
+
+DMA_ATTR_MMIO
+-------------
+
+This attribute indicates the physical address is not normal system
+memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
+functions, it may not be cacheable, and access using CPU load/store
+instructions may not be allowed.
+
+Usually this will be used to describe MMIO addresses, or other non-cacheable
+register addresses. When DMA mapping this sort of address we call
+the operation Peer to Peer as a one device is DMA'ing to another device.
+For PCI devices the p2pdma APIs must be used to determine if
+DMA_ATTR_MMIO is appropriate.
+
+For architectures that require cache flushing for DMA coherence
+DMA_ATTR_MMIO will not perform any cache flushing. The address
+provided must never be mapped cacheable into the CPU.
--- a/Documentation/core-api/genericirq.rst
+++ b/Documentation/core-api/genericirq.rst
@ -410,8 +410,6 @@ which are used in the generic IRQ layer.
 .. kernel-doc:: include/linux/interrupt.h
   :internal:

-.. kernel-doc:: include/linux/irqdomain.h
-
 Public Functions Provided
 =========================

--- a/Documentation/core-api/irq/concepts.rst
+++ b/Documentation/core-api/irq/concepts.rst
@ -2,23 +2,24 @@
 What is an IRQ?
 ===============

-An IRQ is an interrupt request from a device.
-Currently they can come in over a pin, or over a packet.
-Several devices may be connected to the same pin thus
-sharing an IRQ.
+An IRQ is an interrupt request from a device. Currently, they can come
+in over a pin, or over a packet. Several devices may be connected to
+the same pin thus sharing an IRQ. Such as on legacy PCI bus: All devices
+typically share 4 lanes/pins. Note that each device can request an
+interrupt on each of the lanes.

 An IRQ number is a kernel identifier used to talk about a hardware
-interrupt source.  Typically this is an index into the global irq_desc
-array, but except for what linux/interrupt.h implements the details
-are architecture specific.
+interrupt source. Typically, this is an index into the global irq_desc
+array or sparse_irqs tree. But except for what linux/interrupt.h
+implements, the details are architecture specific.

 An IRQ number is an enumeration of the possible interrupt sources on a
-machine.  Typically what is enumerated is the number of input pins on
-all of the interrupt controller in the system.  In the case of ISA
-what is enumerated are the 16 input pins on the two i8259 interrupt
-controllers.
+machine. Typically, what is enumerated is the number of input pins on
+all of the interrupt controllers in the system. In the case of ISA,
+what is enumerated are the 8 input pins on each of the two i8259
+interrupt controllers.

 Architectures can assign additional meaning to the IRQ numbers, and
-are encouraged to in the case  where there is any manual configuration
-of the hardware involved.  The ISA IRQs are a classic example of
+are encouraged to in the case where there is any manual configuration
+of the hardware involved. The ISA IRQs are a classic example of
 assigning this kind of additional meaning.
--- a/Documentation/core-api/irq/irq-domain.rst
+++ b/Documentation/core-api/irq/irq-domain.rst
@ -1,59 +1,77 @@
 ===============================================
-The irq_domain interrupt number mapping library
+The irq_domain Interrupt Number Mapping Library
 ===============================================

 The current design of the Linux kernel uses a single large number
-space where each separate IRQ source is assigned a different number.
-This is simple when there is only one interrupt controller, but in
-systems with multiple interrupt controllers the kernel must ensure
+space where each separate IRQ source is assigned a unique number.
+This is simple when there is only one interrupt controller. But in
+systems with multiple interrupt controllers, the kernel must ensure
 that each one gets assigned non-overlapping allocations of Linux
 IRQ numbers.

 The number of interrupt controllers registered as unique irqchips
-show a rising tendency: for example subdrivers of different kinds
+shows a rising tendency. For example, subdrivers of different kinds
 such as GPIO controllers avoid reimplementing identical callback
 mechanisms as the IRQ core system by modelling their interrupt
-handlers as irqchips, i.e. in effect cascading interrupt controllers.
+handlers as irqchips. I.e. in effect cascading interrupt controllers.

-Here the interrupt number loose all kind of correspondence to
-hardware interrupt numbers: whereas in the past, IRQ numbers could
-be chosen so they matched the hardware IRQ line into the root
-interrupt controller (i.e. the component actually fireing the
-interrupt line to the CPU) nowadays this number is just a number.
+So in the past, IRQ numbers could be chosen so that they match the
+hardware IRQ line into the root interrupt controller (i.e. the
+component actually firing the interrupt line to the CPU). Nowadays,
+this number is just a number and the number loose all kind of
+correspondence to hardware interrupt numbers.

-For this reason we need a mechanism to separate controller-local
-interrupt numbers, called hardware irq's, from Linux IRQ numbers.
+For this reason, we need a mechanism to separate controller-local
+interrupt numbers, called hardware IRQs, from Linux IRQ numbers.

 The irq_alloc_desc*() and irq_free_desc*() APIs provide allocation of
-irq numbers, but they don't provide any support for reverse mapping of
+IRQ numbers, but they don't provide any support for reverse mapping of
 the controller-local IRQ (hwirq) number into the Linux IRQ number
 space.

-The irq_domain library adds mapping between hwirq and IRQ numbers on
-top of the irq_alloc_desc*() API.  An irq_domain to manage mapping is
-preferred over interrupt controller drivers open coding their own
+The irq_domain library adds a mapping between hwirq and IRQ numbers on
+top of the irq_alloc_desc*() API. An irq_domain to manage the mapping
+is preferred over interrupt controller drivers open coding their own
 reverse mapping scheme.

-irq_domain also implements translation from an abstract irq_fwspec
-structure to hwirq numbers (Device Tree and ACPI GSI so far), and can
-be easily extended to support other IRQ topology data sources.
+irq_domain also implements a translation from an abstract struct
+irq_fwspec to hwirq numbers (Device Tree, non-DT firmware node, ACPI
+GSI, and software node so far), and can be easily extended to support
+other IRQ topology data sources. The implementation is performed
+without any extra platform support code.

-irq_domain usage
+irq_domain Usage
 ================
+struct irq_domain could be defined as an irq domain controller. That
+is, it handles the mapping between hardware and virtual interrupt
+numbers for a given interrupt domain. The domain structure is
+generally created by the PIC code for a given PIC instance (though a
+domain can cover more than one PIC if they have a flat number model).
+It is the domain callbacks that are responsible for setting the
+irq_chip on a given irq_desc after it has been mapped.

-An interrupt controller driver creates and registers an irq_domain by
-calling one of the irq_domain_add_*() or irq_domain_create_*() functions
-(each mapping method has a different allocator function, more on that later).
-The function will return a pointer to the irq_domain on success. The caller
-must provide the allocator function with an irq_domain_ops structure.
+The host code and data structures use a fwnode_handle pointer to
+identify the domain. In some cases, and in order to preserve source
+code compatibility, this fwnode pointer is "upgraded" to a DT
+device_node. For those firmware infrastructures that do not provide a
+unique identifier for an interrupt controller, the irq_domain code
+offers a fwnode allocator.
+
+An interrupt controller driver creates and registers a struct irq_domain
+by calling one of the irq_domain_create_*() functions (each mapping
+method has a different allocator function, more on that later). The
+function will return a pointer to the struct irq_domain on success. The
+caller must provide the allocator function with a struct irq_domain_ops
+pointer.

 In most cases, the irq_domain will begin empty without any mappings
 between hwirq and IRQ numbers.  Mappings are added to the irq_domain
 by calling irq_create_mapping() which accepts the irq_domain and a
-hwirq number as arguments.  If a mapping for the hwirq doesn't already
-exist then it will allocate a new Linux irq_desc, associate it with
-the hwirq, and call the .map() callback so the driver can perform any
-required hardware setup.
+hwirq number as arguments. If a mapping for the hwirq doesn't already
+exist, irq_create_mapping() allocates a new Linux irq_desc, associates
+it with the hwirq, and calls the :c:member:`irq_domain_ops.map()`
+callback. In there, the driver can perform any required hardware
+setup.

 Once a mapping has been established, it can be retrieved or used via a
 variety of methods:
@ -63,8 +81,6 @@ variety of methods:
  mapping.
 - irq_find_mapping() returns a Linux IRQ number for a given domain and
  hwirq number, and 0 if there was no mapping
- irq_linear_revmap() is now identical to irq_find_mapping(), and is
-  deprecated
 - generic_handle_domain_irq() handles an interrupt described by a
  domain and a hwirq number

@ -77,9 +93,10 @@ be allocated.

 If the driver has the Linux IRQ number or the irq_data pointer, and
 needs to know the associated hwirq number (such as in the irq_chip
-callbacks) then it can be directly obtained from irq_data->hwirq.
+callbacks) then it can be directly obtained from
+:c:member:`irq_data.hwirq`.

-Types of irq_domain mappings
+Types of irq_domain Mappings
 ============================

 There are several mechanisms available for reverse mapping from hwirq
@ -92,7 +109,6 @@ Linear

 ::

-	irq_domain_add_linear()
 	irq_domain_create_linear()

 The linear reverse map maintains a fixed size table indexed by the
@ -105,19 +121,13 @@ map are fixed time lookup for IRQ numbers, and irq_descs are only
 allocated for in-use IRQs.  The disadvantage is that the table must be
 as large as the largest possible hwirq number.

-irq_domain_add_linear() and irq_domain_create_linear() are functionally
-equivalent, except for the first argument is different - the former
-accepts an Open Firmware specific 'struct device_node', while the latter
-accepts a more general abstraction 'struct fwnode_handle'.
-
-The majority of drivers should use the linear map.
+The majority of drivers should use the Linear map.

 Tree
 ----

 ::

-	irq_domain_add_tree()
 	irq_domain_create_tree()

 The irq_domain maintains a radix tree map from hwirq numbers to Linux
@ -129,11 +139,6 @@ since it doesn't need to allocate a table as large as the largest
 hwirq number.  The disadvantage is that hwirq to IRQ number lookup is
 dependent on how many entries are in the table.

-irq_domain_add_tree() and irq_domain_create_tree() are functionally
-equivalent, except for the first argument is different - the former
-accepts an Open Firmware specific 'struct device_node', while the latter
-accepts a more general abstraction 'struct fwnode_handle'.
-
 Very few drivers should need this mapping.

 No Map
@ -141,7 +146,7 @@ No Map

 ::

-	irq_domain_add_nomap()
+	irq_domain_create_nomap()

 The No Map mapping is to be used when the hwirq number is
 programmable in the hardware.  In this case it is best to program the
@ -159,8 +164,6 @@ Legacy

 ::

-	irq_domain_add_simple()
-	irq_domain_add_legacy()
 	irq_domain_create_simple()
 	irq_domain_create_legacy()

@ -189,13 +192,13 @@ supported.  For example, ISA controllers would use the legacy map for
 mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ
 numbers.

-Most users of legacy mappings should use irq_domain_add_simple() or
-irq_domain_create_simple() which will use a legacy domain only if an IRQ range
-is supplied by the system and will otherwise use a linear domain mapping.
-The semantics of this call are such that if an IRQ range is specified then
-descriptors will be allocated on-the-fly for it, and if no range is
-specified it will fall through to irq_domain_add_linear() or
-irq_domain_create_linear() which means *no* irq descriptors will be allocated.
+Most users of legacy mappings should use irq_domain_create_simple()
+which will use a legacy domain only if an IRQ range is supplied by the
+system and will otherwise use a linear domain mapping. The semantics of
+this call are such that if an IRQ range is specified then descriptors
+will be allocated on-the-fly for it, and if no range is specified it
+will fall through to irq_domain_create_linear() which means *no* irq
+descriptors will be allocated.

 A typical use case for simple domains is where an irqchip provider
 is supporting both dynamic and static IRQ assignments.
@ -206,13 +209,7 @@ that the driver using the simple domain call irq_create_mapping()
 before any irq_find_mapping() since the latter will actually work
 for the static IRQ assignment case.

-irq_domain_add_simple() and irq_domain_create_simple() as well as
-irq_domain_add_legacy() and irq_domain_create_legacy() are functionally
-equivalent, except for the first argument is different - the former
-accepts an Open Firmware specific 'struct device_node', while the latter
-accepts a more general abstraction 'struct fwnode_handle'.
-
-Hierarchy IRQ domain
+Hierarchy IRQ Domain
 --------------------

 On some architectures, there may be multiple interrupt controllers
@ -253,20 +250,40 @@ There are four major interfaces to use hierarchy irq_domain:
 4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware
   to stop delivering the interrupt.

-Following changes are needed to support hierarchy irq_domain:
+The following is needed to support hierarchy irq_domain:

-1) a new field 'parent' is added to struct irq_domain; it's used to
+1) The :c:member:`parent` field in struct irq_domain is used to
   maintain irq_domain hierarchy information.
-2) a new field 'parent_data' is added to struct irq_data; it's used to
-   build hierarchy irq_data to match hierarchy irq_domains. The irq_data
-   is used to store irq_domain pointer and hardware irq number.
-3) new callbacks are added to struct irq_domain_ops to support hierarchy
-   irq_domain operations.
+2) The :c:member:`parent_data` field in struct irq_data is used to
+   build hierarchy irq_data to match hierarchy irq_domains. The
+   irq_data is used to store irq_domain pointer and hardware irq
+   number.
+3) The :c:member:`alloc()`, :c:member:`free()`, and other callbacks in
+   struct irq_domain_ops to support hierarchy irq_domain operations.

-With support of hierarchy irq_domain and hierarchy irq_data ready, an
-irq_domain structure is built for each interrupt controller, and an
+With the support of hierarchy irq_domain and hierarchy irq_data ready,
+an irq_domain structure is built for each interrupt controller, and an
 irq_data structure is allocated for each irq_domain associated with an
-IRQ. Now we could go one step further to support stacked(hierarchy)
+IRQ.
+
+For an interrupt controller driver to support hierarchy irq_domain, it
+needs to:
+
+1) Implement irq_domain_ops.alloc() and irq_domain_ops.free()
+2) Optionally, implement irq_domain_ops.activate() and
+   irq_domain_ops.deactivate().
+3) Optionally, implement an irq_chip to manage the interrupt controller
+   hardware.
+4) There is no need to implement irq_domain_ops.map() and
+   irq_domain_ops.unmap(). They are unused with hierarchy irq_domain.
+
+Note the hierarchy irq_domain is in no way x86-specific, and is
+heavily used to support other architectures, such as ARM, ARM64 etc.
+
+Stacked irq_chip
+~~~~~~~~~~~~~~~~
+
+Now, we could go one step further to support stacked (hierarchy)
 irq_chip. That is, an irq_chip is associated with each irq_data along
 the hierarchy. A child irq_chip may implement a required action by
 itself or by cooperating with its parent irq_chip.
@ -276,22 +293,28 @@ with the hardware managed by itself and may ask for services from its
 parent irq_chip when needed. So we could achieve a much cleaner
 software architecture.

-For an interrupt controller driver to support hierarchy irq_domain, it
-needs to:
-
-1) Implement irq_domain_ops.alloc and irq_domain_ops.free
-2) Optionally implement irq_domain_ops.activate and
-   irq_domain_ops.deactivate.
-3) Optionally implement an irq_chip to manage the interrupt controller
-   hardware.
-4) No need to implement irq_domain_ops.map and irq_domain_ops.unmap,
-   they are unused with hierarchy irq_domain.
-
-Hierarchy irq_domain is in no way x86 specific, and is heavily used to
-support other architectures, such as ARM, ARM64 etc.
-
 Debugging
 =========

 Most of the internals of the IRQ subsystem are exposed in debugfs by
 turning CONFIG_GENERIC_IRQ_DEBUGFS on.
+
+Structures and Public Functions Provided
+========================================
+
+This chapter contains the autogenerated documentation of the structures
+and exported kernel API functions which are used for IRQ domains.
+
+.. kernel-doc:: include/linux/irqdomain.h
+
+.. kernel-doc:: kernel/irq/irqdomain.c
+   :export:
+
+Internal Functions Provided
+===========================
+
+This chapter contains the autogenerated documentation of the internal
+functions.
+
+.. kernel-doc:: kernel/irq/irqdomain.c
+   :internal:
--- a/Documentation/core-api/symbol-namespaces.rst
+++ b/Documentation/core-api/symbol-namespaces.rst
@ -28,6 +28,9 @@ kernel. As of today, modules that make use of symbols exported into namespaces,
 are required to import the namespace. Otherwise the kernel will, depending on
 its configuration, reject loading the module or warn about a missing import.

+Additionally, it is possible to put symbols into a module namespace, strictly
+limiting which modules are allowed to use these symbols.
+
 2. How to define Symbol Namespaces
 ==================================

@ -84,6 +87,22 @@ unit as preprocessor statement. The above example would then read::
 within the corresponding compilation unit before any EXPORT_SYMBOL macro is
 used.

+2.3 Using the EXPORT_SYMBOL_GPL_FOR_MODULES() macro
+===================================================
+
+Symbols exported using this macro are put into a module namespace. This
+namespace cannot be imported.
+
+The macro takes a comma separated list of module names, allowing only those
+modules to access this symbol. Simple tail-globs are supported.
+
+For example:
+
+  EXPORT_SYMBOL_GPL_FOR_MODULES(preempt_notifier_inc, "kvm,kvm-*")
+
+will limit usage of this symbol to modules whoes name matches the given
+patterns.
+
 3. How to use Symbols exported in Namespaces
 ============================================

@ -155,3 +174,6 @@ in-tree modules::
 You can also run nsdeps for external module builds. A typical usage is::

 	$ make -C <path_to_kernel_src> M=$PWD nsdeps
+
+Note: it will happily generate an import statement for the module namespace;
+which will not work and generates build and runtime failures.
--- a/Show More
+++ b/Show More