Import of kernel-5.14.0-611.5.1.el9_7
@ -420,6 +420,13 @@ Description:
|
||||
write_zeroes_max_bytes is 0, write zeroes is not supported
|
||||
by the device.
|
||||
|
||||
What: /sys/block/<disk>/queue/iostats_passthrough
|
||||
Date: October 2024
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] This file is used to control (on/off) the iostats
|
||||
accounting of the disk for passthrough commands.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/zoned
|
||||
Date: September 2016
|
||||
|
||||
@ -1,7 +1,7 @@
|
||||
What: /sys/bus/mhi/devices/.../serialnumber
|
||||
Date: Sept 2020
|
||||
KernelVersion: 5.10
|
||||
Contact: Bhaumik Bhatt <bbhatt@codeaurora.org>
|
||||
Contact: mhi@lists.linux.dev
|
||||
Description: The file holds the serial number of the client device obtained
|
||||
using a BHI (Boot Host Interface) register read after at least
|
||||
one attempt to power up the device has been done. If read
|
||||
@ -12,7 +12,7 @@ Users: Any userspace application or clients interested in device info.
|
||||
What: /sys/bus/mhi/devices/.../oem_pk_hash
|
||||
Date: Sept 2020
|
||||
KernelVersion: 5.10
|
||||
Contact: Bhaumik Bhatt <bbhatt@codeaurora.org>
|
||||
Contact: mhi@lists.linux.dev
|
||||
Description: The file holds the OEM PK Hash value of the endpoint device
|
||||
obtained using a BHI (Boot Host Interface) register read after
|
||||
at least one attempt to power up the device has been done. If
|
||||
|
||||
9
Documentation/ABI/stable/sysfs-class-bluetooth
Normal file
@ -0,0 +1,9 @@
|
||||
What: /sys/class/bluetooth/hci<index>/reset
|
||||
Date: 14-Jan-2025
|
||||
KernelVersion: 6.13
|
||||
Contact: linux-bluetooth@vger.kernel.org
|
||||
Description: This write-only attribute allows users to trigger the vendor reset
|
||||
method on the Bluetooth device when arbitrary data is written.
|
||||
The reset may or may not be done through the device transport
|
||||
(e.g., UART/USB), and can also be done through an out-of-band
|
||||
approach such as GPIO.
|
||||
@ -14,9 +14,10 @@ Description:
|
||||
event to its internal Informational Event log, updates the
|
||||
Event Status register, and if configured, interrupts the host.
|
||||
It is not an error to inject poison into an address that
|
||||
already has poison present and no error is returned. The
|
||||
inject_poison attribute is only visible for devices supporting
|
||||
the capability.
|
||||
already has poison present and no error is returned. If the
|
||||
device returns 'Inject Poison Limit Reached' an -EBUSY error
|
||||
is returned to the user. The inject_poison attribute is only
|
||||
visible for devices supporting the capability.
|
||||
|
||||
|
||||
What: /sys/kernel/debug/memX/clear_poison
|
||||
|
||||
276
Documentation/ABI/testing/debugfs-intel-iommu
Normal file
@ -0,0 +1,276 @@
|
||||
What: /sys/kernel/debug/iommu/intel/iommu_regset
|
||||
Date: December 2023
|
||||
Contact: Jingqi Liu <Jingqi.liu@intel.com>
|
||||
Description:
|
||||
This file dumps all the register contents for each IOMMU device.
|
||||
|
||||
Example in Kabylake:
|
||||
|
||||
::
|
||||
|
||||
$ sudo cat /sys/kernel/debug/iommu/intel/iommu_regset
|
||||
|
||||
IOMMU: dmar0 Register Base Address: 26be37000
|
||||
|
||||
Name Offset Contents
|
||||
VER 0x00 0x0000000000000010
|
||||
GCMD 0x18 0x0000000000000000
|
||||
GSTS 0x1c 0x00000000c7000000
|
||||
FSTS 0x34 0x0000000000000000
|
||||
FECTL 0x38 0x0000000000000000
|
||||
|
||||
[...]
|
||||
|
||||
IOMMU: dmar1 Register Base Address: fed90000
|
||||
|
||||
Name Offset Contents
|
||||
VER 0x00 0x0000000000000010
|
||||
GCMD 0x18 0x0000000000000000
|
||||
GSTS 0x1c 0x00000000c7000000
|
||||
FSTS 0x34 0x0000000000000000
|
||||
FECTL 0x38 0x0000000000000000
|
||||
|
||||
[...]
|
||||
|
||||
IOMMU: dmar2 Register Base Address: fed91000
|
||||
|
||||
Name Offset Contents
|
||||
VER 0x00 0x0000000000000010
|
||||
GCMD 0x18 0x0000000000000000
|
||||
GSTS 0x1c 0x00000000c7000000
|
||||
FSTS 0x34 0x0000000000000000
|
||||
FECTL 0x38 0x0000000000000000
|
||||
|
||||
[...]
|
||||
|
||||
What: /sys/kernel/debug/iommu/intel/ir_translation_struct
|
||||
Date: December 2023
|
||||
Contact: Jingqi Liu <Jingqi.liu@intel.com>
|
||||
Description:
|
||||
This file dumps the table entries for Interrupt
|
||||
remapping and Interrupt posting.
|
||||
|
||||
Example in Kabylake:
|
||||
|
||||
::
|
||||
|
||||
$ sudo cat /sys/kernel/debug/iommu/intel/ir_translation_struct
|
||||
|
||||
Remapped Interrupt supported on IOMMU: dmar0
|
||||
IR table address:100900000
|
||||
|
||||
Entry SrcID DstID Vct IRTE_high IRTE_low
|
||||
0 00:0a.0 00000080 24 0000000000040050 000000800024000d
|
||||
1 00:0a.0 00000001 ef 0000000000040050 0000000100ef000d
|
||||
|
||||
Remapped Interrupt supported on IOMMU: dmar1
|
||||
IR table address:100300000
|
||||
Entry SrcID DstID Vct IRTE_high IRTE_low
|
||||
0 00:02.0 00000002 26 0000000000040010 000000020026000d
|
||||
|
||||
[...]
|
||||
|
||||
****
|
||||
|
||||
Posted Interrupt supported on IOMMU: dmar0
|
||||
IR table address:100900000
|
||||
Entry SrcID PDA_high PDA_low Vct IRTE_high IRTE_low
|
||||
|
||||
What: /sys/kernel/debug/iommu/intel/dmar_translation_struct
|
||||
Date: December 2023
|
||||
Contact: Jingqi Liu <Jingqi.liu@intel.com>
|
||||
Description:
|
||||
This file dumps Intel IOMMU DMA remapping tables, such
|
||||
as root table, context table, PASID directory and PASID
|
||||
table entries in debugfs. For legacy mode, it doesn't
|
||||
support PASID, and hence PASID field is defaulted to
|
||||
'-1' and other PASID related fields are invalid.
|
||||
|
||||
Example in Kabylake:
|
||||
|
||||
::
|
||||
|
||||
$ sudo cat /sys/kernel/debug/iommu/intel/dmar_translation_struct
|
||||
|
||||
IOMMU dmar1: Root Table Address: 0x103027000
|
||||
B.D.F Root_entry
|
||||
00:02.0 0x0000000000000000:0x000000010303e001
|
||||
|
||||
Context_entry
|
||||
0x0000000000000102:0x000000010303f005
|
||||
|
||||
PASID PASID_table_entry
|
||||
-1 0x0000000000000000:0x0000000000000000:0x0000000000000000
|
||||
|
||||
IOMMU dmar0: Root Table Address: 0x103028000
|
||||
B.D.F Root_entry
|
||||
00:0a.0 0x0000000000000000:0x00000001038a7001
|
||||
|
||||
Context_entry
|
||||
0x0000000000000000:0x0000000103220e7d
|
||||
|
||||
PASID PASID_table_entry
|
||||
0 0x0000000000000000:0x0000000000800002:0x00000001038a5089
|
||||
|
||||
[...]
|
||||
|
||||
What: /sys/kernel/debug/iommu/intel/invalidation_queue
|
||||
Date: December 2023
|
||||
Contact: Jingqi Liu <Jingqi.liu@intel.com>
|
||||
Description:
|
||||
This file exports invalidation queue internals of each
|
||||
IOMMU device.
|
||||
|
||||
Example in Kabylake:
|
||||
|
||||
::
|
||||
|
||||
$ sudo cat /sys/kernel/debug/iommu/intel/invalidation_queue
|
||||
|
||||
Invalidation queue on IOMMU: dmar0
|
||||
Base: 0x10022e000 Head: 20 Tail: 20
|
||||
Index qw0 qw1 qw2
|
||||
0 0000000000000014 0000000000000000 0000000000000000
|
||||
1 0000000200000025 0000000100059c04 0000000000000000
|
||||
2 0000000000000014 0000000000000000 0000000000000000
|
||||
|
||||
qw3 status
|
||||
0000000000000000 0000000000000000
|
||||
0000000000000000 0000000000000000
|
||||
0000000000000000 0000000000000000
|
||||
|
||||
[...]
|
||||
|
||||
Invalidation queue on IOMMU: dmar1
|
||||
Base: 0x10026e000 Head: 32 Tail: 32
|
||||
Index qw0 qw1 status
|
||||
0 0000000000000004 0000000000000000 0000000000000000
|
||||
1 0000000200000025 0000000100059804 0000000000000000
|
||||
2 0000000000000011 0000000000000000 0000000000000000
|
||||
|
||||
[...]
|
||||
|
||||
What: /sys/kernel/debug/iommu/intel/dmar_perf_latency
|
||||
Date: December 2023
|
||||
Contact: Jingqi Liu <Jingqi.liu@intel.com>
|
||||
Description:
|
||||
This file is used to control and show counts of
|
||||
execution time ranges for various types per DMAR.
|
||||
|
||||
Firstly, write a value to
|
||||
/sys/kernel/debug/iommu/intel/dmar_perf_latency
|
||||
to enable sampling.
|
||||
|
||||
The possible values are as follows:
|
||||
|
||||
* 0 - disable sampling all latency data
|
||||
|
||||
* 1 - enable sampling IOTLB invalidation latency data
|
||||
|
||||
* 2 - enable sampling devTLB invalidation latency data
|
||||
|
||||
* 3 - enable sampling intr entry cache invalidation latency data
|
||||
|
||||
Next, read /sys/kernel/debug/iommu/intel/dmar_perf_latency gives
|
||||
a snapshot of sampling result of all enabled monitors.
|
||||
|
||||
Examples in Kabylake:
|
||||
|
||||
::
|
||||
|
||||
1) Disable sampling all latency data:
|
||||
|
||||
$ sudo echo 0 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
|
||||
|
||||
2) Enable sampling IOTLB invalidation latency data
|
||||
|
||||
$ sudo echo 1 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
|
||||
|
||||
$ sudo cat /sys/kernel/debug/iommu/intel/dmar_perf_latency
|
||||
|
||||
IOMMU: dmar0 Register Base Address: 26be37000
|
||||
<0.1us 0.1us-1us 1us-10us 10us-100us 100us-1ms
|
||||
inv_iotlb 0 0 0 0 0
|
||||
|
||||
1ms-10ms >=10ms min(us) max(us) average(us)
|
||||
inv_iotlb 0 0 0 0 0
|
||||
|
||||
[...]
|
||||
|
||||
IOMMU: dmar2 Register Base Address: fed91000
|
||||
<0.1us 0.1us-1us 1us-10us 10us-100us 100us-1ms
|
||||
inv_iotlb 0 0 18 0 0
|
||||
|
||||
1ms-10ms >=10ms min(us) max(us) average(us)
|
||||
inv_iotlb 0 0 2 2 2
|
||||
|
||||
3) Enable sampling devTLB invalidation latency data
|
||||
|
||||
$ sudo echo 2 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
|
||||
|
||||
$ sudo cat /sys/kernel/debug/iommu/intel/dmar_perf_latency
|
||||
|
||||
IOMMU: dmar0 Register Base Address: 26be37000
|
||||
<0.1us 0.1us-1us 1us-10us 10us-100us 100us-1ms
|
||||
inv_devtlb 0 0 0 0 0
|
||||
|
||||
>=10ms min(us) max(us) average(us)
|
||||
inv_devtlb 0 0 0 0
|
||||
|
||||
[...]
|
||||
|
||||
What: /sys/kernel/debug/iommu/intel/<bdf>/domain_translation_struct
|
||||
Date: December 2023
|
||||
Contact: Jingqi Liu <Jingqi.liu@intel.com>
|
||||
Description:
|
||||
This file dumps a specified page table of Intel IOMMU
|
||||
in legacy mode or scalable mode.
|
||||
|
||||
For a device that only supports legacy mode, dump its
|
||||
page table by the debugfs file in the debugfs device
|
||||
directory. e.g.
|
||||
/sys/kernel/debug/iommu/intel/0000:00:02.0/domain_translation_struct.
|
||||
|
||||
For a device that supports scalable mode, dump the
|
||||
page table of specified pasid by the debugfs file in
|
||||
the debugfs pasid directory. e.g.
|
||||
/sys/kernel/debug/iommu/intel/0000:00:02.0/1/domain_translation_struct.
|
||||
|
||||
Examples in Kabylake:
|
||||
|
||||
::
|
||||
|
||||
1) Dump the page table of device "0000:00:02.0" that only supports legacy mode.
|
||||
|
||||
$ sudo cat /sys/kernel/debug/iommu/intel/0000:00:02.0/domain_translation_struct
|
||||
|
||||
Device 0000:00:02.0 @0x1017f8000
|
||||
IOVA_PFN PML5E PML4E
|
||||
0x000000008d800 | 0x0000000000000000 0x00000001017f9003
|
||||
0x000000008d801 | 0x0000000000000000 0x00000001017f9003
|
||||
0x000000008d802 | 0x0000000000000000 0x00000001017f9003
|
||||
|
||||
PDPE PDE PTE
|
||||
0x00000001017fa003 0x00000001017fb003 0x000000008d800003
|
||||
0x00000001017fa003 0x00000001017fb003 0x000000008d801003
|
||||
0x00000001017fa003 0x00000001017fb003 0x000000008d802003
|
||||
|
||||
[...]
|
||||
|
||||
2) Dump the page table of device "0000:00:0a.0" with PASID "1" that
|
||||
supports scalable mode.
|
||||
|
||||
$ sudo cat /sys/kernel/debug/iommu/intel/0000:00:0a.0/1/domain_translation_struct
|
||||
|
||||
Device 0000:00:0a.0 with pasid 1 @0x10c112000
|
||||
IOVA_PFN PML5E PML4E
|
||||
0x0000000000000 | 0x0000000000000000 0x000000010df93003
|
||||
0x0000000000001 | 0x0000000000000000 0x000000010df93003
|
||||
0x0000000000002 | 0x0000000000000000 0x000000010df93003
|
||||
|
||||
PDPE PDE PTE
|
||||
0x0000000106ae6003 0x0000000104b38003 0x0000000147c00803
|
||||
0x0000000106ae6003 0x0000000104b38003 0x0000000147c01803
|
||||
0x0000000106ae6003 0x0000000104b38003 0x0000000147c02803
|
||||
|
||||
[...]
|
||||
@ -0,0 +1,12 @@
|
||||
What: /sys/bus/platform/drivers/amd_x3d_vcache/AMDI0101:00/amd_x3d_mode
|
||||
Date: November 2024
|
||||
KernelVersion: 6.13
|
||||
Contact: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
|
||||
Description: (RW) AMD 3D V-Cache optimizer allows users to switch CPU core
|
||||
rankings dynamically.
|
||||
|
||||
This file switches between these two modes:
|
||||
- "frequency" cores within the faster CCD are prioritized before
|
||||
those in the slower CCD.
|
||||
- "cache" cores within the larger L3 CCD are prioritized before
|
||||
those in the smaller L3 CCD.
|
||||
@ -149,6 +149,19 @@ Description:
|
||||
advertise to the partner. The currently used capabilities are in
|
||||
brackets. Selection happens by writing to the file.
|
||||
|
||||
What: /sys/class/typec/<port>/usb_capability
|
||||
Date: November 2024
|
||||
Contact: Heikki Krogerus <heikki.krogerus@linux.intel.com>
|
||||
Description: Lists the supported USB Modes. The default USB mode that is used
|
||||
next time with the Enter_USB Message is in brackets. The default
|
||||
mode can be changed by writing to the file when supported by the
|
||||
driver.
|
||||
|
||||
Valid values:
|
||||
- usb2 (USB 2.0)
|
||||
- usb3 (USB 3.2)
|
||||
- usb4 (USB4)
|
||||
|
||||
USB Type-C partner devices (eg. /sys/class/typec/port0-partner/)
|
||||
|
||||
What: /sys/class/typec/<port>-partner/accessory_mode
|
||||
@ -220,6 +233,20 @@ Description:
|
||||
directory exists, it will have an attribute file for every VDO
|
||||
in Discover Identity command result.
|
||||
|
||||
What: /sys/class/typec/<port>-partner/usb_mode
|
||||
Date: November 2024
|
||||
Contact: Heikki Krogerus <heikki.krogerus@linux.intel.com>
|
||||
Description: The USB Modes that the partner device supports. The active mode
|
||||
is displayed in brackets. The active USB mode can be changed by
|
||||
writing to this file when the port driver is able to send Data
|
||||
Reset Message to the partner. That requires USB Power Delivery
|
||||
contract between the partner and the port.
|
||||
|
||||
Valid values:
|
||||
- usb2 (USB 2.0)
|
||||
- usb3 (USB 3.2)
|
||||
- usb4 (USB4)
|
||||
|
||||
USB Type-C cable devices (eg. /sys/class/typec/port0-cable/)
|
||||
|
||||
Note: Electronically Marked Cables will have a device also for one cable plug
|
||||
|
||||
@ -533,7 +533,6 @@ What: /sys/devices/system/cpu/vulnerabilities
|
||||
/sys/devices/system/cpu/vulnerabilities/srbds
|
||||
/sys/devices/system/cpu/vulnerabilities/tsa
|
||||
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
|
||||
/sys/devices/system/cpu/vulnerabilities/vmscape
|
||||
Date: January 2018
|
||||
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
|
||||
Description: Information about CPU vulnerabilities
|
||||
|
||||
@ -15,25 +15,23 @@ Description:
|
||||
The log sequence number (LSN) of the current tail of the
|
||||
log. The LSN is exported in "cycle:basic block" format.
|
||||
|
||||
What: /sys/fs/xfs/<disk>/log/reserve_grant_head
|
||||
Date: July 2014
|
||||
KernelVersion: 3.17
|
||||
Contact: xfs@oss.sgi.com
|
||||
What: /sys/fs/xfs/<disk>/log/reserve_grant_head_bytes
|
||||
Date: June 2024
|
||||
KernelVersion: 6.11
|
||||
Contact: linux-xfs@vger.kernel.org
|
||||
Description:
|
||||
The current state of the log reserve grant head. It
|
||||
represents the total log reservation of all currently
|
||||
outstanding transactions. The grant head is exported in
|
||||
"cycle:bytes" format.
|
||||
outstanding transactions in bytes.
|
||||
Users: xfstests
|
||||
|
||||
What: /sys/fs/xfs/<disk>/log/write_grant_head
|
||||
Date: July 2014
|
||||
KernelVersion: 3.17
|
||||
Contact: xfs@oss.sgi.com
|
||||
What: /sys/fs/xfs/<disk>/log/write_grant_head_bytes
|
||||
Date: June 2024
|
||||
KernelVersion: 6.11
|
||||
Contact: linux-xfs@vger.kernel.org
|
||||
Description:
|
||||
The current state of the log write grant head. It
|
||||
represents the total log reservation of all currently
|
||||
outstanding transactions, including regrants due to
|
||||
rolling transactions. The grant head is exported in
|
||||
"cycle:bytes" format.
|
||||
rolling transactions in bytes.
|
||||
Users: xfstests
|
||||
|
||||
@ -55,6 +55,15 @@ Description:
|
||||
An attribute which indicates whether the patch supports
|
||||
atomic-replace.
|
||||
|
||||
What: /sys/kernel/livepatch/<patch>/stack_order
|
||||
Date: Jan 2025
|
||||
KernelVersion: 6.14.0
|
||||
Description:
|
||||
This attribute specifies the sequence in which live patch modules
|
||||
are applied to the system. If multiple live patches modify the same
|
||||
function, the implementation with the biggest 'stack_order' number
|
||||
is used, unless a transition is currently in progress.
|
||||
|
||||
What: /sys/kernel/livepatch/<patch>/<object>
|
||||
Date: Nov 2014
|
||||
KernelVersion: 3.19.0
|
||||
|
||||
@ -18,3 +18,4 @@ Linux PCI Bus Subsystem
|
||||
pcieaer-howto
|
||||
endpoint/index
|
||||
boot-interrupts
|
||||
tph
|
||||
|
||||
@ -217,8 +217,12 @@ capability structure except the PCI Express capability structure,
|
||||
that is shared between many drivers including the service drivers.
|
||||
RMW Capability accessors (pcie_capability_clear_and_set_word(),
|
||||
pcie_capability_set_word(), and pcie_capability_clear_word()) protect
|
||||
a selected set of PCI Express Capability Registers (Link Control
|
||||
Register and Root Control Register). Any change to those registers
|
||||
should be performed using RMW accessors to avoid problems due to
|
||||
concurrent updates. For the up-to-date list of protected registers,
|
||||
see pcie_capability_clear_and_set_word().
|
||||
a selected set of PCI Express Capability Registers:
|
||||
|
||||
* Link Control Register
|
||||
* Root Control Register
|
||||
* Link Control 2 Register
|
||||
|
||||
Any change to those registers should be performed using RMW accessors to
|
||||
avoid problems due to concurrent updates. For the up-to-date list of
|
||||
protected registers, see pcie_capability_clear_and_set_word().
|
||||
|
||||
132
Documentation/PCI/tph.rst
Normal file
@ -0,0 +1,132 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
|
||||
===========
|
||||
TPH Support
|
||||
===========
|
||||
|
||||
:Copyright: 2024 Advanced Micro Devices, Inc.
|
||||
:Authors: - Eric van Tassell <eric.vantassell@amd.com>
|
||||
- Wei Huang <wei.huang2@amd.com>
|
||||
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices
|
||||
to provide optimization hints for requests that target memory space.
|
||||
These hints, in a format called Steering Tags (STs), are embedded in the
|
||||
requester's TLP headers, enabling the system hardware, such as the Root
|
||||
Complex, to better manage platform resources for these requests.
|
||||
|
||||
For example, on platforms with TPH-based direct data cache injection
|
||||
support, an endpoint device can include appropriate STs in its DMA
|
||||
traffic to specify which cache the data should be written to. This allows
|
||||
the CPU core to have a higher probability of getting data from cache,
|
||||
potentially improving performance and reducing latency in data
|
||||
processing.
|
||||
|
||||
|
||||
How to Use TPH
|
||||
==============
|
||||
|
||||
TPH is presented as an optional extended capability in PCIe. The Linux
|
||||
kernel handles TPH discovery during boot, but it is up to the device
|
||||
driver to request TPH enablement if it is to be utilized. Once enabled,
|
||||
the driver uses the provided API to obtain the Steering Tag for the
|
||||
target memory and to program the ST into the device's ST table.
|
||||
|
||||
Enable TPH support in Linux
|
||||
---------------------------
|
||||
|
||||
To support TPH, the kernel must be built with the CONFIG_PCIE_TPH option
|
||||
enabled.
|
||||
|
||||
Manage TPH
|
||||
----------
|
||||
|
||||
To enable TPH for a device, use the following function::
|
||||
|
||||
int pcie_enable_tph(struct pci_dev *pdev, int mode);
|
||||
|
||||
This function enables TPH support for device with a specific ST mode.
|
||||
Current supported modes include:
|
||||
|
||||
* PCI_TPH_ST_NS_MODE - NO ST Mode
|
||||
* PCI_TPH_ST_IV_MODE - Interrupt Vector Mode
|
||||
* PCI_TPH_ST_DS_MODE - Device Specific Mode
|
||||
|
||||
`pcie_enable_tph()` checks whether the requested mode is actually
|
||||
supported by the device before enabling. The device driver can figure out
|
||||
which TPH mode is supported and can be properly enabled based on the
|
||||
return value of `pcie_enable_tph()`.
|
||||
|
||||
To disable TPH, use the following function::
|
||||
|
||||
void pcie_disable_tph(struct pci_dev *pdev);
|
||||
|
||||
Manage ST
|
||||
---------
|
||||
|
||||
Steering Tags are platform specific. PCIe spec does not specify where STs
|
||||
are from. Instead PCI Firmware Specification defines an ACPI _DSM method
|
||||
(see the `Revised _DSM for Cache Locality TPH Features ECN
|
||||
<https://members.pcisig.com/wg/PCI-SIG/document/15470>`_) for retrieving
|
||||
STs for a target memory of various properties. This method is what is
|
||||
supported in this implementation.
|
||||
|
||||
To retrieve a Steering Tag for a target memory associated with a specific
|
||||
CPU, use the following function::
|
||||
|
||||
int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type type,
|
||||
unsigned int cpu_uid, u16 *tag);
|
||||
|
||||
The `type` argument is used to specify the memory type, either volatile
|
||||
or persistent, of the target memory. The `cpu_uid` argument specifies the
|
||||
CPU where the memory is associated to.
|
||||
|
||||
After the ST value is retrieved, the device driver can use the following
|
||||
function to write the ST into the device::
|
||||
|
||||
int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index,
|
||||
u16 tag);
|
||||
|
||||
The `index` argument is the ST table entry index the ST tag will be
|
||||
written into. `pcie_tph_set_st_entry()` will figure out the proper
|
||||
location of ST table, either in the MSI-X table or in the TPH Extended
|
||||
Capability space, and write the Steering Tag into the ST entry pointed by
|
||||
the `index` argument.
|
||||
|
||||
It is completely up to the driver to decide how to use these TPH
|
||||
functions. For example a network device driver can use the TPH APIs above
|
||||
to update the Steering Tag when interrupt affinity of a RX/TX queue has
|
||||
been changed. Here is a sample code for IRQ affinity notifier:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
static void irq_affinity_notified(struct irq_affinity_notify *notify,
|
||||
const cpumask_t *mask)
|
||||
{
|
||||
struct drv_irq *irq;
|
||||
unsigned int cpu_id;
|
||||
u16 tag;
|
||||
|
||||
irq = container_of(notify, struct drv_irq, affinity_notify);
|
||||
cpumask_copy(irq->cpu_mask, mask);
|
||||
|
||||
/* Pick a right CPU as the target - here is just an example */
|
||||
cpu_id = cpumask_first(irq->cpu_mask);
|
||||
|
||||
if (pcie_tph_get_cpu_st(irq->pdev, TPH_MEM_TYPE_VM, cpu_id,
|
||||
&tag))
|
||||
return;
|
||||
|
||||
if (pcie_tph_set_st_entry(irq->pdev, irq->msix_nr, tag))
|
||||
return;
|
||||
}
|
||||
|
||||
Disable TPH system-wide
|
||||
-----------------------
|
||||
|
||||
There is a kernel command line option available to control TPH feature:
|
||||
* "notph": TPH will be disabled for all endpoint devices.
|
||||
@ -921,10 +921,10 @@ This portion of the ``rcu_data`` structure is declared as follows:
|
||||
|
||||
::
|
||||
|
||||
1 int dynticks_snap;
|
||||
1 int watching_snap;
|
||||
2 unsigned long dynticks_fqs;
|
||||
|
||||
The ``->dynticks_snap`` field is used to take a snapshot of the
|
||||
The ``->watching_snap`` field is used to take a snapshot of the
|
||||
corresponding CPU's dyntick-idle state when forcing quiescent states,
|
||||
and is therefore accessed from other CPUs. Finally, the
|
||||
``->dynticks_fqs`` field is used to count the number of times this CPU
|
||||
@ -935,8 +935,8 @@ This portion of the rcu_data structure is declared as follows:
|
||||
|
||||
::
|
||||
|
||||
1 long dynticks_nesting;
|
||||
2 long dynticks_nmi_nesting;
|
||||
1 long nesting;
|
||||
2 long nmi_nesting;
|
||||
3 atomic_t dynticks;
|
||||
4 bool rcu_need_heavy_qs;
|
||||
5 bool rcu_urgent_qs;
|
||||
@ -945,14 +945,14 @@ These fields in the rcu_data structure maintain the per-CPU dyntick-idle
|
||||
state for the corresponding CPU. The fields may be accessed only from
|
||||
the corresponding CPU (and from tracing) unless otherwise stated.
|
||||
|
||||
The ``->dynticks_nesting`` field counts the nesting depth of process
|
||||
The ``->nesting`` field counts the nesting depth of process
|
||||
execution, so that in normal circumstances this counter has value zero
|
||||
or one. NMIs, irqs, and tracers are counted by the
|
||||
``->dynticks_nmi_nesting`` field. Because NMIs cannot be masked, changes
|
||||
``->nmi_nesting`` field. Because NMIs cannot be masked, changes
|
||||
to this variable have to be undertaken carefully using an algorithm
|
||||
provided by Andy Lutomirski. The initial transition from idle adds one,
|
||||
and nested transitions add two, so that a nesting level of five is
|
||||
represented by a ``->dynticks_nmi_nesting`` value of nine. This counter
|
||||
represented by a ``->nmi_nesting`` value of nine. This counter
|
||||
can therefore be thought of as counting the number of reasons why this
|
||||
CPU cannot be permitted to enter dyntick-idle mode, aside from
|
||||
process-level transitions.
|
||||
@ -960,12 +960,12 @@ process-level transitions.
|
||||
However, it turns out that when running in non-idle kernel context, the
|
||||
Linux kernel is fully capable of entering interrupt handlers that never
|
||||
exit and perhaps also vice versa. Therefore, whenever the
|
||||
``->dynticks_nesting`` field is incremented up from zero, the
|
||||
``->dynticks_nmi_nesting`` field is set to a large positive number, and
|
||||
whenever the ``->dynticks_nesting`` field is decremented down to zero,
|
||||
the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that
|
||||
``->nesting`` field is incremented up from zero, the
|
||||
``->nmi_nesting`` field is set to a large positive number, and
|
||||
whenever the ``->nesting`` field is decremented down to zero,
|
||||
the ``->nmi_nesting`` field is set to zero. Assuming that
|
||||
the number of misnested interrupts is not sufficient to overflow the
|
||||
counter, this approach corrects the ``->dynticks_nmi_nesting`` field
|
||||
counter, this approach corrects the ``->nmi_nesting`` field
|
||||
every time the corresponding CPU enters the idle loop from process
|
||||
context.
|
||||
|
||||
@ -992,8 +992,8 @@ code.
|
||||
+-----------------------------------------------------------------------+
|
||||
| **Quick Quiz**: |
|
||||
+-----------------------------------------------------------------------+
|
||||
| Why not simply combine the ``->dynticks_nesting`` and |
|
||||
| ``->dynticks_nmi_nesting`` counters into a single counter that just |
|
||||
| Why not simply combine the ``->nesting`` and |
|
||||
| ``->nmi_nesting`` counters into a single counter that just |
|
||||
| counts the number of reasons that the corresponding CPU is non-idle? |
|
||||
+-----------------------------------------------------------------------+
|
||||
| **Answer**: |
|
||||
|
||||
@ -147,11 +147,11 @@ RCU read-side critical sections preceding and following the current
|
||||
idle sojourn.
|
||||
This case is handled by calls to the strongly ordered
|
||||
``atomic_add_return()`` read-modify-write atomic operation that
|
||||
is invoked within ``rcu_dynticks_eqs_enter()`` at idle-entry
|
||||
time and within ``rcu_dynticks_eqs_exit()`` at idle-exit time.
|
||||
The grace-period kthread invokes ``rcu_dynticks_snap()`` and
|
||||
``rcu_dynticks_in_eqs_since()`` (both of which invoke
|
||||
an ``atomic_add_return()`` of zero) to detect idle CPUs.
|
||||
is invoked within ``ct_kernel_exit_state()`` at idle-entry
|
||||
time and within ``ct_kernel_enter_state()`` at idle-exit time.
|
||||
The grace-period kthread invokes first ``ct_rcu_watching_cpu_acquire()``
|
||||
(preceded by a full memory barrier) and ``rcu_watching_snap_stopped_since()``
|
||||
(both of which rely on acquire semantics) to detect idle CPUs.
|
||||
|
||||
+-----------------------------------------------------------------------+
|
||||
| **Quick Quiz**: |
|
||||
|
||||
@ -564,15 +564,6 @@
|
||||
font-size="192"
|
||||
id="text202-7-9-6"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_migrate_callbacks()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="8335.4873"
|
||||
y="5357.1006"
|
||||
font-style="normal"
|
||||
font-weight="bold"
|
||||
font-size="192"
|
||||
id="text202-7-9-6-0"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_migrate_callbacks()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="8768.4678"
|
||||
|
||||
|
Before Width: | Height: | Size: 23 KiB After Width: | Height: | Size: 23 KiB |
@ -528,7 +528,7 @@
|
||||
font-style="normal"
|
||||
y="-8652.5312"
|
||||
x="2466.7822"
|
||||
xml:space="preserve">dyntick_save_progress_counter()</text>
|
||||
xml:space="preserve">rcu_watching_snap_save()</text>
|
||||
<text
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"
|
||||
id="text202-7-2-7-2-0"
|
||||
@ -537,7 +537,7 @@
|
||||
font-style="normal"
|
||||
y="-8368.1475"
|
||||
x="2463.3262"
|
||||
xml:space="preserve">rcu_implicit_dynticks_qs()</text>
|
||||
xml:space="preserve">rcu_watching_snap_recheck()</text>
|
||||
</g>
|
||||
<g
|
||||
id="g4504"
|
||||
@ -607,7 +607,7 @@
|
||||
font-weight="bold"
|
||||
font-size="192"
|
||||
id="text202-7-5-3-27-6"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_enter()</text>
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_exit_state()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="3745.7725"
|
||||
@ -638,7 +638,7 @@
|
||||
font-weight="bold"
|
||||
font-size="192"
|
||||
id="text202-7-5-3-27-6-1"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_exit()</text>
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_enter_state()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="3745.7725"
|
||||
|
||||
|
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 25 KiB |
@ -844,7 +844,7 @@
|
||||
font-style="normal"
|
||||
y="1547.8876"
|
||||
x="4417.6396"
|
||||
xml:space="preserve">dyntick_save_progress_counter()</text>
|
||||
xml:space="preserve">rcu_watching_snap_save()</text>
|
||||
<g
|
||||
style="fill:none;stroke-width:0.025in"
|
||||
transform="translate(6501.9719,-10685.904)"
|
||||
@ -899,7 +899,7 @@
|
||||
font-style="normal"
|
||||
y="1858.8729"
|
||||
x="4414.1836"
|
||||
xml:space="preserve">rcu_implicit_dynticks_qs()</text>
|
||||
xml:space="preserve">rcu_watching_snap_recheck()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="14659.87"
|
||||
@ -977,7 +977,7 @@
|
||||
font-weight="bold"
|
||||
font-size="192"
|
||||
id="text202-7-5-3-27-6"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_enter()</text>
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_exit_state()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="3745.7725"
|
||||
@ -1008,7 +1008,7 @@
|
||||
font-weight="bold"
|
||||
font-size="192"
|
||||
id="text202-7-5-3-27-6-1"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_exit()</text>
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_enter_state()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="3745.7725"
|
||||
|
||||
|
Before Width: | Height: | Size: 50 KiB After Width: | Height: | Size: 50 KiB |
@ -1446,15 +1446,6 @@
|
||||
font-size="192"
|
||||
id="text202-7-9-6"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_migrate_callbacks()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="8335.4873"
|
||||
y="5357.1006"
|
||||
font-style="normal"
|
||||
font-weight="bold"
|
||||
font-size="192"
|
||||
id="text202-7-9-6-0"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_migrate_callbacks()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="8768.4678"
|
||||
@ -2983,7 +2974,7 @@
|
||||
font-style="normal"
|
||||
y="38114.047"
|
||||
x="-334.33856"
|
||||
xml:space="preserve">dyntick_save_progress_counter()</text>
|
||||
xml:space="preserve">rcu_watching_snap_save()</text>
|
||||
<g
|
||||
style="fill:none;stroke-width:0.025in"
|
||||
transform="translate(1749.9916,25880.249)"
|
||||
@ -3038,7 +3029,7 @@
|
||||
font-style="normal"
|
||||
y="38425.035"
|
||||
x="-337.79462"
|
||||
xml:space="preserve">rcu_implicit_dynticks_qs()</text>
|
||||
xml:space="preserve">rcu_watching_snap_recheck()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="9907.8887"
|
||||
@ -3116,7 +3107,7 @@
|
||||
font-weight="bold"
|
||||
font-size="192"
|
||||
id="text202-7-5-3-27-6"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_enter()</text>
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_exit_state()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="3745.7725"
|
||||
@ -3147,7 +3138,7 @@
|
||||
font-weight="bold"
|
||||
font-size="192"
|
||||
id="text202-7-5-3-27-6-1"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_exit()</text>
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_enter_state()</text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
x="3745.7725"
|
||||
|
||||
|
Before Width: | Height: | Size: 209 KiB After Width: | Height: | Size: 208 KiB |
@ -516,7 +516,7 @@
|
||||
font-style="normal"
|
||||
y="-8652.5312"
|
||||
x="2466.7822"
|
||||
xml:space="preserve">dyntick_save_progress_counter()</text>
|
||||
xml:space="preserve">rcu_watching_snap_save()</text>
|
||||
<text
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"
|
||||
id="text202-7-2-7-2-0"
|
||||
@ -525,7 +525,7 @@
|
||||
font-style="normal"
|
||||
y="-8368.1475"
|
||||
x="2463.3262"
|
||||
xml:space="preserve">rcu_implicit_dynticks_qs()</text>
|
||||
xml:space="preserve">rcu_watching_snap_recheck()</text>
|
||||
<text
|
||||
sodipodi:linespacing="125%"
|
||||
style="font-size:192px;font-style:normal;font-weight:bold;line-height:125%;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"
|
||||
|
||||
|
Before Width: | Height: | Size: 28 KiB After Width: | Height: | Size: 28 KiB |
@ -9,6 +9,15 @@ is that all of the required memory barriers are included for you in
|
||||
the list macros. This document describes several applications of RCU,
|
||||
with the best fits first.
|
||||
|
||||
When iterating a list while holding the rcu_read_lock(), writers may
|
||||
modify the list. The reader is guaranteed to see all of the elements
|
||||
which were added to the list before they acquired the rcu_read_lock()
|
||||
and are still on the list when they drop the rcu_read_unlock().
|
||||
Elements which are added to, or removed from the list may or may not
|
||||
be seen. If the writer calls list_replace_rcu(), the reader may see
|
||||
either the old element or the new element; they will not see both,
|
||||
nor will they see neither.
|
||||
|
||||
|
||||
Example 1: Read-mostly list: Deferred Destruction
|
||||
-------------------------------------------------
|
||||
|
||||
@ -10,7 +10,7 @@ misuses of the RCU API, most notably using one of the rcu_dereference()
|
||||
family to access an RCU-protected pointer without the proper protection.
|
||||
When such misuse is detected, an lockdep-RCU splat is emitted.
|
||||
|
||||
The usual cause of a lockdep-RCU slat is someone accessing an
|
||||
The usual cause of a lockdep-RCU splat is someone accessing an
|
||||
RCU-protected data structure without either (1) being in the right kind of
|
||||
RCU read-side critical section or (2) holding the right update-side lock.
|
||||
This problem can therefore be serious: it might result in random memory
|
||||
|
||||
@ -14,23 +14,34 @@ Using 'nulls'
|
||||
=============
|
||||
|
||||
Using special makers (called 'nulls') is a convenient way
|
||||
to solve following problem :
|
||||
to solve following problem.
|
||||
|
||||
A typical RCU linked list managing objects which are
|
||||
allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
|
||||
use following algos :
|
||||
|
||||
1) Lookup algo
|
||||
--------------
|
||||
Without 'nulls', a typical RCU linked list managing objects which are
|
||||
allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can use the following
|
||||
algorithms. Following examples assume 'obj' is a pointer to such
|
||||
objects, which is having below type.
|
||||
|
||||
::
|
||||
|
||||
struct object {
|
||||
struct hlist_node obj_node;
|
||||
atomic_t refcnt;
|
||||
unsigned int key;
|
||||
};
|
||||
|
||||
1) Lookup algorithm
|
||||
-------------------
|
||||
|
||||
::
|
||||
|
||||
rcu_read_lock()
|
||||
begin:
|
||||
rcu_read_lock();
|
||||
obj = lockless_lookup(key);
|
||||
if (obj) {
|
||||
if (!try_get_ref(obj)) // might fail for free objects
|
||||
if (!try_get_ref(obj)) { // might fail for free objects
|
||||
rcu_read_unlock();
|
||||
goto begin;
|
||||
}
|
||||
/*
|
||||
* Because a writer could delete object, and a writer could
|
||||
* reuse these object before the RCU grace period, we
|
||||
@ -38,6 +49,7 @@ use following algos :
|
||||
*/
|
||||
if (obj->key != key) { // not the object we expected
|
||||
put_ref(obj);
|
||||
rcu_read_unlock();
|
||||
goto begin;
|
||||
}
|
||||
}
|
||||
@ -52,9 +64,9 @@ but a version with an additional memory barrier (smp_rmb())
|
||||
{
|
||||
struct hlist_node *node, *next;
|
||||
for (pos = rcu_dereference((head)->first);
|
||||
pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
|
||||
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
|
||||
pos = rcu_dereference(next))
|
||||
pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
|
||||
({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; });
|
||||
pos = rcu_dereference(next))
|
||||
if (obj->key == key)
|
||||
return obj;
|
||||
return NULL;
|
||||
@ -64,11 +76,11 @@ And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb()::
|
||||
|
||||
struct hlist_node *node;
|
||||
for (pos = rcu_dereference((head)->first);
|
||||
pos && ({ prefetch(pos->next); 1; }) &&
|
||||
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
|
||||
pos = rcu_dereference(pos->next))
|
||||
if (obj->key == key)
|
||||
return obj;
|
||||
pos && ({ prefetch(pos->next); 1; }) &&
|
||||
({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; });
|
||||
pos = rcu_dereference(pos->next))
|
||||
if (obj->key == key)
|
||||
return obj;
|
||||
return NULL;
|
||||
|
||||
Quoting Corey Minyard::
|
||||
@ -82,36 +94,32 @@ Quoting Corey Minyard::
|
||||
solved by pre-fetching the "next" field (with proper barriers) before
|
||||
checking the key."
|
||||
|
||||
2) Insert algo
|
||||
--------------
|
||||
2) Insertion algorithm
|
||||
----------------------
|
||||
|
||||
We need to make sure a reader cannot read the new 'obj->obj_next' value
|
||||
and previous value of 'obj->key'. Or else, an item could be deleted
|
||||
We need to make sure a reader cannot read the new 'obj->obj_node.next' value
|
||||
and previous value of 'obj->key'. Otherwise, an item could be deleted
|
||||
from a chain, and inserted into another chain. If new chain was empty
|
||||
before the move, 'next' pointer is NULL, and lockless reader can
|
||||
not detect it missed following items in original chain.
|
||||
before the move, 'next' pointer is NULL, and lockless reader can not
|
||||
detect the fact that it missed following items in original chain.
|
||||
|
||||
::
|
||||
|
||||
/*
|
||||
* Please note that new inserts are done at the head of list,
|
||||
* not in the middle or end.
|
||||
*/
|
||||
* Please note that new inserts are done at the head of list,
|
||||
* not in the middle or end.
|
||||
*/
|
||||
obj = kmem_cache_alloc(...);
|
||||
lock_chain(); // typically a spin_lock()
|
||||
obj->key = key;
|
||||
/*
|
||||
* we need to make sure obj->key is updated before obj->next
|
||||
* or obj->refcnt
|
||||
*/
|
||||
smp_wmb();
|
||||
atomic_set(&obj->refcnt, 1);
|
||||
atomic_set_release(&obj->refcnt, 1); // key before refcnt
|
||||
hlist_add_head_rcu(&obj->obj_node, list);
|
||||
unlock_chain(); // typically a spin_unlock()
|
||||
|
||||
|
||||
3) Remove algo
|
||||
--------------
|
||||
3) Removal algorithm
|
||||
--------------------
|
||||
|
||||
Nothing special here, we can use a standard RCU hlist deletion.
|
||||
But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
|
||||
very very fast (before the end of RCU grace period)
|
||||
@ -132,8 +140,7 @@ very very fast (before the end of RCU grace period)
|
||||
Avoiding extra smp_rmb()
|
||||
========================
|
||||
|
||||
With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
|
||||
and extra smp_wmb() in insert function.
|
||||
With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup().
|
||||
|
||||
For example, if we choose to store the slot number as the 'nulls'
|
||||
end-of-list marker for each slot of the hash table, we can detect
|
||||
@ -142,59 +149,67 @@ to another chain) checking the final 'nulls' value if
|
||||
the lookup met the end of chain. If final 'nulls' value
|
||||
is not the slot number, then we must restart the lookup at
|
||||
the beginning. If the object was moved to the same chain,
|
||||
then the reader doesn't care : It might eventually
|
||||
then the reader doesn't care: It might occasionally
|
||||
scan the list again without harm.
|
||||
|
||||
Note that using hlist_nulls means the type of 'obj_node' field of
|
||||
'struct object' becomes 'struct hlist_nulls_node'.
|
||||
|
||||
1) lookup algo
|
||||
--------------
|
||||
|
||||
1) lookup algorithm
|
||||
-------------------
|
||||
|
||||
::
|
||||
|
||||
head = &table[slot];
|
||||
rcu_read_lock();
|
||||
begin:
|
||||
hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
|
||||
rcu_read_lock();
|
||||
hlist_nulls_for_each_entry_rcu(obj, node, head, obj_node) {
|
||||
if (obj->key == key) {
|
||||
if (!try_get_ref(obj)) // might fail for free objects
|
||||
goto begin;
|
||||
if (obj->key != key) { // not the object we expected
|
||||
put_ref(obj);
|
||||
if (!try_get_ref(obj)) { // might fail for free objects
|
||||
rcu_read_unlock();
|
||||
goto begin;
|
||||
}
|
||||
goto out;
|
||||
if (obj->key != key) { // not the object we expected
|
||||
put_ref(obj);
|
||||
rcu_read_unlock();
|
||||
goto begin;
|
||||
}
|
||||
goto out;
|
||||
}
|
||||
}
|
||||
|
||||
// If the nulls value we got at the end of this lookup is
|
||||
// not the expected one, we must restart lookup.
|
||||
// We probably met an item that was moved to another chain.
|
||||
if (get_nulls_value(node) != slot) {
|
||||
put_ref(obj);
|
||||
rcu_read_unlock();
|
||||
goto begin;
|
||||
}
|
||||
/*
|
||||
* if the nulls value we got at the end of this lookup is
|
||||
* not the expected one, we must restart lookup.
|
||||
* We probably met an item that was moved to another chain.
|
||||
*/
|
||||
if (get_nulls_value(node) != slot)
|
||||
goto begin;
|
||||
obj = NULL;
|
||||
|
||||
out:
|
||||
rcu_read_unlock();
|
||||
|
||||
2) Insert function
|
||||
------------------
|
||||
2) Insert algorithm
|
||||
-------------------
|
||||
|
||||
Same to the above one, but uses hlist_nulls_add_head_rcu() instead of
|
||||
hlist_add_head_rcu().
|
||||
|
||||
::
|
||||
|
||||
/*
|
||||
* Please note that new inserts are done at the head of list,
|
||||
* not in the middle or end.
|
||||
*/
|
||||
* Please note that new inserts are done at the head of list,
|
||||
* not in the middle or end.
|
||||
*/
|
||||
obj = kmem_cache_alloc(cachep);
|
||||
lock_chain(); // typically a spin_lock()
|
||||
obj->key = key;
|
||||
atomic_set_release(&obj->refcnt, 1); // key before refcnt
|
||||
/*
|
||||
* changes to obj->key must be visible before refcnt one
|
||||
*/
|
||||
smp_wmb();
|
||||
atomic_set(&obj->refcnt, 1);
|
||||
/*
|
||||
* insert obj in RCU way (readers might be traversing chain)
|
||||
*/
|
||||
* insert obj in RCU way (readers might be traversing chain)
|
||||
*/
|
||||
hlist_nulls_add_head_rcu(&obj->obj_node, list);
|
||||
unlock_chain(); // typically a spin_unlock()
|
||||
|
||||
@ -185,7 +185,7 @@ argument.
|
||||
Not all changes require that all scenarios be run. For example, a change
|
||||
to Tree SRCU might run only the SRCU-N and SRCU-P scenarios using the
|
||||
--configs argument to kvm.sh as follows: "--configs 'SRCU-N SRCU-P'".
|
||||
Large systems can run multiple copies of of the full set of scenarios,
|
||||
Large systems can run multiple copies of the full set of scenarios,
|
||||
for example, a system with 448 hardware threads can run five instances
|
||||
of the full set concurrently. To make this happen::
|
||||
|
||||
|
||||
@ -52,8 +52,8 @@ experiment with should focus on Section 2. People who prefer to start
|
||||
with example uses should focus on Sections 3 and 4. People who need to
|
||||
understand the RCU implementation should focus on Section 5, then dive
|
||||
into the kernel source code. People who reason best by analogy should
|
||||
focus on Section 6. Section 7 serves as an index to the docbook API
|
||||
documentation, and Section 8 is the traditional answer key.
|
||||
focus on Section 6 and 7. Section 8 serves as an index to the docbook
|
||||
API documentation, and Section 9 is the traditional answer key.
|
||||
|
||||
So, start with the section that makes the most sense to you and your
|
||||
preferred method of learning. If you need to know everything about
|
||||
|
||||
@ -75,4 +75,4 @@ taking two different snapshots of feedback counters at time T1 and T2.
|
||||
delivered_counter_delta = fbc_t2[del] - fbc_t1[del]
|
||||
reference_counter_delta = fbc_t2[ref] - fbc_t1[ref]
|
||||
|
||||
delivered_perf = (refernce_perf x delivered_counter_delta) / reference_counter_delta
|
||||
delivered_perf = (reference_perf x delivered_counter_delta) / reference_counter_delta
|
||||
|
||||
@ -533,10 +533,12 @@ cgroup namespace on namespace creation.
|
||||
Because the resource control interface files in a given directory
|
||||
control the distribution of the parent's resources, the delegatee
|
||||
shouldn't be allowed to write to them. For the first method, this is
|
||||
achieved by not granting access to these files. For the second, the
|
||||
kernel rejects writes to all files other than "cgroup.procs" and
|
||||
"cgroup.subtree_control" on a namespace root from inside the
|
||||
namespace.
|
||||
achieved by not granting access to these files. For the second, files
|
||||
outside the namespace should be hidden from the delegatee by the means
|
||||
of at least mount namespacing, and the kernel rejects writes to all
|
||||
files on a namespace root from inside the cgroup namespace, except for
|
||||
those files listed in "/sys/kernel/cgroup/delegate" (including
|
||||
"cgroup.procs", "cgroup.threads", "cgroup.subtree_control", etc.).
|
||||
|
||||
The end results are equivalent for both delegation types. Once
|
||||
delegated, the user can build sub-hierarchy under the directory,
|
||||
@ -1708,6 +1710,8 @@ PAGE_SIZE multiple when read back.
|
||||
|
||||
Note that this is subtly different from setting memory.swap.max to
|
||||
0, as it still allows for pages to be written to the zswap pool.
|
||||
This setting has no effect if zswap is disabled, and swapping
|
||||
is allowed unless memory.swap.max is set to 0.
|
||||
|
||||
memory.pressure
|
||||
A read-only nested-keyed file.
|
||||
|
||||
@ -270,6 +270,8 @@ configured for Unix Extensions (and the client has not disabled
|
||||
illegal Windows/NTFS/SMB characters to a remap range (this mount parameter
|
||||
is the default for SMB3). This remap (``mapposix``) range is also
|
||||
compatible with Mac (and "Services for Mac" on some older Windows).
|
||||
When POSIX Extensions for SMB 3.1.1 are negotiated, remapping is automatically
|
||||
disabled.
|
||||
|
||||
CIFS VFS Mount Options
|
||||
======================
|
||||
|
||||
@ -3,29 +3,52 @@ dm-delay
|
||||
========
|
||||
|
||||
Device-Mapper's "delay" target delays reads and/or writes
|
||||
and maps them to different devices.
|
||||
and/or flushs and optionally maps them to different devices.
|
||||
|
||||
Parameters::
|
||||
Arguments::
|
||||
|
||||
<device> <offset> <delay> [<write_device> <write_offset> <write_delay>
|
||||
[<flush_device> <flush_offset> <flush_delay>]]
|
||||
|
||||
With separate write parameters, the first set is only used for reads.
|
||||
Table line has to either have 3, 6 or 9 arguments:
|
||||
|
||||
3: apply offset and delay to read, write and flush operations on device
|
||||
|
||||
6: apply offset and delay to device, also apply write_offset and write_delay
|
||||
to write and flush operations on optionally different write_device with
|
||||
optionally different sector offset
|
||||
|
||||
9: same as 6 arguments plus define flush_offset and flush_delay explicitely
|
||||
on/with optionally different flush_device/flush_offset.
|
||||
|
||||
Offsets are specified in sectors.
|
||||
|
||||
Delays are specified in milliseconds.
|
||||
|
||||
|
||||
Example scripts
|
||||
===============
|
||||
|
||||
::
|
||||
|
||||
#!/bin/sh
|
||||
# Create device delaying rw operation for 500ms
|
||||
echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
|
||||
#
|
||||
# Create mapped device named "delayed" delaying read, write and flush operations for 500ms.
|
||||
#
|
||||
dmsetup create delayed --table "0 `blockdev --getsz $1` delay $1 0 500"
|
||||
|
||||
::
|
||||
|
||||
#!/bin/sh
|
||||
# Create device delaying only write operation for 500ms and
|
||||
# splitting reads and writes to different devices $1 $2
|
||||
echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
|
||||
#
|
||||
# Create mapped device delaying write and flush operations for 400ms and
|
||||
# splitting reads to device $1 but writes and flushs to different device $2
|
||||
# to different offsets of 2048 and 4096 sectors respectively.
|
||||
#
|
||||
dmsetup create delayed --table "0 `blockdev --getsz $1` delay $1 2048 0 $2 4096 400"
|
||||
|
||||
::
|
||||
#!/bin/sh
|
||||
#
|
||||
# Create mapped device delaying reads for 50ms, writes for 100ms and flushs for 333ms
|
||||
# onto the same backing device at offset 0 sectors.
|
||||
#
|
||||
dmsetup create delayed --table "0 `blockdev --getsz $1` delay $1 0 50 $2 0 100 $1 0 333"
|
||||
|
||||
@ -146,6 +146,11 @@ integrity:<bytes>:<type>
|
||||
integrity for the encrypted device. The additional space is then
|
||||
used for storing authentication tag (and persistent IV if needed).
|
||||
|
||||
integrity_key_size:<bytes>
|
||||
Optionally set the integrity key size if it differs from the digest size.
|
||||
It allows the use of wrapped key algorithms where the key size is
|
||||
independent of the cryptographic key size.
|
||||
|
||||
sector_size:<bytes>
|
||||
Use <bytes> as the encryption unit instead of 512 bytes sectors.
|
||||
This option can be in range 512 - 4096 bytes and must be power of two.
|
||||
@ -160,6 +165,10 @@ iv_large_sectors
|
||||
The <iv_offset> must be multiple of <sector_size> (in 512 bytes units)
|
||||
if this flag is specified.
|
||||
|
||||
integrity_key_size:<bytes>
|
||||
Use an integrity key of <bytes> size instead of using an integrity key size
|
||||
of the digest size of the used HMAC algorithm.
|
||||
|
||||
|
||||
Module parameters::
|
||||
|
||||
|
||||
@ -22,5 +22,5 @@ are configurable at compile, boot or run time.
|
||||
srso
|
||||
gather_data_sampling
|
||||
reg-file-data-sampling
|
||||
rsb
|
||||
indirect-target-selection
|
||||
vmscape
|
||||
|
||||
268
Documentation/admin-guide/hw-vuln/rsb.rst
Normal file
@ -0,0 +1,268 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================
|
||||
RSB-related mitigations
|
||||
=======================
|
||||
|
||||
.. warning::
|
||||
Please keep this document up-to-date, otherwise you will be
|
||||
volunteered to update it and convert it to a very long comment in
|
||||
bugs.c!
|
||||
|
||||
Since 2018 there have been many Spectre CVEs related to the Return Stack
|
||||
Buffer (RSB) (sometimes referred to as the Return Address Stack (RAS) or
|
||||
Return Address Predictor (RAP) on AMD).
|
||||
|
||||
Information about these CVEs and how to mitigate them is scattered
|
||||
amongst a myriad of microarchitecture-specific documents.
|
||||
|
||||
This document attempts to consolidate all the relevant information in
|
||||
once place and clarify the reasoning behind the current RSB-related
|
||||
mitigations. It's meant to be as concise as possible, focused only on
|
||||
the current kernel mitigations: what are the RSB-related attack vectors
|
||||
and how are they currently being mitigated?
|
||||
|
||||
It's *not* meant to describe how the RSB mechanism operates or how the
|
||||
exploits work. More details about those can be found in the references
|
||||
below.
|
||||
|
||||
Rather, this is basically a glorified comment, but too long to actually
|
||||
be one. So when the next CVE comes along, a kernel developer can
|
||||
quickly refer to this as a refresher to see what we're actually doing
|
||||
and why.
|
||||
|
||||
At a high level, there are two classes of RSB attacks: RSB poisoning
|
||||
(Intel and AMD) and RSB underflow (Intel only). They must each be
|
||||
considered individually for each attack vector (and microarchitecture
|
||||
where applicable).
|
||||
|
||||
----
|
||||
|
||||
RSB poisoning (Intel and AMD)
|
||||
=============================
|
||||
|
||||
SpectreRSB
|
||||
~~~~~~~~~~
|
||||
|
||||
RSB poisoning is a technique used by SpectreRSB [#spectre-rsb]_ where
|
||||
an attacker poisons an RSB entry to cause a victim's return instruction
|
||||
to speculate to an attacker-controlled address. This can happen when
|
||||
there are unbalanced CALLs/RETs after a context switch or VMEXIT.
|
||||
|
||||
* All attack vectors can potentially be mitigated by flushing out any
|
||||
poisoned RSB entries using an RSB filling sequence
|
||||
[#intel-rsb-filling]_ [#amd-rsb-filling]_ when transitioning between
|
||||
untrusted and trusted domains. But this has a performance impact and
|
||||
should be avoided whenever possible.
|
||||
|
||||
.. DANGER::
|
||||
**FIXME**: Currently we're flushing 32 entries. However, some CPU
|
||||
models have more than 32 entries. The loop count needs to be
|
||||
increased for those. More detailed information is needed about RSB
|
||||
sizes.
|
||||
|
||||
* On context switch, the user->user mitigation requires ensuring the
|
||||
RSB gets filled or cleared whenever IBPB gets written [#cond-ibpb]_
|
||||
during a context switch:
|
||||
|
||||
* AMD:
|
||||
On Zen 4+, IBPB (or SBPB [#amd-sbpb]_ if used) clears the RSB.
|
||||
This is indicated by IBPB_RET in CPUID [#amd-ibpb-rsb]_.
|
||||
|
||||
On Zen < 4, the RSB filling sequence [#amd-rsb-filling]_ must be
|
||||
always be done in addition to IBPB [#amd-ibpb-no-rsb]_. This is
|
||||
indicated by X86_BUG_IBPB_NO_RET.
|
||||
|
||||
* Intel:
|
||||
IBPB always clears the RSB:
|
||||
|
||||
"Software that executed before the IBPB command cannot control
|
||||
the predicted targets of indirect branches executed after the
|
||||
command on the same logical processor. The term indirect branch
|
||||
in this context includes near return instructions, so these
|
||||
predicted targets may come from the RSB." [#intel-ibpb-rsb]_
|
||||
|
||||
* On context switch, user->kernel attacks are prevented by SMEP. User
|
||||
space can only insert user space addresses into the RSB. Even
|
||||
non-canonical addresses can't be inserted due to the page gap at the
|
||||
end of the user canonical address space reserved by TASK_SIZE_MAX.
|
||||
A SMEP #PF at instruction fetch prevents the kernel from speculatively
|
||||
executing user space.
|
||||
|
||||
* AMD:
|
||||
"Finally, branches that are predicted as 'ret' instructions get
|
||||
their predicted targets from the Return Address Predictor (RAP).
|
||||
AMD recommends software use a RAP stuffing sequence (mitigation
|
||||
V2-3 in [2]) and/or Supervisor Mode Execution Protection (SMEP)
|
||||
to ensure that the addresses in the RAP are safe for
|
||||
speculation. Collectively, we refer to these mitigations as "RAP
|
||||
Protection"." [#amd-smep-rsb]_
|
||||
|
||||
* Intel:
|
||||
"On processors with enhanced IBRS, an RSB overwrite sequence may
|
||||
not suffice to prevent the predicted target of a near return
|
||||
from using an RSB entry created in a less privileged predictor
|
||||
mode. Software can prevent this by enabling SMEP (for
|
||||
transitions from user mode to supervisor mode) and by having
|
||||
IA32_SPEC_CTRL.IBRS set during VM exits." [#intel-smep-rsb]_
|
||||
|
||||
* On VMEXIT, guest->host attacks are mitigated by eIBRS (and PBRSB
|
||||
mitigation if needed):
|
||||
|
||||
* AMD:
|
||||
"When Automatic IBRS is enabled, the internal return address
|
||||
stack used for return address predictions is cleared on VMEXIT."
|
||||
[#amd-eibrs-vmexit]_
|
||||
|
||||
* Intel:
|
||||
"On processors with enhanced IBRS, an RSB overwrite sequence may
|
||||
not suffice to prevent the predicted target of a near return
|
||||
from using an RSB entry created in a less privileged predictor
|
||||
mode. Software can prevent this by enabling SMEP (for
|
||||
transitions from user mode to supervisor mode) and by having
|
||||
IA32_SPEC_CTRL.IBRS set during VM exits. Processors with
|
||||
enhanced IBRS still support the usage model where IBRS is set
|
||||
only in the OS/VMM for OSes that enable SMEP. To do this, such
|
||||
processors will ensure that guest behavior cannot control the
|
||||
RSB after a VM exit once IBRS is set, even if IBRS was not set
|
||||
at the time of the VM exit." [#intel-eibrs-vmexit]_
|
||||
|
||||
Note that some Intel CPUs are susceptible to Post-barrier Return
|
||||
Stack Buffer Predictions (PBRSB) [#intel-pbrsb]_, where the last
|
||||
CALL from the guest can be used to predict the first unbalanced RET.
|
||||
In this case the PBRSB mitigation is needed in addition to eIBRS.
|
||||
|
||||
AMD RETBleed / SRSO / Branch Type Confusion
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
On AMD, poisoned RSB entries can also be created by the AMD RETBleed
|
||||
variant [#retbleed-paper]_ [#amd-btc]_ or by Speculative Return Stack
|
||||
Overflow [#amd-srso]_ (Inception [#inception-paper]_). The kernel
|
||||
protects itself by replacing every RET in the kernel with a branch to a
|
||||
single safe RET.
|
||||
|
||||
----
|
||||
|
||||
RSB underflow (Intel only)
|
||||
==========================
|
||||
|
||||
RSB Alternate (RSBA) ("Intel Retbleed")
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Some Intel Skylake-generation CPUs are susceptible to the Intel variant
|
||||
of RETBleed [#retbleed-paper]_ (Return Stack Buffer Underflow
|
||||
[#intel-rsbu]_). If a RET is executed when the RSB buffer is empty due
|
||||
to mismatched CALLs/RETs or returning from a deep call stack, the branch
|
||||
predictor can fall back to using the Branch Target Buffer (BTB). If a
|
||||
user forces a BTB collision then the RET can speculatively branch to a
|
||||
user-controlled address.
|
||||
|
||||
* Note that RSB filling doesn't fully mitigate this issue. If there
|
||||
are enough unbalanced RETs, the RSB may still underflow and fall back
|
||||
to using a poisoned BTB entry.
|
||||
|
||||
* On context switch, user->user underflow attacks are mitigated by the
|
||||
conditional IBPB [#cond-ibpb]_ on context switch which effectively
|
||||
clears the BTB:
|
||||
|
||||
* "The indirect branch predictor barrier (IBPB) is an indirect branch
|
||||
control mechanism that establishes a barrier, preventing software
|
||||
that executed before the barrier from controlling the predicted
|
||||
targets of indirect branches executed after the barrier on the same
|
||||
logical processor." [#intel-ibpb-btb]_
|
||||
|
||||
* On context switch and VMEXIT, user->kernel and guest->host RSB
|
||||
underflows are mitigated by IBRS or eIBRS:
|
||||
|
||||
* "Enabling IBRS (including enhanced IBRS) will mitigate the "RSBU"
|
||||
attack demonstrated by the researchers. As previously documented,
|
||||
Intel recommends the use of enhanced IBRS, where supported. This
|
||||
includes any processor that enumerates RRSBA but not RRSBA_DIS_S."
|
||||
[#intel-rsbu]_
|
||||
|
||||
However, note that eIBRS and IBRS do not mitigate intra-mode attacks.
|
||||
Like RRSBA below, this is mitigated by clearing the BHB on kernel
|
||||
entry.
|
||||
|
||||
As an alternative to classic IBRS, call depth tracking (combined with
|
||||
retpolines) can be used to track kernel returns and fill the RSB when
|
||||
it gets close to being empty.
|
||||
|
||||
Restricted RSB Alternate (RRSBA)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Some newer Intel CPUs have Restricted RSB Alternate (RRSBA) behavior,
|
||||
which, similar to RSBA described above, also falls back to using the BTB
|
||||
on RSB underflow. The only difference is that the predicted targets are
|
||||
restricted to the current domain when eIBRS is enabled:
|
||||
|
||||
* "Restricted RSB Alternate (RRSBA) behavior allows alternate branch
|
||||
predictors to be used by near RET instructions when the RSB is
|
||||
empty. When eIBRS is enabled, the predicted targets of these
|
||||
alternate predictors are restricted to those belonging to the
|
||||
indirect branch predictor entries of the current prediction domain.
|
||||
[#intel-eibrs-rrsba]_
|
||||
|
||||
When a CPU with RRSBA is vulnerable to Branch History Injection
|
||||
[#bhi-paper]_ [#intel-bhi]_, an RSB underflow could be used for an
|
||||
intra-mode BTI attack. This is mitigated by clearing the BHB on
|
||||
kernel entry.
|
||||
|
||||
However if the kernel uses retpolines instead of eIBRS, it needs to
|
||||
disable RRSBA:
|
||||
|
||||
* "Where software is using retpoline as a mitigation for BHI or
|
||||
intra-mode BTI, and the processor both enumerates RRSBA and
|
||||
enumerates RRSBA_DIS controls, it should disable this behavior."
|
||||
[#intel-retpoline-rrsba]_
|
||||
|
||||
----
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [#spectre-rsb] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://arxiv.org/pdf/1807.07940.pdf>`_
|
||||
|
||||
.. [#intel-rsb-filling] "Empty RSB Mitigation on Skylake-generation" in `Retpoline: A Branch Target Injection Mitigation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/retpoline-branch-target-injection-mitigation.html#inpage-nav-5-1>`_
|
||||
|
||||
.. [#amd-rsb-filling] "Mitigation V2-3" in `Software Techniques for Managing Speculation <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf>`_
|
||||
|
||||
.. [#cond-ibpb] Whether IBPB is written depends on whether the prev and/or next task is protected from Spectre attacks. It typically requires opting in per task or system-wide. For more details see the documentation for the ``spectre_v2_user`` cmdline option in Documentation/admin-guide/kernel-parameters.txt.
|
||||
|
||||
.. [#amd-sbpb] IBPB without flushing of branch type predictions. Only exists for AMD.
|
||||
|
||||
.. [#amd-ibpb-rsb] "Function 8000_0008h -- Processor Capacity Parameters and Extended Feature Identification" in `AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf>`_. SBPB behaves the same way according to `this email <https://lore.kernel.org/5175b163a3736ca5fd01cedf406735636c99a>`_.
|
||||
|
||||
.. [#amd-ibpb-no-rsb] `Spectre Attacks: Exploiting Speculative Execution <https://comsec.ethz.ch/wp-content/files/ibpb_sp25.pdf>`_
|
||||
|
||||
.. [#intel-ibpb-rsb] "Introduction" in `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html>`_
|
||||
|
||||
.. [#amd-smep-rsb] "Existing Mitigations" in `Technical Guidance for Mitigating Branch Type Confusion <https://www.amd.com/content/dam/amd/en/documents/resources/technical-guidance-for-mitigating-branch-type-confusion.pdf>`_
|
||||
|
||||
.. [#intel-smep-rsb] "Enhanced IBRS" in `Indirect Branch Restricted Speculation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html>`_
|
||||
|
||||
.. [#amd-eibrs-vmexit] "Extended Feature Enable Register (EFER)" in `AMD64 Architecture Programmer's Manual Volume 2: System Programming <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf>`_
|
||||
|
||||
.. [#intel-eibrs-vmexit] "Enhanced IBRS" in `Indirect Branch Restricted Speculation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html>`_
|
||||
|
||||
.. [#intel-pbrsb] `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html>`_
|
||||
|
||||
.. [#retbleed-paper] `RETBleed: Arbitrary Speculative Code Execution with Return Instruction <https://comsec.ethz.ch/wp-content/files/retbleed_sec22.pdf>`_
|
||||
|
||||
.. [#amd-btc] `Technical Guidance for Mitigating Branch Type Confusion <https://www.amd.com/content/dam/amd/en/documents/resources/technical-guidance-for-mitigating-branch-type-confusion.pdf>`_
|
||||
|
||||
.. [#amd-srso] `Technical Update Regarding Speculative Return Stack Overflow <https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf>`_
|
||||
|
||||
.. [#inception-paper] `Inception: Exposing New Attack Surfaces with Training in Transient Execution <https://comsec.ethz.ch/wp-content/files/inception_sec23.pdf>`_
|
||||
|
||||
.. [#intel-rsbu] `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html>`_
|
||||
|
||||
.. [#intel-ibpb-btb] `Indirect Branch Predictor Barrier' <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-predictor-barrier.html>`_
|
||||
|
||||
.. [#intel-eibrs-rrsba] "Guidance for RSBU" in `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html>`_
|
||||
|
||||
.. [#bhi-paper] `Branch History Injection: On the Effectiveness of Hardware Mitigations Against Cross-Privilege Spectre-v2 Attacks <http://download.vusec.net/papers/bhi-spectre-bhb_sec22.pdf>`_
|
||||
|
||||
.. [#intel-bhi] `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html>`_
|
||||
|
||||
.. [#intel-retpoline-rrsba] "Retpoline" in `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html>`_
|
||||
@ -1,110 +0,0 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
VMSCAPE
|
||||
=======
|
||||
|
||||
VMSCAPE is a vulnerability that may allow a guest to influence the branch
|
||||
prediction in host userspace. It particularly affects hypervisors like QEMU.
|
||||
|
||||
Even if a hypervisor may not have any sensitive data like disk encryption keys,
|
||||
guest-userspace may be able to attack the guest-kernel using the hypervisor as
|
||||
a confused deputy.
|
||||
|
||||
Affected processors
|
||||
-------------------
|
||||
|
||||
The following CPU families are affected by VMSCAPE:
|
||||
|
||||
**Intel processors:**
|
||||
- Skylake generation (Parts without Enhanced-IBRS)
|
||||
- Cascade Lake generation - (Parts affected by ITS guest/host separation)
|
||||
- Alder Lake and newer (Parts affected by BHI)
|
||||
|
||||
Note that, BHI affected parts that use BHB clearing software mitigation e.g.
|
||||
Icelake are not vulnerable to VMSCAPE.
|
||||
|
||||
**AMD processors:**
|
||||
- Zen series (families 0x17, 0x19, 0x1a)
|
||||
|
||||
** Hygon processors:**
|
||||
- Family 0x18
|
||||
|
||||
Mitigation
|
||||
----------
|
||||
|
||||
Conditional IBPB
|
||||
----------------
|
||||
|
||||
Kernel tracks when a CPU has run a potentially malicious guest and issues an
|
||||
IBPB before the first exit to userspace after VM-exit. If userspace did not run
|
||||
between VM-exit and the next VM-entry, no IBPB is issued.
|
||||
|
||||
Note that the existing userspace mitigation against Spectre-v2 is effective in
|
||||
protecting the userspace. They are insufficient to protect the userspace VMMs
|
||||
from a malicious guest. This is because Spectre-v2 mitigations are applied at
|
||||
context switch time, while the userspace VMM can run after a VM-exit without a
|
||||
context switch.
|
||||
|
||||
Vulnerability enumeration and mitigation is not applied inside a guest. This is
|
||||
because nested hypervisors should already be deploying IBPB to isolate
|
||||
themselves from nested guests.
|
||||
|
||||
SMT considerations
|
||||
------------------
|
||||
|
||||
When Simultaneous Multi-Threading (SMT) is enabled, hypervisors can be
|
||||
vulnerable to cross-thread attacks. For complete protection against VMSCAPE
|
||||
attacks in SMT environments, STIBP should be enabled.
|
||||
|
||||
The kernel will issue a warning if SMT is enabled without adequate STIBP
|
||||
protection. Warning is not issued when:
|
||||
|
||||
- SMT is disabled
|
||||
- STIBP is enabled system-wide
|
||||
- Intel eIBRS is enabled (which implies STIBP protection)
|
||||
|
||||
System information and options
|
||||
------------------------------
|
||||
|
||||
The sysfs file showing VMSCAPE mitigation status is:
|
||||
|
||||
/sys/devices/system/cpu/vulnerabilities/vmscape
|
||||
|
||||
The possible values in this file are:
|
||||
|
||||
* 'Not affected':
|
||||
|
||||
The processor is not vulnerable to VMSCAPE attacks.
|
||||
|
||||
* 'Vulnerable':
|
||||
|
||||
The processor is vulnerable and no mitigation has been applied.
|
||||
|
||||
* 'Mitigation: IBPB before exit to userspace':
|
||||
|
||||
Conditional IBPB mitigation is enabled. The kernel tracks when a CPU has
|
||||
run a potentially malicious guest and issues an IBPB before the first
|
||||
exit to userspace after VM-exit.
|
||||
|
||||
* 'Mitigation: IBPB on VMEXIT':
|
||||
|
||||
IBPB is issued on every VM-exit. This occurs when other mitigations like
|
||||
RETBLEED or SRSO are already issuing IBPB on VM-exit.
|
||||
|
||||
Mitigation control on the kernel command line
|
||||
----------------------------------------------
|
||||
|
||||
The mitigation can be controlled via the ``vmscape=`` command line parameter:
|
||||
|
||||
* ``vmscape=off``:
|
||||
|
||||
Disable the VMSCAPE mitigation.
|
||||
|
||||
* ``vmscape=ibpb``:
|
||||
|
||||
Enable conditional IBPB mitigation (default when CONFIG_MITIGATION_VMSCAPE=y).
|
||||
|
||||
* ``vmscape=force``:
|
||||
|
||||
Force vulnerability detection and mitigation even on processors that are
|
||||
not known to be affected.
|
||||
@ -442,6 +442,9 @@
|
||||
arm64.nopauth [ARM64] Unconditionally disable Pointer Authentication
|
||||
support
|
||||
|
||||
arm64.nompam [ARM64] Unconditionally disable Memory Partitioning And
|
||||
Monitoring support
|
||||
|
||||
arm64.nomte [ARM64] Unconditionally disable Memory Tagging Extension
|
||||
support
|
||||
|
||||
@ -2215,6 +2218,9 @@
|
||||
per_cpu_perf_limits
|
||||
Allow per-logical-CPU P-State performance control limits using
|
||||
cpufreq sysfs interface
|
||||
no_cas
|
||||
Do not enable capacity-aware scheduling (CAS) on
|
||||
hybrid systems
|
||||
|
||||
intremap= [X86-64, Intel-IOMMU]
|
||||
on enable Interrupt Remapping (default)
|
||||
@ -2338,7 +2344,9 @@
|
||||
specified in the flag list (default: domain):
|
||||
|
||||
nohz
|
||||
Disable the tick when a single task runs.
|
||||
Disable the tick when a single task runs as well as
|
||||
disabling other kernel noises like having RCU callbacks
|
||||
offloaded. This is equivalent to the nohz_full parameter.
|
||||
|
||||
A residual 1Hz tick is offloaded to workqueues, which you
|
||||
need to affine to housekeeping through the global
|
||||
@ -3342,7 +3350,7 @@
|
||||
mem_encrypt=on: Activate SME
|
||||
mem_encrypt=off: Do not activate SME
|
||||
|
||||
Refer to Documentation/virt/kvm/amd-memory-encryption.rst
|
||||
Refer to Documentation/virt/kvm/x86/amd-memory-encryption.rst
|
||||
for details on when memory encryption can be activated.
|
||||
|
||||
mem_sleep_default= [SUSPEND] Default system suspend mode:
|
||||
@ -3427,7 +3435,6 @@
|
||||
srbds=off [X86,INTEL]
|
||||
ssbd=force-off [ARM64]
|
||||
tsx_async_abort=off [X86]
|
||||
vmscape=off [X86]
|
||||
|
||||
Exceptions:
|
||||
This does not have any effect on
|
||||
@ -4572,6 +4579,10 @@
|
||||
nomio [S390] Do not use MIO instructions.
|
||||
norid [S390] ignore the RID field and force use of
|
||||
one PCI domain per PCI function
|
||||
notph [PCIE] If the PCIE_TPH kernel config parameter
|
||||
is enabled, this kernel boot option can be used
|
||||
to disable PCIe TLP Processing Hints support
|
||||
system-wide.
|
||||
|
||||
pcie_aspm= [PCIE] Forcibly enable or ignore PCIe Active State Power
|
||||
Management.
|
||||
@ -4761,7 +4772,9 @@
|
||||
|
||||
prot_virt= [S390] enable hosting protected virtual machines
|
||||
isolated from the hypervisor (if hardware supports
|
||||
that).
|
||||
that). If enabled, the default kernel base address
|
||||
might be overridden even when Kernel Address Space
|
||||
Layout Randomization is disabled.
|
||||
Format: <bool>
|
||||
|
||||
psi= [KNL] Enable or disable pressure stall information
|
||||
@ -4889,6 +4902,10 @@
|
||||
Set maximum number of finished RCU callbacks to
|
||||
process in one batch.
|
||||
|
||||
rcutree.csd_lock_suppress_rcu_stall= [KNL]
|
||||
Do only a one-line RCU CPU stall warning when
|
||||
there is an ongoing too-long CSD-lock wait.
|
||||
|
||||
rcutree.do_rcu_barrier= [KNL]
|
||||
Request a call to rcu_barrier(). This is
|
||||
throttled so that userspace tests can safely
|
||||
@ -5112,6 +5129,15 @@
|
||||
test until boot completes in order to avoid
|
||||
interference.
|
||||
|
||||
rcuscale.kfree_by_call_rcu= [KNL]
|
||||
In kernels built with CONFIG_RCU_LAZY=y, test
|
||||
call_rcu() instead of kfree_rcu().
|
||||
|
||||
rcuscale.kfree_mult= [KNL]
|
||||
Instead of allocating an object of size kfree_obj,
|
||||
allocate one of kfree_mult * sizeof(kfree_obj).
|
||||
Defaults to 1.
|
||||
|
||||
rcuscale.kfree_rcu_test= [KNL]
|
||||
Set to measure performance of kfree_rcu() flooding.
|
||||
|
||||
@ -5157,7 +5183,7 @@
|
||||
the same as for rcuscale.nreaders.
|
||||
N, where N is the number of CPUs
|
||||
|
||||
rcuscale.perf_type= [KNL]
|
||||
rcuscale.scale_type= [KNL]
|
||||
Specify the RCU implementation to test.
|
||||
|
||||
rcuscale.shutdown= [KNL]
|
||||
@ -5311,7 +5337,13 @@
|
||||
Time to wait (s) after boot before inducing stall.
|
||||
|
||||
rcutorture.stall_cpu_irqsoff= [KNL]
|
||||
Disable interrupts while stalling if set.
|
||||
Disable interrupts while stalling if set, but only
|
||||
on the first stall in the set.
|
||||
|
||||
rcutorture.stall_cpu_repeat= [KNL]
|
||||
Number of times to repeat the stall sequence,
|
||||
so that rcutorture.stall_cpu_repeat=3 will result
|
||||
in four stall sequences.
|
||||
|
||||
rcutorture.stall_gp_kthread= [KNL]
|
||||
Duration (s) of forced sleep within RCU
|
||||
@ -5557,6 +5589,12 @@
|
||||
test until boot completes in order to avoid
|
||||
interference.
|
||||
|
||||
refscale.lookup_instances= [KNL]
|
||||
Number of data elements to use for the forms of
|
||||
SLAB_TYPESAFE_BY_RCU testing. A negative number
|
||||
is negated and multiplied by nr_cpu_ids, while
|
||||
zero specifies nr_cpu_ids.
|
||||
|
||||
refscale.loops= [KNL]
|
||||
Set the number of loops over the synchronization
|
||||
primitive under test. Increasing this number
|
||||
@ -5993,6 +6031,13 @@
|
||||
This feature may be more efficiently disabled
|
||||
using the csdlock_debug- kernel parameter.
|
||||
|
||||
smp.panic_on_ipistall= [KNL]
|
||||
If a csd_lock_timeout extends for more than
|
||||
the specified number of milliseconds, panic the
|
||||
system. By default, let CSD-lock acquisition
|
||||
take as long as they take. Specifying 300,000
|
||||
for this value provides a 5-minute timeout.
|
||||
|
||||
smsc-ircc2.nopnp [HW] Don't use PNP to discover SMC devices
|
||||
smsc-ircc2.ircc_cfg= [HW] Device configuration I/O port
|
||||
smsc-ircc2.ircc_sir= [HW] SIR base I/O port
|
||||
@ -6064,6 +6109,8 @@
|
||||
|
||||
Selecting 'on' will also enable the mitigation
|
||||
against user space to user space task attacks.
|
||||
Selecting specific mitigation does not force enable
|
||||
user mitigations.
|
||||
|
||||
Selecting 'off' will disable both the kernel and
|
||||
the user space protections.
|
||||
@ -7105,16 +7152,6 @@
|
||||
vmpoff= [KNL,S390] Perform z/VM CP command after power off.
|
||||
Format: <command>
|
||||
|
||||
vmscape= [X86] Controls mitigation for VMscape attacks.
|
||||
VMscape attacks can leak information from a userspace
|
||||
hypervisor to a guest via speculative side-channels.
|
||||
|
||||
off - disable the mitigation
|
||||
ibpb - use Indirect Branch Prediction Barrier
|
||||
(IBPB) mitigation (default)
|
||||
force - force vulnerability detection even on
|
||||
unaffected processors
|
||||
|
||||
vsyscall= [X86-64]
|
||||
Controls the behavior of vsyscalls (i.e. calls to
|
||||
fixed addresses of 0xffffffffff600x00 from legacy
|
||||
|
||||
@ -378,6 +378,13 @@ Note that the number of overcommit and reserve pages remain global quantities,
|
||||
as we don't know until fault time, when the faulting task's mempolicy is
|
||||
applied, from which node the huge page allocation will be attempted.
|
||||
|
||||
The hugetlb may be migrated between the per-node hugepages pool in the following
|
||||
scenarios: memory offline, memory failure, longterm pinning, syscalls(mbind,
|
||||
migrate_pages and move_pages), alloc_contig_range() and alloc_contig_pages().
|
||||
Now only memory offline, memory failure and syscalls allow fallbacking to allocate
|
||||
a new hugetlb on a different node if the current node is unable to allocate during
|
||||
hugetlb migration, that means these 3 cases can break the per-node hugepages pool.
|
||||
|
||||
.. _using_huge_pages:
|
||||
|
||||
Using Huge Pages
|
||||
|
||||
@ -34,7 +34,7 @@ strongly-ordered (SO) PCIE write traffic to local/remote memory. Please see
|
||||
traffic coverage.
|
||||
|
||||
The events and configuration options of this PMU device are described in sysfs,
|
||||
see /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>.
|
||||
see /sys/bus/event_source/devices/nvidia_scf_pmu_<socket-id>.
|
||||
|
||||
Example usage:
|
||||
|
||||
@ -66,7 +66,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
|
||||
the PMU traffic coverage.
|
||||
|
||||
The events and configuration options of this PMU device are described in sysfs,
|
||||
see /sys/bus/event_sources/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
|
||||
see /sys/bus/event_source/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
|
||||
|
||||
Example usage:
|
||||
|
||||
@ -86,6 +86,22 @@ Example usage:
|
||||
|
||||
perf stat -a -e nvidia_nvlink_c2c0_pmu_3/event=0x0/
|
||||
|
||||
The NVLink-C2C has two ports that can be connected to one GPU (occupying both
|
||||
ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
|
||||
parameter to select the port(s) to monitor. Each bit represents the port number,
|
||||
e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
|
||||
PMU will monitor both ports by default if not specified.
|
||||
|
||||
Example for port filtering:
|
||||
|
||||
* Count event id 0x0 from the GPU connected with socket 0 on port 0::
|
||||
|
||||
perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x1/
|
||||
|
||||
* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
|
||||
|
||||
perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x3/
|
||||
|
||||
NVLink-C2C1 PMU
|
||||
-------------------
|
||||
|
||||
@ -96,7 +112,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
|
||||
the PMU traffic coverage.
|
||||
|
||||
The events and configuration options of this PMU device are described in sysfs,
|
||||
see /sys/bus/event_sources/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
|
||||
see /sys/bus/event_source/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
|
||||
|
||||
Example usage:
|
||||
|
||||
@ -116,6 +132,22 @@ Example usage:
|
||||
|
||||
perf stat -a -e nvidia_nvlink_c2c1_pmu_3/event=0x0/
|
||||
|
||||
The NVLink-C2C has two ports that can be connected to one GPU (occupying both
|
||||
ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
|
||||
parameter to select the port(s) to monitor. Each bit represents the port number,
|
||||
e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
|
||||
PMU will monitor both ports by default if not specified.
|
||||
|
||||
Example for port filtering:
|
||||
|
||||
* Count event id 0x0 from the GPU connected with socket 0 on port 0::
|
||||
|
||||
perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x1/
|
||||
|
||||
* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
|
||||
|
||||
perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x3/
|
||||
|
||||
CNVLink PMU
|
||||
---------------
|
||||
|
||||
@ -125,13 +157,14 @@ to local memory. For PCIE traffic, this PMU captures read and relaxed ordered
|
||||
for more info about the PMU traffic coverage.
|
||||
|
||||
The events and configuration options of this PMU device are described in sysfs,
|
||||
see /sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>.
|
||||
see /sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>.
|
||||
|
||||
Each SoC socket can be connected to one or more sockets via CNVLink. The user can
|
||||
use "rem_socket" bitmap parameter to select the remote socket(s) to monitor.
|
||||
Each bit represents the socket number, e.g. "rem_socket=0xE" corresponds to
|
||||
socket 1 to 3.
|
||||
/sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
|
||||
socket 1 to 3. The PMU will monitor all remote sockets by default if not
|
||||
specified.
|
||||
/sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
|
||||
shows the valid bits that can be set in the "rem_socket" parameter.
|
||||
|
||||
The PMU can not distinguish the remote traffic initiator, therefore it does not
|
||||
@ -165,12 +198,13 @@ local/remote memory. Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section
|
||||
for more info about the PMU traffic coverage.
|
||||
|
||||
The events and configuration options of this PMU device are described in sysfs,
|
||||
see /sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>.
|
||||
see /sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>.
|
||||
|
||||
Each SoC socket can support multiple root ports. The user can use
|
||||
"root_port" bitmap parameter to select the port(s) to monitor, i.e.
|
||||
"root_port=0xF" corresponds to root port 0 to 3.
|
||||
/sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
|
||||
"root_port=0xF" corresponds to root port 0 to 3. The PMU will monitor all root
|
||||
ports by default if not specified.
|
||||
/sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
|
||||
shows the valid bits that can be set in the "root_port" parameter.
|
||||
|
||||
Example usage:
|
||||
|
||||
@ -230,8 +230,8 @@ with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond
|
||||
to the request from AMD P-States.
|
||||
|
||||
|
||||
User Space Interface in ``sysfs``
|
||||
==================================
|
||||
User Space Interface in ``sysfs`` - Per-policy control
|
||||
======================================================
|
||||
|
||||
``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
|
||||
control its functionality at the system level. They are located in the
|
||||
@ -262,6 +262,52 @@ lowest non-linear performance in `AMD CPPC Performance Capability
|
||||
<perf_cap_>`_.)
|
||||
This attribute is read-only.
|
||||
|
||||
``amd_pstate_hw_prefcore``
|
||||
|
||||
Whether the platform supports the preferred core feature and it has been
|
||||
enabled. This attribute is read-only.
|
||||
|
||||
``amd_pstate_prefcore_ranking``
|
||||
|
||||
The performance ranking of the core. This number doesn't have any unit, but
|
||||
larger numbers are preferred at the time of reading. This can change at
|
||||
runtime based on platform conditions. This attribute is read-only.
|
||||
|
||||
``energy_performance_available_preferences``
|
||||
|
||||
A list of all the supported EPP preferences that could be used for
|
||||
``energy_performance_preference`` on this system.
|
||||
These profiles represent different hints that are provided
|
||||
to the low-level firmware about the user's desired energy vs efficiency
|
||||
tradeoff. ``default`` represents the epp value is set by platform
|
||||
firmware. This attribute is read-only.
|
||||
|
||||
``energy_performance_preference``
|
||||
|
||||
The current energy performance preference can be read from this attribute.
|
||||
and user can change current preference according to energy or performance needs
|
||||
Please get all support profiles list from
|
||||
``energy_performance_available_preferences`` attribute, all the profiles are
|
||||
integer values defined between 0 to 255 when EPP feature is enabled by platform
|
||||
firmware, if EPP feature is disabled, driver will ignore the written value
|
||||
This attribute is read-write.
|
||||
|
||||
``boost``
|
||||
The `boost` sysfs attribute provides control over the CPU core
|
||||
performance boost, allowing users to manage the maximum frequency limitation
|
||||
of the CPU. This attribute can be used to enable or disable the boost feature
|
||||
on individual CPUs.
|
||||
|
||||
When the boost feature is enabled, the CPU can dynamically increase its frequency
|
||||
beyond the base frequency, providing enhanced performance for demanding workloads.
|
||||
On the other hand, disabling the boost feature restricts the CPU to operate at the
|
||||
base frequency, which may be desirable in certain scenarios to prioritize power
|
||||
efficiency or manage temperature.
|
||||
|
||||
To manipulate the `boost` attribute, users can write a value of `0` to disable the
|
||||
boost or `1` to enable it, for the respective CPU using the sysfs path
|
||||
`/sys/devices/system/cpu/cpuX/cpufreq/boost`, where `X` represents the CPU number.
|
||||
|
||||
Other performance and frequency values can be read back from
|
||||
``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`.
|
||||
|
||||
@ -280,8 +326,35 @@ module which supports the new AMD P-States mechanism on most of the future AMD
|
||||
platforms. The AMD P-States mechanism is the more performance and energy
|
||||
efficiency frequency management method on AMD processors.
|
||||
|
||||
Kernel Module Options for ``amd-pstate``
|
||||
=========================================
|
||||
|
||||
``amd-pstate`` Driver Operation Modes
|
||||
======================================
|
||||
|
||||
``amd_pstate`` CPPC has 3 operation modes: autonomous (active) mode,
|
||||
non-autonomous (passive) mode and guided autonomous (guided) mode.
|
||||
Active/passive/guided mode can be chosen by different kernel parameters.
|
||||
|
||||
- In autonomous mode, platform ignores the desired performance level request
|
||||
and takes into account only the values set to the minimum, maximum and energy
|
||||
performance preference registers.
|
||||
- In non-autonomous mode, platform gets desired performance level
|
||||
from OS directly through Desired Performance Register.
|
||||
- In guided-autonomous mode, platform sets operating performance level
|
||||
autonomously according to the current workload and within the limits set by
|
||||
OS through min and max performance registers.
|
||||
|
||||
Active Mode
|
||||
------------
|
||||
|
||||
``amd_pstate=active``
|
||||
|
||||
This is the low-level firmware control mode which is implemented by ``amd_pstate_epp``
|
||||
driver with ``amd_pstate=active`` passed to the kernel in the command line.
|
||||
In this mode, ``amd_pstate_epp`` driver provides a hint to the hardware if software
|
||||
wants to bias toward performance (0x0) or energy efficiency (0xff) to the CPPC firmware.
|
||||
then CPPC power algorithm will calculate the runtime workload and adjust the realtime
|
||||
cores frequency according to the power supply and thermal, core voltage and some other
|
||||
hardware conditions.
|
||||
|
||||
Passive Mode
|
||||
------------
|
||||
@ -297,6 +370,102 @@ to the Performance Reduction Tolerance register. Above the nominal performance l
|
||||
processor must provide at least nominal performance requested and go higher if current
|
||||
operating conditions allow.
|
||||
|
||||
Guided Mode
|
||||
-----------
|
||||
|
||||
``amd_pstate=guided``
|
||||
|
||||
If ``amd_pstate=guided`` is passed to kernel command line option then this mode
|
||||
is activated. In this mode, driver requests minimum and maximum performance
|
||||
level and the platform autonomously selects a performance level in this range
|
||||
and appropriate to the current workload.
|
||||
|
||||
``amd-pstate`` Preferred Core
|
||||
=================================
|
||||
|
||||
The core frequency is subjected to the process variation in semiconductors.
|
||||
Not all cores are able to reach the maximum frequency respecting the
|
||||
infrastructure limits. Consequently, AMD has redefined the concept of
|
||||
maximum frequency of a part. This means that a fraction of cores can reach
|
||||
maximum frequency. To find the best process scheduling policy for a given
|
||||
scenario, OS needs to know the core ordering informed by the platform through
|
||||
highest performance capability register of the CPPC interface.
|
||||
|
||||
``amd-pstate`` preferred core enables the scheduler to prefer scheduling on
|
||||
cores that can achieve a higher frequency with lower voltage. The preferred
|
||||
core rankings can dynamically change based on the workload, platform conditions,
|
||||
thermals and ageing.
|
||||
|
||||
The priority metric will be initialized by the ``amd-pstate`` driver. The ``amd-pstate``
|
||||
driver will also determine whether or not ``amd-pstate`` preferred core is
|
||||
supported by the platform.
|
||||
|
||||
``amd-pstate`` driver will provide an initial core ordering when the system boots.
|
||||
The platform uses the CPPC interfaces to communicate the core ranking to the
|
||||
operating system and scheduler to make sure that OS is choosing the cores
|
||||
with highest performance firstly for scheduling the process. When ``amd-pstate``
|
||||
driver receives a message with the highest performance change, it will
|
||||
update the core ranking and set the cpu's priority.
|
||||
|
||||
``amd-pstate`` Preferred Core Switch
|
||||
=====================================
|
||||
Kernel Parameters
|
||||
-----------------
|
||||
|
||||
``amd-pstate`` peferred core`` has two states: enable and disable.
|
||||
Enable/disable states can be chosen by different kernel parameters.
|
||||
Default enable ``amd-pstate`` preferred core.
|
||||
|
||||
``amd_prefcore=disable``
|
||||
|
||||
For systems that support ``amd-pstate`` preferred core, the core rankings will
|
||||
always be advertised by the platform. But OS can choose to ignore that via the
|
||||
kernel parameter ``amd_prefcore=disable``.
|
||||
|
||||
User Space Interface in ``sysfs`` - General
|
||||
===========================================
|
||||
|
||||
Global Attributes
|
||||
-----------------
|
||||
|
||||
``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
|
||||
control its functionality at the system level. They are located in the
|
||||
``/sys/devices/system/cpu/amd_pstate/`` directory and affect all CPUs.
|
||||
|
||||
``status``
|
||||
Operation mode of the driver: "active", "passive", "guided" or "disable".
|
||||
|
||||
"active"
|
||||
The driver is functional and in the ``active mode``
|
||||
|
||||
"passive"
|
||||
The driver is functional and in the ``passive mode``
|
||||
|
||||
"guided"
|
||||
The driver is functional and in the ``guided mode``
|
||||
|
||||
"disable"
|
||||
The driver is unregistered and not functional now.
|
||||
|
||||
This attribute can be written to in order to change the driver's
|
||||
operation mode or to unregister it. The string written to it must be
|
||||
one of the possible values of it and, if successful, writing one of
|
||||
these values to the sysfs file will cause the driver to switch over
|
||||
to the operation mode represented by that string - or to be
|
||||
unregistered in the "disable" case.
|
||||
|
||||
``prefcore``
|
||||
Preferred core state of the driver: "enabled" or "disabled".
|
||||
|
||||
"enabled"
|
||||
Enable the ``amd-pstate`` preferred core.
|
||||
|
||||
"disabled"
|
||||
Disable the ``amd-pstate`` preferred core
|
||||
|
||||
|
||||
This attribute is read-only to check the state of preferred core set
|
||||
by the kernel parameter.
|
||||
|
||||
``cpupower`` tool support for ``amd-pstate``
|
||||
===============================================
|
||||
@ -405,37 +574,55 @@ Unit Tests for amd-pstate
|
||||
|
||||
1. Test case descriptions
|
||||
|
||||
1). Basic tests
|
||||
|
||||
Test prerequisite and basic functions for the ``amd-pstate`` driver.
|
||||
|
||||
+---------+--------------------------------+------------------------------------------------------------------------------------+
|
||||
| Index | Functions | Description |
|
||||
+=========+================================+====================================================================================+
|
||||
| 0 | amd_pstate_ut_acpi_cpc_valid || Check whether the _CPC object is present in SBIOS. |
|
||||
| 1 | amd_pstate_ut_acpi_cpc_valid || Check whether the _CPC object is present in SBIOS. |
|
||||
| | || |
|
||||
| | || The detail refer to `Processor Support <processor_support_>`_. |
|
||||
+---------+--------------------------------+------------------------------------------------------------------------------------+
|
||||
| 1 | amd_pstate_ut_check_enabled || Check whether AMD P-State is enabled. |
|
||||
| 2 | amd_pstate_ut_check_enabled || Check whether AMD P-State is enabled. |
|
||||
| | || |
|
||||
| | || AMD P-States and ACPI hardware P-States always can be supported in one processor. |
|
||||
| | | But AMD P-States has the higher priority and if it is enabled with |
|
||||
| | | :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond to the |
|
||||
| | | request from AMD P-States. |
|
||||
+---------+--------------------------------+------------------------------------------------------------------------------------+
|
||||
| 2 | amd_pstate_ut_check_perf || Check if the each performance values are reasonable. |
|
||||
| 3 | amd_pstate_ut_check_perf || Check if the each performance values are reasonable. |
|
||||
| | || highest_perf >= nominal_perf > lowest_nonlinear_perf > lowest_perf > 0. |
|
||||
+---------+--------------------------------+------------------------------------------------------------------------------------+
|
||||
| 3 | amd_pstate_ut_check_freq || Check if the each frequency values and max freq when set support boost mode |
|
||||
| 4 | amd_pstate_ut_check_freq || Check if the each frequency values and max freq when set support boost mode |
|
||||
| | | are reasonable. |
|
||||
| | || max_freq >= nominal_freq > lowest_nonlinear_freq > min_freq > 0 |
|
||||
| | || If boost is not active but supported, this maximum frequency will be larger than |
|
||||
| | | the one in ``cpuinfo``. |
|
||||
+---------+--------------------------------+------------------------------------------------------------------------------------+
|
||||
|
||||
2). Tbench test
|
||||
|
||||
Test and monitor the cpu changes when running tbench benchmark under the specified governor.
|
||||
These changes include desire performance, frequency, load, performance, energy etc.
|
||||
The specified governor is ondemand or schedutil.
|
||||
Tbench can also be tested on the ``acpi-cpufreq`` kernel driver for comparison.
|
||||
|
||||
3). Gitsource test
|
||||
|
||||
Test and monitor the cpu changes when running gitsource benchmark under the specified governor.
|
||||
These changes include desire performance, frequency, load, time, energy etc.
|
||||
The specified governor is ondemand or schedutil.
|
||||
Gitsource can also be tested on the ``acpi-cpufreq`` kernel driver for comparison.
|
||||
|
||||
#. How to execute the tests
|
||||
|
||||
We use test module in the kselftest frameworks to implement it.
|
||||
We create amd-pstate-ut module and tie it into kselftest.(for
|
||||
We create ``amd-pstate-ut`` module and tie it into kselftest.(for
|
||||
details refer to Linux Kernel Selftests [4]_).
|
||||
|
||||
1. Build
|
||||
1). Build
|
||||
|
||||
+ open the :c:macro:`CONFIG_X86_AMD_PSTATE` configuration option.
|
||||
+ set the :c:macro:`CONFIG_X86_AMD_PSTATE_UT` configuration option to M.
|
||||
@ -445,23 +632,159 @@ Unit Tests for amd-pstate
|
||||
$ cd linux
|
||||
$ make -C tools/testing/selftests
|
||||
|
||||
#. Installation & Steps ::
|
||||
+ make perf ::
|
||||
|
||||
$ cd tools/perf/
|
||||
$ make
|
||||
|
||||
|
||||
2). Installation & Steps ::
|
||||
|
||||
$ make -C tools/testing/selftests install INSTALL_PATH=~/kselftest
|
||||
$ cp tools/perf/perf /usr/bin/perf
|
||||
$ sudo ./kselftest/run_kselftest.sh -c amd-pstate
|
||||
TAP version 13
|
||||
1..1
|
||||
# selftests: amd-pstate: amd-pstate-ut.sh
|
||||
# amd-pstate-ut: ok
|
||||
ok 1 selftests: amd-pstate: amd-pstate-ut.sh
|
||||
|
||||
#. Results ::
|
||||
3). Specified test case ::
|
||||
|
||||
$ dmesg | grep "amd_pstate_ut" | tee log.txt
|
||||
[12977.570663] amd_pstate_ut: 1 amd_pstate_ut_acpi_cpc_valid success!
|
||||
[12977.570673] amd_pstate_ut: 2 amd_pstate_ut_check_enabled success!
|
||||
[12977.571207] amd_pstate_ut: 3 amd_pstate_ut_check_perf success!
|
||||
[12977.571212] amd_pstate_ut: 4 amd_pstate_ut_check_freq success!
|
||||
$ cd ~/kselftest/amd-pstate
|
||||
$ sudo ./run.sh -t basic
|
||||
$ sudo ./run.sh -t tbench
|
||||
$ sudo ./run.sh -t tbench -m acpi-cpufreq
|
||||
$ sudo ./run.sh -t gitsource
|
||||
$ sudo ./run.sh -t gitsource -m acpi-cpufreq
|
||||
$ ./run.sh --help
|
||||
./run.sh: illegal option -- -
|
||||
Usage: ./run.sh [OPTION...]
|
||||
[-h <help>]
|
||||
[-o <output-file-for-dump>]
|
||||
[-c <all: All testing,
|
||||
basic: Basic testing,
|
||||
tbench: Tbench testing,
|
||||
gitsource: Gitsource testing.>]
|
||||
[-t <tbench time limit>]
|
||||
[-p <tbench process number>]
|
||||
[-l <loop times for tbench>]
|
||||
[-i <amd tracer interval>]
|
||||
[-m <comparative test: acpi-cpufreq>]
|
||||
|
||||
|
||||
4). Results
|
||||
|
||||
+ basic
|
||||
|
||||
When you finish test, you will get the following log info ::
|
||||
|
||||
$ dmesg | grep "amd_pstate_ut" | tee log.txt
|
||||
[12977.570663] amd_pstate_ut: 1 amd_pstate_ut_acpi_cpc_valid success!
|
||||
[12977.570673] amd_pstate_ut: 2 amd_pstate_ut_check_enabled success!
|
||||
[12977.571207] amd_pstate_ut: 3 amd_pstate_ut_check_perf success!
|
||||
[12977.571212] amd_pstate_ut: 4 amd_pstate_ut_check_freq success!
|
||||
|
||||
+ tbench
|
||||
|
||||
When you finish test, you will get selftest.tbench.csv and png images.
|
||||
The selftest.tbench.csv file contains the raw data and the drop of the comparative test.
|
||||
The png images shows the performance, energy and performan per watt of each test.
|
||||
Open selftest.tbench.csv :
|
||||
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ Governor | Round | Des-perf | Freq | Load | Performance | Energy | Performance Per Watt |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ Unit | | | GHz | | MB/s | J | MB/J |
|
||||
+=================================================+==============+==========+=========+==========+=============+=========+======================+
|
||||
+ amd-pstate-ondemand | 1 | | | | 2504.05 | 1563.67 | 158.5378 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-ondemand | 2 | | | | 2243.64 | 1430.32 | 155.2941 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-ondemand | 3 | | | | 2183.88 | 1401.32 | 154.2860 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-ondemand | Average | | | | 2310.52 | 1465.1 | 156.1268 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-schedutil | 1 | 165.329 | 1.62257 | 99.798 | 2136.54 | 1395.26 | 151.5971 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-schedutil | 2 | 166 | 1.49761 | 99.9993 | 2100.56 | 1380.5 | 150.6377 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-schedutil | 3 | 166 | 1.47806 | 99.9993 | 2084.12 | 1375.76 | 149.9737 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-schedutil | Average | 165.776 | 1.53275 | 99.9322 | 2107.07 | 1383.84 | 150.7399 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand | 1 | | | | 2529.9 | 1564.4 | 160.0997 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand | 2 | | | | 2249.76 | 1432.97 | 155.4297 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand | 3 | | | | 2181.46 | 1406.88 | 153.5060 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand | Average | | | | 2320.37 | 1468.08 | 156.4741 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-schedutil | 1 | | | | 2137.64 | 1385.24 | 152.7723 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-schedutil | 2 | | | | 2107.05 | 1372.23 | 152.0138 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-schedutil | 3 | | | | 2085.86 | 1365.35 | 151.2433 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-schedutil | Average | | | | 2110.18 | 1374.27 | 152.0136 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand VS acpi-cpufreq-schedutil | Comprison(%) | | | | -9.0584 | -6.3899 | -2.8506 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-ondemand VS amd-pstate-schedutil | Comprison(%) | | | | 8.8053 | -5.5463 | -3.4503 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand VS amd-pstate-ondemand | Comprison(%) | | | | -0.4245 | -0.2029 | -0.2219 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-schedutil VS amd-pstate-schedutil | Comprison(%) | | | | -0.1473 | 0.6963 | -0.8378 |
|
||||
+-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
|
||||
|
||||
+ gitsource
|
||||
|
||||
When you finish test, you will get selftest.gitsource.csv and png images.
|
||||
The selftest.gitsource.csv file contains the raw data and the drop of the comparative test.
|
||||
The png images shows the performance, energy and performan per watt of each test.
|
||||
Open selftest.gitsource.csv :
|
||||
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ Governor | Round | Des-perf | Freq | Load | Time | Energy | Performance Per Watt |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ Unit | | | GHz | | s | J | 1/J |
|
||||
+=================================================+==============+==========+==========+==========+=============+=========+======================+
|
||||
+ amd-pstate-ondemand | 1 | 50.119 | 2.10509 | 23.3076 | 475.69 | 865.78 | 0.001155027 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-ondemand | 2 | 94.8006 | 1.98771 | 56.6533 | 467.1 | 839.67 | 0.001190944 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-ondemand | 3 | 76.6091 | 2.53251 | 43.7791 | 467.69 | 855.85 | 0.001168429 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-ondemand | Average | 73.8429 | 2.20844 | 41.2467 | 470.16 | 853.767 | 0.001171279 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-schedutil | 1 | 165.919 | 1.62319 | 98.3868 | 464.17 | 866.8 | 0.001153668 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-schedutil | 2 | 165.97 | 1.31309 | 99.5712 | 480.15 | 880.4 | 0.001135847 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-schedutil | 3 | 165.973 | 1.28448 | 99.9252 | 481.79 | 867.02 | 0.001153375 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-schedutil | Average | 165.954 | 1.40692 | 99.2944 | 475.37 | 871.407 | 0.001147569 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand | 1 | | | | 2379.62 | 742.96 | 0.001345967 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand | 2 | | | | 441.74 | 817.49 | 0.001223256 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand | 3 | | | | 455.48 | 820.01 | 0.001219497 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand | Average | | | | 425.613 | 793.487 | 0.001260260 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-schedutil | 1 | | | | 459.69 | 838.54 | 0.001192548 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-schedutil | 2 | | | | 466.55 | 830.89 | 0.001203528 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-schedutil | 3 | | | | 470.38 | 837.32 | 0.001194286 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-schedutil | Average | | | | 465.54 | 835.583 | 0.001196769 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand VS acpi-cpufreq-schedutil | Comprison(%) | | | | 9.3810 | 5.3051 | -5.0379 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ amd-pstate-ondemand VS amd-pstate-schedutil | Comprison(%) | 124.7392 | -36.2934 | 140.7329 | 1.1081 | 2.0661 | -2.0242 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-ondemand VS amd-pstate-ondemand | Comprison(%) | | | | 10.4665 | 7.5968 | -7.0605 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
+ acpi-cpufreq-schedutil VS amd-pstate-schedutil | Comprison(%) | | | | 2.1115 | 4.2873 | -4.1110 |
|
||||
+-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
|
||||
|
||||
Reference
|
||||
===========
|
||||
|
||||
@ -248,6 +248,20 @@ are the following:
|
||||
If that frequency cannot be determined, this attribute should not
|
||||
be present.
|
||||
|
||||
``cpuinfo_avg_freq``
|
||||
An average frequency (in KHz) of all CPUs belonging to a given policy,
|
||||
derived from a hardware provided feedback and reported on a time frame
|
||||
spanning at most few milliseconds.
|
||||
|
||||
This is expected to be based on the frequency the hardware actually runs
|
||||
at and, as such, might require specialised hardware support (such as AMU
|
||||
extension on ARM). If one cannot be determined, this attribute should
|
||||
not be present.
|
||||
|
||||
Note, that failed attempt to retrieve current frequency for a given
|
||||
CPU(s) will result in an appropriate error, i.e: EAGAIN for CPU that
|
||||
remains idle (raised on ARM).
|
||||
|
||||
``cpuinfo_max_freq``
|
||||
Maximum possible operating frequency the CPUs belonging to this policy
|
||||
can run at (in kHz).
|
||||
@ -289,7 +303,8 @@ are the following:
|
||||
Some architectures (e.g. ``x86``) may attempt to provide information
|
||||
more precisely reflecting the current CPU frequency through this
|
||||
attribute, but that still may not be the exact current CPU frequency as
|
||||
seen by the hardware at the moment.
|
||||
seen by the hardware at the moment. This behavior though, is only
|
||||
available via c:macro:``CPUFREQ_ARCH_CUR_FREQ`` option.
|
||||
|
||||
``scaling_driver``
|
||||
The scaling driver currently in use.
|
||||
|
||||
@ -269,61 +269,56 @@ Namely, when invoked to select an idle state for a CPU (i.e. an idle state that
|
||||
the CPU will ask the processor hardware to enter), it attempts to predict the
|
||||
idle duration and uses the predicted value for idle state selection.
|
||||
|
||||
It first obtains the time until the closest timer event with the assumption
|
||||
that the scheduler tick will be stopped. That time, referred to as the *sleep
|
||||
length* in what follows, is the upper bound on the time before the next CPU
|
||||
wakeup. It is used to determine the sleep length range, which in turn is needed
|
||||
to get the sleep length correction factor.
|
||||
It first uses a simple pattern recognition algorithm to obtain a preliminary
|
||||
idle duration prediction. Namely, it saves the last 8 observed idle duration
|
||||
values and, when predicting the idle duration next time, it computes the average
|
||||
and variance of them. If the variance is small (smaller than 400 square
|
||||
milliseconds) or it is small relative to the average (the average is greater
|
||||
that 6 times the standard deviation), the average is regarded as the "typical
|
||||
interval" value. Otherwise, either the longest or the shortest (depending on
|
||||
which one is farther from the average) of the saved observed idle duration
|
||||
values is discarded and the computation is repeated for the remaining ones.
|
||||
|
||||
The ``menu`` governor maintains two arrays of sleep length correction factors.
|
||||
One of them is used when tasks previously running on the given CPU are waiting
|
||||
for some I/O operations to complete and the other one is used when that is not
|
||||
the case. Each array contains several correction factor values that correspond
|
||||
to different sleep length ranges organized so that each range represented in the
|
||||
array is approximately 10 times wider than the previous one.
|
||||
Again, if the variance of them is small (in the above sense), the average is
|
||||
taken as the "typical interval" value and so on, until either the "typical
|
||||
interval" is determined or too many data points are disregarded. In the latter
|
||||
case, if the size of the set of data points still under consideration is
|
||||
sufficiently large, the next idle duration is not likely to be above the largest
|
||||
idle duration value still in that set, so that value is taken as the predicted
|
||||
next idle duration. Finally, if the set of data points still under
|
||||
consideration is too small, no prediction is made.
|
||||
|
||||
If the preliminary prediction of the next idle duration computed this way is
|
||||
long enough, the governor obtains the time until the closest timer event with
|
||||
the assumption that the scheduler tick will be stopped. That time, referred to
|
||||
as the *sleep length* in what follows, is the upper bound on the time before the
|
||||
next CPU wakeup. It is used to determine the sleep length range, which in turn
|
||||
is needed to get the sleep length correction factor.
|
||||
|
||||
The ``menu`` governor maintains an array containing several correction factor
|
||||
values that correspond to different sleep length ranges organized so that each
|
||||
range represented in the array is approximately 10 times wider than the previous
|
||||
one.
|
||||
|
||||
The correction factor for the given sleep length range (determined before
|
||||
selecting the idle state for the CPU) is updated after the CPU has been woken
|
||||
up and the closer the sleep length is to the observed idle duration, the closer
|
||||
to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
|
||||
The sleep length is multiplied by the correction factor for the range that it
|
||||
falls into to obtain the first approximation of the predicted idle duration.
|
||||
falls into to obtain an approximation of the predicted idle duration that is
|
||||
compared to the "typical interval" determined previously and the minimum of
|
||||
the two is taken as the final idle duration prediction.
|
||||
|
||||
Next, the governor uses a simple pattern recognition algorithm to refine its
|
||||
idle duration prediction. Namely, it saves the last 8 observed idle duration
|
||||
values and, when predicting the idle duration next time, it computes the average
|
||||
and variance of them. If the variance is small (smaller than 400 square
|
||||
milliseconds) or it is small relative to the average (the average is greater
|
||||
that 6 times the standard deviation), the average is regarded as the "typical
|
||||
interval" value. Otherwise, the longest of the saved observed idle duration
|
||||
values is discarded and the computation is repeated for the remaining ones.
|
||||
Again, if the variance of them is small (in the above sense), the average is
|
||||
taken as the "typical interval" value and so on, until either the "typical
|
||||
interval" is determined or too many data points are disregarded, in which case
|
||||
the "typical interval" is assumed to equal "infinity" (the maximum unsigned
|
||||
integer value). The "typical interval" computed this way is compared with the
|
||||
sleep length multiplied by the correction factor and the minimum of the two is
|
||||
taken as the predicted idle duration.
|
||||
|
||||
Then, the governor computes an extra latency limit to help "interactive"
|
||||
workloads. It uses the observation that if the exit latency of the selected
|
||||
idle state is comparable with the predicted idle duration, the total time spent
|
||||
in that state probably will be very short and the amount of energy to save by
|
||||
entering it will be relatively small, so likely it is better to avoid the
|
||||
overhead related to entering that state and exiting it. Thus selecting a
|
||||
shallower state is likely to be a better option then. The first approximation
|
||||
of the extra latency limit is the predicted idle duration itself which
|
||||
additionally is divided by a value depending on the number of tasks that
|
||||
previously ran on the given CPU and now they are waiting for I/O operations to
|
||||
complete. The result of that division is compared with the latency limit coming
|
||||
from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
|
||||
framework and the minimum of the two is taken as the limit for the idle states'
|
||||
exit latency.
|
||||
If the "typical interval" value is small, which means that the CPU is likely
|
||||
to be woken up soon enough, the sleep length computation is skipped as it may
|
||||
be costly and the idle duration is simply predicted to equal the "typical
|
||||
interval" value.
|
||||
|
||||
Now, the governor is ready to walk the list of idle states and choose one of
|
||||
them. For this purpose, it compares the target residency of each state with
|
||||
the predicted idle duration and the exit latency of it with the computed latency
|
||||
limit. It selects the state with the target residency closest to the predicted
|
||||
the predicted idle duration and the exit latency of it with the with the latency
|
||||
limit coming from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
|
||||
framework. It selects the state with the target residency closest to the predicted
|
||||
idle duration, but still below it, and exit latency that does not exceed the
|
||||
limit.
|
||||
|
||||
|
||||
@ -192,11 +192,19 @@ even if they have been enumerated (see :ref:`cpu-pm-qos` in
|
||||
Documentation/admin-guide/pm/cpuidle.rst).
|
||||
Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail.
|
||||
|
||||
The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle``
|
||||
if the kernel has been configured with ACPI support) can be set to make the
|
||||
driver ignore the system's ACPI tables entirely or use them for all of the
|
||||
recognized processor models, respectively (they both are unset by default and
|
||||
``use_acpi`` has no effect if ``no_acpi`` is set).
|
||||
The ``no_acpi``, ``use_acpi`` and ``no_native`` module parameters are
|
||||
recognized by ``intel_idle`` if the kernel has been configured with ACPI
|
||||
support. In the case that ACPI is not configured these flags have no impact
|
||||
on functionality.
|
||||
|
||||
``no_acpi`` - Do not use ACPI at all. Only native mode is available, no
|
||||
ACPI mode.
|
||||
|
||||
``use_acpi`` - No-op in ACPI mode, the driver will consult ACPI tables for
|
||||
C-states on/off status in native mode.
|
||||
|
||||
``no_native`` - Work only in ACPI mode, no native mode available (ignore
|
||||
all custom tables).
|
||||
|
||||
The value of the ``states_off`` module parameter (0 by default) represents a
|
||||
list of idle states to be disabled by default in the form of a bitmask.
|
||||
|
||||
@ -696,6 +696,9 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
|
||||
Use per-logical-CPU P-State limits (see `Coordination of P-state
|
||||
Limits`_ for details).
|
||||
|
||||
``no_cas``
|
||||
Do not enable capacity-aware scheduling (CAS) which is enabled by
|
||||
default on hybrid systems.
|
||||
|
||||
Diagnostics and Tuning
|
||||
======================
|
||||
|
||||
@ -1526,6 +1526,13 @@ constant ``FUTEX_TID_MASK`` (0x3fffffff).
|
||||
If a value outside of this range is written to ``threads-max`` an
|
||||
``EINVAL`` error occurs.
|
||||
|
||||
timer_migration
|
||||
===============
|
||||
|
||||
When set to a non-zero value, attempt to migrate timers away from idle cpus to
|
||||
allow them to remain in low power states longer.
|
||||
|
||||
Default is set (1).
|
||||
|
||||
traceoff_on_warning
|
||||
===================
|
||||
|
||||
@ -101,6 +101,7 @@ Bit Log Number Reason that got the kernel tainted
|
||||
16 _/X 65536 auxiliary taint, defined for and used by distros
|
||||
17 _/T 131072 kernel was built with the struct randomization plugin
|
||||
18 _/N 262144 an in-kernel test has been run
|
||||
19 _/J 524288 userspace used a mutating debug operation in fwctl
|
||||
=== === ====== ========================================================
|
||||
|
||||
Note: The character ``_`` is representing a blank in this table to make reading
|
||||
@ -182,3 +183,7 @@ More detailed explanation for tainting
|
||||
produce extremely unusual kernel structure layouts (even performance
|
||||
pathological ones), which is important to know when debugging. Set at
|
||||
build time.
|
||||
|
||||
19) ``J`` if userpace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
|
||||
to use the devices debugging features. Device debugging features could
|
||||
cause the device to malfunction in undefined ways.
|
||||
|
||||
@ -28,7 +28,7 @@ should be a userspace tool that handles all the low-level details, keeps
|
||||
a database of the authorized devices and prompts users for new connections.
|
||||
|
||||
More details about the sysfs interface for Thunderbolt devices can be
|
||||
found in ``Documentation/ABI/testing/sysfs-bus-thunderbolt``.
|
||||
found in Documentation/ABI/testing/sysfs-bus-thunderbolt.
|
||||
|
||||
Those users who just want to connect any device without any sort of
|
||||
manual work can add following line to
|
||||
|
||||
@ -8,6 +8,7 @@ s390 Architecture
|
||||
cds
|
||||
3270
|
||||
driver-model
|
||||
mm
|
||||
monreader
|
||||
qeth
|
||||
s390dbf
|
||||
|
||||
111
Documentation/arch/s390/mm.rst
Normal file
@ -0,0 +1,111 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=================
|
||||
Memory Management
|
||||
=================
|
||||
|
||||
Virtual memory layout
|
||||
=====================
|
||||
|
||||
.. note::
|
||||
|
||||
- Some aspects of the virtual memory layout setup are not
|
||||
clarified (number of page levels, alignment, DMA memory).
|
||||
|
||||
- Unused gaps in the virtual memory layout could be present
|
||||
or not - depending on how partucular system is configured.
|
||||
No page tables are created for the unused gaps.
|
||||
|
||||
- The virtual memory regions are tracked or untracked by KASAN
|
||||
instrumentation, as well as the KASAN shadow memory itself is
|
||||
created only when CONFIG_KASAN configuration option is enabled.
|
||||
|
||||
::
|
||||
|
||||
=============================================================================
|
||||
| Physical | Virtual | VM area description
|
||||
=============================================================================
|
||||
+- 0 --------------+- 0 --------------+
|
||||
| | S390_lowcore | Low-address memory
|
||||
| +- 8 KB -----------+
|
||||
| | |
|
||||
| | |
|
||||
| | ... unused gap | KASAN untracked
|
||||
| | |
|
||||
+- AMODE31_START --+- AMODE31_START --+ .amode31 rand. phys/virt start
|
||||
|.amode31 text/data|.amode31 text/data| KASAN untracked
|
||||
+- AMODE31_END ----+- AMODE31_END ----+ .amode31 rand. phys/virt end (<2GB)
|
||||
| | |
|
||||
| | |
|
||||
+- __kaslr_offset_phys | kernel rand. phys start
|
||||
| | |
|
||||
| kernel text/data | |
|
||||
| | |
|
||||
+------------------+ | kernel phys end
|
||||
| | |
|
||||
| | |
|
||||
| | |
|
||||
| | |
|
||||
+- ident_map_size -+ |
|
||||
| |
|
||||
| ... unused gap | KASAN untracked
|
||||
| |
|
||||
+- __identity_base + identity mapping start (>= 2GB)
|
||||
| |
|
||||
| identity | phys == virt - __identity_base
|
||||
| mapping | virt == phys + __identity_base
|
||||
| |
|
||||
| | KASAN tracked
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
+---- vmemmap -----+ 'struct page' array start
|
||||
| |
|
||||
| virtually mapped |
|
||||
| memory map | KASAN untracked
|
||||
| |
|
||||
+- __abs_lowcore --+
|
||||
| |
|
||||
| Absolute Lowcore | KASAN untracked
|
||||
| |
|
||||
+- __memcpy_real_area
|
||||
| |
|
||||
| Real Memory Copy| KASAN untracked
|
||||
| |
|
||||
+- VMALLOC_START --+ vmalloc area start
|
||||
| | KASAN untracked or
|
||||
| vmalloc area | KASAN shallowly populated in case
|
||||
| | CONFIG_KASAN_VMALLOC=y
|
||||
+- MODULES_VADDR --+ modules area start
|
||||
| | KASAN allocated per module or
|
||||
| modules area | KASAN shallowly populated in case
|
||||
| | CONFIG_KASAN_VMALLOC=y
|
||||
+- __kaslr_offset -+ kernel rand. virt start
|
||||
| | KASAN tracked
|
||||
| kernel text/data | phys == (kvirt - __kaslr_offset) +
|
||||
| | __kaslr_offset_phys
|
||||
+- kernel .bss end + kernel rand. virt end
|
||||
| |
|
||||
| ... unused gap | KASAN untracked
|
||||
| |
|
||||
+------------------+ UltraVisor Secure Storage limit
|
||||
| |
|
||||
| ... unused gap | KASAN untracked
|
||||
| |
|
||||
+KASAN_SHADOW_START+ KASAN shadow memory start
|
||||
| |
|
||||
| KASAN shadow | KASAN untracked
|
||||
| |
|
||||
+------------------+ ASCE limit
|
||||
@ -380,6 +380,36 @@ matrix device.
|
||||
control_domains:
|
||||
A read-only file for displaying the control domain numbers assigned to the
|
||||
vfio_ap mediated device.
|
||||
ap_config:
|
||||
A read/write file that, when written to, allows all three of the
|
||||
vfio_ap mediated device's ap matrix masks to be replaced in one shot.
|
||||
Three masks are given, one for adapters, one for domains, and one for
|
||||
control domains. If the given state cannot be set then no changes are
|
||||
made to the vfio-ap mediated device.
|
||||
|
||||
The format of the data written to ap_config is as follows:
|
||||
{amask},{dmask},{cmask}\n
|
||||
|
||||
\n is a newline character.
|
||||
|
||||
amask, dmask, and cmask are masks identifying which adapters, domains,
|
||||
and control domains should be assigned to the mediated device.
|
||||
|
||||
The format of a mask is as follows:
|
||||
0xNN..NN
|
||||
|
||||
Where NN..NN is 64 hexadecimal characters representing a 256-bit value.
|
||||
The leftmost (highest order) bit represents adapter/domain 0.
|
||||
|
||||
For an example set of masks that represent your mdev's current
|
||||
configuration, simply cat ap_config.
|
||||
|
||||
Setting an adapter or domain number greater than the maximum allowed for
|
||||
the system will result in an error.
|
||||
|
||||
This attribute is intended to be used by automation. End users would be
|
||||
better served using the respective assign/unassign attributes for
|
||||
adapters, domains, and control domains.
|
||||
|
||||
* functions:
|
||||
|
||||
@ -550,7 +580,7 @@ These are the steps:
|
||||
following Kconfig elements selected:
|
||||
* IOMMU_SUPPORT
|
||||
* S390
|
||||
* ZCRYPT
|
||||
* AP
|
||||
* VFIO
|
||||
* KVM
|
||||
|
||||
@ -969,6 +999,36 @@ the vfio_ap mediated device to which it is assigned as long as each new APQN
|
||||
resulting from plugging it in references a queue device bound to the vfio_ap
|
||||
device driver.
|
||||
|
||||
Driver Features
|
||||
===============
|
||||
The vfio_ap driver exposes a sysfs file containing supported features.
|
||||
This exists so third party tools (like Libvirt and mdevctl) can query the
|
||||
availability of specific features.
|
||||
|
||||
The features list can be found here: /sys/bus/matrix/devices/matrix/features
|
||||
|
||||
Entries are space delimited. Each entry consists of a combination of
|
||||
alphanumeric and underscore characters.
|
||||
|
||||
Example:
|
||||
cat /sys/bus/matrix/devices/matrix/features
|
||||
guest_matrix dyn ap_config
|
||||
|
||||
the following features are advertised:
|
||||
|
||||
---------------+---------------------------------------------------------------+
|
||||
| Flag | Description |
|
||||
+==============+===============================================================+
|
||||
| guest_matrix | guest_matrix attribute exists. It reports the matrix of |
|
||||
| | adapters and domains that are or will be passed through to a |
|
||||
| | guest when the mdev is attached to it. |
|
||||
+--------------+---------------------------------------------------------------+
|
||||
| dyn | Indicates hot plug/unplug of AP adapters, domains and control |
|
||||
| | domains for a guest to which the mdev is attached. |
|
||||
+------------+-----------------------------------------------------------------+
|
||||
| ap_config | ap_config interface for one-shot modifications to mdev config |
|
||||
+--------------+---------------------------------------------------------------+
|
||||
|
||||
Limitations
|
||||
===========
|
||||
Live guest migration is not supported for guests using AP devices without
|
||||
|
||||
368
Documentation/arch/x86/amd-debugging.rst
Normal file
@ -0,0 +1,368 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
Debugging AMD Zen systems
|
||||
+++++++++++++++++++++++++
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
This document describes techniques that are useful for debugging issues with
|
||||
AMD Zen systems. It is intended for use by developers and technical users
|
||||
to help identify and resolve issues.
|
||||
|
||||
S3 vs s2idle
|
||||
============
|
||||
|
||||
On AMD systems, it's not possible to simultaneously support suspend-to-RAM (S3)
|
||||
and suspend-to-idle (s2idle). To confirm which mode your system supports you
|
||||
can look at ``cat /sys/power/mem_sleep``. If it shows ``s2idle [deep]`` then
|
||||
*S3* is supported. If it shows ``[s2idle]`` then *s2idle* is
|
||||
supported.
|
||||
|
||||
On systems that support *S3*, the firmware will be utilized to put all hardware into
|
||||
the appropriate low power state.
|
||||
|
||||
On systems that support *s2idle*, the kernel will be responsible for transitioning devices
|
||||
into the appropriate low power state. When all devices are in the appropriate low
|
||||
power state, the hardware will transition into a hardware sleep state.
|
||||
|
||||
After a suspend cycle you can tell how much time was spent in a hardware sleep
|
||||
state by looking at ``cat /sys/power/suspend_stats/last_hw_sleep``.
|
||||
|
||||
This flowchart explains how the AMD s2idle suspend flow works.
|
||||
|
||||
.. kernel-figure:: suspend.svg
|
||||
|
||||
This flowchart explains how the amd s2idle resume flow works.
|
||||
|
||||
.. kernel-figure:: resume.svg
|
||||
|
||||
s2idle debugging tool
|
||||
=====================
|
||||
|
||||
As there are a lot of places that problems can occur, a debugging tool has been
|
||||
created at
|
||||
`amd-debug-tools <https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/>`_
|
||||
that can help test for common problems and offer suggestions.
|
||||
|
||||
If you have an s2idle issue, it's best to start with this and follow instructions
|
||||
from its findings. If you continue to have an issue, raise a bug with the
|
||||
report generated from this script to
|
||||
`drm/amd gitlab <https://gitlab.freedesktop.org/drm/amd/-/issues/new?issuable_template=s2idle_BUG_TEMPLATE>`_.
|
||||
|
||||
Spurious s2idle wakeups from an IRQ
|
||||
===================================
|
||||
|
||||
Spurious wakeups will generally have an IRQ set to ``/sys/power/pm_wakeup_irq``.
|
||||
This can be matched to ``/proc/interrupts`` to determine what device woke the system.
|
||||
|
||||
If this isn't enough to debug the problem, then the following sysfs files
|
||||
can be set to add more verbosity to the wakeup process: ::
|
||||
|
||||
# echo 1 | sudo tee /sys/power/pm_debug_messages
|
||||
# echo 1 | sudo tee /sys/power/pm_print_times
|
||||
|
||||
After making those changes, the kernel will display messages that can
|
||||
be traced back to kernel s2idle loop code as well as display any active
|
||||
GPIO sources while waking up.
|
||||
|
||||
If the wakeup is caused by the ACPI SCI, additional ACPI debugging may be
|
||||
needed. These commands can enable additional trace data: ::
|
||||
|
||||
# echo enable | sudo tee /sys/module/acpi/parameters/trace_state
|
||||
# echo 1 | sudo tee /sys/module/acpi/parameters/aml_debug_output
|
||||
# echo 0x0800000f | sudo tee /sys/module/acpi/parameters/debug_level
|
||||
# echo 0xffff0000 | sudo tee /sys/module/acpi/parameters/debug_layer
|
||||
|
||||
Spurious s2idle wakeups from a GPIO
|
||||
===================================
|
||||
|
||||
If a GPIO is active when waking up the system ideally you would look at the
|
||||
schematic to determine what device it is associated with. If the schematic
|
||||
is not available, another tactic is to look at the ACPI _EVT() entry
|
||||
to determine what device is notified when that GPIO is active.
|
||||
|
||||
For a hypothetical example, say that GPIO 59 woke up the system. You can
|
||||
look at the SSDT to determine what device is notified when GPIO 59 is active.
|
||||
|
||||
First convert the GPIO number into hex. ::
|
||||
|
||||
$ python3 -c "print(hex(59))"
|
||||
0x3b
|
||||
|
||||
Next determine which ACPI table has the ``_EVT`` entry. For example: ::
|
||||
|
||||
$ sudo grep EVT /sys/firmware/acpi/tables/SSDT*
|
||||
grep: /sys/firmware/acpi/tables/SSDT27: binary file matches
|
||||
|
||||
Decode this table::
|
||||
|
||||
$ sudo cp /sys/firmware/acpi/tables/SSDT27 .
|
||||
$ sudo iasl -d SSDT27
|
||||
|
||||
Then look at the table and find the matching entry for GPIO 0x3b. ::
|
||||
|
||||
Case (0x3B)
|
||||
{
|
||||
M000 (0x393B)
|
||||
M460 (" Notify (\\_SB.PCI0.GP17.XHC1, 0x02)\n", Zero, Zero, Zero, Zero, Zero, Zero)
|
||||
Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake
|
||||
}
|
||||
|
||||
You can see in this case that the device ``\_SB.PCI0.GP17.XHC1`` is notified
|
||||
when GPIO 59 is active. It's obvious this is an XHCI controller, but to go a
|
||||
step further you can figure out which XHCI controller it is by matching it to
|
||||
ACPI.::
|
||||
|
||||
$ grep "PCI0.GP17.XHC1" /sys/bus/acpi/devices/*/path
|
||||
/sys/bus/acpi/devices/device:2d/path:\_SB_.PCI0.GP17.XHC1
|
||||
/sys/bus/acpi/devices/device:2e/path:\_SB_.PCI0.GP17.XHC1.RHUB
|
||||
/sys/bus/acpi/devices/device:2f/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1
|
||||
/sys/bus/acpi/devices/device:30/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1.CAM0
|
||||
/sys/bus/acpi/devices/device:31/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1.CAM1
|
||||
/sys/bus/acpi/devices/device:32/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT2
|
||||
/sys/bus/acpi/devices/LNXPOWER:0d/path:\_SB_.PCI0.GP17.XHC1.PWRS
|
||||
|
||||
Here you can see it matches to ``device:2d``. Look at the ``physical_node``
|
||||
to determine what PCI device that actually is. ::
|
||||
|
||||
$ ls -l /sys/bus/acpi/devices/device:2d/physical_node
|
||||
lrwxrwxrwx 1 root root 0 Feb 12 13:22 /sys/bus/acpi/devices/device:2d/physical_node -> ../../../../../pci0000:00/0000:00:08.1/0000:c2:00.4
|
||||
|
||||
So there you have it: the PCI device associated with this GPIO wakeup was ``0000:c2:00.4``.
|
||||
|
||||
The ``amd_s2idle.py`` script will capture most of these artifacts for you.
|
||||
|
||||
s2idle PM debug messages
|
||||
========================
|
||||
|
||||
During the s2idle flow on AMD systems, the ACPI LPS0 driver is responsible
|
||||
to check all uPEP constraints. Failing uPEP constraints does not prevent
|
||||
s0i3 entry. This means that if some constraints are not met, it is possible
|
||||
the kernel may attempt to enter s2idle even if there are some known issues.
|
||||
|
||||
To activate PM debugging, either specify ``pm_debug_messagess`` kernel
|
||||
command-line option at boot or write to ``/sys/power/pm_debug_messages``.
|
||||
Unmet constraints will be displayed in the kernel log and can be
|
||||
viewed by logging tools that process kernel ring buffer like ``dmesg`` or
|
||||
``journalctl``."
|
||||
|
||||
If the system freezes on entry/exit before these messages are flushed, a
|
||||
useful debugging tactic is to unbind the ``amd_pmc`` driver to prevent
|
||||
notification to the platform to start s0i3 entry. This will stop the
|
||||
system from freezing on entry or exit and let you view all the failed
|
||||
constraints. ::
|
||||
|
||||
cd /sys/bus/platform/drivers/amd_pmc
|
||||
ls | grep AMD | sudo tee unbind
|
||||
|
||||
After doing this, run the suspend cycle and look specifically for errors around: ::
|
||||
|
||||
ACPI: LPI: Constraint not met; min power state:%s current power state:%s
|
||||
|
||||
Historical examples of s2idle issues
|
||||
====================================
|
||||
|
||||
To help understand the types of issues that can occur and how to debug them,
|
||||
here are some historical examples of s2idle issues that have been resolved.
|
||||
|
||||
Core offlining
|
||||
--------------
|
||||
An end user had reported that taking a core offline would prevent the system
|
||||
from properly entering s0i3. This was debugged using internal AMD tools
|
||||
to capture and display a stream of metrics from the hardware showing what changed
|
||||
when a core was offlined. It was determined that the hardware didn't get
|
||||
notification the offline cores were in the deepest state, and so it prevented
|
||||
CPU from going into the deepest state. The issue was debugged to a missing
|
||||
command to put cores into C3 upon offline.
|
||||
|
||||
`commit d6b88ce2eb9d2 ("ACPI: processor idle: Allow playing dead in C3 state") <https://git.kernel.org/torvalds/c/d6b88ce2eb9d2>`_
|
||||
|
||||
Corruption after resume
|
||||
-----------------------
|
||||
A big problem that occurred with Rembrandt was that there was graphical
|
||||
corruption after resume. This happened because of a misalignment of PSP
|
||||
and driver responsibility. The PSP will save and restore DMCUB, but the
|
||||
driver assumed it needed to reset DMCUB on resume.
|
||||
This actually was a misalignment for earlier silicon as well, but was not
|
||||
observed.
|
||||
|
||||
`commit 79d6b9351f086 ("drm/amd/display: Don't reinitialize DMCUB on s0ix resume") <https://git.kernel.org/torvalds/c/79d6b9351f086>`_
|
||||
|
||||
Back to Back suspends fail
|
||||
--------------------------
|
||||
When using a wakeup source that triggers the IRQ to wakeup, a bug in the
|
||||
pinctrl-amd driver may capture the wrong state of the IRQ and prevent the
|
||||
system going back to sleep properly.
|
||||
|
||||
`commit b8c824a869f22 ("pinctrl: amd: Don't save/restore interrupt status and wake status bits") <https://git.kernel.org/torvalds/c/b8c824a869f22>`_
|
||||
|
||||
Spurious timer based wakeup after 5 minutes
|
||||
-------------------------------------------
|
||||
The HPET was being used to program the wakeup source for the system, however
|
||||
this was causing a spurious wakeup after 5 minutes. The correct alarm to use
|
||||
was the ACPI alarm.
|
||||
|
||||
`commit 3d762e21d5637 ("rtc: cmos: Use ACPI alarm for non-Intel x86 systems too") <https://git.kernel.org/torvalds/c/3d762e21d5637>`_
|
||||
|
||||
Disk disappears after resume
|
||||
----------------------------
|
||||
After resuming from s2idle, the NVME disk would disappear. This was due to the
|
||||
BIOS not specifying the _DSD StorageD3Enable property. This caused the NVME
|
||||
driver not to put the disk into the expected state at suspend and to fail
|
||||
on resume.
|
||||
|
||||
`commit e79a10652bbd3 ("ACPI: x86: Force StorageD3Enable on more products") <https://git.kernel.org/torvalds/c/e79a10652bbd3>`_
|
||||
|
||||
Spurious IRQ1
|
||||
-------------
|
||||
A number of Renoir, Lucienne, Cezanne, & Barcelo platforms have a
|
||||
platform firmware bug where IRQ1 is triggered during s0i3 resume.
|
||||
|
||||
This was fixed in the platform firmware, but a number of systems didn't
|
||||
receive any more platform firmware updates.
|
||||
|
||||
`commit 8e60615e89321 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN") <https://git.kernel.org/torvalds/c/8e60615e89321>`_
|
||||
|
||||
Hardware timeout
|
||||
----------------
|
||||
The hardware performs many actions besides accepting the values from
|
||||
amd-pmc driver. As the communication path with the hardware is a mailbox,
|
||||
it's possible that it might not respond quickly enough.
|
||||
This issue manifested as a failure to suspend: ::
|
||||
|
||||
PM: dpm_run_callback(): acpi_subsys_suspend_noirq+0x0/0x50 returns -110
|
||||
amd_pmc AMDI0005:00: PM: failed to suspend noirq: error -110
|
||||
|
||||
The timing problem was identified by comparing the values of the idle mask.
|
||||
|
||||
`commit 3c3c8e88c8712 ("platform/x86: amd-pmc: Increase the response register timeout") <https://git.kernel.org/torvalds/c/3c3c8e88c8712>`_
|
||||
|
||||
Failed to reach hardware sleep state with panel on
|
||||
--------------------------------------------------
|
||||
On some Strix systems certain panels were observed to block the system from
|
||||
entering a hardware sleep state if the internal panel was on during the sequence.
|
||||
|
||||
Even though the panel got turned off during suspend it exposed a timing problem
|
||||
where an interrupt caused the display hardware to wake up and block low power
|
||||
state entry.
|
||||
|
||||
`commit 40b8c14936bd2 ("drm/amd/display: Disable unneeded hpd interrupts during dm_init") <https://git.kernel.org/torvalds/c/40b8c14936bd2>`_
|
||||
|
||||
Runtime power consumption issues
|
||||
================================
|
||||
|
||||
Runtime power consumption is influenced by many factors, including but not
|
||||
limited to the configuration of the PCIe Active State Power Management (ASPM),
|
||||
the display brightness, the EPP policy of the CPU, and the power management
|
||||
of the devices.
|
||||
|
||||
ASPM
|
||||
----
|
||||
For the best runtime power consumption, ASPM should be programmed as intended
|
||||
by the BIOS from the hardware vendor. To accomplish this the Linux kernel
|
||||
should be compiled with ``CONFIG_PCIEASPM_DEFAULT`` set to ``y`` and the
|
||||
sysfs file ``/sys/module/pcie_aspm/parameters/policy`` should not be modified.
|
||||
|
||||
Most notably, if L1.2 is not configured properly for any devices, the SoC
|
||||
will not be able to enter the deepest idle state.
|
||||
|
||||
EPP Policy
|
||||
----------
|
||||
The ``energy_performance_preference`` sysfs file can be used to set a bias
|
||||
of efficiency or performance for a CPU. This has a direct relationship on
|
||||
the battery life when more heavily biased towards performance.
|
||||
|
||||
|
||||
BIOS debug messages
|
||||
===================
|
||||
|
||||
Most OEM machines don't have a serial UART for outputting kernel or BIOS
|
||||
debug messages. However BIOS debug messages are useful for understanding
|
||||
both BIOS bugs and bugs with the Linux kernel drivers that call BIOS AML.
|
||||
|
||||
As the BIOS on most OEM AMD systems are based off an AMD reference BIOS,
|
||||
the infrastructure used for exporting debugging messages is often the same
|
||||
as AMD reference BIOS.
|
||||
|
||||
Manually Parsing
|
||||
----------------
|
||||
There is generally an ACPI method ``\M460`` that different paths of the AML
|
||||
will call to emit a message to the BIOS serial log. This method takes
|
||||
7 arguments, with the first being a string and the rest being optional
|
||||
integers::
|
||||
|
||||
Method (M460, 7, Serialized)
|
||||
|
||||
Here is an example of a string that BIOS AML may call out using ``\M460``::
|
||||
|
||||
M460 (" OEM-ASL-PCIe Address (0x%X)._REG (%d %d) PCSA = %d\n", DADR, Arg0, Arg1, PCSA, Zero, Zero)
|
||||
|
||||
Normally when executed, the ``\M460`` method would populate the additional
|
||||
arguments into the string. In order to get these messages from the Linux
|
||||
kernel a hook has been added into ACPICA that can capture the *arguments*
|
||||
sent to ``\M460`` and print them to the kernel ring buffer.
|
||||
For example the following message could be emitted into kernel ring buffer::
|
||||
|
||||
extrace-0174 ex_trace_args : " OEM-ASL-PCIe Address (0x%X)._REG (%d %d) PCSA = %d\n", ec106000, 2, 1, 1, 0, 0
|
||||
|
||||
In order to get these messages, you need to compile with ``CONFIG_ACPI_DEBUG``
|
||||
and then turn on the following ACPICA tracing parameters.
|
||||
This can be done either on the kernel command line or at runtime:
|
||||
|
||||
* ``acpi.trace_method_name=\M460``
|
||||
* ``acpi.trace_state=method``
|
||||
|
||||
NOTE: These can be very noisy at bootup. If you turn these parameters on
|
||||
the kernel command, please also consider turning up ``CONFIG_LOG_BUF_SHIFT``
|
||||
to a larger size such as 17 to avoid losing early boot messages.
|
||||
|
||||
Tool assisted Parsing
|
||||
---------------------
|
||||
As mentioned above, parsing by hand can be tedious, especially with a lot of
|
||||
messages. To help with this, a tool has been created at
|
||||
`amd-debug-tools <https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/>`_
|
||||
to help parse the messages.
|
||||
|
||||
Random reboot issues
|
||||
====================
|
||||
|
||||
When a random reboot occurs, the high-level reason for the reboot is stored
|
||||
in a register that will persist onto the next boot.
|
||||
|
||||
There are 6 classes of reasons for the reboot:
|
||||
* Software induced
|
||||
* Power state transition
|
||||
* Pin induced
|
||||
* Hardware induced
|
||||
* Remote reset
|
||||
* Internal CPU event
|
||||
|
||||
.. csv-table::
|
||||
:header: "Bit", "Type", "Reason"
|
||||
:align: left
|
||||
|
||||
"0", "Pin", "thermal pin BP_THERMTRIP_L was tripped"
|
||||
"1", "Pin", "power button was pressed for 4 seconds"
|
||||
"2", "Pin", "shutdown pin was tripped"
|
||||
"4", "Remote", "remote ASF power off command was received"
|
||||
"9", "Internal", "internal CPU thermal limit was tripped"
|
||||
"16", "Pin", "system reset pin BP_SYS_RST_L was tripped"
|
||||
"17", "Software", "software issued PCI reset"
|
||||
"18", "Software", "software wrote 0x4 to reset control register 0xCF9"
|
||||
"19", "Software", "software wrote 0x6 to reset control register 0xCF9"
|
||||
"20", "Software", "software wrote 0xE to reset control register 0xCF9"
|
||||
"21", "ACPI-state", "ACPI power state transition occurred"
|
||||
"22", "Pin", "keyboard reset pin KB_RST_L was tripped"
|
||||
"23", "Internal", "internal CPU shutdown event occurred"
|
||||
"24", "Hardware", "system failed to boot before failed boot timer expired"
|
||||
"25", "Hardware", "hardware watchdog timer expired"
|
||||
"26", "Remote", "remote ASF reset command was received"
|
||||
"27", "Internal", "an uncorrected error caused a data fabric sync flood event"
|
||||
"29", "Internal", "FCH and MP1 failed warm reset handshake"
|
||||
"30", "Internal", "a parity error occurred"
|
||||
"31", "Internal", "a software sync flood event occurred"
|
||||
|
||||
This information is read by the kernel at bootup and printed into
|
||||
the syslog. When a random reboot occurs this message can be helpful
|
||||
to determine the next component to debug.
|
||||
@ -24,6 +24,7 @@ x86-specific Documentation
|
||||
intel-hfi
|
||||
intel-iommu
|
||||
intel_txt
|
||||
amd-debugging
|
||||
amd-memory-encryption
|
||||
amd_hsmp
|
||||
tdx
|
||||
|
||||
4
Documentation/arch/x86/resume.svg
Normal file
|
After Width: | Height: | Size: 350 KiB |
@ -75,6 +75,15 @@ arch_prctl(ARCH_SHSTK_LOCK, unsigned long features)
|
||||
are ignored. The mask is ORed with the existing value. So any feature bits
|
||||
set here cannot be enabled or disabled afterwards.
|
||||
|
||||
arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features)
|
||||
Unlock features. 'features' is a mask of all features to unlock. All
|
||||
bits set are processed, unset bits are ignored. Only works via ptrace.
|
||||
|
||||
arch_prctl(ARCH_SHSTK_STATUS, unsigned long addr)
|
||||
Copy the currently enabled features to the address passed in addr. The
|
||||
features are described using the bits passed into the others in
|
||||
'features'.
|
||||
|
||||
The return values are as follows. On success, return 0. On error, errno can
|
||||
be::
|
||||
|
||||
@ -82,6 +91,7 @@ be::
|
||||
-ENOTSUPP if the feature is not supported by the hardware or
|
||||
kernel.
|
||||
-EINVAL arguments (non existing feature, etc)
|
||||
-EFAULT if could not copy information back to userspace
|
||||
|
||||
The feature's bits supported are::
|
||||
|
||||
|
||||
4
Documentation/arch/x86/suspend.svg
Normal file
|
After Width: | Height: | Size: 318 KiB |
@ -135,6 +135,10 @@ Thread-related topology information in the kernel:
|
||||
The ID of the core to which a thread belongs. It is also printed in /proc/cpuinfo
|
||||
"core_id."
|
||||
|
||||
- topology_logical_core_id();
|
||||
|
||||
The logical core ID to which a thread belongs.
|
||||
|
||||
|
||||
|
||||
System topology examples
|
||||
|
||||
@ -152,6 +152,8 @@ infrastructure:
|
||||
+------------------------------+---------+---------+
|
||||
| DIT | [51-48] | y |
|
||||
+------------------------------+---------+---------+
|
||||
| MPAM | [43-40] | n |
|
||||
+------------------------------+---------+---------+
|
||||
| SVE | [35-32] | y |
|
||||
+------------------------------+---------+---------+
|
||||
| GIC | [27-24] | n |
|
||||
|
||||
@ -55,6 +55,10 @@ stable kernels.
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| Ampere | AmpereOne | AC03_CPU_38 | AMPERE_ERRATUM_AC03_CPU_38 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| Ampere | AmpereOne AC04 | AC04_CPU_10 | AMPERE_ERRATUM_AC03_CPU_38 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| Ampere | AmpereOne AC04 | AC04_CPU_23 | AMPERE_ERRATUM_AC04_CPU_23 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A510 | #2457168 | ARM64_ERRATUM_2457168 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
@ -182,7 +186,8 @@ stable kernels.
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Neoverse-V1 | #1619801 | N/A |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | MMU-500 | #841119,826419 | N/A |
|
||||
| ARM | MMU-500 | #841119,826419 | ARM_SMMU_MMU_500_CPRE_ERRATA|
|
||||
| | | #562869,1047329 | |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | MMU-600 | #1076982,1209401| N/A |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
|
||||
@ -39,13 +39,16 @@ blkdevparts=<blkdev-def>[;<blkdev-def>]
|
||||
create a link to block device partition with the name "PARTNAME".
|
||||
User space application can access partition by partition name.
|
||||
|
||||
ro
|
||||
read-only. Flag the partition as read-only.
|
||||
|
||||
Example:
|
||||
|
||||
eMMC disk names are "mmcblk0" and "mmcblk0boot0".
|
||||
|
||||
bootargs::
|
||||
|
||||
'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot),-(kernel)'
|
||||
'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot)ro,-(kernel)'
|
||||
|
||||
dmesg::
|
||||
|
||||
|
||||
21
Documentation/bpf/fs_kfuncs.rst
Normal file
@ -0,0 +1,21 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
.. _fs_kfuncs-header-label:
|
||||
|
||||
=====================
|
||||
BPF filesystem kfuncs
|
||||
=====================
|
||||
|
||||
BPF LSM programs need to access filesystem data from LSM hooks. The following
|
||||
BPF kfuncs can be used to get these data.
|
||||
|
||||
* ``bpf_get_file_xattr()``
|
||||
|
||||
* ``bpf_get_fsverity_digest()``
|
||||
|
||||
To avoid recursions, these kfuncs follow the following rules:
|
||||
|
||||
1. These kfuncs are only permitted from BPF LSM function.
|
||||
2. These kfuncs should not call into other LSM hooks, i.e. security_*(). For
|
||||
example, ``bpf_get_file_xattr()`` does not use ``vfs_getxattr()``, because
|
||||
the latter calls LSM hook ``security_inode_getxattr``.
|
||||
@ -21,6 +21,7 @@ that goes into great technical depth about the BPF Architecture.
|
||||
helpers
|
||||
kfuncs
|
||||
cpumasks
|
||||
fs_kfuncs
|
||||
programs
|
||||
maps
|
||||
bpf_prog_run
|
||||
|
||||
@ -348,6 +348,12 @@ latex_elements = {
|
||||
verbatimhintsturnover=false,
|
||||
''',
|
||||
|
||||
#
|
||||
# Some of our authors are fond of deep nesting; tell latex to
|
||||
# cope.
|
||||
#
|
||||
'maxlistdepth': '10',
|
||||
|
||||
# Additional stuff for the LaTeX preamble.
|
||||
'preamble': '''
|
||||
% Prevent column squeezing of tabulary.
|
||||
|
||||
@ -151,16 +151,195 @@ the more significant 4-byte word.
|
||||
We always think of our offsets as if there were no quirk, and we translate
|
||||
them afterwards, before accessing the memory region.
|
||||
|
||||
Note on buffer lengths not multiple of 4
|
||||
----------------------------------------
|
||||
|
||||
To deal with memory layout quirks where groups of 4 bytes are laid out "little
|
||||
endian" relative to each other, but "big endian" within the group itself, the
|
||||
concept of groups of 4 bytes is intrinsic to the packing API (not to be
|
||||
confused with the memory access, which is performed byte by byte, though).
|
||||
|
||||
With buffer lengths not multiple of 4, this means one group will be incomplete.
|
||||
Depending on the quirks, this may lead to discontinuities in the bit fields
|
||||
accessible through the buffer. The packing API assumes discontinuities were not
|
||||
the intention of the memory layout, so it avoids them by effectively logically
|
||||
shortening the most significant group of 4 octets to the number of octets
|
||||
actually available.
|
||||
|
||||
Example with a 31 byte sized buffer given below. Physical buffer offsets are
|
||||
implicit, and increase from left to right within a group, and from top to
|
||||
bottom within a column.
|
||||
|
||||
No quirks:
|
||||
|
||||
::
|
||||
|
||||
31 29 28 | Group 7 (most significant)
|
||||
27 26 25 24 | Group 6
|
||||
23 22 21 20 | Group 5
|
||||
19 18 17 16 | Group 4
|
||||
15 14 13 12 | Group 3
|
||||
11 10 9 8 | Group 2
|
||||
7 6 5 4 | Group 1
|
||||
3 2 1 0 | Group 0 (least significant)
|
||||
|
||||
QUIRK_LSW32_IS_FIRST:
|
||||
|
||||
::
|
||||
|
||||
3 2 1 0 | Group 0 (least significant)
|
||||
7 6 5 4 | Group 1
|
||||
11 10 9 8 | Group 2
|
||||
15 14 13 12 | Group 3
|
||||
19 18 17 16 | Group 4
|
||||
23 22 21 20 | Group 5
|
||||
27 26 25 24 | Group 6
|
||||
30 29 28 | Group 7 (most significant)
|
||||
|
||||
QUIRK_LITTLE_ENDIAN:
|
||||
|
||||
::
|
||||
|
||||
30 28 29 | Group 7 (most significant)
|
||||
24 25 26 27 | Group 6
|
||||
20 21 22 23 | Group 5
|
||||
16 17 18 19 | Group 4
|
||||
12 13 14 15 | Group 3
|
||||
8 9 10 11 | Group 2
|
||||
4 5 6 7 | Group 1
|
||||
0 1 2 3 | Group 0 (least significant)
|
||||
|
||||
QUIRK_LITTLE_ENDIAN | QUIRK_LSW32_IS_FIRST:
|
||||
|
||||
::
|
||||
|
||||
0 1 2 3 | Group 0 (least significant)
|
||||
4 5 6 7 | Group 1
|
||||
8 9 10 11 | Group 2
|
||||
12 13 14 15 | Group 3
|
||||
16 17 18 19 | Group 4
|
||||
20 21 22 23 | Group 5
|
||||
24 25 26 27 | Group 6
|
||||
28 29 30 | Group 7 (most significant)
|
||||
|
||||
Intended use
|
||||
------------
|
||||
|
||||
Drivers that opt to use this API first need to identify which of the above 3
|
||||
quirk combinations (for a total of 8) match what the hardware documentation
|
||||
describes. Then they should wrap the packing() function, creating a new
|
||||
xxx_packing() that calls it using the proper QUIRK_* one-hot bits set.
|
||||
describes.
|
||||
|
||||
There are 3 supported usage patterns, detailed below.
|
||||
|
||||
packing()
|
||||
^^^^^^^^^
|
||||
|
||||
This API function is deprecated.
|
||||
|
||||
The packing() function returns an int-encoded error code, which protects the
|
||||
programmer against incorrect API use. The errors are not expected to occur
|
||||
durring runtime, therefore it is reasonable for xxx_packing() to return void
|
||||
and simply swallow those errors. Optionally it can dump stack or print the
|
||||
error description.
|
||||
during runtime, therefore it is reasonable to wrap packing() into a custom
|
||||
function which returns void and swallows those errors. Optionally it can
|
||||
dump stack or print the error description.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void my_packing(void *buf, u64 *val, int startbit, int endbit,
|
||||
size_t len, enum packing_op op)
|
||||
{
|
||||
int err;
|
||||
|
||||
/* Adjust quirks accordingly */
|
||||
err = packing(buf, val, startbit, endbit, len, op, QUIRK_LSW32_IS_FIRST);
|
||||
if (likely(!err))
|
||||
return;
|
||||
|
||||
if (err == -EINVAL) {
|
||||
pr_err("Start bit (%d) expected to be larger than end (%d)\n",
|
||||
startbit, endbit);
|
||||
} else if (err == -ERANGE) {
|
||||
if ((startbit - endbit + 1) > 64)
|
||||
pr_err("Field %d-%d too large for 64 bits!\n",
|
||||
startbit, endbit);
|
||||
else
|
||||
pr_err("Cannot store %llx inside bits %d-%d (would truncate)\n",
|
||||
*val, startbit, endbit);
|
||||
}
|
||||
dump_stack();
|
||||
}
|
||||
|
||||
pack() and unpack()
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
These are const-correct variants of packing(), and eliminate the last "enum
|
||||
packing_op op" argument.
|
||||
|
||||
Calling pack(...) is equivalent, and preferred, to calling packing(..., PACK).
|
||||
|
||||
Calling unpack(...) is equivalent, and preferred, to calling packing(..., UNPACK).
|
||||
|
||||
pack_fields() and unpack_fields()
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The library exposes optimized functions for the scenario where there are many
|
||||
fields represented in a buffer, and it encourages consumer drivers to avoid
|
||||
repetitive calls to pack() and unpack() for each field, but instead use
|
||||
pack_fields() and unpack_fields(), which reduces the code footprint.
|
||||
|
||||
These APIs use field definitions in arrays of ``struct packed_field_u8`` or
|
||||
``struct packed_field_u16``, allowing consumer drivers to minimize the size
|
||||
of these arrays according to their custom requirements.
|
||||
|
||||
The pack_fields() and unpack_fields() API functions are actually macros which
|
||||
automatically select the appropriate function at compile time, based on the
|
||||
type of the fields array passed in.
|
||||
|
||||
An additional benefit over pack() and unpack() is that sanity checks on the
|
||||
field definitions are handled at compile time with ``BUILD_BUG_ON`` rather
|
||||
than only when the offending code is executed. These functions return void and
|
||||
wrapping them to handle unexpected errors is not necessary.
|
||||
|
||||
It is recommended, but not required, that you wrap your packed buffer into a
|
||||
structured type with a fixed size. This generally makes it easier for the
|
||||
compiler to enforce that the correct size buffer is used.
|
||||
|
||||
Here is an example of how to use the fields APIs:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/* Ordering inside the unpacked structure is flexible and can be different
|
||||
* from the packed buffer. Here, it is optimized to reduce padding.
|
||||
*/
|
||||
struct data {
|
||||
u64 field3;
|
||||
u32 field4;
|
||||
u16 field1;
|
||||
u8 field2;
|
||||
};
|
||||
|
||||
#define SIZE 13
|
||||
|
||||
typdef struct __packed { u8 buf[SIZE]; } packed_buf_t;
|
||||
|
||||
static const struct packed_field_u8 fields[] = {
|
||||
PACKED_FIELD(100, 90, struct data, field1),
|
||||
PACKED_FIELD(90, 87, struct data, field2),
|
||||
PACKED_FIELD(86, 30, struct data, field3),
|
||||
PACKED_FIELD(29, 0, struct data, field4),
|
||||
};
|
||||
|
||||
void unpack_your_data(const packed_buf_t *buf, struct data *unpacked)
|
||||
{
|
||||
BUILD_BUG_ON(sizeof(*buf) != SIZE;
|
||||
|
||||
unpack_fields(buf, sizeof(*buf), unpacked, fields,
|
||||
QUIRK_LITTLE_ENDIAN);
|
||||
}
|
||||
|
||||
void pack_your_data(const struct data *unpacked, packed_buf_t *buf)
|
||||
{
|
||||
BUILD_BUG_ON(sizeof(*buf) != SIZE;
|
||||
|
||||
pack_fields(buf, sizeof(*buf), unpacked, fields,
|
||||
QUIRK_LITTLE_ENDIAN);
|
||||
}
|
||||
|
||||
@ -295,9 +295,9 @@ slot set.
|
||||
|
||||
Fourth, the io_tlb_slot array keeps track of any "padding slots" allocated to
|
||||
meet alloc_align_mask requirements described above. When
|
||||
swiotlb_tlb_map_single() allocates bounce buffer space to meet alloc_align_mask
|
||||
swiotlb_tbl_map_single() allocates bounce buffer space to meet alloc_align_mask
|
||||
requirements, it may allocate pre-padding space across zero or more slots. But
|
||||
when swiotbl_tlb_unmap_single() is called with the bounce buffer address, the
|
||||
when swiotlb_tbl_unmap_single() is called with the bounce buffer address, the
|
||||
alloc_align_mask value that governed the allocation, and therefore the
|
||||
allocation of any padding slots, is not known. The "pad_slots" field records
|
||||
the number of padding slots so that swiotlb_tbl_unmap_single() can free them.
|
||||
|
||||
@ -53,7 +53,6 @@ preemption and interrupts::
|
||||
this_cpu_add_return(pcp, val)
|
||||
this_cpu_xchg(pcp, nval)
|
||||
this_cpu_cmpxchg(pcp, oval, nval)
|
||||
this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
|
||||
this_cpu_sub(pcp, val)
|
||||
this_cpu_inc(pcp)
|
||||
this_cpu_dec(pcp)
|
||||
@ -242,7 +241,6 @@ safe::
|
||||
__this_cpu_add_return(pcp, val)
|
||||
__this_cpu_xchg(pcp, nval)
|
||||
__this_cpu_cmpxchg(pcp, oval, nval)
|
||||
__this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
|
||||
__this_cpu_sub(pcp, val)
|
||||
__this_cpu_inc(pcp)
|
||||
__this_cpu_dec(pcp)
|
||||
|
||||
@ -161,6 +161,7 @@ See the include/linux/kmemleak.h header for the functions prototype.
|
||||
- ``kmemleak_free_percpu`` - notify of a percpu memory block freeing
|
||||
- ``kmemleak_update_trace`` - update object allocation stack trace
|
||||
- ``kmemleak_not_leak`` - mark an object as not a leak
|
||||
- ``kmemleak_transient_leak`` - mark an object as a transient leak
|
||||
- ``kmemleak_ignore`` - do not scan or report an object as leak
|
||||
- ``kmemleak_scan_area`` - add scan areas inside a memory block
|
||||
- ``kmemleak_no_scan`` - do not scan a memory block
|
||||
|
||||
@ -0,0 +1,42 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: "http://devicetree.org/schemas/arm/mediatek/mediatek,mt7622-pcie-mirror.yaml#"
|
||||
$schema: "http://devicetree.org/meta-schemas/core.yaml#"
|
||||
|
||||
title: MediaTek PCIE Mirror Controller for MT7622
|
||||
|
||||
maintainers:
|
||||
- Lorenzo Bianconi <lorenzo@kernel.org>
|
||||
- Felix Fietkau <nbd@nbd.name>
|
||||
|
||||
description:
|
||||
The mediatek PCIE mirror provides a configuration interface for PCIE
|
||||
controller on MT7622 soc.
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
items:
|
||||
- enum:
|
||||
- mediatek,mt7622-pcie-mirror
|
||||
- const: syscon
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
soc {
|
||||
#address-cells = <2>;
|
||||
#size-cells = <2>;
|
||||
pcie_mirror: pcie-mirror@10000400 {
|
||||
compatible = "mediatek,mt7622-pcie-mirror", "syscon";
|
||||
reg = <0 0x10000400 0 0x10>;
|
||||
};
|
||||
};
|
||||
@ -0,0 +1,50 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: "http://devicetree.org/schemas/arm/mediatek/mediatek,mt7622-wed.yaml#"
|
||||
$schema: "http://devicetree.org/meta-schemas/core.yaml#"
|
||||
|
||||
title: MediaTek Wireless Ethernet Dispatch Controller for MT7622
|
||||
|
||||
maintainers:
|
||||
- Lorenzo Bianconi <lorenzo@kernel.org>
|
||||
- Felix Fietkau <nbd@nbd.name>
|
||||
|
||||
description:
|
||||
The mediatek wireless ethernet dispatch controller can be configured to
|
||||
intercept and handle access to the WLAN DMA queues and PCIe interrupts
|
||||
and implement hardware flow offloading from ethernet to WLAN.
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
items:
|
||||
- enum:
|
||||
- mediatek,mt7622-wed
|
||||
- const: syscon
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
interrupts:
|
||||
maxItems: 1
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- interrupts
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||
#include <dt-bindings/interrupt-controller/irq.h>
|
||||
soc {
|
||||
#address-cells = <2>;
|
||||
#size-cells = <2>;
|
||||
wed0: wed@1020a000 {
|
||||
compatible = "mediatek,mt7622-wed","syscon";
|
||||
reg = <0 0x1020a000 0 0x1000>;
|
||||
interrupts = <GIC_SPI 214 IRQ_TYPE_LEVEL_LOW>;
|
||||
};
|
||||
};
|
||||
@ -253,6 +253,53 @@ properties:
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
sink-wait-cap-time-ms:
|
||||
description: Represents the max time in ms that USB Type-C port (in sink
|
||||
role) should wait for the port partner (source role) to send source caps.
|
||||
SinkWaitCap timer starts when port in sink role attaches to the source.
|
||||
This timer will stop when sink receives PD source cap advertisement before
|
||||
timeout in which case it'll move to capability negotiation stage. A
|
||||
timeout leads to a hard reset message by the port.
|
||||
minimum: 310
|
||||
maximum: 620
|
||||
default: 310
|
||||
|
||||
ps-source-off-time-ms:
|
||||
description: Represents the max time in ms that a DRP in source role should
|
||||
take to turn off power after the PsSourceOff timer starts. PsSourceOff
|
||||
timer starts when a sink's PHY layer receives EOP of the GoodCRC message
|
||||
(corresponding to an Accept message sent in response to a PR_Swap or a
|
||||
FR_Swap request). This timer stops when last bit of GoodCRC EOP
|
||||
corresponding to the received PS_RDY message is transmitted by the PHY
|
||||
layer. A timeout shall lead to error recovery in the type-c port.
|
||||
minimum: 750
|
||||
maximum: 920
|
||||
default: 920
|
||||
|
||||
cc-debounce-time-ms:
|
||||
description: Represents the max time in ms that a port shall wait to
|
||||
determine if it's attached to a partner.
|
||||
minimum: 100
|
||||
maximum: 200
|
||||
default: 200
|
||||
|
||||
sink-bc12-completion-time-ms:
|
||||
description: Represents the max time in ms that a port in sink role takes
|
||||
to complete Battery Charger (BC1.2) Detection. BC1.2 detection is a
|
||||
hardware mechanism, which in some TCPC implementations, can run in
|
||||
parallel once the Type-C connection state machine reaches the "potential
|
||||
connect as sink" state. In TCPCs where this causes delays to respond to
|
||||
the incoming PD messages, sink-bc12-completion-time-ms is used to delay
|
||||
PD negotiation till BC1.2 detection completes.
|
||||
default: 0
|
||||
|
||||
pd-revision:
|
||||
description: Specifies the maximum USB PD revision and version supported by
|
||||
the connector. This property is specified in the following order;
|
||||
<revision_major, revision_minor, version_major, version_minor>.
|
||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||
maxItems: 4
|
||||
|
||||
dependencies:
|
||||
sink-vdos-v1: [ 'sink-vdos' ]
|
||||
sink-vdos: [ 'sink-vdos-v1' ]
|
||||
@ -380,7 +427,7 @@ examples:
|
||||
};
|
||||
|
||||
# USB-C connector attached to a typec port controller(ptn5110), which has
|
||||
# power delivery support and enables drp.
|
||||
# power delivery support, explicitly defines time properties and enables drp.
|
||||
- |
|
||||
#include <dt-bindings/usb/pd.h>
|
||||
typec: ptn5110 {
|
||||
@ -393,6 +440,10 @@ examples:
|
||||
sink-pdos = <PDO_FIXED(5000, 2000, PDO_FIXED_USB_COMM)
|
||||
PDO_VAR(5000, 12000, 2000)>;
|
||||
op-sink-microwatt = <10000000>;
|
||||
sink-wait-cap-time-ms = <465>;
|
||||
ps-source-off-time-ms = <835>;
|
||||
cc-debounce-time-ms = <101>;
|
||||
sink-bc12-completion-time-ms = <500>;
|
||||
};
|
||||
};
|
||||
|
||||
|
||||
@ -31,6 +31,10 @@ node must be named "audio-codec".
|
||||
Required properties for the audio-codec subnode:
|
||||
|
||||
- #sound-dai-cells = <1>;
|
||||
- interrupts : should contain jack detection interrupts, with headset
|
||||
detect interrupt matching "hs" and microphone bias 2
|
||||
detect interrupt matching "mb2" in interrupt-names.
|
||||
- interrupt-names : Contains "hs", "mb2"
|
||||
|
||||
The audio-codec provides two DAIs. The first one is connected to the
|
||||
Stereo HiFi DAC and the second one is connected to the Voice DAC.
|
||||
@ -52,6 +56,8 @@ Example:
|
||||
|
||||
audio-codec {
|
||||
#sound-dai-cells = <1>;
|
||||
interrupts-extended = <&cpcap 9 0>, <&cpcap 10 0>;
|
||||
interrupt-names = "hs", "mb2";
|
||||
|
||||
/* HiFi */
|
||||
port@0 {
|
||||
|
||||
@ -9,7 +9,10 @@ title: Bosch MCAN controller Bindings
|
||||
description: Bosch MCAN controller for CAN bus
|
||||
|
||||
maintainers:
|
||||
- Sriram Dash <sriram.dash@samsung.com>
|
||||
- Chandrasekar Ramakrishnan <rcsekar@samsung.com>
|
||||
|
||||
allOf:
|
||||
- $ref: can-controller.yaml#
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
@ -66,8 +69,8 @@ properties:
|
||||
M_CAN includes the following elements according to user manual:
|
||||
11-bit Filter 0-128 elements / 0-128 words
|
||||
29-bit Filter 0-64 elements / 0-128 words
|
||||
Rx FIFO 0 0-64 elements / 0-1152 words
|
||||
Rx FIFO 1 0-64 elements / 0-1152 words
|
||||
Rx FIFO 0 0-64 elements / 0-1152 words
|
||||
Rx FIFO 1 0-64 elements / 0-1152 words
|
||||
Rx Buffers 0-64 elements / 0-1152 words
|
||||
Tx Event FIFO 0-32 elements / 0-64 words
|
||||
Tx Buffers 0-32 elements / 0-576 words
|
||||
@ -104,23 +107,31 @@ properties:
|
||||
maximum: 32
|
||||
maxItems: 1
|
||||
|
||||
power-domains:
|
||||
description:
|
||||
Power domain provider node and an args specifier containing
|
||||
the can device id value.
|
||||
maxItems: 1
|
||||
|
||||
can-transceiver:
|
||||
$ref: can-transceiver.yaml#
|
||||
|
||||
phys:
|
||||
maxItems: 1
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- reg-names
|
||||
- interrupts
|
||||
- interrupt-names
|
||||
- clocks
|
||||
- clock-names
|
||||
- bosch,mram-cfg
|
||||
|
||||
additionalProperties: false
|
||||
unevaluatedProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
// Example with interrupts
|
||||
#include <dt-bindings/clock/imx6sx-clock.h>
|
||||
can@20e8000 {
|
||||
compatible = "bosch,m_can";
|
||||
@ -138,4 +149,21 @@ examples:
|
||||
};
|
||||
};
|
||||
|
||||
- |
|
||||
// Example with timer polling
|
||||
#include <dt-bindings/clock/imx6sx-clock.h>
|
||||
can@20e8000 {
|
||||
compatible = "bosch,m_can";
|
||||
reg = <0x020e8000 0x4000>, <0x02298000 0x4000>;
|
||||
reg-names = "m_can", "message_ram";
|
||||
clocks = <&clks IMX6SX_CLK_CANFD>,
|
||||
<&clks IMX6SX_CLK_CANFD>;
|
||||
clock-names = "hclk", "cclk";
|
||||
bosch,mram-cfg = <0x0 0 0 32 0 0 0 1>;
|
||||
|
||||
can-transceiver {
|
||||
max-bitrate = <5000000>;
|
||||
};
|
||||
};
|
||||
|
||||
...
|
||||
|
||||
@ -13,6 +13,15 @@ properties:
|
||||
$nodename:
|
||||
pattern: "^can(@.*)?$"
|
||||
|
||||
termination-gpios:
|
||||
description: GPIO pin to enable CAN bus termination.
|
||||
maxItems: 1
|
||||
|
||||
termination-ohms:
|
||||
description: The resistance value of the CAN bus termination resistor.
|
||||
minimum: 1
|
||||
maximum: 65535
|
||||
|
||||
additionalProperties: true
|
||||
|
||||
...
|
||||
|
||||
@ -5,22 +5,26 @@ $id: http://devicetree.org/schemas/net/can/microchip,mcp251xfd.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title:
|
||||
Microchip MCP2517FD and MCP2518FD stand-alone CAN controller device tree
|
||||
bindings
|
||||
Microchip MCP2517FD, MCP2518FD and MCP251863 stand-alone CAN
|
||||
controller device tree bindings
|
||||
|
||||
maintainers:
|
||||
- Marc Kleine-Budde <mkl@pengutronix.de>
|
||||
|
||||
allOf:
|
||||
- $ref: can-controller.yaml#
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
oneOf:
|
||||
- const: microchip,mcp2517fd
|
||||
description: for MCP2517FD
|
||||
- const: microchip,mcp2518fd
|
||||
description: for MCP2518FD
|
||||
- const: microchip,mcp251xfd
|
||||
description: to autodetect chip variant
|
||||
|
||||
- enum:
|
||||
- microchip,mcp2517fd
|
||||
- microchip,mcp2518fd
|
||||
- microchip,mcp251xfd
|
||||
- items:
|
||||
- enum:
|
||||
- microchip,mcp251863
|
||||
- const: microchip,mcp2518fd
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
|
||||
@ -4,7 +4,10 @@ Texas Instruments TCAN4x5x CAN Controller
|
||||
This file provides device node information for the TCAN4x5x interface contains.
|
||||
|
||||
Required properties:
|
||||
- compatible: "ti,tcan4x5x"
|
||||
- compatible:
|
||||
"ti,tcan4552", "ti,tcan4x5x"
|
||||
"ti,tcan4553", "ti,tcan4x5x" or
|
||||
"ti,tcan4x5x"
|
||||
- reg: 0
|
||||
- #address-cells: 1
|
||||
- #size-cells: 0
|
||||
@ -21,8 +24,12 @@ Optional properties:
|
||||
- reset-gpios: Hardwired output GPIO. If not defined then software
|
||||
reset.
|
||||
- device-state-gpios: Input GPIO that indicates if the device is in
|
||||
a sleep state or if the device is active.
|
||||
- device-wake-gpios: Wake up GPIO to wake up the TCAN device.
|
||||
a sleep state or if the device is active. Not
|
||||
available with tcan4552/4553.
|
||||
- device-wake-gpios: Wake up GPIO to wake up the TCAN device. Not
|
||||
available with tcan4552/4553.
|
||||
- wakeup-source: Leave the chip running when suspended, and configure
|
||||
the RX interrupt to wake up the device.
|
||||
|
||||
Example:
|
||||
tcan4x5x: tcan4x5x@0 {
|
||||
@ -31,10 +38,11 @@ tcan4x5x: tcan4x5x@0 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
spi-max-frequency = <10000000>;
|
||||
bosch,mram-cfg = <0x0 0 0 32 0 0 1 1>;
|
||||
bosch,mram-cfg = <0x0 0 0 16 0 0 1 1>;
|
||||
interrupt-parent = <&gpio1>;
|
||||
interrupts = <14 IRQ_TYPE_LEVEL_LOW>;
|
||||
device-state-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>;
|
||||
device-wake-gpios = <&gpio1 15 GPIO_ACTIVE_HIGH>;
|
||||
reset-gpios = <&gpio1 27 GPIO_ACTIVE_HIGH>;
|
||||
wakeup-source;
|
||||
};
|
||||
|
||||
@ -41,6 +41,16 @@ Required properties:
|
||||
- mediatek,pctl: phandle to the syscon node that handles the ports slew rate
|
||||
and driver current: only for MT2701 and MT7623 SoC
|
||||
|
||||
Optional properties:
|
||||
- dma-coherent: present if dma operations are coherent
|
||||
- mediatek,cci-control: phandle to the cache coherent interconnect node
|
||||
- mediatek,hifsys: phandle to the mediatek hifsys controller used to provide
|
||||
various clocks and reset to the system.
|
||||
- mediatek,wed: a list of phandles to wireless ethernet dispatch nodes for
|
||||
MT7622 SoC.
|
||||
- mediatek,pcie-mirror: phandle to the mediatek pcie-mirror controller for
|
||||
MT7622 SoC.
|
||||
|
||||
* Ethernet MAC node
|
||||
|
||||
Required properties:
|
||||
|
||||
56
Documentation/devicetree/bindings/net/rfkill-gpio.yaml
Normal file
@ -0,0 +1,56 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/rfkill-gpio.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: GPIO controlled rfkill switch
|
||||
|
||||
maintainers:
|
||||
- Johannes Berg <johannes@sipsolutions.net>
|
||||
- Philipp Zabel <p.zabel@pengutronix.de>
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
const: rfkill-gpio
|
||||
|
||||
label:
|
||||
description: rfkill switch name, defaults to node name
|
||||
|
||||
radio-type:
|
||||
description: rfkill radio type
|
||||
enum:
|
||||
- bluetooth
|
||||
- fm
|
||||
- gps
|
||||
- nfc
|
||||
- ultrawideband
|
||||
- wimax
|
||||
- wlan
|
||||
- wwan
|
||||
|
||||
shutdown-gpios:
|
||||
maxItems: 1
|
||||
|
||||
default-blocked:
|
||||
$ref: /schemas/types.yaml#/definitions/flag
|
||||
description: configure rfkill state as blocked at boot
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- radio-type
|
||||
- shutdown-gpios
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
#include <dt-bindings/gpio/gpio.h>
|
||||
|
||||
rfkill {
|
||||
compatible = "rfkill-gpio";
|
||||
label = "rfkill-pcie-wlan";
|
||||
radio-type = "wlan";
|
||||
shutdown-gpios = <&gpio2 25 GPIO_ACTIVE_HIGH>;
|
||||
default-blocked;
|
||||
};
|
||||
@ -4,7 +4,7 @@
|
||||
$id: http://devicetree.org/schemas/net/wireless/brcm,bcm4329-fmac.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Broadcom BCM4329 family fullmac wireless SDIO devices
|
||||
title: Broadcom BCM4329 family fullmac wireless SDIO/PCIE devices
|
||||
|
||||
maintainers:
|
||||
- Arend van Spriel <arend@broadcom.com>
|
||||
@ -15,6 +15,9 @@ description:
|
||||
These chips also have a Bluetooth portion described in a separate
|
||||
binding.
|
||||
|
||||
allOf:
|
||||
- $ref: ieee80211.yaml#
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
oneOf:
|
||||
@ -38,14 +41,23 @@ properties:
|
||||
- brcm,bcm4354-fmac
|
||||
- brcm,bcm4356-fmac
|
||||
- brcm,bcm4359-fmac
|
||||
- brcm,bcm4366-fmac
|
||||
- cypress,cyw4373-fmac
|
||||
- cypress,cyw43012-fmac
|
||||
- infineon,cyw43439-fmac
|
||||
- const: brcm,bcm4329-fmac
|
||||
- const: brcm,bcm4329-fmac
|
||||
- enum:
|
||||
- brcm,bcm4329-fmac
|
||||
- pci14e4,43dc # BCM4355
|
||||
- pci14e4,4464 # BCM4364
|
||||
- pci14e4,4488 # BCM4377
|
||||
- pci14e4,4425 # BCM4378
|
||||
- pci14e4,4433 # BCM4387
|
||||
- pci14e4,449d # BCM43752
|
||||
|
||||
reg:
|
||||
description: SDIO function number for the device, for most cases
|
||||
this will be 1.
|
||||
description: SDIO function number for the device (for most cases
|
||||
this will be 1) or PCI device identifier.
|
||||
|
||||
interrupts:
|
||||
maxItems: 1
|
||||
@ -75,11 +87,54 @@ properties:
|
||||
items:
|
||||
pattern: '^[A-Z][A-Z]-[A-Z][0-9A-Z]-[0-9]+$'
|
||||
|
||||
brcm,ccode-map-trivial:
|
||||
description: |
|
||||
Use a trivial mapping of ISO3166 country codes to brcmfmac firmware
|
||||
country code and revision: cc -> { cc, 0 }. In other words, assume that
|
||||
the CLM blob firmware uses ISO3166 country codes as well, and that all
|
||||
revisions are zero. This property is mutually exclusive with
|
||||
brcm,ccode-map. If both properties are specified, then brcm,ccode-map
|
||||
takes precedence.
|
||||
type: boolean
|
||||
|
||||
brcm,cal-blob:
|
||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||
description: A per-device calibration blob for the Wi-Fi radio. This
|
||||
should be filled in by the bootloader from platform configuration
|
||||
data, if necessary, and will be uploaded to the device if present.
|
||||
|
||||
brcm,board-type:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
description: Overrides the board type, which is normally the compatible of
|
||||
the root node. This can be used to decouple the overall system board or
|
||||
device name from the board type for WiFi purposes, which is used to
|
||||
construct firmware and NVRAM configuration filenames, allowing for
|
||||
multiple devices that share the same module or characteristics for the
|
||||
WiFi subsystem to share the same firmware/NVRAM files. On Apple platforms,
|
||||
this should be the Apple module-instance codename prefixed by "apple,",
|
||||
e.g. "apple,honshu".
|
||||
|
||||
apple,antenna-sku:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
description: Antenna SKU used to identify a specific antenna configuration
|
||||
on Apple platforms. This is use to build firmware filenames, to allow
|
||||
platforms with different antenna configs to have different firmware and/or
|
||||
NVRAM. This would normally be filled in by the bootloader from platform
|
||||
configuration data.
|
||||
|
||||
clocks:
|
||||
items:
|
||||
- description: External Low Power Clock input (32.768KHz)
|
||||
|
||||
clock-names:
|
||||
items:
|
||||
- const: lpo
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
|
||||
additionalProperties: false
|
||||
unevaluatedProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
|
||||
@ -93,20 +93,41 @@ properties:
|
||||
|
||||
ieee80211-freq-limit: true
|
||||
|
||||
qcom,ath10k-calibration-data:
|
||||
qcom,calibration-data:
|
||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||
description:
|
||||
Calibration data + board-specific data as a byte array. The length
|
||||
can vary between hardware versions.
|
||||
|
||||
qcom,ath10k-calibration-variant:
|
||||
qcom,ath10k-calibration-data:
|
||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||
deprecated: true
|
||||
description:
|
||||
Calibration data + board-specific data as a byte array. The length
|
||||
can vary between hardware versions.
|
||||
|
||||
qcom,calibration-variant:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
description:
|
||||
Unique variant identifier of the calibration data in board-2.bin
|
||||
for designs with colliding bus and device specific ids
|
||||
|
||||
qcom,ath10k-calibration-variant:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
deprecated: true
|
||||
description:
|
||||
Unique variant identifier of the calibration data in board-2.bin
|
||||
for designs with colliding bus and device specific ids
|
||||
|
||||
qcom,pre-calibration-data:
|
||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||
description:
|
||||
Pre-calibration data as a byte array. The length can vary between
|
||||
hardware versions.
|
||||
|
||||
qcom,ath10k-pre-calibration-data:
|
||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||
deprecated: true
|
||||
description:
|
||||
Pre-calibration data as a byte array. The length can vary between
|
||||
hardware versions.
|
||||
|
||||
@ -23,8 +23,15 @@ properties:
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
qcom,calibration-variant:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
description: |
|
||||
string to uniquely identify variant of the calibration data for designs
|
||||
with colliding bus and device ids
|
||||
|
||||
qcom,ath11k-calibration-variant:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
deprecated: true
|
||||
description: |
|
||||
string to uniquely identify variant of the calibration data for designs
|
||||
with colliding bus and device ids
|
||||
@ -50,6 +57,9 @@ properties:
|
||||
vddrfa1p7-supply:
|
||||
description: VDD_RFA_1P7 supply regulator handle
|
||||
|
||||
vddrfa1p8-supply:
|
||||
description: VDD_RFA_1P8 supply regulator handle
|
||||
|
||||
vddpcie0p9-supply:
|
||||
description: VDD_PCIE_0P9 supply regulator handle
|
||||
|
||||
@ -77,6 +87,22 @@ allOf:
|
||||
- vddrfa1p7-supply
|
||||
- vddpcie0p9-supply
|
||||
- vddpcie1p8-supply
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
const: pci17cb,1103
|
||||
then:
|
||||
required:
|
||||
- vddrfacmn-supply
|
||||
- vddaon-supply
|
||||
- vddwlcx-supply
|
||||
- vddwlmx-supply
|
||||
- vddrfa0p8-supply
|
||||
- vddrfa1p2-supply
|
||||
- vddrfa1p8-supply
|
||||
- vddpcie0p9-supply
|
||||
- vddpcie1p8-supply
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
@ -99,7 +125,17 @@ examples:
|
||||
compatible = "pci17cb,1103";
|
||||
reg = <0x10000 0x0 0x0 0x0 0x0>;
|
||||
|
||||
qcom,ath11k-calibration-variant = "LE_X13S";
|
||||
vddrfacmn-supply = <&vreg_pmu_rfa_cmn_0p8>;
|
||||
vddaon-supply = <&vreg_pmu_aon_0p8>;
|
||||
vddwlcx-supply = <&vreg_pmu_wlcx_0p8>;
|
||||
vddwlmx-supply = <&vreg_pmu_wlmx_0p8>;
|
||||
vddpcie1p8-supply = <&vreg_pmu_pcie_1p8>;
|
||||
vddpcie0p9-supply = <&vreg_pmu_pcie_0p9>;
|
||||
vddrfa0p8-supply = <&vreg_pmu_rfa_0p8>;
|
||||
vddrfa1p2-supply = <&vreg_pmu_rfa_1p2>;
|
||||
vddrfa1p8-supply = <&vreg_pmu_rfa_1p7>;
|
||||
|
||||
qcom,calibration-variant = "LE_X13S";
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
@ -42,8 +42,15 @@ properties:
|
||||
* reg
|
||||
* reg-names
|
||||
|
||||
qcom,calibration-variant:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
description:
|
||||
string to uniquely identify variant of the calibration data in the
|
||||
board-2.bin for designs with colliding bus and device specific ids
|
||||
|
||||
qcom,ath11k-calibration-variant:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
deprecated: true
|
||||
description:
|
||||
string to uniquely identify variant of the calibration data in the
|
||||
board-2.bin for designs with colliding bus and device specific ids
|
||||
|
||||
@ -0,0 +1,211 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/wireless/qcom,ath12k-wsi.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Qualcomm Technologies ath12k wireless devices (PCIe) with WSI interface
|
||||
|
||||
maintainers:
|
||||
- Jeff Johnson <jjohnson@kernel.org>
|
||||
- Kalle Valo <kvalo@kernel.org>
|
||||
|
||||
description: |
|
||||
Qualcomm Technologies IEEE 802.11be PCIe devices with WSI interface.
|
||||
|
||||
The ath12k devices (QCN9274) feature WSI support. WSI stands for
|
||||
WLAN Serial Interface. It is used for the exchange of specific
|
||||
control information across radios based on the doorbell mechanism.
|
||||
This WSI connection is essential to exchange control information
|
||||
among these devices.
|
||||
|
||||
The WSI interface includes TX and RX ports, which are used to connect
|
||||
multiple WSI-supported devices together, forming a WSI group.
|
||||
|
||||
Diagram to represent one WSI connection (one WSI group) among
|
||||
three devices.
|
||||
|
||||
+-------+ +-------+ +-------+
|
||||
| pcie1 | | pcie2 | | pcie3 |
|
||||
| | | | | |
|
||||
+----->| wsi |------->| wsi |------->| wsi |-----+
|
||||
| | grp 0 | | grp 0 | | grp 0 | |
|
||||
| +-------+ +-------+ +-------+ |
|
||||
+------------------------------------------------------+
|
||||
|
||||
Diagram to represent two WSI connections (two separate WSI groups)
|
||||
among four devices.
|
||||
|
||||
+-------+ +-------+ +-------+ +-------+
|
||||
| pcie0 | | pcie1 | | pcie2 | | pcie3 |
|
||||
| | | | | | | |
|
||||
+-->| wsi |--->| wsi |--+ +-->| wsi |--->| wsi |--+
|
||||
| | grp 0 | | grp 0 | | | | grp 1 | | grp 1 | |
|
||||
| +-------+ +-------+ | | +-------+ +-------+ |
|
||||
+---------------------------+ +---------------------------+
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
enum:
|
||||
- pci17cb,1109 # QCN9274
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
qcom,calibration-variant:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
description:
|
||||
String to uniquely identify variant of the calibration data for designs
|
||||
with colliding bus and device ids
|
||||
|
||||
qcom,ath12k-calibration-variant:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
deprecated: true
|
||||
description:
|
||||
String to uniquely identify variant of the calibration data for designs
|
||||
with colliding bus and device ids
|
||||
|
||||
qcom,wsi-controller:
|
||||
$ref: /schemas/types.yaml#/definitions/flag
|
||||
description:
|
||||
The WSI controller device in the WSI group aids (is capable) to
|
||||
synchronize the Timing Synchronization Function (TSF) clock across
|
||||
all devices in the WSI group.
|
||||
|
||||
ports:
|
||||
$ref: /schemas/graph.yaml#/properties/ports
|
||||
properties:
|
||||
port@0:
|
||||
$ref: /schemas/graph.yaml#/properties/port
|
||||
description:
|
||||
This is the TX port of WSI interface. It is attached to the RX
|
||||
port of the next device in the WSI connection.
|
||||
|
||||
port@1:
|
||||
$ref: /schemas/graph.yaml#/properties/port
|
||||
description:
|
||||
This is the RX port of WSI interface. It is attached to the TX
|
||||
port of the previous device in the WSI connection.
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
pcie {
|
||||
#address-cells = <3>;
|
||||
#size-cells = <2>;
|
||||
|
||||
pcie@0 {
|
||||
device_type = "pci";
|
||||
reg = <0x0 0x0 0x0 0x0 0x0>;
|
||||
#address-cells = <3>;
|
||||
#size-cells = <2>;
|
||||
ranges;
|
||||
|
||||
wifi@0 {
|
||||
compatible = "pci17cb,1109";
|
||||
reg = <0x0 0x0 0x0 0x0 0x0>;
|
||||
|
||||
qcom,calibration-variant = "RDP433_1";
|
||||
|
||||
ports {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
|
||||
port@0 {
|
||||
reg = <0>;
|
||||
|
||||
wifi1_wsi_tx: endpoint {
|
||||
remote-endpoint = <&wifi2_wsi_rx>;
|
||||
};
|
||||
};
|
||||
|
||||
port@1 {
|
||||
reg = <1>;
|
||||
|
||||
wifi1_wsi_rx: endpoint {
|
||||
remote-endpoint = <&wifi3_wsi_tx>;
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
pcie@1 {
|
||||
device_type = "pci";
|
||||
reg = <0x0 0x0 0x1 0x0 0x0>;
|
||||
#address-cells = <3>;
|
||||
#size-cells = <2>;
|
||||
ranges;
|
||||
|
||||
wifi@0 {
|
||||
compatible = "pci17cb,1109";
|
||||
reg = <0x0 0x0 0x0 0x0 0x0>;
|
||||
|
||||
qcom,calibration-variant = "RDP433_2";
|
||||
qcom,wsi-controller;
|
||||
|
||||
ports {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
|
||||
port@0 {
|
||||
reg = <0>;
|
||||
|
||||
wifi2_wsi_tx: endpoint {
|
||||
remote-endpoint = <&wifi3_wsi_rx>;
|
||||
};
|
||||
};
|
||||
|
||||
port@1 {
|
||||
reg = <1>;
|
||||
|
||||
wifi2_wsi_rx: endpoint {
|
||||
remote-endpoint = <&wifi1_wsi_tx>;
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
pcie@2 {
|
||||
device_type = "pci";
|
||||
reg = <0x0 0x0 0x2 0x0 0x0>;
|
||||
#address-cells = <3>;
|
||||
#size-cells = <2>;
|
||||
ranges;
|
||||
|
||||
wifi@0 {
|
||||
compatible = "pci17cb,1109";
|
||||
reg = <0x0 0x0 0x0 0x0 0x0>;
|
||||
|
||||
qcom,calibration-variant = "RDP433_3";
|
||||
|
||||
ports {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
|
||||
port@0 {
|
||||
reg = <0>;
|
||||
|
||||
wifi3_wsi_tx: endpoint {
|
||||
remote-endpoint = <&wifi1_wsi_rx>;
|
||||
};
|
||||
};
|
||||
|
||||
port@1 {
|
||||
reg = <1>;
|
||||
|
||||
wifi3_wsi_rx: endpoint {
|
||||
remote-endpoint = <&wifi2_wsi_tx>;
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
@ -1,27 +0,0 @@
|
||||
* Altera PCIe MSI controller
|
||||
|
||||
Required properties:
|
||||
- compatible: should contain "altr,msi-1.0"
|
||||
- reg: specifies the physical base address of the controller and
|
||||
the length of the memory mapped region.
|
||||
- reg-names: must include the following entries:
|
||||
"csr": CSR registers
|
||||
"vector_slave": vectors slave port region
|
||||
- interrupts: specifies the interrupt source of the parent interrupt
|
||||
controller. The format of the interrupt specifier depends on the
|
||||
parent interrupt controller.
|
||||
- num-vectors: number of vectors, range 1 to 32.
|
||||
- msi-controller: indicates that this is MSI controller node
|
||||
|
||||
|
||||
Example
|
||||
msi0: msi@0xFF200000 {
|
||||
compatible = "altr,msi-1.0";
|
||||
reg = <0xFF200000 0x00000010
|
||||
0xFF200010 0x00000080>;
|
||||
reg-names = "csr", "vector_slave";
|
||||
interrupt-parent = <&hps_0_arm_gic_0>;
|
||||
interrupts = <0 42 4>;
|
||||
msi-controller;
|
||||
num-vectors = <32>;
|
||||
};
|
||||
@ -1,50 +0,0 @@
|
||||
* Altera PCIe controller
|
||||
|
||||
Required properties:
|
||||
- compatible : should contain "altr,pcie-root-port-1.0" or "altr,pcie-root-port-2.0"
|
||||
- reg: a list of physical base address and length for TXS and CRA.
|
||||
For "altr,pcie-root-port-2.0", additional HIP base address and length.
|
||||
- reg-names: must include the following entries:
|
||||
"Txs": TX slave port region
|
||||
"Cra": Control register access region
|
||||
"Hip": Hard IP region (if "altr,pcie-root-port-2.0")
|
||||
- interrupts: specifies the interrupt source of the parent interrupt
|
||||
controller. The format of the interrupt specifier depends
|
||||
on the parent interrupt controller.
|
||||
- device_type: must be "pci"
|
||||
- #address-cells: set to <3>
|
||||
- #size-cells: set to <2>
|
||||
- #interrupt-cells: set to <1>
|
||||
- ranges: describes the translation of addresses for root ports and
|
||||
standard PCI regions.
|
||||
- interrupt-map-mask and interrupt-map: standard PCI properties to define the
|
||||
mapping of the PCIe interface to interrupt numbers.
|
||||
|
||||
Optional properties:
|
||||
- msi-parent: Link to the hardware entity that serves as the MSI controller
|
||||
for this PCIe controller.
|
||||
- bus-range: PCI bus numbers covered
|
||||
|
||||
Example
|
||||
pcie_0: pcie@c00000000 {
|
||||
compatible = "altr,pcie-root-port-1.0";
|
||||
reg = <0xc0000000 0x20000000>,
|
||||
<0xff220000 0x00004000>;
|
||||
reg-names = "Txs", "Cra";
|
||||
interrupt-parent = <&hps_0_arm_gic_0>;
|
||||
interrupts = <0 40 4>;
|
||||
interrupt-controller;
|
||||
#interrupt-cells = <1>;
|
||||
bus-range = <0x0 0xFF>;
|
||||
device_type = "pci";
|
||||
msi-parent = <&msi_to_gic_gen_0>;
|
||||
#address-cells = <3>;
|
||||
#size-cells = <2>;
|
||||
interrupt-map-mask = <0 0 0 7>;
|
||||
interrupt-map = <0 0 0 1 &pcie_0 1>,
|
||||
<0 0 0 2 &pcie_0 2>,
|
||||
<0 0 0 3 &pcie_0 3>,
|
||||
<0 0 0 4 &pcie_0 4>;
|
||||
ranges = <0x82000000 0x00000000 0x00000000 0xc0000000 0x00000000 0x10000000
|
||||
0x82000000 0x00000000 0x10000000 0xd0000000 0x00000000 0x10000000>;
|
||||
};
|
||||
@ -0,0 +1,65 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||
# Copyright (C) 2015, 2024, Intel Corporation
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/altr,msi-controller.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Altera PCIe MSI controller
|
||||
|
||||
maintainers:
|
||||
- Matthew Gerlach <matthew.gerlach@linux.intel.com>
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
enum:
|
||||
- altr,msi-1.0
|
||||
|
||||
reg:
|
||||
items:
|
||||
- description: CSR registers
|
||||
- description: Vectors slave port region
|
||||
|
||||
reg-names:
|
||||
items:
|
||||
- const: csr
|
||||
- const: vector_slave
|
||||
|
||||
interrupts:
|
||||
maxItems: 1
|
||||
|
||||
msi-controller: true
|
||||
|
||||
num-vectors:
|
||||
description: number of vectors
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
minimum: 1
|
||||
maximum: 32
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- reg-names
|
||||
- interrupts
|
||||
- msi-controller
|
||||
- num-vectors
|
||||
|
||||
allOf:
|
||||
- $ref: /schemas/interrupt-controller/msi-controller.yaml#
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||
#include <dt-bindings/interrupt-controller/irq.h>
|
||||
msi@ff200000 {
|
||||
compatible = "altr,msi-1.0";
|
||||
reg = <0xff200000 0x00000010>,
|
||||
<0xff200010 0x00000080>;
|
||||
reg-names = "csr", "vector_slave";
|
||||
interrupt-parent = <&hps_0_arm_gic_0>;
|
||||
interrupts = <GIC_SPI 42 IRQ_TYPE_LEVEL_HIGH>;
|
||||
msi-controller;
|
||||
num-vectors = <32>;
|
||||
};
|
||||
114
Documentation/devicetree/bindings/pci/altr,pcie-root-port.yaml
Normal file
@ -0,0 +1,114 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||
# Copyright (C) 2015, 2019, 2024, Intel Corporation
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/altr,pcie-root-port.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Altera PCIe Root Port
|
||||
|
||||
maintainers:
|
||||
- Matthew Gerlach <matthew.gerlach@linux.intel.com>
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
enum:
|
||||
- altr,pcie-root-port-1.0
|
||||
- altr,pcie-root-port-2.0
|
||||
|
||||
reg:
|
||||
items:
|
||||
- description: TX slave port region
|
||||
- description: Control register access region
|
||||
- description: Hard IP region
|
||||
minItems: 2
|
||||
|
||||
reg-names:
|
||||
items:
|
||||
- const: Txs
|
||||
- const: Cra
|
||||
- const: Hip
|
||||
minItems: 2
|
||||
|
||||
interrupts:
|
||||
maxItems: 1
|
||||
|
||||
interrupt-controller: true
|
||||
|
||||
interrupt-map-mask:
|
||||
items:
|
||||
- const: 0
|
||||
- const: 0
|
||||
- const: 0
|
||||
- const: 7
|
||||
|
||||
interrupt-map:
|
||||
maxItems: 4
|
||||
|
||||
"#interrupt-cells":
|
||||
const: 1
|
||||
|
||||
msi-parent: true
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- reg-names
|
||||
- interrupts
|
||||
- "#interrupt-cells"
|
||||
- interrupt-controller
|
||||
- interrupt-map
|
||||
- interrupt-map-mask
|
||||
|
||||
allOf:
|
||||
- $ref: /schemas/pci/pci-host-bridge.yaml#
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
enum:
|
||||
- altr,pcie-root-port-1.0
|
||||
then:
|
||||
properties:
|
||||
reg:
|
||||
maxItems: 2
|
||||
|
||||
reg-names:
|
||||
maxItems: 2
|
||||
|
||||
else:
|
||||
properties:
|
||||
reg:
|
||||
minItems: 3
|
||||
|
||||
reg-names:
|
||||
minItems: 3
|
||||
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||
#include <dt-bindings/interrupt-controller/irq.h>
|
||||
pcie_0: pcie@c00000000 {
|
||||
compatible = "altr,pcie-root-port-1.0";
|
||||
reg = <0xc0000000 0x20000000>,
|
||||
<0xff220000 0x00004000>;
|
||||
reg-names = "Txs", "Cra";
|
||||
interrupt-parent = <&hps_0_arm_gic_0>;
|
||||
interrupts = <GIC_SPI 40 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-controller;
|
||||
#interrupt-cells = <1>;
|
||||
bus-range = <0x0 0xff>;
|
||||
device_type = "pci";
|
||||
msi-parent = <&msi_to_gic_gen_0>;
|
||||
#address-cells = <3>;
|
||||
#size-cells = <2>;
|
||||
interrupt-map-mask = <0 0 0 7>;
|
||||
interrupt-map = <0 0 0 1 &pcie_0 0 0 0 1>,
|
||||
<0 0 0 2 &pcie_0 0 0 0 2>,
|
||||
<0 0 0 3 &pcie_0 0 0 0 3>,
|
||||
<0 0 0 4 &pcie_0 0 0 0 4>;
|
||||
ranges = <0x82000000 0x00000000 0x00000000 0xc0000000 0x00000000 0x10000000>,
|
||||
<0x82000000 0x00000000 0x10000000 0xd0000000 0x00000000 0x10000000>;
|
||||
};
|
||||
@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
title: Brcmstb PCIe Host Controller Device Tree Bindings
|
||||
|
||||
maintainers:
|
||||
- Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
|
||||
- Jim Quinlan <james.quinlan@broadcom.com>
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
@ -16,9 +16,12 @@ properties:
|
||||
- brcm,bcm2711-pcie # The Raspberry Pi 4
|
||||
- brcm,bcm4908-pcie
|
||||
- brcm,bcm7211-pcie # Broadcom STB version of RPi4
|
||||
- brcm,bcm7278-pcie # Broadcom 7278 Arm
|
||||
- brcm,bcm7216-pcie # Broadcom 7216 Arm
|
||||
- brcm,bcm7278-pcie # Broadcom 7278 Arm
|
||||
- brcm,bcm7425-pcie # Broadcom 7425 MIPs
|
||||
- brcm,bcm7435-pcie # Broadcom 7435 MIPs
|
||||
- brcm,bcm7445-pcie # Broadcom 7445 Arm
|
||||
- brcm,bcm7712-pcie # Broadcom STB sibling of Rpi 5
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
@ -93,7 +96,16 @@ properties:
|
||||
minItems: 1
|
||||
maxItems: 3
|
||||
|
||||
resets:
|
||||
minItems: 1
|
||||
maxItems: 3
|
||||
|
||||
reset-names:
|
||||
minItems: 1
|
||||
maxItems: 3
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- ranges
|
||||
- dma-ranges
|
||||
@ -114,8 +126,7 @@ allOf:
|
||||
then:
|
||||
properties:
|
||||
resets:
|
||||
items:
|
||||
- description: reset controller handling the PERST# signal
|
||||
maxItems: 1
|
||||
|
||||
reset-names:
|
||||
items:
|
||||
@ -132,8 +143,7 @@ allOf:
|
||||
then:
|
||||
properties:
|
||||
resets:
|
||||
items:
|
||||
- description: phandle pointing to the RESCAL reset controller
|
||||
maxItems: 1
|
||||
|
||||
reset-names:
|
||||
items:
|
||||
@ -143,6 +153,27 @@ allOf:
|
||||
- resets
|
||||
- reset-names
|
||||
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
const: brcm,bcm7712-pcie
|
||||
then:
|
||||
properties:
|
||||
resets:
|
||||
minItems: 3
|
||||
maxItems: 3
|
||||
|
||||
reset-names:
|
||||
items:
|
||||
- const: rescal
|
||||
- const: bridge
|
||||
- const: swinit
|
||||
|
||||
required:
|
||||
- resets
|
||||
- reset-names
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
examples:
|
||||
|
||||
@ -65,12 +65,14 @@ allOf:
|
||||
then:
|
||||
properties:
|
||||
reg:
|
||||
minItems: 2
|
||||
maxItems: 2
|
||||
minItems: 4
|
||||
maxItems: 4
|
||||
reg-names:
|
||||
items:
|
||||
- const: dbi
|
||||
- const: addr_space
|
||||
- const: dbi2
|
||||
- const: atu
|
||||
|
||||
- if:
|
||||
properties:
|
||||
@ -129,8 +131,11 @@ examples:
|
||||
|
||||
pcie_ep: pcie-ep@33800000 {
|
||||
compatible = "fsl,imx8mp-pcie-ep";
|
||||
reg = <0x33800000 0x000400000>, <0x18000000 0x08000000>;
|
||||
reg-names = "dbi", "addr_space";
|
||||
reg = <0x33800000 0x100000>,
|
||||
<0x18000000 0x8000000>,
|
||||
<0x33900000 0x100000>,
|
||||
<0x33b00000 0x100000>;
|
||||
reg-names = "dbi", "addr_space", "dbi2", "atu";
|
||||
clocks = <&clk IMX8MP_CLK_HSIO_ROOT>,
|
||||
<&clk IMX8MP_CLK_HSIO_AXI>,
|
||||
<&clk IMX8MP_CLK_PCIE_ROOT>;
|
||||
|
||||
@ -30,6 +30,7 @@ properties:
|
||||
- fsl,imx8mm-pcie
|
||||
- fsl,imx8mp-pcie
|
||||
- fsl,imx95-pcie
|
||||
- fsl,imx8q-pcie
|
||||
|
||||
clocks:
|
||||
minItems: 3
|
||||
@ -184,6 +185,21 @@ allOf:
|
||||
- const: pcie_bus
|
||||
- const: pcie_aux
|
||||
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
enum:
|
||||
- fsl,imx8q-pcie
|
||||
then:
|
||||
properties:
|
||||
clocks:
|
||||
maxItems: 3
|
||||
clock-names:
|
||||
items:
|
||||
- const: dbi
|
||||
- const: mstr
|
||||
- const: slv
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
examples:
|
||||
|
||||
@ -22,18 +22,20 @@ description:
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
enum:
|
||||
- fsl,ls1021a-pcie
|
||||
- fsl,ls2080a-pcie
|
||||
- fsl,ls2085a-pcie
|
||||
- fsl,ls2088a-pcie
|
||||
- fsl,ls1088a-pcie
|
||||
- fsl,ls1046a-pcie
|
||||
- fsl,ls1043a-pcie
|
||||
- fsl,ls1012a-pcie
|
||||
- fsl,ls1028a-pcie
|
||||
- fsl,lx2160a-pcie
|
||||
|
||||
oneOf:
|
||||
- enum:
|
||||
- fsl,ls1012a-pcie
|
||||
- fsl,ls1021a-pcie
|
||||
- fsl,ls1028a-pcie
|
||||
- fsl,ls1043a-pcie
|
||||
- fsl,ls1046a-pcie
|
||||
- fsl,ls1088a-pcie
|
||||
- fsl,ls2080a-pcie
|
||||
- fsl,ls2085a-pcie
|
||||
- fsl,ls2088a-pcie
|
||||
- items:
|
||||
- const: fsl,lx2160ar2-pcie
|
||||
- const: fsl,ls2088a-pcie
|
||||
reg:
|
||||
maxItems: 2
|
||||
|
||||
@ -43,10 +45,15 @@ properties:
|
||||
- const: config
|
||||
|
||||
fsl,pcie-scfg:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
$ref: /schemas/types.yaml#/definitions/phandle-array
|
||||
description: A phandle to the SCFG device node. The second entry is the
|
||||
physical PCIe controller index starting from '0'. This is used to get
|
||||
SCFG PEXN registers.
|
||||
items:
|
||||
items:
|
||||
- description: A phandle to the SCFG device node
|
||||
- description: PCIe controller index starting from '0'
|
||||
maxItems: 1
|
||||
|
||||
big-endian:
|
||||
$ref: /schemas/types.yaml#/definitions/flag
|
||||
@ -67,6 +74,14 @@ properties:
|
||||
minItems: 1
|
||||
maxItems: 2
|
||||
|
||||
num-viewport:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
deprecated: true
|
||||
description:
|
||||
Number of outbound view ports configured in hardware. It's the same as
|
||||
the number of outbound AT windows.
|
||||
maximum: 256
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
|
||||
@ -37,7 +37,8 @@ properties:
|
||||
minItems: 3
|
||||
maxItems: 4
|
||||
|
||||
clocks: true
|
||||
clocks:
|
||||
maxItems: 5
|
||||
|
||||
clock-names:
|
||||
items:
|
||||
|
||||
@ -102,8 +102,6 @@ properties:
|
||||
As described in IEEE Std 1275-1994, but must provide at least a
|
||||
definition of non-prefetchable memory. One or both of prefetchable Memory
|
||||
and IO Space may also be provided.
|
||||
minItems: 1
|
||||
maxItems: 3
|
||||
|
||||
dma-coherent: true
|
||||
|
||||
|
||||
@ -53,6 +53,7 @@ properties:
|
||||
- mediatek,mt8195-pcie
|
||||
- const: mediatek,mt8192-pcie
|
||||
- const: mediatek,mt8192-pcie
|
||||
- const: airoha,en7581-pcie
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
@ -76,20 +77,20 @@ properties:
|
||||
|
||||
resets:
|
||||
minItems: 1
|
||||
maxItems: 2
|
||||
maxItems: 3
|
||||
|
||||
reset-names:
|
||||
minItems: 1
|
||||
maxItems: 2
|
||||
maxItems: 3
|
||||
items:
|
||||
enum: [ phy, mac ]
|
||||
enum: [ phy, mac, phy-lane0, phy-lane1, phy-lane2 ]
|
||||
|
||||
clocks:
|
||||
minItems: 4
|
||||
minItems: 1
|
||||
maxItems: 6
|
||||
|
||||
clock-names:
|
||||
minItems: 4
|
||||
minItems: 1
|
||||
maxItems: 6
|
||||
|
||||
assigned-clocks:
|
||||
@ -147,6 +148,9 @@ allOf:
|
||||
const: mediatek,mt8192-pcie
|
||||
then:
|
||||
properties:
|
||||
clocks:
|
||||
minItems: 6
|
||||
|
||||
clock-names:
|
||||
items:
|
||||
- const: pl_250m
|
||||
@ -155,6 +159,15 @@ allOf:
|
||||
- const: tl_32k
|
||||
- const: peri_26m
|
||||
- const: top_133m
|
||||
|
||||
resets:
|
||||
minItems: 1
|
||||
maxItems: 2
|
||||
|
||||
reset-names:
|
||||
minItems: 1
|
||||
maxItems: 2
|
||||
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
@ -164,6 +177,9 @@ allOf:
|
||||
- mediatek,mt8195-pcie
|
||||
then:
|
||||
properties:
|
||||
clocks:
|
||||
minItems: 6
|
||||
|
||||
clock-names:
|
||||
items:
|
||||
- const: pl_250m
|
||||
@ -172,6 +188,15 @@ allOf:
|
||||
- const: tl_32k
|
||||
- const: peri_26m
|
||||
- const: peri_mem
|
||||
|
||||
resets:
|
||||
minItems: 1
|
||||
maxItems: 2
|
||||
|
||||
reset-names:
|
||||
minItems: 1
|
||||
maxItems: 2
|
||||
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
@ -180,6 +205,10 @@ allOf:
|
||||
- mediatek,mt7986-pcie
|
||||
then:
|
||||
properties:
|
||||
clocks:
|
||||
minItems: 4
|
||||
maxItems: 4
|
||||
|
||||
clock-names:
|
||||
items:
|
||||
- const: pl_250m
|
||||
@ -187,6 +216,36 @@ allOf:
|
||||
- const: peri_26m
|
||||
- const: top_133m
|
||||
|
||||
resets:
|
||||
minItems: 1
|
||||
maxItems: 2
|
||||
|
||||
reset-names:
|
||||
minItems: 1
|
||||
maxItems: 2
|
||||
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
const: airoha,en7581-pcie
|
||||
then:
|
||||
properties:
|
||||
clocks:
|
||||
maxItems: 1
|
||||
|
||||
clock-names:
|
||||
items:
|
||||
- const: sys-ck
|
||||
|
||||
resets:
|
||||
minItems: 3
|
||||
|
||||
reset-names:
|
||||
items:
|
||||
- const: phy-lane0
|
||||
- const: phy-lane1
|
||||
- const: phy-lane2
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
examples:
|
||||
|
||||
@ -10,7 +10,8 @@ description: |
|
||||
Common properties for PCI Endpoint Controller Nodes.
|
||||
|
||||
maintainers:
|
||||
- Kishon Vijay Abraham I <kishon@ti.com>
|
||||
- Kishon Vijay Abraham I <kishon@kernel.org>
|
||||
- Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
|
||||
|
||||
properties:
|
||||
$nodename:
|
||||
@ -41,6 +42,17 @@ properties:
|
||||
default: 1
|
||||
maximum: 16
|
||||
|
||||
linux,pci-domain:
|
||||
description:
|
||||
If present this property assigns a fixed PCI domain number to a PCI
|
||||
Endpoint Controller, otherwise an unstable (across boots) unique number
|
||||
will be assigned. It is required to either not set this property at all
|
||||
or set it for all PCI endpoint controllers in the system, otherwise
|
||||
potentially conflicting domain numbers may be assigned to endpoint
|
||||
controllers. The domain number for each endpoint controller in the system
|
||||
must be unique.
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
|
||||
required:
|
||||
- compatible
|
||||
|
||||
|
||||
@ -21,11 +21,11 @@ properties:
|
||||
|
||||
interrupts:
|
||||
minItems: 1
|
||||
maxItems: 8
|
||||
maxItems: 9
|
||||
|
||||
interrupt-names:
|
||||
minItems: 1
|
||||
maxItems: 8
|
||||
maxItems: 9
|
||||
|
||||
iommu-map:
|
||||
minItems: 1
|
||||
@ -78,6 +78,9 @@ properties:
|
||||
description: GPIO controlled connection to WAKE# signal
|
||||
maxItems: 1
|
||||
|
||||
vddpe-3v3-supply:
|
||||
description: PCIe endpoint power supply
|
||||
|
||||
required:
|
||||
- reg
|
||||
- reg-names
|
||||
|
||||
@ -280,4 +280,5 @@ examples:
|
||||
phy-names = "pciephy";
|
||||
max-link-speed = <3>;
|
||||
num-lanes = <2>;
|
||||
linux,pci-domain = <0>;
|
||||
};
|
||||
|
||||
@ -53,11 +53,19 @@ properties:
|
||||
- const: aggre1 # Aggre NoC PCIe1 AXI clock
|
||||
|
||||
interrupts:
|
||||
maxItems: 1
|
||||
minItems: 8
|
||||
maxItems: 8
|
||||
|
||||
interrupt-names:
|
||||
items:
|
||||
- const: msi
|
||||
- const: msi0
|
||||
- const: msi1
|
||||
- const: msi2
|
||||
- const: msi3
|
||||
- const: msi4
|
||||
- const: msi5
|
||||
- const: msi6
|
||||
- const: msi7
|
||||
|
||||
resets:
|
||||
maxItems: 1
|
||||
@ -66,9 +74,6 @@ properties:
|
||||
items:
|
||||
- const: pci
|
||||
|
||||
vddpe-3v3-supply:
|
||||
description: PCIe endpoint power supply
|
||||
|
||||
allOf:
|
||||
- $ref: qcom,pcie-common.yaml#
|
||||
|
||||
@ -137,8 +142,16 @@ examples:
|
||||
|
||||
dma-coherent;
|
||||
|
||||
interrupts = <GIC_SPI 307 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "msi";
|
||||
interrupts = <GIC_SPI 307 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 308 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 309 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 312 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 313 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 314 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 374 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 375 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "msi0", "msi1", "msi2", "msi3",
|
||||
"msi4", "msi5", "msi6", "msi7";
|
||||
#interrupt-cells = <1>;
|
||||
interrupt-map-mask = <0 0 0 0x7>;
|
||||
interrupt-map = <0 0 0 1 &intc 0 0 0 434 IRQ_TYPE_LEVEL_HIGH>,
|
||||
|
||||
@ -58,9 +58,6 @@ properties:
|
||||
items:
|
||||
- const: pci
|
||||
|
||||
vddpe-3v3-supply:
|
||||
description: A phandle to the PCIe endpoint power supply
|
||||
|
||||
required:
|
||||
- interconnects
|
||||
- interconnect-names
|
||||
|
||||
@ -55,8 +55,8 @@ properties:
|
||||
- const: aggre1 # Aggre NoC PCIe1 AXI clock
|
||||
|
||||
interrupts:
|
||||
minItems: 8
|
||||
maxItems: 8
|
||||
minItems: 9
|
||||
maxItems: 9
|
||||
|
||||
interrupt-names:
|
||||
items:
|
||||
@ -68,6 +68,7 @@ properties:
|
||||
- const: msi5
|
||||
- const: msi6
|
||||
- const: msi7
|
||||
- const: global
|
||||
|
||||
operating-points-v2: true
|
||||
opp-table:
|
||||
@ -149,9 +150,10 @@ examples:
|
||||
<GIC_SPI 145 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 146 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 147 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>;
|
||||
<GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 140 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "msi0", "msi1", "msi2", "msi3",
|
||||
"msi4", "msi5", "msi6", "msi7";
|
||||
"msi4", "msi5", "msi6", "msi7", "global";
|
||||
#interrupt-cells = <1>;
|
||||
interrupt-map-mask = <0 0 0 0x7>;
|
||||
interrupt-map = <0 0 0 1 &intc 0 0 0 149 IRQ_TYPE_LEVEL_HIGH>, /* int_a */
|
||||
|
||||
@ -91,6 +91,9 @@ properties:
|
||||
vdda_refclk-supply:
|
||||
description: A phandle to the core analog power supply for IC which generates reference clock
|
||||
|
||||
vddpe-3v3-supply:
|
||||
description: A phandle to the PCIe endpoint power supply
|
||||
|
||||
phys:
|
||||
maxItems: 1
|
||||
|
||||
|
||||
@ -19,6 +19,7 @@ properties:
|
||||
- enum:
|
||||
- renesas,r8a779f0-pcie-ep # R-Car S4-8
|
||||
- renesas,r8a779g0-pcie-ep # R-Car V4H
|
||||
- renesas,r8a779h0-pcie-ep # R-Car V4M
|
||||
- const: renesas,rcar-gen4-pcie-ep # R-Car Gen4
|
||||
|
||||
reg:
|
||||
|
||||
@ -19,6 +19,7 @@ properties:
|
||||
- enum:
|
||||
- renesas,r8a779f0-pcie # R-Car S4-8
|
||||
- renesas,r8a779g0-pcie # R-Car V4H
|
||||
- renesas,r8a779h0-pcie # R-Car V4M
|
||||
- const: renesas,rcar-gen4-pcie # R-Car Gen4
|
||||
|
||||
reg:
|
||||
|
||||
@ -42,9 +42,13 @@ properties:
|
||||
interrupts:
|
||||
maxItems: 1
|
||||
|
||||
clocks: true
|
||||
clocks:
|
||||
minItems: 1
|
||||
maxItems: 3
|
||||
|
||||
clock-names: true
|
||||
clock-names:
|
||||
minItems: 1
|
||||
maxItems: 3
|
||||
|
||||
resets:
|
||||
maxItems: 1
|
||||
|
||||