326 lines
		
	
	
		
			24 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			326 lines
		
	
	
		
			24 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| What:		/sys/bus/platform/devices/smpro-errmon.*/error_[core|mem|pcie|other]_[ce|ue]
 | |
| KernelVersion:	6.1
 | |
| Contact:	Quan Nguyen <quan@os.amperecomputing.com>
 | |
| Description:
 | |
| 		(RO) Contains the 48-byte Ampere (Vendor-Specific) Error Record printed
 | |
| 		in hex format according to the table below:
 | |
| 
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 		| Offset |     Field     | Size (byte) |                     Description                            |
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 		| 00     | Error Type    | 1           | See :ref:`the table below <smpro-error-types>` for details |
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 		| 01     | Subtype       | 1           | See :ref:`the table below <smpro-error-types>` for details |
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 		| 02     | Instance      | 2           | See :ref:`the table below <smpro-error-types>` for details |
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 		| 04     | Error status  | 4           | See ARM RAS specification for details                      |
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 		| 08     | Error Address | 8           | See ARM RAS specification for details                      |
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 		| 16     | Error Misc 0  | 8           | See ARM RAS specification for details                      |
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 		| 24     | Error Misc 1  | 8           | See ARM RAS specification for details                      |
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 		| 32     | Error Misc 2  | 8           | See ARM RAS specification for details                      |
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 		| 40     | Error Misc 3  | 8           | See ARM RAS specification for details                      |
 | |
| 		+--------+---------------+-------------+------------------------------------------------------------+
 | |
| 
 | |
| 		The table below defines the value of error types, their subtype, subcomponent and instance:
 | |
| 
 | |
| 		.. _smpro-error-types:
 | |
| 
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		|   Error Group   | Error Type | Sub type | Sub component  |               Instance                 |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| CPM (core)      | 0          | 0        | Snoop-Logic    | CPM #                                  |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| CPM (core)      | 0          | 2        | Armv8 Core 1   | CPM #                                  |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| MCU (mem)       | 1          | 1        | ERR1           | MCU # \| SLOT << 11                    |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| MCU (mem)       | 1          | 2        | ERR2           | MCU # \| SLOT << 11                    |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| MCU (mem)       | 1          | 3        | ERR3           | MCU #                                  |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| MCU (mem)       | 1          | 4        | ERR4           | MCU #                                  |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| MCU (mem)       | 1          | 5        | ERR5           | MCU #                                  |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| MCU (mem)       | 1          | 6        | ERR6           | MCU #                                  |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| MCU (mem)       | 1          | 7        | Link Error     | MCU #                                  |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| Mesh (other)    | 2          | 0        | Cross Point    | X \| (Y << 5) \| NS <<11               |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| Mesh (other)    | 2          | 1        | Home Node(IO)  | X \| (Y << 5) \| NS <<11               |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| Mesh (other)    | 2          | 2        | Home Node(Mem) | X \| (Y << 5) \| NS <<11 \| device<<12 |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| Mesh (other)    | 2          | 4        | CCIX Node      | X \| (Y << 5) \| NS <<11               |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| 2P Link (other) | 3          | 0        | N/A            | Altra 2P Link #                        |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 0        | ERR0           | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 1        | ERR1           | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 2        | ERR2           | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 3        | ERR3           | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 4        | ERR4           | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 5        | ERR5           | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 6        | ERR6           | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 7        | ERR7           | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 8        | ERR8           | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 9        | ERR9           | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 10       | ERR10          | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 11       | ERR11          | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 12       | ERR12          | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| GIC (other)     | 5          | 13-21    | ERR13          | RC # + 1                               |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TCU      | 100            | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TBU0     | 0              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TBU1     | 1              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TBU2     | 2              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TBU3     | 3              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TBU4     | 4              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TBU5     | 5              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TBU6     | 6              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TBU7     | 7              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TBU8     | 8              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMMU (other)    | 6          | TBU9     | 9              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| PCIe AER (pcie) | 7          | Root     | 0              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| PCIe AER (pcie) | 7          | Device   | 1              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| PCIe RC (pcie)  | 8          | RCA HB   | 0              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| PCIe RC (pcie)  | 8          | RCB HB   | 1              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| PCIe RC (pcie)  | 8          | RASDP    | 8              | RC #                                   |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| OCM (other)     | 9          | ERR0     | 0              | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| OCM (other)     | 9          | ERR1     | 1              | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| OCM (other)     | 9          | ERR2     | 2              | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMpro (other)   | 10         | ERR0     | 0              | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMpro (other)   | 10         | ERR1     | 1              | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| SMpro (other)   | 10         | MPA_ERR  | 2              | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| PMpro (other)   | 11         | ERR0     | 0              | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| PMpro (other)   | 11         | ERR1     | 1              | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 		| PMpro (other)   | 11         | MPA_ERR  | 2              | 0                                      |
 | |
| 		+-----------------+------------+----------+----------------+----------------------------------------+
 | |
| 
 | |
| 		Example::
 | |
| 
 | |
| 		 # cat error_other_ue
 | |
| 		 880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
 | |
| 
 | |
| 		The detail of each sysfs entries is as below:
 | |
| 
 | |
| 		+-------------+---------------------------------------------------------+----------------------------------+
 | |
| 		|   Error     |                   Sysfs entry                           |   Description (when triggered)   |
 | |
| 		+-------------+---------------------------------------------------------+----------------------------------+
 | |
| 		| Core's CE   | /sys/bus/platform/devices/smpro-errmon.*/error_core_ce  | Core has CE error                |
 | |
| 		+-------------+---------------------------------------------------------+----------------------------------+
 | |
| 		| Core's UE   | /sys/bus/platform/devices/smpro-errmon.*/error_core_ue  | Core has UE error                |
 | |
| 		+-------------+---------------------------------------------------------+----------------------------------+
 | |
| 		| Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/error_mem_ce   | Memory has CE error              |
 | |
| 		+-------------+---------------------------------------------------------+----------------------------------+
 | |
| 		| Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/error_mem_ue   | Memory has UE error              |
 | |
| 		+-------------+---------------------------------------------------------+----------------------------------+
 | |
| 		| PCIe's CE   | /sys/bus/platform/devices/smpro-errmon.*/error_pcie_ce  | any PCIe controller has CE error |
 | |
| 		+-------------+---------------------------------------------------------+----------------------------------+
 | |
| 		| PCIe's UE   | /sys/bus/platform/devices/smpro-errmon.*/error_pcie_ue  | any PCIe controller has UE error |
 | |
| 		+-------------+---------------------------------------------------------+----------------------------------+
 | |
| 		| Other's CE  | /sys/bus/platform/devices/smpro-errmon.*/error_other_ce | any other CE error               |
 | |
| 		+-------------+---------------------------------------------------------+----------------------------------+
 | |
| 		| Other's UE  | /sys/bus/platform/devices/smpro-errmon.*/error_other_ue | any other UE error               |
 | |
| 		+-------------+---------------------------------------------------------+----------------------------------+
 | |
| 
 | |
| 		UE: Uncorrect-able Error
 | |
| 		CE: Correct-able Error
 | |
| 
 | |
| 		For details, see section `3.3 Ampere (Vendor-Specific) Error Record Formats,
 | |
| 		Altra Family RAS Supplement`.
 | |
| 
 | |
| 
 | |
| What:		/sys/bus/platform/devices/smpro-errmon.*/overflow_[core|mem|pcie|other]_[ce|ue]
 | |
| KernelVersion:	6.1
 | |
| Contact:	Quan Nguyen <quan@os.amperecomputing.com>
 | |
| Description:
 | |
| 		(RO) Return the overflow status of each type HW error reported:
 | |
| 
 | |
| 		  - 0      : No overflow
 | |
| 		  - 1      : There is an overflow and the oldest HW errors are dropped
 | |
| 
 | |
| 		The detail of each sysfs entries is as below:
 | |
| 
 | |
| 		+-------------+-----------------------------------------------------------+---------------------------------------+
 | |
| 		|   Overflow  |                   Sysfs entry                             |             Description               |
 | |
| 		+-------------+-----------------------------------------------------------+---------------------------------------+
 | |
| 		| Core's CE   | /sys/bus/platform/devices/smpro-errmon.*/overflow_core_ce | Core CE error overflow                |
 | |
| 		+-------------+-----------------------------------------------------------+---------------------------------------+
 | |
| 		| Core's UE   | /sys/bus/platform/devices/smpro-errmon.*/overflow_core_ue | Core UE error overflow                |
 | |
| 		+-------------+-----------------------------------------------------------+---------------------------------------+
 | |
| 		| Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/overflow_mem_ce  | Memory CE error overflow              |
 | |
| 		+-------------+-----------------------------------------------------------+---------------------------------------+
 | |
| 		| Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/overflow_mem_ue  | Memory UE error overflow              |
 | |
| 		+-------------+-----------------------------------------------------------+---------------------------------------+
 | |
| 		| PCIe's CE   | /sys/bus/platform/devices/smpro-errmon.*/overflow_pcie_ce | any PCIe controller CE error overflow |
 | |
| 		+-------------+-----------------------------------------------------------+---------------------------------------+
 | |
| 		| PCIe's UE   | /sys/bus/platform/devices/smpro-errmon.*/overflow_pcie_ue | any PCIe controller UE error overflow |
 | |
| 		+-------------+-----------------------------------------------------------+---------------------------------------+
 | |
| 		| Other's CE  | /sys/bus/platform/devices/smpro-errmon.*/overflow_other_ce| any other CE error overflow           |
 | |
| 		+-------------+-----------------------------------------------------------+---------------------------------------+
 | |
| 		| Other's UE  | /sys/bus/platform/devices/smpro-errmon.*/overflow_other_ue| other UE error overflow               |
 | |
| 		+-------------+-----------------------------------------------------------+---------------------------------------+
 | |
| 
 | |
| 		where:
 | |
| 
 | |
| 		  - UE: Uncorrect-able Error
 | |
| 		  - CE: Correct-able Error
 | |
| 
 | |
| What:		/sys/bus/platform/devices/smpro-errmon.*/[error|warn]_[smpro|pmpro]
 | |
| KernelVersion:	6.1
 | |
| Contact:	Quan Nguyen <quan@os.amperecomputing.com>
 | |
| Description:
 | |
| 		(RO) Contains the internal firmware error/warning printed as hex format.
 | |
| 
 | |
| 		The detail of each sysfs entries is as below:
 | |
| 
 | |
| 		+---------------+------------------------------------------------------+--------------------------+
 | |
| 		|   Error       |                   Sysfs entry                        |        Description       |
 | |
| 		+---------------+------------------------------------------------------+--------------------------+
 | |
| 		| SMpro error   | /sys/bus/platform/devices/smpro-errmon.*/error_smpro | system has SMpro error   |
 | |
| 		+---------------+------------------------------------------------------+--------------------------+
 | |
| 		| SMpro warning | /sys/bus/platform/devices/smpro-errmon.*/warn_smpro  | system has SMpro warning |
 | |
| 		+---------------+------------------------------------------------------+--------------------------+
 | |
| 		| PMpro error   | /sys/bus/platform/devices/smpro-errmon.*/error_pmpro | system has PMpro error   |
 | |
| 		+---------------+------------------------------------------------------+--------------------------+
 | |
| 		| PMpro warning | /sys/bus/platform/devices/smpro-errmon.*/warn_pmpro  | system has PMpro warning |
 | |
| 		+---------------+------------------------------------------------------+--------------------------+
 | |
| 
 | |
| 		For details, see section `5.10 RAS Internal Error Register Definitions,
 | |
| 		Altra Family Soc BMC Interface Specification`.
 | |
| 
 | |
| What:		/sys/bus/platform/devices/smpro-errmon.*/event_[vrd_warn_fault|vrd_hot|dimm_hot|dimm_2x_refresh]
 | |
| KernelVersion:	6.1 (event_[vrd_warn_fault|vrd_hot|dimm_hot]), 6.4 (event_dimm_2x_refresh)
 | |
| Contact:	Quan Nguyen <quan@os.amperecomputing.com>
 | |
| Description:
 | |
| 		(RO) Contains the detail information in case of VRD/DIMM warning/hot events
 | |
| 		in hex format as below::
 | |
| 
 | |
| 		    AAAA
 | |
| 
 | |
| 		where:
 | |
| 
 | |
| 		  - ``AAAA``: The event detail information data
 | |
| 
 | |
| 		The detail of each sysfs entries is as below:
 | |
| 
 | |
| 		+---------------+---------------------------------------------------------------+---------------------+
 | |
| 		|   Event       |                        Sysfs entry                            |     Description     |
 | |
| 		+---------------+---------------------------------------------------------------+---------------------+
 | |
| 		| VRD HOT       | /sys/bus/platform/devices/smpro-errmon.*/event_vrd_hot        | VRD Hot             |
 | |
| 		+---------------+---------------------------------------------------------------+---------------------+
 | |
| 		| VR Warn/Fault | /sys/bus/platform/devices/smpro-errmon.*/event_vrd_warn_fault | VR Warning or Fault |
 | |
| 		+---------------+---------------------------------------------------------------+---------------------+
 | |
| 		| DIMM HOT      | /sys/bus/platform/devices/smpro-errmon.*/event_dimm_hot       | DIMM Hot            |
 | |
| 		+---------------+---------------------------------------------------------------+---------------------+
 | |
| 		| DIMM 2X       | /sys/bus/platform/devices/smpro-errmon.*/event_dimm_2x_refresh| DIMM 2x refresh rate|
 | |
| 		| REFRESH RATE  |                                                               | event in high temp  |
 | |
| 		+---------------+---------------------------------------------------------------+---------------------+
 | |
| 
 | |
| 		For more details, see section `5.7 GPI Status Registers and 5.9 Memory Error Register Definitions,
 | |
| 		Altra Family Soc BMC Interface Specification`.
 | |
| 
 | |
| What:		/sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome
 | |
| KernelVersion:	6.4
 | |
| Contact:	Quan Nguyen <quan@os.amperecomputing.com>
 | |
| Description:
 | |
| 		(RO) The sysfs returns the 2-byte DIMM failure syndrome data for slot
 | |
| 		0-15 if it failed to initialize.
 | |
| 
 | |
| 		For more details, see section `5.11 Boot Stage Register Definitions,
 | |
| 		Altra Family Soc BMC Interface Specification`.
 | |
| 
 | |
| What:		/sys/bus/platform/devices/smpro-misc.*/boot_progress
 | |
| KernelVersion:	6.1
 | |
| Contact:	Quan Nguyen <quan@os.amperecomputing.com>
 | |
| Description:
 | |
| 		(RO) Contains the boot stages information in hex as format below::
 | |
| 
 | |
| 		    AABBCCCCCCCC
 | |
| 
 | |
| 		where:
 | |
| 
 | |
| 		  - ``AA``      : The boot stages
 | |
| 
 | |
| 		    - 00: SMpro firmware booting
 | |
| 		    - 01: PMpro firmware booting
 | |
| 		    - 02: ATF BL1 firmware booting
 | |
| 		    - 03: DDR initialization
 | |
| 		    - 04: DDR training report status
 | |
| 		    - 05: ATF BL2 firmware booting
 | |
| 		    - 06: ATF BL31 firmware booting
 | |
| 		    - 07: ATF BL32 firmware booting
 | |
| 		    - 08: UEFI firmware booting
 | |
| 		    - 09: OS booting
 | |
| 
 | |
| 		  - ``BB``      : Boot status
 | |
| 
 | |
| 		    - 00: Not started
 | |
| 		    - 01: Started
 | |
| 		    - 02: Completed without error
 | |
| 		    - 03: Failed.
 | |
| 
 | |
| 		  - ``CCCCCCCC``: Boot status information defined for each boot stages
 | |
| 
 | |
| 		For details, see section `5.11 Boot Stage Register Definitions`
 | |
| 		and section `6. Processor Boot Progress Codes, Altra Family Soc BMC
 | |
| 		Interface Specification`.
 | |
| 
 | |
| 
 | |
| What:		/sys/bus/platform/devices/smpro-misc*/soc_power_limit
 | |
| KernelVersion:	6.1
 | |
| Contact:	Quan Nguyen <quan@os.amperecomputing.com>
 | |
| Description:
 | |
| 		(RW) Contains the desired SoC power limit in Watt.
 | |
| 		Writes to this sysfs set the desired SoC power limit (W).
 | |
| 		Reads from this register return the current SoC power limit (W).
 | |
| 		The value ranges:
 | |
| 
 | |
| 		  - Minimum: 120 W
 | |
| 		  - Maximum: Socket TDP power
 |