1047 lines
29 KiB
Diff
1047 lines
29 KiB
Diff
|
From 233df51fbccaf1b66571495a8da18e8cfb5153a2 Mon Sep 17 00:00:00 2001
|
||
|
From: Colin Ian King <colin.i.king@gmail.com>
|
||
|
Date: Fri, 2 Aug 2024 09:21:14 +0100
|
||
|
Subject: [PATCH 23/32] common: Add missing -t from help and manual
|
||
|
|
||
|
The help and manual don't describe the -t option, so add this. Also
|
||
|
convert the manual from DOS format to UNIX format carriage return
|
||
|
+ line feed.
|
||
|
|
||
|
Closes: https://github.com/intel/numatop/issues/28
|
||
|
|
||
|
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
|
||
|
---
|
||
|
common/numatop.c | 4 +-
|
||
|
numatop.8 | 1008 +++++++++++++++++++++++-----------------------
|
||
|
2 files changed, 507 insertions(+), 505 deletions(-)
|
||
|
|
||
|
diff --git a/common/numatop.c b/common/numatop.c
|
||
|
index 2cc5aea..d66e64b 100644
|
||
|
--- a/common/numatop.c
|
||
|
+++ b/common/numatop.c
|
||
|
@@ -387,6 +387,6 @@ print_usage(const char *exec_name)
|
||
|
" normal: balance precision and overhead (default)\n"
|
||
|
" high : high sampling precision\n"
|
||
|
" (high overhead, not recommended option)\n"
|
||
|
- " low : low sampling precision, suitable for high"
|
||
|
- " load system\n");
|
||
|
+ " low : low sampling precision, suitable for high load system\n"
|
||
|
+ " -t specify run time in seconds\n");
|
||
|
}
|
||
|
diff --git a/numatop.8 b/numatop.8
|
||
|
index b09862e..e2ead45 100644
|
||
|
--- a/numatop.8
|
||
|
+++ b/numatop.8
|
||
|
@@ -1,503 +1,505 @@
|
||
|
-.TH NUMATOP 8 "April 3, 2013"
|
||
|
-.\" Please adjust this date whenever revising the manpage.
|
||
|
-.\"
|
||
|
-.\" Some roff macros, for reference:
|
||
|
-.\" .nh disable hyphenation
|
||
|
-.\" .hy enable hyphenation
|
||
|
-.\" .ad l left justify
|
||
|
-.\" .ad b justify to both left and right margins
|
||
|
-.\" .nf disable filling
|
||
|
-.\" .fi enable filling
|
||
|
-.\" .br insert line break
|
||
|
-.\" .sp <n> insert n+1 empty lines
|
||
|
-.\" for manpage-specific macros, see man(7)
|
||
|
-.SH NAME
|
||
|
-numatop \- a tool for memory access locality characterization and analysis.
|
||
|
-.SH SYNOPSIS
|
||
|
-.B numatop
|
||
|
-.RI [ -s ] " " [ -l ] " " [ -f ] " " [ -d ]
|
||
|
-.PP
|
||
|
-.B numatop
|
||
|
-.RI [ -h ]
|
||
|
-.SH DESCRIPTION
|
||
|
-This manual page briefly documents the
|
||
|
-.B numatop
|
||
|
-command.
|
||
|
-.PP
|
||
|
-Most modern systems use a Non-Uniform Memory Access (NUMA) design for
|
||
|
-multiprocessing. In NUMA systems, memory and processors are organized in such a
|
||
|
-way that some parts of memory are closer to a given processor, while other parts
|
||
|
-are farther from it. A processor can access memory that is closer to it much faster
|
||
|
-than the memory that is farther from it. Hence, the latency between the processors
|
||
|
-and different portions of the memory in a NUMA machine may be significantly different.
|
||
|
-
|
||
|
-\fBnumatop\fP is an observation tool for runtime memory locality characterization
|
||
|
-and analysis of processes and threads running on a NUMA system. It helps the user to
|
||
|
-characterize the NUMA behavior of processes and threads and to identify where the
|
||
|
-NUMA-related performance bottlenecks reside. The tool uses hardware performance counter
|
||
|
-sampling technologies and associates the performance data with Linux system runtime
|
||
|
-information to provide real-time analysis in production systems. The tool can be used to:
|
||
|
-
|
||
|
-\fBA)\fP Characterize the locality of all running processes and threads to identify
|
||
|
-those with the poorest locality in the system.
|
||
|
-
|
||
|
-\fBB)\fP Identify the "hot" memory areas, report average memory access latency, and
|
||
|
-provide the location where accessed memory is allocated. A "hot" memory area is where
|
||
|
-process/thread(s) accesses are most frequent. numatop has a metric called "ACCESS%"
|
||
|
-that specifies what percentage of memory accesses are attributable to each memory area.
|
||
|
-
|
||
|
-\fBNote: numatop records only the memory accesses which have latencies greater than a
|
||
|
-predefined threshold (128 CPU cycles).\fP
|
||
|
-
|
||
|
-\fBC)\fP Provide the call-chain(s) in the process/thread code that accesses a given hot
|
||
|
-memory area.
|
||
|
-
|
||
|
-\fBD)\fP Provide the call-chain(s) when the process/thread generates certain counter
|
||
|
-events (RMA/LMA/IR/CYCLE). The call-chain(s) helps to locate the source code that generates
|
||
|
-the events.
|
||
|
-.PP
|
||
|
-RMA: Remote Memory Access.
|
||
|
-.br
|
||
|
-LMA: Local Memory Access.
|
||
|
-.br
|
||
|
-IR: Instruction Retired.
|
||
|
-.br
|
||
|
-CYCLE: CPU cycles.
|
||
|
-.br
|
||
|
-
|
||
|
-\fBE)\fP Provide per-node statistics for memory and CPU utilization. A node is: a region
|
||
|
-of memory in which every byte has the same distance from each CPU.
|
||
|
-
|
||
|
-\fBF)\fP Show, using a user-friendly interface, the list of processes/threads sorted by
|
||
|
-some metrics (by default, sorted by CPU utilization), with the top process having the
|
||
|
-highest CPU utilization in the system and the bottom one having the lowest CPU utilization.
|
||
|
-Users can also use hotkeys to resort the output by these metrics: RMA, LMA, RMA/LMA, CPI,
|
||
|
-and CPU%.
|
||
|
-
|
||
|
-.br
|
||
|
-RMA/LMA: ratio of RMA/LMA.
|
||
|
-.br
|
||
|
-CPI: CPU cycle per instruction.
|
||
|
-.br
|
||
|
-CPU%: CPU utilization.
|
||
|
-.br
|
||
|
-
|
||
|
-\fBnumatop\fP is a GUI tool that periodically tracks and analyzes the NUMA activity of
|
||
|
-processes and threads and displays useful metrics. Users can scroll up/down by using the
|
||
|
-up or down key to navigate in the current window and can use several hot keys shown at the
|
||
|
-bottom of the window, to switch between windows or to change the running state of the tool.
|
||
|
-For example, hotkey 'R' refreshes the data in the current window.
|
||
|
-
|
||
|
-Below is a detailed description of the various display windows and the data items
|
||
|
-that they display:
|
||
|
-
|
||
|
-\fB[WIN1 - Monitoring processes and threads]:\fP
|
||
|
-.br
|
||
|
-Get the locality characterization of all processes. This is the first window upon startup,
|
||
|
-it's numatop's "Home" window. This window displays a list of processes. The top process has
|
||
|
-the highest system CPU utilization (CPU%), while the bottom process has the lowest CPU% in
|
||
|
-the system. Generally, the memory-intensive process is also CPU-intensive, so the processes
|
||
|
-shown in this window are sorted by CPU% by default. The user can press hotkeys '1', '2', '3', '4', or '5' to resort the output by "RMA", "LMA", "RMA/LMA", "CPI", or "CPU%".
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-RMA(K): number of Remote Memory Access (unit is 1000).
|
||
|
-.br
|
||
|
- RMA(K) = RMA / 1000;
|
||
|
-.br
|
||
|
-LMA(K): number of Local Memory Access (unit is 1000).
|
||
|
-.br
|
||
|
- LMA(K) = LMA / 1000;
|
||
|
-.br
|
||
|
-RMA/LMA: ratio of RMA/LMA.
|
||
|
-.br
|
||
|
-CPI: CPU cycles per instruction.
|
||
|
-.br
|
||
|
-CPU%: system CPU utilization (busy time across all CPUs).
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: WIN1 refresh.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.br
|
||
|
-I: Switch to WIN2 to show the normalized data.
|
||
|
-.br
|
||
|
-N: Switch to WIN11 to show the per-node statistics.
|
||
|
-.br
|
||
|
-1: Sort by RMA.
|
||
|
-.br
|
||
|
-2: Sort by LMA.
|
||
|
-.br
|
||
|
-3: Sort by RMA/LMA.
|
||
|
-.br
|
||
|
-4: Sort by CPI.
|
||
|
-.br
|
||
|
-5: Sort by CPU%
|
||
|
-.PP
|
||
|
-\fB[WIN2 - Monitoring processes and threads (normalized)]:\fP
|
||
|
-.br
|
||
|
-Get the normalized locality characterization of all processes.
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-RPI(K): RMA normalized by 1000 instructions.
|
||
|
-.br
|
||
|
- RPI(K) = RMA / (IR / 1000);
|
||
|
-.br
|
||
|
-LPI(K): LMA normalized by 1000 instructions.
|
||
|
-.br
|
||
|
- LPI(K) = LMA / (IR / 1000);
|
||
|
-.br
|
||
|
-Other metrics remain the same.
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: Switch to WIN1.
|
||
|
-.br
|
||
|
-B: Back to previous window.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.br
|
||
|
-N: Switch to WIN11 to show the per-node statistics.
|
||
|
-.br
|
||
|
-1: Sort by RPI.
|
||
|
-.br
|
||
|
-2: Sort by LPI.
|
||
|
-.br
|
||
|
-3: Sort by RMA/LMA.
|
||
|
-.br
|
||
|
-4: Sort by CPI.
|
||
|
-.br
|
||
|
-5: Sort by CPU%
|
||
|
-.PP
|
||
|
-\fB[WIN3 - Monitoring the process]:\fP
|
||
|
-.br
|
||
|
-Get the locality characterization with node affinity of a specified process.
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-NODE: the node ID.
|
||
|
-.br
|
||
|
-CPU%: per-node CPU utilization.
|
||
|
-.br
|
||
|
-Other metrics remain the same.
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: Switch to WIN1.
|
||
|
-.br
|
||
|
-B: Back to previous window.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.br
|
||
|
-N: Switch to WIN11 to show the per-node statistics.
|
||
|
-.br
|
||
|
-L: Show the latency information.
|
||
|
-.br
|
||
|
-C: Show the call-chain.
|
||
|
-.PP
|
||
|
-\fB[WIN4 - Monitoring all threads]:\fP
|
||
|
-.br
|
||
|
-Get the locality characterization of all threads in a specified process.
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]\fP:
|
||
|
-.br
|
||
|
-CPU%: per-CPU CPU utilization.
|
||
|
-.br
|
||
|
-Other metrics remain the same.
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: Switch to WIN1.
|
||
|
-.br
|
||
|
-B: Back to previous window.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.br
|
||
|
-N: Switch to WIN11 to show the per-node statistics.
|
||
|
-.PP
|
||
|
-\fB[WIN5 - Monitoring the thread]:\fP
|
||
|
-.br
|
||
|
-Get the locality characterization with node affinity of a specified thread.
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-CPU%: per-CPU CPU utilization.
|
||
|
-.br
|
||
|
-Other metrics remain the same.
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: Switch to WIN1.
|
||
|
-.br
|
||
|
-B: Back to previous window.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.br
|
||
|
-N: Switch to WIN11 to show the per-node statistics.
|
||
|
-.br
|
||
|
-L: Show the latency information.
|
||
|
-.br
|
||
|
-C: Show the call-chain.
|
||
|
-.PP
|
||
|
-\fB[WIN6 - Monitoring memory areas]:\fP
|
||
|
-.br
|
||
|
-Get the memory area use with the associated accessing latency of a
|
||
|
-specified process/thread.
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-ADDR: starting address of the memory area.
|
||
|
-.br
|
||
|
-SIZE: size of memory area (K/M/G bytes).
|
||
|
-.br
|
||
|
-ACCESS%: percentage of memory accesses are to this memory area.
|
||
|
-.br
|
||
|
-LAT(ns): the average latency (nanoseconds) of memory accesses.
|
||
|
-.br
|
||
|
-DESC: description of memory area (from /proc/<pid>/maps).
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: Switch to WIN1.
|
||
|
-.br
|
||
|
-B: Back to previous window.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.br
|
||
|
-A: Show the memory access node distribution.
|
||
|
-.br
|
||
|
-C: Show the call-chain when process/thread accesses the memory area.
|
||
|
-.PP
|
||
|
-\fB[WIN7 - Memory access node distribution overview]:\fP
|
||
|
-.br
|
||
|
-Get the percentage of memory accesses originated from the process/thread to each node.
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-NODE: the node ID.
|
||
|
-.br
|
||
|
-ACCESS%: percentage of memory accesses are to this node.
|
||
|
-.br
|
||
|
-LAT(ns): the average latency (nanoseconds) of memory accesses to this node.
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: Switch to WIN1.
|
||
|
-.br
|
||
|
-B: Back to previous window.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.PP
|
||
|
-\fB[WIN8 - Break down the memory area into physical memory on node]:\fP
|
||
|
-.br
|
||
|
-Break down the memory area into the physical mapping on node with the
|
||
|
-associated accessing latency of a process/thread.
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-NODE: the node ID.
|
||
|
-.br
|
||
|
-Other metrics remain the same.
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: Switch to WIN1.
|
||
|
-.br
|
||
|
-B: Back to previous window.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.PP
|
||
|
-\fB[WIN9 - Call-chain when process/thread generates the event ("RMA"/"LMA"/"CYCLE"/"IR")]:\fP
|
||
|
-.br
|
||
|
-Determine the call-chains to the code that generates "RMA"/"LMA"/"CYCLE"/"IR".
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-Call-chain list: a list of call-chains.
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: Switch to WIN1.
|
||
|
-.br
|
||
|
-B: Back to the previous window.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.br
|
||
|
-1: Locate call-chain when process/thread generates "RMA"
|
||
|
-.br
|
||
|
-2: Locate call-chain when process/thread generates "LMA"
|
||
|
-.br
|
||
|
-3: Locate call-chain when process/thread generates "CYCLE" (CPU cycle)
|
||
|
-.br
|
||
|
-4: Locate call-chain when process/thread generates "IR" (Instruction Retired)
|
||
|
-.PP
|
||
|
-\fB[WIN10 - Call-chain when process/thread access the memory area]:\fP
|
||
|
-.br
|
||
|
-Determine the call-chains to the code that references this memory area.
|
||
|
-The latency must be greater than the predefined latency threshold
|
||
|
-(128 CPU cycles).
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-Call-chain list: a list of call-chains.
|
||
|
-.br
|
||
|
-Other metrics remain the same.
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: Switch to WIN1.
|
||
|
-.br
|
||
|
-B: Back to previous window.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.PP
|
||
|
-\fB[WIN11 - Node Overview]:\fP
|
||
|
-.br
|
||
|
-Show the basic per-node statistics for this system
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-MEM.ALL: total usable RAM (physical RAM minus a few reserved bits and the kernel binary code).
|
||
|
-.br
|
||
|
-MEM.FREE: sum of LowFree + HighFree (overall stat) .
|
||
|
-.br
|
||
|
-CPU%: per-node CPU utilization.
|
||
|
-.br
|
||
|
-Other metrics remain the same.
|
||
|
-.PP
|
||
|
-\fB[WIN12 - Information of Node N]:\fP
|
||
|
-.br
|
||
|
-Show the memory use and CPU utilization for the selected node.
|
||
|
-.PP
|
||
|
-\fB[KEY METRICS]:\fP
|
||
|
-.br
|
||
|
-CPU: array of logical CPUs which belong to this node.
|
||
|
-.br
|
||
|
-CPU%: per-node CPU utilization.
|
||
|
-.br
|
||
|
-MEM active: the amount of memory that has been used more recently and is not usually reclaimed unless absolute necessary.
|
||
|
-.br
|
||
|
-MEM inactive: the amount of memory that has not been used for a while and is eligible to be swapped to disk.
|
||
|
-.br
|
||
|
-Dirty: the amount of memory waiting to be written back to the disk.
|
||
|
-.br
|
||
|
-Writeback: the amount of memory actively being written back to the disk.
|
||
|
-.br
|
||
|
-Mapped: all pages mapped into a process.
|
||
|
-.PP
|
||
|
-\fB[HOTKEY]:\fP
|
||
|
-.br
|
||
|
-Q: Quit the application.
|
||
|
-.br
|
||
|
-H: Switch to WIN1.
|
||
|
-.br
|
||
|
-B: Back to previous window.
|
||
|
-.br
|
||
|
-R: Refresh to show the latest data.
|
||
|
-.PP
|
||
|
-.SH "OPTIONS"
|
||
|
-The following options are supported by numatop:
|
||
|
-.PP
|
||
|
--s sampling_precision
|
||
|
-.br
|
||
|
-normal: balance precision and overhead (default)
|
||
|
-.br
|
||
|
-high: high sampling precision (high overhead)
|
||
|
-.br
|
||
|
-low: low sampling precision, suitable for high load system
|
||
|
-.PP
|
||
|
--l log_level
|
||
|
-.br
|
||
|
-Specifies the level of logging in the log file. Valid values are:
|
||
|
-.br
|
||
|
-1: unknown (reserved for future use)
|
||
|
-.br
|
||
|
-2: all
|
||
|
-.PP
|
||
|
--f log_file
|
||
|
-.br
|
||
|
-Specifies the log file where output will be written. If the log file is
|
||
|
-not writable, the tool will prompt "Cannot open '<file name>' for writting.".
|
||
|
-.PP
|
||
|
--d dump_file
|
||
|
-.br
|
||
|
-Specifies the dump file where the screen data will be written. Generally the dump
|
||
|
-file is used for automated test. If the dump file is not writable, the tool will
|
||
|
-prompt "Cannot open <file name> for dump writing."
|
||
|
-.PP
|
||
|
--h
|
||
|
-.br
|
||
|
-Displays the command's usage.
|
||
|
-.PP
|
||
|
-.SH EXAMPLES
|
||
|
-Example 1: Launch numatop with high sampling precision
|
||
|
-.br
|
||
|
-numatop -s high
|
||
|
-.PP
|
||
|
-Example 2: Write all warning messages in /tmp/numatop.log
|
||
|
-.br
|
||
|
-numatop -l 2 -o /tmp/numatop.log
|
||
|
-.PP
|
||
|
-Example 3: Dump screen data in /tmp/dump.log
|
||
|
-.br
|
||
|
-numatop -d /tmp/dump.log
|
||
|
-.PP
|
||
|
-.SH EXIT STATUS
|
||
|
-.br
|
||
|
-0: successful operation.
|
||
|
-.br
|
||
|
-Other value: an error occurred.
|
||
|
-.PP
|
||
|
-.SH USAGE
|
||
|
-.br
|
||
|
-You must have root privileges to run numatop.
|
||
|
-.br
|
||
|
-Or set -1 in /proc/sys/kernel/perf_event_paranoid
|
||
|
-.PP
|
||
|
-\fBNote\fP: The perf_event_paranoid setting has security implications and a non-root
|
||
|
-user probably doesn't have authority to access /proc. It is highly recommended
|
||
|
-that the user runs \fBnumatop\fP as root.
|
||
|
-.PP
|
||
|
-.SH VERSION
|
||
|
-.br
|
||
|
-
|
||
|
-\fBnumatop\fP requires a patch set to support PEBS Load Latency functionality in the
|
||
|
-kernel. The patch set has not been integrated in 3.8. Probably it will be integrated
|
||
|
-in 3.9. The following steps show how to get and apply the patch set.
|
||
|
-
|
||
|
-.PP
|
||
|
-1. git clone git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
|
||
|
-.br
|
||
|
-2. cd tip
|
||
|
-.br
|
||
|
-3. git checkout perf/x86
|
||
|
-.br
|
||
|
-4. build kernel as usual
|
||
|
-.PP
|
||
|
-
|
||
|
-\fBnumatop\fP supports the Intel Xeon processors: 5500-series, 6500/7500-series,
|
||
|
-5600 series, E7-x8xx-series, and E5-16xx/24xx/26xx/46xx-series.
|
||
|
-\fBNote\fP: CPU microcode version 0x618 or 0x70c or later is required on
|
||
|
-E5-16xx/24xx/26xx/46xx-series. It also supports IBM Power8, Power9, Power10 and Power11 processors.
|
||
|
+.TH NUMATOP 8 "August 1, 2024"
|
||
|
+.\" Please adjust this date whenever revising the manpage.
|
||
|
+.\"
|
||
|
+.\" Some roff macros, for reference:
|
||
|
+.\" .nh disable hyphenation
|
||
|
+.\" .hy enable hyphenation
|
||
|
+.\" .ad l left justify
|
||
|
+.\" .ad b justify to both left and right margins
|
||
|
+.\" .nf disable filling
|
||
|
+.\" .fi enable filling
|
||
|
+.\" .br insert line break
|
||
|
+.\" .sp <n> insert n+1 empty lines
|
||
|
+.\" for manpage-specific macros, see man(7)
|
||
|
+.SH NAME
|
||
|
+numatop \- a tool for memory access locality characterization and analysis.
|
||
|
+.SH SYNOPSIS
|
||
|
+.B numatop
|
||
|
+.RI [ -s ] " " [ -l ] " " [ -f ] " " [ -d ]
|
||
|
+.PP
|
||
|
+.B numatop
|
||
|
+.RI [ -h ]
|
||
|
+.SH DESCRIPTION
|
||
|
+This manual page briefly documents the
|
||
|
+.B numatop
|
||
|
+command.
|
||
|
+.PP
|
||
|
+Most modern systems use a Non-Uniform Memory Access (NUMA) design for
|
||
|
+multiprocessing. In NUMA systems, memory and processors are organized in such a
|
||
|
+way that some parts of memory are closer to a given processor, while other parts
|
||
|
+are farther from it. A processor can access memory that is closer to it much faster
|
||
|
+than the memory that is farther from it. Hence, the latency between the processors
|
||
|
+and different portions of the memory in a NUMA machine may be significantly different.
|
||
|
+
|
||
|
+\fBnumatop\fP is an observation tool for runtime memory locality characterization
|
||
|
+and analysis of processes and threads running on a NUMA system. It helps the user to
|
||
|
+characterize the NUMA behavior of processes and threads and to identify where the
|
||
|
+NUMA-related performance bottlenecks reside. The tool uses hardware performance counter
|
||
|
+sampling technologies and associates the performance data with Linux system runtime
|
||
|
+information to provide real-time analysis in production systems. The tool can be used to:
|
||
|
+
|
||
|
+\fBA)\fP Characterize the locality of all running processes and threads to identify
|
||
|
+those with the poorest locality in the system.
|
||
|
+
|
||
|
+\fBB)\fP Identify the "hot" memory areas, report average memory access latency, and
|
||
|
+provide the location where accessed memory is allocated. A "hot" memory area is where
|
||
|
+process/thread(s) accesses are most frequent. numatop has a metric called "ACCESS%"
|
||
|
+that specifies what percentage of memory accesses are attributable to each memory area.
|
||
|
+
|
||
|
+\fBNote: numatop records only the memory accesses which have latencies greater than a
|
||
|
+predefined threshold (128 CPU cycles).\fP
|
||
|
+
|
||
|
+\fBC)\fP Provide the call-chain(s) in the process/thread code that accesses a given hot
|
||
|
+memory area.
|
||
|
+
|
||
|
+\fBD)\fP Provide the call-chain(s) when the process/thread generates certain counter
|
||
|
+events (RMA/LMA/IR/CYCLE). The call-chain(s) helps to locate the source code that generates
|
||
|
+the events.
|
||
|
+.PP
|
||
|
+RMA: Remote Memory Access.
|
||
|
+.br
|
||
|
+LMA: Local Memory Access.
|
||
|
+.br
|
||
|
+IR: Instruction Retired.
|
||
|
+.br
|
||
|
+CYCLE: CPU cycles.
|
||
|
+.br
|
||
|
+
|
||
|
+\fBE)\fP Provide per-node statistics for memory and CPU utilization. A node is: a region
|
||
|
+of memory in which every byte has the same distance from each CPU.
|
||
|
+
|
||
|
+\fBF)\fP Show, using a user-friendly interface, the list of processes/threads sorted by
|
||
|
+some metrics (by default, sorted by CPU utilization), with the top process having the
|
||
|
+highest CPU utilization in the system and the bottom one having the lowest CPU utilization.
|
||
|
+Users can also use hotkeys to resort the output by these metrics: RMA, LMA, RMA/LMA, CPI,
|
||
|
+and CPU%.
|
||
|
+
|
||
|
+.br
|
||
|
+RMA/LMA: ratio of RMA/LMA.
|
||
|
+.br
|
||
|
+CPI: CPU cycle per instruction.
|
||
|
+.br
|
||
|
+CPU%: CPU utilization.
|
||
|
+.br
|
||
|
+
|
||
|
+\fBnumatop\fP is a GUI tool that periodically tracks and analyzes the NUMA activity of
|
||
|
+processes and threads and displays useful metrics. Users can scroll up/down by using the
|
||
|
+up or down key to navigate in the current window and can use several hot keys shown at the
|
||
|
+bottom of the window, to switch between windows or to change the running state of the tool.
|
||
|
+For example, hotkey 'R' refreshes the data in the current window.
|
||
|
+
|
||
|
+Below is a detailed description of the various display windows and the data items
|
||
|
+that they display:
|
||
|
+
|
||
|
+\fB[WIN1 - Monitoring processes and threads]:\fP
|
||
|
+.br
|
||
|
+Get the locality characterization of all processes. This is the first window upon startup,
|
||
|
+it's numatop's "Home" window. This window displays a list of processes. The top process has
|
||
|
+the highest system CPU utilization (CPU%), while the bottom process has the lowest CPU% in
|
||
|
+the system. Generally, the memory-intensive process is also CPU-intensive, so the processes
|
||
|
+shown in this window are sorted by CPU% by default. The user can press hotkeys '1', '2', '3', '4', or '5' to resort the output by "RMA", "LMA", "RMA/LMA", "CPI", or "CPU%".
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+RMA(K): number of Remote Memory Access (unit is 1000).
|
||
|
+.br
|
||
|
+ RMA(K) = RMA / 1000;
|
||
|
+.br
|
||
|
+LMA(K): number of Local Memory Access (unit is 1000).
|
||
|
+.br
|
||
|
+ LMA(K) = LMA / 1000;
|
||
|
+.br
|
||
|
+RMA/LMA: ratio of RMA/LMA.
|
||
|
+.br
|
||
|
+CPI: CPU cycles per instruction.
|
||
|
+.br
|
||
|
+CPU%: system CPU utilization (busy time across all CPUs).
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: WIN1 refresh.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.br
|
||
|
+I: Switch to WIN2 to show the normalized data.
|
||
|
+.br
|
||
|
+N: Switch to WIN11 to show the per-node statistics.
|
||
|
+.br
|
||
|
+1: Sort by RMA.
|
||
|
+.br
|
||
|
+2: Sort by LMA.
|
||
|
+.br
|
||
|
+3: Sort by RMA/LMA.
|
||
|
+.br
|
||
|
+4: Sort by CPI.
|
||
|
+.br
|
||
|
+5: Sort by CPU%
|
||
|
+.PP
|
||
|
+\fB[WIN2 - Monitoring processes and threads (normalized)]:\fP
|
||
|
+.br
|
||
|
+Get the normalized locality characterization of all processes.
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+RPI(K): RMA normalized by 1000 instructions.
|
||
|
+.br
|
||
|
+ RPI(K) = RMA / (IR / 1000);
|
||
|
+.br
|
||
|
+LPI(K): LMA normalized by 1000 instructions.
|
||
|
+.br
|
||
|
+ LPI(K) = LMA / (IR / 1000);
|
||
|
+.br
|
||
|
+Other metrics remain the same.
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: Switch to WIN1.
|
||
|
+.br
|
||
|
+B: Back to previous window.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.br
|
||
|
+N: Switch to WIN11 to show the per-node statistics.
|
||
|
+.br
|
||
|
+1: Sort by RPI.
|
||
|
+.br
|
||
|
+2: Sort by LPI.
|
||
|
+.br
|
||
|
+3: Sort by RMA/LMA.
|
||
|
+.br
|
||
|
+4: Sort by CPI.
|
||
|
+.br
|
||
|
+5: Sort by CPU%
|
||
|
+.PP
|
||
|
+\fB[WIN3 - Monitoring the process]:\fP
|
||
|
+.br
|
||
|
+Get the locality characterization with node affinity of a specified process.
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+NODE: the node ID.
|
||
|
+.br
|
||
|
+CPU%: per-node CPU utilization.
|
||
|
+.br
|
||
|
+Other metrics remain the same.
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: Switch to WIN1.
|
||
|
+.br
|
||
|
+B: Back to previous window.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.br
|
||
|
+N: Switch to WIN11 to show the per-node statistics.
|
||
|
+.br
|
||
|
+L: Show the latency information.
|
||
|
+.br
|
||
|
+C: Show the call-chain.
|
||
|
+.PP
|
||
|
+\fB[WIN4 - Monitoring all threads]:\fP
|
||
|
+.br
|
||
|
+Get the locality characterization of all threads in a specified process.
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]\fP:
|
||
|
+.br
|
||
|
+CPU%: per-CPU CPU utilization.
|
||
|
+.br
|
||
|
+Other metrics remain the same.
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: Switch to WIN1.
|
||
|
+.br
|
||
|
+B: Back to previous window.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.br
|
||
|
+N: Switch to WIN11 to show the per-node statistics.
|
||
|
+.PP
|
||
|
+\fB[WIN5 - Monitoring the thread]:\fP
|
||
|
+.br
|
||
|
+Get the locality characterization with node affinity of a specified thread.
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+CPU%: per-CPU CPU utilization.
|
||
|
+.br
|
||
|
+Other metrics remain the same.
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: Switch to WIN1.
|
||
|
+.br
|
||
|
+B: Back to previous window.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.br
|
||
|
+N: Switch to WIN11 to show the per-node statistics.
|
||
|
+.br
|
||
|
+L: Show the latency information.
|
||
|
+.br
|
||
|
+C: Show the call-chain.
|
||
|
+.PP
|
||
|
+\fB[WIN6 - Monitoring memory areas]:\fP
|
||
|
+.br
|
||
|
+Get the memory area use with the associated accessing latency of a
|
||
|
+specified process/thread.
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+ADDR: starting address of the memory area.
|
||
|
+.br
|
||
|
+SIZE: size of memory area (K/M/G bytes).
|
||
|
+.br
|
||
|
+ACCESS%: percentage of memory accesses are to this memory area.
|
||
|
+.br
|
||
|
+LAT(ns): the average latency (nanoseconds) of memory accesses.
|
||
|
+.br
|
||
|
+DESC: description of memory area (from /proc/<pid>/maps).
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: Switch to WIN1.
|
||
|
+.br
|
||
|
+B: Back to previous window.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.br
|
||
|
+A: Show the memory access node distribution.
|
||
|
+.br
|
||
|
+C: Show the call-chain when process/thread accesses the memory area.
|
||
|
+.PP
|
||
|
+\fB[WIN7 - Memory access node distribution overview]:\fP
|
||
|
+.br
|
||
|
+Get the percentage of memory accesses originated from the process/thread to each node.
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+NODE: the node ID.
|
||
|
+.br
|
||
|
+ACCESS%: percentage of memory accesses are to this node.
|
||
|
+.br
|
||
|
+LAT(ns): the average latency (nanoseconds) of memory accesses to this node.
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: Switch to WIN1.
|
||
|
+.br
|
||
|
+B: Back to previous window.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.PP
|
||
|
+\fB[WIN8 - Break down the memory area into physical memory on node]:\fP
|
||
|
+.br
|
||
|
+Break down the memory area into the physical mapping on node with the
|
||
|
+associated accessing latency of a process/thread.
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+NODE: the node ID.
|
||
|
+.br
|
||
|
+Other metrics remain the same.
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: Switch to WIN1.
|
||
|
+.br
|
||
|
+B: Back to previous window.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.PP
|
||
|
+\fB[WIN9 - Call-chain when process/thread generates the event ("RMA"/"LMA"/"CYCLE"/"IR")]:\fP
|
||
|
+.br
|
||
|
+Determine the call-chains to the code that generates "RMA"/"LMA"/"CYCLE"/"IR".
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+Call-chain list: a list of call-chains.
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: Switch to WIN1.
|
||
|
+.br
|
||
|
+B: Back to the previous window.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.br
|
||
|
+1: Locate call-chain when process/thread generates "RMA"
|
||
|
+.br
|
||
|
+2: Locate call-chain when process/thread generates "LMA"
|
||
|
+.br
|
||
|
+3: Locate call-chain when process/thread generates "CYCLE" (CPU cycle)
|
||
|
+.br
|
||
|
+4: Locate call-chain when process/thread generates "IR" (Instruction Retired)
|
||
|
+.PP
|
||
|
+\fB[WIN10 - Call-chain when process/thread access the memory area]:\fP
|
||
|
+.br
|
||
|
+Determine the call-chains to the code that references this memory area.
|
||
|
+The latency must be greater than the predefined latency threshold
|
||
|
+(128 CPU cycles).
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+Call-chain list: a list of call-chains.
|
||
|
+.br
|
||
|
+Other metrics remain the same.
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: Switch to WIN1.
|
||
|
+.br
|
||
|
+B: Back to previous window.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.PP
|
||
|
+\fB[WIN11 - Node Overview]:\fP
|
||
|
+.br
|
||
|
+Show the basic per-node statistics for this system
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+MEM.ALL: total usable RAM (physical RAM minus a few reserved bits and the kernel binary code).
|
||
|
+.br
|
||
|
+MEM.FREE: sum of LowFree + HighFree (overall stat) .
|
||
|
+.br
|
||
|
+CPU%: per-node CPU utilization.
|
||
|
+.br
|
||
|
+Other metrics remain the same.
|
||
|
+.PP
|
||
|
+\fB[WIN12 - Information of Node N]:\fP
|
||
|
+.br
|
||
|
+Show the memory use and CPU utilization for the selected node.
|
||
|
+.PP
|
||
|
+\fB[KEY METRICS]:\fP
|
||
|
+.br
|
||
|
+CPU: array of logical CPUs which belong to this node.
|
||
|
+.br
|
||
|
+CPU%: per-node CPU utilization.
|
||
|
+.br
|
||
|
+MEM active: the amount of memory that has been used more recently and is not usually reclaimed unless absolute necessary.
|
||
|
+.br
|
||
|
+MEM inactive: the amount of memory that has not been used for a while and is eligible to be swapped to disk.
|
||
|
+.br
|
||
|
+Dirty: the amount of memory waiting to be written back to the disk.
|
||
|
+.br
|
||
|
+Writeback: the amount of memory actively being written back to the disk.
|
||
|
+.br
|
||
|
+Mapped: all pages mapped into a process.
|
||
|
+.PP
|
||
|
+\fB[HOTKEY]:\fP
|
||
|
+.br
|
||
|
+Q: Quit the application.
|
||
|
+.br
|
||
|
+H: Switch to WIN1.
|
||
|
+.br
|
||
|
+B: Back to previous window.
|
||
|
+.br
|
||
|
+R: Refresh to show the latest data.
|
||
|
+.PP
|
||
|
+.SH "OPTIONS"
|
||
|
+The following options are supported by numatop:
|
||
|
+.PP
|
||
|
+-s sampling_precision
|
||
|
+.br
|
||
|
+normal: balance precision and overhead (default)
|
||
|
+.br
|
||
|
+high: high sampling precision (high overhead)
|
||
|
+.br
|
||
|
+low: low sampling precision, suitable for high load system
|
||
|
+.PP
|
||
|
+-l log_level
|
||
|
+.br
|
||
|
+Specifies the level of logging in the log file. Valid values are:
|
||
|
+.br
|
||
|
+1: unknown (reserved for future use)
|
||
|
+.br
|
||
|
+2: all
|
||
|
+.PP
|
||
|
+-f log_file
|
||
|
+.br
|
||
|
+Specifies the log file where output will be written. If the log file is
|
||
|
+not writable, the tool will prompt "Cannot open '<file name>' for writting.".
|
||
|
+.PP
|
||
|
+-d dump_file
|
||
|
+.br
|
||
|
+Specifies the dump file where the screen data will be written. Generally the dump
|
||
|
+file is used for automated test. If the dump file is not writable, the tool will
|
||
|
+prompt "Cannot open <file name> for dump writing."
|
||
|
+.PP
|
||
|
+-h Displays the command's usage.
|
||
|
+.PP
|
||
|
+-t duration
|
||
|
+.br
|
||
|
+Specifies run time duration in seconds.
|
||
|
+.PP
|
||
|
+.SH EXAMPLES
|
||
|
+Example 1: Launch numatop with high sampling precision
|
||
|
+.br
|
||
|
+numatop -s high
|
||
|
+.PP
|
||
|
+Example 2: Write all warning messages in /tmp/numatop.log
|
||
|
+.br
|
||
|
+numatop -l 2 -o /tmp/numatop.log
|
||
|
+.PP
|
||
|
+Example 3: Dump screen data in /tmp/dump.log
|
||
|
+.br
|
||
|
+numatop -d /tmp/dump.log
|
||
|
+.PP
|
||
|
+.SH EXIT STATUS
|
||
|
+.br
|
||
|
+0: successful operation.
|
||
|
+.br
|
||
|
+Other value: an error occurred.
|
||
|
+.PP
|
||
|
+.SH USAGE
|
||
|
+.br
|
||
|
+You must have root privileges to run numatop.
|
||
|
+.br
|
||
|
+Or set -1 in /proc/sys/kernel/perf_event_paranoid
|
||
|
+.PP
|
||
|
+\fBNote\fP: The perf_event_paranoid setting has security implications and a non-root
|
||
|
+user probably doesn't have authority to access /proc. It is highly recommended
|
||
|
+that the user runs \fBnumatop\fP as root.
|
||
|
+.PP
|
||
|
+.SH VERSION
|
||
|
+.br
|
||
|
+
|
||
|
+\fBnumatop\fP requires a patch set to support PEBS Load Latency functionality in the
|
||
|
+kernel. The patch set has not been integrated in 3.8. Probably it will be integrated
|
||
|
+in 3.9. The following steps show how to get and apply the patch set.
|
||
|
+
|
||
|
+.PP
|
||
|
+1. git clone git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
|
||
|
+.br
|
||
|
+2. cd tip
|
||
|
+.br
|
||
|
+3. git checkout perf/x86
|
||
|
+.br
|
||
|
+4. build kernel as usual
|
||
|
+.PP
|
||
|
+
|
||
|
+\fBnumatop\fP supports the Intel Xeon processors: 5500-series, 6500/7500-series,
|
||
|
+5600 series, E7-x8xx-series, and E5-16xx/24xx/26xx/46xx-series.
|
||
|
+\fBNote\fP: CPU microcode version 0x618 or 0x70c or later is required on
|
||
|
+E5-16xx/24xx/26xx/46xx-series. It also supports IBM Power8, Power9, Power10 and Power11 processors.
|
||
|
--
|
||
|
2.41.0
|
||
|
|