628 lines
		
	
	
		
			23 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			628 lines
		
	
	
		
			23 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| perf-report(1)
 | |
| ==============
 | |
| 
 | |
| NAME
 | |
| ----
 | |
| perf-report - Read perf.data (created by perf record) and display the profile
 | |
| 
 | |
| SYNOPSIS
 | |
| --------
 | |
| [verse]
 | |
| 'perf report' [-i <file> | --input=file]
 | |
| 
 | |
| DESCRIPTION
 | |
| -----------
 | |
| This command displays the performance counter profile information recorded
 | |
| via perf record.
 | |
| 
 | |
| OPTIONS
 | |
| -------
 | |
| -i::
 | |
| --input=::
 | |
|         Input file name. (default: perf.data unless stdin is a fifo)
 | |
| 
 | |
| -v::
 | |
| --verbose::
 | |
|         Be more verbose. (show symbol address, etc)
 | |
| 
 | |
| -q::
 | |
| --quiet::
 | |
| 	Do not show any warnings or messages.  (Suppress -v)
 | |
| 
 | |
| -n::
 | |
| --show-nr-samples::
 | |
| 	Show the number of samples for each symbol
 | |
| 
 | |
| --show-cpu-utilization::
 | |
|         Show sample percentage for different cpu modes.
 | |
| 
 | |
| -T::
 | |
| --threads::
 | |
| 	Show per-thread event counters.  The input data file should be recorded
 | |
| 	with -s option.
 | |
| -c::
 | |
| --comms=::
 | |
| 	Only consider symbols in these comms. CSV that understands
 | |
| 	file://filename entries.  This option will affect the percentage of
 | |
| 	the overhead column.  See --percentage for more info.
 | |
| --pid=::
 | |
|         Only show events for given process ID (comma separated list).
 | |
| 
 | |
| --tid=::
 | |
|         Only show events for given thread ID (comma separated list).
 | |
| -d::
 | |
| --dsos=::
 | |
| 	Only consider symbols in these dsos. CSV that understands
 | |
| 	file://filename entries.  This option will affect the percentage of
 | |
| 	the overhead column.  See --percentage for more info.
 | |
| -S::
 | |
| --symbols=::
 | |
| 	Only consider these symbols. CSV that understands
 | |
| 	file://filename entries.  This option will affect the percentage of
 | |
| 	the overhead column.  See --percentage for more info.
 | |
| 
 | |
| --symbol-filter=::
 | |
| 	Only show symbols that match (partially) with this filter.
 | |
| 
 | |
| -U::
 | |
| --hide-unresolved::
 | |
|         Only display entries resolved to a symbol.
 | |
| 
 | |
| -s::
 | |
| --sort=::
 | |
| 	Sort histogram entries by given key(s) - multiple keys can be specified
 | |
| 	in CSV format.  Following sort keys are available:
 | |
| 	pid, comm, dso, symbol, parent, cpu, socket, srcline, weight,
 | |
| 	local_weight, cgroup_id, addr.
 | |
| 
 | |
| 	Each key has following meaning:
 | |
| 
 | |
| 	- comm: command (name) of the task which can be read via /proc/<pid>/comm
 | |
| 	- pid: command and tid of the task
 | |
| 	- dso: name of library or module executed at the time of sample
 | |
| 	- dso_size: size of library or module executed at the time of sample
 | |
| 	- symbol: name of function executed at the time of sample
 | |
| 	- symbol_size: size of function executed at the time of sample
 | |
| 	- parent: name of function matched to the parent regex filter. Unmatched
 | |
| 	entries are displayed as "[other]".
 | |
| 	- cpu: cpu number the task ran at the time of sample
 | |
| 	- socket: processor socket number the task ran at the time of sample
 | |
| 	- srcline: filename and line number executed at the time of sample.  The
 | |
| 	DWARF debugging info must be provided.
 | |
| 	- srcfile: file name of the source file of the samples. Requires dwarf
 | |
| 	information.
 | |
| 	- weight: Event specific weight, e.g. memory latency or transaction
 | |
| 	abort cost. This is the global weight.
 | |
| 	- local_weight: Local weight version of the weight above.
 | |
| 	- cgroup_id: ID derived from cgroup namespace device and inode numbers.
 | |
| 	- cgroup: cgroup pathname in the cgroupfs.
 | |
| 	- transaction: Transaction abort flags.
 | |
| 	- overhead: Overhead percentage of sample
 | |
| 	- overhead_sys: Overhead percentage of sample running in system mode
 | |
| 	- overhead_us: Overhead percentage of sample running in user mode
 | |
| 	- overhead_guest_sys: Overhead percentage of sample running in system mode
 | |
| 	on guest machine
 | |
| 	- overhead_guest_us: Overhead percentage of sample running in user mode on
 | |
| 	guest machine
 | |
| 	- sample: Number of sample
 | |
| 	- period: Raw number of event count of sample
 | |
| 	- time: Separate the samples by time stamp with the resolution specified by
 | |
| 	--time-quantum (default 100ms). Specify with overhead and before it.
 | |
| 	- code_page_size: the code page size of sampled code address (ip)
 | |
| 	- ins_lat: Instruction latency in core cycles. This is the global instruction
 | |
| 	  latency
 | |
| 	- local_ins_lat: Local instruction latency version
 | |
| 	- p_stage_cyc: On powerpc, this presents the number of cycles spent in a
 | |
| 	  pipeline stage. And currently supported only on powerpc.
 | |
| 	- addr: (Full) virtual address of the sampled instruction
 | |
| 	- retire_lat: On X86, this reports pipeline stall of this instruction compared
 | |
| 	  to the previous instruction in cycles. And currently supported only on X86
 | |
| 	- simd: Flags describing a SIMD operation. "e" for empty Arm SVE predicate. "p" for partial Arm SVE predicate
 | |
| 	- type: Data type of sample memory access.
 | |
| 	- typeoff: Offset in the data type of sample memory access.
 | |
| 	- symoff: Offset in the symbol.
 | |
| 	- weight1: Average value of event specific weight (1st field of weight_struct).
 | |
| 	- weight2: Average value of event specific weight (2nd field of weight_struct).
 | |
| 	- weight3: Average value of event specific weight (3rd field of weight_struct).
 | |
| 
 | |
| 	By default, comm, dso and symbol keys are used.
 | |
| 	(i.e. --sort comm,dso,symbol)
 | |
| 
 | |
| 	If --branch-stack option is used, following sort keys are also
 | |
| 	available:
 | |
| 
 | |
| 	- dso_from: name of library or module branched from
 | |
| 	- dso_to: name of library or module branched to
 | |
| 	- symbol_from: name of function branched from
 | |
| 	- symbol_to: name of function branched to
 | |
| 	- srcline_from: source file and line branched from
 | |
| 	- srcline_to: source file and line branched to
 | |
| 	- mispredict: "N" for predicted branch, "Y" for mispredicted branch
 | |
| 	- in_tx: branch in TSX transaction
 | |
| 	- abort: TSX transaction abort.
 | |
| 	- cycles: Cycles in basic block
 | |
| 
 | |
| 	And default sort keys are changed to comm, dso_from, symbol_from, dso_to
 | |
| 	and symbol_to, see '--branch-stack'.
 | |
| 
 | |
| 	When the sort key symbol is specified, columns "IPC" and "IPC Coverage"
 | |
| 	are enabled automatically. Column "IPC" reports the average IPC per function
 | |
| 	and column "IPC coverage" reports the percentage of instructions with
 | |
| 	sampled IPC in this function. IPC means Instruction Per Cycle. If it's low,
 | |
| 	it indicates there may be a performance bottleneck when the function is
 | |
| 	executed, such as a memory access bottleneck. If a function has high overhead
 | |
| 	and low IPC, it's worth further analyzing it to optimize its performance.
 | |
| 
 | |
| 	If the --mem-mode option is used, the following sort keys are also available
 | |
| 	(incompatible with --branch-stack):
 | |
| 	symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline, blocked.
 | |
| 
 | |
| 	- symbol_daddr: name of data symbol being executed on at the time of sample
 | |
| 	- dso_daddr: name of library or module containing the data being executed
 | |
| 	on at the time of the sample
 | |
| 	- locked: whether the bus was locked at the time of the sample
 | |
| 	- tlb: type of tlb access for the data at the time of the sample
 | |
| 	- mem: type of memory access for the data at the time of the sample
 | |
| 	- snoop: type of snoop (if any) for the data at the time of the sample
 | |
| 	- dcacheline: the cacheline the data address is on at the time of the sample
 | |
| 	- phys_daddr: physical address of data being executed on at the time of sample
 | |
| 	- data_page_size: the data page size of data being executed on at the time of sample
 | |
| 	- blocked: reason of blocked load access for the data at the time of the sample
 | |
| 
 | |
| 	And the default sort keys are changed to local_weight, mem, sym, dso,
 | |
| 	symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat,
 | |
| 	see '--mem-mode'.
 | |
| 
 | |
| 	If the data file has tracepoint event(s), following (dynamic) sort keys
 | |
| 	are also available:
 | |
| 	trace, trace_fields, [<event>.]<field>[/raw]
 | |
| 
 | |
| 	- trace: pretty printed trace output in a single column
 | |
| 	- trace_fields: fields in tracepoints in separate columns
 | |
| 	- <field name>: optional event and field name for a specific field
 | |
| 
 | |
| 	The last form consists of event and field names.  If event name is
 | |
| 	omitted, it searches all events for matching field name.  The matched
 | |
| 	field will be shown only for the event has the field.  The event name
 | |
| 	supports substring match so user doesn't need to specify full subsystem
 | |
| 	and event name everytime.  For example, 'sched:sched_switch' event can
 | |
| 	be shortened to 'switch' as long as it's not ambiguous.  Also event can
 | |
| 	be specified by its index (starting from 1) preceded by the '%'.
 | |
| 	So '%1' is the first event, '%2' is the second, and so on.
 | |
| 
 | |
| 	The field name can have '/raw' suffix which disables pretty printing
 | |
| 	and shows raw field value like hex numbers.  The --raw-trace option
 | |
| 	has the same effect for all dynamic sort keys.
 | |
| 
 | |
| 	The default sort keys are changed to 'trace' if all events in the data
 | |
| 	file are tracepoint.
 | |
| 
 | |
| -F::
 | |
| --fields=::
 | |
| 	Specify output field - multiple keys can be specified in CSV format.
 | |
| 	Following fields are available:
 | |
| 	overhead, overhead_sys, overhead_us, overhead_children, sample, period,
 | |
| 	weight1, weight2, weight3, ins_lat, p_stage_cyc and retire_lat.  The
 | |
| 	last 3 names are alias for the corresponding weights.  When the weight
 | |
| 	fields are used, they will show the average value of the weight.
 | |
| 
 | |
| 	Also it can contain any sort key(s).
 | |
| 
 | |
| 	By default, every sort keys not specified in -F will be appended
 | |
| 	automatically.
 | |
| 
 | |
| 	If the keys starts with a prefix '+', then it will append the specified
 | |
|         field(s) to the default field order. For example: perf report -F +period,sample.
 | |
| 
 | |
| -p::
 | |
| --parent=<regex>::
 | |
|         A regex filter to identify parent. The parent is a caller of this
 | |
| 	function and searched through the callchain, thus it requires callchain
 | |
| 	information recorded. The pattern is in the extended regex format and
 | |
| 	defaults to "\^sys_|^do_page_fault", see '--sort parent'.
 | |
| 
 | |
| -x::
 | |
| --exclude-other::
 | |
|         Only display entries with parent-match.
 | |
| 
 | |
| -w::
 | |
| --column-widths=<width[,width...]>::
 | |
| 	Force each column width to the provided list, for large terminal
 | |
| 	readability.  0 means no limit (default behavior).
 | |
| 
 | |
| -t::
 | |
| --field-separator=::
 | |
| 	Use a special separator character and don't pad with spaces, replacing
 | |
| 	all occurrences of this separator in symbol names (and other output)
 | |
| 	with a '.' character, that thus it's the only non valid separator.
 | |
| 
 | |
| -D::
 | |
| --dump-raw-trace::
 | |
|         Dump raw trace in ASCII.
 | |
| 
 | |
| --disable-order::
 | |
| 	Disable raw trace ordering.
 | |
| 
 | |
| -g::
 | |
| --call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>::
 | |
|         Display call chains using type, min percent threshold, print limit,
 | |
| 	call order, sort key, optional branch and value.  Note that ordering
 | |
| 	is not fixed so any parameter can be given in an arbitrary order.
 | |
| 	One exception is the print_limit which should be preceded by threshold.
 | |
| 
 | |
| 	print_type can be either:
 | |
| 	- flat: single column, linear exposure of call chains.
 | |
| 	- graph: use a graph tree, displaying absolute overhead rates. (default)
 | |
| 	- fractal: like graph, but displays relative rates. Each branch of
 | |
| 		 the tree is considered as a new profiled object.
 | |
| 	- folded: call chains are displayed in a line, separated by semicolons
 | |
| 	- none: disable call chain display.
 | |
| 
 | |
| 	threshold is a percentage value which specifies a minimum percent to be
 | |
| 	included in the output call graph.  Default is 0.5 (%).
 | |
| 
 | |
| 	print_limit is only applied when stdio interface is used.  It's to limit
 | |
| 	number of call graph entries in a single hist entry.  Note that it needs
 | |
| 	to be given after threshold (but not necessarily consecutive).
 | |
| 	Default is 0 (unlimited).
 | |
| 
 | |
| 	order can be either:
 | |
| 	- callee: callee based call graph.
 | |
| 	- caller: inverted caller based call graph.
 | |
| 	Default is 'caller' when --children is used, otherwise 'callee'.
 | |
| 
 | |
| 	sort_key can be:
 | |
| 	- function: compare on functions (default)
 | |
| 	- address: compare on individual code addresses
 | |
| 	- srcline: compare on source filename and line number
 | |
| 
 | |
| 	branch can be:
 | |
| 	- branch: include last branch information in callgraph when available.
 | |
| 	          Usually more convenient to use --branch-history for this.
 | |
| 
 | |
| 	value can be:
 | |
| 	- percent: display overhead percent (default)
 | |
| 	- period: display event period
 | |
| 	- count: display event count
 | |
| 
 | |
| --children::
 | |
| 	Accumulate callchain of children to parent entry so that then can
 | |
| 	show up in the output.  The output will have a new "Children" column
 | |
| 	and will be sorted on the data.  It requires callchains are recorded.
 | |
| 	See the `overhead calculation' section for more details. Enabled by
 | |
| 	default, disable with --no-children.
 | |
| 
 | |
| --max-stack::
 | |
| 	Set the stack depth limit when parsing the callchain, anything
 | |
| 	beyond the specified depth will be ignored. This is a trade-off
 | |
| 	between information loss and faster processing especially for
 | |
| 	workloads that can have a very long callchain stack.
 | |
| 	Note that when using the --itrace option the synthesized callchain size
 | |
| 	will override this value if the synthesized callchain size is bigger.
 | |
| 
 | |
| 	Default: 127
 | |
| 
 | |
| -G::
 | |
| --inverted::
 | |
|         alias for inverted caller based call graph.
 | |
| 
 | |
| --ignore-callees=<regex>::
 | |
|         Ignore callees of the function(s) matching the given regex.
 | |
|         This has the effect of collecting the callers of each such
 | |
|         function into one place in the call-graph tree.
 | |
| 
 | |
| --pretty=<key>::
 | |
|         Pretty printing style.  key: normal, raw
 | |
| 
 | |
| --stdio:: Use the stdio interface.
 | |
| 
 | |
| --stdio-color::
 | |
| 	'always', 'never' or 'auto', allowing configuring color output
 | |
| 	via the command line, in addition to via "color.ui" .perfconfig.
 | |
| 	Use '--stdio-color always' to generate color even when redirecting
 | |
| 	to a pipe or file. Using just '--stdio-color' is equivalent to
 | |
| 	using 'always'.
 | |
| 
 | |
| --tui:: Use the TUI interface, that is integrated with annotate and allows
 | |
|         zooming into DSOs or threads, among other features. Use of --tui
 | |
| 	requires a tty, if one is not present, as when piping to other
 | |
| 	commands, the stdio interface is used.
 | |
| 
 | |
| --gtk:: Use the GTK2 interface.
 | |
| 
 | |
| -k::
 | |
| --vmlinux=<file>::
 | |
|         vmlinux pathname
 | |
| 
 | |
| --ignore-vmlinux::
 | |
| 	Ignore vmlinux files.
 | |
| 
 | |
| --kallsyms=<file>::
 | |
|         kallsyms pathname
 | |
| 
 | |
| -m::
 | |
| --modules::
 | |
|         Load module symbols. WARNING: This should only be used with -k and
 | |
|         a LIVE kernel.
 | |
| 
 | |
| -f::
 | |
| --force::
 | |
|         Don't do ownership validation.
 | |
| 
 | |
| --symfs=<directory>::
 | |
|         Look for files with symbols relative to this directory.
 | |
| 
 | |
| -C::
 | |
| --cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
 | |
| 	be provided as a comma-separated list with no space: 0,1. Ranges of
 | |
| 	CPUs are specified with -: 0-2. Default is to report samples on all
 | |
| 	CPUs.
 | |
| 
 | |
| -M::
 | |
| --disassembler-style=:: Set disassembler style for objdump.
 | |
| 
 | |
| --source::
 | |
| 	Interleave source code with assembly code. Enabled by default,
 | |
| 	disable with --no-source.
 | |
| 
 | |
| --asm-raw::
 | |
| 	Show raw instruction encoding of assembly instructions.
 | |
| 
 | |
| --show-total-period:: Show a column with the sum of periods.
 | |
| 
 | |
| -I::
 | |
| --show-info::
 | |
| 	Display extended information about the perf.data file. This adds
 | |
| 	information which may be very large and thus may clutter the display.
 | |
| 	It currently includes: cpu and numa topology of the host system.
 | |
| 
 | |
| -b::
 | |
| --branch-stack::
 | |
| 	Use the addresses of sampled taken branches instead of the instruction
 | |
| 	address to build the histograms. To generate meaningful output, the
 | |
| 	perf.data file must have been obtained using perf record -b or
 | |
| 	perf record --branch-filter xxx where xxx is a branch filter option.
 | |
| 	perf report is able to auto-detect whether a perf.data file contains
 | |
| 	branch stacks and it will automatically switch to the branch view mode,
 | |
| 	unless --no-branch-stack is used.
 | |
| 
 | |
| --branch-history::
 | |
| 	Add the addresses of sampled taken branches to the callstack.
 | |
| 	This allows to examine the path the program took to each sample.
 | |
| 	The data collection must have used -b (or -j) and -g.
 | |
| 
 | |
| --addr2line=<path>::
 | |
|         Path to addr2line binary.
 | |
| 
 | |
| --objdump=<path>::
 | |
|         Path to objdump binary.
 | |
| 
 | |
| --prefix=PREFIX::
 | |
| --prefix-strip=N::
 | |
| 	Remove first N entries from source file path names in executables
 | |
| 	and add PREFIX. This allows to display source code compiled on systems
 | |
| 	with different file system layout.
 | |
| 
 | |
| --group::
 | |
| 	Show event group information together. It forces group output also
 | |
| 	if there are no groups defined in data file.
 | |
| 
 | |
| --group-sort-idx::
 | |
| 	Sort the output by the event at the index n in group. If n is invalid,
 | |
| 	sort by the first event. It can support multiple groups with different
 | |
| 	amount of events. WARNING: This should be used on grouped events.
 | |
| 
 | |
| --demangle::
 | |
| 	Demangle symbol names to human readable form. It's enabled by default,
 | |
| 	disable with --no-demangle.
 | |
| 
 | |
| --demangle-kernel::
 | |
| 	Demangle kernel symbol names to human readable form (for C++ kernels).
 | |
| 
 | |
| --mem-mode::
 | |
| 	Use the data addresses of samples in addition to instruction addresses
 | |
| 	to build the histograms.  To generate meaningful output, the perf.data
 | |
| 	file must have been obtained using perf record -d -W and using a
 | |
| 	special event -e cpu/mem-loads/p or -e cpu/mem-stores/p. See
 | |
| 	'perf mem' for simpler access.
 | |
| 
 | |
| --percent-limit::
 | |
| 	Do not show entries which have an overhead under that percent.
 | |
| 	(Default: 0).  Note that this option also sets the percent limit (threshold)
 | |
| 	of callchains.  However the default value of callchain threshold is
 | |
| 	different than the default value of hist entries.  Please see the
 | |
| 	--call-graph option for details.
 | |
| 
 | |
| --percentage::
 | |
| 	Determine how to display the overhead percentage of filtered entries.
 | |
| 	Filters can be applied by --comms, --dsos and/or --symbols options and
 | |
| 	Zoom operations on the TUI (thread, dso, etc).
 | |
| 
 | |
| 	"relative" means it's relative to filtered entries only so that the
 | |
| 	sum of shown entries will be always 100%.  "absolute" means it retains
 | |
| 	the original value before and after the filter is applied.
 | |
| 
 | |
| --header::
 | |
| 	Show header information in the perf.data file.  This includes
 | |
| 	various information like hostname, OS and perf version, cpu/mem
 | |
| 	info, perf command line, event list and so on.  Currently only
 | |
| 	--stdio output supports this feature.
 | |
| 
 | |
| --header-only::
 | |
| 	Show only perf.data header (forces --stdio).
 | |
| 
 | |
| --time::
 | |
| 	Only analyze samples within given time window: <start>,<stop>. Times
 | |
| 	have the format seconds.nanoseconds. If start is not given (i.e. time
 | |
| 	string is ',x.y') then analysis starts at the beginning of the file. If
 | |
| 	stop time is not given (i.e. time string is 'x.y,') then analysis goes
 | |
| 	to end of file. Multiple ranges can be separated by spaces, which
 | |
| 	requires the argument to be quoted e.g. --time "1234.567,1234.789 1235,"
 | |
| 
 | |
| 	Also support time percent with multiple time ranges. Time string is
 | |
| 	'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
 | |
| 
 | |
| 	For example:
 | |
| 	Select the second 10% time slice:
 | |
| 
 | |
| 	  perf report --time 10%/2
 | |
| 
 | |
| 	Select from 0% to 10% time slice:
 | |
| 
 | |
| 	  perf report --time 0%-10%
 | |
| 
 | |
| 	Select the first and second 10% time slices:
 | |
| 
 | |
| 	  perf report --time 10%/1,10%/2
 | |
| 
 | |
| 	Select from 0% to 10% and 30% to 40% slices:
 | |
| 
 | |
| 	  perf report --time 0%-10%,30%-40%
 | |
| 
 | |
| --switch-on EVENT_NAME::
 | |
| 	Only consider events after this event is found.
 | |
| 
 | |
| 	This may be interesting to measure a workload only after some initialization
 | |
| 	phase is over, i.e. insert a perf probe at that point and then using this
 | |
| 	option with that probe.
 | |
| 
 | |
| --switch-off EVENT_NAME::
 | |
| 	Stop considering events after this event is found.
 | |
| 
 | |
| --show-on-off-events::
 | |
| 	Show the --switch-on/off events too. This has no effect in 'perf report' now
 | |
| 	but probably we'll make the default not to show the switch-on/off events
 | |
|         on the --group mode and if there is only one event besides the off/on ones,
 | |
| 	go straight to the histogram browser, just like 'perf report' with no events
 | |
| 	explicitly specified does.
 | |
| 
 | |
| --itrace::
 | |
| 	Options for decoding instruction tracing data. The options are:
 | |
| 
 | |
| include::itrace.txt[]
 | |
| 
 | |
| 	To disable decoding entirely, use --no-itrace.
 | |
| 
 | |
| --full-source-path::
 | |
| 	Show the full path for source files for srcline output.
 | |
| 
 | |
| --show-ref-call-graph::
 | |
| 	When multiple events are sampled, it may not be needed to collect
 | |
| 	callgraphs for all of them. The sample sites are usually nearby,
 | |
| 	and it's enough to collect the callgraphs on a reference event.
 | |
| 	So user can use "call-graph=no" event modifier to disable callgraph
 | |
| 	for other events to reduce the overhead.
 | |
| 	However, perf report cannot show callgraphs for the event which
 | |
| 	disable the callgraph.
 | |
| 	This option extends the perf report to show reference callgraphs,
 | |
| 	which collected by reference event, in no callgraph event.
 | |
| 
 | |
| --stitch-lbr::
 | |
| 	Show callgraph with stitched LBRs, which may have more complete
 | |
| 	callgraph. The perf.data file must have been obtained using
 | |
| 	perf record --call-graph lbr.
 | |
| 	Disabled by default. In common cases with call stack overflows,
 | |
| 	it can recreate better call stacks than the default lbr call stack
 | |
| 	output. But this approach is not foolproof. There can be cases
 | |
| 	where it creates incorrect call stacks from incorrect matches.
 | |
| 	The known limitations include exception handing such as
 | |
| 	setjmp/longjmp will have calls/returns not match.
 | |
| 
 | |
| --socket-filter::
 | |
| 	Only report the samples on the processor socket that match with this filter
 | |
| 
 | |
| --samples=N::
 | |
| 	Save N individual samples for each histogram entry to show context in perf
 | |
| 	report tui browser.
 | |
| 
 | |
| --raw-trace::
 | |
| 	When displaying traceevent output, do not use print fmt or plugins.
 | |
| 
 | |
| -H::
 | |
| --hierarchy::
 | |
| 	Enable hierarchical output.  In the hierarchy mode, each sort key groups
 | |
| 	samples based on the criteria and then sub-divide it using the lower
 | |
| 	level sort key.
 | |
| 
 | |
| 	For example:
 | |
| 	In normal output:
 | |
| 
 | |
| 	  perf report -s dso,sym
 | |
| 	  # Overhead  Shared Object      Symbol
 | |
| 	      50.00%  [kernel.kallsyms]  [k] kfunc1
 | |
| 	      20.00%  perf               [.] foo
 | |
| 	      15.00%  [kernel.kallsyms]  [k] kfunc2
 | |
| 	      10.00%  perf               [.] bar
 | |
| 	       5.00%  libc.so            [.] libcall
 | |
| 
 | |
| 	In hierarchy output:
 | |
| 
 | |
| 	  perf report -s dso,sym --hierarchy
 | |
| 	  #   Overhead  Shared Object / Symbol
 | |
| 	      65.00%    [kernel.kallsyms]
 | |
| 	        50.00%    [k] kfunc1
 | |
| 	        15.00%    [k] kfunc2
 | |
| 	      30.00%    perf
 | |
| 	        20.00%    [.] foo
 | |
| 	        10.00%    [.] bar
 | |
| 	       5.00%    libc.so
 | |
| 	         5.00%    [.] libcall
 | |
| 
 | |
| --inline::
 | |
| 	If a callgraph address belongs to an inlined function, the inline stack
 | |
| 	will be printed. Each entry is function name or file/line. Enabled by
 | |
| 	default, disable with --no-inline.
 | |
| 
 | |
| --mmaps::
 | |
| 	Show --tasks output plus mmap information in a format similar to
 | |
| 	/proc/<PID>/maps.
 | |
| 
 | |
| 	Please note that not all mmaps are stored, options affecting which ones
 | |
| 	are include 'perf record --data', for instance.
 | |
| 
 | |
| --ns::
 | |
| 	Show time stamps in nanoseconds.
 | |
| 
 | |
| --stats::
 | |
| 	Display overall events statistics without any further processing.
 | |
| 	(like the one at the end of the perf report -D command)
 | |
| 
 | |
| --tasks::
 | |
| 	Display monitored tasks stored in perf data. Displaying pid/tid/ppid
 | |
| 	plus the command string aligned to distinguish parent and child tasks.
 | |
| 
 | |
| --percent-type::
 | |
| 	Set annotation percent type from following choices:
 | |
| 	  global-period, local-period, global-hits, local-hits
 | |
| 
 | |
| 	The local/global keywords set if the percentage is computed
 | |
| 	in the scope of the function (local) or the whole data (global).
 | |
| 	The period/hits keywords set the base the percentage is computed
 | |
| 	on - the samples period or the number of samples (hits).
 | |
| 
 | |
| --time-quantum::
 | |
| 	Configure time quantum for time sort key. Default 100ms.
 | |
| 	Accepts s, us, ms, ns units.
 | |
| 
 | |
| --total-cycles::
 | |
| 	When --total-cycles is specified, it supports sorting for all blocks by
 | |
| 	'Sampled Cycles%'. This is useful to concentrate on the globally hottest
 | |
| 	blocks. In output, there are some new columns:
 | |
| 
 | |
| 	'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles
 | |
| 	'Sampled Cycles'  - block sampled cycles aggregation
 | |
| 	'Avg Cycles%'     - block average sampled cycles / sum of total block average
 | |
| 			    sampled cycles
 | |
| 	'Avg Cycles'      - block average sampled cycles
 | |
| 	'Branch Counter'  - block branch counter histogram (with -v showing the number)
 | |
| 
 | |
| --skip-empty::
 | |
| 	Do not print 0 results in the --stat output.
 | |
| 
 | |
| include::callchain-overhead-calculation.txt[]
 | |
| 
 | |
| SEE ALSO
 | |
| --------
 | |
| linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1],
 | |
| linkperf:perf-intel-pt[1]
 |