607 lines
		
	
	
		
			33 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			607 lines
		
	
	
		
			33 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
 | ||
| 
 | ||
| ======================================================
 | ||
| Discovering Linux kernel subsystems used by a workload
 | ||
| ======================================================
 | ||
| 
 | ||
| :Authors: - Shuah Khan <skhan@linuxfoundation.org>
 | ||
|           - Shefali Sharma <sshefali021@gmail.com>
 | ||
| :maintained-by: Shuah Khan <skhan@linuxfoundation.org>
 | ||
| 
 | ||
| Key Points
 | ||
| ==========
 | ||
| 
 | ||
|  * Understanding system resources necessary to build and run a workload
 | ||
|    is important.
 | ||
|  * Linux tracing and strace can be used to discover the system resources
 | ||
|    in use by a workload. The completeness of the system usage information
 | ||
|    depends on the completeness of coverage of a workload.
 | ||
|  * Performance and security of the operating system can be analyzed with
 | ||
|    the help of tools such as:
 | ||
|    `perf <https://man7.org/linux/man-pages/man1/perf.1.html>`_,
 | ||
|    `stress-ng <https://www.mankier.com/1/stress-ng>`_,
 | ||
|    `paxtest <https://github.com/opntr/paxtest-freebsd>`_.
 | ||
|  * Once we discover and understand the workload needs, we can focus on them
 | ||
|    to avoid regressions and use it to evaluate safety considerations.
 | ||
| 
 | ||
| Methodology
 | ||
| ===========
 | ||
| 
 | ||
| `strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_ is a
 | ||
| diagnostic, instructional, and debugging tool and can be used to discover
 | ||
| the system resources in use by a workload. Once we discover and understand
 | ||
| the workload needs, we can focus on them to avoid regressions and use it
 | ||
| to evaluate safety considerations. We use strace tool to trace workloads.
 | ||
| 
 | ||
| This method of tracing using strace tells us the system calls invoked by
 | ||
| the workload and doesn't include all the system calls that can be invoked
 | ||
| by it. In addition, this tracing method tells us just the code paths within
 | ||
| these system calls that are invoked. As an example, if a workload opens a
 | ||
| file and reads from it successfully, then the success path is the one that
 | ||
| is traced. Any error paths in that system call will not be traced. If there
 | ||
| is a workload that provides full coverage of a workload then the method
 | ||
| outlined here will trace and find all possible code paths. The completeness
 | ||
| of the system usage information depends on the completeness of coverage of a
 | ||
| workload.
 | ||
| 
 | ||
| The goal is tracing a workload on a system running a default kernel without
 | ||
| requiring custom kernel installs.
 | ||
| 
 | ||
| How do we gather fine-grained system information?
 | ||
| =================================================
 | ||
| 
 | ||
| strace tool can be used to trace system calls made by a process and signals
 | ||
| it receives. System calls are the fundamental interface between an
 | ||
| application and the operating system kernel. They enable a program to
 | ||
| request services from the kernel. For instance, the open() system call in
 | ||
| Linux is used to provide access to a file in the file system. strace enables
 | ||
| us to track all the system calls made by an application. It lists all the
 | ||
| system calls made by a process and their resulting output.
 | ||
| 
 | ||
| You can generate profiling data combining strace and perf record tools to
 | ||
| record the events and information associated with a process. This provides
 | ||
| insight into the process. "perf annotate" tool generates the statistics of
 | ||
| each instruction of the program. This document goes over the details of how
 | ||
| to gather fine-grained information on a workload's usage of system resources.
 | ||
| 
 | ||
| We used strace to trace the perf, stress-ng, paxtest workloads to illustrate
 | ||
| our methodology to discover resources used by a workload. This process can
 | ||
| be applied to trace other workloads.
 | ||
| 
 | ||
| Getting the system ready for tracing
 | ||
| ====================================
 | ||
| 
 | ||
| Before we can get started we will show you how to get your system ready.
 | ||
| We assume that you have a Linux distribution running on a physical system
 | ||
| or a virtual machine. Most distributions will include strace command. Let’s
 | ||
| install other tools that aren’t usually included to build Linux kernel.
 | ||
| Please note that the following works on Debian based distributions. You
 | ||
| might have to find equivalent packages on other Linux distributions.
 | ||
| 
 | ||
| Install tools to build Linux kernel and tools in kernel repository.
 | ||
| scripts/ver_linux is a good way to check if your system already has
 | ||
| the necessary tools::
 | ||
| 
 | ||
|   sudo apt-get build-essentials flex bison yacc
 | ||
|   sudo apt install libelf-dev systemtap-sdt-dev libaudit-dev libslang2-dev libperl-dev libdw-dev
 | ||
| 
 | ||
| cscope is a good tool to browse kernel sources. Let's install it now::
 | ||
| 
 | ||
|   sudo apt-get install cscope
 | ||
| 
 | ||
| Install stress-ng and paxtest::
 | ||
| 
 | ||
|   apt-get install stress-ng
 | ||
|   apt-get install paxtest
 | ||
| 
 | ||
| Workload overview
 | ||
| =================
 | ||
| 
 | ||
| As mentioned earlier, we used strace to trace perf bench, stress-ng and
 | ||
| paxtest workloads to show how to analyze a workload and identify Linux
 | ||
| subsystems used by these workloads. Let's start with an overview of these
 | ||
| three workloads to get a better understanding of what they do and how to
 | ||
| use them.
 | ||
| 
 | ||
| perf bench (all) workload
 | ||
| -------------------------
 | ||
| 
 | ||
| The perf bench command contains multiple multi-threaded microkernel
 | ||
| benchmarks for executing different subsystems in the Linux kernel and
 | ||
| system calls. This allows us to easily measure the impact of changes,
 | ||
| which can help mitigate performance regressions. It also acts as a common
 | ||
| benchmarking framework, enabling developers to easily create test cases,
 | ||
| integrate transparently, and use performance-rich tooling subsystems.
 | ||
| 
 | ||
| Stress-ng netdev stressor workload
 | ||
| ----------------------------------
 | ||
| 
 | ||
| stress-ng is used for performing stress testing on the kernel. It allows
 | ||
| you to exercise various physical subsystems of the computer, as well as
 | ||
| interfaces of the OS kernel, using "stressor-s". They are available for
 | ||
| CPU, CPU cache, devices, I/O, interrupts, file system, memory, network,
 | ||
| operating system, pipelines, schedulers, and virtual machines. Please refer
 | ||
| to the `stress-ng man-page <https://www.mankier.com/1/stress-ng>`_ to
 | ||
| find the description of all the available stressor-s. The netdev stressor
 | ||
| starts specified number (N) of workers that exercise various netdevice
 | ||
| ioctl commands across all the available network devices.
 | ||
| 
 | ||
| paxtest kiddie workload
 | ||
| -----------------------
 | ||
| 
 | ||
| paxtest is a program that tests buffer overflows in the kernel. It tests
 | ||
| kernel enforcements over memory usage. Generally, execution in some memory
 | ||
| segments makes buffer overflows possible. It runs a set of programs that
 | ||
| attempt to subvert memory usage. It is used as a regression test suite for
 | ||
| PaX, but might be useful to test other memory protection patches for the
 | ||
| kernel. We used paxtest kiddie mode which looks for simple vulnerabilities.
 | ||
| 
 | ||
| What is strace and how do we use it?
 | ||
| ====================================
 | ||
| 
 | ||
| As mentioned earlier, strace which is a useful diagnostic, instructional,
 | ||
| and debugging tool and can be used to discover the system resources in use
 | ||
| by a workload. It can be used:
 | ||
| 
 | ||
|  * To see how a process interacts with the kernel.
 | ||
|  * To see why a process is failing or hanging.
 | ||
|  * For reverse engineering a process.
 | ||
|  * To find the files on which a program depends.
 | ||
|  * For analyzing the performance of an application.
 | ||
|  * For troubleshooting various problems related to the operating system.
 | ||
| 
 | ||
| In addition, strace can generate run-time statistics on times, calls, and
 | ||
| errors for each system call and report a summary when program exits,
 | ||
| suppressing the regular output. This attempts to show system time (CPU time
 | ||
| spent running in the kernel) independent of wall clock time. We plan to use
 | ||
| these features to get information on workload system usage.
 | ||
| 
 | ||
| strace command supports basic, verbose, and stats modes. strace command when
 | ||
| run in verbose mode gives more detailed information about the system calls
 | ||
| invoked by a process.
 | ||
| 
 | ||
| Running strace -c generates a report of the percentage of time spent in each
 | ||
| system call, the total time in seconds, the microseconds per call, the total
 | ||
| number of calls, the count of each system call that has failed with an error
 | ||
| and the type of system call made.
 | ||
| 
 | ||
|  * Usage: strace <command we want to trace>
 | ||
|  * Verbose mode usage: strace -v <command>
 | ||
|  * Gather statistics: strace -c <command>
 | ||
| 
 | ||
| We used the “-c” option to gather fine-grained run-time statistics in use
 | ||
| by three workloads we have chose for this analysis.
 | ||
| 
 | ||
|  * perf
 | ||
|  * stress-ng
 | ||
|  * paxtest
 | ||
| 
 | ||
| What is cscope and how do we use it?
 | ||
| ====================================
 | ||
| 
 | ||
| Now let’s look at `cscope <https://cscope.sourceforge.net/>`_, a command
 | ||
| line tool for browsing C, C++ or Java code-bases. We can use it to find
 | ||
| all the references to a symbol, global definitions, functions called by a
 | ||
| function, functions calling a function, text strings, regular expression
 | ||
| patterns, files including a file.
 | ||
| 
 | ||
| We can use cscope to find which system call belongs to which subsystem.
 | ||
| This way we can find the kernel subsystems used by a process when it is
 | ||
| executed.
 | ||
| 
 | ||
| Let’s checkout the latest Linux repository and build cscope database::
 | ||
| 
 | ||
|   git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux
 | ||
|   cd linux
 | ||
|   cscope -R -p10  # builds cscope.out database before starting browse session
 | ||
|   cscope -d -p10  # starts browse session on cscope.out database
 | ||
| 
 | ||
| Note: Run "cscope -R -p10" to build the database and c"scope -d -p10" to
 | ||
| enter into the browsing session. cscope by default cscope.out database.
 | ||
| To get out of this mode press ctrl+d. -p option is used to specify the
 | ||
| number of file path components to display. -p10 is optimal for browsing
 | ||
| kernel sources.
 | ||
| 
 | ||
| What is perf and how do we use it?
 | ||
| ==================================
 | ||
| 
 | ||
| Perf is an analysis tool based on Linux 2.6+ systems, which abstracts the
 | ||
| CPU hardware difference in performance measurement in Linux, and provides
 | ||
| a simple command line interface. Perf is based on the perf_events interface
 | ||
| exported by the kernel. It is very useful for profiling the system and
 | ||
| finding performance bottlenecks in an application.
 | ||
| 
 | ||
| If you haven't already checked out the Linux mainline repository, you can do
 | ||
| so and then build kernel and perf tool::
 | ||
| 
 | ||
|   git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux
 | ||
|   cd linux
 | ||
|   make -j3 all
 | ||
|   cd tools/perf
 | ||
|   make
 | ||
| 
 | ||
| Note: The perf command can be built without building the kernel in the
 | ||
| repository and can be run on older kernels. However matching the kernel
 | ||
| and perf revisions gives more accurate information on the subsystem usage.
 | ||
| 
 | ||
| We used "perf stat" and "perf bench" options. For a detailed information on
 | ||
| the perf tool, run "perf -h".
 | ||
| 
 | ||
| perf stat
 | ||
| ---------
 | ||
| The perf stat command generates a report of various hardware and software
 | ||
| events. It does so with the help of hardware counter registers found in
 | ||
| modern CPUs that keep the count of these activities. "perf stat cal" shows
 | ||
| stats for cal command.
 | ||
| 
 | ||
| Perf bench
 | ||
| ----------
 | ||
| The perf bench command contains multiple multi-threaded microkernel
 | ||
| benchmarks for executing different subsystems in the Linux kernel and
 | ||
| system calls. This allows us to easily measure the impact of changes,
 | ||
| which can help mitigate performance regressions. It also acts as a common
 | ||
| benchmarking framework, enabling developers to easily create test cases,
 | ||
| integrate transparently, and use performance-rich tooling.
 | ||
| 
 | ||
| "perf bench all" command runs the following benchmarks:
 | ||
| 
 | ||
|  * sched/messaging
 | ||
|  * sched/pipe
 | ||
|  * syscall/basic
 | ||
|  * mem/memcpy
 | ||
|  * mem/memset
 | ||
| 
 | ||
| What is stress-ng and how do we use it?
 | ||
| =======================================
 | ||
| 
 | ||
| As mentioned earlier, stress-ng is used for performing stress testing on
 | ||
| the kernel. It allows you to exercise various physical subsystems of the
 | ||
| computer, as well as interfaces of the OS kernel, using stressor-s. They
 | ||
| are available for CPU, CPU cache, devices, I/O, interrupts, file system,
 | ||
| memory, network, operating system, pipelines, schedulers, and virtual
 | ||
| machines.
 | ||
| 
 | ||
| The netdev stressor starts N workers that exercise various netdevice ioctl
 | ||
| commands across all the available network devices. The following ioctls are
 | ||
| exercised:
 | ||
| 
 | ||
|  * SIOCGIFCONF, SIOCGIFINDEX, SIOCGIFNAME, SIOCGIFFLAGS
 | ||
|  * SIOCGIFADDR, SIOCGIFNETMASK, SIOCGIFMETRIC, SIOCGIFMTU
 | ||
|  * SIOCGIFHWADDR, SIOCGIFMAP, SIOCGIFTXQLEN
 | ||
| 
 | ||
| The following command runs the stressor::
 | ||
| 
 | ||
|   stress-ng --netdev 1 -t 60 --metrics command.
 | ||
| 
 | ||
| We can use the perf record command to record the events and information
 | ||
| associated with a process. This command records the profiling data in the
 | ||
| perf.data file in the same directory.
 | ||
| 
 | ||
| Using the following commands you can record the events associated with the
 | ||
| netdev stressor, view the generated report perf.data and annotate the to
 | ||
| view the statistics of each instruction of the program::
 | ||
| 
 | ||
|   perf record stress-ng --netdev 1 -t 60 --metrics command.
 | ||
|   perf report
 | ||
|   perf annotate
 | ||
| 
 | ||
| What is paxtest and how do we use it?
 | ||
| =====================================
 | ||
| 
 | ||
| paxtest is a program that tests buffer overflows in the kernel. It tests
 | ||
| kernel enforcements over memory usage. Generally, execution in some memory
 | ||
| segments makes buffer overflows possible. It runs a set of programs that
 | ||
| attempt to subvert memory usage. It is used as a regression test suite for
 | ||
| PaX, and will be useful to test other memory protection patches for the
 | ||
| kernel.
 | ||
| 
 | ||
| paxtest provides kiddie and blackhat modes. The paxtest kiddie mode runs
 | ||
| in normal mode, whereas the blackhat mode tries to get around the protection
 | ||
| of the kernel testing for vulnerabilities. We focus on the kiddie mode here
 | ||
| and combine "paxtest kiddie" run with "perf record" to collect CPU stack
 | ||
| traces for the paxtest kiddie run to see which function is calling other
 | ||
| functions in the performance profile. Then the "dwarf" (DWARF's Call Frame
 | ||
| Information) mode can be used to unwind the stack.
 | ||
| 
 | ||
| The following command can be used to view resulting report in call-graph
 | ||
| format::
 | ||
| 
 | ||
|   perf record --call-graph dwarf paxtest kiddie
 | ||
|   perf report --stdio
 | ||
| 
 | ||
| Tracing workloads
 | ||
| =================
 | ||
| 
 | ||
| Now that we understand the workloads, let's start tracing them.
 | ||
| 
 | ||
| Tracing perf bench all workload
 | ||
| -------------------------------
 | ||
| 
 | ||
| Run the following command to trace perf bench all workload::
 | ||
| 
 | ||
|  strace -c perf bench all
 | ||
| 
 | ||
| **System Calls made by the workload**
 | ||
| 
 | ||
| The below table shows the system calls invoked by the workload, number of
 | ||
| times each system call is invoked, and the corresponding Linux subsystem.
 | ||
| 
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | System Call       | # calls   | Linux Subsystem | System Call (API)       |
 | ||
| +===================+===========+=================+=========================+
 | ||
| | getppid           | 10000001  | Process Mgmt    | sys_getpid()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | clone             | 1077      | Process Mgmt.   | sys_clone()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | prctl             | 23        | Process Mgmt.   | sys_prctl()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | prlimit64         | 7         | Process Mgmt.   | sys_prlimit64()         |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getpid            | 10        | Process Mgmt.   | sys_getpid()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | uname             | 3         | Process Mgmt.   | sys_uname()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | sysinfo           | 1         | Process Mgmt.   | sys_sysinfo()           |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getuid            | 1         | Process Mgmt.   | sys_getuid()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getgid            | 1         | Process Mgmt.   | sys_getgid()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | geteuid           | 1         | Process Mgmt.   | sys_geteuid()           |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getegid           | 1         | Process Mgmt.   | sys_getegid             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | close             | 49951     | Filesystem      | sys_close()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | pipe              | 604       | Filesystem      | sys_pipe()              |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | openat            | 48560     | Filesystem      | sys_opennat()           |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | fstat             | 8338      | Filesystem      | sys_fstat()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | stat              | 1573      | Filesystem      | sys_stat()              |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | pread64           | 9646      | Filesystem      | sys_pread64()           |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getdents64        | 1873      | Filesystem      | sys_getdents64()        |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | access            | 3         | Filesystem      | sys_access()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | lstat             | 1880      | Filesystem      | sys_lstat()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | lseek             | 6         | Filesystem      | sys_lseek()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | ioctl             | 3         | Filesystem      | sys_ioctl()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | dup2              | 1         | Filesystem      | sys_dup2()              |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | execve            | 2         | Filesystem      | sys_execve()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | fcntl             | 8779      | Filesystem      | sys_fcntl()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | statfs            | 1         | Filesystem      | sys_statfs()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | epoll_create      | 2         | Filesystem      | sys_epoll_create()      |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | epoll_ctl         | 64        | Filesystem      | sys_epoll_ctl()         |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | newfstatat        | 8318      | Filesystem      | sys_newfstatat()        |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | eventfd2          | 192       | Filesystem      | sys_eventfd2()          |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | mmap              | 243       | Memory Mgmt.    | sys_mmap()              |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | mprotect          | 32        | Memory Mgmt.    | sys_mprotect()          |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | brk               | 21        | Memory Mgmt.    | sys_brk()               |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | munmap            | 128       | Memory Mgmt.    | sys_munmap()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | set_mempolicy     | 156       | Memory Mgmt.    | sys_set_mempolicy()     |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | set_tid_address   | 1         | Process Mgmt.   | sys_set_tid_address()   |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | set_robust_list   | 1         | Futex           | sys_set_robust_list()   |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | futex             | 341       | Futex           | sys_futex()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | sched_getaffinity | 79        | Scheduler       | sys_sched_getaffinity() |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | sched_setaffinity | 223       | Scheduler       | sys_sched_setaffinity() |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | socketpair        | 202       | Network         | sys_socketpair()        |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | rt_sigprocmask    | 21        | Signal          | sys_rt_sigprocmask()    |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | rt_sigaction      | 36        | Signal          | sys_rt_sigaction()      |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | rt_sigreturn      | 2         | Signal          | sys_rt_sigreturn()      |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | wait4             | 889       | Time            | sys_wait4()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | clock_nanosleep   | 37        | Time            | sys_clock_nanosleep()   |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | capget            | 4         | Capability      | sys_capget()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| 
 | ||
| Tracing stress-ng netdev stressor workload
 | ||
| ------------------------------------------
 | ||
| 
 | ||
| Run the following command to trace stress-ng netdev stressor workload::
 | ||
| 
 | ||
|   strace -c  stress-ng --netdev 1 -t 60 --metrics
 | ||
| 
 | ||
| **System Calls made by the workload**
 | ||
| 
 | ||
| The below table shows the system calls invoked by the workload, number of
 | ||
| times each system call is invoked, and the corresponding Linux subsystem.
 | ||
| 
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | System Call       | # calls   | Linux Subsystem | System Call (API)       |
 | ||
| +===================+===========+=================+=========================+
 | ||
| | openat            | 74        | Filesystem      | sys_openat()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | close             | 75        | Filesystem      | sys_close()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | read              | 58        | Filesystem      | sys_read()              |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | fstat             | 20        | Filesystem      | sys_fstat()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | flock             | 10        | Filesystem      | sys_flock()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | write             | 7         | Filesystem      | sys_write()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getdents64        | 8         | Filesystem      | sys_getdents64()        |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | pread64           | 8         | Filesystem      | sys_pread64()           |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | lseek             | 1         | Filesystem      | sys_lseek()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | access            | 2         | Filesystem      | sys_access()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getcwd            | 1         | Filesystem      | sys_getcwd()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | execve            | 1         | Filesystem      | sys_execve()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | mmap              | 61        | Memory Mgmt.    | sys_mmap()              |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | munmap            | 3         | Memory Mgmt.    | sys_munmap()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | mprotect          | 20        | Memory Mgmt.    | sys_mprotect()          |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | mlock             | 2         | Memory Mgmt.    | sys_mlock()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | brk               | 3         | Memory Mgmt.    | sys_brk()               |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | rt_sigaction      | 21        | Signal          | sys_rt_sigaction()      |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | rt_sigprocmask    | 1         | Signal          | sys_rt_sigprocmask()    |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | sigaltstack       | 1         | Signal          | sys_sigaltstack()       |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | rt_sigreturn      | 1         | Signal          | sys_rt_sigreturn()      |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getpid            | 8         | Process Mgmt.   | sys_getpid()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | prlimit64         | 5         | Process Mgmt.   | sys_prlimit64()         |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | arch_prctl        | 2         | Process Mgmt.   | sys_arch_prctl()        |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | sysinfo           | 2         | Process Mgmt.   | sys_sysinfo()           |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getuid            | 2         | Process Mgmt.   | sys_getuid()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | uname             | 1         | Process Mgmt.   | sys_uname()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | setpgid           | 1         | Process Mgmt.   | sys_setpgid()           |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getrusage         | 1         | Process Mgmt.   | sys_getrusage()         |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | geteuid           | 1         | Process Mgmt.   | sys_geteuid()           |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | getppid           | 1         | Process Mgmt.   | sys_getppid()           |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | sendto            | 3         | Network         | sys_sendto()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | connect           | 1         | Network         | sys_connect()           |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | socket            | 1         | Network         | sys_socket()            |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | clone             | 1         | Process Mgmt.   | sys_clone()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | set_tid_address   | 1         | Process Mgmt.   | sys_set_tid_address()   |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | wait4             | 2         | Time            | sys_wait4()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | alarm             | 1         | Time            | sys_alarm()             |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| | set_robust_list   | 1         | Futex           | sys_set_robust_list()   |
 | ||
| +-------------------+-----------+-----------------+-------------------------+
 | ||
| 
 | ||
| Tracing paxtest kiddie workload
 | ||
| -------------------------------
 | ||
| 
 | ||
| Run the following command to trace paxtest kiddie workload::
 | ||
| 
 | ||
|  strace -c paxtest kiddie
 | ||
| 
 | ||
| **System Calls made by the workload**
 | ||
| 
 | ||
| The below table shows the system calls invoked by the workload, number of
 | ||
| times each system call is invoked, and the corresponding Linux subsystem.
 | ||
| 
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | System Call       | # calls   | Linux Subsystem | System Call (API)    |
 | ||
| +===================+===========+=================+======================+
 | ||
| | read              | 3         | Filesystem      | sys_read()           |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | write             | 11        | Filesystem      | sys_write()          |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | close             | 41        | Filesystem      | sys_close()          |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | stat              | 24        | Filesystem      | sys_stat()           |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | fstat             | 2         | Filesystem      | sys_fstat()          |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | pread64           | 6         | Filesystem      | sys_pread64()        |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | access            | 1         | Filesystem      | sys_access()         |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | pipe              | 1         | Filesystem      | sys_pipe()           |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | dup2              | 24        | Filesystem      | sys_dup2()           |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | execve            | 1         | Filesystem      | sys_execve()         |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | fcntl             | 26        | Filesystem      | sys_fcntl()          |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | openat            | 14        | Filesystem      | sys_openat()         |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | rt_sigaction      | 7         | Signal          | sys_rt_sigaction()   |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | rt_sigreturn      | 38        | Signal          | sys_rt_sigreturn()   |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | clone             | 38        | Process Mgmt.   | sys_clone()          |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | wait4             | 44        | Time            | sys_wait4()          |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | mmap              | 7         | Memory Mgmt.    | sys_mmap()           |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | mprotect          | 3         | Memory Mgmt.    | sys_mprotect()       |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | munmap            | 1         | Memory Mgmt.    | sys_munmap()         |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | brk               | 3         | Memory Mgmt.    | sys_brk()            |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | getpid            | 1         | Process Mgmt.   | sys_getpid()         |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | getuid            | 1         | Process Mgmt.   | sys_getuid()         |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | getgid            | 1         | Process Mgmt.   | sys_getgid()         |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | geteuid           | 2         | Process Mgmt.   | sys_geteuid()        |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | getegid           | 1         | Process Mgmt.   | sys_getegid()        |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | getppid           | 1         | Process Mgmt.   | sys_getppid()        |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| | arch_prctl        | 2         | Process Mgmt.   | sys_arch_prctl()     |
 | ||
| +-------------------+-----------+-----------------+----------------------+
 | ||
| 
 | ||
| Conclusion
 | ||
| ==========
 | ||
| 
 | ||
| This document is intended to be used as a guide on how to gather fine-grained
 | ||
| information on the resources in use by workloads using strace.
 | ||
| 
 | ||
| References
 | ||
| ==========
 | ||
| 
 | ||
|  * `Discovery Linux Kernel Subsystems used by OpenAPS <https://elisa.tech/blog/2022/02/02/discovery-linux-kernel-subsystems-used-by-openaps>`_
 | ||
|  * `ELISA-White-Papers-Discovering Linux kernel subsystems used by a workload <https://github.com/elisa-tech/ELISA-White-Papers/blob/master/Processes/Discovering_Linux_kernel_subsystems_used_by_a_workload.md>`_
 | ||
|  * `strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_
 | ||
|  * `perf <https://man7.org/linux/man-pages/man1/perf.1.html>`_
 | ||
|  * `paxtest README <https://github.com/opntr/paxtest-freebsd/blob/hardenedbsd/0.9.14-hbsd/README>`_
 | ||
|  * `stress-ng <https://www.mankier.com/1/stress-ng>`_
 | ||
|  * `Monitoring and managing system status and performance <https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/index>`_
 |