2540
Comment:
|
6856
|
Deletions are marked like this. | Additions are marked like this. |
Line 36: | Line 36: |
Line 37: | Line 38: |
This command will give a rather legthy list of all supported events that can be observed by the 'perf' utility. A short snippet of the list is shown below: {{{ [user@westmere]# perf list List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] cpu-clock [Software event] task-clock [Software event] page-faults OR faults [Software event] minor-faults [Software event] major-faults [Software event] context-switches OR cs [Software event] cpu-migrations OR migrations [Software event] alignment-faults [Software event] emulation-faults [Software event] rNNN [Raw hardware event descriptor] L1-dcache-loads [Hardware cache event] L1-dcache-load-misses [Hardware cache event] L1-dcache-stores [Hardware cache event] ... }}} The 'rNNN' option can be used to persribe the monitoring of a performance counter/event by its hardware code. This is useful because not all Performance Monitoring Unit (PMU) features are covered by the Software and hardware events defined by the 'perf' developers. For example if one wants to read ou the number of SIMD 64-bit integer operations, there is no event for that defined in the 'perf' command line interface. In such cases one can obtain the hexadecimal code for that particular performance event and use it with the 'rNNN' event description, where 'NNN' is substituted with the particular hardware code. An example of this follows in the next section. |
|
Line 38: | Line 73: |
This command is the most common and easy way to gather performance statistics of an application. After being acquainted with what one can observe as performance counters through invocation of 'perf list', the events listet there can be used by 'perf stat'. Here is an example how to get the number of instructions a command needed during operation and how many clock cycles that took: {{{ [root@westmere 2006-1.1]# perf stat -e instructions -e cycles ls /usr bin etc games include lib lib64 libexec local sbin share src sue tmp vice Performance counter stats for 'ls /usr': 992698 instructions # 0.697 IPC 1425153 cycles 0.001307494 seconds time elapsed }}} Besides the '-e' option one can use also additional parameters to the 'perf stat' command: {{{ usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events -i, --inherit child tasks inherit counters -p, --pid <n> stat events on existing pid -a, --all-cpus system-wide collection from all CPUs -c, --scale scale/normalize counters -v, --verbose be more verbose (show counter open errors, etc) -r, --repeat <n> repeat command and print average + stddev (max: 100) -n, --null null run - dont start any counters }}} For obtaining the list of codes for the hardware events not defined in the event list of 'perf list' one must download and install the libpfm4.so library from [[http://perfmon2.sourceforge.net/]]. In the compilation folder there is an 'examples' directory, in which a tool called 'showeventinfo' is located. It will give detailed description of all supported hardware events on the aprticular machine as well as their codes. The process of obtaining these codes is described also on [[http://www.ibm.com/developerworks/wikis/display/LinuxP/Using+perf+on+POWER7+systems]]. A list obtained from the westmere machine can be viwed here - [[]]. == perf record == |
Contents
Overview
The 'perf' command line tool is the interface which can be used by users to utilize the new performance counter subsystem in recent Linux kernels (version > 2.6.30). The new performance counter subsystem of Linux allows for using provides rich abstractions over the PMU hardware capabilities of modern CPUs. It provides per task, per CPU and per-workload counters, counter groups, and it provides sampling capabilities on top of those. It also provides abstraction for 'software events' - such as minor/major page faults, task migrations, task context-switches and tracepoints. The 'perf' tool can be used to optimize, validate and measure applications, workloads or the full system.
NOTE: 'perf' is at this time only available on the westmere server. Installations and deployment on other platforms/machines is to be discussed.
perf commands
'perf' invokation uses different commands and particular options to these commands to perfom fine monitoring of performance-relevant information. The full list of commands supported by 'per' can be seen by issuing the following:
[root@westmere 2006-1.1]# perf help usage: perf [--version] [--help] COMMAND [ARGS] The most commonly used perf commands are: annotate Read perf.data (created by perf record) and display annotated code archive Create archive with object files with build-ids found in perf.data file bench General framework for benchmark suites buildid-cache Manage build-id cache. buildid-list List the buildids in a perf.data file diff Read two perf.data files and display the differential profile kmem Tool to trace/measure kernel memory(slab) properties list List all symbolic event types record Run a command and record its profile into perf.data report Read perf.data (created by perf record) and display the profile sched Tool to trace/measure scheduler properties (latencies) stat Run a command and gather performance counter statistics timechart Tool to visualize total system behavior during a workload top System profiling tool. trace Read perf.data (created by perf record) and display trace output See 'perf help COMMAND' for more information on a specific command.
perf list
This command will give a rather legthy list of all supported events that can be observed by the 'perf' utility. A short snippet of the list is shown below:
[user@westmere]# perf list List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] cpu-clock [Software event] task-clock [Software event] page-faults OR faults [Software event] minor-faults [Software event] major-faults [Software event] context-switches OR cs [Software event] cpu-migrations OR migrations [Software event] alignment-faults [Software event] emulation-faults [Software event] rNNN [Raw hardware event descriptor] L1-dcache-loads [Hardware cache event] L1-dcache-load-misses [Hardware cache event] L1-dcache-stores [Hardware cache event] ...
The 'rNNN' option can be used to persribe the monitoring of a performance counter/event by its hardware code. This is useful because not all Performance Monitoring Unit (PMU) features are covered by the Software and hardware events defined by the 'perf' developers. For example if one wants to read ou the number of SIMD 64-bit integer operations, there is no event for that defined in the 'perf' command line interface. In such cases one can obtain the hexadecimal code for that particular performance event and use it with the 'rNNN' event description, where 'NNN' is substituted with the particular hardware code. An example of this follows in the next section.
perf stat
This command is the most common and easy way to gather performance statistics of an application. After being acquainted with what one can observe as performance counters through invocation of 'perf list', the events listet there can be used by 'perf stat'. Here is an example how to get the number of instructions a command needed during operation and how many clock cycles that took:
[root@westmere 2006-1.1]# perf stat -e instructions -e cycles ls /usr bin etc games include lib lib64 libexec local sbin share src sue tmp vice Performance counter stats for 'ls /usr': 992698 instructions # 0.697 IPC 1425153 cycles 0.001307494 seconds time elapsed
Besides the '-e' option one can use also additional parameters to the 'perf stat' command:
usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events -i, --inherit child tasks inherit counters -p, --pid <n> stat events on existing pid -a, --all-cpus system-wide collection from all CPUs -c, --scale scale/normalize counters -v, --verbose be more verbose (show counter open errors, etc) -r, --repeat <n> repeat command and print average + stddev (max: 100) -n, --null null run - dont start any counters
For obtaining the list of codes for the hardware events not defined in the event list of 'perf list' one must download and install the libpfm4.so library from http://perfmon2.sourceforge.net/. In the compilation folder there is an 'examples' directory, in which a tool called 'showeventinfo' is located. It will give detailed description of all supported hardware events on the aprticular machine as well as their codes. The process of obtaining these codes is described also on http://www.ibm.com/developerworks/wikis/display/LinuxP/Using+perf+on+POWER7+systems. A list obtained from the westmere machine can be viwed here - [[]].
perf record
perf report
Running applications under perf surveilance
HEPSPEC