Differences between revisions 11 and 12
Revision 11 as of 2010-12-01 11:52:53
Size: 7724
Comment:
Revision 12 as of 2010-12-01 11:53:28
Size: 6899
Comment:
Deletions are marked like this. Additions are marked like this.
Line 100: Line 100:

---- /!\ '''Edit conflict - other version:''' ----
For obtaining the list of codes for the hardware events not defined in the event list of 'perf list' one must download and install the libpfm4.so library from [[http://perfmon2.sourceforge.net/]]. In the compilation folder there is an 'examples' directory, in which a tool called 'showeventinfo' is located. It will give detailed description of all supported hardware events on the aprticular machine as well as their codes. The process of obtaining these codes is described also on [[http://www.ibm.com/developerworks/wikis/display/LinuxP/Using+perf+on+POWER7+systems]]. A list obtained from the westmere machine can be viwed here - [[attachment:eventinfo_westmere.txt]].

---- /!\ '''Edit conflict - your version:''' ----
Line 106: Line 101:

---- /!\ '''End of edit conflict''' ----


Overview

The 'perf' command line tool is the interface which can be used by users to utilize the new performance counter subsystem in recent Linux kernels (version > 2.6.30). The new performance counter subsystem of Linux allows for using provides rich abstractions over the PMU hardware capabilities of modern CPUs. It provides per task, per CPU and per-workload counters, counter groups, and it provides sampling capabilities on top of those. It also provides abstraction for 'software events' - such as minor/major page faults, task migrations, task context-switches and tracepoints. The 'perf' tool can be used to optimize, validate and measure applications, workloads or the full system.

NOTE: 'perf' is at this time only available on the westmere server. Installations and deployment on other platforms/machines is to be discussed.

perf commands

'perf' invokation uses different commands and particular options to these commands to perfom fine monitoring of performance-relevant information. The full list of commands supported by 'per' can be seen by issuing the following:

[root@westmere 2006-1.1]# perf help

 usage: perf [--version] [--help] COMMAND [ARGS]

 The most commonly used perf commands are:
   annotate        Read perf.data (created by perf record) and display annotated code
   archive         Create archive with object files with build-ids found in perf.data file
   bench           General framework for benchmark suites
   buildid-cache   Manage build-id cache.
   buildid-list    List the buildids in a perf.data file
   diff            Read two perf.data files and display the differential profile
   kmem            Tool to trace/measure kernel memory(slab) properties
   list            List all symbolic event types
   record          Run a command and record its profile into perf.data
   report          Read perf.data (created by perf record) and display the profile
   sched           Tool to trace/measure scheduler properties (latencies)
   stat            Run a command and gather performance counter statistics
   timechart       Tool to visualize total system behavior during a workload
   top             System profiling tool.
   trace           Read perf.data (created by perf record) and display trace output

 See 'perf help COMMAND' for more information on a specific command.

perf list

This command will give a rather legthy list of all supported events that can be observed by the 'perf' utility. A short snippet of the list is shown below:

[user@westmere]# perf list

List of pre-defined events (to be used in -e):

  cpu-cycles OR cycles                       [Hardware event]
  instructions                               [Hardware event]
  cache-references                           [Hardware event]
  cache-misses                               [Hardware event]
  branch-instructions OR branches            [Hardware event]
  branch-misses                              [Hardware event]
  bus-cycles                                 [Hardware event]

  cpu-clock                                  [Software event]
  task-clock                                 [Software event]
  page-faults OR faults                      [Software event]
  minor-faults                               [Software event]
  major-faults                               [Software event]
  context-switches OR cs                     [Software event]
  cpu-migrations OR migrations               [Software event]
  alignment-faults                           [Software event]
  emulation-faults                           [Software event]

  rNNN                                       [Raw hardware event descriptor]

  L1-dcache-loads                            [Hardware cache event]
  L1-dcache-load-misses                      [Hardware cache event]
  L1-dcache-stores                           [Hardware cache event]
  ...

The 'rNNN' option can be used to persribe the monitoring of a performance counter/event by its hardware code. This is useful because not all Performance Monitoring Unit (PMU) features are covered by the Software and hardware events defined by the 'perf' developers. For example if one wants to read ou the number of SIMD 64-bit integer operations, there is no event for that defined in the 'perf' command line interface. In such cases one can obtain the hexadecimal code for that particular performance event and use it with the 'rNNN' event description, where 'NNN' is substituted with the particular hardware code. An example of this follows in the next section.

perf stat

This command is the most common and easy way to gather performance statistics of an application. After being acquainted with what one can observe as performance counters through invocation of 'perf list', the events listet there can be used by 'perf stat'. Here is an example how to get the number of instructions a command needed during operation and how many clock cycles that took:

[root@westmere 2006-1.1]# perf stat -e instructions -e cycles ls /usr
bin  etc  games  include  lib  lib64  libexec  local  sbin  share  src  sue  tmp  vice

 Performance counter stats for 'ls /usr':

         992698  instructions             #      0.697 IPC  
        1425153  cycles                  

    0.001307494  seconds time elapsed

Besides the '-e' option one can use also additional parameters to the 'perf stat' command:

 usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events
    -i, --inherit         child tasks inherit counters
    -p, --pid <n>         stat events on existing pid
    -a, --all-cpus        system-wide collection from all CPUs
    -c, --scale           scale/normalize counters
    -v, --verbose         be more verbose (show counter open errors, etc)
    -r, --repeat <n>      repeat command and print average + stddev (max: 100)
    -n, --null            null run - dont start any counters

For obtaining the list of codes for the hardware events not defined in the event list of 'perf list' one must download and install the libpfm4.so library from here. In the compilation folder there is an 'examples' directory, in which a tool called 'showeventinfo' is located. It will give detailed description of all supported hardware events on the aprticular machine as well as their codes. The process of obtaining these codes is described also on here. A list obtained from the westmere machine can be viwed here - eventinfo_westmere.txt.

perf record

perf report

Running applications under perf surveilance

HEPSPEC

Results

perf (last edited 2011-11-17 14:58:30 by KonstantinBoyanov)