1. Overview

The Intel VTune performance analyzer utility helps in determining bottlenecks and sources of suboptimal utilization of CPU resources by a given application. It provides also a conveneient way to create a call graph of the application as well as to gather numerous Hardware Performance events which provide both programer and system integrator with invaluable detailed insight into the actual code execution.

2. Using VTune

2.1. GUI mode (Eclipse)

To start the GUI for VTune just type in a terminal:

# /opt/intel/vtune/bin/vtlec &

NB: DO NOT RUN SAMPLING WITH A LOT (MORE THAN 50-60) EVENTS AT A TIME, YOU CAN EASILY EXCEED THE MEMORY SIZE AVAILABLE ON THE MACHINE AND SYSTEM CRASHES ARE POSSIBLE!

2.2. Command line mode

Before being able to use VTune from the command line or in a script, one should extend the $PATH environment variable:

# export PATH=$PATH:/opt/intel/vtune/bin
# which vtl

A simple activity for sampling, and then showing the data can be done in this way:

# vtl activity -d 2 -c sampling -app ls run
# vtl show -all
# vtl view a1::r1

The output of the last two commands should look something like the following:

a1__Activity1
    r1___Sat Jun 26 10:11:57 2010 - Sampling Results [127.0.0.1]
    r2_______Run 1
    r3_________CPU_CLK_UNHALTED.THREAD
    r4_________INST_RETIRED.ANY
a2__Activity2
    r1___Sat Jun 26 10:13:34 2010 - Sampling Results [127.0.0.1]
    r2_______Run 1
    r3_________CPU_CLK_UNHALTED.THREAD
    r4_________INST_RETIRED.ANY
a3__Activity3
    r1___Sat Jun 26 10:15:55 2010 - Sampling Results [127.0.0.1]
    r2_______Run 1
    r3_________CPU_CLK_UNHALTED.THREAD
    r4_________INST_RETIRED.ANY

Module                     Process                      CPU_CLK_UNHALTED.THREAD samples INST_RETIRED.ANY samples Clocks per Instructions Retired - CPI Process Path                     Process ID Original Module Path
vmlinux-2.6.18-194.3.1.el5 pid_0x0                      893                             18                       49.611                                                                 0x0        /boot/
vmlinux-2.6.18-194.3.1.el5 pid_0x349                    16                              3                        5.333                                                                  0x349      /boot/
libpthread-2.5.so          vtl.bin                      8                               0                        0.000                                 /opt/intel/vtune/shared/bin/     0x2f47     /lib/
vmlinux-2.6.18-194.3.1.el5 vtl.bin                      7                               1                        7.000                                 /opt/intel/vtune/shared/bin/     0x2f47     /boot/
vmlinux-2.6.18-194.3.1.el5 ntd                          5                               0                        0.000                                 /opt/sag/exx/v721/bin/           0x1241     /boot/
ntd                        ntd                          3                               1                        3.000                                 /opt/sag/exx/v721/bin/           0x1241     /opt/sag/exx/v721/bin/
libmutant.so               vtl.bin                      2                               1                        2.000                                 /opt/intel/vtune/shared/bin/     0x2f47     /opt/sag/exx/v721/lib/
vmlinux-2.6.18-194.3.1.el5 irqbalance                   1                               1                        1.000                                 /usr/sbin/                       0x117f     /boot/
cpufreq_ondemand           pid_0x349                    1                               0                        0.000                                                                  0x349
libc-2.5.so                vtserver.bin                 1                               0                        0.000                                 /opt/intel/vtune/rdc/shared/bin/ 0x2fc0     /lib64/
Other32                    dsm_sa_datamgr32d.5.9.1.6284 1                               0                        0.000                                 /opt/dell/srvadmin/dataeng/bin/  0x196a

3. Exporting and analyzing data

4. Results

The following sections present the results of various High Energy Physics applications. Central for this results is the floating point percentage of every application compared to the HEP-SPEC benchmark. Results near the result of HEPSPEC are proof for the applicability of HEPSPEC all_cpp benchmarks for simulating HEP application-like workload and thus providing consistent base for testing computational resource power.

4.1. HEP-SPEC 2006

RunVTune (last edited 2010-06-26 10:23:50 by KonstantinBoyanov)