#pragma section-numbers on ---- <> ---- = Overview = The Intel VTune performance analyzer utility helps in determining bottlenecks and sources of suboptimal utilization of CPU resources by a given application. It provides also a conveneient way to create a call graph of the application as well as to gather numerous Hardware Performance events which provide both programer and system integrator with invaluable detailed insight into the actual code execution. = Using VTune = == GUI mode (Eclipse) == To start the GUI for VTune just type in a terminal: {{{#!c # /opt/intel/vtune/bin/vtlec & }}} NB: DO NOT RUN SAMPLING WITH A LOT (MORE THAN 50-60) EVENTS AT A TIME, YOU CAN EASILY EXCEED THE MEMORY SIZE AVAILABLE ON THE MACHINE AND SYSTEM CRASHES ARE POSSIBLE! == Command line mode == Before being able to use VTune from the command line or in a script, one should extend the $PATH environment variable: {{{ # export PATH=$PATH:/opt/intel/vtune/bin # which vtl }}} A simple activity for sampling, and then showing the data can be done in this way: {{{ # vtl activity -d 2 -c sampling -app ls run # vtl show -all # vtl view a1::r1 }}} The output of the last two commands should look something like the following: {{{ a1__Activity1 r1___Sat Jun 26 10:11:57 2010 - Sampling Results [127.0.0.1] r2_______Run 1 r3_________CPU_CLK_UNHALTED.THREAD r4_________INST_RETIRED.ANY a2__Activity2 r1___Sat Jun 26 10:13:34 2010 - Sampling Results [127.0.0.1] r2_______Run 1 r3_________CPU_CLK_UNHALTED.THREAD r4_________INST_RETIRED.ANY a3__Activity3 r1___Sat Jun 26 10:15:55 2010 - Sampling Results [127.0.0.1] r2_______Run 1 r3_________CPU_CLK_UNHALTED.THREAD r4_________INST_RETIRED.ANY Module Process CPU_CLK_UNHALTED.THREAD samples INST_RETIRED.ANY samples Clocks per Instructions Retired - CPI Process Path Process ID Original Module Path vmlinux-2.6.18-194.3.1.el5 pid_0x0 893 18 49.611 0x0 /boot/ vmlinux-2.6.18-194.3.1.el5 pid_0x349 16 3 5.333 0x349 /boot/ libpthread-2.5.so vtl.bin 8 0 0.000 /opt/intel/vtune/shared/bin/ 0x2f47 /lib/ vmlinux-2.6.18-194.3.1.el5 vtl.bin 7 1 7.000 /opt/intel/vtune/shared/bin/ 0x2f47 /boot/ vmlinux-2.6.18-194.3.1.el5 ntd 5 0 0.000 /opt/sag/exx/v721/bin/ 0x1241 /boot/ ntd ntd 3 1 3.000 /opt/sag/exx/v721/bin/ 0x1241 /opt/sag/exx/v721/bin/ libmutant.so vtl.bin 2 1 2.000 /opt/intel/vtune/shared/bin/ 0x2f47 /opt/sag/exx/v721/lib/ vmlinux-2.6.18-194.3.1.el5 irqbalance 1 1 1.000 /usr/sbin/ 0x117f /boot/ cpufreq_ondemand pid_0x349 1 0 0.000 0x349 libc-2.5.so vtserver.bin 1 0 0.000 /opt/intel/vtune/rdc/shared/bin/ 0x2fc0 /lib64/ Other32 dsm_sa_datamgr32d.5.9.1.6284 1 0 0.000 /opt/dell/srvadmin/dataeng/bin/ 0x196a }}} = Exporting and analyzing data = = Results = The following sections present the results of various High Energy Physics applications. Central for this results is the floating point percentage of every application compared to the HEP-SPEC benchmark. Results near the result of HEPSPEC are proof for the applicability of HEPSPEC all_cpp benchmarks for simulating HEP application-like workload and thus providing consistent base for testing computational resource power. == HEP-SPEC 2006 ==