TOC:

  1. Intro
  2. Userspace profiling: gprof, gcov
  3. Userspace profiling: Valgrind
  4. Kernel profiling: Intro
  5. Kernel profiling: ftrace
  6. Kernel profiling: perf
  7. Kernel profiling: SystemTap
  8. Various tools

gprof and gcov is a classical profilers that are still in wide use. Since the dawn of time they were used by hackers though now everybody shifts their focus to dynamic profilers like Valgrind.

gprof

gprof (GNU Profiler) — simple and easy profiler that can show how much time your program spends in routines in percents and seconds. gprof uses source code instrumentation by inserting special mcount function call to gather metrics of your program.

Building with gprof instrumentation

To gather profile you need to compile your program with -pg gcc option and then launch under gprof. For better results and statistical errors elimination it’s recommended to launch profiling subject several times.

To build with gprof instrumentation invoke gcc like this:

[root@simplex block_hasher]# gcc -pg -g -lcrypto -pthread -lrt -Wall -Wextra block_hasher.c -o block_hasher

As a result you’ll get instrumented program. To check if it’s really instrumented just grep mcount symbol.

 [root@simplex block_hasher]# nm block_hasher | grep mcount
                  U mcount@@GLIBC_2.2.5

Profiling block_hasher under gprof

As I said earlier to collect useful statistic we should run program several times and accumulate metrics. To do that I’ve written simple bash script:

#!/bin/bash

if [ $# -ne 1 ]; then
    echo "gprof.sh <number of runs>"
    exit 1
fi

for i in $(seq 1 $1); do

    # Run profiled program
    ./block_hasher -d /dev/md126 -b 1048576 -t 10 -n 1000

    # Accumulate gprof statistic
    if [ -e gmon.sum ]; then
        gprof -s block_hasher gmon.out gmon.sum
    else
        mv gmon.out gmon.sum
    fi
done

# Make final profile
gprof block_hasher gmon.sum > gmon.profile

So, each launch will create gmon.out that gprof will combine in gmon.sum. Finally, gmon.sum will be feed to gprof to get flat text profile and call graph.

Analyzing

Flat profile has info about program routines and time spent in it.

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
100.24      0.01     0.01                             thread_func
  0.00      0.01     0.00       50     0.00     0.00  time_diff
  0.00      0.01     0.00        5     0.00     0.00  bdev_close
  0.00      0.01     0.00        5     0.00     0.00  bdev_open

gprof metrics are clear from the name. As we can see almost all of it’s time our little program spent in thread function, BUT look at the actual seconds — only 0.01 seconds from whole program execution. It means that it’s not the thread function who is slowing down but something underlying. It’s really obvious that pread itself is unlikely to slack — the only reasonable candidate is I/O work that make thread functions sleep in waiting for I/O completion.

Call graph is really not interesting here, so I won’t show you it, sorry.

gcov

gcov (short for GNU Coverage) — tool to collect function call statistics line-by-line. Usually it’s used in pair with gprof to understand what exact line in slacking function is holds your program down.

Building with gcov instrumentation

Just as gprof you need to recompile your program with gcov flags

# gcc -fprofile-arcs -ftest-coverage -lcrypto -pthread -lrt -Wall -Wextra block_hasher.c -o block_hasher

There are 2 gcov flags: -fprofile-arcs и -ftest-coverage. First will instrument your program to profile so called arcs — pathes in program’s control flow. Second option will make gcc to collect additional notes for arcs profiling and gcov itself.

You can simply put -coverage option which implies both of -fprofile-arcs and -ftest-coverage along with linker -lgcov flag. See GCC debugging options for more info.

Profiling block_hasher under gcov

Now, after instrumenting we just launch program to get 2 files — block_hasher.gcda and block_hasher.gcno. Please, don’t look inside it — we will transform it to text profile. To do this we execute gcov passing it source code filename. It’s important to remember that you must have <filename>.gcda and <filename>.gcno files.

[root@simplex block_hasher]# gcov block_hasher.c
File 'block_hasher.c'
Lines executed:77.69% of 121
block_hasher.c:creating 'block_hasher.c.gcov'

Finally we’ll get block_hasher.c.gcov.

Analyzing

.gcov file is result of that whole gcov work. Let’s look at it. For each of your source files gcov will create annotated source codes with runtime coverage. Here is excerpt from thread_func:

   10:  159:    gap = num_threads * block_size; // Multiply here to avoid integer overflow
    -:  160:
    -:  161:    // Initialize EVP and start reading
   10:  162:    md = EVP_sha1();
   10:  163:    mdctx = EVP_MD_CTX_create();
   10:  164:    EVP_DigestInit_ex( mdctx, md, NULL );
    -:  165:
   10:  166:    get_clock( &start );
10010:  167:    for( i = 0; i < nblocks; i++)
    -:  168:    {
10000:  169:        offset = j->off + gap * i;
    -:  170:
    -:  171:        // Read at offset without changing file pointer
10000:  172:        err = pread( bdev->fd, buf, block_size, offset );
 9999:  173:        if( err == -1 )
    -:  174:        {
#####:  175:            fprintf(stderr, "T%02d Failed to read at %llu\n", j->num, (unsigned long long)offset);
#####:  176:            perror("pread");
#####:  177:            pthread_exit(NULL);
    -:  178:        }
    -:  179:
 9999:  180:        bytes += err; // On success pread returns bytes read
    -:  181:
    -:  182:        // Update digest
 9999:  183:        EVP_DigestUpdate( mdctx, buf, block_size );
    -:  184:    }
   10:  185:    get_clock( &end );
   10:  186:    sec_diff = time_diff( start, end );
    -:  187:
   10:  188:    EVP_DigestFinal_ex( mdctx, j->digest, &j->digest_len );
   10:  189:    EVP_MD_CTX_destroy(mdctx);

The left outmost column is how many times this line of code was invoked. As expected, the for loop was executed 10000 times — 10 threads each reading 1000 blocks. Nothing new.

Though gcov was not so much useful for me, I’d like to say that it has really cool feature — branch probabilities. If you’ll launch gcov with -b option

[root@simplex block_hasher]# gcov -b block_hasher.c
File 'block_hasher.c'
Lines executed:77.69% of 121
Branches executed:100.00% of 66
Taken at least once:60.61% of 66
Calls executed:51.47% of 68
block_hasher.c:creating 'block_hasher.c.gcov'

You’ll get nice branch annotation with probabilities. For example, here is time_diff function

113 function time_diff called 10 returned 100% blocks executed 100%
114        10:  106:double time_diff(struct timespec start, struct timespec end)
115         -:  107:{
116         -:  108:    struct timespec diff;
117         -:  109:    double sec;
118         -:  110:
119        10:  111:    if ( (end.tv_nsec - start.tv_nsec) < 0 )
120 branch  0 taken 60% (fallthrough)
121 branch  1 taken 40%
122         -:  112:    {
123         6:  113:        diff.tv_sec  = end.tv_sec - start.tv_sec - 1;
124         6:  114:        diff.tv_nsec = 1000000000 + end.tv_nsec - start.tv_nsec;
125         -:  115:    }
126         -:  116:    else
127         -:  117:    {
128         4:  118:        diff.tv_sec  = end.tv_sec - start.tv_sec;
129         4:  119:        diff.tv_nsec = end.tv_nsec - start.tv_nsec;
130         -:  120:    }
131         -:  121:
132        10:  122:    sec = (double)diff.tv_nsec / 1000000000 + diff.tv_sec;
133         -:  123:
134        10:  124:    return sec;
135         -:  125:}

In 60% of if calls we’ve fallen in branch to calculate time diff with borrow, that actually means we were executing for more than 1 second.

Conclusion

gprof and gcov are really entertaining tools despite a lot of people think about them as obsolete. On one hand these utilities are simple, they implementing and automating obvious method of source code instrumentation to measure function hit count.

But on the other hand such simple metrics won’t help with problems outside of your application. Although there are ways to use it for operating system, e.g. for Linux kernel. Anyway gprof and gcov is useless in case when your application spends most of it’s time in waiting for some syscall (pread in my case).

To read