gprof and gcov

February 10, 2014

gprof and gcov are classical profilers that are still in use. Since the dawn of time, they were used by hackers to gain insight into their programs at the source code level.

In the examples below, I’ll use my block_hasher program to illustrate the usage of profilers. because it’s a small and simple utility.

gprof

gprof (GNU Profiler) – simple and easy profiler that can show how much time your program spends in routines in percents and seconds. gprof uses source code instrumentation by inserting special mcount function call to gather metrics of your program.

Building with gprof instrumentation

To gather profile you need to compile your program with -pg gcc option and then launch under gprof. For better results and statistical errors elimination, it’s recommended to launch profiling subject several times.

To build with gprof instrumentation invoke gcc like this:

$ gcc <your options> -pg -g prog.c -o prog

Here is the actual compile instructions for the block_hasher:

$ gcc -lrt -pthread -lcrypto -pg -g block_hasher.c -o block_hasher

As a result, you’ll get instrumented program. To check if it’s really instrumented just grep mcount symbol.

 $ nm block_hasher | grep mcount
     U mcount@@GLIBC_2.2.5

Profiling block_hasher under gprof

As I said earlier to collect useful statistic we should run the program several times and accumulate metrics. To do that I’ve written simple bash script:

#!/bin/bash

if [[ $# -lt 2 ]]; then
    echo "$0 <number of runs> <program with options...>"
    exit 1
fi

RUNS=$1
shift 1

COMMAND="$@"

# Profile name is a program name (first element in args)
PROFILE_NAME="$(echo "${COMMAND}" | cut -f1 -d' ')"

for i in $(seq 1 ${RUNS}); do

    # Run profiled program
    eval "${COMMAND}"

    # Accumulate gprof statistic
    if [[ -e gmon.sum ]]; then
        gprof -s ${PROFILE_NAME} gmon.out gmon.sum
    else
        mv gmon.out gmon.sum
    fi
done

# Make final profile
gprof ${PROFILE_NAME} gmon.sum > gmon.profile

So, each launch will create gmon.out that gprof will combine in gmon.sum. Finally, gmon.sum will be feed to gprof to get flat text profile and call graph.

Let’s do this for our program:

$ ./gprofiler.sh 10 ./block_hasher -d /dev/sdd -b 1048576 -t 10 -n 1000

After finish, this script will create gmon.profile - a text profile, that we can analyze.

Analyzing

The flat profile has info about program routines and time spent in it.

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
100.24      0.01     0.01                             thread_func
  0.00      0.01     0.00       50     0.00     0.00  time_diff
  0.00      0.01     0.00        5     0.00     0.00  bdev_close
  0.00      0.01     0.00        5     0.00     0.00  bdev_open

gprof metrics are clear from the name. As we can see almost all of it’s time our little program spent in thread function, BUT look at the actual seconds – only 0.01 seconds from whole program execution. It means that it’s not the thread function who is slowing down but something underlying. In the case of block_hasher, it’s a pread syscall that does the I/O for the block device.

The call graph is really not interesting here, so I won’t show you it, sorry.

gcov

gcov (short for GNU Coverage) – tool to collect function call statistics line-by-line. Usually it’s used in pair with gprof to understand what exact line in slacking function is holds your program down.

Building with gcov instrumentation

Just as gprof you need to recompile your program with gcov flags

# gcc -fprofile-arcs -ftest-coverage -lcrypto -pthread -lrt -Wall -Wextra block_hasher.c -o block_hasher

There are 2 gcov flags: -fprofile-arcs и -ftest-coverage. First will instrument your program to profile so called arcs – paths in program’s control flow. The second option will make gcc to collect additional notes for arcs profiling and gcov itself.

You can simply put --coverage option which implies both of -fprofile-arcs and -ftest-coverage along with linker -lgcov flag. See GCC debugging options for more info.

Profiling block_hasher under gcov

Now, after instrumenting we just launch the program to get 2 files – block_hasher.gcda and block_hasher.gcno. Please, don’t look inside it – we will transform it to text profile. To do this we execute gcov passing it source code filename. It’s important to remember that you must have <filename>.gcda and <filename>.gcno files.

$ gcov block_hasher.c
File 'block_hasher.c'
Lines executed:77.69% of 121
block_hasher.c:creating 'block_hasher.c.gcov'

Finally, we’ll get block_hasher.c.gcov.

Analyzing

.gcov file is result of that whole gcov work. Let’s look at it. For each of your source files gcov will create annotated source codes with runtime coverage. Here is excerpt from thread_func:

   10:  159:    gap = num_threads * block_size; // Multiply here to avoid integer overflow
    -:  160:
    -:  161:    // Initialize EVP and start reading
   10:  162:    md = EVP_sha1();
   10:  163:    mdctx = EVP_MD_CTX_create();
   10:  164:    EVP_DigestInit_ex( mdctx, md, NULL );
    -:  165:
   10:  166:    get_clock( &start );
10010:  167:    for( i = 0; i < nblocks; i++)
    -:  168:    {
10000:  169:        offset = j->off + gap * i;
    -:  170:
    -:  171:        // Read at offset without changing file pointer
10000:  172:        err = pread( bdev->fd, buf, block_size, offset );
 9999:  173:        if( err == -1 )
    -:  174:        {
#####:  175:            fprintf(stderr, "T%02d Failed to read at %llu\n", j->num, (unsigned long long)offset);
#####:  176:            perror("pread");
#####:  177:            pthread_exit(NULL);
    -:  178:        }
    -:  179:
 9999:  180:        bytes += err; // On success pread returns bytes read
    -:  181:
    -:  182:        // Update digest
 9999:  183:        EVP_DigestUpdate( mdctx, buf, block_size );
    -:  184:    }
   10:  185:    get_clock( &end );
   10:  186:    sec_diff = time_diff( start, end );
    -:  187:
   10:  188:    EVP_DigestFinal_ex( mdctx, j->digest, &j->digest_len );
   10:  189:    EVP_MD_CTX_destroy(mdctx);

The left outmost column is how many times this line of code was invoked. As expected, the for loop was executed 10000 times – 10 threads each reading 1000 blocks. Nothing new.

Though gcov was not so much useful for me, I’d like to say that it has really cool feature – branch probabilities. If you’ll launch gcov with -b option

[root@simplex block_hasher]# gcov -b block_hasher.c
File 'block_hasher.c'
Lines executed:77.69% of 121
Branches executed:100.00% of 66
Taken at least once:60.61% of 66
Calls executed:51.47% of 68
block_hasher.c:creating 'block_hasher.c.gcov'

You’ll get nice branch annotation with probabilities. For example, here is time_diff function

113 function time_diff called 10 returned 100% blocks executed 100%
114        10:  106:double time_diff(struct timespec start, struct timespec end)
115         -:  107:{
116         -:  108:    struct timespec diff;
117         -:  109:    double sec;
118         -:  110:
119        10:  111:    if ( (end.tv_nsec - start.tv_nsec) < 0 )
120 branch  0 taken 60% (fallthrough)
121 branch  1 taken 40%
122         -:  112:    {
123         6:  113:        diff.tv_sec  = end.tv_sec - start.tv_sec - 1;
124         6:  114:        diff.tv_nsec = 1000000000 + end.tv_nsec - start.tv_nsec;
125         -:  115:    }
126         -:  116:    else
127         -:  117:    {
128         4:  118:        diff.tv_sec  = end.tv_sec - start.tv_sec;
129         4:  119:        diff.tv_nsec = end.tv_nsec - start.tv_nsec;
130         -:  120:    }
131         -:  121:
132        10:  122:    sec = (double)diff.tv_nsec / 1000000000 + diff.tv_sec;
133         -:  123:
134        10:  124:    return sec;
135         -:  125:}

In 60% of if calls we’ve fallen in the branch to calculate time diff with borrow, that actually means we were executing for more than 1 second.

Conclusion

gprof and gcov are really entertaining tools despite a lot of people think about them as obsolete. On the one hand, these utilities are simple, they implement and automate an obvious method of source code instrumentation to measure functions hit count.

But on the other hand, such simple metrics won’t help with problems outside of your application like kernel or library, although there are ways to use it for an operating system, e.g. for Linux kernel. Anyway, gprof and gcov are useless in the case when your application spends most of its time in waiting for some syscall (pread in my case).

To read