gdb

GDB is not easy to learn, not easy to use, cryptic and illogical. Alas, we don’t have anything better. It’s also a great tool for exploring and understanding how things work, because it’s direct view into program execution.

Here are some basic steps and nice tricks I’ve learned about gdb.

We’ll use GDB for exploring CPython for illustrative purposes.

Invocation

There are several ways to invoke gdb:

  1. Simple

    $ gdb ./python
    
  2. With core dump file

    $ gdb ./python python.core.latest
    
  3. Passing arguments to the program with --args

    $ gdb --args ./python -v -d -c 'import this'
    

If program waits input from stdin, you can pass it as part of run or start command like this:

$ gdb -q --args ./python -m json.tool
Reading symbols from ./python...done.
(gdb)
(gdb) run < path/to/input_file.json

Running

run command runs your program (duh!), although I usually use start because it sets temporary breakpoint on main function and invokes run - really handy.

(gdb) start
Temporary breakpoint 1 at 0x41d2f6: file ./Programs/python.c, line 20.
Starting program: /home/avd/dev/cpython/python

To stop program, there is a kill command.

(gdb) kill
Kill the program being debugged? (y or n) y

Breakpoints

There are 2 commands to control breakpoints - breakpoint and delete. First one sets breakpoint, the latter one deletes it by number. To show breakpoints use info breakpoints:

(gdb) b PyEval_EvalFrameEx 
Breakpoint 1 at 0x52d4d5: file Python/ceval.c, line 661.
(gdb) b PyEval_CallFunction 
Breakpoint 2 at 0x4616aa: file Objects/call.c, line 964.
(gdb) info breakpoints 
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x000000000052d4d5 in PyEval_EvalFrameEx at Python/ceval.c:661
2       breakpoint     keep y   0x00000000004616aa in PyEval_CallFunction at Objects/call.c:964
(gdb) delete 1
(gdb) info  breakpoints 
Num     Type           Disp Enb Address            What
2       breakpoint     keep y   0x00000000004616aa in PyEval_CallFunction at Objects/call.c:964

Conditional breakpoints

breakpoint if ..., condition <bp> ...

Backtrace

bt, frame finish, until, advance

Examine

  • Examine at breakpoint - printing variables, printing structs, examine memory
    • set print pretty on
    • ptype var - print type of variable
    • print var - print variable
    • print *var - print variable under pointer
    • display, undisplay, disable display 1
    • info args
    • info params
    • info locals
    • commands
    • call

Dynamic printf

  • dprintf - good ol’ printf debugging without recompiling
    • dprintf <location>, "<format string>", args
    • dprintf myfunc, "input_arg is %d\n", input_arg
    • dprintf io.c:30 "buf is %p\n", buf

Watchpoints

  • watchpoints
    • watch foo - stop when foo is modified
    • watch -l foo - watch location
    • rwatch foo - stop when foo is read
    • watch foo thread 3 - stop when foo is read in thread #3
    • watch foo if foo > 10 - stop when foo is > 10

Miscellaneous

  • Miscellaneous
    • gdb vars and setting own variables with set $myvar = $2,
    • shell <cmd>,
    • set print pretty on
    • set history save on,
    • set follow-fork-mode child, set detach-on-fork off
    • gdb dashboard - https://github.com/cyrus-and/gdb-dashboard
    • gdbinit

Debug info

It all starts with debug info - special sections in the binary file produced by the compiler and used by the debugger and other handy tools.

In GCC there is well-known -g flag for that. Most projects with some kind of build system either build with debug info by default or have some flag for it.

In the case of CPython, you have to do the following:

$ ./configure --with-pydebug
$ make -j

--with-pydebug will insert -g in GCC invocation.

This -g option will generate debug sections - binary sections to insert into program’s binary. These sections are usually in DWARF format. For ELF binaries these debug sections have names like .debug_*, e.g. .debug_info or .debug_loc. These debug sections are what makes the magic of debugging possible - basically, it’s a mapping of assembly level instructions to the source code.

To find whether your program has debug symbols you can list the sections of the binary with objdump:

$ objdump -h ./python

python:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .interp       0000001c  0000000000400238  0000000000400238  00000238  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.ABI-tag 00000020  0000000000400254  0000000000400254  00000254  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
...
 25 .bss          00031f70  00000000008d9e00  00000000008d9e00  002d9dfe  2**5
                  ALLOC
 26 .comment      00000058  0000000000000000  0000000000000000  002d9dfe  2**0
                  CONTENTS, READONLY
 27 .debug_aranges 000017f0  0000000000000000  0000000000000000  002d9e56  2**0
                  CONTENTS, READONLY, DEBUGGING
 28 .debug_info   00377bac  0000000000000000  0000000000000000  002db646  2**0
                  CONTENTS, READONLY, DEBUGGING
 29 .debug_abbrev 0001fcd7  0000000000000000  0000000000000000  006531f2  2**0
                  CONTENTS, READONLY, DEBUGGING
 30 .debug_line   0008b441  0000000000000000  0000000000000000  00672ec9  2**0
                  CONTENTS, READONLY, DEBUGGING
 31 .debug_str    00031f18  0000000000000000  0000000000000000  006fe30a  2**0
                  CONTENTS, READONLY, DEBUGGING
 32 .debug_loc    0034190c  0000000000000000  0000000000000000  00730222  2**0
                  CONTENTS, READONLY, DEBUGGING
 33 .debug_ranges 00062e10  0000000000000000  0000000000000000  00a71b2e  2**0
                  CONTENTS, READONLY, DEBUGGING

or readelf:

$ readelf -S ./python
There are 38 section headers, starting at offset 0xb41840:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000400238  00000238
       000000000000001c  0000000000000000   A       0     0     1

...

  [26] .bss              NOBITS           00000000008d9e00  002d9dfe
       0000000000031f70  0000000000000000  WA       0     0     32
  [27] .comment          PROGBITS         0000000000000000  002d9dfe
       0000000000000058  0000000000000001  MS       0     0     1
  [28] .debug_aranges    PROGBITS         0000000000000000  002d9e56
       00000000000017f0  0000000000000000           0     0     1
  [29] .debug_info       PROGBITS         0000000000000000  002db646
       0000000000377bac  0000000000000000           0     0     1
  [30] .debug_abbrev     PROGBITS         0000000000000000  006531f2
       000000000001fcd7  0000000000000000           0     0     1
  [31] .debug_line       PROGBITS         0000000000000000  00672ec9
       000000000008b441  0000000000000000           0     0     1
  [32] .debug_str        PROGBITS         0000000000000000  006fe30a
       0000000000031f18  0000000000000001  MS       0     0     1
  [33] .debug_loc        PROGBITS         0000000000000000  00730222
       000000000034190c  0000000000000000           0     0     1
  [34] .debug_ranges     PROGBITS         0000000000000000  00a71b2e
       0000000000062e10  0000000000000000           0     0     1
  [35] .shstrtab         STRTAB           0000000000000000  00b416d5
       0000000000000165  0000000000000000           0     0     1
  [36] .symtab           SYMTAB           0000000000000000  00ad4940
       000000000003f978  0000000000000018          37   8762     8
  [37] .strtab           STRTAB           0000000000000000  00b142b8
       000000000002d41d  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

as we see in our fresh compiled Python - it has .debug_* section, hence it has debug info.

Debug info is a collection of DIEs - Debug Info Entries. Each DIE has a tag specifying what kind of DIE it is and attributes that describes this DIE - things like variable name and line number.

How GDB finds source code

To find the sources GDB parses .debug_info section to find all DIEs with tag DW_TAG_compile_unit. The DIE with this tag has 2 main attributes DW_AT_comp_dir (compilation directory) and DW_AT_name - path to the source file. Combined they provide the full path to the source file for the particular compilation unit (object file).

To parse debug info you can again use objdump:

$ objdump -g ./python | vim -

and there you can see the parsed debug info:

Contents of the .debug_info section:

  Compilation Unit @ offset 0x0:
   Length:        0x222d (32-bit)
   Version:       4
   Abbrev Offset: 0x0
   Pointer Size:  8
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_producer    : (indirect string, offset: 0xb6b): GNU C99 6.3.1 20161221 (Red Hat 6.3.1-1) -mtune=generic -march=x86-64 -g -Og -std=c99
    <10>   DW_AT_language    : 12   (ANSI C99)
    <11>   DW_AT_name        : (indirect string, offset: 0x10ec): ./Programs/python.c
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x7a): /home/avd/dev/cpython
    <19>   DW_AT_low_pc      : 0x41d2f6
    <21>   DW_AT_high_pc     : 0x1b3
    <29>   DW_AT_stmt_list   : 0x0

It reads like this - for address range from DW_AT_low_pc = 0x41d2f6 to DW_AT_low_pc + DW_AT_high_pc = 0x41d2f6 + 0x1b3 = 0x41d4a9 source code file is the ./Programs/python.c located in /home/avd/dev/cpython. Pretty straightforward.

So this is what happens when GDB tries to show you the source code:

  • parses the .debug_info to find DW_AT_comp_dir with DW_AT_name attributes for the current object file (range of addresses)
  • opens the file at DW_AT_comp_dir/DW_AT_name
  • shows the content of the file to you

How to tell GDB where are the sources

Sometimes when you debug the program on the host other than the build host you can see this really frustrating message:

$ gdb -q python3.7
Reading symbols from python3.7...done.
(gdb) l
6   ./Programs/python.c: No such file or directory.

To fix this problem we have to obtain our sources on the target host (copy or git clone) and do one of the following:

1. Reconstruct the sources path

You can reconstruct the sources path on the target host, so GDB will find the source file where it expects. Stupid but it will work.

In my case, I can just do git clone https://github.com/python/cpython.git /home/avd/dev/cpython and checkout to the needed commit-ish.

2. Change GDB source path

You can direct GDB to the new source path right in the debug session with directory <dir> command:

(gdb) list
6   ./Programs/python.c: No such file or directory.
(gdb) directory /usr/src/python
Source directories searched: /usr/src/python:$cdir:$cwd
(gdb) list
6   #ifdef __FreeBSD__
7   #include <fenv.h>
8   #endif
9   
10  #ifdef MS_WINDOWS
11  int
12  wmain(int argc, wchar_t **argv)
13  {
14      return Py_Main(argc, argv);
15  }

3. Set GDB substitution rule

Sometimes adding another source path is not enough if you have complex hierarchy. In this case you can add substitution rule for source path with set substitute-path GDB command.

(gdb) list
6   ./Programs/python.c: No such file or directory.
(gdb) set substitute-path /home/avd/dev/cpython /usr/src/python
(gdb) list
6   #ifdef __FreeBSD__
7   #include <fenv.h>
8   #endif
9   
10  #ifdef MS_WINDOWS
11  int
12  wmain(int argc, wchar_t **argv)
13  {
14      return Py_Main(argc, argv);
15  }

4. Move binary to sources

You can trick GDB source path by moving binary to the directory with sources.

mv python /home/user/sources/cpython

This will work because GDB will try to look for sources in the current directory ($cwd) as the last resort.

5. Compile with -fdebug-prefix-map

You can substitute the source path on the build stage with -fdebug-prefix-map=old_path=new_path option. Here is how to do it within CPython project:

$ make distclean    # start clean
$ ./configure CFLAGS="-fdebug-prefix-map=$(pwd)=/usr/src/python" --with-pydebug
$ make -j

And now we have new sources dir:

$ objdump -g ./python
...
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_producer    : (indirect string, offset: 0xb65): GNU C99 6.3.1 20161221 (Red Hat 6.3.1-1) -mtune=generic -march=x86-64 -g -Og -std=c99
    <10>   DW_AT_language    : 12       (ANSI C99)
    <11>   DW_AT_name        : (indirect string, offset: 0x10ff): ./Programs/python.c
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x558): /usr/src/python
    <19>   DW_AT_low_pc      : 0x41d336
    <21>   DW_AT_high_pc     : 0x1b3
    <29>   DW_AT_stmt_list   : 0x0
...

This is the most robust way to do it because you can set it to something like /usr/src/<project>, install sources there from a package and debug like a boss.

Resources