So, you have a binary that you or someone developed and, surprise, it has some bug. Or you just curious how it’s working. Great tool to help with these cases is a debugger.
It’s really seldom when you want to debug on assembly level, usually, you want to see the sources. But often times you debug the program on the host other than the build host and see this really frustrating message:
$ gdb -q python3.7
Reading symbols from python3.7...done.
(gdb) l
6 ./Programs/python.c: No such file or directory.
Ouch. Everybody was here. I’ve seen this so often while it’s so vital for sensible debugging so I think it’s very important to get into details and understand how GDB shows source code in debugging session.
It all starts with debug info - special sections in the binary file produced by the compiler and used by the debugger and other handy tools.
In GCC there is well-known -g
flag for that. Most projects with some kind of
build system either build with debug info by default or have some flag for it.
In the case of CPython, -g
is added by default but nevertheless, we’re better
off adding --with-pydebug
to enable all kinds of debug options available in
CPython:
$ ./configure --with-pydebug
$ make -j
While you’re watching the compilation log, notice the -g
option in gcc
invocations.
This -g
option will generate debug sections - binary sections to insert into
program’s binary. These sections are usually in DWARF format. For ELF binaries
these debug sections have names like .debug_*
, e.g. .debug_info
or
.debug_loc
. These debug sections are what makes the magic of debugging
possible - basically, it’s a mapping of assembly level instructions to the
source code.
To find whether your program has debug symbols you can list the sections of the
binary with objdump
:
$ objdump -h ./python
python: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 0000001c 0000000000400238 0000000000400238 00000238 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.ABI-tag 00000020 0000000000400254 0000000000400254 00000254 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
...
25 .bss 00031f70 00000000008d9e00 00000000008d9e00 002d9dfe 2**5
ALLOC
26 .comment 00000058 0000000000000000 0000000000000000 002d9dfe 2**0
CONTENTS, READONLY
27 .debug_aranges 000017f0 0000000000000000 0000000000000000 002d9e56 2**0
CONTENTS, READONLY, DEBUGGING
28 .debug_info 00377bac 0000000000000000 0000000000000000 002db646 2**0
CONTENTS, READONLY, DEBUGGING
29 .debug_abbrev 0001fcd7 0000000000000000 0000000000000000 006531f2 2**0
CONTENTS, READONLY, DEBUGGING
30 .debug_line 0008b441 0000000000000000 0000000000000000 00672ec9 2**0
CONTENTS, READONLY, DEBUGGING
31 .debug_str 00031f18 0000000000000000 0000000000000000 006fe30a 2**0
CONTENTS, READONLY, DEBUGGING
32 .debug_loc 0034190c 0000000000000000 0000000000000000 00730222 2**0
CONTENTS, READONLY, DEBUGGING
33 .debug_ranges 00062e10 0000000000000000 0000000000000000 00a71b2e 2**0
CONTENTS, READONLY, DEBUGGING
or readelf
:
$ readelf -S ./python
There are 38 section headers, starting at offset 0xb41840:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
...
[26] .bss NOBITS 00000000008d9e00 002d9dfe
0000000000031f70 0000000000000000 WA 0 0 32
[27] .comment PROGBITS 0000000000000000 002d9dfe
0000000000000058 0000000000000001 MS 0 0 1
[28] .debug_aranges PROGBITS 0000000000000000 002d9e56
00000000000017f0 0000000000000000 0 0 1
[29] .debug_info PROGBITS 0000000000000000 002db646
0000000000377bac 0000000000000000 0 0 1
[30] .debug_abbrev PROGBITS 0000000000000000 006531f2
000000000001fcd7 0000000000000000 0 0 1
[31] .debug_line PROGBITS 0000000000000000 00672ec9
000000000008b441 0000000000000000 0 0 1
[32] .debug_str PROGBITS 0000000000000000 006fe30a
0000000000031f18 0000000000000001 MS 0 0 1
[33] .debug_loc PROGBITS 0000000000000000 00730222
000000000034190c 0000000000000000 0 0 1
[34] .debug_ranges PROGBITS 0000000000000000 00a71b2e
0000000000062e10 0000000000000000 0 0 1
[35] .shstrtab STRTAB 0000000000000000 00b416d5
0000000000000165 0000000000000000 0 0 1
[36] .symtab SYMTAB 0000000000000000 00ad4940
000000000003f978 0000000000000018 37 8762 8
[37] .strtab STRTAB 0000000000000000 00b142b8
000000000002d41d 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
as we see in our fresh compiled Python - it has .debug_*
section, hence it has
debug info.
Debug info is a collection of DIEs - Debug Info Entries. Each DIE has a tag specifying what kind of DIE it is and attributes that describes this DIE - things like variable name and line number.
To find the sources GDB parses .debug_info
section to find all DIEs with tag
DW_TAG_compile_unit
. The DIE with this tag has 2 main attributes
DW_AT_comp_dir
(compilation directory) and DW_AT_name
- path to the source
file. Combined they provide the full path to the source file for the particular
compilation unit (object file).
To parse debug info you can again use objdump
:
$ objdump -g ./python | vim -
and there you can see the parsed debug info:
Contents of the .debug_info section:
Compilation Unit @ offset 0x0:
Length: 0x222d (32-bit)
Version: 4
Abbrev Offset: 0x0
Pointer Size: 8
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<c> DW_AT_producer : (indirect string, offset: 0xb6b): GNU C99 6.3.1 20161221 (Red Hat 6.3.1-1) -mtune=generic -march=x86-64 -g -Og -std=c99
<10> DW_AT_language : 12 (ANSI C99)
<11> DW_AT_name : (indirect string, offset: 0x10ec): ./Programs/python.c
<15> DW_AT_comp_dir : (indirect string, offset: 0x7a): /home/avd/dev/cpython
<19> DW_AT_low_pc : 0x41d2f6
<21> DW_AT_high_pc : 0x1b3
<29> DW_AT_stmt_list : 0x0
It reads like this - for address range from DW_AT_low_pc
= 0x41d2f6
to
DW_AT_low_pc + DW_AT_high_pc
= 0x41d2f6
+ 0x1b3
= 0x41d4a9
source code
file is the ./Programs/python.c
located in /home/avd/dev/cpython
. Pretty
straightforward.
So this is what happens when GDB tries to show you the source code:
.debug_info
to find DW_AT_comp_dir
with DW_AT_name
attributes
for the current object file (range of addresses)DW_AT_comp_dir/DW_AT_name
So to fix our problem with ./Programs/python.c: No such file or directory.
we
have to obtain our sources on the target host (copy or git clone
) and do one
of the following:
You can reconstruct the sources path on the target host, so GDB will find the source file where it expects. Stupid but it will work.
In my case, I can just do
git clone https://github.com/python/cpython.git /home/avd/dev/cpython
and checkout to the needed commit-ish.
You can direct GDB to the new source path right in the debug session with
directory <dir>
command:
(gdb) list
6 ./Programs/python.c: No such file or directory.
(gdb) directory /usr/src/python
Source directories searched: /usr/src/python:$cdir:$cwd
(gdb) list
6 #ifdef __FreeBSD__
7 #include <fenv.h>
8 #endif
9
10 #ifdef MS_WINDOWS
11 int
12 wmain(int argc, wchar_t **argv)
13 {
14 return Py_Main(argc, argv);
15 }
Sometimes adding another source path is not enough if you have complex
hierarchy. In this case you can add substitution rule for source path with set substitute-path
GDB command.
(gdb) list
6 ./Programs/python.c: No such file or directory.
(gdb) set substitute-path /home/avd/dev/cpython /usr/src/python
(gdb) list
6 #ifdef __FreeBSD__
7 #include <fenv.h>
8 #endif
9
10 #ifdef MS_WINDOWS
11 int
12 wmain(int argc, wchar_t **argv)
13 {
14 return Py_Main(argc, argv);
15 }
You can trick GDB source path by moving binary to the directory with sources.
mv python /home/user/sources/cpython
This will work because GDB will try to look for sources in the current
directory ($cwd
) as the last resort.
-fdebug-prefix-map
You can substitute the source path on the build stage with
-fdebug-prefix-map=old_path=new_path
option. Here is how to do it within
CPython project:
$ make distclean # start clean
$ ./configure CFLAGS="-fdebug-prefix-map=$(pwd)=/usr/src/python" --with-pydebug
$ make -j
And now we have new sources dir:
$ objdump -g ./python
...
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<c> DW_AT_producer : (indirect string, offset: 0xb65): GNU C99 6.3.1 20161221 (Red Hat 6.3.1-1) -mtune=generic -march=x86-64 -g -Og -std=c99
<10> DW_AT_language : 12 (ANSI C99)
<11> DW_AT_name : (indirect string, offset: 0x10ff): ./Programs/python.c
<15> DW_AT_comp_dir : (indirect string, offset: 0x558): /usr/src/python
<19> DW_AT_low_pc : 0x41d336
<21> DW_AT_high_pc : 0x1b3
<29> DW_AT_stmt_list : 0x0
...
This is the most robust way to do it because you can set it to something like
/usr/src/<project>
, install sources there from a package and debug like a boss.
GDB uses debug info stored in DWARF format to find source level info. DWARF is pretty straightforward format - basically, it’s a tree of DIEs (Debug Info Entries) that describes object files of your programs along with variables and functions.
There are multiple ways to help GDB find sources, where the easiest ones are
directory
and set substitute-path
commands, though -fdebug-prefix-map
is
really useful.
Now, when you have source level info go and explore something!