Sometimes, to understand what a program is doing, I’m tempted to add a log message at the start and end of every function, something like entering foo()
and returning from foo()
. When there are only a few functions this is easy enough to do manually, but if you want it for every function, some automation would be welcome. This could be automated using gcc’s -finstrument-functions
, but it (see docs, 1, 2, wrapper) has some awkwardness, and somehow I never got the hang of it. An alternative is to use gdb
, and here we’ll see how. This has the advantage that refining the format of the logging, or the conditions under which it is printed, does not require editing and recompiling the program every time.
The idea is this:
-
GDB allows scripting, including with Python.
-
In Python, can subclass (inherit from) gdb.Breakpoint
, to define a breakpoint with custom behaviour: override stop()
to control what happens (e.g. is printed) when the breakpoint is hit.
-
Can set breakpoints on all functions in a file, with something like rbreak filename.cc:.*
. To do it from Python, a trick can be used.
-
Can set “finish” breakpoints that run after a function exits, and with some trickery can even set breakpoints before a function exits.
With this much, a lot is possible, best illustrated with an example. Consider the following test.cc
file:
#include <iostream>
int fac(int n) {
if (n <= 1) {
return 1;
}
int prev = fac(n - 1);
int ret = n * prev;
return ret;
}
int main() {
std::cout << fac(5) << std::endl;
}
Now we can use the following script, say bp.py
:
def local_vars(frame, only_args=False):
"""Returns a list of local variables, like `gdb.execute('info locals')`.
If only_args is set, works like `gdb.execute('info args')`."""
local_vars = []
for sym in frame.block():
if only_args and not sym.is_argument: continue
value = sym.value(frame)
local_vars.append('%s %s=%s' % (sym.type.name, sym.name, value))
return ', '.join(local_vars)
def frames():
"""The stack, with all local variables."""
ret = []
f = gdb.selected_frame()
while f is not None and f.is_valid():
name = f.name()
ret.append('%s(%s)' % (name, local_vars(f, only_args=True)))
f = f.older()
ret.reverse()
return ret
def set_breakpoint_at_return():
"""Code from Ciro Santilli at https://stackoverflow.com/a/31264709
Sets a breakpoint just before return."""
frame = gdb.selected_frame()
# TODO make this work if there is no debugging information, where .block() fails.
block = frame.block()
# Find the function block in case we are in an inner block.
while block and not block.function: block = block.superblock
arch = frame.architecture()
pc = gdb.selected_frame().pc()
instructions = arch.disassemble(block.start, block.end - 1)
for instruction in instructions:
if instruction['asm'].startswith('retq '):
spec = '*%s' % instruction['addr']
# print('Setting breakpoint as', spec)
bp = MySillyBreakpoint(spec, internal=True)
bp.frame_at_creation = frame
class MyFinishBreakpoint(gdb.FinishBreakpoint):
def stop(self):
bt = frames()
print(bt, 'got back', self.return_value)
return False # Don't stop
class MySillyBreakpoint(gdb.Breakpoint):
"""Called just before function return."""
def stop(self):
if gdb.selected_frame() != self.frame_at_creation:
# This is not for me.
return False
bt = frames()
print(bt, 'returning')
self.enabled = False
return False # Don't stop
class MyBreakpoint(gdb.Breakpoint):
def stop(self):
bt = frames()
print(bt, 'called')
# Set breakpoints just before and after return.
set_breakpoint_at_return()
MyFinishBreakpoint(frame=gdb.newest_frame(), internal=True)
return False # Don't stop
gdb.execute('set python print-stack full')
MyBreakpoint('test.cc:fac')
gdb.execute('set pagination off')
gdb.execute('run')
gdb.execute('quit')
Now if the source file is compiled with -g
:
g++ -g test.cc -o test
we can run the script as follows:
gdb --eval-command 'source bp.py' --args ./test
to get the following output:
Reading symbols from ./test...done.
Breakpoint 1 at 0x91b: file test.cc, line 4.
['main()', 'fac(int n=5)'] called
['main()', 'fac(int n=5)', 'fac(int n=4)'] called
['main()', 'fac(int n=5)', 'fac(int n=4)', 'fac(int n=3)'] called
['main()', 'fac(int n=5)', 'fac(int n=4)', 'fac(int n=3)', 'fac(int n=2)'] called
['main()', 'fac(int n=5)', 'fac(int n=4)', 'fac(int n=3)', 'fac(int n=2)', 'fac(int n=1)'] called
['main()', 'fac(int n=5)', 'fac(int n=4)', 'fac(int n=3)', 'fac(int n=2)', 'fac(int n=1)'] returning
['main()', 'fac(int n=5)', 'fac(int n=4)', 'fac(int n=3)', 'fac(int n=2)'] got back 1
['main()', 'fac(int n=5)', 'fac(int n=4)', 'fac(int n=3)', 'fac(int n=2)'] returning
['main()', 'fac(int n=5)', 'fac(int n=4)', 'fac(int n=3)'] got back 2
['main()', 'fac(int n=5)', 'fac(int n=4)', 'fac(int n=3)'] returning
['main()', 'fac(int n=5)', 'fac(int n=4)'] got back 6
['main()', 'fac(int n=5)', 'fac(int n=4)'] returning
['main()', 'fac(int n=5)'] got back 24
['main()', 'fac(int n=5)'] returning
['main()'] got back 120
120
Note the trick of setting frame_at_creation
, to make sense of recursive functions. Even with it, try compiling with -O0
or -O1
or -O2
or -Og
to get surprising results.
Some further notes on scripting GDB with Python here, though I should probably merge them with this page.
A more complicated example, with a further modified bp.py
running on the TeX program, is here.