//===----------------------------------------------------------------------===// // LLVM Debug Info Improvements //===----------------------------------------------------------------------===// 3/1/2008 - initial revision At the time of this writing, LLVM's DWARF debug info generation works reasonably well at -O0, but it is completely disabled at optimization levels -O1 and higher. This is because our debug info representation interferes with optimizations, transparently disabling them in cases where they would not update it correctly. This is useful for preserving correct debug info, but it is not what people expect when they use 'llvm-gcc -O3 -g foo.c'. This document describes a path forward that will get us to a place where turning on debug info does not pessimize code, and still preserves the invariant that we don't produce bogus debug info. Before I get too far here, I want to raise an important point. My goals are: 1. Do not disable optimization when debug info is turned on. 2. Do not generate incorrect/bogus debug info. Note that I am explicitly not trying to 'solve' the "debugging optimized code" problem here. My goal is that if we emit debug info that it be correct - if we cannot emit debug info, then the optimization should remove it... not generate silently broken information. This is a long project, and will take quite a bit of work in all areas before we can declare "success", but it is worthwhile, and important and useful steps can be made without solving the whole problem. For sake of discussion, I'll split debug info generation into two pieces: line number information (which I'll implicitly assume includes function boundary info) and variable description/location information. Type description info isn't generally interested, because we never need to do anything to keep it up to date. //===----------------------------------------------------------------------===// // Testing // One of the most useful things to get started is to have some way to determine whether codegen is being impacted by debug info. It is important to be able to tell when this happens so that we can track down these places and fix them. I propose that we add a -strip-debug pass that removes all debug info from the LLVM IR. Given this, it would allow us to do: $ llvm-gcc -O3 -c -o - | llc > good.s $ llvm-gcc -O3 -c -g -o - | opt -strip-debug | llc > test.s $ diff good.s test.s If the two .s files differed, then badness happened. This obviously only catches badness that happens in the LLVM optimizer, if the code generator is broken, we'll need something more sophisticated that strips debug info out of the .s file. In any case, this is a good place to start, and should be turned into a llvm-test TEST/report. Incidentally, we have to go through codegen, we can't diff .ll files after debug info is stripped out. This is because debug info is allowed to (and probably does) impact local names within functions, but these functions are removed at codegen and are not important to preserve. //===----------------------------------------------------------------------===// // Updating Line Number Information // Once we have a way to verify what is happening, I propose that we aim for an intermediate point: instead of having -O disable all debug info, we should make it disable just variable information, but keep emitting line number info. This would allow stepping through the program, getting stack traces, use performance tools like shark, etc. When the front-end has a mode that causes it to emit line number info but not variable info, we can go through the process above to identify passes that change behavior when line number intrinsics are in the code. Obvious cases are things like loop unroll and inlining: they 'measure' the size of some code to determine whether to unroll it or not. This means that it should be enhanced to ignore debug intrinsics for the sake of code size estimation. Another example is optimizations like SimplifyCFG when it merges if/then/else into select instructions. SimplifyCFG will have to be enhanced to ignore debug intrinsics when doing its safety/profitability analysis, but then it will also have to be updated to just delete the line number intrinsics when it does the xform. This is simplifycfg's way of "updating" the debug info for this example transformation. As we progress through various optimizations, we will find cases where it is possible to update (e.g. loop unroll or inlining, which doesn't have to do anything special to update line #'s) and places where it isn't. As long as the debug intrinsics don't affect codegen, we are happy, even if the debug intrinsics are deleted in cases where it would be possible to update them (this becomes a optimized debugging QoI issue). When we go through this and the optimizer is updated, there is still surely work to do in the codegen level. //===----------------------------------------------------------------------===// // Updating Variable Info // Discussion moved to DebugInfoVariableInfo.txt