//===----------------------------------------------------------------------===//
//                        LLVM Debug Info Improvements
//===----------------------------------------------------------------------===//

3/1/2008 - initial revision

At the time of this writing, LLVM's DWARF debug info generation works
reasonably well at -O0, but it is completely disabled at optimization levels
-O1 and higher.  This is because our debug info representation interferes
with optimizations, transparently disabling them in cases where they would not
update it correctly.  This is useful for preserving correct debug info, but it
is not what people expect when they use 'llvm-gcc -O3 -g foo.c'.

This document describes a path forward that will get us to a place where
turning on debug info does not pessimize code, and still preserves the
invariant that we don't produce bogus debug info.

Before I get too far here, I want to raise an important point.  My goals are:

1. Do not disable optimization when debug info is turned on.
2. Do not generate incorrect/bogus debug info.

Note that I am explicitly not trying to 'solve' the "debugging optimized code"
problem here.  My goal is that if we emit debug info that it be correct - if we
cannot emit debug info, then the optimization should remove it... not generate
silently broken information.

This is a long project, and will take quite a bit of work in all areas before
we can declare "success", but it is worthwhile, and important and useful steps 
can be made without solving the whole problem.  

For sake of discussion, I'll split debug info generation into two pieces: line
number information (which I'll implicitly assume includes function boundary
info) and variable description/location information.  Type description info
isn't generally interested, because we never need to do anything to keep it up
to date.

//===----------------------------------------------------------------------===//
// Testing
//

One of the most useful things to get started is to have some way to determine
whether codegen is being impacted by debug info.  It is important to be able to 
tell when this happens so that we can track down these places and fix them.

I propose that we add a -strip-debug pass that removes all debug info from the
LLVM IR.  Given this, it would allow us to do:

$ llvm-gcc -O3 -c -o - | llc > good.s
$ llvm-gcc -O3 -c -g -o - | opt -strip-debug | llc > test.s
$ diff good.s test.s

If the two .s files differed, then badness happened.  This obviously only
catches badness that happens in the LLVM optimizer, if the code generator is
broken, we'll need something more sophisticated that strips debug info out of
the .s file.  In any case, this is a good place to start, and should be turned
into a llvm-test TEST/report.

Incidentally, we have to go through codegen, we can't diff .ll files after
debug info is stripped out.  This is because debug info is allowed to (and
probably does) impact local names within functions, but these functions are
removed at codegen and are not important to preserve.

//===----------------------------------------------------------------------===//
// Updating Line Number Information
//

Once we have a way to verify what is happening, I propose that we aim for an
intermediate point: instead of having -O disable all debug info, we should make
it disable just variable information, but keep emitting line number info.  This
would allow stepping through the program, getting stack traces, use
performance tools like shark, etc.

When the front-end has a mode that causes it to emit line number info but not
variable info, we can go through the process above to identify passes that
change behavior when line number intrinsics are in the code.  Obvious cases are
things like loop unroll and inlining: they 'measure' the size of some code to 
determine whether to unroll it or not.  This means that it should be enhanced 
to ignore debug intrinsics for the sake of code size estimation.  

Another example is optimizations like SimplifyCFG when it merges if/then/else 
into select instructions.  SimplifyCFG will have to be enhanced to ignore debug
intrinsics when doing its safety/profitability analysis, but then it will also
have to be updated to just delete the line number intrinsics when it does the
xform.  This is simplifycfg's way of "updating" the debug info for this example
transformation.

As we progress through various optimizations, we will find cases where it is
possible to update (e.g. loop unroll or inlining, which doesn't have to do
anything special to update line #'s) and places where it isn't.  As long as the
debug intrinsics don't affect codegen, we are happy, even if the debug
intrinsics are deleted in cases where it would be possible to update them (this
becomes a optimized debugging QoI issue).

When we go through this and the optimizer is updated, there is still surely
work to do in the codegen level.

//===----------------------------------------------------------------------===//
// Updating Variable Info
//

Discussion moved to DebugInfoVariableInfo.txt