//===----------------------------------------------------------------------===// // Fix the PassManager to make the inliner work *correctly* //===----------------------------------------------------------------------===// 2004-08-18 The problem: inlining is an extremely important optimization for many classes of programming languages. The traditional way to inline is to inline very aggressively (early in the compile process), up to a particular threshold. The idea is that subsequent scalar and other optimizations will optimize away most of what is left. There are several problems with this. First, in many cases, this does not inline ENOUGH. In particular, programs with many layzers of abstraction simply *dissolve* during optimization, once layer after layer are peeled away. Because we inline up to a threshold, once the inline threshold has been met, nothing further is inlined. Second, this approach inlines TOO MUCH. Largely because of the first problem, and due to benchmarks like stepanov, compiler authors tend to crank up the inlining threshold way too high, or otherwise make their heuristics inline way too much. This bloats executables and leads to excessive compile time. Because not all code has layers of abstractions that are easily removed, inlining with the same threshold that for "abstract" programs bloats in either case. Third, even if we inline exactly the right amount (given omniscent heuristics), we still lose. The problem is that code is duplicated (inlined) all over the place before optimization. Because this code is duplicated, it has to be optimized away in every place that it is inlined, increasing compile time. It would be better if we could optimize the code BEFORE the inline happens, so that *optimized* code is duplicated instead of unoptimized code. Both GCC and LLVM suffer from this problem, though GCC does to a much higher degree. In particular, GCC does very very little optimization before inlining (mostly tree expression folding). LLVM does a bit more, but is limited to mem2reg and simplifycfg. GCC inlines very aggressively (at -O3), which leads to code bloat and slow compile times. LLVM has a more conservative inliner (which it can usually afford due to IP optzns) which causes it to miss certain "obvious" cases. //===----------------------------------------------------------------------===// // The Solution // The fix for this is actually very easy and straight-forward. In particular, The inliner (and a few other optzns) are CallGraphSCCPass's, meaning that they operate "bottom-up" on the call graph (visiting callees before callers). Currently, the CallGraphSCCPass class is just a "Pass", and is scheduled exactly as such by the PassManager. Instead, the PassManager should be enhanced to understand the semantics of SGSCCPasses (just like it knows about FunctionPass'es) and schedule function passes into them. In particular, consider the pass sequence "-inline -mem2reg". Instead of running the -inline pass on the whole module, then the -mem2reg pass on every function in the module, in this configuration: PassManager Inliner FunctionPassBatcher Mem2Reg Ihe pass manager should interlace execution of mem2reg *with* executions of the inliner pass, in this configuration: PassManager CallGraphSCCPassBatcher Inliner SCCFunctionPassBatcher Mem2reg Consider a program with a call graph like this: void A() { B(); C(); } void B() {} void C() {} On this program, the inliner should be run on B (doing nothing, no callees), then the mem2reg pass should be run on B. Next the inliner should be run on C (doing nothing), followed by mem2reg on C. Once B and C are done, the inliner would run on A, and (finally) mem2reg would be run on A. The strenght of this approach is that it is A. simple and automatic, and B. effective. It is simple because we just need to introduce a new batcher class for CallGraphSCCPass's, and effective because we are now inlining functions that have already been optimized by all scalar optimizations. This means that we should be able to set the inlining threshold lower (as there is less chance that subsequent optimizations will decimate the remaining code), we are inlining optimized code, so we don't have to duplicate optimizations. The pre-existing heuristics for inlining (which guess when inlining will make other important optimizations more effective in the caller) automatically become more effective and less pessimistic. Note that there are some funny things that this implies. In particular, if you have an empty pass manager and add a function pass to it, the function pass will be run on all functions in the module in module order. OTOH, if you add a SCC Pass then a FunctionPass, the funciton pass will be executed in SCC order on the function (bottom up on the call graph), not in module order. Funny how we defined the semantics of FunctionPass's to not be able to notice this. :) Enhancing the PassManager is currently more difficult than it should be, owing largely to the fact that PassManagerT.h is a mess. It should be rewritten in an object oriented style, in the process, allowing stuff like this (and the loop optimizer) to be easily added.