//===----------------------------------------------------------------------===// // Indirect Goto Design // Nov 2, 2009 Written by Dan Gohman //===----------------------------------------------------------------------===// This proposal does not address non-local jumps, nor does it address uses of label addresses other than indirect gotos. * Background LLVM's current implementation of indirect goto is suboptimal. Each block that has its address taken is assigned a unique integer value. The "address" of a block is this value, and an indirect goto is represented as a switch, with every block that the indirect goto could theoretically jump to as a destination. A more efficient implementation would be to make the "address" of a block be its actual runtime address, and to make an indirect goto a single jump instruction (on most target ISAs). Indirect goto is not a commonly used language feature, but when it is used, it's often in a performance-sensitive region of code. As such, it's important that compilers use the efficient implementation. However, it's typically not as important for compilers to do many of their traditional optimizations in the vicinity of indirect gotos. * Design overview It's a goal of this design to support the efficient implementation while minimizing the impact on the rest of the optimizer. The design introduces two new constructs into the LLVM IR: block-address constants and indirect-goto instructions. Block-address constants look like this: blockaddress(@foo, %bb) This returns the address of block %bb, which is in function @foo. There's no restriction on where a block-address constant can be used. It is even valid to take the address of a block of one function from within a different function. Having the function as an operand serves several purposes: It makes the syntax unambiguous in the case that there are basic blocks with the same name in multiple functions, and it puts the block address in the use list for the function, which helps prevent the function from being deleted while there are outstanding references to its blocks, among other things. Block-address constants have type i8*, though only because that's how the C void* type is typically lowered to LLVM IR. It isn't really that important what the specific pointer type is as it's never dereferenced or used in arithmetic. If a basic block has no predecessors, it may be removed, even if there are live block-address constants in the Module. When this happens, all remaining block-address constants for that block are immediately replaced with a non-null sentry value. This means that code that uses block-address labels for purposes other than indirect gotos will remain unsupported. It would probably be feasible to introduce some new intrinsics, and possibly also a new constant kind, which can be used to replace block addresses in functions that don't contain indirect gotos, but that's not covered here. Indirect-goto instructions look like this: indirectbr i8* %dst, [ %L0, %L1, %L2 ] The first operand is the value of a block-address constant, for the block to branch to. The rest of the operands are all the blocks to which this indirectbr may branch. If the first operand is not equal to one of them, the behavior is undefined. It would make indirectbr simpler to omit the list of destination blocks, however the list has the benefit of making the indirectbr easier to handle in a generic way by the optimizer. By having all the successors explicitly listed, an indirectbr participates in the function CFG in the same way as all other terminator instructions. It is not valid to split CFG edges between the indirectbr and its successors, because doing so would cause the indirecbr to branch to a destination not listed in its successor list. Various LLVM optimizations depend on being able to split critical edges, so they will need to be aware of this case. We actually made a significant effort to find a way to allow the edges to be split in order to avoid encumbering the optimizer, however that quickly became very awkward, and would have required very inefficient codegen. The optimizer would have had to be taught how to avoid splitting these edges in order to avoid pessimizing the code, however that basically led to the same optimizer encumbrances that we were trying to avoid. Consequently, it's better to just forbid splitting of these edges altogether, avoiding the awkwardness. * Preventing duplication of indirectbr instructions. The basic plan is that functions should have at most a single indirectbr instruction starting from the front-end and lasting through optimization. In a late CodeGen phase, indirectbr instructions will be duplicated as necessary to reduce unnecessary branching. It's not invalid to have multiple indirectbr instructions, but since each indirectbr typically needs to list each address-taken block as a predecessor, this can lead to an explosion in the number of CFG edges, which can slow down optimization. To start, front-ends are strongly encouraged to factor all indirect gotos within a function into a single block, so that there is at most one indirectbr instruction per function. If the input has multiple indirect gotos, they should be lowered into branches to the same basic block containing the indirectbr instruction. Both llvm-gcc and clang will do this, for example. Besides helping out the optimizer, this helps avoid having the number of CFG edges grow quadratically in the worst case. In the optimizer, transformations which duplicate code, such as inlining, loop unrolling, loop unswitching, etc. will be expected to make use of a cost model. This is considered desirable regardless, as these passes must balance optimization opportunities with code size increases. There is a common cost model that is being refactored from the inliner's custom cost model. Currently it lives in lib/Analysis/InlineCost.cpp, though it is still evolving. Passes which have their own cost model today will eventually be converted to use this common one. To help the indirectbr situation, the cost model can easily flag indirectbr instructions as things that are very expensive to duplicate, which should effectively prevent the optimizer from performing transformations that will get it in trouble. Functions containing indirectbr instructions will not be inlined, at least for the initial implementation. There are some cases where it could be done, however there are many where it's complicated, and in general it's not expected to be worthwhile. GCC never inlines functions containing indirect gotos either, for comparison. To eliminate any extra branching, CodeGen will perform tail-duplication to copy the indirectbr instruction into blocks which would otherwise end with a branch to the indirectbr instruction.