//===----------------------------------------------------------------------===//
//                          Indirect Goto Design
//  Nov 2, 2009                                           Written by Dan Gohman
//===----------------------------------------------------------------------===//

This proposal does not address non-local jumps, nor does it address uses
of label addresses other than indirect gotos.

* Background

LLVM's current implementation of indirect goto is suboptimal. Each block
that has its address taken is assigned a unique integer value. The
"address" of a block is this value, and an indirect goto is represented
as a switch, with every block that the indirect goto could theoretically
jump to as a destination.

A more efficient implementation would be to make the "address" of a
block be its actual runtime address, and to make an indirect goto a
single jump instruction (on most target ISAs).

Indirect goto is not a commonly used language feature, but when it is
used, it's often in a performance-sensitive region of code. As such,
it's important that compilers use the efficient implementation. However,
it's typically not as important for compilers to do many of their
traditional optimizations in the vicinity of indirect gotos.

* Design overview

It's a goal of this design to support the efficient implementation while
minimizing the impact on the rest of the optimizer.

The design introduces two new constructs into the LLVM IR: block-address
constants and indirect-goto instructions.

Block-address constants look like this:

  blockaddress(@foo, %bb)

This returns the address of block %bb, which is in function @foo.

There's no restriction on where a block-address constant can be used. It
is even valid to take the address of a block of one function from within
a different function.

Having the function as an operand serves several purposes: It makes the
syntax unambiguous in the case that there are basic blocks with the
same name in multiple functions, and it puts the block address in the
use list for the function, which helps prevent the function from being
deleted while there are outstanding references to its blocks, among
other things.

Block-address constants have type i8*, though only because that's how
the C void* type is typically lowered to LLVM IR. It isn't really that
important what the specific pointer type is as it's never dereferenced
or used in arithmetic.

If a basic block has no predecessors, it may be removed, even if there
are live block-address constants in the Module. When this happens,
all remaining block-address constants for that block are immediately
replaced with a non-null sentry value.

This means that code that uses block-address labels for purposes other
than indirect gotos will remain unsupported. It would probably be
feasible to introduce some new intrinsics, and possibly also a new
constant kind, which can be used to replace block addresses in functions
that don't contain indirect gotos, but that's not covered here.

Indirect-goto instructions look like this:

  indirectbr i8* %dst, [ %L0, %L1, %L2 ]

The first operand is the value of a block-address constant, for the
block to branch to. The rest of the operands are all the blocks to which
this indirectbr may branch. If the first operand is not equal to one of
them, the behavior is undefined.

It would make indirectbr simpler to omit the list of destination blocks,
however the list has the benefit of making the indirectbr easier to
handle in a generic way by the optimizer. By having all the successors
explicitly listed, an indirectbr participates in the function CFG in the
same way as all other terminator instructions.

It is not valid to split CFG edges between the indirectbr and its
successors, because doing so would cause the indirecbr to branch to a
destination not listed in its successor list. Various LLVM optimizations
depend on being able to split critical edges, so they will need to be
aware of this case.

We actually made a significant effort to find a way to allow the edges
to be split in order to avoid encumbering the optimizer, however that
quickly became very awkward, and would have required very inefficient
codegen. The optimizer would have had to be taught how to avoid
splitting these edges in order to avoid pessimizing the code, however
that basically led to the same optimizer encumbrances that we were
trying to avoid. Consequently, it's better to just forbid splitting of
these edges altogether, avoiding the awkwardness.

* Preventing duplication of indirectbr instructions.

The basic plan is that functions should have at most a single
indirectbr instruction starting from the front-end and lasting through
optimization. In a late CodeGen phase, indirectbr instructions will
be duplicated as necessary to reduce unnecessary branching. It's not
invalid to have multiple indirectbr instructions, but since each
indirectbr typically needs to list each address-taken block as a
predecessor, this can lead to an explosion in the number of CFG edges,
which can slow down optimization.

To start, front-ends are strongly encouraged to factor all indirect
gotos within a function into a single block, so that there is at most
one indirectbr instruction per function. If the input has multiple
indirect gotos, they should be lowered into branches to the same basic
block containing the indirectbr instruction. Both llvm-gcc and clang
will do this, for example. Besides helping out the optimizer, this helps
avoid having the number of CFG edges grow quadratically in the worst
case.

In the optimizer, transformations which duplicate code, such as
inlining, loop unrolling, loop unswitching, etc. will be expected to
make use of a cost model. This is considered desirable regardless,
as these passes must balance optimization opportunities with
code size increases. There is a common cost model that is being
refactored from the inliner's custom cost model. Currently it lives in
lib/Analysis/InlineCost.cpp, though it is still evolving. Passes which
have their own cost model today will eventually be converted to use
this common one. To help the indirectbr situation, the cost model can
easily flag indirectbr instructions as things that are very expensive
to duplicate, which should effectively prevent the optimizer from
performing transformations that will get it in trouble.

Functions containing indirectbr instructions will not be inlined, at
least for the initial implementation. There are some cases where it
could be done, however there are many where it's complicated, and in
general it's not expected to be worthwhile. GCC never inlines functions
containing indirect gotos either, for comparison.

To eliminate any extra branching, CodeGen will perform tail-duplication
to copy the indirectbr instruction into blocks which would otherwise end
with a branch to the indirectbr instruction.