//===----------------------------------------------------------------------===// // Aggregates as First Class Values //===----------------------------------------------------------------------===// 4/26/2008 - initial revision LLVM has a distinction between first class values and other types of values. First class values can be returned by instructions, passed to functions, loaded, stored, PHI'd etc. Currently the first class value types are: 1. Integer 2. Floating point 3. Pointer 4. Vector 5. Opaque (which is assumed to eventually resolve to 1-4) The non-first-class types are: 5. Array 6. Structure/Packed Structure 7. Function 8. Void 9. Label In addition to first class values, there is a new wrinkle as of LLVM 2.3: calls and invokes are allowed to return structs directly to support Multiple Return Values. This is currently an odd special case just for calls, and only currently supports structs whose elements are all first class values. This proposal basically suggests that we extend this to make arbitrary structs and arrays first-class values as well. There is a separate proposal to eliminate Void, and Label may eventually also become a first-class type (to directly support taking the address of a basic block). //===----------------------------------------------------------------------===// // The idea The basic idea is simple: structs and arrays become first class values, so they can be loaded, stored, PHI'd, passed/returned by-value to/from calls etc. Because they are first-class "register" values, it is not possible to take their address. Also, I do not propose making arithmetic operators work on these types (you can't add to arrays of double for example). To support creation of these first class aggregates, a new instruction to complement getresult is added, named 'insertvalue' (I'd like to name it 'insertelement' but that is already taken for vectors, and insertelement allows a variable index unlike insertvalue). 'insertvalue' takes two Value operands: a input aggregate to 'update' and an input value to insert into it. It also takes a variadic list of integers that specify the insertion position. The result type of the instruction is always the same type as the first Value operand. For example: %X = insertvalue {i32, i8} %C, i32 %V, 0 ; yields {i32, i8} %Y = insertvalue {float, {i32, i8}} %I, {i32, i8} %X, 1 %Z = insertvalue {float, {i32, i8}} %Y, i32 42, 1, 0 Since we are generalizing getresult, I propose we rename it to 'extractvalue'. It works just like 'getresult' does today, but takes a variadic list of indices: %Q = extractvalue {float, {i32, i8}} %Z, 1, 1 ; Returns i8 The indices do not have types because they are not LLVM Values, they are simple integers like the index for getresult today is. //===----------------------------------------------------------------------===// // Code Generation Code generation for these aggregates is very simple: they are implicitly scalarized and all of the elements are turned into registers. This should happen just like we current split up vectors into smaller vectors (or scalars) when the vector type is not supported by the target. This can all be done in 'SelectionDAGISel' so the majority of the code generator won't even notice. Also, the optimizer will generally decompose these as much as possible, just to remove abstraction and from the code and expose the computation. An interesting subtlty of this is that "padding" in aggregates is not modified by stores of a struct. Also, because elements are scalarized immediately, large aggregates will be as efficient as handling their elements. This means that very large aggregates (those with dozens of elements) will not be efficient: it would probably be better to leave them in memory. This also means that you shouldn't try to emulate memcpy by doing large load and stores: it will be more efficient to use memcpy, which can do bulk transfer instead of element-by-element copies. As far as the ABI goes, I suggest that aggregates be passed just like 'byval' arguments by default. The only difference is that the value is passed and received, instead of a shadow copy of the pointee passed in. //===----------------------------------------------------------------------===// // Benefits The first benefit of this change is that the LLVM IR becomes more orthogonal: right now, calls returning aggregates is a strange special case that the compiler has to handle. If aggregates were first class values, the magic wouldn't be part of calls. This would also allow use to make 'return' more consistent: it would always take zero or one argument (and killing off void would make it always be one) instead of being variadic. The next benefit is that this makes front-end authors lives easier in a few cases. For example, some languages have first-class support for generic tuples and this means that they don't have to track the elements themselves or force the tuples into memory. One specific example of this is handling of complex numbers: Currently clang has a duplicate copy of the scalar code generator that is specialized for complex because "codegen" on an expression needs to return two Value*'s instead of one. llvm-gcc avoids this by forcing things into memory, but this is bad for compile time and performance at -O0. Another benefit is that this may potentially help solve some of the performance issues we suffer with small 'byval' arguments. 'byval' is incredibly important for the case when someone passes a large struct by value (which people do sometimes, without knowing the perf cost), but it produces poor code when the byval argument is a *small* struct. The performance hit comes from the fact that the value is forced into memory (because byval requires its address to be exposed). For small structs that are usually promoted to registers by SRoA (e.g. long double complex on some architectures) this ends up causing a lot of stack traffic that is unneeded and generally disables mem2reg and sroa on values passed to functions. It is actually interesting that byval and first-class aggregates have exactly complementary strengths: the first is great for large structs, the later is great for small ones. The final benefit is that a small collections of optimizations will improve. For example, mem2reg would suddenly subsume a big chunk of the functionality of SRoA, it would be easy to get interprocedural passes propagating information through these arguments (e.g. interprocedural constant prop of elements of the struct), the inliner doesn't need to make a memcpy for these like it does for byval, etc.