Global Register Variables ------------------------- Global register variables are an interesting GCC extension that is widely used for code generation of functional languages that have very specific and frequently used data. The syntax is very simple and llvm-gcc and clang don't really support it right. The basic idea is very simple: if you declare a global variable like this: register int R1 __asm__ ("esi"); You can access R1 anywhere, and you always get the esi register. Here is a discussion of how to implement it from LLVMDev: Chris Lattner clattner at apple.com Subject: Pinning registers in LLVM Mon Jun 29 23:46:47 CDT 2009 On Jun 28, 2009, at 11:00 PM, David Terei wrote: > Hi all, > > I'm working on using LLVM as a back-end for an existing compiler (GHC > Haskell compiler) Very cool! > and one of the problems I'm having is pinning a > global variable to a actual machine register. I've seen mixed > terminology for this feature/idea, so what I mean by this is that I > want to be able to put a global variable into a specified hardware > register. Lets separate two things here: 1) GCC's implementation of this feature 2) the semantic/perf effect of doing it. For 1) GCC implements this feature (with the example code you gave) by globally changing the allocatable register set for the backend and pinning the value to the specified physical register. This is really easy for GCC to do (yay, global variables for everyone, even the backend) and has the "right effect". However, this implementation is inappropriate in LLVM: if we wanted to take this approach, we'd have to encode the set of pinned physregs in the top-level module structure somewhere: this is not impossible, but it is kinda ugly. #2 is the more interesting part of this. Ignoring GCC's implementation of this, the semantic effect of this is that the calling convention of the functions in the translation unit are changed (so that the global is guaranteed to be in the specific physreg on entrance/exit of the function) and the global is guaranteed to be in the register in inline asms. Interestingly (to me at least :), there is no guarantee that this value be in the physreg at a random point in the function. There is no "defined" way to notice this, so the compiler can cheat and reuse the register if it wants to (e.g. spilling the temp value to the stack etc). While you could notice this with a debugger, performance tool, etc, normal code should be fine. > This declaration should thus reserve that machine register > for exclusive use by this global variable. This is used in GHC since > it defines an abstract machine as part of its execution model, with > this abstract machine consisting of several virtual registers. Due to > the frequency the virtual registers are accessed it is best for > performance that they be permanently assigned to a physical machine > register. Right. Coming back to "why do this", you want it because it is good for performance: these values are accessed frequently enough that going to globals (particularly for PIC code) is too expensive. > A very simple example C program using this feature: > > -------------------------- > #include > > register int R1 __asm__ ("esi"); > > int main(void) > { > R1 = 3; > printf("register: %d\n", R1); > R1 *= 2; > printf("register: %d\n", R1); > return 0; > } > -------------------------- > > llvm-gcc doesn't compile this program correctly, although according to > the llvm-gcc release notes this extension was first supported by llvm- > gcc in 1.9. This program actually works for me if you build with -O, but it looks like it is an accident that it works :). The implementation in llvm- gcc could definitely be fixed in this case. However, the more interesting example wouldn't work: if printf were some other function and you read ESI in it. If it were important to me to implement this, I'd implement this extension by adding a new custom calling convention to the X86 backend that always passed the first i32 value in ESI and always returned the first i32 value in ESI. Given that, you could lower the above code to something like this pseudo code: {i32,i32} @main(i32 %in_esi) { %esi = alloca i32 store in_esi -> esi store 3 -> esi esi1 = load esi {esi2, dead} = call @printf(esi1, "register: %d\n", esi1); store esi2 -> esi esi3 = load esi esi4 = esi3*2 store esi4 -> esi esi5 = load esi {esi6, dead} = call @printf(esi5, "register: %d\n", esi5); store esi6 -> esi esi7 = load esi ret {esi7, 0} } Each of printf and main would be marked with the custom CC. After running mem2reg on this, you'd get: {i32,i32} @main(i32 %in_esi) { {esi2, dead} = call @printf(3, "register: %d\n", 3); esi4 = esi2*2 {esi6, dead} = call @printf(esi4, "register: %d\n", esi4); ret {esi6, 0} } When lowered at codegen time, the regalloc would trivially eliminate the copies into/out-of ESI and you'd get the code you desired. No, I don't know of anyone planning to implement this, but it is conceptually quite simple :) -Chris