//===----------------------------------------------------------------------===// // Relaxed Floating Point Support //===----------------------------------------------------------------------===// 8/18/2004 One of the (incredibly important) things that LLVM lacks today is support for "relaxed" floating point semantics. In particular, LLVM currently provides 100% IEEE compliant operators, except in the X86 code generator. The problem with this is that it simultaneously reduces the precision of the end result (e.g. by disabling FMA instructions, which elide an the rounding operation between the M and A) and pessimizes the code. This assumption pessimizes the code by disabling important optimizations. For example, LLVM is not allowed to delete operations like "X + 0.0", because: 1. If X is an SNAN, the operation is supposed to trap. 2. If X is -0.0, the result is 0.0, not -0.0. In addition, strict floating point additions are commutative but not associative, which means that many optimizations cannot be performed. For example, many parallelizing, and vectorizing xforms need to change the order of operations, and even simple optimizations like CSE'ing (X + Y + X + Y) cannot be performed (because this is parsed as (((X + Y) + X) + Y). Note also that strict support on X86 is very expensive, which is why we don't provide it (neither does any other compiler I'm aware of, except some, extremely rare, Java systems). In particular GCC doesn't, even with -fstore-float (a disturbing half-way hack) GCC provides relaxed floating point support through the flags -ffast-math, -funsafe-math-optimizations, -fhonor-{s}nans, and some other flags that noone really uses (-ffast-math is all most people know about). Using a global flag like this is a horrible idea for at least three reasons: 1. Global flags that effect optimizers are horrible and gross. 2. LLVM can link modules produced with different settings of the flag. 3. A single global flag is not adequate to aggressively compile languages like Fortran and Java. The last point is worth explaining a bit more. In Java, for example, you can have portions of the program (on a scope granularity) marked as requiring "strictfp" semantics, but by default the program uses relaxed semantics. To correctly compile java codes with a global flag, the compiler would have to revert to strict semantics for (at least) the entire function that contains any strictfp code (in practice, the granularity would probably be higher do to logistical problems). In Fortran, operations can be marked strict at the source level simply by placing paretheses around them. For example (from my understanding), the compiler is free to associate "X + Y + Z" as "(X + Y) + Z" or ""X + (Y + Z)", but if the parentheses are explicit in the source, the compiler is not allowed to reassociate the expressions. //===----------------------------------------------------------------------===// // Proposed solution My proposed solution to this problem is extremely simple. On a per-instruction basis, keep track of whether or not a floating point operation is strict or not. In particular, we would introduce new add_strict, sub_strict, ... operations for strict FP support. The existing operations would be used to represent the "relaxed" operations. This has the nice side effect of making "add" always associative! Behavior in the face of SNANs should be defined by the EH bit on the instructions (see the ExceptionHandlingChanges note). As we start to define LLVM intrinsics for the various libm primitive operations (like sin,cos,sqrt, etc), we should define them to take a constant boolean value, indicating whether they are strict or not. This would give us functions like: declare F32 %llvm.sinF32(F32 %Val, I1 %isStrict) call F32 %llvm.sinF32(F32 1.0, I1 true) ;; strict call F32 %llvm.sinF32(F32 1.0, I1 false) ;; relaxed The behavior that we allow for "relaxed" operations needs to be specifically defined. In particular, it seems that add/mul should become associative, and we should be able to perform a variety of transforms like X+0.0 -> X. In the C front-end, if -ffast-math is provided the CFE would produce all relaxed operators, if it is not provided, all strict operators are used. Java/Fortran compilers would generate the appropriate mix of operations, based on their input program. Note that if we were to magically make our own C/C++ compilers, we should definitely default to the equivalent of -ffast-math. Empirically, this produces faster and more precise code for GCC, and the intel compiler has its equivalent default to on. We may also consider hacking llvmgcc to default to it being on. //===----------------------------------------------------------------------===// // References This is an interesting paper about implementing strict FP support on Intel hardware: http://www.shudo.net/publications/java-hpc2000/shudo-Java4HPC-strictfp.pdf