This version is faster (about 25% on x86_64) and works when extreme optimisation options such as -ffast-math are used.