Synchronize code style and comments with Arm Optimized Routines, there
are no code changes in this patch. This ensures different projects using
the same code have consistent code style so bug fix patches can be applied
more easily.
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.547 ULP with inlined fma and fma contraction. It uses
a 1 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1584 bytes.
Note that the math.h header defines log2(x) to be log(x)/Ln2, this is
not changed, so the new code is only used if that macro is suppressed.
Improvements on Cortex-A72:
latency: 2.0x
thruput: 2.2x