Synchronize code style and comments with Arm Optimized Routines, there
are no code changes in this patch. This ensures different projects using
the same code have consistent code style so bug fix patches can be applied
more easily.
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.54 ULP with inlined fma and fma contraction. It uses
a 4 KB lookup table in addition to the table in exp_data.c, on aarch64
.text+.rodata size of libm.a is increased by 2295 bytes.
Improvements on Cortex-A72:
latency: 3.3x
thruput: 4.9x