Older GCC (e.g. 4.9.3) seem to define __ARM_FP even in case soft-float is used.
Use the vsqrt.f64 and vsqrt.f32 instructions if available.