The vfma.f32|64 z, x, y instruction performs the operation z += x * y without intermediate rounding. The register used for z is both read and written by the instruction. The inline assembly must therefore use the "+" constraint modifier.