Use a native instruction "Count Sign Bits" to support fast ffs function, then add __rt_ffs support in C28x.