pair handling to be more bullet-proof even with incomplete UTF-8 sequences. Add check for 4 byte sequences resulting in values outside the valid Unicode range. Add a comment to clarify checking for invalid CESU-8 sequences.