- It can represent all 1,114,112 Unicode characters.
- Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII.
- Characters usually require fewer than four bytes.
- STRING SORT ORDER IS PRESERVED. IN OTHER WORDS, SORTING UTF-8 STRINGS PER-BYTE YIELDS THE SAME ORDER AS SORTING THEM PER-CHARACTER BY LOGICAL UNICODE VALUE.
- A missing or corrupt byte in transmission can only affect a single character—you can always find the start of the sequence for the next character just by scanning a couple bytes.
- There are no byte-order/endianness issues, since UTF-8 data is a byte stream.
MULTIBYTE-STRING: A string with an encoding that allows more character to be stored with more than one byte.
WIDE-CHARACTER: A string where each character is the same-size (usually 32-bit integer) and represents a unicode code-point.