Explicit Conversion#
Let’s try converting a single Unicode emoji character 😀 (U+1F600 GRINNING FACE) expressed as four UTF-8 code units (0xF0, 0x9F, 0x98, 0x80) to UTF-16 (where it is the surrogate pair 0xD83D, 0xDE00).
std::vector<char16_t> out;
auto it = std::back_inserter (out);
icubaby::t8_16 t;
for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) {
it = t (cu, it);
}
it = t.end_cp (it);
The out
vector will contain a two UTF-16 code units 0xD83D and 0xDE00.
Disecting the explicit conversion code#
Define where and how the output should be written:
std::vector<char16_t> out; auto it = std::back_inserter (out);
For the purposes of this example, we write the encoded output to a std::vector<char16_t>. Use the container of your choice!
Create the transcoder instance:
icubaby::t8_16 t;
icubaby::t8_16
is an alias to a specialization oficubaby::transcoder
which converts from UTF-8 to UTF-16.Pass each code unit and the output iterator to the transcoder.
for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) { it = t (cu, it); }
For each code unit, call
icubaby::transcoder::operator()()
.Tell the transcoder that we’ve reached the end of the input. This ensures that the sequence didn’t end part way through a code point.
it = t.end_cp (it);
It’s only necessary to make a single call to
icubaby::transcoder::end_cp()
once all of the input has been fed to the transcoder.