Transcoder#

Transcoder Interface#

template<unicode_input FromEncoding, unicode_char_type ToEncoding>
class transcoder#

A transcoder takes a sequence of either bytes or Unicode code-units (one of UTF-8, 16 or 32) and converts it to another Unicode encoding.

Each of the specializations of this template (there is one for each input/output combination) supplies the same interface.

Public Types

using input_type = FromEncoding#

The type of the code units consumed by this transcoder.

using output_type = ToEncoding#

The type of the code units produced by this transcoder.

Public Functions

template<std::output_iterator<output_type> OutputIterator>
OutputIterator operator()(input_type code_unit, OutputIterator dest) noexcept#

This member function is the heart of the transcoder. It accepts a single byte or code unit in the input encoding and, once an entire code point has been consumed, produces the equivalent code point expressed in the output encoding. Malformed input is detected and replaced with the Unicode replacement character (U+FFFD REPLACEMENT CHARACTER).

Template Parameters:

OutputIterator – An output iterator type to which values of type transcoder::output_type can be written.

Parameters:
  • code_unit – A code unit in the source encoding.

  • dest – An output iterator to which the output sequence is written.

Returns:

Iterator one past the last element assigned.

template<std::output_iterator<output_type> OutputIterator>
constexpr OutputIterator end_cp(OutputIterator dest) const#

Call once the entire input sequence has been fed to operator(). This function ensures that the sequence did not end with a partial code point.

Template Parameters:

OutputIterator – An output iterator type to which values of type transcoder::output_type can be written.

Parameters:

dest – An output iterator to which the output sequence is written.

Returns:

Iterator one past the last element assigned.

template<std::output_iterator<output_type> OutputIterator>
constexpr iterator<transcoder, OutputIterator> end_cp(iterator<transcoder, OutputIterator> dest)#

Call once the entire input sequence has been fed to operator(). This function ensures that the sequence did not end with a partial code point and flushes any remaining output.

Template Parameters:

OutputIterator – An output iterator type to which values of type transcoder::output_type can be written.

Parameters:

dest – An output iterator to which the output sequence is written.

Returns:

Iterator one past the last element assigned.

constexpr bool well_formed() const noexcept#

Indicates whether the input was well formed

Returns:

True if the input was well formed.

constexpr bool partial() const noexcept#

Indicates whether a “partial” code point has been passed to operator().

If true, one or more code units are required to build the complete code point.

Returns:

True if a partial code-point has been passed to operator() and false otherwise.

Byte Encoder Detected Encoding#

The “byte transcoder” implements an extension to the standard transcoder API by providing a member function that will return its determination of the encoding of the input stream.

enum class icubaby::encoding#

An enumeration representing the encoding detected by transcoder<std::byte, X>.

Values:

enumerator unknown#

No encoding has yet been determined.

enumerator utf8#

The detected encoding is UTF-8.

enumerator utf16be#

The detected encoding is big-endian UTF-16.

enumerator utf16le#

The detected encoding is little-endian UTF-16.

enumerator utf32be#

The detected encoding is big-endian UTF-32.

enumerator utf32le#

The detected encoding is little-endian UTF-32.

Convenience Typedefs#

using icubaby::tx_8 = transcoder<std::byte, char8>#

A shorter name for the UTF-8 “byte transcoder” which consumes bytes in unknown input encoding and produces UTF-8.

using icubaby::tx_16 = transcoder<std::byte, char16_t>#

A shorter name for the UTF-16 “byte transcoder” which consumes bytes in unknown input encoding and produces UTF-16.

using icubaby::tx_32 = transcoder<std::byte, char32_t>#

A shorter name for the UTF-32 “byte transcoder” which consumes bytes in unknown input encoding and produces UTF-32.

using icubaby::t8_8 = transcoder<char8, char8>#

A shorter name for the UTF-8 to UTF-8 transcoder. This, assuming well-formed input, represents no change.

using icubaby::t8_16 = transcoder<char8, char16_t>#

A shorter name for the UTF-8 to UTF-16 transcoder.

using icubaby::t8_32 = transcoder<char8, char32_t>#

A shorter name for the UTF-8 to UTF-32 transcoder.

using icubaby::t16_8 = transcoder<char16_t, char8>#

A shorter name for the UTF-16 to UTF-8 transcoder.

using icubaby::t16_16 = transcoder<char16_t, char16_t>#

A shorter name for the UTF-16 to UTF-16 transcoder. This, assuming well-formed input, represents no change.

using icubaby::t16_32 = transcoder<char16_t, char32_t>#

A shorter name for the UTF-16 to UTF-32 transcoder.

using icubaby::t32_8 = transcoder<char32_t, char8>#

A shorter name for the UTF-32 to UTF-8 transcoder.

using icubaby::t32_16 = transcoder<char32_t, char16_t>#

A shorter name for the UTF-32 to UTF-16 transcoder.

using icubaby::t32_32 = transcoder<char32_t, char32_t>#

A shorter name for the UTF-32 to UTF-32 transcoder. This, assuming well-formed input, represents no change.