C++ 20 Range Adaptor#

C++ 20 introduces the ranges library for more composable and less error-prone interaction with iterators and containers. In icubaby, we can transform a range of bytes to a specified encoding or convert a sequence of Unicode code units to a different encoding using a single range adaptor.

Range Adaptor Example#

 1#include <icubaby/icubaby.hpp>
 2#include <vector>
 3
 4auto grinning_face() {
 5  auto const in = std::vector{char32_t{0x1F600}};
 6  auto const r = in | icubaby::views::transcode<char32_t, char16_t>;
 7  std::vector<char16_t> out;
 8  std::ranges::copy(r, std::back_inserter(out));
 9  return out;
10}

This code wants to convert a single Unicode code-point 😀 (U+1F600 GRINNING FACE) from UTF-32 to UTF-16 where this code point is encoded in two code units as 0xD83D and 0xDE00.

You can experiment with a working example using Compiler Explorer.

Disecting the Example#

  1. Define the input range:

    auto const in = std::vector{char32_t{0x1F600}};
    

    We express the input as a container with our input text consisting simply of U+1F600 GRINNING FACE expressed in UTF-32.

  2. Create a range with a view of our container and pass it to the icubaby transcode range adaptor:

    auto const r = in | icubaby::views::transcode<char32_t, char16_t>;
    

    The first template argument for icubaby::ranges::views::transcode is the encoding of the input text (one of std::byte, icubaby::char8, char16_t, char32_t), the second argument is the desired encoding of the output text (icubaby::char8, char16_t, char32_t).

    We now have the range r containing the UTF-16 code-units that correspond to the original input text.

  3. The final step is to record the values within the range r. In C++ 20, this can be achieved with the std::ranges::copy() algorithm:

    std::vector<char16_t> out;
    std::ranges::copy(r, std::back_inserter(out));
    

    If you are using the C++ 23 ranges library, you can simplify this even further using std::ranges::to():

    auto const out = r | std::ranges::to<std::vector> ();
    

Namespace icubaby::ranges Reference#

namespace ranges#

icubaby C++ 20 ranges support types.

Variables

template<typename FromEncoding, typename ToEncoding>
constexpr auto max_output_bytes = longest_sequence_v<ToEncoding>#

The maximum number of bytes that can be produced by a single code-unit being passed to a transcoder.

template<>
constexpr auto max_output_bytes<char16_t, char32_t> = std::size_t{2}#

The maximum number of bytes produced by a single code-unit being passed to a transcoder consuming UTF-32.

template<unicode_input FromEncoding, unicode_char_type ToEncoding, std::ranges::input_range View>
class transcode_view : public std::ranges::view_interface<transcode_view<FromEncoding, ToEncoding, View>>#
#include <icubaby.hpp>

A range adaptor for lazily converting between Unicode encodings.

A range adaptor that represents view of an underlying sequence consisting of Unicode code points in the encoding given by FromEncoding and produces the equivalent code points in the encoding given by ToEncoding.

Template Parameters:
  • FromEncoding – The encoding used by the underlying sequence.

  • ToEncoding – The encoding that will be produced by this range adaptor.

  • View – The type of the underlying view.

namespace views#

Functions

template<unicode_input FromEncoding, unicode_char_type ToEncoding, std::ranges::viewable_range Range>
constexpr auto operator|(Range &&range, transcode_range_adaptor<FromEncoding, ToEncoding> const &adaptor)#
Template Parameters:
  • FromEncoding – The encoding used by the underlying sequence.

  • ToEncoding – The encoding that will be produced.

  • Range – The type of the range that will be consumed.

Variables

template<unicode_input FromEncoding, unicode_char_type ToEncoding>
constexpr auto transcode = views::transcode_range_adaptor<FromEncoding, ToEncoding>{}#
Template Parameters:
  • FromEncoding – The encoding used by the underlying sequence.

  • ToEncoding – The encoding that will be produced.

template<unicode_input FromEncoding, unicode_char_type ToEncoding>
class transcode_range_adaptor#
#include <icubaby.hpp>
Template Parameters:
  • FromEncoding – The encoding used by the underlying sequence.

  • ToEncoding – The encoding that will be produced by this adaptor.