C++ 20 Range Adaptor#
C++ 20 introduces the ranges library for more composable and less error-prone interaction with iterators and containers. In icubaby, we can transform a range of bytes to a specified encoding or convert a sequence of Unicode code units to a different encoding using a single range adaptor.
Range Adaptor Example#
1#include <icubaby/icubaby.hpp>
2#include <vector>
3
4auto grinning_face() {
5 auto const in = std::vector{char32_t{0x1F600}};
6 auto const r = in | icubaby::views::transcode<char32_t, char16_t>;
7 std::vector<char16_t> out;
8 std::ranges::copy(r, std::back_inserter(out));
9 return out;
10}
This code wants to convert a single Unicode code-point 😀 (U+1F600 GRINNING FACE) from UTF-32 to UTF-16 where this code point is encoded in two code units as 0xD83D and 0xDE00.
You can experiment with a working example using Compiler Explorer.
Disecting the Example#
Define the input range:
auto const in = std::vector{char32_t{0x1F600}};
We express the input as a container with our input text consisting simply of U+1F600 GRINNING FACE expressed in UTF-32.
Create a range with a view of our container and pass it to the icubaby transcode range adaptor:
auto const r = in | icubaby::views::transcode<char32_t, char16_t>;
The first template argument for
icubaby::ranges::views::transcode
is the encoding of the input text (one ofstd::byte
,icubaby::char8
,char16_t
,char32_t
), the second argument is the desired encoding of the output text (icubaby::char8
,char16_t
,char32_t
).We now have the range
r
containing the UTF-16 code-units that correspond to the original input text.The final step is to record the values within the range
r
. In C++ 20, this can be achieved with thestd::ranges::copy()
algorithm:std::vector<char16_t> out; std::ranges::copy(r, std::back_inserter(out));
If you are using the C++ 23 ranges library, you can simplify this even further using
std::ranges::to()
:auto const out = r | std::ranges::to<std::vector> ();
Namespace icubaby::ranges Reference#
-
namespace ranges#
icubaby C++ 20 ranges support types.
Variables
-
template<typename FromEncoding, typename ToEncoding>
constexpr auto max_output_bytes = longest_sequence_v<ToEncoding># The maximum number of bytes that can be produced by a single code-unit being passed to a transcoder.
-
template<>
constexpr auto max_output_bytes<char16_t, char32_t> = std::size_t{2}# The maximum number of bytes produced by a single code-unit being passed to a transcoder consuming UTF-32.
-
template<unicode_input FromEncoding, unicode_char_type ToEncoding, std::ranges::input_range View>
class transcode_view : public std::ranges::view_interface<transcode_view<FromEncoding, ToEncoding, View>># - #include <icubaby.hpp>
A range adaptor for lazily converting between Unicode encodings.
A range adaptor that represents view of an underlying sequence consisting of Unicode code points in the encoding given by FromEncoding and produces the equivalent code points in the encoding given by ToEncoding.
- Template Parameters:
FromEncoding – The encoding used by the underlying sequence.
ToEncoding – The encoding that will be produced by this range adaptor.
View – The type of the underlying view.
-
namespace views#
Functions
-
template<unicode_input FromEncoding, unicode_char_type ToEncoding, std::ranges::viewable_range Range>
constexpr auto operator|(Range &&range, transcode_range_adaptor<FromEncoding, ToEncoding> const &adaptor)# - Template Parameters:
FromEncoding – The encoding used by the underlying sequence.
ToEncoding – The encoding that will be produced.
Range – The type of the range that will be consumed.
Variables
-
template<unicode_input FromEncoding, unicode_char_type ToEncoding>
constexpr auto transcode = views::transcode_range_adaptor<FromEncoding, ToEncoding>{}# - Template Parameters:
FromEncoding – The encoding used by the underlying sequence.
ToEncoding – The encoding that will be produced.
-
template<unicode_input FromEncoding, unicode_char_type ToEncoding>
class transcode_range_adaptor# - #include <icubaby.hpp>
- Template Parameters:
FromEncoding – The encoding used by the underlying sequence.
ToEncoding – The encoding that will be produced by this adaptor.
-
template<unicode_input FromEncoding, unicode_char_type ToEncoding, std::ranges::viewable_range Range>
-
template<typename FromEncoding, typename ToEncoding>