Examples#

View UTF-32 to UTF-16#

An example showing conversion of an array UTF-32 encoded code points can be converted to UTF-16 using the C++ 20 ranges interface.

view_utf32_to_16.cpp#

#include <algorithm>
#include <array>
#include <iostream>
#include <iterator>
#include <vector>

#include "icubaby/icubaby.hpp"

// The ICUBABY_HAVE_RANGES and ICUBABY_HAVE_CONCEPTS macros are true if the
// corresponding features are available in both the compiler and standard
// library.
#if ICUBABY_HAVE_RANGES && ICUBABY_HAVE_CONCEPTS

int main () {
  // The code points to be converted. Here just a single U+1F600 GRINNING FACE
  // emoji but obviously the array could contain many more code points.
  auto const input = std::array{char32_t{0x1F600}};

  // Take the 'input' container and send it lazily through a UTF-32 to UTF-16
  // transcoder.
  auto r = input | icubaby::views::transcode<char32_t, char16_t>;

  // A vector to contain the output UTF=16 code units.
  std::vector<char16_t> out;

  // Copy from the input range to the output vector.
  std::ranges::copy (r, std::back_inserter (out));
}

#else

int main () {
  std::cout << "Sorry, icubaby C++ 20 ranges aren't supported by your build.\n";
}

#endif  // ICUBABY_HAVE_RANGES && ICUBABY_HAVE_CONCEPTS

Iterator and Algorithm#

The icubaby::iterator class offers a familiar output iterator for using a transcoder. Each code unit from the input encoding is written to the iterator and this in turn writes the output encoding to a second iterator. This enables use of standard algorithms such as std::copy() with the library.

iterator.cpp#

#include <iostream>
#include <iterator>
#include <string_view>
#include <vector>

#include "icubaby/icubaby.hpp"

namespace {

// Dump the number of code units and code points within the supplied container.
template <typename Container>
void describe (Container const& container, std::string_view const encoding) {
  auto const code_units = container.size ();
  auto const code_points =
      icubaby::length (std::begin (container), std::end (container));

  std::cout << encoding << " is " << code_units << ' '
            << (code_units == 1U ? "code unit" : "code units") << " and "
            << code_points << ' '
            << (code_points == 1U ? "code point" : "code points") << '\n';
}

}  // end anonymous namespace

int main () {
  // The input: we start with a vector of UTF-8 code units. In this case a
  // single U+1F600 GRINNING FACE code point. Note that we use icubaby::char8
  // here for compatibility with C++ 17. If you always use C++ 20 or later, you
  // can bypass this type and simply use char8_t.
  auto const input = std::vector{
      static_cast<icubaby::char8> (0xF0), static_cast<icubaby::char8> (0x9F),
      static_cast<icubaby::char8> (0x98), static_cast<icubaby::char8> (0x80)};
  describe (input, "UTF-8");

  // A second vector which will contain the UTF-16 output.
  std::vector<char16_t> output;

  // Instantiate a transcoder which can convert from UTF-8 to UTF-16.
  icubaby::t8_16 transcoder;

  // Now an output iterator based on the t8_16 transcoder which can consume
  // UTF-8 input and convert it to UTF-16. It will then emit the results to a
  // second output iterator (`inserter`) which appends those code units to the
  // `output` vector.
  auto inserter = std::back_inserter (output);
  auto output_it = icubaby::iterator{&transcoder, inserter};

  // Loop through the input assigning each UTF-16 to the `it` output iterator
  // created on the previous line.
  for (auto code_unit : input) {
    *(output_it++) = code_unit;
  }

  // Tell the transcoder that the input has been completely processed.
  (void)transcoder.end_cp (output_it);

  describe (output, "UTF-16");
  std::cout << "Input " << (transcoder.well_formed () ? "was" : "was not")
            << " well formed\n";
}

Bytes to UTF-8#

This code converts an array of bytes containing the string “Hello World” in UTF-16 BE with an initial byte order mark first to UTF-8 and then to an array of std::uint_least8_t. We finally copy these values to std::cout.

bytes_to_utf8.cpp#

#include <iostream>

#include "icubaby/icubaby.hpp"

// The ICUBABY_HAVE_RANGES and ICUBABY_HAVE_CONCEPTS macros are true if the
// corresponding features are available in both the compiler and standard
// library.
#if ICUBABY_HAVE_RANGES && ICUBABY_HAVE_CONCEPTS

int main () {
  // The bytes to be converted. An array here, but this could obviously come
  // from any source such as user input, a file, or a network endpoint. Note
  // that the icubaby transcoder deals with a single byte at a time so we don't
  // need to have the entire input available at any time.
  static std::array const input{
      std::byte{0xFE}, std::byte{0xFF}, std::byte{0x00}, std::byte{'H'},
      std::byte{0x00}, std::byte{'e'},  std::byte{0x00}, std::byte{'l'},
      std::byte{0x00}, std::byte{'l'},  std::byte{0x00}, std::byte{'o'},
      std::byte{0x00}, std::byte{' '},  std::byte{0x00}, std::byte{'W'},
      std::byte{0x00}, std::byte{'o'},  std::byte{0x00}, std::byte{'r'},
      std::byte{0x00}, std::byte{'l'},  std::byte{0x00}, std::byte{'d'},
  };

  // A pipeline where the input array is converted from a series of bytes to a
  // stream of UTF-8 code units and then finally to std::uint_least8_t for
  // display to the user.
  auto range = input | icubaby::views::transcode<std::byte, char8_t> |
               std::views::transform ([] (char8_t code_unit) {
                 return static_cast<std::uint_least8_t> (code_unit);
               });

  // Copy the elements of range directly to `std::cout`.
  (void)std::ranges::copy (
      range, std::ostream_iterator<std::uint_least8_t> (std::cout));
}

#else

int main () {
  std::cout << "Sorry, icubaby C++ 20 ranges aren't supported by your build.\n";
}

#endif  // ICUBABY_HAVE_RANGES && ICUBABY_HAVE_CONCEPTS

Manual Bytes to UTF-8#

This code shows how icubaby makes it straightforward to convert a byte array to a sequence of Unicode code units passing one byte at a time to a transcoder instance. We take the bytes making up the string “Hello World” expressed in big endian UTF-16 (with a byte order marker) and convert them to UTF-8 which is written directly to std::cout.

manual_bytes_to_utf8.cpp#

#include <array>
#include <cstddef>
#include <iostream>
#include <iterator>
#include <vector>

#include "icubaby/icubaby.hpp"

int main () {
  // The bytes to be converted. An array here, but this could obviously come
  // from any source such as user input, a file, or a network endpoint. Note
  // that the icubaby transcoder deals with a single byte at a time so we don't
  // need to have the entire input available at any time.
  static std::array const input{
      std::byte{0xFE}, std::byte{0xFF}, std::byte{0x00}, std::byte{'H'},
      std::byte{0x00}, std::byte{'e'},  std::byte{0x00}, std::byte{'l'},
      std::byte{0x00}, std::byte{'l'},  std::byte{0x00}, std::byte{'o'},
      std::byte{0x00}, std::byte{' '},  std::byte{0x00}, std::byte{'W'},
      std::byte{0x00}, std::byte{'o'},  std::byte{0x00}, std::byte{'r'},
      std::byte{0x00}, std::byte{'l'},  std::byte{0x00}, std::byte{'d'},
      std::byte{0x00}, std::byte{'\n'}};

  // A vector to contain the UTF-8 output.
  std::vector<icubaby::char8> output;

  // An output iterator that will append each UTF-8 code unit to the `output`
  // vector.
  auto out_it = std::back_inserter (output);

  // The transcoder instance. We consume bytes (indicating that the transcoder
  // must decide on the input encoding) and emit icubaby::char8 (UTF-8).
  icubaby::transcoder<std::byte, icubaby::char8> transcode;

  // Call the transcoder for each source byte. Output goes to the 'out' output
  // iterator.
  for (auto b : input) {
    out_it = transcode (b, out_it);
  }

  // Tell the transcoder that it should have received a complete code point.
  // This always happens at the end of the input.
  (void)transcode.end_cp (out_it);

  // Write the output to the console. This example sticks to the ASCII subset of
  // code point, so this should work on most terminals!
  for (auto c : output) {
    std::cout << static_cast<char> (c);
  }
}