Skip to content
Snippets Groups Projects
  1. Aug 20, 2024
  2. Jan 17, 2024
    • Jonathan Wakely's avatar
      libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318] · df0a668b
      Jonathan Wakely authored
      
      This is another C++26 change, approved in Varna 2023. We require a new
      static array of data that is extracted from the IANA Character Sets
      database. A new Python script to generate a header from the IANA CSV
      file is added.
      
      The text_encoding class is basically just a pointer to an {ID,name} pair
      in the static array. The aliases view is also just the same pointer (or
      empty), and the view's iterator moves forwards and backwards in the
      array while the array elements have the same ID (or to one element
      further, for a past-the-end iterator).
      
      Because those iterators refer to a global array that never goes out of
      scope, there's no reason they should every produce undefined behaviour
      or indeterminate values.  They should either have well-defined
      behaviour, or abort. The overhead of ensuring those properties is pretty
      low, so seems worth it.
      
      This means that an aliases_view iterator should never be able to access
      out-of-bounds. A non-value-initialized iterator always points to an
      element of the static array even when not dereferenceable (the array has
      unreachable entries at the start and end, which means that even a
      past-the-end iterator for the last encoding in the array still points to
      valid memory).  Dereferencing an iterator can always return a valid
      array element, or "" for a non-dereferenceable iterator (but doing so
      will abort when assertions are enabled).  In the language being proposed
      for C++26, dereferencing an invalid iterator erroneously returns "".
      Attempting to increment/decrement past the last/first element in the
      view is erroneously a no-op, so aborts when assertions are enabled, and
      doesn't change value otherwise.
      
      Similarly, constructing a std::text_encoding with an invalid id (one
      that doesn't have the value of an enumerator) erroneously behaves the
      same as constructing with id::unknown, or aborts with assertions
      enabled.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/113318
      	* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
      	(GLIBCXX_CHECK_TEXT_ENCODING): Define.
      	* config.h.in: Regenerate.
      	* configure: Regenerate.
      	* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
      	* include/Makefile.am: Add new headers.
      	* include/Makefile.in: Regenerate.
      	* include/bits/locale_classes.h (locale::encoding): Declare new
      	member function.
      	* include/bits/unicode.h (__charset_alias_match): New function.
      	* include/bits/text_encoding-data.h: New file.
      	* include/bits/version.def (text_encoding): Define.
      	* include/bits/version.h: Regenerate.
      	* include/std/text_encoding: New file.
      	* src/Makefile.am: Add new subdirectory.
      	* src/Makefile.in: Regenerate.
      	* src/c++26/Makefile.am: New file.
      	* src/c++26/Makefile.in: New file.
      	* src/c++26/text_encoding.cc: New file.
      	* src/experimental/Makefile.am: Include c++26 convenience
      	library.
      	* src/experimental/Makefile.in: Regenerate.
      	* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
      	printer.
      	* scripts/gen_text_encoding_data.py: New file.
      	* testsuite/22_locale/locale/encoding.cc: New test.
      	* testsuite/ext/unicode/charset_alias_match.cc: New test.
      	* testsuite/std/text_encoding/cons.cc: New test.
      	* testsuite/std/text_encoding/members.cc: New test.
      	* testsuite/std/text_encoding/requirements.cc: New test.
      
      Reviewed-by: default avatarUlrich Drepper <drepper.fsp@gmail.com>
      Reviewed-by: default avatarPatrick Palka <ppalka@redhat.com>
      df0a668b
Loading