Skip to content
Snippets Groups Projects
  • Jonathan Wakely's avatar
    df0a668b
    libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318] · df0a668b
    Jonathan Wakely authored
    
    This is another C++26 change, approved in Varna 2023. We require a new
    static array of data that is extracted from the IANA Character Sets
    database. A new Python script to generate a header from the IANA CSV
    file is added.
    
    The text_encoding class is basically just a pointer to an {ID,name} pair
    in the static array. The aliases view is also just the same pointer (or
    empty), and the view's iterator moves forwards and backwards in the
    array while the array elements have the same ID (or to one element
    further, for a past-the-end iterator).
    
    Because those iterators refer to a global array that never goes out of
    scope, there's no reason they should every produce undefined behaviour
    or indeterminate values.  They should either have well-defined
    behaviour, or abort. The overhead of ensuring those properties is pretty
    low, so seems worth it.
    
    This means that an aliases_view iterator should never be able to access
    out-of-bounds. A non-value-initialized iterator always points to an
    element of the static array even when not dereferenceable (the array has
    unreachable entries at the start and end, which means that even a
    past-the-end iterator for the last encoding in the array still points to
    valid memory).  Dereferencing an iterator can always return a valid
    array element, or "" for a non-dereferenceable iterator (but doing so
    will abort when assertions are enabled).  In the language being proposed
    for C++26, dereferencing an invalid iterator erroneously returns "".
    Attempting to increment/decrement past the last/first element in the
    view is erroneously a no-op, so aborts when assertions are enabled, and
    doesn't change value otherwise.
    
    Similarly, constructing a std::text_encoding with an invalid id (one
    that doesn't have the value of an enumerator) erroneously behaves the
    same as constructing with id::unknown, or aborts with assertions
    enabled.
    
    libstdc++-v3/ChangeLog:
    
    	PR libstdc++/113318
    	* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
    	(GLIBCXX_CHECK_TEXT_ENCODING): Define.
    	* config.h.in: Regenerate.
    	* configure: Regenerate.
    	* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
    	* include/Makefile.am: Add new headers.
    	* include/Makefile.in: Regenerate.
    	* include/bits/locale_classes.h (locale::encoding): Declare new
    	member function.
    	* include/bits/unicode.h (__charset_alias_match): New function.
    	* include/bits/text_encoding-data.h: New file.
    	* include/bits/version.def (text_encoding): Define.
    	* include/bits/version.h: Regenerate.
    	* include/std/text_encoding: New file.
    	* src/Makefile.am: Add new subdirectory.
    	* src/Makefile.in: Regenerate.
    	* src/c++26/Makefile.am: New file.
    	* src/c++26/Makefile.in: New file.
    	* src/c++26/text_encoding.cc: New file.
    	* src/experimental/Makefile.am: Include c++26 convenience
    	library.
    	* src/experimental/Makefile.in: Regenerate.
    	* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
    	printer.
    	* scripts/gen_text_encoding_data.py: New file.
    	* testsuite/22_locale/locale/encoding.cc: New test.
    	* testsuite/ext/unicode/charset_alias_match.cc: New test.
    	* testsuite/std/text_encoding/cons.cc: New test.
    	* testsuite/std/text_encoding/members.cc: New test.
    	* testsuite/std/text_encoding/requirements.cc: New test.
    
    Reviewed-by: default avatarUlrich Drepper <drepper.fsp@gmail.com>
    Reviewed-by: default avatarPatrick Palka <ppalka@redhat.com>
    df0a668b
    History
    libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]
    Jonathan Wakely authored
    
    This is another C++26 change, approved in Varna 2023. We require a new
    static array of data that is extracted from the IANA Character Sets
    database. A new Python script to generate a header from the IANA CSV
    file is added.
    
    The text_encoding class is basically just a pointer to an {ID,name} pair
    in the static array. The aliases view is also just the same pointer (or
    empty), and the view's iterator moves forwards and backwards in the
    array while the array elements have the same ID (or to one element
    further, for a past-the-end iterator).
    
    Because those iterators refer to a global array that never goes out of
    scope, there's no reason they should every produce undefined behaviour
    or indeterminate values.  They should either have well-defined
    behaviour, or abort. The overhead of ensuring those properties is pretty
    low, so seems worth it.
    
    This means that an aliases_view iterator should never be able to access
    out-of-bounds. A non-value-initialized iterator always points to an
    element of the static array even when not dereferenceable (the array has
    unreachable entries at the start and end, which means that even a
    past-the-end iterator for the last encoding in the array still points to
    valid memory).  Dereferencing an iterator can always return a valid
    array element, or "" for a non-dereferenceable iterator (but doing so
    will abort when assertions are enabled).  In the language being proposed
    for C++26, dereferencing an invalid iterator erroneously returns "".
    Attempting to increment/decrement past the last/first element in the
    view is erroneously a no-op, so aborts when assertions are enabled, and
    doesn't change value otherwise.
    
    Similarly, constructing a std::text_encoding with an invalid id (one
    that doesn't have the value of an enumerator) erroneously behaves the
    same as constructing with id::unknown, or aborts with assertions
    enabled.
    
    libstdc++-v3/ChangeLog:
    
    	PR libstdc++/113318
    	* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
    	(GLIBCXX_CHECK_TEXT_ENCODING): Define.
    	* config.h.in: Regenerate.
    	* configure: Regenerate.
    	* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
    	* include/Makefile.am: Add new headers.
    	* include/Makefile.in: Regenerate.
    	* include/bits/locale_classes.h (locale::encoding): Declare new
    	member function.
    	* include/bits/unicode.h (__charset_alias_match): New function.
    	* include/bits/text_encoding-data.h: New file.
    	* include/bits/version.def (text_encoding): Define.
    	* include/bits/version.h: Regenerate.
    	* include/std/text_encoding: New file.
    	* src/Makefile.am: Add new subdirectory.
    	* src/Makefile.in: Regenerate.
    	* src/c++26/Makefile.am: New file.
    	* src/c++26/Makefile.in: New file.
    	* src/c++26/text_encoding.cc: New file.
    	* src/experimental/Makefile.am: Include c++26 convenience
    	library.
    	* src/experimental/Makefile.in: Regenerate.
    	* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
    	printer.
    	* scripts/gen_text_encoding_data.py: New file.
    	* testsuite/22_locale/locale/encoding.cc: New test.
    	* testsuite/ext/unicode/charset_alias_match.cc: New test.
    	* testsuite/std/text_encoding/cons.cc: New test.
    	* testsuite/std/text_encoding/members.cc: New test.
    	* testsuite/std/text_encoding/requirements.cc: New test.
    
    Reviewed-by: default avatarUlrich Drepper <drepper.fsp@gmail.com>
    Reviewed-by: default avatarPatrick Palka <ppalka@redhat.com>