Skip to content
Snippets Groups Projects
  • Jonathan Wakely's avatar
    74b5101c
    libstdc++: Handle encodings in localized chrono formatting [PR109162] · 74b5101c
    Jonathan Wakely authored
    This implements the C++23 paper P2419R2 (Clarify handling of encodings
    in localized formatting of chrono types). The requirement is that when
    the literal encoding is "a Unicode encoding form" and the formatting
    locale uses a different encoding, any locale-specific strings such as
    "août" for std::chrono::August should be converted to the literal
    encoding.
    
    Using the recently-added std::locale::encoding() function we can check
    the locale's encoding and then use iconv if a conversion is needed.
    Because nl_langinfo_l and iconv_open both allocate memory, a naive
    implementation would perform multiple allocations and deallocations for
    every snippet of locale-specific text that needs to be converted to
    UTF-8. To avoid that, a new internal locale::facet is defined to store
    the text_encoding and an iconv_t descriptor, which are then cached in
    the formatting locale. This requires access to the internals of a
    std::locale object in src/c++20/format.cc, so that new file needs to be
    compiled with -fno-access-control, as well as -std=gnu++26 in order to
    use std::text_encoding.
    
    Because the new std::text_encoding and std::locale::encoding() symbols
    are only in the libstdc++exp.a archive, we need to include
    src/c++26/text_encoding.cc in the main library, but not export its
    symbols yet. This means they can be used by the two new functions which
    are exported from the main library.
    
    The encoding conversions are done for C++20, treating it as a DR that
    resolves LWG 3656.
    
    With this change we can increase the value of the __cpp_lib_format macro
    for C++23. The value should be 202207 for P2419R2, but we already
    implement P2510R3 (Formatting pointers) so can use the value 202304.
    
    libstdc++-v3/ChangeLog:
    
    	PR libstdc++/109162
    	* acinclude.m4 (libtool_VERSION): Update to 6:34:0.
    	* config/abi/pre/gnu.ver: Disambiguate old patters. Add new
    	GLIBCXX_3.4.34 symbol version and new exports.
    	* configure: Regenerate.
    	* include/bits/chrono_io.h (_ChronoSpec::_M_locale_specific):
    	Add new accessor functions to use a reserved bit in _Spec.
    	(__formatter_chrono::_M_parse): Use _M_locale_specific(true)
    	when chrono-specs contains locale-dependent conversion
    	specifiers.
    	(__formatter_chrono::_M_format): Open iconv descriptor if
    	conversion to UTF-8 will be needed.
    	(__formatter_chrono::_M_write): New function to write a
    	localized string with possible character conversion.
    	(__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B)
    	(__formatter_chrono::_M_p, __formatter_chrono::_M_r)
    	(__formatter_chrono::_M_x, __formatter_chrono::_M_X)
    	(__formatter_chrono::_M_locale_fmt): Use _M_write.
    	* include/bits/version.def (format): Update value.
    	* include/bits/version.h: Regenerate.
    	* include/std/format (_GLIBCXX_P2518R3): Check feature test
    	macro instead of __cplusplus.
    	(basic_format_context): Declare __formatter_chrono as friend.
    	* src/c++20/Makefile.am: Add new file.
    	* src/c++20/Makefile.in: Regenerate.
    	* src/c++20/format.cc: New file.
    	* testsuite/std/time/format_localized.cc: New test.
    	* testsuite/util/testsuite_abi.cc: Add new symbol version.
    74b5101c
    History
    libstdc++: Handle encodings in localized chrono formatting [PR109162]
    Jonathan Wakely authored
    This implements the C++23 paper P2419R2 (Clarify handling of encodings
    in localized formatting of chrono types). The requirement is that when
    the literal encoding is "a Unicode encoding form" and the formatting
    locale uses a different encoding, any locale-specific strings such as
    "août" for std::chrono::August should be converted to the literal
    encoding.
    
    Using the recently-added std::locale::encoding() function we can check
    the locale's encoding and then use iconv if a conversion is needed.
    Because nl_langinfo_l and iconv_open both allocate memory, a naive
    implementation would perform multiple allocations and deallocations for
    every snippet of locale-specific text that needs to be converted to
    UTF-8. To avoid that, a new internal locale::facet is defined to store
    the text_encoding and an iconv_t descriptor, which are then cached in
    the formatting locale. This requires access to the internals of a
    std::locale object in src/c++20/format.cc, so that new file needs to be
    compiled with -fno-access-control, as well as -std=gnu++26 in order to
    use std::text_encoding.
    
    Because the new std::text_encoding and std::locale::encoding() symbols
    are only in the libstdc++exp.a archive, we need to include
    src/c++26/text_encoding.cc in the main library, but not export its
    symbols yet. This means they can be used by the two new functions which
    are exported from the main library.
    
    The encoding conversions are done for C++20, treating it as a DR that
    resolves LWG 3656.
    
    With this change we can increase the value of the __cpp_lib_format macro
    for C++23. The value should be 202207 for P2419R2, but we already
    implement P2510R3 (Formatting pointers) so can use the value 202304.
    
    libstdc++-v3/ChangeLog:
    
    	PR libstdc++/109162
    	* acinclude.m4 (libtool_VERSION): Update to 6:34:0.
    	* config/abi/pre/gnu.ver: Disambiguate old patters. Add new
    	GLIBCXX_3.4.34 symbol version and new exports.
    	* configure: Regenerate.
    	* include/bits/chrono_io.h (_ChronoSpec::_M_locale_specific):
    	Add new accessor functions to use a reserved bit in _Spec.
    	(__formatter_chrono::_M_parse): Use _M_locale_specific(true)
    	when chrono-specs contains locale-dependent conversion
    	specifiers.
    	(__formatter_chrono::_M_format): Open iconv descriptor if
    	conversion to UTF-8 will be needed.
    	(__formatter_chrono::_M_write): New function to write a
    	localized string with possible character conversion.
    	(__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B)
    	(__formatter_chrono::_M_p, __formatter_chrono::_M_r)
    	(__formatter_chrono::_M_x, __formatter_chrono::_M_X)
    	(__formatter_chrono::_M_locale_fmt): Use _M_write.
    	* include/bits/version.def (format): Update value.
    	* include/bits/version.h: Regenerate.
    	* include/std/format (_GLIBCXX_P2518R3): Check feature test
    	macro instead of __cplusplus.
    	(basic_format_context): Declare __formatter_chrono as friend.
    	* src/c++20/Makefile.am: Add new file.
    	* src/c++20/Makefile.in: Regenerate.
    	* src/c++20/format.cc: New file.
    	* testsuite/std/time/format_localized.cc: New test.
    	* testsuite/util/testsuite_abi.cc: Add new symbol version.