Skip to content
Snippets Groups Projects
  • Jakub Jelinek's avatar
    194825f2
    c++: Implement C++26 P1854R4 - Making non-encodable string literals ill-formed [PR110341] · 194825f2
    Jakub Jelinek authored
    This paper voted in as DR makes some multi-character literals ill-formed.
    'abcd' stays valid, but e.g. 'á' is newly invalid in UTF-8 exec charset
    while valid e.g. in ISO-8859-1, because it is a single character which needs
    2 bytes to be encoded.
    
    The following patch does that by checking (only pedantically, especially
    because it is a DR) if we'd emit a -Wmultichar warning because character
    constant has more than one byte in it whether the number of source characters
    is equal to the number of bytes in the multichar string.
    If it is, it is normal multi-character literal constant
    and is diagnosed normally with -Wmultichar, otherwise at least one of the
    c-chars in the sequence was encoded as 2+ bytes.
    
    2023-11-14  Jakub Jelinek  <jakub@redhat.com>
    
    	PR c++/110341
    libcpp/
    	* charset.cc: Implement C++26 P1854R4 - Making non-encodable string
    	literals ill-formed.
    	(one_count_chars, convert_count_chars, count_source_chars): New
    	functions.
    	(narrow_str_to_charconst): Change last arg type from cpp_ttype to
    	const cpp_token *.  For C++ if pedantic and i > 1 in CPP_CHAR
    	interpret token also as CPP_STRING32 and if number of characters
    	in the CPP_STRING32 is larger than number of bytes in CPP_CHAR,
    	pedwarn on it.  Make the diagnostics more detailed.
    	(wide_str_to_charconst): Change last arg type from cpp_ttype to
    	const cpp_token *.  Make the diagnostics more detailed.
    	(cpp_interpret_charconst): Adjust narrow_str_to_charconst and
    	wide_str_to_charconst callers.
    gcc/testsuite/
    	* g++.dg/cpp26/literals1.C: New test.
    	* g++.dg/cpp26/literals2.C: New test.
    	* g++.dg/cpp23/wchar-multi1.C: Adjust expected diagnostic wordings.
    	* g++.dg/cpp23/wchar-multi2.C: Likewise.
    	* gcc.dg/c23-utf8char-3.c: Likewise.
    	* gcc.dg/cpp/charconst-4.c: Likewise.
    	* gcc.dg/cpp/charconst.c: Likewise.
    	* gcc.dg/cpp/if-2.c: Likewise.
    	* gcc.dg/utf16-4.c: Likewise.
    	* gcc.dg/utf32-4.c: Likewise.
    	* g++.dg/cpp1z/utf8-neg.C: Likewise.
    	* g++.dg/cpp2a/ucn2.C: Likewise.
    	* g++.dg/ext/utf16-4.C: Likewise.
    	* g++.dg/ext/utf32-4.C: Likewise.
    194825f2
    History
    c++: Implement C++26 P1854R4 - Making non-encodable string literals ill-formed [PR110341]
    Jakub Jelinek authored
    This paper voted in as DR makes some multi-character literals ill-formed.
    'abcd' stays valid, but e.g. 'á' is newly invalid in UTF-8 exec charset
    while valid e.g. in ISO-8859-1, because it is a single character which needs
    2 bytes to be encoded.
    
    The following patch does that by checking (only pedantically, especially
    because it is a DR) if we'd emit a -Wmultichar warning because character
    constant has more than one byte in it whether the number of source characters
    is equal to the number of bytes in the multichar string.
    If it is, it is normal multi-character literal constant
    and is diagnosed normally with -Wmultichar, otherwise at least one of the
    c-chars in the sequence was encoded as 2+ bytes.
    
    2023-11-14  Jakub Jelinek  <jakub@redhat.com>
    
    	PR c++/110341
    libcpp/
    	* charset.cc: Implement C++26 P1854R4 - Making non-encodable string
    	literals ill-formed.
    	(one_count_chars, convert_count_chars, count_source_chars): New
    	functions.
    	(narrow_str_to_charconst): Change last arg type from cpp_ttype to
    	const cpp_token *.  For C++ if pedantic and i > 1 in CPP_CHAR
    	interpret token also as CPP_STRING32 and if number of characters
    	in the CPP_STRING32 is larger than number of bytes in CPP_CHAR,
    	pedwarn on it.  Make the diagnostics more detailed.
    	(wide_str_to_charconst): Change last arg type from cpp_ttype to
    	const cpp_token *.  Make the diagnostics more detailed.
    	(cpp_interpret_charconst): Adjust narrow_str_to_charconst and
    	wide_str_to_charconst callers.
    gcc/testsuite/
    	* g++.dg/cpp26/literals1.C: New test.
    	* g++.dg/cpp26/literals2.C: New test.
    	* g++.dg/cpp23/wchar-multi1.C: Adjust expected diagnostic wordings.
    	* g++.dg/cpp23/wchar-multi2.C: Likewise.
    	* gcc.dg/c23-utf8char-3.c: Likewise.
    	* gcc.dg/cpp/charconst-4.c: Likewise.
    	* gcc.dg/cpp/charconst.c: Likewise.
    	* gcc.dg/cpp/if-2.c: Likewise.
    	* gcc.dg/utf16-4.c: Likewise.
    	* gcc.dg/utf32-4.c: Likewise.
    	* g++.dg/cpp1z/utf8-neg.C: Likewise.
    	* g++.dg/cpp2a/ucn2.C: Likewise.
    	* g++.dg/ext/utf16-4.C: Likewise.
    	* g++.dg/ext/utf32-4.C: Likewise.