Skip to content
Snippets Groups Projects
  • Jakub Jelinek's avatar
    0b8c57ed
    libcpp: Add -Winvalid-utf8 warning [PR106655] · 0b8c57ed
    Jakub Jelinek authored
    The following patch introduces a new warning - -Winvalid-utf8 similarly
    to what clang now has - to diagnose invalid UTF-8 byte sequences in
    comments, but not just in those, but also in string/character literals
    and outside of them.
    
    The warning is on by default when explicit -finput-charset=UTF-8 is
    used and C++23 compilation is requested and if -{,W}pedantic or
    -pedantic-errors it is actually a pedwarn.
    
    The reason it is on by default only for -finput-charset=UTF-8 is
    that the sources often are UTF-8, but sometimes could be some ASCII
    compatible single byte encoding where non-ASCII characters only
    appear in comments.  So having the warning off by default
    is IMO desirable.  The C++23 pedantic mode for when the source code
    is UTF-8 is -std=c++23 -pedantic-errors -finput-charset=UTF-8.
    
    2022-09-01  Jakub Jelinek  <jakub@redhat.com>
    
    	PR c++/106655
    libcpp/
    	* include/cpplib.h (struct cpp_options): Implement C++23
    	P2295R6 - Support for UTF-8 as a portable source file encoding.
    	Add cpp_warn_invalid_utf8 and cpp_input_charset_explicit fields.
    	(enum cpp_warning_reason): Add CPP_W_INVALID_UTF8 enumerator.
    	* init.cc (cpp_create_reader): Initialize cpp_warn_invalid_utf8
    	and cpp_input_charset_explicit.
    	* charset.cc (_cpp_valid_utf8): Adjust function comment.
    	* lex.cc (UCS_LIMIT): Define.
    	(utf8_continuation): New const variable.
    	(utf8_signifier): Move earlier in the file.
    	(_cpp_warn_invalid_utf8, _cpp_handle_multibyte_utf8): New functions.
    	(_cpp_skip_block_comment): Handle -Winvalid-utf8 warning.
    	(skip_line_comment): Likewise.
    	(lex_raw_string, lex_string): Likewise.
    	(_cpp_lex_direct): Likewise.
    gcc/
    	* doc/invoke.texi (-Winvalid-utf8): Document it.
    gcc/c-family/
    	* c.opt (-Winvalid-utf8): New warning.
    	* c-opts.cc (c_common_handle_option) <case OPT_finput_charset_>:
    	Set cpp_opts->cpp_input_charset_explicit.
    	(c_common_post_options): If -finput-charset=UTF-8 is explicit
    	in C++23, enable -Winvalid-utf8 by default and if -pedantic
    	or -pedantic-errors, make it a pedwarn.
    gcc/testsuite/
    	* c-c++-common/cpp/Winvalid-utf8-1.c: New test.
    	* c-c++-common/cpp/Winvalid-utf8-2.c: New test.
    	* c-c++-common/cpp/Winvalid-utf8-3.c: New test.
    	* g++.dg/cpp23/Winvalid-utf8-1.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-2.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-3.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-4.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-5.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-6.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-7.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-8.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-9.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-10.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-11.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-12.C: New test.
    0b8c57ed
    History
    libcpp: Add -Winvalid-utf8 warning [PR106655]
    Jakub Jelinek authored
    The following patch introduces a new warning - -Winvalid-utf8 similarly
    to what clang now has - to diagnose invalid UTF-8 byte sequences in
    comments, but not just in those, but also in string/character literals
    and outside of them.
    
    The warning is on by default when explicit -finput-charset=UTF-8 is
    used and C++23 compilation is requested and if -{,W}pedantic or
    -pedantic-errors it is actually a pedwarn.
    
    The reason it is on by default only for -finput-charset=UTF-8 is
    that the sources often are UTF-8, but sometimes could be some ASCII
    compatible single byte encoding where non-ASCII characters only
    appear in comments.  So having the warning off by default
    is IMO desirable.  The C++23 pedantic mode for when the source code
    is UTF-8 is -std=c++23 -pedantic-errors -finput-charset=UTF-8.
    
    2022-09-01  Jakub Jelinek  <jakub@redhat.com>
    
    	PR c++/106655
    libcpp/
    	* include/cpplib.h (struct cpp_options): Implement C++23
    	P2295R6 - Support for UTF-8 as a portable source file encoding.
    	Add cpp_warn_invalid_utf8 and cpp_input_charset_explicit fields.
    	(enum cpp_warning_reason): Add CPP_W_INVALID_UTF8 enumerator.
    	* init.cc (cpp_create_reader): Initialize cpp_warn_invalid_utf8
    	and cpp_input_charset_explicit.
    	* charset.cc (_cpp_valid_utf8): Adjust function comment.
    	* lex.cc (UCS_LIMIT): Define.
    	(utf8_continuation): New const variable.
    	(utf8_signifier): Move earlier in the file.
    	(_cpp_warn_invalid_utf8, _cpp_handle_multibyte_utf8): New functions.
    	(_cpp_skip_block_comment): Handle -Winvalid-utf8 warning.
    	(skip_line_comment): Likewise.
    	(lex_raw_string, lex_string): Likewise.
    	(_cpp_lex_direct): Likewise.
    gcc/
    	* doc/invoke.texi (-Winvalid-utf8): Document it.
    gcc/c-family/
    	* c.opt (-Winvalid-utf8): New warning.
    	* c-opts.cc (c_common_handle_option) <case OPT_finput_charset_>:
    	Set cpp_opts->cpp_input_charset_explicit.
    	(c_common_post_options): If -finput-charset=UTF-8 is explicit
    	in C++23, enable -Winvalid-utf8 by default and if -pedantic
    	or -pedantic-errors, make it a pedwarn.
    gcc/testsuite/
    	* c-c++-common/cpp/Winvalid-utf8-1.c: New test.
    	* c-c++-common/cpp/Winvalid-utf8-2.c: New test.
    	* c-c++-common/cpp/Winvalid-utf8-3.c: New test.
    	* g++.dg/cpp23/Winvalid-utf8-1.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-2.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-3.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-4.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-5.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-6.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-7.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-8.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-9.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-10.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-11.C: New test.
    	* g++.dg/cpp23/Winvalid-utf8-12.C: New test.