Skip to content
Snippets Groups Projects
  • Lewis Hyatt's avatar
    1d3e4f4e
    libcpp: Handle extended characters in user-defined literal suffix [PR103902] · 1d3e4f4e
    Lewis Hyatt authored
    The PR complains that we do not handle UTF-8 in the suffix for a user-defined
    literal, such as:
    
    bool operator ""_π (unsigned long long);
    
    In fact we don't handle any extended identifier characters there, whether
    UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
    the "" tokens is included, since then the identifier is lexed in the
    "normal" way as its own token. But when it is lexed as part of the string
    token, this is handled in lex_string() with a one-off loop that is not aware
    of extended characters.
    
    This patch fixes it by adding a new function scan_cur_identifier() that can
    be used to lex an identifier while in the middle of lexing another token.
    
    BTW, the other place that has been mis-lexing identifiers is
    lex_identifier_intern(), which is used to implement #pragma push_macro
    and #pragma pop_macro. This does not support extended characters either.
    I will add that in a subsequent patch, because it can't directly reuse the
    new function, but rather needs to lex from a string instead of a cpp_buffer.
    
    With scan_cur_identifier(), we do also correctly warn about bidi and
    normalization issues in the extended identifiers comprising the suffix.
    
    libcpp/ChangeLog:
    
    	PR preprocessor/103902
    	* lex.cc (identifier_diagnostics_on_lex): New function refactoring
    	some common code.
    	(lex_identifier_intern): Use the new function.
    	(lex_identifier): Don't run identifier diagnostics here, rather let
    	the call site do it when needed.
    	(_cpp_lex_direct): Adjust the call sites of lex_identifier ()
    	acccordingly.
    	(struct scan_id_result): New struct.
    	(scan_cur_identifier): New function.
    	(create_literal2): New function.
    	(lit_accum::create_literal2): New function.
    	(is_macro): Folded into new function...
    	(maybe_ignore_udl_macro_suffix): ...here.
    	(is_macro_not_literal_suffix): Folded likewise.
    	(lex_raw_string): Handle UTF-8 in UDL suffix via
    	scan_cur_identifier ().
    	(lex_string): Likewise.
    
    gcc/testsuite/ChangeLog:
    
    	PR preprocessor/103902
    	* g++.dg/cpp0x/udlit-extended-id-1.C: New test.
    	* g++.dg/cpp0x/udlit-extended-id-2.C: New test.
    	* g++.dg/cpp0x/udlit-extended-id-3.C: New test.
    	* g++.dg/cpp0x/udlit-extended-id-4.C: New test.
    1d3e4f4e
    History
    libcpp: Handle extended characters in user-defined literal suffix [PR103902]
    Lewis Hyatt authored
    The PR complains that we do not handle UTF-8 in the suffix for a user-defined
    literal, such as:
    
    bool operator ""_π (unsigned long long);
    
    In fact we don't handle any extended identifier characters there, whether
    UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
    the "" tokens is included, since then the identifier is lexed in the
    "normal" way as its own token. But when it is lexed as part of the string
    token, this is handled in lex_string() with a one-off loop that is not aware
    of extended characters.
    
    This patch fixes it by adding a new function scan_cur_identifier() that can
    be used to lex an identifier while in the middle of lexing another token.
    
    BTW, the other place that has been mis-lexing identifiers is
    lex_identifier_intern(), which is used to implement #pragma push_macro
    and #pragma pop_macro. This does not support extended characters either.
    I will add that in a subsequent patch, because it can't directly reuse the
    new function, but rather needs to lex from a string instead of a cpp_buffer.
    
    With scan_cur_identifier(), we do also correctly warn about bidi and
    normalization issues in the extended identifiers comprising the suffix.
    
    libcpp/ChangeLog:
    
    	PR preprocessor/103902
    	* lex.cc (identifier_diagnostics_on_lex): New function refactoring
    	some common code.
    	(lex_identifier_intern): Use the new function.
    	(lex_identifier): Don't run identifier diagnostics here, rather let
    	the call site do it when needed.
    	(_cpp_lex_direct): Adjust the call sites of lex_identifier ()
    	acccordingly.
    	(struct scan_id_result): New struct.
    	(scan_cur_identifier): New function.
    	(create_literal2): New function.
    	(lit_accum::create_literal2): New function.
    	(is_macro): Folded into new function...
    	(maybe_ignore_udl_macro_suffix): ...here.
    	(is_macro_not_literal_suffix): Folded likewise.
    	(lex_raw_string): Handle UTF-8 in UDL suffix via
    	scan_cur_identifier ().
    	(lex_string): Likewise.
    
    gcc/testsuite/ChangeLog:
    
    	PR preprocessor/103902
    	* g++.dg/cpp0x/udlit-extended-id-1.C: New test.
    	* g++.dg/cpp0x/udlit-extended-id-2.C: New test.
    	* g++.dg/cpp0x/udlit-extended-id-3.C: New test.
    	* g++.dg/cpp0x/udlit-extended-id-4.C: New test.