Skip to content
Snippets Groups Projects
  • Lewis Hyatt's avatar
    998eb2a1
    libcpp: Support extended characters for #pragma {push,pop}_macro [PR109704] · 998eb2a1
    Lewis Hyatt authored
    The implementation of #pragma push_macro and #pragma pop_macro has to date
    made use of an ad-hoc function, _cpp_lex_identifier(), which lexes an
    identifier out of a string. When support was added for extended characters
    in identifiers ($, UCNs, or UTF-8), that support was added only for the
    "normal" way of lexing identifiers out of a cpp_buffer (_cpp_lex_direct) and
    not for the ad-hoc way. Consequently, extended identifiers are not usable
    with these pragmas.
    
    The logic for lexing identifiers has become more complicated than it was
    when _cpp_lex_identifier() was written -- it now handles things like \N{}
    escapes in C++, for instance -- and it no longer seems practical to maintain
    a redundant code path for lexing identifiers. Address the issue by changing
    the implementation of #pragma {push,pop}_macro to lex identifiers in the
    expected way, i.e. by pushing a cpp_buffer and lexing the identifier from
    there.
    
    The existing implementation has some quirks because of the ad-hoc parsing
    logic. For example:
    
     #pragma push_macro("X ")
     ...
     #pragma pop_macro("X")
    
    will not restore macro X (note the extra space in the first string). However:
    
     #pragma push_macro("X ")
     ...
     #pragma pop_macro("X ")
    
    actually does sucessfully restore "X". This is because the key for looking
    up the saved macro on the push stack is the original string passed, so the
    string passed to pop_macro needs to match it exactly. It is not that easy to
    reproduce this logic in the world of extended characters, given that for
    example it should be valid to pass a UCN to push_macro, and the
    corresponding UTF-8 to pop_macro. Given that this aspect of the existing
    behavior seems unintentional and has no tests (and does not match other
    implementations), I opted to make the new logic more straightforward. The
    string passed needs to lex to one token, which must be a valid identifier,
    or else no action is taken and no error is generated. Any diagnostics
    encountered during lexing (e.g., due to a UTF-8 character not permitted to
    appear in an identifier) are also suppressed.
    
    It could be nice (for GCC 15) to also add a warning if a pop_macro does not
    match a previous push_macro.
    
    libcpp/ChangeLog:
    
    	PR preprocessor/109704
    	* include/cpplib.h (class cpp_auto_suppress_diagnostics): New class.
    	* errors.cc
    	(cpp_auto_suppress_diagnostics::cpp_auto_suppress_diagnostics): New
    	function.
    	(cpp_auto_suppress_diagnostics::~cpp_auto_suppress_diagnostics): New
    	function.
    	* charset.cc (noop_diagnostic_cb): Remove.
    	(cpp_interpret_string_ranges): Refactor diagnostic suppression logic
    	into new class cpp_auto_suppress_diagnostics.
    	(count_source_chars): Likewise.
    	* directives.cc (cpp_pop_definition): Add cpp_hashnode argument.
    	(lex_identifier_from_string): New static helper function.
    	(push_pop_macro_common): Refactor common logic from
    	do_pragma_push_macro and do_pragma_pop_macro; use
    	lex_identifier_from_string instead of _cpp_lex_identifier.
    	(do_pragma_push_macro): Reimplement using push_pop_macro_common.
    	(do_pragma_pop_macro): Likewise.
    	* internal.h (_cpp_lex_identifier): Remove.
    	* lex.cc (lex_identifier_intern): Remove.
    	(_cpp_lex_identifier): Remove.
    
    gcc/testsuite/ChangeLog:
    
    	PR preprocessor/109704
    	* c-c++-common/cpp/pragma-push-pop-utf8.c: New test.
    	* g++.dg/pch/pushpop-2.C: New test.
    	* g++.dg/pch/pushpop-2.Hs: New test.
    	* gcc.dg/pch/pushpop-2.c: New test.
    	* gcc.dg/pch/pushpop-2.hs: New test.
    998eb2a1
    History
    libcpp: Support extended characters for #pragma {push,pop}_macro [PR109704]
    Lewis Hyatt authored
    The implementation of #pragma push_macro and #pragma pop_macro has to date
    made use of an ad-hoc function, _cpp_lex_identifier(), which lexes an
    identifier out of a string. When support was added for extended characters
    in identifiers ($, UCNs, or UTF-8), that support was added only for the
    "normal" way of lexing identifiers out of a cpp_buffer (_cpp_lex_direct) and
    not for the ad-hoc way. Consequently, extended identifiers are not usable
    with these pragmas.
    
    The logic for lexing identifiers has become more complicated than it was
    when _cpp_lex_identifier() was written -- it now handles things like \N{}
    escapes in C++, for instance -- and it no longer seems practical to maintain
    a redundant code path for lexing identifiers. Address the issue by changing
    the implementation of #pragma {push,pop}_macro to lex identifiers in the
    expected way, i.e. by pushing a cpp_buffer and lexing the identifier from
    there.
    
    The existing implementation has some quirks because of the ad-hoc parsing
    logic. For example:
    
     #pragma push_macro("X ")
     ...
     #pragma pop_macro("X")
    
    will not restore macro X (note the extra space in the first string). However:
    
     #pragma push_macro("X ")
     ...
     #pragma pop_macro("X ")
    
    actually does sucessfully restore "X". This is because the key for looking
    up the saved macro on the push stack is the original string passed, so the
    string passed to pop_macro needs to match it exactly. It is not that easy to
    reproduce this logic in the world of extended characters, given that for
    example it should be valid to pass a UCN to push_macro, and the
    corresponding UTF-8 to pop_macro. Given that this aspect of the existing
    behavior seems unintentional and has no tests (and does not match other
    implementations), I opted to make the new logic more straightforward. The
    string passed needs to lex to one token, which must be a valid identifier,
    or else no action is taken and no error is generated. Any diagnostics
    encountered during lexing (e.g., due to a UTF-8 character not permitted to
    appear in an identifier) are also suppressed.
    
    It could be nice (for GCC 15) to also add a warning if a pop_macro does not
    match a previous push_macro.
    
    libcpp/ChangeLog:
    
    	PR preprocessor/109704
    	* include/cpplib.h (class cpp_auto_suppress_diagnostics): New class.
    	* errors.cc
    	(cpp_auto_suppress_diagnostics::cpp_auto_suppress_diagnostics): New
    	function.
    	(cpp_auto_suppress_diagnostics::~cpp_auto_suppress_diagnostics): New
    	function.
    	* charset.cc (noop_diagnostic_cb): Remove.
    	(cpp_interpret_string_ranges): Refactor diagnostic suppression logic
    	into new class cpp_auto_suppress_diagnostics.
    	(count_source_chars): Likewise.
    	* directives.cc (cpp_pop_definition): Add cpp_hashnode argument.
    	(lex_identifier_from_string): New static helper function.
    	(push_pop_macro_common): Refactor common logic from
    	do_pragma_push_macro and do_pragma_pop_macro; use
    	lex_identifier_from_string instead of _cpp_lex_identifier.
    	(do_pragma_push_macro): Reimplement using push_pop_macro_common.
    	(do_pragma_pop_macro): Likewise.
    	* internal.h (_cpp_lex_identifier): Remove.
    	* lex.cc (lex_identifier_intern): Remove.
    	(_cpp_lex_identifier): Remove.
    
    gcc/testsuite/ChangeLog:
    
    	PR preprocessor/109704
    	* c-c++-common/cpp/pragma-push-pop-utf8.c: New test.
    	* g++.dg/pch/pushpop-2.C: New test.
    	* g++.dg/pch/pushpop-2.Hs: New test.
    	* gcc.dg/pch/pushpop-2.c: New test.
    	* gcc.dg/pch/pushpop-2.hs: New test.