Skip to content
Snippets Groups Projects
  • Jakub Jelinek's avatar
    819bc4f6
    libcpp: Fix valgrind errors on pr88974.c [PR112956] · 819bc4f6
    Jakub Jelinek authored
    On the c-c++-common/cpp/pr88974.c testcase I'm seeing
    ==600549== Conditional jump or move depends on uninitialised value(s)
    ==600549==    at 0x1DD3A05: cpp_get_token_1(cpp_reader*, unsigned int*) (macro.cc:3050)
    ==600549==    by 0x1DBFC7F: _cpp_parse_expr (expr.cc:1392)
    ==600549==    by 0x1DB9471: do_if(cpp_reader*) (directives.cc:2087)
    ==600549==    by 0x1DBB4D8: _cpp_handle_directive (directives.cc:572)
    ==600549==    by 0x1DCD488: _cpp_lex_token (lex.cc:3682)
    ==600549==    by 0x1DD3A97: cpp_get_token_1(cpp_reader*, unsigned int*) (macro.cc:2936)
    ==600549==    by 0x7F7EE4: scan_translation_unit (c-ppoutput.cc:350)
    ==600549==    by 0x7F7EE4: preprocess_file(cpp_reader*) (c-ppoutput.cc:106)
    ==600549==    by 0x7F6235: c_common_init() (c-opts.cc:1280)
    ==600549==    by 0x704C8B: lang_dependent_init (toplev.cc:1837)
    ==600549==    by 0x704C8B: do_compile (toplev.cc:2135)
    ==600549==    by 0x704C8B: toplev::main(int, char**) (toplev.cc:2306)
    ==600549==    by 0x7064BA: main (main.cc:39)
    error.  The problem is that _cpp_lex_direct can leave result->src_loc
    uninitialized in some cases and later on we use that location_t.
    
    _cpp_lex_direct essentially does:
      cppchar_t c;
    ...
      cpp_token *result = pfile->cur_token++;
    
     fresh_line:
      result->flags = 0;
    ...
      if (buffer->need_line)
        {
          if (pfile->state.in_deferred_pragma)
            {
              result->type = CPP_PRAGMA_EOL;
              ... // keeps result->src_loc uninitialized;
              return result;
            }
          if (!_cpp_get_fresh_line (pfile))
            {
              result->type = CPP_EOF;
              if (!pfile->state.in_directive && !pfile->state.parsing_args)
                {
                  result->src_loc = pfile->line_table->highest_line;
                  ...
                }
              ... // otherwise result->src_loc is sometimes uninitialized here
              return result;
            }
          ...
        }
    ...
      result->src_loc = pfile->line_table->highest_line;
    ...
      c = *buffer->cur++;
      switch (c)
        {
    ...
        case '\n':
    ...
          buffer->need_line = true;
          if (pfile->state.in_deferred_pragma)
            {
              result->type = CPP_PRAGMA_EOL;
    ...
              return result;
            }
          goto fresh_line;
    ...
        }
    ...
    So, if _cpp_lex_direct is called without buffer->need_line initially set,
    result->src_loc is always initialized (and actually hundreds of tests rely
    on that exact value it has), even when c == '\n' and we set that flag later
    on and goto fresh_line.  For CPP_PRAGMA_EOL case we have in that case
    separate handling and don't goto.
    But if _cpp_lex_direct is called with buffer->need_line initially set and
    either decide to return a CPP_PRAGMA_EOL token or if getting a new line fails
    for some reason and we return an CPP_ERROR token and we are in directive
    or parsing args state, it is kept uninitialized and can be whatever the
    allocation left it there as.
    
    The following patch attempts to keep the status quo, use value that was
    returned previously if it was initialized (i.e. we went through the
    goto fresh_line; statement in c == '\n' handling) and only initialize
    result->src_loc if it was uninitialized before.
    
    2023-12-13  Jakub Jelinek  <jakub@redhat.com>
    
    	PR preprocessor/112956
    	* lex.cc (_cpp_lex_direct): Initialize c to 0.
    	For CPP_PRAGMA_EOL tokens and if c == 0 also for CPP_EOF
    	set result->src_loc to highest locus.
    819bc4f6
    History
    libcpp: Fix valgrind errors on pr88974.c [PR112956]
    Jakub Jelinek authored
    On the c-c++-common/cpp/pr88974.c testcase I'm seeing
    ==600549== Conditional jump or move depends on uninitialised value(s)
    ==600549==    at 0x1DD3A05: cpp_get_token_1(cpp_reader*, unsigned int*) (macro.cc:3050)
    ==600549==    by 0x1DBFC7F: _cpp_parse_expr (expr.cc:1392)
    ==600549==    by 0x1DB9471: do_if(cpp_reader*) (directives.cc:2087)
    ==600549==    by 0x1DBB4D8: _cpp_handle_directive (directives.cc:572)
    ==600549==    by 0x1DCD488: _cpp_lex_token (lex.cc:3682)
    ==600549==    by 0x1DD3A97: cpp_get_token_1(cpp_reader*, unsigned int*) (macro.cc:2936)
    ==600549==    by 0x7F7EE4: scan_translation_unit (c-ppoutput.cc:350)
    ==600549==    by 0x7F7EE4: preprocess_file(cpp_reader*) (c-ppoutput.cc:106)
    ==600549==    by 0x7F6235: c_common_init() (c-opts.cc:1280)
    ==600549==    by 0x704C8B: lang_dependent_init (toplev.cc:1837)
    ==600549==    by 0x704C8B: do_compile (toplev.cc:2135)
    ==600549==    by 0x704C8B: toplev::main(int, char**) (toplev.cc:2306)
    ==600549==    by 0x7064BA: main (main.cc:39)
    error.  The problem is that _cpp_lex_direct can leave result->src_loc
    uninitialized in some cases and later on we use that location_t.
    
    _cpp_lex_direct essentially does:
      cppchar_t c;
    ...
      cpp_token *result = pfile->cur_token++;
    
     fresh_line:
      result->flags = 0;
    ...
      if (buffer->need_line)
        {
          if (pfile->state.in_deferred_pragma)
            {
              result->type = CPP_PRAGMA_EOL;
              ... // keeps result->src_loc uninitialized;
              return result;
            }
          if (!_cpp_get_fresh_line (pfile))
            {
              result->type = CPP_EOF;
              if (!pfile->state.in_directive && !pfile->state.parsing_args)
                {
                  result->src_loc = pfile->line_table->highest_line;
                  ...
                }
              ... // otherwise result->src_loc is sometimes uninitialized here
              return result;
            }
          ...
        }
    ...
      result->src_loc = pfile->line_table->highest_line;
    ...
      c = *buffer->cur++;
      switch (c)
        {
    ...
        case '\n':
    ...
          buffer->need_line = true;
          if (pfile->state.in_deferred_pragma)
            {
              result->type = CPP_PRAGMA_EOL;
    ...
              return result;
            }
          goto fresh_line;
    ...
        }
    ...
    So, if _cpp_lex_direct is called without buffer->need_line initially set,
    result->src_loc is always initialized (and actually hundreds of tests rely
    on that exact value it has), even when c == '\n' and we set that flag later
    on and goto fresh_line.  For CPP_PRAGMA_EOL case we have in that case
    separate handling and don't goto.
    But if _cpp_lex_direct is called with buffer->need_line initially set and
    either decide to return a CPP_PRAGMA_EOL token or if getting a new line fails
    for some reason and we return an CPP_ERROR token and we are in directive
    or parsing args state, it is kept uninitialized and can be whatever the
    allocation left it there as.
    
    The following patch attempts to keep the status quo, use value that was
    returned previously if it was initialized (i.e. we went through the
    goto fresh_line; statement in c == '\n' handling) and only initialize
    result->src_loc if it was uninitialized before.
    
    2023-12-13  Jakub Jelinek  <jakub@redhat.com>
    
    	PR preprocessor/112956
    	* lex.cc (_cpp_lex_direct): Initialize c to 0.
    	For CPP_PRAGMA_EOL tokens and if c == 0 also for CPP_EOF
    	set result->src_loc to highest locus.