Skip to content
Snippets Groups Projects
  • David Malcolm's avatar
    bd5e882c
    diagnostics: escape non-ASCII source bytes for certain diagnostics · bd5e882c
    David Malcolm authored
    
    This patch adds support to GCC's diagnostic subsystem for escaping certain
    bytes and Unicode characters when quoting source code.
    
    Specifically, this patch adds a new flag rich_location::m_escape_on_output
    which is a hint from a diagnostic that non-ASCII bytes in the pertinent
    lines of the user's source code should be escaped when printed.
    
    The patch sets this for the following diagnostics:
    - when complaining about stray bytes in the program (when these
    are non-printable)
    - when complaining about "null character(s) ignored");
    - for -Wnormalized= (and generate source ranges for such warnings)
    
    The escaping is controlled by a new option:
      -fdiagnostics-escape-format=[unicode|bytes]
    
    For example, consider a diagnostic involing a source line containing the
    string "before" followed by the Unicode character U+03C0 ("GREEK SMALL
    LETTER PI", with UTF-8 encoding 0xCF 0x80) followed by the byte 0xBF
    (a stray UTF-8 trailing byte), followed by the string "after", where the
    diagnostic highlights the U+03C0 character.
    
    By default, this line will be printed verbatim to the user when
    reporting a diagnostic at it, as:
    
     beforeπXafter
           ^
    
    (using X for the stray byte to avoid putting invalid UTF-8 in this
    commit message)
    
    If the diagnostic sets the "escape" flag, it will be printed as:
    
     before<U+03C0><BF>after
           ^~~~~~~~
    
    with -fdiagnostics-escape-format=unicode (the default), or as:
    
      before<CF><80><BF>after
            ^~~~~~~~
    
    if the user supplies -fdiagnostics-escape-format=bytes.
    
    This only affects how the source is printed; it does not affect
    how column numbers that are printed (as per -fdiagnostics-column-unit=
    and -fdiagnostics-column-origin=).
    
    gcc/c-family/ChangeLog:
    	* c-lex.c (c_lex_with_flags): When complaining about non-printable
    	CPP_OTHER tokens, set the "escape on output" flag.
    
    gcc/ChangeLog:
    	* common.opt (fdiagnostics-escape-format=): New.
    	(diagnostics_escape_format): New enum.
    	(DIAGNOSTICS_ESCAPE_FORMAT_UNICODE): New enum value.
    	(DIAGNOSTICS_ESCAPE_FORMAT_BYTES): Likewise.
    	* diagnostic-format-json.cc (json_end_diagnostic): Add
    	"escape-source" attribute.
    	* diagnostic-show-locus.c
    	(exploc_with_display_col::exploc_with_display_col): Replace
    	"tabstop" param with a cpp_char_column_policy and add an "aspect"
    	param.  Use these to compute m_display_col accordingly.
    	(struct char_display_policy): New struct.
    	(layout::m_policy): New field.
    	(layout::m_escape_on_output): New field.
    	(def_policy): New function.
    	(make_range): Update for changes to exploc_with_display_col ctor.
    	(default_print_decoded_ch): New.
    	(width_per_escaped_byte): New.
    	(escape_as_bytes_width): New.
    	(escape_as_bytes_print): New.
    	(escape_as_unicode_width): New.
    	(escape_as_unicode_print): New.
    	(make_policy): New.
    	(layout::layout): Initialize new fields.  Update m_exploc ctor
    	call for above change to ctor.
    	(layout::maybe_add_location_range): Update for changes to
    	exploc_with_display_col ctor.
    	(layout::calculate_x_offset_display): Update for change to
    	cpp_display_width.
    	(layout::print_source_line): Pass policy
    	to cpp_display_width_computation. Capture cpp_decoded_char when
    	calling process_next_codepoint.  Move printing of source code to
    	m_policy.m_print_cb.
    	(line_label::line_label): Pass in policy rather than context.
    	(layout::print_any_labels): Update for change to line_label ctor.
    	(get_affected_range): Pass in policy rather than context, updating
    	calls to location_compute_display_column accordingly.
    	(get_printed_columns): Likewise, also for cpp_display_width.
    	(correction::correction): Pass in policy rather than tabstop.
    	(correction::compute_display_cols): Pass m_policy rather than
    	m_tabstop to cpp_display_width.
    	(correction::m_tabstop): Replace with...
    	(correction::m_policy): ...this.
    	(line_corrections::line_corrections): Pass in policy rather than
    	context.
    	(line_corrections::m_context): Replace with...
    	(line_corrections::m_policy): ...this.
    	(line_corrections::add_hint): Update to use m_policy rather than
    	m_context.
    	(line_corrections::add_hint): Likewise.
    	(layout::print_trailing_fixits): Likewise.
    	(selftest::test_display_widths): New.
    	(selftest::test_layout_x_offset_display_utf8): Update to use
    	policy rather than tabstop.
    	(selftest::test_one_liner_labels_utf8): Add test of escaping
    	source lines.
    	(selftest::test_diagnostic_show_locus_one_liner_utf8): Update to
    	use policy rather than tabstop.
    	(selftest::test_overlapped_fixit_printing): Likewise.
    	(selftest::test_overlapped_fixit_printing_utf8): Likewise.
    	(selftest::test_overlapped_fixit_printing_2): Likewise.
    	(selftest::test_tab_expansion): Likewise.
    	(selftest::test_escaping_bytes_1): New.
    	(selftest::test_escaping_bytes_2): New.
    	(selftest::diagnostic_show_locus_c_tests): Call the new tests.
    	* diagnostic.c (diagnostic_initialize): Initialize
    	context->escape_format.
    	(convert_column_unit): Update to use default character width policy.
    	(selftest::test_diagnostic_get_location_text): Likewise.
    	* diagnostic.h (enum diagnostics_escape_format): New enum.
    	(diagnostic_context::escape_format): New field.
    	* doc/invoke.texi (-fdiagnostics-escape-format=): New option.
    	(-fdiagnostics-format=): Add "escape-source" attribute to examples
    	of JSON output, and document it.
    	* input.c (location_compute_display_column): Pass in "policy"
    	rather than "tabstop", passing to
    	cpp_byte_column_to_display_column.
    	(selftest::test_cpp_utf8): Update to use cpp_char_column_policy.
    	* input.h (class cpp_char_column_policy): New forward decl.
    	(location_compute_display_column): Pass in "policy" rather than
    	"tabstop".
    	* opts.c (common_handle_option): Handle
    	OPT_fdiagnostics_escape_format_.
    	* selftest.c (temp_source_file::temp_source_file): New ctor
    	overload taking a size_t.
    	* selftest.h (temp_source_file::temp_source_file): Likewise.
    
    gcc/testsuite/ChangeLog:
    	* c-c++-common/diagnostic-format-json-1.c: Add regexp to consume
    	"escape-source" attribute.
    	* c-c++-common/diagnostic-format-json-2.c: Likewise.
    	* c-c++-common/diagnostic-format-json-3.c: Likewise.
    	* c-c++-common/diagnostic-format-json-4.c: Likewise, twice.
    	* c-c++-common/diagnostic-format-json-5.c: Likewise.
    	* gcc.dg/cpp/warn-normalized-4-bytes.c: New test.
    	* gcc.dg/cpp/warn-normalized-4-unicode.c: New test.
    	* gcc.dg/encoding-issues-bytes.c: New test.
    	* gcc.dg/encoding-issues-unicode.c: New test.
    	* gfortran.dg/diagnostic-format-json-1.F90: Add regexp to consume
    	"escape-source" attribute.
    	* gfortran.dg/diagnostic-format-json-2.F90: Likewise.
    	* gfortran.dg/diagnostic-format-json-3.F90: Likewise.
    
    libcpp/ChangeLog:
    	* charset.c (convert_escape): Use encoding_rich_location when
    	complaining about nonprintable unknown escape sequences.
    	(cpp_display_width_computation::::cpp_display_width_computation):
    	Pass in policy rather than tabstop.
    	(cpp_display_width_computation::process_next_codepoint): Add "out"
    	param and populate *out if non-NULL.
    	(cpp_display_width_computation::advance_display_cols): Pass NULL
    	to process_next_codepoint.
    	(cpp_byte_column_to_display_column): Pass in policy rather than
    	tabstop.  Pass NULL to process_next_codepoint.
    	(cpp_display_column_to_byte_column): Pass in policy rather than
    	tabstop.
    	* errors.c (cpp_diagnostic_get_current_location): New function,
    	splitting out the logic from...
    	(cpp_diagnostic): ...here.
    	(cpp_warning_at): New function.
    	(cpp_pedwarning_at): New function.
    	* include/cpplib.h (cpp_warning_at): New decl for rich_location.
    	(cpp_pedwarning_at): Likewise.
    	(struct cpp_decoded_char): New.
    	(struct cpp_char_column_policy): New.
    	(cpp_display_width_computation::cpp_display_width_computation):
    	Replace "tabstop" param with "policy".
    	(cpp_display_width_computation::process_next_codepoint): Add "out"
    	param.
    	(cpp_display_width_computation::m_tabstop): Replace with...
    	(cpp_display_width_computation::m_policy): ...this.
    	(cpp_byte_column_to_display_column): Replace "tabstop" param with
    	"policy".
    	(cpp_display_width): Likewise.
    	(cpp_display_column_to_byte_column): Likewise.
    	* include/line-map.h (rich_location::escape_on_output_p): New.
    	(rich_location::set_escape_on_output): New.
    	(rich_location::m_escape_on_output): New.
    	* internal.h (cpp_diagnostic_get_current_location): New decl.
    	(class encoding_rich_location): New.
    	* lex.c (skip_whitespace): Use encoding_rich_location when
    	complaining about null characters.
    	(warn_about_normalization): Generate a source range when
    	complaining about improperly normalized tokens, rather than just a
    	point, and use encoding_rich_location so that the source code
    	is escaped on printing.
    	* line-map.c (rich_location::rich_location): Initialize
    	m_escape_on_output.
    
    Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
    bd5e882c
    History
    diagnostics: escape non-ASCII source bytes for certain diagnostics
    David Malcolm authored
    
    This patch adds support to GCC's diagnostic subsystem for escaping certain
    bytes and Unicode characters when quoting source code.
    
    Specifically, this patch adds a new flag rich_location::m_escape_on_output
    which is a hint from a diagnostic that non-ASCII bytes in the pertinent
    lines of the user's source code should be escaped when printed.
    
    The patch sets this for the following diagnostics:
    - when complaining about stray bytes in the program (when these
    are non-printable)
    - when complaining about "null character(s) ignored");
    - for -Wnormalized= (and generate source ranges for such warnings)
    
    The escaping is controlled by a new option:
      -fdiagnostics-escape-format=[unicode|bytes]
    
    For example, consider a diagnostic involing a source line containing the
    string "before" followed by the Unicode character U+03C0 ("GREEK SMALL
    LETTER PI", with UTF-8 encoding 0xCF 0x80) followed by the byte 0xBF
    (a stray UTF-8 trailing byte), followed by the string "after", where the
    diagnostic highlights the U+03C0 character.
    
    By default, this line will be printed verbatim to the user when
    reporting a diagnostic at it, as:
    
     beforeπXafter
           ^
    
    (using X for the stray byte to avoid putting invalid UTF-8 in this
    commit message)
    
    If the diagnostic sets the "escape" flag, it will be printed as:
    
     before<U+03C0><BF>after
           ^~~~~~~~
    
    with -fdiagnostics-escape-format=unicode (the default), or as:
    
      before<CF><80><BF>after
            ^~~~~~~~
    
    if the user supplies -fdiagnostics-escape-format=bytes.
    
    This only affects how the source is printed; it does not affect
    how column numbers that are printed (as per -fdiagnostics-column-unit=
    and -fdiagnostics-column-origin=).
    
    gcc/c-family/ChangeLog:
    	* c-lex.c (c_lex_with_flags): When complaining about non-printable
    	CPP_OTHER tokens, set the "escape on output" flag.
    
    gcc/ChangeLog:
    	* common.opt (fdiagnostics-escape-format=): New.
    	(diagnostics_escape_format): New enum.
    	(DIAGNOSTICS_ESCAPE_FORMAT_UNICODE): New enum value.
    	(DIAGNOSTICS_ESCAPE_FORMAT_BYTES): Likewise.
    	* diagnostic-format-json.cc (json_end_diagnostic): Add
    	"escape-source" attribute.
    	* diagnostic-show-locus.c
    	(exploc_with_display_col::exploc_with_display_col): Replace
    	"tabstop" param with a cpp_char_column_policy and add an "aspect"
    	param.  Use these to compute m_display_col accordingly.
    	(struct char_display_policy): New struct.
    	(layout::m_policy): New field.
    	(layout::m_escape_on_output): New field.
    	(def_policy): New function.
    	(make_range): Update for changes to exploc_with_display_col ctor.
    	(default_print_decoded_ch): New.
    	(width_per_escaped_byte): New.
    	(escape_as_bytes_width): New.
    	(escape_as_bytes_print): New.
    	(escape_as_unicode_width): New.
    	(escape_as_unicode_print): New.
    	(make_policy): New.
    	(layout::layout): Initialize new fields.  Update m_exploc ctor
    	call for above change to ctor.
    	(layout::maybe_add_location_range): Update for changes to
    	exploc_with_display_col ctor.
    	(layout::calculate_x_offset_display): Update for change to
    	cpp_display_width.
    	(layout::print_source_line): Pass policy
    	to cpp_display_width_computation. Capture cpp_decoded_char when
    	calling process_next_codepoint.  Move printing of source code to
    	m_policy.m_print_cb.
    	(line_label::line_label): Pass in policy rather than context.
    	(layout::print_any_labels): Update for change to line_label ctor.
    	(get_affected_range): Pass in policy rather than context, updating
    	calls to location_compute_display_column accordingly.
    	(get_printed_columns): Likewise, also for cpp_display_width.
    	(correction::correction): Pass in policy rather than tabstop.
    	(correction::compute_display_cols): Pass m_policy rather than
    	m_tabstop to cpp_display_width.
    	(correction::m_tabstop): Replace with...
    	(correction::m_policy): ...this.
    	(line_corrections::line_corrections): Pass in policy rather than
    	context.
    	(line_corrections::m_context): Replace with...
    	(line_corrections::m_policy): ...this.
    	(line_corrections::add_hint): Update to use m_policy rather than
    	m_context.
    	(line_corrections::add_hint): Likewise.
    	(layout::print_trailing_fixits): Likewise.
    	(selftest::test_display_widths): New.
    	(selftest::test_layout_x_offset_display_utf8): Update to use
    	policy rather than tabstop.
    	(selftest::test_one_liner_labels_utf8): Add test of escaping
    	source lines.
    	(selftest::test_diagnostic_show_locus_one_liner_utf8): Update to
    	use policy rather than tabstop.
    	(selftest::test_overlapped_fixit_printing): Likewise.
    	(selftest::test_overlapped_fixit_printing_utf8): Likewise.
    	(selftest::test_overlapped_fixit_printing_2): Likewise.
    	(selftest::test_tab_expansion): Likewise.
    	(selftest::test_escaping_bytes_1): New.
    	(selftest::test_escaping_bytes_2): New.
    	(selftest::diagnostic_show_locus_c_tests): Call the new tests.
    	* diagnostic.c (diagnostic_initialize): Initialize
    	context->escape_format.
    	(convert_column_unit): Update to use default character width policy.
    	(selftest::test_diagnostic_get_location_text): Likewise.
    	* diagnostic.h (enum diagnostics_escape_format): New enum.
    	(diagnostic_context::escape_format): New field.
    	* doc/invoke.texi (-fdiagnostics-escape-format=): New option.
    	(-fdiagnostics-format=): Add "escape-source" attribute to examples
    	of JSON output, and document it.
    	* input.c (location_compute_display_column): Pass in "policy"
    	rather than "tabstop", passing to
    	cpp_byte_column_to_display_column.
    	(selftest::test_cpp_utf8): Update to use cpp_char_column_policy.
    	* input.h (class cpp_char_column_policy): New forward decl.
    	(location_compute_display_column): Pass in "policy" rather than
    	"tabstop".
    	* opts.c (common_handle_option): Handle
    	OPT_fdiagnostics_escape_format_.
    	* selftest.c (temp_source_file::temp_source_file): New ctor
    	overload taking a size_t.
    	* selftest.h (temp_source_file::temp_source_file): Likewise.
    
    gcc/testsuite/ChangeLog:
    	* c-c++-common/diagnostic-format-json-1.c: Add regexp to consume
    	"escape-source" attribute.
    	* c-c++-common/diagnostic-format-json-2.c: Likewise.
    	* c-c++-common/diagnostic-format-json-3.c: Likewise.
    	* c-c++-common/diagnostic-format-json-4.c: Likewise, twice.
    	* c-c++-common/diagnostic-format-json-5.c: Likewise.
    	* gcc.dg/cpp/warn-normalized-4-bytes.c: New test.
    	* gcc.dg/cpp/warn-normalized-4-unicode.c: New test.
    	* gcc.dg/encoding-issues-bytes.c: New test.
    	* gcc.dg/encoding-issues-unicode.c: New test.
    	* gfortran.dg/diagnostic-format-json-1.F90: Add regexp to consume
    	"escape-source" attribute.
    	* gfortran.dg/diagnostic-format-json-2.F90: Likewise.
    	* gfortran.dg/diagnostic-format-json-3.F90: Likewise.
    
    libcpp/ChangeLog:
    	* charset.c (convert_escape): Use encoding_rich_location when
    	complaining about nonprintable unknown escape sequences.
    	(cpp_display_width_computation::::cpp_display_width_computation):
    	Pass in policy rather than tabstop.
    	(cpp_display_width_computation::process_next_codepoint): Add "out"
    	param and populate *out if non-NULL.
    	(cpp_display_width_computation::advance_display_cols): Pass NULL
    	to process_next_codepoint.
    	(cpp_byte_column_to_display_column): Pass in policy rather than
    	tabstop.  Pass NULL to process_next_codepoint.
    	(cpp_display_column_to_byte_column): Pass in policy rather than
    	tabstop.
    	* errors.c (cpp_diagnostic_get_current_location): New function,
    	splitting out the logic from...
    	(cpp_diagnostic): ...here.
    	(cpp_warning_at): New function.
    	(cpp_pedwarning_at): New function.
    	* include/cpplib.h (cpp_warning_at): New decl for rich_location.
    	(cpp_pedwarning_at): Likewise.
    	(struct cpp_decoded_char): New.
    	(struct cpp_char_column_policy): New.
    	(cpp_display_width_computation::cpp_display_width_computation):
    	Replace "tabstop" param with "policy".
    	(cpp_display_width_computation::process_next_codepoint): Add "out"
    	param.
    	(cpp_display_width_computation::m_tabstop): Replace with...
    	(cpp_display_width_computation::m_policy): ...this.
    	(cpp_byte_column_to_display_column): Replace "tabstop" param with
    	"policy".
    	(cpp_display_width): Likewise.
    	(cpp_display_column_to_byte_column): Likewise.
    	* include/line-map.h (rich_location::escape_on_output_p): New.
    	(rich_location::set_escape_on_output): New.
    	(rich_location::m_escape_on_output): New.
    	* internal.h (cpp_diagnostic_get_current_location): New decl.
    	(class encoding_rich_location): New.
    	* lex.c (skip_whitespace): Use encoding_rich_location when
    	complaining about null characters.
    	(warn_about_normalization): Generate a source range when
    	complaining about improperly normalized tokens, rather than just a
    	point, and use encoding_rich_location so that the source code
    	is escaped on printing.
    	* line-map.c (rich_location::rich_location): Initialize
    	m_escape_on_output.
    
    Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>