Commits · f9642ffe7814396f31203f4366f78a43a01a215c · COBOLworx / gcc-cobol

Sep 03, 2024

Explicitly document that the "counted_by" attribute is only supported in C. · f9642ffe

Qing Zhao authored 6 months ago

The "counted_by" attribute currently is only supported in C, mention this
explicitly in documentation and also issue warnings when see "counted_by"
attribute in C++ with -Wattributes.

gcc/c-family/ChangeLog:

	* c-attribs.cc (handle_counted_by_attribute): Is ignored and issues
	warning with -Wattributes in C++ for now.

gcc/ChangeLog:

	* doc/extend.texi: Explicitly mentions counted_by is available
	only in C for now.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/flex-array-counted-by.C: New test.
	* g++.dg/ext/flex-array-counted-by-2.C: New test.

f9642ffe

c++: support C++11 attributes in C++98 · 3775f71c

Jason Merrill authored 6 months ago

I don't see any reason why we can't allow the [[]] attribute syntax in C++98
mode with a pedwarn just like many other C++11 features.  In fact, we
already do support it in some places in the grammar, but not in places that
check cp_nth_tokens_can_be_std_attribute_p.

Let's also follow the C front-end's lead in only warning about them when
-pedantic.

It still isn't necessary for this function to guard against Objective-C
message passing syntax; we handle that with tentative parsing in
cp_parser_statement, and we don't call this function in that context anyway.

gcc/cp/ChangeLog:

	* parser.cc (cp_nth_tokens_can_be_std_attribute_p): Don't check
	cxx_dialect.
	* error.cc (maybe_warn_cpp0x): Only complain about C++11 attributes
	if pedantic.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/gen-attrs-1.C: Also run in C++98 mode.
	* g++.dg/cpp0x/gen-attrs-11.C: Likewise.
	* g++.dg/cpp0x/gen-attrs-13.C: Likewise.
	* g++.dg/cpp0x/gen-attrs-15.C: Likewise.
	* g++.dg/cpp0x/gen-attrs-75.C: Don't expect C++98 warning after
	__extension__.

3775f71c

PR116080: Fix test suite checks for musttail · 1fad396d

Andi Kleen authored 7 months ago

This is a new attempt to fix PR116080. The previous try was reverted
because it just broke a bunch of tests, hiding the problem.

- musttail behaves differently than tailcall at -O0. Some of the test
run at -O0, so add separate effective target tests for musttail.
- New effective target tests need to use unique file names
to make dejagnu caching work
- Change the tests to use new targets
- Add a external_musttail test to check for target's ability
to do tail calls between translation units. This covers some powerpc
ABIs.

gcc/testsuite/ChangeLog:

	PR testsuite/116080
	* c-c++-common/musttail1.c: Use musttail target.
	* c-c++-common/musttail12.c: Use struct_musttail target.
	* c-c++-common/musttail2.c: Use musttail target.
	* c-c++-common/musttail3.c: Likewise.
	* c-c++-common/musttail4.c: Likewise.
	* c-c++-common/musttail7.c: Likewise.
	* c-c++-common/musttail8.c: Likewise.
	* g++.dg/musttail10.C: Likewise. Replace powerpc checks with
	external_musttail.
	* g++.dg/musttail11.C: Use musttail target.
	* g++.dg/musttail6.C: Use musttail target. Replace powerpc
	checks with external_musttail.
	* g++.dg/musttail9.C: Use musttail target.
	* lib/target-supports.exp: Add musttail, struct_musttail,
	external_musttail targets. Remove optimization for musttail.
	Use unique file names for musttail.

1fad396d

pretty-print: split up pretty_printer::format into subroutines · 07e74798

David Malcolm authored 6 months ago


The body of pretty_printer::format is almost 500 lines long,
mostly comprising two distinct phases.

This patch splits it up so that there are explicit subroutines
for the two different phases, reducing the scope of various
locals, and making it easier to e.g. put a breakpoint on phase 2.

No functional change intended.

gcc/ChangeLog:
	* pretty-print-markup.h (pp_markup::context::context): Drop
	params "buf" and "chunk_idx", initializing m_buf from pp.
	(pp_markup::context::m_chunk_idx): Drop field.
	* pretty-print.cc (pretty_printer::format): Convert param
	from a text_info * to a text_info &.  Split out phase 1
	and phase 2 into subroutines...
	(format_phase_1): New, from pretty_printer::format.
	(format_phase_2): Likewise.
	* pretty-print.h (pretty_printer::format): Convert param
	from a text_info * to a text_info &.
	(pp_format): Update for above change.  Assert that text_info is
	non-null.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

07e74798

pretty-print: add selftest of pp_format's stack · d0891f3a

David Malcolm authored 6 months ago


gcc/ChangeLog:
	* pretty-print-format-impl.h (pp_formatted_chunks::get_prev): New
	accessor.
	* pretty-print.cc (selftest::push_pp_format): New.
	(ASSERT_TEXT_TOKEN): New macro.
	(selftest::test_pp_format_stack): New test.
	(selftest::pretty_print_cc_tests): New.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

d0891f3a

pretty-print: naming cleanups · 34f01475

David Malcolm authored 6 months ago


This patch is a followup to r15-3311-ge31b6176996567 making some
cleanups to pretty-printing to reflect those changes:
- renaming "chunk_info" to "pp_formatted_chunks"
- renaming "cur_chunk_array" to "m_cur_fomatted_chunks"
- rewording/clarifying comments
and taking the opportunity to add a "m_" prefix to all fields of
output_buffer.

No functional change intended.

gcc/analyzer/ChangeLog:
	* analyzer-logging.cc (logger::logger): Prefix all output_buffer
	fields with "m_".

gcc/c-family/ChangeLog:
	* c-ada-spec.cc (dump_ada_node): Prefix all output_buffer fields
	with "m_".
	* c-pretty-print.cc (pp_c_integer_constant): Likewise.
	(pp_c_integer_constant): Likewise.
	(pp_c_floating_constant): Likewise.
	(pp_c_fixed_constant): Likewise.

gcc/c/ChangeLog:
	* c-objc-common.cc (print_type): Prefix all output_buffer fields
	with "m_".

gcc/cp/ChangeLog:
	* error.cc (type_to_string): Prefix all output_buffer fields with
	"m_".
	(append_formatted_chunk): Likewise.  Rename "chunk_info" to
	"pp_formatted_chunks" and field cur_chunk_array with
	m_cur_formatted_chunks.

gcc/fortran/ChangeLog:
	* error.cc (gfc_move_error_buffer_from_to): Prefix all
	output_buffer fields with "m_".
	(gfc_diagnostics_init): Likewise.

gcc/ChangeLog:
	* diagnostic.cc (diagnostic_set_caret_max_width): Prefix all
	output_buffer fields with "m_".
	* dumpfile.cc (emit_any_pending_textual_chunks): Likewise.
	(emit_any_pending_textual_chunks): Likewise.
	* gimple-pretty-print.cc (gimple_dump_bb_buff): Likewise.
	* json.cc (value::dump): Likewise.
	* pretty-print-format-impl.h (class chunk_info): Rename to...
	(class pp_formatted_chunks): ...this.  Add friend
	class output_buffer.  Update comment near end of decl to show
	the pp_formatted_chunks instance on the chunk_obstack.
	(pp_formatted_chunks::pop_from_output_buffer): Delete decl.
	(pp_formatted_chunks::on_begin_quote): Delete decl that should
	have been removed in r15-3311-ge31b6176996567.
	(pp_formatted_chunks::on_end_quote): Likewise.
	(pp_formatted_chunks::m_prev): Update for renaming.
	* pretty-print.cc (output_buffer::output_buffer): Prefix all
	fields with "m_".  Rename "cur_chunk_array" to
	"m_cur_formatted_chunks".
	(output_buffer::~output_buffer): Prefix all fields with "m_".
	(output_buffer::push_formatted_chunks): New.
	(output_buffer::pop_formatted_chunks): New.
	(pp_write_text_to_stream): Prefix all output_buffer fields with
	"m_".
	(pp_write_text_as_dot_label_to_stream): Likewise.
	(pp_write_text_as_html_like_dot_to_stream): Likewise.
	(chunk_info::append_formatted_chunk): Rename to...
	(pp_formatted_chunks::append_formatted_chunk): ...this.
	(chunk_info::pop_from_output_buffer): Delete.
	(pretty_printer::format): Update leading comment to mention
	pushing pp_formatted_chunks, and to reflect changes in
	r15-3311-ge31b6176996567.  Prefix all output_buffer fields with
	"m_".
	(pp_output_formatted_text): Update leading comment to mention
	popping a pp_formatted_chunks, and to reflect the changes in
	r15-3311-ge31b6176996567.  Prefix all output_buffer fields with
	"m_" and rename "cur_chunk_array" to "m_cur_formatted_chunks".
	Replace call to chunk_info::pop_from_output_buffer with a call to
	output_buffer::pop_formatted_chunks.
	(pp_flush): Prefix all output_buffer fields with "m_".
	(pp_really_flush): Likewise.
	(pp_clear_output_area): Likewise.
	(pp_append_text): Likewise.
	(pretty_printer::remaining_character_count_for_line): Likewise.
	(pp_newline): Likewise.
	(pp_character): Likewise.
	(pp_markup::context::push_back_any_text): Likewise.
	* pretty-print.h (class chunk_info): Rename to...
	(class pp_formatted_chunks): ...this.
	(class output_buffer): Delete unimplemented rule-of-5 members.
	(output_buffer::push_formatted_chunks): New decl.
	(output_buffer::pop_formatted_chunks): New decl.
	(output_buffer::formatted_obstack): Rename to...
	(output_buffer::m_formatted_obstack): ...this.
	(output_buffer::chunk_obstack): Rename to...
	(output_buffer::m_chunk_obstack): ...this.
	(output_buffer::obstack): Rename to...
	(output_buffer::m_obstack): ...this.
	(output_buffer::cur_chunk_array): Rename to...
	(output_buffer::m_cur_formatted_chunks): ...this.
	(output_buffer::stream): Rename to...
	(output_buffer::m_stream): ...this.
	(output_buffer::line_length): Rename to...
	(output_buffer::m_line_length): ...this.
	(output_buffer::digit_buffer): Rename to...
	(output_buffer::m_digit_buffer): ...this.
	(output_buffer::flush_p): Rename to...
	(output_buffer::m_flush_p): ...this.
	(output_buffer_formatted_text): Prefix all output_buffer fields
	with "m_".
	(output_buffer_append_r): Likewise.
	(output_buffer_last_position_in_text): Likewise.
	(pretty_printer::set_output_stream): Likewise.
	(pp_scalar): Likewise.
	(pp_wide_int): Likewise.
	* tree-pretty-print.cc (dump_generic_node): Likewise.
	(dump_generic_node): Likewise.
	(pp_double_int): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

34f01475

c++: add fixed test [PR109095] · 5f3a6e26

Marek Polacek authored 6 months ago

Fixed by r13-6693.

	PR c++/109095

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/nontype-class66.C: New test.

5f3a6e26

Zen5 tuning part 4: update reassocation width · f0ab3de6

Jan Hubicka authored 6 months ago

Zen5 has 6 instead of 4 ALUs and the integer multiplication can now execute in
3 of them.  FP units can do 2 additions and 2 multiplications with latency 2
and 3.  This patch updates reassociation width accordingly.  This has potential
of increasing register pressure but unlike while benchmarking znver1 tuning
I did not noticed this actually causing problem on spec, so this patch bumps
up reassociation width to 6 for everything except for integer vectors, where
there are 4 units with typical latency of 1.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_reassociation_width): Update for Znver5.
	* config/i386/x86-tune-costs.h (znver5_costs): Update reassociation
	widths.

f0ab3de6

Drop file that should not have been committed. · 36f63000
Jeff Law authored 6 months ago
```
	* J: Drop file that should not have been committed
```
36f63000
Zen5 tuning part 3: fix typo in previous patch · 910e1769
Jan Hubicka authored 6 months ago
```
gcc/ChangeLog:

	* config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Fix
	typo.
```
910e1769

libstdc++: Fix error handling in fs::hard_link_count for Windows · 71b1639c

Jonathan Wakely authored 6 months ago

The recent change to use auto_win_file_handle for
std::filesystem::hard_link_count caused a regression. The
std::error_code argument should be cleared if no error occurs, but this
no longer happens. Add a call to ec.clear() in fs::hard_link_count to
fix this.

Also change the auto_win_file_handle class to take a reference to the
std::error_code and set it if an error occurs, to slightly simplify the
control flow in the fs::equiv_files function.

libstdc++-v3/ChangeLog:

	* src/c++17/fs_ops.cc (auto_win_file_handle): Add error_code&
	member and set it if CreateFileW or GetFileInformationByHandle
	fails.
	(fs::equiv_files) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Simplify
	control flow.
	(fs::hard_link_count) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Clear ec
	on success.
	* testsuite/27_io/filesystem/operations/hard_link_count.cc:
	Check error handling.

71b1639c

libstdc++: Specialize std::disable_sized_sentinel_for for std::move_iterator [PR116549] · 819deae0

Jonathan Wakely authored 6 months ago

LWG 3736 added a partial specialization of this variable template for
two std::move_iterator types. This is needed for the case where the
types satisfy std::sentinel_for and are subtractable, but do not model
the semantics requirements of std::sized_sentinel_for.

libstdc++-v3/ChangeLog:

	PR libstdc++/116549
	* include/bits/stl_iterator.h (disable_sized_sentinel_for):
	Define specialization for two move_iterator types, as per LWG
	3736.
	* testsuite/24_iterators/move_iterator/lwg3736.cc: New test.

819deae0

Dump whether a SLP node represents load/store-lanes · ef0c4482

Richard Biener authored 6 months ago

This makes it easier to discover whether SLP load or store nodes
participate in load/store-lanes accesses.

	* tree-vect-slp.cc (vect_print_slp_tree): Annotate load
	and store-lanes nodes.

ef0c4482

Fix missed peeling for gaps with SLP load-lanes · bd120de1

Richard Biener authored 6 months ago

The following disables peeling for gap avoidance with using smaller
vector accesses when using load-lanes.

	* tree-vect-stmts.cc (get_group_load_store_type): Only disable
	peeling for gaps by using smaller vectors when not using
	load-lanes.

bd120de1

Zen5 tuning part 3: scheduler tweaks · e2125a60

Jan Hubicka authored 6 months ago

this patch adds support for new fussion in znver5 documented in the
optimization manual:

   The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions
   with certain ALU instructions. The following conditions need to be met for
   fusion to happen:
     - The MOV should be reg-reg mov with Opcode 0x89 or 0x8B
     - The MOV is followed by an ALU instruction where the MOV and ALU destination register match.
     - The ALU instruction may source only registers or immediate data. There cannot be any memory source.
     - The ALU instruction sources either the source or dest of MOV instruction.
     - If ALU instruction has 2 reg sources, they should be different.
     - The following ALU instructions can fuse with an older qualified MOV instruction:
       ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR
       (I assume OP is OR)

I also increased issue rate from 4 to 6.  Theoretically znver5 can do more, but
with our model we can't realy use it.
Increasing issue rate to 8 leads to infinite loop in scheduler.

Finally, I also enabled fuse_alu_and_branch since it is supported by
znver5 (I think by earlier zens too).

New fussion pattern moves quite few instructions around in common code:
@@ -2210,13 +2210,13 @@
        .cfi_offset 3, -32
        leaq    63(%rsi), %rbx
        movq    %rbx, %rbp
+       shrq    $6, %rbp
+       salq    $3, %rbp
        subq    $16, %rsp
        .cfi_def_cfa_offset 48
        movq    %rdi, %r12
-       shrq    $6, %rbp
-       movq    %rsi, 8(%rsp)
-       salq    $3, %rbp
        movq    %rbp, %rdi
+       movq    %rsi, 8(%rsp)
        call    _Znwm
        movq    8(%rsp), %rsi
        movl    $0, 8(%r12)
@@ -2224,8 +2224,8 @@
        movq    %rax, (%r12)
        movq    %rbp, 32(%r12)
        testq   %rsi, %rsi
-       movq    %rsi, %rdx
        cmovns  %rsi, %rbx
+       movq    %rsi, %rdx
        sarq    $63, %rdx
        shrq    $58, %rdx
        sarq    $6, %rbx
which should help decoder bandwidth and perhaps also cache, though I was not
able to measure off-noise effect on SPEC.

gcc/ChangeLog:

	* config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune.
	* config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5.
	(ix86_adjust_cost): Add TODO about znver5 memory latency.
	(ix86_fuse_mov_alu_p): New.
	(ix86_macro_fusion_pair_p): Use it.
	* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5.
	(X86_TUNE_FUSE_MOV_AND_ALU): New tune;

e2125a60

libstdc++: Simplify std::any to fix -Wdeprecated-declarations warning · dee3c5c6

Jonathan Wakely authored 6 months ago

We don't need to use std::aligned_storage in std::any. We just need a
POD type of the right size. The void* union member already ensures the
alignment will be correct. Avoiding std::aligned_storage means we don't
need to suppress a -Wdeprecated-declarations warning.

libstdc++-v3/ChangeLog:

	* include/experimental/any (experimental::any::_Storage): Use
	array of unsigned char instead of deprecated
	std::aligned_storage.
	* include/std/any (any::_Storage): Likewise.
	* testsuite/20_util/any/layout.cc: New test.

dee3c5c6

libstdc++: Add missing feature-test macro in various headers · efe6efb6

Dhruv Chawla authored 6 months ago


version.syn#2 requires various headers to define
__cpp_lib_allocator_traits_is_always_equal. Currently, only <memory> was
defining this macro. Implement fixes for the other headers as well.

Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com>

libstdc++-v3/ChangeLog:

	* include/std/deque: Define macro
	__glibcxx_want_allocator_traits_is_always_equal.
	* include/std/forward_list: Likewise.
	* include/std/list: Likewise.
	* include/std/map: Likewise.
	* include/std/scoped_allocator: Likewise.
	* include/std/set: Likewise.
	* include/std/string: Likewise.
	* include/std/unordered_map: Likewise.
	* include/std/unordered_set: Likewise.
	* include/std/vector: Likewise.
	* testsuite/20_util/headers/memory/version.cc: New test.
	* testsuite/20_util/scoped_allocator/version.cc: Likewise.
	* testsuite/21_strings/headers/string/version.cc: Likewise.
	* testsuite/23_containers/deque/version.cc: Likewise.
	* testsuite/23_containers/forward_list/version.cc: Likewise.
	* testsuite/23_containers/list/version.cc: Likewise.
	* testsuite/23_containers/map/version.cc: Likewise.
	* testsuite/23_containers/set/version.cc: Likewise.
	* testsuite/23_containers/unordered_map/version.cc: Likewise.
	* testsuite/23_containers/unordered_set/version.cc: Likewise.
	* testsuite/23_containers/vector/version.cc: Likewise.

efe6efb6

Zen5 tuning part 2: disable gather and scatter · d82edbe9

Jan Hubicka authored 6 months ago

We disable gathers for zen4.  It seems that gather has improved a bit compared
to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when
the indices are known ahead of time. Vector loads followed by shuffles result
in a higher load bandwidth." however the situation seems to be more
complicated.

gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot
products in TSVC. Curiously enough breaking these out into microbenchmark
reversed the situation and it turns out that the performance depends on
how indices are distributed.  gather is loss if indices are sequential,
neutral if they are random and win for some strides (4, 8).

This seems to be similar to earlier zens, so I think (especially for
backporting znver5 support) that it makes sense to be conistent and disable
gather unless we work out a good heuristics on when to use it. Since we
typically do not know the indices in advance, I don't see how that can be done.

I opened PR116582 with some examples of wins and loses

gcc/ChangeLog:

	* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for
	ZNVER5.
	(X86_TUNE_USE_SCATTER_2PARTS): Disable for ZNVER5.
	(X86_TUNE_USE_GATHER_4PARTS): Disable for ZNVER5.
	(X86_TUNE_USE_SCATTER_4PARTS): Disable for ZNVER5.
	(X86_TUNE_USE_GATHER_8PARTS): Disable for ZNVER5.
	(X86_TUNE_USE_SCATTER_8PARTS): Disable for ZNVER5.

d82edbe9

ipa: Don't disable function parameter analysis for fat LTO · 2f1689ea

H.J. Lu authored 6 months ago


Update analyze_parms not to disable function parameter analysis for
-ffat-lto-objects.  Tested on x86-64, there are no differences in zstd
with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects".

	PR ipa/116410
	* ipa-modref.cc (analyze_parms): Always analyze function parameter
	for LTO.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

2f1689ea

[PR target/115921] Improve reassociation for rv64 · 4371f656

Jeff Law authored 6 months ago

As Jovan pointed out in pr115921, we're not reassociating expressions like this
on rv64:

(x & 0x3e) << 12

It generates something like this:

        li      a5,258048
        slli    a0,a0,12
        and     a0,a0,a5

We have a pattern that's designed to clean this up.  Essentially reassociating
the operations so that we don't need to load the constant resulting in
something like this:

        andi    a0,a0,63
        slli    a0,a0,12

That pattern wasn't working for certain constants due to its condition. The
condition is trying to avoid cases where this kind of reassociation would
hinder shadd generation on rv64.  That condition was just written poorly.

This patch tightens up that condition in a few ways.  First, there's no need to
worry about shadd cases if ZBA is not enabled.  Second we can't use shadd if
the shift value isn't 1, 2 or 3.  Finally rather than open-coding one of the
tests, we can use an existing operand predicate.

The net is we'll start performing this transformation in more cases on rv64
while still avoiding reassociation if it would spoil shadd generation.

	PR target/115921
gcc/
	* config/riscv/riscv.md (reassociate bitwise ops): Tighten test for
	cases we do not want reassociate.

gcc/testsuite/
	* gcc.target/riscv/pr115921.c: New test.

4371f656

Zen5 tuning part 1: avoid FMA chains · d6360b40

Jan Hubicka authored 6 months ago

testing matrix multiplication benchmarks shows that FMA on a critical chain
is a perofrmance loss over separate multiply and add. While the latency of 4
is lower than multiply + add (3+2) the problem is that all values needs to
be ready before computation starts.

While on znver4 AVX512 code fared well with FMA, it was because of the split
registers. Znver5 benefits from avoding FMA on all widths.  This may be different
with the mobile version though.

On naive matrix multiplication benchmark the difference is 8% with -O3
only since with -Ofast loop interchange solves the problem differently.
It is 30% win, for example, on S323 from TSVC:

real_t s323(struct args_t * func_args)
{

//    recurrences
//    coupled recurrence

    initialise_arrays(__func__);
    gettimeofday(&func_args->t1, NULL);

    for (int nl = 0; nl < iterations/2; nl++) {
        for (int i = 1; i < LEN_1D; i++) {
            a[i] = b[i-1] + c[i] * d[i];
            b[i] = a[i] + c[i] * e[i];
        }
        dummy(a, b, c, d, e, aa, bb, cc, 0.);
    }

    gettimeofday(&func_args->t2, NULL);
    return calc_checksum(__func__);
}

gcc/ChangeLog:

	* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for
	znver5.
	(X86_TUNE_AVOID_256FMA_CHAINS): Likewise.
	(X86_TUNE_AVOID_512FMA_CHAINS): Likewise.

d6360b40

LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535] · 2fcccf21

Tobias Burnus authored 6 months ago

When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming
sufficient partiations, e.g., via -flto-partition=max), output_offload_tables
wrote the output tables once per fork.

	PR lto/116535

gcc/ChangeLog:

	* lto-cgraph.cc (output_offload_tables): Remove offload_ frees.
	* lto-streamer-out.cc (lto_output): Make call to it depend on
	lto_get_out_decl_state ()->output_offload_tables_p.
	* lto-streamer.h (struct lto_out_decl_state): Add
	output_offload_tables_p field.
	* tree-pass.h (ipa_write_optimization_summaries): Add bool argument.
	* passes.cc (ipa_write_summaries_1): Add bool
	output_offload_tables_p arg.
	(ipa_write_summaries): Update call.
	(ipa_write_optimization_summaries): Accept output_offload_tables_p.

gcc/lto/ChangeLog:

	* lto.cc (stream_out): Update call to
	ipa_write_optimization_summaries to pass true for first partition.

2fcccf21

MAINTAINERS: Update my email address · ce5f2dc4
Szabolcs Nagy authored 6 months ago
```
	* MAINTAINERS: Update my email address and add myself to DCO.
```
ce5f2dc4

tree-optimization/116575 - avoid ICE with SLP mask_load_lane · ac6cd62a

Richard Biener authored 6 months ago

The following avoids performing re-discovery with single lanes in
the attempt to for the use of mask_load_lane as rediscovery will
fail since a single lane of a mask load will appear permuted which
isn't supported.

	PR tree-optimization/116575
	* tree-vect-slp.cc (vect_analyze_slp): Properly compute
	the mask argument for vect_load/store_lanes_supported.
	When the load is masked for now avoid rediscovery.

	* gcc.dg/vect/pr116575.c: New testcase.

ac6cd62a

i386: Fix vfpclassph non-optimizied intrin · 9b312595

Haochen Jiang authored 6 months ago

The intrin for non-optimized got a typo in mask type, which will cause
the high bits of __mmask32 being unexpectedly zeroed.

The test does not fail under O0 with current 1b since the testcase is
wrong. We need to include avx512-mask-type.h after SIZE is defined, or
it will always be __mmask8. That problem also happened in AVX10.2 testcases.
I will write a seperate patch to fix that.

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h
	(_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32.
	(_mm512_fpclass_ph_mask): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.

9b312595

Do not assert NUM_POLY_INT_COEFFS != 1 early · 14b65af6

Richard Biener authored 6 months ago

The following moves the assert on NUM_POLY_INT_COEFFS != 1 after
INTEGER_CST processing.

	* fold-const.cc (poly_int_binop): Move assert on
	NUM_POLY_INT_COEFFS after INTEGER_CST processing.

14b65af6

lower-bitint: Fix up __builtin_{add,sub}_overflow{,_p} bitint lowering [PR116501] · d4d75a83

Jakub Jelinek authored 6 months ago

The following testcase is miscompiled.  The problem is in the last_ovf step.
The second operand has signed _BitInt(513) type but has the MSB clear,
so range_to_prec returns 512 for it (i.e. it fits into unsigned
_BitInt(512)).  Because of that the last step actually doesn't need to get
the most significant bit from the second operand, but the code was deciding
what to use purely from TYPE_UNSIGNED (type1) - if unsigned, use 0,
otherwise sign-extend the last processed bit; but that in this case was set.
We don't want to treat the positive operand as if it was negative regardless
of the bit below that precision, and precN >= 0 indicates that the operand
is in the [0, inf) range.

2024-09-03  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/116501
	* gimple-lower-bitint.cc (bitint_large_huge::lower_addsub_overflow):
	In the last_ovf case, use build_zero_cst operand not just when
	TYPE_UNSIGNED (typeN), but also when precN >= 0.

	* gcc.dg/torture/bitint-73.c: New test.

d4d75a83

ada: Add kludge for quirk of ancient 32-bit ABIs to previous change · a19cf635

Eric Botcazou authored 6 months ago

Some ancient 32-bit ABIs, most notably that of x86/Linux, misalign double
scalars in record types, so comparing DECL_ALIGN with TYPE_ALIGN directly
may give the wrong answer for them.

gcc/ada/

	* gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Add kludge
	to cope with ancient 32-bit ABIs.

a19cf635

ada: Plug loophole exposed by previous change · 9362abf5

Eric Botcazou authored 6 months ago

The change causes more temporaries to be created at call sites for unaligned
actual parameters, thus revealing that the machinery does not properly deal
with unconstrained nominal subtypes for them.

gcc/ada/

	* gcc-interface/trans.cc (create_temporary): Deal with types whose
	size is self-referential by allocating the maximum size.

9362abf5

ada: Fix internal error with Atomic Volatile_Full_Access object · 0a862c5a

Eric Botcazou authored 6 months ago

The initial implementation of the GNAT aspect/pragma Volatile_Full_Access
made it incompatible with Atomic, because it was not decided whether the
read-modify-write sequences generated by Volatile_Full_Access would need
to be implemented atomically when Atomic was also specified, which would
have required a compare-and-swap primitive from the target architecture.

But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic
in the process, answering the above question by the negative, so the
incompatibility between Volatile_Full_Access and Atomic was lifted in
Ada 2012 as well, unfortunately without adjusting the implementation.

gcc/ada/

	* gcc-interface/trans.cc (get_atomic_access): Deal specifically with
	nodes that are both Atomic and Volatile_Full_Access in Ada 2012.

0a862c5a

ada: Pass unaligned record components by copy in calls on all platforms · d8d19146

Eric Botcazou authored 7 months ago

This has historically been done only on platforms requiring the strict
alignment of memory references, but this can arguably be considered as
being mandated by the language on all of them.

gcc/ada/

	* gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Take into
	account the alignment of the field on all platforms.

d8d19146

ada: Fix internal error on pragma pack with discriminated record component · 9ba7262c

Eric Botcazou authored 7 months ago

When updating the size after making a packable type in gnat_to_gnu_field,
we fail to clear it again when it is not constant.

gcc/ada/

	* gcc-interface/decl.cc (gnat_to_gnu_field): Clear again gnu_size
	after updating it if it is not constant.

9ba7262c

ada: Simplify Note_Uplevel_Bound procedure · b3f6a790

Marc Poulhiès authored 7 months ago

The procedure Note_Uplevel_Bound was implemented as a custom expression
tree walk. This change replaces this custom tree traversal by a more
idiomatic use of Traverse_Proc.

gcc/ada/

	* exp_unst.adb (Check_Static_Type::Note_Uplevel_Bound): Refactor
	to use the generic Traverse_Proc.
	(Check_Static_Type): Adjust calls to Note_Uplevel_Bound as the
	previous second parameter was unused, so removed.

b3f6a790

ada: Transform Length attribute references for non-Strict overflow mode. · 1ef11f4b

Steve Baird authored 7 months ago

The non-strict overflow checking code does a better job of eliminating
overflow checks if given an expression consisting only of predefined
operators (including relationals), literals, identifiers, and conditional
expressions. If it is both feasible and useful, rewrite a
Length attribute reference as such an expression. "Feasible" means
"index type is same type as attribute reference type, so we can rewrite without
using type conversions". "Useful" means "Overflow_Mode is something other than
Strict, so there is value in making overflow check elimination easier".

gcc/ada/

	* exp_attr.adb (Expand_N_Attribute_Reference): If it makes sense
	to do so, then rewrite a Length attribute reference as an
	equivalent conditional expression.

1ef11f4b

ada: Do not warn for partial access to Atomic Volatile_Full_Access objects · d7e110d8

Eric Botcazou authored 6 months ago

The initial implementation of the GNAT aspect/pragma Volatile_Full_Access
made it incompatible with Atomic, because it was not decided whether the
read-modify-write sequences generated by Volatile_Full_Access would need
to be implemented atomically when Atomic was also specified, which would
have required a compare-and-swap primitive from the target architecture.

But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic
in the process, answering the above question by the negative, so the
incompatibility between Volatile_Full_Access and Atomic was lifted in
Ada 2012 as well, but the implementation was not entirely adjusted.

In Ada 2012, it does not make sense to warn for the partial access to an
Atomic object if the object is also declared Volatile_Full_Access, since
the object will be accessed as a whole in this case (like in Ada 2022).

gcc/ada/

	* sem_res.adb (Is_Atomic_Ref_With_Address): Rename into...
	(Is_Atomic_Non_VFA_Ref_With_Address): ...this and adjust the
	implementation to exclude Volatile_Full_Access objects.
	(Resolve_Indexed_Component): Adjust to above renaming.
	(Resolve_Selected_Component): Likewise.

d7e110d8

ada: Reject illegal array aggregates as per AI22-0106. · e083e728

Steve Baird authored 7 months ago

Implement the new legality rules of AI22-0106 which (as discussed in the AI)
are needed to disallow constructs whose semantics would otherwise be poorly
defined.

gcc/ada/

	* sem_aggr.adb (Resolve_Array_Aggregate): Implement the two new
	legality rules of AI11-0106. Add code to avoid cascading error
	messages.

e083e728

ada: Fix Finalize_Storage_Only bug in b-i-p calls · b776b08b

Bob Duff authored 6 months ago

Do not pass null for the Collection parameter when
Finalize_Storage_Only is in effect. If the collection
is null in that case, we will blow up later when we
deallocate the object.

gcc/ada/

	* exp_ch6.adb (Add_Collection_Actual_To_Build_In_Place_Call):
	Remove Finalize_Storage_Only from the code that checks whether to
	pass null to the Collection parameter. Having done that, we don't
	need to check for Is_Library_Level_Entity, because
	No_Heap_Finalization requires that. And if we ever change
	No_Heap_Finalization to allow nested access types, we will still
	want to pass null. Note that the comment "Such a type lacks a
	collection." is incorrect in the case of Finalize_Storage_Only;
	such types have a collection.

b776b08b

SVE intrinsics: Fold constant operands for svmul. · 6b1cf59e

Jennifer Schmitz authored 6 months ago


This patch implements constant folding for svmul by calling
gimple_folder::fold_const_binary with tree_code MULT_EXPR.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svmul_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
	Try constant folding.

gcc/testsuite/
	* gcc.target/aarch64/sve/const_fold_mul_1.c: New test.

6b1cf59e

SVE intrinsics: Fold constant operands for svdiv. · ee8b7231

Jennifer Schmitz authored 6 months ago


This patch implements constant folding for svdiv:
The new function aarch64_const_binop was created, which - in contrast to
int_const_binop - does not treat operations as overflowing. This function is
passed as callback to vector_const_binop from the new gimple_folder
method fold_const_binary, if the predicate is ptrue or predication is _x.
From svdiv_impl::fold, fold_const_binary is called with TRUNC_DIV_EXPR as
tree_code.
In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
for division by 0, as defined in the semantics for svdiv.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svdiv_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
	Try constant folding.
	* config/aarch64/aarch64-sve-builtins.h: Declare
	gimple_folder::fold_const_binary.
	* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
	New function to fold binary SVE intrinsics without overflow.
	(gimple_folder::fold_const_binary): New helper function for
	constant folding of SVE intrinsics.

gcc/testsuite/
	* gcc.target/aarch64/sve/const_fold_div_1.c: New test.

ee8b7231

SVE intrinsics: Refactor const_binop to allow constant folding of intrinsics. · 87217bea

Jennifer Schmitz authored 6 months ago


This patch sets the stage for constant folding of binary operations for SVE
intrinsics:
In fold-const.cc, the code for folding vector constants was moved from
const_binop to a new function vector_const_binop. This function takes a
function pointer as argument specifying how to fold the vector elements.
The intention is to call vector_const_binop from the backend with an
aarch64-specific callback function.
The code in const_binop for folding operations where the first operand is a
vector constant and the second argument is an integer constant was also moved
into vector_const_binop to to allow folding of binary SVE intrinsics where
the second operand is an integer (_n).
To allow calling poly_int_binop from the backend, the latter was made public.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* fold-const.h: Declare vector_const_binop.
	* fold-const.cc (const_binop): Remove cases for vector constants.
	(vector_const_binop): New function that folds vector constants
	element-wise.
	(int_const_binop): Remove call to wide_int_binop.
	(poly_int_binop): Add call to wide_int_binop.

87217bea