Commits · 3184f6a565ed5efab39faf9eee764f393c74442d · COBOLworx / gcc-cobol

Jan 16, 2025

lm32: Args with arg.named false still get passed in regs · 3184f6a5

Keith Packard authored 2 months ago

	* config/lm32/lm32.cc (lm32_function_arg): Pass unnamed
	arguments in registers too, just like named arguments.

3184f6a5

Fix an incorrect file header comment for the core2 scheduling model · efd00e3a
Andi Kleen authored 2 months ago
```
Committed as obvious.

gcc/ChangeLog:

	* config/i386/x86-tune-sched-core.cc: Fix incorrect comment.
```
efd00e3a

Fix setting of call graph node AutoFDO count · e683c6b0

Eugene Rozenfeld authored 2 months ago

We are initializing both the call graph node count and
the entry block count of the function with the head_count value
from the profile.

Count propagation algorithm may refine the entry block count
and we may end up with a case where the call graph node count
is set to zero but the entry block count is non-zero. That becomes
a problem because we have this code in execute_fixup_cfg:

 profile_count num = node->count;
 profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
 bool scale = num.initialized_p () && !(num == den);

Here if num is 0 but den is not 0, scale becomes true and we
lose the counts in

if (scale)
  bb->count = bb->count.apply_scale (num, den);

This is what happened in the issue reported in PR116743
(a 10% regression in MySQL HAMMERDB tests).
3d9e6767 made an improvement in
AutoFDO count propagation, which caused a mismatch between
the call graph node count (zero) and the entry block count (non-zero)
and subsequent loss of counts as described above.

The fix is to update the call graph node count once we've done count propagation.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
	PR gcov-profile/116743
	* auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count
	and the entry block count.

e683c6b0

Daily bump. · 14f337e3
GCC Administrator authored 2 months ago

14f337e3

Jan 15, 2025

libstdc++: Fix use of internal feature test macro in test · 79d55040

Jonathan Wakely authored 2 months ago

This test should use __cpp_lib_ios_noreplace rather than the internal
__glibcxx_ios_noreplace macro.

libstdc++-v3/ChangeLog:

	* testsuite/27_io/ios_base/types/openmode/case_label.cc: Use
	standard feature test macro not internal one.

Unverified

79d55040

libstdc++: Fix fancy pointer test for std::set · f079feec

Jonathan Wakely authored 2 months ago

The alloc_ptr.cc test for std::set tries to use C++17 features
unconditionally, and tries to use the C++23 range members which haven't
been implemented for std::set yet.

Some of the range checks are left in place but commented out, so they
can be added after the ranges members are implemented. Others (such as
prepend_range) are not valid for std::set at all.

Also fix uses of internal feature test macros in two other tests, which
should use the standard __cpp_lib_xxx macros.

libstdc++-v3/ChangeLog:

	* testsuite/23_containers/set/requirements/explicit_instantiation/alloc_ptr.cc:
	Guard node extraction checks with feature test macro. Remove
	calls to non-existent range members.
	* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr.cc:
	Use standard macro not internal one.
	* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr.cc:
	Likewise.

Unverified

f079feec

match: Simplify `1 >> x` into `x == 0` [PR102705] · 903ab914

Andrew Pinski authored 2 months ago


This in this PR we have missed optimization where we miss that,
`1 >> x` and `(1 >> x) ^ 1` can't be equal. There are a few ways of
optimizing this, the easiest and simpliest is to simplify `1 >> x` into
just `x == 0` as those are equivalant (if we ignore out of range values for x).
we already have an optimization for `(1 >> X) !=/== 0` so the only difference
here is we don't need the `!=/== 0` part to do the transformation.

So this removes the `(1 >> X) !=/== 0` transformation and just adds a simplfied
`1 >> x` -> `x == 0` one.

Bootstrapped and tested on x86_64-linux-gnu.

	PR tree-optimization/102705

gcc/ChangeLog:

	* match.pd (`(1 >> X) != 0`): Remove pattern.
	(`1 >> x`): New pattern.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/pr105832-2.c: Update testcase.
	* gcc.dg/tree-ssa/pr96669-1.c: Likewise.
	* gcc.dg/tree-ssa/pr102705-1.c: New test.
	* gcc.dg/tree-ssa/pr102705-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

903ab914

doc: cleanup trailing whitespace · c340ff20
Sam James authored 2 months ago
```
gcc/ChangeLog:

	* doc/extend.texi: Cleanup trailing whitespace.
```
Unverified

c340ff20

doc: trivial grammar fix · d8e52444

Sam James authored 2 months ago

We say 'a constant .. expression' elsewhere. Fix the grammar.

gcc/ChangeLog:

	* doc/extend.texi: Add 'a' for grammar fix.

Unverified

d8e52444

libstdc++: Fix reversed args in unreachable assumption [PR109849] · 6f85a972

Jonathan Wakely authored 2 months ago

libstdc++-v3/ChangeLog:

	PR libstdc++/109849
	* include/bits/vector.tcc (vector::_M_range_insert): Fix
	reversed args in length calculation.

Unverified

6f85a972

Fortran: reject NULL as source-expr in ALLOCATE with SOURCE= or MOLD= [PR71884] · 89230999

Harald Anlauf authored 2 months ago

	PR fortran/71884

gcc/fortran/ChangeLog:

	* resolve.cc (resolve_allocate_expr): Reject intrinsic NULL as
	source-expr.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pr71884.f90: New test.

89230999

c++: Handle RAW_DATA_CST in unify [PR118390] · 2619413a

Jakub Jelinek authored 2 months ago

This patch uses the count_ctor_elements function to fix up
unify deduction of array sizes.

2025-01-15  Jakub Jelinek  <jakub@redhat.com>

	PR c++/118390
	* cp-tree.h (count_ctor_elements): Declare.
	* call.cc (count_ctor_elements): No longer static.
	* pt.cc (unify): Use count_ctor_elements instead of
	CONSTRUCTOR_NELTS.

	* g++.dg/cpp/embed-20.C: New test.
	* g++.dg/cpp0x/pr118390.C: New test.

2619413a

AArch64: Update neoverse512tvb tuning · 4ce502f3

Wilco Dijkstra authored 2 months ago

Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the
missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.

gcc:
	* config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.

4ce502f3

AArch64: Add FULLY_PIPELINED_FMA to tune baseline · 2713f6bb

Wilco Dijkstra authored 4 months ago

Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is
already enabled for some cores, but benchmarking it shows it is faster on all
modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1).

gcc:
	* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE):
	Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
	* config/aarch64/tuning_models/ampere1b.h: Remove redundant
	AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
	* config/aarch64/tuning_models/neoversev2.h: Likewise.

2713f6bb

AArch64: Deprecate -mabi=ilp32 · 625ea3c6

Wilco Dijkstra authored 2 months ago

ILP32 was originally intended to make porting to AArch64 easier.  Support was
never merged in the Linux kernel or GLIBC, so it has been unsupported for many
years.  There isn't a benefit in keeping unsupported features forever, so
deprecate it now (and it could be removed in a future release).

gcc:
	* config/aarch64/aarch64.cc (aarch64_override_options): Add warning.
	* doc/invoke.texi: Document -mabi=ilp32 as deprecated.

gcc/testsuite:
	* gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated.
	* gcc.target/aarch64/pr100518.c: Likewise.
	* gcc.target/aarch64/pr113114.c: Likewise.
	* gcc.target/aarch64/pr80295.c: Likewise.
	* gcc.target/aarch64/pr94201.c: Likewise.
	* gcc.target/aarch64/pr94577.c: Likewise.
	* gcc.target/aarch64/sve/pr108603.c: Likewise.

625ea3c6

bpf: set index entry for a VAR_DECL in CO-RE relocs · 01c37f9a

Cupertino Miranda authored 2 months ago

CO-RE accesses with non pointer struct variables will also generate a
"0" string access within the CO-RE relocation.
The first index within the access string, has sort of a different
meaning then the remaining of the indexes.
For i0:i1:...:in being an access index for "struct A a" declaration, its
semantics are represented by:
  (&a + (sizeof(struct A) * i0) + offsetof(i1:...:in)

gcc/ChangeLog:
	* config/bpf/core-builtins.cc (compute_field_expr): Change
	VAR_DECL outcome in switch case.

gcc/testsuite/ChangeLog:
	* gcc.target/bpf/core-builtin-1.c: Correct test.
	* gcc.target/bpf/core-builtin-2.c: Correct test.
	* gcc.target/bpf/core-builtin-exprlist-1.c: Correct test.

01c37f9a

bpf: calls do not promote attr access_index on lhs · 42786ccf

Cupertino Miranda authored 2 months ago

When traversing gimple to introduce CO-RE relocation entries to
expressions that are accesses to attributed perserve_access_index types,
the access is likely to be split in multiple gimple statments.
In order to keep doing the proper CO-RE convertion we will need to mark
the LHS tree nodes of gimple expressions as explicit CO-RE accesses,
such that the gimple traverser will further convert the sub-expressions.

This patch makes sure that this LHS marking will not happen in case the
gimple statement is a function call, which case it is no longer
expecting to keep generating CO-RE accesses with the remaining of the
expression.

gcc/ChangeLog:

	* config/bpf/core-builtins.cc
	(make_gimple_core_safe_access_index): Fix in condition.

gcc/testsuite/ChangeLog:

	* gcc.target/bpf/core-attr-calls.c: New test.

42786ccf

bpf: make sure CO-RE relocs are typed with struct BTF_KIND_STRUCT · d30def00

Cupertino Miranda authored 2 months ago

Based on observation within bpf-next selftests and comparisson of GCC
and clang compiled code, the BPF loader expects all CO-RE relocations to
point to BTF non const and non volatile type nodes.

gcc/ChangeLog:

	* btfout.cc (get_btf_kind): Remove static from function definition.
	* config/bpf/btfext-out.cc (bpf_code_reloc_add): Check if CO-RE type
	is not a const or volatile.
	* ctfc.h (btf_dtd_kind): Add prototype for function.

gcc/testsuite/ChangeLog:

	* gcc.target/bpf/core-attr-const.c: New test.

d30def00

c++: Implement mangling of RAW_DATA_CST [PR118278] · 8d9d5834

Jakub Jelinek authored 2 months ago

As the following testcases show (mangle80.C only after reversion of the
temporary reversion of C++ large array speedup commit), RAW_DATA_CST can
be seen during mangling of some templates and we ICE because
the mangler doesn't handle it.

The following patch handles it and mangles it the same as a sequence of
INTEGER_CSTs that were used previously instead.
The only slight complication is that if ce->value is the last nonzero
element, we need to skip the zeros at the end of RAW_DATA_CST.

2025-01-03  Jakub Jelinek  <jakub@redhat.com>

	PR c++/118278
	* mangle.cc (write_expression): Handle RAW_DATA_CST.

	* g++.dg/abi/mangle80.C: New test.
	* g++.dg/cpp/embed-19.C: New test.

8d9d5834

c++: handle decltype in nested-name-spec printing [PR118139] · 1bc474f6

Marek Polacek authored 2 months ago


Compiling this test, we emit:

  error: 'static void CW<T>::operator=(int) requires requires(typename'decltype_type' not supported by pp_cxx_unqualified_id::type x) {x;}' must be a non-static member function

where the DECLTYPE_TYPE isn't printed properly.  This patch fixes that
to print:

error: 'static void CW<T>::operator=(int) requires requires(typename decltype(T())::type x) {x;}' must be a non-static member function

	PR c++/118139

gcc/cp/ChangeLog:

	* cxx-pretty-print.cc (pp_cxx_nested_name_specifier): Handle
	a computed-type-specifier.

gcc/testsuite/ChangeLog:

	* g++.dg/diagnostic/decltype1.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

1bc474f6

libstdc++: Fix comments in test that reference wrong subclause of C++11 · 9cc31b4e

Jonathan Wakely authored 3 months ago

libstdc++-v3/ChangeLog:

	* testsuite/28_regex/traits/char/transform_primary.cc: Fix
	subclause numbering in references to the standard.

Unverified

9cc31b4e

middle-end: Fix incorrect type replacement in operands_equals [PR118472] · 25eb892a

Tamar Christina authored 2 months ago

In g:3c32575e I made a mistake and incorrectly
replaced the type of the arguments of an expression with the type of the
expression.  This is of course wrong.

This reverts that change and I have also double checked the other replacements
and they are fine.

gcc/ChangeLog:

	PR middle-end/118472
	* fold-const.cc (operand_compare::operand_equal_p): Fix incorrect
	replacement.

gcc/testsuite/ChangeLog:

	PR middle-end/118472
	* gcc.dg/pr118472.c: New test.

25eb892a

Annotate dbg_line_numbers table · bea593f1

Richard Biener authored 2 months ago

The following adds /* <num> */ to dbg_line_numbers so there's the chance
to more easily lookup the ID of the match.pd line number used for
dumping when you want to debug a speicific replacement.  It also cuts
the lines down to 10 entries.

  static int dbg_line_numbers[1267] = {
        /* 0 */ 161, 164, 173, 175, 178, 181, 183, 189, 197, 195,
        /* 10 */ 199, 201, 205, 923, 921, 2060, 2071, 2052, 2058, 2063,
...

	* genmatch.cc (define_dump_logs): Make reverse lookup in
	dbg_line_numbers easier by adding comments with start index
	and cutting number of elements per line to 10.

bea593f1

testsuite: i386: Fix expected vectoriziation in pr105493.c · 120a3700

Christoph Müllner authored 2 months ago


As reported in PR117079, commit ab187858 broke the test pr105493.c.
The test code contains two loops, where the first one is exected to be
vectorized.  The commit that broke that vectorization was the first of
several that enabled vectorization of both loops.
Now, that GCC can vectorize the whole function, let's adjust this test
to expect vectorization of both loops by ensuring that we don't write
to the helper-array 'tmp'.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

	PR target/117079

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr105493.c: Fix expected vectorization

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

120a3700

OpenMP/C++: Fix 'declare variant' for struct-returning functions [PR118486] · b67a0d6a

Tobias Burnus authored 2 months ago

To find the variant declaration, a call is constructed in
omp_declare_variant_finalize_one, which gives here:
  TARGET_EXPR <D.3010, variant_fn ()>

Extracting now the function declaration failed and gave the bogus
  error: could not find variant declaration

Solution: Use the 2nd argument of the TARGET_EXPR and continue.

	PR c++/118486

gcc/cp/ChangeLog:

	* decl.cc (omp_declare_variant_finalize_one): When resolving
	the variant to use, handle variant calls with TARGET_EXPR.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/declare-variant-11.C: New test.

b67a0d6a

ipa: Initialize/release global obstack in process_new_functions [PR116068] · dd389c25

Jakub Jelinek authored 2 months ago

Other spots in cgraphunit.cc already call bitmap_obstack_initialize (NULL);
before running a pass list and bitmap_obstack_release (NULL); after that,
while process_new_functions wasn't doing that and with the new r15-130
bitmap_alloc checking that results in ICE.

2025-01-15  Jakub Jelinek  <jakub@redhat.com>

	PR ipa/116068
	* cgraphunit.cc (symbol_table::process_new_functions): Call
	bitmap_obstack_initialize (NULL); and bitmap_obstack_release (NULL)
	around processing the functions.

	* gcc.dg/graphite/pr116068.c: New test.

dd389c25

c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't... · 18f6bb98

Jakub Jelinek authored 2 months ago

c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't convert to its rettype [PR118387]

Note, the PR raises another problem.
If on the same testcase the B b; line is removed, we silently synthetize
operator<=> which will crash at runtime due to returning without a return
statement.  That is because the standard says that in that case
it should return static_cast<int>(std::strong_ordering::equal);
but I can't find anywhere wording which would say that if that isn't
valid, the function is deleted.
https://eel.is/c++draft/class.compare#class.spaceship-2.2
seems to talk just about cases where there are some members and their
comparison is invalid it is deleted, but here there are none and it
follows
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
So, we synthetize with tf_none, see the static_cast is invalid, don't
add error_mark_node statement silently, but as the function isn't deleted,
we just silently emit it.
Should the standard be amended to say that the operator should be deleted
even if it has no elements and the static cast from
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2

On Fri, Jan 10, 2025 at 12:04:53PM -0500, Jason Merrill wrote:
> That seems pretty obviously what we want, and is what the other compilers
> implement.

This patch implements it then.

2025-01-15  Jakub Jelinek  <jakub@redhat.com>

	PR c++/118387
	* method.cc (build_comparison_op): Set bad if
	std::strong_ordering::equal doesn't convert to rettype.

	* g++.dg/cpp2a/spaceship-err6.C: Expect another error.
	* g++.dg/cpp2a/spaceship-synth17.C: Likewise.
	* g++.dg/cpp2a/spaceship-synth-neg6.C: Likewise.
	* g++.dg/cpp2a/spaceship-synth-neg7.C: New test.

	* testsuite/25_algorithms/default_template_value.cc
	(Input::operator<=>): Use auto as return type rather than bool.

18f6bb98

c++: Fix up maybe_init_list_as_array for RAW_DATA_CST [PR118124] · 64828272

Jakub Jelinek authored 2 months ago

The previous patch made me look around some more and I found
maybe_init_list_as_array doesn't handle RAW_DATA_CSTs correctly either,
while the RAW_DATA_CST is properly split during finish_compound_literal,
it was using CONSTRUCTOR_NELTS as the size of the arrays, which is wrong,
RAW_DATA_CST could stand for far more initializers.

2025-01-15  Jakub Jelinek  <jakub@redhat.com>

	PR c++/118124
	* cp-tree.h (build_array_of_n_type): Change second argument type
	from int to unsigned HOST_WIDE_INT.
	* tree.cc (build_array_of_n_type): Likewise.
	* call.cc (count_ctor_elements): New function.
	(maybe_init_list_as_array): Use it instead of CONSTRUCTOR_NELTS.
	(convert_like_internal): Use length from init's type instead of
	len when handling the maybe_init_list_as_array case.

	* g++.dg/cpp0x/initlist-opt5.C: New test.

64828272

c++: Fix ICEs with large initializer lists or ones including #embed [PR118124] · f263f2d5

Jakub Jelinek authored 2 months ago

The following testcases ICE due to RAW_DATA_CST not being handled where it
should be during ck_list conversions.

The last 2 testcases started ICEing with r15-6339 committed yesterday
(speedup of large initializers), the first two already with r15-5958
(#embed optimization for C++).

For conversion to initializer_list<unsigned char> or char/signed char
we can optimize and keep RAW_DATA_CST with adjusted type if we report
narrowing errors if needed, for others this converts each element
separately.

2025-01-15  Jakub Jelinek  <jakub@redhat.com>

	PR c++/118124
	* call.cc (convert_like_internal): Handle RAW_DATA_CST in
	ck_list handling.  Formatting fixes.

	* g++.dg/cpp/embed-15.C: New test.
	* g++.dg/cpp/embed-16.C: New test.
	* g++.dg/cpp0x/initlist-opt3.C: New test.
	* g++.dg/cpp0x/initlist-opt4.C: New test.

f263f2d5

RISC-V: Fix code gen for reduction with length 0 [PR118182] · 40ad10f7

Kito Cheng authored 2 months ago

`.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the
return value will be the start value even if the length is 0.

However current code gen in RISC-V backend is not meet that semantic, it will
result a random garbage value if length is 0.

Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64:
        # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0);
        vsetvli zero,a5,e64,m1,ta,ma
        vfmv.s.f        v2,fa5     # insn 1
        vfredosum.vs    v1,v1,v2   # insn 2
        vfmv.f.s        fa5,v1     # insn 3

insn 1:
- vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value.
insn 2:
- vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA.
(v-spec say: `If vl=0, no operation is performed and the destination register
 is not updated.`)
insn 3:
- vfmv.f.s will move the value from v1 even VL=0, so this is safe.

So how we fix that? we need two fix for that:

1. insn 1: need always execute with VL=1, so that we can guarantee it will
           always work as expect.
2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for
           all reduction patterns, then we can guarantee vd[0] will contain the
           start value when vl=0

For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2,
we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1`
(start value).

Change since V3:
- Rename _AV to _VL0_SAFE for readability.
- Use non-VL0_SAFE version if VL is const or VLMAX.
- Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX.
- Two more testcase.

gcc/ChangeLog:

	PR target/118182
	* config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust
	argument for expand_reduction.
	(*widen_reduc_plus_scal_<mode>): Ditto.
	(*fold_left_widen_plus_<mode>): Ditto.
	(*mask_len_fold_left_widen_plus_<mode>): Ditto.
	(*cond_widen_reduc_plus_scal_<mode>): Ditto.
	(*cond_len_widen_reduc_plus_scal_<mode>): Ditto.
	(*cond_widen_reduc_plus_scal_<mode>): Ditto.
	* config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for
	expand_reduction.
	(reduc_smax_scal_<mode>): Ditto.
	(reduc_umax_scal_<mode>): Ditto.
	(reduc_smin_scal_<mode>): Ditto.
	(reduc_umin_scal_<mode>): Ditto.
	(reduc_and_scal_<mode>): Ditto.
	(reduc_ior_scal_<mode>): Ditto.
	(reduc_xor_scal_<mode>): Ditto.
	(reduc_plus_scal_<mode>): Ditto.
	(reduc_smax_scal_<mode>): Ditto.
	(reduc_smin_scal_<mode>): Ditto.
	(reduc_fmax_scal_<mode>): Ditto.
	(reduc_fmin_scal_<mode>): Ditto.
	(fold_left_plus_<mode>): Ditto.
	(mask_len_fold_left_plus_<mode>): Ditto.
	* config/riscv/riscv-v.cc (expand_reduction): Add one more
	argument for reduction code for vl0-safe.
	* config/riscv/riscv-protos.h (expand_reduction): Ditto.
	* config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of
	reduction.
	(ANY_REDUC_VL0_SAFE): New.
	(ANY_WREDUC_VL0_SAFE): Ditto.
	(ANY_FREDUC_VL0_SAFE): Ditto.
	(ANY_FREDUC_SUM_VL0_SAFE): Ditto.
	(ANY_FWREDUC_SUM_VL0_SAFE): Ditto.
	(reduc_op): Add _VL0_SAFE variant of reduction.
	(order) Ditto.
	* config/riscv/vector.md (@pred_<reduc_op><mode>): New.

gcc/testsuite/ChangeLog:

	PR target/118182
	* gfortran.target/riscv/rvv/pr118182.f: New.
	* gcc.target/riscv/rvv/autovec/pr118182-1.c: New.
	* gcc.target/riscv/rvv/autovec/pr118182-2.c: New.

40ad10f7

Fix SLP scalar costing with stmts also used in externals · 21edcb95

Richard Biener authored 2 months ago

When we have the situation of an external SLP node that is
permuted the scalar stmts recorded in the permute node do not
mean the scalar computation can be removed.  We are removing
those stmts from the vectorized_scalar_stmts for this reason
but we fail to check this set when we cost scalar stmts.  Note
vectorized_scalar_stmts isn't a complete set so also pass
scalar_stmts_in_externs and check that.

The following fixes this.

This shows in PR115777 when we avoid vectorizing the load, but
on it's own doesn't help the PR yet.

	PR tree-optimization/115777
	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Do not
	cost a scalar stmt that needs to be preserved.

21edcb95

lto: Remove link() to fix build with MinGW [PR118238] · ed123311

Michal Jires authored 2 months ago

I used link() to create cheap copies of Incremental LTO cache contents
to prevent their deletion once linking is finished.
This is unnecessary, since output_files are deleted in our lto-plugin
and not in the linker itself.

Bootstrapped/regtested on x86_64-linux.
lto-wrapper now again builds on MinGW. Though so far I have not setup
MinGW to be able to do full bootstrap.
Ok for trunk?

	PR lto/118238

gcc/ChangeLog:

	* lto-wrapper.cc (run_gcc): Remove link() copying.

lto-plugin/ChangeLog:

	* lto-plugin.c (cleanup_handler):
	Keep output_files when using Incremental LTO.
	(onload): Detect Incremental LTO.

ed123311

[RISC-V][PR target/118170] Add HF div/sqrt reservation · d6f1961e

Anton Blanchard authored 2 months ago


Clearly an oversight in the generic-ooo model caught by the checking code.  I
should have realized it was generic-ooo as we don't have a pipeline description
for the tenstorrent design yet, just the costing model.

The patch was extracted from the BZ which indicated Anton was the author, so I
kept that.  I'm listed as co-author just in case someone wants to complain
about the testcase in the future.  I didn't do any notable lifting here.

Thanks Peter and Anton!

	PR target/118170
gcc/
	* config/riscv/generic-ooo.md (generic_ooo_float_div_half): New
	reservation.

gcc/testsuite
	* gcc.target/riscv/pr118170.c: New test.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

d6f1961e

[PR rtl-optimization/109592] Simplify nested shifts · cab2e123

Richard Sandiford authored 2 months ago

> The BZ in question is a failure to recognize a pair of shifts as a sign
> extension.
>
> I originally thought simplify-rtx would be the right framework to
> address this problem, but fwprop is actually better.  We can write the
> recognizer much simpler in that framework.
>
> fwprop already simplifies nested shifts/extensions to the desired RTL,
> but it's not considered profitable and we throw away the good work done
> by fwprop & simplifiers.
>
> It's hard to see a scenario where nested shifts or nested extensions
> that simplify down to a single sign/zero extension isn't a profitable
> transformation.  So when fwprop has nested shifts/extensions that
> simplifies to an extension, we consider it profitable.
>
> This allow us to simplify the testcase on rv64 with ZBB enabled from a
> pair of shifts to a single byte or half-word sign extension.

Hmm.  So just to summarise something that was discussed in the PR
comments, this is a case where combine's expand_compound_operation/
make_compound_operation wrangler hurts us, because the process isn't
idempotent, and combine produces two complex instructions:

(insn 6 3 7 2 (set (reg:DI 137 [ _3 ])
        (ashift:DI (reg:DI 139 [ x ])
            (const_int 24 [0x18]))) "foo.c":2:20 305 {ashldi3}
     (expr_list:REG_DEAD (reg:DI 139 [ x ])
        (nil)))
(insn 12 7 13 2 (set (reg/i:DI 10 a0)
        (sign_extend:DI (ashiftrt:SI (subreg:SI (reg:DI 137 [ _3 ]) 0)
                (const_int 24 [0x18])))) "foo.c":2:27 321 {ashrsi3_extend}
     (expr_list:REG_DEAD (reg:DI 137 [ _3 ])
        (nil)))

given two simple instructions:

(insn 6 3 7 2 (set (reg:SI 137 [ _3 ])
        (sign_extend:SI (subreg:QI (reg/v:DI 136 [ x ]) 0))) "foo.c":2:20 533 {*extendqisi2_bitmanip}
     (expr_list:REG_DEAD (reg/v:DI 136 [ x ])
        (nil)))
(insn 7 6 12 2 (set (reg:DI 138 [ _3 ])
        (sign_extend:DI (reg:SI 137 [ _3 ]))) "foo.c":2:20 discrim 1 133 {*extendsidi2_internal}
     (expr_list:REG_DEAD (reg:SI 137 [ _3 ])
        (nil)))

If I run with -fdisable-rtl-combine then late_combine1 already does the
expected transformation.

Although it would be nice to fix combine, that might be difficult.
If we treat combine as immutable then the options are:

(1) Teach simplify-rtx to simplify combine's output into a single sign_extend.

(2) Allow fwprop1 to get in first, before combine has a chance to mess
    things up.

The patch goes for (2).

Is that a fair summary?

Playing devil's advocate, I suppose one advantage of (1) is that it
would allow the optimisation even if the original rtl looked like
combine's output.  And fwprop1 doesn't distinguish between cases in
which the source instruction disappears from cases in which the source
instruction is kept.  Thus we could transform:

  (set (reg:SI R2) (sign_extend:SI (reg:QI R1)))
  (set (reg:DI R3) (sign_extend:DI (reg:SI R2)))

into:

  (set (reg:SI R2) (sign_extend:SI (reg:QI R1)))
  (set (reg:DI R3) (sign_extend:DI (reg:QI R1)))

which increases the register pressure between the two instructions
(since R2 and R1 are both now live).  In general, there could be
quite a gap between the two instructions.

On the other hand, even in that case, fwprop1 would be parallelising
the extensions.  And since we're talking about unary operations,
even two-address targets would allow R1 to be extended without
tying the source and destination.

Also, it seems relatively unlikely that expand would produce code
that looks like combine's, since the gimple optimisers should have
simplified it into conversions.

So initially I was going to agree that it's worth trying in fwprop.  But...

[ commentary on Jeff's original approach dropped. ]

So it seems like it's a bit of a mess 

If we do try to fix combine, I think something like the attached
would fit within the current scheme.  It is a pure shift-for-shift
transformation, avoiding any extensions.

Will think more about it, but wanted to get the above stream of
consciousness out before I finish for the day 



	PR rtl-optimization/109592
gcc/
	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
	Simplify nested shifts with subregs.

gcc/testsuite
	* gcc.target/riscv/pr109592.c: New test.
	* gcc.target/riscv/sign-extend-rshift.c: Adjust expected output

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

cab2e123

Daily bump. · 3b3b3f88
GCC Administrator authored 2 months ago

3b3b3f88

Jan 14, 2025

c++: dump-lang-raw with obj_type_ref fields · 6e0b048f

anetczuk authored 2 months ago

Raw dump of lang tree was missing information about virtual method call.
The information is provided in "tok" field of obj_type_ref.

gcc/ChangeLog:

	* tree-dump.cc (dequeue_and_dump): Handle OBJ_TYPE_REF.

gcc/testsuite/ChangeLog:

	* g++.dg/diagnostic/lang-dump-1.C: New test.

6e0b048f

d: Merge upstream dmd, druntime d6f693b46a, phobos 336bed6d8. · c8894b68

Iain Buclaw authored 2 months ago

D front-end changes:

	- Import latest fixes from dmd v2.110.0-rc.1.

D runtime changes:

	- Import latest fixes from druntime v2.110.0-rc.1.

Phobos changes:

	- Import latest fixes from phobos v2.110.0-rc.1.

Included in the merge are fixes for the following PRs:

	PR d/118438
	PR d/118448
	PR d/118449

gcc/d/ChangeLog:

	* dmd/MERGE: Merge upstream dmd d6f693b46a.
	* d-incpath.cc (add_import_paths): Update for new front-end interface.

libphobos/ChangeLog:

	* libdruntime/MERGE: Merge upstream druntime d6f693b46a.
	* src/MERGE: Merge upstream phobos 336bed6d8.
	* testsuite/libphobos.init_fini/custom_gc.d: Adjust test.

c8894b68

[ifcombine] robustify decode_field_reference · 5006b9d8

Alexandre Oliva authored 2 months ago

Arrange for decode_field_reference to use local variables throughout,
to modify the out parms only when we're about to return non-NULL, and
to drop the unused case of NULL pand_mask, that had a latent failure
to detect signbit masking.


for  gcc/ChangeLog

	* gimple-fold.cc (decode_field_reference): Rebustify to set
	out parms only when returning non-NULL.
	(fold_truth_andor_for_ifcombine): Bail if
	decode_field_reference returns NULL.  Add complementary assert
	on r_const's not being set when l_const isn't.

5006b9d8

c++: re-enable NSDMI CONSTRUCTOR folding [PR118355] · e939005c

Marek Polacek authored 2 months ago


In c++/102990 we had a problem where massage_init_elt got {},
digest_nsdmi_init turned that {} into { .value = (int) 1.0e+0 },
and we crashed in the call to fold_non_dependent_init because
a FIX_TRUNC_EXPR/FLOAT_EXPR got into tsubst*.  So we avoided
calling fold_non_dependent_init for a CONSTRUCTOR.

But that broke the following test, where we no longer fold the
CONST_DECL in
  { .type = ZERO }
to
  { .type = 0 }
and then process_init_constructor_array does:

            if (next != error_mark_node
                && (initializer_constant_valid_p (next, TREE_TYPE (next))
                    != null_pointer_node))
              {
                /* Use VEC_INIT_EXPR for non-constant initialization of
                   trailing elements with no explicit initializers.  */
                picflags |= PICFLAG_VEC_INIT;

because { .type = ZERO } isn't initializer_constant_valid_p.  Then we
create a VEC_INIT_EXPR and say we can't convert the argument.

So we have to fold the elements of the CONSTRUCTOR.  We just can't
instantiate the elements in a template.

This also fixes c++/118047.

	PR c++/118047
	PR c++/118355

gcc/cp/ChangeLog:

	* typeck2.cc (massage_init_elt): Call fold_non_dependent_init
	unless for a CONSTRUCTOR in a template.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/nsdmi-list10.C: New test.
	* g++.dg/cpp0x/nsdmi-list9.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

e939005c

OpenMP: Remove dead code from declare variant reimplementation · d27db303

Sandra Loosemore authored 2 months ago

After reimplementing late resolution of "declare variant", the
declare_variant_alt and calls_declare_variant_alt flags on struct
cgraph_node are no longer used by anything.  For the purposes of
marking functions that need late resolution, the
has_omp_variant_constructs flag has replaced
calls_declare_variant_alt.

Likewise struct omp_declare_variant_entry, struct
omp_declare_variant_base_entry, and the hash tables used to store
these structures are no longer needed, since the information needed for
late resolution is now stored in the gomp_variant_construct nodes.

In addition, some obsolete code that was temporarily ifdef'ed out
instead of delted in order to produce a more readable patch for the
previous installment of this series is now removed entirely.

There are no functional changes in this patch, just removing dead code.

gcc/ChangeLog
	* cgraph.cc (symbol_table::create_edge): Don't set
	calls_declare_variant_alt in the caller.
	* cgraph.h (struct cgraph_node): Remove declare_variant_alt
	and calls_declare_variant_alt flags.
	* cgraphclones.cc (cgraph_node::create_clone): Don't copy
	calls_declare_variant_alt bit.
	* gimplify.cc: Remove previously #ifdef-ed out code.
	* ipa-free-lang-data.cc (free_lang_data_in_decl): Adjust code
	referencing declare_variant_alt bit.
	* ipa.cc (symbol_table::remove_unreachable_nodes): Likewise.
	* lto-cgraph.cc (lto_output_node): Remove references to deleted
	bits.
	(output_refs): Adjust code referencing declare_variant_alt bit.
	(input_overwrite_node): Remove references to deleted bits.
	(input_refs): Adjust code referencing declare_variant_alt bit.
	* lto-streamer-out.cc (lto_output): Likewise.
	* lto-streamer.h (omp_lto_output_declare_variant_alt): Delete.
	(omp_lto_input_declare_variant_alt): Delete.
	* omp-expand.cc (expand_omp_target): Use has_omp_variant_constructs
	bit to trigger pass_omp_device_lower instead of
	calls_declare_variant_alt.
	* omp-general.cc (struct omp_declare_variant_entry): Delete.
	(struct omp_declare_variant_base_entry): Delete.
	(struct omp_declare_variant_hasher): Delete.
	(omp_declare_variant_hasher::hash): Delete.
	(omp_declare_variant_hasher::equal): Delete.
	(omp_declare_variants): Delete.
	(omp_declare_variant_alt_hasher): Delete.
	(omp_declare_variant_alt_hasher::hash): Delete.
	(omp_declare_variant_alt_hasher::equal): Delete.
	(omp_declare_variant_alt): Delete.
	(omp_lto_output_declare_variant_alt): Delete.
	(omp_lto_input_declare_variant_alt): Delete.
	(includes): Delete unnecessary include of gt-omp-general.h.
	* omp-offload.cc (execute_omp_device_lower): Remove references
	to deleted bit.
	(pass_omp_device_lower::gate): Likewise.
	* omp-simd-clone.cc (simd_clone_create): Likewise.
	* passes.cc (ipa_write_summaries): Likeise.
	* symtab.cc (symtab_node::get_partitioning_class): Likewise.
	* tree-inline.cc (expand_call_inline): Likewise.
	(tree_function_versioning): Likewise.

gcc/lto/ChangeLog
	* lto-partition.cc (lto_balanced_map): Adjust code referencing
	deleted declare_variant_alt bit.

d27db303