Commits · fdd95e1cf29137a19baed25f8c817d320dfe63e3 · COBOLworx / gcc-cobol

Mar 06, 2025

lto/114501 - missed free-lang-data for CONSTRUCTOR index · fdd95e1c

Richard Biener authored 1 week ago

The following makes sure to also walk CONSTRUCTOR element indexes
which can be FIELD_DECLs, referencing otherwise unused types we
need to clean.  walk_tree only walks CONSTRUCTOR element data.

	PR lto/114501
	* ipa-free-lang-data.cc (find_decls_types_r): Explicitly
	handle CONSTRUCTORs as walk_tree handling of those is
	incomplete.

	* g++.dg/pr114501_0.C: New testcase.

fdd95e1c

pair-fusion: Add singleton move_range asserts [PR114492] · d6d7da92

Alex Coplan authored 1 week ago

The PR claims that pair-fusion has invalid uses of gcc_assert (such that
the pass will misbehave with --disable-checking).  As noted in the
comments, in the case of the calls to restrict_movement, the only way we
can possibly depend on the side effects is if we call it with a
non-singleton move range.  However, the intent is that we always have a
singleton move range here, and thus we do not rely on the side effects.

This patch therefore adds asserts to check for a singleton move range
before calling restrict_movement, thus clarifying the intent and
hopefully dispelling any concerns that having the calls wrapped in
asserts is problematic here.

gcc/ChangeLog:

	PR rtl-optimization/114492
	* pair-fusion.cc (pair_fusion_bb_info::fuse_pair): Check for singleton
	move range before calling restrict_movement.
	(pair_fusion::try_promote_writeback): Likewise.

d6d7da92

ira: Add new hooks for callee-save vs spills [PR117477] · e836d803

Richard Sandiford authored 1 week ago

Following on from the discussion in:

  https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html



this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
replaces it with two hooks: one that controls the cost of using an
extra callee-saved register and one that controls the cost of allocating
a frame for the first spill.

(The patch does not attempt to address the shrink-wrapping part of
the thread above.)

On AArch64, this is enough to fix PR117477, as verified by the new tests.
The patch does not change the SPEC2017 scores significantly.  (I saw a
slight improvement in fotonik3d and roms, but I'm not convinced that
the improvements are real.)

The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
which is a scan-dump correctness test that relies on not using
caller saves.  The decision to use caller saves looks appropriate,
and saves an instruction, so I've just added -fno-caller-saves
to the test options.

The x86 parts were written by Honza.

gcc/
	PR rtl-optimization/117477
	* config/aarch64/aarch64.cc (aarch64_count_saves): New function.
	(aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
	(aarch64_frame_allocation_cost): Likewise.
	(TARGET_CALLEE_SAVE_COST): Define.
	(TARGET_FRAME_ALLOCATION_COST): Likewise.
	* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
	Replace with...
	(ix86_callee_save_cost): ...this new hook.
	(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
	(TARGET_CALLEE_SAVE_COST): Define.
	* target.h (spill_cost_type, frame_cost_type): New enums.
	* target.def (callee_save_cost, frame_allocation_cost): New hooks.
	(ira_callee_saved_register_cost_scale): Delete.
	* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
	(TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
	* doc/tm.texi: Regenerate.
	* hard-reg-set.h (hard_reg_set_popcount): New function.
	* ira-color.cc (allocated_memory_p): New variable.
	(allocated_callee_save_regs): Likewise.
	(record_allocation): New function.
	(assign_hard_reg): Use targetm.frame_allocation_cost to model
	the cost of the first spill or first caller save.  Use
	targetm.callee_save_cost to model the cost of using new callee-saved
	registers.  Apply the exit rather than entry frequency to the cost
	of restoring a register or deallocating the frame.  Update the
	new variables above.
	(improve_allocation): Use record_allocation.
	(color): Initialize allocated_callee_save_regs.
	(ira_color): Initialize allocated_memory_p.
	* targhooks.h (default_callee_save_cost): Declare.
	(default_frame_allocation_cost): Likewise.
	* targhooks.cc (default_callee_save_cost): New function.
	(default_frame_allocation_cost): Likewise.

gcc/testsuite/
	PR rtl-optimization/117477
	* gcc.target/aarch64/callee_save_1.c: New test.
	* gcc.target/aarch64/callee_save_2.c: Likewise.
	* gcc.target/aarch64/callee_save_3.c: Likewise.
	* gcc.target/aarch64/pr103350-1.c: Add -fno-caller-saves.

Co-authored-by: Jan Hubicka <hubicka@ucw.cz>

e836d803

middle-end/119119 - re-gimplification of empty CTOR assignments · 3bd61c1d

Richard Biener authored 1 week ago

The following testcase runs into a re-gimplification issue during
inlining when processing

  MEM[(struct e *)this_2(D)].a = {};

where re-gimplification does not handle assignments in the same
way than the gimplifier but instead relies on rhs_predicate_for
and gimplifying the RHS standalone.  This fails to handle
special-casing of CTORs.  The is_gimple_mem_rhs_or_call predicate
already handles clobbers but not empty CTORs so we end up in
the fallback code trying to force the CTOR into a separate stmt
using a temporary - but as we have a non-copyable type here that ICEs.

The following generalizes empty CTORs in is_gimple_mem_rhs_or_call
since those need no additional re-gimplification.

	PR middle-end/119119
	* gimplify.cc (is_gimple_mem_rhs_or_call): All empty CTORs
	are OK when not a register type.

	* g++.dg/torture/pr11911.C: New testcase.

3bd61c1d

c++: Don't replace INDIRECT_REFs by a const capture proxy too eagerly [PR117504] · fdf846fd

Simon Martin authored 1 week ago

We have been miscompiling the following valid code since GCC8, and
r8-3497-g281e6c1d8f1b4c

=== cut here ===
struct span {
  span (const int (&__first)[1]) : _M_ptr (__first) {}
  int operator[] (long __i) { return _M_ptr[__i]; }
  const int *_M_ptr;
};
void foo () {
  constexpr int a_vec[]{1};
  auto vec{[&a_vec]() -> span { return a_vec; }()};
}
=== cut here ===

The problem is that perform_implicit_conversion_flags (via
mark_rvalue_use) replaces "a_vec" in the return statement by a
CONSTRUCTOR representing a_vec's constant value, and then takes its
address when invoking span's constructor. So we end up with an instance
that points to garbage instead of a_vec's storage.

As per Jason's suggestion, this patch simply removes the calls to
mark_*_use from perform_implicit_conversion_flags, which fixes the PR.

	PR c++/117504

gcc/cp/ChangeLog:

	* call.cc (perform_implicit_conversion_flags): Don't call
	mark_{l,r}value_use.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/constexpr-117504.C: New test.
	* g++.dg/cpp2a/constexpr-117504a.C: New test.

fdf846fd

RISC-V: Tweak asm check for test case multiple_rgroup_zbb.c · 0aa9b079

Pan Li authored 1 week ago


The changes to vsetvl pass since 14 result in the asm check failure,
update the asm check to meet the newest behavior.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_zbb.c: Tweak
	the asm check for vsetvl.

Signed-off-by: Pan Li <pan2.li@intel.com>

0aa9b079

Improve coverage of ext-dce tests in risc-v testsuite · 316eaca1

Jeff Law authored 1 week ago

Inspired by Liao Shihua, this adjusts two tests in the RISC-V testsuite
to get more coverage.  Drop the -O1 argument and replace it with -fext-dce.
That way the test gets run across the full set of flags.  We just need to
make sure to skip -O0.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/core_list_init.c: Use -fext-dce rather than
	-O1.  Skip for -O0.
	* gcc.target/riscv/pr111384.c: Ditto.

316eaca1

Daily bump. · da8aaa77
GCC Administrator authored 1 week ago

da8aaa77

Mar 05, 2025

PR modula2/118998 Rotate of a packetset causes different types to binary operator error · 1b43154b

Gaius Mulley authored 1 week ago


This patch allow a packedset to be rotated by the system module intrinsic
procedure function.  It ensures that both operands to the tree rotate are
of the same type.  In turn the result will be the same type and the
assignment into the designator (of the same set type) will succeed.

gcc/m2/ChangeLog:

	PR modula2/118998
	* gm2-gcc/m2expr.cc (m2expr_BuildLRotate): Convert nBits
	to the return type.
	(m2expr_BuildRRotate): Ditto.
	(m2expr_BuildLogicalRotate): Convert op3 to an integer type.
	Replace op3 aith rotateCount.
	Negate rotateCount if it is negative and call rotate right.
	* gm2-gcc/m2pp.cc (m2pp_bit_and_expr): New function.
	(m2pp_binary_function): Ditto.
	(m2pp_simple_expression): BIT_AND_EXPR new case clause.
	LROTATE_EXPR ditto.
	RROTATE_EXPR ditto.

gcc/testsuite/ChangeLog:

	PR modula2/118998
	* gm2/iso/pass/testrotate.mod: New test.
	* gm2/pim/fail/tinyconst.mod: New test.
	* gm2/sets/run/pass/simplepacked.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

1b43154b

Regenerate fortran/lang.opt.urls · 3c02d195

Mark Wielaard authored 1 week ago

fortran added a new -Wexternal-argument-mismatch option, but the
lang.opt.urls file wasn't regenerated.

Fixes: 21ca9153 ("C prototypes for external arguments; add warning for mismatch.")

gcc/fortran/ChangeLog:

	* lang.opt.urls: Regenerated.

3c02d195

c++: disable -Wnonnull in unevaluated context [PR115580] · 459c8a55

Marek Polacek authored 1 week ago


This PR complains that we issue a -Wnonnull even in a decltype.
This fix disables even -Wformat and -Wrestrict.  I think that's fine.

	PR c++/115580

gcc/c-family/ChangeLog:

	* c-common.cc (check_function_arguments): Return early if
	c_inhibit_evaluation_warnings.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/Wnonnull16.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

459c8a55

c++: coroutines and return in registers [PR118874] · 7e576d5b

Jason Merrill authored 1 week ago


Because coroutines insert a call to the resumer between the initialization
of the return value and the actual return to the caller, we need to
duplicate the work of gimplify_return_expr for the !aggregate_value_p case.

	PR c++/117364
	PR c++/118874

gcc/cp/ChangeLog:

	* coroutines.cc (cp_coroutine_transform::build_ramp_function): For
	!aggregate_value_p return force return value into a local temp.

gcc/testsuite/ChangeLog:

	* g++.dg/coroutines/torture/pr118874.C: New test.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>

7e576d5b

arm: Fix signedness of vld1q intrinsic parms [PR118942] · 4d0a333e

Hannes Braun authored 3 weeks ago


vld1q_s8_x3, vld1q_s16_x3, vld1q_s8_x4 and vld1q_s16_x4 were expecting
pointers to unsigned integers. These parameters should be pointers to
signed integers.

gcc/ChangeLog:
	PR target/118942
	* config/arm/arm_neon.h (vld1q_s8_x3): Use int8_t instead of
	uint16_t.
	(vld1q_s16_x3): Use int16_t instead of uint16_t.
	(vld1q_s8_x4): Likewise.
	(vld1q_s16_x4): Likewise.

gcc/testsuite/ChangeLog:
	PR target/118942
	* gcc.target/arm/simd/vld1q_base_xN_1.c: Add -Wpointer-sign.

Signed-off-by: Hannes Braun <hannes@hannesbraun.net>

4d0a333e

c++: Check invalid use of constrained auto with trailing return type [PR100589] · 7439febd

Da Xie authored 2 weeks ago


Add check for constrained auto type specifier in function declaration or
function type declaration with trailing return type. Issue error if such
usage is detected.

Test file renamed, and added a new test for type declaration.

Successfully bootstrapped and regretested on x86_64-pc-linux-gnu:
Added 6 passed and 4 unsupported tests.

	PR c++/100589

gcc/cp/ChangeLog:

	* decl.cc (grokdeclarator): Issue an error for a declarator with
	constrained auto type specifier and trailing return types. Include
	function names if available.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-pr100589.C: New test.

Signed-off-by: Da Xie <xxie_xd@163.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

7439febd

PR rtl-optimization/119046: aarch64: Fix PARALLEL mode for vec_perm DUP expansion · ff505948

Kyrylo Tkachov authored 1 week ago


The PARALLEL created in aarch64_evpc_dup is used to hold the lane number.
It is not appropriate for it to have a vector mode.
Other such uses use VOIDmode.
Do this here as well.
This avoids the risk of generic code treating the PARALLEL as trapping when it
has floating-point mode.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

	PR rtl-optimization/119046
	* config/aarch64/aarch64.cc (aarch64_evpc_dup): Use VOIDmode for
	PARALLEL.

ff505948

PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping · db764821

Kyrylo Tkachov authored 2 weeks ago


In this testcase late-combine was failing to merge:
        dup     v31.4s, v31.s[3]
        fmla    v30.4s, v31.4s, v29.4s
into the lane-wise fmla form.
This is because late-combine checks may_trap_p under the hood on the dup insn.
This ended up returning true for the insn:
(set (reg:V4SF 152 [ _32 ])
        (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ])
                (parallel:V4SF [
                        (const_int 3 [0x3])]))))

Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
floating-point modes can't trap, it assumed that the V4SF parallel can trap.
The correct behaviour is to recurse into vector inside the PARALLEL and check
the sub-expression.  This patch adjusts may_trap_p_1 to do just that.
With this check the above insn is not deemed to be trapping and is propagated
into the FMLA giving:
        fmla    vD.4s, vA.4s, vB.s[3]

Bootstrapped and tested on aarch64-none-linux-gnu.
Apparently this also fixes a regression in
gcc.target/aarch64/vmul_element_cost.c that I observed.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

gcc/

	PR rtl-optimization/119046
	* rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping.

gcc/testsuite/

	PR rtl-optimization/119046
	* gcc.target/aarch64/pr119046.c: New test.

db764821

value-range: Fix up irange::union_bitmask [PR118953] · 54da358f

Jakub Jelinek authored 1 week ago

The following testcase is miscompiled during evrp.
Before vrp, we have (from ccp):
  # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x2d
  _3 = _2 + 18446744073708503085;
...
  # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x59
  _6 = (long long unsigned int) _5;
  # RANGE [irange] int [-INF, +INF] MASK 0xffffc000 VALUE 0x34
  _7 = k_11 + -1048524;
  switch (_7) <default: <L5> [33.33%], case 8: <L7> [33.33%], case 24: <L6> [33.33%], case 32: <L6> [33.33%]>
...
  # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc07d VALUE 0x0
  # i_20 = PHI <_3(4), 0(3), _6(2)>
and evrp is now trying to figure out range for i_20 in range_of_phi.

All the ranges and MASK/VALUE pairs above are correct for the testcase,
k_11 and _2 based on it is a result of multiplication by a constant with low
14 bits cleared and then some numbers are added to it.

There is an obvious missed optimization for which I've filed PR119039,
simplify_switch_using_ranges could see that all the labels but default
are unreachable because the controlling expression has
MASK 0xffffc000 VALUE 0x34 and none of 8, 24 and 32 satisfy that.

Anyway, during range_of_phi for i_20, we process the PHI arguments
in order.  For the _3(4) case, we figure out that it is reachable
through the case 24: case 32: labels only of the switch and that
0x34 - 0x2d is 7, so derive
[irange] long long unsigned int [17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d
(the MASK/VALUE just got inherited from the _3 earlier range).
Now (not suprisingly because those labels aren't actually reachable),
that range is inconsistent, 0x2d is 45, so there is conflict between the
values and the irange_bitmask.
value-range.{h,cc} code differentiates between actually stored
irange_bitmask, which is that MASK 0xffffffffffffc000 VALUE 0x2d, and
semantic bitmask, which is what get_bitmask returns.  That is
  // The mask inherent in the range is calculated on-demand.  For
  // example, [0,255] does not have known bits set by default.  This
  // saves us considerable time, because setting it at creation incurs
  // a large penalty for irange::set.  At the time of writing there
  // was a 5% slowdown in VRP if we kept the mask precisely up to date
  // at all times.  Instead, we default to -1 and set it when
  // explicitly requested.  However, this function will always return
  // the correct mask.
  //
  // This also means that the mask may have a finer granularity than
  // the range and thus contradict it.  Think of the mask as an
  // enhancement to the range.  For example:
  //
  // [3, 1000] MASK 0xfffffffe VALUE 0x0
  //
  // 3 is in the range endpoints, but is excluded per the known 0 bits
  // in the mask.
  //
  // See also the note in irange_bitmask::intersect.
  irange_bitmask bm
    = get_bitmask_from_range (type (), lower_bound (), upper_bound ());
  if (!m_bitmask.unknown_p ())
    bm.intersect (m_bitmask);
Now, get_bitmask_from_range here is MASK 0x1f VALUE 0x0 and it intersects
that with that MASK 0xffffffffffffc000 VALUE 0x2d.
Which triggers the ugly special case in irange_bitmask::intersect:
  // If we have two known bits that are incompatible, the resulting
  // bit is undefined.  It is unclear whether we should set the entire
  // range to UNDEFINED, or just a subset of it.  For now, set the
  // entire bitmask to unknown (VARYING).
  if (wi::bit_and (~(m_mask | src.m_mask),
                   m_value ^ src.m_value) != 0)
    {
      unsigned prec = m_mask.get_precision ();
      m_mask = wi::minus_one (prec);
      m_value = wi::zero (prec);
    }
so the semantic bitmask is actually MASK 0xffffffffffffffff VALUE 0x0.

Next, range_of_phi attempts to union it with the 0(3) PHI argument,
and during irange::union_ first adds the [0,0] to the subranges, so
[irange] long long unsigned int [0, 0][17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d
and then goes on to irange::union_bitmask which does
  if (m_bitmask == r.m_bitmask)
    return false;
  irange_bitmask bm = get_bitmask ();
  irange_bitmask save = bm;
  bm.union_ (r.get_bitmask ());
  if (save == bm)
    return false;
  m_bitmask = bm;
  if (save == get_bitmask ())
    return false;
m_bitmask MASK 0xffffffffffffc000 VALUE 0x2d isn't the same as
r.m_bitmask MASK 0x0 VALUE 0x0, so we compute the semantic bitmask
(but note, not from the original range before union, but the modified one,
dunno if that isn't a problem as well), which is still the VARYING/unknown_p
one, union_ that with MASK 0x0 VALUE 0x0 and get still
MASK 0xffffffffffffffff VALUE 0x0, so don't update anything, the semantic
bitmask didn't change, so we are fine (not!, see later).

Except then we try to union with the third PHI argument.  And, because the
edge to that comes only from case 8: label and there is a known difference
between the two, the argument is actually already from earlier replaced by
45(2) constant.  So, irange::union_ adds the [45, 45] range to the list
of subranges, but voila, 45 is 0x2d and satisfies the stored
MASK 0xffffffffffffc000 VALUE 0x2d and so the semantic bitmask changed to
from MASK 0xffffffffffffffff VALUE 0x0 to MASK 0xffffffffffffc000 VALUE 0x2d
by that addition.  Eventually, we just optimize this to
[irange] long long unsigned int [45, 45] because that is the only range
which satisfies the bitmask.  And that is wrong, at runtime i_20 has
value 0.

The following patch attempts to detect this case where get_bitmask
turns some non-VARYING m_bitmask into VARYING one because of a conflict
and in that case makes sure m_bitmask is actually updated rather than
unmodified, so that later union_ doesn't cause problems.

I also wonder whether e.g. get_bitmask couldn't have special case for this
and if bm.intersect (m_bitmask); yields unknown_p from something not
originally unknown_p, perhaps chooses to just use get_bitmask_from_range
value and ignore the stored m_bitmask.  Though, dunno how union_bitmask
in that case would figure out it needs to update m_bitmask.

2025-03-05  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/118953
	* value-range.cc (irange::union_bitmask): Update m_bitmask if
	get_bitmask () is unknown_p and m_bitmask is not even when the
	semantic bitmask didn't change and returning false.

	* gcc.dg/torture/pr118953.c: New test.

54da358f

middle-end/97323 - TYPE_CANONICAL vs. ARRAY_TYPE modes · 556e25f0

Richard Biener authored 1 week ago

For strict-alignment targets we can end up with BLKmode single-element
array types when the element type is unaligned.  This confuses
type checking since the canonical type would have an aligned
element type and a non-BLKmode mode.  The following simply ignores
the mode we assign to array types for this purpose, like we already
do for record and union types.

	PR middle-end/97323
	* tree.cc (gimple_canonical_types_compatible_p): Ignore
	TYPE_MODE also for ARRAY_TYPE.
	(verify_type): Likewise.

	* gcc.dg/pr97323.c: New testcase.

556e25f0

Fortran: Add view convert to pointer assign when only pointer/alloc attr differs [PR104684] · 705ae582

Andre Vehreschild authored 1 week ago

	PR fortran/104684

gcc/fortran/ChangeLog:

	* trans-array.cc (gfc_conv_expr_descriptor): Look at the
	lang-specific akind and do a view convert when only the akind
	attribute differs between pointer and allocatable array.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/ptr_comp_6.f08: New test.

705ae582

c++: Fix checking assert upon invalid class definition [PR116740] · b3d07822

Simon Martin authored 1 week ago

A checking assert triggers upon the following invalid code since
GCC 11:

=== cut here ===
class { a (struct b;
} struct b
=== cut here ===

The problem is that during error recovery, we call
set_identifier_type_value_with_scope for B in the global namespace, and
the checking assert added via r11-7228-g8f93e1b892850b fails.

This patch relaxes that assert to not fail if we've seen a parser error
(it a generalization of another fix done to that checking assert via
r11-7266-g24bf79f1798ad1).

	PR c++/116740

gcc/cp/ChangeLog:

	* name-lookup.cc (set_identifier_type_value_with_scope): Don't
	fail assert with ill-formed input.

gcc/testsuite/ChangeLog:

	* g++.dg/parse/crash80.C: New test.

b3d07822

openmp, c++: Fix up OpenMP/OpenACC handling in C++ modules [PR119102] · ddeb7054

Jakub Jelinek authored 1 week ago

modules.cc has apparently support for extensions and attempts to ensure
that if a module is compiled with those extensions enabled, sources which
use the module are compiled with the same extensions.
The only extension supported is SE_OPENMP right now.
And the use of the extension is keyed on streaming out or in OMP_CLAUSE
tree.
This is undesirable for several reasons.
OMP_CLAUSE is the only tree which can appear in the IL even without
-fopenmp/-fopenmp-simd/-fopenacc (when simd ("notinbranch") or
simd ("inbranch") attributes are used), and it can appear also in all
the 3 modes mentioned above.  On the other side, with the exception of
arguments of attributes added e.g. for declare simd where no harm should
be done if -fopenmp/-fopenmp-simd isn't enabled later on, OMP_CLAUSE appears
in OMP_*_CLAUSES of OpenMP/OpenACC construct trees.  And those construct
trees often have no clauses at all, so keying the extension on OMP_CLAUSE
doesn't catch many cases that should be caught.
Furthermore, for OpenMP we have 2 modes, -fopenmp-simd which parses some
OpenMP but constructs from that mostly OMP_SIMD and a few other cases,
and -fopenmp which includes that and far more on top of that; and there is
also -fopenacc.

So, this patch stops setting/requesting the extension on OMP_CLAUSE,
introduces 3 extensions rather than one (SE_OPENMP_SIMD, SE_OPENMP and
SE_OPENACC) and keyes those on OpenMP constructs from the -fopenmp-simd
subset, other OpenMP constructs and OpenACC constructs.

2025-03-05  Jakub Jelinek  <jakub@redhat.com>

	PR c++/119102
gcc/cp/
	* module.cc (enum streamed_extensions): Add SE_OPENMP_SIMD
	and SE_OPENACC, change value of SE_OPENMP and SE_BITS.
	(CASE_OMP_SIMD_CODE, CASE_OMP_CODE, CASE_OACC_CODE): Define.
	(trees_out::start): Don't set SE_OPENMP extension for OMP_CLAUSE.
	Set SE_OPENMP_SIMD extension for CASE_OMP_SIMD_CODE, SE_OPENMP
	for CASE_OMP_CODE and SE_OPENACC for CASE_OACC_CODE.
	(trees_in::start): Don't fail for OMP_CLAUSE with missing
	SE_OPENMP extension.  Do fail for CASE_OMP_SIMD_CODE and missing
	SE_OPENMP_SIMD extension, or CASE_OMP_CODE and missing SE_OPENMP
	extension, or CASE_OACC_CODE and missing SE_OPENACC extension.
	(module_state::write_readme): Write all of SE_OPENMP_SIMD, SE_OPENMP
	and SE_OPENACC extensions.
	(module_state::read_config): Diagnose missing -fopenmp, -fopenmp-simd
	and/or -fopenacc depending on extensions used.
gcc/testsuite/
	* g++.dg/modules/pr119102_a.H: New test.
	* g++.dg/modules/pr119102_b.C: New test.
	* g++.dg/modules/omp-3_a.C: New test.
	* g++.dg/modules/omp-3_b.C: New test.
	* g++.dg/modules/omp-3_c.C: New test.
	* g++.dg/modules/omp-3_d.C: New test.
	* g++.dg/modules/oacc-1_a.C: New test.
	* g++.dg/modules/oacc-1_b.C: New test.
	* g++.dg/modules/oacc-1_c.C: New test.

ddeb7054

c++: Fix a comment typo · b85b405e

Jakub Jelinek authored 1 week ago

During the 118874 coro investigation I found a typo in a comment.

Fixed thusly.

2025-03-05  Jakub Jelinek  <jakub@redhat.com>

	* typeck.cc (check_return_expr): Fix comment typo, rom -> from.

b85b405e

c++: Apply/diagnose attributes when instatiating ARRAY/POINTER/REFERENCE_TYPE [PR118787] · 1853b02d

Jakub Jelinek authored 1 week ago

The following testcase IMO in violation of the P2552R3 paper doesn't
pedwarn on alignas applying to dependent types or alignas with dependent
argument.

tsubst was just ignoring TYPE_ATTRIBUTES.

The following patch fixes it for the POINTER/REFERENCE_TYPE and
ARRAY_TYPE cases, but perhaps we need to do the same also for other
types (INTEGER_TYPE/REAL_TYPE and the like).  I guess I'll need to
construct more testcases.

2025-03-05  Jakub Jelinek  <jakub@redhat.com>

	PR c++/118787
	* pt.cc (tsubst) <case ARRAY_TYPE>: Use return t; only if it doesn't
	have any TYPE_ATTRIBUTES.  Call apply_late_template_attributes.
	<case POINTER_TYPE, case REFERENCE_TYPE>: Likewise.  Formatting fix.

	* g++.dg/cpp0x/alignas22.C: New test.

1853b02d

LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084] · 4856292f

Xi Ruoyao authored 2 weeks ago

They could be incorrectly reordered with store instructions like st.b
because the RTL expression does not have a memory_operand or a (mem)
expression.  The incorrect reorder has been observed in openh264 LTO
build.

Expand them to a (mem) expression instead of unspec to fix the issue.
Then we need to make loongarch_address_insns return 1 for
ADDRESS_REG_REG because the constraint "R" expects this behavior, or
the vldx instruction will be considered invalid by the register
allocate pass and turned to add.d + vld.  Apply the ADDRESS_REG_REG
penalty in loongarch_address_cost instead, loongarch_rtx_costs should
also call loongarch_address_cost instead of loongarch_address_insns
then.

Closes: https://github.com/cisco/openh264/issues/3857

gcc/ChangeLog:

	PR target/119084
	* config/loongarch/lasx.md (UNSPEC_LASX_XVLDX): Remove.
	(lasx_xvldx): Remove.
	* config/loongarch/lsx.md (UNSPEC_LSX_VLDX): Remove.
	(lsx_vldx): Remove.
	* config/loongarch/simd.md (QIVEC): New define_mode_iterator.
	(<simd_isa>_<x>vldx): New define_expand.
	* config/loongarch/loongarch.cc (loongarch_address_insns_1): New
	static function with most logic factored out from ...
	(loongarch_address_insns): ... here.  Call
	loongarch_address_insns_1 with reg_reg_cost = 1.
	(loongarch_address_cost): Call loongarch_address_insns_1 with
	reg_reg_cost = la_addr_reg_reg_cost.

gcc/testsuite/ChangeLog:

	PR target/119084
	* gcc.target/loongarch/pr119084.c: New test.

4856292f

Daily bump. · c49ef76d
GCC Administrator authored 1 week ago

c49ef76d

Mar 04, 2025

c++: C++23 range-for temps and ?: [PR119073] · f2a7f845

Jason Merrill authored 1 week ago

Here gimplification got confused because extend_temps_r messed up the types
of the arms of a COND_EXPR.

	PR c++/119073

gcc/cp/ChangeLog:

	* call.cc (extend_temps_r): Preserve types of COND_EXPR arms.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/range-for39.C: New test.

f2a7f845

libgo: bump libgo version for GCC 15 release · 8d776294
Ian Lance Taylor authored 2 weeks ago
```
For PR go/119098

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/654477
```
8d776294

C prototypes for external arguments; add warning for mismatch. · 21ca9153

Thomas Koenig authored 1 week ago

The problem was that we were not handling external dummy arguments
with -fc-prototypes-external. In looking at this, I found that we
were not warning about external procedures with different argument
lists.  This can actually be legal (see the two test cases) but
creates a problem for the C prototypes: If we have something like

subroutine foo(a,n)
  external a
  if (n == 1) call a(1)
  if (n == 2) call a(2,3)
end subroutine foo

then, pre-C23, we could just have written out the prototype as

void foo_ (void (*a) (), int *n);

but this is illegal in C23. What to do?  I finally chose to warn
about the argument mismatch, with a new option. Warn only because the
code above is legal, but include in -Wall because such code seems highly
suspect.  This option is also implied in -fc-prototypes-external. I also
put a warning in the generated header file in that case, so users
have a chance to see what is going on (especially since gcc now
defaults to C23).

gcc/fortran/ChangeLog:

	PR fortran/119049
	PR fortran/119074
	* dump-parse-tree.cc (seen_conflict): New static varaible.
	(gfc_dump_external_c_prototypes): Initialize it. If it was
	set, write out a warning that -std=c23 will not work.
	(write_proc): Move the work of actually writing out the
	formal arglist to...
	(write_formal_arglist): New function. Handle external dummy
	parameters and their argument lists. If there were mismatched
	arguments, output an empty argument list in pre-C23 style.
	* gfortran.h (struct gfc_symbol): Add ext_dummy_arglist_mismatch
	flag and formal_at.
	* invoke.texi: Document -Wexternal-argument-mismatch.
	* lang.opt: Put it in.
	* resolve.cc (resolve_function): If warning about external
	argument mismatches, build a formal from actual arglist the
	first time around, and later compare and warn.
	(resolve_call): Likewise

gcc/testsuite/ChangeLog:

	PR fortran/119049
	PR fortran/119074
	* gfortran.dg/interface_55.f90: New test.
	* gfortran.dg/interface_56.f90: New test.

21ca9153

AVR: Add texi @subsubsection "AVR Optimization Options". · 9ee39fcb
Georg-Johann Lay authored 2 weeks ago
```
gcc/
	* doc/invoke.texi (AVR Optimization Options): New @subsubsection
	for pure optimization options.
```
9ee39fcb

testsuite: arm: Use effective-target for pr68674.c test · 879fd9c8

Torbjörn SVENSSON authored 4 months ago


gcc/testsuite/ChangeLog:

	* gcc.target/arm/pr68674.c: Use effective-target arm_arch_v7a
	and arm_libc_fp_abi.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

879fd9c8

__builtin_bswapXX: improve docs · 5452b50a
Oscar Gustafsson authored 1 week ago
```
gcc/ChangeLog:

	* doc/extend.texi: Improve example for __builtin_bswap16.
```
5452b50a

Break false dependency chain on Zen5 · 8c4a00f9

Jan Hubicka authored 1 week ago

Zen5 on some variants has false dependency on tzcnt, blsi, blsr and blsmsk
instructions.  Those can be tested by the following benchmark

jh@shroud:~> cat ee.c
int
main()
{
       int a = 10;
       int b = 0;
       for (int i = 0; i < 1000000000; i++)
       {
               asm volatile ("xor %0, %0": "=r" (b));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
       }
       return 0;
}
jh@shroud:~> cat bmk.sh
gcc ee.c -DBREAK -DINST=\"$1\" -O2 ; time ./a.out ; gcc ee.c -DINST=\"$1\" -O2 ; time ./a.out
jh@shroud:~> sh bmk.sh tzcnt

real    0m0.886s
user    0m0.886s
sys     0m0.000s

real    0m0.886s
user    0m0.886s
sys     0m0.000s

jh@shroud:~> sh bmk.sh blsi

real    0m0.979s
user    0m0.979s
sys     0m0.000s

real    0m2.418s
user    0m2.418s
sys     0m0.000s

jh@shroud:~> sh bmk.sh blsr

real    0m0.986s
user    0m0.986s
sys     0m0.000s

real    0m2.422s
user    0m2.421s
sys     0m0.000s
jh@shroud:~> sh bmk.sh blsmsk

real    0m0.973s
user    0m0.973s
sys     0m0.000s

real    0m2.422s
user    0m2.422s
sys     0m0.000s

We already have runable that controls tzcnt together with lzcnt and popcnt.
Since it seems that only tzcnt is affected I added new tunable to control tzcnt
only.  I also added splitters for blsi/blsr/blsmsk implemented analogously to
existing splitter for lzcnt.

The patch is neutral on SPEC. We produce blsi and blsr in some internal loops, but
they usually have same destination as source. However it is good to break the
dependency chain to avoid patogolical cases and it is quite cheap overall, so I
think we want to enable this for generic.  I will send followup patch for this.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

	* config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_TZCNT): New macro.
	(TARGET_AVOID_FALSE_DEP_FOR_BLS): New macro.
	* config/i386/i386.md (*bmi_blsi_<mode>): Add splitter for false
	dependency.
	(*bmi_blsi_<mode>_ccno): Add splitter for false dependency.
	(*bmi_blsi_<mode>_falsedep): New pattern.
	(*bmi_blsmsk_<mode>): Add splitter for false dependency.
	(*bmi_blsmsk_<mode>_falsedep): New pattern.
	(*bmi_blsr_<mode>): Add splitter for false dependency.
	(*bmi_blsr_<mode>_cmp): Add splitter for false dependency
	(*bmi_blsr_<mode>_cmp_falsedep): New pattern.
	* config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT): New tune.
	(X86_TUNE_AVOID_FALSE_DEP_FOR_BLS): New tune.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/blsi.c: New test.
	* gcc.target/i386/blsmsk.c: New test.
	* gcc.target/i386/blsr.c: New test.

8c4a00f9

Fortran: Fix gimplification error on assignment to pointer [PR103391] · 04909c7e

Andre Vehreschild authored 1 week ago

	PR fortran/103391

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_trans_assignment_1): Do not use poly assign
	for pointer arrays on lhs (as it is done for allocatables
	already).

gcc/testsuite/ChangeLog:

	* gfortran.dg/assign_12.f90: New test.

04909c7e

Make ix86_macro_fusion_pair_p and ix86_fuse_mov_alu_p match current CPUs · c84be624

Jan Hubicka authored 2 weeks ago

The current implementation of fussion predicates misses some common
fussion cases on zen and more recent cores.  I added knobs for
individual conditionals we test.

 1) I split checks for fusing ALU with conditional operands when the ALU
 has memory operand.  This seems to be supported by zen3+ and by
 tigerlake and coperlake (according to Agner Fog's manual)

 2) znver4 and 5 supports fussion of ALU and conditional even if ALU has
    memory and immediate operands.
    This seems to be relatively important enabling 25% more fusions on
    gcc bootstrap.

 3) no CPU supports fusing when ALU contains IP relative memory
    references.  I added separate knob so we do not forger about this if
    this gets supoorted later.

The patch does not solve the limitation of sched that fuse pairs must be
adjacent on imput and the first operation must be signle-set.  Fixing
single-set is easy (I have separate patch for this), for non-adjacent
pairs we need bigger surgery.

To verify what CPU really does I made simpe test script.

jh@ryzen3:~> cat fuse-test.c
        int b;
        const int z = 0;
        const int o = 1;
        int
main()
{
        int a = 1000000000;
        int b;
        int z = 0;
        int o = 1;
        asm volatile ("\n"
".L1234:\n"
        "nop\n"
        "subl   %3, %0\n"

        "movl %0, %1\n"
        "cmpl     %2, %1\n"
        "movl %0, %1\n"
        "test %1, %1\n"

        "nop\n"
        "jne    .L1234":"=a"(a),
        "=m"(b)
        "=r"(b)
        :
        "m"(z),
        "m"(o),
        "i"(0),
        "i"(1),
        "0"(a)
                );
}
jh@ryzen3:~> cat fuse-test.sh
EVENT=ex_ret_fused_instr
dotest()
{
gcc -O2  fuse-test.c $* -o fuse-cmp-imm-mem-nofuse
perf stat -e $EVENT ./fuse-cmp-imm-mem-nofuse  2>&1 | grep $EVENT
gcc -O2 fuse-test.c -DFUSE $* -o fuse-cmp-imm-mem-fuse
perf stat  -e $EVENT ./fuse-cmp-imm-mem-fuse 2>&1 | grep $EVENT
}

echo ALU with immediate
dotest
echo ALU with memory
dotest -D MEM
echo ALU with IP relative memory
dotest -D MEM -D IPRELATIVE
echo CMP with immediate
dotest -D CMP
echo CMP with memory
dotest -D CMP -D MEM
echo CMP with memory and immediate
dotest -D CMP -D MEMIMM
echo CMP with IP relative memory
dotest -D CMP -D MEM -D IPRELATIVE
echo TEST
dotest -D TEST

On zen5 I get:
ALU with immediate
            20,345      ex_ret_fused_instr:u
     1,000,020,278      ex_ret_fused_instr:u
ALU with memory
            20,367      ex_ret_fused_instr:u
     1,000,020,290      ex_ret_fused_instr:u
ALU with IP relative memory
            20,395      ex_ret_fused_instr:u
            20,403      ex_ret_fused_instr:u
CMP with immediate
            20,369      ex_ret_fused_instr:u
     1,000,020,301      ex_ret_fused_instr:u
CMP with memory
            20,314      ex_ret_fused_instr:u
     1,000,020,341      ex_ret_fused_instr:u
CMP with memory and immediate
            20,372      ex_ret_fused_instr:u
     1,000,020,266      ex_ret_fused_instr:u
CMP with IP relative memory
            20,382      ex_ret_fused_instr:u
            20,369      ex_ret_fused_instr:u
TEST
            20,346      ex_ret_fused_instr:u
     1,000,020,301      ex_ret_fused_instr:u

IP relative memory seems to not be documented.

On zen3/4 I get:

ALU with immediate
            20,263      ex_ret_fused_instr:u
     1,000,020,051      ex_ret_fused_instr:u
ALU with memory
            20,255      ex_ret_fused_instr:u
     1,000,020,056      ex_ret_fused_instr:u
ALU with IP relative memory
            20,253      ex_ret_fused_instr:u
            20,266      ex_ret_fused_instr:u
CMP with immediate
            20,264      ex_ret_fused_instr:u
     1,000,020,052      ex_ret_fused_instr:u
CMP with memory
            20,253      ex_ret_fused_instr:u
     1,000,019,794      ex_ret_fused_instr:u
CMP with memory and immediate
            20,260      ex_ret_fused_instr:u
            20,264      ex_ret_fused_instr:u
CMP with IP relative memory
            20,258      ex_ret_fused_instr:u
            20,256      ex_ret_fused_instr:u
TEST
            20,261      ex_ret_fused_instr:u
     1,000,020,048      ex_ret_fused_instr:u

zen1 and 2 gets:

ALU with immediate
            21,610      ex_ret_fus_brnch_inst:u
            21,697      ex_ret_fus_brnch_inst:u
ALU with memory
            21,479      ex_ret_fus_brnch_inst:u
            21,747      ex_ret_fus_brnch_inst:u
ALU with IP relative memory
            21,623      ex_ret_fus_brnch_inst:u
            21,684      ex_ret_fus_brnch_inst:u
CMP with immediate
            21,708      ex_ret_fus_brnch_inst:u
     1,000,021,288      ex_ret_fus_brnch_inst:u
CMP with memory
            21,689      ex_ret_fus_brnch_inst:u
     1,000,004,270      ex_ret_fus_brnch_inst:u
CMP with memory and immediate
            21,604      ex_ret_fus_brnch_inst:u
            21,671      ex_ret_fus_brnch_inst:u
CMP with IP relative memory
            21,589      ex_ret_fus_brnch_inst:u
            21,602      ex_ret_fus_brnch_inst:u
TEST
            21,600      ex_ret_fus_brnch_inst:u
     1,000,021,233      ex_ret_fus_brnch_inst:u

I tested the patch on zen3 and zen5 and spec2k17 and it seems neutral, however
the number of fussion does go up.

Bootstrapped/regtested x86_64-linux, I plan to commit it tomorrow.

Honza

gcc/ChangeLog:

	* config/i386/i386.h (TARGET_FUSE_ALU_AND_BRANCH_MEM): New macro.
	(TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM): New macro.
	(TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New macro.
	* config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Support
	non-single-set.
	(ix86_macro_fusion_pair_p): Allow ALU which only clobbers;
	be more careful about immediates; check TARGET_FUSE_ALU_AND_BRANCH_MEM,
	TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM, TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE;
	verify that we never use unsigned checks with inc/dec.
	* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): New tune.
	(X86_TUNE_FUSE_ALU_AND_BRANCH_MEM): New tune.
	(X86_TUNE_FUSE_ALU_AND_BRANCH_MEM_IMM): New tune.
	(X86_TUNE_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New tune.

c84be624

c++: ICE with RANGE_EXPR and array init [PR109431] · 173cf7c9

Marek Polacek authored 2 weeks ago


We crash because we generate

  {[0 ... 1]={.low=0, .high=1}, [1]={.low=0, .high=1}}

which output_constructor_regular_field doesn't want to see.  This
happens since r9-1483: process_init_constructor_array can now create
a RANGE_EXPR.  But the bug isn't in that patch; the problem is that
build_vec_init doesn't handle RANGE_EXPRs.

build_vec_init has a FOR_EACH_CONSTRUCTOR_ELT loop which populates
const_vec.  In this case it loops over the elements of

  {[0 ... 1]={.low=0, .high=1}}

but assumes that each element initializes one element.  So after the
loop num_initialized_elts was 1, and then below:

              HOST_WIDE_INT last = tree_to_shwi (maxindex);
              if (num_initialized_elts <= last)
                {
                  tree field = size_int (num_initialized_elts);
                  if (num_initialized_elts != last)
                    field = build2 (RANGE_EXPR, sizetype, field,
                                    size_int (last));
                  CONSTRUCTOR_APPEND_ELT (const_vec, field, e);
                }

we added the extra initializer.

It seemed convenient to use range_expr_nelts like below.

	PR c++/109431

gcc/cp/ChangeLog:

	* cp-tree.h (range_expr_nelts): Declare.
	* init.cc (build_vec_init): If the CONSTRUCTOR's index is a
	RANGE_EXPR, use range_expr_nelts to count how many elements
	were initialized.

gcc/testsuite/ChangeLog:

	* g++.dg/init/array67.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

173cf7c9

aarch64: force operand to fresh register to avoid subreg issues [PR118892] · d883f323

Tamar Christina authored 1 week ago

When the input is already a subreg and we try to make a paradoxical
subreg out of it for copysign this can fail if it violates the subreg
relationship.

Use force_lowpart_subreg instead of lowpart_subreg to then force the
results to a register instead of ICEing.

gcc/ChangeLog:

	PR target/118892
	* config/aarch64/aarch64.md (copysign<GPF:mode>3): Use
	force_lowpart_subreg instead of lowpart_subreg.

gcc/testsuite/ChangeLog:

	PR target/118892
	* gcc.target/aarch64/copysign-pr118892.c: New test.

d883f323

Fix folding of BIT_NOT_EXPR for POLY_INT_CST [PR118976] · 78380fd7

Richard Sandiford authored 1 week ago

There was an embarrassing typo in the folding of BIT_NOT_EXPR for
POLY_INT_CSTs: it used - rather than ~ on the poly_int.  Not sure
how that happened, but it might have been due to the way that
~x is implemented as -1 - x internally.

gcc/
	PR tree-optimization/118976
	* fold-const.cc (const_unop): Use ~ rather than - for BIT_NOT_EXPR.
	* config/aarch64/aarch64.cc (aarch64_test_sve_folding): New function.
	(aarch64_run_selftests): Run it.

78380fd7

simplify-rtx: Fix up simplify_logical_relational_operation [PR119002] · 1ff01a88

Richard Sandiford authored 1 week ago


The following testcase is miscompiled on powerpc64le-linux starting with
r15-6777.  During combine we see:

(set (reg:SI 134)
    (ior:SI (ge:SI (reg:CCFP 128)
            (const_int 0 [0]))
        (lt:SI (reg:CCFP 128)
            (const_int 0 [0]))))

The simplify_logical_relational_operation code (in its current form)
was written with arithmetic rather than CC modes in mind.  Since CCFP
is a CC mode, it fails the HONOR_NANS check, and so the function assumes
that ge | lt => true.

If one comparison is unsigned then it should be safe to assume that
the other comparison is also unsigned, even for CC modes, since the
optimisation checks that the comparisons are between the same operands.
For the other cases, we can only safely fold comparisons of CC mode
values if the result is always-true (15) or always-false (0).

It turns out that the original testcase for PR117186, which ran at -O,
was relying on the old behaviour for some of the functions.  It needs
4-instruction combinations, and so -fexpensive-optimizations, to pass
in its intended form.

gcc/
	PR rtl-optimization/119002
	* simplify-rtx.cc
	(simplify_context::simplify_logical_relational_operation): Handle
	comparisons between CC values.  If there is no evidence that the
	CC values are unsigned, restrict the fold to always-true or
	always-false results.

gcc/testsuite/
	* gcc.c-torture/execute/ieee/pr119002.c: New test.
	* gcc.target/aarch64/pr117186.c: Run at -O2 rather than -O.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>

1ff01a88

testsuite: Add tests for already fixed PR [PR119071] · ccf9db9a

Jakub Jelinek authored 1 week ago

Uros' r15-7793 fixed this PR as well, I'm just committing tests
from the PR so that it can be closed.

2025-03-04  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/119071
	* gcc.dg/pr119071.c: New test.
	* gcc.c-torture/execute/pr119071.c: New test.

ccf9db9a

Fortran: Prevent ICE when getting caf-token from abstract type [PR77872] · 5bd66483

Andre Vehreschild authored 2 weeks ago

	PR fortran/77872

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_get_tree_for_caf_expr): Pick up token from
	decl when it is present there for class types.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/class_1.f90: New test.

5bd66483