Commits · 389e41f3ee011b3092a4841b0711dc8b68eccbca · COBOLworx / gcc-cobol

Jan 24, 2025

Improved Make-lang.in for charmaps- and valconv-dup. New errmsg for missing main() · 389e41f3
rdubner authored 2 months ago

389e41f3
Triple-ply copybook playpen · 7370bf9f
rdubner authored 2 months ago

7370bf9f
Merge remote-tracking branch 'gnu-gcc/master' into master+cobol · c315037a
rdubner authored 2 months ago

View commits for tag v0.2180 v0.2180

c315037a

tree-optimization/116010 - dr_may_alias regression · 02fc12b0

Richard Biener authored 2 months ago

r15-491-gc290e6a0b7a9de fixed a latent issue with dr_analyze_innermost
and dr_may_alias where not properly analyzed DRs would yield an invalid
answer.  This caused some missed optimizations in case there is not
actually any evolution in the not analyzed base part.  The following
recovers this by only handling base parts which reference SSA vars
as index in the conservative way.

The gfortran.dg/vect/vect-8.f90 testcase is difficult to deal with,
so the following merely bumps the maximum number of expected vectorized loops
for both aarch64 and x86-64.

	PR tree-optimization/116010
	* tree-data-ref.cc (contains_ssa_ref_p_1): New function.
	(contains_ssa_ref_p): Likewise.
	(dr_may_alias_p): Avoid treating unanalyzed base parts without
	SSA reference conservatively.

	* gfortran.dg/vect/vect-8.f90: Adjust.

02fc12b0

s390: Implement isfinite and isnormal optabs · b00bd292

Stefan Schulze Frielinghaus authored 2 months ago

Merge new optabs with the existing implementations for signbit and
isinf.

gcc/ChangeLog:

	* config/s390/s390.h (S390_TDC_POSITIVE_ZERO): Remove.
	(S390_TDC_NEGATIVE_ZERO): Remove.
	(S390_TDC_POSITIVE_NORMALIZED_BFP_NUMBER): Remove.
	(S390_TDC_NEGATIVE_NORMALIZED_BFP_NUMBER): Remove.
	(S390_TDC_POSITIVE_DENORMALIZED_BFP_NUMBER): Remove.
	(S390_TDC_NEGATIVE_DENORMALIZED_BFP_NUMBER): Remove.
	(S390_TDC_POSITIVE_INFINITY): Remove.
	(S390_TDC_NEGATIVE_INFINITY): Remove.
	(S390_TDC_POSITIVE_QUIET_NAN): Remove.
	(S390_TDC_NEGATIVE_QUIET_NAN): Remove.
	(S390_TDC_POSITIVE_SIGNALING_NAN): Remove.
	(S390_TDC_NEGATIVE_SIGNALING_NAN): Remove.
	(S390_TDC_POSITIVE_DENORMALIZED_DFP_NUMBER): Remove.
	(S390_TDC_NEGATIVE_DENORMALIZED_DFP_NUMBER): Remove.
	(S390_TDC_POSITIVE_NORMALIZED_DFP_NUMBER): Remove.
	(S390_TDC_NEGATIVE_NORMALIZED_DFP_NUMBER): Remove.
	(S390_TDC_SIGNBIT_SET): Remove.
	(S390_TDC_INFINITY): Remove.
	* config/s390/s390.md (signbit<mode>2<tf_fpr>): Merge this one
	(isinf<mode>2<tf_fpr>): and this one into
	(<TDC_CLASS:tdc_insn><mode>2<tf_fpr>): new expander.
	(isnormal<mode>2<tf_fpr>): New BFP expander.
	(isnormal<mode>2): New DFP expander.
	* config/s390/vector.md (signbittf2_vr): Merge this one
	(isinftf2_vr): and this one into
	(<tdc_insn>tf2_vr): new expander.
	(signbittf2): Merge this one
	(isinftf2): and this one into
	(<tdc_insn>tf2): new expander.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/isfinite-isinf-isnormal-signbit-1.c: New test.
	* gcc.target/s390/isfinite-isinf-isnormal-signbit-2.c: New test.
	* gcc.target/s390/isfinite-isinf-isnormal-signbit-3.c: New test.
	* gcc.target/s390/isfinite-isinf-isnormal-signbit.h: New test.

b00bd292

tree-optimization/118634 - improve cunroll dump · dc1e1b38

Richard Biener authored 2 months ago

We no longer subtract the estimated eliminated number of instructions
from the estimated size after unrolling we print - this is a bit
confusing when comparing dumps to previous releases.  The following
changes the dump from

  Estimated size after unrolling: 42

to

  Estimated size after unrolling: 42-12

for the testcase in the PR.

	PR tree-optimization/118634
	* tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely):
	Dump the number of estimated eliminated insns.

dc1e1b38

Fix command flags for SVE2 faminmax · 8bdf10fc

Saurabh Jha authored 2 months ago

Earlier, we were gating SVE2 faminmax behind sve+faminmax. This was
incorrect and this patch changes it so that it is gated behind
sve2+faminmax.

gcc/ChangeLog:

	* config/aarch64/aarch64-sve2.md:
	(*aarch64_pred_faminmax_fused): Fix to use the correct flags.
	* config/aarch64/aarch64.h
	(TARGET_SVE_FAMINMAX): Remove.
	* config/aarch64/iterators.md: Fix iterators so that famax and
	famin use correct flags.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/faminmax_1.c: Fix test to use the
	correct flags.
	* gcc.target/aarch64/sve/faminmax_2.c: Fix test to use the
	correct flags.
	* gcc.target/aarch64/sve/faminmax_3.c: New test.

8bdf10fc

[ifcombine] check for more zero-extension cases [PR118572] · 91fa9c15

Alexandre Oliva authored 2 months ago

When comparing a signed narrow variable with a wider constant that has
the bit corresponding to the variable's sign bit set, we would check
that the constant is a sign-extension from that sign bit, and conclude
that the compare fails if it isn't.

When the signed variable is masked without getting the [lr]l_signbit
variable set, or when the sign bit itself is masked out, we know the
sign-extension bits from the extended variable are going to be zero,
so the constant will only compare equal if it is a zero- rather than
sign-extension from the narrow variable's precision, therefore, check
that it satisfies this property, and yield a false compare result
otherwise.


for  gcc/ChangeLog

	PR tree-optimization/118572
	* gimple-fold.cc (fold_truth_andor_for_ifcombine): Compare as
	unsigned the variables whose extension bits are masked out.

for  gcc/testsuite/ChangeLog

	PR tree-optimization/118572
	* gcc.dg/field-merge-24.c: New.

91fa9c15

[ifcombine] improve reverse checking and operand swapping · a56122de

Alexandre Oliva authored 2 months ago

Don't reject an ifcombine field-merging opportunity just because the
left-hand operands aren't both reversed, if the second compare needs
to be swapped for operands to match.

Also mention that reversep does NOT affect the turning of range tests
into bit tests.


for  gcc/ChangeLog

	* gimple-fold.cc (fold_truth_andor_for_ifcombine): Document
	reversep's absence of effects on range tests.  Don't reject
	reversep mismatches before trying compare swapping.

a56122de

[ifcombine] out-of-bounds bitfield refs can trap [PR118514] · 3f05d703

Alexandre Oliva authored 2 months ago

Check that BIT_FIELD_REFs of DECLs are in range before deciding they
don't trap.

Check that a replacement bitfield load is as trapping as the replaced
load.


for  gcc/ChangeLog

	PR tree-optimization/118514
	* tree-eh.cc (bit_field_ref_in_bounds_p): New.
	(tree_could_trap_p) <BIT_FIELD_REF>: Call it.
	* gimple-fold.cc (make_bit_field_load): Check trapping status
	of replacement load against original load.

for  gcc/testsuite/ChangeLog

	PR tree-optimization/118514
	* gcc.dg/field-merge-23.c: New.

3f05d703

Daily bump. · 35d5c4f9
GCC Administrator authored 2 months ago

35d5c4f9

Jan 23, 2025

c++: bogus error with nested lambdas [PR117602] · 6d8a0e8b

Marek Polacek authored 4 months ago


The error here should also check that we aren't nested in another
lambda; in it, at_function_scope_p() will be false.

	PR c++/117602

gcc/cp/ChangeLog:

	* cp-tree.h (current_nonlambda_scope): Add a default argument.
	* lambda.cc (current_nonlambda_scope): New bool parameter.  Use it.
	* parser.cc (cp_parser_lambda_introducer): Use current_nonlambda_scope
	to check if the lambda is non-local.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/lambda-uneval21.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

6d8a0e8b

c++: Small make_tree_vector_from_ctor improvement · 4ce9e353

Jakub Jelinek authored 2 months ago

After committing the append_ctor_to_tree_vector patch, I've realized
that for the larger constructors make_tree_vector_from_ctor unnecessarily
wastes one GC vector; make_tree_vector () / release_tree_vector () only
caches GC vectors from 4 to 16 allocated tree elements, so in the likely
case of a rather small ctor using make_tree_vector () can be beneficial,
we can pick something from the cache and if we don't need it later,
pt.cc calls release_tree_vector on it to return it back to the cache.
But for the larger ctors, we just eat one vector from the cache, never
use it (because the vec_safe_reserve will immediately allocate a different
vector) and never return it back to the cache.

So, the following patch passes NULL for the larger vectors, which
append_ctor_to_tree_vector handles just fine now (vec_safe_reserve will
just allocate appropriately sized vector).

2025-01-23  Jakub Jelinek  <jakub@redhat.com>

	* c-common.cc (make_tree_vector_from_ctor): Only use make_tree_vector
	for ctors with <= 16 elements.

4ce9e353

hppa: Fix typo in ADDITIONAL_REGISTER_NAMES in pa32-regs.h · ce28eb9f

John David Anglin authored 2 months ago

2025-01-23  John David Anglin  <danglin@gcc.gnu.org>

gcc/ChangeLog:

	* config/pa/pa32-regs.h (ADDITIONAL_REGISTER_NAMES): Change
	register 86 name to "%fr31L".

ce28eb9f

Merge remote-tracking branch 'origin/parser' into bobdev · c1dc4e7a
rdubner authored 2 months ago

c1dc4e7a
Tweaked ROUNDING into compliance with the standard · 35e0b040
rdubner authored 2 months ago

35e0b040

vect: Avoid copying of uninitialized variable [PR118628] · 8f6dd185

Jakub Jelinek authored 2 months ago

vectorizable_{store,load} does roughly
      tree offvar;
      tree running_off;
      if (!costing_p)
        {
          ... initialize offvar ...
        }
      running_off = offvar;
      for (...)
        {
          if (costing_p)
            {
              ...
              continue;
            }
          ... use running_off ...
        }
so, it copies unconditionally sometimes uninitialized variable (but then
uses the copied variable only if it was set to something initialized).
Still, I think it is better to avoid copying around maybe uninitialized
vars.

2025-01-23  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/118628
	* tree-vect-stmts.cc (vectorizable_store, vectorizable_load):
	Initialize offvar to NULL_TREE.

8f6dd185

WIP: rounding · f81df894
rdubner authored 2 months ago

f81df894

Fortran: do not evaluate arguments of MAXVAL/MINVAL too often [PR118613] · 3cef53a4

Harald Anlauf authored 2 months ago

	PR fortran/118613

gcc/fortran/ChangeLog:

	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxval): Adjust algorithm
	for inlined version of MINLOC and MAXLOC so that arguments are only
	evaluted once, and create temporaries where necessary.  Document
	change of algorithm.

gcc/testsuite/ChangeLog:

	* gfortran.dg/maxval_arg_eval_count.f90: New test.

3cef53a4

remove -v from install · db9f068f
James K. Lowden authored 2 months ago

db9f068f

AVR: PR118012 - Try to work around sick code from match.pd. · 0bb32230

Georg-Johann Lay authored 2 months ago

This patch tries to work around PR118012 which may use a
full fledged multiplication instead of a simple bit test.
This is because match.pd's

/* (zero_one == 0) ? y : z <op> y -> ((typeof(y))zero_one * z) <op> y */
/* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */

"optimizes" code with op in { plus, ior, xor } like

  if (a & 1)
    b = b <op> c;

to something like:

  x1 = EXTRACT_BIT0 (a);
  x2 = c MULT x1;
  b = b <op> x2;

or

  x1 = EXTRACT_BIT0 (a);
  x2 = ZERO_EXTEND (x1);
  x3 = NEG x2;
  x4 = a AND x3:
  b = b <op> x4;

which is very expensive and may even result in a libgcc call for
a 32-bit multiplication on devices that don't even have MUL.
Notice that EXTRACT_BIT0 is already more expensive (slower, more
code, more register pressure) than a bit-test + branch.

The patch:

o Adds some combiner patterns that try to map sick code back
  to a bit test + branch.

o Adjusts costs to make MULT (x AND 1) cheap, in the hope that the
  middle-end will use that alternative (which we map to sane code).

o On devices without MUL, 32-bit multiplication was performed by a
  library call, which bypasses the MULT (x AND 1) and similar patterns.
  Therefore, mulsi3 is also allowed for devices without MUL so that
  we get at MULT pattern that can be transformed.  (Though this is
  not possible on AVR_TINY since it passes arguments on the stack).

o Add a new command line option -mpr118012, so most of the patterns
  and cost computations can be switched off as they have
  avropt_pr118012 in their insn condition.

o Added sign-extract.0 patterns unconditionally (no avropt_pr118012).

Notice that this patch is just a work-around, it's not a fix of the
root cause, which are the patterns in match.pd that don't care about
the target and don't even care about costs.

The work-around is incomplete, and 3 of the new tests are still failing.
This is because there are situations where it does not work:

* The MULT is realized as a library call.

* The MULT is realized as an ASHIFT, and the ASHIFT again is transformed
  into something else.  For example, with -O2 -mmcu=atmega128,
  ASHIFT(3) is transformed into ASHIFT(1) + ASHIFT(2).

	PR tree-optimization/118012
	PR tree-optimization/118360
gcc/
	* config/avr/avr.opt (-mpr118012): New undocumented option.
	* config/avr/avr-protos.h (avr_out_sextr)
	(avr_emit_skip_pixop, avr_emit_skip_clear): New protos.
	* config/avr/avr.cc (avr_adjust_insn_length)
	[case ADJUST_LEN_SEXTR]: Handle case.
	(avr_rtx_costs_1) [NEG]: Costs for NEG (ZERO_EXTEND (ZERO_EXTRACT)).
	[MULT && avropt_pr118012]: Costs for MULT (x AND 1).
	(avr_out_sextr, avr_emit_skip_pixop, avr_emit_skip_clear): New
	functions.
	* config/avr/avr.md [avropt_pr118012]: Add combine patterns with
	that condition that try to work around PR118012.
	(adjust_len) <sextr>: Add insn attr value.
	(pixop): New code iterator.
	(mulsi3) [avropt_pr118012 && !AVR_TINY]: Allow these in insn condition.
gcc/testsuite/
	* gcc.target/avr/mmcu/pr118012-1.h: New file.
	* gcc.target/avr/mmcu/pr118012-1-o2-m128.c: New test.
	* gcc.target/avr/mmcu/pr118012-1-os-m128.c: New test.
	* gcc.target/avr/mmcu/pr118012-1-o2-m103.c: New test.
	* gcc.target/avr/mmcu/pr118012-1-os-m103.c: New test.
	* gcc.target/avr/mmcu/pr118012-1-o2-t40.c: New test.
	* gcc.target/avr/mmcu/pr118012-1-os-t40.c: New test.
	* gcc.target/avr/mmcu/pr118360-1.h: New file.
	* gcc.target/avr/mmcu/pr118360-1-o2-m128.c: New test.
	* gcc.target/avr/mmcu/pr118360-1-os-m128.c: New test.
	* gcc.target/avr/mmcu/pr118360-1-o2-m103.c: New test.
	* gcc.target/avr/mmcu/pr118360-1-os-m103.c: New test.
	* gcc.target/avr/mmcu/pr118360-1-o2-t40.c: New test.
	* gcc.target/avr/mmcu/pr118360-1-os-t40.c: New test.

0bb32230

Optimize vector<bool>::operator[] · 2d55c016

Jan Hubicka authored 2 months ago

the following testcase:

  bool f(const std::vector<bool>& v, std::size_t x) {
    return v[x];
  }

is compiled as:

f(std::vector<bool, std::allocator<bool> > const&, unsigned long):
        testq   %rsi, %rsi
        leaq    63(%rsi), %rax
        movq    (%rdi), %rdx
        cmovns  %rsi, %rax
        sarq    $6, %rax
        leaq    (%rdx,%rax,8), %rdx
        movq    %rsi, %rax
        sarq    $63, %rax
        shrq    $58, %rax
        addq    %rax, %rsi
        andl    $63, %esi
        subq    %rax, %rsi
        jns     .L2
        addq    $64, %rsi
        subq    $8, %rdx
.L2:
        movl    $1, %eax
        shlx    %rsi, %rax, %rax
        andq    (%rdx), %rax
        setne   %al
        ret

which is quite expensive for simple bit access in a bitmap.  The reason is that
the bit access is implemented using iterators
	return begin()[__n];
Which in turn cares about situation where __n is negative yielding the extra
conditional.

    _GLIBCXX20_CONSTEXPR
    void
    _M_incr(ptrdiff_t __i)
    {
      _M_assume_normalized();
      difference_type __n = __i + _M_offset;
      _M_p += __n / int(_S_word_bit);
      __n = __n % int(_S_word_bit);
      if (__n < 0)
        {
          __n += int(_S_word_bit);
          --_M_p;
        }
      _M_offset = static_cast<unsigned int>(__n);
    }

While we can use __builtin_unreachable to declare that __n is in range
0...max_size () but I think it is better to implement it directly, since
resulting code is shorter and much easier to optimize.

We now porduce:
.LFB1248:
        .cfi_startproc
        movq    (%rdi), %rax
        movq    %rsi, %rdx
        shrq    $6, %rdx
        andq    (%rax,%rdx,8), %rsi
        andl    $63, %esi
        setne   %al
        ret

Testcase suggests
        movq    (%rdi), %rax
        movl    %esi, %ecx
        shrq    $5, %rsi        # does still need to be 64-bit
        movl    (%rax,%rsi,4), %eax
        btl     %ecx, %eax
        setb    %al
        retq
Which is still one instruction shorter.

libstdc++-v3/ChangeLog:

	PR target/80813
	* include/bits/stl_bvector.h (vector<bool, _Alloc>::operator []): Do
	not use iterators.

gcc/testsuite/ChangeLog:

	PR target/80813
	* g++.dg/tree-ssa/bvector-3.C: New test.

2d55c016

rtl-ssa: Avoid dangling phi uses [PR118562] · 3dbcf794

Richard Sandiford authored 2 months ago

rtl-ssa uses degenerate phis to maintain an RPO list of
accesses in which every use is of the RPO-previous definition.
Thus, if it finds that a phi is always equal to a particular
value V, it sometimes needs to keep the phi and make V the
single input, rather than replace all uses of the phi with V.

The code to do that rerouted the phi's first input to the single
value V.  But as this PR shows, it failed to unlink the uses of
the other inputs.

The specific problem in the PR was that we had:

    x = PHI<x(a), V(b)>

The code replaced the first input with V and removed the second
input from the phi, but it didn't unlink the use of V associated
with that second input.

gcc/
	PR rtl-optimization/118562
	* rtl-ssa/blocks.cc (function_info::replace_phi): When converting
	to a degenerate phi, make sure to remove all uses of the previous
	inputs.

gcc/testsuite/
	PR rtl-optimization/118562
	* gcc.dg/torture/pr118562.c: New test.

3dbcf794

aarch64: Avoid redundant writes to FPMR · 1886dfb2

Richard Sandiford authored 2 months ago

GCC 15 is the first release to support FP8 intrinsics.
The underlying instructions depend on the value of a new register,
FPMR.  Unlike FPCR, FPMR is a normal call-clobbered/caller-save
register rather than a global register.  So:

- The FP8 intrinsics take a final uint64_t argument that
  specifies what value FPMR should have.

- If an FP8 operation is split across multiple functions,
  it is likely that those functions would have a similar argument.

If the object code has the structure:

    for (...)
      fp8_kernel (..., fpmr_value);

then fp8_kernel would set FPMR to fpmr_value each time it is
called, even though FPMR will already have that value for at
least the second and subsequent calls (and possibly the first).

The working assumption for the ABI has been that writes to
registers like FPMR can in general be more expensive than
reads and so it would be better to use a conditional write like:

       mrs     tmp, fpmr
       cmp     tmp, <value>
       beq     1f
       msr     fpmr, <value>
     1:

instead of writing the same value to FPMR repeatedly.

This patch implements that.  It also adds a tuning flag that suppresses
the behaviour, both to make testing easier and to support any future
cores that (for example) are able to rename FPMR.

Hopefully this really is the last part of the FP8 enablement.

gcc/
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag.
	* config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro.
	* config/aarch64/aarch64.md: Split moves into FPMR into a test
	and branch around.
	(aarch64_write_fpmr): New pattern.

gcc/testsuite/
	* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add
	cheap_fpmr_write by default.
	* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
	* gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write.
	* gcc.target/aarch64/acle/fpmr-2.c: Likewise.
	* gcc.target/aarch64/simd/vcvt_fpm.c: Likewise.
	* gcc.target/aarch64/simd/vdot2_fpm.c: Likewise.
	* gcc.target/aarch64/simd/vdot4_fpm.c: Likewise.
	* gcc.target/aarch64/simd/vmla_fpm.c: Likewise.
	* gcc.target/aarch64/acle/fpmr-6.c: New test.

1886dfb2

aarch64: Fix memory cost for FPM_REGNUM · ce6fc67d

Richard Sandiford authored 2 months ago

GCC 15 is going to be the first release to support FPMR.
While working on a follow-up patch, I noticed that for:

    (set (reg:DI R) ...)
    ...
    (set (reg:DI fpmr) (reg:DI R))

IRA would prefer to spill R to memory rather than allocate a GPR.
This is because the register move cost for GENERAL_REGS to
MOVEABLE_SYSREGS is very high:

  /* Moves to/from sysregs are expensive, and must go via GPR.  */
  if (from == MOVEABLE_SYSREGS)
    return 80 + aarch64_register_move_cost (mode, GENERAL_REGS, to);
  if (to == MOVEABLE_SYSREGS)
    return 80 + aarch64_register_move_cost (mode, from, GENERAL_REGS);

but the memory cost for MOVEABLE_SYSREGS was the same as for
GENERAL_REGS, making memory much cheaper.

Loading and storing FPMR involves a GPR temporary, so the cost should
account for moving into and out of that temporary.

This did show up indirectly in some of the existing asm tests,
where the stack frame allocated 16 bytes for callee saves (D8)
and another 16 bytes for spilling a temporary register.

It's possible that other registers need the same treatment
and it's more than probable that this code needs a rework.
None of that seems suitable for stage 4 though.

gcc/
	* config/aarch64/aarch64.cc (aarch64_memory_move_cost): Account
	for the cost of moving in and out of GENERAL_SYSREGS.

gcc/testsuite/
	* gcc.target/aarch64/acle/fpmr-5.c: New test.
	* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Don't expect
	a spill slot to be allocated.
	* gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.

ce6fc67d

aarch64: Allow FPMR source values to be zero · 97beccb3

Richard Sandiford authored 2 months ago

GCC 15 is going to be the first release to support FPMR.
The alternatives for moving values into FPMR were missing
a zero alternative, meaning that moves of zero would use an
unnecessary temporary register.

gcc/
	* config/aarch64/aarch64.md (*mov<SHORT:mode>_aarch64)
	(*movsi_aarch64, *movdi_aarch64): Allow the source of an MSR
	to be zero.

gcc/testsuite/
	* gcc.target/aarch64/acle/fp8.c: Add tests for moving zero into FPMR.

97beccb3

tree-assume: Fix UB in assume_query [PR118605] · 27a05f8d

Jakub Jelinek authored 2 months ago

The assume_query constructor does
assume_query::assume_query (function *f, bitmap p) : m_parm_list (p),
                                                     m_func (f)
where m_parm_list is bitmap &.  This is compile time UB, because
as soon as the constructor returns, m_parm_list reference is still
bound to the parameter of the constructor which is no longer in scope.

Now, one possible fix would be change the ctor argument to be bitmap &,
but that doesn't really work because in the only user of that class
we have
      auto_bitmap decls;
...
      assume_query query (fun, decls);
and auto_bitmap just has
  operator bitmap () { return &m_bits; }
Could be perhaps const bitmap &, but why?  bitmap is a pointer:
typedef class bitmap_head *bitmap;
and the EXECUTE_IF_SET_IN_BITMAP macros don't really change that point,
they just inspect what is inside of that bitmap_head the pointer points
to.

So, the simplest I think is avoid references (which cause even worse
code as it has to be dereferenced twice rather than once).

2025-01-23  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/118605
	* tree-assume.cc (assume_query::m_parm_list): Change type
	from bitmap & to bitmap.

27a05f8d

OpenMP/PolyInt: Pass poly-int structures by address to OMP libs. · b8ac0616

Tejas Belagod authored 1 year ago

Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

gcc/ChangeLog:

	* omp-low.cc (use_pointer_for_field): Use pointer if the OMP data
	structure's field type is a poly-int.

b8ac0616

testsuite: i386: Adjust gcc.target/i386/cmov12.c for Sun as syntax · 314d20bb

Rainer Orth authored 2 months ago

The new gcc.target/i386/cmov12.c test FAILs on Solaris/x86 with the
native as:

FAIL: gcc.target/i386/cmov12.c scan-assembler-times cmovg 3

This happens because as uses a different syntax for cmov:

--- cmov12.s.bu243	2025-01-21 16:55:27.038829605 +0100
+++ cmov12.s.bu24390	2025-01-21 16:55:44.565051230 +0100
@@ -41,9 +41,9 @@
 	leal	1(%rdx), %ebp
 	movl	(%r11), %esi
 	cmpl	%eax, %esi
-	cmovg	%ebp, %edx
-	cmovg	%r11, %rcx
-	cmovg	%esi, %eax
+	cmovl.g	%ebp, %edx
+	cmovq.g	%r11, %rcx
+	cmovl.g	%esi, %eax

The problem is even more prominent with the upcoming gas 2.44 which
added support for the Sun as syntax on Solaris, which gcc/configure
picks up.

This patch allows for both forms.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

2025-01-22  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

	gcc/testsuite:
	* gcc.target/i386/cmov12.c (scan-assembler-times): Allow for
	cmovl.g etc.

314d20bb

c++: Fix build_omp_array_section for type dependent array_expr [PR118590] · b02c061b

Jakub Jelinek authored 2 months ago

As can be seen on the testcase, when array_expr is type dependent, assuming
it has non-NULL TREE_TYPE is just wrong, it can often have NULL type, and even
if not, blindly assuming it is a pointer or array type is also wrong.

So, like in many other spots in the C++ FE, for type dependent expressions
we want to create something which will survive until instantiation and can be
redone at that point.

Unfortunately, build_omp_array_section is called before we actually do any
kind of checking what array_expr really is, and on invalid code it can be e.g.
a TYPE_DECL on which type_dependent_expression_p ICEs (as can be seen on the
pr67522.C testcase).  So, I've hacked this by checking it is not TYPE_DECL,
I hope a TYPE_P can't make it through there when we just lookup an identifier.

Anyway, this patch is not enough, we can ICE e.g. on __uint128_t[0:something]
during instantiation, so I think something needs to be done for this in pt.cc
as well.

2025-01-23  Jakub Jelinek  <jakub@redhat.com>

	PR c++/118590
	* typeck.cc (build_omp_array_section): If array_expr is type dependent
	or a TYPE_DECL, build OMP_ARRAY_SECTION with NULL type.

	* g++.dg/goacc/pr118590.C: New test.

b02c061b

c++: Fix weird expression in test for clauses other than when/default/otherwise [PR118604] · dd14b08e

Jakub Jelinek authored 2 months ago

Some clang analyzer warned about
if (!strcmp (p, "when") == 0 && !default_p)
which really looks weird, it is better to use strcmp (p, "when") != 0
or !!strcmp (p, "when").  Furthermore, as a micro optimization, it is cheaper
to evaluate default_p than calling strcmp, so that can be put first in the &&.

The C test for the same thing wasn't that weird, but I think for consistency
it is better to use the same test rather than trying to be creative.

2025-01-23  Jakub Jelinek  <jakub@redhat.com>

	PR c++/118604
gcc/c/
	* c-parser.cc (c_parser_omp_metadirective): Rewrite
	condition for clauses other than when, default and otherwise.
gcc/cp/
	* parser.cc (cp_parser_omp_metadirective): Test !default_p
	first and use strcmp () != 0 rather than !strcmp () == 0.

dd14b08e

builtins: Store unspecified value to *exp for inf/nan [PR114877] · d19b0682

Jakub Jelinek authored 2 months ago

The fold_builtin_frexp folding for NaN/Inf just returned the first argument
with evaluating second arguments side-effects, rather than storing something
to what the second argument points to.

The PR argues that the C standard requires the function to store something
there but what exactly is stored is unspecified, so not storing there
anything can result in UB if the value isn't initialized and is read later.

glibc and newlib store there 0, musl apparently doesn't store anything.

The following patch stores there zero (or would you prefer storing there
some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form
in assembly though) and adjusts the test so that it
doesn't rely on not storing there anything but instead checks for
-Wmaybe-uninitialized warning to find out that something has been stored
there.
Unfortunately I had to disable the NaN tests for -O0, while we can fold
__builtin_isnan (__builtin_nan ("")) at compile time, we can't fold
__builtin_isnan ((i = 0, __builtin_nan (""))) at compile time.
fold_builtin_classify uses just tree_expr_nan_p and if that isn't true
(because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg),
it does
      arg = builtin_save_expr (arg);
      return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and
nothing propagates the NAN to the comparison.
I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR:
added and recurse on the second argument, but that feels like stage1
material to me if we want to do that at all.

2025-01-23  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/114877
	* builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases
	like rvc_zero, return passed in arg and set *exp = 0.

	* gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as
	dg-additional-options.
	(bar): New function.
	(TESTIT_FREXP2): Rework the macro so that it doesn't test whether
	nothing has been stored to what the second argument points to, but
	instead that something has been stored there, whatever it is.
	(main): Temporarily don't enable the nan tests for -O0.

d19b0682

testsuite: Only run test if alarm is available · 57b706d1

Torbjörn SVENSSON authored 2 months ago


Most baremetal toolchains will not have an implementation for alarm and
sigaction as they are target specific.
For arm-none-eabi with newlib, function signatures are exposed, but
there is no implmentation and thus the test cases causes a undefined
symbol link error.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr78185.c: Remove dg-do and replace with
	with dg-require-effective-target of signal and alarm.
	* gcc.dg/pr116906-1.c: Likewise.
	* gcc.dg/pr116906-2.c: Likewise.
	* gcc.dg/vect/pr101145inf.c: Use effective-target alarm.
	* gcc.dg/vect/pr101145inf_1.c: Likewise.
	* lib/target-supports.exp(check_effective_target_alarm): New.

gcc/ChangeLog:

	* doc/sourcebuild.texi (Effective-Target Keywords): Document
	'alarm'.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

57b706d1

AVR: PR117726 - Tweak 32-bit logical shifts of 25...30 for -Oz. · f30edd17

Georg-Johann Lay authored 2 months ago

As it turns out, logical 32-bit shifts with an offset of 25..30 can
be performed in 7 instructions or less.  This beats the 7 instruc-
tions required for the default code of a shift loop.
Plus, with zero overhead, these cases can be 3-operand.

This is only relevant for -Oz because with -Os, 3op shifts are
split with -msplit-bit-shift (which is not performed with -Oz).

	PR target/117726
gcc/
	* config/avr/avr.cc (avr_ld_regno_p): New function.
	(ashlsi3_out) [case 25,26,27,28,29,30]: Handle and tweak.
	(lshrsi3_out): Same.
	(avr_rtx_costs_1) [SImode, ASHIFT, LSHIFTRT]: Adjust costs.
	* config/avr/avr.md (ashlsi3, *ashlsi3, *ashlsi3_const):
	Add "r,r,C4L" alternative.
	(lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,r,C4R" alternative.
	* config/avr/constraints.md (C4R, C4L): New,
gcc/testsuite/
	* gcc.target/avr/torture/avr-torture.exp (AVR_TORTURE_OPTIONS):
	Turn one option variant into -Oz.

f30edd17

Fortran: Regression- fix ICE at fortran/trans-decl.c:1575 [PR96087] · b3f51ea8

Paul Thomas authored 2 months ago

2025-01-23  Paul Thomas  <pault@gcc.gnu.org>

gcc/fortran
	PR fortran/96087
	* trans-decl.cc (gfc_get_symbol_decl): If a dummy is missing a
	backend decl, it is likely that it has come from a module proc
	interface. Look for the formal symbol by name in the containing
	proc and use its backend decl.
	* trans-expr.cc (gfc_apply_interface_mapping_to_expr): For the
	same reason, match the name, rather than the symbol address to
	perform the mapping.

gcc/testsuite/
	PR fortran/96087
	* gfortran.dg/pr96087.f90: New test.

b3f51ea8

tree-optimization/118558 - fix alignment compute with VMAT_CONTIGUOUS_REVERSE · 7fffff1d

Richard Biener authored 2 months ago

There are calls to dr_misalignment left that do not correct for the
offset (which is vector type dependent) when the stride is negative.
Notably vect_known_alignment_in_bytes doesn't allow to pass through
such offset which the following adds (computing the offset in
vect_known_alignment_in_bytes would be possible as well, but the
offset can be shared as seen).  Eventually this function could go away.

This leads to peeling for gaps not considerd, nor shortening of the
access applied which is what fixes the testcase on x86_64.

	PR tree-optimization/118558
	* tree-vectorizer.h (vect_known_alignment_in_bytes): Pass
	through offset to dr_misalignment.
	* tree-vect-stmts.cc (get_group_load_store_type): Compute
	offset applied for negative stride and use it when querying
	alignment of accesses.
	(vectorizable_load): Likewise.

	* gcc.dg/vect/pr118558.c: New testcase.

7fffff1d

c++: Update mangling of lambdas in expressions · 2119c254

Nathaniel Shead authored 4 months ago

https://github.com/itanium-cxx-abi/cxx-abi/pull/85

 clarifies that
mangling a lambda expression should use 'L' rather than "tl".

gcc/cp/ChangeLog:

	* mangle.cc (write_expression): Update mangling for lambdas.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/lambda-generic-mangle1.C: Update mangling.
	* g++.dg/cpp2a/lambda-generic-mangle1a.C: Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

2119c254

c++: Fix mangling of lambdas in static data member initializers [PR107741] · 685c458f

Nathaniel Shead authored 3 months ago


This fixes an issue where lambdas declared in the initializer of a
static data member within the class body do not get a mangling scope of
that variable; this results in mangled names that do not conform to the
ABI spec.

To do this, the patch splits up grokfield for this case specifically,
allowing a declaration to be build and used in start_lambda_scope before
parsing the initializer, so that record_lambda_scope works correctly.

As a drive-by, this also fixes the issue of a static member not being
visible within its own initializer.

	PR c++/107741

gcc/c-family/ChangeLog:

	* c-opts.cc (c_common_post_options): Bump ABI version.

gcc/ChangeLog:

	* common.opt: Add -fabi-version=20.
	* doc/invoke.texi: Likewise.

gcc/cp/ChangeLog:

	* cp-tree.h (start_initialized_static_member): Declare.
	(finish_initialized_static_member): Declare.
	* decl2.cc (start_initialized_static_member): New function.
	(finish_initialized_static_member): New function.
	* lambda.cc (record_lambda_scope): Support falling back to old
	ABI (maybe with warning).
	* parser.cc (cp_parser_member_declaration): Build decl early
	when parsing an initialized static data member.

gcc/testsuite/ChangeLog:

	* g++.dg/abi/macro0.C: Bump ABI version.
	* g++.dg/abi/mangle74.C: Remove XFAILs.
	* g++.dg/other/fold1.C: Restore originally raised error.
	* g++.dg/abi/lambda-ctx2-19.C: New test.
	* g++.dg/abi/lambda-ctx2-19vs20.C: New test.
	* g++.dg/abi/lambda-ctx2-20.C: New test.
	* g++.dg/abi/lambda-ctx2.h: New test.
	* g++.dg/cpp0x/static-member-init-1.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

685c458f

c++/modules: Fix exporting temploid friends in header units [PR118582] · 21cccfa9

Nathaniel Shead authored 2 months ago


When we started streaming the bit to handle merging of imported temploid
friends in r15-2807, I unthinkingly only streamed it in the
'!state->is_header ()' case.

This patch reworks the streaming logic to ensure that this data is
always streamed, including for unique entities (in case that ever comes
up somehow).  This does make the streaming slightly less efficient, as
functions and types will need an extra byte, but this doesn't appear to
make a huge difference to the size of the resulting module; the 'std'
module on my machine grows by 0.2% from 30671136 to 30730144 bytes.

	PR c++/118582

gcc/cp/ChangeLog:

	* module.cc (trees_out::decl_value): Always stream
	imported_temploid_friends information.
	(trees_in::decl_value): Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/pr118582_a.H: New test.
	* g++.dg/modules/pr118582_b.H: New test.
	* g++.dg/modules/pr118582_c.H: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

21cccfa9

LoongArch: Fix invalid subregs in xorsign [PR118501] · 9ddf4a6c

Xi Ruoyao authored 2 months ago

The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.

gcc/ChangeLog:

	PR target/118501
	* config/loongarch/loongarch.md (@xorsign<mode>3): Use
	force_lowpart_subreg.

Unverified

9ddf4a6c