Skip to content
Snippets Groups Projects
  1. Jan 24, 2025
    • rdubner's avatar
    • rdubner's avatar
      Triple-ply copybook playpen · 7370bf9f
      rdubner authored
      7370bf9f
    • rdubner's avatar
    • Richard Biener's avatar
      tree-optimization/116010 - dr_may_alias regression · 02fc12b0
      Richard Biener authored
      r15-491-gc290e6a0b7a9de fixed a latent issue with dr_analyze_innermost
      and dr_may_alias where not properly analyzed DRs would yield an invalid
      answer.  This caused some missed optimizations in case there is not
      actually any evolution in the not analyzed base part.  The following
      recovers this by only handling base parts which reference SSA vars
      as index in the conservative way.
      
      The gfortran.dg/vect/vect-8.f90 testcase is difficult to deal with,
      so the following merely bumps the maximum number of expected vectorized loops
      for both aarch64 and x86-64.
      
      	PR tree-optimization/116010
      	* tree-data-ref.cc (contains_ssa_ref_p_1): New function.
      	(contains_ssa_ref_p): Likewise.
      	(dr_may_alias_p): Avoid treating unanalyzed base parts without
      	SSA reference conservatively.
      
      	* gfortran.dg/vect/vect-8.f90: Adjust.
      02fc12b0
    • Stefan Schulze Frielinghaus's avatar
      s390: Implement isfinite and isnormal optabs · b00bd292
      Stefan Schulze Frielinghaus authored
      Merge new optabs with the existing implementations for signbit and
      isinf.
      
      gcc/ChangeLog:
      
      	* config/s390/s390.h (S390_TDC_POSITIVE_ZERO): Remove.
      	(S390_TDC_NEGATIVE_ZERO): Remove.
      	(S390_TDC_POSITIVE_NORMALIZED_BFP_NUMBER): Remove.
      	(S390_TDC_NEGATIVE_NORMALIZED_BFP_NUMBER): Remove.
      	(S390_TDC_POSITIVE_DENORMALIZED_BFP_NUMBER): Remove.
      	(S390_TDC_NEGATIVE_DENORMALIZED_BFP_NUMBER): Remove.
      	(S390_TDC_POSITIVE_INFINITY): Remove.
      	(S390_TDC_NEGATIVE_INFINITY): Remove.
      	(S390_TDC_POSITIVE_QUIET_NAN): Remove.
      	(S390_TDC_NEGATIVE_QUIET_NAN): Remove.
      	(S390_TDC_POSITIVE_SIGNALING_NAN): Remove.
      	(S390_TDC_NEGATIVE_SIGNALING_NAN): Remove.
      	(S390_TDC_POSITIVE_DENORMALIZED_DFP_NUMBER): Remove.
      	(S390_TDC_NEGATIVE_DENORMALIZED_DFP_NUMBER): Remove.
      	(S390_TDC_POSITIVE_NORMALIZED_DFP_NUMBER): Remove.
      	(S390_TDC_NEGATIVE_NORMALIZED_DFP_NUMBER): Remove.
      	(S390_TDC_SIGNBIT_SET): Remove.
      	(S390_TDC_INFINITY): Remove.
      	* config/s390/s390.md (signbit<mode>2<tf_fpr>): Merge this one
      	(isinf<mode>2<tf_fpr>): and this one into
      	(<TDC_CLASS:tdc_insn><mode>2<tf_fpr>): new expander.
      	(isnormal<mode>2<tf_fpr>): New BFP expander.
      	(isnormal<mode>2): New DFP expander.
      	* config/s390/vector.md (signbittf2_vr): Merge this one
      	(isinftf2_vr): and this one into
      	(<tdc_insn>tf2_vr): new expander.
      	(signbittf2): Merge this one
      	(isinftf2): and this one into
      	(<tdc_insn>tf2): new expander.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/s390/isfinite-isinf-isnormal-signbit-1.c: New test.
      	* gcc.target/s390/isfinite-isinf-isnormal-signbit-2.c: New test.
      	* gcc.target/s390/isfinite-isinf-isnormal-signbit-3.c: New test.
      	* gcc.target/s390/isfinite-isinf-isnormal-signbit.h: New test.
      b00bd292
    • Richard Biener's avatar
      tree-optimization/118634 - improve cunroll dump · dc1e1b38
      Richard Biener authored
      We no longer subtract the estimated eliminated number of instructions
      from the estimated size after unrolling we print - this is a bit
      confusing when comparing dumps to previous releases.  The following
      changes the dump from
      
        Estimated size after unrolling: 42
      
      to
      
        Estimated size after unrolling: 42-12
      
      for the testcase in the PR.
      
      	PR tree-optimization/118634
      	* tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely):
      	Dump the number of estimated eliminated insns.
      dc1e1b38
    • Saurabh Jha's avatar
      Fix command flags for SVE2 faminmax · 8bdf10fc
      Saurabh Jha authored
      Earlier, we were gating SVE2 faminmax behind sve+faminmax. This was
      incorrect and this patch changes it so that it is gated behind
      sve2+faminmax.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-sve2.md:
      	(*aarch64_pred_faminmax_fused): Fix to use the correct flags.
      	* config/aarch64/aarch64.h
      	(TARGET_SVE_FAMINMAX): Remove.
      	* config/aarch64/iterators.md: Fix iterators so that famax and
      	famin use correct flags.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/sve/faminmax_1.c: Fix test to use the
      	correct flags.
      	* gcc.target/aarch64/sve/faminmax_2.c: Fix test to use the
      	correct flags.
      	* gcc.target/aarch64/sve/faminmax_3.c: New test.
      8bdf10fc
    • Alexandre Oliva's avatar
      [ifcombine] check for more zero-extension cases [PR118572] · 91fa9c15
      Alexandre Oliva authored
      When comparing a signed narrow variable with a wider constant that has
      the bit corresponding to the variable's sign bit set, we would check
      that the constant is a sign-extension from that sign bit, and conclude
      that the compare fails if it isn't.
      
      When the signed variable is masked without getting the [lr]l_signbit
      variable set, or when the sign bit itself is masked out, we know the
      sign-extension bits from the extended variable are going to be zero,
      so the constant will only compare equal if it is a zero- rather than
      sign-extension from the narrow variable's precision, therefore, check
      that it satisfies this property, and yield a false compare result
      otherwise.
      
      
      for  gcc/ChangeLog
      
      	PR tree-optimization/118572
      	* gimple-fold.cc (fold_truth_andor_for_ifcombine): Compare as
      	unsigned the variables whose extension bits are masked out.
      
      for  gcc/testsuite/ChangeLog
      
      	PR tree-optimization/118572
      	* gcc.dg/field-merge-24.c: New.
      91fa9c15
    • Alexandre Oliva's avatar
      [ifcombine] improve reverse checking and operand swapping · a56122de
      Alexandre Oliva authored
      Don't reject an ifcombine field-merging opportunity just because the
      left-hand operands aren't both reversed, if the second compare needs
      to be swapped for operands to match.
      
      Also mention that reversep does NOT affect the turning of range tests
      into bit tests.
      
      
      for  gcc/ChangeLog
      
      	* gimple-fold.cc (fold_truth_andor_for_ifcombine): Document
      	reversep's absence of effects on range tests.  Don't reject
      	reversep mismatches before trying compare swapping.
      a56122de
    • Alexandre Oliva's avatar
      [ifcombine] out-of-bounds bitfield refs can trap [PR118514] · 3f05d703
      Alexandre Oliva authored
      Check that BIT_FIELD_REFs of DECLs are in range before deciding they
      don't trap.
      
      Check that a replacement bitfield load is as trapping as the replaced
      load.
      
      
      for  gcc/ChangeLog
      
      	PR tree-optimization/118514
      	* tree-eh.cc (bit_field_ref_in_bounds_p): New.
      	(tree_could_trap_p) <BIT_FIELD_REF>: Call it.
      	* gimple-fold.cc (make_bit_field_load): Check trapping status
      	of replacement load against original load.
      
      for  gcc/testsuite/ChangeLog
      
      	PR tree-optimization/118514
      	* gcc.dg/field-merge-23.c: New.
      3f05d703
    • GCC Administrator's avatar
      Daily bump. · 35d5c4f9
      GCC Administrator authored
      35d5c4f9
  2. Jan 23, 2025
    • Marek Polacek's avatar
      c++: bogus error with nested lambdas [PR117602] · 6d8a0e8b
      Marek Polacek authored
      
      The error here should also check that we aren't nested in another
      lambda; in it, at_function_scope_p() will be false.
      
      	PR c++/117602
      
      gcc/cp/ChangeLog:
      
      	* cp-tree.h (current_nonlambda_scope): Add a default argument.
      	* lambda.cc (current_nonlambda_scope): New bool parameter.  Use it.
      	* parser.cc (cp_parser_lambda_introducer): Use current_nonlambda_scope
      	to check if the lambda is non-local.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp2a/lambda-uneval21.C: New test.
      
      Reviewed-by: default avatarJason Merrill <jason@redhat.com>
      6d8a0e8b
    • Jakub Jelinek's avatar
      c++: Small make_tree_vector_from_ctor improvement · 4ce9e353
      Jakub Jelinek authored
      After committing the append_ctor_to_tree_vector patch, I've realized
      that for the larger constructors make_tree_vector_from_ctor unnecessarily
      wastes one GC vector; make_tree_vector () / release_tree_vector () only
      caches GC vectors from 4 to 16 allocated tree elements, so in the likely
      case of a rather small ctor using make_tree_vector () can be beneficial,
      we can pick something from the cache and if we don't need it later,
      pt.cc calls release_tree_vector on it to return it back to the cache.
      But for the larger ctors, we just eat one vector from the cache, never
      use it (because the vec_safe_reserve will immediately allocate a different
      vector) and never return it back to the cache.
      
      So, the following patch passes NULL for the larger vectors, which
      append_ctor_to_tree_vector handles just fine now (vec_safe_reserve will
      just allocate appropriately sized vector).
      
      2025-01-23  Jakub Jelinek  <jakub@redhat.com>
      
      	* c-common.cc (make_tree_vector_from_ctor): Only use make_tree_vector
      	for ctors with <= 16 elements.
      4ce9e353
    • John David Anglin's avatar
      hppa: Fix typo in ADDITIONAL_REGISTER_NAMES in pa32-regs.h · ce28eb9f
      John David Anglin authored
      2025-01-23  John David Anglin  <danglin@gcc.gnu.org>
      
      gcc/ChangeLog:
      
      	* config/pa/pa32-regs.h (ADDITIONAL_REGISTER_NAMES): Change
      	register 86 name to "%fr31L".
      ce28eb9f
    • rdubner's avatar
      c1dc4e7a
    • rdubner's avatar
      35e0b040
    • Jakub Jelinek's avatar
      vect: Avoid copying of uninitialized variable [PR118628] · 8f6dd185
      Jakub Jelinek authored
      vectorizable_{store,load} does roughly
            tree offvar;
            tree running_off;
            if (!costing_p)
              {
                ... initialize offvar ...
              }
            running_off = offvar;
            for (...)
              {
                if (costing_p)
                  {
                    ...
                    continue;
                  }
                ... use running_off ...
              }
      so, it copies unconditionally sometimes uninitialized variable (but then
      uses the copied variable only if it was set to something initialized).
      Still, I think it is better to avoid copying around maybe uninitialized
      vars.
      
      2025-01-23  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/118628
      	* tree-vect-stmts.cc (vectorizable_store, vectorizable_load):
      	Initialize offvar to NULL_TREE.
      8f6dd185
    • rdubner's avatar
      WIP: rounding · f81df894
      rdubner authored
      f81df894
    • Harald Anlauf's avatar
      Fortran: do not evaluate arguments of MAXVAL/MINVAL too often [PR118613] · 3cef53a4
      Harald Anlauf authored
      	PR fortran/118613
      
      gcc/fortran/ChangeLog:
      
      	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxval): Adjust algorithm
      	for inlined version of MINLOC and MAXLOC so that arguments are only
      	evaluted once, and create temporaries where necessary.  Document
      	change of algorithm.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/maxval_arg_eval_count.f90: New test.
      3cef53a4
    • James K. Lowden's avatar
      remove -v from install · db9f068f
      James K. Lowden authored
      db9f068f
    • Georg-Johann Lay's avatar
      AVR: PR118012 - Try to work around sick code from match.pd. · 0bb32230
      Georg-Johann Lay authored
      This patch tries to work around PR118012 which may use a
      full fledged multiplication instead of a simple bit test.
      This is because match.pd's
      
      /* (zero_one == 0) ? y : z <op> y -> ((typeof(y))zero_one * z) <op> y */
      /* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */
      
      "optimizes" code with op in { plus, ior, xor } like
      
        if (a & 1)
          b = b <op> c;
      
      to something like:
      
        x1 = EXTRACT_BIT0 (a);
        x2 = c MULT x1;
        b = b <op> x2;
      
      or
      
        x1 = EXTRACT_BIT0 (a);
        x2 = ZERO_EXTEND (x1);
        x3 = NEG x2;
        x4 = a AND x3:
        b = b <op> x4;
      
      which is very expensive and may even result in a libgcc call for
      a 32-bit multiplication on devices that don't even have MUL.
      Notice that EXTRACT_BIT0 is already more expensive (slower, more
      code, more register pressure) than a bit-test + branch.
      
      The patch:
      
      o Adds some combiner patterns that try to map sick code back
        to a bit test + branch.
      
      o Adjusts costs to make MULT (x AND 1) cheap, in the hope that the
        middle-end will use that alternative (which we map to sane code).
      
      o On devices without MUL, 32-bit multiplication was performed by a
        library call, which bypasses the MULT (x AND 1) and similar patterns.
        Therefore, mulsi3 is also allowed for devices without MUL so that
        we get at MULT pattern that can be transformed.  (Though this is
        not possible on AVR_TINY since it passes arguments on the stack).
      
      o Add a new command line option -mpr118012, so most of the patterns
        and cost computations can be switched off as they have
        avropt_pr118012 in their insn condition.
      
      o Added sign-extract.0 patterns unconditionally (no avropt_pr118012).
      
      Notice that this patch is just a work-around, it's not a fix of the
      root cause, which are the patterns in match.pd that don't care about
      the target and don't even care about costs.
      
      The work-around is incomplete, and 3 of the new tests are still failing.
      This is because there are situations where it does not work:
      
      * The MULT is realized as a library call.
      
      * The MULT is realized as an ASHIFT, and the ASHIFT again is transformed
        into something else.  For example, with -O2 -mmcu=atmega128,
        ASHIFT(3) is transformed into ASHIFT(1) + ASHIFT(2).
      
      	PR tree-optimization/118012
      	PR tree-optimization/118360
      gcc/
      	* config/avr/avr.opt (-mpr118012): New undocumented option.
      	* config/avr/avr-protos.h (avr_out_sextr)
      	(avr_emit_skip_pixop, avr_emit_skip_clear): New protos.
      	* config/avr/avr.cc (avr_adjust_insn_length)
      	[case ADJUST_LEN_SEXTR]: Handle case.
      	(avr_rtx_costs_1) [NEG]: Costs for NEG (ZERO_EXTEND (ZERO_EXTRACT)).
      	[MULT && avropt_pr118012]: Costs for MULT (x AND 1).
      	(avr_out_sextr, avr_emit_skip_pixop, avr_emit_skip_clear): New
      	functions.
      	* config/avr/avr.md [avropt_pr118012]: Add combine patterns with
      	that condition that try to work around PR118012.
      	(adjust_len) <sextr>: Add insn attr value.
      	(pixop): New code iterator.
      	(mulsi3) [avropt_pr118012 && !AVR_TINY]: Allow these in insn condition.
      gcc/testsuite/
      	* gcc.target/avr/mmcu/pr118012-1.h: New file.
      	* gcc.target/avr/mmcu/pr118012-1-o2-m128.c: New test.
      	* gcc.target/avr/mmcu/pr118012-1-os-m128.c: New test.
      	* gcc.target/avr/mmcu/pr118012-1-o2-m103.c: New test.
      	* gcc.target/avr/mmcu/pr118012-1-os-m103.c: New test.
      	* gcc.target/avr/mmcu/pr118012-1-o2-t40.c: New test.
      	* gcc.target/avr/mmcu/pr118012-1-os-t40.c: New test.
      	* gcc.target/avr/mmcu/pr118360-1.h: New file.
      	* gcc.target/avr/mmcu/pr118360-1-o2-m128.c: New test.
      	* gcc.target/avr/mmcu/pr118360-1-os-m128.c: New test.
      	* gcc.target/avr/mmcu/pr118360-1-o2-m103.c: New test.
      	* gcc.target/avr/mmcu/pr118360-1-os-m103.c: New test.
      	* gcc.target/avr/mmcu/pr118360-1-o2-t40.c: New test.
      	* gcc.target/avr/mmcu/pr118360-1-os-t40.c: New test.
      0bb32230
    • Jan Hubicka's avatar
      Optimize vector<bool>::operator[] · 2d55c016
      Jan Hubicka authored
      the following testcase:
      
        bool f(const std::vector<bool>& v, std::size_t x) {
          return v[x];
        }
      
      is compiled as:
      
      f(std::vector<bool, std::allocator<bool> > const&, unsigned long):
              testq   %rsi, %rsi
              leaq    63(%rsi), %rax
              movq    (%rdi), %rdx
              cmovns  %rsi, %rax
              sarq    $6, %rax
              leaq    (%rdx,%rax,8), %rdx
              movq    %rsi, %rax
              sarq    $63, %rax
              shrq    $58, %rax
              addq    %rax, %rsi
              andl    $63, %esi
              subq    %rax, %rsi
              jns     .L2
              addq    $64, %rsi
              subq    $8, %rdx
      .L2:
              movl    $1, %eax
              shlx    %rsi, %rax, %rax
              andq    (%rdx), %rax
              setne   %al
              ret
      
      which is quite expensive for simple bit access in a bitmap.  The reason is that
      the bit access is implemented using iterators
      	return begin()[__n];
      Which in turn cares about situation where __n is negative yielding the extra
      conditional.
      
          _GLIBCXX20_CONSTEXPR
          void
          _M_incr(ptrdiff_t __i)
          {
            _M_assume_normalized();
            difference_type __n = __i + _M_offset;
            _M_p += __n / int(_S_word_bit);
            __n = __n % int(_S_word_bit);
            if (__n < 0)
              {
                __n += int(_S_word_bit);
                --_M_p;
              }
            _M_offset = static_cast<unsigned int>(__n);
          }
      
      While we can use __builtin_unreachable to declare that __n is in range
      0...max_size () but I think it is better to implement it directly, since
      resulting code is shorter and much easier to optimize.
      
      We now porduce:
      .LFB1248:
              .cfi_startproc
              movq    (%rdi), %rax
              movq    %rsi, %rdx
              shrq    $6, %rdx
              andq    (%rax,%rdx,8), %rsi
              andl    $63, %esi
              setne   %al
              ret
      
      Testcase suggests
              movq    (%rdi), %rax
              movl    %esi, %ecx
              shrq    $5, %rsi        # does still need to be 64-bit
              movl    (%rax,%rsi,4), %eax
              btl     %ecx, %eax
              setb    %al
              retq
      Which is still one instruction shorter.
      
      libstdc++-v3/ChangeLog:
      
      	PR target/80813
      	* include/bits/stl_bvector.h (vector<bool, _Alloc>::operator []): Do
      	not use iterators.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/80813
      	* g++.dg/tree-ssa/bvector-3.C: New test.
      2d55c016
    • Richard Sandiford's avatar
      rtl-ssa: Avoid dangling phi uses [PR118562] · 3dbcf794
      Richard Sandiford authored
      rtl-ssa uses degenerate phis to maintain an RPO list of
      accesses in which every use is of the RPO-previous definition.
      Thus, if it finds that a phi is always equal to a particular
      value V, it sometimes needs to keep the phi and make V the
      single input, rather than replace all uses of the phi with V.
      
      The code to do that rerouted the phi's first input to the single
      value V.  But as this PR shows, it failed to unlink the uses of
      the other inputs.
      
      The specific problem in the PR was that we had:
      
          x = PHI<x(a), V(b)>
      
      The code replaced the first input with V and removed the second
      input from the phi, but it didn't unlink the use of V associated
      with that second input.
      
      gcc/
      	PR rtl-optimization/118562
      	* rtl-ssa/blocks.cc (function_info::replace_phi): When converting
      	to a degenerate phi, make sure to remove all uses of the previous
      	inputs.
      
      gcc/testsuite/
      	PR rtl-optimization/118562
      	* gcc.dg/torture/pr118562.c: New test.
      3dbcf794
    • Richard Sandiford's avatar
      aarch64: Avoid redundant writes to FPMR · 1886dfb2
      Richard Sandiford authored
      GCC 15 is the first release to support FP8 intrinsics.
      The underlying instructions depend on the value of a new register,
      FPMR.  Unlike FPCR, FPMR is a normal call-clobbered/caller-save
      register rather than a global register.  So:
      
      - The FP8 intrinsics take a final uint64_t argument that
        specifies what value FPMR should have.
      
      - If an FP8 operation is split across multiple functions,
        it is likely that those functions would have a similar argument.
      
      If the object code has the structure:
      
          for (...)
            fp8_kernel (..., fpmr_value);
      
      then fp8_kernel would set FPMR to fpmr_value each time it is
      called, even though FPMR will already have that value for at
      least the second and subsequent calls (and possibly the first).
      
      The working assumption for the ABI has been that writes to
      registers like FPMR can in general be more expensive than
      reads and so it would be better to use a conditional write like:
      
             mrs     tmp, fpmr
             cmp     tmp, <value>
             beq     1f
             msr     fpmr, <value>
           1:
      
      instead of writing the same value to FPMR repeatedly.
      
      This patch implements that.  It also adds a tuning flag that suppresses
      the behaviour, both to make testing easier and to support any future
      cores that (for example) are able to rename FPMR.
      
      Hopefully this really is the last part of the FP8 enablement.
      
      gcc/
      	* config/aarch64/aarch64-tuning-flags.def
      	(AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag.
      	* config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro.
      	* config/aarch64/aarch64.md: Split moves into FPMR into a test
      	and branch around.
      	(aarch64_write_fpmr): New pattern.
      
      gcc/testsuite/
      	* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add
      	cheap_fpmr_write by default.
      	* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
      	* gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write.
      	* gcc.target/aarch64/acle/fpmr-2.c: Likewise.
      	* gcc.target/aarch64/simd/vcvt_fpm.c: Likewise.
      	* gcc.target/aarch64/simd/vdot2_fpm.c: Likewise.
      	* gcc.target/aarch64/simd/vdot4_fpm.c: Likewise.
      	* gcc.target/aarch64/simd/vmla_fpm.c: Likewise.
      	* gcc.target/aarch64/acle/fpmr-6.c: New test.
      1886dfb2
    • Richard Sandiford's avatar
      aarch64: Fix memory cost for FPM_REGNUM · ce6fc67d
      Richard Sandiford authored
      GCC 15 is going to be the first release to support FPMR.
      While working on a follow-up patch, I noticed that for:
      
          (set (reg:DI R) ...)
          ...
          (set (reg:DI fpmr) (reg:DI R))
      
      IRA would prefer to spill R to memory rather than allocate a GPR.
      This is because the register move cost for GENERAL_REGS to
      MOVEABLE_SYSREGS is very high:
      
        /* Moves to/from sysregs are expensive, and must go via GPR.  */
        if (from == MOVEABLE_SYSREGS)
          return 80 + aarch64_register_move_cost (mode, GENERAL_REGS, to);
        if (to == MOVEABLE_SYSREGS)
          return 80 + aarch64_register_move_cost (mode, from, GENERAL_REGS);
      
      but the memory cost for MOVEABLE_SYSREGS was the same as for
      GENERAL_REGS, making memory much cheaper.
      
      Loading and storing FPMR involves a GPR temporary, so the cost should
      account for moving into and out of that temporary.
      
      This did show up indirectly in some of the existing asm tests,
      where the stack frame allocated 16 bytes for callee saves (D8)
      and another 16 bytes for spilling a temporary register.
      
      It's possible that other registers need the same treatment
      and it's more than probable that this code needs a rework.
      None of that seems suitable for stage 4 though.
      
      gcc/
      	* config/aarch64/aarch64.cc (aarch64_memory_move_cost): Account
      	for the cost of moving in and out of GENERAL_SYSREGS.
      
      gcc/testsuite/
      	* gcc.target/aarch64/acle/fpmr-5.c: New test.
      	* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Don't expect
      	a spill slot to be allocated.
      	* gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.
      ce6fc67d
    • Richard Sandiford's avatar
      aarch64: Allow FPMR source values to be zero · 97beccb3
      Richard Sandiford authored
      GCC 15 is going to be the first release to support FPMR.
      The alternatives for moving values into FPMR were missing
      a zero alternative, meaning that moves of zero would use an
      unnecessary temporary register.
      
      gcc/
      	* config/aarch64/aarch64.md (*mov<SHORT:mode>_aarch64)
      	(*movsi_aarch64, *movdi_aarch64): Allow the source of an MSR
      	to be zero.
      
      gcc/testsuite/
      	* gcc.target/aarch64/acle/fp8.c: Add tests for moving zero into FPMR.
      97beccb3
    • Jakub Jelinek's avatar
      tree-assume: Fix UB in assume_query [PR118605] · 27a05f8d
      Jakub Jelinek authored
      The assume_query constructor does
      assume_query::assume_query (function *f, bitmap p) : m_parm_list (p),
                                                           m_func (f)
      where m_parm_list is bitmap &.  This is compile time UB, because
      as soon as the constructor returns, m_parm_list reference is still
      bound to the parameter of the constructor which is no longer in scope.
      
      Now, one possible fix would be change the ctor argument to be bitmap &,
      but that doesn't really work because in the only user of that class
      we have
            auto_bitmap decls;
      ...
            assume_query query (fun, decls);
      and auto_bitmap just has
        operator bitmap () { return &m_bits; }
      Could be perhaps const bitmap &, but why?  bitmap is a pointer:
      typedef class bitmap_head *bitmap;
      and the EXECUTE_IF_SET_IN_BITMAP macros don't really change that point,
      they just inspect what is inside of that bitmap_head the pointer points
      to.
      
      So, the simplest I think is avoid references (which cause even worse
      code as it has to be dereferenced twice rather than once).
      
      2025-01-23  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/118605
      	* tree-assume.cc (assume_query::m_parm_list): Change type
      	from bitmap & to bitmap.
      27a05f8d
    • Tejas Belagod's avatar
      OpenMP/PolyInt: Pass poly-int structures by address to OMP libs. · b8ac0616
      Tejas Belagod authored
      Currently poly-int type structures are passed by value to OpenMP runtime
      functions for shared clauses etc.  This patch improves on this by passing
      around poly-int structures by address to avoid copy-overhead.
      
      gcc/ChangeLog:
      
      	* omp-low.cc (use_pointer_for_field): Use pointer if the OMP data
      	structure's field type is a poly-int.
      b8ac0616
    • Rainer Orth's avatar
      testsuite: i386: Adjust gcc.target/i386/cmov12.c for Sun as syntax · 314d20bb
      Rainer Orth authored
      The new gcc.target/i386/cmov12.c test FAILs on Solaris/x86 with the
      native as:
      
      FAIL: gcc.target/i386/cmov12.c scan-assembler-times cmovg 3
      
      This happens because as uses a different syntax for cmov:
      
      --- cmov12.s.bu243	2025-01-21 16:55:27.038829605 +0100
      +++ cmov12.s.bu24390	2025-01-21 16:55:44.565051230 +0100
      @@ -41,9 +41,9 @@
       	leal	1(%rdx), %ebp
       	movl	(%r11), %esi
       	cmpl	%eax, %esi
      -	cmovg	%ebp, %edx
      -	cmovg	%r11, %rcx
      -	cmovg	%esi, %eax
      +	cmovl.g	%ebp, %edx
      +	cmovq.g	%r11, %rcx
      +	cmovl.g	%esi, %eax
      
      The problem is even more prominent with the upcoming gas 2.44 which
      added support for the Sun as syntax on Solaris, which gcc/configure
      picks up.
      
      This patch allows for both forms.
      
      Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
      
      2025-01-22  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
      
      	gcc/testsuite:
      	* gcc.target/i386/cmov12.c (scan-assembler-times): Allow for
      	cmovl.g etc.
      314d20bb
    • Jakub Jelinek's avatar
      c++: Fix build_omp_array_section for type dependent array_expr [PR118590] · b02c061b
      Jakub Jelinek authored
      As can be seen on the testcase, when array_expr is type dependent, assuming
      it has non-NULL TREE_TYPE is just wrong, it can often have NULL type, and even
      if not, blindly assuming it is a pointer or array type is also wrong.
      
      So, like in many other spots in the C++ FE, for type dependent expressions
      we want to create something which will survive until instantiation and can be
      redone at that point.
      
      Unfortunately, build_omp_array_section is called before we actually do any
      kind of checking what array_expr really is, and on invalid code it can be e.g.
      a TYPE_DECL on which type_dependent_expression_p ICEs (as can be seen on the
      pr67522.C testcase).  So, I've hacked this by checking it is not TYPE_DECL,
      I hope a TYPE_P can't make it through there when we just lookup an identifier.
      
      Anyway, this patch is not enough, we can ICE e.g. on __uint128_t[0:something]
      during instantiation, so I think something needs to be done for this in pt.cc
      as well.
      
      2025-01-23  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118590
      	* typeck.cc (build_omp_array_section): If array_expr is type dependent
      	or a TYPE_DECL, build OMP_ARRAY_SECTION with NULL type.
      
      	* g++.dg/goacc/pr118590.C: New test.
      b02c061b
    • Jakub Jelinek's avatar
      c++: Fix weird expression in test for clauses other than when/default/otherwise [PR118604] · dd14b08e
      Jakub Jelinek authored
      Some clang analyzer warned about
      if (!strcmp (p, "when") == 0 && !default_p)
      which really looks weird, it is better to use strcmp (p, "when") != 0
      or !!strcmp (p, "when").  Furthermore, as a micro optimization, it is cheaper
      to evaluate default_p than calling strcmp, so that can be put first in the &&.
      
      The C test for the same thing wasn't that weird, but I think for consistency
      it is better to use the same test rather than trying to be creative.
      
      2025-01-23  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118604
      gcc/c/
      	* c-parser.cc (c_parser_omp_metadirective): Rewrite
      	condition for clauses other than when, default and otherwise.
      gcc/cp/
      	* parser.cc (cp_parser_omp_metadirective): Test !default_p
      	first and use strcmp () != 0 rather than !strcmp () == 0.
      dd14b08e
    • Jakub Jelinek's avatar
      builtins: Store unspecified value to *exp for inf/nan [PR114877] · d19b0682
      Jakub Jelinek authored
      The fold_builtin_frexp folding for NaN/Inf just returned the first argument
      with evaluating second arguments side-effects, rather than storing something
      to what the second argument points to.
      
      The PR argues that the C standard requires the function to store something
      there but what exactly is stored is unspecified, so not storing there
      anything can result in UB if the value isn't initialized and is read later.
      
      glibc and newlib store there 0, musl apparently doesn't store anything.
      
      The following patch stores there zero (or would you prefer storing there
      some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form
      in assembly though) and adjusts the test so that it
      doesn't rely on not storing there anything but instead checks for
      -Wmaybe-uninitialized warning to find out that something has been stored
      there.
      Unfortunately I had to disable the NaN tests for -O0, while we can fold
      __builtin_isnan (__builtin_nan ("")) at compile time, we can't fold
      __builtin_isnan ((i = 0, __builtin_nan (""))) at compile time.
      fold_builtin_classify uses just tree_expr_nan_p and if that isn't true
      (because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg),
      it does
            arg = builtin_save_expr (arg);
            return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
      and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and
      nothing propagates the NAN to the comparison.
      I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR:
      added and recurse on the second argument, but that feels like stage1
      material to me if we want to do that at all.
      
      2025-01-23  Jakub Jelinek  <jakub@redhat.com>
      
      	PR middle-end/114877
      	* builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases
      	like rvc_zero, return passed in arg and set *exp = 0.
      
      	* gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as
      	dg-additional-options.
      	(bar): New function.
      	(TESTIT_FREXP2): Rework the macro so that it doesn't test whether
      	nothing has been stored to what the second argument points to, but
      	instead that something has been stored there, whatever it is.
      	(main): Temporarily don't enable the nan tests for -O0.
      d19b0682
    • Torbjörn SVENSSON's avatar
      testsuite: Only run test if alarm is available · 57b706d1
      Torbjörn SVENSSON authored
      
      Most baremetal toolchains will not have an implementation for alarm and
      sigaction as they are target specific.
      For arm-none-eabi with newlib, function signatures are exposed, but
      there is no implmentation and thus the test cases causes a undefined
      symbol link error.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/pr78185.c: Remove dg-do and replace with
      	with dg-require-effective-target of signal and alarm.
      	* gcc.dg/pr116906-1.c: Likewise.
      	* gcc.dg/pr116906-2.c: Likewise.
      	* gcc.dg/vect/pr101145inf.c: Use effective-target alarm.
      	* gcc.dg/vect/pr101145inf_1.c: Likewise.
      	* lib/target-supports.exp(check_effective_target_alarm): New.
      
      gcc/ChangeLog:
      
      	* doc/sourcebuild.texi (Effective-Target Keywords): Document
      	'alarm'.
      
      Signed-off-by: default avatarTorbjörn SVENSSON <torbjorn.svensson@foss.st.com>
      57b706d1
    • Georg-Johann Lay's avatar
      AVR: PR117726 - Tweak 32-bit logical shifts of 25...30 for -Oz. · f30edd17
      Georg-Johann Lay authored
      As it turns out, logical 32-bit shifts with an offset of 25..30 can
      be performed in 7 instructions or less.  This beats the 7 instruc-
      tions required for the default code of a shift loop.
      Plus, with zero overhead, these cases can be 3-operand.
      
      This is only relevant for -Oz because with -Os, 3op shifts are
      split with -msplit-bit-shift (which is not performed with -Oz).
      
      	PR target/117726
      gcc/
      	* config/avr/avr.cc (avr_ld_regno_p): New function.
      	(ashlsi3_out) [case 25,26,27,28,29,30]: Handle and tweak.
      	(lshrsi3_out): Same.
      	(avr_rtx_costs_1) [SImode, ASHIFT, LSHIFTRT]: Adjust costs.
      	* config/avr/avr.md (ashlsi3, *ashlsi3, *ashlsi3_const):
      	Add "r,r,C4L" alternative.
      	(lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,r,C4R" alternative.
      	* config/avr/constraints.md (C4R, C4L): New,
      gcc/testsuite/
      	* gcc.target/avr/torture/avr-torture.exp (AVR_TORTURE_OPTIONS):
      	Turn one option variant into -Oz.
      f30edd17
    • Paul Thomas's avatar
      Fortran: Regression- fix ICE at fortran/trans-decl.c:1575 [PR96087] · b3f51ea8
      Paul Thomas authored
      2025-01-23  Paul Thomas  <pault@gcc.gnu.org>
      
      gcc/fortran
      	PR fortran/96087
      	* trans-decl.cc (gfc_get_symbol_decl): If a dummy is missing a
      	backend decl, it is likely that it has come from a module proc
      	interface. Look for the formal symbol by name in the containing
      	proc and use its backend decl.
      	* trans-expr.cc (gfc_apply_interface_mapping_to_expr): For the
      	same reason, match the name, rather than the symbol address to
      	perform the mapping.
      
      gcc/testsuite/
      	PR fortran/96087
      	* gfortran.dg/pr96087.f90: New test.
      b3f51ea8
    • Richard Biener's avatar
      tree-optimization/118558 - fix alignment compute with VMAT_CONTIGUOUS_REVERSE · 7fffff1d
      Richard Biener authored
      There are calls to dr_misalignment left that do not correct for the
      offset (which is vector type dependent) when the stride is negative.
      Notably vect_known_alignment_in_bytes doesn't allow to pass through
      such offset which the following adds (computing the offset in
      vect_known_alignment_in_bytes would be possible as well, but the
      offset can be shared as seen).  Eventually this function could go away.
      
      This leads to peeling for gaps not considerd, nor shortening of the
      access applied which is what fixes the testcase on x86_64.
      
      	PR tree-optimization/118558
      	* tree-vectorizer.h (vect_known_alignment_in_bytes): Pass
      	through offset to dr_misalignment.
      	* tree-vect-stmts.cc (get_group_load_store_type): Compute
      	offset applied for negative stride and use it when querying
      	alignment of accesses.
      	(vectorizable_load): Likewise.
      
      	* gcc.dg/vect/pr118558.c: New testcase.
      7fffff1d
    • Nathaniel Shead's avatar
      c++: Update mangling of lambdas in expressions · 2119c254
      Nathaniel Shead authored
      https://github.com/itanium-cxx-abi/cxx-abi/pull/85
      
       clarifies that
      mangling a lambda expression should use 'L' rather than "tl".
      
      gcc/cp/ChangeLog:
      
      	* mangle.cc (write_expression): Update mangling for lambdas.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp2a/lambda-generic-mangle1.C: Update mangling.
      	* g++.dg/cpp2a/lambda-generic-mangle1a.C: Likewise.
      
      Signed-off-by: default avatarNathaniel Shead <nathanieloshead@gmail.com>
      2119c254
    • Nathaniel Shead's avatar
      c++: Fix mangling of lambdas in static data member initializers [PR107741] · 685c458f
      Nathaniel Shead authored
      
      This fixes an issue where lambdas declared in the initializer of a
      static data member within the class body do not get a mangling scope of
      that variable; this results in mangled names that do not conform to the
      ABI spec.
      
      To do this, the patch splits up grokfield for this case specifically,
      allowing a declaration to be build and used in start_lambda_scope before
      parsing the initializer, so that record_lambda_scope works correctly.
      
      As a drive-by, this also fixes the issue of a static member not being
      visible within its own initializer.
      
      	PR c++/107741
      
      gcc/c-family/ChangeLog:
      
      	* c-opts.cc (c_common_post_options): Bump ABI version.
      
      gcc/ChangeLog:
      
      	* common.opt: Add -fabi-version=20.
      	* doc/invoke.texi: Likewise.
      
      gcc/cp/ChangeLog:
      
      	* cp-tree.h (start_initialized_static_member): Declare.
      	(finish_initialized_static_member): Declare.
      	* decl2.cc (start_initialized_static_member): New function.
      	(finish_initialized_static_member): New function.
      	* lambda.cc (record_lambda_scope): Support falling back to old
      	ABI (maybe with warning).
      	* parser.cc (cp_parser_member_declaration): Build decl early
      	when parsing an initialized static data member.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/abi/macro0.C: Bump ABI version.
      	* g++.dg/abi/mangle74.C: Remove XFAILs.
      	* g++.dg/other/fold1.C: Restore originally raised error.
      	* g++.dg/abi/lambda-ctx2-19.C: New test.
      	* g++.dg/abi/lambda-ctx2-19vs20.C: New test.
      	* g++.dg/abi/lambda-ctx2-20.C: New test.
      	* g++.dg/abi/lambda-ctx2.h: New test.
      	* g++.dg/cpp0x/static-member-init-1.C: New test.
      
      Signed-off-by: default avatarNathaniel Shead <nathanieloshead@gmail.com>
      685c458f
    • Nathaniel Shead's avatar
      c++/modules: Fix exporting temploid friends in header units [PR118582] · 21cccfa9
      Nathaniel Shead authored
      
      When we started streaming the bit to handle merging of imported temploid
      friends in r15-2807, I unthinkingly only streamed it in the
      '!state->is_header ()' case.
      
      This patch reworks the streaming logic to ensure that this data is
      always streamed, including for unique entities (in case that ever comes
      up somehow).  This does make the streaming slightly less efficient, as
      functions and types will need an extra byte, but this doesn't appear to
      make a huge difference to the size of the resulting module; the 'std'
      module on my machine grows by 0.2% from 30671136 to 30730144 bytes.
      
      	PR c++/118582
      
      gcc/cp/ChangeLog:
      
      	* module.cc (trees_out::decl_value): Always stream
      	imported_temploid_friends information.
      	(trees_in::decl_value): Likewise.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/modules/pr118582_a.H: New test.
      	* g++.dg/modules/pr118582_b.H: New test.
      	* g++.dg/modules/pr118582_c.H: New test.
      
      Signed-off-by: default avatarNathaniel Shead <nathanieloshead@gmail.com>
      21cccfa9
    • Xi Ruoyao's avatar
      LoongArch: Fix invalid subregs in xorsign [PR118501] · 9ddf4a6c
      Xi Ruoyao authored
      The test case added in r15-7073 now triggers an ICE, indicating we need
      the same fix as AArch64.
      
      gcc/ChangeLog:
      
      	PR target/118501
      	* config/loongarch/loongarch.md (@xorsign<mode>3): Use
      	force_lowpart_subreg.
      Unverified
      9ddf4a6c
Loading