Skip to content
Snippets Groups Projects
  1. Jan 16, 2025
    • Jakub Jelinek's avatar
      tree-ssa-propagate: Special case lhs of musttail calls in may_propagate_copy [PR118430] · 7f5adfd3
      Jakub Jelinek authored
      This patch ensures that VRP or similar passes don't replace the uses of lhs of
      [[gnu::musttail]] calls with some constant (e.g. if the call is known is known
      to return a singleton value range) etc. to make it more likely that it is actually
      tail callable.
      
      2025-01-16  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/118430
      	* tree-ssa-propagate.cc (may_propagate_copy): Return false if dest
      	is lhs of an [[gnu::musttail]] call.
      	(substitute_and_fold_dom_walker::before_dom_children): Formatting fix.
      
      	* c-c++-common/musttail14.c: Expect lhs on the must tail call calls.
      7f5adfd3
    • Jakub Jelinek's avatar
      tailc: Virtually undo IPA-VRP return value optimization for tail calls [PR118430] · 9c4397ca
      Jakub Jelinek authored
      When we have return somefn (whatever); where somefn is normally tail
      callable and IPA-VRP determines somefn returns a singleton range, VRP
      just changes the IL to
        somefn (whatever);
        return 42;
      (or whatever the value in that range is).  The introduction of IPA-VRP
      return value tracking then effectively regresses the tail call optimization.
      This is even more important if the call is [[gnu::musttail]].
      
      So, the following patch queries IPA-VRP whether a function returns singleton
      range and if so and the value returned is identical to that, marks the
      call as [tail call] anyway.  If expansion decides it can't use the tail
      call, we'll still expand the return 42; or similar statement, and if it
      decides it can use the tail call, that part will be ignored and we'll emit
      normal tail call.
      
      The reason it works is that the expand pass relies on the tailc pass to
      do its job properly.
      E.g. when we have
        <bb 2> [local count: 1073741824]:
        foo (x_2(D));
        baz (&v);
        v ={v} {CLOBBER(eos)};
        bar (x_2(D)); [tail call]
        return 1;
      when expand_gimple_basic_block handles the bar (x_2(D)); call, it uses
                if (call_stmt && gimple_call_tail_p (call_stmt))
                  {
                    bool can_fallthru;
                    new_bb = expand_gimple_tailcall (bb, call_stmt, &can_fallthru);
                    if (new_bb)
                      {
                        if (can_fallthru)
                          bb = new_bb;
                        else
                          {
                            currently_expanding_gimple_stmt = NULL;
                            return new_bb;
                          }
                      }
                  }
      As it is actually tail callable during expansion of the bar (x_2(D)); call
      stmt, expand_gimple_tailbb returns non-NULL and sets can_fallthru to false,
      plus emits
      ;; bar (x_2(D)); [tail call]
      
      (insn 11 10 12 2 (set (reg:SI 5 di)
              (reg/v:SI 99 [ x ])) "pr118430.c":35:10 -1
           (nil))
      
      (call_insn/j 12 11 13 2 (set (reg:SI 0 ax)
              (call (mem:QI (symbol_ref:DI ("bar") [flags 0x3]  <function_decl 0x7fb39020bd00 bar>) [0 bar S1 A8])
                  (const_int 0 [0]))) "pr118430.c":35:10 -1
           (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x3]  <function_decl 0x7fb39020bd00 bar>)
              (expr_list:REG_EH_REGION (const_int 0 [0])
                  (nil)))
          (expr_list:SI (use (reg:SI 5 di))
              (nil)))
      
      (barrier 13 12 0)
      Because it doesn't fallthru, no further statements in the same bb are
      expanded.  Now, if the bb with return happened to be in some other basic
      block from the [tail call], it could be expanded but because the bb with
      tail call ends with a barrier, it doesn't fall thru there and if nothing
      else could reach it, we'd remove the unreachable bb RSN.
      
      2025-01-16  Jakub Jelinek  <jakub@redhat.com>
      	    Andrew Pinski  <quic_apinski@quicinc.com>
      
      	PR tree-optimization/118430
      	* tree-tailcall.cc: Include gimple-range.h, alloc-pool.h, sreal.h,
      	symbol-summary.h, ipa-cp.h and ipa-prop.h.
      	(find_tail_calls): If ass_var is NULL and ret_var is not, check if
      	IPA-VRP has not found singleton return range for it.  In that case,
      	don't punt if ret_var is the only value in that range.  Adjust the
      	maybe_error_musttail message otherwise to diagnose different value
      	being returned from the caller and callee rather than using return
      	slot.  Formatting fixes.
      
      	* c-c++-common/musttail14.c: New test.
      	* c-c++-common/pr118430.c: New test.
      9c4397ca
    • Jakub Jelinek's avatar
      docs: Fix up inline asm documentation · 015ec112
      Jakub Jelinek authored
      When writing the gcc-15/changes.html patch posted earlier, I've been
      wondering where significant part of the Basic asm chapter went and the
      problem was the insertion of a new @node in the middle of the Basic Asm
      @node, plus not mentioning the new @node in the @menu.  So the asm constexpr
      node was not normally visible and the Remarks for the section neither.
      
      The following patch moves it before Asm Labels, removes the spots where it
      described what hasn't been actually committed (constant expression can only
      be a container with data/size member functions) and fixes up the toplevel
      extended asm documentation (it was in the Basic Asm remarks and Extended Asm
      section's remark still said it is not valid).
      
      2025-01-16  Jakub Jelinek  <jakub@redhat.com>
      
      	* doc/extend.texi (Using Assembly Language with C): Add Asm constexprs
      	to @menu.
      	(Basic Asm): Move @node asm constexprs before Asm Labels, rename to
      	Asm constexprs, change wording so that it is clearer that the constant
      	expression actually must not return a string literal, just some specific
      	container and other wording tweaks.  Only talk about top-level for basic
      	asms in this @node, move restrictions on top-level extended asms to ...
      	(Extended Asm): ... here.
      015ec112
    • Jakub Jelinek's avatar
      vec.h: Properly destruct elements in auto_vec auto storage [PR118400] · 43f4d44b
      Jakub Jelinek authored
      For T with non-trivial destructors, we were destructing objects in the
      vector on release only when not using auto storage of auto_vec.
      
      The following patch calls truncate (0) instead of m_vecpfx.m_num clearing,
      and truncate takes care of that destruction:
        unsigned l = length ();
        gcc_checking_assert (l >= size);
        if (!std::is_trivially_destructible <T>::value)
          vec_destruct (address () + size, l - size);
        m_vecpfx.m_num = size;
      
      2025-01-16  Jakub Jelinek  <jakub@redhat.com>
      
      	PR ipa/118400
      	* vec.h (vec<T, va_heap, vl_ptr>::release): Call m_vec->truncate (0)
      	instead of clearing m_vec->m_vecpfx.m_num.
      43f4d44b
    • liuhongt's avatar
      Fix typo to avoid ICE. · 3872daa5
      liuhongt authored
      gcc/ChangeLog:
      
      	PR target/118489
      	* config/i386/sse.md (VF1_AVX512BW): Fix typo.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr118489.c: New test.
      3872daa5
    • Richard Biener's avatar
      tree-optimization/115895 - overrun with masked loop · 1b5d2ccd
      Richard Biener authored
      The following addresses the fact that with loop masking (or regular
      mask loads) we do not implement load shortening but we override
      the case where we need that for correctness.  Likewise when we
      attempt to use loop masking to handle large trailing gaps we cannot
      do so when there's this overrun case.
      
      	PR tree-optimization/115895
      	* tree-vect-stmts.cc (get_group_load_store_type): When we
      	might overrun because the group size is not a multiple of the
      	vector size we cannot use loop masking since that does not
      	implement the required load shortening.
      
      	* gcc.target/i386/vect-pr115895.c: New testcase.
      1b5d2ccd
    • Keith Packard's avatar
      lm32: In va_arg, skip to stack args with too few remaining reg args · cf9de710
      Keith Packard authored
      lm32 has 8 register parameter slots, so many vararg functions end up
      with several anonymous parameters passed in registers. If we run out
      of registers in the middle of a parameter, the entire parameter will
      be placed on the stack, skipping any remaining available registers.
      
      The receiving varargs function doesn't know this, and will save all of
      the possible parameter register values just below the stack parameters.
      
      When processing a va_arg call with a type size larger than a single
      register, we must check to see if it spans the boundary between
      register and stack parameters. If so, we need to skip to the stack
      parameters.
      
      This is done by making va_list a structure containing the arg pointer
      and the address of the start of the stack parameters. Boundary checks
      are inserted in va_arg calls to detect this case and the address of
      the parameter is set to the stack parameter start when the parameter
      crosses over.
      
      gcc/
      	* config/lm32/lm32.cc: Add several #includes.
      	(va_list_type): New.
      	(lm32_build_va_list): New function.
      	(lm32_builtin_va_start): Likewise.
      	(lm32_sd_gimplify_va_arg_expr): Likewise.
      	(lm32_gimplify_va_arg_expr): Likewise.
      cf9de710
    • Keith Packard's avatar
      lm32: Compute pretend_size in setup_incoming_varargs even if no_rtl · 423e9a8a
      Keith Packard authored
      gcc/
      	* config/lm32/lm32.cc (setup_incoming_varargs): Adjust the
      	conditionals so that pretend_size is always computed, even
      	if no_rtl is set.
      423e9a8a
    • Keith Packard's avatar
      lm32: Skip last named param when computing save varargs regs · 6e593fcd
      Keith Packard authored
      The cumulative args value in setup_incoming_varargs points at
      the last named parameter. We need to skip over that (if present) to
      get to the first anonymous argument as we only want to include
      those anonymous args in the saved register block.
      
      gcc/
      	* config/lm32/lm32.cc (lm32_setup_incoming_varargs): Skip last
      	named parameter when preparing to flush registers with unnamed
      	arguments to th stack.
      6e593fcd
    • Keith Packard's avatar
      lm32: Args with arg.named false still get passed in regs · 3184f6a5
      Keith Packard authored
      	* config/lm32/lm32.cc (lm32_function_arg): Pass unnamed
      	arguments in registers too, just like named arguments.
      3184f6a5
    • Andi Kleen's avatar
      Fix an incorrect file header comment for the core2 scheduling model · efd00e3a
      Andi Kleen authored
      Committed as obvious.
      
      gcc/ChangeLog:
      
      	* config/i386/x86-tune-sched-core.cc: Fix incorrect comment.
      efd00e3a
    • Eugene Rozenfeld's avatar
      Fix setting of call graph node AutoFDO count · e683c6b0
      Eugene Rozenfeld authored
      We are initializing both the call graph node count and
      the entry block count of the function with the head_count value
      from the profile.
      
      Count propagation algorithm may refine the entry block count
      and we may end up with a case where the call graph node count
      is set to zero but the entry block count is non-zero. That becomes
      a problem because we have this code in execute_fixup_cfg:
      
       profile_count num = node->count;
       profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
       bool scale = num.initialized_p () && !(num == den);
      
      Here if num is 0 but den is not 0, scale becomes true and we
      lose the counts in
      
      if (scale)
        bb->count = bb->count.apply_scale (num, den);
      
      This is what happened in the issue reported in PR116743
      (a 10% regression in MySQL HAMMERDB tests).
      3d9e6767 made an improvement in
      AutoFDO count propagation, which caused a mismatch between
      the call graph node count (zero) and the entry block count (non-zero)
      and subsequent loss of counts as described above.
      
      The fix is to update the call graph node count once we've done count propagation.
      
      Tested on x86_64-pc-linux-gnu.
      
      gcc/ChangeLog:
      	PR gcov-profile/116743
      	* auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count
      	and the entry block count.
      e683c6b0
    • GCC Administrator's avatar
      Daily bump. · 14f337e3
      GCC Administrator authored
      14f337e3
  2. Jan 15, 2025
    • Jonathan Wakely's avatar
      libstdc++: Fix use of internal feature test macro in test · 79d55040
      Jonathan Wakely authored
      This test should use __cpp_lib_ios_noreplace rather than the internal
      __glibcxx_ios_noreplace macro.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/27_io/ios_base/types/openmode/case_label.cc: Use
      	standard feature test macro not internal one.
      Unverified
      79d55040
    • Jonathan Wakely's avatar
      libstdc++: Fix fancy pointer test for std::set · f079feec
      Jonathan Wakely authored
      The alloc_ptr.cc test for std::set tries to use C++17 features
      unconditionally, and tries to use the C++23 range members which haven't
      been implemented for std::set yet.
      
      Some of the range checks are left in place but commented out, so they
      can be added after the ranges members are implemented. Others (such as
      prepend_range) are not valid for std::set at all.
      
      Also fix uses of internal feature test macros in two other tests, which
      should use the standard __cpp_lib_xxx macros.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/23_containers/set/requirements/explicit_instantiation/alloc_ptr.cc:
      	Guard node extraction checks with feature test macro. Remove
      	calls to non-existent range members.
      	* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr.cc:
      	Use standard macro not internal one.
      	* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr.cc:
      	Likewise.
      Unverified
      f079feec
    • Andrew Pinski's avatar
      match: Simplify `1 >> x` into `x == 0` [PR102705] · 903ab914
      Andrew Pinski authored
      
      This in this PR we have missed optimization where we miss that,
      `1 >> x` and `(1 >> x) ^ 1` can't be equal. There are a few ways of
      optimizing this, the easiest and simpliest is to simplify `1 >> x` into
      just `x == 0` as those are equivalant (if we ignore out of range values for x).
      we already have an optimization for `(1 >> X) !=/== 0` so the only difference
      here is we don't need the `!=/== 0` part to do the transformation.
      
      So this removes the `(1 >> X) !=/== 0` transformation and just adds a simplfied
      `1 >> x` -> `x == 0` one.
      
      Bootstrapped and tested on x86_64-linux-gnu.
      
      	PR tree-optimization/102705
      
      gcc/ChangeLog:
      
      	* match.pd (`(1 >> X) != 0`): Remove pattern.
      	(`1 >> x`): New pattern.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/tree-ssa/pr105832-2.c: Update testcase.
      	* gcc.dg/tree-ssa/pr96669-1.c: Likewise.
      	* gcc.dg/tree-ssa/pr102705-1.c: New test.
      	* gcc.dg/tree-ssa/pr102705-2.c: New test.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      903ab914
    • Sam James's avatar
      doc: cleanup trailing whitespace · c340ff20
      Sam James authored
      gcc/ChangeLog:
      
      	* doc/extend.texi: Cleanup trailing whitespace.
      Unverified
      c340ff20
    • Sam James's avatar
      doc: trivial grammar fix · d8e52444
      Sam James authored
      We say 'a constant .. expression' elsewhere. Fix the grammar.
      
      gcc/ChangeLog:
      
      	* doc/extend.texi: Add 'a' for grammar fix.
      Unverified
      d8e52444
    • Jonathan Wakely's avatar
      libstdc++: Fix reversed args in unreachable assumption [PR109849] · 6f85a972
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/109849
      	* include/bits/vector.tcc (vector::_M_range_insert): Fix
      	reversed args in length calculation.
      Unverified
      6f85a972
    • Harald Anlauf's avatar
      Fortran: reject NULL as source-expr in ALLOCATE with SOURCE= or MOLD= [PR71884] · 89230999
      Harald Anlauf authored
      	PR fortran/71884
      
      gcc/fortran/ChangeLog:
      
      	* resolve.cc (resolve_allocate_expr): Reject intrinsic NULL as
      	source-expr.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/pr71884.f90: New test.
      89230999
    • Jakub Jelinek's avatar
      c++: Handle RAW_DATA_CST in unify [PR118390] · 2619413a
      Jakub Jelinek authored
      This patch uses the count_ctor_elements function to fix up
      unify deduction of array sizes.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118390
      	* cp-tree.h (count_ctor_elements): Declare.
      	* call.cc (count_ctor_elements): No longer static.
      	* pt.cc (unify): Use count_ctor_elements instead of
      	CONSTRUCTOR_NELTS.
      
      	* g++.dg/cpp/embed-20.C: New test.
      	* g++.dg/cpp0x/pr118390.C: New test.
      2619413a
    • Wilco Dijkstra's avatar
      AArch64: Update neoverse512tvb tuning · 4ce502f3
      Wilco Dijkstra authored
      Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the
      missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
      
      gcc:
      	* config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.
      4ce502f3
    • Wilco Dijkstra's avatar
      AArch64: Add FULLY_PIPELINED_FMA to tune baseline · 2713f6bb
      Wilco Dijkstra authored
      Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is
      already enabled for some cores, but benchmarking it shows it is faster on all
      modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1).
      
      gcc:
      	* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE):
      	Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
      	* config/aarch64/tuning_models/ampere1b.h: Remove redundant
      	AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
      	* config/aarch64/tuning_models/neoversev2.h: Likewise.
      2713f6bb
    • Wilco Dijkstra's avatar
      AArch64: Deprecate -mabi=ilp32 · 625ea3c6
      Wilco Dijkstra authored
      ILP32 was originally intended to make porting to AArch64 easier.  Support was
      never merged in the Linux kernel or GLIBC, so it has been unsupported for many
      years.  There isn't a benefit in keeping unsupported features forever, so
      deprecate it now (and it could be removed in a future release).
      
      gcc:
      	* config/aarch64/aarch64.cc (aarch64_override_options): Add warning.
      	* doc/invoke.texi: Document -mabi=ilp32 as deprecated.
      
      gcc/testsuite:
      	* gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated.
      	* gcc.target/aarch64/pr100518.c: Likewise.
      	* gcc.target/aarch64/pr113114.c: Likewise.
      	* gcc.target/aarch64/pr80295.c: Likewise.
      	* gcc.target/aarch64/pr94201.c: Likewise.
      	* gcc.target/aarch64/pr94577.c: Likewise.
      	* gcc.target/aarch64/sve/pr108603.c: Likewise.
      625ea3c6
    • Cupertino Miranda's avatar
      bpf: set index entry for a VAR_DECL in CO-RE relocs · 01c37f9a
      Cupertino Miranda authored
      CO-RE accesses with non pointer struct variables will also generate a
      "0" string access within the CO-RE relocation.
      The first index within the access string, has sort of a different
      meaning then the remaining of the indexes.
      For i0:i1:...:in being an access index for "struct A a" declaration, its
      semantics are represented by:
        (&a + (sizeof(struct A) * i0) + offsetof(i1:...:in)
      
      gcc/ChangeLog:
      	* config/bpf/core-builtins.cc (compute_field_expr): Change
      	VAR_DECL outcome in switch case.
      
      gcc/testsuite/ChangeLog:
      	* gcc.target/bpf/core-builtin-1.c: Correct test.
      	* gcc.target/bpf/core-builtin-2.c: Correct test.
      	* gcc.target/bpf/core-builtin-exprlist-1.c: Correct test.
      01c37f9a
    • Cupertino Miranda's avatar
      bpf: calls do not promote attr access_index on lhs · 42786ccf
      Cupertino Miranda authored
      When traversing gimple to introduce CO-RE relocation entries to
      expressions that are accesses to attributed perserve_access_index types,
      the access is likely to be split in multiple gimple statments.
      In order to keep doing the proper CO-RE convertion we will need to mark
      the LHS tree nodes of gimple expressions as explicit CO-RE accesses,
      such that the gimple traverser will further convert the sub-expressions.
      
      This patch makes sure that this LHS marking will not happen in case the
      gimple statement is a function call, which case it is no longer
      expecting to keep generating CO-RE accesses with the remaining of the
      expression.
      
      gcc/ChangeLog:
      
      	* config/bpf/core-builtins.cc
      	(make_gimple_core_safe_access_index): Fix in condition.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/bpf/core-attr-calls.c: New test.
      42786ccf
    • Cupertino Miranda's avatar
      bpf: make sure CO-RE relocs are typed with struct BTF_KIND_STRUCT · d30def00
      Cupertino Miranda authored
      Based on observation within bpf-next selftests and comparisson of GCC
      and clang compiled code, the BPF loader expects all CO-RE relocations to
      point to BTF non const and non volatile type nodes.
      
      gcc/ChangeLog:
      
      	* btfout.cc (get_btf_kind): Remove static from function definition.
      	* config/bpf/btfext-out.cc (bpf_code_reloc_add): Check if CO-RE type
      	is not a const or volatile.
      	* ctfc.h (btf_dtd_kind): Add prototype for function.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/bpf/core-attr-const.c: New test.
      d30def00
    • Jakub Jelinek's avatar
      c++: Implement mangling of RAW_DATA_CST [PR118278] · 8d9d5834
      Jakub Jelinek authored
      As the following testcases show (mangle80.C only after reversion of the
      temporary reversion of C++ large array speedup commit), RAW_DATA_CST can
      be seen during mangling of some templates and we ICE because
      the mangler doesn't handle it.
      
      The following patch handles it and mangles it the same as a sequence of
      INTEGER_CSTs that were used previously instead.
      The only slight complication is that if ce->value is the last nonzero
      element, we need to skip the zeros at the end of RAW_DATA_CST.
      
      2025-01-03  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118278
      	* mangle.cc (write_expression): Handle RAW_DATA_CST.
      
      	* g++.dg/abi/mangle80.C: New test.
      	* g++.dg/cpp/embed-19.C: New test.
      8d9d5834
    • Marek Polacek's avatar
      c++: handle decltype in nested-name-spec printing [PR118139] · 1bc474f6
      Marek Polacek authored
      
      Compiling this test, we emit:
      
        error: 'static void CW<T>::operator=(int) requires requires(typename'decltype_type' not supported by pp_cxx_unqualified_id::type x) {x;}' must be a non-static member function
      
      where the DECLTYPE_TYPE isn't printed properly.  This patch fixes that
      to print:
      
      error: 'static void CW<T>::operator=(int) requires requires(typename decltype(T())::type x) {x;}' must be a non-static member function
      
      	PR c++/118139
      
      gcc/cp/ChangeLog:
      
      	* cxx-pretty-print.cc (pp_cxx_nested_name_specifier): Handle
      	a computed-type-specifier.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/diagnostic/decltype1.C: New test.
      
      Reviewed-by: default avatarJason Merrill <jason@redhat.com>
      1bc474f6
    • Jonathan Wakely's avatar
      libstdc++: Fix comments in test that reference wrong subclause of C++11 · 9cc31b4e
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	* testsuite/28_regex/traits/char/transform_primary.cc: Fix
      	subclause numbering in references to the standard.
      Unverified
      9cc31b4e
    • Tamar Christina's avatar
      middle-end: Fix incorrect type replacement in operands_equals [PR118472] · 25eb892a
      Tamar Christina authored
      In g:3c32575e I made a mistake and incorrectly
      replaced the type of the arguments of an expression with the type of the
      expression.  This is of course wrong.
      
      This reverts that change and I have also double checked the other replacements
      and they are fine.
      
      gcc/ChangeLog:
      
      	PR middle-end/118472
      	* fold-const.cc (operand_compare::operand_equal_p): Fix incorrect
      	replacement.
      
      gcc/testsuite/ChangeLog:
      
      	PR middle-end/118472
      	* gcc.dg/pr118472.c: New test.
      25eb892a
    • Richard Biener's avatar
      Annotate dbg_line_numbers table · bea593f1
      Richard Biener authored
      The following adds /* <num> */ to dbg_line_numbers so there's the chance
      to more easily lookup the ID of the match.pd line number used for
      dumping when you want to debug a speicific replacement.  It also cuts
      the lines down to 10 entries.
      
        static int dbg_line_numbers[1267] = {
              /* 0 */ 161, 164, 173, 175, 178, 181, 183, 189, 197, 195,
              /* 10 */ 199, 201, 205, 923, 921, 2060, 2071, 2052, 2058, 2063,
      ...
      
      	* genmatch.cc (define_dump_logs): Make reverse lookup in
      	dbg_line_numbers easier by adding comments with start index
      	and cutting number of elements per line to 10.
      bea593f1
    • Christoph Müllner's avatar
      testsuite: i386: Fix expected vectoriziation in pr105493.c · 120a3700
      Christoph Müllner authored
      
      As reported in PR117079, commit ab187858 broke the test pr105493.c.
      The test code contains two loops, where the first one is exected to be
      vectorized.  The commit that broke that vectorization was the first of
      several that enabled vectorization of both loops.
      Now, that GCC can vectorize the whole function, let's adjust this test
      to expect vectorization of both loops by ensuring that we don't write
      to the helper-array 'tmp'.
      
      Signed-off-by: default avatarChristoph Müllner <christoph.muellner@vrull.eu>
      
      	PR target/117079
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr105493.c: Fix expected vectorization
      
      Signed-off-by: default avatarChristoph Müllner <christoph.muellner@vrull.eu>
      120a3700
    • Tobias Burnus's avatar
      OpenMP/C++: Fix 'declare variant' for struct-returning functions [PR118486] · b67a0d6a
      Tobias Burnus authored
      To find the variant declaration, a call is constructed in
      omp_declare_variant_finalize_one, which gives here:
        TARGET_EXPR <D.3010, variant_fn ()>
      
      Extracting now the function declaration failed and gave the bogus
        error: could not find variant declaration
      
      Solution: Use the 2nd argument of the TARGET_EXPR and continue.
      
      	PR c++/118486
      
      gcc/cp/ChangeLog:
      
      	* decl.cc (omp_declare_variant_finalize_one): When resolving
      	the variant to use, handle variant calls with TARGET_EXPR.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/gomp/declare-variant-11.C: New test.
      b67a0d6a
    • Jakub Jelinek's avatar
      ipa: Initialize/release global obstack in process_new_functions [PR116068] · dd389c25
      Jakub Jelinek authored
      Other spots in cgraphunit.cc already call bitmap_obstack_initialize (NULL);
      before running a pass list and bitmap_obstack_release (NULL); after that,
      while process_new_functions wasn't doing that and with the new r15-130
      bitmap_alloc checking that results in ICE.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR ipa/116068
      	* cgraphunit.cc (symbol_table::process_new_functions): Call
      	bitmap_obstack_initialize (NULL); and bitmap_obstack_release (NULL)
      	around processing the functions.
      
      	* gcc.dg/graphite/pr116068.c: New test.
      dd389c25
    • Jakub Jelinek's avatar
      c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't... · 18f6bb98
      Jakub Jelinek authored
      c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't convert to its rettype [PR118387]
      
      Note, the PR raises another problem.
      If on the same testcase the B b; line is removed, we silently synthetize
      operator<=> which will crash at runtime due to returning without a return
      statement.  That is because the standard says that in that case
      it should return static_cast<int>(std::strong_ordering::equal);
      but I can't find anywhere wording which would say that if that isn't
      valid, the function is deleted.
      https://eel.is/c++draft/class.compare#class.spaceship-2.2
      seems to talk just about cases where there are some members and their
      comparison is invalid it is deleted, but here there are none and it
      follows
      https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
      So, we synthetize with tf_none, see the static_cast is invalid, don't
      add error_mark_node statement silently, but as the function isn't deleted,
      we just silently emit it.
      Should the standard be amended to say that the operator should be deleted
      even if it has no elements and the static cast from
      https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
      
      On Fri, Jan 10, 2025 at 12:04:53PM -0500, Jason Merrill wrote:
      > That seems pretty obviously what we want, and is what the other compilers
      > implement.
      
      This patch implements it then.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118387
      	* method.cc (build_comparison_op): Set bad if
      	std::strong_ordering::equal doesn't convert to rettype.
      
      	* g++.dg/cpp2a/spaceship-err6.C: Expect another error.
      	* g++.dg/cpp2a/spaceship-synth17.C: Likewise.
      	* g++.dg/cpp2a/spaceship-synth-neg6.C: Likewise.
      	* g++.dg/cpp2a/spaceship-synth-neg7.C: New test.
      
      	* testsuite/25_algorithms/default_template_value.cc
      	(Input::operator<=>): Use auto as return type rather than bool.
      18f6bb98
    • Jakub Jelinek's avatar
      c++: Fix up maybe_init_list_as_array for RAW_DATA_CST [PR118124] · 64828272
      Jakub Jelinek authored
      The previous patch made me look around some more and I found
      maybe_init_list_as_array doesn't handle RAW_DATA_CSTs correctly either,
      while the RAW_DATA_CST is properly split during finish_compound_literal,
      it was using CONSTRUCTOR_NELTS as the size of the arrays, which is wrong,
      RAW_DATA_CST could stand for far more initializers.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118124
      	* cp-tree.h (build_array_of_n_type): Change second argument type
      	from int to unsigned HOST_WIDE_INT.
      	* tree.cc (build_array_of_n_type): Likewise.
      	* call.cc (count_ctor_elements): New function.
      	(maybe_init_list_as_array): Use it instead of CONSTRUCTOR_NELTS.
      	(convert_like_internal): Use length from init's type instead of
      	len when handling the maybe_init_list_as_array case.
      
      	* g++.dg/cpp0x/initlist-opt5.C: New test.
      64828272
    • Jakub Jelinek's avatar
      c++: Fix ICEs with large initializer lists or ones including #embed [PR118124] · f263f2d5
      Jakub Jelinek authored
      The following testcases ICE due to RAW_DATA_CST not being handled where it
      should be during ck_list conversions.
      
      The last 2 testcases started ICEing with r15-6339 committed yesterday
      (speedup of large initializers), the first two already with r15-5958
      (#embed optimization for C++).
      
      For conversion to initializer_list<unsigned char> or char/signed char
      we can optimize and keep RAW_DATA_CST with adjusted type if we report
      narrowing errors if needed, for others this converts each element
      separately.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118124
      	* call.cc (convert_like_internal): Handle RAW_DATA_CST in
      	ck_list handling.  Formatting fixes.
      
      	* g++.dg/cpp/embed-15.C: New test.
      	* g++.dg/cpp/embed-16.C: New test.
      	* g++.dg/cpp0x/initlist-opt3.C: New test.
      	* g++.dg/cpp0x/initlist-opt4.C: New test.
      f263f2d5
    • Kito Cheng's avatar
      RISC-V: Fix code gen for reduction with length 0 [PR118182] · 40ad10f7
      Kito Cheng authored
      `.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the
      return value will be the start value even if the length is 0.
      
      However current code gen in RISC-V backend is not meet that semantic, it will
      result a random garbage value if length is 0.
      
      Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64:
              # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0);
              vsetvli zero,a5,e64,m1,ta,ma
              vfmv.s.f        v2,fa5     # insn 1
              vfredosum.vs    v1,v1,v2   # insn 2
              vfmv.f.s        fa5,v1     # insn 3
      
      insn 1:
      - vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value.
      insn 2:
      - vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA.
      (v-spec say: `If vl=0, no operation is performed and the destination register
       is not updated.`)
      insn 3:
      - vfmv.f.s will move the value from v1 even VL=0, so this is safe.
      
      So how we fix that? we need two fix for that:
      
      1. insn 1: need always execute with VL=1, so that we can guarantee it will
                 always work as expect.
      2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for
                 all reduction patterns, then we can guarantee vd[0] will contain the
                 start value when vl=0
      
      For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2,
      we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1`
      (start value).
      
      Change since V3:
      - Rename _AV to _VL0_SAFE for readability.
      - Use non-VL0_SAFE version if VL is const or VLMAX.
      - Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX.
      - Two more testcase.
      
      gcc/ChangeLog:
      
      	PR target/118182
      	* config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust
      	argument for expand_reduction.
      	(*widen_reduc_plus_scal_<mode>): Ditto.
      	(*fold_left_widen_plus_<mode>): Ditto.
      	(*mask_len_fold_left_widen_plus_<mode>): Ditto.
      	(*cond_widen_reduc_plus_scal_<mode>): Ditto.
      	(*cond_len_widen_reduc_plus_scal_<mode>): Ditto.
      	(*cond_widen_reduc_plus_scal_<mode>): Ditto.
      	* config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for
      	expand_reduction.
      	(reduc_smax_scal_<mode>): Ditto.
      	(reduc_umax_scal_<mode>): Ditto.
      	(reduc_smin_scal_<mode>): Ditto.
      	(reduc_umin_scal_<mode>): Ditto.
      	(reduc_and_scal_<mode>): Ditto.
      	(reduc_ior_scal_<mode>): Ditto.
      	(reduc_xor_scal_<mode>): Ditto.
      	(reduc_plus_scal_<mode>): Ditto.
      	(reduc_smax_scal_<mode>): Ditto.
      	(reduc_smin_scal_<mode>): Ditto.
      	(reduc_fmax_scal_<mode>): Ditto.
      	(reduc_fmin_scal_<mode>): Ditto.
      	(fold_left_plus_<mode>): Ditto.
      	(mask_len_fold_left_plus_<mode>): Ditto.
      	* config/riscv/riscv-v.cc (expand_reduction): Add one more
      	argument for reduction code for vl0-safe.
      	* config/riscv/riscv-protos.h (expand_reduction): Ditto.
      	* config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of
      	reduction.
      	(ANY_REDUC_VL0_SAFE): New.
      	(ANY_WREDUC_VL0_SAFE): Ditto.
      	(ANY_FREDUC_VL0_SAFE): Ditto.
      	(ANY_FREDUC_SUM_VL0_SAFE): Ditto.
      	(ANY_FWREDUC_SUM_VL0_SAFE): Ditto.
      	(reduc_op): Add _VL0_SAFE variant of reduction.
      	(order) Ditto.
      	* config/riscv/vector.md (@pred_<reduc_op><mode>): New.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/118182
      	* gfortran.target/riscv/rvv/pr118182.f: New.
      	* gcc.target/riscv/rvv/autovec/pr118182-1.c: New.
      	* gcc.target/riscv/rvv/autovec/pr118182-2.c: New.
      40ad10f7
    • Richard Biener's avatar
      Fix SLP scalar costing with stmts also used in externals · 21edcb95
      Richard Biener authored
      When we have the situation of an external SLP node that is
      permuted the scalar stmts recorded in the permute node do not
      mean the scalar computation can be removed.  We are removing
      those stmts from the vectorized_scalar_stmts for this reason
      but we fail to check this set when we cost scalar stmts.  Note
      vectorized_scalar_stmts isn't a complete set so also pass
      scalar_stmts_in_externs and check that.
      
      The following fixes this.
      
      This shows in PR115777 when we avoid vectorizing the load, but
      on it's own doesn't help the PR yet.
      
      	PR tree-optimization/115777
      	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Do not
      	cost a scalar stmt that needs to be preserved.
      21edcb95
Loading