Skip to content
Snippets Groups Projects
  1. Jan 16, 2025
    • Keith Packard's avatar
      lm32: Args with arg.named false still get passed in regs · 3184f6a5
      Keith Packard authored
      	* config/lm32/lm32.cc (lm32_function_arg): Pass unnamed
      	arguments in registers too, just like named arguments.
      3184f6a5
    • Andi Kleen's avatar
      Fix an incorrect file header comment for the core2 scheduling model · efd00e3a
      Andi Kleen authored
      Committed as obvious.
      
      gcc/ChangeLog:
      
      	* config/i386/x86-tune-sched-core.cc: Fix incorrect comment.
      efd00e3a
    • Eugene Rozenfeld's avatar
      Fix setting of call graph node AutoFDO count · e683c6b0
      Eugene Rozenfeld authored
      We are initializing both the call graph node count and
      the entry block count of the function with the head_count value
      from the profile.
      
      Count propagation algorithm may refine the entry block count
      and we may end up with a case where the call graph node count
      is set to zero but the entry block count is non-zero. That becomes
      a problem because we have this code in execute_fixup_cfg:
      
       profile_count num = node->count;
       profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
       bool scale = num.initialized_p () && !(num == den);
      
      Here if num is 0 but den is not 0, scale becomes true and we
      lose the counts in
      
      if (scale)
        bb->count = bb->count.apply_scale (num, den);
      
      This is what happened in the issue reported in PR116743
      (a 10% regression in MySQL HAMMERDB tests).
      3d9e6767 made an improvement in
      AutoFDO count propagation, which caused a mismatch between
      the call graph node count (zero) and the entry block count (non-zero)
      and subsequent loss of counts as described above.
      
      The fix is to update the call graph node count once we've done count propagation.
      
      Tested on x86_64-pc-linux-gnu.
      
      gcc/ChangeLog:
      	PR gcov-profile/116743
      	* auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count
      	and the entry block count.
      e683c6b0
    • GCC Administrator's avatar
      Daily bump. · 14f337e3
      GCC Administrator authored
      14f337e3
  2. Jan 15, 2025
    • Jonathan Wakely's avatar
      libstdc++: Fix use of internal feature test macro in test · 79d55040
      Jonathan Wakely authored
      This test should use __cpp_lib_ios_noreplace rather than the internal
      __glibcxx_ios_noreplace macro.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/27_io/ios_base/types/openmode/case_label.cc: Use
      	standard feature test macro not internal one.
      Unverified
      79d55040
    • Jonathan Wakely's avatar
      libstdc++: Fix fancy pointer test for std::set · f079feec
      Jonathan Wakely authored
      The alloc_ptr.cc test for std::set tries to use C++17 features
      unconditionally, and tries to use the C++23 range members which haven't
      been implemented for std::set yet.
      
      Some of the range checks are left in place but commented out, so they
      can be added after the ranges members are implemented. Others (such as
      prepend_range) are not valid for std::set at all.
      
      Also fix uses of internal feature test macros in two other tests, which
      should use the standard __cpp_lib_xxx macros.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/23_containers/set/requirements/explicit_instantiation/alloc_ptr.cc:
      	Guard node extraction checks with feature test macro. Remove
      	calls to non-existent range members.
      	* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr.cc:
      	Use standard macro not internal one.
      	* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr.cc:
      	Likewise.
      Unverified
      f079feec
    • Andrew Pinski's avatar
      match: Simplify `1 >> x` into `x == 0` [PR102705] · 903ab914
      Andrew Pinski authored
      
      This in this PR we have missed optimization where we miss that,
      `1 >> x` and `(1 >> x) ^ 1` can't be equal. There are a few ways of
      optimizing this, the easiest and simpliest is to simplify `1 >> x` into
      just `x == 0` as those are equivalant (if we ignore out of range values for x).
      we already have an optimization for `(1 >> X) !=/== 0` so the only difference
      here is we don't need the `!=/== 0` part to do the transformation.
      
      So this removes the `(1 >> X) !=/== 0` transformation and just adds a simplfied
      `1 >> x` -> `x == 0` one.
      
      Bootstrapped and tested on x86_64-linux-gnu.
      
      	PR tree-optimization/102705
      
      gcc/ChangeLog:
      
      	* match.pd (`(1 >> X) != 0`): Remove pattern.
      	(`1 >> x`): New pattern.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/tree-ssa/pr105832-2.c: Update testcase.
      	* gcc.dg/tree-ssa/pr96669-1.c: Likewise.
      	* gcc.dg/tree-ssa/pr102705-1.c: New test.
      	* gcc.dg/tree-ssa/pr102705-2.c: New test.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      903ab914
    • Sam James's avatar
      doc: cleanup trailing whitespace · c340ff20
      Sam James authored
      gcc/ChangeLog:
      
      	* doc/extend.texi: Cleanup trailing whitespace.
      Unverified
      c340ff20
    • Sam James's avatar
      doc: trivial grammar fix · d8e52444
      Sam James authored
      We say 'a constant .. expression' elsewhere. Fix the grammar.
      
      gcc/ChangeLog:
      
      	* doc/extend.texi: Add 'a' for grammar fix.
      Unverified
      d8e52444
    • Jonathan Wakely's avatar
      libstdc++: Fix reversed args in unreachable assumption [PR109849] · 6f85a972
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/109849
      	* include/bits/vector.tcc (vector::_M_range_insert): Fix
      	reversed args in length calculation.
      Unverified
      6f85a972
    • Harald Anlauf's avatar
      Fortran: reject NULL as source-expr in ALLOCATE with SOURCE= or MOLD= [PR71884] · 89230999
      Harald Anlauf authored
      	PR fortran/71884
      
      gcc/fortran/ChangeLog:
      
      	* resolve.cc (resolve_allocate_expr): Reject intrinsic NULL as
      	source-expr.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/pr71884.f90: New test.
      89230999
    • Jakub Jelinek's avatar
      c++: Handle RAW_DATA_CST in unify [PR118390] · 2619413a
      Jakub Jelinek authored
      This patch uses the count_ctor_elements function to fix up
      unify deduction of array sizes.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118390
      	* cp-tree.h (count_ctor_elements): Declare.
      	* call.cc (count_ctor_elements): No longer static.
      	* pt.cc (unify): Use count_ctor_elements instead of
      	CONSTRUCTOR_NELTS.
      
      	* g++.dg/cpp/embed-20.C: New test.
      	* g++.dg/cpp0x/pr118390.C: New test.
      2619413a
    • Wilco Dijkstra's avatar
      AArch64: Update neoverse512tvb tuning · 4ce502f3
      Wilco Dijkstra authored
      Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the
      missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
      
      gcc:
      	* config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.
      4ce502f3
    • Wilco Dijkstra's avatar
      AArch64: Add FULLY_PIPELINED_FMA to tune baseline · 2713f6bb
      Wilco Dijkstra authored
      Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is
      already enabled for some cores, but benchmarking it shows it is faster on all
      modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1).
      
      gcc:
      	* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE):
      	Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
      	* config/aarch64/tuning_models/ampere1b.h: Remove redundant
      	AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
      	* config/aarch64/tuning_models/neoversev2.h: Likewise.
      2713f6bb
    • Wilco Dijkstra's avatar
      AArch64: Deprecate -mabi=ilp32 · 625ea3c6
      Wilco Dijkstra authored
      ILP32 was originally intended to make porting to AArch64 easier.  Support was
      never merged in the Linux kernel or GLIBC, so it has been unsupported for many
      years.  There isn't a benefit in keeping unsupported features forever, so
      deprecate it now (and it could be removed in a future release).
      
      gcc:
      	* config/aarch64/aarch64.cc (aarch64_override_options): Add warning.
      	* doc/invoke.texi: Document -mabi=ilp32 as deprecated.
      
      gcc/testsuite:
      	* gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated.
      	* gcc.target/aarch64/pr100518.c: Likewise.
      	* gcc.target/aarch64/pr113114.c: Likewise.
      	* gcc.target/aarch64/pr80295.c: Likewise.
      	* gcc.target/aarch64/pr94201.c: Likewise.
      	* gcc.target/aarch64/pr94577.c: Likewise.
      	* gcc.target/aarch64/sve/pr108603.c: Likewise.
      625ea3c6
    • Cupertino Miranda's avatar
      bpf: set index entry for a VAR_DECL in CO-RE relocs · 01c37f9a
      Cupertino Miranda authored
      CO-RE accesses with non pointer struct variables will also generate a
      "0" string access within the CO-RE relocation.
      The first index within the access string, has sort of a different
      meaning then the remaining of the indexes.
      For i0:i1:...:in being an access index for "struct A a" declaration, its
      semantics are represented by:
        (&a + (sizeof(struct A) * i0) + offsetof(i1:...:in)
      
      gcc/ChangeLog:
      	* config/bpf/core-builtins.cc (compute_field_expr): Change
      	VAR_DECL outcome in switch case.
      
      gcc/testsuite/ChangeLog:
      	* gcc.target/bpf/core-builtin-1.c: Correct test.
      	* gcc.target/bpf/core-builtin-2.c: Correct test.
      	* gcc.target/bpf/core-builtin-exprlist-1.c: Correct test.
      01c37f9a
    • Cupertino Miranda's avatar
      bpf: calls do not promote attr access_index on lhs · 42786ccf
      Cupertino Miranda authored
      When traversing gimple to introduce CO-RE relocation entries to
      expressions that are accesses to attributed perserve_access_index types,
      the access is likely to be split in multiple gimple statments.
      In order to keep doing the proper CO-RE convertion we will need to mark
      the LHS tree nodes of gimple expressions as explicit CO-RE accesses,
      such that the gimple traverser will further convert the sub-expressions.
      
      This patch makes sure that this LHS marking will not happen in case the
      gimple statement is a function call, which case it is no longer
      expecting to keep generating CO-RE accesses with the remaining of the
      expression.
      
      gcc/ChangeLog:
      
      	* config/bpf/core-builtins.cc
      	(make_gimple_core_safe_access_index): Fix in condition.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/bpf/core-attr-calls.c: New test.
      42786ccf
    • Cupertino Miranda's avatar
      bpf: make sure CO-RE relocs are typed with struct BTF_KIND_STRUCT · d30def00
      Cupertino Miranda authored
      Based on observation within bpf-next selftests and comparisson of GCC
      and clang compiled code, the BPF loader expects all CO-RE relocations to
      point to BTF non const and non volatile type nodes.
      
      gcc/ChangeLog:
      
      	* btfout.cc (get_btf_kind): Remove static from function definition.
      	* config/bpf/btfext-out.cc (bpf_code_reloc_add): Check if CO-RE type
      	is not a const or volatile.
      	* ctfc.h (btf_dtd_kind): Add prototype for function.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/bpf/core-attr-const.c: New test.
      d30def00
    • Jakub Jelinek's avatar
      c++: Implement mangling of RAW_DATA_CST [PR118278] · 8d9d5834
      Jakub Jelinek authored
      As the following testcases show (mangle80.C only after reversion of the
      temporary reversion of C++ large array speedup commit), RAW_DATA_CST can
      be seen during mangling of some templates and we ICE because
      the mangler doesn't handle it.
      
      The following patch handles it and mangles it the same as a sequence of
      INTEGER_CSTs that were used previously instead.
      The only slight complication is that if ce->value is the last nonzero
      element, we need to skip the zeros at the end of RAW_DATA_CST.
      
      2025-01-03  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118278
      	* mangle.cc (write_expression): Handle RAW_DATA_CST.
      
      	* g++.dg/abi/mangle80.C: New test.
      	* g++.dg/cpp/embed-19.C: New test.
      8d9d5834
    • Marek Polacek's avatar
      c++: handle decltype in nested-name-spec printing [PR118139] · 1bc474f6
      Marek Polacek authored
      
      Compiling this test, we emit:
      
        error: 'static void CW<T>::operator=(int) requires requires(typename'decltype_type' not supported by pp_cxx_unqualified_id::type x) {x;}' must be a non-static member function
      
      where the DECLTYPE_TYPE isn't printed properly.  This patch fixes that
      to print:
      
      error: 'static void CW<T>::operator=(int) requires requires(typename decltype(T())::type x) {x;}' must be a non-static member function
      
      	PR c++/118139
      
      gcc/cp/ChangeLog:
      
      	* cxx-pretty-print.cc (pp_cxx_nested_name_specifier): Handle
      	a computed-type-specifier.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/diagnostic/decltype1.C: New test.
      
      Reviewed-by: default avatarJason Merrill <jason@redhat.com>
      1bc474f6
    • Jonathan Wakely's avatar
      libstdc++: Fix comments in test that reference wrong subclause of C++11 · 9cc31b4e
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	* testsuite/28_regex/traits/char/transform_primary.cc: Fix
      	subclause numbering in references to the standard.
      Unverified
      9cc31b4e
    • Tamar Christina's avatar
      middle-end: Fix incorrect type replacement in operands_equals [PR118472] · 25eb892a
      Tamar Christina authored
      In g:3c32575e I made a mistake and incorrectly
      replaced the type of the arguments of an expression with the type of the
      expression.  This is of course wrong.
      
      This reverts that change and I have also double checked the other replacements
      and they are fine.
      
      gcc/ChangeLog:
      
      	PR middle-end/118472
      	* fold-const.cc (operand_compare::operand_equal_p): Fix incorrect
      	replacement.
      
      gcc/testsuite/ChangeLog:
      
      	PR middle-end/118472
      	* gcc.dg/pr118472.c: New test.
      25eb892a
    • Richard Biener's avatar
      Annotate dbg_line_numbers table · bea593f1
      Richard Biener authored
      The following adds /* <num> */ to dbg_line_numbers so there's the chance
      to more easily lookup the ID of the match.pd line number used for
      dumping when you want to debug a speicific replacement.  It also cuts
      the lines down to 10 entries.
      
        static int dbg_line_numbers[1267] = {
              /* 0 */ 161, 164, 173, 175, 178, 181, 183, 189, 197, 195,
              /* 10 */ 199, 201, 205, 923, 921, 2060, 2071, 2052, 2058, 2063,
      ...
      
      	* genmatch.cc (define_dump_logs): Make reverse lookup in
      	dbg_line_numbers easier by adding comments with start index
      	and cutting number of elements per line to 10.
      bea593f1
    • Christoph Müllner's avatar
      testsuite: i386: Fix expected vectoriziation in pr105493.c · 120a3700
      Christoph Müllner authored
      
      As reported in PR117079, commit ab187858 broke the test pr105493.c.
      The test code contains two loops, where the first one is exected to be
      vectorized.  The commit that broke that vectorization was the first of
      several that enabled vectorization of both loops.
      Now, that GCC can vectorize the whole function, let's adjust this test
      to expect vectorization of both loops by ensuring that we don't write
      to the helper-array 'tmp'.
      
      Signed-off-by: default avatarChristoph Müllner <christoph.muellner@vrull.eu>
      
      	PR target/117079
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr105493.c: Fix expected vectorization
      
      Signed-off-by: default avatarChristoph Müllner <christoph.muellner@vrull.eu>
      120a3700
    • Tobias Burnus's avatar
      OpenMP/C++: Fix 'declare variant' for struct-returning functions [PR118486] · b67a0d6a
      Tobias Burnus authored
      To find the variant declaration, a call is constructed in
      omp_declare_variant_finalize_one, which gives here:
        TARGET_EXPR <D.3010, variant_fn ()>
      
      Extracting now the function declaration failed and gave the bogus
        error: could not find variant declaration
      
      Solution: Use the 2nd argument of the TARGET_EXPR and continue.
      
      	PR c++/118486
      
      gcc/cp/ChangeLog:
      
      	* decl.cc (omp_declare_variant_finalize_one): When resolving
      	the variant to use, handle variant calls with TARGET_EXPR.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/gomp/declare-variant-11.C: New test.
      b67a0d6a
    • Jakub Jelinek's avatar
      ipa: Initialize/release global obstack in process_new_functions [PR116068] · dd389c25
      Jakub Jelinek authored
      Other spots in cgraphunit.cc already call bitmap_obstack_initialize (NULL);
      before running a pass list and bitmap_obstack_release (NULL); after that,
      while process_new_functions wasn't doing that and with the new r15-130
      bitmap_alloc checking that results in ICE.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR ipa/116068
      	* cgraphunit.cc (symbol_table::process_new_functions): Call
      	bitmap_obstack_initialize (NULL); and bitmap_obstack_release (NULL)
      	around processing the functions.
      
      	* gcc.dg/graphite/pr116068.c: New test.
      dd389c25
    • Jakub Jelinek's avatar
      c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't... · 18f6bb98
      Jakub Jelinek authored
      c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't convert to its rettype [PR118387]
      
      Note, the PR raises another problem.
      If on the same testcase the B b; line is removed, we silently synthetize
      operator<=> which will crash at runtime due to returning without a return
      statement.  That is because the standard says that in that case
      it should return static_cast<int>(std::strong_ordering::equal);
      but I can't find anywhere wording which would say that if that isn't
      valid, the function is deleted.
      https://eel.is/c++draft/class.compare#class.spaceship-2.2
      seems to talk just about cases where there are some members and their
      comparison is invalid it is deleted, but here there are none and it
      follows
      https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
      So, we synthetize with tf_none, see the static_cast is invalid, don't
      add error_mark_node statement silently, but as the function isn't deleted,
      we just silently emit it.
      Should the standard be amended to say that the operator should be deleted
      even if it has no elements and the static cast from
      https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
      
      On Fri, Jan 10, 2025 at 12:04:53PM -0500, Jason Merrill wrote:
      > That seems pretty obviously what we want, and is what the other compilers
      > implement.
      
      This patch implements it then.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118387
      	* method.cc (build_comparison_op): Set bad if
      	std::strong_ordering::equal doesn't convert to rettype.
      
      	* g++.dg/cpp2a/spaceship-err6.C: Expect another error.
      	* g++.dg/cpp2a/spaceship-synth17.C: Likewise.
      	* g++.dg/cpp2a/spaceship-synth-neg6.C: Likewise.
      	* g++.dg/cpp2a/spaceship-synth-neg7.C: New test.
      
      	* testsuite/25_algorithms/default_template_value.cc
      	(Input::operator<=>): Use auto as return type rather than bool.
      18f6bb98
    • Jakub Jelinek's avatar
      c++: Fix up maybe_init_list_as_array for RAW_DATA_CST [PR118124] · 64828272
      Jakub Jelinek authored
      The previous patch made me look around some more and I found
      maybe_init_list_as_array doesn't handle RAW_DATA_CSTs correctly either,
      while the RAW_DATA_CST is properly split during finish_compound_literal,
      it was using CONSTRUCTOR_NELTS as the size of the arrays, which is wrong,
      RAW_DATA_CST could stand for far more initializers.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118124
      	* cp-tree.h (build_array_of_n_type): Change second argument type
      	from int to unsigned HOST_WIDE_INT.
      	* tree.cc (build_array_of_n_type): Likewise.
      	* call.cc (count_ctor_elements): New function.
      	(maybe_init_list_as_array): Use it instead of CONSTRUCTOR_NELTS.
      	(convert_like_internal): Use length from init's type instead of
      	len when handling the maybe_init_list_as_array case.
      
      	* g++.dg/cpp0x/initlist-opt5.C: New test.
      64828272
    • Jakub Jelinek's avatar
      c++: Fix ICEs with large initializer lists or ones including #embed [PR118124] · f263f2d5
      Jakub Jelinek authored
      The following testcases ICE due to RAW_DATA_CST not being handled where it
      should be during ck_list conversions.
      
      The last 2 testcases started ICEing with r15-6339 committed yesterday
      (speedup of large initializers), the first two already with r15-5958
      (#embed optimization for C++).
      
      For conversion to initializer_list<unsigned char> or char/signed char
      we can optimize and keep RAW_DATA_CST with adjusted type if we report
      narrowing errors if needed, for others this converts each element
      separately.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118124
      	* call.cc (convert_like_internal): Handle RAW_DATA_CST in
      	ck_list handling.  Formatting fixes.
      
      	* g++.dg/cpp/embed-15.C: New test.
      	* g++.dg/cpp/embed-16.C: New test.
      	* g++.dg/cpp0x/initlist-opt3.C: New test.
      	* g++.dg/cpp0x/initlist-opt4.C: New test.
      f263f2d5
    • Kito Cheng's avatar
      RISC-V: Fix code gen for reduction with length 0 [PR118182] · 40ad10f7
      Kito Cheng authored
      `.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the
      return value will be the start value even if the length is 0.
      
      However current code gen in RISC-V backend is not meet that semantic, it will
      result a random garbage value if length is 0.
      
      Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64:
              # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0);
              vsetvli zero,a5,e64,m1,ta,ma
              vfmv.s.f        v2,fa5     # insn 1
              vfredosum.vs    v1,v1,v2   # insn 2
              vfmv.f.s        fa5,v1     # insn 3
      
      insn 1:
      - vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value.
      insn 2:
      - vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA.
      (v-spec say: `If vl=0, no operation is performed and the destination register
       is not updated.`)
      insn 3:
      - vfmv.f.s will move the value from v1 even VL=0, so this is safe.
      
      So how we fix that? we need two fix for that:
      
      1. insn 1: need always execute with VL=1, so that we can guarantee it will
                 always work as expect.
      2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for
                 all reduction patterns, then we can guarantee vd[0] will contain the
                 start value when vl=0
      
      For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2,
      we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1`
      (start value).
      
      Change since V3:
      - Rename _AV to _VL0_SAFE for readability.
      - Use non-VL0_SAFE version if VL is const or VLMAX.
      - Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX.
      - Two more testcase.
      
      gcc/ChangeLog:
      
      	PR target/118182
      	* config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust
      	argument for expand_reduction.
      	(*widen_reduc_plus_scal_<mode>): Ditto.
      	(*fold_left_widen_plus_<mode>): Ditto.
      	(*mask_len_fold_left_widen_plus_<mode>): Ditto.
      	(*cond_widen_reduc_plus_scal_<mode>): Ditto.
      	(*cond_len_widen_reduc_plus_scal_<mode>): Ditto.
      	(*cond_widen_reduc_plus_scal_<mode>): Ditto.
      	* config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for
      	expand_reduction.
      	(reduc_smax_scal_<mode>): Ditto.
      	(reduc_umax_scal_<mode>): Ditto.
      	(reduc_smin_scal_<mode>): Ditto.
      	(reduc_umin_scal_<mode>): Ditto.
      	(reduc_and_scal_<mode>): Ditto.
      	(reduc_ior_scal_<mode>): Ditto.
      	(reduc_xor_scal_<mode>): Ditto.
      	(reduc_plus_scal_<mode>): Ditto.
      	(reduc_smax_scal_<mode>): Ditto.
      	(reduc_smin_scal_<mode>): Ditto.
      	(reduc_fmax_scal_<mode>): Ditto.
      	(reduc_fmin_scal_<mode>): Ditto.
      	(fold_left_plus_<mode>): Ditto.
      	(mask_len_fold_left_plus_<mode>): Ditto.
      	* config/riscv/riscv-v.cc (expand_reduction): Add one more
      	argument for reduction code for vl0-safe.
      	* config/riscv/riscv-protos.h (expand_reduction): Ditto.
      	* config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of
      	reduction.
      	(ANY_REDUC_VL0_SAFE): New.
      	(ANY_WREDUC_VL0_SAFE): Ditto.
      	(ANY_FREDUC_VL0_SAFE): Ditto.
      	(ANY_FREDUC_SUM_VL0_SAFE): Ditto.
      	(ANY_FWREDUC_SUM_VL0_SAFE): Ditto.
      	(reduc_op): Add _VL0_SAFE variant of reduction.
      	(order) Ditto.
      	* config/riscv/vector.md (@pred_<reduc_op><mode>): New.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/118182
      	* gfortran.target/riscv/rvv/pr118182.f: New.
      	* gcc.target/riscv/rvv/autovec/pr118182-1.c: New.
      	* gcc.target/riscv/rvv/autovec/pr118182-2.c: New.
      40ad10f7
    • Richard Biener's avatar
      Fix SLP scalar costing with stmts also used in externals · 21edcb95
      Richard Biener authored
      When we have the situation of an external SLP node that is
      permuted the scalar stmts recorded in the permute node do not
      mean the scalar computation can be removed.  We are removing
      those stmts from the vectorized_scalar_stmts for this reason
      but we fail to check this set when we cost scalar stmts.  Note
      vectorized_scalar_stmts isn't a complete set so also pass
      scalar_stmts_in_externs and check that.
      
      The following fixes this.
      
      This shows in PR115777 when we avoid vectorizing the load, but
      on it's own doesn't help the PR yet.
      
      	PR tree-optimization/115777
      	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Do not
      	cost a scalar stmt that needs to be preserved.
      21edcb95
    • Michal Jires's avatar
      lto: Remove link() to fix build with MinGW [PR118238] · ed123311
      Michal Jires authored
      I used link() to create cheap copies of Incremental LTO cache contents
      to prevent their deletion once linking is finished.
      This is unnecessary, since output_files are deleted in our lto-plugin
      and not in the linker itself.
      
      Bootstrapped/regtested on x86_64-linux.
      lto-wrapper now again builds on MinGW. Though so far I have not setup
      MinGW to be able to do full bootstrap.
      Ok for trunk?
      
      	PR lto/118238
      
      gcc/ChangeLog:
      
      	* lto-wrapper.cc (run_gcc): Remove link() copying.
      
      lto-plugin/ChangeLog:
      
      	* lto-plugin.c (cleanup_handler):
      	Keep output_files when using Incremental LTO.
      	(onload): Detect Incremental LTO.
      ed123311
    • Anton Blanchard's avatar
      [RISC-V][PR target/118170] Add HF div/sqrt reservation · d6f1961e
      Anton Blanchard authored
      
      Clearly an oversight in the generic-ooo model caught by the checking code.  I
      should have realized it was generic-ooo as we don't have a pipeline description
      for the tenstorrent design yet, just the costing model.
      
      The patch was extracted from the BZ which indicated Anton was the author, so I
      kept that.  I'm listed as co-author just in case someone wants to complain
      about the testcase in the future.  I didn't do any notable lifting here.
      
      Thanks Peter and Anton!
      
      	PR target/118170
      gcc/
      	* config/riscv/generic-ooo.md (generic_ooo_float_div_half): New
      	reservation.
      
      gcc/testsuite
      	* gcc.target/riscv/pr118170.c: New test.
      
      Co-authored-by: default avatarJeff Law <jlaw@ventanamicro.com>
      d6f1961e
    • Richard Sandiford's avatar
      [PR rtl-optimization/109592] Simplify nested shifts · cab2e123
      Richard Sandiford authored
      > The BZ in question is a failure to recognize a pair of shifts as a sign
      > extension.
      >
      > I originally thought simplify-rtx would be the right framework to
      > address this problem, but fwprop is actually better.  We can write the
      > recognizer much simpler in that framework.
      >
      > fwprop already simplifies nested shifts/extensions to the desired RTL,
      > but it's not considered profitable and we throw away the good work done
      > by fwprop & simplifiers.
      >
      > It's hard to see a scenario where nested shifts or nested extensions
      > that simplify down to a single sign/zero extension isn't a profitable
      > transformation.  So when fwprop has nested shifts/extensions that
      > simplifies to an extension, we consider it profitable.
      >
      > This allow us to simplify the testcase on rv64 with ZBB enabled from a
      > pair of shifts to a single byte or half-word sign extension.
      
      Hmm.  So just to summarise something that was discussed in the PR
      comments, this is a case where combine's expand_compound_operation/
      make_compound_operation wrangler hurts us, because the process isn't
      idempotent, and combine produces two complex instructions:
      
      (insn 6 3 7 2 (set (reg:DI 137 [ _3 ])
              (ashift:DI (reg:DI 139 [ x ])
                  (const_int 24 [0x18]))) "foo.c":2:20 305 {ashldi3}
           (expr_list:REG_DEAD (reg:DI 139 [ x ])
              (nil)))
      (insn 12 7 13 2 (set (reg/i:DI 10 a0)
              (sign_extend:DI (ashiftrt:SI (subreg:SI (reg:DI 137 [ _3 ]) 0)
                      (const_int 24 [0x18])))) "foo.c":2:27 321 {ashrsi3_extend}
           (expr_list:REG_DEAD (reg:DI 137 [ _3 ])
              (nil)))
      
      given two simple instructions:
      
      (insn 6 3 7 2 (set (reg:SI 137 [ _3 ])
              (sign_extend:SI (subreg:QI (reg/v:DI 136 [ x ]) 0))) "foo.c":2:20 533 {*extendqisi2_bitmanip}
           (expr_list:REG_DEAD (reg/v:DI 136 [ x ])
              (nil)))
      (insn 7 6 12 2 (set (reg:DI 138 [ _3 ])
              (sign_extend:DI (reg:SI 137 [ _3 ]))) "foo.c":2:20 discrim 1 133 {*extendsidi2_internal}
           (expr_list:REG_DEAD (reg:SI 137 [ _3 ])
              (nil)))
      
      If I run with -fdisable-rtl-combine then late_combine1 already does the
      expected transformation.
      
      Although it would be nice to fix combine, that might be difficult.
      If we treat combine as immutable then the options are:
      
      (1) Teach simplify-rtx to simplify combine's output into a single sign_extend.
      
      (2) Allow fwprop1 to get in first, before combine has a chance to mess
          things up.
      
      The patch goes for (2).
      
      Is that a fair summary?
      
      Playing devil's advocate, I suppose one advantage of (1) is that it
      would allow the optimisation even if the original rtl looked like
      combine's output.  And fwprop1 doesn't distinguish between cases in
      which the source instruction disappears from cases in which the source
      instruction is kept.  Thus we could transform:
      
        (set (reg:SI R2) (sign_extend:SI (reg:QI R1)))
        (set (reg:DI R3) (sign_extend:DI (reg:SI R2)))
      
      into:
      
        (set (reg:SI R2) (sign_extend:SI (reg:QI R1)))
        (set (reg:DI R3) (sign_extend:DI (reg:QI R1)))
      
      which increases the register pressure between the two instructions
      (since R2 and R1 are both now live).  In general, there could be
      quite a gap between the two instructions.
      
      On the other hand, even in that case, fwprop1 would be parallelising
      the extensions.  And since we're talking about unary operations,
      even two-address targets would allow R1 to be extended without
      tying the source and destination.
      
      Also, it seems relatively unlikely that expand would produce code
      that looks like combine's, since the gimple optimisers should have
      simplified it into conversions.
      
      So initially I was going to agree that it's worth trying in fwprop.  But...
      
      [ commentary on Jeff's original approach dropped. ]
      
      So it seems like it's a bit of a mess :slight_frown:
      
      If we do try to fix combine, I think something like the attached
      would fit within the current scheme.  It is a pure shift-for-shift
      transformation, avoiding any extensions.
      
      Will think more about it, but wanted to get the above stream of
      consciousness out before I finish for the day :slight_smile:
      
      
      
      	PR rtl-optimization/109592
      gcc/
      	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
      	Simplify nested shifts with subregs.
      
      gcc/testsuite
      	* gcc.target/riscv/pr109592.c: New test.
      	* gcc.target/riscv/sign-extend-rshift.c: Adjust expected output
      
      Co-authored-by: default avatarJeff Law <jlaw@ventanamicro.com>
      cab2e123
    • GCC Administrator's avatar
      Daily bump. · 3b3b3f88
      GCC Administrator authored
      3b3b3f88
  3. Jan 14, 2025
    • anetczuk's avatar
      c++: dump-lang-raw with obj_type_ref fields · 6e0b048f
      anetczuk authored
      Raw dump of lang tree was missing information about virtual method call.
      The information is provided in "tok" field of obj_type_ref.
      
      gcc/ChangeLog:
      
      	* tree-dump.cc (dequeue_and_dump): Handle OBJ_TYPE_REF.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/diagnostic/lang-dump-1.C: New test.
      6e0b048f
    • Iain Buclaw's avatar
      d: Merge upstream dmd, druntime d6f693b46a, phobos 336bed6d8. · c8894b68
      Iain Buclaw authored
      D front-end changes:
      
      	- Import latest fixes from dmd v2.110.0-rc.1.
      
      D runtime changes:
      
      	- Import latest fixes from druntime v2.110.0-rc.1.
      
      Phobos changes:
      
      	- Import latest fixes from phobos v2.110.0-rc.1.
      
      Included in the merge are fixes for the following PRs:
      
      	PR d/118438
      	PR d/118448
      	PR d/118449
      
      gcc/d/ChangeLog:
      
      	* dmd/MERGE: Merge upstream dmd d6f693b46a.
      	* d-incpath.cc (add_import_paths): Update for new front-end interface.
      
      libphobos/ChangeLog:
      
      	* libdruntime/MERGE: Merge upstream druntime d6f693b46a.
      	* src/MERGE: Merge upstream phobos 336bed6d8.
      	* testsuite/libphobos.init_fini/custom_gc.d: Adjust test.
      c8894b68
    • Alexandre Oliva's avatar
      [ifcombine] robustify decode_field_reference · 5006b9d8
      Alexandre Oliva authored
      Arrange for decode_field_reference to use local variables throughout,
      to modify the out parms only when we're about to return non-NULL, and
      to drop the unused case of NULL pand_mask, that had a latent failure
      to detect signbit masking.
      
      
      for  gcc/ChangeLog
      
      	* gimple-fold.cc (decode_field_reference): Rebustify to set
      	out parms only when returning non-NULL.
      	(fold_truth_andor_for_ifcombine): Bail if
      	decode_field_reference returns NULL.  Add complementary assert
      	on r_const's not being set when l_const isn't.
      5006b9d8
    • Marek Polacek's avatar
      c++: re-enable NSDMI CONSTRUCTOR folding [PR118355] · e939005c
      Marek Polacek authored
      
      In c++/102990 we had a problem where massage_init_elt got {},
      digest_nsdmi_init turned that {} into { .value = (int) 1.0e+0 },
      and we crashed in the call to fold_non_dependent_init because
      a FIX_TRUNC_EXPR/FLOAT_EXPR got into tsubst*.  So we avoided
      calling fold_non_dependent_init for a CONSTRUCTOR.
      
      But that broke the following test, where we no longer fold the
      CONST_DECL in
        { .type = ZERO }
      to
        { .type = 0 }
      and then process_init_constructor_array does:
      
                  if (next != error_mark_node
                      && (initializer_constant_valid_p (next, TREE_TYPE (next))
                          != null_pointer_node))
                    {
                      /* Use VEC_INIT_EXPR for non-constant initialization of
                         trailing elements with no explicit initializers.  */
                      picflags |= PICFLAG_VEC_INIT;
      
      because { .type = ZERO } isn't initializer_constant_valid_p.  Then we
      create a VEC_INIT_EXPR and say we can't convert the argument.
      
      So we have to fold the elements of the CONSTRUCTOR.  We just can't
      instantiate the elements in a template.
      
      This also fixes c++/118047.
      
      	PR c++/118047
      	PR c++/118355
      
      gcc/cp/ChangeLog:
      
      	* typeck2.cc (massage_init_elt): Call fold_non_dependent_init
      	unless for a CONSTRUCTOR in a template.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp0x/nsdmi-list10.C: New test.
      	* g++.dg/cpp0x/nsdmi-list9.C: New test.
      
      Reviewed-by: default avatarJason Merrill <jason@redhat.com>
      e939005c
    • Sandra Loosemore's avatar
      OpenMP: Remove dead code from declare variant reimplementation · d27db303
      Sandra Loosemore authored
      After reimplementing late resolution of "declare variant", the
      declare_variant_alt and calls_declare_variant_alt flags on struct
      cgraph_node are no longer used by anything.  For the purposes of
      marking functions that need late resolution, the
      has_omp_variant_constructs flag has replaced
      calls_declare_variant_alt.
      
      Likewise struct omp_declare_variant_entry, struct
      omp_declare_variant_base_entry, and the hash tables used to store
      these structures are no longer needed, since the information needed for
      late resolution is now stored in the gomp_variant_construct nodes.
      
      In addition, some obsolete code that was temporarily ifdef'ed out
      instead of delted in order to produce a more readable patch for the
      previous installment of this series is now removed entirely.
      
      There are no functional changes in this patch, just removing dead code.
      
      gcc/ChangeLog
      	* cgraph.cc (symbol_table::create_edge): Don't set
      	calls_declare_variant_alt in the caller.
      	* cgraph.h (struct cgraph_node): Remove declare_variant_alt
      	and calls_declare_variant_alt flags.
      	* cgraphclones.cc (cgraph_node::create_clone): Don't copy
      	calls_declare_variant_alt bit.
      	* gimplify.cc: Remove previously #ifdef-ed out code.
      	* ipa-free-lang-data.cc (free_lang_data_in_decl): Adjust code
      	referencing declare_variant_alt bit.
      	* ipa.cc (symbol_table::remove_unreachable_nodes): Likewise.
      	* lto-cgraph.cc (lto_output_node): Remove references to deleted
      	bits.
      	(output_refs): Adjust code referencing declare_variant_alt bit.
      	(input_overwrite_node): Remove references to deleted bits.
      	(input_refs): Adjust code referencing declare_variant_alt bit.
      	* lto-streamer-out.cc (lto_output): Likewise.
      	* lto-streamer.h (omp_lto_output_declare_variant_alt): Delete.
      	(omp_lto_input_declare_variant_alt): Delete.
      	* omp-expand.cc (expand_omp_target): Use has_omp_variant_constructs
      	bit to trigger pass_omp_device_lower instead of
      	calls_declare_variant_alt.
      	* omp-general.cc (struct omp_declare_variant_entry): Delete.
      	(struct omp_declare_variant_base_entry): Delete.
      	(struct omp_declare_variant_hasher): Delete.
      	(omp_declare_variant_hasher::hash): Delete.
      	(omp_declare_variant_hasher::equal): Delete.
      	(omp_declare_variants): Delete.
      	(omp_declare_variant_alt_hasher): Delete.
      	(omp_declare_variant_alt_hasher::hash): Delete.
      	(omp_declare_variant_alt_hasher::equal): Delete.
      	(omp_declare_variant_alt): Delete.
      	(omp_lto_output_declare_variant_alt): Delete.
      	(omp_lto_input_declare_variant_alt): Delete.
      	(includes): Delete unnecessary include of gt-omp-general.h.
      	* omp-offload.cc (execute_omp_device_lower): Remove references
      	to deleted bit.
      	(pass_omp_device_lower::gate): Likewise.
      	* omp-simd-clone.cc (simd_clone_create): Likewise.
      	* passes.cc (ipa_write_summaries): Likeise.
      	* symtab.cc (symtab_node::get_partitioning_class): Likewise.
      	* tree-inline.cc (expand_call_inline): Likewise.
      	(tree_function_versioning): Likewise.
      
      gcc/lto/ChangeLog
      	* lto-partition.cc (lto_balanced_map): Adjust code referencing
      	deleted declare_variant_alt bit.
      d27db303
Loading