- Jan 16, 2025
-
-
Jakub Jelinek authored
When we have return somefn (whatever); where somefn is normally tail callable and IPA-VRP determines somefn returns a singleton range, VRP just changes the IL to somefn (whatever); return 42; (or whatever the value in that range is). The introduction of IPA-VRP return value tracking then effectively regresses the tail call optimization. This is even more important if the call is [[gnu::musttail]]. So, the following patch queries IPA-VRP whether a function returns singleton range and if so and the value returned is identical to that, marks the call as [tail call] anyway. If expansion decides it can't use the tail call, we'll still expand the return 42; or similar statement, and if it decides it can use the tail call, that part will be ignored and we'll emit normal tail call. The reason it works is that the expand pass relies on the tailc pass to do its job properly. E.g. when we have <bb 2> [local count: 1073741824]: foo (x_2(D)); baz (&v); v ={v} {CLOBBER(eos)}; bar (x_2(D)); [tail call] return 1; when expand_gimple_basic_block handles the bar (x_2(D)); call, it uses if (call_stmt && gimple_call_tail_p (call_stmt)) { bool can_fallthru; new_bb = expand_gimple_tailcall (bb, call_stmt, &can_fallthru); if (new_bb) { if (can_fallthru) bb = new_bb; else { currently_expanding_gimple_stmt = NULL; return new_bb; } } } As it is actually tail callable during expansion of the bar (x_2(D)); call stmt, expand_gimple_tailbb returns non-NULL and sets can_fallthru to false, plus emits ;; bar (x_2(D)); [tail call] (insn 11 10 12 2 (set (reg:SI 5 di) (reg/v:SI 99 [ x ])) "pr118430.c":35:10 -1 (nil)) (call_insn/j 12 11 13 2 (set (reg:SI 0 ax) (call (mem:QI (symbol_ref:DI ("bar") [flags 0x3] <function_decl 0x7fb39020bd00 bar>) [0 bar S1 A8]) (const_int 0 [0]))) "pr118430.c":35:10 -1 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x3] <function_decl 0x7fb39020bd00 bar>) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil))) (expr_list:SI (use (reg:SI 5 di)) (nil))) (barrier 13 12 0) Because it doesn't fallthru, no further statements in the same bb are expanded. Now, if the bb with return happened to be in some other basic block from the [tail call], it could be expanded but because the bb with tail call ends with a barrier, it doesn't fall thru there and if nothing else could reach it, we'd remove the unreachable bb RSN. 2025-01-16 Jakub Jelinek <jakub@redhat.com> Andrew Pinski <quic_apinski@quicinc.com> PR tree-optimization/118430 * tree-tailcall.cc: Include gimple-range.h, alloc-pool.h, sreal.h, symbol-summary.h, ipa-cp.h and ipa-prop.h. (find_tail_calls): If ass_var is NULL and ret_var is not, check if IPA-VRP has not found singleton return range for it. In that case, don't punt if ret_var is the only value in that range. Adjust the maybe_error_musttail message otherwise to diagnose different value being returned from the caller and callee rather than using return slot. Formatting fixes. * c-c++-common/musttail14.c: New test. * c-c++-common/pr118430.c: New test.
-
Jakub Jelinek authored
When writing the gcc-15/changes.html patch posted earlier, I've been wondering where significant part of the Basic asm chapter went and the problem was the insertion of a new @node in the middle of the Basic Asm @node, plus not mentioning the new @node in the @menu. So the asm constexpr node was not normally visible and the Remarks for the section neither. The following patch moves it before Asm Labels, removes the spots where it described what hasn't been actually committed (constant expression can only be a container with data/size member functions) and fixes up the toplevel extended asm documentation (it was in the Basic Asm remarks and Extended Asm section's remark still said it is not valid). 2025-01-16 Jakub Jelinek <jakub@redhat.com> * doc/extend.texi (Using Assembly Language with C): Add Asm constexprs to @menu. (Basic Asm): Move @node asm constexprs before Asm Labels, rename to Asm constexprs, change wording so that it is clearer that the constant expression actually must not return a string literal, just some specific container and other wording tweaks. Only talk about top-level for basic asms in this @node, move restrictions on top-level extended asms to ... (Extended Asm): ... here.
-
Jakub Jelinek authored
For T with non-trivial destructors, we were destructing objects in the vector on release only when not using auto storage of auto_vec. The following patch calls truncate (0) instead of m_vecpfx.m_num clearing, and truncate takes care of that destruction: unsigned l = length (); gcc_checking_assert (l >= size); if (!std::is_trivially_destructible <T>::value) vec_destruct (address () + size, l - size); m_vecpfx.m_num = size; 2025-01-16 Jakub Jelinek <jakub@redhat.com> PR ipa/118400 * vec.h (vec<T, va_heap, vl_ptr>::release): Call m_vec->truncate (0) instead of clearing m_vec->m_vecpfx.m_num.
-
liuhongt authored
gcc/ChangeLog: PR target/118489 * config/i386/sse.md (VF1_AVX512BW): Fix typo. gcc/testsuite/ChangeLog: * gcc.target/i386/pr118489.c: New test.
-
Richard Biener authored
The following addresses the fact that with loop masking (or regular mask loads) we do not implement load shortening but we override the case where we need that for correctness. Likewise when we attempt to use loop masking to handle large trailing gaps we cannot do so when there's this overrun case. PR tree-optimization/115895 * tree-vect-stmts.cc (get_group_load_store_type): When we might overrun because the group size is not a multiple of the vector size we cannot use loop masking since that does not implement the required load shortening. * gcc.target/i386/vect-pr115895.c: New testcase.
-
Keith Packard authored
lm32 has 8 register parameter slots, so many vararg functions end up with several anonymous parameters passed in registers. If we run out of registers in the middle of a parameter, the entire parameter will be placed on the stack, skipping any remaining available registers. The receiving varargs function doesn't know this, and will save all of the possible parameter register values just below the stack parameters. When processing a va_arg call with a type size larger than a single register, we must check to see if it spans the boundary between register and stack parameters. If so, we need to skip to the stack parameters. This is done by making va_list a structure containing the arg pointer and the address of the start of the stack parameters. Boundary checks are inserted in va_arg calls to detect this case and the address of the parameter is set to the stack parameter start when the parameter crosses over. gcc/ * config/lm32/lm32.cc: Add several #includes. (va_list_type): New. (lm32_build_va_list): New function. (lm32_builtin_va_start): Likewise. (lm32_sd_gimplify_va_arg_expr): Likewise. (lm32_gimplify_va_arg_expr): Likewise.
-
Keith Packard authored
gcc/ * config/lm32/lm32.cc (setup_incoming_varargs): Adjust the conditionals so that pretend_size is always computed, even if no_rtl is set.
-
Keith Packard authored
The cumulative args value in setup_incoming_varargs points at the last named parameter. We need to skip over that (if present) to get to the first anonymous argument as we only want to include those anonymous args in the saved register block. gcc/ * config/lm32/lm32.cc (lm32_setup_incoming_varargs): Skip last named parameter when preparing to flush registers with unnamed arguments to th stack.
-
Keith Packard authored
* config/lm32/lm32.cc (lm32_function_arg): Pass unnamed arguments in registers too, just like named arguments.
-
Andi Kleen authored
Committed as obvious. gcc/ChangeLog: * config/i386/x86-tune-sched-core.cc: Fix incorrect comment.
-
Eugene Rozenfeld authored
We are initializing both the call graph node count and the entry block count of the function with the head_count value from the profile. Count propagation algorithm may refine the entry block count and we may end up with a case where the call graph node count is set to zero but the entry block count is non-zero. That becomes a problem because we have this code in execute_fixup_cfg: profile_count num = node->count; profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; bool scale = num.initialized_p () && !(num == den); Here if num is 0 but den is not 0, scale becomes true and we lose the counts in if (scale) bb->count = bb->count.apply_scale (num, den); This is what happened in the issue reported in PR116743 (a 10% regression in MySQL HAMMERDB tests). 3d9e6767 made an improvement in AutoFDO count propagation, which caused a mismatch between the call graph node count (zero) and the entry block count (non-zero) and subsequent loss of counts as described above. The fix is to update the call graph node count once we've done count propagation. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: PR gcov-profile/116743 * auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count and the entry block count.
-
GCC Administrator authored
-
- Jan 15, 2025
-
-
Jonathan Wakely authored
This test should use __cpp_lib_ios_noreplace rather than the internal __glibcxx_ios_noreplace macro. libstdc++-v3/ChangeLog: * testsuite/27_io/ios_base/types/openmode/case_label.cc: Use standard feature test macro not internal one.
-
Jonathan Wakely authored
The alloc_ptr.cc test for std::set tries to use C++17 features unconditionally, and tries to use the C++23 range members which haven't been implemented for std::set yet. Some of the range checks are left in place but commented out, so they can be added after the ranges members are implemented. Others (such as prepend_range) are not valid for std::set at all. Also fix uses of internal feature test macros in two other tests, which should use the standard __cpp_lib_xxx macros. libstdc++-v3/ChangeLog: * testsuite/23_containers/set/requirements/explicit_instantiation/alloc_ptr.cc: Guard node extraction checks with feature test macro. Remove calls to non-existent range members. * testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr.cc: Use standard macro not internal one. * testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr.cc: Likewise.
-
Andrew Pinski authored
This in this PR we have missed optimization where we miss that, `1 >> x` and `(1 >> x) ^ 1` can't be equal. There are a few ways of optimizing this, the easiest and simpliest is to simplify `1 >> x` into just `x == 0` as those are equivalant (if we ignore out of range values for x). we already have an optimization for `(1 >> X) !=/== 0` so the only difference here is we don't need the `!=/== 0` part to do the transformation. So this removes the `(1 >> X) !=/== 0` transformation and just adds a simplfied `1 >> x` -> `x == 0` one. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/102705 gcc/ChangeLog: * match.pd (`(1 >> X) != 0`): Remove pattern. (`1 >> x`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr105832-2.c: Update testcase. * gcc.dg/tree-ssa/pr96669-1.c: Likewise. * gcc.dg/tree-ssa/pr102705-1.c: New test. * gcc.dg/tree-ssa/pr102705-2.c: New test. Signed-off-by:
Andrew Pinski <quic_apinski@quicinc.com>
-
Sam James authored
gcc/ChangeLog: * doc/extend.texi: Cleanup trailing whitespace.
-
Sam James authored
We say 'a constant .. expression' elsewhere. Fix the grammar. gcc/ChangeLog: * doc/extend.texi: Add 'a' for grammar fix.
-
Jonathan Wakely authored
libstdc++-v3/ChangeLog: PR libstdc++/109849 * include/bits/vector.tcc (vector::_M_range_insert): Fix reversed args in length calculation.
-
Harald Anlauf authored
PR fortran/71884 gcc/fortran/ChangeLog: * resolve.cc (resolve_allocate_expr): Reject intrinsic NULL as source-expr. gcc/testsuite/ChangeLog: * gfortran.dg/pr71884.f90: New test.
-
Jakub Jelinek authored
This patch uses the count_ctor_elements function to fix up unify deduction of array sizes. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118390 * cp-tree.h (count_ctor_elements): Declare. * call.cc (count_ctor_elements): No longer static. * pt.cc (unify): Use count_ctor_elements instead of CONSTRUCTOR_NELTS. * g++.dg/cpp/embed-20.C: New test. * g++.dg/cpp0x/pr118390.C: New test.
-
Wilco Dijkstra authored
Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW. gcc: * config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.
-
Wilco Dijkstra authored
Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is already enabled for some cores, but benchmarking it shows it is faster on all modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1). gcc: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE): Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA. * config/aarch64/tuning_models/ampere1b.h: Remove redundant AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA. * config/aarch64/tuning_models/neoversev2.h: Likewise.
-
Wilco Dijkstra authored
ILP32 was originally intended to make porting to AArch64 easier. Support was never merged in the Linux kernel or GLIBC, so it has been unsupported for many years. There isn't a benefit in keeping unsupported features forever, so deprecate it now (and it could be removed in a future release). gcc: * config/aarch64/aarch64.cc (aarch64_override_options): Add warning. * doc/invoke.texi: Document -mabi=ilp32 as deprecated. gcc/testsuite: * gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated. * gcc.target/aarch64/pr100518.c: Likewise. * gcc.target/aarch64/pr113114.c: Likewise. * gcc.target/aarch64/pr80295.c: Likewise. * gcc.target/aarch64/pr94201.c: Likewise. * gcc.target/aarch64/pr94577.c: Likewise. * gcc.target/aarch64/sve/pr108603.c: Likewise.
-
Cupertino Miranda authored
CO-RE accesses with non pointer struct variables will also generate a "0" string access within the CO-RE relocation. The first index within the access string, has sort of a different meaning then the remaining of the indexes. For i0:i1:...:in being an access index for "struct A a" declaration, its semantics are represented by: (&a + (sizeof(struct A) * i0) + offsetof(i1:...:in) gcc/ChangeLog: * config/bpf/core-builtins.cc (compute_field_expr): Change VAR_DECL outcome in switch case. gcc/testsuite/ChangeLog: * gcc.target/bpf/core-builtin-1.c: Correct test. * gcc.target/bpf/core-builtin-2.c: Correct test. * gcc.target/bpf/core-builtin-exprlist-1.c: Correct test.
-
Cupertino Miranda authored
When traversing gimple to introduce CO-RE relocation entries to expressions that are accesses to attributed perserve_access_index types, the access is likely to be split in multiple gimple statments. In order to keep doing the proper CO-RE convertion we will need to mark the LHS tree nodes of gimple expressions as explicit CO-RE accesses, such that the gimple traverser will further convert the sub-expressions. This patch makes sure that this LHS marking will not happen in case the gimple statement is a function call, which case it is no longer expecting to keep generating CO-RE accesses with the remaining of the expression. gcc/ChangeLog: * config/bpf/core-builtins.cc (make_gimple_core_safe_access_index): Fix in condition. gcc/testsuite/ChangeLog: * gcc.target/bpf/core-attr-calls.c: New test.
-
Cupertino Miranda authored
Based on observation within bpf-next selftests and comparisson of GCC and clang compiled code, the BPF loader expects all CO-RE relocations to point to BTF non const and non volatile type nodes. gcc/ChangeLog: * btfout.cc (get_btf_kind): Remove static from function definition. * config/bpf/btfext-out.cc (bpf_code_reloc_add): Check if CO-RE type is not a const or volatile. * ctfc.h (btf_dtd_kind): Add prototype for function. gcc/testsuite/ChangeLog: * gcc.target/bpf/core-attr-const.c: New test.
-
Jakub Jelinek authored
As the following testcases show (mangle80.C only after reversion of the temporary reversion of C++ large array speedup commit), RAW_DATA_CST can be seen during mangling of some templates and we ICE because the mangler doesn't handle it. The following patch handles it and mangles it the same as a sequence of INTEGER_CSTs that were used previously instead. The only slight complication is that if ce->value is the last nonzero element, we need to skip the zeros at the end of RAW_DATA_CST. 2025-01-03 Jakub Jelinek <jakub@redhat.com> PR c++/118278 * mangle.cc (write_expression): Handle RAW_DATA_CST. * g++.dg/abi/mangle80.C: New test. * g++.dg/cpp/embed-19.C: New test.
-
Marek Polacek authored
Compiling this test, we emit: error: 'static void CW<T>::operator=(int) requires requires(typename'decltype_type' not supported by pp_cxx_unqualified_id::type x) {x;}' must be a non-static member function where the DECLTYPE_TYPE isn't printed properly. This patch fixes that to print: error: 'static void CW<T>::operator=(int) requires requires(typename decltype(T())::type x) {x;}' must be a non-static member function PR c++/118139 gcc/cp/ChangeLog: * cxx-pretty-print.cc (pp_cxx_nested_name_specifier): Handle a computed-type-specifier. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/decltype1.C: New test. Reviewed-by:
Jason Merrill <jason@redhat.com>
-
Jonathan Wakely authored
libstdc++-v3/ChangeLog: * testsuite/28_regex/traits/char/transform_primary.cc: Fix subclause numbering in references to the standard.
-
Tamar Christina authored
In g:3c32575e I made a mistake and incorrectly replaced the type of the arguments of an expression with the type of the expression. This is of course wrong. This reverts that change and I have also double checked the other replacements and they are fine. gcc/ChangeLog: PR middle-end/118472 * fold-const.cc (operand_compare::operand_equal_p): Fix incorrect replacement. gcc/testsuite/ChangeLog: PR middle-end/118472 * gcc.dg/pr118472.c: New test.
-
Richard Biener authored
The following adds /* <num> */ to dbg_line_numbers so there's the chance to more easily lookup the ID of the match.pd line number used for dumping when you want to debug a speicific replacement. It also cuts the lines down to 10 entries. static int dbg_line_numbers[1267] = { /* 0 */ 161, 164, 173, 175, 178, 181, 183, 189, 197, 195, /* 10 */ 199, 201, 205, 923, 921, 2060, 2071, 2052, 2058, 2063, ... * genmatch.cc (define_dump_logs): Make reverse lookup in dbg_line_numbers easier by adding comments with start index and cutting number of elements per line to 10.
-
Christoph Müllner authored
As reported in PR117079, commit ab187858 broke the test pr105493.c. The test code contains two loops, where the first one is exected to be vectorized. The commit that broke that vectorization was the first of several that enabled vectorization of both loops. Now, that GCC can vectorize the whole function, let's adjust this test to expect vectorization of both loops by ensuring that we don't write to the helper-array 'tmp'. Signed-off-by:
Christoph Müllner <christoph.muellner@vrull.eu> PR target/117079 gcc/testsuite/ChangeLog: * gcc.target/i386/pr105493.c: Fix expected vectorization Signed-off-by:
Christoph Müllner <christoph.muellner@vrull.eu>
-
Tobias Burnus authored
To find the variant declaration, a call is constructed in omp_declare_variant_finalize_one, which gives here: TARGET_EXPR <D.3010, variant_fn ()> Extracting now the function declaration failed and gave the bogus error: could not find variant declaration Solution: Use the 2nd argument of the TARGET_EXPR and continue. PR c++/118486 gcc/cp/ChangeLog: * decl.cc (omp_declare_variant_finalize_one): When resolving the variant to use, handle variant calls with TARGET_EXPR. gcc/testsuite/ChangeLog: * g++.dg/gomp/declare-variant-11.C: New test.
-
Jakub Jelinek authored
Other spots in cgraphunit.cc already call bitmap_obstack_initialize (NULL); before running a pass list and bitmap_obstack_release (NULL); after that, while process_new_functions wasn't doing that and with the new r15-130 bitmap_alloc checking that results in ICE. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR ipa/116068 * cgraphunit.cc (symbol_table::process_new_functions): Call bitmap_obstack_initialize (NULL); and bitmap_obstack_release (NULL) around processing the functions. * gcc.dg/graphite/pr116068.c: New test.
-
Jakub Jelinek authored
c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't convert to its rettype [PR118387] Note, the PR raises another problem. If on the same testcase the B b; line is removed, we silently synthetize operator<=> which will crash at runtime due to returning without a return statement. That is because the standard says that in that case it should return static_cast<int>(std::strong_ordering::equal); but I can't find anywhere wording which would say that if that isn't valid, the function is deleted. https://eel.is/c++draft/class.compare#class.spaceship-2.2 seems to talk just about cases where there are some members and their comparison is invalid it is deleted, but here there are none and it follows https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2 So, we synthetize with tf_none, see the static_cast is invalid, don't add error_mark_node statement silently, but as the function isn't deleted, we just silently emit it. Should the standard be amended to say that the operator should be deleted even if it has no elements and the static cast from https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2 On Fri, Jan 10, 2025 at 12:04:53PM -0500, Jason Merrill wrote: > That seems pretty obviously what we want, and is what the other compilers > implement. This patch implements it then. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118387 * method.cc (build_comparison_op): Set bad if std::strong_ordering::equal doesn't convert to rettype. * g++.dg/cpp2a/spaceship-err6.C: Expect another error. * g++.dg/cpp2a/spaceship-synth17.C: Likewise. * g++.dg/cpp2a/spaceship-synth-neg6.C: Likewise. * g++.dg/cpp2a/spaceship-synth-neg7.C: New test. * testsuite/25_algorithms/default_template_value.cc (Input::operator<=>): Use auto as return type rather than bool.
-
Jakub Jelinek authored
The previous patch made me look around some more and I found maybe_init_list_as_array doesn't handle RAW_DATA_CSTs correctly either, while the RAW_DATA_CST is properly split during finish_compound_literal, it was using CONSTRUCTOR_NELTS as the size of the arrays, which is wrong, RAW_DATA_CST could stand for far more initializers. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118124 * cp-tree.h (build_array_of_n_type): Change second argument type from int to unsigned HOST_WIDE_INT. * tree.cc (build_array_of_n_type): Likewise. * call.cc (count_ctor_elements): New function. (maybe_init_list_as_array): Use it instead of CONSTRUCTOR_NELTS. (convert_like_internal): Use length from init's type instead of len when handling the maybe_init_list_as_array case. * g++.dg/cpp0x/initlist-opt5.C: New test.
-
Jakub Jelinek authored
The following testcases ICE due to RAW_DATA_CST not being handled where it should be during ck_list conversions. The last 2 testcases started ICEing with r15-6339 committed yesterday (speedup of large initializers), the first two already with r15-5958 (#embed optimization for C++). For conversion to initializer_list<unsigned char> or char/signed char we can optimize and keep RAW_DATA_CST with adjusted type if we report narrowing errors if needed, for others this converts each element separately. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118124 * call.cc (convert_like_internal): Handle RAW_DATA_CST in ck_list handling. Formatting fixes. * g++.dg/cpp/embed-15.C: New test. * g++.dg/cpp/embed-16.C: New test. * g++.dg/cpp0x/initlist-opt3.C: New test. * g++.dg/cpp0x/initlist-opt4.C: New test.
-
Kito Cheng authored
`.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the return value will be the start value even if the length is 0. However current code gen in RISC-V backend is not meet that semantic, it will result a random garbage value if length is 0. Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64: # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0); vsetvli zero,a5,e64,m1,ta,ma vfmv.s.f v2,fa5 # insn 1 vfredosum.vs v1,v1,v2 # insn 2 vfmv.f.s fa5,v1 # insn 3 insn 1: - vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value. insn 2: - vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA. (v-spec say: `If vl=0, no operation is performed and the destination register is not updated.`) insn 3: - vfmv.f.s will move the value from v1 even VL=0, so this is safe. So how we fix that? we need two fix for that: 1. insn 1: need always execute with VL=1, so that we can guarantee it will always work as expect. 2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for all reduction patterns, then we can guarantee vd[0] will contain the start value when vl=0 For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2, we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1` (start value). Change since V3: - Rename _AV to _VL0_SAFE for readability. - Use non-VL0_SAFE version if VL is const or VLMAX. - Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX. - Two more testcase. gcc/ChangeLog: PR target/118182 * config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust argument for expand_reduction. (*widen_reduc_plus_scal_<mode>): Ditto. (*fold_left_widen_plus_<mode>): Ditto. (*mask_len_fold_left_widen_plus_<mode>): Ditto. (*cond_widen_reduc_plus_scal_<mode>): Ditto. (*cond_len_widen_reduc_plus_scal_<mode>): Ditto. (*cond_widen_reduc_plus_scal_<mode>): Ditto. * config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for expand_reduction. (reduc_smax_scal_<mode>): Ditto. (reduc_umax_scal_<mode>): Ditto. (reduc_smin_scal_<mode>): Ditto. (reduc_umin_scal_<mode>): Ditto. (reduc_and_scal_<mode>): Ditto. (reduc_ior_scal_<mode>): Ditto. (reduc_xor_scal_<mode>): Ditto. (reduc_plus_scal_<mode>): Ditto. (reduc_smax_scal_<mode>): Ditto. (reduc_smin_scal_<mode>): Ditto. (reduc_fmax_scal_<mode>): Ditto. (reduc_fmin_scal_<mode>): Ditto. (fold_left_plus_<mode>): Ditto. (mask_len_fold_left_plus_<mode>): Ditto. * config/riscv/riscv-v.cc (expand_reduction): Add one more argument for reduction code for vl0-safe. * config/riscv/riscv-protos.h (expand_reduction): Ditto. * config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of reduction. (ANY_REDUC_VL0_SAFE): New. (ANY_WREDUC_VL0_SAFE): Ditto. (ANY_FREDUC_VL0_SAFE): Ditto. (ANY_FREDUC_SUM_VL0_SAFE): Ditto. (ANY_FWREDUC_SUM_VL0_SAFE): Ditto. (reduc_op): Add _VL0_SAFE variant of reduction. (order) Ditto. * config/riscv/vector.md (@pred_<reduc_op><mode>): New. gcc/testsuite/ChangeLog: PR target/118182 * gfortran.target/riscv/rvv/pr118182.f: New. * gcc.target/riscv/rvv/autovec/pr118182-1.c: New. * gcc.target/riscv/rvv/autovec/pr118182-2.c: New.
-
Richard Biener authored
When we have the situation of an external SLP node that is permuted the scalar stmts recorded in the permute node do not mean the scalar computation can be removed. We are removing those stmts from the vectorized_scalar_stmts for this reason but we fail to check this set when we cost scalar stmts. Note vectorized_scalar_stmts isn't a complete set so also pass scalar_stmts_in_externs and check that. The following fixes this. This shows in PR115777 when we avoid vectorizing the load, but on it's own doesn't help the PR yet. PR tree-optimization/115777 * tree-vect-slp.cc (vect_bb_slp_scalar_cost): Do not cost a scalar stmt that needs to be preserved.
-
Michal Jires authored
I used link() to create cheap copies of Incremental LTO cache contents to prevent their deletion once linking is finished. This is unnecessary, since output_files are deleted in our lto-plugin and not in the linker itself. Bootstrapped/regtested on x86_64-linux. lto-wrapper now again builds on MinGW. Though so far I have not setup MinGW to be able to do full bootstrap. Ok for trunk? PR lto/118238 gcc/ChangeLog: * lto-wrapper.cc (run_gcc): Remove link() copying. lto-plugin/ChangeLog: * lto-plugin.c (cleanup_handler): Keep output_files when using Incremental LTO. (onload): Detect Incremental LTO.
-