- Jan 16, 2025
-
-
Keith Packard authored
* config/lm32/lm32.cc (lm32_function_arg): Pass unnamed arguments in registers too, just like named arguments.
-
Andi Kleen authored
Committed as obvious. gcc/ChangeLog: * config/i386/x86-tune-sched-core.cc: Fix incorrect comment.
-
Eugene Rozenfeld authored
We are initializing both the call graph node count and the entry block count of the function with the head_count value from the profile. Count propagation algorithm may refine the entry block count and we may end up with a case where the call graph node count is set to zero but the entry block count is non-zero. That becomes a problem because we have this code in execute_fixup_cfg: profile_count num = node->count; profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; bool scale = num.initialized_p () && !(num == den); Here if num is 0 but den is not 0, scale becomes true and we lose the counts in if (scale) bb->count = bb->count.apply_scale (num, den); This is what happened in the issue reported in PR116743 (a 10% regression in MySQL HAMMERDB tests). 3d9e6767 made an improvement in AutoFDO count propagation, which caused a mismatch between the call graph node count (zero) and the entry block count (non-zero) and subsequent loss of counts as described above. The fix is to update the call graph node count once we've done count propagation. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: PR gcov-profile/116743 * auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count and the entry block count.
-
GCC Administrator authored
-
- Jan 15, 2025
-
-
Jonathan Wakely authored
This test should use __cpp_lib_ios_noreplace rather than the internal __glibcxx_ios_noreplace macro. libstdc++-v3/ChangeLog: * testsuite/27_io/ios_base/types/openmode/case_label.cc: Use standard feature test macro not internal one.
-
Jonathan Wakely authored
The alloc_ptr.cc test for std::set tries to use C++17 features unconditionally, and tries to use the C++23 range members which haven't been implemented for std::set yet. Some of the range checks are left in place but commented out, so they can be added after the ranges members are implemented. Others (such as prepend_range) are not valid for std::set at all. Also fix uses of internal feature test macros in two other tests, which should use the standard __cpp_lib_xxx macros. libstdc++-v3/ChangeLog: * testsuite/23_containers/set/requirements/explicit_instantiation/alloc_ptr.cc: Guard node extraction checks with feature test macro. Remove calls to non-existent range members. * testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr.cc: Use standard macro not internal one. * testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr.cc: Likewise.
-
Andrew Pinski authored
This in this PR we have missed optimization where we miss that, `1 >> x` and `(1 >> x) ^ 1` can't be equal. There are a few ways of optimizing this, the easiest and simpliest is to simplify `1 >> x` into just `x == 0` as those are equivalant (if we ignore out of range values for x). we already have an optimization for `(1 >> X) !=/== 0` so the only difference here is we don't need the `!=/== 0` part to do the transformation. So this removes the `(1 >> X) !=/== 0` transformation and just adds a simplfied `1 >> x` -> `x == 0` one. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/102705 gcc/ChangeLog: * match.pd (`(1 >> X) != 0`): Remove pattern. (`1 >> x`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr105832-2.c: Update testcase. * gcc.dg/tree-ssa/pr96669-1.c: Likewise. * gcc.dg/tree-ssa/pr102705-1.c: New test. * gcc.dg/tree-ssa/pr102705-2.c: New test. Signed-off-by:
Andrew Pinski <quic_apinski@quicinc.com>
-
Sam James authored
gcc/ChangeLog: * doc/extend.texi: Cleanup trailing whitespace.
-
Sam James authored
We say 'a constant .. expression' elsewhere. Fix the grammar. gcc/ChangeLog: * doc/extend.texi: Add 'a' for grammar fix.
-
Jonathan Wakely authored
libstdc++-v3/ChangeLog: PR libstdc++/109849 * include/bits/vector.tcc (vector::_M_range_insert): Fix reversed args in length calculation.
-
Harald Anlauf authored
PR fortran/71884 gcc/fortran/ChangeLog: * resolve.cc (resolve_allocate_expr): Reject intrinsic NULL as source-expr. gcc/testsuite/ChangeLog: * gfortran.dg/pr71884.f90: New test.
-
Jakub Jelinek authored
This patch uses the count_ctor_elements function to fix up unify deduction of array sizes. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118390 * cp-tree.h (count_ctor_elements): Declare. * call.cc (count_ctor_elements): No longer static. * pt.cc (unify): Use count_ctor_elements instead of CONSTRUCTOR_NELTS. * g++.dg/cpp/embed-20.C: New test. * g++.dg/cpp0x/pr118390.C: New test.
-
Wilco Dijkstra authored
Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW. gcc: * config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.
-
Wilco Dijkstra authored
Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is already enabled for some cores, but benchmarking it shows it is faster on all modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1). gcc: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE): Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA. * config/aarch64/tuning_models/ampere1b.h: Remove redundant AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA. * config/aarch64/tuning_models/neoversev2.h: Likewise.
-
Wilco Dijkstra authored
ILP32 was originally intended to make porting to AArch64 easier. Support was never merged in the Linux kernel or GLIBC, so it has been unsupported for many years. There isn't a benefit in keeping unsupported features forever, so deprecate it now (and it could be removed in a future release). gcc: * config/aarch64/aarch64.cc (aarch64_override_options): Add warning. * doc/invoke.texi: Document -mabi=ilp32 as deprecated. gcc/testsuite: * gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated. * gcc.target/aarch64/pr100518.c: Likewise. * gcc.target/aarch64/pr113114.c: Likewise. * gcc.target/aarch64/pr80295.c: Likewise. * gcc.target/aarch64/pr94201.c: Likewise. * gcc.target/aarch64/pr94577.c: Likewise. * gcc.target/aarch64/sve/pr108603.c: Likewise.
-
Cupertino Miranda authored
CO-RE accesses with non pointer struct variables will also generate a "0" string access within the CO-RE relocation. The first index within the access string, has sort of a different meaning then the remaining of the indexes. For i0:i1:...:in being an access index for "struct A a" declaration, its semantics are represented by: (&a + (sizeof(struct A) * i0) + offsetof(i1:...:in) gcc/ChangeLog: * config/bpf/core-builtins.cc (compute_field_expr): Change VAR_DECL outcome in switch case. gcc/testsuite/ChangeLog: * gcc.target/bpf/core-builtin-1.c: Correct test. * gcc.target/bpf/core-builtin-2.c: Correct test. * gcc.target/bpf/core-builtin-exprlist-1.c: Correct test.
-
Cupertino Miranda authored
When traversing gimple to introduce CO-RE relocation entries to expressions that are accesses to attributed perserve_access_index types, the access is likely to be split in multiple gimple statments. In order to keep doing the proper CO-RE convertion we will need to mark the LHS tree nodes of gimple expressions as explicit CO-RE accesses, such that the gimple traverser will further convert the sub-expressions. This patch makes sure that this LHS marking will not happen in case the gimple statement is a function call, which case it is no longer expecting to keep generating CO-RE accesses with the remaining of the expression. gcc/ChangeLog: * config/bpf/core-builtins.cc (make_gimple_core_safe_access_index): Fix in condition. gcc/testsuite/ChangeLog: * gcc.target/bpf/core-attr-calls.c: New test.
-
Cupertino Miranda authored
Based on observation within bpf-next selftests and comparisson of GCC and clang compiled code, the BPF loader expects all CO-RE relocations to point to BTF non const and non volatile type nodes. gcc/ChangeLog: * btfout.cc (get_btf_kind): Remove static from function definition. * config/bpf/btfext-out.cc (bpf_code_reloc_add): Check if CO-RE type is not a const or volatile. * ctfc.h (btf_dtd_kind): Add prototype for function. gcc/testsuite/ChangeLog: * gcc.target/bpf/core-attr-const.c: New test.
-
Jakub Jelinek authored
As the following testcases show (mangle80.C only after reversion of the temporary reversion of C++ large array speedup commit), RAW_DATA_CST can be seen during mangling of some templates and we ICE because the mangler doesn't handle it. The following patch handles it and mangles it the same as a sequence of INTEGER_CSTs that were used previously instead. The only slight complication is that if ce->value is the last nonzero element, we need to skip the zeros at the end of RAW_DATA_CST. 2025-01-03 Jakub Jelinek <jakub@redhat.com> PR c++/118278 * mangle.cc (write_expression): Handle RAW_DATA_CST. * g++.dg/abi/mangle80.C: New test. * g++.dg/cpp/embed-19.C: New test.
-
Marek Polacek authored
Compiling this test, we emit: error: 'static void CW<T>::operator=(int) requires requires(typename'decltype_type' not supported by pp_cxx_unqualified_id::type x) {x;}' must be a non-static member function where the DECLTYPE_TYPE isn't printed properly. This patch fixes that to print: error: 'static void CW<T>::operator=(int) requires requires(typename decltype(T())::type x) {x;}' must be a non-static member function PR c++/118139 gcc/cp/ChangeLog: * cxx-pretty-print.cc (pp_cxx_nested_name_specifier): Handle a computed-type-specifier. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/decltype1.C: New test. Reviewed-by:
Jason Merrill <jason@redhat.com>
-
Jonathan Wakely authored
libstdc++-v3/ChangeLog: * testsuite/28_regex/traits/char/transform_primary.cc: Fix subclause numbering in references to the standard.
-
Tamar Christina authored
In g:3c32575e I made a mistake and incorrectly replaced the type of the arguments of an expression with the type of the expression. This is of course wrong. This reverts that change and I have also double checked the other replacements and they are fine. gcc/ChangeLog: PR middle-end/118472 * fold-const.cc (operand_compare::operand_equal_p): Fix incorrect replacement. gcc/testsuite/ChangeLog: PR middle-end/118472 * gcc.dg/pr118472.c: New test.
-
Richard Biener authored
The following adds /* <num> */ to dbg_line_numbers so there's the chance to more easily lookup the ID of the match.pd line number used for dumping when you want to debug a speicific replacement. It also cuts the lines down to 10 entries. static int dbg_line_numbers[1267] = { /* 0 */ 161, 164, 173, 175, 178, 181, 183, 189, 197, 195, /* 10 */ 199, 201, 205, 923, 921, 2060, 2071, 2052, 2058, 2063, ... * genmatch.cc (define_dump_logs): Make reverse lookup in dbg_line_numbers easier by adding comments with start index and cutting number of elements per line to 10.
-
Christoph Müllner authored
As reported in PR117079, commit ab187858 broke the test pr105493.c. The test code contains two loops, where the first one is exected to be vectorized. The commit that broke that vectorization was the first of several that enabled vectorization of both loops. Now, that GCC can vectorize the whole function, let's adjust this test to expect vectorization of both loops by ensuring that we don't write to the helper-array 'tmp'. Signed-off-by:
Christoph Müllner <christoph.muellner@vrull.eu> PR target/117079 gcc/testsuite/ChangeLog: * gcc.target/i386/pr105493.c: Fix expected vectorization Signed-off-by:
Christoph Müllner <christoph.muellner@vrull.eu>
-
Tobias Burnus authored
To find the variant declaration, a call is constructed in omp_declare_variant_finalize_one, which gives here: TARGET_EXPR <D.3010, variant_fn ()> Extracting now the function declaration failed and gave the bogus error: could not find variant declaration Solution: Use the 2nd argument of the TARGET_EXPR and continue. PR c++/118486 gcc/cp/ChangeLog: * decl.cc (omp_declare_variant_finalize_one): When resolving the variant to use, handle variant calls with TARGET_EXPR. gcc/testsuite/ChangeLog: * g++.dg/gomp/declare-variant-11.C: New test.
-
Jakub Jelinek authored
Other spots in cgraphunit.cc already call bitmap_obstack_initialize (NULL); before running a pass list and bitmap_obstack_release (NULL); after that, while process_new_functions wasn't doing that and with the new r15-130 bitmap_alloc checking that results in ICE. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR ipa/116068 * cgraphunit.cc (symbol_table::process_new_functions): Call bitmap_obstack_initialize (NULL); and bitmap_obstack_release (NULL) around processing the functions. * gcc.dg/graphite/pr116068.c: New test.
-
Jakub Jelinek authored
c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't convert to its rettype [PR118387] Note, the PR raises another problem. If on the same testcase the B b; line is removed, we silently synthetize operator<=> which will crash at runtime due to returning without a return statement. That is because the standard says that in that case it should return static_cast<int>(std::strong_ordering::equal); but I can't find anywhere wording which would say that if that isn't valid, the function is deleted. https://eel.is/c++draft/class.compare#class.spaceship-2.2 seems to talk just about cases where there are some members and their comparison is invalid it is deleted, but here there are none and it follows https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2 So, we synthetize with tf_none, see the static_cast is invalid, don't add error_mark_node statement silently, but as the function isn't deleted, we just silently emit it. Should the standard be amended to say that the operator should be deleted even if it has no elements and the static cast from https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2 On Fri, Jan 10, 2025 at 12:04:53PM -0500, Jason Merrill wrote: > That seems pretty obviously what we want, and is what the other compilers > implement. This patch implements it then. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118387 * method.cc (build_comparison_op): Set bad if std::strong_ordering::equal doesn't convert to rettype. * g++.dg/cpp2a/spaceship-err6.C: Expect another error. * g++.dg/cpp2a/spaceship-synth17.C: Likewise. * g++.dg/cpp2a/spaceship-synth-neg6.C: Likewise. * g++.dg/cpp2a/spaceship-synth-neg7.C: New test. * testsuite/25_algorithms/default_template_value.cc (Input::operator<=>): Use auto as return type rather than bool.
-
Jakub Jelinek authored
The previous patch made me look around some more and I found maybe_init_list_as_array doesn't handle RAW_DATA_CSTs correctly either, while the RAW_DATA_CST is properly split during finish_compound_literal, it was using CONSTRUCTOR_NELTS as the size of the arrays, which is wrong, RAW_DATA_CST could stand for far more initializers. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118124 * cp-tree.h (build_array_of_n_type): Change second argument type from int to unsigned HOST_WIDE_INT. * tree.cc (build_array_of_n_type): Likewise. * call.cc (count_ctor_elements): New function. (maybe_init_list_as_array): Use it instead of CONSTRUCTOR_NELTS. (convert_like_internal): Use length from init's type instead of len when handling the maybe_init_list_as_array case. * g++.dg/cpp0x/initlist-opt5.C: New test.
-
Jakub Jelinek authored
The following testcases ICE due to RAW_DATA_CST not being handled where it should be during ck_list conversions. The last 2 testcases started ICEing with r15-6339 committed yesterday (speedup of large initializers), the first two already with r15-5958 (#embed optimization for C++). For conversion to initializer_list<unsigned char> or char/signed char we can optimize and keep RAW_DATA_CST with adjusted type if we report narrowing errors if needed, for others this converts each element separately. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118124 * call.cc (convert_like_internal): Handle RAW_DATA_CST in ck_list handling. Formatting fixes. * g++.dg/cpp/embed-15.C: New test. * g++.dg/cpp/embed-16.C: New test. * g++.dg/cpp0x/initlist-opt3.C: New test. * g++.dg/cpp0x/initlist-opt4.C: New test.
-
Kito Cheng authored
`.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the return value will be the start value even if the length is 0. However current code gen in RISC-V backend is not meet that semantic, it will result a random garbage value if length is 0. Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64: # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0); vsetvli zero,a5,e64,m1,ta,ma vfmv.s.f v2,fa5 # insn 1 vfredosum.vs v1,v1,v2 # insn 2 vfmv.f.s fa5,v1 # insn 3 insn 1: - vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value. insn 2: - vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA. (v-spec say: `If vl=0, no operation is performed and the destination register is not updated.`) insn 3: - vfmv.f.s will move the value from v1 even VL=0, so this is safe. So how we fix that? we need two fix for that: 1. insn 1: need always execute with VL=1, so that we can guarantee it will always work as expect. 2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for all reduction patterns, then we can guarantee vd[0] will contain the start value when vl=0 For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2, we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1` (start value). Change since V3: - Rename _AV to _VL0_SAFE for readability. - Use non-VL0_SAFE version if VL is const or VLMAX. - Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX. - Two more testcase. gcc/ChangeLog: PR target/118182 * config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust argument for expand_reduction. (*widen_reduc_plus_scal_<mode>): Ditto. (*fold_left_widen_plus_<mode>): Ditto. (*mask_len_fold_left_widen_plus_<mode>): Ditto. (*cond_widen_reduc_plus_scal_<mode>): Ditto. (*cond_len_widen_reduc_plus_scal_<mode>): Ditto. (*cond_widen_reduc_plus_scal_<mode>): Ditto. * config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for expand_reduction. (reduc_smax_scal_<mode>): Ditto. (reduc_umax_scal_<mode>): Ditto. (reduc_smin_scal_<mode>): Ditto. (reduc_umin_scal_<mode>): Ditto. (reduc_and_scal_<mode>): Ditto. (reduc_ior_scal_<mode>): Ditto. (reduc_xor_scal_<mode>): Ditto. (reduc_plus_scal_<mode>): Ditto. (reduc_smax_scal_<mode>): Ditto. (reduc_smin_scal_<mode>): Ditto. (reduc_fmax_scal_<mode>): Ditto. (reduc_fmin_scal_<mode>): Ditto. (fold_left_plus_<mode>): Ditto. (mask_len_fold_left_plus_<mode>): Ditto. * config/riscv/riscv-v.cc (expand_reduction): Add one more argument for reduction code for vl0-safe. * config/riscv/riscv-protos.h (expand_reduction): Ditto. * config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of reduction. (ANY_REDUC_VL0_SAFE): New. (ANY_WREDUC_VL0_SAFE): Ditto. (ANY_FREDUC_VL0_SAFE): Ditto. (ANY_FREDUC_SUM_VL0_SAFE): Ditto. (ANY_FWREDUC_SUM_VL0_SAFE): Ditto. (reduc_op): Add _VL0_SAFE variant of reduction. (order) Ditto. * config/riscv/vector.md (@pred_<reduc_op><mode>): New. gcc/testsuite/ChangeLog: PR target/118182 * gfortran.target/riscv/rvv/pr118182.f: New. * gcc.target/riscv/rvv/autovec/pr118182-1.c: New. * gcc.target/riscv/rvv/autovec/pr118182-2.c: New.
-
Richard Biener authored
When we have the situation of an external SLP node that is permuted the scalar stmts recorded in the permute node do not mean the scalar computation can be removed. We are removing those stmts from the vectorized_scalar_stmts for this reason but we fail to check this set when we cost scalar stmts. Note vectorized_scalar_stmts isn't a complete set so also pass scalar_stmts_in_externs and check that. The following fixes this. This shows in PR115777 when we avoid vectorizing the load, but on it's own doesn't help the PR yet. PR tree-optimization/115777 * tree-vect-slp.cc (vect_bb_slp_scalar_cost): Do not cost a scalar stmt that needs to be preserved.
-
Michal Jires authored
I used link() to create cheap copies of Incremental LTO cache contents to prevent their deletion once linking is finished. This is unnecessary, since output_files are deleted in our lto-plugin and not in the linker itself. Bootstrapped/regtested on x86_64-linux. lto-wrapper now again builds on MinGW. Though so far I have not setup MinGW to be able to do full bootstrap. Ok for trunk? PR lto/118238 gcc/ChangeLog: * lto-wrapper.cc (run_gcc): Remove link() copying. lto-plugin/ChangeLog: * lto-plugin.c (cleanup_handler): Keep output_files when using Incremental LTO. (onload): Detect Incremental LTO.
-
Anton Blanchard authored
Clearly an oversight in the generic-ooo model caught by the checking code. I should have realized it was generic-ooo as we don't have a pipeline description for the tenstorrent design yet, just the costing model. The patch was extracted from the BZ which indicated Anton was the author, so I kept that. I'm listed as co-author just in case someone wants to complain about the testcase in the future. I didn't do any notable lifting here. Thanks Peter and Anton! PR target/118170 gcc/ * config/riscv/generic-ooo.md (generic_ooo_float_div_half): New reservation. gcc/testsuite * gcc.target/riscv/pr118170.c: New test. Co-authored-by:
Jeff Law <jlaw@ventanamicro.com>
-
Richard Sandiford authored
> The BZ in question is a failure to recognize a pair of shifts as a sign > extension. > > I originally thought simplify-rtx would be the right framework to > address this problem, but fwprop is actually better. We can write the > recognizer much simpler in that framework. > > fwprop already simplifies nested shifts/extensions to the desired RTL, > but it's not considered profitable and we throw away the good work done > by fwprop & simplifiers. > > It's hard to see a scenario where nested shifts or nested extensions > that simplify down to a single sign/zero extension isn't a profitable > transformation. So when fwprop has nested shifts/extensions that > simplifies to an extension, we consider it profitable. > > This allow us to simplify the testcase on rv64 with ZBB enabled from a > pair of shifts to a single byte or half-word sign extension. Hmm. So just to summarise something that was discussed in the PR comments, this is a case where combine's expand_compound_operation/ make_compound_operation wrangler hurts us, because the process isn't idempotent, and combine produces two complex instructions: (insn 6 3 7 2 (set (reg:DI 137 [ _3 ]) (ashift:DI (reg:DI 139 [ x ]) (const_int 24 [0x18]))) "foo.c":2:20 305 {ashldi3} (expr_list:REG_DEAD (reg:DI 139 [ x ]) (nil))) (insn 12 7 13 2 (set (reg/i:DI 10 a0) (sign_extend:DI (ashiftrt:SI (subreg:SI (reg:DI 137 [ _3 ]) 0) (const_int 24 [0x18])))) "foo.c":2:27 321 {ashrsi3_extend} (expr_list:REG_DEAD (reg:DI 137 [ _3 ]) (nil))) given two simple instructions: (insn 6 3 7 2 (set (reg:SI 137 [ _3 ]) (sign_extend:SI (subreg:QI (reg/v:DI 136 [ x ]) 0))) "foo.c":2:20 533 {*extendqisi2_bitmanip} (expr_list:REG_DEAD (reg/v:DI 136 [ x ]) (nil))) (insn 7 6 12 2 (set (reg:DI 138 [ _3 ]) (sign_extend:DI (reg:SI 137 [ _3 ]))) "foo.c":2:20 discrim 1 133 {*extendsidi2_internal} (expr_list:REG_DEAD (reg:SI 137 [ _3 ]) (nil))) If I run with -fdisable-rtl-combine then late_combine1 already does the expected transformation. Although it would be nice to fix combine, that might be difficult. If we treat combine as immutable then the options are: (1) Teach simplify-rtx to simplify combine's output into a single sign_extend. (2) Allow fwprop1 to get in first, before combine has a chance to mess things up. The patch goes for (2). Is that a fair summary? Playing devil's advocate, I suppose one advantage of (1) is that it would allow the optimisation even if the original rtl looked like combine's output. And fwprop1 doesn't distinguish between cases in which the source instruction disappears from cases in which the source instruction is kept. Thus we could transform: (set (reg:SI R2) (sign_extend:SI (reg:QI R1))) (set (reg:DI R3) (sign_extend:DI (reg:SI R2))) into: (set (reg:SI R2) (sign_extend:SI (reg:QI R1))) (set (reg:DI R3) (sign_extend:DI (reg:QI R1))) which increases the register pressure between the two instructions (since R2 and R1 are both now live). In general, there could be quite a gap between the two instructions. On the other hand, even in that case, fwprop1 would be parallelising the extensions. And since we're talking about unary operations, even two-address targets would allow R1 to be extended without tying the source and destination. Also, it seems relatively unlikely that expand would produce code that looks like combine's, since the gimple optimisers should have simplified it into conversions. So initially I was going to agree that it's worth trying in fwprop. But... [ commentary on Jeff's original approach dropped. ] So it seems like it's a bit of a mess
If we do try to fix combine, I think something like the attached would fit within the current scheme. It is a pure shift-for-shift transformation, avoiding any extensions. Will think more about it, but wanted to get the above stream of consciousness out before I finish for the day PR rtl-optimization/109592 gcc/ * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Simplify nested shifts with subregs. gcc/testsuite * gcc.target/riscv/pr109592.c: New test. * gcc.target/riscv/sign-extend-rshift.c: Adjust expected output Co-authored-by:Jeff Law <jlaw@ventanamicro.com>
-
GCC Administrator authored
-
- Jan 14, 2025
-
-
anetczuk authored
Raw dump of lang tree was missing information about virtual method call. The information is provided in "tok" field of obj_type_ref. gcc/ChangeLog: * tree-dump.cc (dequeue_and_dump): Handle OBJ_TYPE_REF. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/lang-dump-1.C: New test.
-
Iain Buclaw authored
D front-end changes: - Import latest fixes from dmd v2.110.0-rc.1. D runtime changes: - Import latest fixes from druntime v2.110.0-rc.1. Phobos changes: - Import latest fixes from phobos v2.110.0-rc.1. Included in the merge are fixes for the following PRs: PR d/118438 PR d/118448 PR d/118449 gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd d6f693b46a. * d-incpath.cc (add_import_paths): Update for new front-end interface. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime d6f693b46a. * src/MERGE: Merge upstream phobos 336bed6d8. * testsuite/libphobos.init_fini/custom_gc.d: Adjust test.
-
Alexandre Oliva authored
Arrange for decode_field_reference to use local variables throughout, to modify the out parms only when we're about to return non-NULL, and to drop the unused case of NULL pand_mask, that had a latent failure to detect signbit masking. for gcc/ChangeLog * gimple-fold.cc (decode_field_reference): Rebustify to set out parms only when returning non-NULL. (fold_truth_andor_for_ifcombine): Bail if decode_field_reference returns NULL. Add complementary assert on r_const's not being set when l_const isn't.
-
Marek Polacek authored
In c++/102990 we had a problem where massage_init_elt got {}, digest_nsdmi_init turned that {} into { .value = (int) 1.0e+0 }, and we crashed in the call to fold_non_dependent_init because a FIX_TRUNC_EXPR/FLOAT_EXPR got into tsubst*. So we avoided calling fold_non_dependent_init for a CONSTRUCTOR. But that broke the following test, where we no longer fold the CONST_DECL in { .type = ZERO } to { .type = 0 } and then process_init_constructor_array does: if (next != error_mark_node && (initializer_constant_valid_p (next, TREE_TYPE (next)) != null_pointer_node)) { /* Use VEC_INIT_EXPR for non-constant initialization of trailing elements with no explicit initializers. */ picflags |= PICFLAG_VEC_INIT; because { .type = ZERO } isn't initializer_constant_valid_p. Then we create a VEC_INIT_EXPR and say we can't convert the argument. So we have to fold the elements of the CONSTRUCTOR. We just can't instantiate the elements in a template. This also fixes c++/118047. PR c++/118047 PR c++/118355 gcc/cp/ChangeLog: * typeck2.cc (massage_init_elt): Call fold_non_dependent_init unless for a CONSTRUCTOR in a template. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/nsdmi-list10.C: New test. * g++.dg/cpp0x/nsdmi-list9.C: New test. Reviewed-by:
Jason Merrill <jason@redhat.com>
-
Sandra Loosemore authored
After reimplementing late resolution of "declare variant", the declare_variant_alt and calls_declare_variant_alt flags on struct cgraph_node are no longer used by anything. For the purposes of marking functions that need late resolution, the has_omp_variant_constructs flag has replaced calls_declare_variant_alt. Likewise struct omp_declare_variant_entry, struct omp_declare_variant_base_entry, and the hash tables used to store these structures are no longer needed, since the information needed for late resolution is now stored in the gomp_variant_construct nodes. In addition, some obsolete code that was temporarily ifdef'ed out instead of delted in order to produce a more readable patch for the previous installment of this series is now removed entirely. There are no functional changes in this patch, just removing dead code. gcc/ChangeLog * cgraph.cc (symbol_table::create_edge): Don't set calls_declare_variant_alt in the caller. * cgraph.h (struct cgraph_node): Remove declare_variant_alt and calls_declare_variant_alt flags. * cgraphclones.cc (cgraph_node::create_clone): Don't copy calls_declare_variant_alt bit. * gimplify.cc: Remove previously #ifdef-ed out code. * ipa-free-lang-data.cc (free_lang_data_in_decl): Adjust code referencing declare_variant_alt bit. * ipa.cc (symbol_table::remove_unreachable_nodes): Likewise. * lto-cgraph.cc (lto_output_node): Remove references to deleted bits. (output_refs): Adjust code referencing declare_variant_alt bit. (input_overwrite_node): Remove references to deleted bits. (input_refs): Adjust code referencing declare_variant_alt bit. * lto-streamer-out.cc (lto_output): Likewise. * lto-streamer.h (omp_lto_output_declare_variant_alt): Delete. (omp_lto_input_declare_variant_alt): Delete. * omp-expand.cc (expand_omp_target): Use has_omp_variant_constructs bit to trigger pass_omp_device_lower instead of calls_declare_variant_alt. * omp-general.cc (struct omp_declare_variant_entry): Delete. (struct omp_declare_variant_base_entry): Delete. (struct omp_declare_variant_hasher): Delete. (omp_declare_variant_hasher::hash): Delete. (omp_declare_variant_hasher::equal): Delete. (omp_declare_variants): Delete. (omp_declare_variant_alt_hasher): Delete. (omp_declare_variant_alt_hasher::hash): Delete. (omp_declare_variant_alt_hasher::equal): Delete. (omp_declare_variant_alt): Delete. (omp_lto_output_declare_variant_alt): Delete. (omp_lto_input_declare_variant_alt): Delete. (includes): Delete unnecessary include of gt-omp-general.h. * omp-offload.cc (execute_omp_device_lower): Remove references to deleted bit. (pass_omp_device_lower::gate): Likewise. * omp-simd-clone.cc (simd_clone_create): Likewise. * passes.cc (ipa_write_summaries): Likeise. * symtab.cc (symtab_node::get_partitioning_class): Likewise. * tree-inline.cc (expand_call_inline): Likewise. (tree_function_versioning): Likewise. gcc/lto/ChangeLog * lto-partition.cc (lto_balanced_map): Adjust code referencing deleted declare_variant_alt bit.
-