- Jan 15, 2025
-
-
Jonathan Wakely authored
libstdc++-v3/ChangeLog: PR libstdc++/109849 * include/bits/vector.tcc (vector::_M_range_insert): Fix reversed args in length calculation.
-
Harald Anlauf authored
PR fortran/71884 gcc/fortran/ChangeLog: * resolve.cc (resolve_allocate_expr): Reject intrinsic NULL as source-expr. gcc/testsuite/ChangeLog: * gfortran.dg/pr71884.f90: New test.
-
Jakub Jelinek authored
This patch uses the count_ctor_elements function to fix up unify deduction of array sizes. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118390 * cp-tree.h (count_ctor_elements): Declare. * call.cc (count_ctor_elements): No longer static. * pt.cc (unify): Use count_ctor_elements instead of CONSTRUCTOR_NELTS. * g++.dg/cpp/embed-20.C: New test. * g++.dg/cpp0x/pr118390.C: New test.
-
Wilco Dijkstra authored
Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW. gcc: * config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.
-
Wilco Dijkstra authored
Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is already enabled for some cores, but benchmarking it shows it is faster on all modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1). gcc: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE): Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA. * config/aarch64/tuning_models/ampere1b.h: Remove redundant AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA. * config/aarch64/tuning_models/neoversev2.h: Likewise.
-
Wilco Dijkstra authored
ILP32 was originally intended to make porting to AArch64 easier. Support was never merged in the Linux kernel or GLIBC, so it has been unsupported for many years. There isn't a benefit in keeping unsupported features forever, so deprecate it now (and it could be removed in a future release). gcc: * config/aarch64/aarch64.cc (aarch64_override_options): Add warning. * doc/invoke.texi: Document -mabi=ilp32 as deprecated. gcc/testsuite: * gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated. * gcc.target/aarch64/pr100518.c: Likewise. * gcc.target/aarch64/pr113114.c: Likewise. * gcc.target/aarch64/pr80295.c: Likewise. * gcc.target/aarch64/pr94201.c: Likewise. * gcc.target/aarch64/pr94577.c: Likewise. * gcc.target/aarch64/sve/pr108603.c: Likewise.
-
Cupertino Miranda authored
CO-RE accesses with non pointer struct variables will also generate a "0" string access within the CO-RE relocation. The first index within the access string, has sort of a different meaning then the remaining of the indexes. For i0:i1:...:in being an access index for "struct A a" declaration, its semantics are represented by: (&a + (sizeof(struct A) * i0) + offsetof(i1:...:in) gcc/ChangeLog: * config/bpf/core-builtins.cc (compute_field_expr): Change VAR_DECL outcome in switch case. gcc/testsuite/ChangeLog: * gcc.target/bpf/core-builtin-1.c: Correct test. * gcc.target/bpf/core-builtin-2.c: Correct test. * gcc.target/bpf/core-builtin-exprlist-1.c: Correct test.
-
Cupertino Miranda authored
When traversing gimple to introduce CO-RE relocation entries to expressions that are accesses to attributed perserve_access_index types, the access is likely to be split in multiple gimple statments. In order to keep doing the proper CO-RE convertion we will need to mark the LHS tree nodes of gimple expressions as explicit CO-RE accesses, such that the gimple traverser will further convert the sub-expressions. This patch makes sure that this LHS marking will not happen in case the gimple statement is a function call, which case it is no longer expecting to keep generating CO-RE accesses with the remaining of the expression. gcc/ChangeLog: * config/bpf/core-builtins.cc (make_gimple_core_safe_access_index): Fix in condition. gcc/testsuite/ChangeLog: * gcc.target/bpf/core-attr-calls.c: New test.
-
Cupertino Miranda authored
Based on observation within bpf-next selftests and comparisson of GCC and clang compiled code, the BPF loader expects all CO-RE relocations to point to BTF non const and non volatile type nodes. gcc/ChangeLog: * btfout.cc (get_btf_kind): Remove static from function definition. * config/bpf/btfext-out.cc (bpf_code_reloc_add): Check if CO-RE type is not a const or volatile. * ctfc.h (btf_dtd_kind): Add prototype for function. gcc/testsuite/ChangeLog: * gcc.target/bpf/core-attr-const.c: New test.
-
Jakub Jelinek authored
As the following testcases show (mangle80.C only after reversion of the temporary reversion of C++ large array speedup commit), RAW_DATA_CST can be seen during mangling of some templates and we ICE because the mangler doesn't handle it. The following patch handles it and mangles it the same as a sequence of INTEGER_CSTs that were used previously instead. The only slight complication is that if ce->value is the last nonzero element, we need to skip the zeros at the end of RAW_DATA_CST. 2025-01-03 Jakub Jelinek <jakub@redhat.com> PR c++/118278 * mangle.cc (write_expression): Handle RAW_DATA_CST. * g++.dg/abi/mangle80.C: New test. * g++.dg/cpp/embed-19.C: New test.
-
Marek Polacek authored
Compiling this test, we emit: error: 'static void CW<T>::operator=(int) requires requires(typename'decltype_type' not supported by pp_cxx_unqualified_id::type x) {x;}' must be a non-static member function where the DECLTYPE_TYPE isn't printed properly. This patch fixes that to print: error: 'static void CW<T>::operator=(int) requires requires(typename decltype(T())::type x) {x;}' must be a non-static member function PR c++/118139 gcc/cp/ChangeLog: * cxx-pretty-print.cc (pp_cxx_nested_name_specifier): Handle a computed-type-specifier. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/decltype1.C: New test. Reviewed-by:
Jason Merrill <jason@redhat.com>
-
Jonathan Wakely authored
libstdc++-v3/ChangeLog: * testsuite/28_regex/traits/char/transform_primary.cc: Fix subclause numbering in references to the standard.
-
Tamar Christina authored
In g:3c32575e I made a mistake and incorrectly replaced the type of the arguments of an expression with the type of the expression. This is of course wrong. This reverts that change and I have also double checked the other replacements and they are fine. gcc/ChangeLog: PR middle-end/118472 * fold-const.cc (operand_compare::operand_equal_p): Fix incorrect replacement. gcc/testsuite/ChangeLog: PR middle-end/118472 * gcc.dg/pr118472.c: New test.
-
Richard Biener authored
The following adds /* <num> */ to dbg_line_numbers so there's the chance to more easily lookup the ID of the match.pd line number used for dumping when you want to debug a speicific replacement. It also cuts the lines down to 10 entries. static int dbg_line_numbers[1267] = { /* 0 */ 161, 164, 173, 175, 178, 181, 183, 189, 197, 195, /* 10 */ 199, 201, 205, 923, 921, 2060, 2071, 2052, 2058, 2063, ... * genmatch.cc (define_dump_logs): Make reverse lookup in dbg_line_numbers easier by adding comments with start index and cutting number of elements per line to 10.
-
Christoph Müllner authored
As reported in PR117079, commit ab187858 broke the test pr105493.c. The test code contains two loops, where the first one is exected to be vectorized. The commit that broke that vectorization was the first of several that enabled vectorization of both loops. Now, that GCC can vectorize the whole function, let's adjust this test to expect vectorization of both loops by ensuring that we don't write to the helper-array 'tmp'. Signed-off-by:
Christoph Müllner <christoph.muellner@vrull.eu> PR target/117079 gcc/testsuite/ChangeLog: * gcc.target/i386/pr105493.c: Fix expected vectorization Signed-off-by:
Christoph Müllner <christoph.muellner@vrull.eu>
-
Tobias Burnus authored
To find the variant declaration, a call is constructed in omp_declare_variant_finalize_one, which gives here: TARGET_EXPR <D.3010, variant_fn ()> Extracting now the function declaration failed and gave the bogus error: could not find variant declaration Solution: Use the 2nd argument of the TARGET_EXPR and continue. PR c++/118486 gcc/cp/ChangeLog: * decl.cc (omp_declare_variant_finalize_one): When resolving the variant to use, handle variant calls with TARGET_EXPR. gcc/testsuite/ChangeLog: * g++.dg/gomp/declare-variant-11.C: New test.
-
Jakub Jelinek authored
Other spots in cgraphunit.cc already call bitmap_obstack_initialize (NULL); before running a pass list and bitmap_obstack_release (NULL); after that, while process_new_functions wasn't doing that and with the new r15-130 bitmap_alloc checking that results in ICE. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR ipa/116068 * cgraphunit.cc (symbol_table::process_new_functions): Call bitmap_obstack_initialize (NULL); and bitmap_obstack_release (NULL) around processing the functions. * gcc.dg/graphite/pr116068.c: New test.
-
Jakub Jelinek authored
c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't convert to its rettype [PR118387] Note, the PR raises another problem. If on the same testcase the B b; line is removed, we silently synthetize operator<=> which will crash at runtime due to returning without a return statement. That is because the standard says that in that case it should return static_cast<int>(std::strong_ordering::equal); but I can't find anywhere wording which would say that if that isn't valid, the function is deleted. https://eel.is/c++draft/class.compare#class.spaceship-2.2 seems to talk just about cases where there are some members and their comparison is invalid it is deleted, but here there are none and it follows https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2 So, we synthetize with tf_none, see the static_cast is invalid, don't add error_mark_node statement silently, but as the function isn't deleted, we just silently emit it. Should the standard be amended to say that the operator should be deleted even if it has no elements and the static cast from https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2 On Fri, Jan 10, 2025 at 12:04:53PM -0500, Jason Merrill wrote: > That seems pretty obviously what we want, and is what the other compilers > implement. This patch implements it then. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118387 * method.cc (build_comparison_op): Set bad if std::strong_ordering::equal doesn't convert to rettype. * g++.dg/cpp2a/spaceship-err6.C: Expect another error. * g++.dg/cpp2a/spaceship-synth17.C: Likewise. * g++.dg/cpp2a/spaceship-synth-neg6.C: Likewise. * g++.dg/cpp2a/spaceship-synth-neg7.C: New test. * testsuite/25_algorithms/default_template_value.cc (Input::operator<=>): Use auto as return type rather than bool.
-
Jakub Jelinek authored
The previous patch made me look around some more and I found maybe_init_list_as_array doesn't handle RAW_DATA_CSTs correctly either, while the RAW_DATA_CST is properly split during finish_compound_literal, it was using CONSTRUCTOR_NELTS as the size of the arrays, which is wrong, RAW_DATA_CST could stand for far more initializers. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118124 * cp-tree.h (build_array_of_n_type): Change second argument type from int to unsigned HOST_WIDE_INT. * tree.cc (build_array_of_n_type): Likewise. * call.cc (count_ctor_elements): New function. (maybe_init_list_as_array): Use it instead of CONSTRUCTOR_NELTS. (convert_like_internal): Use length from init's type instead of len when handling the maybe_init_list_as_array case. * g++.dg/cpp0x/initlist-opt5.C: New test.
-
Jakub Jelinek authored
The following testcases ICE due to RAW_DATA_CST not being handled where it should be during ck_list conversions. The last 2 testcases started ICEing with r15-6339 committed yesterday (speedup of large initializers), the first two already with r15-5958 (#embed optimization for C++). For conversion to initializer_list<unsigned char> or char/signed char we can optimize and keep RAW_DATA_CST with adjusted type if we report narrowing errors if needed, for others this converts each element separately. 2025-01-15 Jakub Jelinek <jakub@redhat.com> PR c++/118124 * call.cc (convert_like_internal): Handle RAW_DATA_CST in ck_list handling. Formatting fixes. * g++.dg/cpp/embed-15.C: New test. * g++.dg/cpp/embed-16.C: New test. * g++.dg/cpp0x/initlist-opt3.C: New test. * g++.dg/cpp0x/initlist-opt4.C: New test.
-
Kito Cheng authored
`.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the return value will be the start value even if the length is 0. However current code gen in RISC-V backend is not meet that semantic, it will result a random garbage value if length is 0. Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64: # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0); vsetvli zero,a5,e64,m1,ta,ma vfmv.s.f v2,fa5 # insn 1 vfredosum.vs v1,v1,v2 # insn 2 vfmv.f.s fa5,v1 # insn 3 insn 1: - vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value. insn 2: - vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA. (v-spec say: `If vl=0, no operation is performed and the destination register is not updated.`) insn 3: - vfmv.f.s will move the value from v1 even VL=0, so this is safe. So how we fix that? we need two fix for that: 1. insn 1: need always execute with VL=1, so that we can guarantee it will always work as expect. 2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for all reduction patterns, then we can guarantee vd[0] will contain the start value when vl=0 For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2, we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1` (start value). Change since V3: - Rename _AV to _VL0_SAFE for readability. - Use non-VL0_SAFE version if VL is const or VLMAX. - Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX. - Two more testcase. gcc/ChangeLog: PR target/118182 * config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust argument for expand_reduction. (*widen_reduc_plus_scal_<mode>): Ditto. (*fold_left_widen_plus_<mode>): Ditto. (*mask_len_fold_left_widen_plus_<mode>): Ditto. (*cond_widen_reduc_plus_scal_<mode>): Ditto. (*cond_len_widen_reduc_plus_scal_<mode>): Ditto. (*cond_widen_reduc_plus_scal_<mode>): Ditto. * config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for expand_reduction. (reduc_smax_scal_<mode>): Ditto. (reduc_umax_scal_<mode>): Ditto. (reduc_smin_scal_<mode>): Ditto. (reduc_umin_scal_<mode>): Ditto. (reduc_and_scal_<mode>): Ditto. (reduc_ior_scal_<mode>): Ditto. (reduc_xor_scal_<mode>): Ditto. (reduc_plus_scal_<mode>): Ditto. (reduc_smax_scal_<mode>): Ditto. (reduc_smin_scal_<mode>): Ditto. (reduc_fmax_scal_<mode>): Ditto. (reduc_fmin_scal_<mode>): Ditto. (fold_left_plus_<mode>): Ditto. (mask_len_fold_left_plus_<mode>): Ditto. * config/riscv/riscv-v.cc (expand_reduction): Add one more argument for reduction code for vl0-safe. * config/riscv/riscv-protos.h (expand_reduction): Ditto. * config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of reduction. (ANY_REDUC_VL0_SAFE): New. (ANY_WREDUC_VL0_SAFE): Ditto. (ANY_FREDUC_VL0_SAFE): Ditto. (ANY_FREDUC_SUM_VL0_SAFE): Ditto. (ANY_FWREDUC_SUM_VL0_SAFE): Ditto. (reduc_op): Add _VL0_SAFE variant of reduction. (order) Ditto. * config/riscv/vector.md (@pred_<reduc_op><mode>): New. gcc/testsuite/ChangeLog: PR target/118182 * gfortran.target/riscv/rvv/pr118182.f: New. * gcc.target/riscv/rvv/autovec/pr118182-1.c: New. * gcc.target/riscv/rvv/autovec/pr118182-2.c: New.
-
Richard Biener authored
When we have the situation of an external SLP node that is permuted the scalar stmts recorded in the permute node do not mean the scalar computation can be removed. We are removing those stmts from the vectorized_scalar_stmts for this reason but we fail to check this set when we cost scalar stmts. Note vectorized_scalar_stmts isn't a complete set so also pass scalar_stmts_in_externs and check that. The following fixes this. This shows in PR115777 when we avoid vectorizing the load, but on it's own doesn't help the PR yet. PR tree-optimization/115777 * tree-vect-slp.cc (vect_bb_slp_scalar_cost): Do not cost a scalar stmt that needs to be preserved.
-
Michal Jires authored
I used link() to create cheap copies of Incremental LTO cache contents to prevent their deletion once linking is finished. This is unnecessary, since output_files are deleted in our lto-plugin and not in the linker itself. Bootstrapped/regtested on x86_64-linux. lto-wrapper now again builds on MinGW. Though so far I have not setup MinGW to be able to do full bootstrap. Ok for trunk? PR lto/118238 gcc/ChangeLog: * lto-wrapper.cc (run_gcc): Remove link() copying. lto-plugin/ChangeLog: * lto-plugin.c (cleanup_handler): Keep output_files when using Incremental LTO. (onload): Detect Incremental LTO.
-
Anton Blanchard authored
Clearly an oversight in the generic-ooo model caught by the checking code. I should have realized it was generic-ooo as we don't have a pipeline description for the tenstorrent design yet, just the costing model. The patch was extracted from the BZ which indicated Anton was the author, so I kept that. I'm listed as co-author just in case someone wants to complain about the testcase in the future. I didn't do any notable lifting here. Thanks Peter and Anton! PR target/118170 gcc/ * config/riscv/generic-ooo.md (generic_ooo_float_div_half): New reservation. gcc/testsuite * gcc.target/riscv/pr118170.c: New test. Co-authored-by:
Jeff Law <jlaw@ventanamicro.com>
-
Richard Sandiford authored
> The BZ in question is a failure to recognize a pair of shifts as a sign > extension. > > I originally thought simplify-rtx would be the right framework to > address this problem, but fwprop is actually better. We can write the > recognizer much simpler in that framework. > > fwprop already simplifies nested shifts/extensions to the desired RTL, > but it's not considered profitable and we throw away the good work done > by fwprop & simplifiers. > > It's hard to see a scenario where nested shifts or nested extensions > that simplify down to a single sign/zero extension isn't a profitable > transformation. So when fwprop has nested shifts/extensions that > simplifies to an extension, we consider it profitable. > > This allow us to simplify the testcase on rv64 with ZBB enabled from a > pair of shifts to a single byte or half-word sign extension. Hmm. So just to summarise something that was discussed in the PR comments, this is a case where combine's expand_compound_operation/ make_compound_operation wrangler hurts us, because the process isn't idempotent, and combine produces two complex instructions: (insn 6 3 7 2 (set (reg:DI 137 [ _3 ]) (ashift:DI (reg:DI 139 [ x ]) (const_int 24 [0x18]))) "foo.c":2:20 305 {ashldi3} (expr_list:REG_DEAD (reg:DI 139 [ x ]) (nil))) (insn 12 7 13 2 (set (reg/i:DI 10 a0) (sign_extend:DI (ashiftrt:SI (subreg:SI (reg:DI 137 [ _3 ]) 0) (const_int 24 [0x18])))) "foo.c":2:27 321 {ashrsi3_extend} (expr_list:REG_DEAD (reg:DI 137 [ _3 ]) (nil))) given two simple instructions: (insn 6 3 7 2 (set (reg:SI 137 [ _3 ]) (sign_extend:SI (subreg:QI (reg/v:DI 136 [ x ]) 0))) "foo.c":2:20 533 {*extendqisi2_bitmanip} (expr_list:REG_DEAD (reg/v:DI 136 [ x ]) (nil))) (insn 7 6 12 2 (set (reg:DI 138 [ _3 ]) (sign_extend:DI (reg:SI 137 [ _3 ]))) "foo.c":2:20 discrim 1 133 {*extendsidi2_internal} (expr_list:REG_DEAD (reg:SI 137 [ _3 ]) (nil))) If I run with -fdisable-rtl-combine then late_combine1 already does the expected transformation. Although it would be nice to fix combine, that might be difficult. If we treat combine as immutable then the options are: (1) Teach simplify-rtx to simplify combine's output into a single sign_extend. (2) Allow fwprop1 to get in first, before combine has a chance to mess things up. The patch goes for (2). Is that a fair summary? Playing devil's advocate, I suppose one advantage of (1) is that it would allow the optimisation even if the original rtl looked like combine's output. And fwprop1 doesn't distinguish between cases in which the source instruction disappears from cases in which the source instruction is kept. Thus we could transform: (set (reg:SI R2) (sign_extend:SI (reg:QI R1))) (set (reg:DI R3) (sign_extend:DI (reg:SI R2))) into: (set (reg:SI R2) (sign_extend:SI (reg:QI R1))) (set (reg:DI R3) (sign_extend:DI (reg:QI R1))) which increases the register pressure between the two instructions (since R2 and R1 are both now live). In general, there could be quite a gap between the two instructions. On the other hand, even in that case, fwprop1 would be parallelising the extensions. And since we're talking about unary operations, even two-address targets would allow R1 to be extended without tying the source and destination. Also, it seems relatively unlikely that expand would produce code that looks like combine's, since the gimple optimisers should have simplified it into conversions. So initially I was going to agree that it's worth trying in fwprop. But... [ commentary on Jeff's original approach dropped. ] So it seems like it's a bit of a mess
If we do try to fix combine, I think something like the attached would fit within the current scheme. It is a pure shift-for-shift transformation, avoiding any extensions. Will think more about it, but wanted to get the above stream of consciousness out before I finish for the day PR rtl-optimization/109592 gcc/ * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Simplify nested shifts with subregs. gcc/testsuite * gcc.target/riscv/pr109592.c: New test. * gcc.target/riscv/sign-extend-rshift.c: Adjust expected output Co-authored-by:Jeff Law <jlaw@ventanamicro.com>
-
GCC Administrator authored
-
- Jan 14, 2025
-
-
anetczuk authored
Raw dump of lang tree was missing information about virtual method call. The information is provided in "tok" field of obj_type_ref. gcc/ChangeLog: * tree-dump.cc (dequeue_and_dump): Handle OBJ_TYPE_REF. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/lang-dump-1.C: New test.
-
Iain Buclaw authored
D front-end changes: - Import latest fixes from dmd v2.110.0-rc.1. D runtime changes: - Import latest fixes from druntime v2.110.0-rc.1. Phobos changes: - Import latest fixes from phobos v2.110.0-rc.1. Included in the merge are fixes for the following PRs: PR d/118438 PR d/118448 PR d/118449 gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd d6f693b46a. * d-incpath.cc (add_import_paths): Update for new front-end interface. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime d6f693b46a. * src/MERGE: Merge upstream phobos 336bed6d8. * testsuite/libphobos.init_fini/custom_gc.d: Adjust test.
-
Alexandre Oliva authored
Arrange for decode_field_reference to use local variables throughout, to modify the out parms only when we're about to return non-NULL, and to drop the unused case of NULL pand_mask, that had a latent failure to detect signbit masking. for gcc/ChangeLog * gimple-fold.cc (decode_field_reference): Rebustify to set out parms only when returning non-NULL. (fold_truth_andor_for_ifcombine): Bail if decode_field_reference returns NULL. Add complementary assert on r_const's not being set when l_const isn't.
-
Marek Polacek authored
In c++/102990 we had a problem where massage_init_elt got {}, digest_nsdmi_init turned that {} into { .value = (int) 1.0e+0 }, and we crashed in the call to fold_non_dependent_init because a FIX_TRUNC_EXPR/FLOAT_EXPR got into tsubst*. So we avoided calling fold_non_dependent_init for a CONSTRUCTOR. But that broke the following test, where we no longer fold the CONST_DECL in { .type = ZERO } to { .type = 0 } and then process_init_constructor_array does: if (next != error_mark_node && (initializer_constant_valid_p (next, TREE_TYPE (next)) != null_pointer_node)) { /* Use VEC_INIT_EXPR for non-constant initialization of trailing elements with no explicit initializers. */ picflags |= PICFLAG_VEC_INIT; because { .type = ZERO } isn't initializer_constant_valid_p. Then we create a VEC_INIT_EXPR and say we can't convert the argument. So we have to fold the elements of the CONSTRUCTOR. We just can't instantiate the elements in a template. This also fixes c++/118047. PR c++/118047 PR c++/118355 gcc/cp/ChangeLog: * typeck2.cc (massage_init_elt): Call fold_non_dependent_init unless for a CONSTRUCTOR in a template. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/nsdmi-list10.C: New test. * g++.dg/cpp0x/nsdmi-list9.C: New test. Reviewed-by:
Jason Merrill <jason@redhat.com>
-
Sandra Loosemore authored
After reimplementing late resolution of "declare variant", the declare_variant_alt and calls_declare_variant_alt flags on struct cgraph_node are no longer used by anything. For the purposes of marking functions that need late resolution, the has_omp_variant_constructs flag has replaced calls_declare_variant_alt. Likewise struct omp_declare_variant_entry, struct omp_declare_variant_base_entry, and the hash tables used to store these structures are no longer needed, since the information needed for late resolution is now stored in the gomp_variant_construct nodes. In addition, some obsolete code that was temporarily ifdef'ed out instead of delted in order to produce a more readable patch for the previous installment of this series is now removed entirely. There are no functional changes in this patch, just removing dead code. gcc/ChangeLog * cgraph.cc (symbol_table::create_edge): Don't set calls_declare_variant_alt in the caller. * cgraph.h (struct cgraph_node): Remove declare_variant_alt and calls_declare_variant_alt flags. * cgraphclones.cc (cgraph_node::create_clone): Don't copy calls_declare_variant_alt bit. * gimplify.cc: Remove previously #ifdef-ed out code. * ipa-free-lang-data.cc (free_lang_data_in_decl): Adjust code referencing declare_variant_alt bit. * ipa.cc (symbol_table::remove_unreachable_nodes): Likewise. * lto-cgraph.cc (lto_output_node): Remove references to deleted bits. (output_refs): Adjust code referencing declare_variant_alt bit. (input_overwrite_node): Remove references to deleted bits. (input_refs): Adjust code referencing declare_variant_alt bit. * lto-streamer-out.cc (lto_output): Likewise. * lto-streamer.h (omp_lto_output_declare_variant_alt): Delete. (omp_lto_input_declare_variant_alt): Delete. * omp-expand.cc (expand_omp_target): Use has_omp_variant_constructs bit to trigger pass_omp_device_lower instead of calls_declare_variant_alt. * omp-general.cc (struct omp_declare_variant_entry): Delete. (struct omp_declare_variant_base_entry): Delete. (struct omp_declare_variant_hasher): Delete. (omp_declare_variant_hasher::hash): Delete. (omp_declare_variant_hasher::equal): Delete. (omp_declare_variants): Delete. (omp_declare_variant_alt_hasher): Delete. (omp_declare_variant_alt_hasher::hash): Delete. (omp_declare_variant_alt_hasher::equal): Delete. (omp_declare_variant_alt): Delete. (omp_lto_output_declare_variant_alt): Delete. (omp_lto_input_declare_variant_alt): Delete. (includes): Delete unnecessary include of gt-omp-general.h. * omp-offload.cc (execute_omp_device_lower): Remove references to deleted bit. (pass_omp_device_lower::gate): Likewise. * omp-simd-clone.cc (simd_clone_create): Likewise. * passes.cc (ipa_write_summaries): Likeise. * symtab.cc (symtab_node::get_partitioning_class): Likewise. * tree-inline.cc (expand_call_inline): Likewise. (tree_function_versioning): Likewise. gcc/lto/ChangeLog * lto-partition.cc (lto_balanced_map): Adjust code referencing deleted declare_variant_alt bit.
-
Sandra Loosemore authored
This patch reimplements the middle-end support for "declare variant" and extends the resolution mechanism to also handle metadirectives (PR112779). It also adds partial support for dynamic selectors (PR113904) and fixes a selector scoring bug reported as PR114596. I hope this rewrite also improves the engineering aspect of the code, e.g. more comments to explain what it is doing. In most cases, variant constructs can be resolved either in the front end or during gimplification; if the variant with the highest score has a static selector, then only that one is emitted. In the case where it has a dynamic selector, it is resolved into a (possibly nested) if/then/else construct, testing the run-time predicate for each selector sorted by decreasing order of score until a static selector is found. In some cases, notably a variant construct in a "declare simd" function which may or may not expand into a simd clone, it may not be possible to score or sort the variants until later in compilation (the ompdevlow pass). In this case the gimplifier emits a loop containing a switch statement with the variants in arbitrary order and uses the OMP_NEXT_VARIANT tree node as a placeholder to control which variant is tested on each iteration of the loop. It looks something like: switch_var = OMP_NEXT_VARIANT (0, state); loop_label: switch (switch_var) { case 1: if (dynamic_selector_predicate_1) { alternative_1; goto end_label; } else { switch_var = OMP_NEXT_VARIANT (1, state); goto loop_label; } case 2: ... } end_label: Note that when there are no dynamic selectors, the loop is unnecessary and only the switch is emitted. Finally, in the ompdevlow pass, the OMP_NEXT_VARIANT magic cookies are resolved and replaced with constants. When compiling with -O we can expect that the loop and switch will be discarded by subsequent optimizations and replaced with direct jumps between the cases, eventually arriving at code with similar control flow to the early-resolution cases. This approach is somewhat simpler than the one currently used for handling declare variant in that all possible code paths are already included in the output of the gimplifier, so it is not necessary to maintain hidden references or data structures pointing to expansions of not-yet-resolved variant constructs and special logic for passing them through LTO (see PR lto/96680). A possible disadvantage of this expansion strategy is that dead code for unused variants in the switch can remain when compiling without -O. If this turns out to be a critical problem (e.g., an unused case includes calls to functions not available to the linker) perhaps some further processing could be performed by default after ompdevlow to simplify such constructs. In order to make this patch more readable for review purposes, it leaves the existing code for "declare variant" resolution (including the above-mentioned LTO hack) in place, in some cases just ifdef-ing out functions that won't compile due to changed interfaces for dependencies. The next patch in the series will delete all the now-unused code. gcc/ChangeLog PR middle-end/114596 PR middle-end/112779 PR middle-end/113904 * Makefile.in (GTFILES): Move omp-general.h earlier; required because of moving score_wide_int declaration to that file. * cgraph.h (struct cgraph_node): Add has_omp_variant_constructs flag. * cgraphclones.cc (cgraph_node::create_clone): Propagate has_omp_variant_constructs flag. * gimplify.cc (omp_resolved_variant_calls): New. (expand_late_variant_directive): New. (find_supercontext): New. (gimplify_variant_call_expr): New. (gimplify_call_expr): Adjust parameters to make fallback available. Update processing for "declare variant" substitution. (is_gimple_stmt): Add OMP_METADIRECTIVE. (omp_construct_selector_matches): Ifdef out unused function. (omp_get_construct_context): New. (gimplify_omp_dispatch): Replace call to deleted function omp_resolve_declare_variant with equivalent logic. (expand_omp_metadirective): New. (expand_late_variant_directive): New. (gimplify_omp_metadirective): New. (gimplify_expr): Adjust arguments to gimplify_call_expr. Add cases for OMP_METADIRECTIVE, OMP_NEXT_VARIANT, and OMP_TARGET_DEVICE_MATCHES. (gimplify_function_tree): Initialize/clean up omp_resolved_variant_calls. * gimplify.h (omp_construct_selector_matches): Delete declaration. (omp_get_construct_context): Declare. * lto-cgraph.cc (lto_output_node): Write has_omp_variant_constructs. (input_overwrite_node): Read has_omp_variant_constructs. * omp-builtins.def (BUILT_IN_OMP_GET_NUM_DEVICES): New. * omp-expand.cc (expand_omp_taskreg): Propagate has_omp_variant_constructs. (expand_omp_target): Likewise. * omp-general.cc (omp_maybe_offloaded): Add construct_context parameter; use it instead of querying gimplifier state. Add comments. (omp_context_name_list_prop): Do not test lang_GNU_Fortran in offload compiler, just use the string as-is. (expr_uses_parm_decl): New. (omp_check_context_selector): Add metadirective_p parameter. Remove sorry for target_device selector. Add additional checks specific to metadirective or declare variant. (make_omp_metadirective_variant): New. (omp_construct_traits_match): New. (omp_context_selector_matches): Temporarily ifdef out the previous code, and add a new implementation based on the old one with different parameters, some unnecessary loops removed, and code re-indented. (omp_target_device_matches_on_host): New. (resolve_omp_target_device_matches): New. (omp_construct_simd_compare): Support matching of "simdlen" and "aligned" clauses. (omp_context_selector_set_compare): Make static. Adjust call to omp_construct_simd_compare. (score_wide_int): Move declaration to omp-general.h. (omp_selector_is_dynamic): New. (omp_device_num_check): New. (omp_dynamic_cond): New. (omp_context_compute_score): Ifdef out the old version and re-implement with different parameters. (omp_complete_construct_context): New. (omp_resolve_late_declare_variant): Ifdef out. (omp_declare_variant_remove_hook): Likewise. (omp_resolve_declare_variant): Likewise. (sort_variant): New. (omp_get_dynamic_candidates): New. (omp_declare_variant_candidates): New. (omp_metadirective_candidates): New. (omp_early_resolve_metadirective): New. (omp_resolve_variant_construct): New. * omp-general.h (score_wide_int): Moved here from omp-general.cc. (struct omp_variant): New. (make_omp_metadirective_variant): Declare. (omp_construct_traits_to_codes): Delete declaration. (omp_check_context_selector): Adjust parameters. (omp_context_selector_matches): Likewise. (omp_context_selector_set_compare): Delete declaration. (omp_resolve_declare_variant): Likewise. (omp_declare_variant_candidates): Declare. (omp_metadirective_candidates): Declare. (omp_get_dynamic_candidates): Declare. (omp_early_resolve_metadirective): Declare. (omp_resolve_variant_construct): Declare. (omp_dynamic_cond): Declare. * omp-offload.cc (resolve_omp_variant_cookies): New. (execute_omp_device_lower): Call the above function to resolve variant directives. Remove call to omp_resolve_declare_variant. (pass_omp_device_lower::gate): Check has_omp_variant_construct bit. * omp-simd-clone.cc (simd_clone_create): Propagate has_omp_variant_constructs bit. * tree-inline.cc (expand_call_inline): Likewise. (tree_function_versioning): Likewise. gcc/c/ChangeLog PR middle-end/114596 PR middle-end/112779 PR middle-end/113904 * c-parser.cc (c_finish_omp_declare_variant): Update for changes to omp-general.h interfaces. gcc/cp/ChangeLog PR middle-end/114596 PR middle-end/112779 PR middle-end/113904 * decl.cc (omp_declare_variant_finalize_one): Update for changes to omp-general.h interfaces. * parser.cc (cp_finish_omp_declare_variant): Likewise. gcc/fortran/ChangeLog PR middle-end/114596 PR middle-end/112779 PR middle-end/113904 * trans-openmp.cc (gfc_trans_omp_declare_variant): Update for changes to omp-general.h interfaces. gcc/testsuite/ PR middle-end/114596 PR middle-end/112779 PR middle-end/113904 * c-c++-common/gomp/declare-variant-12.c: Adjust expected behavior per PR114596. * c-c++-common/gomp/declare-variant-13.c: Test that this is resolvable after gimplification, not just final resolution. * c-c++-common/gomp/declare-variant-14.c: Tweak testcase to ensure that -O causes dead code to be optimized away. * gfortran.dg/gomp/declare-variant-12.f90: Adjust expected behavior per PR114596. * gfortran.dg/gomp/declare-variant-13.f90: Test that this is resolvable after gimplification, not just final resolution. * gfortran.dg/gomp/declare-variant-14.f90: Tweak testcase to ensure that -O causes dead code to be optimized away. Co-Authored-By:
Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By:
Sandra Loosemore <sandra@codesourcery.com> Co-Authored-By:
Marcel Vollweiler <marcel@codesourcery.com>
-
Sandra Loosemore authored
This patch adds basic support for three new tree node types that will be used in subsequent patches to support OpenMP metadirectives and dynamic selectors. OMP_METADIRECTIVE is the internal representation of parsed OpenMP metadirective constructs. It's produced by the front ends and is expanded during gimplification. OMP_NEXT_VARIANT is used as a "magic cookie" for late resolution of variant constructs that cannot be fully resolved during gimplification, used to set the controlling variable of a switch statement that branches to the next alternative once the candidate list can be filtered and sorted. These nodes are expanded into constants in the ompdevlow pass. In some gimple passes, they need to be treated as constants. OMP_TARGET_DEVICE_MATCHES is a similar "magic cookie" used to resolve the target_device dynamic selector. It is wrapped in an OpenMP target construct, and can be resolved to a constant in the ompdevlow pass. gcc/ChangeLog: * doc/generic.texi (OpenMP): Document OMP_METADIRECTIVE, OMP_NEXT_VARIANT, and OMP_TARGET_DEVICE_MATCHES. * fold-const.cc (operand_compare::hash_operand): Ignore the new nodes. * gimple-expr.cc (is_gimple_val): Allow OMP_NEXT_VARIANT and OMP_TARGET_DEVICE_MATCHES. * gimple.cc (get_gimple_rhs_num_ops): OMP_NEXT_VARIANT and OMP_TARGET_DEVICE_MATCHES are both GIMPLE_SINGLE_RHS. * tree-cfg.cc (tree_node_can_be_shared): Allow sharing of OMP_NEXT_VARIANT. * tree-inline.cc (remap_gimple_op_r): Ignore subtrees of OMP_NEXT_VARIANT. * tree-pretty-print.cc (dump_generic_node): Handle OMP_METADIRECTIVE, OMP_NEXT_VARIANT, and OMP_TARGET_DEVICE_MATCHES. * tree-ssa-operands.cc (operands_scanner::get_expr_operands): Ignore operands of OMP_NEXT_VARIANT and OMP_TARGET_DEVICE_MATCHES. * tree.def (OMP_METADIRECTIVE): New. (OMP_NEXT_VARIANT): New. (OMP_TARGET_DEVICE_MATCHES): New. * tree.h (OMP_METADIRECTIVE_VARIANTS): New. (OMP_METADIRECTIVE_VARIANT_SELECTOR): New. (OMP_METADIRECTIVE_VARIANT_DIRECTIVE): New. (OMP_METADIRECTIVE_VARIANT_BODY): New. (OMP_NEXT_VARIANT_INDEX): New. (OMP_NEXT_VARIANT_STATE): New. (OMP_TARGET_DEVICE_MATCHES_SELECTOR): New. (OMP_TARGET_DEVICE_MATCHES_PROPERTIES): New. Co-Authored-By:
Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By:
Sandra Loosemore <sandra@codesourcery.com>
-
Alexandre Oliva authored
Add logic to check and extend constants compared with bitfields, so that fields are only compared with constants they could actually equal. This involves making sure the signedness doesn't change between loads and conversions before shifts: we'd need to carry a lot more data to deal with all the possibilities. for gcc/ChangeLog PR tree-optimization/118456 * gimple-fold.cc (decode_field_reference): Punt if shifting after changing signedness. (fold_truth_andor_for_ifcombine): Check extension bits in constants before clipping. for gcc/testsuite/ChangeLog PR tree-optimization/118456 * gcc.dg/field-merge-21.c: New. * gcc.dg/field-merge-22.c: New.
-
Robin Dapp authored
In PR118154 we emit strided stores but the first of those does not always have the proper VTYPE. That's because we erroneously delete a necessary vsetvl. In order to determine whether to elide (1) Expr[7]: VALID (insn 116, bb 17) Demand fields: demand_ratio_and_ge_sew demand_avl SEW=8, VLMUL=mf2, RATIO=16, MAX_SEW=64 TAIL_POLICY=agnostic, MASK_POLICY=agnostic AVL=(reg:DI 0 zero) when e.g. (2) Expr[3]: VALID (insn 360, bb 15) Demand fields: demand_sew_lmul demand_avl SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64 TAIL_POLICY=agnostic, MASK_POLICY=agnostic AVL=(reg:DI 0 zero) VL=(reg:DI 13 a3 [345]) is already available, we use sew_ge_and_prev_sew_le_next_max_sew_and_next_ratio_valid_for_prev_sew_p. (1) requires RATIO = SEW/LMUL = 16 and an SEW >= 8. (2) has ratio = 64, though, so we cannot directly elide (1). This patch uses ratio_eq_p instead of next_ratio_valid_for_prev_sew_p. PR target/118154 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (MAX_LMUL): New define. (pre_vsetvl::earliest_fuse_vsetvl_info): Use. (pre_vsetvl::pre_global_vsetvl_info): New predicate with equal ratio. * config/riscv/riscv-vsetvl.def: Use. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr118154-1.c: New test. * gcc.target/riscv/rvv/autovec/pr118154-2.c: New test.
-
Robin Dapp authored
In PR118140 we simplify _ifc__33 = .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11); to 1: Match-and-simplified .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11) to 1 when _46 == 1. This happens by removing the conditional and applying a | 1 = 1. Normally we re-introduce the conditional and its else value if needed but that does not happen here as we're not dealing with a vector type. For correctness's sake, we must not remove the conditional even for non-vector types. This patch re-introduces a COND_EXPR in such cases. For PR118140 this result in a non-vectorized loop. PR middle-end/118140 gcc/ChangeLog: * gimple-match-exports.cc (maybe_resimplify_conditional_op): Add COND_EXPR when we simplified to a scalar gimple value but still have an else value. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr118140.c: New test. * gcc.target/riscv/rvv/autovec/pr118140.c: New test.
-
Nathaniel Shead authored
The ICE in the linked PR is caused because name lookup finds duplicate copies of the deduction guides, causing a checking assert to fail. This is ultimately because we're exporting an imported guide; when name lookup processes 'dguide-5_b.H' it goes via the 'tt_entity' path and just returns the entity from 'dguide-5_a.H'. Because this doesn't ever go through 'key_mergeable' we never set 'BINDING_VECTOR_GLOBAL_DUPS_P' and so deduping is not engaged, allowing duplicate results. Currently I believe this to be a perculiarity of the ANY_REACHABLE handling for deduction guides; in no other case that I can find do we emit bindings purely to imported entities. As such, this patch fixes this problem from that end, by ensuring that we simply do not emit any imported deduction guides. This avoids the ICE because no duplicates need deduping to start with, and should otherwise have no functional change because lookup of deduction guides will look at all reachable modules (exported or not) regardless. Since we're now deliberately not emitting imported deduction guides we can use LOOK_want::NORMAL instead of LOOK_want::ANY_REACHABLE, since the extra work to find as-yet undiscovered deduction guides in transitive importers is not necessary here. PR c++/117397 gcc/cp/ChangeLog: * module.cc (depset::hash::add_deduction_guides): Don't emit imported deduction guides. (depset::hash::finalize_dependencies): Add check for any bindings referring to imported entities. gcc/testsuite/ChangeLog: * g++.dg/modules/dguide-5_a.H: New test. * g++.dg/modules/dguide-5_b.H: New test. * g++.dg/modules/dguide-5_c.H: New test. * g++.dg/modules/dguide-6.h: New test. * g++.dg/modules/dguide-6_a.C: New test. * g++.dg/modules/dguide-6_b.C: New test. * g++.dg/modules/dguide-6_c.C: New test. Signed-off-by:
Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by:
Jason Merrill <jason@redhat.com>
-
Eric Botcazou authored
...to the object file reader present in the run-time library. gcc/ada/ PR ada/118459 * libgnat/s-objrea.ads (Object_Arch): Add S390 and RISCV. * libgnat/s-objrea.adb (EM_S390): New named number. (EM_RISCV): Likewise. (ELF_Ops.Initialize): Deal with EM_S390 and EM_RISCV. (Read_Address): Deal with S390 and RISCV.
-
Richard Biener authored
When vectorizing a load we are now checking alignment before emitting a vector(1) T load instead of blindly assuming it's OK when we had a scalar T load. For reasons we're not handling alignment computation optimally here but we shouldn't ICE when we fall back to loads of T. The following ensures the IL remains correct by emitting VIEW_CONVERT from T to vector(1) T when needed. It also removes an earlier fix done in r9-382-gbb4e47476537f6 for the same issue with VMAT_ELEMENTWISE. PR tree-optimization/118405 * tree-vect-stmts.cc (vectorizable_load): When we fall back to scalar loads make sure we properly convert to vector(1) T when there was only a single vector element.
-
Anuj Mohite authored
This patch provided by Anuj Mohite as part of the GSoC project. It is modified slightly by Jerry DeLisle for minor formatting. The patch provides front-end parsing of the LOCALITY specs in DO_CONCURRENT and adds numerous test cases. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_code_node): Updated to use c->ext.concur.forall_iterator instead of c->ext.forall_iterator. * frontend-passes.cc (index_interchange): Updated to use c->ext.concur.forall_iterator instead of c->ext.forall_iterator. (gfc_code_walker): Likewise. * gfortran.h (enum locality_type): Added new enum for locality types in DO CONCURRENT constructs. * match.cc (match_simple_forall): Updated to use new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator. (gfc_match_forall): Likewise. (gfc_match_do): Implemented support for matching DO CONCURRENT locality specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and REDUCE). * parse.cc (parse_do_block): Updated to use new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator. * resolve.cc (struct check_default_none_data): Added struct check_default_none_data. (do_concur_locality_specs_f2023): New function to check compliance with F2023's C1133 constraint for DO CONCURRENT. (check_default_none_expr): New function to check DEFAULT(NONE) compliance. (resolve_locality_spec): New function to resolve locality specs. (gfc_count_forall_iterators): Updated to use code->ext.concur.forall_iterator. (gfc_resolve_forall): Updated to use code->ext.concur.forall_iterator. * st.cc (gfc_free_statement): Updated to free locality specifications and use p->ext.concur.forall_iterator. * trans-stmt.cc (gfc_trans_forall_1): Updated to use code->ext.concur.forall_iterator. gcc/testsuite/ChangeLog: * gfortran.dg/do_concurrent_10.f90: New test. * gfortran.dg/do_concurrent_8_f2018.f90: New test. * gfortran.dg/do_concurrent_8_f2023.f90: New test. * gfortran.dg/do_concurrent_9.f90: New test. * gfortran.dg/do_concurrent_all_clauses.f90: New test. * gfortran.dg/do_concurrent_basic.f90: New test. * gfortran.dg/do_concurrent_constraints.f90: New test. * gfortran.dg/do_concurrent_local_init.f90: New test. * gfortran.dg/do_concurrent_locality_specs.f90: New test. * gfortran.dg/do_concurrent_multiple_reduce.f90: New test. * gfortran.dg/do_concurrent_nested.f90: New test. * gfortran.dg/do_concurrent_parser.f90: New test. * gfortran.dg/do_concurrent_reduce_max.f90: New test. * gfortran.dg/do_concurrent_reduce_sum.f90: New test. * gfortran.dg/do_concurrent_shared.f90: New test. Signed-off-by:
Anuj <anujmohite001@gmail.com>
-