- Jan 11, 2025
-
-
mengqinggang authored
Generate 0x1010 instead of 0x1010000>>12 for lu12i.w. lu32i.d and lu52i.d use the same processing. gcc/ChangeLog: * config/loongarch/lasx.md: Use new loongarch_output_move. * config/loongarch/loongarch-protos.h (loongarch_output_move): Change parameters from (rtx, rtx) to (rtx *). * config/loongarch/loongarch.cc (loongarch_output_move): Generate final immediate for lu12i.w and lu52i.d. * config/loongarch/loongarch.md: Generate final immediate for lu32i.d and lu52i.d. * config/loongarch/lsx.md: Use new loongarch_output_move. gcc/testsuite/ChangeLog: * gcc.target/loongarch/imm-load.c: Not generate ">>".
-
Iain Buclaw authored
D front-end changes: - Import latest fixes from dmd v2.110.0-beta.1. D runtime changes: - Import latest fixes from druntime v2.110.0-beta.1. Phobos changes: - Import latest fixes from phobos v2.110.0-beta.1. - Added `popGrapheme' function to `std.uni'. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 2b89c2909d. * Make-lang.in (D_FRONTEND_OBJS): Rename d/basicmangle.o to d/mangle-basic.o, d/cppmangle.o to d/mangle-cpp.o, and d/dmangle.o to d/mangle-package.o. (d/mangle-%.o): New rule. * d-builtins.cc (maybe_set_builtin_1): Update for new front-end interface. * d-diagnostic.cc (verrorReport): Likewise. (verrorReportSupplemental): Likewise. * d-frontend.cc (getTypeInfoType): Likewise. * d-lang.cc (d_init_options): Likewise. (d_handle_option): Likewise. (d_post_options): Likewise. * d-target.cc (TargetC::contributesToAggregateAlignment): New. * d-tree.h (create_typeinfo): Adjust prototype. * decl.cc (layout_struct_initializer): Update for new front-end interface. * typeinfo.cc (create_typeinfo): Remove generate parameter. * types.cc (layout_aggregate_members): Update for new front-end interface. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 2b89c2909d. * src/MERGE: Merge upstream phobos bdedad3bf.
-
Andrew MacLeod authored
Query for known relations between the operands, and pass that to fold_range to help simplify MIN and MAX relations. Make it type agnostic as well. Adapt testcases from DOM to EVRP (e suffix) and test floats (f suffix). PR tree-optimization/88575 gcc/ * vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Query relation between op0 and op1 and utilize it. (simplify_using_ranges::simplify): Do not eliminate float checks. gcc/testsuite/ * gcc.dg/tree-ssa/minmax-27.c: Disable VRP. * gcc.dg/tree-ssa/minmax-27e.c: New. * gcc.dg/tree-ssa/minmax-27f.c: New. * gcc.dg/tree-ssa/minmax-28.c: Disable VRP. * gcc.dg/tree-ssa/minmax-28e.c: New. * gcc.dg/tree-ssa/minmax-28f.c: New.
-
GCC Administrator authored
-
- Jan 10, 2025
-
-
Iain Buclaw authored
D front-end changes: - Added pragma for ImportC to allow setting `nothrow', `@nogc' or `pure'. - Mixin templates can now use assignment syntax. D runtime changes: - Removed `ThreadBase.criticalRegionLock' from `core.thread'. - Added `expect', `[un]likely', `trap' to `core.builtins'. Phobos changes: - Import latest fixes from phobos v2.110.0-beta.1. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 4ccb01fde5. * Make-lang.in (D_FRONTEND_OBJS): Rename d/foreachvar.o to d/visitor-foreachvar.o, d/visitor.o to d/visitor-package.o, and d/statement_rewrite_walker.o to d/visitor-statement_rewrite_walker.o. (D_FRONTEND_OBJS): Rename d/{parsetime,permissive,postorder,transitive}visitor.o to d/visitor-{parsetime,permissive,postorder,transitive}.o. (D_FRONTEND_OBJS): Remove d/sapply.o. (d.tags): Add dmd/common/*.h. (d/visitor-%.o:): New rule. * d-codegen.cc (get_frameinfo): Update for new front-end interface. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 4ccb01fde5. * src/MERGE: Merge upstream phobos eab6595ad.
-
Iain Buclaw authored
D front-end changes: - It's now deprecated to declare `auto ref' parameters without putting those two keywords next to each other. - An error is now given for case fallthough for multivalued cases. - An error is now given for constructors with field destructors with stricter attributes. - An error is now issued for `in'/`out' contracts of `nothrow' functions that may throw. - `auto ref' can now be applied to local, static, extern, and global variables. D runtime changes: - Import latest fixes from druntime v2.110.0-beta.1. Phobos changes: - Import latest fixes from phobos v2.110.0-beta.1. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 6884b433d2. * d-builtins.cc (build_frontend_type): Update for new front-end interface. (d_build_builtins_module): Likewise. (matches_builtin_type): Likewise. (covariant_with_builtin_type_p): Likewise. * d-codegen.cc (lower_struct_comparison): Likewise. (call_side_effect_free_p): Likewise. * d-compiler.cc (Compiler::paintAsType): Likewise. * d-convert.cc (convert_expr): Likewise. (convert_for_assignment): Likewise. * d-target.cc (Target::isVectorTypeSupported): Likewise. (Target::isVectorOpSupported): Likewise. (Target::isReturnOnStack): Likewise. * decl.cc (get_symbol_decl): Likewise. * expr.cc (build_return_dtor): Likewise. * imports.cc (class ImportVisitor): Likewise. * toir.cc (class IRVisitor): Likewise. * types.cc (class TypeVisitor): Likewise. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 6884b433d2. * src/MERGE: Merge upstream phobos 48d581a1f.
-
Alex Coplan authored
Currently we only cost gconds for the vector loop while we omit costing them when analyzing the scalar loop; this unfairly penalizes the vector loop in the case of loops with early exits. This (together with the previous patches) enables us to vectorize std::find with 64-bit element sizes. gcc/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost): Don't skip over gconds.
-
Alex Coplan authored
This fixes a latent wrong code issue whereby vect_do_peeling determined the wrong condition for inserting the vector skip guard. Specifically in the case where the loop niters are unknown at compile time we used to check: !LOOP_REQUIRES_VERSIONING (loop_vinfo) but LOOP_REQUIRES_VERSIONING is true for loops which we have versioned for aliasing, and that has nothing to do with prolog peeling. I think this condition should instead be checking specifically if we aren't versioning for alignment. As it stands, when we version for alignment, we don't peel, so the vector skip guard is indeed redundant in that case. With the testcase added (reduced from the Fortran frontend) we would version for aliasing, omit the vector skip guard, and then at runtime we would peel sufficient iterations for alignment that there wasn't a full vector iteration left when we entered the vector body, thus overflowing the output buffer. gcc/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * tree-vect-loop-manip.cc (vect_do_peeling): Adjust skip_vector condition to only omit the edge if we're versioning for alignment. gcc/testsuite/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * gcc.dg/vect/vect-early-break_130.c: New test.
-
Tamar Christina authored
The alignment peeling changes exposed a latent missing dominator update with early break vectorization, specifically when inserting the vector skip edge, since the new edge bypasses the prolog skip block and thus has the potential to subvert its dominance. This patch fixes that. gcc/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * tree-vect-loop-manip.cc (vect_do_peeling): Update immediate dominators of nodes that were dominated by the prolog skip block after inserting vector skip edge. Initialize prolog variable to NULL to avoid bogus -Wmaybe-uninitialized during bootstrap. gcc/testsuite/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * g++.dg/vect/vect-early-break_6.cc: New test. Co-Authored-By:
Alex Coplan <alex.coplan@arm.com>
-
Alex Coplan authored
For loops with LOOP_VINFO_EARLY_BREAKS_VECT_PEELED we should always enter the scalar epilogue, so avoid emitting a guard on entry to the epilogue. gcc/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * tree-vect-loop-manip.cc (vect_do_peeling): Avoid emitting an epilogue guard for inverted early-exit loops.
-
Alex Coplan authored
This allows us to vectorize more loops with early exits by forcing peeling for alignment to make sure that we're guaranteed to be able to safely read an entire vector iteration without crossing a page boundary. To make this work for VLA architectures we have to allow compile-time non-constant target alignments. We also have to override the result of the target's preferred_vector_alignment hook if it isn't a power-of-two multiple of the TYPE_SIZE of the chosen vector type. gcc/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Set need_peeling_for_alignment flag on read DRs instead of failing vectorization. Punt on gathers. (dr_misalignment): Handle non-constant target alignments. (vect_compute_data_ref_alignment): If need_peeling_for_alignment flag is set on the DR, then override the target alignment chosen by the preferred_vector_alignment hook to choose a safe alignment. (vect_supportable_dr_alignment): Override support_vector_misalignment hook if need_peeling_for_alignment is set on the DR: in this case we must return dr_unaligned_unsupported in order to force peeling. * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog peeling by a compile-time non-constant amount. * tree-vectorizer.h (dr_vec_info): Add new flag need_peeling_for_alignment. gcc/testsuite/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * gcc.dg/tree-ssa/cunroll-13.c: Don't vectorize. * gcc.dg/tree-ssa/cunroll-14.c: Likewise. * gcc.dg/unroll-6.c: Likewise. * gcc.dg/tree-ssa/gen-vect-28.c: Likewise. * gcc.dg/vect/vect-104.c: Expect to vectorize. * gcc.dg/vect/vect-early-break_108-pr113588.c: Likewise. * gcc.dg/vect/vect-early-break_109-pr113588.c: Likewise. * gcc.dg/vect/vect-early-break_110-pr113467.c: Likewise. * gcc.dg/vect/vect-early-break_3.c: Likewise. * gcc.dg/vect/vect-early-break_65.c: Likewise. * gcc.dg/vect/vect-early-break_8.c: Likewise. * gfortran.dg/vect/vect-5.f90: Likewise. * gfortran.dg/vect/vect-8.f90: Likewise. * gcc.dg/vect/vect-switch-search-line-fast.c: Co-Authored-By:
Tamar Christina <tamar.christina@arm.com>
-
Tamar Christina authored
The Parts Num field for the MIDR for Cortex-X4 is wrong. It's currently the parts number for a Cortex-A720 (which does have the right number). The correct number can be found in the Cortex-X4 Technical Reference Manual [1] on page 382 in Issue Number 5. [1] https://developer.arm.com/documentation/102484/latest/ gcc/ChangeLog: * config/aarch64/aarch64-cores.def (AARCH64_CORE): Fix cortex-x4 parts num.
-
Iain Buclaw authored
D front-end changes: - Import dmd v2.110.0-beta.1. - `ref' can now be applied to local, static, extern, and global variables. D runtime changes: - Import druntime v2.110.0-beta.1. Phobos changes: - Import phobos v2.110.0-beta.1. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 34875cd6e1. * dmd/VERSION: Bump version to v2.110.0-beta.1. * Make-lang.in (D_FRONTEND_OBJS): Add d/deps.o, d/timetrace.o. * decl.cc (class DeclVisitor): Update for new front-end interface. * expr.cc (class ExprVisitor): Likewise * typeinfo.cc (check_typeinfo_type): Likewise. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 34875cd6e1. * src/MERGE: Merge upstream phobos ebd24da8a.
-
Jonathan Wakely authored
This fixes warnings like the following during bootstrap: sparc-sun-solaris2.11/libstdc++-v3/include/bits/atomic_futex.h:324:53: warning: unused parameter ‘__mo’ [-Wunused-parameter] 324 | _M_load_when_equal(unsigned __val, memory_order __mo) | ~~~~~~~~~~~~~^~~~ libstdc++-v3/ChangeLog: * include/bits/atomic_futex.h (__atomic_futex_unsigned): Remove names of unused parameters in non-futex implementation.
-
Marek Polacek authored
Fixed by r15-6740. PR c++/118391 gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-uneval20.C: New test.
-
Wilco Dijkstra authored
Simplify and cleanup ifunc selection logic. Since LRCPC3 does not imply LSE2, has_rcpc3() should also check LSE2 is enabled. Passes regress and bootstrap, OK for commit? libatomic: * config/linux/aarch64/host-config.h (has_lse2): Cleanup. (has_lse128): Likewise. (has_rcpc3): Add early check for LSE2.
-
Torbjörn SVENSSON authored
Since armv8-m.base uses thumb1 that does not suport sibcall/tailcall, a pattern is needed that uses PUSH/BL/POP sequence instead of a single B instruction to reuse an already existing function in the compile unit. gcc/testsuite/ChangeLog: * gcc.target/arm/cmse/cmse-15.c: Added pattern for armv8-m.base. Signed-off-by:
Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
-
Paul-Antoine Arras authored
This is a followup to ed49709a OpenMP: C++ front-end support for dispatch + adjust_args. The call to cp_parser_omp_dispatch only belongs in cp_parser_omp_construct. In cp_parser_pragma, handle PRAGMA_OMP_DISPATCH by calling cp_parser_omp_construct. gcc/cp/ChangeLog: * parser.cc (cp_parser_pragma): Replace call to cp_parser_omp_dispatch with cp_parser_omp_construct and check context. gcc/testsuite/ChangeLog: * g++.dg/gomp/dispatch-8.C: New test.
-
Jakub Jelinek authored
In the following testcase there are 2 issues, one is that B doesn't have operator<=> and the other is that A's operator<=> has int return type, i.e. not the standard comparison category. Because of the int return type, retcat is cc_last; when we first try to synthetize it, it is therefore with tentative false and complain tf_none, we find that B doesn't have operator<=> and because retcat isn't tc_last, don't try to search for other operators in genericize_spaceship. And then mark the operator deleted. When trying to explain the use of the deleted operator, tentative is still false, but complain is tf_error_or_warning. do_one_comp will first do: tree comp = build_new_op (loc, code, flags, lhs, rhs, NULL_TREE, NULL_TREE, &overload, tentative ? tf_none : complain); and because complain isn't tf_none, it will actually diagnose the bug already, but then (tentative || complain) is true and we call genericize_spaceship, which has if (tag == cc_last && is_auto (type)) { ... } gcc_checking_assert (tag < cc_last); and because tag is cc_last and type isn't auto, we just ICE on that assertion. The patch fixes it by returning error_mark_node from genericize_spaceship instead of failing the assertion. Note, the PR raises another problem. If on the same testcase the B b; line is removed, we silently synthetize operator<=> which will crash at runtime due to returning without a return statement. That is because the standard says that in that case it should return static_cast<int>(std::strong_ordering::equal); but I can't find anywhere wording which would say that if that isn't valid, the function is deleted. https://eel.is/c++draft/class.compare#class.spaceship-2.2 seems to talk just about cases where there are some members and their comparison is invalid it is deleted, but here there are none and it follows https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2 So, we synthetize with tf_none, see the static_cast is invalid, don't add error_mark_node statement silently, but as the function isn't deleted, we just silently emit it. Should the standard be amended to say that the operator should be deleted even if it has no elements and the static cast from https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2 ? 2025-01-10 Jakub Jelinek <jakub@redhat.com> PR c++/118387 * method.cc (genericize_spaceship): For tag == cc_last if type is not auto just return error_mark_node instead of failing checking assertion. * g++.dg/cpp2a/spaceship-synth17.C: New test.
-
Jason Merrill authored
We need to remember that the ::operator new is replaceable to avoid a bogus error about __builtin_operator_new finding a non-replaceable function. This affected __get_temporary_buffer in stl_tempbuf.h. gcc/cp/ChangeLog: * module.cc (trees_out::core_bools): Write replaceable_operator. (trees_in::core_bools): Read it. gcc/testsuite/ChangeLog: * g++.dg/modules/operator-2_a.C: New test. * g++.dg/modules/operator-2_b.C: New test.
-
Richard Biener authored
The following fixes memory leaks found compiling SPEC CPU 2017 with valgrind. * df-core.cc (rest_of_handle_df_finish): Release dflow for problems without free function (like LR). * gimple-crc-optimization.cc (crc_optimization::loop_may_calculate_crc): Release loop_bbs on all exits. * tree-vectorizer.h (supportable_indirect_convert_operation): Change. * tree-vect-generic.cc (expand_vector_conversion): Adjust. * tree-vect-stmts.cc (vectorizable_conversion): Use auto_vec for converts. (supportable_indirect_convert_operation): Get a reference to the output vector of converts.
-
Vladimir N. Makarov authored
My previous patch for PR118017 contains a test which fails on i686. The patch fixes this. gcc/testsuite/ChangeLog: PR target/118017 * gcc.target/i386/pr118017.c: Check target int128.
-
Christophe Lyon authored
The previous fix only worked for C, for C++ we need to add more information to the underlying type so that finish_class_member_access_expr accepts it. We use the same logic as in aarch64's register_tuple_type for AdvSIMD tuples. This patch makes gcc.target/arm/mve/intrinsics/pr118332.c pass in C++ mode. gcc/ChangeLog: PR target/118332 * config/arm/arm-mve-builtins.cc (wrap_type_in_struct): Delete. (register_type_decl): Delete. (register_builtin_tuple_types): Use lang_hooks.types.simulate_record_decl.
-
Richard Biener authored
Pushed as obvious. * gcse.cc (pass_hardreg_pre::gate): Wrap possibly unused fun argument.
-
Richard Biener authored
The following puts in a hard limit on ext-dce because it might end up requiring memory on the order of the number of basic blocks times the number of pseudo registers. The limiting follows what GCSE based passes do and thus I re-use --param max-gcse-memory here. This doesn't in any way address the implementation issues of the pass, but it reduces the memory-use when compiling the module_first_rk_step_part1.F90 TU from 521.wrf_r from 25GB to 1GB. PR rtl-optimization/117467 PR rtl-optimization/117934 * ext-dce.cc (ext_dce_execute): Do nothing if a memory allocation estimate exceeds what is allowed by --param max-gcse-memory.
-
Marek Polacek authored
Here we ICE in expand_expr_real_1: if (exp) { tree context = decl_function_context (exp); gcc_assert (SCOPE_FILE_SCOPE_P (context) || context == current_function_decl on something like this test: void f (auto... args) { [&]<size_t... i>(seq<i...>) { g(args...[i]...); }(seq<0>()); } because while current_function_decl is: f<int>(int)::<lambda(seq<i ...>)> [with long unsigned int ...i = {0}] (correct), context is: f<int>(int)::<lambda(seq<i ...>)> which is only the partial instantiation. I think that when tsubst_pack_index gets a partial instantiation, e.g. {*args#0} as the pack, we should still tsubst it. The args#0's value-expr can be __closure->__args#0 where the closure's context is the partially instantiated operator(). So we should let retrieve_local_specialization find the right args#0. PR c++/117937 gcc/cp/ChangeLog: * pt.cc (tsubst_pack_index): tsubst the pack even when it's not PACK_EXPANSION_P. gcc/testsuite/ChangeLog: * g++.dg/cpp26/pack-indexing13.C: New test. * g++.dg/cpp26/pack-indexing14.C: New test.
-
Stefan Schulze Frielinghaus authored
gcc/ChangeLog: * config/s390/s390-protos.h (s390_emit_compare): Add mode parameter for the resulting RTX. * config/s390/s390.cc (s390_emit_compare): Dito. (s390_emit_compare_and_swap): Change. (s390_expand_vec_strlen): Change. (s390_expand_cs_hqi): Change. (s390_expand_split_stack_prologue): Change. * config/s390/s390.md (*add<mode>3_carry1_cc): Renamed to ... (add<mode>3_carry1_cc): this and in order to use the corresponding gen function, encode CC mode into pattern. (*sub<mode>3_borrow_cc): Renamed to ... (sub<mode>3_borrow_cc): this and in order to use the corresponding gen function, encode CC mode into pattern. (*add<mode>3_alc_carry1_cc): Renamed to ... (add<mode>3_alc_carry1_cc): this and in order to use the corresponding gen function, encode CC mode into pattern. (sub<mode>3_slb_borrow1_cc): New. (uaddc<mode>5): New. (usubc<mode>5): New. gcc/testsuite/ChangeLog: * gcc.target/s390/uaddc-1.c: New test. * gcc.target/s390/uaddc-2.c: New test. * gcc.target/s390/uaddc-3.c: New test. * gcc.target/s390/usubc-1.c: New test. * gcc.target/s390/usubc-2.c: New test. * gcc.target/s390/usubc-3.c: New test.
-
Andrew Carlotti authored
gcc/ChangeLog: * doc/passes.texi: Document hardreg PRE pass.
-
Andrew Carlotti authored
This pass is used to optimise assignments to the FPMR register in aarch64. I chose to implement this as a middle-end pass because it mostly reuses the existing RTL PRE code within gcse.cc. Compared to RTL PRE, the key difference in this new pass is that we insert new writes directly to the destination hardreg, instead of writing to a new pseudo-register and copying the result later. This requires changes to the analysis portion of the pass, because sets cannot be moved before existing instructions that set, use or clobber the hardreg, and the value becomes unavailable after any uses of clobbers of the hardreg. Any uses of the hardreg in debug insns will be deleted. We could do better than this, but for the aarch64 fpmr I don't think we emit useful debuginfo for deleted fp8 instructions anyway (and I don't even know if it's possible to have a debug fpmr use when entering hardreg PRE). gcc/ChangeLog: * config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro. * gcse.cc (doing_hardreg_pre_p): New global variable. (do_load_motion): New boolean check. (current_hardreg_regno): New global variable. (compute_local_properties): Unset transp for hardreg clobbers. (prune_hardreg_uses): New function. (want_to_gcse_p): Use different checks for hardreg PRE. (oprs_unchanged_p): Disable load motion for hardreg PRE pass. (hash_scan_set): For hardreg PRE, skip non-hardreg sets and check for hardreg clobbers. (record_last_mem_set_info): Skip for hardreg PRE. (compute_pre_data): Prune hardreg uses from transp bitmap. (pre_expr_reaches_here_p_work): Add sentence to comment. (insert_insn_start_basic_block): New functions. (pre_edge_insert): Don't add hardreg sets to predecessor block. (pre_delete): Use hardreg for the reaching reg. (reset_hardreg_debug_uses): New function. (pre_gcse): For hardreg PRE, reset debug uses and don't insert copies. (one_pre_gcse_pass): Disable load motion for hardreg PRE. (execute_hardreg_pre): New. (class pass_hardreg_pre): New. (pass_hardreg_pre::gate): New. (make_pass_hardreg_pre): New. * passes.def (pass_hardreg_pre): New pass. * tree-pass.h (make_pass_hardreg_pre): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/fpmr-1.c: New test. * gcc.target/aarch64/acle/fpmr-2.c: New test. * gcc.target/aarch64/acle/fpmr-3.c: New test. * gcc.target/aarch64/acle/fpmr-4.c: New test.
-
Andrew Carlotti authored
This patch skips redirect_to_specific clone for aarch64 and riscv, because the optimisation has two flaws: 1. It checks the value of the "target" attribute, even on targets that don't use this attribute for multiversioning. 2. The algorithm used is too aggressive, and will eliminate the indirection in some cases where the runtime choice of callee version can't be determined statically at compile time. A correct would need to verify that: - if the current caller version were selected at runtime, then the chosen callee version would be eligible for selection. - if any higher priority callee version were selected at runtime, then a higher priority caller version would have been eligble for selection (and hence the current caller version wouldn't have been selected). The current checks only verify a more restrictive version of the first condition, and don't check the second condition at all. Fixing the optimisation properly would require implementing target hooks to check for implications between version attributes, which is too complicated for this stage. However, I would like to see this hook implemented in the future, since it could also help deduplicate other multiversioning code. Since this behaviour has existed for x86 and powerpc for a while, I think it's best to preserve the existing behaviour on those targets, unless any maintainer for those targets disagrees. gcc/ChangeLog: * multiple_target.cc (redirect_to_specific_clone): Assert that "target" attribute is used for FMV before checking it. (ipa_target_clone): Skip redirect_to_specific_clone on some targets. gcc/testsuite/ChangeLog: * g++.target/aarch64/mv-pragma.C: New test.
-
Andrew Carlotti authored
gcc/ChangeLog: * doc/invoke.texi: Add new AArch64 flags.
-
Andrew Carlotti authored
GCC does not emit tlbi instructions, so this only affects the flags passed through to the assembler. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_7A): Add XS. * config/aarch64/aarch64-option-extensions.def (XS): New flag.
-
Andrew Carlotti authored
GCC does not currently emit the wfet or wfit instructions, so this primarily affects the flags passed through to the assembler. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_7A): Add WFXT. * config/aarch64/aarch64-option-extensions.def (WFXT): New flag.
-
Andrew Carlotti authored
gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_4A): Add RCPC2. * config/aarch64/aarch64-option-extensions.def (RCPC2): New flag. (RCPC3): Add RCPC2 dependency. * config/aarch64/aarch64.h (TARGET_RCPC2): Use new flag. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_21.c: Add rcpc2 to expected feature string instead of rcpc. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
-
Andrew Carlotti authored
GCC does not currently emit the axflag or xaflag instructions, so this primarily affects the flags passed through to the assembler. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_5A): Add FLAGM2. * config/aarch64/aarch64-option-extensions.def (FLAGM2): New flag. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_21.c: Add flagm2 to expected feature string instead of flagm. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
-
Andrew Carlotti authored
gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_5A): Add FRINTTS * config/aarch64/aarch64-option-extensions.def (FRINTTS): New flag. * config/aarch64/aarch64.h (TARGET_FRINT): Use new flag. * config/aarch64/arm_acle.h: Use new flag for frintts intrinsics. * config/aarch64/arm_neon.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_21.c: Add frintts to expected feature string. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
-
Andrew Carlotti authored
gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_3A): Add JSCVT. * config/aarch64/aarch64-option-extensions.def (JSCVT): New flag. * config/aarch64/aarch64.h (TARGET_JSCVT): Use new flag. * config/aarch64/arm_acle.h: Use new flag for jscvt intrinsics. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/native_cpu_21.c: Add jscvt to expected feature string. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
-
Andrew Carlotti authored
This includes +fcma as a dependency of +sve, and means that we can finally support fcma intrinsics on a64fx. Also add fcma to the Features list in several cpunative testcases that incorrectly included sve without fcma. gcc/ChangeLog: * config/aarch64/aarch64-arches.def (V8_3A): Add FCMA. * config/aarch64/aarch64-option-extensions.def (FCMA): New flag. (SVE): Add FCMA dependency. * config/aarch64/aarch64.h (TARGET_COMPLEX): Use new flag. * config/aarch64/arm_neon.h: Use new flag for fcma intrinsics. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/info_15: Add fcma to Features. * gcc.target/aarch64/cpunative/info_16: Ditto. * gcc.target/aarch64/cpunative/info_17: Ditto. * gcc.target/aarch64/cpunative/info_8: Ditto. * gcc.target/aarch64/cpunative/info_9: Ditto.
-
Andrew Carlotti authored
gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_expand_epilogue): Use TARGET_PAUTH. * config/aarch64/aarch64.md: Update comment.
-
Jakub Jelinek authored
Seems I forgot to set_c_expr_source_range for the __builtin_stdc_rotate_* case (the other __builtin_stdc_* cases already have it), which means the locations in expr are uninitialized, sometimes causing ICEs in linemap code, at other times just valgrind errors about uninitialized var uses. 2025-01-10 Jakub Jelinek <jakub@redhat.com> PR c/118376 * c-parser.cc (c_parser_postfix_expression): Call set_c_expr_source_range before break in the __builtin_stdc_rotate_* case. * gcc.dg/pr118376.c: New test.
-