- Jul 29, 2023
-
-
Tobias Burnus authored
Fixes for commit r14-2792-g25072a477a56a727b369bf9b20f4d18198ff5894 "OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect", namely: In that commit, the code was changed to handle shared-memory devices; however, as pointed out, omp_target_memcpy_check already set the pointer to NULL in that case. Hence, this commit reverts to the prior version. In cuda.h, it adds cuMemcpyPeer{,Async} for symmetry for cuMemcpy3DPeer (all currently unused) and in three structs, fixes reserved-member names and remove a bogus 'const' in three structs. And it changes a DLSYM to DLSYM_OPT as not all plugins support the new functions, yet. include/ChangeLog: * cuda/cuda.h (CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): Remove bogus 'const' from 'const void *dst' and fix reserved-name name in those structs. (cuMemcpyPeer, cuMemcpyPeerAsync): Add. libgomp/ChangeLog: * target.c (omp_target_memcpy_rect_worker): Undo dim=1 change for GOMP_OFFLOAD_CAP_SHARED_MEM. (omp_target_memcpy_rect_copy): Likewise for lock condition. (gomp_load_plugin_for_device): Use DLSYM_OPT not DLSYM for memcpy3d/memcpy2d. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): Use memset 0 to nullify reserved and unused src/dst fields for that mem type; remove '{src,dst}LOD = 0'.
-
Jan Hubicka authored
gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vect-profile-upate-2.c: New test.
-
Jan Hubicka authored
Vectorizer while loop versioning produces a versioned loop guarded with two conditionals of the form if (cond1) goto scalar_loop else goto next_bb next_bb: if (cond2) godo scalar_loop else goto vector_loop It wants the combined test to be prob (whch is set to likely) and uses profile_probability::split to determine probability of cond1 and cond2. However spliting is turning: if (cond) goto lab; // ORIG probability into if (cond1) goto lab; // FIRST = ORIG * CPROB probability if (cond2) goto lab; // SECOND probability Which is or instead of and. As a result we get pretty low probabiility of entering vectorized loop. The fixes this by introducing sqrt to profile probability (which is correct way to split this) and also adding pow that is needed elsewhere. While loop versioning I now produce code as if there was only one combined conditional and then update probability of conditional produced (containig cond1). Later edge is split and new conditional is added. At that time it is necessary to update probability of the BB containing second conditional so everything matches. gcc/ChangeLog: * profile-count.cc (profile_probability::sqrt): New member function. (profile_probability::pow): Likewise. * profile-count.h: (profile_probability::sqrt): Declare (profile_probability::pow): Likewise. * tree-vect-loop-manip.cc (vect_loop_versioning): Fix profile update.
-
GCC Administrator authored
-
- Jul 28, 2023
-
-
Andrew MacLeod authored
* gimple-range-cache.cc (ssa_cache::merge_range): New. (ssa_lazy_cache::merge_range): New. * gimple-range-cache.h (class ssa_cache): Adjust protoypes. (class ssa_lazy_cache): Ditto. * gimple-range.cc (assume_query::calculate_op): Use merge_range.
-
Andrew MacLeod authored
* tree-ssa-propagate.cc (substitute_and_fold_engine::value_on_edge): Move from value-query.cc. (substitute_and_fold_engine::value_of_stmt): Ditto. (substitute_and_fold_engine::range_of_expr): New. * tree-ssa-propagate.h (substitute_and_fold_engine): Inherit from range_query. New prototypes. * value-query.cc (value_query::value_on_edge): Relocate. (value_query::value_of_stmt): Ditto. * value-query.h (class value_query): Remove. (class range_query): Remove base class. Adjust prototypes.
-
Andrew MacLeod authored
PR tree-optimization/110205 * gimple-range-cache.h (ranger_cache::m_estimate): Delete. * range-op-mixed.h (operator_bitwise_xor::op1_op2_relation_effect): Add final override. * range-op.cc (operator_lshift): Add missing final overrides. (operator_rshift): Ditto.
-
Joseph Myers authored
* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po, ja.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po, zh_TW.po: Update.
-
Jose E. Marchesi authored
clang disables tail call optimizations in BPF targets. Do the same in GCC. gcc/ChangeLog: * config/bpf/bpf.cc (bpf_option_override): Disable tail-call optimizations in BPF target.
-
Harald Anlauf authored
gcc/fortran/ChangeLog: PR fortran/110825 * gfortran.texi: Clarify argument passing convention. * trans-expr.cc (gfc_conv_procedure_call): Do not pass the character length as hidden argument when the declared dummy argument is assumed-type. gcc/testsuite/ChangeLog: PR fortran/110825 * gfortran.dg/assumed_type_18.f90: New test.
-
Honza authored
I have noticed that for all these three cases I need same update of loop exit probability. While my earlier patch unified it for unrollers, this patch makes it more general and also simplifies tree-ssa-loop-split.cc. I also refactored the code, since with all the special cases for corrupted profile it gets relatively long. I now also handle multiple loop exits in RTL unroller. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * cfgloopmanip.cc (loop_count_in): Break out from ... (loop_exit_for_scaling): Break out from ... (update_loop_exit_probability_scale_dom_bbs): Break out from ...; add more sanity check and debug info. (scale_loop_profile): ... here. (create_empty_loop_on_edge): Fix whitespac. * cfgloopmanip.h (update_loop_exit_probability_scale_dom_bbs): Declare. * loop-unroll.cc (unroll_loop_constant_iterations): Use update_loop_exit_probability_scale_dom_bbs. * tree-ssa-loop-manip.cc (update_exit_probability_after_unrolling): Remove. (tree_transform_and_unroll_loop): Use update_loop_exit_probability_scale_dom_bbs. * tree-ssa-loop-split.cc (split_loop): Use update_loop_exit_probability_scale_dom_bbs.
-
Patrick O'Neill authored
On rv32 targets, this patch fixes: FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test for excess errors) cc1: error: ABI requires '-march=rv32' gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/madd-split2-1.c: Add -mabi=lp64d to dg-options. Signed-off-by:
Patrick O'Neill <patrick@rivosinc.com>
-
Ng YongXiang authored
PR c++/110057 PR ipa/83054 gcc/cp/ChangeLog: * init.cc (build_vec_delete_1): Devirtualize array destruction. gcc/testsuite/ChangeLog: * g++.dg/warn/pr83054.C: Remove devirtualization warning. * g++.dg/lto/pr89335_0.C: Likewise. * g++.dg/tree-ssa/devirt-array-destructor-1.C: New test. * g++.dg/tree-ssa/devirt-array-destructor-2.C: New test. * g++.dg/warn/pr83054-2.C: New test. Signed-off-by:
Ng Yong Xiang <yongxiangng@gmail.com>
-
Jan Hubicka authored
extend tree-ssa-loop-split to understand test of the form if (i==0) and if (i!=0) which triggers only during the first iteration. Naturally we should also be able to trigger last iteration or split into 3 cases if the test indeed can fire in the middle of the loop. Last iteration is bit trickier pattern matching so I want to do it incrementally, but I implemented easy case using value range that handled loops with constant iterations. The testcase gets misupdated profile, I will also fix that incrementally. gcc/ChangeLog: PR middle-end/77689 * tree-ssa-loop-split.cc: Include value-query.h. (split_at_bb_p): Analyze cases where EQ/NE can be turned into LT/LE/GT/GE; return updated guard code. (split_loop): Use guard code. gcc/testsuite/ChangeLog: PR middle-end/77689 * g++.dg/tree-ssa/loop-split-1.C: New test.
-
Roger Sayle authored
This patch is one of a series of fixes for PR rtl-optimization/110587, a compile-time regression with -O0, that attempts to address the underlying cause. As noted previously, the pathological test case pr28071.c contains a large number of useless register-to-register moves that can produce quadratic behaviour (in LRA). These moves are generated during RTL expansion in emit_group_load_1, where the middle-end attempts to simplify the source before calling extract_bit_field. This is reasonable if the source is a complex expression (from before the tree-ssa optimizers), or a SUBREG, or a hard register, but it's not particularly useful to copy a pseudo register into a new pseudo register. This patch eliminates that redundancy. The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains 777K lines, with this patch it contains 717K lines, i.e. saving about 60K lines (admittedly of debugging text output, but it makes the point). 2023-07-28 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * expr.cc (emit_group_load_1): Simplify logic for calling force_reg on ORIG_SRC, to avoid making a copy if the source is already in a pseudo register.
-
Jan Hubicka authored
this patch fixes profile update in the first case of loop splitting. The pass still gives up on very basic testcases: __attribute__ ((noinline,noipa)) void test1 (int n) { if (n <= 0 || n > 100000) return; for (int i = 0; i <= n; i++) { if (i < n) do_something (); if (a[i]) do_something2(); } } Here I needed to do the conditoinal that enforces sane value range of n. The reason is that it gives up on: !number_of_iterations_exit (loop1, exit1, &niter, false, true) and without the conditonal we get assumption that n>=0 and not INT_MAX. I think from overflow we shold derive that INT_MAX test is not needed and since the loop does nothing for n<0 it is also just an paranoia. I am not sure how to fix this though :(. In general the pass does not really need to compute iteration count. It only needs to know what direction the IVs go so it can detect tests that fires in first part of iteration space. Rich, any idea what the correct test should be? In testcase: for (int i = 0; i < 200; i++) if (i < 150) do_something (); else do_something2 (); the old code did wrong update of the exit condition probabilities. We know that first loop iterates 150 times and the second loop 50 times and we get it by simply scaling loop body by the probability of inner test. With the patch we now get: <bb 2> [count: 1000]: <bb 3> [count: 150000]: <- loop 1 correctly iterates 149 times # i_10 = PHI <i_7(8), 0(2)> do_something (); i_7 = i_10 + 1; if (i_7 <= 149) goto <bb 8>; [99.33%] else goto <bb 17>; [0.67%] <bb 8> [count: 149000]: goto <bb 3>; [100.00%] <bb 16> [count: 1000]: # i_15 = PHI <i_18(17)> <bb 9> [count: 49975]: <- loop 2 should iterate 50 times but we are slightly wrong # i_3 = PHI <i_15(16), i_14(13)> do_something2 (); i_14 = i_3 + 1; if (i_14 != 200) goto <bb 13>; [98.00%] else goto <bb 7>; [2.00%] <bb 13> [count: 48975]: goto <bb 9>; [100.00%] <bb 17> [count: 1000]: <- this test is always true becuase it is reached form bb 3 # i_18 = PHI <i_7(3)> if (i_18 != 200) goto <bb 16>; [99.95%] else goto <bb 7>; [0.05%] <bb 7> [count: 1000]: return; The reason why we are slightly wrong is the condtion in bb17 that is always true but the pass does not konw it. Rich any idea how to do that? I think connect_loops should work out the cas where the loop exit conditon is never satisfied at the time the splitted condition fails for first time. Before patch on hmmer we get a lot of mismatches: Profile report here claims: dump id |static mismat|dynamic mismatch | |in count |in count |time | lsplit | 5 +5| 8151850567 +8151850567| 531506481006 +57.9%| ldist | 9 +4| 15345493501 +7193642934| 606848841056 +14.2%| ifcvt | 10 +1| 15487514871 +142021370| 689469797790 +13.6%| vect | 35 +25| 17558425961 +2070911090| 517375405715 -25.0%| cunroll | 42 +7| 16898736178 -659689783| 452445796198 -4.9%| loopdone| 33 -9| 2678017188 -14220718990| 330969127663 | tracer | 34 +1| 2678018710 +1522| 330613415364 +0.0%| fre | 33 -1| 2676980249 -1038461| 330465677073 -0.0%| expand | 28 -5| 2497468467 -179511782|--------------------------| With patch lsplit | 0 | 0 | 328723360744 -2.3%| ldist | 0 | 0 | 396193562452 +20.6%| ifcvt | 1 +1| 71010686 +71010686| 478743508522 +20.8%| vect | 14 +13| 697518955 +626508269| 299398068323 -37.5%| cunroll | 13 -1| 489349408 -208169547| 257777839725 -10.5%| loopdone| 11 -2| 402558559 -86790849| 201010712702 | tracer | 13 +2| 402977200 +418641| 200651036623 +0.0%| fre | 13 | 402622146 -355054| 200344398654 -0.2%| expand | 11 -2| 333608636 -69013510|--------------------------| So no mismatches for lsplit and ldist and also lsplit thinks it improves speed by 2.3% rather than regressig it by 57%. Update is still not perfect since we do not work out that the second loop never iterates. Ifcft wrecks profile by desing since it insert conditonals with both arms 100% that will be eliminated later after vect. It is not clear to me what happens in vect though. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: PR middle-end/106923 * tree-ssa-loop-split.cc (connect_loops): Change probability of the test preconditioning second loop to very_likely. (fix_loop_bb_probability): Handle correctly case where on of the arms of the conditional is empty. (split_loop): Fold the test guarding first condition to see if it is constant true; Set correct entry block probabilities of the split loops; determine correct loop eixt probabilities. gcc/testsuite/ChangeLog: PR middle-end/106293 * gcc.dg/tree-prof/loop-split-1.c: New test. * gcc.dg/tree-prof/loop-split-2.c: New test. * gcc.dg/tree-prof/loop-split-3.c: New test.
-
Eric Botcazou authored
gcc/ada/ * gcc-interface/trans.cc (gnat_to_gnu): Restrict previous change to the case where the simple return statement has got no storage pool.
-
Clément Chigot authored
All functions but Interrupt_Wait in s-inmaop__posix are checking the result of their syscalls with an assert. However, any return code of sigwait different than 0 means that something went wrong for it. From sigwait man: > RETURN VALUE > On success, sigwait() returns 0. On error, it returns a > positive error number (listed in ERRORS). gcc/ada/ * libgnarl/s-inmaop__posix.adb: Add assert after sigwait in Interrupt_Wait
-
Javier Miranda authored
Add dummy build-in-place parameters when a BIP function does not require the BIP parameters but it is a dispatching operation that inherited them. gcc/ada/ * einfo-utils.adb (Underlying_Type): Protect recursion call against non-available attribute Etype. * einfo.ads (Protected_Subprogram): Fix typo in documentation. * exp_ch3.adb (BIP_Function_Call_Id): New subprogram. (Expand_N_Object_Declaration): Improve code that evaluates if the object is initialized with a BIP function call. * exp_ch6.adb (Is_True_Build_In_Place_Function_Call): New subprogram. (Add_Task_Actuals_To_Build_In_Place_Call): Add dummy actuals if the function does not require the BIP task actuals but it is a dispatching operation that inherited them. (Build_In_Place_Formal): Improve code to avoid never-ending loop if the BIP formal is not found. (Add_Dummy_Build_In_Place_Actuals): New subprogram. (Expand_Call_Helper): Add calls to Add_Dummy_Build_In_Place_Actuals. (Expand_N_Extended_Return_Statement): Adjust assertion. (Expand_Simple_Function_Return): Adjust assertion. (Make_Build_In_Place_Call_In_Allocator): No action needed if the called function inherited the BIP extra formals but it is not a true BIP function. (Make_Build_In_Place_Call_In_Assignment): Ditto. * exp_intr.adb (Expand_Dispatching_Constructor_Call): Remove code reporting unsupported case (since this patch adds support for it). * sem_ch6.adb (Analyze_Subprogram_Body_Helper): Adding assertion to ensure matching of BIP formals when setting the Protected_Formal field of a protected subprogram to reference the corresponding extra formal of the subprogram that implements it. (Might_Need_BIP_Task_Actuals): New subprogram. (Create_Extra_Formals): Improve code adding inherited extra formals.
-
Pascal Obry authored
gcc/ada/ * s-oscons-tmplt.c: Add support for SO_BINDTODEVICE constant. * libgnat/g-socket.ads (Set_Socket_Option): Handle SO_BINDTODEVICE option. (Get_Socket_Option): Handle SO_BINDTODEVICE option. * libgnat/g-socket.adb: Likewise. (Get_Socket_Option): Handle the case where IF_NAMESIZE is not defined and so equal to -1.
-
Léo Creuse authored
This change corrects the Has_Decision predicate in par_sco.adb to properly consider predicates of quantified expressions as decisions. gcc/ada/ * par_sco.adb (Has_Decision): Consider that quantified expressions contain decisions.
-
Ronan Desplanques authored
This patch only affects the single-entry implementation of protected objects. Before this patch, there was a race condition where a task that called an entry could put itself to sleep right after another task had executed the entry as a proxy and signalled the not-yet-waiting first task, which caused the first task to enter a deadlock. Note that this race condition has been identified and fixed before for the implementations of the run-time that live under hie/. This patch reworks the locking sequence so that it is closer to the one that's used in the multiple-entry implementation of protected objects. The code for the multiple-entry implementation is spread across multiple subprograms. To draw a parallel with the section this patch modifies, one can read the following subprograms: - System.Tasking.Protected_Objects.Operations.Protected_Entry_Call - System.Tasking.Entry_Calls.Wait_For_Completion - System.Tasking.Entry_Calls.Check_Pending_Actions_For_Entry_Call This patch also adds a comment that explicitly states the locking constraint that must hold in the affected section. gcc/ada/ * libgnarl/s-tposen.adb: Fix race condition. Add comment to justify the locking timing.
-
Viljar Indus authored
gcc/ada/ * exp_util.adb (Find_Optional_Prim_Op): use "No" instead of "= Empty"
-
Piotr Trojanek authored
When skipping check on subprograms built for class-wide preconditions we must deal with the current scope not being a subprogram, e.g. it could be a declare-block. gcc/ada/ * sem_res.adb (Resolve_Actuals): Add guard for the call to Class_Preconditions_Subprogram.
-
Eric Botcazou authored
It occurs at compile time on an aggregate of a 2-dimensional packed array type whose component type is itself a packed array, because the compiler is trying to pack the intermediate aggregate and ends up rewriting a bunch of subcomponents. This optimization was originally devised for the case of a scalar component type so the change adds this restriction. gcc/ada/ * exp_aggr.adb (Is_Two_Dim_Packed_Array): Return true only if the component type of the array is scalar.
-
Piotr Trojanek authored
GNAT has a heuristic to warn about missing return statements in functions. This warning was escalated to errors when operating in GNATprove mode and SPARK_Mode was On. However, this heuristic was imprecise and caused spurious errors. Also, it was applied after the Push_Scope/End_Scope, so for functions acting as compilation units it was using the wrong SPARK_Mode. It is better to simply leave this detection to GNATprove. gcc/ada/ * sem_ch6.adb (Check_Statement_Sequence): Only warn about missing return statements and let GNATprove emit a check when needed.
-
Tom Tromey authored
This patch changes xsnamest and gen_il-gen to emit various constants as enums rather than a sequence of preprocessor defines. This enables better debugging and somewhat better type safety. gcc/ada/ * fe.h (Convention): Now inline function. * gen_il-gen.adb (Put_C_Type_And_Subtypes.Put_Enum_Lit) (Put_C_Type_And_Subtypes.Put_Kind_Subtype, Put_C_Getter): Emit enum. * snames.h-tmpl (Name_Id, Name_, Attribute_Id, Attribute_) (Convention_Id, Convention_, Pragma_Id, Pragma_): Now enum. (Get_Attribute_Id, Get_Pragma_Id): Now inline functions. * types.h (Node_Kind, Entity_Kind, Convention_Id, Name_Id): Now enum. * xsnamest.adb (Output_Header_Line, Make_Value): Emit enum.
-
Piotr Trojanek authored
Minor typo in comment. gcc/ada/ * libgnat/a-except.ads (Save_Occurrence): Fix typo.
-
Piotr Trojanek authored
It is much simpler and safer for the routine Number_Formals to accept subprogram entities that have no formals. gcc/ada/ * einfo-utils.adb (Number_Formals): Change types in body. * einfo-utils.ads (Number_Formals): Change type in spec. * einfo.ads (Number_Formals): Change type in comment. * sem_ch13.adb (Is_Property_Function): Fix style in a caller of Number_Formals that was likely to crash because of missing guards.
-
Piotr Trojanek authored
Fix crash occurring when attribute System'To_Address is used without a WITH clause for package System. gcc/ada/ * sem_warn.adb (Check_Infinite_Loop_Warning): Don't look at the type of actual parameter when it has no type at all, e.g. because the entire subprogram call is illegal.
-
xuli authored
Computation of `vsadd`, `vsaddu`, `vssub`, and `vssubu` do not need the rounding mode, therefore the intrinsics of these instructions do not have the parameter for rounding mode control. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: remove rounding mode of vsadd[u] and vssub[u]. * config/riscv/vector.md: Ditto. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/bug-12.C: Adapt testcase. * g++.target/riscv/rvv/base/bug-14.C: Ditto. * g++.target/riscv/rvv/base/bug-18.C: Ditto. * g++.target/riscv/rvv/base/bug-19.C: Ditto. * g++.target/riscv/rvv/base/bug-20.C: Ditto. * g++.target/riscv/rvv/base/bug-21.C: Ditto. * g++.target/riscv/rvv/base/bug-22.C: Ditto. * g++.target/riscv/rvv/base/bug-23.C: Ditto. * g++.target/riscv/rvv/base/bug-3.C: Ditto. * g++.target/riscv/rvv/base/bug-8.C: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto. * gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto. * gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: New test. * gcc.target/riscv/rvv/base/fixed-point-vxrm.c: New test.
-
Jan Hubicka authored
while looking on profile misupdate on hmmer I noticed that loop splitting pass is not able to handle the loop it has as an example it should apply on: One transformation of loops like: for (i = 0; i < 100; i++) { if (i < 50) A; else B; } into: for (i = 0; i < 50; i++) { A; } for (; i < 100; i++) { B; } The problem is that ivcanon turns the test into i != 100 and the pass explicitly gives up on any loops ending with != test. It needs to know the directoin of the induction variable in order to derive right conditions, but that can be done also from step. It turns out that there are no testcases for basic loop splitting. I will add some with the profile update fix. gcc/ChangeLog: * tree-ssa-loop-split.cc (split_loop): Also support NE driven loops when IV test is not overflowing. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ifc-12.c: Disable loop splitting. * gcc.target/i386/avx2-gather-6.c: Likewise. * gcc.target/i386/avx2-vect-aggressive.c: Likewise.
-
liuhongt authored
Prevent rtl optimization of vec_duplicate + zero_extend to vpbroadcastm since there could be an extra kmov after RA. gcc/ChangeLog: PR target/110788 * config/i386/sse.md (avx512cd_maskb_vec_dup<mode>): Add UNSPEC_MASKOP. (avx512cd_maskw_vec_dup<mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110788.c: New test.
-
GCC Administrator authored
-
- Jul 27, 2023
-
-
David Faust authored
BPF ISA V4 introduces sign-extending move and load operations. This patch makes the BPF backend generate those instructions, when enabled and useful. A new option, -m[no-]smov gates generation of these instructions, and is enabled by default for -mcpu=v4 and above. Tests for the new instructions and documentation for the new options are included. PR target/110782 PR target/110784 gcc/ * config/bpf/bpf.opt (msmov): New option. * config/bpf/bpf.cc (bpf_option_override): Handle it here. * config/bpf/bpf.md (*extendsidi2): New. (extendhidi2): New. (extendqidi2): New. (extendsisi2): New. (extendhisi2): New. (extendqisi2): New. * doc/invoke.texi (Option Summary): Add -msmov eBPF option. (eBPF Options): Add -m[no-]smov. Document that -mcpu=v4 also enables -msmov. gcc/testsuite/ * gcc.target/bpf/sload-1.c: New test. * gcc.target/bpf/sload-pseudoc-1.c: New test. * gcc.target/bpf/smov-1.c: New test. * gcc.target/bpf/smov-pseudoc-1.c: New test.
-
David Faust authored
This patch makes some minor cleanups to eBPF options documented in invoke.texi: - Delete some vestigal docs for removed -mkernel option - Add -mbswap and -msdiv to the option summary - Note the negative versions of several options - Note that -mcpu=v4 also enables -msdiv. gcc/ * doc/invoke.texi (Option Summary): Remove -mkernel eBPF option. Add -mbswap and -msdiv eBPF options. (eBPF Options): Remove -mkernel. Add -mno-{jmpext, jmp32, alu32, v3-atomics, bswap, sdiv}. Document that -mcpu=v4 also enables -msdiv.
-
David Faust authored
The pseudo-C output templates for these instructions were incorrectly using operand 1 rather than operand 2 on the RHS, which led to some very incorrect assembly generation with -masm=pseudoc. gcc/ * config/bpf/bpf.md (add<AM:mode>3): Use %w2 instead of %w1 in pseudo-C dialect output template. (sub<AM:mode>3): Likewise. gcc/testsuite/ * gcc.target/bpf/alu-2.c: New test. * gcc.target/bpf/alu-pseudoc-2.c: Likewise.
-
Jan Hubicka authored
gcc/ChangeLog: * tree-vect-loop.cc (optimize_mask_stores): Make store likely.
-
Jan Hubicka authored
This patch fixes profile update after RTL unroll, that is now done same way as in tree one. We still produce (slightly) corrupted profile for multiple exit loops I can try to fix incrementally. I also updated testcases to look for profile mismatches so they do not creep back in again. gcc/ChangeLog: * cfgloop.h (single_dom_exit): Declare. * cfgloopmanip.h (update_exit_probability_after_unrolling): Declare. * cfgrtl.cc (struct cfg_hooks): Fix comment. * loop-unroll.cc (unroll_loop_constant_iterations): Update exit edge. * tree-ssa-loop-ivopts.h (single_dom_exit): Do not declare it here. * tree-ssa-loop-manip.cc (update_exit_probability_after_unrolling): Break out from ... (tree_transform_and_unroll_loop): ... here; gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/peel-1.c: Test for profile mismatches. * gcc.dg/tree-prof/unroll-1.c: Test for profile mismatches. * gcc.dg/tree-ssa/peel1.c: Test for profile mismatches. * gcc.dg/unroll-1.c: Test for profile mismatches. * gcc.dg/unroll-3.c: Test for profile mismatches. * gcc.dg/unroll-4.c: Test for profile mismatches. * gcc.dg/unroll-5.c: Test for profile mismatches. * gcc.dg/unroll-6.c: Test for profile mismatches.
-
Tobias Burnus authored
The previous version failed to diagnose when the 'teams' was nested more deeply inside the target region, e.g. inside a DO or some block or structured block. PR fortran/110725 PR middle-end/71065 gcc/fortran/ChangeLog: * openmp.cc (resolve_omp_target): Minor cleanup. * parse.cc (decode_omp_directive): Find TARGET statement also higher in the stack. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/teams-6.f90: Extend.
-