- Jan 11, 2024
-
-
Jin Ma authored
Due to the premature split optimizations for XTheadFMemIdx, GPR is allocated when reload allocates registers, resulting in the following insn. (insn 66 21 64 5 (set (reg:DF 14 a4 [orig:136 <retval> ] [136]) (mem:DF (plus:SI (reg/f:SI 15 a5 [141]) (ashift:SI (reg/v:SI 10 a0 [orig:137 i ] [137]) (const_int 3 [0x3]))) [0 S8 A64])) 218 {*movdf_hardfloat_rv32} (nil)) Since we currently do not support adjustments to th_m_mir/th_m_miu, which will trigger ICE. So it is recommended to place the split optimizations after reload to ensure FPR when registers are allocated. gcc/ChangeLog: * config/riscv/thead.md: Add limits for splits. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadfmemidx-medany.c: New test.
-
Andrew Pinski authored
The problem here is after the recent vectorizer improvements, we end up with a comparison against a vector bool 0 which then tries expand_single_bit_test which is not expecting vector comparisons at all. The IR was: vector(4) <signed-boolean:1> mask_patt_5.13; _Bool _12; mask_patt_5.13_44 = vect_perm_even_41 != { 0.0, 1.0e+0, 2.0e+0, 3.0e+0 }; _12 = mask_patt_5.13_44 == { 0, 0, 0, 0 }; and we tried to call expand_single_bit_test for the last comparison. Rejecting the vector comparison is needed. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/113322 gcc/ChangeLog: * expr.cc (do_store_flag): Don't try single bit tests with comparison on vector types. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr113322-1.c: New test. Signed-off-by:
Andrew Pinski <quic_apinski@quicinc.com>
-
Andrew Pinski authored
Since currently ranger does not work with the complexity of COND_EXPR in some cases so delaying the simplification of `1/x` for signed types help code generation. tree-ssa/divide-8.c is a new testcase where this can help. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/113301 gcc/ChangeLog: * match.pd (`1/x`): Delay signed case until late. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/divide-8.c: New test. Signed-off-by:
Andrew Pinski <quic_apinski@quicinc.com>
-
Jason Merrill authored
These two lines have been getting XPASS since the test was added. gcc/testsuite/ChangeLog: * g++.dg/cpp23/explicit-obj-diagnostics7.C: Remove xfail.
-
Tamar Christina authored
This removes -save-temps from the tests I've introduced to fix the LTO mismatches. gcc/testsuite/ChangeLog: PR testsuite/113319 * gcc.dg/bic-bitmask-13.c: Remove -save-temps. * gcc.dg/bic-bitmask-14.c: Likewise. * gcc.dg/bic-bitmask-15.c: Likewise. * gcc.dg/bic-bitmask-16.c: Likewise. * gcc.dg/bic-bitmask-17.c: Likewise. * gcc.dg/bic-bitmask-18.c: Likewise. * gcc.dg/bic-bitmask-19.c: Likewise. * gcc.dg/bic-bitmask-20.c: Likewise. * gcc.dg/bic-bitmask-21.c: Likewise. * gcc.dg/bic-bitmask-22.c: Likewise. * gcc.dg/bic-bitmask-7.c: Likewise. * gcc.dg/vect/vect-early-break-run_1.c: Likewise. * gcc.dg/vect/vect-early-break-run_10.c: Likewise. * gcc.dg/vect/vect-early-break-run_2.c: Likewise. * gcc.dg/vect/vect-early-break-run_3.c: Likewise. * gcc.dg/vect/vect-early-break-run_4.c: Likewise. * gcc.dg/vect/vect-early-break-run_5.c: Likewise. * gcc.dg/vect/vect-early-break-run_6.c: Likewise. * gcc.dg/vect/vect-early-break-run_7.c: Likewise. * gcc.dg/vect/vect-early-break-run_8.c: Likewise. * gcc.dg/vect/vect-early-break-run_9.c: Likewise.
-
Richard Biener authored
Vectorization of bit-precision inductions isn't implemented but we don't check this, instead we ICE during transform. PR tree-optimization/112505 * tree-vect-loop.cc (vectorizable_induction): Reject bit-precision induction. * gcc.dg/vect/pr112505.c: New testcase.
-
Richard Biener authored
The following makes sure the resulting boolean type is the same when eliding a float extension. PR tree-optimization/113126 * match.pd ((double)float CMP (double)float -> float CMP float): Make sure the boolean type is the same. * fold-const.cc (fold_binary_loc): Likewise. * gcc.dg/torture/pr113126.c: New testcase.
-
Richard Biener authored
The following avoids a mismatch between an early query for maximum number of iterations of a loop and a late one when through ranger we'd get iterations estimated. Instead make sure we compute niters before querying the iteration bound. PR tree-optimization/112636 * tree-ssa-loop-ch.cc (ch_base::copy_headers): Call estimate_numbers_of_iterations before querying get_max_loop_iterations_int. (pass_ch::execute): Initialize SCEV and loops appropriately. * gcc.dg/pr112636.c: New testcase.
-
Pan Li authored
The insert_var_expansion_initialization depends on the HONOR_SIGNED_ZEROS to initialize the unrolling variables to +0.0f when -0.0f and no-signed-option. Unfortunately, we should always keep the -0.0f here because: * The -0.0f is always the correct initial value. * We need to support the target that always honor signed zero. Thus, we need to leverage MODE_HAS_SIGNED_ZEROS when initialize instead of HONOR_SIGNED_ZEROS. Then the target/backend can decide to honor the no-signed-zero or not. We also removed the testcase pr30957-1.c, as it makes undefined behavior whether the return value is positive or negative. The below tests are passed for this patch: * The riscv regression tests. * The aarch64 regression tests. * The x86 bootstrap and regression tests. gcc/ChangeLog: * loop-unroll.cc (insert_var_expansion_initialization): Leverage MODE_HAS_SIGNED_ZEROS for expansion variable initialization. gcc/testsuite/ChangeLog: * gcc.dg/pr30957-1.c: Remove. Signed-off-by:
Pan Li <pan2.li@intel.com>
-
Alex Coplan authored
In r14-6604-gd7ee988c491cde43d04fe25f2b3dbad9d85ded45 we changed the CFI notes attached to callee saves (in aarch64_save_callee_saves). That patch changed the ldp/stp representation to use unspecs instead of PARALLEL moves. This meant that we needed to attach CFI notes to all frame-related pair saves such that dwarf2cfi could still emit the appropriate CFI (it cannot interpret the unspecs directly). The patch also attached REG_CFA_OFFSET notes to individual saves so that the ldp/stp pass could easily preserve them when forming stps. In that change I chose to use REG_CFA_OFFSET, but as the PR shows, that choice was problematic in that REG_CFA_OFFSET requires the attached store to be expressed in terms of the current CFA register at all times. This means that even scheduling of frame-related insns can break this invariant, leading to ICEs in dwarf2cfi. The old behaviour (before that change) allowed dwarf2cfi to interpret the RTL directly for sp-relative saves. This change restores that behaviour by using REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET. REG_FRAME_RELATED_EXPR effectively just gives a different pattern for dwarf2cfi to look at instead of the main insn pattern. That allows us to attach the old-style PARALLEL move representation in a REG_FRAME_RELATED_EXPR note and means we are free to always express the save addresses in terms of the stack pointer. Since the ldp/stp fusion pass can combine frame-related stores, this patch also updates it to preserve REG_FRAME_RELATED_EXPR notes, and additionally gives it the ability to synthesize those notes when combining sp-relative saves into an stp (the latter always needs a note due to the unspec representation, the former does not). gcc/ChangeLog: PR target/113077 * config/aarch64/aarch64-ldp-fusion.cc (filter_notes): Add fr_expr param to extract REG_FRAME_RELATED_EXPR notes. (combine_reg_notes): Handle REG_FRAME_RELATED_EXPR notes, and synthesize these if needed. Update caller ... (ldp_bb_info::fuse_pair): ... here. (ldp_bb_info::try_fuse_pair): Punt if either insn has writeback and either insn is frame-related. (find_trailing_add): Punt on frame-related insns. * config/aarch64/aarch64.cc (aarch64_save_callee_saves): Use REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET. gcc/testsuite/ChangeLog: PR target/113077 * gcc.target/aarch64/pr113077.c: New test.
-
Richard Biener authored
Testcase for fixed PR. PR tree-optimization/111003 gcc/testsuite/ * gcc.dg/tree-ssa/pr111003.c: New testcase.
-
Richard Biener authored
The optimization to expand uniform boolean vectors by sign-extension works only for dense masks but it failed to check that. PR middle-end/112740 * expr.cc (store_constructor): Check the integer vector mask has a single bit per element before using sign-extension to expand an uniform vector. * gcc.dg/pr112740.c: New testcase.
-
Juzhe-Zhong authored
This patch fixes the known issues on SLP cases: ble a2,zero,.L11 addiw t1,a2,-1 li a5,15 bleu t1,a5,.L9 srliw a7,t1,4 slli a7,a7,7 lui t3,%hi(.LANCHOR0) lui a6,%hi(.LANCHOR0+128) addi t3,t3,%lo(.LANCHOR0) li a4,128 addi a6,a6,%lo(.LANCHOR0+128) add a7,a7,a0 addi a3,a1,37 mv a5,a0 vsetvli zero,a4,e8,m8,ta,ma vle8.v v24,0(t3) vle8.v v16,0(a6) .L4: li a6,128 vle8.v v0,0(a3) vrgather.vv v8,v0,v24 vadd.vv v8,v8,v16 vse8.v v8,0(a5) add a5,a5,a6 add a3,a3,a6 bne a5,a7,.L4 andi a5,t1,-16 mv t1,a5 .L3: subw a2,a2,a5 li a4,1 beq a2,a4,.L5 slli a5,a5,32 srli a5,a5,32 addiw a2,a2,-1 slli a5,a5,3 csrr a4,vlenb slli a6,a2,32 addi t3,a5,37 srli a3,a6,29 slli a4,a4,2 add t3,a1,t3 add a5,a0,a5 mv t5,a3 bgtu a3,a4,.L14 .L6: li a4,50790400 addi a4,a4,1541 li a6,67633152 addi a6,a6,513 slli a4,a4,32 add a4,a4,a6 vsetvli t4,zero,e64,m4,ta,ma vmv.v.x v16,a4 vsetvli a6,zero,e16,m8,ta,ma vid.v v8 vsetvli zero,t5,e8,m4,ta,ma vle8.v v20,0(t3) vsetvli a6,zero,e16,m8,ta,ma csrr a7,vlenb vand.vi v8,v8,-8 vsetvli zero,zero,e8,m4,ta,ma slli a4,a7,2 vrgatherei16.vv v4,v20,v8 vadd.vv v4,v4,v16 vsetvli zero,t5,e8,m4,ta,ma vse8.v v4,0(a5) bgtu a3,a4,.L15 .L7: addw t1,a2,t1 .L5: slliw a5,t1,3 add a1,a1,a5 lui a4,%hi(.LC2) add a0,a0,a5 lbu a3,37(a1) addi a5,a4,%lo(.LC2) vsetivli zero,8,e8,mf2,ta,ma vmv.v.x v1,a3 vle8.v v2,0(a5) vadd.vv v1,v1,v2 vse8.v v1,0(a0) .L11: ret .L15: sub a3,a3,a4 bleu a3,a4,.L8 mv a3,a4 .L8: li a7,50790400 csrr a4,vlenb slli a4,a4,2 addi a7,a7,1541 li t4,67633152 add t3,t3,a4 vsetvli zero,a3,e8,m4,ta,ma slli a7,a7,32 addi t4,t4,513 vle8.v v20,0(t3) add a4,a5,a4 add a7,a7,t4 vsetvli a5,zero,e64,m4,ta,ma vmv.v.x v16,a7 vsetvli a6,zero,e16,m8,ta,ma vid.v v8 vand.vi v8,v8,-8 vsetvli zero,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v8 vadd.vv v4,v4,v16 vsetvli zero,a3,e8,m4,ta,ma vse8.v v4,0(a4) j .L7 .L14: mv t5,a4 j .L6 .L9: li a5,0 li t1,0 j .L3 The vectorization codegen is quite inefficient since we choose a VLS modes to vectorize the loop body with epilogue choosing a VLA modes. cost.c:6:21: note: ***** Choosing vector mode V128QI cost.c:6:21: note: ***** Choosing epilogue vector mode RVVM4QI As we known, in RVV side, we have VLA modes and VLS modes. VLAmodes support partial vectors wheras VLSmodes support full vectors. The goal we add VLSmodes is to improve the codegen of known NITERS or SLP codes. If NITERS is unknown, that is i < n, n is unknown. We will always have partial vectors vectorization. It can be loop body or epilogue. In this case, It's always more efficient to apply VLA partial vectorization on loop body which doesn't have epilogue. After this patch: f: ble a2,zero,.L7 li a5,1 beq a2,a5,.L5 li a6,50790400 addi a6,a6,1541 li a4,67633152 addi a4,a4,513 csrr a5,vlenb addiw a2,a2,-1 slli a6,a6,32 add a6,a6,a4 slli a5,a5,2 slli a4,a2,32 vsetvli t1,zero,e64,m4,ta,ma srli a3,a4,29 neg t4,a5 addi a7,a1,37 mv a4,a0 vmv.v.x v12,a6 vsetvli t3,zero,e16,m8,ta,ma vid.v v16 vand.vi v16,v16,-8 .L4: minu a6,a3,a5 vsetvli zero,a6,e8,m4,ta,ma vle8.v v8,0(a7) vsetvli t3,zero,e8,m4,ta,ma mv t1,a3 vrgatherei16.vv v4,v8,v16 vsetvli zero,a6,e8,m4,ta,ma vadd.vv v4,v4,v12 vse8.v v4,0(a4) add a7,a7,a5 add a4,a4,a5 add a3,a3,t4 bgtu t1,a5,.L4 .L3: slliw a2,a2,3 add a1,a1,a2 lui a5,%hi(.LC0) lbu a4,37(a1) add a0,a0,a2 addi a5,a5,%lo(.LC0) vsetivli zero,8,e8,mf2,ta,ma vmv.v.x v1,a4 vle8.v v2,0(a5) vadd.vv v1,v1,v2 vse8.v v1,0(a0) .L7: ret Tested on both RV32 and RV64 no regression. Ok for trunk ? gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): VLA preempt VLS on unknown NITERS loop. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/slp-1.c: Remove xfail. * gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
-
Lulu Cheng authored
LoongArch: Optimized some of the symbolic expansion instructions generated during bitwise operations. There are two mode iterators defined in the loongarch.md: (define_mode_iterator GPR [SI (DI "TARGET_64BIT")]) and (define_mode_iterator X [(SI "!TARGET_64BIT") (DI "TARGET_64BIT")]) Replace the mode in the bit arithmetic from GPR to X. Since the bitwise operation instruction does not distinguish between 64-bit, 32-bit, etc., it is necessary to perform symbolic expansion if the bitwise operation is less than 64 bits. The original definition would have generated a lot of redundant symbolic extension instructions. This problem is optimized with reference to the implementation of RISCV. Add this patch spec2017 500.perlbench performance improvement by 1.8% gcc/ChangeLog: * config/loongarch/loongarch.md (one_cmpl<mode>2): Replace GPR with X. (*nor<mode>3): Likewise. (nor<mode>3): Likewise. (*negsi2_extended): New template. (*<optab>si3_internal): Likewise. (*one_cmplsi2_internal): Likewise. (*norsi3_internal): Likewise. (*<optab>nsi_internal): Likewise. (bytepick_w_<bytepick_imm>_extend): Modify this template according to the modified bit operation to make the optimization work. gcc/testsuite/ChangeLog: * gcc.target/loongarch/sign-extend-bitwise.c: New test.
-
liuhongt authored
Similar for A < B ? B : A to MAX_EXPR. There're codes in the frontend to optimize such pattern but failed to handle testcase in the PR since it's exposed at gimple level when folding backend builtins. pr95906 now can be optimized to MAX_EXPR as it's commented in the testcase. // FIXME: this should further optimize to a MAX_EXPR typedef signed char v16i8 __attribute__((vector_size(16))); v16i8 f(v16i8 a, v16i8 b) gcc/ChangeLog: PR target/104401 * match.pd (VEC_COND_EXPR: A < B ? A : B -> MIN_EXPR): New patten match. gcc/testsuite/ChangeLog: * gcc.target/i386/pr104401.c: New test. * gcc.dg/tree-ssa/pr95906.c: Adjust testcase.
-
Gaius Mulley authored
This patch adds type checking for binary set operators. It also checks the IN operator and improves the := type checking. gcc/m2/ChangeLog: PR modula2/112946 * gm2-compiler/M2GenGCC.mod (IsExpressionCompatible): Import. (ExpressionTypeCompatible): Import. (CodeStatement): Remove op1, op2, op3 parameters from CodeSetOr, CodeSetAnd, CodeSetSymmetricDifference, CodeSetLogicalDifference. (checkArrayElements): Rename op1 to des and op3 to expr. Use despos and exprpos instead of CurrentQuadToken. (checkRecordTypes): Rename op1 to des and op2 to expr. Use virtpos instead of CurrentQuadToken. (checkIncorrectMeta): Ditto. (checkBecomes): Rename op1 to des and op3 to expr. Use virtpos instead of CurrentQuadToken. (NoWalkProcedure): New procedure stub. (CheckBinaryExpressionTypes): New procedure function. (CheckElementSetTypes): New procedure function. (CodeBinarySet): Re-write. (FoldBinarySet): Re-write. (CodeSetOr): Remove parameters op1, op2 and op3. (CodeSetAnd): Ditto. (CodeSetLogicalDifference): Ditto. (CodeSetSymmetricDifference): Ditto. (CodeIfIn): Call CheckBinaryExpressionTypes and CheckElementSetTypes. * gm2-compiler/M2Quads.mod (BuildRotateFunction): Correct parameters to MakeVirtualTok to reflect parameter block passed to Rotate. gcc/testsuite/ChangeLog: PR modula2/112946 * gm2/pim/fail/badbecomes.mod: New test. * gm2/pim/fail/badexpression.mod: New test. * gm2/pim/fail/badexpression2.mod: New test. * gm2/pim/fail/badifin.mod: New test. * gm2/pim/pass/goodifin.mod: New test. Signed-off-by:
Gaius Mulley <gaiusmod2@gmail.com>
-
GCC Administrator authored
-
- Jan 10, 2024
-
-
Juzhe-Zhong authored
v2 update: Robostify tests. While working on cost model, I notice one case that dynamic lmul cost doesn't work well. Before this patch: foo: lui a4,%hi(.LANCHOR0) li a0,1953 li a1,63 addi a4,a4,%lo(.LANCHOR0) li a3,64 vsetvli a2,zero,e32,mf2,ta,ma vmv.v.x v5,a0 vmv.v.x v4,a1 vid.v v3 .L2: vsetvli a5,a3,e32,mf2,ta,ma vadd.vi v2,v3,1 vadd.vv v1,v3,v5 mv a2,a5 vmacc.vv v1,v2,v4 slli a1,a5,2 vse32.v v1,0(a4) sub a3,a3,a5 add a4,a4,a1 vsetvli a5,zero,e32,mf2,ta,ma vmv.v.x v1,a2 vadd.vv v3,v3,v1 bne a3,zero,.L2 li a0,0 ret Unexpected: Use scalable vector and LMUL = MF2 which is wasting computation resources. Ideally, we should use LMUL = M8 VLS modes. The root cause is the dynamic LMUL heuristic dominates the VLS heuristic. Adapt the cost model heuristic. After this patch: foo: lui a4,%hi(.LANCHOR0) addi a4,a4,%lo(.LANCHOR0) li a3,4096 li a5,32 li a1,2016 addi a2,a4,128 addiw a3,a3,-32 vsetvli zero,a5,e32,m8,ta,ma li a0,0 vid.v v8 vsll.vi v8,v8,6 vadd.vx v16,v8,a1 vadd.vx v8,v8,a3 vse32.v v16,0(a4) vse32.v v8,0(a2) ret Tested on both RV32/RV64 no regression. Ok for trunk ? gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): Minior tweak. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Fix test. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto. * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
-
Antoni Boucher authored
gcc/ChangeLog: PR jit/111396 * ipa-fnsummary.cc (ipa_fnsummary_cc_finalize): Call ipa_free_size_summary. * ipa-icf.cc (ipa_icf_cc_finalize): New function. * ipa-profile.cc (ipa_profile_cc_finalize): New function. * ipa-prop.cc (ipa_prop_cc_finalize): New function. * ipa-prop.h (ipa_prop_cc_finalize): New function. * ipa-sra.cc (ipa_sra_cc_finalize): New function. * ipa-utils.h (ipa_profile_cc_finalize, ipa_icf_cc_finalize, ipa_sra_cc_finalize): New functions. * toplev.cc (toplev::finalize): Call ipa_icf_cc_finalize, ipa_prop_cc_finalize, ipa_profile_cc_finalize and ipa_sra_cc_finalize Include ipa-utils.h. gcc/testsuite/ChangeLog: PR jit/111396 * jit.dg/all-non-failing-tests.h: Add note about test-ggc-bugfix. * jit.dg/test-ggc-bugfix.c: New test.
-
Jin Ma authored
The XTheadInt ISA extension provides the following instructions to accelerate interrupt processing: * th.ipush * th.ipop Ref: https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf gcc/ChangeLog: * config/riscv/riscv-protos.h (th_int_get_mask): New prototype. (th_int_get_save_adjustment): Likewise. (th_int_adjust_cfi_prologue): Likewise. * config/riscv/riscv.cc (BITSET_P): Moved away from here. (TH_INT_INTERRUPT): New macro. (riscv_expand_prologue): Add the processing of XTheadInt. (riscv_expand_epilogue): Likewise. * config/riscv/riscv.h (BITSET_P): Moved to here. * config/riscv/riscv.md: New unspec. * config/riscv/thead.cc (th_int_get_mask): New function. (th_int_get_save_adjustment): Likewise. (th_int_adjust_cfi_prologue): Likewise. * config/riscv/thead.md (th_int_push): New pattern. (th_int_pop): new pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadint-push-pop.c: New test.
-
Tamar Christina authored
Currently GCC does not treat IFN_COPYSIGN the same as the copysign tree expr. The latter has a libcall fallback and the IFN can only do optabs. Because of this the change I made to optimize copysign only works if the target has impemented the optab, but it should work for those that have the libcall too. More annoyingly if a target has vector versions of ABS and NEG but not COPYSIGN then the change made them lose vectorization. The proper fix for this is to treat the IFN the same as the tree EXPR and to enhance expand_COPYSIGN to also support vector calls. I have such a patch for GCC 15 but it's quite big and too invasive for stage-4. As such this is a minimal fix, just don't apply the transformation and leave targets which don't have the optab unoptimized. Targets list for check_effective_target_ifn_copysign was gotten by grepping for copysign and looking at the optab. gcc/ChangeLog: PR tree-optimization/112468 * doc/sourcebuild.texi: Document ifn_copysign. * match.pd: Only apply transformation if target supports the IFN. gcc/testsuite/ChangeLog: PR tree-optimization/112468 * gcc.dg/fold-copysign-1.c: Modify tests based on if target supports IFN_COPYSIGN. * gcc.dg/pr55152-2.c: Likewise. * gcc.dg/tree-ssa/abs-4.c: Likewise. * gcc.dg/tree-ssa/backprop-6.c: Likewise. * gcc.dg/tree-ssa/copy-sign-2.c: Likewise. * gcc.dg/tree-ssa/mult-abs-2.c: Likewise. * lib/target-supports.exp (check_effective_target_ifn_copysign): New.
-
Andrew Pinski authored
Like r14-2293-g11350734240dba and r14-2289-gb083203f053f16, reassociation can combine across a few bb and one of the usage can be an uninitializated variable and if going from an conditional usage to an unconditional usage can cause wrong code. This uses maybe_undef_p like other passes where this can happen. Note if-to-switch uses the function (init_range_entry) provided by ressociation so we need to call mark_ssa_maybe_undefs there; otherwise we assume almost all ssa names are uninitialized. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/112581 * gimple-if-to-switch.cc (pass_if_to_switch::execute): Call mark_ssa_maybe_undefs. * tree-ssa-reassoc.cc (can_reassociate_op_p): Uninitialized variables can not be reassociated. (init_range_entry): Check for uninitialized variables too. (init_reassoc): Call mark_ssa_maybe_undefs. gcc/testsuite/ChangeLog: PR tree-optimization/112581 * gcc.c-torture/execute/pr112581-1.c: New test. Signed-off-by:
Andrew Pinski <quic_apinski@quicinc.com>
-
Maciej W. Rozycki authored
Add terminating `/' character missing from one of the test harness command clauses in pr105314.c. This causes no issue with compilation owing to another comment immediately following, but would cause a: pr105314.c:3:1: warning: "/*" within comment [-Wcomment] message if warnings were enabled. gcc/testsuite/ * gcc.target/riscv/pr105314.c: Fix comment termination.
-
Maciej W. Rozycki authored
Complement commit c1e8cb3d ("RISC-V: Rework branch costing model for if-conversion") and also handle extraneous sign extend operations that are sometimes produced by `noce_try_cmove_arith' instead of zero extend operations, making branch costing consistent. It is unclear what the condition is for the middle end to choose between the zero extend and sign extend operation, but the test case included uses sign extension with 64-bit targets, preventing if-conversion from triggering across all the architectural variants. There are further anomalies revealed by the test case, specifically the exceedingly high branch cost of 6 required for the `-mmovcc' variant despite that the final branchless sequence only uses 4 instructions, the missed conversion at -O1 for 32-bit targets even though code is machine word size agnostic, and the missed conversion at -Os and -Oz for 32-bit Zicond targets even though the branchless sequence would be shorter than the branched one. These will have to be handled separately. gcc/ * config/riscv/riscv.cc (riscv_noce_conversion_profitable_p): Also handle sign extension. gcc/testsuite/ * gcc.target/riscv/cset-sext-sfb.c: New test. * gcc.target/riscv/cset-sext-thead.c: New test. * gcc.target/riscv/cset-sext-ventana.c: New test. * gcc.target/riscv/cset-sext-zicond.c: New test. * gcc.target/riscv/cset-sext.c: New test.
-
Jakub Jelinek authored
This test was already fixed by r14-6051 aka PR112770 fix. 2024-01-10 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/112734 * gcc.dg/bitint-64.c: New test.
-
Tamar Christina authored
The vectorizer needs to know during early break vectorization whether the edge that will be taken if the condition is true stays or leaves the loop. This is because the code assumes that if you take the true branch you exit the loop. If you don't exit the loop it has to generate a different condition. Basically it uses this information to decide whether it's generating a "any element" or an "all element" check. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues with --enable-lto --with-build-config=bootstrap-O3 --enable-checking=release,yes,rtl,extra. gcc/ChangeLog: PR tree-optimization/113287 * tree-vect-stmts.cc (vectorizable_early_exit): Check the flags on edge instead of using BRANCH_EDGE to determine true edge. gcc/testsuite/ChangeLog: PR tree-optimization/113287 * gcc.dg/vect/vect-early-break_100-pr113287.c: New test. * gcc.dg/vect/vect-early-break_99-pr113287.c: New test.
-
Richard Biener authored
When if-conversion was changed to use .COND_ADD/SUB for conditional reduction it was forgotten to update reduction path handling to canonicalize .COND_SUB to .COND_ADD for vectorizable_reduction similar to what we do for MINUS_EXPR. The following adds this and testcases exercising this at runtime and looking for the appropriate masked subtraction in the vectorized code on x86. PR tree-optimization/113078 * tree-vect-loop.cc (check_reduction_path): Canonicalize .COND_SUB to .COND_ADD. * gcc.dg/vect/vect-reduc-cond-sub.c: New testcase. * gcc.target/i386/vect-pr113078.c: Likewise.
-
Julian Brown authored
This patch adjusts diagnostic output for C++23 and above for the test case mentioned in the commit title. 2024-01-10 Julian Brown <julian@codesourcery.com> gcc/testsuite/ * g++.dg/gomp/bad-array-section-10.C: Adjust diagnostics for C++23 and up.
-
Julian Brown authored
This patch fixes several tests introduced by the commit r14-7033-g1413af02d62182 for 32-bit targets. 2024-01-10 Julian Brown <julian@codesourcery.com> gcc/testsuite/ * g++.dg/gomp/array-section-1.C: Fix scan output for 32-bit target. * g++.dg/gomp/array-section-2.C: Likewise. * g++.dg/gomp/bad-array-section-4.C: Adjust error output for 32-bit target.
-
Tamar Christina authored
When we peel at_exit we are moving the new loop at the exit of the previous loop. This means that the blocks outside the loop dat the previous loop used to dominate are no longer being dominated by it. The new dominators however are hard to predict since if the loop has multiple exits and all the exits are an "early" one then we always execute the scalar loop. In this case the scalar loop can completely dominate the new loop. If we later have skip_vector then there's an additional skip edge added that might change the dominators. The previous patch would force an update of all blocks reachable from the new exits. This one updates *only* blocks that we know the scalar exits dominated. For the examples this reduces the blocks to update from 18 to 3. gcc/ChangeLog: PR tree-optimization/113144 PR tree-optimization/113145 * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Update all BB that the original exits dominated. gcc/testsuite/ChangeLog: PR tree-optimization/113144 PR tree-optimization/113145 * gcc.dg/vect/vect-early-break_94-pr113144.c: New test.
-
Jakub Jelinek authored
2024-01-10 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113297 * gcc.dg/bitint-63.c: Fix PR number.
-
chenxiaolong authored
The function of this test is to check that the compiler supports vectorization using SLP and vec_{load/store/*}_lanes. However, vec_{load/store/*}_lanes are not supported on LoongArch, such as the corresponding "st4/ld4" directives on aarch64. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-21.c: Add loongarch.
-
chenxiaolong authored
After the code is committed in r14-6948, GCC regression testing on some architectures will produce the following error: "error executing dg-final: unknown effective target keyword `loongarch*-*-*'" gcc/testsuite/ChangeLog: * lib/target-supports.exp: Removed an issue with "target keyword" checking errors on LoongArch architecture.
-
Jakub Jelinek authored
As changed in other parts of the compiler, using build_nonstandard_integer_type is not appropriate for arbitrary precisions, especially if the precision comes from a BITINT_TYPE or something based on that, build_nonstandard_integer_type relies on some integral mode being supported that can support the precision. The following patch uses build_bitint_type instead for BITINT_TYPE precisions. Note, it would be good if we were able to punt on the optimization (but this code doesn't seem to be able to punt, so it needs to be done somewhere earlier) at least in cases where building it would be invalid. E.g. right now BITINT_TYPE can support precisions up to 65535 (inclusive), but 65536 will not work anymore (we can't have > 16-bit TYPE_PRECISION). I've tried to replace 513 with 65532 in the testcase and it didn't ICE, so maybe it ran into some other SRA limit. 2024-01-10 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113120 * tree-sra.cc (analyze_access_subtree): For BITINT_TYPE with root->size TYPE_PRECISION don't build anything new. Otherwise, if root->type is a BITINT_TYPE, use build_bitint_type rather than build_nonstandard_integer_type. * gcc.dg/bitint-63.c: New test.
-
Juzhe-Zhong authored
This patch is inspired by LLVM patches: https://github.com/llvm/llvm-project/pull/76550 https://github.com/llvm/llvm-project/pull/77473 Use vaaddu for AVG vectorization. Before this patch: vsetivli zero,8,e8,mf2,ta,ma vle8.v v3,0(a1) vle8.v v2,0(a2) vwaddu.vv v1,v3,v2 vsetvli zero,zero,e16,m1,ta,ma vadd.vi v1,v1,1 vsetvli zero,zero,e8,mf2,ta,ma vnsrl.wi v1,v1,1 vse8.v v1,0(a0) ret After this patch: vsetivli zero,8,e8,mf2,ta,ma csrwi vxrm,0 vle8.v v1,0(a1) vle8.v v2,0(a2) vaaddu.vv v1,v1,v2 vse8.v v1,0(a0) ret Note on signed averaging addition Based on the rvv spec, there is also a variant for signed averaging addition called vaadd. But AFAIU, no matter in which rounding mode, we cannot achieve the semantic of signed averaging addition through vaadd. Thus this patch only introduces vaaddu. More details in: https://github.com/riscv/riscv-v-spec/issues/935 https://github.com/riscv/riscv-v-spec/issues/934 Tested on both RV32 and RV64 no regression. Ok for trunk ? gcc/ChangeLog: * config/riscv/autovec.md (<u>avg<v_double_trunc>3_floor): Remove. (avg<v_double_trunc>3_floor): New pattern. (<u>avg<v_double_trunc>3_ceil): Remove. (avg<v_double_trunc>3_ceil): New pattern. (uavg<mode>3_floor): Ditto. (uavg<mode>3_ceil): Ditto. * config/riscv/riscv-protos.h (enum insn_flags): Add for average addition. (enum insn_type): Ditto. * config/riscv/riscv-v.cc: Ditto. * config/riscv/vector-iterators.md (ashiftrt): Remove. (ASHIFTRT): Ditto. * config/riscv/vector.md: Add VLS modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/avg-1.c: Adapt test. * gcc.target/riscv/rvv/autovec/vls/avg-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/avg-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/avg-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/avg-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/avg-6.c: Ditto. * gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: Ditto.
-
Kewen Lin authored
As PR112751 shows, commit r14-5628 caused pcrel-sibcall-1.c to fail as it enables ipa-vrp which makes return values of functions {x,y,xx} as known and propagated. This patch is to adjust it with noipa to make it not fragile. PR testsuite/112751 gcc/testsuite/ChangeLog: * gcc.target/powerpc/pcrel-sibcall-1.c: Replace noinline as noipa.
-
Juzhe-Zhong authored
While working on refining the cost model, I notice this test will generate unexpected scalar xor instructions if we don't tune cost model carefully. Add more assembler to avoid future regression. Committed. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add assembler-not check.
-
GCC Administrator authored
-
- Jan 09, 2024
-
-
Jason Merrill authored
In a couple of places in the xobj patch I noticed that is_this_parameter probably wanted to change to is_object_parameter; this implements that and does the additional adjustments needed to make the accessor fixits handle xobj parms. gcc/cp/ChangeLog: * semantics.cc (is_object_parameter): New. * cp-tree.h (is_object_parameter): Declare. * call.cc (maybe_warn_class_memaccess): Use it. * search.cc (field_access_p): Use it. (class_of_object_parm): New. (field_accessor_p): Adjust for explicit object parms. gcc/testsuite/ChangeLog: * g++.dg/torture/accessor-fixits-9-xobj.C: New test.
-
waffl3x authored
This adds support for defaulted comparison operators and copy/move assignment operators, as well as allowing user defined xobj copy/move assignment operators. It turns out defaulted comparison operators already worked though, so this just adds a test for them. Defaulted comparison operators were not so nice and required a bit of a hack. Should work fine though! The diagnostics leave something to be desired, and there are some things that could be improved with more extensive design changes. There are a few notes left indicating where I think we could make improvements. Aside from some small bugs, with this commit xobj member functions should be feature complete. PR c++/102609 gcc/cp/ChangeLog: PR c++/102609 C++23 P0847R7 (deducing this) - CWG2586. * decl.cc (copy_fn_p): Accept xobj copy assignment functions. (move_signature_fn_p): Accept xobj move assignment functions. * method.cc (do_build_copy_assign): Handle defaulted xobj member functions. (defaulted_late_check): Comment. (defaultable_fn_check): Comment. gcc/testsuite/ChangeLog: PR c++/102609 C++23 P0847R7 (deducing this) - CWG2586. * g++.dg/cpp23/explicit-obj-basic6.C: New test. * g++.dg/cpp23/explicit-obj-default1.C: New test. * g++.dg/cpp23/explicit-obj-default2.C: New test. Signed-off-by:
Waffl3x <waffl3x@protonmail.com>
-