Skip to content
Snippets Groups Projects
  1. Jan 11, 2024
    • Ken Matsui's avatar
      libstdc++: Fix error handling in filesystem::equivalent [PR113250] · df147e2e
      Ken Matsui authored
      
      This patch made std::filesystem::equivalent correctly throw an exception
      when either path does not exist as per [fs.op.equivalent]/4.
      
      	PR libstdc++/113250
      
      libstdc++-v3/ChangeLog:
      
      	* src/c++17/fs_ops.cc (fs::equivalent): Use || instead of &&.
      	* src/filesystem/ops.cc (fs::equivalent): Likewise.
      	* testsuite/27_io/filesystem/operations/equivalent.cc: Handle
      	error codes.
      	* testsuite/experimental/filesystem/operations/equivalent.cc:
      	Likewise.
      
      Signed-off-by: default avatarKen Matsui <kmatsui@gcc.gnu.org>
      Reviewed-by: default avatarJonathan Wakely <jwakely@redhat.com>
      df147e2e
    • Yang Yujie's avatar
      LoongArch: Implement option save/restore · ea2a9c76
      Yang Yujie authored
      LTO option streaming and target attributes both require per-function
      target configuration, which is achieved via option save/restore.
      
      We implement TARGET_OPTION_{SAVE,RESTORE} to switch the la_target
      context in addition to other automatically maintained option states
      (via the "Save" option property in the .opt files).
      
      Tested on loongarch64-linux-gnu without regression.
      
      	PR target/113233
      
      gcc/ChangeLog:
      
      	* config/loongarch/genopts/loongarch.opt.in: Mark options with
      	the "Save" property.
      	* config/loongarch/loongarch.opt: Same.
      	* config/loongarch/loongarch-opts.cc: Refresh -mcmodel= state
      	according to la_target.
      	* config/loongarch/loongarch.cc: Implement TARGET_OPTION_{SAVE,
      	RESTORE} for the la_target structure; Rename option conditions
      	to have the same "la_" prefix.
      	* config/loongarch/loongarch.h: Same.
      ea2a9c76
    • Pan Li's avatar
      LOOP-UNROLL: Leverage HAS_SIGNED_ZERO for var expansion · b89ef3d4
      Pan Li authored
      
      The insert_var_expansion_initialization depends on the
      HONOR_SIGNED_ZEROS to initialize the unrolling variables
      to +0.0f when -0.0f and no-signed-option.  Unfortunately,
      we should always keep the -0.0f here because:
      
      * The -0.0f is always the correct initial value.
      * We need to support the target that always honor signed zero.
      
      Thus, we need to leverage MODE_HAS_SIGNED_ZEROS when initialize
      instead of HONOR_SIGNED_ZEROS.  Then the target/backend can
      decide to honor the no-signed-zero or not.
      
      We also removed the testcase pr30957-1.c, as it makes undefined behavior
      whether the return value is positive or negative.
      
      The below tests are passed for this patch:
      
      * The riscv regression tests.
      * The aarch64 regression tests.
      * The x86 bootstrap and regression tests.
      
      gcc/ChangeLog:
      
      	* loop-unroll.cc (insert_var_expansion_initialization): Leverage
      	MODE_HAS_SIGNED_ZEROS for expansion variable initialization.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/pr30957-1.c: Remove.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      b89ef3d4
    • Alex Coplan's avatar
      aarch64: Fix dwarf2cfi ICEs due to recent CFI note changes [PR113077] · 5400778f
      Alex Coplan authored
      In r14-6604-gd7ee988c491cde43d04fe25f2b3dbad9d85ded45 we changed the CFI notes
      attached to callee saves (in aarch64_save_callee_saves).  That patch changed
      the ldp/stp representation to use unspecs instead of PARALLEL moves.  This meant
      that we needed to attach CFI notes to all frame-related pair saves such that
      dwarf2cfi could still emit the appropriate CFI (it cannot interpret the unspecs
      directly).  The patch also attached REG_CFA_OFFSET notes to individual saves so
      that the ldp/stp pass could easily preserve them when forming stps.
      
      In that change I chose to use REG_CFA_OFFSET, but as the PR shows, that
      choice was problematic in that REG_CFA_OFFSET requires the attached
      store to be expressed in terms of the current CFA register at all times.
      This means that even scheduling of frame-related insns can break this
      invariant, leading to ICEs in dwarf2cfi.
      
      The old behaviour (before that change) allowed dwarf2cfi to interpret the RTL
      directly for sp-relative saves.  This change restores that behaviour by using
      REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.  REG_FRAME_RELATED_EXPR
      effectively just gives a different pattern for dwarf2cfi to look at instead of
      the main insn pattern.  That allows us to attach the old-style PARALLEL move
      representation in a REG_FRAME_RELATED_EXPR note and means we are free to always
      express the save addresses in terms of the stack pointer.
      
      Since the ldp/stp fusion pass can combine frame-related stores, this patch also
      updates it to preserve REG_FRAME_RELATED_EXPR notes, and additionally gives it
      the ability to synthesize those notes when combining sp-relative saves into an
      stp (the latter always needs a note due to the unspec representation, the former
      does not).
      
      gcc/ChangeLog:
      
      	PR target/113077
      	* config/aarch64/aarch64-ldp-fusion.cc (filter_notes): Add
      	fr_expr param to extract REG_FRAME_RELATED_EXPR notes.
      	(combine_reg_notes): Handle REG_FRAME_RELATED_EXPR notes, and
      	synthesize these if needed.  Update caller ...
      	(ldp_bb_info::fuse_pair): ... here.
      	(ldp_bb_info::try_fuse_pair): Punt if either insn has writeback
      	and either insn is frame-related.
      	(find_trailing_add): Punt on frame-related insns.
      	* config/aarch64/aarch64.cc (aarch64_save_callee_saves): Use
      	REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/113077
      	* gcc.target/aarch64/pr113077.c: New test.
      5400778f
    • YunQiang Su's avatar
      MIPS: Add ATTRIBUTE_UNUSED to mips_start_function_definition · b531bc36
      YunQiang Su authored
      Fix build warning:
        mips.cc: warning: unused parameter 'decl'.
      
      gcc
      	* config/mips/mips.cc (mips_start_function_definition):
      	Add ATTRIBUTE_UNUSED.
      b531bc36
    • Richard Biener's avatar
      tree-optimization/111003 - new testcase · 96fb3908
      Richard Biener authored
      Testcase for fixed PR.
      
      	PR tree-optimization/111003
      gcc/testsuite/
      	* gcc.dg/tree-ssa/pr111003.c: New testcase.
      96fb3908
    • Richard Biener's avatar
      middle-end/112740 - vector boolean CTOR expansion issue · e1f2d58a
      Richard Biener authored
      The optimization to expand uniform boolean vectors by sign-extension
      works only for dense masks but it failed to check that.
      
      	PR middle-end/112740
      	* expr.cc (store_constructor): Check the integer vector
      	mask has a single bit per element before using sign-extension
      	to expand an uniform vector.
      
      	* gcc.dg/pr112740.c: New testcase.
      e1f2d58a
    • Juzhe-Zhong's avatar
      RISC-V: VLA preempts VLS on unknown NITERS loop · 1a51886a
      Juzhe-Zhong authored
      This patch fixes the known issues on SLP cases:
      
      	ble	a2,zero,.L11
      	addiw	t1,a2,-1
      	li	a5,15
      	bleu	t1,a5,.L9
      	srliw	a7,t1,4
      	slli	a7,a7,7
      	lui	t3,%hi(.LANCHOR0)
      	lui	a6,%hi(.LANCHOR0+128)
      	addi	t3,t3,%lo(.LANCHOR0)
      	li	a4,128
      	addi	a6,a6,%lo(.LANCHOR0+128)
      	add	a7,a7,a0
      	addi	a3,a1,37
      	mv	a5,a0
      	vsetvli	zero,a4,e8,m8,ta,ma
      	vle8.v	v24,0(t3)
      	vle8.v	v16,0(a6)
      .L4:
      	li	a6,128
      	vle8.v	v0,0(a3)
      	vrgather.vv	v8,v0,v24
      	vadd.vv	v8,v8,v16
      	vse8.v	v8,0(a5)
      	add	a5,a5,a6
      	add	a3,a3,a6
      	bne	a5,a7,.L4
      	andi	a5,t1,-16
      	mv	t1,a5
      .L3:
      	subw	a2,a2,a5
      	li	a4,1
      	beq	a2,a4,.L5
      	slli	a5,a5,32
      	srli	a5,a5,32
      	addiw	a2,a2,-1
      	slli	a5,a5,3
      	csrr	a4,vlenb
      	slli	a6,a2,32
      	addi	t3,a5,37
      	srli	a3,a6,29
      	slli	a4,a4,2
      	add	t3,a1,t3
      	add	a5,a0,a5
      	mv	t5,a3
      	bgtu	a3,a4,.L14
      .L6:
      	li	a4,50790400
      	addi	a4,a4,1541
      	li	a6,67633152
      	addi	a6,a6,513
      	slli	a4,a4,32
      	add	a4,a4,a6
      	vsetvli	t4,zero,e64,m4,ta,ma
      	vmv.v.x	v16,a4
      	vsetvli	a6,zero,e16,m8,ta,ma
      	vid.v	v8
      	vsetvli	zero,t5,e8,m4,ta,ma
      	vle8.v	v20,0(t3)
      	vsetvli	a6,zero,e16,m8,ta,ma
      	csrr	a7,vlenb
      	vand.vi	v8,v8,-8
      	vsetvli	zero,zero,e8,m4,ta,ma
      	slli	a4,a7,2
      	vrgatherei16.vv	v4,v20,v8
      	vadd.vv	v4,v4,v16
      	vsetvli	zero,t5,e8,m4,ta,ma
      	vse8.v	v4,0(a5)
      	bgtu	a3,a4,.L15
      .L7:
      	addw	t1,a2,t1
      .L5:
      	slliw	a5,t1,3
      	add	a1,a1,a5
      	lui	a4,%hi(.LC2)
      	add	a0,a0,a5
      	lbu	a3,37(a1)
      	addi	a5,a4,%lo(.LC2)
      	vsetivli	zero,8,e8,mf2,ta,ma
      	vmv.v.x	v1,a3
      	vle8.v	v2,0(a5)
      	vadd.vv	v1,v1,v2
      	vse8.v	v1,0(a0)
      .L11:
      	ret
      .L15:
      	sub	a3,a3,a4
      	bleu	a3,a4,.L8
      	mv	a3,a4
      .L8:
      	li	a7,50790400
      	csrr	a4,vlenb
      	slli	a4,a4,2
      	addi	a7,a7,1541
      	li	t4,67633152
      	add	t3,t3,a4
      	vsetvli	zero,a3,e8,m4,ta,ma
      	slli	a7,a7,32
      	addi	t4,t4,513
      	vle8.v	v20,0(t3)
      	add	a4,a5,a4
      	add	a7,a7,t4
      	vsetvli	a5,zero,e64,m4,ta,ma
      	vmv.v.x	v16,a7
      	vsetvli	a6,zero,e16,m8,ta,ma
      	vid.v	v8
      	vand.vi	v8,v8,-8
      	vsetvli	zero,zero,e8,m4,ta,ma
      	vrgatherei16.vv	v4,v20,v8
      	vadd.vv	v4,v4,v16
      	vsetvli	zero,a3,e8,m4,ta,ma
      	vse8.v	v4,0(a4)
      	j	.L7
      .L14:
      	mv	t5,a4
      	j	.L6
      .L9:
      	li	a5,0
      	li	t1,0
      	j	.L3
      
      The vectorization codegen is quite inefficient since we choose a VLS modes to vectorize the loop body
      with epilogue choosing a VLA modes.
      
      cost.c:6:21: note:  ***** Choosing vector mode V128QI
      cost.c:6:21: note:  ***** Choosing epilogue vector mode RVVM4QI
      
      As we known, in RVV side, we have VLA modes and VLS modes. VLAmodes support partial vectors wheras
      VLSmodes support full vectors.  The goal we add VLSmodes is to improve the codegen of known NITERS
      or SLP codes.
      
      If NITERS is unknown, that is i < n, n is unknown. We will always have partial vectors vectorization.
      It can be loop body or epilogue. In this case, It's always more efficient to apply VLA partial vectorization
      on loop body which doesn't have epilogue.
      
      After this patch:
      
      f:
      	ble	a2,zero,.L7
      	li	a5,1
      	beq	a2,a5,.L5
      	li	a6,50790400
      	addi	a6,a6,1541
      	li	a4,67633152
      	addi	a4,a4,513
      	csrr	a5,vlenb
      	addiw	a2,a2,-1
      	slli	a6,a6,32
      	add	a6,a6,a4
      	slli	a5,a5,2
      	slli	a4,a2,32
      	vsetvli	t1,zero,e64,m4,ta,ma
      	srli	a3,a4,29
      	neg	t4,a5
      	addi	a7,a1,37
      	mv	a4,a0
      	vmv.v.x	v12,a6
      	vsetvli	t3,zero,e16,m8,ta,ma
      	vid.v	v16
      	vand.vi	v16,v16,-8
      .L4:
      	minu	a6,a3,a5
      	vsetvli	zero,a6,e8,m4,ta,ma
      	vle8.v	v8,0(a7)
      	vsetvli	t3,zero,e8,m4,ta,ma
      	mv	t1,a3
      	vrgatherei16.vv	v4,v8,v16
      	vsetvli	zero,a6,e8,m4,ta,ma
      	vadd.vv	v4,v4,v12
      	vse8.v	v4,0(a4)
      	add	a7,a7,a5
      	add	a4,a4,a5
      	add	a3,a3,t4
      	bgtu	t1,a5,.L4
      .L3:
      	slliw	a2,a2,3
      	add	a1,a1,a2
      	lui	a5,%hi(.LC0)
      	lbu	a4,37(a1)
      	add	a0,a0,a2
      	addi	a5,a5,%lo(.LC0)
      	vsetivli	zero,8,e8,mf2,ta,ma
      	vmv.v.x	v1,a4
      	vle8.v	v2,0(a5)
      	vadd.vv	v1,v1,v2
      	vse8.v	v1,0(a0)
      .L7:
      	ret
      
      Tested on both RV32 and RV64 no regression. Ok for trunk ?
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): VLA
      	preempt VLS on unknown NITERS loop.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Remove xfail.
      	* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
      	* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
      	* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
      1a51886a
    • Ken Matsui's avatar
      libstdc++: Optimize std::is_compound compilation performance · 2105c49b
      Ken Matsui authored
      
      This patch optimizes the compilation performance of std::is_compound.
      
      libstdc++-v3/ChangeLog:
      
      	* include/std/type_traits (is_compound): Do not use __not_.
      	(is_compound_v): Use is_fundamental_v instead.
      
      Signed-off-by: default avatarKen Matsui <kmatsui@gcc.gnu.org>
      2105c49b
    • Haochen Jiang's avatar
      Add -mevex512 into invoke.texi · 49a14ee4
      Haochen Jiang authored
      Hi Richard,
      
      It seems that I send out a not updated patch. This patch should what
      I want to send.
      
      Thx,
      Haochen
      
      gcc/ChangeLog:
      
      	* doc/invoke.texi: Add -mevex512.
      49a14ee4
    • Lulu Cheng's avatar
      LoongArch: Optimized some of the symbolic expansion instructions generated... · b4deb244
      Lulu Cheng authored
      LoongArch: Optimized some of the symbolic expansion instructions generated during bitwise operations.
      
      There are two mode iterators defined in the loongarch.md:
      	(define_mode_iterator GPR [SI (DI "TARGET_64BIT")])
        and
      	(define_mode_iterator X [(SI "!TARGET_64BIT") (DI "TARGET_64BIT")])
      Replace the mode in the bit arithmetic from GPR to X.
      
      Since the bitwise operation instruction does not distinguish between 64-bit,
      32-bit, etc., it is necessary to perform symbolic expansion if the bitwise
      operation is less than 64 bits.
      The original definition would have generated a lot of redundant symbolic
      extension instructions. This problem is optimized with reference to the
      implementation of RISCV.
      
      Add this patch spec2017 500.perlbench performance improvement by 1.8%
      
      gcc/ChangeLog:
      
      	* config/loongarch/loongarch.md (one_cmpl<mode>2): Replace GPR with X.
      	(*nor<mode>3): Likewise.
      	(nor<mode>3): Likewise.
      	(*negsi2_extended): New template.
      	(*<optab>si3_internal): Likewise.
      	(*one_cmplsi2_internal): Likewise.
      	(*norsi3_internal): Likewise.
      	(*<optab>nsi_internal): Likewise.
      	(bytepick_w_<bytepick_imm>_extend): Modify this template according to the
      	modified bit operation to make the optimization work.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/loongarch/sign-extend-bitwise.c: New test.
      b4deb244
    • liuhongt's avatar
      Optimize A < B ? A : B to MIN_EXPR. · 6686e16f
      liuhongt authored
      Similar for A < B ? B : A to MAX_EXPR.
      There're codes in the frontend to optimize such pattern but failed to
      handle testcase in the PR since it's exposed at gimple level when
      folding backend builtins.
      
      pr95906 now can be optimized to MAX_EXPR as it's commented in the
      testcase.
      
      // FIXME: this should further optimize to a MAX_EXPR
       typedef signed char v16i8 __attribute__((vector_size(16)));
       v16i8 f(v16i8 a, v16i8 b)
      
      gcc/ChangeLog:
      
      	PR target/104401
      	* match.pd (VEC_COND_EXPR: A < B ? A : B -> MIN_EXPR): New patten match.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr104401.c: New test.
      	* gcc.dg/tree-ssa/pr95906.c: Adjust testcase.
      6686e16f
    • Gaius Mulley's avatar
      PR modula2/112946 set expression type checking · 96a9355a
      Gaius Mulley authored
      
      This patch adds type checking for binary set operators.
      It also checks the IN operator and improves the := type checking.
      
      gcc/m2/ChangeLog:
      
      	PR modula2/112946
      	* gm2-compiler/M2GenGCC.mod (IsExpressionCompatible): Import.
      	(ExpressionTypeCompatible): Import.
      	(CodeStatement): Remove op1, op2, op3 parameters from CodeSetOr,
      	CodeSetAnd, CodeSetSymmetricDifference, CodeSetLogicalDifference.
      	(checkArrayElements): Rename op1 to des and op3 to expr.
      	Use despos and exprpos instead of CurrentQuadToken.
      	(checkRecordTypes): Rename op1 to des and op2 to expr.
      	Use virtpos instead of CurrentQuadToken.
      	(checkIncorrectMeta): Ditto.
      	(checkBecomes): Rename op1 to des and op3 to expr.
      	Use virtpos instead of CurrentQuadToken.
      	(NoWalkProcedure): New procedure stub.
      	(CheckBinaryExpressionTypes): New procedure function.
      	(CheckElementSetTypes): New procedure function.
      	(CodeBinarySet): Re-write.
      	(FoldBinarySet): Re-write.
      	(CodeSetOr): Remove parameters op1, op2 and op3.
      	(CodeSetAnd): Ditto.
      	(CodeSetLogicalDifference): Ditto.
      	(CodeSetSymmetricDifference): Ditto.
      	(CodeIfIn): Call CheckBinaryExpressionTypes and
      	CheckElementSetTypes.
      	* gm2-compiler/M2Quads.mod (BuildRotateFunction): Correct
      	parameters to MakeVirtualTok to reflect parameter block
      	passed to Rotate.
      
      gcc/testsuite/ChangeLog:
      
      	PR modula2/112946
      	* gm2/pim/fail/badbecomes.mod: New test.
      	* gm2/pim/fail/badexpression.mod: New test.
      	* gm2/pim/fail/badexpression2.mod: New test.
      	* gm2/pim/fail/badifin.mod: New test.
      	* gm2/pim/pass/goodifin.mod: New test.
      
      Signed-off-by: default avatarGaius Mulley <gaiusmod2@gmail.com>
      96a9355a
    • Mike Frysinger's avatar
      config: delete unused CYG_AC_PATH_LIBERTY macro · be9b6820
      Mike Frysinger authored
      Nothing uses this, so delete it to avoid confusion.
      
      config/ChangeLog:
      
      	* acinclude.m4 (CYG_AC_PATH_LIBERTY): Delete.
      be9b6820
    • GCC Administrator's avatar
      Daily bump. · 45af8962
      GCC Administrator authored
      45af8962
    • Patrick Palka's avatar
      libstdc++: Use _GLIBCXX_USE_BUILTIN_TRAIT for _Nth_type · c84363b8
      Patrick Palka authored
      Since _Nth_type has a fallback native implementation, use
      _GLIBCXX_USE_BUILTIN_TRAIT when checking for __type_pack_element
      so that we can easily toggle which implementation to use.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/utility.h (_Nth_type): Use
      	_GLIBCXX_USE_BUILTIN_TRAIT instead of __has_builtin.
      c84363b8
  2. Jan 10, 2024
    • Juzhe-Zhong's avatar
      RISC-V: Switch RVV cost model. · 3b8ef3f2
      Juzhe-Zhong authored
      This patch is preparing patch for the following cost model tweak.
      
      Since we don't have vector cost model in default tune info (rocket),
      we set the cost model default as generic cost model by default.
      
      The reason we want to switch to generic vector cost model is the default
      cost model generates inferior codegen for various benchmarks.
      
      For example, PR113247, we have performance bug that we end up having over 70%
      performance drop of SHA256.  Currently, no matter how we adapt cost model,
      we are not able to fix the performance bug since we always use default cost model by default.
      
      Also, tweak the generic cost model back to default cost model since we have some FAILs in
      current tests.
      
      After this patch, we (me an Robin) can work on cost model tunning together to improve performane
      in various benchmarks.
      
      Tested on both RV32 and RV64, ok for trunk ?
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc (get_common_costs): Switch RVV cost model.
      	(get_vector_costs): Ditto.
      	(riscv_builtin_vectorization_cost): Ditto.
      3b8ef3f2
    • Juzhe-Zhong's avatar
      RISC-V: Minor tweak dynamic cost model · 2aa83f0a
      Juzhe-Zhong authored
      v2 update: Robostify tests.
      
      While working on cost model, I notice one case that dynamic lmul cost doesn't work well.
      
      Before this patch:
      
      foo:
              lui     a4,%hi(.LANCHOR0)
              li      a0,1953
              li      a1,63
              addi    a4,a4,%lo(.LANCHOR0)
              li      a3,64
              vsetvli a2,zero,e32,mf2,ta,ma
              vmv.v.x v5,a0
              vmv.v.x v4,a1
              vid.v   v3
      .L2:
              vsetvli a5,a3,e32,mf2,ta,ma
              vadd.vi v2,v3,1
              vadd.vv v1,v3,v5
              mv      a2,a5
              vmacc.vv        v1,v2,v4
              slli    a1,a5,2
              vse32.v v1,0(a4)
              sub     a3,a3,a5
              add     a4,a4,a1
              vsetvli a5,zero,e32,mf2,ta,ma
              vmv.v.x v1,a2
              vadd.vv v3,v3,v1
              bne     a3,zero,.L2
              li      a0,0
              ret
      
      Unexpected: Use scalable vector and LMUL = MF2 which is wasting computation resources.
      
      Ideally, we should use LMUL = M8 VLS modes.
      
      The root cause is the dynamic LMUL heuristic dominates the VLS heuristic.
      Adapt the cost model heuristic.
      
      After this patch:
      
      foo:
      	lui	a4,%hi(.LANCHOR0)
      	addi	a4,a4,%lo(.LANCHOR0)
      	li	a3,4096
      	li	a5,32
      	li	a1,2016
      	addi	a2,a4,128
      	addiw	a3,a3,-32
      	vsetvli	zero,a5,e32,m8,ta,ma
      	li	a0,0
      	vid.v	v8
      	vsll.vi	v8,v8,6
      	vadd.vx	v16,v8,a1
      	vadd.vx	v8,v8,a3
      	vse32.v	v16,0(a4)
      	vse32.v	v8,0(a2)
      	ret
      
      Tested on both RV32/RV64 no regression.
      
      Ok for trunk ?
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): Minior tweak.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Fix test.
      	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
      	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
      2aa83f0a
    • Antoni Boucher's avatar
      libgccjit: Fix GGC segfault when using -flto · 8415bcee
      Antoni Boucher authored
      gcc/ChangeLog:
      	PR jit/111396
      	* ipa-fnsummary.cc (ipa_fnsummary_cc_finalize): Call
      	ipa_free_size_summary.
      	* ipa-icf.cc (ipa_icf_cc_finalize): New function.
      	* ipa-profile.cc (ipa_profile_cc_finalize): New function.
      	* ipa-prop.cc (ipa_prop_cc_finalize): New function.
      	* ipa-prop.h (ipa_prop_cc_finalize): New function.
      	* ipa-sra.cc (ipa_sra_cc_finalize): New function.
      	* ipa-utils.h (ipa_profile_cc_finalize, ipa_icf_cc_finalize,
      	ipa_sra_cc_finalize): New functions.
      	* toplev.cc (toplev::finalize): Call ipa_icf_cc_finalize,
      	ipa_prop_cc_finalize, ipa_profile_cc_finalize and
      	ipa_sra_cc_finalize
      	Include ipa-utils.h.
      
      gcc/testsuite/ChangeLog:
      	PR jit/111396
      	* jit.dg/all-non-failing-tests.h: Add note about test-ggc-bugfix.
      	* jit.dg/test-ggc-bugfix.c: New test.
      8415bcee
    • Jin Ma's avatar
      RISC-V: T-HEAD: Add support for the XTheadInt ISA extension · 52e809d5
      Jin Ma authored
      The XTheadInt ISA extension provides the following instructions
      to accelerate interrupt processing:
      * th.ipush
      * th.ipop
      
      Ref:
      https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-protos.h (th_int_get_mask): New prototype.
      	(th_int_get_save_adjustment): Likewise.
      	(th_int_adjust_cfi_prologue): Likewise.
      	* config/riscv/riscv.cc (BITSET_P): Moved away from here.
      	(TH_INT_INTERRUPT): New macro.
      	(riscv_expand_prologue): Add the processing of XTheadInt.
      	(riscv_expand_epilogue): Likewise.
      	* config/riscv/riscv.h (BITSET_P): Moved to here.
      	* config/riscv/riscv.md: New unspec.
      	* config/riscv/thead.cc (th_int_get_mask): New function.
      	(th_int_get_save_adjustment): Likewise.
      	(th_int_adjust_cfi_prologue): Likewise.
      	* config/riscv/thead.md (th_int_push): New pattern.
      	(th_int_pop): new pattern.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/xtheadint-push-pop.c: New test.
      52e809d5
    • Tamar Christina's avatar
      middle-end: Don't apply copysign optimization if target does not implement optab [PR112468] · 7cbe41d3
      Tamar Christina authored
      Currently GCC does not treat IFN_COPYSIGN the same as the copysign tree expr.
      The latter has a libcall fallback and the IFN can only do optabs.
      
      Because of this the change I made to optimize copysign only works if the
      target has impemented the optab, but it should work for those that have the
      libcall too.
      
      More annoyingly if a target has vector versions of ABS and NEG but not COPYSIGN
      then the change made them lose vectorization.
      
      The proper fix for this is to treat the IFN the same as the tree EXPR and to
      enhance expand_COPYSIGN to also support vector calls.
      
      I have such a patch for GCC 15 but it's quite big and too invasive for stage-4.
      As such this is a minimal fix, just don't apply the transformation and leave
      targets which don't have the optab unoptimized.
      
      Targets list for check_effective_target_ifn_copysign was gotten by grepping for
      copysign and looking at the optab.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/112468
      	* doc/sourcebuild.texi: Document ifn_copysign.
      	* match.pd: Only apply transformation if target supports the IFN.
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/112468
      	* gcc.dg/fold-copysign-1.c: Modify tests based on if target supports
      	IFN_COPYSIGN.
      	* gcc.dg/pr55152-2.c: Likewise.
      	* gcc.dg/tree-ssa/abs-4.c: Likewise.
      	* gcc.dg/tree-ssa/backprop-6.c: Likewise.
      	* gcc.dg/tree-ssa/copy-sign-2.c: Likewise.
      	* gcc.dg/tree-ssa/mult-abs-2.c: Likewise.
      	* lib/target-supports.exp (check_effective_target_ifn_copysign): New.
      7cbe41d3
    • Andrew Pinski's avatar
      reassoc vs uninitialized variable [PR112581] · 113475d0
      Andrew Pinski authored
      
      Like r14-2293-g11350734240dba and r14-2289-gb083203f053f16,
      reassociation can combine across a few bb and one of the usage
      can be an uninitializated variable and if going from an conditional
      usage to an unconditional usage can cause wrong code.
      This uses maybe_undef_p like other passes where this can happen.
      
      Note if-to-switch uses the function (init_range_entry) provided
      by ressociation so we need to call mark_ssa_maybe_undefs there;
      otherwise we assume almost all ssa names are uninitialized.
      
      Bootstrapped and tested on x86_64-linux-gnu.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/112581
      	* gimple-if-to-switch.cc (pass_if_to_switch::execute): Call
      	mark_ssa_maybe_undefs.
      	* tree-ssa-reassoc.cc (can_reassociate_op_p): Uninitialized
      	variables can not be reassociated.
      	(init_range_entry): Check for uninitialized variables too.
      	(init_reassoc): Call mark_ssa_maybe_undefs.
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/112581
      	* gcc.c-torture/execute/pr112581-1.c: New test.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      113475d0
    • Maciej W. Rozycki's avatar
      RISC-V/testsuite: Fix comment termination in pr105314.c · 3353e7d2
      Maciej W. Rozycki authored
      Add terminating `/' character missing from one of the test harness
      command clauses in pr105314.c.  This causes no issue with compilation
      owing to another comment immediately following, but would cause a:
      
      pr105314.c:3:1: warning: "/*" within comment [-Wcomment]
      
      message if warnings were enabled.
      
      	gcc/testsuite/
      	* gcc.target/riscv/pr105314.c: Fix comment termination.
      3353e7d2
    • Maciej W. Rozycki's avatar
      RISC-V: Also handle sign extension in branch costing · 6c3365e7
      Maciej W. Rozycki authored
      Complement commit c1e8cb3d ("RISC-V: Rework branch costing model for
      if-conversion") and also handle extraneous sign extend operations that
      are sometimes produced by `noce_try_cmove_arith' instead of zero extend
      operations, making branch costing consistent.  It is unclear what the
      condition is for the middle end to choose between the zero extend and
      sign extend operation, but the test case included uses sign extension
      with 64-bit targets, preventing if-conversion from triggering across all
      the architectural variants.
      
      There are further anomalies revealed by the test case, specifically the
      exceedingly high branch cost of 6 required for the `-mmovcc' variant
      despite that the final branchless sequence only uses 4 instructions, the
      missed conversion at -O1 for 32-bit targets even though code is machine
      word size agnostic, and the missed conversion at -Os and -Oz for 32-bit
      Zicond targets even though the branchless sequence would be shorter than
      the branched one.  These will have to be handled separately.
      
      	gcc/
      	* config/riscv/riscv.cc (riscv_noce_conversion_profitable_p):
      	Also handle sign extension.
      
      	gcc/testsuite/
      	* gcc.target/riscv/cset-sext-sfb.c: New test.
      	* gcc.target/riscv/cset-sext-thead.c: New test.
      	* gcc.target/riscv/cset-sext-ventana.c: New test.
      	* gcc.target/riscv/cset-sext-zicond.c: New test.
      	* gcc.target/riscv/cset-sext.c: New test.
      6c3365e7
    • Jakub Jelinek's avatar
      testsuite: Add testcase for already fixed PR [PR112734] · ac6bcce1
      Jakub Jelinek authored
      This test was already fixed by r14-6051 aka PR112770 fix.
      
      2024-01-10  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/112734
      	* gcc.dg/bitint-64.c: New test.
      ac6bcce1
    • Alex Coplan's avatar
      aarch64: Make ldp/stp pass off by default · 8ed77a23
      Alex Coplan authored
      As discussed on IRC, this makes the aarch64 ldp/stp pass off by default.  This
      should stabilize the trunk and give some time to address the P1 regressions.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
      	to 0.
      	(-mlate-ldp-fusion): Likewise.
      8ed77a23
    • Tamar Christina's avatar
      middle-end: correctly identify the edge taken when condition is true. [PR113287] · 91fd5c94
      Tamar Christina authored
      The vectorizer needs to know during early break vectorization whether the edge
      that will be taken if the condition is true stays or leaves the loop.
      
      This is because the code assumes that if you take the true branch you exit the
      loop.  If you don't exit the loop it has to generate a different condition.
      
      Basically it uses this information to decide whether it's generating a
      "any element" or an "all element" check.
      
      Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
      and no issues with --enable-lto --with-build-config=bootstrap-O3
      --enable-checking=release,yes,rtl,extra.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/113287
      	* tree-vect-stmts.cc (vectorizable_early_exit): Check the flags on edge
      	instead of using BRANCH_EDGE to determine true edge.
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/113287
      	* gcc.dg/vect/vect-early-break_100-pr113287.c: New test.
      	* gcc.dg/vect/vect-early-break_99-pr113287.c: New test.
      91fd5c94
    • Richard Biener's avatar
      tree-optimization/113078 - conditional subtraction reduction vectorization · cac9d2d2
      Richard Biener authored
      When if-conversion was changed to use .COND_ADD/SUB for conditional
      reduction it was forgotten to update reduction path handling to
      canonicalize .COND_SUB to .COND_ADD for vectorizable_reduction
      similar to what we do for MINUS_EXPR.  The following adds this
      and testcases exercising this at runtime and looking for the
      appropriate masked subtraction in the vectorized code on x86.
      
      	PR tree-optimization/113078
      	* tree-vect-loop.cc (check_reduction_path): Canonicalize
      	.COND_SUB to .COND_ADD.
      
      	* gcc.dg/vect/vect-reduc-cond-sub.c: New testcase.
      	* gcc.target/i386/vect-pr113078.c: Likewise.
      cac9d2d2
    • Tamar Christina's avatar
      c++ frontend: initialize ivdep value · f8a70fb2
      Tamar Christina authored
      Should control enter the switch from one of the cases other than
      the IVDEP one then the variable remains uninitialized.
      
      This fixes it by initializing it to false.
      
      gcc/cp/ChangeLog:
      
      	* parser.cc (cp_parser_pragma): Initialize to false.
      f8a70fb2
    • David Malcolm's avatar
      gcc-urlifier: handle option prefixes such as '-fno-' · be2bf5dc
      David Malcolm authored
      
      Given e.g. this missppelled option (omitting the trailing 's'):
      $ LANG=C ./xgcc -B. -fno-inline-small-function
      xgcc: error: unrecognized command-line option '-fno-inline-small-function'; did you mean '-fno-inline-small-functions'?
      
      we weren't providing a documentation URL for the suggestion.
      
      The issue is the URLification code uses find_opt, which doesn't consider
      the various '-fno-' prefixes.
      
      This patch adds a way to find the pertinent prefix remapping and uses it
      when determining URLs.
      With this patch, the suggestion '-fno-inline-small-functions' now gets a
      documentation link (to that of '-finline-small-functions').
      
      gcc/ChangeLog:
      	* gcc-urlifier.cc (gcc_urlifier::get_url_suffix_for_option):
      	Handle prefix mappings before calling find_opt.
      	(selftest::gcc_urlifier_cc_tests): Add example of urlifying a
      	"-fno-"-prefixed command-line option.
      	* opts-common.cc (get_option_prefix_remapping): New.
      	* opts.h (get_option_prefix_remapping): New decl.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      be2bf5dc
    • David Malcolm's avatar
      pretty-print: support urlification in phase 3 · 5daf9104
      David Malcolm authored
      
      TL;DR: for the case when the user misspells a command-line option
      and we suggest one, with this patch we now provide a documentation URL
      for the suggestion.
      
      In r14-5118-gc5db4d8ba5f3de I added a mechanism to automatically add
      URLs to quoted strings in diagnostics, and in r14-6920-g9e49746da303b8
      through r14-6923-g4ded42c2c5a5c9 wired this up so that any time
      we mention a command-line option in a diagnostic message in quotes,
      the user gets a URL to the HTML documentation for that option.
      
      However this only worked for quoted strings that were fully within
      a single "chunk" within the pretty-printer implementation, such as:
      
      * "%<-foption%>" (handled in phase 1)
      * "%qs", "-foption" (handled in phase 2)
      
      but not where the the quoted string straddled multiple chunks, in
      particular for this important case in the gcc.cc:
      
      	  error ("unrecognized command-line option %<-%s%>;"
      		 " did you mean %<-%s%>?",
      		 switches[i].part1, hint);
      
      e.g. for:
      $ LANG=C ./xgcc -B. -finling-small-functions
      xgcc: error: unrecognized command-line option '-finling-small-functions'; did you mean '-finline-small-functions'?
      
      which within pp_format becomes these chunks:
      
      * chunk 0: "unrecognized command-line option `-"
      * chunk 1: switches[i].part1  (e.g. "finling-small-functions")
      * chunk 2: "'; did you mean `-"
      * chunk 3: hint (e.g. "finline-small-functions")
      * chunk 4: "'?"
      
      where the first quoted run is in chunks 1-3 and the second in
      chunks 2-4.
      
      Hence we were not attempting to provide a URL for the two quoted runs,
      and, in particular not for the hint.
      
      This patch refactors the urlification mechanism in pretty-print.cc so
      that it checks for quoted runs that appear in phase 3 (as well as in
      phases 1 and 2, as before).  With this, the quoted text runs
      "-finling-small-functions" and "-finline-small-functions" are passed
      to the urlifier, which successfully finds a documentation URL for
      the latter.
      
      As before, the urlification code is only run if the URL escapes are
      enabled, and only for messages from diagnostic.cc (error, warn, inform,
      etc), not for all pretty_printer usage.
      
      gcc/ChangeLog:
      	* diagnostic.cc (diagnostic_context::report_diagnostic): Pass
      	m_urlifier to pp_output_formatted_text.
      	* pretty-print.cc: Add #define of INCLUDE_VECTOR.
      	(obstack_append_string): New overload, taking a length.
      	(urlify_quoted_string): Pass in an obstack ptr, rather than using
      	that of the pp's buffer.  Generalize to handle trailing text in
      	the buffer beyond the run of quoted text.
      	(class quoting_info): New.
      	(on_begin_quote): New.
      	(on_end_quote): New.
      	(pp_format): Refactor phase 1 and phase 2 quoting support, moving
      	it to calls to on_begin_quote and on_end_quote.
      	(struct auto_obstack): New.
      	(quoting_info::handle_phase_3): New.
      	(pp_output_formatted_text): Add urlifier param.  Use it if there
      	is deferred urlification.  Delete m_quotes.
      	(selftest::pp_printf_with_urlifier): Pass urlifier to
      	pp_output_formatted_text.
      	(selftest::test_urlification): Update results for the existing
      	case of quoted text stradding chunks; add more such test cases.
      	* pretty-print.h (class quoting_info): New forward decl.
      	(chunk_info::m_quotes): New field.
      	(pp_output_formatted_text): Add optional urlifier param.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      5daf9104
    • David Malcolm's avatar
      pretty-print: add selftest coverage for numbered args · 7daa935c
      David Malcolm authored
      
      No functional change intended.
      
      gcc/ChangeLog:
      	* pretty-print.cc (selftest::test_pp_format): Add selftest
      	coverage for numbered args.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      7daa935c
    • Julian Brown's avatar
      OpenMP: Fix g++.dg/gomp/bad-array-section-10.C for C++23 and up · 6a3700f9
      Julian Brown authored
      This patch adjusts diagnostic output for C++23 and above for the test
      case mentioned in the commit title.
      
      2024-01-10  Julian Brown  <julian@codesourcery.com>
      
      gcc/testsuite/
      	* g++.dg/gomp/bad-array-section-10.C: Adjust diagnostics for C++23 and
      	up.
      6a3700f9
    • Julian Brown's avatar
      OpenMP: Fix new lvalue-parsing map/to/from tests for 32-bit targets · 3c52d799
      Julian Brown authored
      This patch fixes several tests introduced by the commit
      r14-7033-g1413af02d62182 for 32-bit targets.
      
      2024-01-10  Julian Brown  <julian@codesourcery.com>
      
      gcc/testsuite/
      	* g++.dg/gomp/array-section-1.C: Fix scan output for 32-bit target.
      	* g++.dg/gomp/array-section-2.C: Likewise.
      	* g++.dg/gomp/bad-array-section-4.C: Adjust error output for 32-bit
      	target.
      3c52d799
    • Tamar Christina's avatar
      middle-end: Fix dominators updates when peeling with multiple exits [PR113144] · 9e7c77c7
      Tamar Christina authored
      When we peel at_exit we are moving the new loop at the exit of the previous
      loop.  This means that the blocks outside the loop dat the previous loop used to
      dominate are no longer being dominated by it.
      
      The new dominators however are hard to predict since if the loop has multiple
      exits and all the exits are an "early" one then we always execute the scalar
      loop.  In this case the scalar loop can completely dominate the new loop.
      
      If we later have skip_vector then there's an additional skip edge added that
      might change the dominators.
      
      The previous patch would force an update of all blocks reachable from the new
      exits.  This one updates *only* blocks that we know the scalar exits dominated.
      
      For the examples this reduces the blocks to update from 18 to 3.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/113144
      	PR tree-optimization/113145
      	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
      	Update all BB that the original exits dominated.
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/113144
      	PR tree-optimization/113145
      	* gcc.dg/vect/vect-early-break_94-pr113144.c: New test.
      9e7c77c7
    • Jakub Jelinek's avatar
      testsuite: Fix PR number [PR113297] · d790565a
      Jakub Jelinek authored
      2024-01-10  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/113297
      	* gcc.dg/bitint-63.c: Fix PR number.
      d790565a
    • Jakub Jelinek's avatar
      libgomp: Fix up FLOCK fallback handling [PR113192] · 2fb3ee3e
      Jakub Jelinek authored
      My earlier change broke Solaris testing, because @FLOCK@ isn't substituted
      just into libgomp/Makefile where it worked, but also the
      testsuite/libgomp-site-extra.exp file where Make variables aren't present
      and can't be substituted.
      
      The following patch instead computes the absolute srcdir path and uses it
      for FLOCK.
      
      2024-01-10  Jakub Jelinek  <jakub@redhat.com>
      
      	PR libgomp/113192
      	* configure.ac (FLOCK): Use $libgomp_abs_srcdir/testsuite/flock
      	instead of \$(abs_top_srcdir)/testsuite/flock.
      	* configure: Regenerated.
      2fb3ee3e
    • Eric Botcazou's avatar
      Fix debug info for enumeration types with reverse Scalar_Storage_Order · 5d8b60ef
      Eric Botcazou authored
      This implements the support of DW_AT_endianity for enumeration types because
      they are scalar and therefore, reverse Scalar_Storage_Order is supported for
      them, but only when the -gstrict-dwarf switch is not passed because this is
      an extension.
      
      There is an associated GDB patch to be submitted to grok the new DWARF.
      
      gcc/
      	* dwarf2out.cc (modified_type_die): Extend the support of reverse
      	storage order to enumeration types if -gstrict-dwarf is not passed.
      	(gen_enumeration_type_die): Add REVERSE parameter and generate the
      	DIE immediately after the existing one if it is true.
      	(gen_tagged_type_die): Add REVERSE parameter and pass it in the
      	call to gen_enumeration_type_die.
      	(gen_type_die_with_usage): Add REVERSE parameter and pass it in the
      	first recursive call as well as the call to gen_tagged_type_die.
      	(gen_type_die): Add REVERSE parameter and pass it in the call to
      	gen_type_die_with_usage.
      5d8b60ef
    • chenxiaolong's avatar
      LoongArch: testsuite: Add loongarch support to slp-21.c. · 898c39ca
      chenxiaolong authored
      The function of this test is to check that the compiler supports vectorization
      using SLP and vec_{load/store/*}_lanes. However, vec_{load/store/*}_lanes are
      not supported on LoongArch, such as the corresponding "st4/ld4" directives on
      aarch64.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/slp-21.c: Add loongarch.
      898c39ca
    • chenxiaolong's avatar
      LoongArch: testsuite:Fixed a bug that added a target check error. · 41084f08
      chenxiaolong authored
      After the code is committed in r14-6948, GCC regression testing on some
      architectures will produce the following error:
      
      "error executing dg-final: unknown effective target keyword `loongarch*-*-*'"
      
      gcc/testsuite/ChangeLog:
      
      	* lib/target-supports.exp: Removed an issue with "target keyword"
      	checking errors on LoongArch architecture.
      41084f08
Loading