Commits · b79cd204c780ee27e240616ac07e8201b85aeb92 · COBOLworx / gcc-cobol

Jan 11, 2024

RISC-V: THEAD: Fix ICE caused by split optimizations for XTheadFMemIdx. · b79cd204

Jin Ma authored 1 year ago

Due to the premature split optimizations for XTheadFMemIdx, GPR
is allocated when reload allocates registers, resulting in the
following insn.

(insn 66 21 64 5 (set (reg:DF 14 a4 [orig:136 <retval> ] [136])
        (mem:DF (plus:SI (reg/f:SI 15 a5 [141])
                (ashift:SI (reg/v:SI 10 a0 [orig:137 i ] [137])
                    (const_int 3 [0x3]))) [0  S8 A64])) 218 {*movdf_hardfloat_rv32}
     (nil))

Since we currently do not support adjustments to th_m_mir/th_m_miu,
which will trigger ICE. So it is recommended to place the split
optimizations after reload to ensure FPR when registers are allocated.

gcc/ChangeLog:

	* config/riscv/thead.md: Add limits for splits.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/xtheadfmemidx-medany.c: New test.

b79cd204

expr: Limit the store flag optimization for single bit to non-vectors [PR113322] · a2be4e15

Andrew Pinski authored 1 year ago


The problem here is after the recent vectorizer improvements, we end up
with a comparison against a vector bool 0 which then tries expand_single_bit_test
which is not expecting vector comparisons at all.

The IR was:
  vector(4) <signed-boolean:1> mask_patt_5.13;
  _Bool _12;

  mask_patt_5.13_44 = vect_perm_even_41 != { 0.0, 1.0e+0, 2.0e+0, 3.0e+0 };
  _12 = mask_patt_5.13_44 == { 0, 0, 0, 0 };

and we tried to call expand_single_bit_test for the last comparison.
Rejecting the vector comparison is needed.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

	PR middle-end/113322

gcc/ChangeLog:

	* expr.cc (do_store_flag): Don't try single bit tests with
	comparison on vector types.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/compile/pr113322-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

a2be4e15

match: Delay folding of 1/x into `(x+1u)<2u?x:0` until late [PR113301] · 7f56a902

Andrew Pinski authored 1 year ago


Since currently ranger does not work with the complexity of COND_EXPR in
some cases so delaying the simplification of `1/x` for signed types
help code generation.
tree-ssa/divide-8.c is a new testcase where this can help.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

	PR tree-optimization/113301

gcc/ChangeLog:

	* match.pd (`1/x`): Delay signed case until late.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/divide-8.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

7f56a902

testsuite: remove xfail · 887e3a1c

Jason Merrill authored 1 year ago

These two lines have been getting XPASS since the test was added.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp23/explicit-obj-diagnostics7.C: Remove xfail.

887e3a1c

testsuite: remove -save-temps from many tests [PR113319] · da1e651e

Tamar Christina authored 1 year ago

This removes -save-temps from the tests I've introduced to fix the LTO
mismatches.

gcc/testsuite/ChangeLog:

	PR testsuite/113319
	* gcc.dg/bic-bitmask-13.c: Remove -save-temps.
	* gcc.dg/bic-bitmask-14.c: Likewise.
	* gcc.dg/bic-bitmask-15.c: Likewise.
	* gcc.dg/bic-bitmask-16.c: Likewise.
	* gcc.dg/bic-bitmask-17.c: Likewise.
	* gcc.dg/bic-bitmask-18.c: Likewise.
	* gcc.dg/bic-bitmask-19.c: Likewise.
	* gcc.dg/bic-bitmask-20.c: Likewise.
	* gcc.dg/bic-bitmask-21.c: Likewise.
	* gcc.dg/bic-bitmask-22.c: Likewise.
	* gcc.dg/bic-bitmask-7.c: Likewise.
	* gcc.dg/vect/vect-early-break-run_1.c: Likewise.
	* gcc.dg/vect/vect-early-break-run_10.c: Likewise.
	* gcc.dg/vect/vect-early-break-run_2.c: Likewise.
	* gcc.dg/vect/vect-early-break-run_3.c: Likewise.
	* gcc.dg/vect/vect-early-break-run_4.c: Likewise.
	* gcc.dg/vect/vect-early-break-run_5.c: Likewise.
	* gcc.dg/vect/vect-early-break-run_6.c: Likewise.
	* gcc.dg/vect/vect-early-break-run_7.c: Likewise.
	* gcc.dg/vect/vect-early-break-run_8.c: Likewise.
	* gcc.dg/vect/vect-early-break-run_9.c: Likewise.

da1e651e

tree-optimization/112505 - bit-precision induction vectorization · ec345df5

Richard Biener authored 1 year ago

Vectorization of bit-precision inductions isn't implemented but we
don't check this, instead we ICE during transform.

	PR tree-optimization/112505
	* tree-vect-loop.cc (vectorizable_induction): Reject
	bit-precision induction.

	* gcc.dg/vect/pr112505.c: New testcase.

ec345df5

tree-optimization/113126 - vector extension compare optimization · 897b95a1

Richard Biener authored 1 year ago

The following makes sure the resulting boolean type is the same
when eliding a float extension.

	PR tree-optimization/113126
	* match.pd ((double)float CMP (double)float -> float CMP float):
	Make sure the boolean type is the same.
	* fold-const.cc (fold_binary_loc): Likewise.

	* gcc.dg/torture/pr113126.c: New testcase.

897b95a1

tree-optimization/112636 - estimate niters before header copying · 05e8ef2a

Richard Biener authored 1 year ago

The following avoids a mismatch between an early query for maximum
number of iterations of a loop and a late one when through ranger
we'd get iterations estimated.  Instead make sure we compute niters
before querying the iteration bound.

	PR tree-optimization/112636
	* tree-ssa-loop-ch.cc (ch_base::copy_headers): Call
	estimate_numbers_of_iterations before querying
	get_max_loop_iterations_int.
	(pass_ch::execute): Initialize SCEV and loops appropriately.

	* gcc.dg/pr112636.c: New testcase.

05e8ef2a

LOOP-UNROLL: Leverage HAS_SIGNED_ZERO for var expansion · b89ef3d4

Pan Li authored 1 year ago


The insert_var_expansion_initialization depends on the
HONOR_SIGNED_ZEROS to initialize the unrolling variables
to +0.0f when -0.0f and no-signed-option.  Unfortunately,
we should always keep the -0.0f here because:

* The -0.0f is always the correct initial value.
* We need to support the target that always honor signed zero.

Thus, we need to leverage MODE_HAS_SIGNED_ZEROS when initialize
instead of HONOR_SIGNED_ZEROS.  Then the target/backend can
decide to honor the no-signed-zero or not.

We also removed the testcase pr30957-1.c, as it makes undefined behavior
whether the return value is positive or negative.

The below tests are passed for this patch:

* The riscv regression tests.
* The aarch64 regression tests.
* The x86 bootstrap and regression tests.

gcc/ChangeLog:

	* loop-unroll.cc (insert_var_expansion_initialization): Leverage
	MODE_HAS_SIGNED_ZEROS for expansion variable initialization.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr30957-1.c: Remove.

Signed-off-by: Pan Li <pan2.li@intel.com>

b89ef3d4

aarch64: Fix dwarf2cfi ICEs due to recent CFI note changes [PR113077] · 5400778f

Alex Coplan authored 1 year ago

In r14-6604-gd7ee988c491cde43d04fe25f2b3dbad9d85ded45 we changed the CFI notes
attached to callee saves (in aarch64_save_callee_saves).  That patch changed
the ldp/stp representation to use unspecs instead of PARALLEL moves.  This meant
that we needed to attach CFI notes to all frame-related pair saves such that
dwarf2cfi could still emit the appropriate CFI (it cannot interpret the unspecs
directly).  The patch also attached REG_CFA_OFFSET notes to individual saves so
that the ldp/stp pass could easily preserve them when forming stps.

In that change I chose to use REG_CFA_OFFSET, but as the PR shows, that
choice was problematic in that REG_CFA_OFFSET requires the attached
store to be expressed in terms of the current CFA register at all times.
This means that even scheduling of frame-related insns can break this
invariant, leading to ICEs in dwarf2cfi.

The old behaviour (before that change) allowed dwarf2cfi to interpret the RTL
directly for sp-relative saves.  This change restores that behaviour by using
REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.  REG_FRAME_RELATED_EXPR
effectively just gives a different pattern for dwarf2cfi to look at instead of
the main insn pattern.  That allows us to attach the old-style PARALLEL move
representation in a REG_FRAME_RELATED_EXPR note and means we are free to always
express the save addresses in terms of the stack pointer.

Since the ldp/stp fusion pass can combine frame-related stores, this patch also
updates it to preserve REG_FRAME_RELATED_EXPR notes, and additionally gives it
the ability to synthesize those notes when combining sp-relative saves into an
stp (the latter always needs a note due to the unspec representation, the former
does not).

gcc/ChangeLog:

	PR target/113077
	* config/aarch64/aarch64-ldp-fusion.cc (filter_notes): Add
	fr_expr param to extract REG_FRAME_RELATED_EXPR notes.
	(combine_reg_notes): Handle REG_FRAME_RELATED_EXPR notes, and
	synthesize these if needed.  Update caller ...
	(ldp_bb_info::fuse_pair): ... here.
	(ldp_bb_info::try_fuse_pair): Punt if either insn has writeback
	and either insn is frame-related.
	(find_trailing_add): Punt on frame-related insns.
	* config/aarch64/aarch64.cc (aarch64_save_callee_saves): Use
	REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.

gcc/testsuite/ChangeLog:

	PR target/113077
	* gcc.target/aarch64/pr113077.c: New test.

5400778f

tree-optimization/111003 - new testcase · 96fb3908

Richard Biener authored 1 year ago

Testcase for fixed PR.

	PR tree-optimization/111003
gcc/testsuite/
	* gcc.dg/tree-ssa/pr111003.c: New testcase.

96fb3908

middle-end/112740 - vector boolean CTOR expansion issue · e1f2d58a

Richard Biener authored 1 year ago

The optimization to expand uniform boolean vectors by sign-extension
works only for dense masks but it failed to check that.

	PR middle-end/112740
	* expr.cc (store_constructor): Check the integer vector
	mask has a single bit per element before using sign-extension
	to expand an uniform vector.

	* gcc.dg/pr112740.c: New testcase.

e1f2d58a

RISC-V: VLA preempts VLS on unknown NITERS loop · 1a51886a

Juzhe-Zhong authored 1 year ago

This patch fixes the known issues on SLP cases:

	ble	a2,zero,.L11
	addiw	t1,a2,-1
	li	a5,15
	bleu	t1,a5,.L9
	srliw	a7,t1,4
	slli	a7,a7,7
	lui	t3,%hi(.LANCHOR0)
	lui	a6,%hi(.LANCHOR0+128)
	addi	t3,t3,%lo(.LANCHOR0)
	li	a4,128
	addi	a6,a6,%lo(.LANCHOR0+128)
	add	a7,a7,a0
	addi	a3,a1,37
	mv	a5,a0
	vsetvli	zero,a4,e8,m8,ta,ma
	vle8.v	v24,0(t3)
	vle8.v	v16,0(a6)
.L4:
	li	a6,128
	vle8.v	v0,0(a3)
	vrgather.vv	v8,v0,v24
	vadd.vv	v8,v8,v16
	vse8.v	v8,0(a5)
	add	a5,a5,a6
	add	a3,a3,a6
	bne	a5,a7,.L4
	andi	a5,t1,-16
	mv	t1,a5
.L3:
	subw	a2,a2,a5
	li	a4,1
	beq	a2,a4,.L5
	slli	a5,a5,32
	srli	a5,a5,32
	addiw	a2,a2,-1
	slli	a5,a5,3
	csrr	a4,vlenb
	slli	a6,a2,32
	addi	t3,a5,37
	srli	a3,a6,29
	slli	a4,a4,2
	add	t3,a1,t3
	add	a5,a0,a5
	mv	t5,a3
	bgtu	a3,a4,.L14
.L6:
	li	a4,50790400
	addi	a4,a4,1541
	li	a6,67633152
	addi	a6,a6,513
	slli	a4,a4,32
	add	a4,a4,a6
	vsetvli	t4,zero,e64,m4,ta,ma
	vmv.v.x	v16,a4
	vsetvli	a6,zero,e16,m8,ta,ma
	vid.v	v8
	vsetvli	zero,t5,e8,m4,ta,ma
	vle8.v	v20,0(t3)
	vsetvli	a6,zero,e16,m8,ta,ma
	csrr	a7,vlenb
	vand.vi	v8,v8,-8
	vsetvli	zero,zero,e8,m4,ta,ma
	slli	a4,a7,2
	vrgatherei16.vv	v4,v20,v8
	vadd.vv	v4,v4,v16
	vsetvli	zero,t5,e8,m4,ta,ma
	vse8.v	v4,0(a5)
	bgtu	a3,a4,.L15
.L7:
	addw	t1,a2,t1
.L5:
	slliw	a5,t1,3
	add	a1,a1,a5
	lui	a4,%hi(.LC2)
	add	a0,a0,a5
	lbu	a3,37(a1)
	addi	a5,a4,%lo(.LC2)
	vsetivli	zero,8,e8,mf2,ta,ma
	vmv.v.x	v1,a3
	vle8.v	v2,0(a5)
	vadd.vv	v1,v1,v2
	vse8.v	v1,0(a0)
.L11:
	ret
.L15:
	sub	a3,a3,a4
	bleu	a3,a4,.L8
	mv	a3,a4
.L8:
	li	a7,50790400
	csrr	a4,vlenb
	slli	a4,a4,2
	addi	a7,a7,1541
	li	t4,67633152
	add	t3,t3,a4
	vsetvli	zero,a3,e8,m4,ta,ma
	slli	a7,a7,32
	addi	t4,t4,513
	vle8.v	v20,0(t3)
	add	a4,a5,a4
	add	a7,a7,t4
	vsetvli	a5,zero,e64,m4,ta,ma
	vmv.v.x	v16,a7
	vsetvli	a6,zero,e16,m8,ta,ma
	vid.v	v8
	vand.vi	v8,v8,-8
	vsetvli	zero,zero,e8,m4,ta,ma
	vrgatherei16.vv	v4,v20,v8
	vadd.vv	v4,v4,v16
	vsetvli	zero,a3,e8,m4,ta,ma
	vse8.v	v4,0(a4)
	j	.L7
.L14:
	mv	t5,a4
	j	.L6
.L9:
	li	a5,0
	li	t1,0
	j	.L3

The vectorization codegen is quite inefficient since we choose a VLS modes to vectorize the loop body
with epilogue choosing a VLA modes.

cost.c:6:21: note:  ***** Choosing vector mode V128QI
cost.c:6:21: note:  ***** Choosing epilogue vector mode RVVM4QI

As we known, in RVV side, we have VLA modes and VLS modes. VLAmodes support partial vectors wheras
VLSmodes support full vectors.  The goal we add VLSmodes is to improve the codegen of known NITERS
or SLP codes.

If NITERS is unknown, that is i < n, n is unknown. We will always have partial vectors vectorization.
It can be loop body or epilogue. In this case, It's always more efficient to apply VLA partial vectorization
on loop body which doesn't have epilogue.

After this patch:

f:
	ble	a2,zero,.L7
	li	a5,1
	beq	a2,a5,.L5
	li	a6,50790400
	addi	a6,a6,1541
	li	a4,67633152
	addi	a4,a4,513
	csrr	a5,vlenb
	addiw	a2,a2,-1
	slli	a6,a6,32
	add	a6,a6,a4
	slli	a5,a5,2
	slli	a4,a2,32
	vsetvli	t1,zero,e64,m4,ta,ma
	srli	a3,a4,29
	neg	t4,a5
	addi	a7,a1,37
	mv	a4,a0
	vmv.v.x	v12,a6
	vsetvli	t3,zero,e16,m8,ta,ma
	vid.v	v16
	vand.vi	v16,v16,-8
.L4:
	minu	a6,a3,a5
	vsetvli	zero,a6,e8,m4,ta,ma
	vle8.v	v8,0(a7)
	vsetvli	t3,zero,e8,m4,ta,ma
	mv	t1,a3
	vrgatherei16.vv	v4,v8,v16
	vsetvli	zero,a6,e8,m4,ta,ma
	vadd.vv	v4,v4,v12
	vse8.v	v4,0(a4)
	add	a7,a7,a5
	add	a4,a4,a5
	add	a3,a3,t4
	bgtu	t1,a5,.L4
.L3:
	slliw	a2,a2,3
	add	a1,a1,a2
	lui	a5,%hi(.LC0)
	lbu	a4,37(a1)
	add	a0,a0,a2
	addi	a5,a5,%lo(.LC0)
	vsetivli	zero,8,e8,mf2,ta,ma
	vmv.v.x	v1,a4
	vle8.v	v2,0(a5)
	vadd.vv	v1,v1,v2
	vse8.v	v1,0(a0)
.L7:
	ret

Tested on both RV32 and RV64 no regression. Ok for trunk ?

gcc/ChangeLog:

	* config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): VLA
	preempt VLS on unknown NITERS loop.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Remove xfail.
	* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
	* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
	* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.

1a51886a

LoongArch: Optimized some of the symbolic expansion instructions generated... · b4deb244

Lulu Cheng authored 1 year ago

LoongArch: Optimized some of the symbolic expansion instructions generated during bitwise operations.

There are two mode iterators defined in the loongarch.md:
	(define_mode_iterator GPR [SI (DI "TARGET_64BIT")])
  and
	(define_mode_iterator X [(SI "!TARGET_64BIT") (DI "TARGET_64BIT")])
Replace the mode in the bit arithmetic from GPR to X.

Since the bitwise operation instruction does not distinguish between 64-bit,
32-bit, etc., it is necessary to perform symbolic expansion if the bitwise
operation is less than 64 bits.
The original definition would have generated a lot of redundant symbolic
extension instructions. This problem is optimized with reference to the
implementation of RISCV.

Add this patch spec2017 500.perlbench performance improvement by 1.8%

gcc/ChangeLog:

	* config/loongarch/loongarch.md (one_cmpl<mode>2): Replace GPR with X.
	(*nor<mode>3): Likewise.
	(nor<mode>3): Likewise.
	(*negsi2_extended): New template.
	(*<optab>si3_internal): Likewise.
	(*one_cmplsi2_internal): Likewise.
	(*norsi3_internal): Likewise.
	(*<optab>nsi_internal): Likewise.
	(bytepick_w_<bytepick_imm>_extend): Modify this template according to the
	modified bit operation to make the optimization work.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/sign-extend-bitwise.c: New test.

b4deb244

Optimize A < B ? A : B to MIN_EXPR. · 6686e16f

liuhongt authored 1 year ago

Similar for A < B ? B : A to MAX_EXPR.
There're codes in the frontend to optimize such pattern but failed to
handle testcase in the PR since it's exposed at gimple level when
folding backend builtins.

pr95906 now can be optimized to MAX_EXPR as it's commented in the
testcase.

// FIXME: this should further optimize to a MAX_EXPR
 typedef signed char v16i8 __attribute__((vector_size(16)));
 v16i8 f(v16i8 a, v16i8 b)

gcc/ChangeLog:

	PR target/104401
	* match.pd (VEC_COND_EXPR: A < B ? A : B -> MIN_EXPR): New patten match.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr104401.c: New test.
	* gcc.dg/tree-ssa/pr95906.c: Adjust testcase.

6686e16f

PR modula2/112946 set expression type checking · 96a9355a

Gaius Mulley authored 1 year ago


This patch adds type checking for binary set operators.
It also checks the IN operator and improves the := type checking.

gcc/m2/ChangeLog:

	PR modula2/112946
	* gm2-compiler/M2GenGCC.mod (IsExpressionCompatible): Import.
	(ExpressionTypeCompatible): Import.
	(CodeStatement): Remove op1, op2, op3 parameters from CodeSetOr,
	CodeSetAnd, CodeSetSymmetricDifference, CodeSetLogicalDifference.
	(checkArrayElements): Rename op1 to des and op3 to expr.
	Use despos and exprpos instead of CurrentQuadToken.
	(checkRecordTypes): Rename op1 to des and op2 to expr.
	Use virtpos instead of CurrentQuadToken.
	(checkIncorrectMeta): Ditto.
	(checkBecomes): Rename op1 to des and op3 to expr.
	Use virtpos instead of CurrentQuadToken.
	(NoWalkProcedure): New procedure stub.
	(CheckBinaryExpressionTypes): New procedure function.
	(CheckElementSetTypes): New procedure function.
	(CodeBinarySet): Re-write.
	(FoldBinarySet): Re-write.
	(CodeSetOr): Remove parameters op1, op2 and op3.
	(CodeSetAnd): Ditto.
	(CodeSetLogicalDifference): Ditto.
	(CodeSetSymmetricDifference): Ditto.
	(CodeIfIn): Call CheckBinaryExpressionTypes and
	CheckElementSetTypes.
	* gm2-compiler/M2Quads.mod (BuildRotateFunction): Correct
	parameters to MakeVirtualTok to reflect parameter block
	passed to Rotate.

gcc/testsuite/ChangeLog:

	PR modula2/112946
	* gm2/pim/fail/badbecomes.mod: New test.
	* gm2/pim/fail/badexpression.mod: New test.
	* gm2/pim/fail/badexpression2.mod: New test.
	* gm2/pim/fail/badifin.mod: New test.
	* gm2/pim/pass/goodifin.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

96a9355a

Daily bump. · 45af8962
GCC Administrator authored 1 year ago

45af8962

Jan 10, 2024

RISC-V: Minor tweak dynamic cost model · 2aa83f0a

Juzhe-Zhong authored 1 year ago

v2 update: Robostify tests.

While working on cost model, I notice one case that dynamic lmul cost doesn't work well.

Before this patch:

foo:
        lui     a4,%hi(.LANCHOR0)
        li      a0,1953
        li      a1,63
        addi    a4,a4,%lo(.LANCHOR0)
        li      a3,64
        vsetvli a2,zero,e32,mf2,ta,ma
        vmv.v.x v5,a0
        vmv.v.x v4,a1
        vid.v   v3
.L2:
        vsetvli a5,a3,e32,mf2,ta,ma
        vadd.vi v2,v3,1
        vadd.vv v1,v3,v5
        mv      a2,a5
        vmacc.vv        v1,v2,v4
        slli    a1,a5,2
        vse32.v v1,0(a4)
        sub     a3,a3,a5
        add     a4,a4,a1
        vsetvli a5,zero,e32,mf2,ta,ma
        vmv.v.x v1,a2
        vadd.vv v3,v3,v1
        bne     a3,zero,.L2
        li      a0,0
        ret

Unexpected: Use scalable vector and LMUL = MF2 which is wasting computation resources.

Ideally, we should use LMUL = M8 VLS modes.

The root cause is the dynamic LMUL heuristic dominates the VLS heuristic.
Adapt the cost model heuristic.

After this patch:

foo:
	lui	a4,%hi(.LANCHOR0)
	addi	a4,a4,%lo(.LANCHOR0)
	li	a3,4096
	li	a5,32
	li	a1,2016
	addi	a2,a4,128
	addiw	a3,a3,-32
	vsetvli	zero,a5,e32,m8,ta,ma
	li	a0,0
	vid.v	v8
	vsll.vi	v8,v8,6
	vadd.vx	v16,v8,a1
	vadd.vx	v8,v8,a3
	vse32.v	v16,0(a4)
	vse32.v	v8,0(a2)
	ret

Tested on both RV32/RV64 no regression.

Ok for trunk ?

gcc/ChangeLog:

	* config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): Minior tweak.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Fix test.
	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.

2aa83f0a

libgccjit: Fix GGC segfault when using -flto · 8415bcee

Antoni Boucher authored 1 year ago

gcc/ChangeLog:
	PR jit/111396
	* ipa-fnsummary.cc (ipa_fnsummary_cc_finalize): Call
	ipa_free_size_summary.
	* ipa-icf.cc (ipa_icf_cc_finalize): New function.
	* ipa-profile.cc (ipa_profile_cc_finalize): New function.
	* ipa-prop.cc (ipa_prop_cc_finalize): New function.
	* ipa-prop.h (ipa_prop_cc_finalize): New function.
	* ipa-sra.cc (ipa_sra_cc_finalize): New function.
	* ipa-utils.h (ipa_profile_cc_finalize, ipa_icf_cc_finalize,
	ipa_sra_cc_finalize): New functions.
	* toplev.cc (toplev::finalize): Call ipa_icf_cc_finalize,
	ipa_prop_cc_finalize, ipa_profile_cc_finalize and
	ipa_sra_cc_finalize
	Include ipa-utils.h.

gcc/testsuite/ChangeLog:
	PR jit/111396
	* jit.dg/all-non-failing-tests.h: Add note about test-ggc-bugfix.
	* jit.dg/test-ggc-bugfix.c: New test.

8415bcee

RISC-V: T-HEAD: Add support for the XTheadInt ISA extension · 52e809d5

Jin Ma authored 1 year ago

The XTheadInt ISA extension provides the following instructions
to accelerate interrupt processing:
* th.ipush
* th.ipop

Ref:
https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf

gcc/ChangeLog:

	* config/riscv/riscv-protos.h (th_int_get_mask): New prototype.
	(th_int_get_save_adjustment): Likewise.
	(th_int_adjust_cfi_prologue): Likewise.
	* config/riscv/riscv.cc (BITSET_P): Moved away from here.
	(TH_INT_INTERRUPT): New macro.
	(riscv_expand_prologue): Add the processing of XTheadInt.
	(riscv_expand_epilogue): Likewise.
	* config/riscv/riscv.h (BITSET_P): Moved to here.
	* config/riscv/riscv.md: New unspec.
	* config/riscv/thead.cc (th_int_get_mask): New function.
	(th_int_get_save_adjustment): Likewise.
	(th_int_adjust_cfi_prologue): Likewise.
	* config/riscv/thead.md (th_int_push): New pattern.
	(th_int_pop): new pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/xtheadint-push-pop.c: New test.

52e809d5

middle-end: Don't apply copysign optimization if target does not implement optab [PR112468] · 7cbe41d3

Tamar Christina authored 1 year ago

Currently GCC does not treat IFN_COPYSIGN the same as the copysign tree expr.
The latter has a libcall fallback and the IFN can only do optabs.

Because of this the change I made to optimize copysign only works if the
target has impemented the optab, but it should work for those that have the
libcall too.

More annoyingly if a target has vector versions of ABS and NEG but not COPYSIGN
then the change made them lose vectorization.

The proper fix for this is to treat the IFN the same as the tree EXPR and to
enhance expand_COPYSIGN to also support vector calls.

I have such a patch for GCC 15 but it's quite big and too invasive for stage-4.
As such this is a minimal fix, just don't apply the transformation and leave
targets which don't have the optab unoptimized.

Targets list for check_effective_target_ifn_copysign was gotten by grepping for
copysign and looking at the optab.

gcc/ChangeLog:

	PR tree-optimization/112468
	* doc/sourcebuild.texi: Document ifn_copysign.
	* match.pd: Only apply transformation if target supports the IFN.

gcc/testsuite/ChangeLog:

	PR tree-optimization/112468
	* gcc.dg/fold-copysign-1.c: Modify tests based on if target supports
	IFN_COPYSIGN.
	* gcc.dg/pr55152-2.c: Likewise.
	* gcc.dg/tree-ssa/abs-4.c: Likewise.
	* gcc.dg/tree-ssa/backprop-6.c: Likewise.
	* gcc.dg/tree-ssa/copy-sign-2.c: Likewise.
	* gcc.dg/tree-ssa/mult-abs-2.c: Likewise.
	* lib/target-supports.exp (check_effective_target_ifn_copysign): New.

7cbe41d3

reassoc vs uninitialized variable [PR112581] · 113475d0

Andrew Pinski authored 1 year ago


Like r14-2293-g11350734240dba and r14-2289-gb083203f053f16,
reassociation can combine across a few bb and one of the usage
can be an uninitializated variable and if going from an conditional
usage to an unconditional usage can cause wrong code.
This uses maybe_undef_p like other passes where this can happen.

Note if-to-switch uses the function (init_range_entry) provided
by ressociation so we need to call mark_ssa_maybe_undefs there;
otherwise we assume almost all ssa names are uninitialized.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

	PR tree-optimization/112581
	* gimple-if-to-switch.cc (pass_if_to_switch::execute): Call
	mark_ssa_maybe_undefs.
	* tree-ssa-reassoc.cc (can_reassociate_op_p): Uninitialized
	variables can not be reassociated.
	(init_range_entry): Check for uninitialized variables too.
	(init_reassoc): Call mark_ssa_maybe_undefs.

gcc/testsuite/ChangeLog:

	PR tree-optimization/112581
	* gcc.c-torture/execute/pr112581-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

113475d0

RISC-V/testsuite: Fix comment termination in pr105314.c · 3353e7d2

Maciej W. Rozycki authored 1 year ago

Add terminating `/' character missing from one of the test harness
command clauses in pr105314.c.  This causes no issue with compilation
owing to another comment immediately following, but would cause a:

pr105314.c:3:1: warning: "/*" within comment [-Wcomment]

message if warnings were enabled.

	gcc/testsuite/
	* gcc.target/riscv/pr105314.c: Fix comment termination.

3353e7d2

RISC-V: Also handle sign extension in branch costing · 6c3365e7

Maciej W. Rozycki authored 1 year ago

Complement commit c1e8cb3d ("RISC-V: Rework branch costing model for
if-conversion") and also handle extraneous sign extend operations that
are sometimes produced by `noce_try_cmove_arith' instead of zero extend
operations, making branch costing consistent.  It is unclear what the
condition is for the middle end to choose between the zero extend and
sign extend operation, but the test case included uses sign extension
with 64-bit targets, preventing if-conversion from triggering across all
the architectural variants.

There are further anomalies revealed by the test case, specifically the
exceedingly high branch cost of 6 required for the `-mmovcc' variant
despite that the final branchless sequence only uses 4 instructions, the
missed conversion at -O1 for 32-bit targets even though code is machine
word size agnostic, and the missed conversion at -Os and -Oz for 32-bit
Zicond targets even though the branchless sequence would be shorter than
the branched one.  These will have to be handled separately.

	gcc/
	* config/riscv/riscv.cc (riscv_noce_conversion_profitable_p):
	Also handle sign extension.

	gcc/testsuite/
	* gcc.target/riscv/cset-sext-sfb.c: New test.
	* gcc.target/riscv/cset-sext-thead.c: New test.
	* gcc.target/riscv/cset-sext-ventana.c: New test.
	* gcc.target/riscv/cset-sext-zicond.c: New test.
	* gcc.target/riscv/cset-sext.c: New test.

6c3365e7

testsuite: Add testcase for already fixed PR [PR112734] · ac6bcce1

Jakub Jelinek authored 1 year ago

This test was already fixed by r14-6051 aka PR112770 fix.

2024-01-10  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/112734
	* gcc.dg/bitint-64.c: New test.

ac6bcce1

middle-end: correctly identify the edge taken when condition is true. [PR113287] · 91fd5c94

Tamar Christina authored 1 year ago

The vectorizer needs to know during early break vectorization whether the edge
that will be taken if the condition is true stays or leaves the loop.

This is because the code assumes that if you take the true branch you exit the
loop.  If you don't exit the loop it has to generate a different condition.

Basically it uses this information to decide whether it's generating a
"any element" or an "all element" check.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues with --enable-lto --with-build-config=bootstrap-O3
--enable-checking=release,yes,rtl,extra.

gcc/ChangeLog:

	PR tree-optimization/113287
	* tree-vect-stmts.cc (vectorizable_early_exit): Check the flags on edge
	instead of using BRANCH_EDGE to determine true edge.

gcc/testsuite/ChangeLog:

	PR tree-optimization/113287
	* gcc.dg/vect/vect-early-break_100-pr113287.c: New test.
	* gcc.dg/vect/vect-early-break_99-pr113287.c: New test.

91fd5c94

tree-optimization/113078 - conditional subtraction reduction vectorization · cac9d2d2

Richard Biener authored 1 year ago

When if-conversion was changed to use .COND_ADD/SUB for conditional
reduction it was forgotten to update reduction path handling to
canonicalize .COND_SUB to .COND_ADD for vectorizable_reduction
similar to what we do for MINUS_EXPR.  The following adds this
and testcases exercising this at runtime and looking for the
appropriate masked subtraction in the vectorized code on x86.

	PR tree-optimization/113078
	* tree-vect-loop.cc (check_reduction_path): Canonicalize
	.COND_SUB to .COND_ADD.

	* gcc.dg/vect/vect-reduc-cond-sub.c: New testcase.
	* gcc.target/i386/vect-pr113078.c: Likewise.

cac9d2d2

OpenMP: Fix g++.dg/gomp/bad-array-section-10.C for C++23 and up · 6a3700f9

Julian Brown authored 1 year ago

This patch adjusts diagnostic output for C++23 and above for the test
case mentioned in the commit title.

2024-01-10  Julian Brown  <julian@codesourcery.com>

gcc/testsuite/
	* g++.dg/gomp/bad-array-section-10.C: Adjust diagnostics for C++23 and
	up.

6a3700f9

OpenMP: Fix new lvalue-parsing map/to/from tests for 32-bit targets · 3c52d799

Julian Brown authored 1 year ago

This patch fixes several tests introduced by the commit
r14-7033-g1413af02d62182 for 32-bit targets.

2024-01-10  Julian Brown  <julian@codesourcery.com>

gcc/testsuite/
	* g++.dg/gomp/array-section-1.C: Fix scan output for 32-bit target.
	* g++.dg/gomp/array-section-2.C: Likewise.
	* g++.dg/gomp/bad-array-section-4.C: Adjust error output for 32-bit
	target.

3c52d799

middle-end: Fix dominators updates when peeling with multiple exits [PR113144] · 9e7c77c7

Tamar Christina authored 1 year ago

When we peel at_exit we are moving the new loop at the exit of the previous
loop.  This means that the blocks outside the loop dat the previous loop used to
dominate are no longer being dominated by it.

The new dominators however are hard to predict since if the loop has multiple
exits and all the exits are an "early" one then we always execute the scalar
loop.  In this case the scalar loop can completely dominate the new loop.

If we later have skip_vector then there's an additional skip edge added that
might change the dominators.

The previous patch would force an update of all blocks reachable from the new
exits.  This one updates *only* blocks that we know the scalar exits dominated.

For the examples this reduces the blocks to update from 18 to 3.

gcc/ChangeLog:

	PR tree-optimization/113144
	PR tree-optimization/113145
	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
	Update all BB that the original exits dominated.

gcc/testsuite/ChangeLog:

	PR tree-optimization/113144
	PR tree-optimization/113145
	* gcc.dg/vect/vect-early-break_94-pr113144.c: New test.

9e7c77c7

testsuite: Fix PR number [PR113297] · d790565a

Jakub Jelinek authored 1 year ago

2024-01-10  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/113297
	* gcc.dg/bitint-63.c: Fix PR number.

d790565a

LoongArch: testsuite: Add loongarch support to slp-21.c. · 898c39ca

chenxiaolong authored 1 year ago

The function of this test is to check that the compiler supports vectorization
using SLP and vec_{load/store/*}_lanes. However, vec_{load/store/*}_lanes are
not supported on LoongArch, such as the corresponding "st4/ld4" directives on
aarch64.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-21.c: Add loongarch.

898c39ca

LoongArch: testsuite:Fixed a bug that added a target check error. · 41084f08

chenxiaolong authored 1 year ago

After the code is committed in r14-6948, GCC regression testing on some
architectures will produce the following error:

"error executing dg-final: unknown effective target keyword `loongarch*-*-*'"

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp: Removed an issue with "target keyword"
	checking errors on LoongArch architecture.

41084f08

sra: Partial fix for BITINT_TYPEs [PR113120] · 2611cdc3

Jakub Jelinek authored 1 year ago

As changed in other parts of the compiler, using
build_nonstandard_integer_type is not appropriate for arbitrary precisions,
especially if the precision comes from a BITINT_TYPE or something based on
that, build_nonstandard_integer_type relies on some integral mode being
supported that can support the precision.

The following patch uses build_bitint_type instead for BITINT_TYPE
precisions.

Note, it would be good if we were able to punt on the optimization
(but this code doesn't seem to be able to punt, so it needs to be done
somewhere earlier) at least in cases where building it would be invalid.
E.g. right now BITINT_TYPE can support precisions up to 65535 (inclusive),
but 65536 will not work anymore (we can't have > 16-bit TYPE_PRECISION).
I've tried to replace 513 with 65532 in the testcase and it didn't ICE,
so maybe it ran into some other SRA limit.

2024-01-10  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/113120
	* tree-sra.cc (analyze_access_subtree): For BITINT_TYPE
	with root->size TYPE_PRECISION don't build anything new.
	Otherwise, if root->type is a BITINT_TYPE, use build_bitint_type
	rather than build_nonstandard_integer_type.

	* gcc.dg/bitint-63.c: New test.

2611cdc3

RISC-V: Refine unsigned avg_floor/avg_ceil · 0141ee79

Juzhe-Zhong authored 1 year ago

This patch is inspired by LLVM patches:
https://github.com/llvm/llvm-project/pull/76550
https://github.com/llvm/llvm-project/pull/77473

Use vaaddu for AVG vectorization.

Before this patch:

        vsetivli        zero,8,e8,mf2,ta,ma
        vle8.v  v3,0(a1)
        vle8.v  v2,0(a2)
        vwaddu.vv        v1,v3,v2
        vsetvli zero,zero,e16,m1,ta,ma
        vadd.vi v1,v1,1
        vsetvli zero,zero,e8,mf2,ta,ma
        vnsrl.wi        v1,v1,1
        vse8.v  v1,0(a0)
        ret

After this patch:

	vsetivli	zero,8,e8,mf2,ta,ma
	csrwi	vxrm,0
	vle8.v	v1,0(a1)
	vle8.v	v2,0(a2)
	vaaddu.vv	v1,v1,v2
	vse8.v	v1,0(a0)
	ret

Note on signed averaging addition

Based on the rvv spec, there is also a variant for signed averaging addition called vaadd.
But AFAIU, no matter in which rounding mode, we cannot achieve the semantic of signed averaging addition through vaadd.
Thus this patch only introduces vaaddu.

More details in:
https://github.com/riscv/riscv-v-spec/issues/935
https://github.com/riscv/riscv-v-spec/issues/934

Tested on both RV32 and RV64 no regression.

Ok for trunk ?

gcc/ChangeLog:

	* config/riscv/autovec.md (<u>avg<v_double_trunc>3_floor): Remove.
	(avg<v_double_trunc>3_floor): New pattern.
	(<u>avg<v_double_trunc>3_ceil): Remove.
	(avg<v_double_trunc>3_ceil): New pattern.
	(uavg<mode>3_floor): Ditto.
	(uavg<mode>3_ceil): Ditto.
	* config/riscv/riscv-protos.h (enum insn_flags): Add for average addition.
	(enum insn_type): Ditto.
	* config/riscv/riscv-v.cc: Ditto.
	* config/riscv/vector-iterators.md (ashiftrt): Remove.
	(ASHIFTRT): Ditto.
	* config/riscv/vector.md: Add VLS modes.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls/avg-1.c: Adapt test.
	* gcc.target/riscv/rvv/autovec/vls/avg-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/avg-3.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/avg-4.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/avg-5.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/avg-6.c: Ditto.
	* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: Ditto.
	* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: Ditto.

0141ee79

testsuite, rs6000: Adjust pcrel-sibcall-1.c with noipa [PR112751] · 57792c33

Kewen Lin authored 1 year ago

As PR112751 shows, commit r14-5628 caused pcrel-sibcall-1.c
to fail as it enables ipa-vrp which makes return values of
functions {x,y,xx} as known and propagated.  This patch is
to adjust it with noipa to make it not fragile.

	PR testsuite/112751

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pcrel-sibcall-1.c: Replace noinline as noipa.

57792c33

RISC-V: Robostify dynamic lmul test · cf4dcf83

Juzhe-Zhong authored 1 year ago

While working on refining the cost model, I notice this test will generate unexpected
scalar xor instructions if we don't tune cost model carefully.

Add more assembler to avoid future regression.

Committed.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add assembler-not check.

cf4dcf83

Daily bump. · 73ce73fc
GCC Administrator authored 1 year ago

73ce73fc

Jan 09, 2024

c++: adjust accessor fixits for explicit object parm · ae3003b2

Jason Merrill authored 1 year ago

In a couple of places in the xobj patch I noticed that is_this_parameter
probably wanted to change to is_object_parameter; this implements that and
does the additional adjustments needed to make the accessor fixits handle
xobj parms.

gcc/cp/ChangeLog:

	* semantics.cc (is_object_parameter): New.
	* cp-tree.h (is_object_parameter): Declare.
	* call.cc (maybe_warn_class_memaccess): Use it.
	* search.cc (field_access_p): Use it.
	(class_of_object_parm): New.
	(field_accessor_p): Adjust for explicit object parms.

gcc/testsuite/ChangeLog:

	* g++.dg/torture/accessor-fixits-9-xobj.C: New test.

ae3003b2

c++: P0847R7 (deducing this) - CWG2586 [PR102609] · bfad006b

waffl3x authored 1 year ago


This adds support for defaulted comparison operators and copy/move
assignment operators, as well as allowing user defined xobj copy/move
assignment operators. It turns out defaulted comparison operators already
worked though, so this just adds a test for them. Defaulted comparison
operators were not so nice and required a bit of a hack. Should work fine
though!

The diagnostics leave something to be desired, and there are some things
that could be improved with more extensive design changes. There are a few
notes left indicating where I think we could make improvements.

Aside from some small bugs, with this commit xobj member functions should be
feature complete.

	PR c++/102609

gcc/cp/ChangeLog:

	PR c++/102609
	C++23 P0847R7 (deducing this) - CWG2586.
	* decl.cc (copy_fn_p): Accept xobj copy assignment functions.
	(move_signature_fn_p): Accept xobj move assignment functions.
	* method.cc (do_build_copy_assign): Handle defaulted xobj member
	functions.
	(defaulted_late_check): Comment.
	(defaultable_fn_check): Comment.

gcc/testsuite/ChangeLog:

	PR c++/102609
	C++23 P0847R7 (deducing this) - CWG2586.
	* g++.dg/cpp23/explicit-obj-basic6.C: New test.
	* g++.dg/cpp23/explicit-obj-default1.C: New test.
	* g++.dg/cpp23/explicit-obj-default2.C: New test.

Signed-off-by: Waffl3x <waffl3x@protonmail.com>

bfad006b