Commits · e3b8480f94ee15e7fffc662d40270ee57c6ad82b · COBOLworx / gcc-cobol

Jul 07, 2024

doc: Remove dubious example around bug reporting · a28046e2
Gerald Pfeifer authored 8 months ago
```
gcc:
	* doc/bugreport.texi (Bug Criteria): Remove dubious example.
```
a28046e2

c++: Simplify uses of LAMBDA_EXPR_EXTRA_SCOPE · 24cb586c

Nathaniel Shead authored 9 months ago


I noticed there already exists a getter to get the scope of a lambda
from its type directly rather than needing to go via
CLASSTYPE_LAMBDA_EXPR, we may as well use it.

gcc/cp/ChangeLog:

	* module.cc (trees_out::get_merge_kind): Use
	LAMBDA_TYPE_EXTRA_SCOPE instead of LAMBDA_EXPR_EXTRA_SCOPE.
	(trees_out::key_mergeable): Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

24cb586c

ada: Make the names of uninstalled cross-gnattools consistent across builds · d364c4ce

Maciej W. Rozycki authored 8 months ago

We suffer from an inconsistency in the names of uninstalled gnattools
executables in cross-compiler configurations.  The cause is a recipe we
have:

ada.all.cross:
	for tool in $(ADA_TOOLS) ; do \
	  if [ -f $$tool$(exeext) ] ; \
	  then \
	    $(MV) $$tool$(exeext) $$tool-cross$(exeext); \
	  fi; \
	done

the intent of which is to give the names of gnattools executables the
'-cross' suffix, consistently with the compiler drivers: 'gcc-cross',
'g++-cross', etc.

A problem with the recipe is that this 'make' target is called too early
in the build process, before gnattools have been made.  Consequently no
renames happen and owing to that they are conditional on the presence of
the individual executables the recipe succeeds doing nothing.

However if a target is requested later on such as 'make pdf' that does
not cause gnattools executables to be rebuilt, then 'ada.all.cross' does
succeed in renaming the executables already present in the build tree.
Then if the 'gnat' testsuite is run later on which expects non-suffixed
'gnatmake' executable, it does not find the 'gnatmake-cross' executable
in the build tree and may either catastrophically fail or incorrectly
use a system-installed copy of 'gnatmake'.

Of course if a target is requested such as `make all' that does cause
gnattools executables to be rebuilt, then both suffixed and non-suffixed
uninstalled executables result.

Fix the problem by moving the renaming of gnattools to a separate 'make'
recipe, pasted into a new 'gnattools-cross-mv' target and the existing
legacy 'cross-gnattools' target.  Then invoke the new target explicitly
from the 'gnattools-cross' recipe in gnattools/.

Update the test harness accordingly, so that suffixed gnattools are used
in cross-compilation testsuite runs.

	gcc/ada/
	* gcc-interface/Make-lang.in (ada.all.cross): Move recipe to...
	(GNATTOOLS_CROSS_MV): ... this new variable.
	(cross-gnattools): Paste it here.
	(gnattools-cross-mv): New target.

	gnattools/
	* Makefile.in (gnattools-cross): Also build 'gnattools-cross-mv'
	in GCC_DIR.

	gcc/testsuite/
	* lib/gnat.exp (local_find_gnatmake, find_gnatclean): Use
	'-cross' suffix where testing a cross-compiler.

d364c4ce

Daily bump. · e78c5d0a
GCC Administrator authored 8 months ago

e78c5d0a

Jul 06, 2024

[to-be-committed][v3][RISC-V] Handle bit manipulation of SImode values · 273f16a1

Jeff Law authored 8 months ago

Last patch in this round of bitmanip work...  At least I think I'm going to
pause here and switch gears to other projects that need attention 🙂

This patch introduces the ability to generate bitmanip instructions for rv64
when operating on SI objects when we know something about the range of the bit
position (due to masking of the position).

I've got note that the (7-pos % 8) bit position form was discovered by RAU in
500.perl.  I took that and expanded it to the simple (pos & mask) form as well
as covering bset, binv and bclr.

As far as the implementation is concerned....

This turns the recently added define_splits into define_insn_and_split
constructs.  This allows combine to "see" enough RTL to realize a sign
extension is unnecessary.  Otherwise we get undesirable sign extensions for the
new testcases.

Second it adds new patterns for the logical operations.  Two patterns for
IOR/XOR and two patterns for AND.

I think a key concept to keep in mind is that once we determine a Zbs operation
is safe to perform on a SI value, we can rewrite the RTL in 64bit form.  If we
were ever to try and use range information at expand time for this stuff (and
we probably should investigate that), that's the path I'd suggest.

This is notably cleaner than my original implementation which actually kept the
more complex RTL form through final and emitted 2/3 instructions (mask the bit
position, then the bset/bclr/binv).

Tested in my tester, but waiting for pre-commit CI to report back before taking
further action.

gcc/
	* config/riscv/bitmanip.md (bset splitters): Turn into define_and_splits.
	Don't depend on combine splitting the "andn with constant" form.
	(bset, binv, bclr with masked bit position): New patterns.

gcc/testsuite
	* gcc.target/riscv/binv-for-simode-1.c: New test.
	* gcc.target/riscv/bset-for-simode-1.c: New test.
	* gcc.target/riscv/bclr-for-simode-1.c: New test.

273f16a1

testsuite/52641 - Fix more sloppy tests. · bb16e317

Georg-Johann Lay authored 8 months ago

	PR testsuite/52641
gcc/testsuite/
	* gcc.dg/analyzer/torture/boxed-ptr-1.c: Requires size24plus.
	* gcc.dg/analyzer/torture/pr102692.c: Use intptr_t instead of long.
	* gcc.dg/ipa/pr102714.c: Use uintptr_t instead of unsigned long.
	* gcc.dg/torture/pr115387-1.c: Same.
	* gcc.dg/torture/pr113895-1.c : Same.
	* gcc.dg/ipa/pr108007.c: Require int32plus.
	* gcc.dg/ipa/pr109318.c: Same.
	* gcc.dg/ipa/pr96040.c: Use size_t instead of unsigned long.
	* gcc.dg/torture/pr113126.c: Use vectors of same dimension.
	* gcc.dg/tree-ssa/builtin-sprintf-9.c: Requires double64.

	* gcc.dg/spellcheck-inttypes.c [avr]: Avoid include of inttypes.h.
	* gcc.dg/analyzer/torture/pr104159.c [avr]: Skip.
	* gcc.dg/torture/pr84682-2.c [avr]: Skip.
	* gcc.dg/wtr-conversion-1.c [avr]: Remove avr selector since
	long double is a 64-bit type by now.

bb16e317

[committed] Fix various sh define_insn_and_split predicates · cb9badea

Jeff Law authored 8 months ago

The sh4-linux-gnu port has failed to bootstrap since the introduction of late
combine due to failures to split certain insns.

This is caused by incorrect predicates in various define_insn_and_split
patterns.  Essentially the insn's predicate is something like "TARGET_SH1".
The split predicate is "&& can_create_pseudos_p ()".  So these patterns will
match post-reload, but be un-splittable.  So at assembly output time, we get
the failure as the output template is "#".

This patch fixes the most obvious & egregious cases by bringing the split
condition into the insn's predicate and leaving "&& 1" as the split condition.
That's enough to get sh4-linux-gnu bootstrapping again and I'm hoping it does
the same for sh4eb-linux-gnu.

Pushing to the trunk.

gcc/
	* config/sh/sh.md (adddi3): Only allow matching when we can
	still create new pseudos.
	(subdi3, *rotcl, *rotcr, *rotcr_neg_t, negdi2): Likewise.
	(abs<mode>2, negabs<mode>2, negdi_cond): Likewise.
	(*swapbisi2_and_shl8, *swapbhisi2, *movsi_index_disp_load): Likewise.
	(*movhi_index_disp_load, *mov<mode>index_disp_store): Likewise.
	(*mov_t_msb_neg, *negt_msb, clipu_one): Likewise.

cb9badea

AVR: Create more opportunities for -mfuse-add optimization. · 96559be7

Georg-Johann Lay authored 8 months ago

avr_split_tiny_move() was only run for AVR_TINY because it has no PLUS
addressing modes.  Same applies to the X register on ordinary cores, and
also to the Z register when used with [E]LPM.  For example, without this patch

long long addLL (long long *a, long long *b)
{
  return *a + *b;
}

compiles with "-mmcu=atmgea128 -Os -dp" to:

    ...
    movw r26,r24     ;  80  [c=4 l=1]  *movhi/0
    movw r30,r22     ;  81  [c=4 l=1]  *movhi/0
    ld r18,X         ;  82  [c=4 l=1]  movqi_insn/3
    adiw r26,1   ;  83  [c=4 l=3]  movqi_insn/3
    ld r19,X
    sbiw r26,1
    adiw r26,2   ;  84  [c=4 l=3]  movqi_insn/3
    ld r20,X
    sbiw r26,2
    adiw r26,3   ;  85  [c=4 l=3]  movqi_insn/3
    ld r21,X
    sbiw r26,3
    adiw r26,4   ;  86  [c=4 l=3]  movqi_insn/3
    ld r22,X
    sbiw r26,4
    adiw r26,5   ;  87  [c=4 l=3]  movqi_insn/3
    ld r23,X
    sbiw r26,5
    adiw r26,6   ;  88  [c=4 l=3]  movqi_insn/3
    ld r24,X
    sbiw r26,6
    adiw r26,7   ;  89  [c=4 l=2]  movqi_insn/3
    ld r25,X
    ld r10,Z         ;  90  [c=4 l=1]  movqi_insn/3
    ...

whereas with this patch it becomes:

    ...
    movw r26,r24     ;  80  [c=4 l=1]  *movhi/0
    movw r30,r22     ;  81  [c=4 l=1]  *movhi/0
    ld r18,X+        ;  140 [c=4 l=1]  movqi_insn/3
    ld r19,X+        ;  142 [c=4 l=1]  movqi_insn/3
    ld r20,X+        ;  144 [c=4 l=1]  movqi_insn/3
    ld r21,X+        ;  146 [c=4 l=1]  movqi_insn/3
    ld r22,X+        ;  148 [c=4 l=1]  movqi_insn/3
    ld r23,X+        ;  150 [c=4 l=1]  movqi_insn/3
    ld r24,X+        ;  152 [c=4 l=1]  movqi_insn/3
    ld r25,X         ;  109 [c=4 l=1]  movqi_insn/3
    ld r10,Z         ;  111 [c=4 l=1]  movqi_insn/3
    ...

gcc/
	* config/avr/avr.md: Also split with avr_split_tiny_move()
	for non-AVR_TINY.
	* config/avr/avr.cc (avr_split_tiny_move): Don't change memory
	references with base regs that can do PLUS addressing.
	(avr_out_lpm_no_lpmx) [POST_INC]: Don't output final ADIW when the
	address register is unused after.
gcc/testsuite/
	* gcc.target/avr/torture/fuse-add.c: New test.

96559be7

RISC-V: fix internal error on global variable-length array · 8bc5561c

Eric Botcazou authored 8 months ago

This is an ICE in the RISC-V back-end calling tree_to_uhwi on the DECL_SIZE
of a global variable-length array.

gcc/
	PR target/115591
	* config/riscv/riscv.cc (riscv_valid_lo_sum_p): Add missing test on
	tree_fits_uhwi_p before calling tree_to_uhwi.

gcc/testsuite/
	* gnat.dg/array41.ads, gnat.dg/array41.adb: New test.

8bc5561c

PR target/115751: Avoid force_reg in ix86_expand_ternlog. · 9a7e3f57

Roger Sayle authored 8 months ago

This patch fixes a problem with splitting of complex AVX512 ternlog
instructions on x86_64.  A recent change allows the ternlog pattern
to have multiple mem-like operands prior to reload, by emitting any
"reloads" as necessary during split1, before register allocation.
The issue is that this code calls force_reg to place the mem-like
operand into a register, but unfortunately the vec_duplicate (broadcast)
form of operands supported by ternlog isn't considered a "general_operand",
i.e. supported by all instructions.  This mismatch triggers an ICE in
the middle-end's force_reg, even though the x86 supports loading these
vec_duplicate operands into a vector register in a single (move)
instruction.

This patch resolves this problem by replacing force_reg with calls
to gen_reg_rtx and emit_move (as the i386 backend, unlike the middle-end,
knows these will be recognized by recog).

2024-07-06  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR target/115751
	* config/i386/i386-expand.cc (ix86_expand_ternlog): Avoid use of
	force_reg to "reload" non-register operands, as these may contain
	vec_duplicate (broadcast) operands that aren't supported by
	force_reg.  Use (safer) gen_reg_rtx and emit_move instead.

9a7e3f57

Daily bump. · 92e4d73d
GCC Administrator authored 8 months ago

92e4d73d

Jul 05, 2024

x86, Darwin: Fix bootstrap for 32b multilibs/hosts. · 807e36d7

Iain Sandoe authored 8 months ago


r15-1735-ge62ea4fb8ffcab06ddd  contained changes that altered the
codegen for 32b Darwin (whether hosted on 64b or as 32b host) such
that the per function picbase load is called multiple times in some
cases.  Darwin's back end is not expecting this (and indeed some of
the handling depends on a single instance).

The fixes the issue by marking those instructions as not copyable
(as suggested by Andrew Pinski).

The change is Darwin-specific.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_cannot_copy_insn_p): New.
	(TARGET_CANNOT_COPY_INSN_P): New.

Signed-off-by: Iain Sandoe <iains@gcc.gnu.org>

807e36d7

Fortran: switch test to use issignaling() built-in · eec30733

Francois-Xavier Coudert authored 8 months ago

The macro may not be present in all libc's, but the built-in
is always available.

gcc/testsuite/ChangeLog:

	* gfortran.dg/ieee/signaling_2.f90: Adjust test.
	* gfortran.dg/ieee/signaling_2_c.c: Adjust test.

eec30733

Arm: Fix ldrd offset range [PR115153] · 44e5ecfd

Wilco Dijkstra authored 8 months ago

The valid offset range of LDRD in arm_legitimate_index_p is increased to
-1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode.
Fix this by moving the LDRD check earlier.

gcc:
	PR target/115153
	* config/arm/arm.cc (arm_legitimate_index_p): Move LDRD case before
	NEON.
	(thumb2_legitimate_index_p): Update comments.
	(output_move_neon): Use DFmode for vldr/vstr and non-checking
	adjust_address.

gcc/testsuite:
	PR target/115153
	* gcc.target/arm/pr115153.c: Add new test.
	* lib/target-supports.exp: Add arm_arch_v7ve_neon target support.

44e5ecfd

libgccjit: Allow comparing array types · 533f807e

Antoni Boucher authored 1 year ago

gcc/jit/ChangeLog:

	* jit-common.h: Add array_type class.
	* jit-recording.h (type::dyn_cast_array_type,
	memento_of_get_aligned::dyn_cast_array_type,
	array_type::dyn_cast_array_type, array_type::is_same_type_as):
	New methods.

gcc/testsuite/ChangeLog:

	* jit.dg/test-types.c: Add array type comparison to the test.

533f807e

libgccjit: Add support for the type bfloat16 · 1c314247

Antoni Boucher authored 1 year ago

gcc/jit/ChangeLog:

	PR jit/112574
	* docs/topics/types.rst: Document GCC_JIT_TYPE_BFLOAT16.
	* jit-common.h: Update NUM_GCC_JIT_TYPES.
	* jit-playback.cc (get_tree_node_for_type): Support bfloat16.
	* jit-recording.cc (recording::memento_of_get_type::get_size,
	recording::memento_of_get_type::dereference,
	recording::memento_of_get_type::is_int,
	recording::memento_of_get_type::is_signed,
	recording::memento_of_get_type::is_float,
	recording::memento_of_get_type::is_bool): Support bfloat16.
	* libgccjit.h (enum gcc_jit_types): Add GCC_JIT_TYPE_BFLOAT16.

gcc/testsuite/ChangeLog:

	PR jit/112574
	* jit.dg/all-non-failing-tests.h: New test test-bfloat16.c.
	* jit.dg/test-types.c: Test GCC_JIT_TYPE_BFLOAT16.
	* jit.dg/test-bfloat16.c: New test.

1c314247

RISC-V: Use tu policy for first-element vec_set [PR115725]. · acc3b703

Robin Dapp authored 8 months ago

This patch changes the tail policy for vmv.s.x from ta to tu.
By default the bug does not show up with qemu because qemu's
current vmv.s.x implementation always uses the tail-undisturbed
policy.  With a local qemu version that overwrites the tail
with ones when the tail-agnostic policy is specified, the bug
shows.

gcc/ChangeLog:

	* config/riscv/autovec.md: Add TU policy.
	* config/riscv/riscv-protos.h (enum insn_type): Define
	SCALAR_MOVE_MERGED_OP_TU.

gcc/testsuite/ChangeLog:

	PR target/115725

	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust
	test expectation.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.

acc3b703

AVR: target/87376 - Use nop_general_operand for DImode inputs. · 23a09352

Georg-Johann Lay authored 8 months ago

The avr-dimode.md expanders have code like  emit_move_insn(acc_a, operands[1])
where acc_a is a hard register and operands[1] might be a non-generic
address-space memory reference.  Such loads may clobber hard regs since
some of them are implemented as libgcc calls /and/ 64-moves are
expanded as eight byte-moves, so that acc_a or acc_b might be clobbered
by such a load.

This patch simply denies non-generic address-space references by using
nop_general_operand for all avr-dimode.md input predicates.
With the patch, all memory loads that require library calls are issued
before the expander codes from avr-dimode.md are run.

	PR target/87376
gcc/
	* config/avr/avr-dimode.md: Use "nop_general_operand" instead
	of "general_operand" as predicate for all input operands.

gcc/testsuite/
	* gcc.target/avr/torture/pr87376.c: New test.

23a09352

AArch64: lower 2 reg TBL permutes with one zero register to 1 reg TBL. · 97fcfeac

Tamar Christina authored 8 months ago

When a two reg TBL is performed with one operand being a zero vector we can
instead use a single reg TBL and map the indices for accessing the zero vector
to an out of range constant.

On AArch64 out of range indices into a TBL have a defined semantics of setting
the element to zero.  Many uArches have a slower 2-reg TBL than 1-reg TBL.

Before this change we had:

typedef unsigned int v4si __attribute__ ((vector_size (16)));

v4si f1 (v4si a)
{
  v4si zeros = {0,0,0,0};
  return __builtin_shufflevector (a, zeros, 0, 5, 1, 6);
}

which generates:

f1:
        mov     v30.16b, v0.16b
        movi    v31.4s, 0
        adrp    x0, .LC0
        ldr     q0, [x0, #:lo12:.LC0]
        tbl     v0.16b, {v30.16b - v31.16b}, v0.16b
        ret

.LC0:
        .byte   0
        .byte   1
        .byte   2
        .byte   3
        .byte   20
        .byte   21
        .byte   22
        .byte   23
        .byte   4
        .byte   5
        .byte   6
        .byte   7
        .byte   24
        .byte   25
        .byte   26
        .byte   27

and with the patch:

f1:
        adrp    x0, .LC0
        ldr     q31, [x0, #:lo12:.LC0]
        tbl     v0.16b, {v0.16b}, v31.16b
        ret

.LC0:
        .byte   0
        .byte   1
        .byte   2
        .byte   3
        .byte   -1
        .byte   -1
        .byte   -1
        .byte   -1
        .byte   4
        .byte   5
        .byte   6
        .byte   7
        .byte   -1
        .byte   -1
        .byte   -1
        .byte   -1

This sequence is generated often by openmp and aside from the
strict performance impact of this change, it also gives better
register allocation as we no longer have the consecutive
register limitation.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (struct expand_vec_perm_d): Add zero_op0_p
	and zero_op_p1.
	(aarch64_evpc_tbl): Implement register value remapping.
	(aarch64_vectorize_vec_perm_const): Detect if operand is a zero dup
	before it's forced to a reg.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/tbl_with_zero_1.c: New test.
	* gcc.target/aarch64/tbl_with_zero_2.c: New test.

97fcfeac

AArch64: remove aarch64_simd_vec_unpack<su>_lo_ · 6ff69810

Tamar Christina authored 8 months ago

The fix for PR18127 reworked the uxtl to zip optimization.
In doing so it undid the changes in aarch64_simd_vec_unpack<su>_lo_ and this now
no longer matches aarch64_simd_vec_unpack<su>_hi_.  It still works because the
RTL generated by aarch64_simd_vec_unpack<su>_lo_ overlaps with the general zero
extend RTL and so because that one is listed before the lo pattern recog picks
it instead.

This removes aarch64_simd_vec_unpack<su>_lo_.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md
	(aarch64_simd_vec_unpack<su>_lo_<mode>): Remove.
	(vec_unpack<su>_lo_<mode): Simplify.
	* config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update
	comment.

6ff69810

middle-end: Add debug functions to dump dominator tree in dot format · ae07f62a

Alex Coplan authored 8 months ago

This adds debug functions to dump the dominator tree in dot format.
There are two overloads: one which takes a FILE * and another which
takes a const char *fname and wraps the first with fopen/fclose for
convenience.

gcc/ChangeLog:

	* dominance.cc (dot_dominance_tree): New.

ae07f62a

i386: Refactor ssedoublemode · 319d3956

Hu, Lin1 authored 8 months ago

ssedoublemode's double should mean double type, like SI -> DI.
And we need to refactor some patterns with <ssedoublemode> instead of
<ssedoublevecmode>.

gcc/ChangeLog:

	* config/i386/sse.md (ssedoublemode): Remove mappings to twice
	the number of same-sized elements. Add mappings to the same
	number of double-sized elements.
	(define_split for vec_concat_minus_plus): Change mode_attr from
	ssedoublemode to ssedoublevecmode.
	(define_split for vec_concat_plus_minus): Ditto.
	(<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>):
	Ditto.
	(avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ditto.
	(avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ditto.
	(avx512f_shuf_<shuffletype>32x4_1<mask_name>): Ditto.

319d3956

MIPS: Support more cases with alien mode of SHF.DF · 320c2ed4

YunQiang Su authored 8 months ago

Currently, we support the cases that strictly fit for the instructions.
For example, for V16QImode, we only support shuffle like
(0<=N0, N1, N2, N3<=3 here)
	N0,	N1,	N2,	N3
	N0+4	N1+4	N2+4,	N3+4
	N0+8	N1+8	N2+8,	N3+8
	N0+12	N1+12	N2+12,	N3+12

While in fact we can support more cases to try use other SHF.DF
instructions not strictly fitting the mode.

1) We can use SHF.H to support more cases for V16QImode:
(M0/M1/M2/M3 are 0 or 2 or 4 or 6)
	M0	M0+1,	M1,	M1+1
	M2	M2+1,	M3,	M3+1
	M0+8	M0+9,	M1+8,	M1+9
	M2+8	M2+9,	M3+8,	M3+9

2) We can use SHF.W to support some cases for V16QImode:
(M0/M1/M2/M3 are 0 or 4 or 8 or 12)
	M0,	M0+1,	M0+2,	M0+3
	M1,	M1+1,	M1+2,	M1+3
	M2,	M2+1,	M2+2,	M2+3
	M3,	M3+1,	M3+2,	M3+3

3) We can use SHF.W to support some cases for V8HImode:
(M0/M1/M2/M3 are 0 or 2 or 4 or 6)
	M0,	M0+1
	M1,	M1+1
	M2,	M2+1
	M3,	M3+1

4) We can also use SHF.W to swap the 2 parts of V2DF or V2DI.

gcc
	* config/mips/mips-protos.h: New function mips_msa_shf_i8.
	* config/mips/mips-msa.md(MSA_WHB_W): Not used anymore;
	(msa_shf_<msafmt_f>): Use mips_msa_shf_i8.
	* config/mips/mips.cc(mips_const_vector_shuffle_set_p):
	Support more cases try to use alien mode instruction;
	(mips_msa_shf_i8): New function to get the correct MSA SHF
	instruction and IMM.

320c2ed4

Testsuite/MIPS: Fix msa.c: test7_v2f64, test7_v4f32, test43_v2i64 · 33dfd679

YunQiang Su authored 8 months ago

BNEGI.W/D are used for test7_v2f64 and test7_v4f32 now.  It is
an improvment since that we can save a instruction.

ILVR.D is used for test43_v2i64 now, instead of INSVE.D.

gcc/testsuite
	* gcc.target/mips/msa.c: Fix test7_v2f64, test7_v4f32 and
	test43_v2i64.

33dfd679

MIPS/testsuite: Add -mfpxx to call-clobbered-1.c · e08ed5f1

YunQiang Su authored 8 months ago

The scan-assembler-times rules only fit for -mfp32 and -mfpxx.
It fails if we are configured as FP64 by default, as it has
one less sdc1/ldc1 pair.

gcc/testsuite
	* gcc.target/mips/call-clobbered-1.c: Add -mfpxx.

e08ed5f1

MIPS/testsuite: Fix umips-save-restore-1.c · f1437b96

YunQiang Su authored 8 months ago

With some recent optimization, -O1/-O2/-O3 can archive almost same
performace/size by stack load/store.  Thus lwm/swm will save/store
less callee-saved register.  In fact only $16 is saved with swm.

To be sure that this optimization does exist, let's add 2 more
function calls.  So that lwm/swm can be much more profitable.

If we add only once more, -O1 will still use stack load/store.

gcc/testsuite
	* gcc.target/mips/umips-save-restore-1.c: Be sure lwm/swm
	are used for more callee-saved registers with addtional
	2 more function calls.

f1437b96

Support group size of three in SLP store permute lowering · 7eb8b657

Richard Biener authored 8 months ago

The following implements the group-size three scheme from
vect_permute_store_chain in SLP grouped store permute lowering
and extends it to power-of-two multiples of group size three.

The scheme goes from vectors A, B and C to
{ A[0], B[0], C[0], A[1], B[1], C[1], ... } by first producing
{ A[0], B[0], X, A[1], B[1], X, ... } (with X random but chosen
to A[n]) and then permuting in C[n] in the appropriate places.

The extension goes as to replace vector elements with a
power-of-two number of lanes and you'd get pairwise interleaving
until the final three input permutes happen.

The last permute step could be seen as extending C to { C[0], C[0],
C[0], ... } and then performing a blend.

VLA archs will want to use store-lanes here I guess, I'm not sure
if the three vector interleave operation is also available with
a register source and destination and thus available for a shuffle.

	* tree-vect-slp.cc (vect_build_slp_instance): Special case
	three input permute with the same number of lanes in store
	permute lowering.

	* gcc.dg/vect/slp-53.c: New testcase.
	* gcc.dg/vect/slp-54.c: New testcase.

7eb8b657

Daily bump. · 304b6464
GCC Administrator authored 8 months ago

304b6464

Jul 04, 2024

analyzer: convert sm_context * to sm_context & · f8c130cd

David Malcolm authored 8 months ago


These are never nullptr and never change, so use a reference rather
than a pointer.

No functional change intended.

gcc/analyzer/ChangeLog:
	* diagnostic-manager.cc
	(diagnostic_manager::add_events_for_eedge): Pass sm_ctxt by
	reference.
	* engine.cc (impl_region_model_context::on_condition): Likewise.
	(impl_region_model_context::on_bounded_ranges): Likewise.
	(impl_region_model_context::on_phi): Likewise.
	(exploded_node::on_stmt): Likewise.
	* sm-fd.cc: Update all uses of sm_context * to sm_context &.
	* sm-file.cc: Likewise.
	* sm-malloc.cc: Likewise.
	* sm-pattern-test.cc: Likewise.
	* sm-sensitive.cc: Likewise.
	* sm-signal.cc: Likewise.
	* sm-taint.cc: Likewise.
	* sm.h: Likewise.
	* varargs.cc: Likewise.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/analyzer_gil_plugin.c: Update all uses of
	sm_context * to sm_context &.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

f8c130cd

analyzer: handle <error.h> at -O0 [PR115724] · a6fdb1a2

David Malcolm authored 8 months ago


At -O0, glibc's:

__extern_always_inline void
error (int __status, int __errnum, const char *__format, ...)
{
  if (__builtin_constant_p (__status) && __status != 0)
    __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ());
  else
    __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ());
}

becomes just:

__extern_always_inline void
error (int __status, int __errnum, const char *__format, ...)
{
  if (0)
    __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ());
  else
    __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ());
}

and thus calls to "error" are calls to "__error_alias" by the
time -fanalyzer "sees" them.

Handle them with more special-casing in kf.cc.

gcc/analyzer/ChangeLog:
	PR analyzer/115724
	* kf.cc (register_known_functions): Add __error_alias and
	__error_at_line_alias.

gcc/testsuite/ChangeLog:
	PR analyzer/115724
	* c-c++-common/analyzer/error-pr115724.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

a6fdb1a2

[committed][RISC-V] Fix test expectations after recent late-combine changes · b611f396

Jeff Law authored 8 months ago

With the recent DCE related adjustment to late-combine the rvv/base/vcreate.c
test no longer has those undesirable vmvNr statements.

It's a bit unclear why this wasn't written as a scan-assembler-not and xfailed
given the comment says we don't want to see vmvNr insructions.  I must have
missed that during review.

This patch adjusts the test to expect no vmvNr statements and if they're ever
re-introduced, we'll get a nice unexpected failure.

gcc/testsuite
	* gcc.target/riscv/rvv/base/vcreate.c: Update expected output.

b611f396

testsuite: Update test for PR115537 to use SVE . · adcfb4fb

Tamar Christina authored 8 months ago

The PR was about SVE codegen, the testcase accidentally used neoverse-n1
instead of neoverse-v1 as was the original report.

This updates the tool options.

gcc/testsuite/ChangeLog:

	PR tree-optimization/115537
	* gcc.dg/vect/pr115537.c: Update flag from neoverse-n1 to neoverse-v1.

adcfb4fb

c++ frontend: check for missing condition for novector [PR115623] · 84acbfbe

Tamar Christina authored 8 months ago

It looks like I forgot to check in the C++ frontend if a condition exist for the
loop being adorned with novector.  This causes a segfault because cond isn't
expected to be null.

This fixes it by issuing ignoring the pragma when there's no loop condition
the same way we do in the C frontend.

gcc/cp/ChangeLog:

	PR c++/115623
	* semantics.cc (finish_for_cond): Add check for C++ cond.

gcc/testsuite/ChangeLog:

	PR c++/115623
	* g++.dg/vect/vect-novector-pragma_2.cc: New test.

84acbfbe

arm: Use LDMIA/STMIA for thumb1 DI/DF loads/stores · 236d6fef

Siarhei Volkau authored 9 months ago


If the address register is dead after load/store operation it looks
beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
at least if optimizing for size.

gcc/ChangeLog:

	* config/arm/arm.cc (thumb_load_double_from_address): Emit ldmia
	when address reg rewritten by load.
	* config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
	(peephole2 to rewrite DI/DF store): New.
	* config/arm/iterators.md (DIDF): New.

gcc/testsuite:

	* gcc.target/arm/thumb1-load-store-64bit.c: Add new test.

Signed-off-by: Siarhei Volkau <lis8215@gmail.com>

236d6fef

Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890] · 11049cdf

Alfie Richards authored 8 months ago

This change removes code that switches the operands in bigendian mode erroneously.
This fixes the related test also.

gcc/ChangeLog:

	PR target/114890
	* config/aarch64/aarch64-simd.md: Remove bigendian operand swap.

gcc/testsuite/ChangeLog:

	PR target/114890
	* gcc.target/aarch64/vector_intrinsics_asm.c: Remove xfail.

11049cdf

Aarch64: Add test for non-commutative SIMD intrinsic · 14c67938

Alfie Richards authored 8 months ago

This adds a test for non-commutative SIMD NEON intrinsics.
Specifically addp is non-commutative and has a bug in the current big-endian implementation.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_intrinsics_asm.c: New test.

14c67938

middle-end/115426 - wrong gimplification of "rm" asm output operand · a4bbdec2

Richard Biener authored 9 months ago

When the operand is gimplified to an extract of a register or a
register we have to disallow memory as we otherwise fail to
gimplify it properly.  Instead of

  __asm__("" : "=rm" __imag <r>);

we want

  __asm__("" : "=rm" D.2772);
  _1 = REALPART_EXPR <r>;
  r = COMPLEX_EXPR <_1, D.2772>;

otherwise SSA rewrite will fail and generate wrong code with 'r'
left bare in the asm output.

	PR middle-end/115426
	* gimplify.cc (gimplify_asm_expr): Handle "rm" output
	constraint gimplified to a register (operation).

	* gcc.dg/pr115426.c: New testcase.

a4bbdec2

Use __builtin_cpu_support instead of __get_cpuid_count. · 699087a1

liuhongt authored 8 months ago

gcc/testsuite/ChangeLog:

	PR target/115748
	* gcc.target/i386/avx512-check.h: Use __builtin_cpu_support
	instead of __get_cpuid_count.

699087a1

i386: Add additional variant of bswaphisi2_lowpart peephole2. · 727f8b14

Roger Sayle authored 8 months ago

This patch adds an additional variation of the peephole2 used to convert
bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into
rotw if the flags register isn't live.  The motivating example is:

void ext(int x);
void foo(int x)
{
  ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8));
}

where GCC with -O2 currently produces:

foo:	movl    %edi, %eax
        rolw    $8, %ax
        movl    %eax, %edi
        jmp     ext

The issue is that the original xchgb (bswaphisi2_lowpart) can only be
performed in "Q" registers that allow the %?h register to be used, so
reload generates the above two movl.  However, it's later in peephole2
where we see that CC_FLAGS can be clobbered, so we can use a rotate word,
which is more forgiving with register allocations.  With the additional
peephole2 proposed here, we now generate:

foo:	rolw    $8, %di
        jmp     ext

2024-07-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386.md (bswaphisi2_lowpart peephole2): New
	peephole2 variant to eliminate register shuffling.

gcc/testsuite/ChangeLog
	* gcc.target/i386/xchg-4.c: New test case.

727f8b14

[committed] Fix newlib build failure with rx as well as several dozen testsuite failures · 759f4abe

Jeff Law authored 8 months ago

The rx port has been failing to build newlib for a bit over a week.  I can't
remember if it was the late-combine work or the IRA costing twiddle, regardless
the real bug is in the rx backend.

Basically dwarf2cfi is blowing up because of inconsistent state caused by the
failure to mark a stack adjustment as frame related.  This instance in the
epilogue looks like a simple goof.

With the port building again, the testsuite would run and it showed a number of
regressions, again related to CFI handling.  The common thread was a failure to
mark a copy from FP to SP in the prologue as frame related.  The change which
introduced this bug as supposed to just be changing promotions of vector types.
It's unclear if Nick included the hunk accidentally or just goof'd on the
logic.  Regardless it looks quite incorrect.

Reverting that hunk fixes the regressions *and* fixes 94 pre-existing failures.

The net is rx-elf is regression free and has moved forward in terms of its
testsuite status.

Pushing to the trunk momentarily.

gcc/

	* config/rx/rx.cc (rx_expand_prologue): Mark the copy from FP to SP
	as frame related.
	(rx_expand_epilogue): Mark the stack pointer adjustment as frame
	related.

759f4abe