Commits · 319d3956b16b1270f27e9cbf749e881c4ff7dfb4 · COBOLworx / gcc-cobol

Jul 05, 2024

Hu, Lin1 authored 8 months ago

ssedoublemode's double should mean double type, like SI -> DI.
And we need to refactor some patterns with <ssedoublemode> instead of
<ssedoublevecmode>.

gcc/ChangeLog:

	* config/i386/sse.md (ssedoublemode): Remove mappings to twice
	the number of same-sized elements. Add mappings to the same
	number of double-sized elements.
	(define_split for vec_concat_minus_plus): Change mode_attr from
	ssedoublemode to ssedoublevecmode.
	(define_split for vec_concat_plus_minus): Ditto.
	(<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>):
	Ditto.
	(avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ditto.
	(avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ditto.
	(avx512f_shuf_<shuffletype>32x4_1<mask_name>): Ditto.

319d3956

MIPS: Support more cases with alien mode of SHF.DF · 320c2ed4

YunQiang Su authored 9 months ago

Currently, we support the cases that strictly fit for the instructions.
For example, for V16QImode, we only support shuffle like
(0<=N0, N1, N2, N3<=3 here)
	N0,	N1,	N2,	N3
	N0+4	N1+4	N2+4,	N3+4
	N0+8	N1+8	N2+8,	N3+8
	N0+12	N1+12	N2+12,	N3+12

While in fact we can support more cases to try use other SHF.DF
instructions not strictly fitting the mode.

1) We can use SHF.H to support more cases for V16QImode:
(M0/M1/M2/M3 are 0 or 2 or 4 or 6)
	M0	M0+1,	M1,	M1+1
	M2	M2+1,	M3,	M3+1
	M0+8	M0+9,	M1+8,	M1+9
	M2+8	M2+9,	M3+8,	M3+9

2) We can use SHF.W to support some cases for V16QImode:
(M0/M1/M2/M3 are 0 or 4 or 8 or 12)
	M0,	M0+1,	M0+2,	M0+3
	M1,	M1+1,	M1+2,	M1+3
	M2,	M2+1,	M2+2,	M2+3
	M3,	M3+1,	M3+2,	M3+3

3) We can use SHF.W to support some cases for V8HImode:
(M0/M1/M2/M3 are 0 or 2 or 4 or 6)
	M0,	M0+1
	M1,	M1+1
	M2,	M2+1
	M3,	M3+1

4) We can also use SHF.W to swap the 2 parts of V2DF or V2DI.

gcc
	* config/mips/mips-protos.h: New function mips_msa_shf_i8.
	* config/mips/mips-msa.md(MSA_WHB_W): Not used anymore;
	(msa_shf_<msafmt_f>): Use mips_msa_shf_i8.
	* config/mips/mips.cc(mips_const_vector_shuffle_set_p):
	Support more cases try to use alien mode instruction;
	(mips_msa_shf_i8): New function to get the correct MSA SHF
	instruction and IMM.

320c2ed4

Testsuite/MIPS: Fix msa.c: test7_v2f64, test7_v4f32, test43_v2i64 · 33dfd679

YunQiang Su authored 9 months ago

BNEGI.W/D are used for test7_v2f64 and test7_v4f32 now.  It is
an improvment since that we can save a instruction.

ILVR.D is used for test43_v2i64 now, instead of INSVE.D.

gcc/testsuite
	* gcc.target/mips/msa.c: Fix test7_v2f64, test7_v4f32 and
	test43_v2i64.

33dfd679

MIPS/testsuite: Add -mfpxx to call-clobbered-1.c · e08ed5f1

YunQiang Su authored 9 months ago

The scan-assembler-times rules only fit for -mfp32 and -mfpxx.
It fails if we are configured as FP64 by default, as it has
one less sdc1/ldc1 pair.

gcc/testsuite
	* gcc.target/mips/call-clobbered-1.c: Add -mfpxx.

e08ed5f1

MIPS/testsuite: Fix umips-save-restore-1.c · f1437b96

YunQiang Su authored 9 months ago

With some recent optimization, -O1/-O2/-O3 can archive almost same
performace/size by stack load/store.  Thus lwm/swm will save/store
less callee-saved register.  In fact only $16 is saved with swm.

To be sure that this optimization does exist, let's add 2 more
function calls.  So that lwm/swm can be much more profitable.

If we add only once more, -O1 will still use stack load/store.

gcc/testsuite
	* gcc.target/mips/umips-save-restore-1.c: Be sure lwm/swm
	are used for more callee-saved registers with addtional
	2 more function calls.

f1437b96

Support group size of three in SLP store permute lowering · 7eb8b657

Richard Biener authored 8 months ago

The following implements the group-size three scheme from
vect_permute_store_chain in SLP grouped store permute lowering
and extends it to power-of-two multiples of group size three.

The scheme goes from vectors A, B and C to
{ A[0], B[0], C[0], A[1], B[1], C[1], ... } by first producing
{ A[0], B[0], X, A[1], B[1], X, ... } (with X random but chosen
to A[n]) and then permuting in C[n] in the appropriate places.

The extension goes as to replace vector elements with a
power-of-two number of lanes and you'd get pairwise interleaving
until the final three input permutes happen.

The last permute step could be seen as extending C to { C[0], C[0],
C[0], ... } and then performing a blend.

VLA archs will want to use store-lanes here I guess, I'm not sure
if the three vector interleave operation is also available with
a register source and destination and thus available for a shuffle.

	* tree-vect-slp.cc (vect_build_slp_instance): Special case
	three input permute with the same number of lanes in store
	permute lowering.

	* gcc.dg/vect/slp-53.c: New testcase.
	* gcc.dg/vect/slp-54.c: New testcase.

7eb8b657

Daily bump. · 304b6464
GCC Administrator authored 8 months ago

304b6464

Jul 04, 2024

analyzer: convert sm_context * to sm_context & · f8c130cd

David Malcolm authored 8 months ago


These are never nullptr and never change, so use a reference rather
than a pointer.

No functional change intended.

gcc/analyzer/ChangeLog:
	* diagnostic-manager.cc
	(diagnostic_manager::add_events_for_eedge): Pass sm_ctxt by
	reference.
	* engine.cc (impl_region_model_context::on_condition): Likewise.
	(impl_region_model_context::on_bounded_ranges): Likewise.
	(impl_region_model_context::on_phi): Likewise.
	(exploded_node::on_stmt): Likewise.
	* sm-fd.cc: Update all uses of sm_context * to sm_context &.
	* sm-file.cc: Likewise.
	* sm-malloc.cc: Likewise.
	* sm-pattern-test.cc: Likewise.
	* sm-sensitive.cc: Likewise.
	* sm-signal.cc: Likewise.
	* sm-taint.cc: Likewise.
	* sm.h: Likewise.
	* varargs.cc: Likewise.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/analyzer_gil_plugin.c: Update all uses of
	sm_context * to sm_context &.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

f8c130cd

analyzer: handle <error.h> at -O0 [PR115724] · a6fdb1a2

David Malcolm authored 8 months ago


At -O0, glibc's:

__extern_always_inline void
error (int __status, int __errnum, const char *__format, ...)
{
  if (__builtin_constant_p (__status) && __status != 0)
    __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ());
  else
    __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ());
}

becomes just:

__extern_always_inline void
error (int __status, int __errnum, const char *__format, ...)
{
  if (0)
    __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ());
  else
    __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ());
}

and thus calls to "error" are calls to "__error_alias" by the
time -fanalyzer "sees" them.

Handle them with more special-casing in kf.cc.

gcc/analyzer/ChangeLog:
	PR analyzer/115724
	* kf.cc (register_known_functions): Add __error_alias and
	__error_at_line_alias.

gcc/testsuite/ChangeLog:
	PR analyzer/115724
	* c-c++-common/analyzer/error-pr115724.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

a6fdb1a2

[committed][RISC-V] Fix test expectations after recent late-combine changes · b611f396

Jeff Law authored 8 months ago

With the recent DCE related adjustment to late-combine the rvv/base/vcreate.c
test no longer has those undesirable vmvNr statements.

It's a bit unclear why this wasn't written as a scan-assembler-not and xfailed
given the comment says we don't want to see vmvNr insructions.  I must have
missed that during review.

This patch adjusts the test to expect no vmvNr statements and if they're ever
re-introduced, we'll get a nice unexpected failure.

gcc/testsuite
	* gcc.target/riscv/rvv/base/vcreate.c: Update expected output.

b611f396

Skip 30_threads/future/members/poll.cc on hppa*-*-linux* · 46ffda9b

John David Anglin authored 8 months ago

hppa*-*-linux* lacks high resolution timer support. Timer resolution
ranges from 1 to 10ms. As a result, a large number of iterations are
needed for the wait_for_0 and ready loops. This causes the
wait_until_sys_epoch and wait_until_steady_epoch loops to timeout.
There the loop wait time is determined by the timer resolution.

2024-07-04  John David Anglin  <danglin@gcc.gnu.org>

libstdc++-v3/ChangeLog:
	PR libstdc++/98678
	* testsuite/30_threads/future/members/poll.cc: Skip on hppa*-*-linux*.

46ffda9b

testsuite: Update test for PR115537 to use SVE . · adcfb4fb

Tamar Christina authored 8 months ago

The PR was about SVE codegen, the testcase accidentally used neoverse-n1
instead of neoverse-v1 as was the original report.

This updates the tool options.

gcc/testsuite/ChangeLog:

	PR tree-optimization/115537
	* gcc.dg/vect/pr115537.c: Update flag from neoverse-n1 to neoverse-v1.

adcfb4fb

c++ frontend: check for missing condition for novector [PR115623] · 84acbfbe

Tamar Christina authored 8 months ago

It looks like I forgot to check in the C++ frontend if a condition exist for the
loop being adorned with novector.  This causes a segfault because cond isn't
expected to be null.

This fixes it by issuing ignoring the pragma when there's no loop condition
the same way we do in the C frontend.

gcc/cp/ChangeLog:

	PR c++/115623
	* semantics.cc (finish_for_cond): Add check for C++ cond.

gcc/testsuite/ChangeLog:

	PR c++/115623
	* g++.dg/vect/vect-novector-pragma_2.cc: New test.

84acbfbe

arm: Use LDMIA/STMIA for thumb1 DI/DF loads/stores · 236d6fef

Siarhei Volkau authored 9 months ago


If the address register is dead after load/store operation it looks
beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
at least if optimizing for size.

gcc/ChangeLog:

	* config/arm/arm.cc (thumb_load_double_from_address): Emit ldmia
	when address reg rewritten by load.
	* config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
	(peephole2 to rewrite DI/DF store): New.
	* config/arm/iterators.md (DIDF): New.

gcc/testsuite:

	* gcc.target/arm/thumb1-load-store-64bit.c: Add new test.

Signed-off-by: Siarhei Volkau <lis8215@gmail.com>

236d6fef

Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890] · 11049cdf

Alfie Richards authored 8 months ago

This change removes code that switches the operands in bigendian mode erroneously.
This fixes the related test also.

gcc/ChangeLog:

	PR target/114890
	* config/aarch64/aarch64-simd.md: Remove bigendian operand swap.

gcc/testsuite/ChangeLog:

	PR target/114890
	* gcc.target/aarch64/vector_intrinsics_asm.c: Remove xfail.

11049cdf

Aarch64: Add test for non-commutative SIMD intrinsic · 14c67938

Alfie Richards authored 8 months ago

This adds a test for non-commutative SIMD NEON intrinsics.
Specifically addp is non-commutative and has a bug in the current big-endian implementation.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_intrinsics_asm.c: New test.

14c67938

middle-end/115426 - wrong gimplification of "rm" asm output operand · a4bbdec2

Richard Biener authored 9 months ago

When the operand is gimplified to an extract of a register or a
register we have to disallow memory as we otherwise fail to
gimplify it properly.  Instead of

  __asm__("" : "=rm" __imag <r>);

we want

  __asm__("" : "=rm" D.2772);
  _1 = REALPART_EXPR <r>;
  r = COMPLEX_EXPR <_1, D.2772>;

otherwise SSA rewrite will fail and generate wrong code with 'r'
left bare in the asm output.

	PR middle-end/115426
	* gimplify.cc (gimplify_asm_expr): Handle "rm" output
	constraint gimplified to a register (operation).

	* gcc.dg/pr115426.c: New testcase.

a4bbdec2

Use __builtin_cpu_support instead of __get_cpuid_count. · 699087a1

liuhongt authored 8 months ago

gcc/testsuite/ChangeLog:

	PR target/115748
	* gcc.target/i386/avx512-check.h: Use __builtin_cpu_support
	instead of __get_cpuid_count.

699087a1

i386: Add additional variant of bswaphisi2_lowpart peephole2. · 727f8b14

Roger Sayle authored 8 months ago

This patch adds an additional variation of the peephole2 used to convert
bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into
rotw if the flags register isn't live.  The motivating example is:

void ext(int x);
void foo(int x)
{
  ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8));
}

where GCC with -O2 currently produces:

foo:	movl    %edi, %eax
        rolw    $8, %ax
        movl    %eax, %edi
        jmp     ext

The issue is that the original xchgb (bswaphisi2_lowpart) can only be
performed in "Q" registers that allow the %?h register to be used, so
reload generates the above two movl.  However, it's later in peephole2
where we see that CC_FLAGS can be clobbered, so we can use a rotate word,
which is more forgiving with register allocations.  With the additional
peephole2 proposed here, we now generate:

foo:	rolw    $8, %di
        jmp     ext

2024-07-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386.md (bswaphisi2_lowpart peephole2): New
	peephole2 variant to eliminate register shuffling.

gcc/testsuite/ChangeLog
	* gcc.target/i386/xchg-4.c: New test case.

727f8b14

[committed] Fix newlib build failure with rx as well as several dozen testsuite failures · 759f4abe

Jeff Law authored 8 months ago

The rx port has been failing to build newlib for a bit over a week.  I can't
remember if it was the late-combine work or the IRA costing twiddle, regardless
the real bug is in the rx backend.

Basically dwarf2cfi is blowing up because of inconsistent state caused by the
failure to mark a stack adjustment as frame related.  This instance in the
epilogue looks like a simple goof.

With the port building again, the testsuite would run and it showed a number of
regressions, again related to CFI handling.  The common thread was a failure to
mark a copy from FP to SP in the prologue as frame related.  The change which
introduced this bug as supposed to just be changing promotions of vector types.
It's unclear if Nick included the hunk accidentally or just goof'd on the
logic.  Regardless it looks quite incorrect.

Reverting that hunk fixes the regressions *and* fixes 94 pre-existing failures.

The net is rx-elf is regression free and has moved forward in terms of its
testsuite status.

Pushing to the trunk momentarily.

gcc/

	* config/rx/rx.cc (rx_expand_prologue): Mark the copy from FP to SP
	as frame related.
	(rx_expand_epilogue): Mark the stack pointer adjustment as frame
	related.

759f4abe

[APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue · 8e72b1bb

Hongyu Wang authored 1 year ago

According to APX spec, the pushp/popp pairs should be matched,
otherwise the PPX hint cannot take effect and cause performance loss.

In the ix86_expand_epilogue, there are several optimizations that may
cause the epilogue using mov to restore the regs. Check if PPX applied
and prevent usage of mov/leave in the epilogue. Also do not use PPX
for eh_return.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_expand_prologue): Set apx_ppx_used
	flag in m.fs with TARGET_APX_PPX && !crtl->calls_eh_return.
	(ix86_emit_save_regs): Emit ppx is available only when
	TARGET_APX_PPX && !crtl->calls_eh_return.
	(ix86_expand_epilogue): Don't restore reg using mov when
	apx_ppx_used flag is true.
	* config/i386/i386.h (struct machine_frame_state):
	Add apx_ppx_used flag.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ppx-2.c: New test.
	* gcc.target/i386/apx-ppx-3.c: Likewise.

8e72b1bb

c++: OVERLOAD in diagnostics · baac8f71

Jason Merrill authored 8 months ago

In modules we can get an OVERLOAD around a non-function, so let's tail
recurse instead of falling through.  As a result we start printing the
template header in this testcase.

gcc/cp/ChangeLog:

	* error.cc (dump_decl) [OVERLOAD]: Recurse on single case.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/pr61945.C: Adjust diagnostic.

baac8f71

c++: CTAD and trait built-ins · 655fe94a

Jason Merrill authored 8 months ago

While poking at 101232 I noticed that we started trying to parse
__is_invocable(_Fn, _Args...) as a functional cast to a CTAD placeholder
type; we shouldn't consider CTAD for a template that shares a name (reserved
for the implementation) with a built-in trait.

gcc/cp/ChangeLog:

	* pt.cc (ctad_template_p): Return false for trait names.

655fe94a

vect: Fix ICE caused by missing check for TREE_CODE == SSA_NAME · d1eeafe4

Hu, Lin1 authored 8 months ago

Need to check if the tree's code is SSA_NAME before SSA_NAME_RANGE_INFO.

2024-07-03  Hu, Lin1 <lin1.hu@intel.com>
	    Andrew Pinski <quic_apinski@quicinc.com>

gcc/ChangeLog:

	PR tree-optimization/115753
	* tree-vect-stmts.cc (supportable_indirect_convert_operation): Add
	TYPE_CODE check before SSA_NAME_RANGE_INFO.

gcc/testsuite/ChangeLog:

	PR tree-optimization/115753
	* gcc.dg/vect/pr115753-1.c: New test.
	* gcc.dg/vect/pr115753-2.c: Ditto.
	* gcc.dg/vect/pr115753-3.c: Ditto.

d1eeafe4

Daily bump. · 0720394a
GCC Administrator authored 8 months ago

0720394a

Jul 03, 2024

[committed] Fix previously latent bug in reorg affecting cris port · e5f73853

Jeff Law authored 8 months ago

The late-combine patch has triggered a previously latent bug in reorg.

Basically we have a sequence like this in the middle of reorg before we start
relaxing delay slots (cris-elf, gcc.dg/torture/pr98289.c)

> (insn 67 49 18 (sequence [
>             (jump_insn 50 49 52 (set (pc)
>                     (if_then_else (ne (reg:CC 19 ccr)
>                             (const_int 0 [0]))
>                         (label_ref:SI 30)
>                         (pc))) "j.c":10:6 discrim 1 282 {*bnecc}
>                  (expr_list:REG_DEAD (reg:CC 19 ccr)
>                     (int_list:REG_BR_PROB 7 (nil)))
>              -> 30)
>             (insn/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
>                     (reg:SI 16 srp)) 37 {*mov_tomemsi}
>                  (nil))
>         ]) "j.c":10:6 discrim 1 -1
>      (nil))
>
> (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)
>
> (note 54 18 55 NOTE_INSN_EPILOGUE_BEG)
>
> (jump_insn 55 54 56 (return) "j.c":14:1 228 {*return_expanded}
>      (nil)
>  -> return)
>
> (barrier 56 55 43)
>
> (note 43 56 65 [bb 4] NOTE_INSN_BASIC_BLOCK)
>
> (note 65 43 30 NOTE_INSN_SWITCH_TEXT_SECTIONS)
>
> (code_label 30 65 8 5 6 (nil) [1 uses])
>
> (note 8 30 61 [bb 5] NOTE_INSN_BASIC_BLOCK)

So at a high level the things to note are that insn 50 conditionally jumps
around insn 55.  Second there's a SWITCH_TEXT_SECTIONS note between insn 50 and
the target label for insn 50 (code_label 30).

reorg sees the conditional jump around the unconditional jump/return and will
invert the jump and retarget the original jump to an appropriate location.  In
this case generating:

> (insn 67 49 18 (sequence [
>             (jump_insn 50 49 52 (set (pc)
>                     (if_then_else (eq (reg:CC 19 ccr)
>                             (const_int 0 [0]))
>                         (label_ref:SI 68)
>                         (pc))) "j.c":10:6 discrim 1 281 {*beqcc}
>                  (expr_list:REG_DEAD (reg:CC 19 ccr)
>                     (int_list:REG_BR_PROB 1073741831 (nil)))
>              -> 68)
>             (insn/s/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
>                     (reg:SI 16 srp)) 37 {*mov_tomemsi}
>                  (nil))
>         ]) "j.c":10:6 discrim 1 -1
>      (nil))
>
> (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)
>
> (note 54 18 43 NOTE_INSN_EPILOGUE_BEG)
>
> (note 43 54 65 [bb 4] NOTE_INSN_BASIC_BLOCK)
>
> (note 65 43 8 NOTE_INSN_SWITCH_TEXT_SECTIONS)
>
> (note 8 65 61 [bb 5] NOTE_INSN_BASIC_BLOCK)
[ ... ]
Where the new target of the jump is a return statement later in the IL.

Note that we now have a SWITCH_TEXT_SECTIONS note that is not immediately
preceded by a BARRIER.  That triggers an assertion in the dwarf2 code.  Removal
of the BARRIER is inherent in this optimization.

The fix is simple, we avoid this optimization when there's a
SWITCH_TEXT_SECTIONS note between the conditional jump insn and its target.
Thankfully we already have a routine to test for this in reorg, so we just need
to call it appropriately.  The other approach would be to drop the note which I
considered and discarded.

We don't have great coverage for delay slot targets.  I've tested arc, cris,
fr30, frv, h8, iq2000, microblaze, or1k, sh3  visium in my tester as crosses
without new regressions, fixing one regression along the way.   Bootstrap &
regression testing on sh4 and hppa will take considerably longer.

gcc/

	* reorg.cc (relax_delay_slots): Do not optimize a conditional
	jump around an unconditional jump/return in the presence of
	a text section switch.

e5f73853

Revert "Delete MALLOC_ABI_ALIGNMENT define from pa32-linux.h" · ad2206d5
John David Anglin authored 8 months ago
```
This reverts commit 0ee3266b.
```
ad2206d5

Fortran: fix associate with assumed-length character array [PR115700] · 7b7f2034

Harald Anlauf authored 8 months ago

gcc/fortran/ChangeLog:

	PR fortran/115700
	* trans-stmt.cc (trans_associate_var): When the associate target
	is an array-valued character variable, the length is known at entry
	of the associate block.  Move setting of string length of the
	selector to the initialization part of the block.

gcc/testsuite/ChangeLog:

	PR fortran/115700
	* gfortran.dg/associate_69.f90: New test.

7b7f2034

RISC-V: Describe -march behavior for dependent extensions · 70f6bc39
Palmer Dabbelt authored 8 months ago
```
gcc/ChangeLog:

	* doc/invoke.texi: Describe -march behavior for dependent extensions on
	RISC-V.
```
70f6bc39

RISC-V: Add support for Zabha extension · 7b2b2e3d

Gianluca Guida authored 8 months ago

The Zabha extension adds support for subword Zaamo ops.

Extension: https://github.com/riscv/riscv-zabha.git
Ratification: https://jira.riscv.org/browse/RVS-1685



gcc/ChangeLog:

	* common/config/riscv/riscv-common.cc
	(riscv_subset_list::to_string): Skip zabha when not supported by
	the assembler.
	* config.in: Regenerate.
	* config/riscv/arch-canonicalize: Make zabha imply zaamo.
	* config/riscv/iterators.md (amobh): Add iterator for amo
	byte/halfword.
	* config/riscv/riscv.opt: Add zabha.
	* config/riscv/sync.md (atomic_<atomic_optab><mode>): Add
	subword atomic op pattern.
	(zabha_atomic_fetch_<atomic_optab><mode>): Add subword
	atomic_fetch op pattern.
	(lrsc_atomic_fetch_<atomic_optab><mode>): Prefer zabha over lrsc
	for subword atomic ops.
	(zabha_atomic_exchange<mode>): Add subword atomic exchange
	pattern.
	(lrsc_atomic_exchange<mode>): Prefer zabha over lrsc for subword
	atomic exchange ops.
	* configure: Regenerate.
	* configure.ac: Add zabha assembler check.
	* doc/sourcebuild.texi: Add zabha documentation.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp: Add zabha testsuite infra support.
	* gcc.target/riscv/amo/inline-atomics-1.c: Remove zabha to continue to
	test the lr/sc subword patterns.
	* gcc.target/riscv/amo/inline-atomics-2.c: Ditto.
	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: Ditto.
	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: Ditto.
	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: Ditto.
	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: Ditto.
	* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: Ditto.
	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c: Ditto.
	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acquire.c: Ditto.
	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-relaxed.c: Ditto.
	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-release.c: Ditto.
	* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-seq-cst.c: Ditto.
	* gcc.target/riscv/amo/zabha-all-amo-ops-char-run.c: New test.
	* gcc.target/riscv/amo/zabha-all-amo-ops-short-run.c: New test.
	* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-char.c: New test.
	* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-short.c: New test.
	* gcc.target/riscv/amo/zabha-rvwmo-amo-add-char.c: New test.
	* gcc.target/riscv/amo/zabha-rvwmo-amo-add-short.c: New test.
	* gcc.target/riscv/amo/zabha-ztso-amo-add-char.c: New test.
	* gcc.target/riscv/amo/zabha-ztso-amo-add-short.c: New test.

Co-Authored-By: Patrick O'Neill <patrick@rivosinc.com>
Signed-Off-By: Gianluca Guida <gianluca@rivosinc.com>
Tested-by: Andrea Parri <andrea@rivosinc.com>

7b2b2e3d

[PATCH] ARC: Update gcc.target/arc/pr9001184797.c test · c41eb4c7

Luis Silva authored 8 months ago

... to comply with new standards due to stricter analysis in
the latest GCC versions.

gcc/testsuite/ChangeLog:

	* gcc.target/arc/pr9001184797.c: Fix compiler warnings.

c41eb4c7

RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763] · de9254e2

Pan Li authored 8 months ago


According to the ISA,  the zvfhmin sub extension should only contain
convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
present when only the zvfhmin option is given.

This patch would like to fix it by split the pred_broadcast define_insn
into zvfhmin and zvfh part.  Given below example:

void test (_Float16 *dest, _Float16 bias) {
  dest[0] = bias;
  dest[1] = bias;
}

when compile with -march=rv64gcv_zfh_zvfhmin

Before this patch:
test:
  vsetivli        zero,2,e16,mf4,ta,ma
  vfmv.v.f        v1,fa0 // should not leverage vfmv for zvfhmin
  vse16.v v1,0(a0)
  ret

After this patch:
test:
  addi     sp,sp,-16
  fsh      fa0,14(sp)
  addi     a5,sp,14
  vsetivli zero,2,e16,mf4,ta,ma
  vlse16.v v1,0(a5),zero
  vse16.v  v1,0(a0)
  addi     sp,sp,16
  jr       ra

	PR target/115763

gcc/ChangeLog:

	* config/riscv/vector.md (*pred_broadcast<mode>): Split into
	zvfh and zvfhmin part.
	(*pred_broadcast<mode>_zvfh): New define_insn for zvfh part.
	(*pred_broadcast<mode>_zvfhmin): Ditto but for zvfhmin.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
	* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
	* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
	* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
	* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
	* gcc.target/riscv/rvv/base/pr115763-2.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

de9254e2

[MAINTAINERS] Update my email address. · 56814070

Prathamesh Kulkarni authored 8 months ago


	* MAINTAINERS: Update my email address and add myself to DCO.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

56814070

Match: Allow more types truncation for .SAT_TRUNC · 44c767c0

Pan Li authored 8 months ago


The .SAT_TRUNC has the input and output types,  aka cvt from
itype to otype and the sizeof (otype) < sizeof (itype).  The
previous patch only allows the sizeof (otype) == sizeof (itype) / 2.
But actually we have 1/4 and 1/8 truncation.

This patch would like to support more types trunction when
sizeof (otype) < sizeof (itype).  The below truncation will be
covered.

* uint64_t => uint8_t
* uint64_t => uint16_t
* uint64_t => uint32_t
* uint32_t => uint8_t
* uint32_t => uint16_t
* uint16_t => uint8_t

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/ChangeLog:

	* match.pd: Allow any otype is less than itype truncation.

Signed-off-by: Pan Li <pan2.li@intel.com>

44c767c0

Vect: Support IFN SAT_TRUNC for unsigned vector int · 8d2c460e

Pan Li authored 8 months ago


This patch would like to support the .SAT_TRUNC for the unsigned
vector int.  Given we have below example code:

Form 1
  #define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT)                             \
  void __attribute__((noinline))                                       \
  vec_sat_u_truc_##WT##_to_##NT##_fmt_1 (NT *x, WT *y, unsigned limit) \
  {                                                                    \
    for (unsigned i = 0; i < limit; i++)                               \
      {                                                                \
        bool overflow = y[i] > (WT)(NT)(-1);                           \
        x[i] = ((NT)y[i]) | (NT)-overflow;                             \
      }                                                                \
  }

VEC_DEF_SAT_U_TRUC_FMT_1 (uint32_t, uint64_t)

Before this patch:
void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, unsigned int limit)
{
  ...
  _51 = .SELECT_VL (ivtmp_49, POLY_INT_CST [2, 2]);
  ivtmp_35 = _51 * 8;
  vect__4.7_32 = .MASK_LEN_LOAD (vectp_y.5_34, 64B, { -1, ... }, _51, 0);
  mask_overflow_16.8_30 = vect__4.7_32 > { 4294967295, ... };
  vect__5.9_29 = (vector([2,2]) unsigned int) vect__4.7_32;
  vect__10.13_20 = .VCOND_MASK (mask_overflow_16.8_30, { 4294967295, ... }, vect__5.9_29);
  ivtmp_12 = _51 * 4;
  .MASK_LEN_STORE (vectp_x.14_11, 32B, { -1, ... }, _51, 0, vect__10.13_20);
  vectp_y.5_33 = vectp_y.5_34 + ivtmp_35;
  vectp_x.14_46 = vectp_x.14_11 + ivtmp_12;
  ivtmp_50 = ivtmp_49 - _51;
  if (ivtmp_50 != 0)
  ...
}

After this patch:
void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, unsigned int limit)
{
  ...
  _12 = .SELECT_VL (ivtmp_21, POLY_INT_CST [2, 2]);
  ivtmp_34 = _12 * 8;
  vect__4.7_31 = .MASK_LEN_LOAD (vectp_y.5_33, 64B, { -1, ... }, _12, 0);
  vect_patt_40.8_30 = .SAT_TRUNC (vect__4.7_31); // << .SAT_TRUNC
  ivtmp_29 = _12 * 4;
  .MASK_LEN_STORE (vectp_x.9_28, 32B, { -1, ... }, _12, 0, vect_patt_40.8_30);
  vectp_y.5_32 = vectp_y.5_33 + ivtmp_34;
  vectp_x.9_27 = vectp_x.9_28 + ivtmp_29;
  ivtmp_20 = ivtmp_21 - _12;
  if (ivtmp_20 != 0)
  ...
}

The below test suites are passed for this patch
* The x86 bootstrap test.
* The x86 fully regression test.
* The rv64gcv fully regression tests.

gcc/ChangeLog:

	* tree-vect-patterns.cc (gimple_unsigned_integer_sat_trunc): Add
	new decl generated by match.
	(vect_recog_sat_trunc_pattern): Add new func impl to recog the
	.SAT_TRUNC pattern.

Signed-off-by: Pan Li <pan2.li@intel.com>

8d2c460e

Remove redundant vector permute dump · 1dc20965

Richard Biener authored 8 months ago

The following removes redundant dumping in vect permute vectorization.

	* tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
	redundant dump.

1dc20965

[PATCH] match.pd: Fold x/sqrt(x) to sqrt(x) · 8dc5ad3c

Jennifer Schmitz authored 8 months ago


This patch adds a pattern in match.pd folding x/sqrt(x) to sqrt(x) for -funsafe-math-optimizations. Test cases were added for double, float, and long double.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
Ok for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/

	* match.pd: Fold x/sqrt(x) to sqrt(x).

gcc/testsuite/

	* gcc.dg/tree-ssa/sqrt_div.c: New test.

8dc5ad3c

Deduplicate explicitly-sized types · 640f0f3e

Alexandre Oliva authored 8 months ago

When make_type_from_size is called with a biased type, for an entity
that isn't explicitly biased, we may refrain from reusing the given
type because it doesn't seem to match, and then proceed to create an
exact copy of that type.

Compute earlier the biased status of the expected type, early enough
for the suitability check of the given type.  Modify for_biased
instead of biased_p, so that biased_p remains with the given type's
status for the comparison.

Avoid creating unnecessary copies of types in make_type_from_size, by
caching and reusing previously-created identical types, similarly to
the caching of packable types.

While at that, fix two vaguely related issues:

- TYPE_DEBUG_TYPE's storage is shared with other sorts of references
to types, so it shouldn't be accessed unless
TYPE_CAN_HAVE_DEBUG_TYPE_P holds.

- When we choose the narrower/packed variant of a type as the main
debug info type, we fail to output its name if we fail to follow debug
type for the TYPE_NAME decl type in modified_type_die.


for  gcc/ada/ChangeLog

	* gcc-interface/misc.cc (gnat_get_array_descr_info): Only follow
	TYPE_DEBUG_TYPE if TYPE_CAN_HAVE_DEBUG_TYPE_P.
	* gcc-interface/utils.cc (sized_type_hash): New struct.
	(sized_type_hasher): New struct.
	(sized_type_hash_table): New variable.
	(init_gnat_utils): Allocate it.
	(destroy_gnat_utils): Release it.
	(sized_type_hasher::equal): New.
	(hash_sized_type): New.
	(canonicalize_sized_type): New.
	(make_type_from_size): Use it to cache packed variants.  Fix
	type reuse by combining biased_p and for_biased earlier.  Hold
	the combination in for_biased, adjusting later uses.

for  gcc/ChangeLog

	* dwarf2out.cc (modified_type_die): Follow name's debug type.

for  gcc/testsuite/ChangeLog

	* gnat.dg/bias1.adb: Count occurrences of -7.*DW_AT_GNU_bias.

640f0f3e

[debug] Avoid dropping bits from num/den in fixed-point types · 113c4826

Alexandre Oliva authored 8 months ago

We used to use an unsigned 128-bit type to hold the numerator and
denominator used to represent the delta of a fixed-point type in debug
information, but there are cases in which that was not enough, and
more significant bits silently overflowed and got omitted from debug
information.

Introduce a mode in which UI_to_gnu selects a wide-enough unsigned
type, and use that to convert numerator and denominator.  While at
that, avoid exceeding the maximum precision for wide ints, and for
available int modes, when selecting a type to represent very wide
constants, falling back to 0/0 for unrepresentable fractions.


for  gcc/ada/ChangeLog

	* gcc-interface/cuintp.cc (UI_To_gnu): Add mode that selects a
	wide enough unsigned type.  Fail if the constant exceeds the
	representable numbers.
	* gcc-interface/decl.cc (gnat_to_gnu_entity): Use it for
	numerator and denominator of fixed-point types.  In case of
	failure, fall back to an indeterminate fraction.

113c4826

[i386] restore recompute to override opts after change [PR113719] · bf2fc0a2

Alexandre Oliva authored 9 months ago

The first patch for PR113719 regressed gcc.dg/ipa/iinline-attr.c on
toolchains configured to --enable-frame-pointer, because the
optimization node created within handle_optimize_attribute had
flag_omit_frame_pointer incorrectly set, whereas
default_optimization_node didn't.  With this difference,
can_inline_edge_by_limits_p flagged an optimization mismatch and we
refused to inline the function that had a redundant optimization flag
into one that didn't, which is exactly what is tested for there.

This patch restores the calls to ix86_default_align and
ix86_recompute_optlev_based_flags that used to be, and ought to be,
issued during TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE, but preserves the
intent of the original change, of having those functions called at
different spots within ix86_option_override_internal.  To that end,
the remaining bits were refactored into a separate function, that was
in turn adjusted to operate on explicitly-passed opts and opts_set,
rather than going for their global counterparts.


for  gcc/ChangeLog

	PR target/113719
	* config/i386/i386-options.cc
	(ix86_override_options_after_change_1): Add opts and opts_set
	parms, operate on them, after factoring out of...
	(ix86_override_options_after_change): ... this.  Restore calls
	of ix86_default_align and ix86_recompute_optlev_based_flags.
	(ix86_option_override_internal): Call the factored-out bits.

bf2fc0a2