Commits · a29c5852a606588175d11844db84da0881227100 · COBOLworx / gcc-cobol

Jun 06, 2024

nvptx, libgcc: Stub unwinding implementation · a29c5852

Thomas Schwinge authored 9 months ago


Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
of libgfortran wants to do, for example.

The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
'libgcc/config/gcn/unwind-gcn.c'.

libgcc/ChangeLog:

	* config/nvptx/t-nvptx: Add unwind-nvptx.c.
	* config/nvptx/unwind-nvptx.c: New file.

Co-authored-by: Andrew Stubbs <ams@gcc.gnu.org>

a29c5852

nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' · 5bbe5350

Thomas Schwinge authored 9 months ago

This extends commit d9c90c82
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'"
for offloading.

	libgcc/
	* config/nvptx/gbl-ctors.c ["mgomp"]
	(__do_global_ctors__entry__mgomp)
	(__do_global_dtors__entry__mgomp): New.
	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
	New.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
	(nvptx_close_device, GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image): Call it.

5bbe5350

nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution, via 'vote.all.pred' · b4e68dd9

Thomas Schwinge authored 10 months ago

For example, this allows for '-muniform-simt' code to be executed
single-threaded, which currently fails (device-side 'trap'): the '0xffffffff'
bitmask isn't correct if not all 32 threads of a warp are active.  The same
issue/fix, I suppose but have not verified, would apply if we were to allow for
OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'.

We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0.
Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff',
which evidently appears to do the right thing.  (I've tested '-muniform-simt'
code executing single-threaded.)

The change that I proposed on 2022-12-15 was to emit PTX code to calculate
'(1 << %ntid.x) - 1' as the actual bitmask to use instead of '0xffffffff'.
This works, but the PTX JIT generates SASS code to do this computation.

In turn, this change now uses PTX 'vote.all.pred' -- which even simplifies upon
the original code a little bit, see the following examplary SASS 'diff' before
vs. after this change:

    [...]
              /*[...]*/                   SYNC                                                        (*"BRANCH_TARGETS .L_x_332"*)        }
      .L_x_332:
    -         /*[...]*/                   VOTE.ANY R9, PT, PT ;
    +         /*[...]*/                   VOTE.ALL P1, PT ;
    -         /*[...]*/                   ISETP.NE.U32.AND P1, PT, R9, -0x1, PT ;
    -         /*[...]*/              @!P1 BRA `(.L_x_333) ;
    +         /*[...]*/               @P1 BRA `(.L_x_333) ;
              /*[...]*/                   BPT.TRAP 0x1 ;
      .L_x_333:
    -         /*[...]*/               @P1 EXIT ;
    +         /*[...]*/              @!P1 EXIT ;
    [...]

	gcc/
	* config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for
	non-full-warp execution, via 'vote.all.pred'.
	gcc/testsuite/
	* gcc.target/nvptx/nvptx.exp
	(check_effective_target_default_ptx_isa_version_at_least_6_0):
	New.
	* gcc.target/nvptx/uniform-simt-2.c: Adjust.
	* gcc.target/nvptx/uniform-simt-5.c: New.

b4e68dd9

Clean up after newlib "nvptx: In offloading execution, map '_exit' to 'abort' [GCC PR85463]" · 395ac041

Thomas Schwinge authored 9 months ago

	PR target/85463
	libgfortran/
	* runtime/minimal.c [__nvptx__] (exit): Don't override.
	libgomp/
	* config/nvptx/error.c (exit): Don't override.
	* testsuite/libgomp.oacc-fortran/error_stop-1.f: Update.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.

395ac041

Vect: Support IFN SAT_SUB for unsigned vector int · 2d11de35

Pan Li authored 9 months ago


This patch would like to support the .SAT_SUB for the unsigned
vector int.  Given we have below example code:

void
vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  for (unsigned i = 0; i < n; i++)
    out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i]));
}

Before this patch:
void
vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]);
  ivtmp_56 = _77 * 8;
  vect__4.7_59 = .MASK_LEN_LOAD (vectp_x.5_57, 64B, { -1, ... }, _77, 0);
  vect__6.10_63 = .MASK_LEN_LOAD (vectp_y.8_61, 64B, { -1, ... }, _77, 0);

  mask__7.11_64 = vect__4.7_59 >= vect__6.10_63;
  _66 = .COND_SUB (mask__7.11_64, vect__4.7_59, vect__6.10_63, { 0, ... });

  .MASK_LEN_STORE (vectp_out.15_71, 64B, { -1, ... }, _77, 0, _66);
  vectp_x.5_58 = vectp_x.5_57 + ivtmp_56;
  vectp_y.8_62 = vectp_y.8_61 + ivtmp_56;
  vectp_out.15_72 = vectp_out.15_71 + ivtmp_56;
  ivtmp_76 = ivtmp_75 - _77;
  ...
}

After this patch:
void
vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _76 = .SELECT_VL (ivtmp_74, POLY_INT_CST [2, 2]);
  ivtmp_60 = _76 * 8;
  vect__4.7_63 = .MASK_LEN_LOAD (vectp_x.5_61, 64B, { -1, ... }, _76, 0);
  vect__6.10_67 = .MASK_LEN_LOAD (vectp_y.8_65, 64B, { -1, ... }, _76, 0);

  vect_patt_37.11_68 = .SAT_SUB (vect__4.7_63, vect__6.10_67);

  .MASK_LEN_STORE (vectp_out.12_70, 64B, { -1, ... }, _76, 0, vect_patt_37.11_68);
  vectp_x.5_62 = vectp_x.5_61 + ivtmp_60;
  vectp_y.8_66 = vectp_y.8_65 + ivtmp_60;
  vectp_out.12_71 = vectp_out.12_70 + ivtmp_60;
  ivtmp_75 = ivtmp_74 - _76;
  ...
}

The below test suites are passed for this patch
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression tests.

gcc/ChangeLog:

	* match.pd: Add new form for vector mode recog.
	* tree-vect-patterns.cc (gimple_unsigned_integer_sat_sub): Add
	new match func decl;
	(vect_recog_build_binary_gimple_call): Extract helper func to
	build gcall with given internal_fn.
	(vect_recog_sat_sub_pattern): Add new func impl to recog .SAT_SUB.

Signed-off-by: Pan Li <pan2.li@intel.com>

2d11de35

lto: Remove random_seed from section name. · 346f33e2

Michal Jires authored 1 year ago

This patch removes suffixes from section names during LTO linking.

These suffixes were originally added for ld -r to work (PR lto/44992).
They were added to all LTO object files, but are only useful before WPA.
After that they waste space, and if kept random, make LTO caching impossible.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

	* lto-streamer.cc (lto_get_section_name): Remove suffixes after WPA.

gcc/lto/ChangeLog:

	* lto-common.cc (lto_section_with_id): Dont load suffix during LTRANS.

346f33e2

lto: Skip flag OPT_fltrans_output_list_. · ca43678c

Michal Jires authored 1 year ago

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

	* lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_.

ca43678c

RISC-V: Regenerate opt urls. · 037fc4d1

Robin Dapp authored 9 months ago

I wasn't aware that I needed to regenerate the opt urls when
adding an option.  This patch does that.

gcc/ChangeLog:

	* config/riscv/riscv.opt.urls: Regenerate.

037fc4d1

[APX CCMP] Support ccmp for float compare · 0b6cea87

Hongyu Wang authored 10 months ago

The ccmp insn itself doesn't support fp compare, but x86 has fp comi
insn that changes EFLAG which can be the scc input to ccmp. Allow
scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD
compare which can not be identified in ccmp.

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_gen_ccmp_first):
	Add fp compare and check the allowed fp compare type.
	(ix86_gen_ccmp_next): Adjust compare_code input to ccmp for
	fp compare.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ccmp-1.c: Add test for fp compare.
	* gcc.target/i386/apx-ccmp-2.c: Likewise.

0b6cea87

[APX CCMP] Adjust startegy for selecting ccmp candidates · 23db8730

Hongyu Wang authored 11 months ago

For general ccmp scenario, the tree sequence is like

_1 = (a < b)
_2 = (c < d)
_3 = _1 & _2

current ccmp expanding will try to swap compare order for _1 and _2,
compare the expansion cost/cost2 for expanding _1 or _2 first, then
return the sequence with lower cost.

It is possible that one expansion succeeds and the other fails.
For example, x86 has int ccmp but not fp ccmp, so a combined fp and
int comparison must be ordered such that the fp comparison happens
first.  The costs are not meaningful for failed expansions.

Check the expand_ccmp_next result ret and ret2, returns the valid one
before cost comparison.

gcc/ChangeLog:

	* ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
	expand_ccmp_next, returns the valid one first instead of
	comparing cost.

23db8730

[APX CCMP] Support APX CCMP · c989e59f

Hongyu Wang authored 1 year ago

APX CCMP feature implements conditional compare which executes compare
when EFLAGS matches certain condition.

CCMP introduces default flags value (dfv), when conditional compare does
not execute, it will directly set the flags according to dfv.

The instruction goes like

ccmpeq {dfv=sf,of,cf,zf}  %rax, %r16

For this instruction, it will test EFLAGS regs if it matches conditional
code EQ, if yes, compare %rax and %r16 like legacy cmp. If no, the
EFLAGS will be updated according to dfv, which means SF,OF,CF,ZF are
set. PF will be set according to CF in dfv, and AF will always be
cleared.

The dfv part can be a combination of sf,of,cf,zf, like {dfv=cf,zf} which
sets CF and ZF only and clear others, or {dfv=} which clears all EFLAGS.

To enable CCMP, we implemented the target hook TARGET_GEN_CCMP_FIRST and
TARGET_GEN_CCMP_NEXT to reuse the current ccmp infrastructure. Also we
extended the cstorem4 optab to support storing different CCmode to fit
current ccmp infrasturcture.

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_gen_ccmp_first): New function
	that test if the first compare can be generated.
	(ix86_gen_ccmp_next): New function to emit a simgle compare and ccmp
	sequence.
	* config/i386/i386-opts.h (enum apx_features): Add apx_ccmp.
	* config/i386/i386-protos.h (ix86_gen_ccmp_first): New proto
	declare.
	(ix86_gen_ccmp_next): Likewise.
	(ix86_get_flags_cc): Likewise.
	* config/i386/i386.cc (ix86_flags_cc): New enum.
	(ix86_ccmp_dfv_mapping): New string array to map conditional
	code to dfv.
	(ix86_print_operand): Handle special dfv flag for CCMP.
	(ix86_get_flags_cc): New function to return x86 CC enum.
	(TARGET_GEN_CCMP_FIRST): Define.
	(TARGET_GEN_CCMP_NEXT): Likewise.
	* config/i386/i386.h (TARGET_APX_CCMP): Define.
	* config/i386/i386.md (@ccmp<mode>): New define_insn to support
	ccmp.
	(UNSPEC_APX_DFV): New unspec for ccmp dfv.
	(ALL_CC): New mode iterator.
	(cstorecc4): Change to ...
	(cstore<mode>4) ... this, use ALL_CC to loop through all
	available CCmodes.
	* config/i386/i386.opt (apx_ccmp): Add enum value for ccmp.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ccmp-1.c: New compile test.
	* gcc.target/i386/apx-ccmp-2.c: New runtime test.

c989e59f

[APX] Adjust target-support check [PR 115341] · f46d54a2

Hongyu Wang authored 9 months ago

Current target apxf check does not specify sub-features that assembler
supports, so the check with older binutils will fail at assemble stage
for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check
for all apx subfeatures.

gcc/testsuite/ChangeLog:

	PR target/115341
	* lib/target-supports.exp (check_effective_target_apxf):
	Check for all apx sub-features.

f46d54a2

Allow single-lane SLP in-order reductions · 4653b682

Richard Biener authored 1 year ago

The single-lane case isn't different from non-SLP, no re-association
implied.  But the transform stage cannot handle a conditional reduction
op which isn't checked during analysis - this makes it work, exercised
with a single-lane non-reduction-chain by gcc.target/i386/pr112464.c

	* tree-vect-loop.cc (vectorizable_reduction): Allow
	single-lane SLP in-order reductions.
	(vectorize_fold_left_reduction): Handle SLP reduction with
	conditional reduction op.

4653b682

Add double reduction support for SLP vectorization · 2ee41ef7

Richard Biener authored 1 year ago

The following makes double reduction vectorization work when
using (single-lane) SLP vectorization.

	* tree-vect-loop.cc (vect_analyze_scalar_cycles_1): Queue
	double reductions in LOOP_VINFO_REDUCTIONS.
	(vect_create_epilog_for_reduction): Remove asserts disabling
	SLP for double reductions.
	(vectorizable_reduction): Analyze SLP double reductions
	only once and start off the correct places.
	* tree-vect-slp.cc (vect_get_and_check_slp_defs): Allow
	vect_double_reduction_def.
	(vect_build_slp_tree_2): Fix condition for the ignored
	reduction initial values.
	* tree-vect-stmts.cc (vect_analyze_stmt): Allow
	vect_double_reduction_def.

2ee41ef7

Allow single-lane COND_REDUCTION vectorization · 202a9c8f

Richard Biener authored 1 year ago

The following enables single-lane COND_REDUCTION vectorization.

	* tree-vect-loop.cc (vect_create_epilog_for_reduction):
	Adjust for single-lane COND_REDUCTION SLP vectorization.
	(vectorizable_reduction): Likewise.
	(vect_transform_cycle_phi): Likewise.

202a9c8f

Relax COND_EXPR reduction vectorization SLP restriction · 28edeb14

Richard Biener authored 1 year ago

Allow one-lane SLP but for the case where we need to swap the arms.

	* tree-vect-stmts.cc (vectorizable_condition): Allow
	single-lane SLP, but not when we need to swap then and
	else clause.

28edeb14

libgomp: Mark Loop transformation constructs as implemented in the implementation status · 6a6bab4b

Jakub Jelinek authored 9 months ago

The implementation has been committed in r15-1037.

2024-06-06  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.texi (OpenMP 5.1 status): Mark Loop transformation constructs
	as implemented.

6a6bab4b

MIPS: Need COSTS_N_INSNS in mips_insn_cost · edd90d6d

YunQiang Su authored 9 months ago

In mips_insn_cost, COSTS_N_INSNS is missing when we return the cost
if count * ratio > 0.

gcc
	* config/mips/mips.cc(mips_insn_cost): Add missing COSTS_N_INSNS
	to count.

edd90d6d

Refine testcase for power10. · fcfce55c

liuhongt authored 9 months ago

For power10, there're extra 3 REG_EQUIV notes with (fix:SI. to avoid
the failure. Check (fix:SI is from the pattern not NOTE.

gcc/testsuite/ChangeLog:

	PR target/115365
	* gcc.dg/pr100927.c: Don't scan fix:SI from the note.

fcfce55c

[libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__ · 67be156f

Alexandre Oliva authored 9 months ago

A proprietary embedded operating system that uses clang as its primary
compiler ships headers that require __clang__ to be defined.  Defining
that macro causes libstdc++ to adopt workarounds that work for clang
but that break for GCC.

So, introduce a _GLIBCXX_CLANG macro, and a convention to test for it
rather than for __clang__, so that a GCC variant that adds -D__clang__
to satisfy system headers can also -D_GLIBCXX_CLANG=0 to avoid
workarounds that are not meant for GCC.

I've left fast_float and ryu files alone, their tests for __clang__
don't seem to be harmful for GCC, they don't include bits/c++config,
and patching such third-party files would just make trouble for
updating them without visible benefit.  pstl_config.h, though also
imported, required adjustment.


for  libstdc++-v3/ChangeLog

	* include/bits/c++config (_GLIBCXX_CLANG): Define or undefine.
	* include/bits/locale_facets_nonio.tcc: Test for it.
	* include/bits/stl_bvector.h: Likewise.
	* include/c_compatibility/stdatomic.h: Likewise.
	* include/experimental/bits/simd.h: Likewise.
	* include/experimental/bits/simd_builtin.h: Likewise.
	* include/experimental/bits/simd_detail.h: Likewise.
	* include/experimental/bits/simd_x86.h: Likewise.
	* include/experimental/simd: Likewise.
	* include/std/complex: Likewise.
	* include/std/ranges: Likewise.
	* include/std/variant: Likewise.
	* include/pstl/pstl_config.h: Likewise.

67be156f

Adjust rtx_cost for MEM to enable more simplication · 961dd0d6

liuhongt authored 11 months ago

For CONST_VECTOR_DUPLICATE_P in constant_pool, it is just broadcast or
variants in ix86_vector_duplicate_simode_const.
Adjust the cost to COSTS_N_INSNS (2) + speed which should be a little
bit larger than broadcast.

gcc/ChangeLog:
	PR target/114428
	* config/i386/i386.cc (ix86_rtx_costs): Adjust cost for
	CONST_VECTOR_DUPLICATE_P in constant_pool.
	* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
	Remove static.
	* config/i386/i386-protos.h (ix86_broadcast_from_constant):
	Declare.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr114428.c: New test.

961dd0d6

Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode. · 7876cde2

liuhongt authored 11 months ago

When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
of A, then it can be simplified to LSHIFTRT.

i.e Simplify
(and:v8hi
  (ashifrt:v8hi A 8)
  (const_vector 0xff x8))
to
(lshifrt:v8hi A 8)

gcc/ChangeLog:

	PR target/114428
	* simplify-rtx.cc
	(simplify_context::simplify_binary_operation_1):
	Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
	specific mask.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr114428-1.c: New test.

7876cde2

Daily bump. · 10cb3336
GCC Administrator authored 9 months ago

10cb3336

Jun 05, 2024

contrib: Fix spelling and capitalization in header-tools · 66fa2f10

Jonathan Wakely authored 9 months ago

contrib/header-tools/ChangeLog:

	* README: Fix spelling and capitalization typos.
	* gcc-order-headers: Fix spelling typo.

66fa2f10

contrib: header-tools scripts updated to python3 · ac6fb0ff

Sundeep KOKKONDA authored 11 months ago


The scripts in contrib/header-tools/ are incompatible with python3.
This updates them to use python3.

contrib/header-tools/ChangeLog:

	* count-headers: Adapt to Python 3.
	* gcc-order-headers: Likewise.
	* graph-header-logs: Likewise.
	* graph-include-web: Likewise.
	* headerutils.py: Likewise.
	* included-by: Likewise.
	* reduce-headers: Likewise.
	* replace-header: Likewise.
	* show-headers: Likewise.

Signed-off-by: Sundeep KOKKONDA <sundeep.kokkonda@windriver.com>

ac6fb0ff

check_GNU_style: Use raw strings. · 03e1a727

Robin Dapp authored 10 months ago

This silences some warnings when using check_GNU_style.

contrib/ChangeLog:

	* check_GNU_style_lib.py: Use raw strings for regexps.

03e1a727

RISC-V: Introduce -mvector-strict-align. · 68b0742a

Robin Dapp authored 9 months ago

this patch disables movmisalign by default and introduces
the -mno-vector-strict-align option to override it and re-enable
movmisalign.  For now, generic-ooo is the only uarch that supports
misaligned vector access.

The patch also adds a check_effective_target_riscv_v_misalign_ok to
the testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.

Changes from v3:
 - Adressed Kito's comments.
 - Made -mscalar-strict-align a real alias.

gcc/ChangeLog:

	* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
	Move from here...
	* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
	...to here and map to riscv_vector_unaligned_access_p.
	* config/riscv/riscv.opt: Add -mvector-strict-align.
	* config/riscv/riscv.cc (struct riscv_tune_param): Add
	vector_unaligned_access.
	(riscv_override_options_internal): Set
	riscv_vector_unaligned_access_p.
	* doc/invoke.texi: Document -mvector-strict-align.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp: Add
	check_effective_target_riscv_v_misalign_ok.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
	-mno-vector-strict-align.
	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
	* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.

68b0742a

AArch64: enable new predicate tuning for Neoverse cores. · 3eb9f6ea

Tamar Christina authored 9 months ago

This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2.
It is kept off for generic codegen.

Note the reason for the +sve even though they are in aarch64-sve.exp is if the
testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then
the intrinsics end up being disabled because the -march is preferred over the
-mcpu even though the -mcpu comes later.

This prevents the tests from failing in such runs.

gcc/ChangeLog:

	* config/aarch64/tuning_models/neoversen2.h (neoversen2_tunings): Add
	AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
	* config/aarch64/tuning_models/neoversev1.h (neoversev1_tunings): Add
	AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
	* config/aarch64/tuning_models/neoversev2.h (neoversev2_tunings): Add
	AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/pred_clobber_1.c: New test.
	* gcc.target/aarch64/sve/pred_clobber_2.c: New test.
	* gcc.target/aarch64/sve/pred_clobber_3.c: New test.
	* gcc.target/aarch64/sve/pred_clobber_4.c: New test.

3eb9f6ea

AArch64: add new alternative with early clobber to patterns · 2de3bbde

Tamar Christina authored 9 months ago

This patch adds new alternatives to the patterns which are affected.  The new
alternatives with the conditional early clobbers are added before the normal
ones in order for LRA to prefer them in the event that we have enough free
registers to accommodate them.

In case register pressure is too high the normal alternatives will be preferred
before a reload is considered as we rather have the tie than a spill.

Tests are in the next patch.

gcc/ChangeLog:

	* config/aarch64/aarch64-sve.md (and<mode>3,
	@aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
	*<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
	*<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
	aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
	*<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>,
	*cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest,
	@aarch64_pred_cmp<cmp_op><mode>_wide,
	*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
	*aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>,
	*aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest,
	@aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc,
	*aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest,
	*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add
	new early clobber
	alternative.
	* config/aarch64/aarch64-sve2.md
	(@aarch64_pred_<sve_int_op><mode>): Likewise.

2de3bbde

AArch64: add new tuning param and attribute for enabling conditional early clobber · 35f17c68

Tamar Christina authored 9 months ago

This adds a new tuning parameter AARCH64_EXTRA_TUNE_AVOID_PRED_RMW for AArch64 to
allow us to conditionally enable the early clobber alternatives based on the
tuning models.

gcc/ChangeLog:

	* config/aarch64/aarch64-tuning-flags.def
	(AVOID_PRED_RMW): New.
	* config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
	* config/aarch64/aarch64.md (pred_clobber): New.
	(arch_enabled): Use it.

35f17c68

AArch64: convert several predicate patterns to new compact syntax · fd489889

Tamar Christina authored 9 months ago

This converts the single alternative patterns to the new compact syntax such
that when I add the new alternatives it's clearer what's being changed.

Note that this will spew out a bunch of warnings from geninsn as it'll warn that
@ is useless for a single alternative pattern.  These are not fatal so won't
break the build and are only temporary.

No change in functionality is expected with this patch.

gcc/ChangeLog:

	* config/aarch64/aarch64-sve.md (and<mode>3,
	@aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
	*<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
	*<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
	aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
	*<logical_nn><mode>3_ptest, *cmp<cmp_op><mode>_ptest,
	@aarch64_pred_cmp<cmp_op><mode>_wide,
	*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
	*aarch64_pred_cmp<cmp_op><mode>_wide_ptest, *aarch64_brk<brk_op>_cc,
	*aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>,
	*aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z,
	*aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc,
	*aarch64_rdffr_cc): Convert to compact syntax.
	* config/aarch64/aarch64-sve2.md
	(@aarch64_pred_<sve_int_op><mode>): Likewise.

fd489889

openmp: OpenMP loop transformation support · 804c0f35

Jakub Jelinek authored 9 months ago

This patch is largely rewritten version of the
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631764.html
patch set which I've promissed to adjust the way I'd like it but didn't
get to it until now.
The previous series together in diffstat was
 176 files changed, 12107 insertions(+), 298 deletions(-)
This patch is
 197 files changed, 10843 insertions(+), 212 deletions(-)
and diff between the old series and new patch is
 268 files changed, 8053 insertions(+), 9231 deletions(-)

Only the 5.1/5.2 tile/unroll constructs are supported, in various
places some preparations for the other 6.0 loop transformations
constructs (interchange/reverse/fuse) are done, but certainly
not complete and not everywhere.  The important difference is that
because tile/unroll partial map 1:1 the original loops to generated
canonical loops and add another set of generated loops without canonical
form inside of it, the tile/unroll partial constructs are terminal
for the generated loop, one can't have some loops from the tile or
unroll partial and some further loops from inside the body of that
construct.
The GENERIC representation attempts to match what the standard specifies,
so there are separate OMP_TILE and OMP_UNROLL trees.  If for a particular
loop in a loop nest of some OpenMP loop it awaits a generated loop from a
nested loop, or if in OMP_LOOPXFORM_LOWERED OMP_TILE/UNROLL construct
a generated loop has been moved to some surrounding construct, that
particular loop is represented by all NULL_TREEs in the
OMP_FOR_{INIT,COND,INCR,ORIG_DECLS} vector.
The lowering of the loop transforming constructs is done at gimplification
time, at the start of gimplify_omp_for.
I think this way it is more maintainable over magic clauses with various
loop depths on the other looping constructs or the magic OMP_LOOP_TRANS
construct.
Though, I admit I'm still undecided how to represent the OpenMP 6.0
loop transformation case of say:
  #pragma omp for collapse (4)
  for (int i = 0; i < 32; ++i)
  #pragma omp interchange permutation (2, 1)
  #pragma omp reverse
  for (int j = 0; j < 32; ++j)
  #pragma omp reverse
  for (int k = 0; k < 32; ++k)
  for (int l = 0; l < 32; ++l)
    ;
Surely the i loop would go to first vector elements of OMP_FOR_*
of the work-sharing loop, then 2 loops are expecting generated loops
from interchange which would be inside of the body.  But the innermost
l loop isn't part of the interchange, so the question is where to
put it.  One possibility is to have it in the 4th loop of the OMP_FOR,
another possibility would be to add some artificial construct inside
of the OMP_INTERCHANGE and 2 OMP_REVERSE bodies which would contain
the inner loop(s), e.g. it could be OMP_INTERCHANGE without permutation
clause or some artificial ones or whatever.

I've recently raised various unclear things in the 5.1/5.2/TRs versions
regarding loop transformations, in particular
https://github.com/OpenMP/spec/issues/3908
https://github.com/OpenMP/spec/issues/3909
(sorry, private links unless you have OpenMP membership).  Until those
are resolved, I have a sorry on trying to mix generated loops with
non-rectangular loops (way too many questions need to be answered before
that can be done) and similarly for mixing non-perfectly nested loops
with generated loops (again, it can be implemented somehow, but is way
too unclear).  The second issue is mostly about data sharing, which is
ambiguous, the patch makes the artificial iterators of the loops effectively
private in the associated constructs (more like local), but for user
iterators doesn't do anything in particular, so for now one needs to use
explicit data sharing clauses on the non-loop transformation OpenMP looping
constructs or surrounding parallel/task/target etc.

2024-06-05  Jakub Jelinek  <jakub@redhat.com>
	    Frederik Harwath  <frederik@codesourcery.com>
	    Sandra Loosemore  <sandra@codesourcery.com>

gcc/
	* tree.def (OMP_TILE, OMP_UNROLL): New tree codes.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_PARTIAL,
	OMP_CLAUSE_FULL and OMP_CLAUSE_SIZES.
	* tree.h (OMP_LOOPXFORM_CHECK): Define.
	(OMP_LOOPXFORM_LOWERED): Define.
	(OMP_CLAUSE_PARTIAL_EXPR): Define.
	(OMP_CLAUSE_SIZES_LIST): Define.
	* tree.cc (omp_clause_num_ops, omp_clause_code_name): Add entries
	for OMP_CLAUSE_{PARTIAL,FULL,SIZES}.
	* tree-pretty-print.cc (dump_omp_clause): Handle
	OMP_CLAUSE_{PARTIAL,FULL,SIZES}.
	(dump_generic_node): Handle OMP_TILE and OMP_UNROLL.  Skip printing
	loops with NULL OMP_FOR_INIT (node) vector element.
	* gimplify.cc (is_gimple_stmt): Handle OMP_TILE and OMP_UNROLL.
	(gimplify_omp_taskloop_expr): For SAVE_EXPR use gimplify_save_expr.
	(gimplify_omp_loop_xform): New function.
	(gimplify_omp_for): Call omp_maybe_apply_loop_xforms and if that
	reshuffles what the passed pointer points to, retry or return GS_OK.
	Handle OMP_TILE and OMP_UNROLL.
	(gimplify_omp_loop): Call omp_maybe_apply_loop_xforms and if that
	reshuffles what the passed pointer points to, return GS_OK.
	(gimplify_expr): Handle OMP_TILE and OMP_UNROLL.
	* omp-general.h (omp_loop_number_of_iterations,
	omp_maybe_apply_loop_xforms): Declare.
	* omp-general.cc (omp_adjust_for_condition): For LE_EXPR and GE_EXPR
	with pointers, don't add/subtract one, but the size of what the
	pointer points to.
	(omp_loop_number_of_iterations, omp_apply_tile,
	find_nested_loop_xform, omp_maybe_apply_loop_xforms): New functions.
gcc/c-family/
	* c-common.h (c_omp_find_generated_loop): Declare.
	* c-gimplify.cc (c_genericize_control_stmt): Handle OMP_TILE and
	OMP_UNROLL.
	* c-omp.cc (c_finish_omp_for): Handle generated loops.
	(c_omp_is_loop_iterator): Likewise.
	(c_find_nested_loop_xform_r, c_omp_find_generated_loop): New
	functions.
	(c_omp_check_loop_iv): Handle generated loops.  For now sorry
	on mixing non-rectangular loop with generated loops.
	(c_omp_check_loop_binding_exprs): For now sorry on mixing
	imperfect loops with generated loops.
	(c_omp_directives): Uncomment tile and unroll entries.
	* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE and
	PRAGMA_OMP_UNROLL, change PRAGMA_OMP__LAST_ to the latter.
	(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and
	PRAGMA_OMP_CLAUSE_PARTIAL.
	* c-pragma.cc (omp_pragmas_simd): Add tile and unroll omp pragmas.
gcc/c/
	* c-parser.cc (c_parser_skip_std_attribute_spec_seq): New function.
	(check_omp_intervening_code): Reject imperfectly nested tile.
	(c_parser_compound_statement_nostart): If want_nested_loop, use
	c_parser_omp_next_tokens_can_be_canon_loop instead of just checking
	for RID_FOR keyword.
	(c_parser_omp_clause_name): Handle full and partial clause names.
	(c_parser_omp_clause_allocate): Remove spurious semicolon.
	(c_parser_omp_clause_full, c_parser_omp_clause_partial): New
	functions.
	(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL and
	PRAGMA_OMP_CLAUSE_PARTIAL.
	(c_parser_omp_next_tokens_can_be_canon_loop): New function.
	(c_parser_omp_loop_nest): Parse C23 attributes.  Handle tile/unroll
	constructs.  Use c_parser_omp_next_tokens_can_be_canon_loop instead
	of just checking for RID_FOR keyword.  Only add_stmt (body) if it is
	non-NULL.
	(c_parser_omp_for_loop): Rename tiling variable to oacc_tiling.  For
	OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST.
	Use c_parser_omp_next_tokens_can_be_canon_loop instead of just
	checking for RID_FOR keyword.  Remove spurious semicolon.  Don't call
	c_omp_check_loop_binding_exprs if stmt is NULL.  Skip generated loops.
	(c_parser_omp_tile_sizes, c_parser_omp_tile): New functions.
	(OMP_UNROLL_CLAUSE_MASK): Define.
	(c_parser_omp_unroll): New function.
	(c_parser_omp_construct): Handle PRAGMA_OMP_TILE and
	PRAGMA_OMP_UNROLL.
	* c-typeck.cc (c_finish_omp_clauses): Adjust wording of some of the
	conflicting clause diagnostic messages to include word clause.
	Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES} and diagnose full vs. partial
	conflict.
gcc/cp/
	* cp-tree.h (dependent_omp_for_p): Add another tree argument.
	* parser.cc (check_omp_intervening_code): Reject imperfectly nested
	tile.
	(cp_parser_statement_seq_opt): If want_nested_loop, use
	cp_parser_next_tokens_can_be_canon_loop instead of just checking
	for RID_FOR keyword.
	(cp_parser_omp_clause_name): Handle full and partial clause names.
	(cp_parser_omp_clause_full, cp_parser_omp_clause_partial): New
	functions.
	(cp_parser_omp_all_clauses): Formatting fix.  Handle
	PRAGMA_OMP_CLAUSE_PARTIAL and PRAGMA_OMP_CLAUSE_FULL.
	(cp_parser_next_tokens_can_be_canon_loop): New function.
	(cp_parser_omp_loop_nest): Parse C++11 attributes.  Handle tile/unroll
	constructs.  Use cp_parser_next_tokens_can_be_canon_loop instead
	of just checking for RID_FOR keyword.  Only add_stmt
	cp_parser_omp_loop_nest result if it is non-NULL.
	(cp_parser_omp_for_loop): Rename tiling variable to oacc_tiling.  For
	OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST.
	Use cp_parser_next_tokens_can_be_canon_loop instead of just
	checking for RID_FOR keyword.  Remove spurious semicolon.  Don't call
	c_omp_check_loop_binding_exprs if stmt is NULL.  Skip and/or handle
	generated loops.  Remove spurious ()s around & operands.
	(cp_parser_omp_tile_sizes, cp_parser_omp_tile): New functions.
	(OMP_UNROLL_CLAUSE_MASK): Define.
	(cp_parser_omp_unroll): New function.
	(cp_parser_omp_construct): Handle PRAGMA_OMP_TILE and
	PRAGMA_OMP_UNROLL.
	(cp_parser_pragma): Likewise.
	* semantics.cc (finish_omp_clauses): Don't call
	fold_build_cleanup_point_expr for cases which obviously won't need it,
	like checked INTEGER_CSTs.  Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES}
	and diagnose full vs. partial conflict.  Adjust wording of some of the
	conflicting clause diagnostic messages to include word clause.
	(finish_omp_for): Use decl equal to global_namespace as a marker for
	generated loop.  Pass also body to dependent_omp_for_p.  Skip
	generated loops.
	(finish_omp_for_block): Skip generated loops.
	* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES}.
	(tsubst_stmt): Handle OMP_TILE and OMP_UNROLL.  Handle or skip
	generated loops.
	(dependent_omp_for_p): Add body argument.  If declv vector element
	is NULL, find generated loop.
	* cp-gimplify.cc (cp_gimplify_expr): Handle OMP_TILE and OMP_UNROLL.
	(cp_fold_r): Likewise.
	(cp_genericize_r): Likewise.  Skip generated loops.
gcc/fortran/
	* gfortran.h (enum gfc_statement): Add ST_OMP_UNROLL,
	ST_OMP_END_UNROLL, ST_OMP_TILE and ST_OMP_END_TILE.
	(struct gfc_omp_clauses): Add sizes_list, partial, full and erroneous
	members.
	(enum gfc_exec_op): Add EXEC_OMP_UNROLL and EXEC_OMP_TILE.
	(gfc_expr_list_len): Declare.
	* match.h (gfc_match_omp_tile, gfc_match_omp_unroll): Declare.
	* openmp.cc (gfc_get_location): Declare.
	(gfc_free_omp_clauses): Free sizes_list.
	(match_oacc_expr_list): Rename to ...
	(match_omp_oacc_expr_list): ... this.  Add is_omp argument and
	change diagnostic wording if it is true.
	(enum omp_mask2): Add OMP_CLAUSE_{FULL,PARTIAL,SIZES}.
	(gfc_match_omp_clauses): Parse full, partial and sizes clauses.
	(gfc_match_oacc_wait): Use match_omp_oacc_expr_list instead of
	match_oacc_expr_list.
	(OMP_UNROLL_CLAUSES, OMP_TILE_CLAUSES): Define.
	(gfc_match_omp_tile, gfc_match_omp_unroll): New functions.
	(resolve_omp_clauses): Diagnose full vs. partial clause conflict.
	Resolve sizes clause arguments.
	(find_nested_loop_in_chain): Use switch instead of series of ifs.
	Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
	(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse to
	list length of sizes_list if present.
	(gfc_resolve_do_iterator): Return for EXEC_OMP_TILE or
	EXEC_OMP_UNROLL.
	(restructure_intervening_code): Remove spurious ()s around & operands.
	(is_outer_iteration_variable): Handle EXEC_OMP_TILE and
	EXEC_OMP_UNROLL.
	(check_nested_loop_in_chain): Likewise.
	(expr_is_invariant): Likewise.
	(resolve_omp_do): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.  Diagnose
	tile without sizes clause.  Use sizes_list length for count if
	non-NULL.  Set code->ext.omp_clauses->erroneous on loops where we've
	reported diagnostics.  Sorry for mixing non-rectangular loops with
	generated loops.
	(omp_code_to_statement): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
	(gfc_resolve_omp_directive): Likewise.
	* parse.cc (decode_omp_directive): Parse end tile, end unroll, tile
	and unroll.  Move nothing entry alphabetically.
	(case_exec_markers): Add ST_OMP_TILE and ST_OMP_UNROLL.
	(gfc_ascii_statement): Handle ST_OMP_END_TILE, ST_OMP_END_UNROLL,
	ST_OMP_TILE and ST_OMP_UNROLL.
	(parse_omp_do): Add nested argument.  Handle ST_OMP_TILE and
	ST_OMP_UNROLL.
	(parse_omp_structured_block): Adjust parse_omp_do caller.
	(parse_executable): Likewise.  Handle ST_OMP_TILE and ST_OMP_UNROLL.
	* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE and
	EXEC_OMP_UNROLL.
	(gfc_resolve_code): Likewise.
	* st.cc (gfc_free_statement): Likewise.
	* trans.cc (trans_code): Likewise.
	* trans-openmp.cc (gfc_trans_omp_clauses): Handle full, partial and
	sizes clauses.  Use tree_cons + nreverse instead of
	temporary vector and build_tree_list_vec for tile_list handling.
	(gfc_expr_list_len): New function.
	(gfc_trans_omp_do): Rename tile to oacc_tile.  Handle sizes clause.
	Don't assert code->op is EXEC_DO.  Handle EXEC_OMP_TILE and
	EXEC_OMP_UNROLL.
	(gfc_trans_omp_directive): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
	* dump-parse-tree.cc (show_omp_clauses): Dump full, partial and
	sizes clauses.
	(show_omp_node): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
	(show_code_node): Likewise.
gcc/testsuite/
	* c-c++-common/gomp/attrs-tile-1.c: New test.
	* c-c++-common/gomp/attrs-tile-2.c: New test.
	* c-c++-common/gomp/attrs-tile-3.c: New test.
	* c-c++-common/gomp/attrs-tile-4.c: New test.
	* c-c++-common/gomp/attrs-tile-5.c: New test.
	* c-c++-common/gomp/attrs-tile-6.c: New test.
	* c-c++-common/gomp/attrs-unroll-1.c: New test.
	* c-c++-common/gomp/attrs-unroll-2.c: New test.
	* c-c++-common/gomp/attrs-unroll-3.c: New test.
	* c-c++-common/gomp/attrs-unroll-inner-1.c: New test.
	* c-c++-common/gomp/attrs-unroll-inner-2.c: New test.
	* c-c++-common/gomp/attrs-unroll-inner-3.c: New test.
	* c-c++-common/gomp/attrs-unroll-inner-4.c: New test.
	* c-c++-common/gomp/attrs-unroll-inner-5.c: New test.
	* c-c++-common/gomp/imperfect-attributes.c: Adjust expected
	diagnostics.
	* c-c++-common/gomp/imperfect-loop-nest.c: New test.
	* c-c++-common/gomp/ordered-5.c: New test.
	* c-c++-common/gomp/scan-7.c: New test.
	* c-c++-common/gomp/tile-1.c: New test.
	* c-c++-common/gomp/tile-2.c: New test.
	* c-c++-common/gomp/tile-3.c: New test.
	* c-c++-common/gomp/tile-4.c: New test.
	* c-c++-common/gomp/tile-5.c: New test.
	* c-c++-common/gomp/tile-6.c: New test.
	* c-c++-common/gomp/tile-7.c: New test.
	* c-c++-common/gomp/tile-8.c: New test.
	* c-c++-common/gomp/tile-9.c: New test.
	* c-c++-common/gomp/tile-10.c: New test.
	* c-c++-common/gomp/tile-11.c: New test.
	* c-c++-common/gomp/tile-12.c: New test.
	* c-c++-common/gomp/tile-13.c: New test.
	* c-c++-common/gomp/tile-14.c: New test.
	* c-c++-common/gomp/tile-15.c: New test.
	* c-c++-common/gomp/unroll-1.c: New test.
	* c-c++-common/gomp/unroll-2.c: New test.
	* c-c++-common/gomp/unroll-3.c: New test.
	* c-c++-common/gomp/unroll-4.c: New test.
	* c-c++-common/gomp/unroll-5.c: New test.
	* c-c++-common/gomp/unroll-6.c: New test.
	* c-c++-common/gomp/unroll-7.c: New test.
	* c-c++-common/gomp/unroll-8.c: New test.
	* c-c++-common/gomp/unroll-9.c: New test.
	* c-c++-common/gomp/unroll-inner-1.c: New test.
	* c-c++-common/gomp/unroll-inner-2.c: New test.
	* c-c++-common/gomp/unroll-inner-3.c: New test.
	* c-c++-common/gomp/unroll-non-rect-1.c: New test.
	* c-c++-common/gomp/unroll-non-rect-2.c: New test.
	* c-c++-common/gomp/unroll-non-rect-3.c: New test.
	* c-c++-common/gomp/unroll-simd-1.c: New test.
	* gcc.dg/gomp/attrs-4.c: Adjust expected diagnostics.
	* gcc.dg/gomp/for-1.c: Likewise.
	* gcc.dg/gomp/for-11.c: Likewise.
	* g++.dg/gomp/attrs-4.C: Likewise.
	* g++.dg/gomp/for-1.C: Likewise.
	* g++.dg/gomp/pr94512.C: Likewise.
	* g++.dg/gomp/tile-1.C: New test.
	* g++.dg/gomp/tile-2.C: New test.
	* g++.dg/gomp/unroll-1.C: New test.
	* g++.dg/gomp/unroll-2.C: New test.
	* g++.dg/gomp/unroll-3.C: New test.
	* gfortran.dg/gomp/inner-loops-1.f90: New test.
	* gfortran.dg/gomp/inner-loops-2.f90: New test.
	* gfortran.dg/gomp/pure-1.f90: Add tests for !$omp unroll
	and !$omp tile.
	* gfortran.dg/gomp/pure-2.f90: Remove those tests from here.
	* gfortran.dg/gomp/scan-9.f90: New test.
	* gfortran.dg/gomp/tile-1.f90: New test.
	* gfortran.dg/gomp/tile-2.f90: New test.
	* gfortran.dg/gomp/tile-3.f90: New test.
	* gfortran.dg/gomp/tile-4.f90: New test.
	* gfortran.dg/gomp/tile-5.f90: New test.
	* gfortran.dg/gomp/tile-6.f90: New test.
	* gfortran.dg/gomp/tile-7.f90: New test.
	* gfortran.dg/gomp/tile-8.f90: New test.
	* gfortran.dg/gomp/tile-9.f90: New test.
	* gfortran.dg/gomp/tile-10.f90: New test.
	* gfortran.dg/gomp/tile-imperfect-nest-1.f90: New test.
	* gfortran.dg/gomp/tile-imperfect-nest-2.f90: New test.
	* gfortran.dg/gomp/tile-inner-loops-1.f90: New test.
	* gfortran.dg/gomp/tile-inner-loops-2.f90: New test.
	* gfortran.dg/gomp/tile-inner-loops-3.f90: New test.
	* gfortran.dg/gomp/tile-inner-loops-4.f90: New test.
	* gfortran.dg/gomp/tile-inner-loops-5.f90: New test.
	* gfortran.dg/gomp/tile-inner-loops-6.f90: New test.
	* gfortran.dg/gomp/tile-inner-loops-7.f90: New test.
	* gfortran.dg/gomp/tile-inner-loops-8.f90: New test.
	* gfortran.dg/gomp/tile-non-rectangular-1.f90: New test.
	* gfortran.dg/gomp/tile-non-rectangular-2.f90: New test.
	* gfortran.dg/gomp/tile-non-rectangular-3.f90: New test.
	* gfortran.dg/gomp/tile-unroll-1.f90: New test.
	* gfortran.dg/gomp/tile-unroll-2.f90: New test.
	* gfortran.dg/gomp/unroll-1.f90: New test.
	* gfortran.dg/gomp/unroll-2.f90: New test.
	* gfortran.dg/gomp/unroll-3.f90: New test.
	* gfortran.dg/gomp/unroll-4.f90: New test.
	* gfortran.dg/gomp/unroll-5.f90: New test.
	* gfortran.dg/gomp/unroll-6.f90: New test.
	* gfortran.dg/gomp/unroll-7.f90: New test.
	* gfortran.dg/gomp/unroll-8.f90: New test.
	* gfortran.dg/gomp/unroll-9.f90: New test.
	* gfortran.dg/gomp/unroll-10.f90: New test.
	* gfortran.dg/gomp/unroll-11.f90: New test.
	* gfortran.dg/gomp/unroll-12.f90: New test.
	* gfortran.dg/gomp/unroll-13.f90: New test.
	* gfortran.dg/gomp/unroll-inner-loop-1.f90: New test.
	* gfortran.dg/gomp/unroll-inner-loop-2.f90: New test.
	* gfortran.dg/gomp/unroll-no-clause-1.f90: New test.
	* gfortran.dg/gomp/unroll-non-rect-1.f90: New test.
	* gfortran.dg/gomp/unroll-non-rect-2.f90: New test.
	* gfortran.dg/gomp/unroll-simd-1.f90: New test.
	* gfortran.dg/gomp/unroll-simd-2.f90: New test.
	* gfortran.dg/gomp/unroll-simd-3.f90: New test.
	* gfortran.dg/gomp/unroll-tile-1.f90: New test.
	* gfortran.dg/gomp/unroll-tile-2.f90: New test.
	* gfortran.dg/gomp/unroll-tile-inner-1.f90: New test.
libgomp/
	* testsuite/libgomp.c-c++-common/imperfect-transform-1.c: New test.
	* testsuite/libgomp.c-c++-common/imperfect-transform-2.c: New test.
	* testsuite/libgomp.c-c++-common/matrix-1.h: New test.
	* testsuite/libgomp.c-c++-common/matrix-constant-iter.h: New test.
	* testsuite/libgomp.c-c++-common/matrix-helper.h: New test.
	* testsuite/libgomp.c-c++-common/matrix-no-directive-1.c: New test.
	* testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c:
	New test.
	* testsuite/libgomp.c-c++-common/matrix-omp-distribute-parallel-for-1.c:
	New test.
	* testsuite/libgomp.c-c++-common/matrix-omp-for-1.c: New test.
	* testsuite/libgomp.c-c++-common/matrix-omp-parallel-for-1.c: New test.
	* testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-1.c:
	New test.
	* testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-simd-1.c:
	New test.
	* testsuite/libgomp.c-c++-common/matrix-omp-target-parallel-for-1.c:
	New test.
	* testsuite/libgomp.c-c++-common/matrix-omp-target-teams-distribute-parallel-for-1.c:
	New test.
	* testsuite/libgomp.c-c++-common/matrix-omp-taskloop-1.c: New test.
	* testsuite/libgomp.c-c++-common/matrix-omp-teams-distribute-parallel-for-1.c:
	New test.
	* testsuite/libgomp.c-c++-common/matrix-simd-1.c: New test.
	* testsuite/libgomp.c-c++-common/matrix-transform-variants-1.h:
	New test.
	* testsuite/libgomp.c-c++-common/target-imperfect-transform-1.c:
	New test.
	* testsuite/libgomp.c-c++-common/target-imperfect-transform-2.c:
	New test.
	* testsuite/libgomp.c-c++-common/unroll-1.c: New test.
	* testsuite/libgomp.c-c++-common/unroll-non-rect-1.c: New test.
	* testsuite/libgomp.c++/matrix-no-directive-unroll-full-1.C: New test.
	* testsuite/libgomp.c++/tile-2.C: New test.
	* testsuite/libgomp.c++/tile-3.C: New test.
	* testsuite/libgomp.c++/unroll-1.C: New test.
	* testsuite/libgomp.c++/unroll-2.C: New test.
	* testsuite/libgomp.c++/unroll-full-tile.C: New test.
	* testsuite/libgomp.fortran/imperfect-transform-1.f90: New test.
	* testsuite/libgomp.fortran/imperfect-transform-2.f90: New test.
	* testsuite/libgomp.fortran/inner-1.f90: New test.
	* testsuite/libgomp.fortran/nested-fn.f90: New test.
	* testsuite/libgomp.fortran/target-imperfect-transform-1.f90: New test.
	* testsuite/libgomp.fortran/target-imperfect-transform-2.f90: New test.
	* testsuite/libgomp.fortran/tile-1.f90: New test.
	* testsuite/libgomp.fortran/tile-2.f90: New test.
	* testsuite/libgomp.fortran/tile-unroll-1.f90: New test.
	* testsuite/libgomp.fortran/tile-unroll-2.f90: New test.
	* testsuite/libgomp.fortran/tile-unroll-3.f90: New test.
	* testsuite/libgomp.fortran/tile-unroll-4.f90: New test.
	* testsuite/libgomp.fortran/unroll-1.f90: New test.
	* testsuite/libgomp.fortran/unroll-2.f90: New test.
	* testsuite/libgomp.fortran/unroll-3.f90: New test.
	* testsuite/libgomp.fortran/unroll-4.f90: New test.
	* testsuite/libgomp.fortran/unroll-5.f90: New test.
	* testsuite/libgomp.fortran/unroll-6.f90: New test.
	* testsuite/libgomp.fortran/unroll-7a.f90: New test.
	* testsuite/libgomp.fortran/unroll-7b.f90: New test.
	* testsuite/libgomp.fortran/unroll-7c.f90: New test.
	* testsuite/libgomp.fortran/unroll-7.f90: New test.
	* testsuite/libgomp.fortran/unroll-8.f90: New test.
	* testsuite/libgomp.fortran/unroll-simd-1.f90: New test.
	* testsuite/libgomp.fortran/unroll-tile-1.f90: New test.
	* testsuite/libgomp.fortran/unroll-tile-2.f90: New test.

804c0f35

AArch64: Fix cpu features initialization [PR115342] · d7cbcfe7

Wilco Dijkstra authored 9 months ago

The CPU features initialization code uses CPUID registers (rather than
HWCAP).  The equality comparisons it uses are incorrect: for example FEAT_SVE
is not set if SVE2 is available.  Using HWCAPs for these is both simpler and
correct.  The initialization must also be done atomically to avoid multiple
threads causing corruption due to non-atomic RMW accesses to the global.

libgcc:
	PR target/115342
	* config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
	Use HWCAP where possible.  Use atomic write for initialization.
	Fix FEAT_PREDRES comparison.
	(__init_cpu_features_resolver): Use atomic load for correct
	initialization.
	(__init_cpu_features): Likewise.

d7cbcfe7

testsuite: Improve check-function-bodies · acdc9df3

Wilco Dijkstra authored 9 months ago

Improve check-function-bodies by allowing single-character function names.

gcc/testsuite:
	* lib/scanasm.exp (configure_check-function-bodies): Allow single-char
	function names.

acdc9df3

darwin: Replace use of LONG_DOUBLE_TYPE_SIZE · 58ecd2eb

Kewen Lin authored 9 months ago

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type.  To be prepared for that, this
patch is to replace use of LONG_DOUBLE_TYPE_SIZE in darwin
with TYPE_PRECISION of long_double_type_node.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/ChangeLog:

	* config/darwin.cc (darwin_patch_builtins): Use TYPE_PRECISION of
	long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.

58ecd2eb

fortran: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE · 37a48009

Kewen Lin authored 9 months ago

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type.  To be prepared for that, this
patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
in fortran with TYPE_PRECISION of
{float,{,long_}double}_type_node.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/fortran/ChangeLog:

	* trans-intrinsic.cc (build_round_expr): Use TYPE_PRECISION of
	long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
	* trans-types.cc (gfc_build_real_type): Use TYPE_PRECISION of
	{float,double,long_double}_type_node to replace
	{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.

37a48009

d: Replace use of LONG_DOUBLE_TYPE_SIZE · b36461f1

Kewen Lin authored 9 months ago

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type.  To be prepared for that, this
patch is to remove the only one use of LONG_DOUBLE_TYPE_SIZE
in d.  Iain found that LONG_DOUBLE_TYPE_SIZE is poorly named
and used incorrectly before, so this patch follows his advice
with int_size_in_bytes.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html



Co-authored-by: Iain Buclaw <ibuclaw@gdcproject.org>

gcc/d/ChangeLog:

	* d-target.cc (Target::_init): Use int_size_in_bytes of
	long_double_type_node to replace the expression with
	LONG_DOUBLE_TYPE_SIZE for c.long_doublesize assignment.

b36461f1

ada: Replace use of LONG_DOUBLE_TYPE_SIZE · 6fa25aa9

Kewen Lin authored 9 months ago

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type.  To be prepared for that, this
patch is to replace use of LONG_DOUBLE_TYPE_SIZE in ada
with TYPE_PRECISION of long_double_type_node.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/ada/ChangeLog:

	* gcc-interface/decl.cc (gnat_to_gnu_entity): Use TYPE_PRECISION of
	long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.

6fa25aa9

Internal-fn: Support new IFN SAT_SUB for unsigned scalar int · abe6d393

Pan Li authored 9 months ago


This patch would like to add the middle-end presentation for the
saturation sub.  Aka set the result of add to the min when downflow.
It will take the pattern similar as below.

SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));

For example for uint8_t, we have

* SAT_SUB (255, 0)   => 255
* SAT_SUB (1, 2)     => 0
* SAT_SUB (254, 255) => 0
* SAT_SUB (0, 255)   => 0

Given below SAT_SUB for uint64

uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
{
  return (x - y) & (-(TYPE)(x >= y));
}

Before this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  _Bool _1;
  long unsigned int _3;
  uint64_t _6;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _1 = x_4(D) >= y_5(D);
  _3 = x_4(D) - y_5(D);
  _6 = _1 ? _3 : 0;
  return _6;
;;    succ:       EXIT
}

After this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _6;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
  return _6;
;;    succ:       EXIT
}

The below tests are running for this patch:
*. The riscv fully regression tests.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
	* match.pd: Add new match for SAT_SUB.
	* optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
	new decl for generated in match.pd.
	(build_saturation_binary_arith_call): Add new helper function
	to build the gimple call to binary SAT alu.
	(match_saturation_arith): Rename from.
	(match_unsigned_saturation_add): Rename to.
	(match_unsigned_saturation_sub): Add new func to match the
	unsigned sat sub.
	(math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
	try when COND_EXPR.

Signed-off-by: Pan Li <pan2.li@intel.com>

abe6d393

doc: Streamline recommendation of GNU awk · 99314267

Gerald Pfeifer authored 9 months ago

GNU awk 3.1.5 was released in August 2005; no need to specify this in
the context of "recent version".

gcc:
	PR other/69374
	* doc/install.texi (Prerequisites): Drop reference to GNU awk
	version 3.1.5. Remove fluff.

99314267