Commits · fceecc511d4918e2b27a0609f8885ec8aba8723d · COBOLworx / gcc-cobol

Aug 19, 2024

aarch64: Fix ls64 intrinsic availability · fceecc51

The availability of ls64 intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.

This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. We also get better error
messages when ls64 is not available (matching the existing error
messages for SVE intrinsics).

The data512_t type is made always available; this is consistent with the
present behaviour for Neon fp16/bf16 types.

gcc/ChangeLog:

	PR target/112108
	* config/aarch64/aarch64-builtins.cc (handle_arm_acle_h): Remove
	feature check at initialisation.
	(aarch64_general_check_builtin_call): Check ls64 intrinsics.
	* config/aarch64/arm_acle.h: (data512_t) Make always available.

gcc/testsuite/ChangeLog:

	PR target/112108
	* gcc.target/aarch64/acle/ls64_guard-1.c: New test.
	* gcc.target/aarch64/acle/ls64_guard-2.c: New test.
	* gcc.target/aarch64/acle/ls64_guard-3.c: New test.
	* gcc.target/aarch64/acle/ls64_guard-4.c: New test.

fceecc51

aarch64: Fix memtag intrinsic availability · 4e1b617b

Andrew Carlotti authored 1 year ago

The availability of memtag intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.

This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. It also removes the macro
indirection from the header file - this simplifies the header, and
allows the missing extension error reporting to find the user-facing
intrinsic names.

gcc/ChangeLog:

	PR target/112108
	* config/aarch64/aarch64-builtins.cc (aarch64_init_memtag_builtins):
	Define intrinsic names directly.
	(aarch64_general_init_builtins): Move memtag intialisation...
	(handle_arm_acle_h): ...to here, and remove feature check.
	(aarch64_general_check_builtin_call): Check memtag intrinsics.
	* config/aarch64/arm_acle.h (__arm_mte_create_random_tag)
	(__arm_mte_exclude_tag, __arm_mte_ptrdiff)
	(__arm_mte_increment_tag, __arm_mte_set_tag, __arm_mte_get_tag):
	Remove.

gcc/testsuite/ChangeLog:

	PR target/112108
	* gcc.target/aarch64/acle/memtag_guard-1.c: New test.
	* gcc.target/aarch64/acle/memtag_guard-2.c: New test.
	* gcc.target/aarch64/acle/memtag_guard-3.c: New test.
	* gcc.target/aarch64/acle/memtag_guard-4.c: New test.

4e1b617b

aarch64: Fix tme intrinsic availability · 32afbb60

Andrew Carlotti authored 1 year ago

The availability of tme intrinsics was previously gated at both
initialisation time (using global target options) and usage time
(accounting for function-specific target options).  This patch removes
the check at initialisation time, and also moves the intrinsics out of
the header file to allow for better error messages (matching the
existing error messages for SVE intrinsics).

gcc/ChangeLog:

	PR target/112108
	* config/aarch64/aarch64-builtins.cc (aarch64_init_tme_builtins):
	Define intrinsic names directly.
	(aarch64_general_init_builtins): Move tme initialisation...
	(handle_arm_acle_h): ...to here, and remove feature check.
	(aarch64_general_check_builtin_call): Check tme intrinsics.
	* config/aarch64/arm_acle.h (__tstart, __tcommit, __tcancel)
	(__ttest): Remove.
	(_TMFAILURE_*): Define unconditionally.

gcc/testsuite/ChangeLog:

	PR target/112108
	* gcc.target/aarch64/acle/tme_guard-1.c: New test.
	* gcc.target/aarch64/acle/tme_guard-2.c: New test.
	* gcc.target/aarch64/acle/tme_guard-3.c: New test.
	* gcc.target/aarch64/acle/tme_guard-4.c: New test.

32afbb60

aarch64: Move check_required_extensions · baf71ec5

Andrew Carlotti authored 1 year ago

Move SVE extension checking functionality to aarch64-builtins.cc, so
that it can be shared by non-SVE intrinsics.

gcc/ChangeLog:

	* config/aarch64/aarch64-sve-builtins.cc (check_builtin_call)
	(expand_builtin): Update calls to the below.
	(report_missing_extension, report_missing_registers)
	(check_required_extensions): Move out of aarch64_sve namespace,
	rename, and move into...
	* config/aarch64/aarch64-builtins.cc (aarch64_report_missing_extension)
	(aarch64_report_missing_registers)
	(aarch64_check_required_extensions) ...here.
	* config/aarch64/aarch64-protos.h (aarch64_check_required_extensions):
	Add prototype.

baf71ec5

aarch64: Refactor check_required_extensions · a4b39dc4

Andrew Carlotti authored 7 months ago

Replace TARGET_GENERAL_REGS_ONLY check with an explicit check that
aarch64_isa_flags enables all required extensions.  This will be more
flexible when repurposing this function for non-SVE intrinsics.

gcc/ChangeLog:

	* config/aarch64/aarch64-sve-builtins.cc
	(check_required_registers): Remove target check and rename to...
	(report_missing_registers): ...this.
	(check_required_extensions): Refactor.

a4b39dc4

Allow coarrays in select type. [PR46371, PR56496] · 8871489c

Andre Vehreschild authored 7 months ago

Fix ICE when scalar coarrays are used in a select type. Prevent
coindexing in associate/select type/select rank selector expression.

gcc/fortran/ChangeLog:

	PR fortran/46371
	PR fortran/56496

	* expr.cc (gfc_is_coindexed): Detect is coindexed also when
	rewritten to caf_get.
	* trans-stmt.cc (trans_associate_var): Always accept a
	descriptor for coarrays.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/select_type_1.f90: New test.
	* gfortran.dg/coarray/select_type_2.f90: New test.
	* gfortran.dg/coarray/select_type_3.f90: New test.

8871489c

gnat: fix lto-type-mismatch between C_Version_String and gnat_version_string [PR115917] · 9cbcf8d1

Arsen Arsenović authored 7 months ago

gcc/ada/ChangeLog:

	PR ada/115917
	* gnatvsn.ads: Add note about the duplication of this value in
	version.c.
	* version.c (VER_LEN_MAX): Define to the same value as
	Gnatvsn.Ver_Len_Max.
	(gnat_version_string): Use VER_LEN_MAX as bound.

9cbcf8d1

aarch64: Reduce FP reassociation width for Neoverse V2 and set... · cc572242

Kyrylo Tkachov authored 7 months ago

aarch64: Reduce FP reassociation width for Neoverse V2 and set AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA

The fp reassociation width for Neoverse V2 was set to 6 since its
introduction and I guess it was empirically tuned. But since
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA was added the tree reassociation
pass seems to be more deliberate in forming FMAs and when that flag is
used it seems to more properly evaluate the FMA vs non-FMA reassociation
widths.
According to the Neoverse V2 SWOG the core has a throughput of 4 for
most FP operations, so the value 6 is not accurate anyway.
Also, the SWOG does state that FMADD operations are pipelined and the
results can be forwarded from FP multiplies to the accumulation operands
of FMADD instructions, which seems to be what
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA expresses.

This patch sets the fp_reassoc_width field to 4 and enables
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA for -mcpu=neoverse-v2.

On SPEC2017 fprate I see the following changes on a Grace system:
503.bwaves_r 0.16%
507.cactuBSSN_r -0.32%
508.namd_r 3.04%
510.parest_r 0.00%
511.povray_r 0.78%
519.lbm_r 0.35%
521.wrf_r 0.69%
526.blender_r -0.53%
527.cam4_r 0.84%
538.imagick_r 0.00%
544.nab_r -0.97%
549.fotonik3d_r -0.45%
554.roms_r 0.97%
Geomean 0.35%

with -Ofast -mcpu=grace -flto.

So slight overall improvement with a meaningful improvement in
508.namd_r.

I think other tunings in aarch64 should look into
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA as well, but I'll leave the
benchmarking to someone else.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

gcc/ChangeLog:

* config/aarch64/tuning_models/neoversev2.h (fp_reassoc_width):
Set to 4.
(tune_flags): Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.

cc572242

testsuite: Prune warning about size of enums · 6d8b9b77

Torbjörn SVENSSON authored 7 months ago

This fixes reported regression at
https://linaro.atlassian.net/browse/GNU-1315

.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/pr33738-2.C: dg-prune arm linker messages about
	size of enums.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

6d8b9b77

rtl: Enable the use of rtx values with int and mode attributes · e57d3cce

Andre Vieira authored 7 months ago

The 'code' part of a 'define_code_attr' refers to the type of the key, in other
words, it uses a code_iterator to pick the 'value' from their (key "value") pair
list.

However, rtx_alloc_for_name requires a code_attribute to be used when the
'value' needs to be a type. In other words, no other type of attributes could be
used, before this patch, to produce a rtx typed 'value'.

This patch removes that restriction and allows the backend to use any kind of
attribute as long as that attribute always produces a valid code typed 'value'.

gcc/ChangeLog:

	* read-rtl.cc (rtx_reader::rtx_alloc_for_name): Allow all attribute
	types to produce code 'values'.
	(check_code_attribute): Rename ...
	(check_attribute_codes): ... to this.  And change comments to refer to
	* doc/md.texi: Add paragraph to document that you can use int and mode
	attributes to produce codes.

e57d3cce

testsuite: Reduce cut-&-paste in scanltranstree.exp · 71059d26

Richard Sandiford authored 7 months ago

scanltranstree.exp defines some LTO wrappers around standard
non-LTO scanners.  Four of them are cut-&-paste variants of
one another, so this patch generates them from a single template.
It also does the same for scan-ltrans-tree-dump-times, so that
other *-times scanners can be added easily in future.

The scanners seem to be lightly used.  gcc.dg/ipa/ipa-icf-38.c uses
scan-ltrans-tree-dump{,-not} and libgomp.c/declare-variant-1.c
uses scan-ltrans-tree-dump-{not,times}.  Nothing currently seems
to use scan-ltrans-tree-dump-dem*.

gcc/testsuite/
	* lib/scanltranstree.exp: Redefine the routines using two
	templates.

71059d26

Fix ICE in recompute_tree_invariant_for_addr_expr, at tree.c:4535 [PR84244] · 661acde6

Andre Vehreschild authored 8 months ago

Declaring an unused function with a derived type having a pointer
component and using that derived type as a coarray, lead the compiler to
ICE because the caf_token for the pointer was not linked into the
component correctly.

	PR fortran/84244

gcc/fortran/ChangeLog:

	* trans-types.cc (gfc_get_derived_type): When a caf_sub_token is
	generated for a component, link it to the component it is
	generated for (the previous one).

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/ptr_comp_5.f08: New test.

661acde6

aarch64: Implement 16-byte vector mode const0 store by TImode · 8d6c6fbc

Haochen Gui authored 7 months ago

gcc/
	* config/aarch64/aarch64-simd.md (mov<mode> for VSTRUCT_QD):
	Expand 16-byte vector mode const0 store by TImode.

8d6c6fbc