Commits · 4fb12ae93ddf6dea9a30041cecc94911d7863556 · COBOLworx / gcc-cobol

Apr 20, 2023

i386: Add AVX512BW dependency to AVX512VBMI2 · 4fb12ae9

Haochen Jiang authored 2 years ago

gcc/ChangeLog:

	* common/config/i386/i386-common.cc
	(OPTION_MASK_ISA_AVX512VBMI2_SET): Change OPTION_MASK_ISA_AVX512F_SET
	to OPTION_MASK_ISA_AVX512BW_SET.
	(OPTION_MASK_ISA_AVX512F_UNSET):
	Remove OPTION_MASK_ISA_AVX512VBMI2_UNSET.
	(OPTION_MASK_ISA_AVX512BW_UNSET):
	Add OPTION_MASK_ISA_AVX512VBMI2_UNSET.
	* config/i386/avx512vbmi2intrin.h: Do not push avx512bw.
	* config/i386/avx512vbmi2vlintrin.h: Ditto.
	* config/i386/i386-builtin.def: Remove OPTION_MASK_ISA_AVX512BW.
	* config/i386/sse.md (VI12_AVX512VLBW): Removed.
	(VI12_VI48F_AVX512VLBW): Rename to VI12_VI48F_AVX512VL.
	(compress<mode>_mask): Change iterator from VI12_AVX512VLBW to
	VI12_AVX512VL.
	(compressstore<mode>_mask): Ditto.
	(expand<mode>_mask): Ditto.
	(expand<mode>_maskz): Ditto.
	(*expand<mode>_mask): Change iterator from VI12_VI48F_AVX512VLBW to
	VI12_VI48F_AVX512VL.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-pr100267-1.c: Remove avx512f and avx512bw.
	* gcc.target/i386/avx512bw-pr100267-b-2.c: Ditto.
	* gcc.target/i386/avx512bw-pr100267-d-2.c: Ditto.
	* gcc.target/i386/avx512bw-pr100267-q-2.c: Ditto.
	* gcc.target/i386/avx512bw-pr100267-w-2.c: Ditto.
	* gcc.target/i386/avx512f-vpcompressb-1.c: Ditto.
	* gcc.target/i386/avx512f-vpcompressb-2.c: Ditto.
	* gcc.target/i386/avx512f-vpcompressw-1.c: Ditto.
	* gcc.target/i386/avx512f-vpcompressw-2.c: Ditto.
	* gcc.target/i386/avx512f-vpexpandb-1.c: Ditto.
	* gcc.target/i386/avx512f-vpexpandb-2.c: Ditto.
	* gcc.target/i386/avx512f-vpexpandw-1.c: Ditto.
	* gcc.target/i386/avx512f-vpexpandw-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshld-1.c: Ditto.
	* gcc.target/i386/avx512f-vpshldd-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshldq-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshldv-1.c: Ditto.
	* gcc.target/i386/avx512f-vpshldvd-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshldvq-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshldvw-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshrdd-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshrdq-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshrdv-1.c: Ditto.
	* gcc.target/i386/avx512f-vpshrdvd-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshrdvq-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshrdvw-2.c: Ditto.
	* gcc.target/i386/avx512f-vpshrdw-2.c: Ditto.
	* gcc.target/i386/avx512vbmi2-vpshld-1.c: Ditto.
	* gcc.target/i386/avx512vbmi2-vpshrd-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcompressb-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpcompressb-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpcompressw-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpexpandb-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpexpandb-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpexpandw-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpexpandw-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshldd-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshldq-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshldv-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpshldvd-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshldvq-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshldvw-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshrdd-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshrdq-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshrdv-1.c: Ditto.
	* gcc.target/i386/avx512vl-vpshrdvd-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshrdvq-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshrdvw-2.c: Ditto.
	* gcc.target/i386/avx512vl-vpshrdw-2.c: Ditto.
	* gcc.target/i386/avx512vlbw-pr100267-1.c: Ditto.
	* gcc.target/i386/avx512vlbw-pr100267-b-2.c: Ditto.
	* gcc.target/i386/avx512vlbw-pr100267-w-2.c: Ditto.

4fb12ae9

i386: Add AVX512BW dependency to AVX512BITALG · d08b0559

Haochen Jiang authored 2 years ago

Since some of the AVX512BITALG intrins use 32/64 bit mask,
AVX512BW should be implied.

gcc/ChangeLog:

	* common/config/i386/i386-common.cc
	(OPTION_MASK_ISA_AVX512BITALG_SET):
	Change OPTION_MASK_ISA_AVX512F_SET
	to OPTION_MASK_ISA_AVX512BW_SET.
	(OPTION_MASK_ISA_AVX512F_UNSET):
	Remove OPTION_MASK_ISA_AVX512BITALG_SET.
	(OPTION_MASK_ISA_AVX512BW_UNSET):
	Add OPTION_MASK_ISA_AVX512BITALG_SET.
	* config/i386/avx512bitalgintrin.h: Do not push avx512bw.
	* config/i386/i386-builtin.def:
	Remove redundant OPTION_MASK_ISA_AVX512BW.
	* config/i386/sse.md (VI1_AVX512VLBW): Removed.
	(avx512vl_vpshufbitqmb<mode><mask_scalar_merge_name>):
	Change the iterator from VI1_AVX512VLBW to VI1_AVX512VL.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bitalg-vpopcntb-1.c:
	Remove avx512bw.
	* gcc.target/i386/avx512bitalg-vpopcntb.c: Ditto.
	* gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto.
	* gcc.target/i386/avx512bitalg-vpopcntw-1.c: Ditto.
	* gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto.
	* gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto.
	* gcc.target/i386/avx512bitalg-vpshufbitqmb-1.c: Ditto.
	* gcc.target/i386/avx512bitalg-vpshufbitqmb.c: Ditto.
	* gcc.target/i386/avx512bitalgvl-vpopcntb-1.c: Ditto.
	* gcc.target/i386/avx512bitalgvl-vpopcntw-1.c: Ditto.
	* gcc.target/i386/avx512bitalgvl-vpshufbitqmb-1.c: Ditto.
	* gcc.target/i386/pr93696-1.c: Ditto.
	* gcc.target/i386/pr93696-2.c: Ditto.

d08b0559

i386: Use macro to wrap up share builtin exceptions in builtin isa check · 5ebdbdb9

Haochen Jiang authored 2 years ago

gcc/ChangeLog:

	* config/i386/i386-expand.cc
	(ix86_check_builtin_isa_match): Correct wrong comments.
	Add a new macro SHARE_BUILTIN and refactor the current if
	clauses to macro.

5ebdbdb9

Re-arrange sections of i386 cpuid · fd7ecd80

Mo, Zewei authored 2 years ago

gcc/ChangeLog:

	* config/i386/cpuid.h: Open a new section for Extended Features
	Leaf (%eax == 7, %ecx == 0) and Extended Features Sub-leaf (%eax == 7,
	%ecx == 1).

fd7ecd80

Optimize vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128 ymm · c2dac2e5

Hu, Lin1 authored 2 years ago

vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128 ymm are 3 clk.
We can optimze them to vblend, vmovaps when there's no cross-lane.

gcc/ChangeLog:

	* config/i386/sse.md: Modify insn vperm{i,f}
	and vshuf{i,f}.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512vl-vshuff32x4-1.c: Modify test.
	* gcc.target/i386/avx512vl-vshuff64x2-1.c: Ditto.
	* gcc.target/i386/avx512vl-vshufi32x4-1.c: Ditto.
	* gcc.target/i386/avx512vl-vshufi64x2-1.c: Ditto.
	* gcc.target/i386/opt-vperm-vshuf-1.c: New test.
	* gcc.target/i386/opt-vperm-vshuf-2.c: Ditto.
	* gcc.target/i386/opt-vperm-vshuf-3.c: Ditto.

c2dac2e5

Daily bump. · cf0d9dbc
GCC Administrator authored 1 year ago

cf0d9dbc

Apr 19, 2023

gcc: xtensa: add -m[no-]strict-align option · 675b390e

Max Filippov authored 2 years ago

gcc/
	* config/xtensa/xtensa-opts.h: New header.
	* config/xtensa/xtensa.h (STRICT_ALIGNMENT): Redefine as
	xtensa_strict_align.
	* config/xtensa/xtensa.cc (xtensa_option_override): When
	-m[no-]strict-align is not specified in the command line set
	xtensa_strict_align to 0 if the hardware supports both unaligned
	loads and stores or to 1 otherwise.
	* config/xtensa/xtensa.opt (mstrict-align): New option.
	* doc/invoke.texi (Xtensa Options): Document -m[no-]strict-align.

675b390e

gcc: xtensa: add data alignment properties to dynconfig · ec9b3087

Max Filippov authored 2 years ago

gcc/
	* config/xtensa/xtensa-dynconfig.cc (xtensa_get_config_v4): New
	function.

include/
	* xtensa-dynconfig.h (xtensa_config_v4): New struct.
	(XCHAL_DATA_WIDTH, XCHAL_UNALIGNED_LOAD_EXCEPTION)
	(XCHAL_UNALIGNED_STORE_EXCEPTION, XCHAL_UNALIGNED_LOAD_HW)
	(XCHAL_UNALIGNED_STORE_HW, XTENSA_CONFIG_V4_ENTRY_LIST): New
	definitions.
	(XTENSA_CONFIG_INSTANCE_LIST): Add xtensa_config_v4 instance.
	(XTENSA_CONFIG_ENTRY_LIST): Add XTENSA_CONFIG_V4_ENTRY_LIST.

ec9b3087

c++: Define built-in for std::tuple_element [PR100157] · 58b7dbf8

Patrick Palka authored 1 year ago


This adds a new built-in to replace the recursive class template
instantiations done by traits such as std::tuple_element and
std::variant_alternative.  The purpose is to select the Nth type from a
list of types, e.g. __type_pack_element<1, char, int, float> is int.
We implement it as a special kind of TRAIT_TYPE.

For a pathological example tuple_element_t<1000, tuple<2000 types...>>
the compilation time is reduced by more than 90% and the memory used by
the compiler is reduced by 97%.  In realistic examples the gains will be
much smaller, but still relevant.

Unlike the other built-in traits, __type_pack_element uses template-id
syntax instead of call syntax and is SFINAE-enabled, matching Clang's
implementation.  And like the other built-in traits, it's not mangleable
so we can't use it directly in function signatures.

N.B. Clang seems to implement __type_pack_element as a first-class
template that can e.g. be used as a template-template argument.  For
simplicity we implement it in a more ad-hoc way.

Co-authored-by: Jonathan Wakely <jwakely@redhat.com>

	PR c++/100157

gcc/cp/ChangeLog:

	* cp-trait.def (TYPE_PACK_ELEMENT): Define.
	* cp-tree.h (finish_trait_type): Add complain parameter.
	* cxx-pretty-print.cc (pp_cxx_trait): Handle
	CPTK_TYPE_PACK_ELEMENT.
	* parser.cc (cp_parser_constant_expression): Document default
	arguments.
	(cp_parser_trait): Handle CPTK_TYPE_PACK_ELEMENT.  Pass
	tf_warning_or_error to finish_trait_type.
	* pt.cc (tsubst) <case TRAIT_TYPE>: Handle non-type first
	argument.  Pass complain to finish_trait_type.
	* semantics.cc (finish_type_pack_element): Define.
	(finish_trait_type): Add complain parameter.  Handle
	CPTK_TYPE_PACK_ELEMENT.
	* tree.cc (strip_typedefs): Handle non-type first argument.
	Pass tf_warning_or_error to finish_trait_type.
	* typeck.cc (structural_comptypes) <case TRAIT_TYPE>: Use
	cp_tree_equal instead of same_type_p for the first argument.

libstdc++-v3/ChangeLog:

	* include/bits/utility.h (_Nth_type): Conditionally define in
	terms of __type_pack_element if available.
	* testsuite/20_util/tuple/element_access/get_neg.cc: Prune
	additional errors from the new built-in.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/type_pack_element1.C: New test.
	* g++.dg/ext/type_pack_element2.C: New test.
	* g++.dg/ext/type_pack_element3.C: New test.

58b7dbf8

c++: bad ggc_free in try_class_unification [PR109556] · 5e284ebb

Patrick Palka authored 1 year ago

Aside from correcting how try_class_unification copies multi-dimensional
'targs', r13-377-g3e948d645bc908 also made it ggc_free this copy as an
optimization.  But this is wrong since the call to unify within might've
captured the args in persistent memory such as the satisfaction cache
(as part of constrained auto deduction).

	PR c++/109556

gcc/cp/ChangeLog:

	* pt.cc (try_class_unification): Don't ggc_free the copy of
	'targs'.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-placeholder13.C: New test.

5e284ebb

testsuite: fix scan-tree-dump patterns [PR83904,PR100297] · 6fc8e25c

Harald Anlauf authored 1 year ago

Adjust scan-tree-dump patterns so that they do not accidentally match a
valid path.

gcc/testsuite/ChangeLog:

	PR testsuite/83904
	PR fortran/100297
	* gfortran.dg/allocatable_function_1.f90: Use "__builtin_free "
	instead of the naive "free".
	* gfortran.dg/reshape_8.f90: Extend pattern from a simple "data".

6fc8e25c

i386: Add new pattern for zero-extend cmov · 04a9209d

Andrew Pinski authored 1 year ago

After a phiopt change, I got a failure of cmov9.c.
The RTL IR has zero_extend on the outside of
the if_then_else rather than on the side. Both
ways are considered canonical as mentioned in
PR 66588.

This fixes the failure I got and also adds a testcase
which fails before even my phiopt patch but will pass
with this patch.

OK? Bootstrapped and tested on x86_64-linux-gnu with
no regressions.

gcc/ChangeLog:

	* config/i386/i386.md (*movsicc_noc_zext_1): New pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/cmov10.c: New test.
	* gcc.target/i386/cmov11.c: New test.

04a9209d

c++: fix 'unsigned __int128_t' semantics [PR108099] · ed32ec26

Jason Merrill authored 1 year ago

My earlier patch for 108099 made us accept this non-standard pattern but
messed up the semantics, so that e.g. unsigned __int128_t was not a 128-bit
type.

	PR c++/108099

gcc/cp/ChangeLog:

	* decl.cc (grokdeclarator): Keep typedef_decl for __int128_t.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/int128-8.C: New test.

ed32ec26

RISC-V: Support 128 bit vector chunk · 9fdea28d

Juzhe-Zhong authored 1 year ago

RISC-V has provide different VLEN configuration by different ISA
extension like `zve32x`, `zve64x` and `v`
zve32x just guarantee the minimal VLEN is 32 bits,
zve64x guarantee the minimal VLEN is 64 bits,
and v guarantee the minimal VLEN is 128 bits,

Current status (without this patch):

Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
is invalid mode
 - one vector register could hold 1 + 1x SImode where x is 0~n, so it
might hold just one SI

Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
 - one vector register could hold 1 + 1x DImode where x is 0~n, so it
might hold just one DI.
 - one vector register could hold 2 + 2x SImode where x is 0~n, so it
might hold just two SI.

However `v` extension guarantees the minimal VLEN is 128 bits.

We introduce another type/mode mapping for this configure:

v: Mode for one vector register mode is VNx2DImode or VNx4SImode
 - one vector register could hold 2 + 2x DImode where x is 0~n, so it
will hold at least two DI
 - one vector register could hold 4 + 4x SImode where x is 0~n, so it
will hold at least four DI

This patch model the mode more precisely for the RVV, and help some
middle-end optimization that assume number of element must be a
multiple of two.

gcc/ChangeLog:

	* config/riscv/riscv-modes.def (FLOAT_MODE): Add chunk 128 support.
	(VECTOR_BOOL_MODE): Ditto.
	(ADJUST_NUNITS): Ditto.
	(ADJUST_ALIGNMENT): Ditto.
	(ADJUST_BYTESIZE): Ditto.
	(ADJUST_PRECISION): Ditto.
	(RVV_MODES): Ditto.
	(VECTOR_MODE_WITH_PREFIX): Ditto.
	* config/riscv/riscv-v.cc (ENTRY): Ditto.
	(get_vlmul): Ditto.
	(get_ratio): Ditto.
	* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TYPE): Ditto.
	* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE): Ditto.
	(vbool64_t): Ditto.
	(vbool32_t): Ditto.
	(vbool16_t): Ditto.
	(vbool8_t): Ditto.
	(vbool4_t): Ditto.
	(vbool2_t): Ditto.
	(vbool1_t): Ditto.
	(vint8mf8_t): Ditto.
	(vuint8mf8_t): Ditto.
	(vint8mf4_t): Ditto.
	(vuint8mf4_t): Ditto.
	(vint8mf2_t): Ditto.
	(vuint8mf2_t): Ditto.
	(vint8m1_t): Ditto.
	(vuint8m1_t): Ditto.
	(vint8m2_t): Ditto.
	(vuint8m2_t): Ditto.
	(vint8m4_t): Ditto.
	(vuint8m4_t): Ditto.
	(vint8m8_t): Ditto.
	(vuint8m8_t): Ditto.
	(vint16mf4_t): Ditto.
	(vuint16mf4_t): Ditto.
	(vint16mf2_t): Ditto.
	(vuint16mf2_t): Ditto.
	(vint16m1_t): Ditto.
	(vuint16m1_t): Ditto.
	(vint16m2_t): Ditto.
	(vuint16m2_t): Ditto.
	(vint16m4_t): Ditto.
	(vuint16m4_t): Ditto.
	(vint16m8_t): Ditto.
	(vuint16m8_t): Ditto.
	(vint32mf2_t): Ditto.
	(vuint32mf2_t): Ditto.
	(vint32m1_t): Ditto.
	(vuint32m1_t): Ditto.
	(vint32m2_t): Ditto.
	(vuint32m2_t): Ditto.
	(vint32m4_t): Ditto.
	(vuint32m4_t): Ditto.
	(vint32m8_t): Ditto.
	(vuint32m8_t): Ditto.
	(vint64m1_t): Ditto.
	(vuint64m1_t): Ditto.
	(vint64m2_t): Ditto.
	(vuint64m2_t): Ditto.
	(vint64m4_t): Ditto.
	(vuint64m4_t): Ditto.
	(vint64m8_t): Ditto.
	(vuint64m8_t): Ditto.
	(vfloat32mf2_t): Ditto.
	(vfloat32m1_t): Ditto.
	(vfloat32m2_t): Ditto.
	(vfloat32m4_t): Ditto.
	(vfloat32m8_t): Ditto.
	(vfloat64m1_t): Ditto.
	(vfloat64m2_t): Ditto.
	(vfloat64m4_t): Ditto.
	(vfloat64m8_t): Ditto.
	* config/riscv/riscv-vector-switch.def (ENTRY): Ditto.
	* config/riscv/riscv.cc (riscv_legitimize_poly_move): Ditto.
	(riscv_convert_vector_bits): Ditto.
	* config/riscv/riscv.md:
	* config/riscv/vector-iterators.md:
	* config/riscv/vector.md
	(@pred_indexed_<order>store<VNX32_QH:mode><VNX32_QHI:mode>): Ditto.
	(@pred_indexed_<order>store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
	(@pred_indexed_<order>store<VNX64_Q:mode><VNX64_Q:mode>): Ditto.
	(@pred_indexed_<order>store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
	(@pred_indexed_<order>store<VNX128_Q:mode><VNX128_Q:mode>): Ditto.
	(@pred_reduc_<reduc><mode><vlmul1_zve64>): Ditto.
	(@pred_widen_reduc_plus<v_su><mode><vwlmul1_zve64>): Ditto.
	(@pred_reduc_plus<order><mode><vlmul1_zve64>): Ditto.
	(@pred_widen_reduc_plus<order><mode><vwlmul1_zve64>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/pr108185-4.c: Adapt testcase.
	* gcc.target/riscv/rvv/base/spill-1.c: Ditto.
	* gcc.target/riscv/rvv/base/spill-11.c: Ditto.
	* gcc.target/riscv/rvv/base/spill-2.c: Ditto.
	* gcc.target/riscv/rvv/base/spill-3.c: Ditto.
	* gcc.target/riscv/rvv/base/spill-5.c: Ditto.
	* gcc.target/riscv/rvv/base/spill-9.c: Ditto.

9fdea28d

RISC-V: Align IOR optimization MODE_CLASS condition to AND. · 978e8f02

Pan Li authored 1 year ago


This patch aligned the MODE_CLASS condition of the IOR to the AND. Then
more MODE_CLASS besides SCALAR_INT can able to perform the optimization
A | (~A) -> -1 similar to AND operator. For example as below sample code.

vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl)
{
  return __riscv_vmorn_mm_b32(v1, v1, vl);
}

Before this patch:
vsetvli  a5,zero,e8,mf4,ta,ma
vlm.v    v24,0(a1)
vsetvli  zero,a2,e8,mf4,ta,ma
vmorn.mm v24,v24,v24
vsetvli  a5,zero,e8,mf4,ta,ma
vsm.v    v24,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf4,ta,ma
vmset.m v24
vsetvli a5,zero,e8,mf4,ta,ma
vsm.v   v24,0(a0)
ret

Or in RTL's perspective,
from:
(ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
to:
(const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])

The similar optimization like VMANDN has enabled already. There should
be no difference execpt the operator when compare the VMORN and VMANDN
for such kind of optimization. The patch aligns the IOR MODE_CLASS condition
of the simplification to the AND operator.

gcc/ChangeLog:

	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
	Align IOR (A | (~A) -> -1) optimization MODE_CLASS condition to AND.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: Update check
	condition.
	* gcc.target/riscv/simplify_ior_optimization.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

978e8f02

i386: Emit compares between high registers and memory · 0df6d181

Uros Bizjak authored 1 year ago

Following code:

typedef __SIZE_TYPE__ size_t;

struct S1s
{
  char pad1;
  char val;
  short pad2;
};

extern char ts[256];

_Bool foo (struct S1s a, size_t i)
{
  return (ts[i] > a.val);
}

compiles with -O2 to:

        movl    %edi, %eax
        movsbl  %ah, %edi
        cmpb    %dil, ts(%rsi)
        setg    %al
        ret

the compare could use high register %ah instead of %dil:

        movl    %edi, %eax
        cmpb    ts(%rsi), %ah
        setl    %al
        ret

Use any_extract code iterator to handle signed and unsigned extracts
from high register and introduce peephole2 patterns to propagate
norex memory opeerand into the compare insn.

gcc/ChangeLog:

	PR target/78904
	PR target/78952
	* config/i386/i386.md (*cmpqi_ext<mode>_1_mem_rex64): New insn pattern.
	(*cmpqi_ext<mode>_1): Use nonimmediate_operand predicate
	for operand 0. Use any_extract code iterator.
	(*cmpqi_ext<mode>_1 peephole2): New peephole2 pattern.
	(*cmpqi_ext<mode>_2): Use any_extract code iterator.
	(*cmpqi_ext<mode>_3_mem_rex64): New insn pattern.
	(*cmpqi_ext<mode>_1): Use general_operand predicate
	for operand 1. Use any_extract code iterator.
	(*cmpqi_ext<mode>_3 peephole2): New peephole2 pattern.
	(*cmpqi_ext<mode>_4): Use any_extract code iterator.

gcc/testsuite/ChangeLog:

	PR target/78904
	PR target/78952
	* gcc.target/i386/pr78952-3.c: New test.

0df6d181

aarch64: Factorise widening add/sub high-half expanders with iterators · a30078d5

Kyrylo Tkachov authored 1 year ago

I noticed these define_expand are almost identical modulo some string substitutions.
This patch compresses them together with a couple of code iterators.
No functional change intended.
Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (aarch64_saddw2<mode>): Delete.
	(aarch64_uaddw2<mode>): Delete.
	(aarch64_ssubw2<mode>): Delete.
	(aarch64_usubw2<mode>): Delete.
	(aarch64_<ANY_EXTEND:su><ADDSUB:optab>w2<mode>): New define_expand.

a30078d5

Use solve_add_graph_edge in more places · 57aecdbc

Richard Biener authored 2 years ago

The following makes sure to use solve_add_graph_edge and honoring
special-cases, especially edges from escaped, in the remaining places
the solver adds edges.

	* tree-ssa-structalias.cc (do_ds_constraint): Use
	solve_add_graph_edge.

57aecdbc

Split out solve_add_graph_edge · 2cef0d09

Richard Biener authored 2 years ago

Split out a worker with all the special-casings when adding a graph
edge during solving.

	* tree-ssa-structalias.cc (solve_add_graph_edge): New function,
	split out from ...
	(do_sd_constraint): ... here.

2cef0d09

Remove odd code from gimple_can_merge_blocks_p · 1da16c11

Richard Biener authored 2 years ago

The following removes a special case to not merge a block with
only a non-local label.  We have a restriction of non-local labels
to be the first statement (and label) in a block, but otherwise nothing,
if the last stmt of A is a non-local label then it will be still
the first statement of the combined A + B.  In particular we'd
happily merge when there's a stmt after that label.

The check originates from the tree-ssa merge.

Bootstrapped and tested on x86_64-unknown-linux-gnu with all
languages.

	* tree-cfg.cc (gimple_can_merge_blocks_p): Remove condition
	rejecting the merge when A contains only a non-local label.

1da16c11

Introduce VIRTUAL_REGISTER_P and VIRTUAL_REGISTER_NUM_P predicates · 258aecd7

Uros Bizjak authored 1 year ago

These two predicates are similar to existing HARD_REGISTER_P and
HARD_REGISTER_NUM_P predicates and return 1 if the given register
corresponds to a virtual register.

gcc/ChangeLog:

	* rtl.h (VIRTUAL_REGISTER_P): New predicate.
	(VIRTUAL_REGISTER_NUM_P): Ditto.
	(REGNO_PTR_FRAME_P): Use VIRTUAL_REGISTER_NUM_P predicate.
	* expr.cc (force_operand): Use VIRTUAL_REGISTER_P predicate.
	* function.cc (instantiate_decl_rtl): Ditto.
	* rtlanal.cc (rtx_addr_can_trap_p_1): Ditto.
	(nonzero_address_p): Ditto.
	(refers_to_regno_p): Use VIRTUAL_REGISTER_NUM_P predicate.

258aecd7

Fix pointer sharing in Value_Range constructor. · 4c9f8cd6
Aldy Hernandez authored 2 years ago
```
gcc/ChangeLog:

	* value-range.h (Value_Range::Value_Range): Avoid pointer sharing.
```
4c9f8cd6

Transform more gmp/mpfr uses to use RAII · 210617b5

Richard Biener authored 1 year ago

The following picks up the coccinelle generated patch from Bernhard,
leaving out the fortran frontend parts and fixing up the rest.
In particular both gmp.h and mpfr.h contain macros like
  #define mpfr_inf_p(_x)      ((_x)->_mpfr_exp == __MPFR_EXP_INF)
for which I add operator-> overloads to the auto_* classes.

	* system.h (auto_mpz::operator->()): New.
	* realmpfr.h (auto_mpfr::operator->()): New.
	* builtins.cc (do_mpfr_lgamma_r): Use auto_mpfr.
	* real.cc (real_from_string): Likewise.
	(dconst_e_ptr): Likewise.
	(dconst_sqrt2_ptr): Likewise.
	* tree-ssa-loop-niter.cc (refine_value_range_using_guard):
	Use auto_mpz.
	(bound_difference_of_offsetted_base): Likewise.
	(number_of_iterations_ne): Likewise.
	(number_of_iterations_lt_to_ne): Likewise.
	* ubsan.cc: Include realmpfr.h.
	(ubsan_instrument_float_cast): Use auto_mpfr.

210617b5

Revert "libstdc++: Export global iostreams with GLIBCXX_3.4.31 symver [PR108969]" · fac24d43

Jonathan Wakely authored 1 year ago

This reverts commit b7c54e3f.

libstdc++-v3/ChangeLog:

	* config/abi/post/aarch64-linux-gnu/baseline_symbols.txt:
	* config/abi/post/i486-linux-gnu/baseline_symbols.txt:
	* config/abi/post/m68k-linux-gnu/baseline_symbols.txt:
	* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt:
	* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt:
	* config/abi/post/s390x-linux-gnu/baseline_symbols.txt:
	* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt:
	* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt:
	* config/abi/pre/gnu.ver:
	* src/Makefile.am:
	* src/Makefile.in:
	* src/c++98/Makefile.am:
	* src/c++98/Makefile.in:
	* src/c++98/globals_io.cc (defined):
	(_GLIBCXX_IO_GLOBAL):

fac24d43

Revert "libstdc++: Fix preprocessor condition in linker script [PR108969]" · a6e4b81b
Jonathan Wakely authored 1 year ago
```
This reverts commit 6067ae45.

libstdc++-v3/ChangeLog:

	* config/abi/pre/gnu.ver:
```
a6e4b81b

Remove special-cased edges when solving copies · 6702fdcd

Richard Biener authored 2 years ago

The following makes sure to remove the copy edges we ignore or
need to special-case only once.

	* tree-ssa-structalias.cc (solve_graph): Remove self-copy
	edges, remove edges from escaped after special-casing them.

6702fdcd

Fix do_sd_constraint escape special casing · 8366e676

Richard Biener authored 2 years ago

The following fixes the escape special casing to test the proper
variable IDs.

	* tree-ssa-structalias.cc (do_sd_constraint): Fixup escape
	special casing.

8366e676

Remove senseless store in do_sd_constraint · 9d218c45

Richard Biener authored 2 years ago

	* tree-ssa-structalias.cc (do_sd_constraint): Do not write
	to the LHS varinfo solution member.

9d218c45

Avoid non-unified nodes on the topological sorting for PTA solving · 7838574b

Richard Biener authored 2 years ago

Since we do not update successor edges when merging nodes we have
to deal with this in the users.  The following avoids putting those
on the topo order vector.

	* tree-ssa-structalias.cc (topo_visit): Look at the real
	destination of edges.

7838574b

tree-optimization/44794 - avoid excessive RTL unrolling on epilogues · a243ce2a

Richard Biener authored 2 years ago

The following adjusts tree_[transform_and_]unroll_loop to set an
upper bound on the number of iterations on the epilogue loop it
creates.  For the testcase at hand which involves array prefetching
this avoids applying RTL unrolling to them when -funroll-loops is
specified.

Other users of this API includes predictive commoning and
unroll-and-jam.

	PR tree-optimization/44794
	* tree-ssa-loop-manip.cc (tree_transform_and_unroll_loop):
	If an epilogue loop is required set its iteration upper bound.

a243ce2a

LoongArch: Improve cpymemsi expansion [PR109465] · 6d7e0bcf

Xi Ruoyao authored 1 year ago

We'd been generating really bad block move sequences which is recently
complained by kernel developers who tried __builtin_memcpy.  To improve
it:

1. Take the advantage of -mno-strict-align.  When it is set, set mode
   size to UNITS_PER_WORD regardless of the alignment.
2. Half the mode size when (block size) % (mode size) != 0, instead of
   falling back to ld.bu/st.b at once.
3. Limit the length of block move sequence considering the number of
   instructions, not the size of block.  When -mstrict-align is set and
   the block is not aligned, the old size limit for straight-line
   implementation (64 bytes) was definitely too large (we don't have 64
   registers anyway).

Change since v1: add a comment about the calculation of num_reg.

gcc/ChangeLog:

	PR target/109465
	* config/loongarch/loongarch-protos.h
	(loongarch_expand_block_move): Add a parameter as alignment RTX.
	* config/loongarch/loongarch.h:
	(LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER): Remove.
	(LARCH_MAX_MOVE_BYTES_STRAIGHT): Remove.
	(LARCH_MAX_MOVE_OPS_PER_LOOP_ITER): Define.
	(LARCH_MAX_MOVE_OPS_STRAIGHT): Define.
	(MOVE_RATIO): Use LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of
	LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER.
	* config/loongarch/loongarch.cc (loongarch_expand_block_move):
	Take the alignment from the parameter, but set it to
	UNITS_PER_WORD if !TARGET_STRICT_ALIGN.  Limit the length of
	straight-line implementation with LARCH_MAX_MOVE_OPS_STRAIGHT
	instead of LARCH_MAX_MOVE_BYTES_STRAIGHT.
	(loongarch_block_move_straight): When there are left-over bytes,
	half the mode size instead of falling back to byte mode at once.
	(loongarch_block_move_loop): Limit the length of loop body with
	LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of
	LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER.
	* config/loongarch/loongarch.md (cpymemsi): Pass the alignment
	to loongarch_expand_block_move.

gcc/testsuite/ChangeLog:

	PR target/109465
	* gcc.target/loongarch/pr109465-1.c: New test.
	* gcc.target/loongarch/pr109465-2.c: New test.
	* gcc.target/loongarch/pr109465-3.c: New test.

6d7e0bcf

LoongArch: Improve GAR store for va_list · 81c65014

Xi Ruoyao authored 2 years ago

LoongArch backend used to save all GARs for a function with variable
arguments.  But sometimes a function only accepts variable arguments for
a purpose like C++ function overloading.  For example, POSIX defines
open() as:

    int open(const char *path, int oflag, ...);

But only two forms are actually used:

    int open(const char *pathname, int flags);
    int open(const char *pathname, int flags, mode_t mode);

So it's obviously a waste to save all 8 GARs in open().  We can use the
cfun->va_list_gpr_size field set by the stdarg pass to only save the
GARs necessary to be saved.

If the va_list escapes (for example, in fprintf() we pass it to
vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we
don't need a special case.

With this patch, only one GAR ($a2/$r6) is saved in open().  Ideally
even this stack store should be omitted too, but doing so is not trivial
and AFAIK there are no compilers (for any target) performing the "ideal"
optimization here, see https://godbolt.org/z/n1YqWq9c9.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk
(GCC 14 or now)?

gcc/ChangeLog:

	* config/loongarch/loongarch.cc
	(loongarch_setup_incoming_varargs): Don't save more GARs than
	cfun->va_list_gpr_size / UNITS_PER_WORD.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/va_arg.c: New test.

81c65014

Avoid unnecessary epilogues from tree_unroll_loop · 01e79e21

Richard Biener authored 2 years ago

The following fixes the condition determining whether we need an
epilogue.

	* tree-ssa-loop-manip.cc (determine_exit_conditions): Fix
	no epilogue condition.

01e79e21

Simplify gimple_assign_load · 2c800ed8

Richard Biener authored 2 years ago

The following simplifies and outlines gimple_assign_load.  In
particular it is not necessary to get at the base of the possibly
loaded expression but just handle the case of a single handled
component wrapping a non-memory operand.

	* gimple.h (gimple_assign_load): Outline...
	* gimple.cc (gimple_assign_load): ... here.  Avoid
	get_base_address and instead just strip the outermost
	handled component, treating a remaining handled component
	as load.

2c800ed8

aarch64: Delete __builtin_aarch64_neg* builtins and their use · 9bc407c7

Kyrylo Tkachov authored 1 year ago

I don't think we need to keep the __builtin_aarch64_neg* builtins around.
They are only used once in the vnegh_f16 intrinsic in arm_fp16.h and I AFAICT
it was added this way only for the sake of orthogonality in
https://gcc.gnu.org/g:d7f33f07d88984cbe769047e3d07fc21067fbba9
We already use normal "-" negation in the other vneg* intrinsics, so do so here as well.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd-builtins.def (neg): Delete builtins
	definition.
	* config/aarch64/arm_fp16.h (vnegh_f16): Reimplement using normal negation.

9bc407c7

tree-vect-patterns: Improve __builtin_{clz,ctz,ffs}ll vectorization [PR109011] · ade0a1ee

Jakub Jelinek authored 1 year ago

For __builtin_popcountll tree-vect-patterns.cc has
vect_recog_popcount_pattern, which improves the vectorized code.
Without that the vectorization is always multi-type vectorization
in the loop (at least int and long long types) where we emit two
.POPCOUNT calls with long long arguments and int return value and then
widen to long long, so effectively after vectorization do the
V?DImode -> V?DImode popcount twice, then pack the result into V?SImode
and immediately unpack.

The following patch extends that handling to __builtin_{clz,ctz,ffs}ll
builtins as well (as long as there is an optab for them; more to come
laster).

x86 can do __builtin_popcountll with -mavx512vpopcntdq, __builtin_clzll
with -mavx512cd, ppc can do __builtin_popcountll and __builtin_clzll
with -mpower8-vector and __builtin_ctzll with -mpower9-vector, s390
can do __builtin_{popcount,clz,ctz}ll with -march=z13 -mzarch (i.e. VX).

2023-04-19  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/109011
	* tree-vect-patterns.cc (vect_recog_popcount_pattern): Rename to ...
	(vect_recog_popcount_clz_ctz_ffs_pattern): ... this.  Handle also
	CLZ, CTZ and FFS.  Remove vargs variable, use
	gimple_build_call_internal rather than gimple_build_call_internal_vec.
	(vect_vect_recog_func_ptrs): Adjust popcount entry.

	* gcc.dg/vect/pr109011-1.c: New test.

ade0a1ee

dse: Use SUBREG_REG for copy_to_mode_reg in DSE replace_read for... · 76f44fbf

Jakub Jelinek authored 1 year ago

dse: Use SUBREG_REG for copy_to_mode_reg in DSE replace_read for WORD_REGISTER_OPERATIONS targets [PR109040]

While we've agreed this is not the right fix for the PR109040 bug,
the patch clearly improves generated code (at least on the testcase from the
PR), so I'd like to propose this as optimization heuristics improvement
for GCC 14.

2023-04-19  Jakub Jelinek  <jakub@redhat.com>

	PR target/109040
	* dse.cc (replace_read): If read_reg is a SUBREG of a word mode
	REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into
	a new REG rather than the SUBREG.

76f44fbf

[aarch64] Use wzr/xzr for assigning 0 to vector element. · 2c7bf803

Prathamesh Kulkarni authored 1 year ago

gcc/ChangeLog:
	* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>):
	New pattern.

gcc/testsuite/ChangeLog:
	* gcc.target/aarch64/vec-set-zero.c: New test.

2c7bf803

aarch64: PR target/108840 Simplify register shift RTX costs and eliminate shift amount masking · 136330bf

Kyrylo Tkachov authored 1 year ago

In this PR we fail to eliminate explicit &31 operations for variable shifts such as in:
void
bar (int x[3], int y)
{
  x[0] <<= (y & 31);
  x[1] <<= (y & 31);
  x[2] <<= (y & 31);
}

This is rejected by RTX costs that end up giving too high a cost for:
(set (reg:SI 96)
    (ashift:SI (reg:SI 98)
        (subreg:QI (and:SI (reg:SI 99)
                (const_int 31 [0x1f])) 0)))

There is code to handle the AND-31 case in rtx costs, but it gets confused by the subreg.
It's easy enough to fix by looking inside the subreg when costing the expression.
While doing that I noticed that the ASHIFT case and the other shift-like cases are almost identical
and we should just merge them. This code will only be used for valid insns anyway, so the code after this
patch should do the Right Thing (TM) for all such shift cases.

With this patch there are no more "and wn, wn, 31" instructions left in the testcase.

Bootstrapped and tested on aarch64-none-linux-gnu.

	PR target/108840

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_rtx_costs): Merge ASHIFT and
	ROTATE, ROTATERT, LSHIFTRT, ASHIFTRT cases.  Handle subregs in op1.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/pr108840.c: New test.

136330bf

rtl-optimization/109237 - quadraticness in delete_trivially_dead_insns · 675ac882

Richard Biener authored 2 years ago

The following addresses quadraticness in processing debug insns
in delete_trivially_dead_insns and insn_live_p by using TREE_VISITED
on the INSN_VAR_LOCATION_DECL to indicate a later debug bind
with the same decl and no intervening real insn or debug marker.
That gets rid of the NEXT_INSN walk in insn_live_p in favor of
first clearing TREE_VISITED in the first loop over insn and
the book-keeping of decls we set the bit since we need to clear
them when visiting a real or debug marker insn.

That improves the time spent in delete_trivially_dead_insns from
10.6s to 2.2s for the testcase.

	PR rtl-optimization/109237
	* cse.cc (insn_live_p): Remove NEXT_INSN walk, instead check
	TREE_VISITED on INSN_VAR_LOCATION_DECL.
	(delete_trivially_dead_insns): Maintain TREE_VISITED on
	active debug bind INSN_VAR_LOCATION_DECL.

675ac882