Commits · 6be5d852216d36f5b0024cd581c2508c168647a6 · COBOLworx / gcc-cobol

Jun 06, 2023

aarch64: Improve representation of vpaddd intrinsics · 6be5d852

Kyrylo Tkachov authored 1 year ago

The aarch64_addpdi pattern is redundant as the reduc_plus_scal_<mode> pattern can already generate
the required form of the ADDP instruction, and is mostly folded to GIMPLE early on so can benefit from more optimisations.
Though it turns out that we were missing the folding for the unsigned variants.
This patch adds that and wires up the vpaddd_u64 and vpaddd_s64 intrinsics through the above pattern instead
so that we can remove a redundant pattern and get more optimisation earlier.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

	* config/aarch64/aarch64-builtins.cc (aarch64_general_gimple_fold_builtin):
	Handle unsigned reduc_plus_scal_ builtins.
	* config/aarch64/aarch64-simd-builtins.def (addp): Delete DImode instances.
	* config/aarch64/aarch64-simd.md (aarch64_addpdi): Delete.
	* config/aarch64/arm_neon.h (vpaddd_s64): Reimplement with
	__builtin_aarch64_reduc_plus_scal_v2di.
	(vpaddd_u64): Reimplement with __builtin_aarch64_reduc_plus_scal_v2di_uu.

6be5d852

aarch64: Reimplement URSHR,SRSHR patterns with standard RTL codes · 93716409

Kyrylo Tkachov authored 1 year ago

Having converted the patterns for the URSRA,SRSRA instructions to standard RTL codes we can also
easily convert the non-accumulating forms URSHR,SRSHR.
This patch does that, reusing the various helpers and predicates from that patch in a straightforward way.
This allows GCC to perform the optimisations in the testcase, matching what Clang does.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (aarch64_<sur>shr_n<mode>): Delete.
	(aarch64_<sra_op>rshr_n<mode><vczle><vczbe>_insn): New define_insn.
	(aarch64_<sra_op>rshr_n<mode>): New define_expand.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/simd/vrshr_1.c: New test.

93716409

aarch64: Simplify SHRN, RSHRN expanders and patterns · d2cdfafd

Kyrylo Tkachov authored 1 year ago

Now that we've got the <vczle><vczbe> annotations we can get rid of explicit
!BYTES_BIG_ENDIAN and BYTES_BIG_ENDIAN patterns for the narrowing shift instructions.
This allows us to clean up the expanders as well.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le): Delete.
	(aarch64_shrn<mode>_insn_be): Delete.
	(*aarch64_<srn_op>shrn<mode>_vect):  Rename to...
	(*aarch64_<srn_op>shrn<mode><vczle><vczbe>): ... This.
	(aarch64_shrn<mode>): Remove reference to the above deleted patterns.
	(aarch64_rshrn<mode>_insn_le): Delete.
	(aarch64_rshrn<mode>_insn_be): Delete.
	(aarch64_rshrn<mode><vczle><vczbe>_insn): New define_insn.
	(aarch64_rshrn<mode>): Remove references to the above deleted patterns.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/simd/pr99195_5.c: Add testing for shrn_n, rshrn_n
	intrinsics.

d2cdfafd

aarch64: Improve representation of ADDLV instructions · b327cbe8

Kyrylo Tkachov authored 1 year ago

We've received requests to optimise the attached intrinsics testcase.
We currently generate:
foo_1:
        uaddlp  v0.4s, v0.8h
        uaddlv  d31, v0.4s
        fmov    x0, d31
        ret
foo_2:
        uaddlp  v0.4s, v0.8h
        addv    s31, v0.4s
        fmov    w0, s31
        ret
foo_3:
        saddlp  v0.4s, v0.8h
        addv    s31, v0.4s
        fmov    w0, s31
        ret

The widening pair-wise addition addlp instructions can be omitted if we're just doing an ADDV afterwards.
Making this optimisation would be quite simple if we had a standard RTL PLUS vector reduction code.
As we don't, we can use UNSPEC_ADDV as a stand in.
This patch expresses the SADDLV and UADDLV instructions as an UNSPEC_ADDV over a widened input, thus removing
the need for separate UNSPEC_SADDLV and UNSPEC_UADDLV codes.
To optimise the testcases involved we add two splitters that match a vector addition where all participating elements
are taken and widened from the same vector and then fed into an UNSPEC_ADDV. In that case we can just remove the
vector PLUS and just emit the simple RTL for SADDLV/UADDLV.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

	* config/aarch64/aarch64-protos.h (aarch64_parallel_select_half_p):
	Define prototype.
	(aarch64_pars_overlap_p): Likewise.
	* config/aarch64/aarch64-simd.md (aarch64_<su>addlv<mode>):
	Express in terms of UNSPEC_ADDV.
	(*aarch64_<su>addlv<VDQV_L:mode>_ze<GPI:mode>): Likewise.
	(*aarch64_<su>addlv<mode>_reduction): Define.
	(*aarch64_uaddlv<mode>_reduction_2): Likewise.
	* config/aarch64/aarch64.cc	(aarch64_parallel_select_half_p): Define.
	(aarch64_pars_overlap_p): Likewise.
	* config/aarch64/iterators.md (UNSPEC_SADDLV, UNSPEC_UADDLV): Delete.
	(VQUADW): New mode attribute.
	(VWIDE2X_S): Likewise.
	(USADDLV): Delete.
	(su): Delete handling of UNSPEC_SADDLV, UNSPEC_UADDLV.
	* config/aarch64/predicates.md (vect_par_cnst_select_half): Define.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/simd/addlv_1.c: New test.

b327cbe8

middle-end/110055 - avoid CLOBBERing static variables · 84eec291

Richard Biener authored 1 year ago

The gimplifier can elide initialized constant automatic variables
to static storage in which case TARGET_EXPR gimplification needs
to avoid emitting a CLOBBER for them since their lifetime is no
longer limited.  Failing to do so causes spurious dangling-pointer
diagnostics on the added testcase for some targets.

	PR middle-end/110055
	* gimplify.cc (gimplify_target_expr): Do not emit
	CLOBBERs for variables which have static storage duration
	after gimplifying their initializers.

	* g++.dg/warn/Wdangling-pointer-pr110055.C: New testcase.

84eec291

tree-optimization/109143 - improve PTA compile time · 21bf2b2f

Richard Biener authored 1 year ago

The following improves solution_set_expand to require one less
iteration over the bitmap and avoid changing the bitmap we iterate
over.  Plus we handle adjacent subvars in the ID space (the common case)
and use bitmap_set_range.  This cuts a bit less than 10% off the PTA
time from the testcase in the PR.

	PR tree-optimization/109143
	* tree-ssa-structalias.cc (solution_set_expand): Avoid
	one bitmap iteration and optimize bit range setting.

21bf2b2f

libiberty: writeargv: Simplify function error mode. · 4d1e4ce9

Costas Argyris authored 1 year ago

writeargv can be simplified by getting rid of the error exit mode
that was only relevant many years ago when the function used
to open the file descriptor internally.

0001-libiberty-writeargv-Simplify-function-error-mode.patch

From 1271552baee5561fa61652f4ca7673c9667e4f8f Mon Sep 17 00:00:00 2001
From: Costas Argyris <costas.argyris@gmail.com>
Date: Mon, 5 Jun 2023 15:02:06 +0100
Subject: [PATCH] libiberty: writeargv: Simplify function error mode.

The goto-based error mode was based on a previous version
of the function where it was responsible for opening the
file, so it had to close it upon any exit:

https://inbox.sourceware.org/gcc-patches/20070417200340.GM9017@sparrowhawk.codesourcery.com/

(thanks pinskia)

This is no longer the case though since now the function
takes the file descriptor as input, so the exit mode on
error can be just a simple return 1 statement.

libiberty/
	* argv.c (writeargv): Simplify & remove gotos.

Signed-off-by: Costas Argyris <costas.argyris@gmail.com>

4d1e4ce9

bootstrap rtl-checking: Fix XVEC vs XVECEXP in postreload.cc · 9677cc74

Hans-Peter Nilsson authored 1 year ago

	PR bootstrap/110120
	* postreload.cc (reload_cse_move2add, move2add_use_add2_insn): Use
	XVECEXP, not XEXP, to access first item of a PARALLEL.

9677cc74

RISC-V] add TC for save-restore cfi directives. · d1344c41

Fei Gao authored 1 year ago

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/save-restore-cfi.c: New test to check save-restore
	cfi directives.

d1344c41

RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API · 78058904

Pan Li authored 1 year ago


This patch support the intrinsic API of FP16 ZVFH Reduction floating-point.
Aka SEW=16 for below instructions:

vfredosum vfredusum
vfredmax vfredmin
vfwredosum vfwredusum

Then users can leverage the instrinsic APIs to perform the FP=16 related
reduction operations. Please note not all the instrinsic APIs are coverred
in the test files, only pick some typical ones due to too many. We will
perform the FP16 related instrinsic API test entirely soon.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv-vector-builtins-types.def
	(vfloat16mf4_t): Add vfloat16mf4_t to WF operations.
	(vfloat16mf2_t): Likewise.
	(vfloat16m1_t): Likewise.
	(vfloat16m2_t): Likewise.
	(vfloat16m4_t): Likewise.
	(vfloat16m8_t): Likewise.
	* config/riscv/vector-iterators.md: Add FP=16 to VWF, VWF_ZVE64,
	VWLMUL1, VWLMUL1_ZVE64, vwlmul1 and vwlmul1_zve64.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new test cases.

78058904

[RISC-V] correct machine mode in save-restore cfi RTL. · 17c796c7

Fei Gao authored 1 year ago

gcc/ChangeLog:

	* config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue): Use Pmode
	for cfi reg/mem machmode
	(riscv_adjust_libcall_cfi_epilogue): Use Pmode for cfi reg machmode

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/save-restore-cfi-2.c: New test to check machmode
	for cfi reg/mem.

17c796c7

RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in vector-iterators.md. · da2d75af

Li Xu authored 1 year ago

gcc/ChangeLog:

	* config/riscv/vector-iterators.md:
	Fix 'REQUIREMENT' for machine_mode 'MODE'.
	* config/riscv/vector.md (@pred_indexed_<order>store<VNX16_QHS:mode>
	<VNX16_QHSI:mode>): change VNX16_QHSI to VNX16_QHSDI.
	(@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Ditto.

da2d75af

RISC-V: Fix some typo in vector-iterators.md · 6d4b6f7b

Pan Li authored 1 year ago


This patch would like to fix some typo in vector-iterators.md, aka:

[-"vnx1DI")-]{+"vnx1di")+}
[-"vnx2SI")-]{+"vnx2si")+}
[-"vnx1SI")-]{+"vnx1si")+}

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/vector-iterators.md: Fix typo in mode attr.

6d4b6f7b

Daily bump. · 14da7648
GCC Administrator authored 1 year ago

14da7648

Jun 05, 2023

Remove widen_plus/minus_expr tree codes · 8ebd1d9a

Andre Vieira authored 1 year ago

This patch removes the old widen plus/minus tree codes which have been
replaced by internal functions.

2023-06-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Joel Hutton  <joel.hutton@arm.com>

gcc/ChangeLog:

	* doc/generic.texi: Remove old tree codes.
	* expr.cc (expand_expr_real_2): Remove old tree code cases.
	* gimple-pretty-print.cc (dump_binary_rhs): Likewise.
	* optabs-tree.cc (optab_for_tree_code): Likewise.
	(supportable_half_widening_operation): Likewise.
	* tree-cfg.cc (verify_gimple_assign_binary): Likewise.
	* tree-inline.cc (estimate_operator_cost): Likewise.
	(op_symbol_code): Likewise.
	* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise.
	(vect_analyze_data_ref_accesses): Likewise.
	* tree-vect-generic.cc (expand_vector_operations_1): Likewise.
	* cfgexpand.cc (expand_debug_expr): Likewise.
	* tree-vect-stmts.cc (vectorizable_conversion): Likewise.
	(supportable_widening_operation): Likewise.
	* gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard):
	Likewise.
	* optabs.def (vec_widen_ssubl_hi_optab, vec_widen_ssubl_lo_optab,
	vec_widen_saddl_hi_optab, vec_widen_saddl_lo_optab,
	vec_widen_usubl_hi_optab, vec_widen_usubl_lo_optab,
	vec_widen_uaddl_hi_optab, vec_widen_uaddl_lo_optab): Remove optabs.
	* tree-pretty-print.cc (dump_generic_node): Remove tree code definition.
	* tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR,
	VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR,
	VEC_WIDEN_MINUS_LO_EXPR): Likewise.

8ebd1d9a

internal-fn,vect: Refactor widen_plus as internal_fn · 2f482a07

Andre Vieira authored 1 year ago

     DEF_INTERNAL_WIDENING_OPTAB_FN and DEF_INTERNAL_NARROWING_OPTAB_FN
are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN
respectively. With the exception that they provide convenience wrappers
for a single vector to vector conversion, a hi/lo split or an even/odd
split.  Each definition for <NAME> will require either signed optabs
named <UOPTAB> and <SOPTAB> (for widening) or a single <OPTAB> (for
narrowing) for each of the five functions it creates.

      For example, for widening addition the
DEF_INTERNAL_WIDENING_OPTAB_FN will create five internal functions:
IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
IFN_VEC_WIDEN_PLUS_EVEN and IFN_VEC_WIDEN_PLUS_ODD. Each requiring two
optabs, one for signed and one for unsigned.
      Aarch64 implements the hi/lo split optabs:
      IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> -> (u/s)addl2
      IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode> -> (u/s)addl

     This gives the same functionality as the previous
WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into
VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.

2023-06-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Joel Hutton  <joel.hutton@arm.com>
	    Tamar Christina  <tamar.christina@arm.com>

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): Rename
	this ...
	(vec_widen_<su>add_lo_<mode>): ... to this.
	(vec_widen_<su>addl_hi_<mode>): Rename this ...
	(vec_widen_<su>add_hi_<mode>): ... to this.
	(vec_widen_<su>subl_lo_<mode>): Rename this ...
	(vec_widen_<su>sub_lo_<mode>): ... to this.
	(vec_widen_<su>subl_hi_<mode>): Rename this ...
	(vec_widen_<su>sub_hi_<mode>): ...to this.
	* doc/generic.texi: Document new IFN codes.
	* internal-fn.cc (lookup_hilo_internal_fn): Add lookup function.
	(commutative_binary_fn_p): Add widen_plus fn's.
	(widening_fn_p): New function.
	(narrowing_fn_p): New function.
	(direct_internal_fn_optab): Change visibility.
	* internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an
	internal_fn that expands into multiple internal_fns for widening.
	(IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
	IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD,
	IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI,
	IFN_VEC_WIDEN_MINUS_LO, IFN_VEC_WIDEN_MINUS_ODD,
	IFN_VEC_WIDEN_MINUS_EVEN): Define widening  plus,minus functions.
	* internal-fn.h (direct_internal_fn_optab): Declare new prototype.
	(lookup_hilo_internal_fn): Likewise.
	(widening_fn_p): Likewise.
	(Narrowing_fn_p): Likewise.
	* optabs.cc (commutative_optab_p): Add widening plus optabs.
	* optabs.def (OPTAB_D): Define widen add, sub optabs.
	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
	patterns with a hi/lo or even/odd split.
	(vect_recog_sad_pattern): Refactor to use new IFN codes.
	(vect_recog_widen_plus_pattern): Likewise.
	(vect_recog_widen_minus_pattern): Likewise.
	(vect_recog_average_pattern): Likewise.
	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
	_HILO IFNs.
	(supportable_widening_operation): Likewise.
	* tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-widen-add.c: Test that new
	IFN_VEC_WIDEN_PLUS is being used.
	* gcc.target/aarch64/vect-widen-sub.c: Test that new
	IFN_VEC_WIDEN_MINUS is being used.

2f482a07

vect: Refactor to allow internal_fn's · fe29963d

Andre Vieira authored 1 year ago

Refactor vect-patterns to allow patterns to be internal_fns starting
with widening_plus/minus patterns

2023-06-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Joel Hutton  <joel.hutton@arm.com>

gcc/ChangeLog:
	* tree-vect-patterns.cc: Add include for gimple-iterator.
	(vect_recog_widen_op_pattern): Refactor to use code_helper.
	(vect_gimple_build): New function.
	* tree-vect-stmts.cc (simple_integer_narrowing): Refactor to use
	code_helper.
	(vectorizable_call): Likewise.
	(vect_gen_widened_results_half): Likewise.
	(vect_create_vectorized_demotion_stmts): Likewise.
	(vect_create_vectorized_promotion_stmts): Likewise.
	(vect_create_half_widening_stmts): Likewise.
	(vectorizable_conversion): Likewise.
	(supportable_widening_operation): Likewise.
	(supportable_narrowing_operation): Likewise.
	* tree-vectorizer.h (supportable_widening_operation): Change
	prototype to use code_helper.
	(supportable_narrowing_operation): Likewise.
	(vect_gimple_build): New function prototype.
	* tree.h (code_helper::safe_as_tree_code): New function.
	(code_helper::safe_as_fn_code): New function.

fe29963d

d: Warn when declared size of a special enum does not match its intrinsic type. · 3ad9313a

Iain Buclaw authored 1 year ago

All special enums have declarations in the D runtime library, but the
compiler will recognize and treat them specially if declared in any
module.  When the underlying base type of a special enum is a different
size to its matched intrinsic, then this can cause undefined behavior at
runtime.  Detect and warn about when such a mismatch occurs.

gcc/d/ChangeLog:

	* gdc.texi (Warnings): Document -Wextra and -Wmismatched-special-enum.
	* implement-d.texi (Special Enums): Add reference to warning option
	-Wmismatched-special-enum.
	* lang.opt: Add -Wextra and -Wmismatched-special-enum.
	* types.cc (TypeVisitor::visit (TypeEnum *)): Warn when declared
	special enum size mismatches its intrinsic type.

gcc/testsuite/ChangeLog:

	* gdc.dg/Wmismatched_enum.d: New test.

3ad9313a

New wi::bitreverse function. · 108ff03b

Roger Sayle authored 1 year ago

This patch provides a wide-int implementation of bitreverse, that
implements both of Richard Sandiford's suggestions from the review at
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618215.html of an
improved API (as a stand-alone function matching the bswap refactoring),
and an implementation that works with any bit-width precision.

2023-06-05  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* wide-int.cc (wi::bitreverse_large): New function implementing
	bit reversal of an integer.
	* wide-int.h (wi::bitreverse): New (template) function prototype.
	(bitreverse_large): Prototype helper function/implementation.
	(wi::bitreverse): New template wrapper around bitreverse_large.

108ff03b

Testsuite: Fix a fail about xtheadcondmov-indirect-rv64.c · f7f12f0b

Liao Shihua authored 1 year ago

I find fail of the xtheadcondmov-indirect-rv64.c test case and provide a way to solve it.
In this patch, I take Kito's advice that I modify the form of the function bodies.It likes
*[a-x0-9].

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Generalize to be
	less sensitive to register allocation choices.
	* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Similarly.

f7f12f0b

print-rtl: Change return type of two print functions from int to void · 8e1e1fc4

Uros Bizjak authored 1 year ago

Also change one internal variable to bool.

gcc/ChangeLog:

	* rtl.h (print_rtl_single): Change return type from int to void.
	(print_rtl_single_with_indent): Ditto.
	* print-rtl.h (class rtx_writer): Ditto.  Change m_sawclose to bool.
	* print-rtl.cc (rtx_writer::rtx_writer): Update for m_sawclose change.
	(rtx_writer::print_rtx_operand_code_0): Ditto.
	(rtx_writer::print_rtx_operand_codes_E_and_V): Ditto.
	(rtx_writer::print_rtx_operand_code_i): Ditto.
	(rtx_writer::print_rtx_operand_code_u): Ditto.
	(rtx_writer::print_rtx_operand): Ditto.
	(rtx_writer::print_rtx): Ditto.
	(rtx_writer::finish_directive): Ditto.
	(print_rtl_single): Change return type from int to void
	and adjust function body accordingly.
	(rtx_writer::print_rtl_single_with_indent): Ditto.

8e1e1fc4

reginfo: Change return type of predicate functions from int to bool · d015c658

Uros Bizjak authored 1 year ago

gcc/ChangeLog:

	* rtl.h (reg_classes_intersect_p): Change return type from int to bool.
	(reg_class_subset_p): Ditto.
	* reginfo.cc (reg_classes_intersect_p): Ditto.
	(reg_class_subset_p): Ditto.

d015c658

libiberty: pex-win32.c: Fix some typos. · 7ee22dc8

Costas Argyris authored 1 year ago


libiberty/ChangeLog:

	* pex-win32.c: fix typos.

Signed-off-by: Costas Argyris <costas.argyris@gmail.com>
Signed-off-by: Jonathan Yong <10walls@gmail.com>

7ee22dc8

RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API · 71ea7a30

Pan Li authored 1 year ago


This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:

vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt

Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv-vector-builtins-types.def
	(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
	(vfloat32m1_t): Ditto.
	(vfloat32m2_t): Ditto.
	(vfloat32m4_t): Ditto.
	(vfloat32m8_t): Ditto.
	(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
	(vint16mf2_t): Ditto.
	(vint16m1_t): Ditto.
	(vint16m2_t): Ditto.
	(vint16m4_t): Ditto.
	(vint16m8_t): Ditto.
	(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
	(vuint16mf2_t): Ditto.
	(vuint16m1_t): Ditto.
	(vuint16m2_t): Ditto.
	(vuint16m4_t): Ditto.
	(vuint16m8_t): Ditto.
	(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
	(vint32m1_t): Ditto.
	(vint32m2_t): Ditto.
	(vint32m4_t): Ditto.
	(vint32m8_t): Ditto.
	(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
	(vuint32m1_t): Ditto.
	(vuint32m2_t): Ditto.
	(vuint32m4_t): Ditto.
	(vuint32m8_t): Ditto.
	* config/riscv/vector-iterators.md: Add FP=16 support for V,
	VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

71ea7a30

libiberty: On Windows, pass a >32k cmdline through a response file. · 180ebb8a

Costas Argyris authored 1 year ago


pex-win32.c (win32_spawn): If the command line for CreateProcess
exceeds the 32k Windows limit, try to store it in a temporary
response file and call CreateProcess with @file instead (PR71850).

Signed-off-by: Costas Argyris <costas.argyris@gmail.com>
Signed-off-by: Jonathan Yong <10walls@gmail.com>

libiberty/ChangeLog:

	* pex-win32.c (win32_spawn): Check command line length
	and generate a response file if necessary.
	(spawn_script): Adjust parameters.
	(pex_win32_exec_child): Ditto.

Signed-off-by: Jonathan Yong <10walls@gmail.com>

180ebb8a

Fix PR 110085: `make clean` in GCC directory on sh target causes a failure · afd87299

Andrew Pinski authored 1 year ago

On sh target, there is a MULTILIB_DIRNAMES (or is it MULTILIB_OPTIONS) named m2,
this conflicts with the langauge m2. So when you do a `make clean`, it will remove
the m2 directory and then a build will fail. Now since r0-78222-gfa9585134f6f58,
the multilib directories are no longer created in the gcc directory as libgcc
was moved to the toplevel. So we can remove the part of clean that removes those
directories.

Tested on x86_64-linux-gnu and a cross to sh-elf that `make clean` followed by
`make` works again.

OK?

gcc/ChangeLog:

	PR bootstrap/110085
	* Makefile.in (clean): Remove the removing of
	MULTILIB_DIR/MULTILIB_OPTIONS directories.

afd87299

libgcc: Use initarray section type for .init_stack · 83c3550e

Kewen Lin authored 1 year ago

One of my workmates found there is a warning like:

  libgcc/config/rs6000/morestack.S:402: Warning: ignoring
    incorrect section type for .init_array.00000

when compiling libgcc/config/rs6000/morestack.S.

Since commit r13-6545 touched that file recently, which was
suspected to be responsible for this warning, I did some
investigation and found this is a warning staying for a long
time.  For section .init_stack*, it's preferred to use
section type SHT_INIT_ARRAY.  So this patch is use
"@init_array" to replace "@progbits".

Although the warning is trivial, Segher suggested me to
post this to fix it, in order to avoid any possible
misunderstanding/confusion on the warning.

As Alan confirmed, this doesn't require a premise check
on if the existing binutils supports "@init_array" or not,
"because if you want split-stack to work, you must link
with gold, any version of binutils that has gold has an
assembler that understands @init_array". (Thanks Alan!)

libgcc/ChangeLog:

	* config/i386/morestack.S: Use @init_array rather than
	@progbits for section type of section .init_array.
	* config/rs6000/morestack.S: Likewise.
	* config/s390/morestack.S: Likewise.

83c3550e

MIPS: Add speculation_barrier support · 29b74545

YunQiang Su authored 1 year ago

speculation_barrier for MIPS needs sync+jr.hb (r2+),
so we implement __speculation_barrier in libgcc, like arm32 does.

gcc/ChangeLog:
	* config/mips/mips-protos.h (mips_emit_speculation_barrier): New
	prototype.
	* config/mips/mips.cc (speculation_barrier_libfunc): New static
	variable.
	(mips_init_libfuncs): Initialize it.
	(mips_emit_speculation_barrier): New function.
	* config/mips/mips.md (speculation_barrier): Call
	mips_emit_speculation_barrier.

libgcc/ChangeLog:
	* config/mips/lib1funcs.S: New file.
	define __speculation_barrier and include mips16.S.
	* config/mips/t-mips: define LIB1ASMSRC as mips/lib1funcs.S.
	define LIB1ASMFUNCS as _speculation_barrier.
	set version info for __speculation_barrier.
	* config/mips/libgcc-mips.ver: New file.
	* config/mips/t-mips16: don't define LIB1ASMSRC as mips16.S
	included in lib1funcs.S now.

29b74545

RISC-V: Reorganize riscv-v.cc · c7fe7ad6

Juzhe-Zhong authored 1 year ago

This patch is just reorganizing the functions for the following patch.

I put rvv_builder and emit_* functions located before expand_const_vector
function since I will use them in expand_const_vector in the following patch.

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (class rvv_builder): Reorganize functions.
	(rvv_builder::can_duplicate_repeating_sequence_p): Ditto.
	(rvv_builder::repeating_sequence_use_merge_profitable_p): Ditto.
	(rvv_builder::get_merged_repeating_sequence): Ditto.
	(rvv_builder::get_merge_scalar_mask): Ditto.
	(emit_scalar_move_insn): Ditto.
	(emit_vlmax_integer_move_insn): Ditto.
	(emit_nonvlmax_integer_move_insn): Ditto.
	(emit_vlmax_gather_insn): Ditto.
	(emit_vlmax_masked_gather_mu_insn): Ditto.
	(get_repeating_sequence_dup_machine_mode): Ditto.

c7fe7ad6

RISC-V: Split arguments of expand_vec_perm · 2418cdfc

Juzhe-Zhong authored 1 year ago

Since the following patch will calls expand_vec_perm with
splitted arguments, change the expand_vec_perm interface in
this patch.

gcc/ChangeLog:

	* config/riscv/autovec.md: Split arguments.
	* config/riscv/riscv-protos.h (expand_vec_perm): Ditto.
	* config/riscv/riscv-v.cc (expand_vec_perm): Ditto.

2418cdfc

Daily bump. · b4889084
GCC Administrator authored 1 year ago

b4889084

Jun 04, 2023

Improve do_store_flag for comparing single bit against that bit · 6cf856f8

Andrew Pinski authored 1 year ago

This is a case which I noticed while working on the previous patch.
Sometimes we end up with `a == CST` instead of comparing against 0.
This happens in the following code:
```
unsigned f(unsigned t)
{
  if (t & ~(1<<30)) __builtin_unreachable();
  t ^= (1<<30);
  return t != 0;
}
```

We should handle the case where the nonzero bits is the same as the
comparison operand.

Changes from v1:
* v2: Updated for the bit extraction changes.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

	* expr.cc (do_store_flag): Improve for single bit testing
	not against zero but against that single bit.

6cf856f8

Improve do_store_flag for single bit comparison against 0 · 908e5ab5

Andrew Pinski authored 1 year ago

While working something else, I noticed we could improve
the following function code generation:
```
unsigned f(unsigned t)
{
  if (t & ~(1<<30)) __builtin_unreachable();
  return t != 0;
}
```
Right know we just emit a comparison against 0 instead
of just a shift right by 30.
There is code in do_store_flag which already optimizes
`(t & 1<<30) != 0` to `(t >> 30) & 1` (using bit extraction if available).
This patch extends it to handle the case where we know t has a nonzero
of just one bit set.

Changes from v1:
* v2: Updated for the bit extraction improvements.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

	* expr.cc (do_store_flag): Extend the one bit checking case
	to handle the case where we don't have an and but rather still
	one bit is known to be non-zero.

908e5ab5

Convert H8 port to LRA · f66e0a94

Jeff Law authored 1 year ago

With Vlad's recent LRA fix to the elimination code, the H8 can be converted
to LRA.

This patch has two changes of note.

First, this turns Zz into a standard constraint.  This helps reloading for
the H8/SX movqi pattern.

Second, this drops the whole pattern for the SX bit memory operations.  I
can't see why those exist to begin with.  They should be handled by the
standard bit manipulation patterns.   If someone wants to try and improve SX
bit support, that'd be great and they can do so within the LRA framework :-)

Pushed to the trunk...

gcc/
	* config/h8300/constraints.md (Zz): Make this a normal
	constraint.
	* config/h8300/h8300.cc (TARGET_LRA_P): Remove.
	* config/h8300/logical.md (H8/SX bit patterns): Remove.

f66e0a94

xtensa: Optimize boolean evaluation or branching when EQ/NE to INT_MIN · 830d36b3

Takayuki 'January June' Suwa authored 1 year ago

This patch optimizes both the boolean evaluation of and the branching of
EQ/NE against INT_MIN (-2147483648), by taking advantage of the specifi-
cation the ABS machine instruction on Xtensa returns INT_MIN iff INT_MIN,
otherwise non-negative value.

    /* example */
    int test0(int x) {
      return (x == -2147483648);
    }
    int test1(int x) {
      return (x != -2147483648);
    }
    extern void foo(void);
    void test2(int x) {
      if(x == -2147483648)
        foo();
    }
    void test3(int x) {
      if(x != -2147483648)
        foo();
    }

    ;; before
    test0:
	movi.n	a9, -1
	slli	a9, a9, 31
	add.n	a2, a2, a9
	nsau	a2, a2
	srli	a2, a2, 5
	ret.n
    test1:
	movi.n	a9, -1
	slli	a9, a9, 31
	add.n	a9, a2, a9
	movi.n	a2, 1
	moveqz	a2, a9, a9
	ret.n
    test2:
	movi.n	a9, -1
	slli	a9, a9, 31
	bne	a2, a9, .L3
	j.l     foo, a9
    .L3:
	ret.n
    test3:
	movi.n	a9, -1
	slli	a9, a9, 31
	beq	a2, a9, .L5
	j.l	foo, a9
    .L5:
	ret.n

    ;; after
    test0:
	abs	a2, a2
	extui	a2, a2, 31, 1
	ret.n
    test1:
	abs	a2, a2
	srai	a2, a2, 31
	addi.n	a2, a2, 1
	ret.n
    test2:
	abs	a2, a2
	bbci	a2, 31, .L3
	j.l	foo, a9
    .L3:
	ret.n
    test3:
	abs	a2, a2
	bbsi	a2, 31, .L5
	j.l	foo, a9
    .L5:
	ret.n

gcc/ChangeLog:

	* config/xtensa/xtensa.md (*btrue_INT_MIN, *eqne_INT_MIN):
	New insn_and_split patterns.

830d36b3

RISC-V: Remove redundant vlmul_ext_* patterns to fix PR110109 · a96ba6b9

Juzhe-Zhong authored 1 year ago

This patch is to fix PR110109 issue. This issue happens is because:

(define_insn_and_split "*vlmul_extx2<mode>"
  [(set (match_operand:<VLMULX2> 0 "register_operand"  "=vr, ?&vr")
       (subreg:<VLMULX2>
         (match_operand:VLMULEXT2 1 "register_operand" " 0,   vr") 0))]
  "TARGET_VECTOR"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
  DONE;
})

Such pattern generate such codes in insn-recog.cc:
static int
pattern57 (rtx x1)
{
  rtx * const operands ATTRIBUTE_UNUSED = &recog_data.operand[0];
  rtx x2;
  int res ATTRIBUTE_UNUSED;
  if (maybe_ne (SUBREG_BYTE (x1).to_constant (), 0))
    return -1;
...

PR110109 ICE at maybe_ne (SUBREG_BYTE (x1).to_constant (), 0) since for scalable
RVV modes can not be accessed as SUBREG_BYTE (x1).to_constant ()

I create that patterns is to optimize the following test:
vfloat32m2_t test_vlmul_ext_v_f32mf2_f32m2(vfloat32mf2_t op1) {
  return __riscv_vlmul_ext_v_f32mf2_f32m2(op1);
}

codegen:
test_vlmul_ext_v_f32mf2_f32m2:
        vsetvli a5,zero,e32,m2,ta,ma
        vmv.v.i v2,0
        vsetvli a5,zero,e32,mf2,ta,ma
        vle32.v v2,0(a1)
        vs2r.v  v2,0(a0)
        ret

There is a redundant 'vmv.v.i' here, Since GCC doesn't undefine IR
(unlike LLVM, LLVM has undef/poison).  For vlmul_ext_* RVV intrinsic,
GCC will initiate all zeros into register. However, I think it's not
a big issue after we support subreg livness tracking.

	PR target/110109

gcc/ChangeLog:

	* config/riscv/riscv-vector-builtins-bases.cc: Change expand approach.
	* config/riscv/vector.md (@vlmul_extx2<mode>): Remove it.
	(@vlmul_extx4<mode>): Ditto.
	(@vlmul_extx8<mode>): Ditto.
	(@vlmul_extx16<mode>): Ditto.
	(@vlmul_extx32<mode>): Ditto.
	(@vlmul_extx64<mode>): Ditto.
	(*vlmul_extx2<mode>): Ditto.
	(*vlmul_extx4<mode>): Ditto.
	(*vlmul_extx8<mode>): Ditto.
	(*vlmul_extx16<mode>): Ditto.
	(*vlmul_extx32<mode>): Ditto.
	(*vlmul_extx64<mode>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/pr110109-1.c: New test.
	* gcc.target/riscv/rvv/base/pr110109-2.c: New test.

a96ba6b9

RISC-V: Support RVV FP16 ZVFHMIN intrinsic API · 5c9cffa3

Pan Li authored 1 year ago


This patch support the 2 intrinsic API of FP16 ZVFHMIN extension. Aka
SEW=16 for below instructions

vfwcvt.f.f.v
vfncvt.f.f.w

Then users can leverage the instrinsic APIs to perform the conversion
between RVV vector single float point and half float point.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv-vector-builtins-types.def
	(vfloat32mf2_t): Add vfloat32mf2_t type to vfncvt.f.f.w operations.
	(vfloat32m1_t): Likewise.
	(vfloat32m2_t): Likewise.
	(vfloat32m4_t): Likewise.
	(vfloat32m8_t): Likewise.
	* config/riscv/riscv-vector-builtins.def: Fix typo in comments.
	* config/riscv/vector-iterators.md: Add single to half machine
	mode conversion.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: New test.

5c9cffa3

RISC-V: Move optimization patterns into autovec-opt.md · 13309771

Juzhe-Zhong authored 1 year ago

Move all optimization patterns into autovec-opt.md to make organization
easier maintain.

gcc/ChangeLog:

	* config/riscv/autovec-opt.md (*<optab>not<mode>): Move to autovec-opt.md.
	(*n<optab><mode>): Ditto.
	* config/riscv/autovec.md (*<optab>not<mode>): Ditto.
	(*n<optab><mode>): Ditto.
	* config/riscv/vector.md: Ditto.

13309771

PR target/110083: Fix-up REG_EQUAL notes on COMPARE in STV. · 8ab9fb6b

Roger Sayle authored 1 year ago

This patch fixes PR target/110083, an ICE-on-valid regression exposed by
my recent PTEST improvements (to address PR target/109973).  The latent
bug (admittedly mine) is that the scalar-to-vector (STV) pass doesn't update
or delete REG_EQUAL notes attached to COMPARE instructions.  As a result
the operands of COMPARE would be mismatched, with the register transformed
to V1TImode, but the immediate operand left as const_wide_int, which is
valid for TImode but not V1TImode.  This remained latent when the STV
conversion converted the mode of the COMPARE to CCmode, with later passes
recognizing the REG_EQUAL note is obviously invalid as the modes didn't
match, but now that we (correctly) preserve the CCZmode on COMPARE, the
mismatched operand modes trigger a sanity checking ICE downstream.

Fixed by updating (or deleting) any REG_EQUAL notes in convert_compare.

Before:
    (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ])
        (const_wide_int 0x80000000000000000000000000000000))

After:
    (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ])
        (const_vector:V1TI [
            (const_wide_int 0x80000000000000000000000000000000)
         ]))

2023-06-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR target/110083
	* config/i386/i386-features.cc (scalar_chain::convert_compare):
	Update or delete REG_EQUAL notes, converting CONST_INT and
	CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR.

gcc/testsuite/ChangeLog
	PR target/110083
	* gcc.target/i386/pr110083.c: New test case.

8ab9fb6b

c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720] · 2415024e

Jason Merrill authored 1 year ago

[except.handle]/7 says that when we enter std::terminate due to a throw,
that is considered an active handler.  We already implemented that properly
for the case of not finding a handler (__cxa_throw calls __cxa_begin_catch
before std::terminate) and the case of finding a callsite with no landing
pad (the personality function calls __cxa_call_terminate which calls
__cxa_begin_catch), but for the case of a throw in a try/catch in a noexcept
function, we were emitting a cleanup that calls std::terminate directly
without ever calling __cxa_begin_catch to handle the exception.

A straightforward way to fix this seems to be calling __cxa_call_terminate
instead.  However, that requires exporting it from libstdc++, which we have
not previously done.  Despite the name, it isn't actually part of the ABI
standard.  Nor is __cxa_call_unexpected, as far as I can tell, but that one
is also used by clang.  For this case they use __clang_call_terminate; it
seems reasonable to me for us to stick with __cxa_call_terminate.

I also change __cxa_call_terminate to take void* for simplicity in the front
end (and consistency with __cxa_call_unexpected) but that isn't necessary if
it's undesirable for some reason.

This patch does not fix the issue that representing the noexcept as a
cleanup is wrong, and confuses the handler search; since it looks like a
cleanup in the EH tables, the unwinder keeps looking until it finds the
catch in main(), which it should never have gotten to.  Without the
try/catch in main, the unwinder would reach the end of the stack and say no
handler was found.  The noexcept is a handler, and should be treated as one,
as it is when the landing pad is omitted.

The best fix for that issue seems to me to be to represent an
ERT_MUST_NOT_THROW after an ERT_TRY in an action list as though it were an
ERT_ALLOWED_EXCEPTIONS (since indeed it is an exception-specification).  The
actual code generation shouldn't need to change (apart from the change made
by this patch), only the action table entry.

	PR c++/97720

gcc/cp/ChangeLog:

	* cp-tree.h (enum cp_tree_index): Add CPTI_CALL_TERMINATE_FN.
	(call_terminate_fn): New macro.
	* cp-gimplify.cc (gimplify_must_not_throw_expr): Use it.
	* except.cc (init_exception_processing): Set it.
	(cp_protect_cleanup_actions): Return it.

gcc/ChangeLog:

	* tree-eh.cc (lower_resx): Pass the exception pointer to the
	failure_decl.
	* except.h: Tweak comment.

libstdc++-v3/ChangeLog:

	* libsupc++/eh_call.cc (__cxa_call_terminate): Take void*.
	* config/abi/pre/gnu.ver: Add it.

gcc/testsuite/ChangeLog:

	* g++.dg/eh/terminate2.C: New test.

2415024e