Commits · b349c651ff16240b8cc4225db76479154c941c28 · COBOLworx / gcc-cobol

Oct 09, 2024

libstdc++: Fix formatting of chrono::duration with character rep [PR116755] · b349c651

Jonathan Wakely authored 6 months ago

Implement Peter Dimov's suggestion for resolving LWG 4118, which is to
use +d.count() so that character types are promoted to an integer type
before formatting them. This didn't have unanimous consensus in the
committee as Howard Hinnant proposed that we should format the rep
consistently with std::format("{}", d.count()) instead. That ends up
being more complicated, because it makes std::formattable a precondition
of operator<< which was not previously the case, and it means that
ios_base::fmtflags from the stream would be ignored because std::format
doesn't use them.

libstdc++-v3/ChangeLog:

	PR libstdc++/116755
	* include/bits/chrono_io.h (operator<<): Use +d.count() for
	duration inserter.
	(__formatter_chrono::_M_format): Likewise for %Q format.
	* testsuite/20_util/duration/io.cc: Test durations with
	character types as reps.

b349c651

Clear DR_GROUP_NEXT_ELEMENT upon group dissolving · 55dbb4b5

Richard Biener authored 5 months ago

I've tried to sanitize DR_GROUP_NEXT_ELEMENT accesses but there are too
many so the following instead makes sure DR_GROUP_NEXT_ELEMENT is never
non-NULL for !STMT_VINFO_GROUPED_ACCESS.

	* tree-vect-data-refs.cc (vect_analyze_data_ref_access): When
	cancelling a DR group also clear DR_GROUP_NEXT_ELEMENT.

55dbb4b5

tree-optimization/117041 - fix load classification of former grouped load · 72c83f64

Richard Biener authored 5 months ago

When we first detect a grouped load but later dis-associate it we
only set DR_GROUP_FIRST_ELEMENT to NULL, indicating it is not a
STMT_VINFO_GROUPED_ACCESS but leave DR_GROUP_NEXT_ELEMENT set.  This
causes a stray DR_GROUP_NEXT_ELEMENT access in get_group_load_store_type
to go wrong, indicating a load isn't single_element_p when it actually
is, leading to wrong classification and an ICE.

	PR tree-optimization/117041
	* tree-vect-stmts.cc (get_group_load_store_type): Only
	check DR_GROUP_NEXT_ELEMENT for STMT_VINFO_GROUPED_ACCESS.

	* gcc.dg/torture/pr117041.c: New testcase.

72c83f64

testsuite: arm: use effective-target for vsel*, mod* and pr65647.c tests · cf08dd29

Torbjörn SVENSSON authored 5 months ago


Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog

	* gcc.target/arm/pr65647.c: Use effective-target arm_arch_v6m.
	Removed unneeded dg-skip-if.
	* gcc.target/arm/mod_2.c: Use effective-target arm_cpu_cortex_a57.
	* gcc.target/arm/mod_256.c: Likewise.
	* gcc.target/arm/vseleqdf.c: Likewise.
	* gcc.target/arm/vseleqsf.c: Likewise.
	* gcc.target/arm/vselgedf.c: Likewise.
	* gcc.target/arm/vselgesf.c: Likewise.
	* gcc.target/arm/vselgtdf.c: Likewise.
	* gcc.target/arm/vselgtsf.c: Likewise.
	* gcc.target/arm/vselledf.c: Likewise.
	* gcc.target/arm/vsellesf.c: Likewise.
	* gcc.target/arm/vselltdf.c: Likewise.
	* gcc.target/arm/vselltsf.c: Likewise.
	* gcc.target/arm/vselnedf.c: Likewise.
	* gcc.target/arm/vselnesf.c: Likewise.
	* gcc.target/arm/vselvcdf.c: Likewise.
	* gcc.target/arm/vselvcsf.c: Likewise.
	* gcc.target/arm/vselvsdf.c: Likewise.
	* gcc.target/arm/vselvssf.c: Likewise.
	* lib/target-supports.exp: Define effective-target arm_cpu_cortex_a57.
	Update effective-target arm_v8_1_lob_ok to use -mcpu=unset.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

cf08dd29

libcpp: Use ' instead of %< and %> [PR117039] · f7099903

Ken Matsui authored 5 months ago


	PR bootstrap/117039

libcpp/ChangeLog:

	* directives.cc (do_pragma_once): Use ' instead of %< and %>.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

f7099903

Enable LRA for ia64 · 68afc7ac

René Rebe authored 9 months ago

This was tested by bootstrapping GCC natively on ia64-t2-linux-gnu and
running the testsuite (based on
23611606):

https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html

For comparison, the same with just
23611606:

https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html



gcc/
	* config/ia64/ia64.cc: Enable LRA for ia64.
	* config/ia64/ia64.md: Likewise.
	* config/ia64/predicates.md: Likewise.

Signed-off-by: René Rebe <rene@exactcode.de>

68afc7ac

Remove ia64*-*-linux from the list of obsolete targets · 452b12ce

René Rebe authored 9 months ago


The following un-deprecates ia64*-*-linux for GCC 15. Since we plan to
support this for some years to come.

gcc/
	* config.gcc: Only list ia64*-*-(hpux|vms|elf) in the list of
	obsoleted targets.

contrib/
	* config-list.mk (LIST): no --enable-obsolete for ia64-linux.

Signed-off-by: René Rebe <rene@exactcode.de>

452b12ce

tree-optimization/116974 - Handle single-lane SLP for OMP scan store · 9df0772d

Richard Biener authored 1 year ago

The following massages the GIMPLE matching way of handling scan
stores to work with single-lane SLP.  I do not fully understand all
the cases that can happen and the stmt matching at vectorizable_store
time is less than ideal - but the following gets me all the testcases
to pass with and without forced SLP.

Long term we want to perform the matching at SLP discovery time,
properly chaining the various SLP instances the current state ends
up with.

	PR tree-optimization/116974
	* tree-vect-stmts.cc (check_scan_store): Pass in the SLP node
	instead of just a flag.  Allow single-lane scan stores.
	(vectorizable_store): Adjust.
	* tree-vect-loop.cc (vect_analyze_loop_2): Empty scan_map
	before re-trying.

9df0772d

tree-optimization/116575 - handle SLP of permuted masked loads · dc90578f

Richard Biener authored 5 months ago

The following handles SLP discovery of permuted masked loads which
was prohibited (because wrongly handled) for PR114375.  In particular
with single-lane SLP at the moment all masked group loads appear
permuted and we fail to use masked load lanes as well.  The following
addresses parts of the issues, starting with doing correct basic
discovery - namely discover an unpermuted mask load followed by
a permute node.  In particular groups with gaps do not support masking
yet (and didn't before w/o SLP IIRC).  There's still issues with
how we represent masked load/store-lanes I think, but I first have to
get my hands on a good testcase.

	PR tree-optimization/116575
	PR tree-optimization/114375
	* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reject
	permuted mask loads without gaps but instead discover a
	node for the full unpermuted load and permute that with
	a VEC_PERM node.

	* gcc.dg/vect/vect-pr114375.c: Expect vectorization now with avx2.

dc90578f

tree-optimization/117000 - elide .REDUC_IOR with compare against zero · 5977b746

Richard Biener authored 5 months ago

The following adds a pattern to elide a .REDUC_IOR operation when
the result is compared against zero with a cbranch.  I've resorted
to using can_compare_p since that's what RTL expansion eventually
checks - while GIMPLE allowed whole vector equality compares for long
I'll notice vector lowering won't lower unsupported ones and RTL
expansion doesn't seem to try using [u]cmp<vector-mode> optabs
(and neither x86 nor aarch64 implements those).  There's cstore
but no target implements that for vector modes either.

	PR tree-optimization/117000
	* match.pd (.REDUC_IOR !=/== 0): New pattern.
	* gimple-match-head.cc: Include memmodel.h and optabs.h.
	* generic-match-head.cc: Likewise.

	* gcc.target/i386/pr117000.c: New testcase.

5977b746

Fix memory leak in vect_cse_slp_nodes · fd883919

Richard Biener authored 5 months ago

The following avoids copying scalar stmts again for the re-lookup
of the slot to replace the NULL guard with node.

	* tree-vect-slp.cc (vect_cse_slp_nodes): Fix memory leak.

fd883919

gcc/doc: adjust __builtin_choose_expr() description · 4b152f62

Jan Beulich authored 5 months ago

Present wording has misled people to believe the ?: operator would be
evaluating all three of the involved expressions.

gcc/

	* doc/extend.texi: Clarify __builtin_choose_expr()
	(dis)similarity to the ?: operator.

4b152f62

gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808] · 821d5610

Ken Matsui authored 1 year ago


This patch adds a warning switch for "#pragma once in main file".  The
warning option name is Wpragma-once-outside-header, which is the same
as Clang provides.

	PR preprocessor/89808

gcc/c-family/ChangeLog:

	* c.opt (Wpragma_once_outside_header): Define new option.
	* c.opt.urls: Regenerate.

gcc/ChangeLog:

	* doc/invoke.texi (Warning Options): Document
	-Wno-pragma-once-outside-header.

libcpp/ChangeLog:

	* include/cpplib.h (cpp_warning_reason): Define
	CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.
	* directives.cc (do_pragma_once): Use
	CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/Wno-pragma-once-outside-header.C: New test.
	* g++.dg/warn/Wpragma-once-outside-header.C: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Marek Polacek <polacek@redhat.com>

821d5610

Daily bump. · 41179a32
GCC Administrator authored 5 months ago

41179a32

tree-optimization/116024 - simplify some cases of X +- C1 cmp C2 · 52fdf1e7

Artemiy Volkov authored 5 months ago

Whenever C1 and C2 are integer constants, X is of a wrapping type, and
cmp is a relational operator, the expression X +- C1 cmp C2 can be
simplified in the following cases:

(a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial
comparison in the following way:
   X +- C1 <= C2
   -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1)
   -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions)
   -INF -+ C1 <= X <= +INF (due to (1))
   -INF -+ C1 <= X (eliminate the right hand side since it holds for any X)

(b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following
sequence of transformations:

   X +- C1 >= C2
   +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1)
   +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions)
   +INF -+ C1 >= X >= -INF (due to (1))
   +INF -+ C1 >= X (eliminate the right hand side since it holds for any X)

(c) The > and < cases are negations of (a) and (b), respectively.

This transformation allows to occasionally save add / sub instructions,
for instance the expression

3 + (uint32_t)f() < 2

compiles to

cmn     w0, #4
cset    w0, ls

instead of

add     w0, w0, 3
cmp     w0, 2
cset    w0, ls

on aarch64.

Testcases that go together with this patch have been split into two
separate files, one containing testcases for unsigned variables and the
other for wrapping signed ones (and thus compiled with -fwrapv).
Additionally, one aarch64 test has been adjusted since the patch has
caused the generated code to change from

cmn     w0, #2
csinc   w0, w1, wzr, cc   (x < -2)

to

cmn     w0, #3
csinc   w0, w1, wzr, cs   (x <= -3)

This patch has been bootstrapped and regtested on aarch64, x86_64, and
i386, and additionally regtested on riscv32.

gcc/ChangeLog:

	PR tree-optimization/116024
	* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/pr116024-2.c: New test.
	* gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto.
	* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.

52fdf1e7

tree-optimization/116024 - simplify C1-X cmp C2 for wrapping signed types · e5f5cffb

Artemiy Volkov authored 5 months ago

Implement a match.pd transformation inverting the sign of X in
C1 - X cmp C2, where C1 and C2 are integer constants and X is
of a wrapping signed type, by observing that:

(a) If cmp is == or !=, simply move X and C2 to opposite sides of
the comparison to arrive at X cmp C1 - C2.

(b) If cmp is <:
	- C1 - X < C2 means that C1 - X spans the values of -INF,
	  -INF + 1, ..., C2 - 1;
        - Therefore, X is one of C1 - -INF, C1 - (-INF + 1), ...,
	  C1 - C2 + 1;
	- Subtracting (C1 + 1), X - (C1 + 1) is one of - (-INF) - 1,
          - (-INF) - 2, ..., -C2;
        - Using the fact that - (-INF) - 1 is +INF, derive that
          X - (C1 + 1) spans the values +INF, +INF - 1, ..., -C2;
        - Thus, the original expression can be simplified to
          X - (C1 + 1) > -C2 - 1.

(c) Similarly, C1 - X <= C2 is equivalent to X - (C1 + 1) >= -C2 - 1.

(d) The >= and > cases are negations of (b) and (c), respectively.

(e) In all cases, the expression -C2 - 1 can be shortened to
bit_not (C2).

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

10 - (int)f() >= 20;

now compiles to

addi    a0,a0,-11
slti    a0,a0,-20

instead of

li      a5,10
sub     a0,a5,a0
slti    t0,a0,20
xori    a0,t0,1

on 32-bit RISC-V when compiled with -fwrapv.

Additional examples can be found in the newly added test file.  This
patch has been bootstrapped and regtested on aarch64, x86_64, and i386,
and additionally regtested on riscv32.

gcc/ChangeLog:

	PR tree-optimization/116024
	* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/pr116024-1-fwrapv.c: New test.

e5f5cffb

Oct 08, 2024

tree-optimization/116024 - simplify C1-X cmp C2 for unsigned types · 65b33d43

Artemiy Volkov authored 5 months ago

Implement a match.pd transformation inverting the sign of X in
C1 - X cmp C2, where C1 and C2 are integer constants and X is
of an unsigned type, by observing that:

(a) If cmp is == or !=, simply move X and C2 to opposite sides of the
comparison to arrive at X cmp C1 - C2.

(b) If cmp is <:
	- C1 - X < C2 means that C1 - X spans the range of 0, 1, ..., C2 - 1;
        - This means that X spans the range of C1 - (C2 - 1),
	  C1 - (C2 - 2), ..., C1;
	- Subtracting C1 - (C2 - 1), X - (C1 - (C2 - 1)) is one of 0, 1,
	  ..., C1 - (C1 - (C2 - 1));
        - Simplifying the above, X - (C1 - C2 + 1) is one of 0, 1, ...,
         C2 - 1;
        - Summarizing, the expression C1 - X < C2 can be transformed
	  into X - (C1 - C2 + 1) < C2.

(c) Similarly, if cmp is <=:
	- C1 - X <= C2 means that C1 - X is one of 0, 1, ..., C2;
	- It follows that X is one of C1 - C2, C1 - (C2 - 1), ..., C1;
        - Subtracting C1 - C2, X - (C1 - C2) has range 0, 1, ..., C2;
        - Thus, the expression C1 - X <= C2 can be transformed into
	  X - (C1 - C2) <= C2.

(d) The >= and > cases are negations of (b) and (c), respectively.

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

300 - (unsigned int)f() < 100;

now compiles to

addi    a0,a0,-201
sltiu   a0,a0,100

instead of

li      a5,300
sub     a0,a5,a0
sltiu   a0,a0,100

on 32-bit RISC-V.

Additional examples can be found in the newly added test file.  This
patch has been bootstrapped and regtested on aarch64, x86_64, and i386,
and additionally regtested on riscv32.

gcc/ChangeLog:

	PR tree-optimization/116024
	* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/pr116024-1.c: New test.

65b33d43

tree-optimization/116024 - simplify C1-X cmp C2 for UB-on-overflow types · 0883c886

Artemiy Volkov authored 5 months ago

Implement a match.pd pattern for C1 - X cmp C2, where C1 and C2 are
integer constants and X is of a UB-on-overflow type.  The pattern is
simplified to X rcmp C1 - C2 by moving X and C2 to the other side of the
comparison (with opposite signs).  If C1 - C2 happens to overflow,
replace the whole expression with either a constant 0 or a constant 1
node, depending on the comparison operator and the sign of the overflow.

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

10 - (int) x <= 9;

now compiles to

sgt     a0,a0,zero

instead of

li      a5,10
sub     a0,a5,a0
slti    a0,a0,10

on 32-bit RISC-V.

Additional examples can be found in the newly added test file. This
patch has been bootstrapped and regtested on aarch64, x86_64, and
i386, and additionally regtested on riscv32.  Existing tests were
adjusted where necessary.

gcc/ChangeLog:

	PR tree-optimization/116024
	* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/pr116024.c: New test.
	* gcc.dg/pr67089-6.c: Adjust.

0883c886

RISC-V: Enable builtin __riscv_mul with Zmmul extension. · 2990f580

Tsung Chun Lin authored 5 months ago

From d5b254e19d1f37fe27c7e98a0160e5c22446cfea Mon Sep 17 00:00:00 2001
From: Jim Lin <jim@andestech.com>
Date: Tue, 8 Oct 2024 13:14:32 +0800
Subject: [PATCH] RISC-V: Enable builtin __riscv_mul with Zmmul extension.

gcc/ChangeLog:

	* config/riscv/riscv-c.cc: (riscv_cpu_cpp_builtins):
	Enable builtin __riscv_mul with Zmmul extension.

2990f580

RISC-V: Add implication for M extension. · 0a193466

Tsung Chun Lin authored 5 months ago

That M implies Zmmul.

gcc/ChangeLog:

	* common/config/riscv/riscv-common.cc: M implies Zmmul.

0a193466

RISC-V: Implement TARGET_CAN_INLINE_P · 517d344e

Yangyu Chen authored 5 months ago

Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
ISA. As a result, certain functions cannot be optimized with inlining
when specific options, such as __attribute__((target("arch=+v"))) .
This can lead to potential performance issues when building
retargetable binaries for RISC-V.

To address this, I have implemented the riscv_can_inline_p function.
This addition enables inlining when the callee either has no special
options or when the some options match, and also ensuring that the
callee's ISA is a subset of the caller's. I also check some other
options when there is no always_inline set.

gcc/ChangeLog:

	* common/config/riscv/riscv-common.cc (cl_opt_var_ref_t): Add
	cl_opt_var_ref_t pointer to member of cl_target_option.
	(struct riscv_ext_flag_table_t): Add new cl_opt_var_ref_t field.
	(RISCV_EXT_FLAG_ENTRY): New macro to simplify the definition of
	riscv_ext_flag_table.
	(riscv_ext_is_subset): New function to check if the callee's ISA
	is a subset of the caller's.
	(riscv_x_target_flags_isa_mask): New function to get the mask of
	ISA extension in x_target_flags of gcc_options.
	* config/riscv/riscv-subset.h (riscv_ext_is_subset): Declare
	riscv_ext_is_subset function.
	(riscv_x_target_flags_isa_mask): Declare
	riscv_x_target_flags_isa_mask function.
	* config/riscv/riscv.cc (riscv_can_inline_p): New function.
	(TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.

517d344e

Add regression test · 5f0a3818
Eric Botcazou authored 5 months ago
```
gcc/testsuite/
	PR ada/116190
	* gnat.dg/aggr31.adb: New test.
```
5f0a3818
Add regression test · 8da27c7b
Eric Botcazou authored 5 months ago
```
gcc/testsuite/
	PR ada/115535
	* gnat.dg/put_image1.adb: New test
```
8da27c7b

Add regression test · 0c002cce

Eric Botcazou authored 5 months ago

gcc/testsuite/
	PR ada/114636
	* gnat.dg/specs/generic_inst1.ads: New test.

0c002cce

RISC-V: Add testcases for form 1 of scalar signed SAT_TRUNC · 8b407d5c

Pan Li authored 5 months ago


Form 1:
  #define DEF_SAT_S_TRUNC_FMT_1(WT, NT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))                          \
  sat_s_trunc_##WT##_to_##NT##_fmt_1 (WT x)             \
  {                                                     \
    NT trunc = (NT)x;                                   \
    return (WT)NT_MIN <= x && x <= (WT)NT_MAX           \
      ? trunc                                           \
      : x < 0 ? NT_MIN : NT_MAX;                        \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_arith.h: Add test helper macros.
	* gcc.target/riscv/sat_arith_data.h: Add test data for SAT_TRUNC.
	* gcc.target/riscv/sat_s_trunc-1-i16-to-i8.c: New test.
	* gcc.target/riscv/sat_s_trunc-1-i32-to-i16.c: New test.
	* gcc.target/riscv/sat_s_trunc-1-i32-to-i8.c: New test.
	* gcc.target/riscv/sat_s_trunc-1-i64-to-i16.c: New test.
	* gcc.target/riscv/sat_s_trunc-1-i64-to-i32.c: New test.
	* gcc.target/riscv/sat_s_trunc-1-i64-to-i8.c: New test.
	* gcc.target/riscv/sat_s_trunc-run-1-i16-to-i8.c: New test.
	* gcc.target/riscv/sat_s_trunc-run-1-i32-to-i16.c: New test.
	* gcc.target/riscv/sat_s_trunc-run-1-i32-to-i8.c: New test.
	* gcc.target/riscv/sat_s_trunc-run-1-i64-to-i16.c: New test.
	* gcc.target/riscv/sat_s_trunc-run-1-i64-to-i32.c: New test.
	* gcc.target/riscv/sat_s_trunc-run-1-i64-to-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

8b407d5c

RISC-V: Implement scalar SAT_TRUNC for signed integer · 110ccfa5

Pan Li authored 5 months ago


This patch would like to implement the sstrunc for scalar signed
integer.

Form 1:
  #define DEF_SAT_S_TRUNC_FMT_1(WT, NT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))                          \
  sat_s_trunc_##WT##_to_##NT##_fmt_1 (WT x)             \
  {                                                     \
    NT trunc = (NT)x;                                   \
    return (WT)NT_MIN <= x && x <= (WT)NT_MAX           \
      ? trunc                                           \
      : x < 0 ? NT_MIN : NT_MAX;                        \
  }

DEF_SAT_S_TRUNC_FMT_1(int64_t, int32_t, INT32_MIN, INT32_MAX)

Before this patch:
  10   │ sat_s_trunc_int64_t_to_int32_t_fmt_1:
  11   │     li  a5,1
  12   │     slli    a5,a5,31
  13   │     li  a4,-1
  14   │     add a5,a0,a5
  15   │     srli    a4,a4,32
  16   │     bgtu    a5,a4,.L2
  17   │     sext.w  a0,a0
  18   │     ret
  19   │ .L2:
  20   │     srai    a5,a0,63
  21   │     li  a0,-2147483648
  22   │     xor a0,a0,a5
  23   │     not a0,a0
  24   │     ret

After this patch:
  10   │ sat_s_trunc_int64_t_to_int32_t_fmt_1:
  11   │     li  a5,-2147483648
  12   │     xori    a3,a5,-1
  13   │     slt a4,a0,a3
  14   │     slt a5,a5,a0
  15   │     and a5,a4,a5
  16   │     srai    a4,a0,63
  17   │     xor a4,a4,a3
  18   │     addi    a3,a5,-1
  19   │     neg a5,a5
  20   │     and a4,a4,a3
  21   │     and a0,a0,a5
  22   │     or  a0,a0,a4
  23   │     sext.w  a0,a0
  24   │     ret

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

	* config/riscv/riscv-protos.h (riscv_expand_sstrunc): Add new
	func decl to expand SAT_TRUNC.
	* config/riscv/riscv.cc (riscv_expand_sstrunc): Add new func
	impl to expand SAT_TRUNC.
	* config/riscv/riscv.md (sstrunc<mode><anyi_double_truncated>2):
	Add new pattern for double truncation.
	(sstrunc<mode><anyi_quad_truncated>2): Ditto but for quad.
	(sstrunc<mode><anyi_oct_truncated>2): Ditto but for oct.

Signed-off-by: Pan Li <pan2.li@intel.com>

110ccfa5

Widening-Mul: Fix one bug of consume after phi node released · 2291739e

Pan Li authored 5 months ago


When try to matching saturation related pattern on PHI node, we may have
to try each pattern for all phi node of bb.  Aka:

for each PHI node in bb:
  gphi *phi = xxx;
  try_match_sat_add (, phi);
  try_match_sat_sub (, phi);
  try_match_sat_trunc (, phi);

The PHI node will be removed if one of the above 3 sat patterns are
matched.  There will be a problem that, for example, sat_add is
matched and then the phi is removed(freed), and the next 2 sat_sub and
sat_trunc will depend on the removed(freed) phi node.

This patch would like to fix this consume after phi node released issue.
To ensure at most one pattern of the above will be matched.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

	* tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Rename
	to...
	(build_saturation_binary_arith_call_and_replace): ...this.
	(build_saturation_binary_arith_call_and_insert): ...this.
	(match_unsigned_saturation_add): Leverage renamed func.
	(match_unsigned_saturation_sub): Ditto.
	(match_saturation_add): Return bool on matched and leverage
	renamed func.
	(match_saturation_sub): Ditto.
	(match_saturation_trunc): Ditto.
	(math_opts_dom_walker::after_dom_children): Ensure at most one
	pattern will be matched for each phi node.

Signed-off-by: Pan Li <pan2.li@intel.com>

2291739e

Match: Support form 1 for scalar signed integer SAT_TRUNC · f9f57df8

Pan Li authored 5 months ago


This patch would like to support the form 1 of the scalar signed
integer SAT_TRUNC.  Aka below example:

Form 1:
  #define DEF_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX) \
  NT __attribute__((noinline))                          \
  sat_s_trunc_##WT##_to_##NT##_fmt_1 (WT x)             \
  {                                                     \
    NT trunc = (NT)x;                                   \
    return (WT)NT_MIN <= x && x <= (WT)NT_MAX           \
      ? trunc                                           \
      : x < 0 ? NT_MIN : NT_MAX;                        \
  }

DEF_SAT_S_TRUNC_FMT_1(int64_t, int32_t, INT32_MIN, INT32_MAX)

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int32_t sat_s_trunc_int64_t_to_int32_t_fmt_1 (int64_t x)
   6   │ {
   7   │   int32_t trunc;
   8   │   unsigned long x.0_1;
   9   │   unsigned long _2;
  10   │   int32_t _3;
  11   │   _Bool _7;
  12   │   int _8;
  13   │   int _9;
  14   │   int _10;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;    pred:       ENTRY
  18   │   x.0_1 = (unsigned long) x_4(D);
  19   │   _2 = x.0_1 + 2147483648;
  20   │   if (_2 > 4294967295)
  21   │     goto <bb 4>; [50.00%]
  22   │   else
  23   │     goto <bb 3>; [50.00%]
  24   │ ;;    succ:       4
  25   │ ;;                3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;    pred:       2
  29   │   trunc_5 = (int32_t) x_4(D);
  30   │   goto <bb 5>; [100.00%]
  31   │ ;;    succ:       5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;    pred:       2
  35   │   _7 = x_4(D) < 0;
  36   │   _8 = (int) _7;
  37   │   _9 = -_8;
  38   │   _10 = _9 ^ 2147483647;
  39   │ ;;    succ:       5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;    pred:       3
  43   │ ;;                4
  44   │   # _3 = PHI <trunc_5(3), _10(4)>
  45   │   return _3;
  46   │ ;;    succ:       EXIT
  47   │
  48   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int32_t sat_s_trunc_int64_t_to_int32_t_fmt_1 (int64_t x)
   6   │ {
   7   │   int32_t _3;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;    pred:       ENTRY
  11   │   _3 = .SAT_TRUNC (x_4(D)); [tail call]
  12   │   return _3;
  13   │ ;;    succ:       EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test with pr116861-1.c failed.
* The x86 bootstrap test.
* The x86 fully regression test.

The failed pr116861-1.c ice will be fixed in underlying patch, as it
just trigger one existing bug.

gcc/ChangeLog:

	* match.pd: Add case 1 matching pattern for signed SAT_TRUNC.
	* tree-ssa-math-opts.cc (gimple_signed_integer_sat_trunc): Add
	new decl for signed SAT_TRUNC.
	(match_saturation_trunc): Add new func impl to try SAT_TRUNC
	pattern on phi node.
	(math_opts_dom_walker::after_dom_children): Add
	match_saturation_trunc for phi node iteration.

Signed-off-by: Pan Li <pan2.li@intel.com>

f9f57df8

x86/{,V}AES: adjust when to force EVEX encoding · 0ab66f09

Jan Beulich authored 5 months ago

Commit a79d13a0 ("i386: Fix aes/vaes patterns [PR114576]") correctly
said "..., but we need to emit {evex} prefix in the assembly if AES ISA
is not enabled". Yet it did so only for the TARGET_AES insns. Going from
the alternative chosen in the TARGET_VAES insns isn't quite right: If
AES is (also) enabled, EVEX encoding would needlessly be forced.

gcc/

	* config/i386/sse.md (vaesdec_<mode>, vaesdeclast_<mode>,
	vaesenc_<mode>, vaesenclast_<mode>): Replace which_alternative
	check by TARGET_AES one.

0ab66f09

aarch64: Expand CTZ to RBIT + CLZ for SVE [PR109498] · c94adf02

Soumya AR authored 5 months ago


Currently, we vectorize CTZ for SVE by using the following operation:
.CTZ (X) = (PREC - 1) - .CLZ (X & -X)

Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in PR109498.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR <soumyaa@nvidia.com>

gcc/ChangeLog:
	PR target/109498
	* config/aarch64/aarch64-sve.md (ctz<mode>2): Added pattern to expand
	CTZ to RBIT + CLZ for SVE.

gcc/testsuite/ChangeLog:
	PR target/109498
	* gcc.target/aarch64/sve/ctz.c: New test.

c94adf02

[RISC-V][PR target/116615] RISC-V: Use default LOGICAL_OP_NON_SHORT_CIRCUIT · 34ae3a99

Palmer Dabbelt authored 5 months ago


> We have cheap logical ops, so let's just move this back to the default
> to take advantage of the standard branch/op hueristics.
>
> gcc/ChangeLog:
>
>     PR target/116615
>     * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> ---
> There's a bunch more discussion in the bug, but it's starting to smell
> like this was just a holdover from MIPS (where maybe it also shouldn't
> be set).  I haven't tested this, but I figured I'd send the patch to get
> a little more visibility.
>
> I guess we should also kick off something like a SPEC run to make sure
> there's no regressions?
So as I noted earlier, this appears to be a nice win on the BPI. Testsuite
fallout is minimal -- just the one SFB related test tripping at -Os that was
also hit by Andrew P's work.

After looking at it more closely, the SFB codegen and the codegen after
Andrew's work should be equivalent assuming two independent ops can dispatch
together.

The test actually generates sensible code at -Os.  It's the -Os in combination
with the -fno-ssa-phiopt that causes problems.   I think the best thing to do
here is just skip at -Os.  That still keeps a degree of testing the SFB path.

Tested successfully in my tester.  But will wait for the pre-commit tester to
render a verdict before moving forward.

	PR target/116615
gcc/
	* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.

gcc/testsuite/

	* gcc.target/riscv/cset-sext-sfb.c: Skip for -Os.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

34ae3a99

LoongArch: Fix up r15-4130 · 60e99901

Xi Ruoyao authored 8 months ago

An earlier version of the patch (lacking the regeneration of some files)
was pushed.  Fix it up now.

gcc/ChangeLog:

	* config/loongarch/loongarch.opt: Regenerate.
	* config/loongarch/loongarch.opt.urls: Regenerate.

60e99901

Fix parsing of substring refs in coarrays. [PR51815] · 0ad2c76b

Andre Vehreschild authored 5 months ago

The parser was greadily taking the substring ref as an array ref because
an array_spec was present.  Fix this by only parsing the coarray (pseudo)
ref when no regular array is present.

gcc/fortran/ChangeLog:

	PR fortran/51815

	* array.cc (gfc_match_array_ref): Only parse coarray part of
	ref.
	* match.h (gfc_match_array_ref): Add flag.
	* primary.cc (gfc_match_varspec): Request only coarray ref
	parsing when no regular array is present.  Report error on
	unexpected additional ref.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pr102532.f90: Fix dg-errors: Add new error.
	* gfortran.dg/coarray/substring_1.f90: New test.

0ad2c76b

RISC-V: Add testcases for form 4 of scalar signed SAT_SUB · 9252fc39

Pan Li authored 5 months ago


Form 4:
  #define DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX)           \
  T __attribute__((noinline))                            \
  sat_s_sub_##T##_fmt_4 (T x, T y)                       \
  {                                                      \
    T minus;                                               \
    bool overflow = __builtin_sub_overflow (x, y, &minus); \
    return !overflow ? minus : x < 0 ? MIN : MAX;          \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_arith.h: Add test helper macros.
	* gcc.target/riscv/sat_s_sub-4-i16.c: New test.
	* gcc.target/riscv/sat_s_sub-4-i32.c: New test.
	* gcc.target/riscv/sat_s_sub-4-i64.c: New test.
	* gcc.target/riscv/sat_s_sub-4-i8.c: New test.
	* gcc.target/riscv/sat_s_sub-run-4-i16.c: New test.
	* gcc.target/riscv/sat_s_sub-run-4-i32.c: New test.
	* gcc.target/riscv/sat_s_sub-run-4-i64.c: New test.
	* gcc.target/riscv/sat_s_sub-run-4-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

9252fc39

RISC-V: Add testcases for form 3 of scalar signed SAT_SUB · aac2bc48

Pan Li authored 5 months ago


Form 3:
  #define DEF_SAT_S_SUB_FMT_3(T, UT, MIN, MAX)             \
  T __attribute__((noinline))                              \
  sat_s_sub_##T##_fmt_3 (T x, T y)                         \
  {                                                        \
    T minus;                                               \
    bool overflow = __builtin_sub_overflow (x, y, &minus); \
    return overflow ? x < 0 ? MIN : MAX : minus;           \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_arith.h: Add test helper macros.
	* gcc.target/riscv/sat_s_sub-3-i16.c: New test.
	* gcc.target/riscv/sat_s_sub-3-i32.c: New test.
	* gcc.target/riscv/sat_s_sub-3-i64.c: New test.
	* gcc.target/riscv/sat_s_sub-3-i8.c: New test.
	* gcc.target/riscv/sat_s_sub-run-3-i16.c: New test.
	* gcc.target/riscv/sat_s_sub-run-3-i32.c: New test.
	* gcc.target/riscv/sat_s_sub-run-3-i64.c: New test.
	* gcc.target/riscv/sat_s_sub-run-3-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

aac2bc48

Match: Support form 3 and form 4 for scalar signed integer SAT_SUB · e21a8d9e

Pan Li authored 5 months ago


This patch would like to support the form 3 and form 4 of the scalar signed
integer SAT_SUB.  Aka below example:

Form 3:
  #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)           \
  T __attribute__((noinline))                            \
  sat_s_add_##T##_fmt_3 (T x, T y)                       \
  {                                                      \
    T sum;                                               \
    bool overflow = __builtin_add_overflow (x, y, &sum); \
    return overflow ? x < 0 ? MIN : MAX : sum;           \
  }

Form 4:
  #define DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX)             \
  T __attribute__((noinline))                              \
  sat_s_sub_##T##_fmt_4 (T x, T y)                         \
  {                                                        \
    T minus;                                               \
    bool overflow = __builtin_sub_overflow (x, y, &minus); \
    return !overflow ? minus : x < 0 ? MIN : MAX;          \
  }

DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX);

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_sub_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   signed char _1;
   8   │   signed char _2;
   9   │   int8_t _3;
  10   │   __complex__ signed char _6;
  11   │   _Bool _8;
  12   │   signed char _9;
  13   │   signed char _10;
  14   │   signed char _11;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;    pred:       ENTRY
  18   │   _6 = .SUB_OVERFLOW (x_4(D), y_5(D));
  19   │   _2 = IMAGPART_EXPR <_6>;
  20   │   if (_2 != 0)
  21   │     goto <bb 4>; [50.00%]
  22   │   else
  23   │     goto <bb 3>; [50.00%]
  24   │ ;;    succ:       4
  25   │ ;;                3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;    pred:       2
  29   │   _1 = REALPART_EXPR <_6>;
  30   │   goto <bb 5>; [100.00%]
  31   │ ;;    succ:       5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;    pred:       2
  35   │   _8 = x_4(D) < 0;
  36   │   _9 = (signed char) _8;
  37   │   _10 = -_9;
  38   │   _11 = _10 ^ 127;
  39   │ ;;    succ:       5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;    pred:       3
  43   │ ;;                4
  44   │   # _3 = PHI <_1(3), _11(4)>
  45   │   return _3;
  46   │ ;;    succ:       EXIT
  47   │
  48   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_sub_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t _3;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;    pred:       ENTRY
  11   │   _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
  12   │   return _3;
  13   │ ;;    succ:       EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

	* match.pd: Add case 3 matching pattern for signed SAT_SUB.

Signed-off-by: Pan Li <pan2.li@intel.com>

e21a8d9e

ssa-math-opts, i386: Handle most unordered values rather than just 2 [PR116896] · ff889b35

Jakub Jelinek authored 5 months ago

On Mon, Oct 07, 2024 at 10:32:57AM +0200, Richard Biener wrote:
> > They are implementation defined, -1, 0, 1, 2 is defined by libstdc++:
> >     using type = signed char;
> >     enum class _Ord : type { equivalent = 0, less = -1, greater = 1 };
> >     enum class _Ncmp : type { _Unordered = 2 };
> > https://eel.is/c++draft/cmp#categories.pre-1 documents them as
> > enum class ord { equal = 0, equivalent = equal, less = -1, greater = 1 }; // exposition only
> > enum class ncmp { unordered = -127 };                                     // exposition only
> > and now looking at it, LLVM's libc++ takes that literally and uses
> > -1, 0, 1, -127.  One can't use <=> operator without including <compare>
> > which provides the enums, so I think if all we care about is libstdc++,
> > then just hardcoding -1, 0, 1, 2 is fine, if we want to also optimize
> > libc++ when used with gcc, we could support -1, 0, 1, -127 as another
> > option.
> > Supporting arbitrary 4 values doesn't make sense, at least on x86 the
> > only reason to do the conversion to int in an optab is a good sequence
> > to turn the flag comparisons to -1, 0, 1.  So, either we do nothing
> > more than the patch, or add handle both 2 and -127 for unordered,
> > or add support for arbitrary value for the unordered case except
> > -1, 0, 1 (then -1 could mean signed int, 1 unsigned int, 0 do the jumps
> > and any other value what should be returned for unordered.

Here is an incremental patch which adds support for (almost) arbitrary
unordered constant value.  It changes the .SPACESHIP and spaceship<mode>4
optab conventions, so 0 means use branches, floating point, -1, 0, 1, 2
results consumed by tree-ssa-math-opts.cc emitted comparisons, -1
means signed int comparisons, -1, 0, 1 results, 1 means unsigned int
comparisons, -1, 0, 1 results, and for constant other than -1, 0, 1
which fit into [-128, 127] converted to the PHI type are otherwise
specified as the last argument (then it is -1, 0, 1, C results).

2024-10-08  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/116896
	* tree-ssa-math-opts.cc (optimize_spaceship): Handle unordered values
	other than 2, but they still need to be signed char range possibly
	converted to the PHI result and can't be in [-1, 1] range.  Use
	last .SPACESHIP argument of 1 for unsigned int comparisons, -1 for
	signed int, 0 for floating point branches and any other for floating
	point with that value as unordered.
	* config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Use op2 rather
	const2_rtx if op2 is not const0_rtx for unordered result.
	(ix86_expand_int_spaceship): Change INTVAL (op2) == 1 tests to
	INTVAL (op2) != -1.
	* doc/md.texi (spaceship@var{m}4): Document the above changes.

	* gcc.target/i386/pr116896.c: New test.

ff889b35

ada: Fix infinite loop on MSP430 with -mlarge flag · 9fd38cc5

Eric Botcazou authored 6 months ago

This removes the loop trying to find a pointer mode among the integer modes,
which is obsolete and does not work on platforms where pointers have unusual
size like MSP430 or special semantics like Morello.

gcc/ada/ChangeLog:
	PR ada/116498
	* gcc-interface/decl.cc (validate_size): Use the size of the default
	pointer mode as the minimum size for access types and fat pointers.

9fd38cc5

ada: Remove -gnateE information message for noncontiguous enumeration type · 409de30d

Eric Botcazou authored 6 months ago

It is very confusing for the user because it does not make any reference
to the source code but only to details of the underlying implementation.

gcc/ada/ChangeLog:
	* gcc-interface/trans.cc (Raise_Error_to_gnu) <CE_Invalid_Data>:
	Do not the generate range information if the value is a call to a
	Rep_To_Pos function.

409de30d

ada: Rework the Android sigtramp implementation · c4e90a24

Olivier Hainque authored 6 months ago

The initial signal handling code introduced for aarch64-android
overlooked details of the tasking runtime, not in the initial testing
perimeter.

Specifically, a reference to __gnat_sigtramp from __gnat_error_handler,
initially introduced for the arm port, was prevented if !arm on the
grounds that other ports would rely on kernel CFI. aarch64-android
does provide kernel CFI and __gnat_sigtramp was not provided for this
configuration.

But there is a similar reference from s-intman__android, which kicks in
as soon as the tasking runtime gets activated, triggering link failures.

Testing for more precise target specific parameters from Ada
code is inconvenient and replicating the logic is not attractive in
any case, so this change addresses the problem in the following
fashion:

- Always provide a __gnat_sigtramp entry point, common to the
  tasking and non-tasking signal handling code for all the Android
  configurations,

- There (C code), from target definition macros, select a path
  that either routes directly to the actual signal handler or goes
  through the intermediate layer providing hand crafted CFI
  information which allows unwinding up to the interrupted code.

- Similarily to what was done for VxWorks, move the arm specific
  definitions to a separate header file to make the general structure
  of the common C code easier to grasp,

- Adjust the comments in the common sigtramp.h header to
  account for such an organisation possibility.

gcc/ada/ChangeLog:
	* sigtramp-armdroid.c: Refactor into ...
	* sigtramp-android.c, sigtramp-android-asm.h: New files.
	* Makefile.rtl (arm/aarch64-android section): Add
	sigtramp-android.o to EXTRA_LIBGNAT_OBJS unconditionally. Add
	sigtramp.h and sigtramp-android-asm.h to EXTRA_LIBGNAT_SRCS.
	* init.c (android section, __gnat_error_handler): Defer to
	__gnat_sigramp unconditionally again.
	* sigtramp.h: Adjust comments to allow neutral signal handling
	relays, merely forwarding to the underlying handler without any
	intermediate CFI magic.

c4e90a24