Commits · f531673917e4f80ad51eda0d806f0479c501a907 · COBOLworx / gcc-cobol

Sep 23, 2024

aarch64: store signing key and signing method in DWARF _Unwind_FrameState · f5316739

Matthieu Longo authored 6 months ago

This patch is only a refactoring of the existing implementation
of PAuth and returned-address signing. The existing behavior is
preserved.

_Unwind_FrameState already contains several CIE and FDE information
(see the attributes below the comment "The information we care
about from the CIE/FDE" in libgcc/unwind-dw2.h).
The patch aims at moving the information from DWARF CIE (signing
key stored in the augmentation string) and FDE (the used signing
method) into _Unwind_FrameState along the already-stored CIE and
FDE information.
Note: those information have to be saved in frame_state_reg_info
instead of _Unwind_FrameState as they need to be savable by
DW_CFA_remember_state and restorable by DW_CFA_restore_state, that
both rely on the attribute "prev".

Those new information in _Unwind_FrameState simplifies the look-up
of the signing key when the return address is demangled. It also
allows future signing methods to be easily added.

_Unwind_FrameState is not a part of the public API of libunwind,
so the change is backward compatible.

A new architecture-specific handler MD_ARCH_EXTENSION_FRAME_INIT
allows to reset values (if needed) in the frame state and unwind
context before changing the frame state to the caller context.

A new architecture-specific handler MD_ARCH_EXTENSION_CIE_AUG_HANDLER
isolates the architecture-specific augmentation strings in AArch64
backend, and allows others architectures to reuse augmentation
strings that would have clashed with AArch64 DWARF extensions.

aarch64_demangle_return_addr, DW_CFA_AARCH64_negate_ra_state and
DW_CFA_val_expression cases in libgcc/unwind-dw2-execute_cfa.h
were documented to clarify where the value of the RA state register
is stored (FS and CONTEXT respectively).

libgcc/ChangeLog:

	* config/aarch64/aarch64-unwind.h
	(AARCH64_DWARF_RA_STATE_MASK): The mask for RA state register.
	(aarch64_ra_signing_method_t): The diversifiers used to sign a
	function's return address.
	(aarch64_pointer_auth_key): The key used to sign a function's
	return address.
	(aarch64_cie_signed_with_b_key): Deleted as the signing key is
	available now in _Unwind_FrameState.
	(MD_ARCH_EXTENSION_CIE_AUG_HANDLER): New CIE augmentation string
	handler for architecture extensions.
	(MD_ARCH_EXTENSION_FRAME_INIT): New architecture-extension
	initialization routine for DWARF frame state and context before
	execution of DWARF instructions.
	(aarch64_context_ra_state_get): Read RA state register from CONTEXT.
	(aarch64_ra_state_get): Read RA state register from FS.
	(aarch64_ra_state_set): Write RA state register into FS.
	(aarch64_ra_state_toggle): Toggle RA state register in FS.
	(aarch64_cie_aug_handler): Handler AArch64 augmentation strings.
	(aarch64_arch_extension_frame_init): Initialize defaults for the
	signing key (PAUTH_KEY_A), and RA state register (RA_no_signing).
	(aarch64_demangle_return_addr): Rely on the frame registers and
	the signing_key attribute in _Unwind_FrameState.
	* unwind-dw2-execute_cfa.h:
	Use the right alias DW_CFA_AARCH64_negate_ra_state for __aarch64__
	instead of DW_CFA_GNU_window_save.
	(DW_CFA_AARCH64_negate_ra_state): Save the signing method in RA
	state register. Toggle RA state register without resetting 'how'
	to REG_UNSAVED.
	* unwind-dw2.c:
	(extract_cie_info): Save the signing key in the current
	_Unwind_FrameState while parsing the augmentation data.
	(uw_frame_state_for): Reset some attributes related to architecture
	extensions in _Unwind_FrameState.
	(uw_update_context): Move authentication code to AArch64 unwinding.
	* unwind-dw2.h (enum register_rule): Give a name to the existing
	enum for the register rules, and replace 'unsigned char' by 'enum
	register_rule' to facilitate debugging in GDB.
	(_Unwind_FrameState): Add a new architecture-extension attribute
	to store the signing key.

f5316739

OpenMP: Fix omp_get_device_from_uid, minor cleanup · cdb9aa0f

Tobias Burnus authored 6 months ago

In Fortran, omp_get_device_from_uid can also accept substrings, which are
then not NUL terminated.  Fixed by introducing a fortran.c wrapper function.
Additionally, in case of a fail the plugin functions now return NULL instead
of failing fatally such that a fall-back UID is generated.

gcc/ChangeLog:

	* omp-general.cc (omp_runtime_api_procname): Strip "omp_" from
	string; move get_device_from_uid as now a '_' suffix exists.

libgomp/ChangeLog:

	* fortran.c (omp_get_device_from_uid_): New function.
	* libgomp.map (GOMP_6.0): Add it.
	* oacc-host.c (host_dispatch): Init '.uid' and '.get_uid_func'.
	* omp_lib.f90.in: Make it used by removing bind(C).
	* omp_lib.h.in: Likewise.
	* target.c (omp_get_device_from_uid): Ensure the device is initialized.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): Add function comment;
	return NULL in case of an error.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): Likewise.
	* testsuite/libgomp.fortran/device_uid.f90: Update to test substrings.

cdb9aa0f

arc: Remove mlra option [PR113954] · ffd861c8

Claudiu Zissulescu authored 6 months ago


The target dependent mlra option was designed to be able to quickly
switch between LRA and reload.  The reload register allocator step is
scheduled for retirement, thus, remove the functionality of mlra,
keeping it for backward compatibility.

	PR target/113954

gcc/ChangeLog:

	* config/arc/arc.cc (TARGET_LRA_P): Always return true.
	(arc_lra_p): Remove.
	* config/arc/arc.h (TARGET_LRA): Remove.
	* config/arc/arc.opt (mlra): Change it to do nothing.
	* doc/invoke.texi (mlra): Update option description.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

ffd861c8

c++: Don't crash when mangling member with anonymous union or template type [PR100632, PR109790] · a030fcad

Simon Martin authored 6 months ago

We currently crash upon mangling members that have an anonymous union or
a template operator type.

The problem is that before calling write_unqualified_name,
write_member_name asserts that it has a declaration whose DECL_NAME is
an identifier node that is not that of an operator. This is wrong:
 - In PR100632, it's an anonymous union declaration, hence a 0 DECL_NAME
 - In PR109790, it's a legitimate template declaration for an operator
   (this was accepted up to GCC 10)

This assert was added via r11-6301, to be sure that we do write the "on"
marker for operator members.

This patch removes that assert and instead
 - Lets members with an anonymous union type go through
 - For operators, adds the missing "on" marker for ABI versions greater
   than the highest usable with GCC 10

	PR c++/109790
	PR c++/100632

gcc/cp/ChangeLog:

	* mangle.cc (write_member_name): Handle members whose type is an
	anonymous union member. Write missing "on" marker for operators
	when ABI version is at least 16.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/decltype83.C: New test.
	* g++.dg/cpp0x/decltype83a.C: New test.
	* g++.dg/cpp1y/lambda-ice3.C: New test.
	* g++.dg/cpp1y/lambda-ice3a.C: New test.
	* g++.dg/cpp2a/nontype-class67.C: New test.

a030fcad

c++: Don't ICE due to artificial constructor parameters [PR116722] · d7bf5e53

Simon Martin authored 6 months ago

The following code triggers an ICE

=== cut here ===
class base {};
class derived : virtual public base {
public:
  template<typename Arg> constexpr derived(Arg) {}
};
int main() {
  derived obj(1.);
}
=== cut here ===

The problem is that cxx_bind_parameters_in_call ends up attempting to
convert a REAL_CST (the first non artificial parameter) to INTEGER_TYPE
(the type of the __in_chrg parameter), which ICEs.

This patch changes cxx_bind_parameters_in_call to return early if it's
called with a *structor that has an __in_chrg or __vtt_parm parameter
since the expression won't be a constant expression.

Note that in the test case, the constructor is not constexpr-suitable,
however it's OK since it's a template according to my read of paragraph
(3) of [dcl.constexpr].

	PR c++/116722

gcc/cp/ChangeLog:

	* constexpr.cc (cxx_bind_parameters_in_call): Leave early for
	{con,de}structors of classes with virtual bases.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/constexpr-ctor22.C: New test.

d7bf5e53

Add myself to write after approval · 346f767f
Saurabh Jha authored 6 months ago
```
ChangeLog:

	* MAINTAINERS: Add myself to write after approval.
```
346f767f

tree-optimization/116810 - out-of-bound access to matches[] · 2c04f175

Richard Biener authored 6 months ago

The following makes sure to apply forced splitting of groups for
firced single-lane SLP only when the group being analyzed has more
than one lane.  This avoids an out-of-bound access to matches[].

	PR tree-optimization/116810
	* tree-vect-slp.cc (vect_build_slp_instance): Onlu force
	splitting for group_size > 1.

2c04f175

tree-optimization/116796 - virtual LC SSA broken after unrolling · e97c75d6

Richard Biener authored 6 months ago

When the unroller unloops loops it tracks whether it changes any
nesting relationship of remaining loops but when scanning a loops
preheader it fails to pass down the LC-SSA-invalidated bitmap, losing
the fact that an unrolled formerly inner loop can now be placed on
an exit of its outer loop.  The following fixes that.

	PR tree-optimization/116796
	* cfgloopmanip.cc (fix_loop_placements): Get LC-SSA-invalidated
	bitmap and pass it on.
	(remove_path): Pass LC-SSA-invalidated to fix_loop_placements.

e97c75d6

middle-end: Insert invariant instructions before the gsi [PR116812] · 09892448

Tamar Christina authored 6 months ago

The new invariant statements should be inserted before the current
statement and not after.  This goes fine 99% of the time but when the
current statement is a gcond the control flow gets corrupted.

gcc/ChangeLog:

	PR tree-optimization/116812
	* tree-vect-slp.cc (vect_slp_region): Fix insertion.

gcc/testsuite/ChangeLog:

	PR tree-optimization/116812
	* gcc.dg/vect/pr116812.c: New test.

09892448

tree-optimization/116791 - Elementwise SLP vectorization · 723f7b6d

Richard Biener authored 6 months ago

The following restricts the elementwise SLP vectorization to the
single-lane case which is the reason I enabled it to avoid regressions
with non-SLP.  The PR shows that multi-line SLP loads with elementwise
accesses require work, I'll open a new bug to track this for the
future.

	PR tree-optimization/116791
	* tree-vect-stmts.cc (get_group_load_store_type): Only
	fall back to elementwise access for single-lane SLP, restore
	hard failure mode for other cases.

	* gcc.dg/vect/pr116791.c: New testcase.

723f7b6d

gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h · dfb75079

Tobias Burnus authored 6 months ago

In commit r15-3629-g508ef585243d4674d06b0737bfe8769fc18f824f, #embed
was added and no longer required fprintf '#include' removed, missing
somehow that with -mstack-size=, the generated configure_stack_size
will use 'setenv' and 'true'.

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (process_asm): (Re)add the fprintf
	lines for stdlib.h/stdbool.h inclusion if gcn_stack_size is used.

dfb75079

Genmatch: Fix ICE for binary phi cfg mismatching [PR116795] · 999363c5

Pan Li authored 6 months ago


This patch would like to fix one ICE when try to match the binary
phi for below cfg.  We check the first edge of the Phi block comes
from b0, instead of check the only one edge of b1 comes from the
b0 too.  Thus, it will result in some code to be recog as .SAT_SUB
but it is not, and finally result the verify_ssa failure.

+------+
| b0:  |
| def  |       +-----+
| ...  |       | b1: |
| cond |------>| def |
+------+       | ... |
   |           +-----+
   |              |
   |              |
   v              |
+-----+           |
| b2: |           |
| Phi |<----------+
+-----+

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

	PR target/116795

gcc/ChangeLog:

	* gimple-match-head.cc (match_cond_with_binary_phi): Fix the
	incorrect cfg check as b0->b1 in above example.

gcc/testsuite/ChangeLog:

	* gcc.dg/torture/pr116795-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

999363c5

gimple: Simplify gimple_seq_nondebug_singleton_p · 831137be

Andrew Pinski authored 6 months ago


The implementation of gimple_seq_nondebug_singleton_p
was convoluted on how to determine if the sequence
was a singleton (which could contain debug statements).

This simplifies the function into two calls. One to get the start
after all of the debug statements and then check to see if it
is at the one before the end (or there is only debug statements
afterwards).

Bootstrapped and tested on x86_64-linux-gnu (including ada).

gcc/ChangeLog:

	* gimple-iterator.h (gimple_seq_nondebug_singleton_p):
	Rewrite to be simplely, gsi_start_nondebug/gsi_one_nondebug_before_end_p.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

831137be

gimple: Remove custom remove_pointer · 2cd76720

Andrew Pinski authored 6 months ago


Since r11-2700-g22dc89f8073cd0, type_traits has been included via system.h so
we don't need a custom version for gimple.h.

Note a small C++14 cleanup is to use remove_pointer_t directly here instead
of remove_pointer<t>::type.

bootstrapped and tested on x86_64-linux-gnu

gcc/ChangeLog:

	* gimple.h (remove_pointer): Remove.
	(GIMPLE_CHECK2): Use std::remove_pointer instead of custom one.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

2cd76720

Remove commented out PHI_ARG_DEF macro defition · 0d68bfe2

Andrew Pinski authored 6 months ago


This was commented out since r0-125500-g80560f9521f81a and a new
defition was added at the same time. Let's remove the commented
out version.

gcc/ChangeLog:

	* tree-ssa-operands.h (PHI_ARG_DEF): Remove definition.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

0d68bfe2

Update email in MAINTAINERS file. · 52783489
Aldy Hernandez authored 6 months ago
```
ChangeLog:

	* MAINTAINERS: Update email and add myself to DCO.
```
52783489

Match: Support form 2 for vector signed integer .SAT_ADD · 4fc92480

Pan Li authored 6 months ago


This patch would like to support the form 2 of the vector signed
integer .SAT_ADD.  Aka below example:

Form 2:
  #define DEF_VEC_SAT_S_ADD_FMT_2(T, UT, MIN, MAX)                     \
  void __attribute__((noinline))                                       \
  vec_sat_s_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    for (i = 0; i < limit; i++)                                        \
      {                                                                \
        T x = op_1[i];                                                 \
        T y = op_2[i];                                                 \
        T sum = (UT)x + (UT)y;                                         \
        if ((x ^ y) < 0 || (sum ^ x) >= 0)                             \
          out[i] = sum;                                                \
        else                                                           \
          out[i] = x < 0 ? MIN : MAX;                                  \
      }                                                                \
  }

DEF_VEC_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX)

Before this patch:
 104   │   loop_len_79 = MIN_EXPR <ivtmp.51_53, POLY_INT_CST [16, 16]>;
 105   │   _50 = &MEM <vector([16,16]) signed char> [(int8_t *)vectp_op_1.9_77];
 106   │   vect_x_18.11_80 = .MASK_LEN_LOAD (_50, 8B, { -1, ... }, loop_len_79, 0);
 107   │   _70 = vect_x_18.11_80 >> 7;
 108   │   vect_x.12_81 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned char>(vect_x_18.11_80);
 109   │   _26 = (void *) ivtmp.47_20;
 110   │   _27 = &MEM <vector([16,16]) signed char> [(int8_t *)_26];
 111   │   vect_y_20.15_84 = .MASK_LEN_LOAD (_27, 8B, { -1, ... }, loop_len_79, 0);
 112   │   vect__7.21_90 = vect_x_18.11_80 ^ vect_y_20.15_84;
 113   │   mask__50.23_92 = vect__7.21_90 >= { 0, ... };
 114   │   vect_y.16_85 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned char>(vect_y_20.15_84);
 115   │   vect__6.17_86 = vect_x.12_81 + vect_y.16_85;
 116   │   vect_sum_21.18_87 = VIEW_CONVERT_EXPR<vector([16,16]) signed char>(vect__6.17_86);
 117   │   vect__8.19_88 = vect_x_18.11_80 ^ vect_sum_21.18_87;
 118   │   mask__45.20_89 = vect__8.19_88 < { 0, ... };
 119   │   mask__44.24_93 = mask__45.20_89 & mask__50.23_92;
 120   │   _40 = .COND_XOR (mask__44.24_93, _70, { 127, ... }, vect_sum_21.18_87);
 121   │   _60 = (void *) ivtmp.49_6;
 122   │   _61 = &MEM <vector([16,16]) signed char> [(int8_t *)_60];
 123   │   .MASK_LEN_STORE (_61, 8B, { -1, ... }, loop_len_79, 0, _40);
 124   │   vectp_op_1.9_78 = vectp_op_1.9_77 + POLY_INT_CST [16, 16];
 125   │   ivtmp.47_4 = ivtmp.47_20 + POLY_INT_CST [16, 16];
 126   │   ivtmp.49_21 = ivtmp.49_6 + POLY_INT_CST [16, 16];
 127   │   ivtmp.51_98 = ivtmp.51_53;
 128   │   ivtmp.51_8 = ivtmp.51_53 + POLY_INT_CST [18446744073709551600, 18446744073709551600];

After this patch:
  88   │   _103 = .SELECT_VL (ivtmp_101, POLY_INT_CST [16, 16]);
  89   │   vect_x_18.11_90 = .MASK_LEN_LOAD (vectp_op_1.9_88, 8B, { -1, ... }, _103, 0);
  90   │   vect_y_20.14_94 = .MASK_LEN_LOAD (vectp_op_2.12_92, 8B, { -1, ... }, _103, 0);
  91   │   vect_patt_49.15_95 = .SAT_ADD (vect_x_18.11_90, vect_y_20.14_94);
  92   │   .MASK_LEN_STORE (vectp_out.16_97, 8B, { -1, ... }, _103, 0, vect_patt_49.15_95);
  93   │   vectp_op_1.9_89 = vectp_op_1.9_88 + _103;
  94   │   vectp_op_2.12_93 = vectp_op_2.12_92 + _103;
  95   │   vectp_out.16_98 = vectp_out.16_97 + _103;
  96   │   ivtmp_102 = ivtmp_101 - _103;

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

	* match.pd: Add the case 3 for signed .SAT_ADD matching.

Signed-off-by: Pan Li <pan2.li@intel.com>

4fc92480

RISC-V: Add testcases for form 2 of signed vector SAT_ADD · a1e6bb6f

Pan Li authored 6 months ago


Form 2:
  #define DEF_VEC_SAT_S_ADD_FMT_2(T, UT, MIN, MAX)                     \
  void __attribute__((noinline))                                       \
  vec_sat_s_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    for (i = 0; i < limit; i++)                                        \
      {                                                                \
        T x = op_1[i];                                                 \
        T y = op_2[i];                                                 \
        T sum = (UT)x + (UT)y;                                         \
        if ((x ^ y) < 0 || (sum ^ x) >= 0)                             \
          out[i] = sum;                                                \
        else                                                           \
          out[i] = x < 0 ? MIN : MAX;                                  \
      }                                                                \
  }

DEF_VEC_SAT_S_ADD_FMT_2 (int8_t, uint8_t, INT8_MIN, INT8_MAX)

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macro.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-5.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-6.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-7.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-8.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-5.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-6.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-7.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

a1e6bb6f

testsuite/gfortran.dg/unsigned_22.f90: Add missing close with delete, PR116701 · 3f37c6f4

Hans-Peter Nilsson authored 6 months ago

Without this patch, gfortran.dg/unsigned_22.f90 fails for
non-effective-target fd_truncate targets, i.e. targets that
don't support chsize or ftruncate.  See also
libgfortran/io/unix.c:raw_truncate.  It passes on the first
run, but leaves behind a file "fort.10" which is then picked
up by subsequent runs, but since that file is to be
rewritten, the libgfortran machinery tries to truncate it,
which fails.  The file always being left behind, is
primarily because the test-case lacks a deleting
close-statement, apparently accidentally.

Incidentally, this "fort.10" artefact is also picked up by
gfortran.dg/write_check3.f90 causing that test to fail too,
observable as a regression for non-fd_truncate targets since
the unsigned_22.f90 introduction.  Also, when running
e.g. the whole of gfortran.dg/dg.exp, the "fort.10" is later
deleted by gfortran.dg/write_direct_eor.f90 (which
regardlessly passes), erasing the clue of the cause of the
write_check3 failure.  Also, running just
dg.exp=write_check3.f90 or manually repeating the commands
in gfortran.log showed no error.

N.B.: this close-statement will not help if unsigned_22 for
some reason fails, executing one of the "stop" statements,
but that's also the case for many other tests.

	PR testsuite/116701
	* gfortran.dg/unsigned_22.f90: Add missing close with delete.

3f37c6f4

Daily bump. · ca12354f
GCC Administrator authored 6 months ago

ca12354f

Sep 22, 2024

RISC-V: Add testcases for form 4 of signed scalar SAT_ADD · 50c9c3cb

Pan Li authored 6 months ago


Form 4:
  #define DEF_SAT_S_ADD_FMT_4(T, UT, MIN, MAX)           \
  T __attribute__((noinline))                            \
  sat_s_add_##T##_fmt_4 (T x, T y)                       \
  {                                                      \
    T sum;                                               \
    bool overflow = __builtin_add_overflow (x, y, &sum); \
    return !overflow ? sum : x < 0 ? MIN : MAX;          \
  }

DEF_SAT_S_ADD_FMT_4 (int64_t, uint64_t, INT64_MIN, INT64_MAX)

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_arith.h: Add test helper macros.
	* gcc.target/riscv/sat_s_add-13.c: New test.
	* gcc.target/riscv/sat_s_add-14.c: New test.
	* gcc.target/riscv/sat_s_add-15.c: New test.
	* gcc.target/riscv/sat_s_add-16.c: New test.
	* gcc.target/riscv/sat_s_add-run-13.c: New test.
	* gcc.target/riscv/sat_s_add-run-14.c: New test.
	* gcc.target/riscv/sat_s_add-run-15.c: New test.
	* gcc.target/riscv/sat_s_add-run-16.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

50c9c3cb

RISC-V: Add testcases for form 3 of signed scalar SAT_ADD · 20ec2c5d

Pan Li authored 6 months ago


This patch would like to add testcases of the signed scalar SAT_ADD
for form 3.  Aka:

Form 3:
  #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)           \
  T __attribute__((noinline))                            \
  sat_s_add_##T##_fmt_3 (T x, T y)                       \
  {                                                      \
    T sum;                                               \
    bool overflow = __builtin_add_overflow (x, y, &sum); \
    return overflow ? x < 0 ? MIN : MAX : sum;           \
  }

DEF_SAT_S_ADD_FMT_3 (int64_t, uint64_t, INT64_MIN, INT64_MAX)

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_arith.h: Add test helper macros.
	* gcc.target/riscv/sat_s_add-10.c: New test.
	* gcc.target/riscv/sat_s_add-11.c: New test.
	* gcc.target/riscv/sat_s_add-12.c: New test.
	* gcc.target/riscv/sat_s_add-9.c: New test.
	* gcc.target/riscv/sat_s_add-run-10.c: New test.
	* gcc.target/riscv/sat_s_add-run-11.c: New test.
	* gcc.target/riscv/sat_s_add-run-12.c: New test.
	* gcc.target/riscv/sat_s_add-run-9.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

20ec2c5d

testsuite, coroutines: Add tests for non-supension ramp returns. · 0312b666

Iain Sandoe authored 6 months ago


Although it is most common for the ramp function to see a return when a coroutine
first suspends, there are other possibilities.  For example all the awaits could
be ready - effectively the coroutine will then run to completion and deallocation.
Another case is where the first active suspension point causes the current routine
to be cancelled and thence destroyed.

These cases are tested here.

gcc/testsuite/ChangeLog:

	* g++.dg/coroutines/torture/special-termination-00-sync-completion.C: New test.
	* g++.dg/coroutines/torture/special-termination-01-self-destruct.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

0312b666

libgcc, Darwin: From macOS 11, make that the earliest supported. · 43eab549

Iain Sandoe authored 6 months ago


For libgcc, we have (so far) supported building a DSO that supports
earlier versions of the OS than the target.  From macOS 11, there are
APIs that do not exist on earlier OS versions, so limit the libgcc
range to macOS11..current.

libgcc/ChangeLog:

	* config.host: From macOS 11, limit earliest macOS support
	to macOS 11.
	* config/t-darwin-min-11: New file.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

43eab549

libstdc++: Disable std::formatter<char8_t, C> specialization · 0f52a92a

Jonathan Wakely authored 6 months ago

I noticed that char8_t was missing from the list of types that were
prevented from using the std::formatter partial specialization for
integer types. That partial specialization was also matching
cv-qualified integer types, because std::integral<const int> is true.

This change simplifies the constraints by introducing a new variable
template which is only true for cv-unqualified integer types, with
explicit specializations to exclude the character types. This should be
slightly more efficient than the previous constraints that checked
std::integral<T> and (!__is_one_of<T, char, wchar_t, ...>). It also
avoids the need for a separate std::formatter specialization for 128-bit
integers, as they can be handled by the new variable template too.

libstdc++-v3/ChangeLog:

	* include/std/format (__format::__is_formattable_integer): New
	variable template and specializations.
	(template<integral, __char> struct formatter): Replace
	constraints on first arg with __is_formattable_integer.
	* testsuite/std/format/formatter/requirements.cc: Check that
	std::formatter specializations for char8_t and const int are
	disabled.

0f52a92a

libstdc++: Fix condition for ranges::copy to use memmove [PR116754] · 83c6fe13

Jonathan Wakely authored 6 months ago

libstdc++-v3/ChangeLog:

	PR libstdc++/116754
	* include/bits/ranges_algobase.h (__copy_or_move): Fix order of
	arguments to __memcpyable.

83c6fe13

libstdc++: Fix formatting of most negative chrono::duration [PR116755] · 482e651f

Jonathan Wakely authored 6 months ago

When formatting chrono::duration<signed-integer-type, P>::min() we were
causing undefined behaviour by trying to form the negative of the most
negative value. If we convert negative durations with integer rep to the
corresponding unsigned integer rep then we can safely represent all
values.

libstdc++-v3/ChangeLog:

	PR libstdc++/116755
	* include/bits/chrono_io.h (formatter<duration<R,P>>::format):
	Cast negative integral durations to unsigned rep.
	* testsuite/20_util/duration/io.cc: Test the most negative
	integer durations.

482e651f

libstdc++: Use constexpr instead of _GLIBCXX20_CONSTEXPR in <vector> · b6463161

Jonathan Wakely authored 6 months ago

For the operator<=> overload we can use the 'constexpr' keyword
directly, because we know the language dialect is at least C++20.

libstdc++-v3/ChangeLog:

	* include/bits/stl_vector.h (operator<=>): Use constexpr
	instead of _GLIBCXX20_CONSTEXPR macro.

b6463161

libstdc++: Silence -Wattributes warning in exception_ptr · 164c1b1f

Jonathan Wakely authored 6 months ago

libstdc++-v3/ChangeLog:

	* libsupc++/exception_ptr.h (__exception_ptr::_M_safe_bool_dummy):
	Remove __attribute__((const)) from function returning void.

164c1b1f

libstdc++: Silence -Woverloaded-virtual warning in cxx11-ios_failure.cc · d842eb5e

Jonathan Wakely authored 6 months ago

libstdc++-v3/ChangeLog:

	* src/c++11/cxx11-ios_failure.cc (__iosfail_type_info): Unhide
	the three-arg overload of __do_upcast.

d842eb5e

libstdc++: Reorder C++26 entries in version.def · d024be89

Jonathan Wakely authored 6 months ago

This puts the C++26 ftms definitions in alphabetical order.

libstdc++-v3/ChangeLog:

	* include/bits/version.def: Sort C++26 entries alphabetically.
	* include/bits/version.h: Regenerate.

d024be89

libstdc++: add default template parameters to algorithms · dc47add7

Jonathan Wakely authored 6 months ago


This implements P2248R8 + P3217R0, both approved for C++26.
The changes are mostly mechanical; the struggle is to keep readability
with the pre-P2248 signatures.

* For containers, "classic STL" algorithms and their parallel versions,
  introduce a macro and amend their declarations/definitions with it.
  The macro either expands to the defaulted parameter or to nothing
  in pre-C++26 modes.

* For range algorithms, we need to reorder their template parameters.
  I've done so unconditionally, because users cannot rely on template
  parameters of algorithms (this is explicitly authorized by
  [algorithms.requirements]/15). The defaults are then hidden behind
  another macro.

libstdc++-v3/ChangeLog:

	* include/bits/iterator_concepts.h: Add projected_value_t.
	* include/bits/algorithmfwd.h: Add the default template
	parameter to the relevant forward declarations.
	* include/pstl/glue_algorithm_defs.h: Likewise.
	* include/bits/ranges_algo.h: Add the default template
	parameter to range-based algorithms.
	* include/bits/ranges_algobase.h: Likewise.
	* include/bits/ranges_util.h: Likewise.
	* include/bits/ranges_base.h: Add helper macros.
	* include/bits/stl_iterator_base_types.h: Add helper macro.
	* include/bits/version.def: Add the new feature-testing macro.
	* include/bits/version.h: Regenerate.
	* include/std/algorithm: Pull the feature-testing macro.
	* include/std/ranges: Likewise.
	* include/std/deque: Pull the feature-testing macro, add
	the default for std::erase.
	* include/std/forward_list: Likewise.
	* include/std/list: Likewise.
	* include/std/string: Likewise.
	* include/std/vector: Likewise.
	* testsuite/23_containers/default_template_value.cc: New test.
	* testsuite/25_algorithms/default_template_value.cc: New test.

Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
Co-authored-by: Jonathan Wakely <jwakely@redhat.com>

dc47add7

middle-end: lower COND_EXPR into gimple form in vect_recog_bool_pattern · 4150bcd2

Tamar Christina authored 6 months ago

Currently the vectorizer cheats when lowering COND_EXPR during bool recog.
In the cases where the conditonal is loop invariant or non-boolean it instead
converts the operation back into GENERIC and hides much of the operation from
the analysis part of the vectorizer.

i.e.

  a ? b : c

is transformed into:

  a != 0 ? b : c

however by doing so we can't perform any optimization on the mask as they aren't
explicit until quite late during codegen.

To fix this this patch lowers booleans earlier and so ensures that we are always
in GIMPLE.

For when the value is a loop invariant boolean we have to generate an additional
conversion from bool to the integer mask form.

This is done by creating a loop invariant a ? -1 : 0 with the target mask
precision and then doing a normal != 0 comparison on that.

To support this the patch also adds the ability to during pattern matching
create a loop invariant pattern that won't be seen by the vectorizer and will
instead me materialized inside the loop preheader in the case of loops, or in
the case of BB vectorization it materializes it in the first BB in the region.

gcc/ChangeLog:

	* tree-vect-patterns.cc (append_inv_pattern_def_seq): New.
	(vect_recog_bool_pattern): Lower COND_EXPRs.
	* tree-vect-slp.cc (vect_slp_region): Materialize loop invariant
	statements.
	* tree-vect-loop.cc (vect_transform_loop): Likewise.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Remove
	VECT_SCALAR_BOOLEAN_TYPE_P handling for vectype.
	* tree-vectorizer.cc (vec_info::vec_info): Initialize
	inv_pattern_def_seq.
	* tree-vectorizer.h (LOOP_VINFO_INV_PATTERN_DEF_SEQ): New.
	(class vec_info): Add inv_pattern_def_seq.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/bb-slp-conditional_store_1.c: New test.
	* gcc.dg/vect/vect-conditional_store_5.c: New test.
	* gcc.dg/vect/vect-conditional_store_6.c: New test.

4150bcd2

aarch64: Take into account when VF is higher than known scalar iters · e84e5d03

Tamar Christina authored 6 months ago

Consider low overhead loops like:

void
foo (char *restrict a, int *restrict b, int *restrict c, int n)
{
  for (int i = 0; i < 9; i++)
    {
      int res = c[i];
      int t = b[i];
      if (a[i] != 0)
        res = t;
      c[i] = res;
    }
}

For such loops we use latency only costing since the loop bounds is known and
small.

The current costing however does not consider the case where niters < VF.

So when comparing the scalar vs vector costs it doesn't keep in mind that the
scalar code can't perform VF iterations.  This makes it overestimate the cost
for the scalar loop and we incorrectly vectorize.

This patch takes the minimum of the VF and niters in such cases.
Before the patch we generate:

 note:  Original vector body cost = 46
 note:  Vector loop iterates at most 1 times
 note:  Scalar issue estimate:
 note:    load operations = 2
 note:    store operations = 1
 note:    general operations = 1
 note:    reduction latency = 0
 note:    estimated min cycles per iteration = 1.000000
 note:    estimated cycles per vector iteration (for VF 32) = 32.000000
 note:  SVE issue estimate:
 note:    load operations = 5
 note:    store operations = 4
 note:    general operations = 11
 note:    predicate operations = 12
 note:    reduction latency = 0
 note:    estimated min cycles per iteration without predication = 5.500000
 note:    estimated min cycles per iteration for predication = 12.000000
 note:    estimated min cycles per iteration = 12.000000
 note:  Low iteration count, so using pure latency costs
 note:  Cost model analysis:

vs after:

 note:  Original vector body cost = 46
 note:  Known loop bounds, capping VF to 9 for analysis
 note:  Vector loop iterates at most 1 times
 note:  Scalar issue estimate:
 note:    load operations = 2
 note:    store operations = 1
 note:    general operations = 1
 note:    reduction latency = 0
 note:    estimated min cycles per iteration = 1.000000
 note:    estimated cycles per vector iteration (for VF 9) = 9.000000
 note:  SVE issue estimate:
 note:    load operations = 5
 note:    store operations = 4
 note:    general operations = 11
 note:    predicate operations = 12
 note:    reduction latency = 0
 note:    estimated min cycles per iteration without predication = 5.500000
 note:    estimated min cycles per iteration for predication = 12.000000
 note:    estimated min cycles per iteration = 12.000000
 note:  Increasing body cost to 1472 because the scalar code could issue within the limit imposed by predicate operations
 note:  Low iteration count, so using pure latency costs
 note:  Cost model analysis:

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (adjust_body_cost):
	Cap VF for low iteration loops.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/asrdiv_4.c: Update bounds.
	* gcc.target/aarch64/sve/cond_asrd_2.c: Likewise.
	* gcc.target/aarch64/sve/cond_uxt_6.c: Likewise.
	* gcc.target/aarch64/sve/cond_uxt_7.c: Likewise.
	* gcc.target/aarch64/sve/cond_uxt_8.c: Likewise.
	* gcc.target/aarch64/sve/miniloop_1.c: Likewise.
	* gcc.target/aarch64/sve/spill_6.c: Likewise.
	* gcc.target/aarch64/sve/sve_iters_low_1.c: New test.
	* gcc.target/aarch64/sve/sve_iters_low_2.c: New test.

e84e5d03

Daily bump. · 67382245
GCC Administrator authored 6 months ago

67382245

Sep 21, 2024

fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608] · d6cb7794

Mikael Morin authored 6 months ago

Introduce the -finline-intrinsics flag to control from the command line
whether to generate either inline code or calls to the functions from the
library, for the MINLOC and MAXLOC intrinsics.

The flag allows to specify inlining either independently for each intrinsic
(either MINLOC or MAXLOC), or all together.  For each intrinsic, a default
value is set if none was set.  The default value depends on the optimization
setting: inlining is avoided if not optimizing or if optimizing for size;
otherwise inlining is preferred.

There is no direct support for this behaviour provided by the .opt options
framework.  It is obtained by defining three different variants of the flag
(finline-intrinsics, fno-inline-intrinsics, finline-intrinsics=) all using
the same underlying option variable.  Each enum value (corresponding to an
intrinsic function) uses two identical bits, and the variable is initialized
with alternated bits, so that we can tell whether the value was set or not
by checking whether the two bits have different values.

	PR fortran/90608

gcc/ChangeLog:

	* flag-types.h (enum gfc_inlineable_intrinsics): New type.

gcc/fortran/ChangeLog:

	* invoke.texi(finline-intrinsics): Document new flag.
	* lang.opt (finline-intrinsics, finline-intrinsics=,
	fno-inline-intrinsics): New flags.
	* options.cc (gfc_post_options): If the option variable controlling
	the inlining of MAXLOC (respectively MINLOC) has not been set, set
	it or clear it depending on the optimization option variables.
	* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Return false
	if inlining for the intrinsic is disabled according to the option
	variable.

gcc/testsuite/ChangeLog:

	* gfortran.dg/minmaxloc_18.f90: New test.
	* gfortran.dg/minmaxloc_18a.f90: New test.
	* gfortran.dg/minmaxloc_18b.f90: New test.
	* gfortran.dg/minmaxloc_18c.f90: New test.
	* gfortran.dg/minmaxloc_18d.f90: New test.

d6cb7794

fortran: Continue MINLOC/MAXLOC second loop where the first stopped [PR90608] · 3c01ddc4

Mikael Morin authored 6 months ago

Continue the second set of loops where the first one stopped in the
generated inline MINLOC/MAXLOC code in the cases where the generated code
contains two sets of loops.  This fixes a regression that was introduced
when enabling the generation of inline MINLOC/MAXLOC code with ARRAY of rank
greater than 1, no DIM argument, and either non-scalar MASK or floating-
point ARRAY.

In the cases where two sets of loops are generated as inline MINLOC/MAXLOC
code, we previously generated code such as (for rank 2 ARRAY, so with two
levels of nesting):

	for (idx11 in lower1..upper1)
	  {
	    for (idx12 in lower2..upper2)
	      {
	        ...
	        if (...)
	          {
	            ...
	            goto second_loop;
	          }
	      }
	  }
	second_loop:
	for (idx21 in lower1..upper1)
	  {
	    for (idx22 in lower2..upper2)
	      {
	        ...
	      }
	  }

which means we process the first elements twice, once in the first set
of loops and once in the second one.  This change avoids this duplicate
processing by using a conditional as lower bound for the second set of
loops, generating code like:

	second_loop_entry = false;
	for (idx11 in lower1..upper1)
	  {
	    for (idx12 in lower2..upper2)
	      {
	        ...
	        if (...)
	          {
	            ...
	            second_loop_entry = true;
	            goto second_loop;
	          }
	      }
	  }
	second_loop:
	for (idx21 in (second_loop_entry ? idx11 : lower1)..upper1)
	  {
	    for (idx22 in (second_loop_entry ? idx12 : lower2)..upper2)
	      {
	        ...
	        second_loop_entry = false;
	      }
	  }

It was expected that the compiler optimizations would be able to remove the
state variable second_loop_entry.  It is the case if ARRAY has rank 1 (so
without loop nesting), the variable is removed and the loop bounds become
unconditional, which restores previously generated code, fully fixing the
regression.  For larger rank, unfortunately, the state variable and
conditional loop bounds remain, but those cases were previously using
library calls, so it's not a regression.

	PR fortran/90608

gcc/fortran/ChangeLog:

	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate a set
	of index variables.  Set them using the loop indexes before leaving
	the first set of loops.  Generate a new loop entry predicate.
	Initialize it.  Set it before leaving the first set of loops.  Clear
	it in the body of the second set of loops.  For the second set of
	loops, update each loop lower bound to use the corresponding index
	variable if the predicate variable is set.

3c01ddc4

fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608] · 7d43b4e0

Mikael Morin authored 6 months ago

Enable generation of inline MINLOC/MAXLOC code in the case where DIM
is not present, and either ARRAY is of floating point type or MASK is an
array.  Those cases are the remaining bits to fully support inlining of
non-CHARACTER MINLOC/MAXLOC without DIM.  They are treated together because
they generate similar code, the NANs for REAL types being handled a bit like
a second level of masking.  These are the cases for which we generate two
sets of loops.

This change affects the code generating the second loop, that was previously
accessible only in the cases ARRAY has rank 1 only.  The single variable
initialization and update are changed to apply to multiple variables, one
per dimension.

The code generated is as follows (if ARRAY has rank 2):

	for (idx11 in lower1..upper1)
	  {
	    for (idx12 in lower2..upper2)
	      {
		...
		if (...)
		  {
		    ...
		    goto second_loop;
		  }
	      }
	  }
	second_loop:
	for (idx21 in lower1..upper1)
	  {
	    for (idx22 in lower2..upper2)
	      {
		...
	      }
	  }

This code leads to processing the first elements redundantly, both in the
first set of loops and in the second one.  The loop over idx22 could
start from idx12 the first time it is run, but as it has to start from
lower2 for the rest of the runs, this change uses the same bounds for both
set of loops for simplicity.  In the rank 1 case, this makes the generated
code worse compared to the inline code that was generated before.  A later
change will introduce conditionals to avoid the duplicate processing and
restore the generated code in that case.

	PR fortran/90608

gcc/fortran/ChangeLog:

	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Initialize
	and update all the variables.  Put the label and goto in the
	outermost scalarizer loop.  Don't start the second loop where the
	first stopped.
	(gfc_inline_intrinsic_function_p): Also return TRUE for array MASK
	or for any REAL type.

gcc/testsuite/ChangeLog:

	* gfortran.dg/maxloc_bounds_5.f90: Additionally accept error
	messages reported by the scalarizer.
	* gfortran.dg/maxloc_bounds_6.f90: Ditto.

7d43b4e0

fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK [PR90608] · 5999d558

Mikael Morin authored 6 months ago

Enable the generation of inline code for MINLOC/MAXLOC when argument ARRAY
is of integral type, DIM is not present, and MASK is present and is scalar
(only absent MASK or rank 1 ARRAY were inlined before).

Scalar masks are implemented with a wrapping condition around the code one
would generate if MASK wasn't present, so they are easy to support once
inline code without MASK is working.

	PR fortran/90608

gcc/fortran/ChangeLog:

	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate
	variable initialization for each dimension in the else branch of
	the toplevel condition.
	(gfc_inline_intrinsic_function_p): Return TRUE for scalar MASK.

gcc/testsuite/ChangeLog:

	* gfortran.dg/maxloc_bounds_7.f90: Additionally accept the error message
	reported by the scalarizer.

5999d558

fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK [PR90608] · dd525038

Mikael Morin authored 6 months ago

Enable generation of inline code for the MINLOC and MAXLOC intrinsic,
if the ARRAY argument is of integral type and of any rank (only the rank 1
case was previously inlined), and neither DIM nor MASK arguments are
present.

This needs a few adjustments in gfc_conv_intrinsic_minmaxloc,
mainly to replace the single variables POS and OFFSET, with collections
of variables, one variable per dimension each.

The restriction to integral ARRAY and absent MASK limits the scope of
the change to the cases where we generate single loop inline code.  The
code generation for the second loop is only accessible with ARRAY of rank
1, so it can continue using a single variable.  A later change will extend
inlining to the double loop cases.

There is some bounds checking code that was previously handled by the
library, and that needed some changes in the scalarizer to avoid regressing.
The bounds check code generation was already supported by the scalarizer,
but it was only applying to array reference sections, checking both
for array bound violation and for shape conformability between all the
involved arrays.  With this change, for MINLOC or MAXLOC, enable the
conformability check between all the scalarized arrays, and disable the
array bound violation check.

	PR fortran/90608

gcc/fortran/ChangeLog:

	* trans-array.cc (gfc_conv_ss_startstride): Set the MINLOC/MAXLOC
	result upper bound using the rank of the ARRAY argument.  Ajdust
	the error message for intrinsic result arrays.  Only check array
	bounds for array references.  Move bound check decision code...
	(bounds_check_needed): ... here as a new predicate.  Allow bound
	check for MINLOC/MAXLOC intrinsic results.
	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Change the
	result array upper bound to the rank of ARRAY.  Update the NONEMPTY
	variable to depend on the non-empty extent of every dimension.  Use
	one variable per dimension instead of a single variable for the
	position and the offset.  Update their declaration, initialization,
	and update to affect the variable of each dimension.  Use the first
	variable only in areas only accessed with rank 1 ARRAY argument.
	Set every element of the result using its corresponding variable.
	(gfc_inline_intrinsic_function_p): Return true for integral ARRAY
	and absent DIM and MASK.

gcc/testsuite/ChangeLog:

	* gfortran.dg/maxloc_bounds_4.f90: Additionally accept the error
	message emitted by the scalarizer.

dd525038