- Oct 09, 2024
-
-
Jonathan Wakely authored
Implement Peter Dimov's suggestion for resolving LWG 4118, which is to use +d.count() so that character types are promoted to an integer type before formatting them. This didn't have unanimous consensus in the committee as Howard Hinnant proposed that we should format the rep consistently with std::format("{}", d.count()) instead. That ends up being more complicated, because it makes std::formattable a precondition of operator<< which was not previously the case, and it means that ios_base::fmtflags from the stream would be ignored because std::format doesn't use them. libstdc++-v3/ChangeLog: PR libstdc++/116755 * include/bits/chrono_io.h (operator<<): Use +d.count() for duration inserter. (__formatter_chrono::_M_format): Likewise for %Q format. * testsuite/20_util/duration/io.cc: Test durations with character types as reps.
-
Richard Biener authored
I've tried to sanitize DR_GROUP_NEXT_ELEMENT accesses but there are too many so the following instead makes sure DR_GROUP_NEXT_ELEMENT is never non-NULL for !STMT_VINFO_GROUPED_ACCESS. * tree-vect-data-refs.cc (vect_analyze_data_ref_access): When cancelling a DR group also clear DR_GROUP_NEXT_ELEMENT.
-
Richard Biener authored
When we first detect a grouped load but later dis-associate it we only set DR_GROUP_FIRST_ELEMENT to NULL, indicating it is not a STMT_VINFO_GROUPED_ACCESS but leave DR_GROUP_NEXT_ELEMENT set. This causes a stray DR_GROUP_NEXT_ELEMENT access in get_group_load_store_type to go wrong, indicating a load isn't single_element_p when it actually is, leading to wrong classification and an ICE. PR tree-optimization/117041 * tree-vect-stmts.cc (get_group_load_store_type): Only check DR_GROUP_NEXT_ELEMENT for STMT_VINFO_GROUPED_ACCESS. * gcc.dg/torture/pr117041.c: New testcase.
-
Torbjörn SVENSSON authored
Update test cases to use -mcpu=unset/-march=unset feature introduced in r15-3606-g7d6c6a0d15c. gcc/testsuite/ChangeLog * gcc.target/arm/pr65647.c: Use effective-target arm_arch_v6m. Removed unneeded dg-skip-if. * gcc.target/arm/mod_2.c: Use effective-target arm_cpu_cortex_a57. * gcc.target/arm/mod_256.c: Likewise. * gcc.target/arm/vseleqdf.c: Likewise. * gcc.target/arm/vseleqsf.c: Likewise. * gcc.target/arm/vselgedf.c: Likewise. * gcc.target/arm/vselgesf.c: Likewise. * gcc.target/arm/vselgtdf.c: Likewise. * gcc.target/arm/vselgtsf.c: Likewise. * gcc.target/arm/vselledf.c: Likewise. * gcc.target/arm/vsellesf.c: Likewise. * gcc.target/arm/vselltdf.c: Likewise. * gcc.target/arm/vselltsf.c: Likewise. * gcc.target/arm/vselnedf.c: Likewise. * gcc.target/arm/vselnesf.c: Likewise. * gcc.target/arm/vselvcdf.c: Likewise. * gcc.target/arm/vselvcsf.c: Likewise. * gcc.target/arm/vselvsdf.c: Likewise. * gcc.target/arm/vselvssf.c: Likewise. * lib/target-supports.exp: Define effective-target arm_cpu_cortex_a57. Update effective-target arm_v8_1_lob_ok to use -mcpu=unset. Signed-off-by:
Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
-
Ken Matsui authored
PR bootstrap/117039 libcpp/ChangeLog: * directives.cc (do_pragma_once): Use ' instead of %< and %>. Signed-off-by:
Ken Matsui <kmatsui@gcc.gnu.org>
-
René Rebe authored
This was tested by bootstrapping GCC natively on ia64-t2-linux-gnu and running the testsuite (based on 23611606): https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html For comparison, the same with just 23611606: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html gcc/ * config/ia64/ia64.cc: Enable LRA for ia64. * config/ia64/ia64.md: Likewise. * config/ia64/predicates.md: Likewise. Signed-off-by:
René Rebe <rene@exactcode.de>
-
René Rebe authored
The following un-deprecates ia64*-*-linux for GCC 15. Since we plan to support this for some years to come. gcc/ * config.gcc: Only list ia64*-*-(hpux|vms|elf) in the list of obsoleted targets. contrib/ * config-list.mk (LIST): no --enable-obsolete for ia64-linux. Signed-off-by:
René Rebe <rene@exactcode.de>
-
Richard Biener authored
The following massages the GIMPLE matching way of handling scan stores to work with single-lane SLP. I do not fully understand all the cases that can happen and the stmt matching at vectorizable_store time is less than ideal - but the following gets me all the testcases to pass with and without forced SLP. Long term we want to perform the matching at SLP discovery time, properly chaining the various SLP instances the current state ends up with. PR tree-optimization/116974 * tree-vect-stmts.cc (check_scan_store): Pass in the SLP node instead of just a flag. Allow single-lane scan stores. (vectorizable_store): Adjust. * tree-vect-loop.cc (vect_analyze_loop_2): Empty scan_map before re-trying.
-
Richard Biener authored
The following handles SLP discovery of permuted masked loads which was prohibited (because wrongly handled) for PR114375. In particular with single-lane SLP at the moment all masked group loads appear permuted and we fail to use masked load lanes as well. The following addresses parts of the issues, starting with doing correct basic discovery - namely discover an unpermuted mask load followed by a permute node. In particular groups with gaps do not support masking yet (and didn't before w/o SLP IIRC). There's still issues with how we represent masked load/store-lanes I think, but I first have to get my hands on a good testcase. PR tree-optimization/116575 PR tree-optimization/114375 * tree-vect-slp.cc (vect_build_slp_tree_2): Do not reject permuted mask loads without gaps but instead discover a node for the full unpermuted load and permute that with a VEC_PERM node. * gcc.dg/vect/vect-pr114375.c: Expect vectorization now with avx2.
-
Richard Biener authored
The following adds a pattern to elide a .REDUC_IOR operation when the result is compared against zero with a cbranch. I've resorted to using can_compare_p since that's what RTL expansion eventually checks - while GIMPLE allowed whole vector equality compares for long I'll notice vector lowering won't lower unsupported ones and RTL expansion doesn't seem to try using [u]cmp<vector-mode> optabs (and neither x86 nor aarch64 implements those). There's cstore but no target implements that for vector modes either. PR tree-optimization/117000 * match.pd (.REDUC_IOR !=/== 0): New pattern. * gimple-match-head.cc: Include memmodel.h and optabs.h. * generic-match-head.cc: Likewise. * gcc.target/i386/pr117000.c: New testcase.
-
Richard Biener authored
The following avoids copying scalar stmts again for the re-lookup of the slot to replace the NULL guard with node. * tree-vect-slp.cc (vect_cse_slp_nodes): Fix memory leak.
-
Jan Beulich authored
Present wording has misled people to believe the ?: operator would be evaluating all three of the involved expressions. gcc/ * doc/extend.texi: Clarify __builtin_choose_expr() (dis)similarity to the ?: operator.
-
Ken Matsui authored
This patch adds a warning switch for "#pragma once in main file". The warning option name is Wpragma-once-outside-header, which is the same as Clang provides. PR preprocessor/89808 gcc/c-family/ChangeLog: * c.opt (Wpragma_once_outside_header): Define new option. * c.opt.urls: Regenerate. gcc/ChangeLog: * doc/invoke.texi (Warning Options): Document -Wno-pragma-once-outside-header. libcpp/ChangeLog: * include/cpplib.h (cpp_warning_reason): Define CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER. * directives.cc (do_pragma_once): Use CPP_W_PRAGMA_ONCE_OUTSIDE_HEADER. gcc/testsuite/ChangeLog: * g++.dg/warn/Wno-pragma-once-outside-header.C: New test. * g++.dg/warn/Wpragma-once-outside-header.C: New test. Signed-off-by:
Ken Matsui <kmatsui@gcc.gnu.org> Reviewed-by:
Marek Polacek <polacek@redhat.com>
-
GCC Administrator authored
-
Artemiy Volkov authored
Whenever C1 and C2 are integer constants, X is of a wrapping type, and cmp is a relational operator, the expression X +- C1 cmp C2 can be simplified in the following cases: (a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial comparison in the following way: X +- C1 <= C2 -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1) -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions) -INF -+ C1 <= X <= +INF (due to (1)) -INF -+ C1 <= X (eliminate the right hand side since it holds for any X) (b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following sequence of transformations: X +- C1 >= C2 +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1) +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions) +INF -+ C1 >= X >= -INF (due to (1)) +INF -+ C1 >= X (eliminate the right hand side since it holds for any X) (c) The > and < cases are negations of (a) and (b), respectively. This transformation allows to occasionally save add / sub instructions, for instance the expression 3 + (uint32_t)f() < 2 compiles to cmn w0, #4 cset w0, ls instead of add w0, w0, 3 cmp w0, 2 cset w0, ls on aarch64. Testcases that go together with this patch have been split into two separate files, one containing testcases for unsigned variables and the other for wrapping signed ones (and thus compiled with -fwrapv). Additionally, one aarch64 test has been adjusted since the patch has caused the generated code to change from cmn w0, #2 csinc w0, w1, wzr, cc (x < -2) to cmn w0, #3 csinc w0, w1, wzr, cs (x <= -3) This patch has been bootstrapped and regtested on aarch64, x86_64, and i386, and additionally regtested on riscv32. gcc/ChangeLog: PR tree-optimization/116024 * match.pd: New transformation around integer comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr116024-2.c: New test. * gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto. * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.
-
Artemiy Volkov authored
Implement a match.pd transformation inverting the sign of X in C1 - X cmp C2, where C1 and C2 are integer constants and X is of a wrapping signed type, by observing that: (a) If cmp is == or !=, simply move X and C2 to opposite sides of the comparison to arrive at X cmp C1 - C2. (b) If cmp is <: - C1 - X < C2 means that C1 - X spans the values of -INF, -INF + 1, ..., C2 - 1; - Therefore, X is one of C1 - -INF, C1 - (-INF + 1), ..., C1 - C2 + 1; - Subtracting (C1 + 1), X - (C1 + 1) is one of - (-INF) - 1, - (-INF) - 2, ..., -C2; - Using the fact that - (-INF) - 1 is +INF, derive that X - (C1 + 1) spans the values +INF, +INF - 1, ..., -C2; - Thus, the original expression can be simplified to X - (C1 + 1) > -C2 - 1. (c) Similarly, C1 - X <= C2 is equivalent to X - (C1 + 1) >= -C2 - 1. (d) The >= and > cases are negations of (b) and (c), respectively. (e) In all cases, the expression -C2 - 1 can be shortened to bit_not (C2). This transformation allows to occasionally save load-immediate / subtraction instructions, e.g. the following statement: 10 - (int)f() >= 20; now compiles to addi a0,a0,-11 slti a0,a0,-20 instead of li a5,10 sub a0,a5,a0 slti t0,a0,20 xori a0,t0,1 on 32-bit RISC-V when compiled with -fwrapv. Additional examples can be found in the newly added test file. This patch has been bootstrapped and regtested on aarch64, x86_64, and i386, and additionally regtested on riscv32. gcc/ChangeLog: PR tree-optimization/116024 * match.pd: New transformation around integer comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr116024-1-fwrapv.c: New test.
-
- Oct 08, 2024
-
-
Artemiy Volkov authored
Implement a match.pd transformation inverting the sign of X in C1 - X cmp C2, where C1 and C2 are integer constants and X is of an unsigned type, by observing that: (a) If cmp is == or !=, simply move X and C2 to opposite sides of the comparison to arrive at X cmp C1 - C2. (b) If cmp is <: - C1 - X < C2 means that C1 - X spans the range of 0, 1, ..., C2 - 1; - This means that X spans the range of C1 - (C2 - 1), C1 - (C2 - 2), ..., C1; - Subtracting C1 - (C2 - 1), X - (C1 - (C2 - 1)) is one of 0, 1, ..., C1 - (C1 - (C2 - 1)); - Simplifying the above, X - (C1 - C2 + 1) is one of 0, 1, ..., C2 - 1; - Summarizing, the expression C1 - X < C2 can be transformed into X - (C1 - C2 + 1) < C2. (c) Similarly, if cmp is <=: - C1 - X <= C2 means that C1 - X is one of 0, 1, ..., C2; - It follows that X is one of C1 - C2, C1 - (C2 - 1), ..., C1; - Subtracting C1 - C2, X - (C1 - C2) has range 0, 1, ..., C2; - Thus, the expression C1 - X <= C2 can be transformed into X - (C1 - C2) <= C2. (d) The >= and > cases are negations of (b) and (c), respectively. This transformation allows to occasionally save load-immediate / subtraction instructions, e.g. the following statement: 300 - (unsigned int)f() < 100; now compiles to addi a0,a0,-201 sltiu a0,a0,100 instead of li a5,300 sub a0,a5,a0 sltiu a0,a0,100 on 32-bit RISC-V. Additional examples can be found in the newly added test file. This patch has been bootstrapped and regtested on aarch64, x86_64, and i386, and additionally regtested on riscv32. gcc/ChangeLog: PR tree-optimization/116024 * match.pd: New transformation around integer comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr116024-1.c: New test.
-
Artemiy Volkov authored
Implement a match.pd pattern for C1 - X cmp C2, where C1 and C2 are integer constants and X is of a UB-on-overflow type. The pattern is simplified to X rcmp C1 - C2 by moving X and C2 to the other side of the comparison (with opposite signs). If C1 - C2 happens to overflow, replace the whole expression with either a constant 0 or a constant 1 node, depending on the comparison operator and the sign of the overflow. This transformation allows to occasionally save load-immediate / subtraction instructions, e.g. the following statement: 10 - (int) x <= 9; now compiles to sgt a0,a0,zero instead of li a5,10 sub a0,a5,a0 slti a0,a0,10 on 32-bit RISC-V. Additional examples can be found in the newly added test file. This patch has been bootstrapped and regtested on aarch64, x86_64, and i386, and additionally regtested on riscv32. Existing tests were adjusted where necessary. gcc/ChangeLog: PR tree-optimization/116024 * match.pd: New transformation around integer comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr116024.c: New test. * gcc.dg/pr67089-6.c: Adjust.
-
Tsung Chun Lin authored
From d5b254e19d1f37fe27c7e98a0160e5c22446cfea Mon Sep 17 00:00:00 2001 From: Jim Lin <jim@andestech.com> Date: Tue, 8 Oct 2024 13:14:32 +0800 Subject: [PATCH] RISC-V: Enable builtin __riscv_mul with Zmmul extension. gcc/ChangeLog: * config/riscv/riscv-c.cc: (riscv_cpu_cpp_builtins): Enable builtin __riscv_mul with Zmmul extension.
-
Tsung Chun Lin authored
That M implies Zmmul. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: M implies Zmmul.
-
Yangyu Chen authored
Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V ISA. As a result, certain functions cannot be optimized with inlining when specific options, such as __attribute__((target("arch=+v"))) . This can lead to potential performance issues when building retargetable binaries for RISC-V. To address this, I have implemented the riscv_can_inline_p function. This addition enables inlining when the callee either has no special options or when the some options match, and also ensuring that the callee's ISA is a subset of the caller's. I also check some other options when there is no always_inline set. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (cl_opt_var_ref_t): Add cl_opt_var_ref_t pointer to member of cl_target_option. (struct riscv_ext_flag_table_t): Add new cl_opt_var_ref_t field. (RISCV_EXT_FLAG_ENTRY): New macro to simplify the definition of riscv_ext_flag_table. (riscv_ext_is_subset): New function to check if the callee's ISA is a subset of the caller's. (riscv_x_target_flags_isa_mask): New function to get the mask of ISA extension in x_target_flags of gcc_options. * config/riscv/riscv-subset.h (riscv_ext_is_subset): Declare riscv_ext_is_subset function. (riscv_x_target_flags_isa_mask): Declare riscv_x_target_flags_isa_mask function. * config/riscv/riscv.cc (riscv_can_inline_p): New function. (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.
-
Eric Botcazou authored
gcc/testsuite/ PR ada/116190 * gnat.dg/aggr31.adb: New test.
-
Eric Botcazou authored
gcc/testsuite/ PR ada/115535 * gnat.dg/put_image1.adb: New test
-
Eric Botcazou authored
gcc/testsuite/ PR ada/114636 * gnat.dg/specs/generic_inst1.ads: New test.
-
Pan Li authored
Form 1: #define DEF_SAT_S_TRUNC_FMT_1(WT, NT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_1 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN <= x && x <= (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_arith_data.h: Add test data for SAT_TRUNC. * gcc.target/riscv/sat_s_trunc-1-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-1-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-1-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-1-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-1-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-1-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-1-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-1-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-1-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-1-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-1-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-1-i64-to-i8.c: New test. Signed-off-by:
Pan Li <pan2.li@intel.com>
-
Pan Li authored
This patch would like to implement the sstrunc for scalar signed integer. Form 1: #define DEF_SAT_S_TRUNC_FMT_1(WT, NT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_1 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN <= x && x <= (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } DEF_SAT_S_TRUNC_FMT_1(int64_t, int32_t, INT32_MIN, INT32_MAX) Before this patch: 10 │ sat_s_trunc_int64_t_to_int32_t_fmt_1: 11 │ li a5,1 12 │ slli a5,a5,31 13 │ li a4,-1 14 │ add a5,a0,a5 15 │ srli a4,a4,32 16 │ bgtu a5,a4,.L2 17 │ sext.w a0,a0 18 │ ret 19 │ .L2: 20 │ srai a5,a0,63 21 │ li a0,-2147483648 22 │ xor a0,a0,a5 23 │ not a0,a0 24 │ ret After this patch: 10 │ sat_s_trunc_int64_t_to_int32_t_fmt_1: 11 │ li a5,-2147483648 12 │ xori a3,a5,-1 13 │ slt a4,a0,a3 14 │ slt a5,a5,a0 15 │ and a5,a4,a5 16 │ srai a4,a0,63 17 │ xor a4,a4,a3 18 │ addi a3,a5,-1 19 │ neg a5,a5 20 │ and a4,a4,a3 21 │ and a0,a0,a5 22 │ or a0,a0,a4 23 │ sext.w a0,a0 24 │ ret The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_expand_sstrunc): Add new func decl to expand SAT_TRUNC. * config/riscv/riscv.cc (riscv_expand_sstrunc): Add new func impl to expand SAT_TRUNC. * config/riscv/riscv.md (sstrunc<mode><anyi_double_truncated>2): Add new pattern for double truncation. (sstrunc<mode><anyi_quad_truncated>2): Ditto but for quad. (sstrunc<mode><anyi_oct_truncated>2): Ditto but for oct. Signed-off-by:
Pan Li <pan2.li@intel.com>
-
Pan Li authored
When try to matching saturation related pattern on PHI node, we may have to try each pattern for all phi node of bb. Aka: for each PHI node in bb: gphi *phi = xxx; try_match_sat_add (, phi); try_match_sat_sub (, phi); try_match_sat_trunc (, phi); The PHI node will be removed if one of the above 3 sat patterns are matched. There will be a problem that, for example, sat_add is matched and then the phi is removed(freed), and the next 2 sat_sub and sat_trunc will depend on the removed(freed) phi node. This patch would like to fix this consume after phi node released issue. To ensure at most one pattern of the above will be matched. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Rename to... (build_saturation_binary_arith_call_and_replace): ...this. (build_saturation_binary_arith_call_and_insert): ...this. (match_unsigned_saturation_add): Leverage renamed func. (match_unsigned_saturation_sub): Ditto. (match_saturation_add): Return bool on matched and leverage renamed func. (match_saturation_sub): Ditto. (match_saturation_trunc): Ditto. (math_opts_dom_walker::after_dom_children): Ensure at most one pattern will be matched for each phi node. Signed-off-by:
Pan Li <pan2.li@intel.com>
-
Pan Li authored
This patch would like to support the form 1 of the scalar signed integer SAT_TRUNC. Aka below example: Form 1: #define DEF_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_1 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN <= x && x <= (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } DEF_SAT_S_TRUNC_FMT_1(int64_t, int32_t, INT32_MIN, INT32_MAX) Before this patch: 4 │ __attribute__((noinline)) 5 │ int32_t sat_s_trunc_int64_t_to_int32_t_fmt_1 (int64_t x) 6 │ { 7 │ int32_t trunc; 8 │ unsigned long x.0_1; 9 │ unsigned long _2; 10 │ int32_t _3; 11 │ _Bool _7; 12 │ int _8; 13 │ int _9; 14 │ int _10; 15 │ 16 │ ;; basic block 2, loop depth 0 17 │ ;; pred: ENTRY 18 │ x.0_1 = (unsigned long) x_4(D); 19 │ _2 = x.0_1 + 2147483648; 20 │ if (_2 > 4294967295) 21 │ goto <bb 4>; [50.00%] 22 │ else 23 │ goto <bb 3>; [50.00%] 24 │ ;; succ: 4 25 │ ;; 3 26 │ 27 │ ;; basic block 3, loop depth 0 28 │ ;; pred: 2 29 │ trunc_5 = (int32_t) x_4(D); 30 │ goto <bb 5>; [100.00%] 31 │ ;; succ: 5 32 │ 33 │ ;; basic block 4, loop depth 0 34 │ ;; pred: 2 35 │ _7 = x_4(D) < 0; 36 │ _8 = (int) _7; 37 │ _9 = -_8; 38 │ _10 = _9 ^ 2147483647; 39 │ ;; succ: 5 40 │ 41 │ ;; basic block 5, loop depth 0 42 │ ;; pred: 3 43 │ ;; 4 44 │ # _3 = PHI <trunc_5(3), _10(4)> 45 │ return _3; 46 │ ;; succ: EXIT 47 │ 48 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ int32_t sat_s_trunc_int64_t_to_int32_t_fmt_1 (int64_t x) 6 │ { 7 │ int32_t _3; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _3 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _3; 13 │ ;; succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test with pr116861-1.c failed. * The x86 bootstrap test. * The x86 fully regression test. The failed pr116861-1.c ice will be fixed in underlying patch, as it just trigger one existing bug. gcc/ChangeLog: * match.pd: Add case 1 matching pattern for signed SAT_TRUNC. * tree-ssa-math-opts.cc (gimple_signed_integer_sat_trunc): Add new decl for signed SAT_TRUNC. (match_saturation_trunc): Add new func impl to try SAT_TRUNC pattern on phi node. (math_opts_dom_walker::after_dom_children): Add match_saturation_trunc for phi node iteration. Signed-off-by:
Pan Li <pan2.li@intel.com>
-
Jan Beulich authored
Commit a79d13a0 ("i386: Fix aes/vaes patterns [PR114576]") correctly said "..., but we need to emit {evex} prefix in the assembly if AES ISA is not enabled". Yet it did so only for the TARGET_AES insns. Going from the alternative chosen in the TARGET_VAES insns isn't quite right: If AES is (also) enabled, EVEX encoding would needlessly be forced. gcc/ * config/i386/sse.md (vaesdec_<mode>, vaesdeclast_<mode>, vaesenc_<mode>, vaesenclast_<mode>): Replace which_alternative check by TARGET_AES one.
-
Soumya AR authored
Currently, we vectorize CTZ for SVE by using the following operation: .CTZ (X) = (PREC - 1) - .CLZ (X & -X) Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in PR109498. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by:
Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: PR target/109498 * config/aarch64/aarch64-sve.md (ctz<mode>2): Added pattern to expand CTZ to RBIT + CLZ for SVE. gcc/testsuite/ChangeLog: PR target/109498 * gcc.target/aarch64/sve/ctz.c: New test.
-
Palmer Dabbelt authored
> We have cheap logical ops, so let's just move this back to the default > to take advantage of the standard branch/op hueristics. > > gcc/ChangeLog: > > PR target/116615 > * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove. > --- > There's a bunch more discussion in the bug, but it's starting to smell > like this was just a holdover from MIPS (where maybe it also shouldn't > be set). I haven't tested this, but I figured I'd send the patch to get > a little more visibility. > > I guess we should also kick off something like a SPEC run to make sure > there's no regressions? So as I noted earlier, this appears to be a nice win on the BPI. Testsuite fallout is minimal -- just the one SFB related test tripping at -Os that was also hit by Andrew P's work. After looking at it more closely, the SFB codegen and the codegen after Andrew's work should be equivalent assuming two independent ops can dispatch together. The test actually generates sensible code at -Os. It's the -Os in combination with the -fno-ssa-phiopt that causes problems. I think the best thing to do here is just skip at -Os. That still keeps a degree of testing the SFB path. Tested successfully in my tester. But will wait for the pre-commit tester to render a verdict before moving forward. PR target/116615 gcc/ * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove. gcc/testsuite/ * gcc.target/riscv/cset-sext-sfb.c: Skip for -Os. Co-authored-by:
Jeff Law <jlaw@ventanamicro.com>
-
Xi Ruoyao authored
An earlier version of the patch (lacking the regeneration of some files) was pushed. Fix it up now. gcc/ChangeLog: * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch.opt.urls: Regenerate.
-
Andre Vehreschild authored
The parser was greadily taking the substring ref as an array ref because an array_spec was present. Fix this by only parsing the coarray (pseudo) ref when no regular array is present. gcc/fortran/ChangeLog: PR fortran/51815 * array.cc (gfc_match_array_ref): Only parse coarray part of ref. * match.h (gfc_match_array_ref): Add flag. * primary.cc (gfc_match_varspec): Request only coarray ref parsing when no regular array is present. Report error on unexpected additional ref. gcc/testsuite/ChangeLog: * gfortran.dg/pr102532.f90: Fix dg-errors: Add new error. * gfortran.dg/coarray/substring_1.f90: New test.
-
Pan Li authored
Form 4: #define DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_sub_##T##_fmt_4 (T x, T y) \ { \ T minus; \ bool overflow = __builtin_sub_overflow (x, y, &minus); \ return !overflow ? minus : x < 0 ? MIN : MAX; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_sub-4-i16.c: New test. * gcc.target/riscv/sat_s_sub-4-i32.c: New test. * gcc.target/riscv/sat_s_sub-4-i64.c: New test. * gcc.target/riscv/sat_s_sub-4-i8.c: New test. * gcc.target/riscv/sat_s_sub-run-4-i16.c: New test. * gcc.target/riscv/sat_s_sub-run-4-i32.c: New test. * gcc.target/riscv/sat_s_sub-run-4-i64.c: New test. * gcc.target/riscv/sat_s_sub-run-4-i8.c: New test. Signed-off-by:
Pan Li <pan2.li@intel.com>
-
Pan Li authored
Form 3: #define DEF_SAT_S_SUB_FMT_3(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_sub_##T##_fmt_3 (T x, T y) \ { \ T minus; \ bool overflow = __builtin_sub_overflow (x, y, &minus); \ return overflow ? x < 0 ? MIN : MAX : minus; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_sub-3-i16.c: New test. * gcc.target/riscv/sat_s_sub-3-i32.c: New test. * gcc.target/riscv/sat_s_sub-3-i64.c: New test. * gcc.target/riscv/sat_s_sub-3-i8.c: New test. * gcc.target/riscv/sat_s_sub-run-3-i16.c: New test. * gcc.target/riscv/sat_s_sub-run-3-i32.c: New test. * gcc.target/riscv/sat_s_sub-run-3-i64.c: New test. * gcc.target/riscv/sat_s_sub-run-3-i8.c: New test. Signed-off-by:
Pan Li <pan2.li@intel.com>
-
Pan Li authored
This patch would like to support the form 3 and form 4 of the scalar signed integer SAT_SUB. Aka below example: Form 3: #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_3 (T x, T y) \ { \ T sum; \ bool overflow = __builtin_add_overflow (x, y, &sum); \ return overflow ? x < 0 ? MIN : MAX : sum; \ } Form 4: #define DEF_SAT_S_SUB_FMT_4(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_sub_##T##_fmt_4 (T x, T y) \ { \ T minus; \ bool overflow = __builtin_sub_overflow (x, y, &minus); \ return !overflow ? minus : x < 0 ? MIN : MAX; \ } DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX); Before this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_sub_int8_t_fmt_3 (int8_t x, int8_t y) 6 │ { 7 │ signed char _1; 8 │ signed char _2; 9 │ int8_t _3; 10 │ __complex__ signed char _6; 11 │ _Bool _8; 12 │ signed char _9; 13 │ signed char _10; 14 │ signed char _11; 15 │ 16 │ ;; basic block 2, loop depth 0 17 │ ;; pred: ENTRY 18 │ _6 = .SUB_OVERFLOW (x_4(D), y_5(D)); 19 │ _2 = IMAGPART_EXPR <_6>; 20 │ if (_2 != 0) 21 │ goto <bb 4>; [50.00%] 22 │ else 23 │ goto <bb 3>; [50.00%] 24 │ ;; succ: 4 25 │ ;; 3 26 │ 27 │ ;; basic block 3, loop depth 0 28 │ ;; pred: 2 29 │ _1 = REALPART_EXPR <_6>; 30 │ goto <bb 5>; [100.00%] 31 │ ;; succ: 5 32 │ 33 │ ;; basic block 4, loop depth 0 34 │ ;; pred: 2 35 │ _8 = x_4(D) < 0; 36 │ _9 = (signed char) _8; 37 │ _10 = -_9; 38 │ _11 = _10 ^ 127; 39 │ ;; succ: 5 40 │ 41 │ ;; basic block 5, loop depth 0 42 │ ;; pred: 3 43 │ ;; 4 44 │ # _3 = PHI <_1(3), _11(4)> 45 │ return _3; 46 │ ;; succ: EXIT 47 │ 48 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_sub_int8_t_fmt_3 (int8_t x, int8_t y) 6 │ { 7 │ int8_t _3; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call] 12 │ return _3; 13 │ ;; succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add case 3 matching pattern for signed SAT_SUB. Signed-off-by:
Pan Li <pan2.li@intel.com>
-
Jakub Jelinek authored
On Mon, Oct 07, 2024 at 10:32:57AM +0200, Richard Biener wrote: > > They are implementation defined, -1, 0, 1, 2 is defined by libstdc++: > > using type = signed char; > > enum class _Ord : type { equivalent = 0, less = -1, greater = 1 }; > > enum class _Ncmp : type { _Unordered = 2 }; > > https://eel.is/c++draft/cmp#categories.pre-1 documents them as > > enum class ord { equal = 0, equivalent = equal, less = -1, greater = 1 }; // exposition only > > enum class ncmp { unordered = -127 }; // exposition only > > and now looking at it, LLVM's libc++ takes that literally and uses > > -1, 0, 1, -127. One can't use <=> operator without including <compare> > > which provides the enums, so I think if all we care about is libstdc++, > > then just hardcoding -1, 0, 1, 2 is fine, if we want to also optimize > > libc++ when used with gcc, we could support -1, 0, 1, -127 as another > > option. > > Supporting arbitrary 4 values doesn't make sense, at least on x86 the > > only reason to do the conversion to int in an optab is a good sequence > > to turn the flag comparisons to -1, 0, 1. So, either we do nothing > > more than the patch, or add handle both 2 and -127 for unordered, > > or add support for arbitrary value for the unordered case except > > -1, 0, 1 (then -1 could mean signed int, 1 unsigned int, 0 do the jumps > > and any other value what should be returned for unordered. Here is an incremental patch which adds support for (almost) arbitrary unordered constant value. It changes the .SPACESHIP and spaceship<mode>4 optab conventions, so 0 means use branches, floating point, -1, 0, 1, 2 results consumed by tree-ssa-math-opts.cc emitted comparisons, -1 means signed int comparisons, -1, 0, 1 results, 1 means unsigned int comparisons, -1, 0, 1 results, and for constant other than -1, 0, 1 which fit into [-128, 127] converted to the PHI type are otherwise specified as the last argument (then it is -1, 0, 1, C results). 2024-10-08 Jakub Jelinek <jakub@redhat.com> PR middle-end/116896 * tree-ssa-math-opts.cc (optimize_spaceship): Handle unordered values other than 2, but they still need to be signed char range possibly converted to the PHI result and can't be in [-1, 1] range. Use last .SPACESHIP argument of 1 for unsigned int comparisons, -1 for signed int, 0 for floating point branches and any other for floating point with that value as unordered. * config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Use op2 rather const2_rtx if op2 is not const0_rtx for unordered result. (ix86_expand_int_spaceship): Change INTVAL (op2) == 1 tests to INTVAL (op2) != -1. * doc/md.texi (spaceship@var{m}4): Document the above changes. * gcc.target/i386/pr116896.c: New test.
-
Eric Botcazou authored
This removes the loop trying to find a pointer mode among the integer modes, which is obsolete and does not work on platforms where pointers have unusual size like MSP430 or special semantics like Morello. gcc/ada/ChangeLog: PR ada/116498 * gcc-interface/decl.cc (validate_size): Use the size of the default pointer mode as the minimum size for access types and fat pointers.
-
Eric Botcazou authored
It is very confusing for the user because it does not make any reference to the source code but only to details of the underlying implementation. gcc/ada/ChangeLog: * gcc-interface/trans.cc (Raise_Error_to_gnu) <CE_Invalid_Data>: Do not the generate range information if the value is a call to a Rep_To_Pos function.
-
Olivier Hainque authored
The initial signal handling code introduced for aarch64-android overlooked details of the tasking runtime, not in the initial testing perimeter. Specifically, a reference to __gnat_sigtramp from __gnat_error_handler, initially introduced for the arm port, was prevented if !arm on the grounds that other ports would rely on kernel CFI. aarch64-android does provide kernel CFI and __gnat_sigtramp was not provided for this configuration. But there is a similar reference from s-intman__android, which kicks in as soon as the tasking runtime gets activated, triggering link failures. Testing for more precise target specific parameters from Ada code is inconvenient and replicating the logic is not attractive in any case, so this change addresses the problem in the following fashion: - Always provide a __gnat_sigtramp entry point, common to the tasking and non-tasking signal handling code for all the Android configurations, - There (C code), from target definition macros, select a path that either routes directly to the actual signal handler or goes through the intermediate layer providing hand crafted CFI information which allows unwinding up to the interrupted code. - Similarily to what was done for VxWorks, move the arm specific definitions to a separate header file to make the general structure of the common C code easier to grasp, - Adjust the comments in the common sigtramp.h header to account for such an organisation possibility. gcc/ada/ChangeLog: * sigtramp-armdroid.c: Refactor into ... * sigtramp-android.c, sigtramp-android-asm.h: New files. * Makefile.rtl (arm/aarch64-android section): Add sigtramp-android.o to EXTRA_LIBGNAT_OBJS unconditionally. Add sigtramp.h and sigtramp-android-asm.h to EXTRA_LIBGNAT_SRCS. * init.c (android section, __gnat_error_handler): Defer to __gnat_sigramp unconditionally again. * sigtramp.h: Adjust comments to allow neutral signal handling relays, merely forwarding to the underlying handler without any intermediate CFI magic.
-