- Nov 22, 2024
-
-
Lulu Cheng authored
TARGET_ASM_ALIGNED_{HI,SI,QI}_OP are defined repeatedly and deleted. gcc/ChangeLog: * config/loongarch/loongarch-builtins.cc (loongarch_builtin_vectorized_function): Delete. (LARCH_GET_BUILTIN): Delete. * config/loongarch/loongarch-protos.h (loongarch_builtin_vectorized_function): Delete. * config/loongarch/loongarch.cc (TARGET_ASM_ALIGNED_HI_OP): Delete. (TARGET_ASM_ALIGNED_SI_OP): Delete. (TARGET_ASM_ALIGNED_DI_OP): Delete.
-
Haochen Jiang authored
Under -fno-omit-frame-pointer, %ebp will be used, which is the Solaris/x86 default. Both check %ebp and %esp to avoid error on that. gcc/testsuite/ChangeLog: PR target/117697 * gcc.target/i386/avx10_2-vmovd-1.c: Both check %esp and %ebp. * gcc.target/i386/avx10_2-vmovw-1.c: Ditto.
-
Lulu Cheng authored
[x]vldi.{b/h/w/d} is not implemented in LoongArch. Use the macro [x]vrepli.{b/h/w/d} to replace. gcc/ChangeLog: * config/loongarch/lasx.md: Fixed. * config/loongarch/lsx.md: Fixed.
-
Xi Ruoyao authored
LoongArch: Make __builtin_lsx_vorn_v and __builtin_lasx_xvorn_v arguments and return values unsigned Align them with other vector bitwise builtins. This may break programs directly invoking __builtin_lsx_vorn_v or __builtin_lasx_xvorn_v, but doing so is not supported (as builtins are not documented, only intrinsics are documented and users should use them instead). gcc/ChangeLog: * config/loongarch/loongarch-builtins.cc (vorn_v, xvorn_v): Use unsigned vector modes. * config/loongarch/lsxintrin.h (__lsx_vorn_v): Cast arguments to v16u8. * config/loongarch/lasxintrin.h (__lasx_xvorn_v): Cast arguments to v32u8. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lsx/lsx-builtin.c (__lsx_vorn_v): Change arguments and return value to v16u8. * gcc.target/loongarch/vector/lasx/lasx-builtin.c (__lasx_xvorn_v): Change arguments and return value to v32u8.
-
GCC Administrator authored
-
- Nov 21, 2024
-
-
Jeff Law authored
As hinted out in the BZ, we were missing a left shift in the constant synthesis in the case where the upper 32 bits can be synthesized using a shNadd of the low 32 bits. This adjusts the synthesis to add the missing left shift and adjusts the cost to account for the additional instruction. Regression tested on riscv64-elf in my tester. Waiting for the pre-commit tester before moving forward. PR target/117690 gcc/ * config/riscv/riscv.cc (riscv_build_integer): Add missing left shift when using shNadd to derive upper 32 bits from lower 32 bits. gcc/testsuite * gcc.target/riscv/pr117690.c: New test. * gcc.target/riscv/synthesis-13.c: Adjust expected output.
-
Arsen Arsenović authored
While hacking on an unrelated change, I noticed that __has_include_next hasn't been documented at all. This patch adds it to the __has_include manual node. gcc/ChangeLog: * doc/cpp.texi (__has_include): Document __has_include_next also. (Conditional Syntax): Mention __has_include_next in the description for the __has_include menu entry.
-
Joseph Myers authored
Cases of void parameters, other than a parameter list of (void) (or equivalent with a typedef for void) in its entirety, have been made a constraint violation in C2Y (N3344 alternative 1 was adopted), as part of a series of changes to eliminate unnecessary undefined behavior by turning it into constraint violations, implementation-defined behavior or something else with stricter bounds on what behavior is allowed. Previously, these were implicitly undefined behavior (see DR#295), with only some cases listed in Annex J as undefined (but even those cases not having wording in the normative text to make them explicitly undefined). As discussed in bug 114816, GCC is not entirely consistent about diagnosing such usages; unnamed void parameters get errors when not the entire parameter list, while qualified and register void (the cases listed in Annex J) get errors as a single unnamed parameter, but named void parameters are accepted with a warning (in a declaration that's not a definition; it's not possible to define a function with incomplete parameter types). Following C2Y, make all these cases into errors. The errors are not conditional on the standard version, given that this was previously implicit undefined behavior. Since it wasn't possible anyway to define such functions, only declare them without defining them (or otherwise use such parameters in function type names that can't correspond to any defined function), hopefully the risks of compatibility issues are small. Bootstrapped with no regressions for x86-64-pc-linux-gnu. PR c/114816 gcc/c/ * c-decl.cc (grokparms): Do not warn for void parameter type here. (get_parm_info): Give errors for void parameters even when named. gcc/testsuite/ * gcc.dg/c2y-void-parm-1.c: New test. * gcc.dg/noncompile/920616-2.c, gcc.dg/noncompile/921116-1.c, gcc.dg/parm-incomplete-1.c: Update expected diagnostics.
-
David Malcolm authored
gcc/ChangeLog: PR bootstrap/117677 * json-parsing.cc (selftest::test_parse_number): Replace ASSERT_EQ of 'double' values with ASSERT_NEAR. Eliminate ASSERT_PRINT_EQ for such values. * selftest.h (ASSERT_NEAR): New. (ASSERT_NEAR_AT): New. Signed-off-by:
David Malcolm <dmalcolm@redhat.com>
-
David Malcolm authored
I wrote this support file to help me debug Tcl issues in the testsuite. Adding a call to: print_stack_backtrace somewhere in a .exp file (along with "load_lib print-stack.exp") leads to the interpreter printing a backtrace in a form that e.g. Emacs can consume, with filename:linenum: lines, and quoting the line of .exp source code. Fer example, adding a print_stack_backtrace to scansarif.exp in run-sarif-pytest I get this output: VVV START OF BACKTRACE VVV /home/david/coding/gcc-newgit/src/gcc/testsuite/lib/scansarif.exp:142: frame 16 in proc print_stack_backtrace 142 | print_stack_backtrace <proc>: frame 15 in proc run-sarif-pytest <eval>: frame 14 in proc dg-final-proc /usr/share/dejagnu/dg.exp:851: frame 13 in proc dg-final-proc 851 | if {[catch "dg-final-proc $prog" errmsg]} { <eval>: frame 12 in proc saved-dg-test /home/david/coding/gcc-newgit/src/gcc/testsuite/lib/gcc-dg.exp:1080: frame 11 in proc saved-dg-test 1080 | if { [ catch { eval saved-dg-test $args } errmsg ] } { /usr/share/dejagnu/dg.exp:559: frame 10 in proc dg-test 559 | dg-test $testcase $options ${default-extra-options} /home/david/coding/gcc-newgit/src/gcc/testsuite/gcc.dg/sarif-output/sarif-output.exp:28: frame 9 28 | dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] "" "" <eval>: frame 8 <eval>: frame 7 /usr/share/dejagnu/runtest.exp:1460: frame 6 1460 | if { [catch "uplevel #0 source $test_file_name"] == 1 } { /usr/share/dejagnu/runtest.exp:1886: frame 5 in proc dg-runtest 1886 | runtest $test_name /usr/share/dejagnu/runtest.exp:1845: frame 4 in proc dg-runtest 1845 | foreach test_name [lsort [find ${dir} *.exp]] { /usr/share/dejagnu/runtest.exp:1788: frame 3 in proc dg-runtest 1788 | foreach dir "${test_top_dirs}" { /usr/share/dejagnu/runtest.exp:1669: frame 2 in proc dg-runtest 1669 | foreach pass $multipass { /usr/share/dejagnu/runtest.exp:1619: frame 1 in proc dg-runtest 1619 | foreach current_target $target_list { ^^^ END OF BACKTRACE ^^^ and can click on the lines in Emacs's compilation buffer to take me to the relevant places. I found this made it *much* easier to debug my .exp files. That said, I'm uncomfortable with Tcl, and so (a) there may be a better way of doing this (b) I may have made mistakes gcc/testsuite/ChangeLog: * lib/print-stack.exp: New file. Signed-off-by:
David Malcolm <dmalcolm@redhat.com>
-
Christoph Müllner authored
Recently added test cases assume optimized code generation for certain vectorized code. However, this optimization might not be applied if the backends don't support the optimized permuation. The tests are confirmed to work on aarch64 and x86-64, so this patch restricts the tests accordingly. Tested on x86-64. PR117728 gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/satd-hadamard.c: Restrict to aarch64 and x86-64. * gcc.dg/tree-ssa/vector-8.c: Likewise. * gcc.dg/tree-ssa/vector-9.c: Likewise. Signed-off-by:
Christoph Müllner <christoph.muellner@vrull.eu>
-
Jason Merrill authored
We weren't writing out the definition of an inline variable, so the importer either got an undefined symbol or 0. gcc/cp/ChangeLog: * module.cc (has_definition): Also true for inline vars. gcc/testsuite/ChangeLog: * g++.dg/modules/inline-1_a.C: New test. * g++.dg/modules/inline-1_b.C: New test.
-
Jason Merrill authored
21_strings/basic_string/operations/contains/nonnull.cc was failing because the module was built with debug markers and the testcase was built not expecting debug markers, so we crashed in lower_stmt. Let's accommodate this by discarding debug marker statements we don't want. gcc/cp/ChangeLog: * module.cc (trees_in::core_vals) [STATEMENT_LIST]: Skip DEBUG_BEGIN_STMT if !MAY_HAVE_DEBUG_MARKER_STMTS.
-
Jason Merrill authored
In 20_util/function_objects/mem_fn/constexpr.cc we start to instantiate _Mem_fn_base's friend declaration of _Bind_check_arity before we've loaded the namespace-scope declaration, so lookup_imported_hidden_friend doesn't find it. But then we load the namespace-scope declaration in lookup_template_class during substitution, and so when we get around to pushing the result of substitution, they conflict. Fixed by calling lazy_load_pendings in lookup_imported_hidden_friend. gcc/cp/ChangeLog: * name-lookup.cc (lookup_imported_hidden_friend): Call lazy_load_pendings.
-
Georg-Johann Lay authored
This patch improves the 4-byte ASHIFT insns. 1) It adds a "r,r,C15" alternative for improved long << 15. 2) It adds 3-operand alternatives (depending on options) and splits them after peephole2 / before avr-fuse-move into a 3-operand byte shift and a 2-operand residual bit shift. For better control, it introduces new option -msplit-bit-shift that's activated at -O2 and higher per default. 2) is even performed with -Os, but not with -Oz. PR target/117726 gcc/ * config/avr/avr.opt (-msplit-bit-shift): Add new optimization option. * common/config/avr/avr-common.cc (avr_option_optimization_table) [OPT_LEVELS_2_PLUS]: Turn on -msplit-bit-shift. * config/avr/avr.h (machine_function.n_avr_fuse_add_executed): New bool component. * config/avr/avr.md (attr "isa") <2op, 3op>: Add new values. (attr "enabled"): Handle them. (ashlsi3, *ashlsi3, *ashlsi3_const): Add "r,r,C15" alternative. Add "r,0,C4l" and "r,r,C4l" alternatives (depending on 2op / 3op). (define_split) [avr_split_bit_shift]: Add 2 new ashift:ALL4 splitters. (define_peephole2) [ashift:ALL4]: Add (match_dup 3) so that the scratch won't overlap with the output operand of the matched insn. (*ashl<mode>3_const_split): Remove unused ashift:ALL4 splitter. * config/avr/avr-passes.cc (emit_valid_insn) (emit_valid_move_clobbercc): Move out of anonymous namespace. (make_avr_pass_fuse_add) <gate>: Don't override. <execute>: Set n_avr_fuse_add_executed according to func->machine->n_avr_fuse_add_executed. (pass_data avr_pass_data_split_after_peephole2): New object. (avr_pass_split_after_peephole2): New rtl_opt_pass. (avr_emit_shift): New static function. (avr_shift_is_3op, avr_split_shift_p, avr_split_shift) (make_avr_pass_split_after_peephole2): New functions. * config/avr/avr-passes.def (avr_pass_split_after_peephole2): Insert new pass after pass_peephole2. * config/avr/avr-protos.h (n_avr_fuse_add_executed, avr_shift_is_3op, avr_split_shift_p) (avr_split_shift, avr_optimize_size_level) (make_avr_pass_split_after_peephole2): New prototypes. * config/avr/avr.cc (n_avr_fuse_add_executed): New global variable. (avr_optimize_size_level): New function. (avr_set_current_function): Set n_avr_fuse_add_executed according to cfun->machine->n_avr_fuse_add_executed. (ashlsi3_out) [case 15]: Output optimized code for this offset. (avr_rtx_costs_1) [ASHIFT, SImode]: Adjust costs of oggsets 15, 16. * config/avr/constraints.md (C4a, C4r, C4r): New constraints. * pass_manager.h (pass_manager): Adjust comments.
-
Georg-Johann Lay authored
gcc/ * config/avr/avr-passes.cc (absint_t::dump): Fix missing newline in dump.
-
Jeff Law authored
So much like my patch from last week, this removes alternatives that create multiple instructions that we really should have never needed. In this case it fixes one of two bugs in pr116590. In particular we don't want vmvNr instructions for thead-vector. Those instructions were emitted as part of those two instruction sequences. I've tested this in my tester and assuming the pre-commit tester is happy, I'll push it to the trunk. PR target/116590 gcc * config/riscv/vector.md (pred_mul_<optab>mode_undef): Drop unnecessary alternatives. (pred_<madd_msub><mode>): Likewise. (pred_<macc_msac><mode>): Likewise. (pred_<madd_msub><mode>_scalar): Likewise. (pred_<macc_msac><mode>_scalar): Likewise. (pred_mul_neg_<optab><mode>_undef): Likewise. (pred_<nmsub_nmadd><mode>): Likewise. (pred_<nmsac_nmacc><mode>): Likewise. (pred_<nmsub_nmadd><mode>_scalar): Likewise. (pred_<nmsac_nmacc><mode>_scalar): Likewise. gcc/testsuite * gcc.target/riscv/pr116590.c: New test.
-
Pan Li authored
This patch would like to refactor the unsigned SAT_ADD pattern by: * Extract type check outside. * Extract common sub pattern. * Re-arrange the related match pattern forms together. * Remove unnecessary helper pattern matches. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Refactor sorts of unsigned SAT_ADD match pattern. Signed-off-by:
Pan Li <pan2.li@intel.com> Signed-off-by:
Pan Li <pan2.li@intel.com>
-
Tamar Christina authored
With the support to SLP only we now pass the VMAT through the SLP node, however the majority of the costing calls inside vectorizable_load and vectorizable_store do no pass the SLP node along. Due to this the backend costing never sees the VMAT for these cases anymore. Additionally the helper around record_stmt_cost when both SLP and stmt_vinfo are passed would only pass the SLP node along. However the SLP node doesn't contain all the info available in the stmt_vinfo and we'd have to go through the SLP_TREE_REPRESENTATIVE anyway. As such I changed the function to just Always pass both along. Unlike the VMAT changes, I don't believe there to be a correctness issue here but would minimize the number of churn in the backend costing until vectorizer costing as a whole is revisited in GCC 16. These changes re-enable the cost model on AArch64 and also correctly find the VMATs on loads and stores fixing testcases such as sve_iters_low_2.c. gcc/ChangeLog: * tree-vect-data-refs.cc (vect_get_data_access_cost): Pass NULL for SLP node. * tree-vect-stmts.cc (record_stmt_cost): Expose. (vect_get_store_cost, vect_get_load_cost): Extend with SLP node. (vectorizable_store, vectorizable_load): Pass SLP node to all costing. * tree-vectorizer.h (record_stmt_cost): Always pass both SLP node and stmt_vinfo to costing. (vect_get_load_cost, vect_get_store_cost): Extend with SLP node.
-
Rainer Orth authored
Solaris has modified versions of ASM_DECLARE_OBJECT_NAME on both i386 and sparc. When commit ce597aed Author: Ilya Enkovich <ilya.enkovich@intel.com> Date: Thu Aug 7 08:04:55 2014 +0000 elfos.h (ASM_DECLARE_OBJECT_NAME): Use decl size instead of type size. was applied, those were missed. At the same time, the testcase was restricted to Linux though there's nothing Linux-specific in there, so the error remained undetected. This patch fixes the definitions to match elfos.h and enables the test on Solaris, too. Bootstrapped without regressions on i386-pc-solaris2.11 and sparc-sun-solaris2.11. 2024-11-19 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/testsuite: PR target/102296 * gcc.target/i386/struct-size.c: Enable on *-*-solaris*. gcc: PR target/102296 * config/i386/sol2.h (ASM_DECLARE_OBJECT_NAME): Use decl size instead of type size. * config/sparc/sol2.h (ASM_DECLARE_OBJECT_NAME): Likewise.
-
Christoph Müllner authored
This extends forwprop by yet another VEC_PERM optimization: It attempts to blend two isomorphic vector sequences by using the redundancy in the lane utilization in these sequences. This redundancy in lane utilization comes from the way how specific scalar statements end up vectorized: two VEC_PERMs on top, binary operations on both of them, and a final VEC_PERM to create the result. Here is an example of this sequence: v_in = {e0, e1, e2, e3} v_1 = VEC_PERM <v_in, v_in, {0, 2, 0, 2}> // v_1 = {e0, e2, e0, e2} v_2 = VEC_PERM <v_in, v_in, {1, 3, 1, 3}> // v_2 = {e1, e3, e1, e3} v_x = v_1 + v_2 // v_x = {e0+e1, e2+e3, e0+e1, e2+e3} v_y = v_1 - v_2 // v_y = {e0-e1, e2-e3, e0-e1, e2-e3} v_out = VEC_PERM <v_x, v_y, {0, 1, 6, 7}> // v_out = {e0+e1, e2+e3, e0-e1, e2-e3} To remove the redundancy, lanes 2 and 3 can be freed, which allows to change the last statement into: v_out' = VEC_PERM <v_x, v_y, {0, 1, 4, 5}> // v_out' = {e0+e1, e2+e3, e0-e1, e2-e3} The cost of eliminating the redundancy in the lane utilization is that lowering the VEC PERM expression could get more expensive because of tighter packing of the lanes. Therefore this optimization is not done alone, but in only in case we identify two such sequences that can be blended. Once all candidate sequences have been identified, we try to blend them, so that we can use the freed lanes for the second sequence. On success we convert 2x (2x BINOP + 1x VEC_PERM) to 2x VEC_PERM + 2x BINOP + 2x VEC_PERM traded for 4x VEC_PERM + 2x BINOP. The implemented transformation reuses (rewrites) the statements of the first sequence and the last VEC_PERM of the second sequence. The remaining four statements of the second statment are left untouched and will be eliminated by DCE later. This targets x264_pixel_satd_8x4, which calculates the sum of absolute transformed differences (SATD) using Hadamard transformation. We have seen 8% speedup on SPEC's x264 on a 5950X (x86-64) and 7% speedup on an AArch64 machine. Bootstrapped and reg-tested on x86-64 and AArch64 (all languages). gcc/ChangeLog: * tree-ssa-forwprop.cc (struct _vec_perm_simplify_seq): New data structure to store analysis results of a vec perm simplify sequence. (get_vect_selector_index_map): Helper to get an index map from the provided vector permute selector. (recognise_vec_perm_simplify_seq): Helper to recognise a vec perm simplify sequence. (narrow_vec_perm_simplify_seq): Helper to pack the lanes more tight. (can_blend_vec_perm_simplify_seqs_p): Test if two vec perm sequences can be blended. (calc_perm_vec_perm_simplify_seqs): Helper to calculate the new permutation indices. (blend_vec_perm_simplify_seqs): Helper to blend two vec perm simplify sequences. (process_vec_perm_simplify_seq_list): Helper to process a list of vec perm simplify sequences. (append_vec_perm_simplify_seq_list): Helper to add a vec perm simplify sequence to the list. (pass_forwprop::execute): Integrate new functionality. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/satd-hadamard.c: New test. * gcc.dg/tree-ssa/vector-10.c: New test. * gcc.dg/tree-ssa/vector-8.c: New test. * gcc.dg/tree-ssa/vector-9.c: New test. * gcc.target/aarch64/sve/satd-hadamard.c: New test. Signed-off-by:
Christoph Müllner <christoph.muellner@vrull.eu>
-
H.J. Lu authored
Since GCC 15 defaults to -std=gnu23, add -std=gnu17 to apx-ndd-tls-1[ab].c to avoid: gcc.target/i386/apx-ndd-tls-1a.c: In function ‘k’: gcc.target/i386/apx-ndd-tls-1a.c:29:7: error: too many arguments to function ‘l’ gcc.target/i386/apx-ndd-tls-1a.c:25:5: note: declared here * gcc.target/i386/apx-ndd-tls-1a.c: -std=gnu17. * gcc.target/i386/apx-ndd-tls-1b.c: Likewise. Signed-off-by:
H.J. Lu <hjl.tools@gmail.com>
-
Rainer Orth authored
Since the switch to a C23 default, three libgomp tests FAIL on Solaris: FAIL: libgomp.c/alloc-pinned-3.c (test for excess errors) UNRESOLVED: libgomp.c/alloc-pinned-3.c compilation failed to produce executable FAIL: libgomp.c/alloc-pinned-4.c (test for excess errors) UNRESOLVED: libgomp.c/alloc-pinned-4.c compilation failed to produce executable FAIL: libgomp.c/alloc-pinned-6.c (test for excess errors) UNRESOLVED: libgomp.c/alloc-pinned-6.c compilation failed to produce executable Excess errors: /vol/gcc/src/hg/master/local/libgomp/testsuite/libgomp.c/alloc-pinned-3.c:104:3: error: too many arguments to function 'set_pin_limit' Fixed by adding the missing size argument to the stub functions. Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11. 2024-11-20 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> libgomp: * testsuite/libgomp.c/alloc-pinned-3.c [!__linux__] (set_pin_limit): Add size arg. * testsuite/libgomp.c/alloc-pinned-4.c [!__linux__] (set_pin_limit): Likewise. * testsuite/libgomp.c/alloc-pinned-6.c [!__linux__] (set_pin_limit): Likewise.
-
Jakub Jelinek authored
DWARF changed the language code assignment to be on a web page and after DWARF 5 has been published already 27 codes have been assigned. We have some of those already in the header, but most of them were missing, including one added just yesterday (DW_LANG_C23). Note, this is really post-DWARF 5 stuff rather than DWARF 6, because DWARF 6 plans to switch from DW_AT_language to DW_AT_language_{name,version} pair where we'll say DW_LNAME_C with 202311 version instead of this. 2024-11-21 Jakub Jelinek <jakub@redhat.com> * dwarf2.h (enum dwarf_source_language): Add comment where the post DWARF 5 additions start. Refresh list from https://dwarfstd.org/languages.html.
-
Richard Biener authored
While vectorizable_store was already checking alignment requirement of the stores and fall back to elementwise accesses if not honored the vectorizable_load path wasn't doing this. After the previous change to disregard alignment checking for VMAT_STRIDED_SLP in get_group_load_store_type this now tripped on power. PR tree-optimization/117720 * tree-vect-stmts.cc (vectorizable_load): For VMAT_STRIDED_SLP verify the choosen load type is OK with regard to alignment.
-
Jakub Jelinek authored
As C23 has been published already https://www.iso.org/standard/82075.html we don't need to say that it is expected to be published etc. Furthermore, standards.texi was still documenting that -std=gnu17 is the default. 2024-11-21 Jakub Jelinek <jakub@redhat.com> gcc/ * doc/invoke.texi (-std=c23): Adjust documentation for publication of the ISO/IEC 9899:2024 standard. * doc/standards.texi: Likewise. Document -std=gnu17 and -std=gnu23 options. Mention that -std=gnu23 rather than -std=gnu17 is now the default for C. gcc/c-family/ * c.opt (std=c23, std=gnu23, std=iso9899:2024): Adjust description for publication of the ISO/IEC 9899:2024 standard.
-
Jakub Jelinek authored
The following patch optimizes spaceship followed by comparisons of the spaceship value even for floating point spaceship when NaNs can appear. operator<=> for this emits roughly signed char c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; else c = 2; and I believe the /* The optimization may be unsafe due to NaNs. */ comment just isn't true. Sure, the i == j comparison doesn't raise exceptions on qNaNs, but if one of the operands is qNaN, then i == j is false and i < j or i > j is then executed and raises exceptions even on qNaNs. And we can safely optimize say c == -1 comparison after the above into i < j, that also raises exceptions like before and handles NaNs the same way as the original. The only unsafe transormation would be c == 0 or c != 0, turning it into i == j or i != j wouldn't raise exception, so I'm not doing that optimization (but other parts of the compiler optimize the i < j comparison away anyway). Anyway, to match the HONOR_NANS case, we need to verify that the second comparison has true edge to the phi_bb (yielding there -1 or 1), it can't be the false edge because when NaNs are honored, the false edge is for both the case where the inverted comparison is true or when one of the operands is NaN. Similarly we need to ensure that the two non-equality comparisons are the opposite, while for -ffast-math we can in some cases get one comparison x >= 5.0 and the other x > 5.0 and it is fine, because NaN is UB, when NaNs are honored, they must be different to leave the unordered case with 2 value as the last one remaining. The patch also punts if HONOR_NANS and the phi has just 3 arguments instead of 4. When NaNs are honored, we also in some cases need to perform some comparison and then invert its result (so that exceptions are properly thrown and we get the correct result). 2024-11-21 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/94589 PR tree-optimization/117612 * tree-ssa-phiopt.cc (spaceship_replacement): Handle HONOR_NANS (TREE_TYPE (lhs1)) case when possible. * gcc.dg/pr94589-5.c: New test. * gcc.dg/pr94589-6.c: New test. * g++.dg/opt/pr94589-5.C: New test. * g++.dg/opt/pr94589-6.C: New test.
-
Jakub Jelinek authored
When working on the PR117612 fix, I've noticed a pasto in tree-ssa-phiopt.cc (spaceship_replacement). The code is if (absu_hwi (tree_to_shwi (arg2)) != 1) return false; if (e1->flags & EDGE_TRUE_VALUE) { if (tree_to_shwi (arg0) != 2 || absu_hwi (tree_to_shwi (arg1)) != 1 || wi::to_widest (arg1) == wi::to_widest (arg2)) return false; } else if (tree_to_shwi (arg1) != 2 || absu_hwi (tree_to_shwi (arg0)) != 1 || wi::to_widest (arg0) == wi::to_widest (arg1)) return false; where arg{0,1,2,3} are PHI args and wants to ensure that if e1 is a true edge, then arg0 is 2 and one of arg{1,2} is -1 and one is 1, otherwise arg1 is 2 and one of arg{0,2} is -1 and one is 1. But due to pasto in the latte case doesn't verify that arg0 is different from arg2, it could be both -1 or both 1 and we wouldn't punt. The wi::to_widest (arg0) == wi::to_widest (arg1) test is always false when we've made sure in the earlier conditions that arg1 is 2 and arg0 is -1 or 1, so never 2. 2024-11-21 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/94589 PR tree-optimization/117612 * tree-ssa-phiopt.cc (spaceship_replacement): Fix up a pasto in check when arg1 is 2.
-
Jakub Jelinek authored
The following patch adds u{,l,ll,imax}abs builtins, which just fold to ABSU_EXPR, similarly to how {,l,ll,imax}abs builtins fold to ABS_EXPR. 2024-11-21 Jakub Jelinek <jakub@redhat.com> PR c/117024 gcc/ * coretypes.h (enum function_class): Add function_c2y_misc enumerator. * builtin-types.def (BT_FN_UINTMAX_INTMAX, BT_FN_ULONG_LONG, BT_FN_ULONGLONG_LONGLONG): New DEF_FUNCTION_TYPE_1s. * builtins.def (DEF_C2Y_BUILTIN): Define. (BUILT_IN_UABS, BUILT_IN_UIMAXABS, BUILT_IN_ULABS, BUILT_IN_ULLABS): New builtins. * builtins.cc (fold_builtin_abs): Handle also folding of u*abs to ABSU_EXPR. (fold_builtin_1): Handle BUILT_IN_U{,L,LL,IMAX}ABS. gcc/lto/ChangeLog: * lto-lang.cc (flag_isoc2y): New variable. gcc/ada/ChangeLog: * gcc-interface/utils.cc (flag_isoc2y): New variable. gcc/testsuite/ * gcc.c-torture/execute/builtins/lib/abs.c (uintmax_t): New typedef. (uabs, ulabs, ullabs, uimaxabs): New functions. * gcc.c-torture/execute/builtins/uabs-1.c: New test. * gcc.c-torture/execute/builtins/uabs-1.x: New file. * gcc.c-torture/execute/builtins/uabs-1-lib.c: New file. * gcc.c-torture/execute/builtins/uabs-2.c: New test. * gcc.c-torture/execute/builtins/uabs-2.x: New file. * gcc.c-torture/execute/builtins/uabs-2-lib.c: New file. * gcc.c-torture/execute/builtins/uabs-3.c: New test. * gcc.c-torture/execute/builtins/uabs-3.x: New test. * gcc.c-torture/execute/builtins/uabs-3-lib.c: New test.
-
Kewen Lin authored
As the associated test case shows, signbit generated assembly is sub-optimal for _Float128 argument from memory on P8 LE. On P8 LE, p8swap pass puts an explicit AND -16 on the memory, which causes mode_dependent_address_p considers it's invalid to change its mode and combine fails to make use of the existing pattern signbit<SIGNBIT:mode>2_dm_mem. Considering it's always more efficient to make use of 8 bytes load and shift on P8 LE, this patch is to adjust the current expander and treat it specially. PR target/114567 gcc/ChangeLog: * config/rs6000/rs6000.md (expander signbit<FLOAT128:mode>2): Adjust. (*signbit<mode>2_dm_mem): Rename to ... (signbit<mode>2_dm_mem): ... this. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr114567.c: New test.
-
Kewen Lin authored
This patch is to adjust define_insn altivec_v{add,sub}uqm with standard names, as the associated test case shows, w/o this patch, it ends up with scalar {add,subf}c/{add,subf}e, the standard names help to exploit v{add,sub}uqm. gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vadduqm): Rename to ... (addv1ti3): ... this. (altivec_vsubuqm): Rename to ... (subv1ti3): ... this. * config/rs6000/rs6000-builtins.def (__builtin_altivec_vadduqm): Replace bif expander altivec_vadduqm with addv1ti3. (__builtin_altivec_vsubuqm): Replace bif expander altivec_vsubuqm with subv1ti3. gcc/testsuite/ChangeLog: * gcc.target/powerpc/p8vector-int128-3.c: New test.
-
Kewen Lin authored
When making a patch to adjust VECTOR_P8_VECTOR rs6000_vector enum, I noticed that V1TImode's mode attribute in VI_unit VECTOR_UNIT_ALTIVEC_P (V1TImode) is never true, since VECTOR_UNIT_ALTIVEC_P checks if vector_unit[V1TImode] is equal to VECTOR_ALTIVEC, but vector_unit[V1TImode] can only be VECTOR_NONE or VECTOR_P8_VECTOR, there is no chance to be VECTOR_ALTIVEC: rs6000_vector_unit[V1TImode] = (TARGET_P8_VECTOR) ? VECTOR_P8_VECTOR : VECTOR_NONE; By checking all uses of VI_unit, the used mode iterator is one of VI2, VI, VP_small and VP, none of them has V1TImode, so the entry for V1TImode is useless. I guessed it was designed to have one mode attribute to cover all integer vector modes, but later we separated V1TI handlings to its own patterns (those guarded with TARGET_VADDUQM). Anyway, this patch is to remove this useless and confusing entry. gcc/ChangeLog: * config/rs6000/altivec.md (mode attr for V1TI in VI_unit): Remove.
-
Kewen Lin authored
When making patch to replace TARGET_P8_VECTOR, I noticed for *eqv<BOOL_128:mode>3_internal1 unlike the other logical operations, we only exploited the vsx version. I think it is an oversight, this patch is to consider veqv as well. gcc/ChangeLog: * config/rs6000/rs6000.md (*eqv<BOOL_128:mode>3_internal1): Generate insn veqv if TARGET_ALTIVEC and operands are altivec_register_operand.
-
Kewen Lin authored
When working to get rid of mask bit OPTION_MASK_P8_VECTOR, I noticed that the check on ISA_3_0_MASKS_IEEE is actually to check TARGET_P9_VECTOR, since we check all three mask bits together and p9 vector guarantees p8 vector and vsx should be enabled. So this patch is to adjust this first as preparatory patch for the following patch to change all uses of OPTION_MASK_P8_VECTOR and TARGET_P8_VECTOR. gcc/ChangeLog: * config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_IEEE): Remove. * config/rs6000/rs6000.cc (rs6000_option_override_internal): Replace ISA_3_0_MASKS_IEEE check with TARGET_P9_VECTOR.
-
Kewen Lin authored
When I was making a patch to rework TARGET_P8_VECTOR, I noticed that there are some redundant checks and dead code related to TARGET_DIRECT_MOVE, so I made this patch as one separated preparatory patch, it consists of: - Check either TARGET_DIRECT_MOVE or TARGET_P8_VECTOR only according to the context, rather than checking both of them since they are actually the same (TARGET_DIRECT_MOVE is defined as TARGET_P8_VECTOR). - Simplify TARGET_VSX && TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE since direct move ensures VSX enabled. - Replace some TARGET_POWERPC64 && TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE_64BIT to simplify it. - Remove some dead code guarded with TARGET_DIRECT_MOVE but the condition never holds here. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_option_override_internal): Simplify TARGET_P8_VECTOR && TARGET_DIRECT_MOVE as TARGET_P8_VECTOR. (rs6000_output_move_128bit): Simplify TARGET_VSX && TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE. * config/rs6000/rs6000.h (TARGET_XSCVDPSPN): Simplify conditions TARGET_DIRECT_MOVE || TARGET_P8_VECTOR as TARGET_P8_VECTOR. (TARGET_XSCVSPDPN): Likewise. (TARGET_DIRECT_MOVE_128): Simplify TARGET_DIRECT_MOVE && TARGET_POWERPC64 as TARGET_DIRECT_MOVE_64BIT. (TARGET_VEXTRACTUB): Likewise. (TARGET_DIRECT_MOVE_64BIT): Simplify TARGET_P8_VECTOR && TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE. * config/rs6000/rs6000.md (signbit<mode>2, @signbit<mode>2_dm, *signbit<mode>2_dm_mem, floatsi<mode>2_lfiwax, floatsi<SFDF:mode>2_lfiwax_<QHI:mode>_mem_zext, floatunssi<mode>2_lfiwzx, float<QHI:mode><SFDF:mode>2, *float<QHI:mode><SFDF:mode>2_internal, floatuns<QHI:mode><SFDF:mode>2, *floatuns<QHI:mode><SFDF:mode>2_internal, p8_mtvsrd_v16qidi2, p8_mtvsrd_df, p8_xxpermdi_<mode>, reload_vsx_from_gpr<mode>, p8_mtvsrd_sf, reload_vsx_from_gprsf, p8_mfvsrd_3_<mode>, reload_gpr_from_vsx<mode>, reload_gpr_from_vsxsf, unpack<mode>_dm): Simplify TARGET_DIRECT_MOVE && TARGET_POWERPC64 as TARGET_DIRECT_MOVE_64BIT. (unpack<mode>_nodm): Simplify !TARGET_DIRECT_MOVE || !TARGET_POWERPC64 as !TARGET_DIRECT_MOVE_64BIT. (fix_trunc<mode>si2, fix_trunc<mode>si2_stfiwx, fix_trunc<mode>si2_internal): Simplify TARGET_P8_VECTOR && TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE. (fix_trunc<mode>si2_stfiwx, fixuns_trunc<mode>si2_stfiwx): Remove some dead code as the guard TARGET_DIRECT_MOVE there never holds. (fixuns_trunc<mode>si2_stfiwx): Change TARGET_P8_VECTOR with TARGET_DIRECT_MOVE which is a better fit. * config/rs6000/vsx.md (define_peephole2 for SFmode in GPR): Simplify TARGET_DIRECT_MOVE && TARGET_POWERPC64 as TARGET_DIRECT_MOVE_64BIT.
-
Torbjörn SVENSSON authored
Update test cases to use -mcpu=unset/-march=unset feature introduced in r15-3606-g7d6c6a0d15c. gcc/testsuite/ChangeLog: * g++.dg/opt/pr69175.C: Added option "-mcpu=unset". Signed-off-by:
Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
-
Torbjörn SVENSSON authored
Update test cases to use -mcpu=unset/-march=unset feature introduced in r15-3606-g7d6c6a0d15c. gcc/testsuite/ChangeLog: * gcc.target/arm/cortex-m55-nodsp-flag-hard.c: Added option "-march=unset". * gcc.target/arm/cortex-m55-nodsp-flag-softfp.c: Likewise. * gcc.target/arm/cortex-m55-nodsp-nofp-flag-softfp.c: Likesie. * gcc.target/arm/cortex-m55-nofp-flag-hard.c: Likewise. * gcc.target/arm/cortex-m55-nofp-flag-softfp.c: Likewise. * gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: Likewise. * gcc.target/arm/cortex-m55-nomve-flag-hard.c: Likewise. * gcc.target/arm/cortex-m55-nomve-flag-softfp.c: Likewise. * gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c: Likewise. * gcc.target/arm/cortex-m55-nomve.fp-flag-softfp.c: Likewise. Signed-off-by:
Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
-
Torbjörn SVENSSON authored
Update test cases to use -mcpu=unset/-march=unset feature introduced in r15-3606-g7d6c6a0d15c. gcc/testsuite/ChangeLog: * g++.dg/ext/pr57735.C: Use effective-target arm_cpu_xscale_arm. Signed-off-by:
Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
-
Torbjörn SVENSSON authored
Update test cases to use -mcpu=unset/-march=unset feature introduced in r15-3606-g7d6c6a0d15c. gcc/testsuite/ChangeLog: * g++.target/arm/mve/general-c++/nomve_fp_1.c: Added option "-mcpu=unset". Signed-off-by:
Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
-
Torbjörn SVENSSON authored
Update test cases to use -mcpu=unset/-march=unset feature introduced in r15-3606-g7d6c6a0d15c. gcc/testsuite/ChangeLog: * gcc.target/arm/vect-early-break-cbranch.c: Use effective-target arm_arch_v8a_hard. Signed-off-by:
Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
-