- Mar 05, 2025
-
-
Kyrylo Tkachov authored
The PARALLEL created in aarch64_evpc_dup is used to hold the lane number. It is not appropriate for it to have a vector mode. Other such uses use VOIDmode. Do this here as well. This avoids the risk of generic code treating the PARALLEL as trapping when it has floating-point mode. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by:
Kyrylo Tkachov <ktkachov@nvidia.com> PR rtl-optimization/119046 * config/aarch64/aarch64.cc (aarch64_evpc_dup): Use VOIDmode for PARALLEL.
-
Xi Ruoyao authored
They could be incorrectly reordered with store instructions like st.b because the RTL expression does not have a memory_operand or a (mem) expression. The incorrect reorder has been observed in openh264 LTO build. Expand them to a (mem) expression instead of unspec to fix the issue. Then we need to make loongarch_address_insns return 1 for ADDRESS_REG_REG because the constraint "R" expects this behavior, or the vldx instruction will be considered invalid by the register allocate pass and turned to add.d + vld. Apply the ADDRESS_REG_REG penalty in loongarch_address_cost instead, loongarch_rtx_costs should also call loongarch_address_cost instead of loongarch_address_insns then. Closes: https://github.com/cisco/openh264/issues/3857 gcc/ChangeLog: PR target/119084 * config/loongarch/lasx.md (UNSPEC_LASX_XVLDX): Remove. (lasx_xvldx): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_VLDX): Remove. (lsx_vldx): Remove. * config/loongarch/simd.md (QIVEC): New define_mode_iterator. (<simd_isa>_<x>vldx): New define_expand. * config/loongarch/loongarch.cc (loongarch_address_insns_1): New static function with most logic factored out from ... (loongarch_address_insns): ... here. Call loongarch_address_insns_1 with reg_reg_cost = 1. (loongarch_address_cost): Call loongarch_address_insns_1 with reg_reg_cost = la_addr_reg_reg_cost. gcc/testsuite/ChangeLog: PR target/119084 * gcc.target/loongarch/pr119084.c: New test.
-
- Mar 04, 2025
-
-
Jan Hubicka authored
Zen5 on some variants has false dependency on tzcnt, blsi, blsr and blsmsk instructions. Those can be tested by the following benchmark jh@shroud:~> cat ee.c int main() { int a = 10; int b = 0; for (int i = 0; i < 1000000000; i++) { asm volatile ("xor %0, %0": "=r" (b)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); } return 0; } jh@shroud:~> cat bmk.sh gcc ee.c -DBREAK -DINST=\"$1\" -O2 ; time ./a.out ; gcc ee.c -DINST=\"$1\" -O2 ; time ./a.out jh@shroud:~> sh bmk.sh tzcnt real 0m0.886s user 0m0.886s sys 0m0.000s real 0m0.886s user 0m0.886s sys 0m0.000s jh@shroud:~> sh bmk.sh blsi real 0m0.979s user 0m0.979s sys 0m0.000s real 0m2.418s user 0m2.418s sys 0m0.000s jh@shroud:~> sh bmk.sh blsr real 0m0.986s user 0m0.986s sys 0m0.000s real 0m2.422s user 0m2.421s sys 0m0.000s jh@shroud:~> sh bmk.sh blsmsk real 0m0.973s user 0m0.973s sys 0m0.000s real 0m2.422s user 0m2.422s sys 0m0.000s We already have runable that controls tzcnt together with lzcnt and popcnt. Since it seems that only tzcnt is affected I added new tunable to control tzcnt only. I also added splitters for blsi/blsr/blsmsk implemented analogously to existing splitter for lzcnt. The patch is neutral on SPEC. We produce blsi and blsr in some internal loops, but they usually have same destination as source. However it is good to break the dependency chain to avoid patogolical cases and it is quite cheap overall, so I think we want to enable this for generic. I will send followup patch for this. Bootstrapped/regtested x86_64-linux, will commit it shortly. gcc/ChangeLog: * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_TZCNT): New macro. (TARGET_AVOID_FALSE_DEP_FOR_BLS): New macro. * config/i386/i386.md (*bmi_blsi_<mode>): Add splitter for false dependency. (*bmi_blsi_<mode>_ccno): Add splitter for false dependency. (*bmi_blsi_<mode>_falsedep): New pattern. (*bmi_blsmsk_<mode>): Add splitter for false dependency. (*bmi_blsmsk_<mode>_falsedep): New pattern. (*bmi_blsr_<mode>): Add splitter for false dependency. (*bmi_blsr_<mode>_cmp): Add splitter for false dependency (*bmi_blsr_<mode>_cmp_falsedep): New pattern. * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT): New tune. (X86_TUNE_AVOID_FALSE_DEP_FOR_BLS): New tune. gcc/testsuite/ChangeLog: * gcc.target/i386/blsi.c: New test. * gcc.target/i386/blsmsk.c: New test. * gcc.target/i386/blsr.c: New test.
-
Jan Hubicka authored
The current implementation of fussion predicates misses some common fussion cases on zen and more recent cores. I added knobs for individual conditionals we test. 1) I split checks for fusing ALU with conditional operands when the ALU has memory operand. This seems to be supported by zen3+ and by tigerlake and coperlake (according to Agner Fog's manual) 2) znver4 and 5 supports fussion of ALU and conditional even if ALU has memory and immediate operands. This seems to be relatively important enabling 25% more fusions on gcc bootstrap. 3) no CPU supports fusing when ALU contains IP relative memory references. I added separate knob so we do not forger about this if this gets supoorted later. The patch does not solve the limitation of sched that fuse pairs must be adjacent on imput and the first operation must be signle-set. Fixing single-set is easy (I have separate patch for this), for non-adjacent pairs we need bigger surgery. To verify what CPU really does I made simpe test script. jh@ryzen3:~> cat fuse-test.c int b; const int z = 0; const int o = 1; int main() { int a = 1000000000; int b; int z = 0; int o = 1; asm volatile ("\n" ".L1234:\n" "nop\n" "subl %3, %0\n" "movl %0, %1\n" "cmpl %2, %1\n" "movl %0, %1\n" "test %1, %1\n" "nop\n" "jne .L1234":"=a"(a), "=m"(b) "=r"(b) : "m"(z), "m"(o), "i"(0), "i"(1), "0"(a) ); } jh@ryzen3:~> cat fuse-test.sh EVENT=ex_ret_fused_instr dotest() { gcc -O2 fuse-test.c $* -o fuse-cmp-imm-mem-nofuse perf stat -e $EVENT ./fuse-cmp-imm-mem-nofuse 2>&1 | grep $EVENT gcc -O2 fuse-test.c -DFUSE $* -o fuse-cmp-imm-mem-fuse perf stat -e $EVENT ./fuse-cmp-imm-mem-fuse 2>&1 | grep $EVENT } echo ALU with immediate dotest echo ALU with memory dotest -D MEM echo ALU with IP relative memory dotest -D MEM -D IPRELATIVE echo CMP with immediate dotest -D CMP echo CMP with memory dotest -D CMP -D MEM echo CMP with memory and immediate dotest -D CMP -D MEMIMM echo CMP with IP relative memory dotest -D CMP -D MEM -D IPRELATIVE echo TEST dotest -D TEST On zen5 I get: ALU with immediate 20,345 ex_ret_fused_instr:u 1,000,020,278 ex_ret_fused_instr:u ALU with memory 20,367 ex_ret_fused_instr:u 1,000,020,290 ex_ret_fused_instr:u ALU with IP relative memory 20,395 ex_ret_fused_instr:u 20,403 ex_ret_fused_instr:u CMP with immediate 20,369 ex_ret_fused_instr:u 1,000,020,301 ex_ret_fused_instr:u CMP with memory 20,314 ex_ret_fused_instr:u 1,000,020,341 ex_ret_fused_instr:u CMP with memory and immediate 20,372 ex_ret_fused_instr:u 1,000,020,266 ex_ret_fused_instr:u CMP with IP relative memory 20,382 ex_ret_fused_instr:u 20,369 ex_ret_fused_instr:u TEST 20,346 ex_ret_fused_instr:u 1,000,020,301 ex_ret_fused_instr:u IP relative memory seems to not be documented. On zen3/4 I get: ALU with immediate 20,263 ex_ret_fused_instr:u 1,000,020,051 ex_ret_fused_instr:u ALU with memory 20,255 ex_ret_fused_instr:u 1,000,020,056 ex_ret_fused_instr:u ALU with IP relative memory 20,253 ex_ret_fused_instr:u 20,266 ex_ret_fused_instr:u CMP with immediate 20,264 ex_ret_fused_instr:u 1,000,020,052 ex_ret_fused_instr:u CMP with memory 20,253 ex_ret_fused_instr:u 1,000,019,794 ex_ret_fused_instr:u CMP with memory and immediate 20,260 ex_ret_fused_instr:u 20,264 ex_ret_fused_instr:u CMP with IP relative memory 20,258 ex_ret_fused_instr:u 20,256 ex_ret_fused_instr:u TEST 20,261 ex_ret_fused_instr:u 1,000,020,048 ex_ret_fused_instr:u zen1 and 2 gets: ALU with immediate 21,610 ex_ret_fus_brnch_inst:u 21,697 ex_ret_fus_brnch_inst:u ALU with memory 21,479 ex_ret_fus_brnch_inst:u 21,747 ex_ret_fus_brnch_inst:u ALU with IP relative memory 21,623 ex_ret_fus_brnch_inst:u 21,684 ex_ret_fus_brnch_inst:u CMP with immediate 21,708 ex_ret_fus_brnch_inst:u 1,000,021,288 ex_ret_fus_brnch_inst:u CMP with memory 21,689 ex_ret_fus_brnch_inst:u 1,000,004,270 ex_ret_fus_brnch_inst:u CMP with memory and immediate 21,604 ex_ret_fus_brnch_inst:u 21,671 ex_ret_fus_brnch_inst:u CMP with IP relative memory 21,589 ex_ret_fus_brnch_inst:u 21,602 ex_ret_fus_brnch_inst:u TEST 21,600 ex_ret_fus_brnch_inst:u 1,000,021,233 ex_ret_fus_brnch_inst:u I tested the patch on zen3 and zen5 and spec2k17 and it seems neutral, however the number of fussion does go up. Bootstrapped/regtested x86_64-linux, I plan to commit it tomorrow. Honza gcc/ChangeLog: * config/i386/i386.h (TARGET_FUSE_ALU_AND_BRANCH_MEM): New macro. (TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM): New macro. (TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New macro. * config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Support non-single-set. (ix86_macro_fusion_pair_p): Allow ALU which only clobbers; be more careful about immediates; check TARGET_FUSE_ALU_AND_BRANCH_MEM, TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM, TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE; verify that we never use unsigned checks with inc/dec. * config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): New tune. (X86_TUNE_FUSE_ALU_AND_BRANCH_MEM): New tune. (X86_TUNE_FUSE_ALU_AND_BRANCH_MEM_IMM): New tune. (X86_TUNE_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New tune.
-
Tamar Christina authored
When the input is already a subreg and we try to make a paradoxical subreg out of it for copysign this can fail if it violates the subreg relationship. Use force_lowpart_subreg instead of lowpart_subreg to then force the results to a register instead of ICEing. gcc/ChangeLog: PR target/118892 * config/aarch64/aarch64.md (copysign<GPF:mode>3): Use force_lowpart_subreg instead of lowpart_subreg. gcc/testsuite/ChangeLog: PR target/118892 * gcc.target/aarch64/copysign-pr118892.c: New test.
-
Richard Sandiford authored
There was an embarrassing typo in the folding of BIT_NOT_EXPR for POLY_INT_CSTs: it used - rather than ~ on the poly_int. Not sure how that happened, but it might have been due to the way that ~x is implemented as -1 - x internally. gcc/ PR tree-optimization/118976 * fold-const.cc (const_unop): Use ~ rather than - for BIT_NOT_EXPR. * config/aarch64/aarch64.cc (aarch64_test_sve_folding): New function. (aarch64_run_selftests): Run it.
-
- Mar 03, 2025
-
-
Andrew Carlotti authored
Refactor the switcher classes into two separate classes: - sve_alignment_switcher takes the alignment switching functionality, and is used only for ABI correctness when defining sve structure types. - aarch64_target_switcher takes the rest of the functionality of aarch64_simd_switcher and sve_switcher, and gates simd/sve specific parts upon the specified feature flags. Additionally, aarch64_target_switcher now adds dependencies of the specified flags (which adds +fcma and +bf16 to some intrinsic declarations), and unsets current_target_pragma. This last change fixes an internal bug where we would sometimes add a user specified target pragma (stored in current_target_pragma) on top of an internally specified target architecture while initialising intrinsics with `#pragma GCC aarch64 "arm_*.h"`. As far as I can tell, this has no visible impact at the moment. However, the unintended target feature combinations lead to unwanted behaviour in an under-development patch. This also fixes a missing Makefile dependency, which was due to aarch64-sve-builtins.o incorrectly depending on the undefined $(REG_H). The correct $(REGS_H) dependency is added to the switcher's new source location. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (struct aarch64_extension_info): Add field. (aarch64_get_required_features): New. * config/aarch64/aarch64-builtins.cc (aarch64_simd_switcher::aarch64_simd_switcher): Rename to... (aarch64_target_switcher::aarch64_target_switcher): ...this, and extend to handle sve, nosimd and target pragmas. (aarch64_simd_switcher::~aarch64_simd_switcher): Rename to... (aarch64_target_switcher::~aarch64_target_switcher): ...this, and extend to handle sve, nosimd and target pragmas. (handle_arm_acle_h): Use aarch64_target_switcher. (handle_arm_neon_h): Rename switcher and pass explicit flags. (aarch64_general_init_builtins): Ditto. * config/aarch64/aarch64-protos.h (class aarch64_simd_switcher): Rename to... (class aarch64_target_switcher): ...this, and add new members. (aarch64_get_required_features): New prototype. * config/aarch64/aarch64-sve-builtins.cc (sve_switcher::sve_switcher): Delete (sve_switcher::~sve_switcher): Delete (sve_alignment_switcher::sve_alignment_switcher): New (sve_alignment_switcher::~sve_alignment_switcher): New (register_builtin_types): Use alignment switcher (init_builtins): Rename switcher. (handle_arm_neon_sve_bridge_h): Ditto. (handle_arm_sme_h): Ditto. (handle_arm_sve_h): Ditto, and use alignment switcher. * config/aarch64/aarch64-sve-builtins.h (class sve_switcher): Delete. (class sme_switcher): Delete. (class sve_alignment_switcher): New. * config/aarch64/t-aarch64 (aarch64-builtins.o): Add $(REGS_H). (aarch64-sve-builtins.o): Remove $(REG_H).
-
Richard Earnshaw authored
The code in gcc.target/unsigned-extend-1.c really should not need an unsigned extension operations when the optimizers are used. For Arm and thumb2 that is indeed the case, but for thumb1 code it gets more complicated as there are too many instructions for combine to look at. For thumb1 we end up with two redundant zero_extend patterns which are not removed: the first after the subtract instruction and the second of the final boolean result. We can partially fix this (for the second case above) by adding a new split pattern for LEU and GEU patterns which work because the two instructions for the [LG]EU pattern plus the redundant extension instruction are combined into a single insn, which we can then split using the 3->2 method back into the two insns of the [LG]EU sequence. Because we're missing the optimization for all thumb1 cases (not just those architectures with UXTB), I've adjust the testcase to detect all the idioms that we might use for zero-extending a value, namely: UXTB AND ...#255 (in thumb1 this would require a register to hold 255) LSL ... #24; LSR ... #24 but I've also marked this test as XFAIL for thumb1 because we can't yet eliminate the first of the two extend instructions. gcc/ * config/arm/thumb1.md (split patterns for GEU and LEU): New. gcc/testsuite: * gcc.target/arm/unsigned-extend-1.c: Expand check for any insn suggesting a zero-extend. XFAIL for thumb1 code.
-
- Mar 02, 2025
-
-
Jeff Law authored
I'm not sure if I goof'd this or if I merely upstreamed someone else's goof. Either way the long branch code isn't working correctly. We were using 'n' as the output modifier to negate the condition. But 'n' has a special meaning elsewhere, so when presented with a condition rather than what was expected, boom, the compiler ICE'd. Thankfully there's only a few places where we were using %n which I turned into %r. The BZ entry includes a good testcase, it just takes a long time to compile as it's trying to create the out-of-range scenario. I'm not including the testcase due to how long it takes, but I did test it locally to ensure it's working properly now. I'm sure that with a little bit of work I could create at testcase that worked before and fails with the trunk (by taking advantage of the fuzzyness in length computations). So I'm going to consider this a regression. Will push to the trunk after pre-commit testing does its thing. PR target/118934 gcc/ * config/riscv/corev.md (cv_branch): Adjust output template. (branch): Likewise. * config/riscv/riscv.md (branch): Likewise. * config/riscv/riscv.cc (riscv_asm_output_opcode): Handle 'r' rather than 'n'.
-
Jakub Jelinek authored
As can be seen in gcc/po/gcc.pot: #: config/avr/avr.cc:2754 #, c-format msgid "bad I/O address 0x" msgstr "" exgettext couldn't retrieve the whole format string in this case, because it uses a macro in the middle. output_operand_lossage is c-format function though, so we can't use %wx to print HOST_WIDE_INT, and HOST_WIDE_INT_PRINT_HEX_PURE is on some hosts %lx, on others %llx and on others %I64x so isn't really translatable that way. As Joseph mentioned in the PR, there is no easy way around this but go through a temporary buffer, which the following patch does. 2025-03-02 Jakub Jelinek <jakub@redhat.com> PR translation/118991 * config/avr/avr.cc (avr_print_operand): Print ival into a temporary buffer and use %s in output_operand_lossage to make the diagnostics translatable.
-
- Mar 01, 2025
-
-
Jan Dubiec authored
[PATCH] H8/300, libgcc: PR target/114222 For HImode call internal ffs() implementation instead of an external one When INT_TYPE_SIZE < BITS_PER_WORD gcc emits a call to an external ffs() implementation instead of a call to "__builtin_ffs()" – see function init_optabs() in <SRCROOT>/gcc/optabs-libfuncs.cc. External ffs() (which is usually the one from newlib) in turn calls __builtin_ffs() what causes infinite recursion and stack overflow. This patch overrides default gcc bahaviour for H8/300H (and newer) and provides a generic ffs() implementation for HImode. PR target/114222 gcc/ChangeLog: * config/h8300/h8300.cc (h8300_init_libfuncs): For HImode override calls to external ffs() (from newlib) with calls to __ffshi2() from libgcc. The implementation of ffs() in newlib calls __builtin_ffs() what causes infinite recursion and finally a stack overflow. libgcc/ChangeLog: * config/h8300/t-h8300: Add __ffshi2(). * config/h8300/ffshi2.c: New file.
-
Jan Dubiec authored
This patch fixes annoying -Wformat warnings when gcc is built on Windows/MinGW64. Instead of %ld it uses HOST_WIDE_INT_PRINT_DEC macro, just like many other targets do. PR target/109189 gcc/ChangeLog: * config/h8300/h8300.cc (h8300_print_operand): Replace %ld format strings with HOST_WIDE_INT_PRINT_DEC macro in order to silence -Wformat warnings when building on Windows/MinGW64.
-
- Feb 28, 2025
-
-
H.J. Lu authored
Move the TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P target hook from i386.h to i386.cc. * config/i386/i386.h (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P): Moved to ... * config/i386/i386.cc (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P): Here. Signed-off-by:
H.J. Lu <hjl.tools@gmail.com>
-
- Feb 27, 2025
-
-
Pan Li authored
This patch would like to fix one bug when expanding const vector for the interleave case. For example, we have: base1 = 151 step = 121 For vec_series, we will generate vector in format of v[i] = base + i * step. Then the vec_series will have below result for HImode, and we can find that the result overflow to the highest 8 bits of HImode. v1.b = {151, 255, 7, 0, 119, 0, 231, 0, 87, 1, 199, 1, 55, 2, 167, 2} Aka we expect v1.b should be: v1.b = {151, 0, 7, 0, 119, 0, 231, 0, 87, 0, 199, 0, 55, 0, 167, 0} After that it will perform the IOR with v2 for the base2(aka another series). v2.b = {0, 17, 0, 33, 0, 49, 0, 65, 0, 81, 0, 97, 0, 113, 0, 129} Unfortunately, the base1 + i * step1 in HImode may overflow to the high 8 bits, and the high 8 bits will pollute the v2 and result in incorrect value in const_vector. This patch would like to perform the overflow to smode check before the optimized interleave code generation. If overflow or VLA, it will fall back to the default merge approach. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/118931 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Add overflow to smode check and clean up highest bits if overflow. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr118931-run-1.c: New test. Signed-off-by:
Pan Li <pan2.li@intel.com>
-
Thomas Schwinge authored
... instead of 64 via 'gcc/defaults.h': MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (DImode) This fixes ICEs: [-FAIL: c-c++-common/pr111309-1.c -Wc++-compat (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-] [-FAIL:-]{+PASS:+} c-c++-common/pr111309-1.c -Wc++-compat (test for excess errors) [-UNRESOLVED:-]{+PASS:+} c-c++-common/pr111309-1.c -Wc++-compat [-compilation failed to produce executable-]{+execution test+} [-FAIL: c-c++-common/pr111309-1.c -std=gnu++17 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-] [-FAIL:-]{+PASS:+} c-c++-common/pr111309-1.c -std=gnu++17 (test for excess errors) [-UNRESOLVED:-]{+PASS:+} c-c++-common/pr111309-1.c -std=gnu++17 [-compilation failed to produce executable-]{+execution test+} [-FAIL: c-c++-common/pr111309-1.c -std=gnu++26 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-] [-FAIL:-]{+PASS:+} c-c++-common/pr111309-1.c -std=gnu++26 (test for excess errors) [-UNRESOLVED:-]{+PASS:+} c-c++-common/pr111309-1.c -std=gnu++26 [-compilation failed to produce executable-]{+execution test+} [-FAIL: c-c++-common/pr111309-1.c -std=gnu++98 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-] [-FAIL:-]{+PASS:+} c-c++-common/pr111309-1.c -std=gnu++98 (test for excess errors) [-UNRESOLVED:-]{+PASS:+} c-c++-common/pr111309-1.c -std=gnu++98 [-compilation failed to produce executable-]{+execution test+} [-FAIL: gcc.dg/torture/pr116480-1.c -O0 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-] [-FAIL:-]{+PASS:+} gcc.dg/torture/pr116480-1.c -O0 (test for excess errors) [-FAIL: gcc.dg/torture/pr116480-1.c -O1 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-] [-FAIL:-]{+PASS:+} gcc.dg/torture/pr116480-1.c -O1 (test for excess errors) PASS: gcc.dg/torture/pr116480-1.c -O2 (test for excess errors) PASS: gcc.dg/torture/pr116480-1.c -O3 -g (test for excess errors) PASS: gcc.dg/torture/pr116480-1.c -Os (test for excess errors) ..., where we ran into 'gcc_assert (icode != CODE_FOR_nothing);' in 'gcc/internal-fn.cc:expand_fn_using_insn' for '__int128' '__builtin_clzg' etc.: during RTL pass: expand [...]/c-c++-common/pr111309-1.c: In function 'clzI': [...]/c-c++-common/pr111309-1.c:69:10: internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268 0x120ec2cf internal_error(char const*, ...) [...]/gcc/diagnostic-global-context.cc:517 0x102c7c5b fancy_abort(char const*, int, char const*) [...]/gcc/diagnostic.cc:1722 0x109708eb expand_fn_using_insn [...]/gcc/internal-fn.cc:268 0x1098114f expand_internal_call(internal_fn, gcall*) [...]/gcc/internal-fn.cc:5273 0x1098114f expand_internal_call(gcall*) [...]/gcc/internal-fn.cc:5281 0x10594fc7 expand_call_stmt [...]/gcc/cfgexpand.cc:3049 [...] Likewise, as of commit e8ad697a "libstdc++: Use new type-generic built-ins in <bit> [PR118855]", the libstdc++ target library build ICEd in the same way. Additionally, this change fixes: [-FAIL:-]{+PASS:+} gcc.dg/pr105094.c (test for excess errors) ..., which was: [...]/gcc.dg/pr105094.c: In function 'foo': [...]/gcc.dg/pr105094.c:11:12: error: size of variable 's' is too large And, finally, regarding 'gcc.target/nvptx/stack_frame-1.c'. Before, in 'gcc/cfgexpand.cc': 'expand_used_vars' -> 'expand_used_vars_for_block' -> 'expand_one_var' for 'ww' -> 'gcc/function.cc:use_register_for_decl' due to 'DECL_MODE (decl) == BLKmode' did 'return false;', thus -> 'add_stack_var' (even if 'ww' wasn't then actually living on the stack). Now, 'ww' has 'TImode' and 'use_register_for_decl' does 'return true;', thus -> 'expand_one_register_var', and therefore no unused stack frame emitted. gcc/ * config/nvptx/nvptx.h (MAX_FIXED_MODE_SIZE): '#define'. gcc/testsuite/ * gcc.target/nvptx/stack_frame-1.c: Adjust.
-
Thomas Schwinge authored
With '-mfake-ptx-alloca' enabled, the user-visible behavior changes only for configurations where PTX 'alloca' is not available. Rather than a compile-time 'sorry, unimplemented: dynamic stack allocation not supported' in presence of dynamic stack allocation, compilation and assembly then succeeds. However, attempting to link in such '*.o' files then fails due to unresolved symbol '__GCC_nvptx__PTX_alloca_not_supported'. This is meant to be used in scenarios where large volumes of code are compiled, a small fraction of which runs into dynamic stack allocation, but these parts are not important for specific use cases, and we'd thus like the build to succeed, and error out just upon actual, very rare use of the offending '*.o' files. gcc/ * config/nvptx/nvptx.opt (-mfake-ptx-alloca): New. * config/nvptx/nvptx-protos.h (nvptx_output_fake_ptx_alloca): Declare. * config/nvptx/nvptx.cc (nvptx_output_fake_ptx_alloca): New. * config/nvptx/nvptx.md (define_insn "@nvptx_alloca_<mode>") [!(TARGET_PTX_7_3 && TARGET_SM52)]: Use it for '-mfake-ptx-alloca'. gcc/testsuite/ * gcc.target/nvptx/alloca-1-O0_-mfake-ptx-alloca.c: New. * gcc.target/nvptx/alloca-2-O0_-mfake-ptx-alloca.c: Likewise. * gcc.target/nvptx/alloca-4-O3_-mfake-ptx-alloca.c: Likewise. * gcc.target/nvptx/vla-1-O0_-mfake-ptx-alloca.c: Likewise. * gcc.target/nvptx/alloca-4-O3.c: 'dg-additional-options -mfake-ptx-alloca'.
-
Thomas Schwinge authored
nvptx: Delay 'sorry, unimplemented: dynamic stack allocation not supported' from expansion time to code generation This gives the back end a chance to clean out a few more unnecessary instances of dynamic stack allocation. This progresses: PASS: gcc.dg/pr78902.c (test for warnings, line 7) PASS: gcc.dg/pr78902.c (test for warnings, line 8) PASS: gcc.dg/pr78902.c (test for warnings, line 9) PASS: gcc.dg/pr78902.c (test for warnings, line 10) PASS: gcc.dg/pr78902.c (test for warnings, line 11) PASS: gcc.dg/pr78902.c (test for warnings, line 12) PASS: gcc.dg/pr78902.c (test for warnings, line 13) PASS: gcc.dg/pr78902.c strndup excessive bound at line 14 (test for warnings, line 13) [-UNSUPPORTED: gcc.dg/pr78902.c: dynamic stack allocation not supported-] {+PASS: gcc.dg/pr78902.c (test for excess errors)+} UNSUPPORTED: gcc.dg/torture/pr71901.c -O0 : dynamic stack allocation not supported [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr71901.c -O1 [-: dynamic stack allocation not supported-]{+(test for excess errors)+} UNSUPPORTED: gcc.dg/torture/pr71901.c -O2 : dynamic stack allocation not supported UNSUPPORTED: gcc.dg/torture/pr71901.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions : dynamic stack allocation not supported UNSUPPORTED: gcc.dg/torture/pr71901.c -O3 -g : dynamic stack allocation not supported [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr71901.c -Os [-: dynamic stack allocation not supported-]{+(test for excess errors)+} UNSUPPORTED: gcc.dg/torture/pr78742.c -O0 : dynamic stack allocation not supported [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr78742.c -O1 [-: dynamic stack allocation not supported-]{+(test for excess errors)+} [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr78742.c -O2 [-: dynamic stack allocation not supported-]{+(test for excess errors)+} [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr78742.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions [-: dynamic stack allocation not supported-]{+(test for excess errors)+} [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr78742.c -O3 -g [-: dynamic stack allocation not supported-]{+(test for excess errors)+} UNSUPPORTED: gcc.dg/torture/pr78742.c -Os : dynamic stack allocation not supported [-UNSUPPORTED:-]{+PASS:+} gfortran.dg/pr101267.f90 -O [-: dynamic stack allocation not supported-]{+(test for excess errors)+} [-UNSUPPORTED:-]{+PASS:+} gfortran.dg/pr112404.f90 -O [-: dynamic stack allocation not supported-]{+(test for excess errors)+} gcc/ * config/nvptx/nvptx.md (define_expand "allocate_stack") [!TARGET_SOFT_STACK]: Move 'sorry ("dynamic stack allocation not supported");'... (define_insn "@nvptx_alloca_<mode>"): ... here. gcc/testsuite/ * gcc.target/nvptx/alloca-1-unused-O0-sm_30.c: Adjust.
-
Haochen Jiang authored
i386: Treat Granite Rapids/Granite Rapids-D/Diamond Rapids similar as Sapphire Rapids in x86-tune.def Since GNR, GNR-D, DMR are both P-core based, we should treat them just like SPR for now. gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_DEST_FALSE_DEP_FOR_GLC): Add GNR, GNR-D, DMR. (X86_TUNE_AVOID_256FMA_CHAINS): Ditto. (X86_TUNE_AVX512_MOVE_BY_PIECES): Ditto. (X86_TUNE_AVX512_STORE_BY_PIECES): Ditto.
-
- Feb 26, 2025
-
-
Jakub Jelinek authored
The linaro CI found my PR119002 patch broke bootstrap on arm. Seems the problem is that it has incorrect REVERSE_CONDITION macro definition. All other target's REVERSE_CONDITION definitions and the default one just use the macro's arguments, while arm.h definition uses the MODE argument but uses code instead of CODE (the first argument). This happens to work because before my patch the only use of the macro was in jump.cc with /* First see if machine description supplies us way to reverse the comparison. Give it priority over everything else to allow machine description to do tricks. */ if (GET_MODE_CLASS (mode) == MODE_CC && REVERSIBLE_CC_MODE (mode)) return REVERSE_CONDITION (code, mode); but in my patch it is used with GT rather than code. 2025-02-26 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/119002 * config/arm/arm.h (REVERSE_CONDITION): Use CODE - the macro argument - in the macro rather than code.
-
- Feb 25, 2025
-
-
Jakub Jelinek authored
HOST_WIDE_INT_PRINT* macros aren't supposed to be used in gcc-internal-format format strings, we have the w modifier for HOST_WIDE_INT in that case, the HOST_WIDE_INT_PRINT* macros might not work properly on some hosts (e.g. mingw32 has HOST_LONG_LONG_FORMAT "I64" and that is something pretty-print doesn't handle, while it handles "ll" for long long) and also the use of macros in the middle of format strings breaks translations (both that exgettext can't retrieve the string from there and we get #: config/pru/pru-pragma.cc:61 msgid "%<CTABLE_ENTRY%> index %" msgstr "" #: config/pru/pru-pragma.cc:64 msgid "redefinition of %<CTABLE_ENTRY %" msgstr "" in po/gcc.pot and also the macros are different on different hosts, so even if exgettext extracted say "%<CTABLE_ENTRY%> index %lld is not valid" it could be translated on some hosts but not e.g. mingw32). So, the following patch just uses %wd instead. Tested it before/after the patch on #pragma ctable_entry 12 0x48040000 #pragma ctable_entry 1024 0x48040000 #pragma ctable_entry 12 0x48040001 and the result is the same. 2025-02-25 Jakub Jelinek <jakub@redhat.com> PR translation/118991 * config/pru/pru-pragma.cc (pru_pragma_ctable_entry): Use %wd instead of %" HOST_WIDE_INT_PRINT "d to print a hwi in error.
-
Iain Buclaw authored
Adds a new i386 d_target_info_spec entry to handle requests for `__traits(getTargetInfo, "CET")', and add predefined target version `GNU_CET' when the option `-fcf-protecton' is used. Both TargetInfo key and predefined version have been added to the D front-end documentation. In the library, `GNU_CET' replaces the existing use of the user-defined version flag `CET' when building libphobos. PR d/118654 gcc/ChangeLog: * config/i386/i386-d.cc (ix86_d_target_versions): Predefine GNU_CET. (ix86_d_handle_target_cf_protection): New. (ix86_d_register_target_info): Add 'CET' TargetInfo key. gcc/d/ChangeLog: * implement-d.texi: Document CET version and traits key. libphobos/ChangeLog: * Makefile.in: Regenerate. * configure: Regenerate. * configure.ac: Remove CET_DFLAGS. * libdruntime/Makefile.am: Replace CET_DFLAGS with CET_FLAGS. * libdruntime/Makefile.in: Regenerate. * libdruntime/core/thread/fiber/package.d: Replace CET with GNU_CET. * src/Makefile.am: Replace CET_DFLAGS with CET_FLAGS. * src/Makefile.in: Regenerate. * testsuite/Makefile.in: Regenerate. * testsuite/testsuite_flags.in: Replace CET_DFLAGS with CET_FLAGS. gcc/testsuite/ChangeLog: * gdc.dg/target/i386/i386.exp: New test. * gdc.dg/target/i386/targetinfo_CET.d: New test.
-
- Feb 24, 2025
-
-
Robin Dapp authored
When scanning for program points, i.e. vector statements, we're missing pattern statements. In PR114516 this becomes obvious as we choose LMUL=8 assuming there are only three statements but the divmod pattern adds another three. Those push us beyond four registers so we need to switch to LMUL=4. This patch adds pattern statements to the program points which helps calculate a better register pressure estimate. PR target/114516 gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (compute_estimated_lmul): Add pattern statements to program points. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr114516.c: New test.
-
Lino Hsing-Yu Peng authored
The incorrect cfi directive info breaks stack unwind in try/catch/cxa. Before patch: cm.push {ra, s0-s2}, -16 .cfi_offset 1, -12 .cfi_offset 8, -8 .cfi_offset 18, -4 After patch: cm.push {ra, s0-s2}, -16 .cfi_offset 1, -16 .cfi_offset 8, -12 .cfi_offset 9, -8 .cfi_offset 18, -4 gcc/ChangeLog: * config/riscv/riscv.cc: Set multi push regs bits. gcc/testsuite/ChangeLog: * gcc.target/riscv/zcmp_push_gpr.c: New test.
-
- Feb 22, 2025
-
-
Thomas Schwinge authored
... instead of BPF: 'error: BPF does not support dynamic stack allocation', and nvptx: 'sorry, unimplemented: target cannot support alloca'. gcc/ * config/bpf/bpf.md (define_expand "allocate_stack"): Emit 'sorry, unimplemented: dynamic stack allocation not supported'. * config/nvptx/nvptx.md (define_expand "allocate_stack") [!TARGET_SOFT_STACK && !(TARGET_PTX_7_3 && TARGET_SM52)]: Likewise. gcc/testsuite/ * gcc.target/bpf/diag-alloca-1.c: Adjust 'dg-message'. * gcc.target/bpf/diag-alloca-2.c: Likewise. * gcc.target/nvptx/alloca-1-sm_30.c: Likewise. * gcc.target/nvptx/vla-1-sm_30.c: Likewise. * lib/target-supports.exp (proc check_effective_target_alloca): Adjust comment.
-
- Feb 20, 2025
-
-
Richard Sandiford authored
While looking at PR118956, I noticed that we had some dead code left over after the removal of the vcond patterns. The can_invert_p path is no longer used. gcc/ * config/aarch64/aarch64-protos.h (aarch64_expand_sve_vec_cmp_float): Remove can_invert_p argument and change return type to void. * config/aarch64/aarch64.cc (aarch64_expand_sve_vec_cmp_float): Likewise. * config/aarch64/aarch64-sve.md (vec_cmp<mode><vpred>): Update call accordingly.
-
- Feb 19, 2025
-
-
Xi Ruoyao authored
Allowing (t + (1ul << imm >> 1)) >> imm to be recognized as a rounding shift operation. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVSRARI): Remove. (UNSPEC_LASX_XVSRLRI): Remove. (lasx_xvsrari_<lsxfmt>): Remove. (lasx_xvsrlri_<lsxfmt>): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_VSRARI): Remove. (UNSPEC_LSX_VSRLRI): Remove. (lsx_vsrari_<lsxfmt>): Remove. (lsx_vsrlri_<lsxfmt>): Remove. * config/loongarch/simd.md (simd_<optab>_imm_round_<mode>): New define_insn. (<simd_isa>_<x>v<insn>ri_<simdfmt>): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-shift-imm-round.c: New test.
-
Xi Ruoyao authored
Despite it's just a special case of "a widening product of which the result used for reduction," having these standard names allows to recognize the dot product pattern earlier and it may be beneficial to optimization. Also fix some test failures with the test cases: - gcc.dg/vect/vect-reduc-chain-2.c - gcc.dg/vect/vect-reduc-chain-3.c - gcc.dg/vect/vect-reduc-chain-dot-slp-3.c - gcc.dg/vect/vect-reduc-chain-dot-slp-4.c gcc/ChangeLog: * config/loongarch/simd.md (wvec_half): New define_mode_attr. (<su>dot_prod<wvec_half><mode>): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/loongarch/wide-mul-reduc-2.c (dg-final): Scan DOT_PROD_EXPR in optimized tree.
-
Xi Ruoyao authored
Since PR116142 has been fixed, now we can add the standard names so the compiler will generate better code if the result of a widening production is reduced. gcc/ChangeLog: * config/loongarch/simd.md (even_odd): New define_int_attr. (vec_widen_<su>mult_<even_odd>_<mode>): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/loongarch/wide-mul-reduc-1.c: New test. * gcc.target/loongarch/wide-mul-reduc-2.c: New test.
-
Xi Ruoyao authored
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates instead of hard-coded const vectors. This is not suitable for LASX where lasx_xvpick has a different semantic. gcc/ChangeLog: * config/loongarch/simd.md (LVEC): New define_mode_attr. (simdfmt_as_i): Make it same as simdfmt for integer vector modes. (_f): New define_mode_attr. * config/loongarch/lsx.md (lsx_vpickev_b): Remove. (lsx_vpickev_h): Remove. (lsx_vpickev_w): Remove. (lsx_vpickev_w_f): Remove. (lsx_vpickod_b): Remove. (lsx_vpickod_h): Remove. (lsx_vpickod_w): Remove. (lsx_vpickev_w_f): Remove. (lsx_pick_evod_<mode>): New define_insn. (lsx_<x>vpick<ev_od>_<simdfmt_as_i><_f>): New define_expand.
-
Xi Ruoyao authored
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. Also reorder two operands of the outer plus in the template, so combine will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVMADDWEV): Remove. (UNSPEC_LASX_XVMADDWEV2): Remove. (UNSPEC_LASX_XVMADDWEV3): Remove. (UNSPEC_LASX_XVMADDWOD): Remove. (UNSPEC_LASX_XVMADDWOD2): Remove. (UNSPEC_LASX_XVMADDWOD3): Remove. (lasx_xvmaddwev_h_b<u>): Remove. (lasx_xvmaddwev_w_h<u>): Remove. (lasx_xvmaddwev_d_w<u>): Remove. (lasx_xvmaddwev_q_d): Remove. (lasx_xvmaddwod_h_b<u>): Remove. (lasx_xvmaddwod_w_h<u>): Remove. (lasx_xvmaddwod_d_w<u>): Remove. (lasx_xvmaddwod_q_d): Remove. (lasx_xvmaddwev_q_du): Remove. (lasx_xvmaddwod_q_du): Remove. (lasx_xvmaddwev_h_bu_b): Remove. (lasx_xvmaddwev_w_hu_h): Remove. (lasx_xvmaddwev_d_wu_w): Remove. (lasx_xvmaddwev_q_du_d): Remove. (lasx_xvmaddwod_h_bu_b): Remove. (lasx_xvmaddwod_w_hu_h): Remove. (lasx_xvmaddwod_d_wu_w): Remove. (lasx_xvmaddwod_q_du_d): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_VMADDWEV): Remove. (UNSPEC_LSX_VMADDWEV2): Remove. (UNSPEC_LSX_VMADDWEV3): Remove. (UNSPEC_LSX_VMADDWOD): Remove. (UNSPEC_LSX_VMADDWOD2): Remove. (UNSPEC_LSX_VMADDWOD3): Remove. (lsx_vmaddwev_h_b<u>): Remove. (lsx_vmaddwev_w_h<u>): Remove. (lsx_vmaddwev_d_w<u>): Remove. (lsx_vmaddwev_q_d): Remove. (lsx_vmaddwod_h_b<u>): Remove. (lsx_vmaddwod_w_h<u>): Remove. (lsx_vmaddwod_d_w<u>): Remove. (lsx_vmaddwod_q_d): Remove. (lsx_vmaddwev_q_du): Remove. (lsx_vmaddwod_q_du): Remove. (lsx_vmaddwev_h_bu_b): Remove. (lsx_vmaddwev_w_hu_h): Remove. (lsx_vmaddwev_d_wu_w): Remove. (lsx_vmaddwev_q_du_d): Remove. (lsx_vmaddwod_h_bu_b): Remove. (lsx_vmaddwod_w_hu_h): Remove. (lsx_vmaddwod_d_wu_w): Remove. (lsx_vmaddwod_q_du_d): Remove. * config/loongarch/simd.md (simd_maddw_evod_<mode>_<su>): New define_insn. (<simd_isa>_<x>vmaddw<ev_od>_<simdfmt_w>_<simdfmt><u>): New define_expand. (simd_maddw_evod_<mode>_hetero): New define_insn. (<simd_isa>_<x>vmaddw<ev_od>_<simdfmt_w>_<simdfmt>u_<simdfmt>): New define_expand. (<simd_isa>_maddw<ev_od>_q_d<u>_punned): New define_expand. (<simd_isa>_maddw<ev_od>_q_du_d_punned): New define_expand. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vmaddwev_q_d): Define as a macro to override it with the punned expand. (CODE_FOR_lsx_vmaddwev_q_du): Likewise. (CODE_FOR_lsx_vmaddwev_q_du_d): Likewise. (CODE_FOR_lsx_vmaddwod_q_d): Likewise. (CODE_FOR_lsx_vmaddwod_q_du): Likewise. (CODE_FOR_lsx_vmaddwod_q_du_d): Likewise. (CODE_FOR_lasx_xvmaddwev_q_d): Likewise. (CODE_FOR_lasx_xvmaddwev_q_du): Likewise. (CODE_FOR_lasx_xvmaddwev_q_du_d): Likewise. (CODE_FOR_lasx_xvmaddwod_q_d): Likewise. (CODE_FOR_lasx_xvmaddwod_q_du): Likewise. (CODE_FOR_lasx_xvmaddwod_q_du_d): Likewise.
-
Xi Ruoyao authored
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove. (UNSPEC_LASX_XVHSUBW_Q_D): Remove. (UNSPEC_LASX_XVHADDW_QU_DU): Remove. (UNSPEC_LASX_XVHSUBW_QU_DU): Remove. (lasx_xvh<addsub:optab>w_h<u>_b<u>): Remove. (lasx_xvh<addsub:optab>w_w<u>_h<u>): Remove. (lasx_xvh<addsub:optab>w_d<u>_w<u>): Remove. (lasx_xvhaddw_q_d): Remove. (lasx_xvhsubw_q_d): Remove. (lasx_xvhaddw_qu_du): Remove. (lasx_xvhsubw_qu_du): Remove. (reduc_plus_scal_v4di): Call gen_lasx_haddw_q_d_punned instead of gen_lasx_xvhaddw_q_d. (reduc_plus_scal_v8si): Likewise. * config/loongarch/lsx.md (UNSPEC_LSX_VHADDW_Q_D): Remove. (UNSPEC_ASX_VHSUBW_Q_D): Remove. (UNSPEC_ASX_VHADDW_QU_DU): Remove. (UNSPEC_ASX_VHSUBW_QU_DU): Remove. (lsx_vh<addsub:optab>w_h<u>_b<u>): Remove. (lsx_vh<addsub:optab>w_w<u>_h<u>): Remove. (lsx_vh<addsub:optab>w_d<u>_w<u>): Remove. (lsx_vhaddw_q_d): Remove. (lsx_vhsubw_q_d): Remove. (lsx_vhaddw_qu_du): Remove. (lsx_vhsubw_qu_du): Remove. (reduc_plus_scal_v2di): Change the temporary register mode to V1TI, and pun the mode calling gen_vec_extractv2didi. (reduc_plus_scal_v4si): Change the temporary register mode to V1TI. * config/loongarch/simd.md (simd_h<optab>w_<mode>_<su>): New define_insn. (<simd_isa>_<x>vh<optab>w_<simdfmt_w><u>_<simdfmt><u>): New define_expand. (<simd_isa>_h<optab>w_q<u>_d<u>_punned): New define_expand. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vhaddw_q_d): Define as a macro to override with punned expand. (CODE_FOR_lsx_vhaddw_qu_du): Likewise. (CODE_FOR_lsx_vhsubw_q_d): Likewise. (CODE_FOR_lsx_vhsubw_qu_du): Likewise. (CODE_FOR_lasx_xvhaddw_q_d): Likewise. (CODE_FOR_lasx_xvhaddw_qu_du): Likewise. (CODE_FOR_lasx_xvhsubw_q_d): Likewise. (CODE_FOR_lasx_xvhsubw_qu_du): Likewise.
-
Xi Ruoyao authored
These pattern definitions are tediously long, invoking 32 UNSPECs and many hard-coded long const vectors. To simplify them, at first we use the TImode vector operations instead of the UNSPECs, then we adopt an approach in AArch64: using a special predicate to match the const vectors for odd/even indices for define_insn's, and generate those vectors in define_expand's. For "backward compatibilty" we need to provide a "punned" version for the operations invoking TImode vectors as the intrinsics still expect DImode vectors. The stat is "201 insertions, 905 deletions." gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVADDWEV): Remove. (UNSPEC_LASX_XVADDWEV2): Remove. (UNSPEC_LASX_XVADDWEV3): Remove. (UNSPEC_LASX_XVSUBWEV): Remove. (UNSPEC_LASX_XVSUBWEV2): Remove. (UNSPEC_LASX_XVMULWEV): Remove. (UNSPEC_LASX_XVMULWEV2): Remove. (UNSPEC_LASX_XVMULWEV3): Remove. (UNSPEC_LASX_XVADDWOD): Remove. (UNSPEC_LASX_XVADDWOD2): Remove. (UNSPEC_LASX_XVADDWOD3): Remove. (UNSPEC_LASX_XVSUBWOD): Remove. (UNSPEC_LASX_XVSUBWOD2): Remove. (UNSPEC_LASX_XVMULWOD): Remove. (UNSPEC_LASX_XVMULWOD2): Remove. (UNSPEC_LASX_XVMULWOD3): Remove. (lasx_xv<addsubmul:optab>wev_h_b<u>): Remove. (lasx_xv<addsubmul:optab>wev_w_h<u>): Remove. (lasx_xv<addsubmul:optab>wev_d_w<u>): Remove. (lasx_xvaddwev_q_d): Remove. (lasx_xvsubwev_q_d): Remove. (lasx_xvmulwev_q_d): Remove. (lasx_xv<addsubmul:optab>wod_h_b<u>): Remove. (lasx_xv<addsubmul:optab>wod_w_h<u>): Remove. (lasx_xv<addsubmul:optab>wod_d_w<u>): Remove. (lasx_xvaddwod_q_d): Remove. (lasx_xvsubwod_q_d): Remove. (lasx_xvmulwod_q_d): Remove. (lasx_xvaddwev_q_du): Remove. (lasx_xvsubwev_q_du): Remove. (lasx_xvmulwev_q_du): Remove. (lasx_xvaddwod_q_du): Remove. (lasx_xvsubwod_q_du): Remove. (lasx_xvmulwod_q_du): Remove. (lasx_xv<addmul:optab>wev_h_bu_b): Remove. (lasx_xv<addmul:optab>wev_w_hu_h): Remove. (lasx_xv<addmul:optab>wev_d_wu_w): Remove. (lasx_xv<addmul:optab>wod_h_bu_b): Remove. (lasx_xv<addmul:optab>wod_w_hu_h): Remove. (lasx_xv<addmul:optab>wod_d_wu_w): Remove. (lasx_xvaddwev_q_du_d): Remove. (lasx_xvsubwev_q_du_d): Remove. (lasx_xvmulwev_q_du_d): Remove. (lasx_xvaddwod_q_du_d): Remove. (lasx_xvsubwod_q_du_d): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_XVADDWEV): Remove. (UNSPEC_LSX_VADDWEV2): Remove. (UNSPEC_LSX_VADDWEV3): Remove. (UNSPEC_LSX_VSUBWEV): Remove. (UNSPEC_LSX_VSUBWEV2): Remove. (UNSPEC_LSX_VMULWEV): Remove. (UNSPEC_LSX_VMULWEV2): Remove. (UNSPEC_LSX_VMULWEV3): Remove. (UNSPEC_LSX_VADDWOD): Remove. (UNSPEC_LSX_VADDWOD2): Remove. (UNSPEC_LSX_VADDWOD3): Remove. (UNSPEC_LSX_VSUBWOD): Remove. (UNSPEC_LSX_VSUBWOD2): Remove. (UNSPEC_LSX_VMULWOD): Remove. (UNSPEC_LSX_VMULWOD2): Remove. (UNSPEC_LSX_VMULWOD3): Remove. (lsx_v<addsubmul:optab>wev_h_b<u>): Remove. (lsx_v<addsubmul:optab>wev_w_h<u>): Remove. (lsx_v<addsubmul:optab>wev_d_w<u>): Remove. (lsx_vaddwev_q_d): Remove. (lsx_vsubwev_q_d): Remove. (lsx_vmulwev_q_d): Remove. (lsx_v<addsubmul:optab>wod_h_b<u>): Remove. (lsx_v<addsubmul:optab>wod_w_h<u>): Remove. (lsx_v<addsubmul:optab>wod_d_w<u>): Remove. (lsx_vaddwod_q_d): Remove. (lsx_vsubwod_q_d): Remove. (lsx_vmulwod_q_d): Remove. (lsx_vaddwev_q_du): Remove. (lsx_vsubwev_q_du): Remove. (lsx_vmulwev_q_du): Remove. (lsx_vaddwod_q_du): Remove. (lsx_vsubwod_q_du): Remove. (lsx_vmulwod_q_du): Remove. (lsx_v<addmul:optab>wev_h_bu_b): Remove. (lsx_v<addmul:optab>wev_w_hu_h): Remove. (lsx_v<addmul:optab>wev_d_wu_w): Remove. (lsx_v<addmul:optab>wod_h_bu_b): Remove. (lsx_v<addmul:optab>wod_w_hu_h): Remove. (lsx_v<addmul:optab>wod_d_wu_w): Remove. (lsx_vaddwev_q_du_d): Remove. (lsx_vsubwev_q_du_d): Remove. (lsx_vmulwev_q_du_d): Remove. (lsx_vaddwod_q_du_d): Remove. (lsx_vsubwod_q_du_d): Remove. (lsx_vmulwod_q_du_d): Remove. * config/loongarch/loongarch-modes.def: Add V4TI and V1DI. * config/loongarch/loongarch-protos.h (loongarch_gen_stepped_int_parallel): New function prototype. * config/loongarch/loongarch.cc (loongarch_print_operand): Accept 'O' for printing "ev" or "od." (loongarch_gen_stepped_int_parallel): Implement. * config/loongarch/predicates.md (vect_par_cnst_even_or_odd_half): New define_predicate. * config/loongarch/simd.md (WVEC_HALF): New define_mode_attr. (simdfmt_w): Likewise. (zero_one): New define_int_iterator. (ev_od): New define_int_attr. (simd_<optab>w_evod_<mode:IVEC>_<su>): New define_insn. (<simd_isa>_<x>v<optab>w<ev_od>_<simdfmt_w>_<simdfmt><u>): New define_expand. (simd_<optab>w_evod_<mode>_hetero): New define_insn. (<simd_isa>_<x>v<optab>w<ev_od>_<simdfmt_w>_<simdfmt>u_<simdfmt>): New define_expand. (DIVEC): New define_mode_iterator. (<simd_isa>_<optab>w<ev_od>_q_d<u>_punned): New define_expand. (<simd_isa>_<optab>w<ev_od>_q_du_d_punned): Likewise. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vaddwev_q_d): Define as a macro to override it with the punned expand. (CODE_FOR_lsx_vaddwev_q_du): Likewise. (CODE_FOR_lsx_vsubwev_q_d): Likewise. (CODE_FOR_lsx_vsubwev_q_du): Likewise. (CODE_FOR_lsx_vmulwev_q_d): Likewise. (CODE_FOR_lsx_vmulwev_q_du): Likewise. (CODE_FOR_lsx_vaddwod_q_d): Likewise. (CODE_FOR_lsx_vaddwod_q_du): Likewise. (CODE_FOR_lsx_vsubwod_q_d): Likewise. (CODE_FOR_lsx_vsubwod_q_du): Likewise. (CODE_FOR_lsx_vmulwod_q_d): Likewise. (CODE_FOR_lsx_vmulwod_q_du): Likewise. (CODE_FOR_lsx_vaddwev_q_du_d): Likewise. (CODE_FOR_lsx_vmulwev_q_du_d): Likewise. (CODE_FOR_lsx_vaddwod_q_du_d): Likewise. (CODE_FOR_lsx_vmulwod_q_du_d): Likewise. (CODE_FOR_lasx_xvaddwev_q_d): Likewise. (CODE_FOR_lasx_xvaddwev_q_du): Likewise. (CODE_FOR_lasx_xvsubwev_q_d): Likewise. (CODE_FOR_lasx_xvsubwev_q_du): Likewise. (CODE_FOR_lasx_xvmulwev_q_d): Likewise. (CODE_FOR_lasx_xvmulwev_q_du): Likewise. (CODE_FOR_lasx_xvaddwod_q_d): Likewise. (CODE_FOR_lasx_xvaddwod_q_du): Likewise. (CODE_FOR_lasx_xvsubwod_q_d): Likewise. (CODE_FOR_lasx_xvsubwod_q_du): Likewise. (CODE_FOR_lasx_xvmulwod_q_d): Likewise. (CODE_FOR_lasx_xvmulwod_q_du): Likewise. (CODE_FOR_lasx_xvaddwev_q_du_d): Likewise. (CODE_FOR_lasx_xvmulwev_q_du_d): Likewise. (CODE_FOR_lasx_xvaddwod_q_du_d): Likewise. (CODE_FOR_lasx_xvmulwod_q_du_d): Likewise.
-
Xi Ruoyao authored
We have some vector instructions for operations on 128-bit integer, i.e. TImode, vectors. Previously they had been modeled with unspecs, but it's more natural to just model them with TImode vector RTL expressions. For the preparation, allow moving V1TImode and V2TImode vectors in LSX and LASX registers so we won't get a reload failure when we start to save TImode vectors in these registers. This implicitly depends on the vrepli optimization: without it we'd try "vrepli.q" which does not really exist and trigger an ICE. gcc/ChangeLog: * config/loongarch/lsx.md (mov<LSX:mode>): Remove. (movmisalign<LSX:mode>): Remove. (mov<LSX:mode>_lsx): Remove. * config/loongarch/lasx.md (mov<LASX:mode>): Remove. (movmisalign<LASX:mode>): Remove. (mov<LASX:mode>_lasx): Remove. * config/loongarch/loongarch-modes.def (V1TI): Add. (V2TI): Mention in the comment. * config/loongarch/loongarch.md (mode): Add V1TI and V2TI. * config/loongarch/simd.md (ALLVEC_TI): New mode iterator. (mov<ALLVEC_TI:mode): New define_expand. (movmisalign<ALLVEC_TI:mode>): Likewise. (mov<ALLVEC_TI:mode>_simd): New define_insn_and_split.
-
Xi Ruoyao authored
For a = (v4si){0xdddddddd, 0xdddddddd, 0xdddddddd, 0xdddddddd} we just want vrepli.b $vr0, 0xdd but the compiler actually produces a load: la.local $r14,.LC0 vld $vr0,$r14,0 It's because we only tried vrepli.d which wouldn't work. Try all vrepli instructions for const int vector materializing to fix it. gcc/ChangeLog: * config/loongarch/loongarch-protos.h (loongarch_const_vector_vrepli): New function prototype. * config/loongarch/loongarch.cc (loongarch_const_vector_vrepli): Implement. (loongarch_const_insns): Call loongarch_const_vector_vrepli instead of loongarch_const_vector_same_int_p. (loongarch_split_vector_move_p): Likewise. (loongarch_output_move): Use loongarch_const_vector_vrepli to pun operend[1] into a better mode if it's a const int vector, and decide the suffix of [x]vrepli with the new mode. * config/loongarch/constraints.md (YI): Call loongarch_const_vector_vrepli instead of loongarch_const_vector_same_int_p. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vrepli.c: New test.
-
Xi Ruoyao authored
Since r15-1120, multi-word shifts/rotates produces PLUS instead of IOR. It's generally a good thing (allowing to use our alsl instruction or similar instrunction on other architectures), but it's preventing us from using bytepick. For example, if we shift a __int128 by 16 bits, the higher word can be produced via a single bytepick.d instruction with immediate 2, but we got: srli.d $r12,$r4,48 slli.d $r5,$r5,16 slli.d $r4,$r4,16 add.d $r5,$r12,$r5 jr $r1 This wasn't work with GCC 14, but after r15-6490 it's supposed to work if IOR was used instead of PLUS. To fix this, add a code iterator to match IOR, XOR, and PLUS and use it instead of just IOR if we know the operands have no overlapping bits. gcc/ChangeLog: PR target/115478 * config/loongarch/loongarch.md (any_or_plus): New define_code_iterator. (bstrins_<mode>_for_ior_mask): Use any_or_plus instead of ior. (bytepick_w_<bytepick_imm>): Likewise. (bytepick_d_<bytepick_imm>): Likewise. (bytepick_d_<bytepick_imm>_rev): Likewise. gcc/testsuite/ChangeLog: PR target/115478 * gcc.target/loongarch/bytepick_shift_128.c: New test.
-
- Feb 18, 2025
-
-
Robin Dapp authored
In PR115703 we fuse two vsetvls: Fuse curr info since prev info compatible with it: prev_info: VALID (insn 438, bb 2) Demand fields: demand_ge_sew demand_non_zero_avl SEW=32, VLMUL=m1, RATIO=32, MAX_SEW=64 TAIL_POLICY=agnostic, MASK_POLICY=agnostic AVL=(reg:DI 0 zero) VL=(reg:DI 9 s1 [312]) curr_info: VALID (insn 92, bb 20) Demand fields: demand_ratio_and_ge_sew demand_avl SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64 TAIL_POLICY=agnostic, MASK_POLICY=agnostic AVL=(const_int 4 [0x4]) VL=(nil) prev_info after fused: VALID (insn 438, bb 2) Demand fields: demand_ratio_and_ge_sew demand_avl SEW=64, VLMUL=mf2, RATIO=64, MAX_SEW=64 TAIL_POLICY=agnostic, MASK_POLICY=agnostic AVL=(const_int 4 [0x4]) VL=(nil). The result is vsetvl zero, zero, e64, mf2, ta, ma. The previous vsetvl set vl = 4 but here we wrongly set it to vl = 2. As all the following vsetvls only ever change the ratio we never recover. The issue is quite difficult to trigger because we can often deduce the value of d at runtime. Then very check for the value of d will be optimized away. The last known bad commit is r15-3458-g5326306e7d9d36. With that commit the output is wrong but -fno-schedule-insns makes it correct. From the next commit on the issue is latent. I still added the PR's test as scan and run check even if they don't trigger right now. Not sure if the run test will ever fail but well. I verified that the patch fixes the issue when applied on top of r15-3458-g5326306e7d9d36. PR target/115703 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Use max_sew for calculating the new LMUL. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr115703-run.c: New test. * gcc.target/riscv/rvv/autovec/pr115703.c: New test.
-
Soumya AR authored
generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses generic_prefetch_tune in generic_armv8_a_tunings. This patch updates the pointer to generic_armv8_a_prefetch_tune. This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. Signed-off-by:
Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: * config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch struct pointer.
-
Pan Li authored
This patch would like to avoid the ICE when the target attribute specific the xlen different to the cmd. Aka compile with rv64gc but target attribute with rv32gcv_zbb. For example as blow: 1 │ long foo (long a, long b) 2 │ __attribute__((target("arch=rv32gcv_zbb"))); 3 │ 4 │ long foo (long a, long b) 5 │ { 6 │ return a + (b * 2); 7 │ } when compile with rv64gc -O3, it will have ICE similar as below during RTL pass: fwprop1 test.c: In function ‘foo’: test.c:10:1: internal compiler error: in add_use, at rtl-ssa/accesses.cc:1234 10 | } | ^ 0x44d6b9d internal_error(char const*, ...) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517 0x44a26a6 fancy_abort(char const*, int, char const*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722 0x408fac9 rtl_ssa::function_info::add_use(rtl_ssa::use_info*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/accesses.cc:1234 0x40a5eea rtl_ssa::function_info::create_reg_use(rtl_ssa::function_info::build_info&, rtl_ssa::insn_info*, rtl_ssa::resource_info) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/insns.cc:496 0x4456738 rtl_ssa::function_info::add_artificial_accesses(rtl_ssa::function_info::build_info&, df_ref_flags) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:900 0x4457297 rtl_ssa::function_info::start_block(rtl_ssa::function_info::build_info&, rtl_ssa::bb_info*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:1082 0x4453627 rtl_ssa::function_info::bb_walker::before_dom_children(basic_block_def*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:118 0x3e9f3fb dom_walker::walk(basic_block_def*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/domwalk.cc:311 0x445806f rtl_ssa::function_info::process_all_blocks() /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:1298 0x40a22d3 rtl_ssa::function_info::function_info(function*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/functions.cc:51 0x3ec3f80 fwprop_init /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/fwprop.cc:893 0x3ec420d fwprop /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/fwprop.cc:963 0x3ec43ad execute Consider stage 4, we just report error for the above scenario when detect the cmd xlen is different to the target attribute during the target hook TARGET_OPTION_VALID_ATTRIBUTE_P implementation. PR target/118540 gcc/ChangeLog: * config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch): Report error when cmd xlen is different with target attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr118540-1.c: New test. * gcc.target/riscv/rvv/base/pr118540-2.c: New test. Signed-off-by:
Pan Li <pan2.li@intel.com>
-