Skip to content
Snippets Groups Projects
  1. Mar 05, 2025
    • Kyrylo Tkachov's avatar
      PR rtl-optimization/119046: aarch64: Fix PARALLEL mode for vec_perm DUP expansion · ff505948
      Kyrylo Tkachov authored
      
      The PARALLEL created in aarch64_evpc_dup is used to hold the lane number.
      It is not appropriate for it to have a vector mode.
      Other such uses use VOIDmode.
      Do this here as well.
      This avoids the risk of generic code treating the PARALLEL as trapping when it
      has floating-point mode.
      
      Bootstrapped and tested on aarch64-none-linux-gnu.
      
      Signed-off-by: default avatarKyrylo Tkachov <ktkachov@nvidia.com>
      
      	PR rtl-optimization/119046
      	* config/aarch64/aarch64.cc (aarch64_evpc_dup): Use VOIDmode for
      	PARALLEL.
      ff505948
    • Xi Ruoyao's avatar
      LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084] · 4856292f
      Xi Ruoyao authored
      They could be incorrectly reordered with store instructions like st.b
      because the RTL expression does not have a memory_operand or a (mem)
      expression.  The incorrect reorder has been observed in openh264 LTO
      build.
      
      Expand them to a (mem) expression instead of unspec to fix the issue.
      Then we need to make loongarch_address_insns return 1 for
      ADDRESS_REG_REG because the constraint "R" expects this behavior, or
      the vldx instruction will be considered invalid by the register
      allocate pass and turned to add.d + vld.  Apply the ADDRESS_REG_REG
      penalty in loongarch_address_cost instead, loongarch_rtx_costs should
      also call loongarch_address_cost instead of loongarch_address_insns
      then.
      
      Closes: https://github.com/cisco/openh264/issues/3857
      
      gcc/ChangeLog:
      
      	PR target/119084
      	* config/loongarch/lasx.md (UNSPEC_LASX_XVLDX): Remove.
      	(lasx_xvldx): Remove.
      	* config/loongarch/lsx.md (UNSPEC_LSX_VLDX): Remove.
      	(lsx_vldx): Remove.
      	* config/loongarch/simd.md (QIVEC): New define_mode_iterator.
      	(<simd_isa>_<x>vldx): New define_expand.
      	* config/loongarch/loongarch.cc (loongarch_address_insns_1): New
      	static function with most logic factored out from ...
      	(loongarch_address_insns): ... here.  Call
      	loongarch_address_insns_1 with reg_reg_cost = 1.
      	(loongarch_address_cost): Call loongarch_address_insns_1 with
      	reg_reg_cost = la_addr_reg_reg_cost.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/119084
      	* gcc.target/loongarch/pr119084.c: New test.
      4856292f
  2. Mar 04, 2025
    • Jan Hubicka's avatar
      Break false dependency chain on Zen5 · 8c4a00f9
      Jan Hubicka authored
      Zen5 on some variants has false dependency on tzcnt, blsi, blsr and blsmsk
      instructions.  Those can be tested by the following benchmark
      
      jh@shroud:~> cat ee.c
      int
      main()
      {
             int a = 10;
             int b = 0;
             for (int i = 0; i < 1000000000; i++)
             {
                     asm volatile ("xor %0, %0": "=r" (b));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
             }
             return 0;
      }
      jh@shroud:~> cat bmk.sh
      gcc ee.c -DBREAK -DINST=\"$1\" -O2 ; time ./a.out ; gcc ee.c -DINST=\"$1\" -O2 ; time ./a.out
      jh@shroud:~> sh bmk.sh tzcnt
      
      real    0m0.886s
      user    0m0.886s
      sys     0m0.000s
      
      real    0m0.886s
      user    0m0.886s
      sys     0m0.000s
      
      jh@shroud:~> sh bmk.sh blsi
      
      real    0m0.979s
      user    0m0.979s
      sys     0m0.000s
      
      real    0m2.418s
      user    0m2.418s
      sys     0m0.000s
      
      jh@shroud:~> sh bmk.sh blsr
      
      real    0m0.986s
      user    0m0.986s
      sys     0m0.000s
      
      real    0m2.422s
      user    0m2.421s
      sys     0m0.000s
      jh@shroud:~> sh bmk.sh blsmsk
      
      real    0m0.973s
      user    0m0.973s
      sys     0m0.000s
      
      real    0m2.422s
      user    0m2.422s
      sys     0m0.000s
      
      We already have runable that controls tzcnt together with lzcnt and popcnt.
      Since it seems that only tzcnt is affected I added new tunable to control tzcnt
      only.  I also added splitters for blsi/blsr/blsmsk implemented analogously to
      existing splitter for lzcnt.
      
      The patch is neutral on SPEC. We produce blsi and blsr in some internal loops, but
      they usually have same destination as source. However it is good to break the
      dependency chain to avoid patogolical cases and it is quite cheap overall, so I
      think we want to enable this for generic.  I will send followup patch for this.
      
      Bootstrapped/regtested x86_64-linux, will commit it shortly.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_TZCNT): New macro.
      	(TARGET_AVOID_FALSE_DEP_FOR_BLS): New macro.
      	* config/i386/i386.md (*bmi_blsi_<mode>): Add splitter for false
      	dependency.
      	(*bmi_blsi_<mode>_ccno): Add splitter for false dependency.
      	(*bmi_blsi_<mode>_falsedep): New pattern.
      	(*bmi_blsmsk_<mode>): Add splitter for false dependency.
      	(*bmi_blsmsk_<mode>_falsedep): New pattern.
      	(*bmi_blsr_<mode>): Add splitter for false dependency.
      	(*bmi_blsr_<mode>_cmp): Add splitter for false dependency
      	(*bmi_blsr_<mode>_cmp_falsedep): New pattern.
      	* config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT): New tune.
      	(X86_TUNE_AVOID_FALSE_DEP_FOR_BLS): New tune.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/blsi.c: New test.
      	* gcc.target/i386/blsmsk.c: New test.
      	* gcc.target/i386/blsr.c: New test.
      8c4a00f9
    • Jan Hubicka's avatar
      Make ix86_macro_fusion_pair_p and ix86_fuse_mov_alu_p match current CPUs · c84be624
      Jan Hubicka authored
      The current implementation of fussion predicates misses some common
      fussion cases on zen and more recent cores.  I added knobs for
      individual conditionals we test.
      
       1) I split checks for fusing ALU with conditional operands when the ALU
       has memory operand.  This seems to be supported by zen3+ and by
       tigerlake and coperlake (according to Agner Fog's manual)
      
       2) znver4 and 5 supports fussion of ALU and conditional even if ALU has
          memory and immediate operands.
          This seems to be relatively important enabling 25% more fusions on
          gcc bootstrap.
      
       3) no CPU supports fusing when ALU contains IP relative memory
          references.  I added separate knob so we do not forger about this if
          this gets supoorted later.
      
      The patch does not solve the limitation of sched that fuse pairs must be
      adjacent on imput and the first operation must be signle-set.  Fixing
      single-set is easy (I have separate patch for this), for non-adjacent
      pairs we need bigger surgery.
      
      To verify what CPU really does I made simpe test script.
      
      jh@ryzen3:~> cat fuse-test.c
              int b;
              const int z = 0;
              const int o = 1;
              int
      main()
      {
              int a = 1000000000;
              int b;
              int z = 0;
              int o = 1;
              asm volatile ("\n"
      ".L1234:\n"
              "nop\n"
              "subl   %3, %0\n"
      
              "movl %0, %1\n"
              "cmpl     %2, %1\n"
              "movl %0, %1\n"
              "test %1, %1\n"
      
              "nop\n"
              "jne    .L1234":"=a"(a),
              "=m"(b)
              "=r"(b)
              :
              "m"(z),
              "m"(o),
              "i"(0),
              "i"(1),
              "0"(a)
                      );
      }
      jh@ryzen3:~> cat fuse-test.sh
      EVENT=ex_ret_fused_instr
      dotest()
      {
      gcc -O2  fuse-test.c $* -o fuse-cmp-imm-mem-nofuse
      perf stat -e $EVENT ./fuse-cmp-imm-mem-nofuse  2>&1 | grep $EVENT
      gcc -O2 fuse-test.c -DFUSE $* -o fuse-cmp-imm-mem-fuse
      perf stat  -e $EVENT ./fuse-cmp-imm-mem-fuse 2>&1 | grep $EVENT
      }
      
      echo ALU with immediate
      dotest
      echo ALU with memory
      dotest -D MEM
      echo ALU with IP relative memory
      dotest -D MEM -D IPRELATIVE
      echo CMP with immediate
      dotest -D CMP
      echo CMP with memory
      dotest -D CMP -D MEM
      echo CMP with memory and immediate
      dotest -D CMP -D MEMIMM
      echo CMP with IP relative memory
      dotest -D CMP -D MEM -D IPRELATIVE
      echo TEST
      dotest -D TEST
      
      On zen5 I get:
      ALU with immediate
                  20,345      ex_ret_fused_instr:u
           1,000,020,278      ex_ret_fused_instr:u
      ALU with memory
                  20,367      ex_ret_fused_instr:u
           1,000,020,290      ex_ret_fused_instr:u
      ALU with IP relative memory
                  20,395      ex_ret_fused_instr:u
                  20,403      ex_ret_fused_instr:u
      CMP with immediate
                  20,369      ex_ret_fused_instr:u
           1,000,020,301      ex_ret_fused_instr:u
      CMP with memory
                  20,314      ex_ret_fused_instr:u
           1,000,020,341      ex_ret_fused_instr:u
      CMP with memory and immediate
                  20,372      ex_ret_fused_instr:u
           1,000,020,266      ex_ret_fused_instr:u
      CMP with IP relative memory
                  20,382      ex_ret_fused_instr:u
                  20,369      ex_ret_fused_instr:u
      TEST
                  20,346      ex_ret_fused_instr:u
           1,000,020,301      ex_ret_fused_instr:u
      
      IP relative memory seems to not be documented.
      
      On zen3/4 I get:
      
      ALU with immediate
                  20,263      ex_ret_fused_instr:u
           1,000,020,051      ex_ret_fused_instr:u
      ALU with memory
                  20,255      ex_ret_fused_instr:u
           1,000,020,056      ex_ret_fused_instr:u
      ALU with IP relative memory
                  20,253      ex_ret_fused_instr:u
                  20,266      ex_ret_fused_instr:u
      CMP with immediate
                  20,264      ex_ret_fused_instr:u
           1,000,020,052      ex_ret_fused_instr:u
      CMP with memory
                  20,253      ex_ret_fused_instr:u
           1,000,019,794      ex_ret_fused_instr:u
      CMP with memory and immediate
                  20,260      ex_ret_fused_instr:u
                  20,264      ex_ret_fused_instr:u
      CMP with IP relative memory
                  20,258      ex_ret_fused_instr:u
                  20,256      ex_ret_fused_instr:u
      TEST
                  20,261      ex_ret_fused_instr:u
           1,000,020,048      ex_ret_fused_instr:u
      
      zen1 and 2 gets:
      
      ALU with immediate
                  21,610      ex_ret_fus_brnch_inst:u
                  21,697      ex_ret_fus_brnch_inst:u
      ALU with memory
                  21,479      ex_ret_fus_brnch_inst:u
                  21,747      ex_ret_fus_brnch_inst:u
      ALU with IP relative memory
                  21,623      ex_ret_fus_brnch_inst:u
                  21,684      ex_ret_fus_brnch_inst:u
      CMP with immediate
                  21,708      ex_ret_fus_brnch_inst:u
           1,000,021,288      ex_ret_fus_brnch_inst:u
      CMP with memory
                  21,689      ex_ret_fus_brnch_inst:u
           1,000,004,270      ex_ret_fus_brnch_inst:u
      CMP with memory and immediate
                  21,604      ex_ret_fus_brnch_inst:u
                  21,671      ex_ret_fus_brnch_inst:u
      CMP with IP relative memory
                  21,589      ex_ret_fus_brnch_inst:u
                  21,602      ex_ret_fus_brnch_inst:u
      TEST
                  21,600      ex_ret_fus_brnch_inst:u
           1,000,021,233      ex_ret_fus_brnch_inst:u
      
      I tested the patch on zen3 and zen5 and spec2k17 and it seems neutral, however
      the number of fussion does go up.
      
      Bootstrapped/regtested x86_64-linux, I plan to commit it tomorrow.
      
      Honza
      
      gcc/ChangeLog:
      
      	* config/i386/i386.h (TARGET_FUSE_ALU_AND_BRANCH_MEM): New macro.
      	(TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM): New macro.
      	(TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New macro.
      	* config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Support
      	non-single-set.
      	(ix86_macro_fusion_pair_p): Allow ALU which only clobbers;
      	be more careful about immediates; check TARGET_FUSE_ALU_AND_BRANCH_MEM,
      	TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM, TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE;
      	verify that we never use unsigned checks with inc/dec.
      	* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): New tune.
      	(X86_TUNE_FUSE_ALU_AND_BRANCH_MEM): New tune.
      	(X86_TUNE_FUSE_ALU_AND_BRANCH_MEM_IMM): New tune.
      	(X86_TUNE_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New tune.
      c84be624
    • Tamar Christina's avatar
      aarch64: force operand to fresh register to avoid subreg issues [PR118892] · d883f323
      Tamar Christina authored
      When the input is already a subreg and we try to make a paradoxical
      subreg out of it for copysign this can fail if it violates the subreg
      relationship.
      
      Use force_lowpart_subreg instead of lowpart_subreg to then force the
      results to a register instead of ICEing.
      
      gcc/ChangeLog:
      
      	PR target/118892
      	* config/aarch64/aarch64.md (copysign<GPF:mode>3): Use
      	force_lowpart_subreg instead of lowpart_subreg.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/118892
      	* gcc.target/aarch64/copysign-pr118892.c: New test.
      d883f323
    • Richard Sandiford's avatar
      Fix folding of BIT_NOT_EXPR for POLY_INT_CST [PR118976] · 78380fd7
      Richard Sandiford authored
      There was an embarrassing typo in the folding of BIT_NOT_EXPR for
      POLY_INT_CSTs: it used - rather than ~ on the poly_int.  Not sure
      how that happened, but it might have been due to the way that
      ~x is implemented as -1 - x internally.
      
      gcc/
      	PR tree-optimization/118976
      	* fold-const.cc (const_unop): Use ~ rather than - for BIT_NOT_EXPR.
      	* config/aarch64/aarch64.cc (aarch64_test_sve_folding): New function.
      	(aarch64_run_selftests): Run it.
      78380fd7
  3. Mar 03, 2025
    • Andrew Carlotti's avatar
      aarch64: Ignore target pragmas while defining intrinsics · 71355700
      Andrew Carlotti authored
      Refactor the switcher classes into two separate classes:
      
      - sve_alignment_switcher takes the alignment switching functionality,
        and is used only for ABI correctness when defining sve structure
        types.
      - aarch64_target_switcher takes the rest of the functionality of
        aarch64_simd_switcher and sve_switcher, and gates simd/sve specific
        parts upon the specified feature flags.
      
      Additionally, aarch64_target_switcher now adds dependencies of the
      specified flags (which adds +fcma and +bf16 to some intrinsic
      declarations), and unsets current_target_pragma.
      
      This last change fixes an internal bug where we would sometimes add a
      user specified target pragma (stored in current_target_pragma) on top of
      an internally specified target architecture while initialising
      intrinsics with `#pragma GCC aarch64 "arm_*.h"`.  As far as I can tell, this
      has no visible impact at the moment.  However, the unintended target
      feature combinations lead to unwanted behaviour in an under-development
      patch.
      
      This also fixes a missing Makefile dependency, which was due to
      aarch64-sve-builtins.o incorrectly depending on the undefined $(REG_H).
      The correct $(REGS_H) dependency is added to the switcher's new source
      location.
      
      gcc/ChangeLog:
      
      	* common/config/aarch64/aarch64-common.cc
      	(struct aarch64_extension_info): Add field.
      	(aarch64_get_required_features): New.
      	* config/aarch64/aarch64-builtins.cc
      	(aarch64_simd_switcher::aarch64_simd_switcher): Rename to...
      	(aarch64_target_switcher::aarch64_target_switcher): ...this,
      	and extend to handle sve, nosimd and target pragmas.
      	(aarch64_simd_switcher::~aarch64_simd_switcher): Rename to...
      	(aarch64_target_switcher::~aarch64_target_switcher): ...this,
      	and extend to handle sve, nosimd and target pragmas.
      	(handle_arm_acle_h): Use aarch64_target_switcher.
      	(handle_arm_neon_h): Rename switcher and pass explicit flags.
      	(aarch64_general_init_builtins): Ditto.
      	* config/aarch64/aarch64-protos.h
      	(class aarch64_simd_switcher): Rename to...
      	(class aarch64_target_switcher): ...this, and add new members.
      	(aarch64_get_required_features): New prototype.
      	* config/aarch64/aarch64-sve-builtins.cc
      	(sve_switcher::sve_switcher): Delete
      	(sve_switcher::~sve_switcher): Delete
      	(sve_alignment_switcher::sve_alignment_switcher): New
      	(sve_alignment_switcher::~sve_alignment_switcher): New
      	(register_builtin_types): Use alignment switcher
      	(init_builtins): Rename switcher.
      	(handle_arm_neon_sve_bridge_h): Ditto.
      	(handle_arm_sme_h): Ditto.
      	(handle_arm_sve_h): Ditto, and use alignment switcher.
      	* config/aarch64/aarch64-sve-builtins.h
      	(class sve_switcher): Delete.
      	(class sme_switcher): Delete.
      	(class sve_alignment_switcher): New.
      	* config/aarch64/t-aarch64 (aarch64-builtins.o): Add $(REGS_H).
      	(aarch64-sve-builtins.o): Remove $(REG_H).
      71355700
    • Richard Earnshaw's avatar
      arm: remove some redundant zero_extend ops on thumb1 · 2a502f9e
      Richard Earnshaw authored
      The code in gcc.target/unsigned-extend-1.c really should not need an
      unsigned extension operations when the optimizers are used.  For Arm
      and thumb2 that is indeed the case, but for thumb1 code it gets more
      complicated as there are too many instructions for combine to look at.
      For thumb1 we end up with two redundant zero_extend patterns which are
      not removed: the first after the subtract instruction and the second of
      the final boolean result.
      
      We can partially fix this (for the second case above) by adding a new
      split pattern for LEU and GEU patterns which work because the two
      instructions for the [LG]EU pattern plus the redundant extension
      instruction are combined into a single insn, which we can then split
      using the 3->2 method back into the two insns of the [LG]EU sequence.
      
      Because we're missing the optimization for all thumb1 cases (not just
      those architectures with UXTB), I've adjust the testcase to detect all
      the idioms that we might use for zero-extending a value, namely:
      
             UXTB
             AND ...#255 (in thumb1 this would require a register to hold 255)
             LSL ... #24; LSR ... #24
      
      but I've also marked this test as XFAIL for thumb1 because we can't yet
      eliminate the first of the two extend instructions.
      
      gcc/
      	* config/arm/thumb1.md (split patterns for GEU and LEU): New.
      
      gcc/testsuite:
      	* gcc.target/arm/unsigned-extend-1.c: Expand check for any
      	insn suggesting a zero-extend.  XFAIL for thumb1 code.
      2a502f9e
  4. Mar 02, 2025
    • Jeff Law's avatar
      [RISC-V][PR target/118934] Fix ICE in RISC-V long branch support · 67e824c2
      Jeff Law authored
      I'm not sure if I goof'd this or if I merely upstreamed someone else's goof.
      Either way the long branch code isn't working correctly.
      
      We were using 'n' as the output modifier to negate the condition.  But 'n' has
      a special meaning elsewhere, so when presented with a condition rather than
      what was expected, boom, the compiler ICE'd.
      
      Thankfully there's only a few places where we were using %n which I turned into
      %r.
      
      The BZ entry includes a good testcase, it just takes a long time to compile as
      it's trying to create the out-of-range scenario.  I'm not including the
      testcase due to how long it takes, but I did test it locally to ensure it's
      working properly now.
      
      I'm sure that with a little bit of work I could create at testcase that worked
      before and fails with the trunk (by taking advantage of the fuzzyness in length
      computations).  So I'm going to consider this a regression.
      
      Will push to the trunk after pre-commit testing does its thing.
      
      	PR target/118934
      gcc/
      	* config/riscv/corev.md (cv_branch): Adjust output template.
      	(branch): Likewise.
      	* config/riscv/riscv.md (branch): Likewise.
      	* config/riscv/riscv.cc (riscv_asm_output_opcode): Handle 'r' rather
      	than 'n'.
      67e824c2
    • Jakub Jelinek's avatar
      avr: Fix up avr_print_operand diagnostics [PR118991] · 047b7f9a
      Jakub Jelinek authored
      As can be seen in gcc/po/gcc.pot:
       #: config/avr/avr.cc:2754
       #, c-format
       msgid "bad I/O address 0x"
       msgstr ""
      
      exgettext couldn't retrieve the whole format string in this case,
      because it uses a macro in the middle.  output_operand_lossage
      is c-format function though, so we can't use %wx to print HOST_WIDE_INT,
      and HOST_WIDE_INT_PRINT_HEX_PURE is on some hosts %lx, on others %llx
      and on others %I64x so isn't really translatable that way.
      
      As Joseph mentioned in the PR, there is no easy way around this
      but go through a temporary buffer, which the following patch does.
      
      2025-03-02  Jakub Jelinek  <jakub@redhat.com>
      
      	PR translation/118991
      	* config/avr/avr.cc (avr_print_operand): Print ival into
      	a temporary buffer and use %s in output_operand_lossage to make
      	the diagnostics translatable.
      047b7f9a
  5. Mar 01, 2025
    • Jan Dubiec's avatar
      [PATCH] H8/300, libgcc: PR target/114222 For HImode call internal ffs()... · 898f22d1
      Jan Dubiec authored
      [PATCH] H8/300, libgcc: PR target/114222 For HImode call internal ffs() implementation instead of an external one
      
      When INT_TYPE_SIZE < BITS_PER_WORD gcc emits a call to an external ffs()
      implementation instead of a call to "__builtin_ffs()" – see function
      init_optabs() in <SRCROOT>/gcc/optabs-libfuncs.cc. External ffs()
      (which is usually the one from newlib) in turn calls __builtin_ffs()
      what causes infinite recursion and stack overflow. This patch overrides
      default gcc bahaviour for H8/300H (and newer) and provides a generic
      ffs() implementation for HImode.
      
      	PR target/114222
      gcc/ChangeLog:
      
      	* config/h8300/h8300.cc (h8300_init_libfuncs): For HImode override
      	calls to external ffs() (from newlib) with calls to __ffshi2() from
      	libgcc. The implementation of ffs() in newlib calls __builtin_ffs()
      	what causes infinite recursion and finally a stack overflow.
      
      libgcc/ChangeLog:
      
      	* config/h8300/t-h8300: Add __ffshi2().
      	* config/h8300/ffshi2.c: New file.
      898f22d1
    • Jan Dubiec's avatar
      [PATCH] H8/300: PR target/109189 Silence -Wformat warnings on Windows · 2fc17730
      Jan Dubiec authored
      This patch fixes annoying -Wformat warnings when gcc is built
      on Windows/MinGW64. Instead of %ld it uses HOST_WIDE_INT_PRINT_DEC
      macro, just like many other targets do.
      
      	PR target/109189
      gcc/ChangeLog:
      
      	* config/h8300/h8300.cc (h8300_print_operand): Replace %ld format
      	strings with HOST_WIDE_INT_PRINT_DEC macro in order to silence
      	-Wformat warnings when building on Windows/MinGW64.
      2fc17730
  6. Feb 28, 2025
  7. Feb 27, 2025
    • Pan Li's avatar
      RISC-V: Fix bug for expand_const_vector interleave [PR118931] · e7287cbb
      Pan Li authored
      
      This patch would like to fix one bug when expanding const vector for the
      interleave case.  For example, we have:
      
      base1 = 151
      step = 121
      
      For vec_series, we will generate vector in format of v[i] = base + i * step.
      Then the vec_series will have below result for HImode, and we can find
      that the result overflow to the highest 8 bits of HImode.
      
      v1.b = {151, 255, 7,  0, 119,  0, 231,  0, 87,  1, 199,  1, 55,   2, 167,   2}
      
      Aka we expect v1.b should be:
      
      v1.b = {151, 0, 7,  0, 119,  0, 231,  0, 87,  0, 199,  0, 55,   0, 167,   0}
      
      After that it will perform the IOR with v2 for the base2(aka another series).
      
      v2.b =  {0,  17, 0, 33,   0, 49,   0, 65,  0, 81,   0, 97,  0, 113,   0, 129}
      
      Unfortunately, the base1 + i * step1 in HImode may overflow to the high
      8 bits, and the high 8 bits will pollute the v2 and result in incorrect
      value in const_vector.
      
      This patch would like to perform the overflow to smode check before the
      optimized interleave code generation.  If overflow or VLA, it will fall
      back to the default merge approach.
      
      The below test suites are passed for this patch.
      * The rv64gcv fully regression test.
      
      	PR target/118931
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-v.cc (expand_const_vector): Add overflow to
      	smode check and clean up highest bits if overflow.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/base/pr118931-run-1.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      e7287cbb
    • Thomas Schwinge's avatar
      nvptx: '#define MAX_FIXED_MODE_SIZE 128' · e333ad4e
      Thomas Schwinge authored
      ... instead of 64 via 'gcc/defaults.h':
      
          MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (DImode)
      
      This fixes ICEs:
      
          [-FAIL: c-c++-common/pr111309-1.c  -Wc++-compat  (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-]
          [-FAIL:-]{+PASS:+} c-c++-common/pr111309-1.c  -Wc++-compat  (test for excess errors)
          [-UNRESOLVED:-]{+PASS:+} c-c++-common/pr111309-1.c  -Wc++-compat  [-compilation failed to produce executable-]{+execution test+}
      
          [-FAIL: c-c++-common/pr111309-1.c  -std=gnu++17 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-]
          [-FAIL:-]{+PASS:+} c-c++-common/pr111309-1.c  -std=gnu++17 (test for excess errors)
          [-UNRESOLVED:-]{+PASS:+} c-c++-common/pr111309-1.c  -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
          [-FAIL: c-c++-common/pr111309-1.c  -std=gnu++26 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-]
          [-FAIL:-]{+PASS:+} c-c++-common/pr111309-1.c  -std=gnu++26 (test for excess errors)
          [-UNRESOLVED:-]{+PASS:+} c-c++-common/pr111309-1.c  -std=gnu++26 [-compilation failed to produce executable-]{+execution test+}
          [-FAIL: c-c++-common/pr111309-1.c  -std=gnu++98 (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-]
          [-FAIL:-]{+PASS:+} c-c++-common/pr111309-1.c  -std=gnu++98 (test for excess errors)
          [-UNRESOLVED:-]{+PASS:+} c-c++-common/pr111309-1.c  -std=gnu++98 [-compilation failed to produce executable-]{+execution test+}
      
          [-FAIL: gcc.dg/torture/pr116480-1.c   -O0  (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-]
          [-FAIL:-]{+PASS:+} gcc.dg/torture/pr116480-1.c   -O0  (test for excess errors)
          [-FAIL: gcc.dg/torture/pr116480-1.c   -O1  (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268)-]
          [-FAIL:-]{+PASS:+} gcc.dg/torture/pr116480-1.c   -O1  (test for excess errors)
          PASS: gcc.dg/torture/pr116480-1.c   -O2  (test for excess errors)
          PASS: gcc.dg/torture/pr116480-1.c   -O3 -g  (test for excess errors)
          PASS: gcc.dg/torture/pr116480-1.c   -Os  (test for excess errors)
      
      ..., where we ran into 'gcc_assert (icode != CODE_FOR_nothing);' in
      'gcc/internal-fn.cc:expand_fn_using_insn' for '__int128' '__builtin_clzg' etc.:
      
          during RTL pass: expand
          [...]/c-c++-common/pr111309-1.c: In function 'clzI':
          [...]/c-c++-common/pr111309-1.c:69:10: internal compiler error: in expand_fn_using_insn, at internal-fn.cc:268
          0x120ec2cf internal_error(char const*, ...)
                  [...]/gcc/diagnostic-global-context.cc:517
          0x102c7c5b fancy_abort(char const*, int, char const*)
                  [...]/gcc/diagnostic.cc:1722
          0x109708eb expand_fn_using_insn
                  [...]/gcc/internal-fn.cc:268
          0x1098114f expand_internal_call(internal_fn, gcall*)
                  [...]/gcc/internal-fn.cc:5273
          0x1098114f expand_internal_call(gcall*)
                  [...]/gcc/internal-fn.cc:5281
          0x10594fc7 expand_call_stmt
                  [...]/gcc/cfgexpand.cc:3049
          [...]
      
      Likewise, as of commit e8ad697a
      "libstdc++: Use new type-generic built-ins in <bit> [PR118855]",
      the libstdc++ target library build ICEd in the same way.
      
      Additionally, this change fixes:
      
          [-FAIL:-]{+PASS:+} gcc.dg/pr105094.c (test for excess errors)
      
      ..., which was:
      
          [...]/gcc.dg/pr105094.c: In function 'foo':
          [...]/gcc.dg/pr105094.c:11:12: error: size of variable 's' is too large
      
      And, finally, regarding 'gcc.target/nvptx/stack_frame-1.c'.  Before, in
      'gcc/cfgexpand.cc': 'expand_used_vars' -> 'expand_used_vars_for_block' ->
      'expand_one_var' for 'ww' -> 'gcc/function.cc:use_register_for_decl' due to
      'DECL_MODE (decl) == BLKmode' did 'return false;', thus -> 'add_stack_var'
      (even if 'ww' wasn't then actually living on the stack).  Now, 'ww' has
      'TImode' and 'use_register_for_decl' does 'return true;', thus ->
      'expand_one_register_var', and therefore no unused stack frame emitted.
      
      	gcc/
      	* config/nvptx/nvptx.h (MAX_FIXED_MODE_SIZE): '#define'.
      	gcc/testsuite/
      	* gcc.target/nvptx/stack_frame-1.c: Adjust.
      e333ad4e
    • Thomas Schwinge's avatar
      nvptx: Support '-mfake-ptx-alloca' · 1146410c
      Thomas Schwinge authored
      With '-mfake-ptx-alloca' enabled, the user-visible behavior changes only
      for configurations where PTX 'alloca' is not available.  Rather than a
      compile-time 'sorry, unimplemented: dynamic stack allocation not supported'
      in presence of dynamic stack allocation, compilation and assembly then
      succeeds.  However, attempting to link in such '*.o' files then fails due
      to unresolved symbol '__GCC_nvptx__PTX_alloca_not_supported'.
      
      This is meant to be used in scenarios where large volumes of code are
      compiled, a small fraction of which runs into dynamic stack allocation, but
      these parts are not important for specific use cases, and we'd thus like the
      build to succeed, and error out just upon actual, very rare use of the
      offending '*.o' files.
      
      	gcc/
      	* config/nvptx/nvptx.opt (-mfake-ptx-alloca): New.
      	* config/nvptx/nvptx-protos.h (nvptx_output_fake_ptx_alloca):
      	Declare.
      	* config/nvptx/nvptx.cc (nvptx_output_fake_ptx_alloca): New.
      	* config/nvptx/nvptx.md (define_insn "@nvptx_alloca_<mode>")
      	[!(TARGET_PTX_7_3 && TARGET_SM52)]: Use it for
      	'-mfake-ptx-alloca'.
      	gcc/testsuite/
      	* gcc.target/nvptx/alloca-1-O0_-mfake-ptx-alloca.c: New.
      	* gcc.target/nvptx/alloca-2-O0_-mfake-ptx-alloca.c: Likewise.
      	* gcc.target/nvptx/alloca-4-O3_-mfake-ptx-alloca.c: Likewise.
      	* gcc.target/nvptx/vla-1-O0_-mfake-ptx-alloca.c: Likewise.
      	* gcc.target/nvptx/alloca-4-O3.c:
      	'dg-additional-options -mfake-ptx-alloca'.
      1146410c
    • Thomas Schwinge's avatar
      nvptx: Delay 'sorry, unimplemented: dynamic stack allocation not supported'... · 22e76700
      Thomas Schwinge authored
      nvptx: Delay 'sorry, unimplemented: dynamic stack allocation not supported' from expansion time to code generation
      
      This gives the back end a chance to clean out a few more unnecessary instances
      of dynamic stack allocation.  This progresses:
      
          PASS: gcc.dg/pr78902.c  (test for warnings, line 7)
          PASS: gcc.dg/pr78902.c  (test for warnings, line 8)
          PASS: gcc.dg/pr78902.c  (test for warnings, line 9)
          PASS: gcc.dg/pr78902.c  (test for warnings, line 10)
          PASS: gcc.dg/pr78902.c  (test for warnings, line 11)
          PASS: gcc.dg/pr78902.c  (test for warnings, line 12)
          PASS: gcc.dg/pr78902.c  (test for warnings, line 13)
          PASS: gcc.dg/pr78902.c strndup excessive bound at line 14 (test for warnings, line 13)
          [-UNSUPPORTED: gcc.dg/pr78902.c: dynamic stack allocation not supported-]
          {+PASS: gcc.dg/pr78902.c (test for excess errors)+}
      
          UNSUPPORTED: gcc.dg/torture/pr71901.c   -O0 : dynamic stack allocation not supported
          [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr71901.c   -O1  [-: dynamic stack allocation not supported-]{+(test for excess errors)+}
          UNSUPPORTED: gcc.dg/torture/pr71901.c   -O2 : dynamic stack allocation not supported
          UNSUPPORTED: gcc.dg/torture/pr71901.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions : dynamic stack allocation not supported
          UNSUPPORTED: gcc.dg/torture/pr71901.c   -O3 -g : dynamic stack allocation not supported
          [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr71901.c   -Os  [-: dynamic stack allocation not supported-]{+(test for excess errors)+}
      
          UNSUPPORTED: gcc.dg/torture/pr78742.c   -O0 : dynamic stack allocation not supported
          [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr78742.c   -O1  [-: dynamic stack allocation not supported-]{+(test for excess errors)+}
          [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr78742.c   -O2  [-: dynamic stack allocation not supported-]{+(test for excess errors)+}
          [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr78742.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  [-: dynamic stack allocation not supported-]{+(test for excess errors)+}
          [-UNSUPPORTED:-]{+PASS:+} gcc.dg/torture/pr78742.c   -O3 -g  [-: dynamic stack allocation not supported-]{+(test for excess errors)+}
          UNSUPPORTED: gcc.dg/torture/pr78742.c   -Os : dynamic stack allocation not supported
      
          [-UNSUPPORTED:-]{+PASS:+} gfortran.dg/pr101267.f90   -O  [-: dynamic stack allocation not supported-]{+(test for excess errors)+}
      
          [-UNSUPPORTED:-]{+PASS:+} gfortran.dg/pr112404.f90   -O  [-: dynamic stack allocation not supported-]{+(test for excess errors)+}
      
      	gcc/
      	* config/nvptx/nvptx.md (define_expand "allocate_stack")
      	[!TARGET_SOFT_STACK]: Move
      	'sorry ("dynamic stack allocation not supported");'...
      	(define_insn "@nvptx_alloca_<mode>"): ... here.
      	gcc/testsuite/
      	* gcc.target/nvptx/alloca-1-unused-O0-sm_30.c: Adjust.
      22e76700
    • Haochen Jiang's avatar
      i386: Treat Granite Rapids/Granite Rapids-D/Diamond Rapids similar as Sapphire... · 44c4a720
      Haochen Jiang authored
      i386: Treat Granite Rapids/Granite Rapids-D/Diamond Rapids similar as Sapphire Rapids in x86-tune.def
      
      Since GNR, GNR-D, DMR are both P-core based, we should treat them
      just like SPR for now.
      
      gcc/ChangeLog:
      
      	* config/i386/x86-tune.def
      	(X86_TUNE_DEST_FALSE_DEP_FOR_GLC): Add GNR, GNR-D, DMR.
      	(X86_TUNE_AVOID_256FMA_CHAINS): Ditto.
      	(X86_TUNE_AVX512_MOVE_BY_PIECES): Ditto.
      	(X86_TUNE_AVX512_STORE_BY_PIECES): Ditto.
      44c4a720
  8. Feb 26, 2025
    • Jakub Jelinek's avatar
      arm: Fix up REVERSE_CONDITION macro [PR119002] · 40bf0770
      Jakub Jelinek authored
      The linaro CI found my PR119002 patch broke bootstrap on arm.
      Seems the problem is that it has incorrect REVERSE_CONDITION macro
      definition.
      All other target's REVERSE_CONDITION definitions and the default one
      just use the macro's arguments, while arm.h definition uses the MODE
      argument but uses code instead of CODE (the first argument).
      This happens to work because before my patch the only use of the
      macro was in jump.cc with
        /* First see if machine description supplies us way to reverse the
           comparison.  Give it priority over everything else to allow
           machine description to do tricks.  */
        if (GET_MODE_CLASS (mode) == MODE_CC
            && REVERSIBLE_CC_MODE (mode))
          return REVERSE_CONDITION (code, mode);
      but in my patch it is used with GT rather than code.
      
      2025-02-26  Jakub Jelinek  <jakub@redhat.com>
      
      	PR rtl-optimization/119002
      	* config/arm/arm.h (REVERSE_CONDITION): Use CODE - the macro
      	argument - in the macro rather than code.
      40bf0770
  9. Feb 25, 2025
    • Jakub Jelinek's avatar
      pru: Fix pru_pragma_ctable_entry diagnostics [PR118991] · 0bb431d0
      Jakub Jelinek authored
      HOST_WIDE_INT_PRINT* macros aren't supposed to be used in
      gcc-internal-format format strings, we have the w modifier for HOST_WIDE_INT
      in that case, the HOST_WIDE_INT_PRINT* macros might not work properly on
      some hosts (e.g. mingw32 has HOST_LONG_LONG_FORMAT "I64" and that is
      something pretty-print doesn't handle, while it handles "ll" for long long)
      and also the use of macros in the middle of format strings breaks
      translations (both that exgettext can't retrieve the string from there
      and we get
       #: config/pru/pru-pragma.cc:61
       msgid "%<CTABLE_ENTRY%> index %"
       msgstr ""
      
       #: config/pru/pru-pragma.cc:64
       msgid "redefinition of %<CTABLE_ENTRY %"
       msgstr ""
      in po/gcc.pot and also the macros are different on different hosts,
      so even if exgettext extracted say "%<CTABLE_ENTRY%> index %lld is not valid"
      it could be translated on some hosts but not e.g. mingw32).
      
      So, the following patch just uses %wd instead.
      
      Tested it before/after the
      patch on
       #pragma ctable_entry 12 0x48040000
       #pragma ctable_entry 1024 0x48040000
       #pragma ctable_entry 12 0x48040001
      and the result is the same.
      
      2025-02-25  Jakub Jelinek  <jakub@redhat.com>
      
      	PR translation/118991
      	* config/pru/pru-pragma.cc (pru_pragma_ctable_entry): Use %wd
      	instead of %" HOST_WIDE_INT_PRINT "d to print a hwi in error.
      0bb431d0
    • Iain Buclaw's avatar
      d/i386: Add CET TargetInfo key and predefined version [PR118654] · c17044e5
      Iain Buclaw authored
      Adds a new i386 d_target_info_spec entry to handle requests for
      `__traits(getTargetInfo, "CET")', and add predefined target version
      `GNU_CET' when the option `-fcf-protecton' is used.
      
      Both TargetInfo key and predefined version have been added to the D
      front-end documentation.
      
      In the library, `GNU_CET' replaces the existing use of the user-defined
      version flag `CET' when building libphobos.
      
      	PR d/118654
      
      gcc/ChangeLog:
      
      	* config/i386/i386-d.cc (ix86_d_target_versions): Predefine GNU_CET.
      	(ix86_d_handle_target_cf_protection): New.
      	(ix86_d_register_target_info): Add 'CET' TargetInfo key.
      
      gcc/d/ChangeLog:
      
      	* implement-d.texi: Document CET version and traits key.
      
      libphobos/ChangeLog:
      
      	* Makefile.in: Regenerate.
      	* configure: Regenerate.
      	* configure.ac: Remove CET_DFLAGS.
      	* libdruntime/Makefile.am: Replace CET_DFLAGS with CET_FLAGS.
      	* libdruntime/Makefile.in: Regenerate.
      	* libdruntime/core/thread/fiber/package.d: Replace CET with GNU_CET.
      	* src/Makefile.am: Replace CET_DFLAGS with CET_FLAGS.
      	* src/Makefile.in: Regenerate.
      	* testsuite/Makefile.in: Regenerate.
      	* testsuite/testsuite_flags.in: Replace CET_DFLAGS with CET_FLAGS.
      
      gcc/testsuite/ChangeLog:
      
      	* gdc.dg/target/i386/i386.exp: New test.
      	* gdc.dg/target/i386/targetinfo_CET.d: New test.
      c17044e5
  10. Feb 24, 2025
    • Robin Dapp's avatar
      RISC-V: Include pattern stmts for dynamic LMUL computation [PR114516]. · 6be1b9e9
      Robin Dapp authored
      When scanning for program points, i.e. vector statements, we're missing
      pattern statements.  In PR114516 this becomes obvious as we choose
      LMUL=8 assuming there are only three statements but the divmod pattern
      adds another three.  Those push us beyond four registers so we need to
      switch to LMUL=4.
      
      This patch adds pattern statements to the program points which helps
      calculate a better register pressure estimate.
      
      	PR target/114516
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-vector-costs.cc (compute_estimated_lmul):
      	Add pattern statements to program points.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/costmodel/riscv/rvv/pr114516.c: New test.
      6be1b9e9
    • Lino Hsing-Yu Peng's avatar
      RISC-V: Fix .cfi_offset directive when push/pop in zcmp · 4dcd3c77
      Lino Hsing-Yu Peng authored
      The incorrect cfi directive info breaks stack unwind in try/catch/cxa.
      
      Before patch:
        cm.push	{ra, s0-s2}, -16
        .cfi_offset 1, -12
        .cfi_offset 8, -8
        .cfi_offset 18, -4
      
      After patch:
        cm.push	{ra, s0-s2}, -16
        .cfi_offset 1, -16
        .cfi_offset 8, -12
        .cfi_offset 9, -8
        .cfi_offset 18, -4
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc: Set multi push regs bits.
      
      gcc/testsuite/ChangeLog:
      	* gcc.target/riscv/zcmp_push_gpr.c: New test.
      4dcd3c77
  11. Feb 22, 2025
    • Thomas Schwinge's avatar
      BPF, nvptx: Standardize on 'sorry, unimplemented: dynamic stack allocation not supported' · 2abc942f
      Thomas Schwinge authored
      ... instead of BPF: 'error: BPF does not support dynamic stack allocation', and
      nvptx: 'sorry, unimplemented: target cannot support alloca'.
      
      	gcc/
      	* config/bpf/bpf.md (define_expand "allocate_stack"): Emit
      	'sorry, unimplemented: dynamic stack allocation not supported'.
      	* config/nvptx/nvptx.md (define_expand "allocate_stack")
      	[!TARGET_SOFT_STACK && !(TARGET_PTX_7_3 && TARGET_SM52)]: Likewise.
      	gcc/testsuite/
      	* gcc.target/bpf/diag-alloca-1.c: Adjust 'dg-message'.
      	* gcc.target/bpf/diag-alloca-2.c: Likewise.
      	* gcc.target/nvptx/alloca-1-sm_30.c: Likewise.
      	* gcc.target/nvptx/vla-1-sm_30.c: Likewise.
      	* lib/target-supports.exp (proc check_effective_target_alloca):
      	Adjust comment.
      2abc942f
  12. Feb 20, 2025
  13. Feb 19, 2025
    • Xi Ruoyao's avatar
      LoongArch: Use normal RTL pattern instead of UNSPEC for {x,}vsr{a,l}ri instructions · 42738604
      Xi Ruoyao authored
      Allowing (t + (1ul << imm >> 1)) >> imm to be recognized as a rounding
      shift operation.
      
      gcc/ChangeLog:
      
      	* config/loongarch/lasx.md (UNSPEC_LASX_XVSRARI): Remove.
      	(UNSPEC_LASX_XVSRLRI): Remove.
      	(lasx_xvsrari_<lsxfmt>): Remove.
      	(lasx_xvsrlri_<lsxfmt>): Remove.
      	* config/loongarch/lsx.md (UNSPEC_LSX_VSRARI): Remove.
      	(UNSPEC_LSX_VSRLRI): Remove.
      	(lsx_vsrari_<lsxfmt>): Remove.
      	(lsx_vsrlri_<lsxfmt>): Remove.
      	* config/loongarch/simd.md (simd_<optab>_imm_round_<mode>): New
      	define_insn.
      	(<simd_isa>_<x>v<insn>ri_<simdfmt>): New define_expand.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/loongarch/vect-shift-imm-round.c: New test.
      42738604
    • Xi Ruoyao's avatar
      LoongArch: Implement [su]dot_prod* for LSX and LASX modes · cef5f23a
      Xi Ruoyao authored
      Despite it's just a special case of "a widening product of which the
      result used for reduction," having these standard names allows to
      recognize the dot product pattern earlier and it may be beneficial to
      optimization.  Also fix some test failures with the test cases:
      
      - gcc.dg/vect/vect-reduc-chain-2.c
      - gcc.dg/vect/vect-reduc-chain-3.c
      - gcc.dg/vect/vect-reduc-chain-dot-slp-3.c
      - gcc.dg/vect/vect-reduc-chain-dot-slp-4.c
      
      gcc/ChangeLog:
      
      	* config/loongarch/simd.md (wvec_half): New define_mode_attr.
      	(<su>dot_prod<wvec_half><mode>): New define_expand.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/loongarch/wide-mul-reduc-2.c (dg-final): Scan
      	DOT_PROD_EXPR in optimized tree.
      cef5f23a
    • Xi Ruoyao's avatar
      LoongArch: Implement vec_widen_mult_{even,odd}_* for LSX and LASX modes · 7c54e46b
      Xi Ruoyao authored
      Since PR116142 has been fixed, now we can add the standard names so the
      compiler will generate better code if the result of a widening
      production is reduced.
      
      gcc/ChangeLog:
      
      	* config/loongarch/simd.md (even_odd): New define_int_attr.
      	(vec_widen_<su>mult_<even_odd>_<mode>): New define_expand.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/loongarch/wide-mul-reduc-1.c: New test.
      	* gcc.target/loongarch/wide-mul-reduc-2.c: New test.
      7c54e46b
    • Xi Ruoyao's avatar
      LoongArch: Simplify lsx_vpick description · 7dda6715
      Xi Ruoyao authored
      Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
      special predicates instead of hard-coded const vectors.
      
      This is not suitable for LASX where lasx_xvpick has a different
      semantic.
      
      gcc/ChangeLog:
      
      	* config/loongarch/simd.md (LVEC): New define_mode_attr.
      	(simdfmt_as_i): Make it same as simdfmt for integer vector
      	modes.
      	(_f): New define_mode_attr.
      	* config/loongarch/lsx.md (lsx_vpickev_b): Remove.
      	(lsx_vpickev_h): Remove.
      	(lsx_vpickev_w): Remove.
      	(lsx_vpickev_w_f): Remove.
      	(lsx_vpickod_b): Remove.
      	(lsx_vpickod_h): Remove.
      	(lsx_vpickod_w): Remove.
      	(lsx_vpickev_w_f): Remove.
      	(lsx_pick_evod_<mode>): New define_insn.
      	(lsx_<x>vpick<ev_od>_<simdfmt_as_i><_f>): New
      	define_expand.
      7dda6715
    • Xi Ruoyao's avatar
      LoongArch: Simplify {lsx_,lasx_x}vmaddw description · f727a4c5
      Xi Ruoyao authored
      Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
      special predicates and TImode RTL instead of hard-coded const vectors
      and UNSPECs.
      
      Also reorder two operands of the outer plus in the template, so combine
      will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}.
      
      gcc/ChangeLog:
      
      	* config/loongarch/lasx.md (UNSPEC_LASX_XVMADDWEV): Remove.
      	(UNSPEC_LASX_XVMADDWEV2): Remove.
      	(UNSPEC_LASX_XVMADDWEV3): Remove.
      	(UNSPEC_LASX_XVMADDWOD): Remove.
      	(UNSPEC_LASX_XVMADDWOD2): Remove.
      	(UNSPEC_LASX_XVMADDWOD3): Remove.
      	(lasx_xvmaddwev_h_b<u>): Remove.
      	(lasx_xvmaddwev_w_h<u>): Remove.
      	(lasx_xvmaddwev_d_w<u>): Remove.
      	(lasx_xvmaddwev_q_d): Remove.
      	(lasx_xvmaddwod_h_b<u>): Remove.
      	(lasx_xvmaddwod_w_h<u>): Remove.
      	(lasx_xvmaddwod_d_w<u>): Remove.
      	(lasx_xvmaddwod_q_d): Remove.
      	(lasx_xvmaddwev_q_du): Remove.
      	(lasx_xvmaddwod_q_du): Remove.
      	(lasx_xvmaddwev_h_bu_b): Remove.
      	(lasx_xvmaddwev_w_hu_h): Remove.
      	(lasx_xvmaddwev_d_wu_w): Remove.
      	(lasx_xvmaddwev_q_du_d): Remove.
      	(lasx_xvmaddwod_h_bu_b): Remove.
      	(lasx_xvmaddwod_w_hu_h): Remove.
      	(lasx_xvmaddwod_d_wu_w): Remove.
      	(lasx_xvmaddwod_q_du_d): Remove.
      	* config/loongarch/lsx.md (UNSPEC_LSX_VMADDWEV): Remove.
      	(UNSPEC_LSX_VMADDWEV2): Remove.
      	(UNSPEC_LSX_VMADDWEV3): Remove.
      	(UNSPEC_LSX_VMADDWOD): Remove.
      	(UNSPEC_LSX_VMADDWOD2): Remove.
      	(UNSPEC_LSX_VMADDWOD3): Remove.
      	(lsx_vmaddwev_h_b<u>): Remove.
      	(lsx_vmaddwev_w_h<u>): Remove.
      	(lsx_vmaddwev_d_w<u>): Remove.
      	(lsx_vmaddwev_q_d): Remove.
      	(lsx_vmaddwod_h_b<u>): Remove.
      	(lsx_vmaddwod_w_h<u>): Remove.
      	(lsx_vmaddwod_d_w<u>): Remove.
      	(lsx_vmaddwod_q_d): Remove.
      	(lsx_vmaddwev_q_du): Remove.
      	(lsx_vmaddwod_q_du): Remove.
      	(lsx_vmaddwev_h_bu_b): Remove.
      	(lsx_vmaddwev_w_hu_h): Remove.
      	(lsx_vmaddwev_d_wu_w): Remove.
      	(lsx_vmaddwev_q_du_d): Remove.
      	(lsx_vmaddwod_h_bu_b): Remove.
      	(lsx_vmaddwod_w_hu_h): Remove.
      	(lsx_vmaddwod_d_wu_w): Remove.
      	(lsx_vmaddwod_q_du_d): Remove.
      	* config/loongarch/simd.md (simd_maddw_evod_<mode>_<su>):
      	New define_insn.
      	(<simd_isa>_<x>vmaddw<ev_od>_<simdfmt_w>_<simdfmt><u>): New
      	define_expand.
      	(simd_maddw_evod_<mode>_hetero): New define_insn.
      	(<simd_isa>_<x>vmaddw<ev_od>_<simdfmt_w>_<simdfmt>u_<simdfmt>):
      	New define_expand.
      	(<simd_isa>_maddw<ev_od>_q_d<u>_punned): New define_expand.
      	(<simd_isa>_maddw<ev_od>_q_du_d_punned): New define_expand.
      	* config/loongarch/loongarch-builtins.cc
      	(CODE_FOR_lsx_vmaddwev_q_d): Define as a macro to override it
      	with the punned expand.
      	(CODE_FOR_lsx_vmaddwev_q_du): Likewise.
      	(CODE_FOR_lsx_vmaddwev_q_du_d): Likewise.
      	(CODE_FOR_lsx_vmaddwod_q_d): Likewise.
      	(CODE_FOR_lsx_vmaddwod_q_du): Likewise.
      	(CODE_FOR_lsx_vmaddwod_q_du_d): Likewise.
      	(CODE_FOR_lasx_xvmaddwev_q_d): Likewise.
      	(CODE_FOR_lasx_xvmaddwev_q_du): Likewise.
      	(CODE_FOR_lasx_xvmaddwev_q_du_d): Likewise.
      	(CODE_FOR_lasx_xvmaddwod_q_d): Likewise.
      	(CODE_FOR_lasx_xvmaddwod_q_du): Likewise.
      	(CODE_FOR_lasx_xvmaddwod_q_du_d): Likewise.
      f727a4c5
    • Xi Ruoyao's avatar
      LoongArch: Simplify {lsx_,lasx_x}vh{add,sub}w description · 2ca759fc
      Xi Ruoyao authored
      Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
      special predicates and TImode RTL instead of hard-coded const vectors
      and UNSPECs.
      
      gcc/ChangeLog:
      
      	* config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove.
      	(UNSPEC_LASX_XVHSUBW_Q_D): Remove.
      	(UNSPEC_LASX_XVHADDW_QU_DU): Remove.
      	(UNSPEC_LASX_XVHSUBW_QU_DU): Remove.
      	(lasx_xvh<addsub:optab>w_h<u>_b<u>): Remove.
      	(lasx_xvh<addsub:optab>w_w<u>_h<u>): Remove.
      	(lasx_xvh<addsub:optab>w_d<u>_w<u>): Remove.
      	(lasx_xvhaddw_q_d): Remove.
      	(lasx_xvhsubw_q_d): Remove.
      	(lasx_xvhaddw_qu_du): Remove.
      	(lasx_xvhsubw_qu_du): Remove.
      	(reduc_plus_scal_v4di): Call gen_lasx_haddw_q_d_punned instead
      	of gen_lasx_xvhaddw_q_d.
      	(reduc_plus_scal_v8si): Likewise.
      	* config/loongarch/lsx.md (UNSPEC_LSX_VHADDW_Q_D): Remove.
      	(UNSPEC_ASX_VHSUBW_Q_D): Remove.
      	(UNSPEC_ASX_VHADDW_QU_DU): Remove.
      	(UNSPEC_ASX_VHSUBW_QU_DU): Remove.
      	(lsx_vh<addsub:optab>w_h<u>_b<u>): Remove.
      	(lsx_vh<addsub:optab>w_w<u>_h<u>): Remove.
      	(lsx_vh<addsub:optab>w_d<u>_w<u>): Remove.
      	(lsx_vhaddw_q_d): Remove.
      	(lsx_vhsubw_q_d): Remove.
      	(lsx_vhaddw_qu_du): Remove.
      	(lsx_vhsubw_qu_du): Remove.
      	(reduc_plus_scal_v2di): Change the temporary register mode to
      	V1TI, and pun the mode calling gen_vec_extractv2didi.
      	(reduc_plus_scal_v4si): Change the temporary register mode to
      	V1TI.
      	* config/loongarch/simd.md (simd_h<optab>w_<mode>_<su>): New
      	define_insn.
      	(<simd_isa>_<x>vh<optab>w_<simdfmt_w><u>_<simdfmt><u>): New
      	define_expand.
      	(<simd_isa>_h<optab>w_q<u>_d<u>_punned): New define_expand.
      	* config/loongarch/loongarch-builtins.cc
      	(CODE_FOR_lsx_vhaddw_q_d): Define as a macro to override with
      	punned expand.
      	(CODE_FOR_lsx_vhaddw_qu_du): Likewise.
      	(CODE_FOR_lsx_vhsubw_q_d): Likewise.
      	(CODE_FOR_lsx_vhsubw_qu_du): Likewise.
      	(CODE_FOR_lasx_xvhaddw_q_d): Likewise.
      	(CODE_FOR_lasx_xvhaddw_qu_du): Likewise.
      	(CODE_FOR_lasx_xvhsubw_q_d): Likewise.
      	(CODE_FOR_lasx_xvhsubw_qu_du): Likewise.
      2ca759fc
    • Xi Ruoyao's avatar
      LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description · a36c15aa
      Xi Ruoyao authored
      These pattern definitions are tediously long, invoking 32 UNSPECs and
      many hard-coded long const vectors.  To simplify them, at first we use
      the TImode vector operations instead of the UNSPECs, then we adopt an
      approach in AArch64: using a special predicate to match the const
      vectors for odd/even indices for define_insn's, and generate those
      vectors in define_expand's.
      
      For "backward compatibilty" we need to provide a "punned" version for
      the operations invoking TImode vectors as the intrinsics still expect
      DImode vectors.
      
      The stat is "201 insertions, 905 deletions."
      
      gcc/ChangeLog:
      
      	* config/loongarch/lasx.md (UNSPEC_LASX_XVADDWEV): Remove.
      	(UNSPEC_LASX_XVADDWEV2): Remove.
      	(UNSPEC_LASX_XVADDWEV3): Remove.
      	(UNSPEC_LASX_XVSUBWEV): Remove.
      	(UNSPEC_LASX_XVSUBWEV2): Remove.
      	(UNSPEC_LASX_XVMULWEV): Remove.
      	(UNSPEC_LASX_XVMULWEV2): Remove.
      	(UNSPEC_LASX_XVMULWEV3): Remove.
      	(UNSPEC_LASX_XVADDWOD): Remove.
      	(UNSPEC_LASX_XVADDWOD2): Remove.
      	(UNSPEC_LASX_XVADDWOD3): Remove.
      	(UNSPEC_LASX_XVSUBWOD): Remove.
      	(UNSPEC_LASX_XVSUBWOD2): Remove.
      	(UNSPEC_LASX_XVMULWOD): Remove.
      	(UNSPEC_LASX_XVMULWOD2): Remove.
      	(UNSPEC_LASX_XVMULWOD3): Remove.
      	(lasx_xv<addsubmul:optab>wev_h_b<u>): Remove.
      	(lasx_xv<addsubmul:optab>wev_w_h<u>): Remove.
      	(lasx_xv<addsubmul:optab>wev_d_w<u>): Remove.
      	(lasx_xvaddwev_q_d): Remove.
      	(lasx_xvsubwev_q_d): Remove.
      	(lasx_xvmulwev_q_d): Remove.
      	(lasx_xv<addsubmul:optab>wod_h_b<u>): Remove.
      	(lasx_xv<addsubmul:optab>wod_w_h<u>): Remove.
      	(lasx_xv<addsubmul:optab>wod_d_w<u>): Remove.
      	(lasx_xvaddwod_q_d): Remove.
      	(lasx_xvsubwod_q_d): Remove.
      	(lasx_xvmulwod_q_d): Remove.
      	(lasx_xvaddwev_q_du): Remove.
      	(lasx_xvsubwev_q_du): Remove.
      	(lasx_xvmulwev_q_du): Remove.
      	(lasx_xvaddwod_q_du): Remove.
      	(lasx_xvsubwod_q_du): Remove.
      	(lasx_xvmulwod_q_du): Remove.
      	(lasx_xv<addmul:optab>wev_h_bu_b): Remove.
      	(lasx_xv<addmul:optab>wev_w_hu_h): Remove.
      	(lasx_xv<addmul:optab>wev_d_wu_w): Remove.
      	(lasx_xv<addmul:optab>wod_h_bu_b): Remove.
      	(lasx_xv<addmul:optab>wod_w_hu_h): Remove.
      	(lasx_xv<addmul:optab>wod_d_wu_w): Remove.
      	(lasx_xvaddwev_q_du_d): Remove.
      	(lasx_xvsubwev_q_du_d): Remove.
      	(lasx_xvmulwev_q_du_d): Remove.
      	(lasx_xvaddwod_q_du_d): Remove.
      	(lasx_xvsubwod_q_du_d): Remove.
      	* config/loongarch/lsx.md (UNSPEC_LSX_XVADDWEV): Remove.
      	(UNSPEC_LSX_VADDWEV2): Remove.
      	(UNSPEC_LSX_VADDWEV3): Remove.
      	(UNSPEC_LSX_VSUBWEV): Remove.
      	(UNSPEC_LSX_VSUBWEV2): Remove.
      	(UNSPEC_LSX_VMULWEV): Remove.
      	(UNSPEC_LSX_VMULWEV2): Remove.
      	(UNSPEC_LSX_VMULWEV3): Remove.
      	(UNSPEC_LSX_VADDWOD): Remove.
      	(UNSPEC_LSX_VADDWOD2): Remove.
      	(UNSPEC_LSX_VADDWOD3): Remove.
      	(UNSPEC_LSX_VSUBWOD): Remove.
      	(UNSPEC_LSX_VSUBWOD2): Remove.
      	(UNSPEC_LSX_VMULWOD): Remove.
      	(UNSPEC_LSX_VMULWOD2): Remove.
      	(UNSPEC_LSX_VMULWOD3): Remove.
      	(lsx_v<addsubmul:optab>wev_h_b<u>): Remove.
      	(lsx_v<addsubmul:optab>wev_w_h<u>): Remove.
      	(lsx_v<addsubmul:optab>wev_d_w<u>): Remove.
      	(lsx_vaddwev_q_d): Remove.
      	(lsx_vsubwev_q_d): Remove.
      	(lsx_vmulwev_q_d): Remove.
      	(lsx_v<addsubmul:optab>wod_h_b<u>): Remove.
      	(lsx_v<addsubmul:optab>wod_w_h<u>): Remove.
      	(lsx_v<addsubmul:optab>wod_d_w<u>): Remove.
      	(lsx_vaddwod_q_d): Remove.
      	(lsx_vsubwod_q_d): Remove.
      	(lsx_vmulwod_q_d): Remove.
      	(lsx_vaddwev_q_du): Remove.
      	(lsx_vsubwev_q_du): Remove.
      	(lsx_vmulwev_q_du): Remove.
      	(lsx_vaddwod_q_du): Remove.
      	(lsx_vsubwod_q_du): Remove.
      	(lsx_vmulwod_q_du): Remove.
      	(lsx_v<addmul:optab>wev_h_bu_b): Remove.
      	(lsx_v<addmul:optab>wev_w_hu_h): Remove.
      	(lsx_v<addmul:optab>wev_d_wu_w): Remove.
      	(lsx_v<addmul:optab>wod_h_bu_b): Remove.
      	(lsx_v<addmul:optab>wod_w_hu_h): Remove.
      	(lsx_v<addmul:optab>wod_d_wu_w): Remove.
      	(lsx_vaddwev_q_du_d): Remove.
      	(lsx_vsubwev_q_du_d): Remove.
      	(lsx_vmulwev_q_du_d): Remove.
      	(lsx_vaddwod_q_du_d): Remove.
      	(lsx_vsubwod_q_du_d): Remove.
      	(lsx_vmulwod_q_du_d): Remove.
      	* config/loongarch/loongarch-modes.def: Add V4TI and V1DI.
      	* config/loongarch/loongarch-protos.h
      	(loongarch_gen_stepped_int_parallel): New function prototype.
      	* config/loongarch/loongarch.cc (loongarch_print_operand):
      	Accept 'O' for printing "ev" or "od."
      	(loongarch_gen_stepped_int_parallel): Implement.
      	* config/loongarch/predicates.md
      	(vect_par_cnst_even_or_odd_half): New define_predicate.
      	* config/loongarch/simd.md (WVEC_HALF): New define_mode_attr.
      	(simdfmt_w): Likewise.
      	(zero_one): New define_int_iterator.
      	(ev_od): New define_int_attr.
      	(simd_<optab>w_evod_<mode:IVEC>_<su>): New define_insn.
      	(<simd_isa>_<x>v<optab>w<ev_od>_<simdfmt_w>_<simdfmt><u>): New
      	define_expand.
      	(simd_<optab>w_evod_<mode>_hetero): New define_insn.
      	(<simd_isa>_<x>v<optab>w<ev_od>_<simdfmt_w>_<simdfmt>u_<simdfmt>):
      	New define_expand.
      	(DIVEC): New define_mode_iterator.
      	(<simd_isa>_<optab>w<ev_od>_q_d<u>_punned): New define_expand.
      	(<simd_isa>_<optab>w<ev_od>_q_du_d_punned): Likewise.
      	* config/loongarch/loongarch-builtins.cc
      	(CODE_FOR_lsx_vaddwev_q_d): Define as a macro to override it
      	with the punned expand.
      	(CODE_FOR_lsx_vaddwev_q_du): Likewise.
      	(CODE_FOR_lsx_vsubwev_q_d): Likewise.
      	(CODE_FOR_lsx_vsubwev_q_du): Likewise.
      	(CODE_FOR_lsx_vmulwev_q_d): Likewise.
      	(CODE_FOR_lsx_vmulwev_q_du): Likewise.
      	(CODE_FOR_lsx_vaddwod_q_d): Likewise.
      	(CODE_FOR_lsx_vaddwod_q_du): Likewise.
      	(CODE_FOR_lsx_vsubwod_q_d): Likewise.
      	(CODE_FOR_lsx_vsubwod_q_du): Likewise.
      	(CODE_FOR_lsx_vmulwod_q_d): Likewise.
      	(CODE_FOR_lsx_vmulwod_q_du): Likewise.
      	(CODE_FOR_lsx_vaddwev_q_du_d): Likewise.
      	(CODE_FOR_lsx_vmulwev_q_du_d): Likewise.
      	(CODE_FOR_lsx_vaddwod_q_du_d): Likewise.
      	(CODE_FOR_lsx_vmulwod_q_du_d): Likewise.
      	(CODE_FOR_lasx_xvaddwev_q_d): Likewise.
      	(CODE_FOR_lasx_xvaddwev_q_du): Likewise.
      	(CODE_FOR_lasx_xvsubwev_q_d): Likewise.
      	(CODE_FOR_lasx_xvsubwev_q_du): Likewise.
      	(CODE_FOR_lasx_xvmulwev_q_d): Likewise.
      	(CODE_FOR_lasx_xvmulwev_q_du): Likewise.
      	(CODE_FOR_lasx_xvaddwod_q_d): Likewise.
      	(CODE_FOR_lasx_xvaddwod_q_du): Likewise.
      	(CODE_FOR_lasx_xvsubwod_q_d): Likewise.
      	(CODE_FOR_lasx_xvsubwod_q_du): Likewise.
      	(CODE_FOR_lasx_xvmulwod_q_d): Likewise.
      	(CODE_FOR_lasx_xvmulwod_q_du): Likewise.
      	(CODE_FOR_lasx_xvaddwev_q_du_d): Likewise.
      	(CODE_FOR_lasx_xvmulwev_q_du_d): Likewise.
      	(CODE_FOR_lasx_xvaddwod_q_du_d): Likewise.
      	(CODE_FOR_lasx_xvmulwod_q_du_d): Likewise.
      a36c15aa
    • Xi Ruoyao's avatar
      LoongArch: Allow moving TImode vectors · ac1b0586
      Xi Ruoyao authored
      We have some vector instructions for operations on 128-bit integer, i.e.
      TImode, vectors.  Previously they had been modeled with unspecs, but
      it's more natural to just model them with TImode vector RTL expressions.
      
      For the preparation, allow moving V1TImode and V2TImode vectors in LSX
      and LASX registers so we won't get a reload failure when we start to
      save TImode vectors in these registers.
      
      This implicitly depends on the vrepli optimization: without it we'd try
      "vrepli.q" which does not really exist and trigger an ICE.
      
      gcc/ChangeLog:
      
      	* config/loongarch/lsx.md (mov<LSX:mode>): Remove.
      	(movmisalign<LSX:mode>): Remove.
      	(mov<LSX:mode>_lsx): Remove.
      	* config/loongarch/lasx.md (mov<LASX:mode>): Remove.
      	(movmisalign<LASX:mode>): Remove.
      	(mov<LASX:mode>_lasx): Remove.
      	* config/loongarch/loongarch-modes.def (V1TI): Add.
      	(V2TI): Mention in the comment.
      	* config/loongarch/loongarch.md (mode): Add V1TI and V2TI.
      	* config/loongarch/simd.md (ALLVEC_TI): New mode iterator.
      	(mov<ALLVEC_TI:mode): New define_expand.
      	(movmisalign<ALLVEC_TI:mode>): Likewise.
      	(mov<ALLVEC_TI:mode>_simd): New define_insn_and_split.
      ac1b0586
    • Xi Ruoyao's avatar
      LoongArch: Try harder using vrepli instructions to materialize const vectors · ed979454
      Xi Ruoyao authored
      For
      
        a = (v4si){0xdddddddd, 0xdddddddd, 0xdddddddd, 0xdddddddd}
      
      we just want
      
        vrepli.b $vr0, 0xdd
      
      but the compiler actually produces a load:
      
        la.local $r14,.LC0
        vld      $vr0,$r14,0
      
      It's because we only tried vrepli.d which wouldn't work.  Try all vrepli
      instructions for const int vector materializing to fix it.
      
      gcc/ChangeLog:
      
      	* config/loongarch/loongarch-protos.h
      	(loongarch_const_vector_vrepli): New function prototype.
      	* config/loongarch/loongarch.cc (loongarch_const_vector_vrepli):
      	Implement.
      	(loongarch_const_insns): Call loongarch_const_vector_vrepli
      	instead of loongarch_const_vector_same_int_p.
      	(loongarch_split_vector_move_p): Likewise.
      	(loongarch_output_move): Use loongarch_const_vector_vrepli to
      	pun operend[1] into a better mode if it's a const int vector,
      	and decide the suffix of [x]vrepli with the new mode.
      	* config/loongarch/constraints.md (YI): Call
      	loongarch_const_vector_vrepli instead of
      	loongarch_const_vector_same_int_p.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/loongarch/vrepli.c: New test.
      ed979454
    • Xi Ruoyao's avatar
      LoongArch: Accept ADD, IOR or XOR when combining objects with no bits in common [PR115478] · ea3ebe48
      Xi Ruoyao authored
      Since r15-1120, multi-word shifts/rotates produces PLUS instead of IOR.
      It's generally a good thing (allowing to use our alsl instruction or
      similar instrunction on other architectures), but it's preventing us
      from using bytepick.  For example, if we shift a __int128 by 16 bits,
      the higher word can be produced via a single bytepick.d instruction with
      immediate 2, but we got:
      
      	srli.d	$r12,$r4,48
      	slli.d	$r5,$r5,16
      	slli.d	$r4,$r4,16
      	add.d	$r5,$r12,$r5
      	jr	$r1
      
      This wasn't work with GCC 14, but after r15-6490 it's supposed to work
      if IOR was used instead of PLUS.
      
      To fix this, add a code iterator to match IOR, XOR, and PLUS and use it
      instead of just IOR if we know the operands have no overlapping bits.
      
      gcc/ChangeLog:
      
      	PR target/115478
      	* config/loongarch/loongarch.md (any_or_plus): New
      	define_code_iterator.
      	(bstrins_<mode>_for_ior_mask): Use any_or_plus instead of ior.
      	(bytepick_w_<bytepick_imm>): Likewise.
      	(bytepick_d_<bytepick_imm>): Likewise.
      	(bytepick_d_<bytepick_imm>_rev): Likewise.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/115478
      	* gcc.target/loongarch/bytepick_shift_128.c: New test.
      ea3ebe48
  14. Feb 18, 2025
    • Robin Dapp's avatar
      RISC-V: Fix ratio in vsetvl fuse rule [PR115703]. · 44d4a108
      Robin Dapp authored
      In PR115703 we fuse two vsetvls:
      
          Fuse curr info since prev info compatible with it:
            prev_info: VALID (insn 438, bb 2)
              Demand fields: demand_ge_sew demand_non_zero_avl
              SEW=32, VLMUL=m1, RATIO=32, MAX_SEW=64
              TAIL_POLICY=agnostic, MASK_POLICY=agnostic
              AVL=(reg:DI 0 zero)
              VL=(reg:DI 9 s1 [312])
            curr_info: VALID (insn 92, bb 20)
              Demand fields: demand_ratio_and_ge_sew demand_avl
              SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64
              TAIL_POLICY=agnostic, MASK_POLICY=agnostic
              AVL=(const_int 4 [0x4])
              VL=(nil)
            prev_info after fused: VALID (insn 438, bb 2)
              Demand fields: demand_ratio_and_ge_sew demand_avl
              SEW=64, VLMUL=mf2, RATIO=64, MAX_SEW=64
              TAIL_POLICY=agnostic, MASK_POLICY=agnostic
              AVL=(const_int 4 [0x4])
              VL=(nil).
      
      The result is vsetvl zero, zero, e64, mf2, ta, ma.  The previous vsetvl
      set vl = 4 but here we wrongly set it to vl = 2.  As all the following
      vsetvls only ever change the ratio we never recover.
      
      The issue is quite difficult to trigger because we can often
      deduce the value of d at runtime.  Then very check for the value of
      d will be optimized away.
      
      The last known bad commit is r15-3458-g5326306e7d9d36.  With that commit
      the output is wrong but -fno-schedule-insns makes it correct.  From the
      next commit on the issue is latent.  I still added the PR's test as scan
      and run check even if they don't trigger right now.  Not sure if the
      run test will ever fail but well.  I verified that the
      patch fixes the issue when applied on top of r15-3458-g5326306e7d9d36.
      
      	PR target/115703
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-vsetvl.cc: Use max_sew for calculating the
      	new LMUL.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/pr115703-run.c: New test.
      	* gcc.target/riscv/rvv/autovec/pr115703.c: New test.
      44d4a108
    • Soumya AR's avatar
      aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h · 8606ab34
      Soumya AR authored
      
      generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses
      generic_prefetch_tune in generic_armv8_a_tunings.
      
      This patch updates the pointer to generic_armv8_a_prefetch_tune.
      
      This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
      
      Signed-off-by: default avatarSoumya AR <soumyaa@nvidia.com>
      
      gcc/ChangeLog:
      
      	* config/aarch64/tuning_models/generic_armv8_a.h: Updated prefetch
      	struct pointer.
      8606ab34
    • Pan Li's avatar
      RISC-V: Fix ICE for target attributes has different xlen size · 17b95cfc
      Pan Li authored
      
      This patch would like to avoid the ICE when the target attribute
      specific the xlen different to the cmd.  Aka compile with rv64gc
      but target attribute with rv32gcv_zbb.  For example as blow:
      
         1   │ long foo (long a, long b)
         2   │ __attribute__((target("arch=rv32gcv_zbb")));
         3   │
         4   │ long foo (long a, long b)
         5   │ {
         6   │   return a + (b * 2);
         7   │ }
      
      when compile with rv64gc -O3, it will have ICE similar as below
      
      during RTL pass: fwprop1
      test.c: In function ‘foo’:
      test.c:10:1: internal compiler error: in add_use, at
      rtl-ssa/accesses.cc:1234
         10 | }
            | ^
      0x44d6b9d internal_error(char const*, ...)
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517
      0x44a26a6 fancy_abort(char const*, int, char const*)
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722
      0x408fac9 rtl_ssa::function_info::add_use(rtl_ssa::use_info*)
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/accesses.cc:1234
      0x40a5eea
      rtl_ssa::function_info::create_reg_use(rtl_ssa::function_info::build_info&,
      rtl_ssa::insn_info*, rtl_ssa::resource_info)
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/insns.cc:496
      0x4456738
      rtl_ssa::function_info::add_artificial_accesses(rtl_ssa::function_info::build_info&,
      df_ref_flags)
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:900
      0x4457297
      rtl_ssa::function_info::start_block(rtl_ssa::function_info::build_info&,
      rtl_ssa::bb_info*)
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:1082
      0x4453627
      rtl_ssa::function_info::bb_walker::before_dom_children(basic_block_def*)
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:118
      0x3e9f3fb dom_walker::walk(basic_block_def*)
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/domwalk.cc:311
      0x445806f rtl_ssa::function_info::process_all_blocks()
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:1298
      0x40a22d3 rtl_ssa::function_info::function_info(function*)
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/functions.cc:51
      0x3ec3f80 fwprop_init
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/fwprop.cc:893
      0x3ec420d fwprop
              /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/fwprop.cc:963
      0x3ec43ad execute
      
      Consider stage 4, we just report error for the above scenario when
      detect the cmd xlen is different to the target attribute during the
      target hook TARGET_OPTION_VALID_ATTRIBUTE_P implementation.
      
      	PR target/118540
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
      	Report error when cmd xlen is different with target attribute.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/base/pr118540-1.c: New test.
      	* gcc.target/riscv/rvv/base/pr118540-2.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      17b95cfc
Loading