-
- Downloads
RISC-V: Support highpart register overlap for vwcvt
Since Richard supports register filters recently, we are able to support highpart register overlap for widening RVV instructions. This patch support it for vwcvt intrinsics. I leverage real application user codes for vwcvt: https://github.com/riscv/riscv-v-spec/issues/929 https://godbolt.org/z/xoeGnzd8q This is the real application codes that using LMUL = 8 with unrolling to gain optimal performance for specific libraury. You can see in the codegen, GCC has optimal codegen for such since we supported register lowpart overlap for narrowing instructions (dest EEW < source EEW). Now, we start to support highpart register overlap from this patch for widening instructions (dest EEW > source EEW). Leverage this intrinsic codes above but for vwcvt: https://godbolt.org/z/1TMPE5Wfr size_t foo (char const *buf, size_t len) { size_t sum = 0; size_t vl = __riscv_vsetvlmax_e8m8 (); size_t step = vl * 4; const char *it = buf, *end = buf + len; for (; it + step <= end;) { vint8m4_t v0 = __riscv_vle8_v_i8m4 ((void *) it, vl); it += vl; vint8m4_t v1 = __riscv_vle8_v_i8m4 ((void *) it, vl); it += vl; vint8m4_t v2 = __riscv_vle8_v_i8m4 ((void *) it, vl); it += vl; vint8m4_t v3 = __riscv_vle8_v_i8m4 ((void *) it, vl); it += vl; asm volatile("nop" ::: "memory"); vint16m8_t vw0 = __riscv_vwcvt_x_x_v_i16m8 (v0, vl); vint16m8_t vw1 = __riscv_vwcvt_x_x_v_i16m8 (v1, vl); vint16m8_t vw2 = __riscv_vwcvt_x_x_v_i16m8 (v2, vl); vint16m8_t vw3 = __riscv_vwcvt_x_x_v_i16m8 (v3, vl); asm volatile("nop" ::: "memory"); size_t sum0 = __riscv_vmv_x_s_i16m8_i16 (vw0); size_t sum1 = __riscv_vmv_x_s_i16m8_i16 (vw1); size_t sum2 = __riscv_vmv_x_s_i16m8_i16 (vw2); size_t sum3 = __riscv_vmv_x_s_i16m8_i16 (vw3); sum += sumation (sum0, sum1, sum2, sum3); } return sum; } Before this patch: ... csrr t0,vlenb ... vwcvt.x.x.v v16,v8 vwcvt.x.x.v v8,v28 vs8r.v v16,0(sp) ---> spill vwcvt.x.x.v v16,v24 vwcvt.x.x.v v24,v4 nop vsetvli zero,zero,e16,m8,ta,ma vmv.x.s a2,v16 vl8re16.v v16,0(sp) ---> reload ... csrr t0,vlenb ... You can see heavy spill && reload inside the loop body. After this patch: ... vwcvt.x.x.v v8,v12 vwcvt.x.x.v v16,v20 vwcvt.x.x.v v24,v28 vwcvt.x.x.v v0,v4 ... Optimal codegen after this patch. Tested on zvl128b no regression. I am gonna to test zve64d/zvl256b/zvl512b/zvl1024b. Ok for trunk if no regression on the testing above ? Co-authored-by:kito-cheng <kito.cheng@sifive.com> Co-authored-by:
kito-cheng <kito.cheng@gmail.com> PR target/112431 gcc/ChangeLog: * config/riscv/constraints.md (TARGET_VECTOR ? V_REGS : NO_REGS): New register filters. * config/riscv/riscv.md (no,W21,W42,W84,W41,W81,W82): Ditto. (no,yes): Ditto. * config/riscv/vector.md: Support highpart register overlap for vwcvt. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-1.c: New test. * gcc.target/riscv/rvv/base/pr112431-2.c: New test. * gcc.target/riscv/rvv/base/pr112431-3.c: New test.
Showing
- gcc/config/riscv/constraints.md 23 additions, 0 deletionsgcc/config/riscv/constraints.md
- gcc/config/riscv/riscv.md 24 additions, 0 deletionsgcc/config/riscv/riscv.md
- gcc/config/riscv/vector.md 10 additions, 9 deletionsgcc/config/riscv/vector.md
- gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-1.c 104 additions, 0 deletionsgcc/testsuite/gcc.target/riscv/rvv/base/pr112431-1.c
- gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-2.c 68 additions, 0 deletionsgcc/testsuite/gcc.target/riscv/rvv/base/pr112431-2.c
- gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-3.c 51 additions, 0 deletionsgcc/testsuite/gcc.target/riscv/rvv/base/pr112431-3.c
Loading
Please register or sign in to comment