"git@gitlab.cobolworx.com:COBOLworx/gcc-cobol.git" did not exist on "e3753785835fdb4c96f901c7d158a38b1c5c2f78"

Commit 841668aa authored 1 year ago by Pan Li

RISC-V: Refine bswap16 auto vectorization code gen


Update in v2

* Remove emit helper functions.
* Take expand_binop instead.

Original log:

This patch would like to refine the code gen for the bswap16.

We will have VEC_PERM_EXPR after rtl expand when invoking
__builtin_bswap. It will generate about 9 instructions in
loop as below, no matter it is bswap16, bswap32 or bswap64.

  .L2:
1 vle16.v v4,0(a0)
2 vmv.v.x v2,a7
3 vand.vv v2,v6,v2
4 slli    a2,a5,1
5 vrgatherei16.vv v1,v4,v2
6 sub     a4,a4,a5
7 vse16.v v1,0(a3)
8 add     a0,a0,a2
9 add     a3,a3,a2
  bne     a4,zero,.L2

But for bswap16 we may have a even simple code gen, which
has only 7 instructions in loop as below.

  .L5
1 vle8.v  v2,0(a5)
2 addi    a5,a5,32
3 vsrl.vi v4,v2,8
4 vsll.vi v2,v2,8
5 vor.vv  v4,v4,v2
6 vse8.v  v4,0(a4)
7 addi    a4,a4,32
  bne     a5,a6,.L5

Unfortunately, this way will make the insn in loop will grow up to
13 and 24 for bswap32 and bswap64. Thus, we will refine the code
gen for the bswap16 only, and leave both the bswap32 and bswap64
as is.

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl
	for shuffle bswap.
	(expand_vec_perm_const_1): Add handling for shuffle bswap pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker.
	* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test.
	* gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test.
	* gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

parent 1543f3e3

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 188 additions and 2 deletions

Please register or to comment