-
- Downloads
i386: Implement doubleword right shifts by 1 bit using s[ha]r+rcr.
This patch tweaks the i386 back-end's ix86_split_ashr and ix86_split_lshr functions to implement doubleword right shifts by 1 bit, using a shift of the highpart that sets the carry flag followed by a rotate-carry-right (RCR) instruction on the lowpart. Conceptually this is similar to the recent left shift patch, but with two complicating factors. The first is that although the RCR sequence is shorter, and is a ~3x performance improvement on AMD, my microbenchmarking shows it ~10% slower on Intel. Hence this patch also introduces a new X86_TUNE_USE_RCR tuning parameter. The second is that I believe this is the first time a "rotate-right-through-carry" and a right shift that sets the carry flag from the least significant bit has been modelled in GCC RTL (on a MODE_CC target). For this I've used the i386 back-end's UNSPEC_CC_NE which seems appropriate. Finally rcrsi2 and rcrdi2 are separate define_insns so that we can use their generator functions. For the pair of functions: unsigned __int128 foo(unsigned __int128 x) { return x >> 1; } __int128 bar(__int128 x) { return x >> 1; } with -O2 -march=znver4 we previously generated: foo: movq %rdi, %rax movq %rsi, %rdx shrdq $1, %rsi, %rax shrq %rdx ret bar: movq %rdi, %rax movq %rsi, %rdx shrdq $1, %rsi, %rax sarq %rdx ret with this patch we now generate: foo: movq %rsi, %rdx movq %rdi, %rax shrq %rdx rcrq %rax ret bar: movq %rsi, %rdx movq %rdi, %rax sarq %rdx rcrq %rax ret 2023-10-09 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_split_ashr): Split shifts by one into ashr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz. (ix86_split_lshr): Likewise, split shifts by one bit into lshr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz. * config/i386/i386.h (TARGET_USE_RCR): New backend macro. * config/i386/i386.md (rcrsi2): New define_insn for rcrl. (rcrdi2): New define_insn for rcrq. (<anyshiftrt><mode>3_carry): New define_insn for right shifts that set the carry flag from the least significant bit, modelled using UNSPEC_CC_NE. * config/i386/x86-tune.def (X86_TUNE_USE_RCR): New tuning parameter controlling use of rcr 1 vs. shrd, which is significantly faster on AMD processors. gcc/testsuite/ChangeLog * gcc.target/i386/rcr-1.c: New 64-bit test case. * gcc.target/i386/rcr-2.c: New 32-bit test case.
Showing
- gcc/config/i386/i386-expand.cc 32 additions, 0 deletionsgcc/config/i386/i386-expand.cc
- gcc/config/i386/i386.h 1 addition, 0 deletionsgcc/config/i386/i386.h
- gcc/config/i386/i386.md 53 additions, 0 deletionsgcc/config/i386/i386.md
- gcc/config/i386/x86-tune.def 3 additions, 0 deletionsgcc/config/i386/x86-tune.def
- gcc/testsuite/gcc.target/i386/rcr-1.c 6 additions, 0 deletionsgcc/testsuite/gcc.target/i386/rcr-1.c
- gcc/testsuite/gcc.target/i386/rcr-2.c 6 additions, 0 deletionsgcc/testsuite/gcc.target/i386/rcr-2.c
Loading
Please register or sign in to comment