Skip to content
Snippets Groups Projects
Commit 34d4168e authored by Roger Sayle's avatar Roger Sayle
Browse files

i386: Implement doubleword right shifts by 1 bit using s[ha]r+rcr.

This patch tweaks the i386 back-end's ix86_split_ashr and ix86_split_lshr
functions to implement doubleword right shifts by 1 bit, using a shift
of the highpart that sets the carry flag followed by a rotate-carry-right
(RCR) instruction on the lowpart.

Conceptually this is similar to the recent left shift patch, but with two
complicating factors.  The first is that although the RCR sequence is
shorter, and is a ~3x performance improvement on AMD, my microbenchmarking
shows it ~10% slower on Intel.  Hence this patch also introduces a new
X86_TUNE_USE_RCR tuning parameter.  The second is that I believe this is
the first time a "rotate-right-through-carry" and a right shift that sets
the carry flag from the least significant bit has been modelled in GCC RTL
(on a MODE_CC target).  For this I've used the i386 back-end's UNSPEC_CC_NE
which seems appropriate.  Finally rcrsi2 and rcrdi2 are separate
define_insns so that we can use their generator functions.

For the pair of functions:
unsigned __int128 foo(unsigned __int128 x) { return x >> 1; }
__int128 bar(__int128 x) { return x >> 1; }

with -O2 -march=znver4 we previously generated:

foo:	movq    %rdi, %rax
        movq    %rsi, %rdx
        shrdq   $1, %rsi, %rax
        shrq    %rdx
        ret
bar:	movq    %rdi, %rax
        movq    %rsi, %rdx
        shrdq   $1, %rsi, %rax
        sarq    %rdx
        ret

with this patch we now generate:

foo:	movq    %rsi, %rdx
        movq    %rdi, %rax
        shrq    %rdx
        rcrq    %rax
        ret
bar:	movq    %rsi, %rdx
        movq    %rdi, %rax
        sarq    %rdx
        rcrq    %rax
        ret

2023-10-09  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386-expand.cc (ix86_split_ashr): Split shifts by
	one into ashr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR
	or -Oz.
	(ix86_split_lshr): Likewise, split shifts by one bit into
	lshr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz.
	* config/i386/i386.h (TARGET_USE_RCR): New backend macro.
	* config/i386/i386.md (rcrsi2): New define_insn for rcrl.
	(rcrdi2): New define_insn for rcrq.
	(<anyshiftrt><mode>3_carry): New define_insn for right shifts that
	set the carry flag from the least significant bit, modelled using
	UNSPEC_CC_NE.
	* config/i386/x86-tune.def (X86_TUNE_USE_RCR): New tuning parameter
	controlling use of rcr 1 vs. shrd, which is significantly faster on
	AMD processors.

gcc/testsuite/ChangeLog
	* gcc.target/i386/rcr-1.c: New 64-bit test case.
	* gcc.target/i386/rcr-2.c: New 32-bit test case.
parent 85bd47bf
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment