Skip to content
Snippets Groups Projects
Commit a56c1641 authored by Roger Sayle's avatar Roger Sayle
Browse files

Use PTEST to perform AND in TImode STV of (A & B) != 0 on x86_64.

This x86_64 backend patch allows TImode STV to take advantage of the
fact that the PTEST instruction performs an AND operation.  Previously
PTEST was (mostly) used for comparison against zero, by using the same
operands.  The benefits are demonstrated by the new test case:

__int128 a,b;
int foo()
{
  return (a & b) != 0;
}

Currently with -O2 -msse4 we generate:

        movdqa  a(%rip), %xmm0
        pand    b(%rip), %xmm0
        xorl    %eax, %eax
        ptest   %xmm0, %xmm0
        setne   %al
        ret

with this patch we now generate:

        movdqa  a(%rip), %xmm0
        xorl    %eax, %eax
        ptest   b(%rip), %xmm0
        setne   %al
        ret

Technically, the magic happens using new define_insn_and_split patterns.
Using two patterns allows this transformation to performed independently
of whether TImode STV is run before or after combine.  The one tricky
case is that immediate constant operands of the AND behave slightly
differently between TImode and V1TImode: All V1TImode immediate operands
becomes loads, but for TImode only values that are not hilo_operands
need to be loaded.  Hence the new *testti_doubleword accepts any
general_operand, but internally during split calls force_reg whenever
the second operand is not x86_64_hilo_general_operand.  This required
(benefits from) some tweaks to TImode STV to support CONST_WIDE_INT in
more places, using CONST_SCALAR_INT_P instead of just CONST_INT_P.

2022-08-09  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386-features.cc (scalar_chain::convert_compare):
	Create new pseudos only when/if needed.  Add support for TEST,
	i.e. (COMPARE (AND x y) (const_int 0)), using UNSPEC_PTEST.
	When broadcasting V2DImode and V4SImode use new pseudo register.
	(timode_scalar_chain::convert_op): Do nothing if operand is
	already V1TImode.  Avoid generating useless SUBREG conversions,
	i.e. (SUBREG:V1TImode (REG:V1TImode) 0).  Handle CONST_WIDE_INT
	in addition to CONST_INT by using CONST_SCALAR_INT_P.
	(convertible_comparison_p): Use CONST_SCALAR_INT_P to match both
	CONST_WIDE_INT and CONST_INT.  Recognize new *testti_doubleword
	pattern as an STV candidate.
	(timode_scalar_to_vector_candidate_p): Allow CONST_SCALAR_INT_P
	operands in binary logic operations.

	* config/i386/i386.cc (ix86_rtx_costs) <case UNSPEC>: Add costs
	for UNSPEC_PTEST; a PTEST that performs an AND has the same cost
	as regular PTEST, i.e. cost->sse_op.

	* config/i386/i386.md (*testti_doubleword): New pre-reload
	define_insn_and_split that recognizes comparison of TI mode AND
	against zero.
	* config/i386/sse.md (*ptest<mode>_and): New pre-reload
	define_insn_and_split that recognizes UNSPEC_PTEST of identical
	AND operands.

gcc/testsuite/ChangeLog
	* gcc.target/i386/sse4_1-stv-8.c: New test case.
parent 6fc14f19
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment