Skip to content
Snippets Groups Projects
Commit 2f14c0db authored by Roger Sayle's avatar Roger Sayle
Browse files

PR target/113560: Enhance is_widening_mult_rhs_p.

This patch resolves PR113560, a code quality regression from GCC12
affecting x86_64, by enhancing the middle-end's tree-ssa-math-opts.cc
to recognize more instances of widening multiplications.

The widening multiplication perception code identifies cases like:

	_1 = (unsigned __int128) x;
	__res = _1 * 100;

but in the reported test case, the original input looks like:

	_1 = (unsigned long long) x;
	_2 = (unsigned __int128) _1;
	__res = _2 * 100;

which gets optimized by constant folding during tree-ssa to:

	_2 = x & 18446744073709551615;  // x & 0xffffffffffffffff
	__res = _2 * 100;

where the BIT_AND_EXPR hides (has consumed) the extension operation.
This reveals the more general deficiency (missed optimization
opportunity) in widening multiplication perception that additionally
both

__int128 foo(__int128 x, __int128 y) {
  return (x & 1000) * (y & 1000)
}

and

unsigned __int128 bar(unsigned __int128 x, unsigned __int128) {
  return (x >> 80) * (y >> 80);
}

should be recognized as widening multiplications.  Hence rather than
test explicitly for BIT_AND_EXPR (as in the first version of this patch)
the more general solution is to make use of range information, as
provided by tree_non_zero_bits.

As a demonstration of the observed improvements, function foo above
currently with -O2 compiles on x86_64 to:

foo:	movq    %rdi, %rsi
        movq    %rdx, %r8
        xorl    %edi, %edi
        xorl    %r9d, %r9d
        andl    $1000, %esi
        andl    $1000, %r8d
        movq    %rdi, %rcx
        movq    %r9, %rdx
        imulq   %rsi, %rdx
        movq    %rsi, %rax
        imulq   %r8, %rcx
        addq    %rdx, %rcx
        mulq    %r8
        addq    %rdx, %rcx
        movq    %rcx, %rdx
        ret

with this patch, GCC recognizes the *w and instead generates:

foo:    movq    %rdi, %rsi
        movq    %rdx, %r8
        andl    $1000, %esi
        andl    $1000, %r8d
        movq    %rsi, %rax
        imulq   %r8
        ret

which is perhaps easier to understand at the tree-level where

__int128 foo (__int128 x, __int128 y)
{
  __int128 _1;
  __int128 _2;
  __int128 _5;

  <bb 2> [local count: 1073741824]:
  _1 = x_3(D) & 1000;
  _2 = y_4(D) & 1000;
  _5 = _1 * _2;
  return _5;

}

gets transformed to:

__int128 foo (__int128 x, __int128 y)
{
  __int128 _1;
  __int128 _2;
  __int128 _5;
  signed long _7;
  signed long _8;

  <bb 2> [local count: 1073741824]:
  _1 = x_3(D) & 1000;
  _2 = y_4(D) & 1000;
  _7 = (signed long) _1;
  _8 = (signed long) _2;
  _5 = _7 w* _8;
  return _5;
}

2023-02-01  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR target/113560
	* tree-ssa-math-opts.cc (is_widening_mult_rhs_p): Use range
	information via tree_non_zero_bits to check if this operand
	is suitably extended for a widening (or highpart) multiplication.
	(convert_mult_to_widen): Insert explicit casts if the RHS or LHS
	isn't already of the claimed type.

gcc/testsuite/ChangeLog
	PR target/113560
	* g++.target/i386/pr113560.C: New test case.
	* gcc.target/i386/pr113560.c: Likewise.
	* gcc.dg/pr87954.c: Update test case.
parent fd4829dd
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment