gcc/tree-vect-slp-patterns.cc · a9473f9c6f2d755d2eb79dbd30877e64b4bc6fc8 · COBOLworx / gcc-cobol

3 months ago

middle-end:For multiplication try swapping operands when matching complex multiply [PR116463] · a9473f9c

Tamar Christina authored 3 months ago

This commit fixes the failures of complex.exp=fast-math-complex-mls-*.c on the
GCC 14 branch and some of the ones on the master.

The current matching just looks for one order for multiplication and was relying
on canonicalization to always give the right order because of the TWO_OPERANDS.

However when it comes to the multiplication trying only one order is a bit
fragile as they can be flipped.

The failing tests on the branch are:

void fms180snd(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
               _Complex TYPE c[restrict N]) {
  for (int i = 0; i < N; i++)
    c[i] -= a[i] * (b[i] * I * I);
}

void fms180fst(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
               _Complex TYPE c[restrict N]) {
  for (int i = 0; i < N; i++)
    c[i] -= (a[i] * I * I) * b[i];
}

The issue is just a small difference in commutative operations.
we look for {R,R} * {R,I} but found {R,I} * {R,R}.

Since the DF analysis is cached, we should be able to swap operands and retry
for multiply cheaply.

There is a constraint being checked by vect_validate_multiplication for the data
flow of the operands feeding the multiplications.  So e.g.

between the nodes:

note:   node 0x4d1d210 (max_nunits=2, refcnt=3) vector(2) double
note:   op template: _27 = _10 * _25;
note:      stmt 0 _27 = _10 * _25;
note:      stmt 1 _29 = _11 * _25;
note:   node 0x4d1d060 (max_nunits=2, refcnt=2) vector(2) double
note:   op template: _26 = _11 * _24;
note:      stmt 0 _26 = _11 * _24;
note:      stmt 1 _28 = _10 * _24;

we require the lanes to come from the same source which
vect_validate_multiplication checks.  As such it doesn't make sense to flip them
individually because that would invalidate the earlier linear_loads_p checks
which have validated that the arguments all come from the same datarefs.

This patch thus flips the operands in unison to still maintain this invariant,
but also honor the commutative nature of multiplication.

gcc/ChangeLog:

	PR tree-optimization/116463
	* tree-vect-slp-patterns.cc (complex_mul_pattern::matches,
	complex_fms_pattern::matches): Try swapping operands on multiply.

a9473f9c

History

middle-end:For multiplication try swapping operands when matching complex multiply [PR116463]

Tamar Christina authored 3 months ago

This commit fixes the failures of complex.exp=fast-math-complex-mls-*.c on the
GCC 14 branch and some of the ones on the master.

The current matching just looks for one order for multiplication and was relying
on canonicalization to always give the right order because of the TWO_OPERANDS.

However when it comes to the multiplication trying only one order is a bit
fragile as they can be flipped.

The failing tests on the branch are:

void fms180snd(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
               _Complex TYPE c[restrict N]) {
  for (int i = 0; i < N; i++)
    c[i] -= a[i] * (b[i] * I * I);
}

void fms180fst(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
               _Complex TYPE c[restrict N]) {
  for (int i = 0; i < N; i++)
    c[i] -= (a[i] * I * I) * b[i];
}

The issue is just a small difference in commutative operations.
we look for {R,R} * {R,I} but found {R,I} * {R,R}.

Since the DF analysis is cached, we should be able to swap operands and retry
for multiply cheaply.

There is a constraint being checked by vect_validate_multiplication for the data
flow of the operands feeding the multiplications.  So e.g.

between the nodes:

note:   node 0x4d1d210 (max_nunits=2, refcnt=3) vector(2) double
note:   op template: _27 = _10 * _25;
note:      stmt 0 _27 = _10 * _25;
note:      stmt 1 _29 = _11 * _25;
note:   node 0x4d1d060 (max_nunits=2, refcnt=2) vector(2) double
note:   op template: _26 = _11 * _24;
note:      stmt 0 _26 = _11 * _24;
note:      stmt 1 _28 = _10 * _24;

we require the lanes to come from the same source which
vect_validate_multiplication checks.  As such it doesn't make sense to flip them
individually because that would invalidate the earlier linear_loads_p checks
which have validated that the arguments all come from the same datarefs.

This patch thus flips the operands in unison to still maintain this invariant,
but also honor the commutative nature of multiplication.

gcc/ChangeLog:

	PR tree-optimization/116463
	* tree-vect-slp-patterns.cc (complex_mul_pattern::matches,
	complex_fms_pattern::matches): Try swapping operands on multiply.