Commit c4733ea2 authored 1 year ago by Kyrylo Tkachov

aarch64: Cost vector comparisons more accurately

We are missing cases for combining of FACGE/FACGT instructions. In the testcase of the patch we generate:
foo:
        fabs    v3.4s, v0.4s
        fabs    v0.4s, v1.4s
        fabs    v1.4s, v2.4s
        fcmgt   v0.4s, v3.4s, v0.4s
        fcmgt   v1.4s, v3.4s, v1.4s
        b       g

This is because combine is rejecting the pattern due to costs:
Successfully matched this instruction:
(set (reg:V4SI 106)
    (neg:V4SI (lt:V4SI (abs:V4SF (reg:V4SF 113))
            (abs:V4SF (reg:V4SF 111)))))
rejecting combination of insns 8, 9 and 10
original costs 8 + 8 + 12 = 28
replacement costs 8 + 28 = 36

It is obviously recursing in the various arms of the RTX and such.
This patch teaches the aarch64 rtx costs routine that our vector comparisons are represented as a NEG of
compare operators, with the FACGE/FAGT operations in particular having ABS on each arm. With this patch we get
the much more reasonable dump:
original costs 8 + 8 + 8 = 24
replacement costs 8 + 8 = 16
and generate the optimal assembly:
foo:
        mov     v31.16b, v0.16b
        facgt   v0.4s, v0.4s, v1.4s
        facgt   v1.4s, v31.4s, v2.4s
        b       g

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_rtx_costs, NEG case): Add costing
	logic for vector modes.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/facg_1.c: New test.

parent 6c3b30ef

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 36 additions and 0 deletions

Please register or to comment