Commit c4638cc4 authored 2 years ago by mtsamis Committed by Philipp Tomsich 1 year ago

aarch64: convert vector shift + bitwise and + multiply to vector compare


When using SWAR (SIMD in a register) techniques a comparison operation
within such a register can be made by using a combination of shifts,
bitwise and and multiplication. If code using this scheme is
vectorized then there is potential to replace all these operations
with a single vector comparison, by reinterpreting the vector types to
match the width of the SWAR register.

For example, for the test function packed_cmp_16_32, the original
generated code is:
        ldr     q0, [x0]
        add     w1, w1, 1
        ushr    v0.4s, v0.4s, 15
        and     v0.16b, v0.16b, v2.16b
        shl     v1.4s, v0.4s, 16
        sub     v0.4s, v1.4s, v0.4s
        str     q0, [x0], 16
        cmp     w2, w1
        bhi     .L20
with this pattern the above can be optimized to:
        ldr     q0, [x0]
        add     w1, w1, 1
        cmlt    v0.8h, v0.8h, #0
        str     q0, [x0], 16
        cmp     w2, w1
        bhi     .L20
The effect is similar for x86-64.

Bootstrapped and reg-tested for x86 and aarch64.

gcc/ChangeLog:

	* match.pd: simplify vector shift + bit_and + multiply.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/swar_to_vec_cmp.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>

parent 9eea27e7

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 133 additions and 0 deletions

Please register or to comment