From c5752c1f01316ac26ec9cf8d171d68aea420a158 Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Tue, 18 Feb 2025 11:00:57 +0000
Subject: [PATCH] late-combine: Tighten register class check [PR108840]

gcc.target/aarch64/pr108840.c has failed since r15-268-g9dbff9c05520
(which means that I really ought to have looked at it earlier).

The test wants us to fold an SImode AND into all shifts that use it.
This is something that late-combine is supposed to do, but:

(1) the pre-RA pass chickened out because of a register pressure check

(2) the post-RA pass can't handle it, because the shift uses are in
    QImode and the sets are in SImode

Both are things that would be good to fix.  But (1) is particularly
silly.  The constraints on the AND have "rk" for the destination
(so allowing the stack pointer) and "r" for the first source.
Including the stack pointer made the destination seem more permissive
than the source.

The intention was instead to check whether there are any
*allocatable* registers in the destination class that aren't
present in the source.

That's enough for all tests but the last one.  The last one still
fails because combine merges the final shift with the move into
the hard return register, giving an arithmetic instruction with
a hard register destination.  Pre-RA late-combine currently punts
on those, again due to register pressure concerns.  That too is
something I'd like to relax, but not for GCC 15.  In the interim,
the best thing seems to be to disable combine for the test.

gcc/
	PR rtl-optimization/108840
	* late-combine.cc (late_combine::check_register_pressure):
	Take only allocatable registers into account when checking
	the permissiveness of register classes.

gcc/testsuite/
	PR rtl-optimization/108840
	* gcc.target/aarch64/pr108840.c: Run at -O2 but disable combine.
---
 gcc/late-combine.cc                         | 10 ++++++++--
 gcc/testsuite/gcc.target/aarch64/pr108840.c |  2 +-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/late-combine.cc b/gcc/late-combine.cc
index 1707ceebd5f4..90d7ef095832 100644
--- a/gcc/late-combine.cc
+++ b/gcc/late-combine.cc
@@ -552,8 +552,14 @@ late_combine::check_register_pressure (insn_info *insn, rtx set)
 	  // Make sure that the source operand's class is at least as
 	  // permissive as the destination operand's class.
 	  auto src_class = alternative_class (alt, i);
-	  if (!reg_class_subset_p (dest_class, src_class))
-	    return false;
+	  if (dest_class != src_class)
+	    {
+	      auto extra_dest_regs = (reg_class_contents[dest_class]
+				      & ~reg_class_contents[src_class]
+				      & ~fixed_reg_set);
+	      if (!hard_reg_set_empty_p (extra_dest_regs))
+		return false;
+	    }
 
 	  // Make sure that the source operand occupies no more hard
 	  // registers than the destination operand.  This mostly matters
diff --git a/gcc/testsuite/gcc.target/aarch64/pr108840.c b/gcc/testsuite/gcc.target/aarch64/pr108840.c
index 804c1cd91567..7e1ea6fa4fe9 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr108840.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr108840.c
@@ -1,6 +1,6 @@
 /* PR target/108840.  Check that the explicit &31 is eliminated.  */
 /* { dg-do compile } */
-/* { dg-options "-O" } */
+/* { dg-options "-O2 -fno-tree-vectorize -fdisable-rtl-combine" } */
 
 int
 foo (int x, int y)
-- 
GitLab