Skip to content
Snippets Groups Projects
Commit 886ce970 authored by Jakub Jelinek's avatar Jakub Jelinek Committed by Jakub Jelinek
Browse files

cselib: For CALL_INSNs to const/pure fns invalidate memory below sp [PR117239]

The following testcase is miscompiled on x86_64 during postreload.
After reload (with IPA-RA figuring out the calls don't modify any
registers but %rax for return value) postreload sees
(insn 14 12 15 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
                (const_int 16 [0x10])) [0  S8 A64])
        (reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":18:7 95 {*movdi_internal}
     (nil))
(call_insn/i 15 14 16 2 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:DI ("baz") [flags 0x3]  <function_decl 0x7ffb2e2bdf00 r>) [0 baz S1 A8])
            (const_int 24 [0x18]))) "pr117239.c":18:7 1476 {*call_value}
     (expr_list:REG_CALL_DECL (symbol_ref:DI ("baz") [flags 0x3]  <function_decl 0x7ffb2e2bdf00 baz>)
        (expr_list:REG_EH_REGION (const_int 0 [0])
            (nil)))
    (nil))
(insn 16 15 18 2 (parallel [
            (set (reg/f:DI 7 sp)
                (plus:DI (reg/f:DI 7 sp)
                    (const_int 24 [0x18])))
            (clobber (reg:CC 17 flags))
        ]) "pr117239.c":18:7 285 {*adddi_1}
     (expr_list:REG_ARGS_SIZE (const_int 0 [0])
        (nil)))
...
(call_insn/i 19 18 21 2 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:DI ("foo") [flags 0x3]  <function_decl 0x7ffb2e2bdb00 l>) [0 foo S1 A8])
            (const_int 0 [0]))) "pr117239.c":19:3 1476 {*call_value}
     (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x3]  <function_decl 0x7ffb2e2bdb00 foo>)
        (expr_list:REG_EH_REGION (const_int 0 [0])
            (nil)))
    (nil))
(insn 21 19 26 2 (parallel [
            (set (reg/f:DI 7 sp)
                (plus:DI (reg/f:DI 7 sp)
                    (const_int -24 [0xffffffffffffffe8])))
            (clobber (reg:CC 17 flags))
        ]) "pr117239.c":19:3 discrim 1 285 {*adddi_1}
     (expr_list:REG_ARGS_SIZE (const_int 24 [0x18])
        (nil)))
(insn 26 21 24 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
                (const_int 16 [0x10])) [0  S8 A64])
        (reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":19:3 discrim 1 95 {*movdi_internal}
     (nil))
i.e.
        movq    %rdx, 16(%rsp)
        call    baz
        addq    $24, %rsp
...
        call    foo
        subq    $24, %rsp
        movq    %rdx, 16(%rsp)
Now, postreload uses cselib and cselib remembered that %rdx value has been
stored into 16(%rsp).  Both baz and foo are pure calls.  If they weren't,
when processing those CALL_INSNs cselib would invalidate all MEMs
      if (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
          || !(RTL_CONST_OR_PURE_CALL_P (insn)))
        cselib_invalidate_mem (callmem);
where callmem is (mem:BLK (scratch)).  But they are pure, so instead the
code just invalidates the argument slots from CALL_INSN_FUNCTION_USAGE.
The calls actually clobber more than that, even const/pure calls clobber
all memory below the stack pointer.  And that is something that hasn't been
invalidated.  In this failing testcase, the call to baz is not a big deal,
we don't have anything remembered in memory below %rsp at that call.
But then we increment %rsp by 24, so the %rsp+16 is now 8 bytes below stack
and do the call to foo.  And that call now actually, not just in theory,
clobbers the memory below the stack pointer (in particular overwrites it
with the return value).  But cselib does not invalidate.  Then %rsp
is decremented again (in preparation for another call, to bar) and cselib
is processing store of %rdx (which IPA-RA says has not been modified by
either baz or foo calls) to %rsp + 16, and it sees the memory already has
that value, so the store is useless, let's remove it.
But it is not, the call to foo has changed it, so it needs to be stored
again.

The following patch adds targetted invalidation of memory below stack
pointer (or on SPARC memory below stack pointer + 2047 when stack bias is
used, or on PA memory above stack pointer instead).
It does so only in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions,
because in other functions the stack pointer should be constant from
the end of prologue till start of epilogue and so nothing should be stored
within the function below the stack pointer.

Now, memory below stack pointer is special, except for functions using
alloca/VLAs I believe no addressable memory should be there, it should be
purely outgoing function argument area, if we take address of some automatic
variable, it should live all the time above the outgoing function argument
area.  So on top of just trying to flush memory below stack pointer
(represented by %rsp - PTRDIFF_MAX with PTRDIFF_MAX size on most arches),
the patch tries to optimize and only invalidate memory that has address
clearly derived from stack pointer (memory with other bases is not
invalidated) and if we can prove (we see same SP_DERIVED_VALUE_P bases in
both VALUEs) it is above current stack, also don't call
canon_anti_dependence which might just give up in certain cases.

I've gathered statistics from x86_64-linux and i686-linux
bootstraps/regtests.  During -m64 compilations from those, there were
3718396 + 42634 + 27761 cases of processing MEMs in cselib_invalidate_mem
(callmem[1]) calls, the first number is number of MEMs not invalidated
because of the optimization, i.e.
+             if (sp_derived_base == NULL_RTX)
+               {
+                 has_mem = true;
+                 num_mems++;
+                 p = &(*p)->next;
+                 continue;
+               }
in the patch, the second number is number of MEMs not invalidated because
canon_anti_dependence returned false and finally the last number is number
of MEMs actually invalidated (so that is what hasn't been invalidated
before).  During -m32 compilations the numbers were
1422412 + 39354 + 16509 with the same meaning.

Note, when there is no red zone, in theory even the sp = sp + incr
instruction invalidates memory below the new stack pointer, as signal
can come and overwrite the memory.  So maybe we should be invalidating
something at those instructions as well.  But in leaf functions we certainly
can have even addressable automatic vars in the red zone (which would make
it harder to distinguish), on the other side aren't normally storing
anything below the red zone, and in non-leaf it should normally be just the
outgoing arguments area.

2025-02-05  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/117239
	* cselib.cc: Include predict.h.
	(callmem): Change type from rtx to rtx[2].
	(cselib_preserve_only_values): Use callmem[0] rather than callmem.
	(cselib_invalidate_mem): Optimize and don't try to invalidate
	for the mem_rtx == callmem[1] case MEMs which clearly can't be
	below the stack pointer.
	(cselib_process_insn): Use callmem[0] rather than callmem.
	For const/pure calls also call cselib_invalidate_mem (callmem[1])
	in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions.
	(cselib_init): Initialize callmem[0] rather than callmem and also
	initialize callmem[1].

	* gcc.dg/pr117239.c: New test.
parent 5163cf2a
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment