cselib: For CALL_INSNs to const/pure fns invalidate memory below sp [PR117239]
The following testcase is miscompiled on x86_64 during postreload. After reload (with IPA-RA figuring out the calls don't modify any registers but %rax for return value) postreload sees (insn 14 12 15 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp) (const_int 16 [0x10])) [0 S8 A64]) (reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":18:7 95 {*movdi_internal} (nil)) (call_insn/i 15 14 16 2 (set (reg:SI 0 ax) (call (mem:QI (symbol_ref:DI ("baz") [flags 0x3] <function_decl 0x7ffb2e2bdf00 r>) [0 baz S1 A8]) (const_int 24 [0x18]))) "pr117239.c":18:7 1476 {*call_value} (expr_list:REG_CALL_DECL (symbol_ref:DI ("baz") [flags 0x3] <function_decl 0x7ffb2e2bdf00 baz>) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil))) (nil)) (insn 16 15 18 2 (parallel [ (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int 24 [0x18]))) (clobber (reg:CC 17 flags)) ]) "pr117239.c":18:7 285 {*adddi_1} (expr_list:REG_ARGS_SIZE (const_int 0 [0]) (nil))) ... (call_insn/i 19 18 21 2 (set (reg:SI 0 ax) (call (mem:QI (symbol_ref:DI ("foo") [flags 0x3] <function_decl 0x7ffb2e2bdb00 l>) [0 foo S1 A8]) (const_int 0 [0]))) "pr117239.c":19:3 1476 {*call_value} (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x3] <function_decl 0x7ffb2e2bdb00 foo>) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil))) (nil)) (insn 21 19 26 2 (parallel [ (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int -24 [0xffffffffffffffe8]))) (clobber (reg:CC 17 flags)) ]) "pr117239.c":19:3 discrim 1 285 {*adddi_1} (expr_list:REG_ARGS_SIZE (const_int 24 [0x18]) (nil))) (insn 26 21 24 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp) (const_int 16 [0x10])) [0 S8 A64]) (reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":19:3 discrim 1 95 {*movdi_internal} (nil)) i.e. movq %rdx, 16(%rsp) call baz addq $24, %rsp ... call foo subq $24, %rsp movq %rdx, 16(%rsp) Now, postreload uses cselib and cselib remembered that %rdx value has been stored into 16(%rsp). Both baz and foo are pure calls. If they weren't, when processing those CALL_INSNs cselib would invalidate all MEMs if (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn) || !(RTL_CONST_OR_PURE_CALL_P (insn))) cselib_invalidate_mem (callmem); where callmem is (mem:BLK (scratch)). But they are pure, so instead the code just invalidates the argument slots from CALL_INSN_FUNCTION_USAGE. The calls actually clobber more than that, even const/pure calls clobber all memory below the stack pointer. And that is something that hasn't been invalidated. In this failing testcase, the call to baz is not a big deal, we don't have anything remembered in memory below %rsp at that call. But then we increment %rsp by 24, so the %rsp+16 is now 8 bytes below stack and do the call to foo. And that call now actually, not just in theory, clobbers the memory below the stack pointer (in particular overwrites it with the return value). But cselib does not invalidate. Then %rsp is decremented again (in preparation for another call, to bar) and cselib is processing store of %rdx (which IPA-RA says has not been modified by either baz or foo calls) to %rsp + 16, and it sees the memory already has that value, so the store is useless, let's remove it. But it is not, the call to foo has changed it, so it needs to be stored again. The following patch adds targetted invalidation of memory below stack pointer (or on SPARC memory below stack pointer + 2047 when stack bias is used, or on PA memory above stack pointer instead). It does so only in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions, because in other functions the stack pointer should be constant from the end of prologue till start of epilogue and so nothing should be stored within the function below the stack pointer. Now, memory below stack pointer is special, except for functions using alloca/VLAs I believe no addressable memory should be there, it should be purely outgoing function argument area, if we take address of some automatic variable, it should live all the time above the outgoing function argument area. So on top of just trying to flush memory below stack pointer (represented by %rsp - PTRDIFF_MAX with PTRDIFF_MAX size on most arches), the patch tries to optimize and only invalidate memory that has address clearly derived from stack pointer (memory with other bases is not invalidated) and if we can prove (we see same SP_DERIVED_VALUE_P bases in both VALUEs) it is above current stack, also don't call canon_anti_dependence which might just give up in certain cases. I've gathered statistics from x86_64-linux and i686-linux bootstraps/regtests. During -m64 compilations from those, there were 3718396 + 42634 + 27761 cases of processing MEMs in cselib_invalidate_mem (callmem[1]) calls, the first number is number of MEMs not invalidated because of the optimization, i.e. + if (sp_derived_base == NULL_RTX) + { + has_mem = true; + num_mems++; + p = &(*p)->next; + continue; + } in the patch, the second number is number of MEMs not invalidated because canon_anti_dependence returned false and finally the last number is number of MEMs actually invalidated (so that is what hasn't been invalidated before). During -m32 compilations the numbers were 1422412 + 39354 + 16509 with the same meaning. Note, when there is no red zone, in theory even the sp = sp + incr instruction invalidates memory below the new stack pointer, as signal can come and overwrite the memory. So maybe we should be invalidating something at those instructions as well. But in leaf functions we certainly can have even addressable automatic vars in the red zone (which would make it harder to distinguish), on the other side aren't normally storing anything below the red zone, and in non-leaf it should normally be just the outgoing arguments area. 2025-02-05 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/117239 * cselib.cc: Include predict.h. (callmem): Change type from rtx to rtx[2]. (cselib_preserve_only_values): Use callmem[0] rather than callmem. (cselib_invalidate_mem): Optimize and don't try to invalidate for the mem_rtx == callmem[1] case MEMs which clearly can't be below the stack pointer. (cselib_process_insn): Use callmem[0] rather than callmem. For const/pure calls also call cselib_invalidate_mem (callmem[1]) in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions. (cselib_init): Initialize callmem[0] rather than callmem and also initialize callmem[1]. * gcc.dg/pr117239.c: New test.
Loading
Please register or sign in to comment