Skip to content
Snippets Groups Projects
  • Roger Sayle's avatar
    c5288df7
    PR tree-optimization/98335: Improvements to DSE's compute_trims. · c5288df7
    Roger Sayle authored
    This patch is the main middle-end piece of a fix for PR tree-opt/98335,
    which is a code-quality regression affecting mainline.  The issue occurs
    in DSE's (dead store elimination's) compute_trims function that determines
    where a store to memory can be trimmed.  In the testcase given in the
    PR, this function notices that the first byte of a DImode store is dead,
    and replaces the 8-byte store at (aligned) offset zero, with a 7-byte store
    at (unaligned) offset one.  Most architectures can store a power-of-two
    bytes (up to a maximum) in single instruction, so writing 7 bytes requires
    more instructions than writing 8 bytes.  This patch follows Jakub Jelinek's
    suggestion in comment 5, that compute_trims needs improved heuristics.
    
    On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:
    
            movl    $0, -24(%rsp)
            movabsq $72057594037927935, %rdx
            movl    $0, -21(%rsp)
            andq    -24(%rsp), %rdx
            movq    %rdx, %rax
            salq    $8, %rax
            movb    c(%rip), %al
            ret
    
    to
    
            xorl    %eax, %eax
            movb    c(%rip), %al
            ret
    
    2022-03-11  Roger Sayle  <roger@nextmovesoftware.com>
    	    Richard Biener  <rguenther@suse.de>
    
    gcc/ChangeLog
    	PR tree-optimization/98335
    	* builtins.cc (get_object_alignment_2): Export.
    	* builtins.h (get_object_alignment_2): Likewise.
    	* tree-ssa-alias.cc (ao_ref_alignment): New.
    	* tree-ssa-alias.h (ao_ref_alignment): Declare.
    
    	* tree-ssa-dse.cc (compute_trims): Improve logic deciding whether
    	to align head/tail, writing more bytes but using fewer store insns.
    
    gcc/testsuite/ChangeLog
    	PR tree-optimization/98335
    	* g++.dg/pr98335.C: New test case.
    	* gcc.dg/pr86010.c: New test case.
    	* gcc.dg/pr86010-2.c: New test case.
    c5288df7
    History
    PR tree-optimization/98335: Improvements to DSE's compute_trims.
    Roger Sayle authored
    This patch is the main middle-end piece of a fix for PR tree-opt/98335,
    which is a code-quality regression affecting mainline.  The issue occurs
    in DSE's (dead store elimination's) compute_trims function that determines
    where a store to memory can be trimmed.  In the testcase given in the
    PR, this function notices that the first byte of a DImode store is dead,
    and replaces the 8-byte store at (aligned) offset zero, with a 7-byte store
    at (unaligned) offset one.  Most architectures can store a power-of-two
    bytes (up to a maximum) in single instruction, so writing 7 bytes requires
    more instructions than writing 8 bytes.  This patch follows Jakub Jelinek's
    suggestion in comment 5, that compute_trims needs improved heuristics.
    
    On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:
    
            movl    $0, -24(%rsp)
            movabsq $72057594037927935, %rdx
            movl    $0, -21(%rsp)
            andq    -24(%rsp), %rdx
            movq    %rdx, %rax
            salq    $8, %rax
            movb    c(%rip), %al
            ret
    
    to
    
            xorl    %eax, %eax
            movb    c(%rip), %al
            ret
    
    2022-03-11  Roger Sayle  <roger@nextmovesoftware.com>
    	    Richard Biener  <rguenther@suse.de>
    
    gcc/ChangeLog
    	PR tree-optimization/98335
    	* builtins.cc (get_object_alignment_2): Export.
    	* builtins.h (get_object_alignment_2): Likewise.
    	* tree-ssa-alias.cc (ao_ref_alignment): New.
    	* tree-ssa-alias.h (ao_ref_alignment): Declare.
    
    	* tree-ssa-dse.cc (compute_trims): Improve logic deciding whether
    	to align head/tail, writing more bytes but using fewer store insns.
    
    gcc/testsuite/ChangeLog
    	PR tree-optimization/98335
    	* g++.dg/pr98335.C: New test case.
    	* gcc.dg/pr86010.c: New test case.
    	* gcc.dg/pr86010-2.c: New test case.
builtins.h 7.03 KiB