-
- Downloads
PR target/103069: Relax cmpxchg loop for x86 target
From the CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and causes excessive cache line bouncing. The atomic_fetch_{or,xor,and,nand} builtins generates cmpxchg loop under -march=x86-64 like: movl v(%rip), %eax .L2: movl %eax, %ecx movl %eax, %edx orl $1, %ecx lock cmpxchgl %ecx, v(%rip) jne .L2 movl %edx, %eax andl $1, %eax ret To relax above loop, GCC should first emit a normal load, check and jump to .L2 if cmpxchgl may fail. Before jump to .L2, PAUSE should be inserted to yield the CPU to another hyperthread and to save power, so the code is like .L84: movl (%rdi), %ecx movl %eax, %edx orl %esi, %edx cmpl %eax, %ecx jne .L82 lock cmpxchgl %edx, (%rdi) jne .L84 .L82: rep nop jmp .L84 This patch adds corresponding atomic_fetch_op expanders to insert load/ compare and pause for all the atomic logic fetch builtins. Add flag -mrelax-cmpxchg-loop to control whether to generate relaxed loop. gcc/ChangeLog: PR target/103069 * config/i386/i386-expand.c (ix86_expand_atomic_fetch_op_loop): New expand function. * config/i386/i386-options.c (ix86_target_string): Add -mrelax-cmpxchg-loop flag. (ix86_valid_target_attribute_inner_p): Likewise. * config/i386/i386-protos.h (ix86_expand_atomic_fetch_op_loop): New expand function prototype. * config/i386/i386.opt: Add -mrelax-cmpxchg-loop. * config/i386/sync.md (atomic_fetch_<logic><mode>): New expander for SI,HI,QI modes. (atomic_<logic>_fetch<mode>): Likewise. (atomic_fetch_nand<mode>): Likewise. (atomic_nand_fetch<mode>): Likewise. (atomic_fetch_<logic><mode>): New expander for DI,TI modes. (atomic_<logic>_fetch<mode>): Likewise. (atomic_fetch_nand<mode>): Likewise. (atomic_nand_fetch<mode>): Likewise. * doc/invoke.texi: Document -mrelax-cmpxchg-loop. gcc/testsuite/ChangeLog: PR target/103069 * gcc.target/i386/pr103069-1.c: New test. * gcc.target/i386/pr103069-2.c: Ditto.
Showing
- gcc/config/i386/i386-expand.c 76 additions, 0 deletionsgcc/config/i386/i386-expand.c
- gcc/config/i386/i386-options.c 6 additions, 1 deletiongcc/config/i386/i386-options.c
- gcc/config/i386/i386-protos.h 2 additions, 0 deletionsgcc/config/i386/i386-protos.h
- gcc/config/i386/i386.opt 4 additions, 0 deletionsgcc/config/i386/i386.opt
- gcc/config/i386/sync.md 117 additions, 0 deletionsgcc/config/i386/sync.md
- gcc/doc/invoke.texi 8 additions, 1 deletiongcc/doc/invoke.texi
- gcc/testsuite/gcc.target/i386/pr103069-1.c 35 additions, 0 deletionsgcc/testsuite/gcc.target/i386/pr103069-1.c
- gcc/testsuite/gcc.target/i386/pr103069-2.c 70 additions, 0 deletionsgcc/testsuite/gcc.target/i386/pr103069-2.c
Loading
Please register or sign in to comment