Skip to content
Snippets Groups Projects
  • Victor Do Nascimento's avatar
    5ad64d76
    libatomic: Enable LSE128 128-bit atomics for Armv9.4-a · 5ad64d76
    Victor Do Nascimento authored
    The armv9.4-a architectural revision adds three new atomic operations
    associated with the LSE128 feature:
    
      * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
      value held in a pair of registers, with original data loaded into
      the same 2 registers.
      * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
      in a pair of registers, with original data loaded into the same 2
      registers.
      * SWPP - Atomic swap of one 128-bit value with 128-bit value held
      in a pair of registers.
    
    It is worth noting that in keeping with existing 128-bit atomic
    operations in `atomic_16.S', we have chosen to merge certain
    less-restrictive orderings into more restrictive ones.  This is done
    to minimize the number of branches in the atomic functions, minimizing
    both the likelihood of branch mispredictions and, in keeping code
    small, limit the need for extra fetch cycles.
    
    Past benchmarking has revealed that acquire is typically slightly
    faster than release (5-10%), such that for the most frequently used
    atomics (CAS and SWP) it makes sense to add support for acquire, as
    well as release.
    
    Likewise, it was identified that combining acquire and release typically
    results in little to no penalty, such that it is of negligible benefit
    to distinguish between release and acquire-release, making the
    combining release/acq_rel/seq_cst a worthwhile design choice.
    
    This patch adds the logic required to make use of these when the
    architectural feature is present and a suitable assembler available.
    
    In order to do this, the following changes are made:
    
      1. Add a configure-time check to check for LSE128 support in the
      assembler.
      2. Edit host-config.h so that when N == 16, nifunc = 2.
      3. Where available due to LSE128, implement the second ifunc, making
      use of the novel instructions.
      4. For atomic functions unable to make use of these new
      instructions, define a new alias which causes the _i1 function
      variant to point ahead to the corresponding _i2 implementation.
    
    libatomic/ChangeLog:
    
    	* Makefile.am (AM_CPPFLAGS): add conditional setting of
    	-DHAVE_FEAT_LSE128.
    	* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LSE128): New.
    	* config/linux/aarch64/atomic_16.S (LSE128): New macro
    	definition.
    	(libat_exchange_16): New LSE128 variant.
    	(libat_fetch_or_16): Likewise.
    	(libat_or_fetch_16): Likewise.
    	(libat_fetch_and_16): Likewise.
    	(libat_and_fetch_16): Likewise.
    	* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
    	(IFUNC_NCOND): Add operand size checking.
    	(has_lse2): Renamed from `ifunc1`.
    	(has_lse128): New.
    	(HWCAP2_LSE128): Likewise.
    	* configure.ac: Add call to
    	LIBAT_TEST_FEAT_AARCH64_LSE128.
    	* configure (ac_subst_vars): Regenerated via autoreconf.
    	* Makefile.in: Likewise.
    	* auto-config.h.in: Likewise.
    5ad64d76
    History
    libatomic: Enable LSE128 128-bit atomics for Armv9.4-a
    Victor Do Nascimento authored
    The armv9.4-a architectural revision adds three new atomic operations
    associated with the LSE128 feature:
    
      * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
      value held in a pair of registers, with original data loaded into
      the same 2 registers.
      * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
      in a pair of registers, with original data loaded into the same 2
      registers.
      * SWPP - Atomic swap of one 128-bit value with 128-bit value held
      in a pair of registers.
    
    It is worth noting that in keeping with existing 128-bit atomic
    operations in `atomic_16.S', we have chosen to merge certain
    less-restrictive orderings into more restrictive ones.  This is done
    to minimize the number of branches in the atomic functions, minimizing
    both the likelihood of branch mispredictions and, in keeping code
    small, limit the need for extra fetch cycles.
    
    Past benchmarking has revealed that acquire is typically slightly
    faster than release (5-10%), such that for the most frequently used
    atomics (CAS and SWP) it makes sense to add support for acquire, as
    well as release.
    
    Likewise, it was identified that combining acquire and release typically
    results in little to no penalty, such that it is of negligible benefit
    to distinguish between release and acquire-release, making the
    combining release/acq_rel/seq_cst a worthwhile design choice.
    
    This patch adds the logic required to make use of these when the
    architectural feature is present and a suitable assembler available.
    
    In order to do this, the following changes are made:
    
      1. Add a configure-time check to check for LSE128 support in the
      assembler.
      2. Edit host-config.h so that when N == 16, nifunc = 2.
      3. Where available due to LSE128, implement the second ifunc, making
      use of the novel instructions.
      4. For atomic functions unable to make use of these new
      instructions, define a new alias which causes the _i1 function
      variant to point ahead to the corresponding _i2 implementation.
    
    libatomic/ChangeLog:
    
    	* Makefile.am (AM_CPPFLAGS): add conditional setting of
    	-DHAVE_FEAT_LSE128.
    	* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LSE128): New.
    	* config/linux/aarch64/atomic_16.S (LSE128): New macro
    	definition.
    	(libat_exchange_16): New LSE128 variant.
    	(libat_fetch_or_16): Likewise.
    	(libat_or_fetch_16): Likewise.
    	(libat_fetch_and_16): Likewise.
    	(libat_and_fetch_16): Likewise.
    	* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
    	(IFUNC_NCOND): Add operand size checking.
    	(has_lse2): Renamed from `ifunc1`.
    	(has_lse128): New.
    	(HWCAP2_LSE128): Likewise.
    	* configure.ac: Add call to
    	LIBAT_TEST_FEAT_AARCH64_LSE128.
    	* configure (ac_subst_vars): Regenerated via autoreconf.
    	* Makefile.in: Likewise.
    	* auto-config.h.in: Likewise.