Skip to content
Snippets Groups Projects
  1. Nov 14, 2024
    • Christophe Lyon's avatar
      libgcc: Fix COPY_ARG_VAL initializer (PR 117537) · 13a966d5
      Christophe Lyon authored
      We recently forced -Werror when building libgcc for aarch64, to make
      sure we'd catch and fix the kind of problem described in the PR.
      
      In this case, when building for aarch64_be (so, big endian), gcc emits
      this warning/error:
      libgcc/config/libbid/bid_conf.h:847:25: error: missing braces around initializer [-Werror=missing-braces]
        847 |        UINT128 arg_name={ bid_##arg_name.w[1], bid_##arg_name.w[0]};
      libgcc/config/libbid/bid_conf.h:871:8: note: in expansion of macro 'COPY_ARG_VAL'
        871 |        COPY_ARG_VAL(arg_name)
      
      This patch fixes the problem by adding curly braces around the
      initializer for COPY_ARG_VAL in the big endian case.
      
      It seems that COPY_ARG_REF (just above COPY_ARG_VAL) has a similar
      issue, but DECIMAL_CALL_BY_REFERENCE seems always defined to 0, so
      COPY_ARG_REF is never used.  The patch fixes it too, though.
      
      libgcc/config/libbid/ChangeLog:
      
      	PR libgcc/117537
      	* bid_conf.h (COPY_ARG_REF): Fix initializer.
      	(COPY_ARG_VAL): Likewise.
      13a966d5
    • Andrew Pinski's avatar
      cfgexpand: Skip doing conflicts if there is only 1 variable · 301dab51
      Andrew Pinski authored
      
      This is a small speed up. If there is only one know stack variable, there
      is no reason figure out the scope conflicts as there are none. So don't
      go through all the live range calculations just to see there are none.
      
      Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      gcc/ChangeLog:
      
      	* cfgexpand.cc (add_scope_conflicts): Return right away
      	if there are only one stack variable.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      301dab51
    • Eikansh Gupta's avatar
      MATCH: Simplify `a rrotate (32-b) -> a lrotate b` [PR109906] · 879c1619
      Eikansh Gupta authored
      
      The pattern `a rrotate (32-b)` should be optimized to `a lrotate b`.
      The same is also true for `a lrotate (32-b)`. It can be optimized to
      `a rrotate b`.
      
      This patch adds following patterns:
      a rrotate (32-b) -> a lrotate b
      a lrotate (32-b) -> a rrotate b
      
      Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      	PR tree-optimization/109906
      
      gcc/ChangeLog:
      
      	* match.pd (a rrotate (32-b) -> a lrotate b): New pattern
      	(a lrotate (32-b) -> a rrotate b): New pattern
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/tree-ssa/pr109906.c: New test.
      
      Signed-off-by: default avatarEikansh Gupta <quic_eikagupt@quicinc.com>
      879c1619
    • Richard Biener's avatar
      Do not consider overrun for VMAT_ELEMENTWISE · 6d85a0bc
      Richard Biener authored
      When we classify an SLP access as VMAT_ELEMENTWISE we still consider
      overrun - the reset of it is later overwritten.  The following fixes
      this, resolving a few RISC-V FAILs with --param vect-force-slp=1.
      
      	* tree-vect-stmts.cc (get_group_load_store_type): For
      	VMAT_ELEMENTWISE there's no overrun.
      6d85a0bc
    • Richard Biener's avatar
      tree-optimization/117554 - correct single-element interleaving check · 72df175c
      Richard Biener authored
      In addition to a single DR we also require a single lane, not a splat.
      
      	PR tree-optimization/117554
      	* tree-vect-stmts.cc (get_group_load_store_type): We can
      	use gather/scatter only for a single-lane single element group
      	access.
      72df175c
    • Richard Biener's avatar
      tree-optimization/117559 - avoid hybrid SLP for masked load/store lanes · ba192895
      Richard Biener authored
      Hybrid analysis is confused by the mask_conversion pattern making a
      uniform mask non-uniform.  As load/store lanes only uses a single
      lane to mask all data lanes the SLP graph doesn't cover the alternate
      (redundant) mask lanes and thus their pattern defs.  The following adds
      a hack to mark them covered.
      
      Fixes gcc.target/aarch64/sve/mask_struct_store_?.c with forced SLP.
      
      	PR tree-optimization/117559
      	* tree-vect-slp.cc (vect_mark_slp_stmts): Pass in vinfo,
      	mark all mask defs of a load/store-lane .MASK_LOAD/STORE
      	as pure.
      	(vect_make_slp_decision): Adjust.
      	(vect_slp_analyze_bb_1): Likewise.
      ba192895
    • Richard Biener's avatar
      tree-optimization/117556 - SLP of live stmts from load-lanes · 4b4aa47e
      Richard Biener authored
      The following fixes SLP live lane generation for load-lanes which
      fails to analyze for gcc.dg/vect/vect-live-slp-3.c because the
      VLA division doesn't work out but it would also wrongly use the
      transposed vector defs I think.  The following properly disables
      the actual load-lanes SLP node from live lane processing and instead
      relies on the SLP permute node representing the live lane where we
      can use extract-last to extract the last lane.  This also fixes
      the reported Ada miscompile.
      
      	PR tree-optimization/117556
      	PR tree-optimization/117553
      	* tree-vect-stmts.cc (vect_analyze_stmt): Do not analyze
      	the SLP load-lanes node for live lanes, but only the
      	permute node.
      	(vect_transform_stmt): Likewise for the transform.
      
      	* gcc.dg/vect/vect-live-slp-3.c: Expect us to SLP even for
      	VLA vectors (in single-lane mode).
      4b4aa47e
    • Pan Li's avatar
      RISC-V: Rearrange the test files for scalar SAT_ADD [NFC] · 735f5260
      Pan Li authored
      
      The test files of scalar SAT_ADD only has numbers as the suffix.
      Rearrange the file name to -{form number}-{target-type}.  For example,
      test form 3 for uint32_t SAT_ADD will have -3-u32.c for asm check and
      -run-3-u32.c for the run test.
      
      The below test suites are passed for this patch.
      * The rv64gcv fully regression test.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/sat_s_add-2.c: Move to...
      	* gcc.target/riscv/sat_s_add-1-i16.c: ...here.
      	* gcc.target/riscv/sat_s_add-3.c: Move to...
      	* gcc.target/riscv/sat_s_add-1-i32.c: ...here.
      	* gcc.target/riscv/sat_s_add-4.c: Move to...
      	* gcc.target/riscv/sat_s_add-1-i64.c: ...here.
      	* gcc.target/riscv/sat_s_add-1.c: Move to...
      	* gcc.target/riscv/sat_s_add-1-i8.c: ...here.
      	* gcc.target/riscv/sat_s_add-6.c: Move to...
      	* gcc.target/riscv/sat_s_add-2-i16.c: ...here.
      	* gcc.target/riscv/sat_s_add-7.c: Move to...
      	* gcc.target/riscv/sat_s_add-2-i32.c: ...here.
      	* gcc.target/riscv/sat_s_add-8.c: Move to...
      	* gcc.target/riscv/sat_s_add-2-i64.c: ...here.
      	* gcc.target/riscv/sat_s_add-5.c: Move to...
      	* gcc.target/riscv/sat_s_add-2-i8.c: ...here.
      	* gcc.target/riscv/sat_s_add-10.c: Move to...
      	* gcc.target/riscv/sat_s_add-3-i16.c: ...here.
      	* gcc.target/riscv/sat_s_add-11.c: Move to...
      	* gcc.target/riscv/sat_s_add-3-i32.c: ...here.
      	* gcc.target/riscv/sat_s_add-12.c: Move to...
      	* gcc.target/riscv/sat_s_add-3-i64.c: ...here.
      	* gcc.target/riscv/sat_s_add-9.c: Move to...
      	* gcc.target/riscv/sat_s_add-3-i8.c: ...here.
      	* gcc.target/riscv/sat_s_add-14.c: Move to...
      	* gcc.target/riscv/sat_s_add-4-i16.c: ...here.
      	* gcc.target/riscv/sat_s_add-15.c: Move to...
      	* gcc.target/riscv/sat_s_add-4-i32.c: ...here.
      	* gcc.target/riscv/sat_s_add-16.c: Move to...
      	* gcc.target/riscv/sat_s_add-4-i64.c: ...here.
      	* gcc.target/riscv/sat_s_add-13.c: Move to...
      	* gcc.target/riscv/sat_s_add-4-i8.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-2.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-1-i16.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-3.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-1-i32.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-4.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-1-i64.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-1.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-1-i8.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-6.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-2-i16.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-7.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-2-i32.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-8.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-2-i64.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-5.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-2-i8.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-10.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-3-i16.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-11.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-3-i32.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-12.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-3-i64.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-9.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-3-i8.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-14.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-4-i16.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-15.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-4-i32.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-16.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-4-i64.c: ...here.
      	* gcc.target/riscv/sat_s_add-run-13.c: Move to...
      	* gcc.target/riscv/sat_s_add-run-4-i8.c: ...here.
      	* gcc.target/riscv/sat_u_add-2.c: Move to...
      	* gcc.target/riscv/sat_u_add-1-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-3.c: Move to...
      	* gcc.target/riscv/sat_u_add-1-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-4.c: Move to...
      	* gcc.target/riscv/sat_u_add-1-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-1.c: Move to...
      	* gcc.target/riscv/sat_u_add-1-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-6.c: Move to...
      	* gcc.target/riscv/sat_u_add-2-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-7.c: Move to...
      	* gcc.target/riscv/sat_u_add-2-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-8.c: Move to...
      	* gcc.target/riscv/sat_u_add-2-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-5.c: Move to...
      	* gcc.target/riscv/sat_u_add-2-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-10.c: Move to...
      	* gcc.target/riscv/sat_u_add-3-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-11.c: Move to...
      	* gcc.target/riscv/sat_u_add-3-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-12.c: Move to...
      	* gcc.target/riscv/sat_u_add-3-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-9.c: Move to...
      	* gcc.target/riscv/sat_u_add-3-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-14.c: Move to...
      	* gcc.target/riscv/sat_u_add-4-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-15.c: Move to...
      	* gcc.target/riscv/sat_u_add-4-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-16.c: Move to...
      	* gcc.target/riscv/sat_u_add-4-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-13.c: Move to...
      	* gcc.target/riscv/sat_u_add-4-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-18.c: Move to...
      	* gcc.target/riscv/sat_u_add-5-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-19.c: Move to...
      	* gcc.target/riscv/sat_u_add-5-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-20.c: Move to...
      	* gcc.target/riscv/sat_u_add-5-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-17.c: Move to...
      	* gcc.target/riscv/sat_u_add-5-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-22.c: Move to...
      	* gcc.target/riscv/sat_u_add-6-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-23.c: Move to...
      	* gcc.target/riscv/sat_u_add-6-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-24.c: Move to...
      	* gcc.target/riscv/sat_u_add-6-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-21.c: Move to...
      	* gcc.target/riscv/sat_u_add-6-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-2.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-1-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-3.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-1-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-4.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-1-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-1.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-1-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-6.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-2-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-7.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-2-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-8.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-2-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-5.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-2-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-10.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-3-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-11.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-3-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-12.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-3-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-9.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-3-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-14.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-4-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-15.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-4-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-16.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-4-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-13.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-4-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-18.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-5-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-19.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-5-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-20.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-5-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-17.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-5-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-22.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-6-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-23.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-6-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-24.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-6-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add-run-21.c: Move to...
      	* gcc.target/riscv/sat_u_add-run-6-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-2.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-1-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-3.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-1-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-4.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-1-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-1.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-1-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-6.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-2-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-7.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-2-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-8.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-2-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-5.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-2-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-10.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-3-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-11.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-3-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-12.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-3-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-9.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-3-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-14.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-4-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-15.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-4-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-16.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-4-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-13.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-4-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-2.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-1-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-3.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-1-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-4.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-1-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-1.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-1-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-6.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-2-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-7.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-2-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-8.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-2-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-5.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-2-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-10.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-3-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-11.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-3-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-12.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-3-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-9.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-3-u8.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-14.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-4-u16.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-15.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-4-u32.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-16.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-4-u64.c: ...here.
      	* gcc.target/riscv/sat_u_add_imm-run-13.c: Move to...
      	* gcc.target/riscv/sat_u_add_imm-run-4-u8.c: ...here.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      735f5260
    • Hongyu Wang's avatar
      i386: Fix cstorebf4 fp comparison operand [PR117495] · 19b24f4a
      Hongyu Wang authored
      For cstorebf4 it uses comparison_operator for BFmode compare, which is
      incorrect when directly uses ix86_expand_setcc as it does not canonicalize
      the input comparison to correct the compare code by swapping operands.
      The original code without AVX10.2 calls emit_store_flag_force, who
      actually calls to emit_store_flags_1 and recurisive calls to this expander
      again with swapped operand and flag.
      Therefore, we can avoid do the redundant recurisive call by adjusting
      the comparison_operator to ix86_fp_comparison_operator, and calls
      ix86_expand_setcc directly.
      
      gcc/ChangeLog:
      
      	PR target/117495
      	* config/i386/i386.md (cstorebf4): Use ix86_fp_comparison_operator
      	and calls ix86_expand_setcc directly.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/117495
      	* gcc.target/i386/pr117495.c: New test.
      19b24f4a
  2. Nov 13, 2024
    • Jin Ma's avatar
      [PATCH] RISC-V: Bugfix for unrecognizable insn for XTheadVector · 8564d094
      Jin Ma authored
      error: unrecognizable insn:
      
      (insn 35 34 36 2 (set (subreg:RVVM1SF (reg/v:RVVM1x4SF 142 [ _r ]) 0)
              (unspec:RVVM1SF [
                      (const_vector:RVVM1SF repeat [
                              (const_double:SF 0.0 [0x0.0p+0])
                          ])
                      (reg:DI 0 zero)
                      (const_int 1 [0x1])
                      (reg:SI 66 vl)
                      (reg:SI 67 vtype)
                  ] UNSPEC_TH_VWLDST)) -1
           (nil))
      during RTL pass: mode_sw
      
      	PR target/116591
      
      gcc/ChangeLog:
      
      	* config/riscv/vector.md: Add restriction to call pred_th_whole_mov.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/xtheadvector/pr116591.c: New test.
      8564d094
    • Jonathan Wakely's avatar
      libstdc++: Refactor std::hash specializations · 01ba02ca
      Jonathan Wakely authored
      This attempts to simplify and clean up our std::hash code. The primary
      benefit is improved diagnostics for users when they do something wrong
      involving std::hash or unordered containers. An additional benefit is
      that for the unstable ABI (--enable-symvers=gnu-versioned-namespace) we
      can reduce the memory footprint of several std::hash specializations.
      
      In the current design, __hash_enum is a base class of the std::hash
      primary template, but the partial specialization of __hash_enum for
      non-enum types is disabled.  This means that if a user forgets to
      specialize std::hash for their class type (or forgets to use a custom
      hash function for unordered containers) they get error messages about
      std::__hash_enum not being constructible.  This is confusing when there
      is no enum type involved: why should users care about __hash_enum not
      being constructible if they're not trying to hash enums?
      
      This change makes the std::hash primary template only derive from
      __hash_enum when the template argument type is an enum. Otherwise, it
      derives directly from a new class template, __hash_not_enabled. This new
      class template defines the deleted members that cause a given std::hash
      specialization to be a disabled specialization (as per P0513R0). Now
      when users try to use a disabled specialization, they get more
      descriptive errors that mention __hash_not_enabled instead of
      __hash_enum.
      
      Additionally, adjust __hash_base to remove the deprecated result_type
      and argument_type typedefs for C++20 and later.
      
      In the current code we use a __poison_hash base class in the std::hash
      specializations for std::unique_ptr, std::optional, and std::variant.
      The primary template of __poison_hash has deleted special members, which
      is used to conditionally disable the derived std::hash specialization.
      This can also result in confusing diagnostics, because seeing "poison"
      in an enabled specialization is misleading. Only some uses of
      __poison_hash actually "poison" anything, i.e. cause a specialization to
      be disabled. In other cases it's just an empty base class that does
      nothing.
      
      This change removes __poison_hash and changes the std::hash
      specializations that were using it to conditionally derive from
      __hash_not_enabled instead. When the std::hash specialization is
      enabled, there is no more __poison_hash base class. However, to preserve
      the ABI properties of those std::hash specializations, we need to
      replace __poison_hash with some other empty base class. This is needed
      because in the current code std::hash<std::variant<int, const int>> has
      two __poison_hash<int> base classes, which must have unique addresses,
      so sizeof(std::hash<std::variant<int, const int>>) == 2. To preserve
      this unfortunate property, a new __hash_empty_base class is used as a
      base class to re-introduce du0plicate base classes that increase the
      class size. For the unstable ABI we don't use __hash_empty_base so the
      std::hash<std::variant<T...>> specializations are always size 1, and
      the class hierarchy is much simpler so will compile faster.
      
      Additionally, remove the result_type and argument_type typedefs from all
      disabled specializations of std::hash for std::unique_ptr,
      std::optional, and std::variant. Those typedefs are useless for disabled
      specializations, and although the standard doesn't say they must *not*
      be present for disabled specializations, it certainly only requires them
      for enabled specializations. Finally, for C++20 the typedefs are also
      removed from enabled specializations of std::hash for std::unique_ptr,
      std::optional, and std::variant.
      
      libstdc++-v3/ChangeLog:
      
      	* doc/xml/manual/evolution.xml: Document removal of nested types
      	from std::hash specializations.
      	* doc/html/manual/api.html: Regenerate.
      	* include/bits/functional_hash.h (__hash_base): Remove
      	deprecated nested types for C++20.
      	(__hash_empty_base): Define new class template.
      	(__is_hash_enabled_for): Define new variable template.
      	(__poison_hash): Remove.
      	(__hash_not_enabled): Define new class template.
      	(__hash_enum): Remove partial specialization for non-enums.
      	(hash): Derive from __hash_not_enabled for non-enums, instead of
      	__hash_enum.
      	* include/bits/unique_ptr.h (__uniq_ptr_hash): Derive from
      	__hash_base. Conditionally derive from __hash_empty_base.
      	(__uniq_ptr_hash<>): Remove disabled specialization.
      	(hash): Do not derive from __hash_base unconditionally.
      	Conditionally derive from either __uniq_ptr_hash or
      	__hash_not_enabled.
      	* include/std/optional (__optional_hash_call_base): Remove.
      	(__optional_hash): Define new class template.
      	(hash): Derive from either
      	(hash): Conditionally derive from either __optional_hash or
      	__hash_not_enabled. Remove nested typedefs.
      	* include/std/variant (_Base_dedup): Replace __poison_hash with
      	__hash_empty_base.
      	(__variant_hash_call_base_impl): Remove.
      	(__variant_hash): Define new class template.
      	(hash): Conditionally derive from either __variant_hash or
      	__hash_not_enabled. Remove nested typedefs.
      	* testsuite/20_util/optional/hash.cc: Check whether nested types
      	are present.
      	* testsuite/20_util/variant/hash.cc: Likewise.
      	* testsuite/20_util/optional/hash_abi.cc: New test.
      	* testsuite/20_util/unique_ptr/hash/abi.cc: New test.
      	* testsuite/20_util/unique_ptr/hash/types.cc: New test.
      	* testsuite/20_util/variant/hash_abi.cc: New test.
      01ba02ca
    • Jonathan Wakely's avatar
      libstdc++: Add _Hashtable::_M_locate(const key_type&) · 84e39b07
      Jonathan Wakely authored
      We have two overloads of _M_find_before_node but they have quite
      different performance characteristics, which isn't necessarily obvious.
      
      The original version, _M_find_before_node(bucket, key, hash_code), looks
      only in the specified bucket, doing a linear search within that bucket
      for an element that compares equal to the key. This is the typical fast
      lookup for hash containers, assuming the load factor is low so that each
      bucket isn't too large.
      
      The newer _M_find_before_node(key) was added in r12-6272-ge3ef832a9e8d6a
      and could be naively assumed to calculate the hash code and bucket for
      key and then call the efficient _M_find_before_node(bkt, key, code)
      function. But in fact it does a linear search of the entire container.
      This is potentially very slow and should only be used for a suitably
      small container, as determined by the __small_size_threshold() function.
      We don't even have a comment pointing out this O(N) performance of the
      newer overload.
      
      Additionally, the newer overload is only ever used in exactly one place,
      which would suggest it could just be removed. However there are several
      places that do the linear search of the whole container with an explicit
      loop each time.
      
      This adds a new member function, _M_locate, and uses it to replace most
      uses of _M_find_node and the loops doing linear searches. This new
      member function does both forms of lookup, the linear search for small
      sizes and the _M_find_node(bkt, key, code) lookup within a single
      bucket. The new function returns a __location_type which is a struct
      that contains a pointer to the first node matching the key (if such a
      node is present), or the hash code and bucket index for the key. The
      hash code and bucket index allow the caller to know where a new node
      with that key should be inserted, for the cases where the lookup didn't
      find a matching node.
      
      The result struct actually contains a pointer to the node *before* the
      one that was located, as that is needed for it to be useful in erase and
      extract members. There is a member function that returns the found node,
      i.e. _M_before->_M_nxt downcast to __node_ptr, which should be used in
      most cases.
      
      This new function greatly simplifies the functions that currently have
      to do two kinds of lookup and explicitly check the current size against
      the small size threshold.
      
      Additionally, now that try_emplace is defined directly in _Hashtable
      (not in _Insert_base) we can use _M_locate in there too, to speed up
      some try_emplace calls. Previously it did not do the small-size linear
      search.
      
      It would be possible to add a function to get a __location_type from an
      iterator, and then rewrite some functions like _M_erase and
      _M_extract_node to take a __location_type parameter. While that might be
      conceptually nice, it wouldn't really make the code any simpler or more
      readable than it is now. That isn't done in this change.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable.h (__location_type): New struct.
      	(_M_locate): New member function.
      	(_M_find_before_node(const key_type&)): Remove.
      	(_M_find_node): Move variable initialization into condition.
      	(_M_find_node_tr): Likewise.
      	(operator=(initializer_list<T>), try_emplace, _M_reinsert_node)
      	(_M_merge_unique, find, erase(const key_type&)): Use _M_locate
      	for lookup.
      84e39b07
    • Jonathan Wakely's avatar
      libstdc++: Simplify _Hashtable merge functions · a147bfca
      Jonathan Wakely authored
      I realised that _M_merge_unique and _M_merge_multi call extract(iter)
      which then has to call _M_get_previous_node to iterate through the
      bucket to find the node before the one iter points to. Since the merge
      function is already iterating over the entire container, we had the
      previous node a moment ago. Walking the whole bucket to find it again is
      wasteful. We could just rewrite the loop in terms of node pointers
      instead of iterators, and then call _M_extract_node directly. However,
      this is only possible when the source container is the same type as the
      destination, because otherwise we can't access the source's private
      members (_M_before_begin, _M_begin, _M_extract_node etc.)
      
      Add overloads of _M_merge_unique and _M_merge_multi that work with
      source containers of the same type, to enable this optimization.
      
      For both overloads of _M_merge_unique we can also remove the conditional
      modifications to __n_elt and just consistently decrement it for every
      element processed. Use a multiplier of one or zero that dictates whether
      __n_elt is passed to _M_insert_unique_node or not. We can also remove
      the repeated calls to size() and just keep track of the size in a local
      variable.
      
      Although _M_merge_unique and _M_merge_multi should be safe for
      "self-merge", i.e. when doing c.merge(c), it's wasteful to search/insert
      every element when we don't need to do anything. Add 'this == &source'
      checks to the overloads taking an lvalue of the container's own type.
      Because those checks aren't needed for the rvalue overloads, change
      those to call the underlying _M_merge_xxx function directly instead of
      going through the lvalue overload that checks the address.
      
      I've also added more extensive tests for better coverage of the new
      overloads added in this commit.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable.h (_M_merge_unique): Add overload for
      	merging from same type.
      	(_M_merge_unique<Compatible>): Simplify size tracking. Add
      	comment.
      	(_M_merge_multi): Add overload for merging from same type.
      	(_M_merge_multi<Compatible>): Add comment.
      	* include/bits/unordered_map.h (unordered_map::merge): Check for
      	self-merge in the lvalue overload. Call _M_merge_unique directly
      	for the rvalue overload.
      	(unordered_multimap::merge): Likewise.
      	* include/bits/unordered_set.h (unordered_set::merge): Likewise.
      	(unordered_multiset::merge): Likewise.
      	* testsuite/23_containers/unordered_map/modifiers/merge.cc:
      	Add more tests.
      	* testsuite/23_containers/unordered_multimap/modifiers/merge.cc:
      	Likewise.
      	* testsuite/23_containers/unordered_multiset/modifiers/merge.cc:
      	Likewise.
      	* testsuite/23_containers/unordered_set/modifiers/merge.cc:
      	Likewise.
      a147bfca
    • Jonathan Wakely's avatar
      libstdc++: Remove _Hashtable_base::_S_equals · 55dbf154
      Jonathan Wakely authored
      This removes the overloaded _S_equals and _S_node_equals functions,
      replacing them with 'if constexpr' in the handful of places they're
      used.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable_policy.h (_Hashtable_base::_S_equals):
      	Remove.
      	(_Hashtable_base::_S_node_equals): Remove.
      	(_Hashtable_base::_M_key_equals_tr): Fix inaccurate
      	static_assert string.
      	(_Hashtable_base::_M_equals, _Hashtable_base::_M_equals_tr): Use
      	'if constexpr' instead of _S_equals.
      	(_Hashtable_base::_M_node_equals): Use 'if constexpr' instead of
      	_S_node_equals.
      55dbf154
    • Jonathan Wakely's avatar
      libstdc++: Remove _Equality base class from _Hashtable · 247e82c7
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable.h (_Hashtable): Remove _Equality base
      	class.
      	(_Hashtable::_M_equal): Define equality comparison here instead
      	of in _Equality::_M_equal.
      	* include/bits/hashtable_policy.h (_Equality): Remove.
      247e82c7
    • Jonathan Wakely's avatar
      libstdc++: Remove _Insert base class from _Hashtable · 0935d0d6
      Jonathan Wakely authored
      
      There's no reason to have a separate base class defining the insert
      member functions now. They can all be moved into the _Hashtable class,
      which simplifies them slightly.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable.h (_Hashtable): Remove inheritance from
      	__detail::_Insert and move its members into _Hashtable.
      	* include/bits/hashtable_policy.h (__detail::_Insert): Remove.
      
      Reviewed-by: default avatarFrançois Dumont <fdumont@gcc.gnu.org>
      0935d0d6
    • Jonathan Wakely's avatar
      libstdc++: Use RAII in _Hashtable · d2970e86
      Jonathan Wakely authored
      
      Use scoped guard types to clean up if an exception is thrown. This
      allows some try-catch blocks to be removed.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable.h (operator=(const _Hashtable&)): Use
      	RAII instead of try-catch.
      	(_M_assign(_Ht&&, _NodeGenerator&)): Likewise.
      
      Reviewed-by: default avatarFrançois Dumont <fdumont@gcc.gnu.org>
      d2970e86
    • Jonathan Wakely's avatar
      libstdc++: Replace _Hashtable::__fwd_value_for with cast · e717c322
      Jonathan Wakely authored
      
      We can just use a cast to the appropriate type instead of calling a
      function to do it. This gives the compiler less work to compile and
      optimize, and at -O0 avoids a function call per element.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable.h (_Hashtable::__fwd_value_for):
      	Remove.
      	(_Hashtable::_M_assign): Use static_cast instead of
      	__fwd_value_for.
      
      Reviewed-by: default avatarFrançois Dumont <fdumont@gcc.gnu.org>
      e717c322
    • Jonathan Wakely's avatar
      libstdc++: Add _Hashtable::_M_assign for the common case · 37b17388
      Jonathan Wakely authored
      
      This adds a convenient _M_assign overload for the common case where the
      node generator is the _AllocNode type. Only two places need to call
      _M_assign with a _ReuseOrAllocNode node generator, so all the other
      calls to _M_assign can use the new overload instead of manually
      constructing a node generator.
      
      The _AllocNode::operator(Args&&...) function doesn't need to be a
      variadic template. It is only ever called with a single argument of type
      const value_type& or value_type&&, so could be simplified. That isn't
      done in this commit.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable.h (_Hashtable): Remove typedefs for
      	node generators.
      	(_Hashtable::_M_assign(_Ht&&)): Add new overload.
      	(_Hashtable::operator=(initializer_list<value_type>)): Add local
      	typedef for node generator.
      	(_Hashtable::_M_assign_elements): Likewise.
      	(_Hashtable::operator=(const _Hashtable&)): Use new _M_assign
      	overload.
      	(_Hashtable(const _Hashtable&)): Likewise.
      	(_Hashtable(const _Hashtable&, const allocator_type&)):
      	Likewise.
      	(_Hashtable(_Hashtable&&, __node_alloc_type&&, false_type)):
      	Likewise.
      	* include/bits/hashtable_policy.h (_Insert): Remove typedef for
      	node generator.
      
      Reviewed-by: default avatarFrançois Dumont <fdumont@gcc.gnu.org>
      37b17388
    • Jonathan Wakely's avatar
      libstdc++: Refactor Hashtable erasure · 73676cfb
      Jonathan Wakely authored
      
      This reworks the internal member functions for erasure from
      unordered containers, similarly to the earlier commit doing it for
      insertion.
      
      Instead of multiple overloads of _M_erase which are selected via tag
      dispatching, the erase(const key_type&) member can use 'if constexpr' to
      choose an appropriate implementation (returning after erasing a single
      element for unique keys, or continuing to erase all equivalent elements
      for non-unique keys).
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable.h (_Hashtable::_M_erase): Remove
      	overloads for erasing by key, moving logic to ...
      	(_Hashtable::erase): ... here.
      
      Reviewed-by: default avatarFrançois Dumont <fdumont@gcc.gnu.org>
      73676cfb
    • Jonathan Wakely's avatar
      libstdc++: Refactor Hashtable insertion [PR115285] · ce2cf1f0
      Jonathan Wakely authored
      This completely reworks the internal member functions for insertion into
      unordered containers. Currently we use a mixture of tag dispatching (for
      unique vs non-unique keys) and template specialization (for maps vs
      sets) to correctly implement insert and emplace members.
      
      This removes a lot of complexity and indirection by using 'if constexpr'
      to select the appropriate member function to call.
      
      Previously there were four overloads of _M_emplace, for unique keys and
      non-unique keys, and for hinted insertion and non-hinted. However two of
      those were redundant, because we always ignore the hint for unique keys
      and always use a hint for non-unique keys. Those four overloads have
      been replaced by two new non-overloaded function templates:
      _M_emplace_uniq and _M_emplace_multi. The former is for unique keys and
      doesn't take a hint, and the latter is for non-unique keys and takes a
      hint.
      
      In the body of _M_emplace_uniq there are special cases to handle
      emplacing values from which a key_type can be extracted directly. This
      means we don't need to allocate a node and construct a value_type that
      might be discarded if an equivalent key is already present. The special
      case applies when emplacing the key_type into std::unordered_set, or
      when emplacing std::pair<cv key_type, X> into std::unordered_map, or
      when emplacing two values into std::unordered_map where the first has
      type cv key_type. For the std::unordered_set case, obviously if we're
      inserting something that's already the key_type, we can look it up
      directly. For the std::unordered_map cases, we know that the inserted
      std::pair<const key_type, mapped_type> would have its first element
      initialized from first member of a std::pair value, or from the first of
      two values, so if that is a key_type, we can look that up directly.
      
      All the _M_insert overloads used a node generator parameter, but apart
      from the one case where _M_insert_range was called from
      _Hashtable::operator=(initializer_list<value_type>), that parameter was
      always the _AllocNode type, never the _ReuseOrAllocNode type. Because
      operator=(initializer_list<value_type>) was rewritten in an earlier
      commit, all calls to _M_insert now use _AllocNode, so there's no reason
      to pass the generator as a template parameter when inserting.
      
      The multiple overloads of _Hashtable::_M_insert can all be removed now,
      because the _Insert_base::insert members now call either _M_emplace_uniq
      or _M_emplace_multi directly, only passing a hint to the latter. Which
      one to call is decided using 'if constexpr (__unique_keys::value)' so
      there is no unnecessary code instantiation, and overload resolution is
      much simpler.
      
      The partial specializations of the _Insert class template can be
      entirely removed, moving the minor differences in 'insert' member
      functions into the common _Insert_base base class. The different
      behaviour for maps and sets can be implemented using enable_if
      constraints and 'if constexpr'. With the _Insert class template no
      longer needed, the _Insert_base class template can be renamed to
      _Insert. This is a minor simplification for the complex inheritance
      hierarchy used by _Hashtable, removing one base class. It also means
      one less class template instantiation, and no need to match the right
      partial specialization of _Insert. The _Insert base class could be
      removed entirely by moving all its 'insert' members into _Hashtable,
      because without any variation in specializations of _Insert there is no
      reason to use a base class to define those members. That is left for a
      later commit.
      
      Consistently using _M_emplace_uniq or _M_emplace_multi for insertion
      means we no longer attempt to avoid constructing a value_type object to
      find its key, removing the PR libstdc++/96088 optimizations. This fixes
      the bugs caused by those optimizations, such as PR libstdc++/115285, but
      causes regressions in the expected number of allocations and temporary
      objects constructed for the PR 96088 tests.  It should be noted that the
      "regressions" in the 96088 tests put us exactly level with the number of
      allocations done by libc++ for those same tests.
      
      To mitigate this to some extent, _M_emplace_uniq detects when the
      emplace arguments already contain a key_type (either as the sole
      argument, for unordered_set, or as the first part of a pair of
      arguments, for unordered_map). In that specific case we don't need to
      allocate a node and construct a value type to check for an existing
      element with equivalent key.
      
      The remaining regressions in the number of allocations and temporaries
      should be addressed separately, with more conservative optimizations
      specific to std::string. That is not part of this commit.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/115285
      	* include/bits/hashtable.h (_Hashtable::_M_emplace): Replace
      	with _M_emplace_uniq and _M_emplace_multi.
      	(_Hashtable::_S_forward_key, _Hashtable::_M_insert_unique)
      	(_Hashtable::_M_insert_unique_aux, _Hashtable::_M_insert):
      	Remove.
      	* include/bits/hashtable_policy.h (_ConvertToValueType):
      	Remove.
      	(_Insert_base::_M_insert_range): Remove overload for unique keys
      	and rename overload for non-unique keys to ...
      	(_Insert_base::_M_insert_range_multi): ... this.
      	(_Insert_base::insert): Call _M_emplace_uniq or _M_emplace_multi
      	instead of _M_insert.  Add insert overloads from _Insert.
      	(_Insert_base): Rename to _Insert.
      	(_Insert): Remove
      	* testsuite/23_containers/unordered_map/96088.cc: Adjust
      	expected number of allocations.
      	* testsuite/23_containers/unordered_set/96088.cc: Likewise.
      ce2cf1f0
    • Jonathan Wakely's avatar
      libstdc++: Allow unordered_set assignment to assign to existing nodes · afc9351e
      Jonathan Wakely authored
      
      Currently the _ReuseOrAllocNode::operator(Args&&...) function always
      destroys the value stored in recycled nodes and constructs a new value.
      
      The _ReuseOrAllocNode type is only ever used for implementing
      assignment, either from another unordered container of the same type, or
      from std::initializer_list<value_type>. Consequently, the parameter pack
      Args only ever consists of a single parameter or type const value_type&
      or value_type.  We can replace the variadic parameter pack with a single
      forwarding reference parameter, and when the value_type is assignable
      from that type we can use assignment instead of destroying the existing
      value and then constructing a new one.
      
      Using assignment is typically only possible for sets, because for maps
      the value_type is std::pair<const key_type, mapped_type> and in most
      cases std::is_assignable_v<const key_type&, const key_type&> is false.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable_policy.h (_ReuseOrAllocNode::operator()):
      	Replace parameter pack with a single parameter. Assign to
      	existing value when possible.
      	* testsuite/23_containers/unordered_multiset/allocator/move_assign.cc:
      	Adjust expected count of operations.
      	* testsuite/23_containers/unordered_set/allocator/move_assign.cc:
      	Likewise.
      
      Reviewed-by: default avatarFrançois Dumont <fdumont@gcc.gnu.org>
      afc9351e
    • Jonathan Wakely's avatar
      libstdc++: Refactor _Hashtable::operator=(initializer_list<value_type>) · 9fcbbb3d
      Jonathan Wakely authored
      
      This replaces a call to _M_insert_range with open coding the loop. This
      will allow removing the node generator parameter from _M_insert_range in
      a later commit.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/hashtable.h (operator=(initializer_list)):
      	Refactor to not use _M_insert_range.
      
      Reviewed-by: default avatarFrançois Dumont <fdumont@gcc.gnu.org>
      9fcbbb3d
    • Jonathan Wakely's avatar
      libstdc++: Fix calculation of system time in performance tests · 19d0720f
      Jonathan Wakely authored
      The system_time() function used the wrong element of the splits array.
      
      Also add a comment about the units for time measurements.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/util/testsuite_performance.h (time_counter): Add
      	comment about times.
      	(time_counter::system_time): Use correct split value.
      19d0720f
    • Jonathan Wakely's avatar
      libstdc++: Write timestamp to libstdc++-performance.sum file · de10b4fc
      Jonathan Wakely authored
      The results of 'make check-performance' are appended to the .sum file,
      with no indication where one set of results ends and the next begins. We
      could just remove the file when starting a new run, but appending makes
      it a little easier to compare with previous runs, without having to copy
      and store old files.
      
      This adds a header containing a timestamp to the file when starting a
      new run.
      
      libstdc++-v3/ChangeLog:
      
      	* scripts/check_performance: Add timestamp to output file at
      	start of run.
      de10b4fc
    • Jonathan Wakely's avatar
      libstdc++: Use __is_single_threaded() in performance tests · 2b920070
      Jonathan Wakely authored
      With recent glibc releases the __gthread_active_p() function is always
      true, so we always append "-thread" onto performance benchmark names.
      
      Use the __gnu_cxx::__is_single_threaded() function instead.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/util/testsuite_performance.h: Use
      	__gnu_cxx::__is_single_threaded instead of __gthread_active_p().
      2b920070
    • Jonathan Wakely's avatar
      libstdc++: Stop using std::unary_function in perf tests · 8586e161
      Jonathan Wakely authored
      This fixes some -Wdeprecated-declarations warnings.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/performance/ext/pb_ds/hash_int_erase_mem.cc: Replace
      	std::unary_function with result_type and argument_type typedefs.
      	* testsuite/util/performance/assoc/multimap_common_type.hpp:
      	Likewise.
      8586e161
    • Jonathan Wakely's avatar
      libstdc++: Fix nodiscard warnings in perf test for memory pools · 42def7cd
      Jonathan Wakely authored
      The use of unnamed std::lock_guard temporaries was intentional here, as
      they were used like barriers (but std::barrier isn't available until
      C++20). But that gives nodiscard warnings, because unnamed temporary
      locks are usually unintentional. Use named variables in new block scopes
      instead.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/performance/20_util/memory_resource/pools.cc: Fix
      	-Wunused-value warnings about unnamed std::lock_guard objects.
      42def7cd
    • Richard Sandiford's avatar
      aarch64: Relax add_overloaded_function assert · 2d7d8179
      Richard Sandiford authored
      There are some SVE intrinsics that support one set of suffixes for
      one extension (E1, say) and another set of suffixes for another
      extension (E2, say).  It is usually the case that, mutatis mutandis,
      E2 extends E1.  Listing E1 first would then ensure that the manual
      C overload would also require E1, making it suitable for resolving
      both the E1 forms and, where appropriate, the E2 forms.
      
      However, there was one exception: the I8MM, F32MM, and F64MM extensions
      to SVE each added variants of svmmla, but there was no svmmla for SVE
      itself.  This was handled by adding an SVE entry for svmmla that only
      defined the C overload; it had no variants of its own.
      
      This situation occurs more often with upcoming patches.  Rather than
      keep adding these dummy entries, it seemed better to make the code
      automatically compute the lowest common denominator for all definitions
      that share the same C overload.
      
      gcc/
      	* config/aarch64/aarch64-protos.h
      	(aarch64_required_extensions::common_denominator): New member
      	function.
      	* config/aarch64/aarch64-sve-builtins-base.def: Remove zero-variant
      	entry for mmla.
      	* config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): Remove
      	support for it.
      	* config/aarch64/aarch64-sve-builtins.cc
      	(function_builder::add_overloaded): Relax the assert for duplicate
      	definitions and instead calculate the common denominator of all
      	requirements.
      2d7d8179
    • Filip Kastl's avatar
      i386: Add -mveclibabi=aocl [PR56504] · 99ec0eb3
      Filip Kastl authored
      
      We currently support generating vectorized math calls to the AMD core
      math library (ACML) (-mveclibabi=acml).  That library is end-of-life and
      its successor is the math library from AMD Optimizing CPU Libraries
      (AOCL).
      
      This patch adds support for AOCL (-mveclibabi=aocl).  That significantly
      broadens the range of vectorized math functions optimized for AMD CPUs
      that GCC can generate calls to.
      
      See the edit to invoke.texi for a complete list of added functions.
      Compared to the list of functions in AOCL LibM docs I left out these
      vectorized function families:
      
      - sincos and all functions working with arrays ... Because these
        functions have pointer arguments and that would require a bigger
        rework of ix86_veclibabi_aocl().  Also, I'm not sure if GCC even ever
        generates calls to these functions.
      - linearfrac ... Because these functions are specific to the AMD
        library.  There's no equivalent glibc function nor GCC internal
        function nor GCC built-in.
      - powx, sqrt, fabs ... Because GCC doesn't vectorize these functions
        into calls and uses instructions instead.
      
      I also left amd_vrd2_expm1() (the AMD docs list the function but I
      wasn't able to link calls to it with the current version of the
      library).
      
      gcc/ChangeLog:
      
      	PR target/56504
      	* config/i386/i386-options.cc (ix86_option_override_internal):
      	Add ix86_veclibabi_type_aocl case.
      	* config/i386/i386-options.h (ix86_veclibabi_aocl): Add extern
      	ix86_veclibabi_aocl().
      	* config/i386/i386-opts.h (enum ix86_veclibabi): Add
      	ix86_veclibabi_type_aocl into the ix86_veclibabi enum.
      	* config/i386/i386.cc (ix86_veclibabi_aocl): New function.
      	* config/i386/i386.opt: Add the 'aocl' type.
      	* doc/invoke.texi: Document -mveclibabi=aocl.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/56504
      	* gcc.target/i386/vectorize-aocl1.c: New test.
      
      Signed-off-by: default avatarFilip Kastl <fkastl@suse.cz>
      99ec0eb3
    • John David Anglin's avatar
      hppa: Remove inner `fix:SF/DF` from fixed-point patterns · 0342d024
      John David Anglin authored
      2024-11-13  John David Anglin  <danglin@gcc.gnu.org>
      
      gcc/ChangeLog:
      
      	PR target/117525
      	* config/pa/pa.md (fix_truncsfsi2): Remove inner `fix:SF`.
      	(fix_truncdfsi2, fix_truncsfdi2, fix_truncdfdi2,
      	fixuns_truncsfsi2, fixuns_truncdfsi2, fixuns_truncsfdi2,
      	fixuns_truncdfdi2): Likewise.
      0342d024
    • David Malcolm's avatar
      diagnostics: avoid using global_dc in path-printing · 5ace2b23
      David Malcolm authored
      
      gcc/analyzer/ChangeLog:
      	* checker-path.cc (checker_path::debug): Explicitly use
      	global_dc's reference printer.
      	* diagnostic-manager.cc
      	(diagnostic_manager::prune_interproc_events): Likewise.
      	(diagnostic_manager::prune_system_headers): Likewise.
      
      gcc/ChangeLog:
      	* diagnostic-path.cc (diagnostic_event::get_desc): Add param
      	"ref_pp" and use instead of global_dc.
      	(class path_label): Likewise, adding field m_ref_pp.
      	(event_range::event_range): Add param "ref_pp" and pass to
      	m_path_label.
      	(path_summary::path_summary): Add param "ref_pp" and pass to
      	event_range ctor.
      	(diagnostic_text_output_format::print_path): Pass *pp to
      	path_summary ctor.
      	(selftest::test_empty_path): Pass *event_pp to pass_summary ctor.
      	(selftest::test_intraprocedural_path): Likewise.
      	(selftest::test_interprocedural_path_1): Likewise.
      	(selftest::test_interprocedural_path_2): Likewise.
      	(selftest::test_recursion): Likewise.
      	(selftest::test_control_flow_1): Likewise.
      	(selftest::test_control_flow_2): Likewise.
      	(selftest::test_control_flow_3): Likewise.
      	(selftest::assert_cfg_edge_path_streq): Likewise.
      	(selftest::test_control_flow_5): Likewise.
      	(selftest::test_control_flow_6): Likewise.
      	* diagnostic-path.h (diagnostic_event::get_desc): Add param
      	"ref_pp".
      	* lazy-diagnostic-path.cc (selftest::test_intraprocedural_path):
      	Pass *event_pp to get_desc.
      	* simple-diagnostic-path.cc (selftest::test_intraprocedural_path):
      	Likewise.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      5ace2b23
    • Soumya AR's avatar
      Match: Fold pow calls to ldexp when possible [PR57492] · 5a674367
      Soumya AR authored
      This patch transforms the following POW calls to equivalent LDEXP calls, as
      discussed in PR57492:
      
      powi (powof2, i) -> ldexp (1.0, i * log2 (powof2))
      
      powof2 * ldexp (x, i) -> ldexp (x, i + log2 (powof2))
      
      a * ldexp(1., i) -> ldexp (a, i)
      
      This is especially helpful for SVE architectures as LDEXP calls can be
      implemented using the FSCALE instruction, as seen in the following patch:
      https://gcc.gnu.org/g:9b2915d95d855333d4d8f66b71a75f653ee0d076
      
      
      
      SPEC2017 was run with this patch, while there are no noticeable improvements,
      there are no non-noise regressions either.
      
      The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
      
      Signed-off-by: default avatarSoumya AR <soumyaa@nvidia.com>
      
      gcc/ChangeLog:
      	PR target/57492
      	* match.pd: Added patterns to fold calls to pow to ldexp and optimize
      	specific ldexp calls.
      
      gcc/testsuite/ChangeLog:
      	PR target/57492
      	* gcc.dg/tree-ssa/ldexp.c: New test.
      	* gcc.dg/tree-ssa/pow-to-ldexp.c: New test.
      5a674367
    • Yangyu Chen's avatar
      RISC-V: Add Multi-Versioning Test Cases · f42f8dcf
      Yangyu Chen authored
      
      This patch adds test cases for the Function Multi-Versioning (FMV)
      feature for RISC-V, which reuses the existing test cases from the
      aarch64 and ported them to RISC-V.
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/testsuite/ChangeLog:
      
      	* g++.target/riscv/mv-symbols1.C: New test.
      	* g++.target/riscv/mv-symbols2.C: New test.
      	* g++.target/riscv/mv-symbols3.C: New test.
      	* g++.target/riscv/mv-symbols4.C: New test.
      	* g++.target/riscv/mv-symbols5.C: New test.
      	* g++.target/riscv/mvc-symbols1.C: New test.
      	* g++.target/riscv/mvc-symbols2.C: New test.
      	* g++.target/riscv/mvc-symbols3.C: New test.
      	* g++.target/riscv/mvc-symbols4.C: New test.
      f42f8dcf
    • Yangyu Chen's avatar
      RISC-V: Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY and... · 917d03e4
      Yangyu Chen authored
      RISC-V: Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY and TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
      
      This patch implements the TARGET_GENERATE_VERSION_DISPATCHER_BODY and
      TARGET_GET_FUNCTION_VERSIONS_DISPATCHER for RISC-V. This is used to
      generate the dispatcher function and get the dispatcher function for
      function multiversioning.
      
      This patch copies many codes from commit 0cfde688 ("[aarch64]
      Add function multiversioning support") and modifies them to fit the
      RISC-V port. A key difference is the data structure of feature bits in
      RISC-V C-API is a array of unsigned long long, while in AArch64 is not
      a array. So we need to generate the array reference for each feature
      bits element in the dispatcher function.
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc (add_condition_to_bb): New function.
      	(dispatch_function_versions): New function.
      	(get_suffixed_assembler_name): New function.
      	(make_resolver_func): New function.
      	(riscv_generate_version_dispatcher_body): New function.
      	(riscv_get_function_versions_dispatcher): New function.
      	(TARGET_GENERATE_VERSION_DISPATCHER_BODY): Implement it.
      	(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Implement it.
      917d03e4
    • Yangyu Chen's avatar
      RISC-V: Implement TARGET_MANGLE_DECL_ASSEMBLER_NAME · 0c77c4b0
      Yangyu Chen authored
      
      This patch implements the TARGET_MANGLE_DECL_ASSEMBLER_NAME for RISC-V.
      This is used to add function multiversioning suffixes to the assembler
      name.
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc
      	(riscv_mangle_decl_assembler_name): New function.
      	(TARGET_MANGLE_DECL_ASSEMBLER_NAME): Define.
      0c77c4b0
    • Yangyu Chen's avatar
      RISC-V: Implement TARGET_COMPARE_VERSION_PRIORITY and TARGET_OPTION_FUNCTION_VERSIONS · 78753c75
      Yangyu Chen authored
      This patch implements TARGET_COMPARE_VERSION_PRIORITY and
      TARGET_OPTION_FUNCTION_VERSIONS for RISC-V.
      
      The TARGET_COMPARE_VERSION_PRIORITY is implemented to compare the
      priority of two function versions based on the rules defined in the
      RISC-V C-API Doc PR #85:
      
      https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85/files#diff-79a93ca266139524b8b642e582ac20999357542001f1f4666fbb62b6fb7a5824R721
      
      
      
      If multiple versions have equal priority, we select the function with
      the most number of feature bits generated by
      riscv_minimal_hwprobe_feature_bits. When it comes to the same number of
      feature bits, we diff two versions and select the one with the least
      significant bit set. Since a feature appears earlier in the feature_bits
      might be more important to performance.
      
      The TARGET_OPTION_FUNCTION_VERSIONS is implemented to check whether the
      two function versions are the same. This Implementation reuses the code
      in TARGET_COMPARE_VERSION_PRIORITY and check it returns 0, which means
      the equal priority.
      
      Co-Developed-by: default avatarHank Chang <hank.chang@sifive.com>
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc
      	(parse_features_for_version): New function.
      	(compare_fmv_features): New function.
      	(riscv_compare_version_priority): New function.
      	(riscv_common_function_versions): New function.
      	(TARGET_COMPARE_VERSION_PRIORITY): Implement it.
      	(TARGET_OPTION_FUNCTION_VERSIONS): Implement it.
      78753c75
    • Yangyu Chen's avatar
      RISC-V: Implement TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P · bd975bd1
      Yangyu Chen authored
      
      This patch implements the TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P for
      RISC-V. This hook is used to process attribute
      ((target_version ("..."))).
      
      As it is the first patch which introduces the target_version attribute,
      we also set TARGET_HAS_FMV_TARGET_ATTRIBUTE to 0 to use "target_version"
      for function versioning.
      
      Co-Developed-by: default avatarHank Chang <hank.chang@sifive.com>
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-protos.h
      	(riscv_process_target_attr): Remove as it is not used.
      	(riscv_option_valid_version_attribute_p): Declare.
      	(riscv_process_target_version_attr): Declare.
      	* config/riscv/riscv-target-attr.cc
      	(riscv_target_attrs): Renamed from riscv_attributes.
      	(riscv_target_version_attrs): New attributes for target_version.
      	(riscv_process_one_target_attr): New arguments to select attrs.
      	(riscv_process_target_attr): Likewise.
      	(riscv_option_valid_attribute_p): Likewise.
      	(riscv_process_target_version_attr): New function.
      	(riscv_option_valid_version_attribute_p): New function.
      	* config/riscv/riscv.cc
      	(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): Implement it.
      	* config/riscv/riscv.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): Define
      	it to 0 to use "target_version" for function versioning.
      bd975bd1
    • Yangyu Chen's avatar
      RISC-V: Implement riscv_minimal_hwprobe_feature_bits · 1f99a39d
      Yangyu Chen authored
      
      This patch implements the riscv_minimal_hwprobe_feature_bits feature
      for the RISC-V target. The feature bits are defined in the
      libgcc/config/riscv/feature_bits.c to provide bitmasks of ISA extensions
      that defined in RISC-V C-API. Thus, we need a function to generate the
      feature bits for IFUNC resolver to dispatch between different functions
      based on the hardware features.
      
      The minimal feature bits means to use the earliest extension appeard in
      the Linux hwprobe to cover the given ISA string. To allow older kernels
      without some implied extensions probe to run the FMV dispatcher
      correctly.
      
      For example, V implies Zve32x, but Zve32x appears in the Linux kernel
      since v6.11. If we use isa string directly to generate FMV dispatcher
      with functions with "arch=+v" extension, since we have V implied the
      Zve32x, FMV dispatcher will check if the Zve32x extension is supported
      by the host. If the Linux kernel is older than v6.11, the FMV dispatcher
      will fail to detect the Zve32x extension even it already implies by the
      V extension, thus making the FMV dispatcher fail to dispatch the correct
      function.
      
      Thus, we need to generate the minimal feature bits to cover the given
      ISA string to allow the FMV dispatcher to work correctly on older
      kernels.
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* common/config/riscv/riscv-common.cc
      	(RISCV_EXT_BITMASK): New macro.
      	(struct riscv_ext_bitmask_table_t): New struct.
      	(riscv_minimal_hwprobe_feature_bits): New function.
      	* common/config/riscv/riscv-ext-bitmask.def: New file.
      	* config/riscv/riscv-subset.h (GCC_RISCV_SUBSET_H): Include
      	riscv-feature-bits.h.
      	(riscv_minimal_hwprobe_feature_bits): Declare the function.
      	* config/riscv/riscv-feature-bits.h: New file.
      1f99a39d
    • Yangyu Chen's avatar
      RISC-V: Implement Priority syntax parser for Function Multi-Versioning · 6b572d4e
      Yangyu Chen authored
      This patch adds the priority syntax parser to support the Function
      Multi-Versioning (FMV) feature in RISC-V. This feature allows users to
      specify the priority of the function version in the attribute syntax.
      
      Chnages based on RISC-V C-API PR:
      https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85
      
      
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-target-attr.cc
      	(riscv_target_attr_parser::handle_priority): New function.
      	(riscv_target_attr_parser::update_settings): Update priority
      	attribute.
      	* config/riscv/riscv.opt: Add TargetVariable riscv_fmv_priority.
      6b572d4e
Loading