Skip to content
Snippets Groups Projects
  1. Apr 20, 2023
    • Haochen Jiang's avatar
      i386: Add AVX512BW dependency to AVX512VBMI2 · 4fb12ae9
      Haochen Jiang authored
      gcc/ChangeLog:
      
      	* common/config/i386/i386-common.cc
      	(OPTION_MASK_ISA_AVX512VBMI2_SET): Change OPTION_MASK_ISA_AVX512F_SET
      	to OPTION_MASK_ISA_AVX512BW_SET.
      	(OPTION_MASK_ISA_AVX512F_UNSET):
      	Remove OPTION_MASK_ISA_AVX512VBMI2_UNSET.
      	(OPTION_MASK_ISA_AVX512BW_UNSET):
      	Add OPTION_MASK_ISA_AVX512VBMI2_UNSET.
      	* config/i386/avx512vbmi2intrin.h: Do not push avx512bw.
      	* config/i386/avx512vbmi2vlintrin.h: Ditto.
      	* config/i386/i386-builtin.def: Remove OPTION_MASK_ISA_AVX512BW.
      	* config/i386/sse.md (VI12_AVX512VLBW): Removed.
      	(VI12_VI48F_AVX512VLBW): Rename to VI12_VI48F_AVX512VL.
      	(compress<mode>_mask): Change iterator from VI12_AVX512VLBW to
      	VI12_AVX512VL.
      	(compressstore<mode>_mask): Ditto.
      	(expand<mode>_mask): Ditto.
      	(expand<mode>_maskz): Ditto.
      	(*expand<mode>_mask): Change iterator from VI12_VI48F_AVX512VLBW to
      	VI12_VI48F_AVX512VL.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512bw-pr100267-1.c: Remove avx512f and avx512bw.
      	* gcc.target/i386/avx512bw-pr100267-b-2.c: Ditto.
      	* gcc.target/i386/avx512bw-pr100267-d-2.c: Ditto.
      	* gcc.target/i386/avx512bw-pr100267-q-2.c: Ditto.
      	* gcc.target/i386/avx512bw-pr100267-w-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpcompressb-1.c: Ditto.
      	* gcc.target/i386/avx512f-vpcompressb-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpcompressw-1.c: Ditto.
      	* gcc.target/i386/avx512f-vpcompressw-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpexpandb-1.c: Ditto.
      	* gcc.target/i386/avx512f-vpexpandb-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpexpandw-1.c: Ditto.
      	* gcc.target/i386/avx512f-vpexpandw-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshld-1.c: Ditto.
      	* gcc.target/i386/avx512f-vpshldd-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshldq-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshldv-1.c: Ditto.
      	* gcc.target/i386/avx512f-vpshldvd-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshldvq-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshldvw-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshrdd-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshrdq-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshrdv-1.c: Ditto.
      	* gcc.target/i386/avx512f-vpshrdvd-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshrdvq-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshrdvw-2.c: Ditto.
      	* gcc.target/i386/avx512f-vpshrdw-2.c: Ditto.
      	* gcc.target/i386/avx512vbmi2-vpshld-1.c: Ditto.
      	* gcc.target/i386/avx512vbmi2-vpshrd-1.c: Ditto.
      	* gcc.target/i386/avx512vl-vpcompressb-1.c: Ditto.
      	* gcc.target/i386/avx512vl-vpcompressb-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpcompressw-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpexpandb-1.c: Ditto.
      	* gcc.target/i386/avx512vl-vpexpandb-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpexpandw-1.c: Ditto.
      	* gcc.target/i386/avx512vl-vpexpandw-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshldd-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshldq-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshldv-1.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshldvd-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshldvq-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshldvw-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshrdd-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshrdq-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshrdv-1.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshrdvd-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshrdvq-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshrdvw-2.c: Ditto.
      	* gcc.target/i386/avx512vl-vpshrdw-2.c: Ditto.
      	* gcc.target/i386/avx512vlbw-pr100267-1.c: Ditto.
      	* gcc.target/i386/avx512vlbw-pr100267-b-2.c: Ditto.
      	* gcc.target/i386/avx512vlbw-pr100267-w-2.c: Ditto.
      4fb12ae9
    • Haochen Jiang's avatar
      i386: Add AVX512BW dependency to AVX512BITALG · d08b0559
      Haochen Jiang authored
      Since some of the AVX512BITALG intrins use 32/64 bit mask,
      AVX512BW should be implied.
      
      gcc/ChangeLog:
      
      	* common/config/i386/i386-common.cc
      	(OPTION_MASK_ISA_AVX512BITALG_SET):
      	Change OPTION_MASK_ISA_AVX512F_SET
      	to OPTION_MASK_ISA_AVX512BW_SET.
      	(OPTION_MASK_ISA_AVX512F_UNSET):
      	Remove OPTION_MASK_ISA_AVX512BITALG_SET.
      	(OPTION_MASK_ISA_AVX512BW_UNSET):
      	Add OPTION_MASK_ISA_AVX512BITALG_SET.
      	* config/i386/avx512bitalgintrin.h: Do not push avx512bw.
      	* config/i386/i386-builtin.def:
      	Remove redundant OPTION_MASK_ISA_AVX512BW.
      	* config/i386/sse.md (VI1_AVX512VLBW): Removed.
      	(avx512vl_vpshufbitqmb<mode><mask_scalar_merge_name>):
      	Change the iterator from VI1_AVX512VLBW to VI1_AVX512VL.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512bitalg-vpopcntb-1.c:
      	Remove avx512bw.
      	* gcc.target/i386/avx512bitalg-vpopcntb.c: Ditto.
      	* gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto.
      	* gcc.target/i386/avx512bitalg-vpopcntw-1.c: Ditto.
      	* gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto.
      	* gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto.
      	* gcc.target/i386/avx512bitalg-vpshufbitqmb-1.c: Ditto.
      	* gcc.target/i386/avx512bitalg-vpshufbitqmb.c: Ditto.
      	* gcc.target/i386/avx512bitalgvl-vpopcntb-1.c: Ditto.
      	* gcc.target/i386/avx512bitalgvl-vpopcntw-1.c: Ditto.
      	* gcc.target/i386/avx512bitalgvl-vpshufbitqmb-1.c: Ditto.
      	* gcc.target/i386/pr93696-1.c: Ditto.
      	* gcc.target/i386/pr93696-2.c: Ditto.
      d08b0559
    • Haochen Jiang's avatar
      i386: Use macro to wrap up share builtin exceptions in builtin isa check · 5ebdbdb9
      Haochen Jiang authored
      gcc/ChangeLog:
      
      	* config/i386/i386-expand.cc
      	(ix86_check_builtin_isa_match): Correct wrong comments.
      	Add a new macro SHARE_BUILTIN and refactor the current if
      	clauses to macro.
      5ebdbdb9
    • Mo, Zewei's avatar
      Re-arrange sections of i386 cpuid · fd7ecd80
      Mo, Zewei authored
      gcc/ChangeLog:
      
      	* config/i386/cpuid.h: Open a new section for Extended Features
      	Leaf (%eax == 7, %ecx == 0) and Extended Features Sub-leaf (%eax == 7,
      	%ecx == 1).
      fd7ecd80
    • Hu, Lin1's avatar
      Optimize vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128 ymm · c2dac2e5
      Hu, Lin1 authored
      vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128 ymm are 3 clk.
      We can optimze them to vblend, vmovaps when there's no cross-lane.
      
      gcc/ChangeLog:
      
      	* config/i386/sse.md: Modify insn vperm{i,f}
      	and vshuf{i,f}.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512vl-vshuff32x4-1.c: Modify test.
      	* gcc.target/i386/avx512vl-vshuff64x2-1.c: Ditto.
      	* gcc.target/i386/avx512vl-vshufi32x4-1.c: Ditto.
      	* gcc.target/i386/avx512vl-vshufi64x2-1.c: Ditto.
      	* gcc.target/i386/opt-vperm-vshuf-1.c: New test.
      	* gcc.target/i386/opt-vperm-vshuf-2.c: Ditto.
      	* gcc.target/i386/opt-vperm-vshuf-3.c: Ditto.
      c2dac2e5
    • GCC Administrator's avatar
      Daily bump. · cf0d9dbc
      GCC Administrator authored
      cf0d9dbc
  2. Apr 19, 2023
    • Max Filippov's avatar
      gcc: xtensa: add -m[no-]strict-align option · 675b390e
      Max Filippov authored
      gcc/
      	* config/xtensa/xtensa-opts.h: New header.
      	* config/xtensa/xtensa.h (STRICT_ALIGNMENT): Redefine as
      	xtensa_strict_align.
      	* config/xtensa/xtensa.cc (xtensa_option_override): When
      	-m[no-]strict-align is not specified in the command line set
      	xtensa_strict_align to 0 if the hardware supports both unaligned
      	loads and stores or to 1 otherwise.
      	* config/xtensa/xtensa.opt (mstrict-align): New option.
      	* doc/invoke.texi (Xtensa Options): Document -m[no-]strict-align.
      675b390e
    • Max Filippov's avatar
      gcc: xtensa: add data alignment properties to dynconfig · ec9b3087
      Max Filippov authored
      gcc/
      	* config/xtensa/xtensa-dynconfig.cc (xtensa_get_config_v4): New
      	function.
      
      include/
      	* xtensa-dynconfig.h (xtensa_config_v4): New struct.
      	(XCHAL_DATA_WIDTH, XCHAL_UNALIGNED_LOAD_EXCEPTION)
      	(XCHAL_UNALIGNED_STORE_EXCEPTION, XCHAL_UNALIGNED_LOAD_HW)
      	(XCHAL_UNALIGNED_STORE_HW, XTENSA_CONFIG_V4_ENTRY_LIST): New
      	definitions.
      	(XTENSA_CONFIG_INSTANCE_LIST): Add xtensa_config_v4 instance.
      	(XTENSA_CONFIG_ENTRY_LIST): Add XTENSA_CONFIG_V4_ENTRY_LIST.
      ec9b3087
    • Patrick Palka's avatar
      c++: Define built-in for std::tuple_element [PR100157] · 58b7dbf8
      Patrick Palka authored
      
      This adds a new built-in to replace the recursive class template
      instantiations done by traits such as std::tuple_element and
      std::variant_alternative.  The purpose is to select the Nth type from a
      list of types, e.g. __type_pack_element<1, char, int, float> is int.
      We implement it as a special kind of TRAIT_TYPE.
      
      For a pathological example tuple_element_t<1000, tuple<2000 types...>>
      the compilation time is reduced by more than 90% and the memory used by
      the compiler is reduced by 97%.  In realistic examples the gains will be
      much smaller, but still relevant.
      
      Unlike the other built-in traits, __type_pack_element uses template-id
      syntax instead of call syntax and is SFINAE-enabled, matching Clang's
      implementation.  And like the other built-in traits, it's not mangleable
      so we can't use it directly in function signatures.
      
      N.B. Clang seems to implement __type_pack_element as a first-class
      template that can e.g. be used as a template-template argument.  For
      simplicity we implement it in a more ad-hoc way.
      
      Co-authored-by: default avatarJonathan Wakely <jwakely@redhat.com>
      
      	PR c++/100157
      
      gcc/cp/ChangeLog:
      
      	* cp-trait.def (TYPE_PACK_ELEMENT): Define.
      	* cp-tree.h (finish_trait_type): Add complain parameter.
      	* cxx-pretty-print.cc (pp_cxx_trait): Handle
      	CPTK_TYPE_PACK_ELEMENT.
      	* parser.cc (cp_parser_constant_expression): Document default
      	arguments.
      	(cp_parser_trait): Handle CPTK_TYPE_PACK_ELEMENT.  Pass
      	tf_warning_or_error to finish_trait_type.
      	* pt.cc (tsubst) <case TRAIT_TYPE>: Handle non-type first
      	argument.  Pass complain to finish_trait_type.
      	* semantics.cc (finish_type_pack_element): Define.
      	(finish_trait_type): Add complain parameter.  Handle
      	CPTK_TYPE_PACK_ELEMENT.
      	* tree.cc (strip_typedefs): Handle non-type first argument.
      	Pass tf_warning_or_error to finish_trait_type.
      	* typeck.cc (structural_comptypes) <case TRAIT_TYPE>: Use
      	cp_tree_equal instead of same_type_p for the first argument.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/utility.h (_Nth_type): Conditionally define in
      	terms of __type_pack_element if available.
      	* testsuite/20_util/tuple/element_access/get_neg.cc: Prune
      	additional errors from the new built-in.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/ext/type_pack_element1.C: New test.
      	* g++.dg/ext/type_pack_element2.C: New test.
      	* g++.dg/ext/type_pack_element3.C: New test.
      58b7dbf8
    • Patrick Palka's avatar
      c++: bad ggc_free in try_class_unification [PR109556] · 5e284ebb
      Patrick Palka authored
      Aside from correcting how try_class_unification copies multi-dimensional
      'targs', r13-377-g3e948d645bc908 also made it ggc_free this copy as an
      optimization.  But this is wrong since the call to unify within might've
      captured the args in persistent memory such as the satisfaction cache
      (as part of constrained auto deduction).
      
      	PR c++/109556
      
      gcc/cp/ChangeLog:
      
      	* pt.cc (try_class_unification): Don't ggc_free the copy of
      	'targs'.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp2a/concepts-placeholder13.C: New test.
      5e284ebb
    • Harald Anlauf's avatar
      testsuite: fix scan-tree-dump patterns [PR83904,PR100297] · 6fc8e25c
      Harald Anlauf authored
      Adjust scan-tree-dump patterns so that they do not accidentally match a
      valid path.
      
      gcc/testsuite/ChangeLog:
      
      	PR testsuite/83904
      	PR fortran/100297
      	* gfortran.dg/allocatable_function_1.f90: Use "__builtin_free "
      	instead of the naive "free".
      	* gfortran.dg/reshape_8.f90: Extend pattern from a simple "data".
      6fc8e25c
    • Andrew Pinski's avatar
      i386: Add new pattern for zero-extend cmov · 04a9209d
      Andrew Pinski authored
      After a phiopt change, I got a failure of cmov9.c.
      The RTL IR has zero_extend on the outside of
      the if_then_else rather than on the side. Both
      ways are considered canonical as mentioned in
      PR 66588.
      
      This fixes the failure I got and also adds a testcase
      which fails before even my phiopt patch but will pass
      with this patch.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with
      no regressions.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.md (*movsicc_noc_zext_1): New pattern.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/cmov10.c: New test.
      	* gcc.target/i386/cmov11.c: New test.
      04a9209d
    • Jason Merrill's avatar
      c++: fix 'unsigned __int128_t' semantics [PR108099] · ed32ec26
      Jason Merrill authored
      My earlier patch for 108099 made us accept this non-standard pattern but
      messed up the semantics, so that e.g. unsigned __int128_t was not a 128-bit
      type.
      
      	PR c++/108099
      
      gcc/cp/ChangeLog:
      
      	* decl.cc (grokdeclarator): Keep typedef_decl for __int128_t.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/ext/int128-8.C: New test.
      ed32ec26
    • Juzhe-Zhong's avatar
      RISC-V: Support 128 bit vector chunk · 9fdea28d
      Juzhe-Zhong authored
      RISC-V has provide different VLEN configuration by different ISA
      extension like `zve32x`, `zve64x` and `v`
      zve32x just guarantee the minimal VLEN is 32 bits,
      zve64x guarantee the minimal VLEN is 64 bits,
      and v guarantee the minimal VLEN is 128 bits,
      
      Current status (without this patch):
      
      Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
      is invalid mode
       - one vector register could hold 1 + 1x SImode where x is 0~n, so it
      might hold just one SI
      
      Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
       - one vector register could hold 1 + 1x DImode where x is 0~n, so it
      might hold just one DI.
       - one vector register could hold 2 + 2x SImode where x is 0~n, so it
      might hold just two SI.
      
      However `v` extension guarantees the minimal VLEN is 128 bits.
      
      We introduce another type/mode mapping for this configure:
      
      v: Mode for one vector register mode is VNx2DImode or VNx4SImode
       - one vector register could hold 2 + 2x DImode where x is 0~n, so it
      will hold at least two DI
       - one vector register could hold 4 + 4x SImode where x is 0~n, so it
      will hold at least four DI
      
      This patch model the mode more precisely for the RVV, and help some
      middle-end optimization that assume number of element must be a
      multiple of two.
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-modes.def (FLOAT_MODE): Add chunk 128 support.
      	(VECTOR_BOOL_MODE): Ditto.
      	(ADJUST_NUNITS): Ditto.
      	(ADJUST_ALIGNMENT): Ditto.
      	(ADJUST_BYTESIZE): Ditto.
      	(ADJUST_PRECISION): Ditto.
      	(RVV_MODES): Ditto.
      	(VECTOR_MODE_WITH_PREFIX): Ditto.
      	* config/riscv/riscv-v.cc (ENTRY): Ditto.
      	(get_vlmul): Ditto.
      	(get_ratio): Ditto.
      	* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TYPE): Ditto.
      	* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE): Ditto.
      	(vbool64_t): Ditto.
      	(vbool32_t): Ditto.
      	(vbool16_t): Ditto.
      	(vbool8_t): Ditto.
      	(vbool4_t): Ditto.
      	(vbool2_t): Ditto.
      	(vbool1_t): Ditto.
      	(vint8mf8_t): Ditto.
      	(vuint8mf8_t): Ditto.
      	(vint8mf4_t): Ditto.
      	(vuint8mf4_t): Ditto.
      	(vint8mf2_t): Ditto.
      	(vuint8mf2_t): Ditto.
      	(vint8m1_t): Ditto.
      	(vuint8m1_t): Ditto.
      	(vint8m2_t): Ditto.
      	(vuint8m2_t): Ditto.
      	(vint8m4_t): Ditto.
      	(vuint8m4_t): Ditto.
      	(vint8m8_t): Ditto.
      	(vuint8m8_t): Ditto.
      	(vint16mf4_t): Ditto.
      	(vuint16mf4_t): Ditto.
      	(vint16mf2_t): Ditto.
      	(vuint16mf2_t): Ditto.
      	(vint16m1_t): Ditto.
      	(vuint16m1_t): Ditto.
      	(vint16m2_t): Ditto.
      	(vuint16m2_t): Ditto.
      	(vint16m4_t): Ditto.
      	(vuint16m4_t): Ditto.
      	(vint16m8_t): Ditto.
      	(vuint16m8_t): Ditto.
      	(vint32mf2_t): Ditto.
      	(vuint32mf2_t): Ditto.
      	(vint32m1_t): Ditto.
      	(vuint32m1_t): Ditto.
      	(vint32m2_t): Ditto.
      	(vuint32m2_t): Ditto.
      	(vint32m4_t): Ditto.
      	(vuint32m4_t): Ditto.
      	(vint32m8_t): Ditto.
      	(vuint32m8_t): Ditto.
      	(vint64m1_t): Ditto.
      	(vuint64m1_t): Ditto.
      	(vint64m2_t): Ditto.
      	(vuint64m2_t): Ditto.
      	(vint64m4_t): Ditto.
      	(vuint64m4_t): Ditto.
      	(vint64m8_t): Ditto.
      	(vuint64m8_t): Ditto.
      	(vfloat32mf2_t): Ditto.
      	(vfloat32m1_t): Ditto.
      	(vfloat32m2_t): Ditto.
      	(vfloat32m4_t): Ditto.
      	(vfloat32m8_t): Ditto.
      	(vfloat64m1_t): Ditto.
      	(vfloat64m2_t): Ditto.
      	(vfloat64m4_t): Ditto.
      	(vfloat64m8_t): Ditto.
      	* config/riscv/riscv-vector-switch.def (ENTRY): Ditto.
      	* config/riscv/riscv.cc (riscv_legitimize_poly_move): Ditto.
      	(riscv_convert_vector_bits): Ditto.
      	* config/riscv/riscv.md:
      	* config/riscv/vector-iterators.md:
      	* config/riscv/vector.md
      	(@pred_indexed_<order>store<VNX32_QH:mode><VNX32_QHI:mode>): Ditto.
      	(@pred_indexed_<order>store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
      	(@pred_indexed_<order>store<VNX64_Q:mode><VNX64_Q:mode>): Ditto.
      	(@pred_indexed_<order>store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
      	(@pred_indexed_<order>store<VNX128_Q:mode><VNX128_Q:mode>): Ditto.
      	(@pred_reduc_<reduc><mode><vlmul1_zve64>): Ditto.
      	(@pred_widen_reduc_plus<v_su><mode><vwlmul1_zve64>): Ditto.
      	(@pred_reduc_plus<order><mode><vlmul1_zve64>): Ditto.
      	(@pred_widen_reduc_plus<order><mode><vwlmul1_zve64>): Ditto.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/base/pr108185-4.c: Adapt testcase.
      	* gcc.target/riscv/rvv/base/spill-1.c: Ditto.
      	* gcc.target/riscv/rvv/base/spill-11.c: Ditto.
      	* gcc.target/riscv/rvv/base/spill-2.c: Ditto.
      	* gcc.target/riscv/rvv/base/spill-3.c: Ditto.
      	* gcc.target/riscv/rvv/base/spill-5.c: Ditto.
      	* gcc.target/riscv/rvv/base/spill-9.c: Ditto.
      9fdea28d
    • Pan Li's avatar
      RISC-V: Align IOR optimization MODE_CLASS condition to AND. · 978e8f02
      Pan Li authored
      
      This patch aligned the MODE_CLASS condition of the IOR to the AND. Then
      more MODE_CLASS besides SCALAR_INT can able to perform the optimization
      A | (~A) -> -1 similar to AND operator. For example as below sample code.
      
      vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl)
      {
        return __riscv_vmorn_mm_b32(v1, v1, vl);
      }
      
      Before this patch:
      vsetvli  a5,zero,e8,mf4,ta,ma
      vlm.v    v24,0(a1)
      vsetvli  zero,a2,e8,mf4,ta,ma
      vmorn.mm v24,v24,v24
      vsetvli  a5,zero,e8,mf4,ta,ma
      vsm.v    v24,0(a0)
      ret
      
      After this patch:
      vsetvli zero,a2,e8,mf4,ta,ma
      vmset.m v24
      vsetvli a5,zero,e8,mf4,ta,ma
      vsm.v   v24,0(a0)
      ret
      
      Or in RTL's perspective,
      from:
      (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
      to:
      (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
      
      The similar optimization like VMANDN has enabled already. There should
      be no difference execpt the operator when compare the VMORN and VMANDN
      for such kind of optimization. The patch aligns the IOR MODE_CLASS condition
      of the simplification to the AND operator.
      
      gcc/ChangeLog:
      
      	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
      	Align IOR (A | (~A) -> -1) optimization MODE_CLASS condition to AND.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: Update check
      	condition.
      	* gcc.target/riscv/simplify_ior_optimization.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      978e8f02
    • Uros Bizjak's avatar
      i386: Emit compares between high registers and memory · 0df6d181
      Uros Bizjak authored
      Following code:
      
      typedef __SIZE_TYPE__ size_t;
      
      struct S1s
      {
        char pad1;
        char val;
        short pad2;
      };
      
      extern char ts[256];
      
      _Bool foo (struct S1s a, size_t i)
      {
        return (ts[i] > a.val);
      }
      
      compiles with -O2 to:
      
              movl    %edi, %eax
              movsbl  %ah, %edi
              cmpb    %dil, ts(%rsi)
              setg    %al
              ret
      
      the compare could use high register %ah instead of %dil:
      
              movl    %edi, %eax
              cmpb    ts(%rsi), %ah
              setl    %al
              ret
      
      Use any_extract code iterator to handle signed and unsigned extracts
      from high register and introduce peephole2 patterns to propagate
      norex memory opeerand into the compare insn.
      
      gcc/ChangeLog:
      
      	PR target/78904
      	PR target/78952
      	* config/i386/i386.md (*cmpqi_ext<mode>_1_mem_rex64): New insn pattern.
      	(*cmpqi_ext<mode>_1): Use nonimmediate_operand predicate
      	for operand 0. Use any_extract code iterator.
      	(*cmpqi_ext<mode>_1 peephole2): New peephole2 pattern.
      	(*cmpqi_ext<mode>_2): Use any_extract code iterator.
      	(*cmpqi_ext<mode>_3_mem_rex64): New insn pattern.
      	(*cmpqi_ext<mode>_1): Use general_operand predicate
      	for operand 1. Use any_extract code iterator.
      	(*cmpqi_ext<mode>_3 peephole2): New peephole2 pattern.
      	(*cmpqi_ext<mode>_4): Use any_extract code iterator.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/78904
      	PR target/78952
      	* gcc.target/i386/pr78952-3.c: New test.
      0df6d181
    • Kyrylo Tkachov's avatar
      aarch64: Factorise widening add/sub high-half expanders with iterators · a30078d5
      Kyrylo Tkachov authored
      I noticed these define_expand are almost identical modulo some string substitutions.
      This patch compresses them together with a couple of code iterators.
      No functional change intended.
      Bootstrapped and tested on aarch64-none-linux-gnu.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-simd.md (aarch64_saddw2<mode>): Delete.
      	(aarch64_uaddw2<mode>): Delete.
      	(aarch64_ssubw2<mode>): Delete.
      	(aarch64_usubw2<mode>): Delete.
      	(aarch64_<ANY_EXTEND:su><ADDSUB:optab>w2<mode>): New define_expand.
      a30078d5
    • Richard Biener's avatar
      Use solve_add_graph_edge in more places · 57aecdbc
      Richard Biener authored
      The following makes sure to use solve_add_graph_edge and honoring
      special-cases, especially edges from escaped, in the remaining places
      the solver adds edges.
      
      	* tree-ssa-structalias.cc (do_ds_constraint): Use
      	solve_add_graph_edge.
      57aecdbc
    • Richard Biener's avatar
      Split out solve_add_graph_edge · 2cef0d09
      Richard Biener authored
      Split out a worker with all the special-casings when adding a graph
      edge during solving.
      
      	* tree-ssa-structalias.cc (solve_add_graph_edge): New function,
      	split out from ...
      	(do_sd_constraint): ... here.
      2cef0d09
    • Richard Biener's avatar
      Remove odd code from gimple_can_merge_blocks_p · 1da16c11
      Richard Biener authored
      The following removes a special case to not merge a block with
      only a non-local label.  We have a restriction of non-local labels
      to be the first statement (and label) in a block, but otherwise nothing,
      if the last stmt of A is a non-local label then it will be still
      the first statement of the combined A + B.  In particular we'd
      happily merge when there's a stmt after that label.
      
      The check originates from the tree-ssa merge.
      
      Bootstrapped and tested on x86_64-unknown-linux-gnu with all
      languages.
      
      	* tree-cfg.cc (gimple_can_merge_blocks_p): Remove condition
      	rejecting the merge when A contains only a non-local label.
      1da16c11
    • Uros Bizjak's avatar
      Introduce VIRTUAL_REGISTER_P and VIRTUAL_REGISTER_NUM_P predicates · 258aecd7
      Uros Bizjak authored
      These two predicates are similar to existing HARD_REGISTER_P and
      HARD_REGISTER_NUM_P predicates and return 1 if the given register
      corresponds to a virtual register.
      
      gcc/ChangeLog:
      
      	* rtl.h (VIRTUAL_REGISTER_P): New predicate.
      	(VIRTUAL_REGISTER_NUM_P): Ditto.
      	(REGNO_PTR_FRAME_P): Use VIRTUAL_REGISTER_NUM_P predicate.
      	* expr.cc (force_operand): Use VIRTUAL_REGISTER_P predicate.
      	* function.cc (instantiate_decl_rtl): Ditto.
      	* rtlanal.cc (rtx_addr_can_trap_p_1): Ditto.
      	(nonzero_address_p): Ditto.
      	(refers_to_regno_p): Use VIRTUAL_REGISTER_NUM_P predicate.
      258aecd7
    • Aldy Hernandez's avatar
      Fix pointer sharing in Value_Range constructor. · 4c9f8cd6
      Aldy Hernandez authored
      gcc/ChangeLog:
      
      	* value-range.h (Value_Range::Value_Range): Avoid pointer sharing.
      4c9f8cd6
    • Richard Biener's avatar
      Transform more gmp/mpfr uses to use RAII · 210617b5
      Richard Biener authored
      The following picks up the coccinelle generated patch from Bernhard,
      leaving out the fortran frontend parts and fixing up the rest.
      In particular both gmp.h and mpfr.h contain macros like
        #define mpfr_inf_p(_x)      ((_x)->_mpfr_exp == __MPFR_EXP_INF)
      for which I add operator-> overloads to the auto_* classes.
      
      	* system.h (auto_mpz::operator->()): New.
      	* realmpfr.h (auto_mpfr::operator->()): New.
      	* builtins.cc (do_mpfr_lgamma_r): Use auto_mpfr.
      	* real.cc (real_from_string): Likewise.
      	(dconst_e_ptr): Likewise.
      	(dconst_sqrt2_ptr): Likewise.
      	* tree-ssa-loop-niter.cc (refine_value_range_using_guard):
      	Use auto_mpz.
      	(bound_difference_of_offsetted_base): Likewise.
      	(number_of_iterations_ne): Likewise.
      	(number_of_iterations_lt_to_ne): Likewise.
      	* ubsan.cc: Include realmpfr.h.
      	(ubsan_instrument_float_cast): Use auto_mpfr.
      210617b5
    • Jonathan Wakely's avatar
      Revert "libstdc++: Export global iostreams with GLIBCXX_3.4.31 symver [PR108969]" · fac24d43
      Jonathan Wakely authored
      This reverts commit b7c54e3f.
      
      libstdc++-v3/ChangeLog:
      
      	* config/abi/post/aarch64-linux-gnu/baseline_symbols.txt:
      	* config/abi/post/i486-linux-gnu/baseline_symbols.txt:
      	* config/abi/post/m68k-linux-gnu/baseline_symbols.txt:
      	* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt:
      	* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt:
      	* config/abi/post/s390x-linux-gnu/baseline_symbols.txt:
      	* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt:
      	* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt:
      	* config/abi/pre/gnu.ver:
      	* src/Makefile.am:
      	* src/Makefile.in:
      	* src/c++98/Makefile.am:
      	* src/c++98/Makefile.in:
      	* src/c++98/globals_io.cc (defined):
      	(_GLIBCXX_IO_GLOBAL):
      fac24d43
    • Jonathan Wakely's avatar
      Revert "libstdc++: Fix preprocessor condition in linker script [PR108969]" · a6e4b81b
      Jonathan Wakely authored
      This reverts commit 6067ae45.
      
      libstdc++-v3/ChangeLog:
      
      	* config/abi/pre/gnu.ver:
      a6e4b81b
    • Richard Biener's avatar
      Remove special-cased edges when solving copies · 6702fdcd
      Richard Biener authored
      The following makes sure to remove the copy edges we ignore or
      need to special-case only once.
      
      	* tree-ssa-structalias.cc (solve_graph): Remove self-copy
      	edges, remove edges from escaped after special-casing them.
      6702fdcd
    • Richard Biener's avatar
      Fix do_sd_constraint escape special casing · 8366e676
      Richard Biener authored
      The following fixes the escape special casing to test the proper
      variable IDs.
      
      	* tree-ssa-structalias.cc (do_sd_constraint): Fixup escape
      	special casing.
      8366e676
    • Richard Biener's avatar
      Remove senseless store in do_sd_constraint · 9d218c45
      Richard Biener authored
      	* tree-ssa-structalias.cc (do_sd_constraint): Do not write
      	to the LHS varinfo solution member.
      9d218c45
    • Richard Biener's avatar
      Avoid non-unified nodes on the topological sorting for PTA solving · 7838574b
      Richard Biener authored
      Since we do not update successor edges when merging nodes we have
      to deal with this in the users.  The following avoids putting those
      on the topo order vector.
      
      	* tree-ssa-structalias.cc (topo_visit): Look at the real
      	destination of edges.
      7838574b
    • Richard Biener's avatar
      tree-optimization/44794 - avoid excessive RTL unrolling on epilogues · a243ce2a
      Richard Biener authored
      The following adjusts tree_[transform_and_]unroll_loop to set an
      upper bound on the number of iterations on the epilogue loop it
      creates.  For the testcase at hand which involves array prefetching
      this avoids applying RTL unrolling to them when -funroll-loops is
      specified.
      
      Other users of this API includes predictive commoning and
      unroll-and-jam.
      
      	PR tree-optimization/44794
      	* tree-ssa-loop-manip.cc (tree_transform_and_unroll_loop):
      	If an epilogue loop is required set its iteration upper bound.
      a243ce2a
    • Xi Ruoyao's avatar
      LoongArch: Improve cpymemsi expansion [PR109465] · 6d7e0bcf
      Xi Ruoyao authored
      We'd been generating really bad block move sequences which is recently
      complained by kernel developers who tried __builtin_memcpy.  To improve
      it:
      
      1. Take the advantage of -mno-strict-align.  When it is set, set mode
         size to UNITS_PER_WORD regardless of the alignment.
      2. Half the mode size when (block size) % (mode size) != 0, instead of
         falling back to ld.bu/st.b at once.
      3. Limit the length of block move sequence considering the number of
         instructions, not the size of block.  When -mstrict-align is set and
         the block is not aligned, the old size limit for straight-line
         implementation (64 bytes) was definitely too large (we don't have 64
         registers anyway).
      
      Change since v1: add a comment about the calculation of num_reg.
      
      gcc/ChangeLog:
      
      	PR target/109465
      	* config/loongarch/loongarch-protos.h
      	(loongarch_expand_block_move): Add a parameter as alignment RTX.
      	* config/loongarch/loongarch.h:
      	(LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER): Remove.
      	(LARCH_MAX_MOVE_BYTES_STRAIGHT): Remove.
      	(LARCH_MAX_MOVE_OPS_PER_LOOP_ITER): Define.
      	(LARCH_MAX_MOVE_OPS_STRAIGHT): Define.
      	(MOVE_RATIO): Use LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of
      	LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER.
      	* config/loongarch/loongarch.cc (loongarch_expand_block_move):
      	Take the alignment from the parameter, but set it to
      	UNITS_PER_WORD if !TARGET_STRICT_ALIGN.  Limit the length of
      	straight-line implementation with LARCH_MAX_MOVE_OPS_STRAIGHT
      	instead of LARCH_MAX_MOVE_BYTES_STRAIGHT.
      	(loongarch_block_move_straight): When there are left-over bytes,
      	half the mode size instead of falling back to byte mode at once.
      	(loongarch_block_move_loop): Limit the length of loop body with
      	LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of
      	LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER.
      	* config/loongarch/loongarch.md (cpymemsi): Pass the alignment
      	to loongarch_expand_block_move.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/109465
      	* gcc.target/loongarch/pr109465-1.c: New test.
      	* gcc.target/loongarch/pr109465-2.c: New test.
      	* gcc.target/loongarch/pr109465-3.c: New test.
      6d7e0bcf
    • Xi Ruoyao's avatar
      LoongArch: Improve GAR store for va_list · 81c65014
      Xi Ruoyao authored
      LoongArch backend used to save all GARs for a function with variable
      arguments.  But sometimes a function only accepts variable arguments for
      a purpose like C++ function overloading.  For example, POSIX defines
      open() as:
      
          int open(const char *path, int oflag, ...);
      
      But only two forms are actually used:
      
          int open(const char *pathname, int flags);
          int open(const char *pathname, int flags, mode_t mode);
      
      So it's obviously a waste to save all 8 GARs in open().  We can use the
      cfun->va_list_gpr_size field set by the stdarg pass to only save the
      GARs necessary to be saved.
      
      If the va_list escapes (for example, in fprintf() we pass it to
      vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we
      don't need a special case.
      
      With this patch, only one GAR ($a2/$r6) is saved in open().  Ideally
      even this stack store should be omitted too, but doing so is not trivial
      and AFAIK there are no compilers (for any target) performing the "ideal"
      optimization here, see https://godbolt.org/z/n1YqWq9c9.
      
      Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk
      (GCC 14 or now)?
      
      gcc/ChangeLog:
      
      	* config/loongarch/loongarch.cc
      	(loongarch_setup_incoming_varargs): Don't save more GARs than
      	cfun->va_list_gpr_size / UNITS_PER_WORD.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/loongarch/va_arg.c: New test.
      81c65014
    • Richard Biener's avatar
      Avoid unnecessary epilogues from tree_unroll_loop · 01e79e21
      Richard Biener authored
      The following fixes the condition determining whether we need an
      epilogue.
      
      	* tree-ssa-loop-manip.cc (determine_exit_conditions): Fix
      	no epilogue condition.
      01e79e21
    • Richard Biener's avatar
      Simplify gimple_assign_load · 2c800ed8
      Richard Biener authored
      The following simplifies and outlines gimple_assign_load.  In
      particular it is not necessary to get at the base of the possibly
      loaded expression but just handle the case of a single handled
      component wrapping a non-memory operand.
      
      	* gimple.h (gimple_assign_load): Outline...
      	* gimple.cc (gimple_assign_load): ... here.  Avoid
      	get_base_address and instead just strip the outermost
      	handled component, treating a remaining handled component
      	as load.
      2c800ed8
    • Kyrylo Tkachov's avatar
      aarch64: Delete __builtin_aarch64_neg* builtins and their use · 9bc407c7
      Kyrylo Tkachov authored
      I don't think we need to keep the __builtin_aarch64_neg* builtins around.
      They are only used once in the vnegh_f16 intrinsic in arm_fp16.h and I AFAICT
      it was added this way only for the sake of orthogonality in
      https://gcc.gnu.org/g:d7f33f07d88984cbe769047e3d07fc21067fbba9
      We already use normal "-" negation in the other vneg* intrinsics, so do so here as well.
      
      Bootstrapped and tested on aarch64-none-linux-gnu.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-simd-builtins.def (neg): Delete builtins
      	definition.
      	* config/aarch64/arm_fp16.h (vnegh_f16): Reimplement using normal negation.
      9bc407c7
    • Jakub Jelinek's avatar
      tree-vect-patterns: Improve __builtin_{clz,ctz,ffs}ll vectorization [PR109011] · ade0a1ee
      Jakub Jelinek authored
      For __builtin_popcountll tree-vect-patterns.cc has
      vect_recog_popcount_pattern, which improves the vectorized code.
      Without that the vectorization is always multi-type vectorization
      in the loop (at least int and long long types) where we emit two
      .POPCOUNT calls with long long arguments and int return value and then
      widen to long long, so effectively after vectorization do the
      V?DImode -> V?DImode popcount twice, then pack the result into V?SImode
      and immediately unpack.
      
      The following patch extends that handling to __builtin_{clz,ctz,ffs}ll
      builtins as well (as long as there is an optab for them; more to come
      laster).
      
      x86 can do __builtin_popcountll with -mavx512vpopcntdq, __builtin_clzll
      with -mavx512cd, ppc can do __builtin_popcountll and __builtin_clzll
      with -mpower8-vector and __builtin_ctzll with -mpower9-vector, s390
      can do __builtin_{popcount,clz,ctz}ll with -march=z13 -mzarch (i.e. VX).
      
      2023-04-19  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/109011
      	* tree-vect-patterns.cc (vect_recog_popcount_pattern): Rename to ...
      	(vect_recog_popcount_clz_ctz_ffs_pattern): ... this.  Handle also
      	CLZ, CTZ and FFS.  Remove vargs variable, use
      	gimple_build_call_internal rather than gimple_build_call_internal_vec.
      	(vect_vect_recog_func_ptrs): Adjust popcount entry.
      
      	* gcc.dg/vect/pr109011-1.c: New test.
      ade0a1ee
    • Jakub Jelinek's avatar
      dse: Use SUBREG_REG for copy_to_mode_reg in DSE replace_read for... · 76f44fbf
      Jakub Jelinek authored
      dse: Use SUBREG_REG for copy_to_mode_reg in DSE replace_read for WORD_REGISTER_OPERATIONS targets [PR109040]
      
      While we've agreed this is not the right fix for the PR109040 bug,
      the patch clearly improves generated code (at least on the testcase from the
      PR), so I'd like to propose this as optimization heuristics improvement
      for GCC 14.
      
      2023-04-19  Jakub Jelinek  <jakub@redhat.com>
      
      	PR target/109040
      	* dse.cc (replace_read): If read_reg is a SUBREG of a word mode
      	REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into
      	a new REG rather than the SUBREG.
      76f44fbf
    • Prathamesh Kulkarni's avatar
      [aarch64] Use wzr/xzr for assigning 0 to vector element. · 2c7bf803
      Prathamesh Kulkarni authored
      gcc/ChangeLog:
      	* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>):
      	New pattern.
      
      gcc/testsuite/ChangeLog:
      	* gcc.target/aarch64/vec-set-zero.c: New test.
      2c7bf803
    • Kyrylo Tkachov's avatar
      aarch64: PR target/108840 Simplify register shift RTX costs and eliminate shift amount masking · 136330bf
      Kyrylo Tkachov authored
      In this PR we fail to eliminate explicit &31 operations for variable shifts such as in:
      void
      bar (int x[3], int y)
      {
        x[0] <<= (y & 31);
        x[1] <<= (y & 31);
        x[2] <<= (y & 31);
      }
      
      This is rejected by RTX costs that end up giving too high a cost for:
      (set (reg:SI 96)
          (ashift:SI (reg:SI 98)
              (subreg:QI (and:SI (reg:SI 99)
                      (const_int 31 [0x1f])) 0)))
      
      There is code to handle the AND-31 case in rtx costs, but it gets confused by the subreg.
      It's easy enough to fix by looking inside the subreg when costing the expression.
      While doing that I noticed that the ASHIFT case and the other shift-like cases are almost identical
      and we should just merge them. This code will only be used for valid insns anyway, so the code after this
      patch should do the Right Thing (TM) for all such shift cases.
      
      With this patch there are no more "and wn, wn, 31" instructions left in the testcase.
      
      Bootstrapped and tested on aarch64-none-linux-gnu.
      
      	PR target/108840
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64.cc (aarch64_rtx_costs): Merge ASHIFT and
      	ROTATE, ROTATERT, LSHIFTRT, ASHIFTRT cases.  Handle subregs in op1.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/pr108840.c: New test.
      136330bf
    • Richard Biener's avatar
      rtl-optimization/109237 - quadraticness in delete_trivially_dead_insns · 675ac882
      Richard Biener authored
      The following addresses quadraticness in processing debug insns
      in delete_trivially_dead_insns and insn_live_p by using TREE_VISITED
      on the INSN_VAR_LOCATION_DECL to indicate a later debug bind
      with the same decl and no intervening real insn or debug marker.
      That gets rid of the NEXT_INSN walk in insn_live_p in favor of
      first clearing TREE_VISITED in the first loop over insn and
      the book-keeping of decls we set the bit since we need to clear
      them when visiting a real or debug marker insn.
      
      That improves the time spent in delete_trivially_dead_insns from
      10.6s to 2.2s for the testcase.
      
      	PR rtl-optimization/109237
      	* cse.cc (insn_live_p): Remove NEXT_INSN walk, instead check
      	TREE_VISITED on INSN_VAR_LOCATION_DECL.
      	(delete_trivially_dead_insns): Maintain TREE_VISITED on
      	active debug bind INSN_VAR_LOCATION_DECL.
      675ac882
Loading