Skip to content
Snippets Groups Projects
  1. Feb 05, 2024
    • H.J. Lu's avatar
      x86-64: Find a scratch register for large model profiling · 51f8ac33
      H.J. Lu authored
      2 scratch registers, %r10 and %r11, are available at function entry for
      large model profiling.  But %r10 may be used by stack realignment and we
      can't use %r10 in this case.  Add x86_64_select_profile_regnum to find
      a caller-saved register which isn't live or a callee-saved register
      which has been saved on stack in the prologue at entry for large model
      profiling and sorry if we can't find one.
      
      gcc/
      
      	PR target/113689
      	* config/i386/i386.cc (x86_64_select_profile_regnum): New.
      	(x86_function_profiler): Call x86_64_select_profile_regnum to
      	get a scratch register for large model profiling.
      
      gcc/testsuite/
      
      	PR target/113689
      	* gcc.target/i386/pr113689-1.c: New file.
      	* gcc.target/i386/pr113689-2.c: Likewise.
      	* gcc.target/i386/pr113689-3.c: Likewise.
      51f8ac33
  2. Jan 27, 2024
    • H.J. Lu's avatar
      x86: Add no_callee_saved_registers function attribute · a96549dc
      H.J. Lu authored
      When an interrupt handler is implemented by an assembly stub which does:
      
      1. Save all registers.
      2. Call a C function.
      3. Restore all registers.
      4. Return from interrupt.
      
      it is completely unnecessary to save and restore any registers in the C
      function called by the assembly stub, even if they would normally be
      callee-saved.
      
      Add no_callee_saved_registers function attribute, which is complementary
      to no_caller_saved_registers function attribute, to mark a function which
      doesn't have any callee-saved registers.  Such a function won't save and
      restore any registers.  Classify function call-saved register handling
      type with:
      
      1. Default call-saved registers.
      2. No caller-saved registers with no_caller_saved_registers attribute.
      3. No callee-saved registers with no_callee_saved_registers attribute.
      
      Disallow sibcall if callee is a no_callee_saved_registers function
      and caller isn't a no_callee_saved_registers function.  Otherwise,
      callee-saved registers won't be preserved.
      
      After a no_callee_saved_registers function is called, all registers may
      be clobbered.  If the calling function isn't a no_callee_saved_registers
      function, we need to preserve all registers which aren't used by function
      calls.
      
      gcc/
      
      	PR target/103503
      	PR target/113312
      	* config/i386/i386-expand.cc (ix86_expand_call): Replace
      	no_caller_saved_registers check with call_saved_registers check.
      	Clobber all registers that are not used by the callee with
      	no_callee_saved_registers attribute.
      	* config/i386/i386-options.cc (ix86_set_func_type): Set
      	call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
      	noreturn function.  Disallow no_callee_saved_registers with
      	interrupt or no_caller_saved_registers attributes together.
      	(ix86_set_current_function): Replace no_caller_saved_registers
      	check with call_saved_registers check.
      	(ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
      	(ix86_handle_call_saved_registers_attribute): This.
      	(ix86_gnu_attributes): Add
      	ix86_handle_call_saved_registers_attribute.
      	* config/i386/i386.cc (ix86_conditional_register_usage): Replace
      	no_caller_saved_registers check with call_saved_registers check.
      	(ix86_function_ok_for_sibcall): Don't allow callee with
      	no_callee_saved_registers attribute when the calling function
      	has callee-saved registers.
      	(ix86_comp_type_attributes): Also check
      	no_callee_saved_registers.
      	(ix86_epilogue_uses): Replace no_caller_saved_registers check
      	with call_saved_registers check.
      	(ix86_hard_regno_scratch_ok): Likewise.
      	(ix86_save_reg): Replace no_caller_saved_registers check with
      	call_saved_registers check.  Don't save any registers for
      	TYPE_NO_CALLEE_SAVED_REGISTERS.  Save all registers with
      	TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
      	no_callee_saved_registers attribute is called.
      	(find_drap_reg): Replace no_caller_saved_registers check with
      	call_saved_registers check.
      	* config/i386/i386.h (call_saved_registers_type): New enum.
      	(machine_function): Replace no_caller_saved_registers with
      	call_saved_registers.
      	* doc/extend.texi: Document no_callee_saved_registers attribute.
      
      gcc/testsuite/
      
      	PR target/103503
      	PR target/113312
      	* gcc.dg/torture/no-callee-saved-run-1a.c: New file.
      	* gcc.dg/torture/no-callee-saved-run-1b.c: Likewise.
      	* gcc.target/i386/no-callee-saved-1.c: Likewise.
      	* gcc.target/i386/no-callee-saved-2.c: Likewise.
      	* gcc.target/i386/no-callee-saved-3.c: Likewise.
      	* gcc.target/i386/no-callee-saved-4.c: Likewise.
      	* gcc.target/i386/no-callee-saved-5.c: Likewise.
      	* gcc.target/i386/no-callee-saved-6.c: Likewise.
      	* gcc.target/i386/no-callee-saved-7.c: Likewise.
      	* gcc.target/i386/no-callee-saved-8.c: Likewise.
      	* gcc.target/i386/no-callee-saved-9.c: Likewise.
      	* gcc.target/i386/no-callee-saved-10.c: Likewise.
      	* gcc.target/i386/no-callee-saved-11.c: Likewise.
      	* gcc.target/i386/no-callee-saved-12.c: Likewise.
      	* gcc.target/i386/no-callee-saved-13.c: Likewise.
      	* gcc.target/i386/no-callee-saved-14.c: Likewise.
      	* gcc.target/i386/no-callee-saved-15.c: Likewise.
      	* gcc.target/i386/no-callee-saved-16.c: Likewise.
      	* gcc.target/i386/no-callee-saved-17.c: Likewise.
      	* gcc.target/i386/no-callee-saved-18.c: Likewise.
      a96549dc
  3. Jan 18, 2024
    • Jakub Jelinek's avatar
      i386: Add -masm=intel profiling support [PR113122] · d4a2d91b
      Jakub Jelinek authored
      x86_function_profiler emits assembly directly into file and only emits
      AT&T syntax.  The following patch adjusts it to emit MASM syntax
      if -masm=intel.
      As it doesn't use asm_fprintf, I can't use {|} syntax for the dialects.
      
      I've tested using
      for i in -mcmodel=large "-mcmodel=large -fpic" "" -fpic "-m32 -fpic" "-m32"; do
      ./xgcc -B ./ -c -O2 -fprofile $i -masm=att pr113122.c -o pr113122.o1;
      ./xgcc -B ./ -c -O2 -fprofile $i -masm=intel pr113122.c -o pr113122.o2;
      objdump -dr pr113122.o1 > /tmp/1; objdump -dr pr113122.o2 > /tmp/2;
      diff -up /tmp/1 /tmp/2; done
      that the emitted sequences are identical after assembly.
      
      2024-01-18  Jakub Jelinek  <jakub@redhat.com>
      
      	PR target/113122
      	* config/i386/i386.cc (x86_function_profiler): Add -masm=intel
      	support.  Add missing space after , in emitted assembly in some
      	cases.  Formatting fixes.
      
      	* gcc.target/i386/pr113122-1.c: New test.
      	* gcc.target/i386/pr113122-2.c: New test.
      	* gcc.target/i386/pr113122-3.c: New test.
      	* gcc.target/i386/pr113122-4.c: New test.
      d4a2d91b
  4. Jan 05, 2024
    • Ilya Leoshkevich's avatar
      asan: Align .LASANPC on function boundary · e66dc37b
      Ilya Leoshkevich authored
      GCC can emit code between the function label and the .LASANPC label,
      making the latter unaligned.  Some architectures cannot load unaligned
      labels directly and require literal pool entries, which is inefficient.
      
      Move the invocation of asan_function_start to
      ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code is
      emitted.  This allows setting the .LASANPC label alignment to the
      respective function alignment.
      
      Link: https://inbox.sourceware.org/gcc-patches/20240102194511.3171559-3-iii@linux.ibm.com/
      
      
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      
      gcc/ChangeLog:
      
      	* asan.cc (asan_function_start): Drop switch_to_section ().
      	(asan_emit_stack_protection): Set .LASANPC alignment.
      	* config/i386/i386.cc: Use assemble_function_label_raw ()
      	instead of ASM_OUTPUT_LABEL ().
      	* config/s390/s390.cc (s390_asm_output_function_label):
      	Likewise.
      	* defaults.h (ASM_OUTPUT_FUNCTION_LABEL): Likewise.
      	* final.cc (final_start_function_1): Drop
      	asan_function_start ().
      	* output.h (assemble_function_label_raw): New function.
      	* varasm.cc (assemble_function_label_raw): Likewise.
      e66dc37b
  5. Jan 03, 2024
  6. Dec 28, 2023
    • Uros Bizjak's avatar
      i386: Cleanup ix86_expand_{unary|binary}_operator issues · d74cceb6
      Uros Bizjak authored
      Move ix86_expand_unary_operator from i386.cc to i386-expand.cc, re-arrange
      prototypes and do some cosmetic changes with the usage of TARGET_APX_NDD.
      
      No functional changes.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_unary_operator_ok): Move from here...
      	* config/i386/i386-expand.cc (ix86_unary_operator_ok): ... to here.
      	* config/i386/i386-protos.h: Re-arrange ix86_{unary|binary}_operator_ok
      	and ix86_expand_{unary|binary}_operator prototypes.
      	* config/i386/i386.md: Cosmetic changes with the usage of
      	TARGET_APX_NDD in ix86_expand_{unary|binary}_operator
      	and ix86_{unary|binary}_operator_ok function calls.
      d74cceb6
  7. Dec 20, 2023
    • Haochen Jiang's avatar
      i386: Allow 64 bit mask register for -mno-evex512 · d3545378
      Haochen Jiang authored
      gcc/ChangeLog:
      
      	* config/i386/avx512bwintrin.h: Allow 64 bit mask intrin usage
      	for -mno-evex512.
      	* config/i386/i386-builtin.def: Remove OPTION_MASK_ISA2_EVEX512
      	for 64 bit mask builtins.
      	* config/i386/i386.cc (ix86_hard_regno_mode_ok): Allow 64 bit
      	mask register for -mno-evex512.
      	* config/i386/i386.md (SWI1248_AVX512BWDQ_64): Remove
      	TARGET_EVEX512.
      	(*zero_extendsidi2): Change isa attribute to avx512bw.
      	(kmov_isa): Ditto.
      	(*anddi_1): Ditto.
      	(*andn<mode>_1): Remove TARGET_EVEX512.
      	(*one_cmplsi2_1_zext): Change isa attribute to avx512bw.
      	(*ashl<mode>3_1): Ditto.
      	(*lshr<mode>3_1): Ditto.
      	* config/i386/sse.md (SWI1248_AVX512BWDQ): Remove TARGET_EVEX512.
      	(SWI1248_AVX512BW): Ditto.
      	(SWI1248_AVX512BWDQ2): Ditto.
      	(*knotsi_1_zext): Ditto.
      	(kunpckdi): Ditto.
      	(SWI24_MASK): Removed.
      	(vec_pack_trunc_<mode>): Change iterator from SWI24_MASK to SWI24.
      	(vec_unpacks_lo_di): Remove TARGET_EVEX512.
      	(SWI48x_MASK): Removed.
      	(vec_unpacks_hi_<mode>): Change iterator from SWI48x_MASK to SWI48x.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx10_1-6.c: Remove check for errors.
      	* gcc.target/i386/noevex512-2.c: Diito.
      d3545378
  8. Dec 15, 2023
    • Jakub Jelinek's avatar
      bitint: Introduce abi_limb_mode · a98a3932
      Jakub Jelinek authored
      Given what I saw in the aarch64/arm psABIs for BITINT_TYPE, as I said
      earlier I'm afraid we need to differentiate between the limb mode/precision
      specified in the psABIs (what is used to decide how it is actually passed,
      aligned or what size it has) vs. what limb mode/precision should be used
      during bitint lowering and in the libgcc bitint APIs.
      While in the x86_64 psABI a limb is 64-bit, which is perfect for both,
      that is a wordsize which we can perform operations natively in,
      e.g. aarch64 wants 128-bit limbs for alignment/sizing purposes, but
      on the bitint lowering side I believe it would result in terribly bad code
      and on the libgcc side wouldn't work at all (because it relies there on
      longlong.h support).
      
      So, the following patch makes it possible for aarch64 to use TImode
      as abi_limb_mode for _BitInt(129) and larger, while using DImode as
      limb_mode.
      
      2023-12-15  Jakub Jelinek  <jakub@redhat.com>
      
      	* target.h (struct bitint_info): Add abi_limb_mode member, adjust
      	comment.
      	* target.def (bitint_type_info): Mention abi_limb_mode instead of
      	limb_mode.
      	* varasm.cc (output_constant): Use abi_limb_mode rather than
      	limb_mode.
      	* stor-layout.cc (finish_bitfield_representative): Likewise.  Assert
      	that if precision is smaller or equal to abi_limb_mode precision or
      	if info.big_endian is different from WORDS_BIG_ENDIAN, info.limb_mode
      	must be the same as info.abi_limb_mode.
      	(layout_type): Use abi_limb_mode rather than limb_mode.
      	* gimple-fold.cc (clear_padding_bitint_needs_padding_p): Likewise.
      	(clear_padding_type): Likewise.
      	* config/i386/i386.cc (ix86_bitint_type_info): Also set
      	info->abi_limb_mode.
      	* doc/tm.texi: Regenerated.
      a98a3932
  9. Dec 13, 2023
    • Jakub Jelinek's avatar
      i386: Fix ICE on __builtin_ia32_pabsd128 without lhs [PR112962] · 02c30fda
      Jakub Jelinek authored
      The following patch fixes ICE on the testcase in similar way to how
      other folded builtins are handled in ix86_gimple_fold_builtin when
      they don't have a lhs; these builtins are const or pure, so normally
      DCE would remove them later, but with -O0 that isn't guaranteed to
      happen, and during expansion if they are marked TREE_SIDE_EFFECTS
      it might still be attempted to be expanded.
      This removes them right away during the folding.
      
      Initially I wanted to also change all gsi_replace last args in that function
      to true, but Andrew pointed to PR107209, so I've kept them as is.
      
      2023-12-13  Jakub Jelinek  <jakub@redhat.com>
      
      	PR target/112962
      	* config/i386/i386.cc (ix86_gimple_fold_builtin): For shifts
      	and abs without lhs replace with nop.
      
      	* gcc.target/i386/pr112962.c: New test.
      02c30fda
  10. Dec 12, 2023
    • liuhongt's avatar
      Don't assume it's AVX_U128_CLEAN after call_insn whose... · fc62716f
      liuhongt authored
      Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS.
      
      If the function desn't clobber any sse registers or only clobber
      128-bit part, then vzeroupper isn't issued before the function exit.
      the status not CLEAN but ANY after the function.
      
      Also for sibling_call, it's safe to issue an vzeroupper. Also there
      could be missing vzeroupper since there's no mode_exit for
      sibling_call_p.
      
      gcc/ChangeLog:
      
      	PR target/112891
      	* config/i386/i386.cc (ix86_avx_u128_mode_after): Return
      	AVX_U128_ANY if callee_abi doesn't clobber all_sse_regs to
      	align with ix86_avx_u128_mode_needed.
      	(ix86_avx_u128_mode_needed): Return AVX_U128_ClEAN for
      	sibling_call.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr112891.c: New test.
      	* gcc.target/i386/pr112891-2.c: New test.
      fc62716f
  11. Dec 07, 2023
    • Kong Lingling's avatar
      [APX NDD] Support APX NDD for neg insn · 042519b6
      Kong Lingling authored
      gcc/ChangeLog:
      
      	* config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd
      	parameter and adjust for NDD.
      	* config/i386/i386-protos.h: Add use_ndd parameter for
      	ix86_unary_operator_ok and ix86_expand_unary_operator.
      	* config/i386/i386.cc (ix86_unary_operator_ok): Add use_ndd parameter
      	and adjust for NDD.
      	* config/i386/i386.md (neg<mode>2): Add new constraint for NDD and
      	adjust output template.
      	(*neg<mode>_1): Likewise.
      	(*neg<dwi>2_doubleword): Likewise and adopt '&' to NDD dest.
      	(*neg<mode>_2): Likewise.
      	(*neg<mode>_ccc_1): Likewise.
      	(*neg<mode>_ccc_2): Likewise.
      	(*negsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
      	to accept memory input for NDD alternatives.
      	(*negsi_2_zext): Likewise.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/apx-ndd.c: Add neg test.
      042519b6
    • Hongyu Wang's avatar
      [APX NDD] Disable seg_prefixed memory usage for NDD add · d564198f
      Hongyu Wang authored
      NDD uses evex prefix, so when segment prefix is also applied, the instruction
      could excceed its 15byte limit, especially adding immediates. This could happen
      when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
      add the offset to segment register, which will be encoded using segment prefix.
      Disable those *POFF constant usage in NDD add alternatives with new constraint.
      
      gcc/ChangeLog:
      
      	* config/i386/constraints.md (je): New constraint.
      	* config/i386/i386-protos.h (x86_poff_operand_p): New function to
      	check any *POFF constant in operand.
      	* config/i386/i386.cc (x86_poff_operand_p): New prototype.
      	* config/i386/i386.md (*add<mode>_1): Split out je alternative for add.
      d564198f
  12. Dec 05, 2023
    • Richard Sandiford's avatar
      Allow targets to add USEs to asms · 414d795d
      Richard Sandiford authored
      Arm's SME has an array called ZA that for inline asm purposes
      is effectively a form of special-purpose memory.  It doesn't
      have an associated storage type and so can't be passed and
      returned in normal C/C++ objects.
      
      We'd therefore like "za" in a clobber list to mean that an inline
      asm can read from and write to ZA.  (Just reading or writing
      individually is unlikely to be useful, but we could add syntax
      for that too if necessary.)
      
      There is currently a TARGET_MD_ASM_ADJUST target hook that allows
      targets to add clobbers to an asm instruction.  This patch
      extends that to allow targets to add USEs as well.
      
      gcc/
      	* target.def (md_asm_adjust): Add a uses parameter.
      	* doc/tm.texi: Regenerate.
      	* cfgexpand.cc (expand_asm_loc): Update call to md_asm_adjust.
      	Handle any USEs created by the target.
      	(expand_asm_stmt): Likewise.
      	* recog.cc (asm_noperands): Handle asms with USEs.
      	(decode_asm_operands): Likewise.
      	* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add uses
      	parameter.
      	* config/arm/aarch-common.cc (arm_md_asm_adjust): Likewise.
      	* config/arm/arm.cc (thumb1_md_asm_adjust): Likewise.
      	* config/avr/avr.cc (avr_md_asm_adjust): Likewise.
      	* config/cris/cris.cc (cris_md_asm_adjust): Likewise.
      	* config/i386/i386.cc (ix86_md_asm_adjust): Likewise.
      	* config/mn10300/mn10300.cc (mn10300_md_asm_adjust): Likewise.
      	* config/nds32/nds32.cc (nds32_md_asm_adjust): Likewise.
      	* config/pdp11/pdp11.cc (pdp11_md_asm_adjust): Likewise.
      	* config/rs6000/rs6000.cc (rs6000_md_asm_adjust): Likewise.
      	* config/s390/s390.cc (s390_md_asm_adjust): Likewise.
      	* config/vax/vax.cc (vax_md_asm_adjust): Likewise.
      	* config/visium/visium.cc (visium_md_asm_adjust): Likewise.
      414d795d
    • liuhongt's avatar
      Take register pressure into account for vec_construct/scalar_to_vec when the... · b1cb2d99
      liuhongt authored
      Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.
      
      For vec_contruct, the components must be live at the same time if
      they're not loaded from memory, when the number of those components
      exceeds available registers, spill happens. Try to account that with a
      rough estimation.
      ??? Ideally, we should have an overall estimation of register pressure
      if we know the live range of all variables.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
      	Count sse_reg/gpr_regs for components not loaded from memory.
      	(ix86_vector_costs:ix86_vector_costs): New constructor.
      	(ix86_vector_costs::m_num_gpr_needed[3]): New private memeber.
      	(ix86_vector_costs::m_num_sse_needed[3]): Ditto.
      	(ix86_vector_costs::finish_cost): Estimate overall register
      	pressure cost.
      	(ix86_vector_costs::ix86_vect_estimate_reg_pressure): New
      	function.
      b1cb2d99
  13. Dec 04, 2023
    • Jakub Jelinek's avatar
      i386: Fix rtl checking ICE in ix86_elim_entry_set_got [PR112837] · 4586d7d0
      Jakub Jelinek authored
      The following testcase ICEs with RTL checking, because it sets if
      XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set)
      is actually an UNSPEC, so any time we see any other insn with PARALLEL
      and a SET in it which is not an UNSPEC we ICE during RTL checking or
      access there some other union member as if it was an rt_int.
      The rest is just small cleanup.
      
      2023-12-04  Jakub Jelinek  <jakub@redhat.com>
      
      	PR target/112837
      	* config/i386/i386.cc (ix86_elim_entry_set_got): Before checking
      	for UNSPEC_SET_GOT check that SET_SRC is UNSPEC.  Use SET_SRC and
      	SET_DEST macros instead of XEXP, rename vec variable to set.
      
      	* gcc.dg/pr112837.c: New test.
      4586d7d0
  14. Dec 02, 2023
    • Richard Sandiford's avatar
      Allow target attributes in non-gnu namespaces · 7fa24687
      Richard Sandiford authored
      Currently there are four static sources of attributes:
      
      - LANG_HOOKS_ATTRIBUTE_TABLE
      - LANG_HOOKS_COMMON_ATTRIBUTE_TABLE
      - LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE
      - TARGET_ATTRIBUTE_TABLE
      
      All of the attributes in these tables go in the "gnu" namespace.
      This means that they can use the traditional GNU __attribute__((...))
      syntax and the standard [[gnu::...]] syntax.
      
      Standard attributes are registered dynamically with a null namespace.
      There are no supported attributes in other namespaces (clang, vendor
      namespaces, etc.).
      
      This patch tries to generalise things by making the namespace
      part of the attribute specification.
      
      It's usual for multiple attributes to be defined in the same namespace,
      so rather than adding the namespace to each individual definition,
      it seemed better to group attributes in the same namespace together.
      This would also allow us to reuse the same table for clang attributes
      that are written with the GNU syntax, or other similar situations
      where the attribute can be accessed via multiple "spellings".
      
      The patch therefore adds a scoped_attribute_specs that contains
      a namespace and a list of attributes in that namespace.
      
      It's still possible to have multiple scoped_attribute_specs
      for the same namespace.  E.g. it makes sense to keep the
      C++-specific, C/C++-common, and format-related attributes in
      separate tables, even though they're all GNU attributes.
      
      Current lists of attributes are terminated by a null name.
      Rather than keep that for the new structure, it seemed neater
      to use an array_slice.  This also makes the tables slighly more
      compact.
      
      In general, a target might want to support attributes in multiple
      namespaces.  Rather than have a separate hook for each possibility
      (like the three langhooks above), it seemed better to make
      TARGET_ATTRIBUTE_TABLE a table of tables.  Specifically, it's
      an array_slice of scoped_attribute_specs.
      
      We can do the same thing for langhooks, which allows the three hooks
      above to be merged into a single LANG_HOOKS_ATTRIBUTE_TABLE.
      It also allows the standard attributes to be registered statically
      and checked by the usual attribs.cc checks.
      
      The patch adds a TARGET_GNU_ATTRIBUTES helper for the common case
      in which a target wants a single table of gnu attributes.  It can
      only be used if the table is free of preprocessor directives.
      
      There are probably other things we need to do to make vendor namespaces
      work smoothly.  E.g. in principle it would be good to make exclusion
      sets namespace-aware.  But to some extent we have that with standard
      vs. gnu attributes too.  This patch is just supposed to be a first step.
      
      gcc/
      	* attribs.h (scoped_attribute_specs): New structure.
      	(register_scoped_attributes): Take a reference to a
      	scoped_attribute_specs instead of separate namespace and array
      	parameters.
      	* plugin.h (register_scoped_attributes): Likewise.
      	* attribs.cc (register_scoped_attributes): Likewise.
      	(attribute_tables): Change into an array of scoped_attribute_specs
      	pointers.  Reduce to 1 element for frontends and 1 element for targets.
      	(empty_attribute_table): Delete.
      	(check_attribute_tables): Update for changes to attribute_tables.
      	Use a hash_set to identify duplicates.
      	(handle_ignored_attributes_option): Update for above changes.
      	(init_attributes): Likewise.
      	(excl_pair): Delete.
      	(test_attribute_exclusions): Update for above changes.  Don't
      	enforce symmetry for standard attributes in the top-level namespace.
      	* langhooks-def.h (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
      	(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Likewise.
      	(LANG_HOOKS_INITIALIZER): Update accordingly.
      	(LANG_HOOKS_ATTRIBUTE_TABLE): Define to an empty constructor.
      	* langhooks.h (lang_hooks::common_attribute_table): Delete.
      	(lang_hooks::format_attribute_table): Likewise.
      	(lang_hooks::attribute_table): Redefine to an array of
      	scoped_attribute_specs pointers.
      	* target-def.h (TARGET_GNU_ATTRIBUTES): New macro.
      	* target.def (attribute_spec): Redefine to return an array of
      	scoped_attribute_specs pointers.
      	* tree-inline.cc (function_attribute_inlinable_p): Update accordingly.
      	* doc/tm.texi: Regenerate.
      	* config/aarch64/aarch64.cc (aarch64_attribute_table): Define using
      	TARGET_GNU_ATTRIBUTES.
      	* config/alpha/alpha.cc (vms_attribute_table): Likewise.
      	* config/avr/avr.cc (avr_attribute_table): Likewise.
      	* config/bfin/bfin.cc (bfin_attribute_table): Likewise.
      	* config/bpf/bpf.cc (bpf_attribute_table): Likewise.
      	* config/csky/csky.cc (csky_attribute_table): Likewise.
      	* config/epiphany/epiphany.cc (epiphany_attribute_table): Likewise.
      	* config/gcn/gcn.cc (gcn_attribute_table): Likewise.
      	* config/h8300/h8300.cc (h8300_attribute_table): Likewise.
      	* config/loongarch/loongarch.cc (loongarch_attribute_table): Likewise.
      	* config/m32c/m32c.cc (m32c_attribute_table): Likewise.
      	* config/m32r/m32r.cc (m32r_attribute_table): Likewise.
      	* config/m68k/m68k.cc (m68k_attribute_table): Likewise.
      	* config/mcore/mcore.cc (mcore_attribute_table): Likewise.
      	* config/microblaze/microblaze.cc (microblaze_attribute_table):
      	Likewise.
      	* config/mips/mips.cc (mips_attribute_table): Likewise.
      	* config/msp430/msp430.cc (msp430_attribute_table): Likewise.
      	* config/nds32/nds32.cc (nds32_attribute_table): Likewise.
      	* config/nvptx/nvptx.cc (nvptx_attribute_table): Likewise.
      	* config/riscv/riscv.cc (riscv_attribute_table): Likewise.
      	* config/rl78/rl78.cc (rl78_attribute_table): Likewise.
      	* config/rx/rx.cc (rx_attribute_table): Likewise.
      	* config/s390/s390.cc (s390_attribute_table): Likewise.
      	* config/sh/sh.cc (sh_attribute_table): Likewise.
      	* config/sparc/sparc.cc (sparc_attribute_table): Likewise.
      	* config/stormy16/stormy16.cc (xstormy16_attribute_table): Likewise.
      	* config/v850/v850.cc (v850_attribute_table): Likewise.
      	* config/visium/visium.cc (visium_attribute_table): Likewise.
      	* config/arc/arc.cc (arc_attribute_table): Likewise.  Move further
      	down file.
      	* config/arm/arm.cc (arm_attribute_table): Update for above changes,
      	using...
      	(arm_gnu_attributes, arm_gnu_attribute_table): ...these new globals.
      	* config/i386/i386-options.h (ix86_attribute_table): Delete.
      	(ix86_gnu_attribute_table): Declare.
      	* config/i386/i386-options.cc (ix86_attribute_table): Replace with...
      	(ix86_gnu_attributes, ix86_gnu_attribute_table): ...these two globals.
      	* config/i386/i386.cc (ix86_attribute_table): Define as an array of
      	scoped_attribute_specs pointers.
      	* config/ia64/ia64.cc (ia64_attribute_table): Update for above changes,
      	using...
      	(ia64_gnu_attributes, ia64_gnu_attribute_table): ...these new globals.
      	* config/rs6000/rs6000.cc (rs6000_attribute_table): Update for above
      	changes, using...
      	(rs6000_gnu_attributes, rs6000_gnu_attribute_table): ...these new
      	globals.
      
      gcc/ada/
      	* gcc-interface/gigi.h (gnat_internal_attribute_table): Change
      	type to scoped_attribute_specs.
      	* gcc-interface/utils.cc (gnat_internal_attribute_table): Likewise,
      	using...
      	(gnat_internal_attributes): ...this as the underlying array.
      	* gcc-interface/misc.cc (gnat_attribute_table): New global.
      	(LANG_HOOKS_ATTRIBUTE_TABLE): Use it.
      
      gcc/c-family/
      	* c-common.h (c_common_attribute_table): Replace with...
      	(c_common_gnu_attribute_table): ...this.
      	(c_common_format_attribute_table): Change type to
      	scoped_attribute_specs.
      	* c-attribs.cc (c_common_attribute_table): Replace with...
      	(c_common_gnu_attributes, c_common_gnu_attribute_table): ...these
      	new globals.
      	(c_common_format_attribute_table): Change type to
      	scoped_attribute_specs, using...
      	(c_common_format_attributes): ...this as the underlying array.
      
      gcc/c/
      	* c-tree.h (std_attribute_table): Declare.
      	* c-decl.cc (std_attribute_table): Change type to
      	scoped_attribute_specs, using...
      	(std_attributes): ...this as the underlying array.
      	(c_init_decl_processing): Remove call to register_scoped_attributes.
      	* c-objc-common.h (c_objc_attribute_table): New global.
      	(LANG_HOOKS_ATTRIBUTE_TABLE): Use it.
      	(LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
      	(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete.
      
      gcc/cp/
      	* cp-tree.h (cxx_attribute_table): Delete.
      	(cxx_gnu_attribute_table, std_attribute_table): Declare.
      	* cp-objcp-common.h (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
      	(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete.
      	(cp_objcp_attribute_table): New table.
      	(LANG_HOOKS_ATTRIBUTE_TABLE): Redefine.
      	* tree.cc (cxx_attribute_table): Replace with...
      	(cxx_gnu_attributes, cxx_gnu_attribute_table): ...these globals.
      	(std_attribute_table): Change type to scoped_attribute_specs, using...
      	(std_attributes): ...this as the underlying array.
      	(init_tree): Remove call to register_scoped_attributes.
      
      gcc/d/
      	* d-tree.h (d_langhook_attribute_table): Replace with...
      	(d_langhook_gnu_attribute_table): ...this.
      	(d_langhook_common_attribute_table): Change type to
      	scoped_attribute_specs.
      	* d-attribs.cc (d_langhook_common_attribute_table): Change type to
      	scoped_attribute_specs, using...
      	(d_langhook_common_attributes): ...this as the underlying array.
      	(d_langhook_attribute_table): Replace with...
      	(d_langhook_gnu_attributes, d_langhook_gnu_attribute_table): ...these
      	new globals.
      	(uda_attribute_p): Update accordingly, and update for new
      	targetm.attribute_table type.
      	* d-lang.cc (d_langhook_attribute_table): New global.
      	(LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
      
      gcc/fortran/
      	* f95-lang.cc: Include attribs.h.
      	(gfc_attribute_table): Change to an array of scoped_attribute_specs
      	pointers, using...
      	(gfc_gnu_attributes, gfc_gnu_attribute_table): ...these new globals.
      
      gcc/jit/
      	* dummy-frontend.cc (jit_format_attribute_table): Change type to
      	scoped_attribute_specs, using...
      	(jit_format_attributes): ...this as the underlying array.
      	(jit_attribute_table): Change to an array of scoped_attribute_specs
      	pointers, using...
      	(jit_gnu_attributes, jit_gnu_attribute_table): ...these new globals
      	for the original array.  Include the format attributes.
      	(LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
      	(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete.
      	(LANG_HOOKS_ATTRIBUTE_TABLE): Define.
      
      gcc/lto/
      	* lto-lang.cc (lto_format_attribute_table): Change type to
      	scoped_attribute_specs, using...
      	(lto_format_attributes): ...this as the underlying array.
      	(lto_attribute_table): Change to an array of scoped_attribute_specs
      	pointers, using...
      	(lto_gnu_attributes, lto_gnu_attribute_table): ...these new globals
      	for the original array.  Include the format attributes.
      	(LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
      	(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete.
      	(LANG_HOOKS_ATTRIBUTE_TABLE): Define.
      7fa24687
  15. Nov 24, 2023
    • Uros Bizjak's avatar
      i386: Fix ICE with -fsplit-stack -mcmodel=large [PR112686] · 404ea4c1
      Uros Bizjak authored
      For -mcmodel=large, we have to load function address to a register.
      
      	PR target/112686
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_expand_split_stack_prologue): Load
      	function address to a register for ix86_cmodel == CM_LARGE.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr112686.c: New test.
      404ea4c1
  16. Nov 23, 2023
    • Uros Bizjak's avatar
      i386: Fix ICE with -mforce-indirect-call and -fsplit-stack [PR89316] · 2f3f8952
      Uros Bizjak authored
      With the above two options, use a temporary register regno (as returned
      from split_stack_prologue_scratch_regno) as an indirect call scratch
      register to hold __morestack function address.  On 64-bit targets, two
      temporary registers are always available, so load the function addres in
      %r11 and call __morestack_large_model with its one-argument-register value
      rn %r10.  On 32-bit targets, bail out with a "sorry" if the temporary
      register can not be obtained.
      
      On 32-bit targets, also emit PIC sequence that re-uses the obtained indirect
      call scratch register before moving the function address to it.  We can
      not set up %ebx PIC register in this case, but __morestack is prepared
      for this situation and sets it up by itself.
      
      	PR target/89316
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_expand_split_stack_prologue): Obtain
      	scratch regno when flag_force_indirect_call is set.  On 64-bit
      	targets, call __morestack_large_model when  flag_force_indirect_call
      	is set and on 32-bit targets with -fpic, manually expand PIC sequence
      	to call __morestack.  Move the function address to an indirect
      	call scratch register.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.target/i386/pr89316.C: New test.
      	* gcc.target/i386/pr112605-1.c: New test.
      	* gcc.target/i386/pr112605-2.c: New test.
      	* gcc.target/i386/pr112605.c: New test.
      2f3f8952
  17. Nov 21, 2023
    • Hongyu Wang's avatar
      [APX PPX] Support Intel APX PPX · 7ad308bd
      Hongyu Wang authored
      PPX stands for Push-Pop Acceleration. PUSH/PUSH2 and its corresponding POP
      can be marked with a 1-bit hint to indicate that the POP reads the
      value written by the PUSH from the stack. The processor tracks these marked
      instructions internally and fast-forwards register data between
      matching PUSH and POP instructions, without going through memory or
      through the training loop of the Fast Store Forwarding Predictor (FSFP).
      This feature can also be adopted to PUSH2/POP2.
      
      For GCC, we emit explicit suffix 'p' (paired) to indicate the push/pop
      pair are marked with PPX hint. To separate form original push/pop, we
      add an UNSPEC on top of those PUSH/POP patterns.
      
      In the first implementation we only emit them under prologue/epilogue
      when saving/restoring callee-saved registers to make sure push/pop are
      paired. So an extra flag was added to check if PPX insns can be emitted
      for those register save/restore interfaces.
      
      The PPX hint is purely a performance hint. If the 'p' suffix is not
      emitted for paired push/pop, the PPX optimization will be disabled,
      while program sematic will not be affected at all.
      
      gcc/ChangeLog:
      
      	* config/i386/i386-expand.h (gen_push): Add default bool
      	parameter.
      	(gen_pop): Likewise.
      	* config/i386/i386-opts.h (enum apx_features): Add apx_ppx, add
      	it to apx_all.
      	* config/i386/i386.cc (ix86_emit_restore_reg_using_pop): Add
      	ppx_p parameter for function declaration.
      	(gen_push2): Add ppx_p parameter, emit push2p if ppx_p is true.
      	(gen_push): Likewise.
      	(ix86_emit_restore_reg_using_pop2): Likewise for pop2p.
      	(ix86_emit_save_regs): Emit pushp/push2p under TARGET_APX_PPX.
      	(ix86_emit_restore_reg_using_pop): Add ppx_p, emit popp insn
      	and adjust cfi when ppx_p is ture.
      	(ix86_emit_restore_reg_using_pop2): Add ppx_p and parse to its
      	callee.
      	(ix86_emit_restore_regs_using_pop2): Likewise.
      	(ix86_expand_epilogue): Parse TARGET_APX_PPX to
      	ix86_emit_restore_reg_using_pop.
      	* config/i386/i386.h (TARGET_APX_PPX): New.
      	* config/i386/i386.md (UNSPEC_APX_PPX): New unspec.
      	(pushp_di): New define_insn.
      	(popp_di): Likewise.
      	(push2p_di): Likewise.
      	(pop2p_di): Likewise.
      	* config/i386/i386.opt: Add apx_ppx enum.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/apx-interrupt-1.c: Adjust option to restrict them
      	under certain subfeatures.
      	* gcc.target/i386/apx-push2pop2-1.c: Likewise.
      	* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
      	* gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise.
      	* gcc.target/i386/apx-ppx-1.c: New test.
      7ad308bd
  18. Nov 13, 2023
    • Uros Bizjak's avatar
      i386: Rewrite pushfl<mode>2 and popfl<mode>1 as unspecs · 10f12d32
      Uros Bizjak authored
      Flags reg is valid only with CC mode.
      
      gcc/ChangeLog:
      
      	* config/i386/i386-expand.h (gen_pushfl): New prototype.
      	(gen_popfl): Ditto.
      	* config/i386/i386-expand.cc (ix86_expand_builtin)
      	[case IX86_BUILTIN_READ_FLAGS]: Use gen_pushfl.
      	[case IX86_BUILTIN_WRITE_FLAGS]: Use gen_popfl.
      	* config/i386/i386.cc (gen_pushfl): New function.
      	(gen_popfl): Ditto.
      	* config/i386/i386.md (unspec): Add UNSPEC_PUSHFL and UNSPEC_POPFL.
      	(@pushfl<mode>2): Rename from *pushfl<mode>2.
      	Rewrite as unspec using UNSPEC_PUSHFL.
      	(@popfl<mode>1): Rename from *popfl<mode>1.
      	Rewrite as unspec using UNSPEC_POPFL.
      10f12d32
    • Uros Bizjak's avatar
      i386: Return CCmode from ix86_cc_mode for unknown RTX code [PR112494] · c75bab72
      Uros Bizjak authored
      Combine wants to combine following instructions into an insn that can
      perform both an (arithmetic) operation and set the condition code.  During
      the conversion a new RTX is created, and combine passes the RTX code of the
      innermost RTX expression of the CC use insn in which CC reg is used to
      SELECT_CC_MODE, to determine the new mode of the comparison:
      
      Trying 5 -> 8:
          5: r98:DI=0xd7
          8: flags:CCZ=cmp(r98:DI,0)
            REG_EQUAL cmp(0xd7,0)
      Failed to match this instruction:
      (parallel [
              (set (reg:CC 17 flags)
                  (compare:CC (const_int 215 [0xd7])
                      (const_int 0 [0])))
              (set (reg/v:DI 98 [ flags ])
                  (const_int 215 [0xd7]))
          ])
      
      where:
      
      (insn 5 2 6 2 (set (reg/v:DI 98 [ flags ])
              (const_int 215 [0xd7])) "pr112494.c":8:8 84 {*movdi_internal}
           (nil))
      
      (insn 8 7 11 2 (set (reg:CCZ 17 flags)
              (compare:CCZ (reg/v:DI 98 [ flags ])
                  (const_int 0 [0]))) "pr112494.c":11:9 8 {*cmpdi_ccno_1}
           (expr_list:REG_EQUAL (compare:CCZ (const_int 215 [0xd7])
                  (const_int 0 [0]))
              (nil)))
      
      x86_cc_mode (AKA SELECT_CC_MODE) is not prepared to handle random RTX
      codes and triggers gcc_unreachable() when SET RTX code is passed to it.
      The patch removes gcc_unreachable() and returns CCmode for unknown
      RTX codes, so combine can try various combinations involving CC reg
      without triggering ICE.
      
      Please note that x86 MOV instructions do not set flags, so the above
      combination is not recognized as a valid x86 instruction.
      
      	PR target/112494
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_cc_mode) [default]: Return CCmode.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr112494.c: New test.
      c75bab72
  19. Nov 11, 2023
    • Richard Sandiford's avatar
      mode-switching: Pass the set of live registers to the after hook · 93d65f39
      Richard Sandiford authored
      This patch passes the set of live hard registers to the after hook,
      like the previous one did for the needed hook.
      
      gcc/
      	* target.def (mode_switching.after): Add a regs_live parameter.
      	* doc/tm.texi: Regenerate.
      	* config/epiphany/epiphany-protos.h (epiphany_mode_after): Update
      	accordingly.
      	* config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise.
      	(epiphany_mode_after): Likewise.
      	* config/i386/i386.cc (ix86_mode_after): Likewise.
      	* config/riscv/riscv.cc (riscv_mode_after): Likewise.
      	* config/sh/sh.cc (sh_mode_after): Likewise.
      	* mode-switching.cc (optimize_mode_switching): Likewise.
      93d65f39
    • Richard Sandiford's avatar
      mode-switching: Pass set of live registers to the needed hook · 29d3e189
      Richard Sandiford authored
      The emit hook already takes the set of live hard registers as input.
      This patch passes it to the needed hook too.  SME uses this to
      optimise the mode choice based on whether state is live or dead.
      
      The main caller already had access to the required info, but the
      special handling of return values did not.
      
      gcc/
      	* target.def (mode_switching.needed): Add a regs_live parameter.
      	* doc/tm.texi: Regenerate.
      	* config/epiphany/epiphany-protos.h (epiphany_mode_needed): Update
      	accordingly.
      	* config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise.
      	* config/epiphany/mode-switch-use.cc (insert_uses): Likewise.
      	* config/i386/i386.cc (ix86_mode_needed): Likewise.
      	* config/riscv/riscv.cc (riscv_mode_needed): Likewise.
      	* config/sh/sh.cc (sh_mode_needed): Likewise.
      	* mode-switching.cc (optimize_mode_switching): Likewise.
      	(create_pre_exit): Likewise, using the DF simulate functions
      	to calculate the required information.
      29d3e189
  20. Nov 09, 2023
    • Alexandre Oliva's avatar
      i386 PIE: accept @GOTOFF in load/store multi base address · 38b396d6
      Alexandre Oliva authored
      Looking at the code generated for sse2-{load,store}-multi.c with PIE,
      I realized we could use UNSPEC_GOTOFF as a base address, and that this
      would enable the test to use the vector insns expected by the tests
      even with PIC, so I extended the base + offset logic used by the SSE2
      multi-load/store peepholes to accept reg + symbolic base + offset too,
      so that the test generated the expected insns even with PIE.
      
      
      for  gcc/ChangeLog
      
      	* config/i386/i386.cc (symbolic_base_address_p,
      	base_address_p): New, factored out from...
      	(extract_base_offset_in_addr): ... here and extended to
      	recognize REG+GOTOFF, as in gcc.target/i386/sse2-load-multi.c
      	and sse2-store-multi.c with PIE enabled by default.
      38b396d6
  21. Nov 06, 2023
    • Uros Bizjak's avatar
      i386: Use "addr" attribute to limit address regclass to non-REX regs · ecd755a9
      Uros Bizjak authored
      Use "addr" attribute with "gpr8" value to limit address register class
      to non-REX registers in instructions with high registers, where REX
      registers can not be used in the address.
      
      gcc/ChangeLog:
      
      	* config/i386/constraints.md (Bc): Remove constraint.
      	(Bn): Rewrite to use x86_extended_reg_mentioned_p predicate.
      	* config/i386/i386.cc (ix86_memory_address_reg_class):
      	Do not limit processing to TARGET_APX_EGPR.  Exit early for
      	NULL insn.  Do not check recog_data.insn before calling
      	extract_insn_cached.
      	(ix86_insn_base_reg_class): Handle ADDR_GPR8.
      	(ix86_regno_ok_for_insn_base_p): Ditto.
      	(ix86_insn_index_reg_class): Ditto.
      	* config/i386/i386.md (*cmpqi_ext<mode>_1_mem_rex64):
      	Remove insn pattern and corresponding peephole2 pattern.
      	(*cmpi_ext<mode>_1): Remove (m,Q) alternative.
      	Change (QBc,Q) alternative to (QBn,Q).  Add "addr" attribute.
      	(*cmpqi_ext<mode>_3_mem_rex64): Remove insn pattern
      	and corresponding peephole2 pattern.
      	(*cmpi_ext<mode>_3): Remove (Q,m) alternative.
      	Change (Q,QnBc) alternative to (Q,QnBn).  Add "addr" attribute.
      	(*extzvqi_mem_rex64): Remove insn pattern and
      	corresponding peephole2 pattern.
      	(*extzvqi): Remove (Q,m) alternative.  Change (Q,QnBc)
      	alternative to (Q,QnBn).  Add "addr" attribute.
      	(*insvqi_1_mem_rex64): Remove insn pattern and
      	corresponding peephole2 pattern.
      	(*insvqi_1): Remove (Q,m) alternative.  Change (Q,QnBc)
      	alternative to (Q,QnBn).  Add "addr" attribute.
      	(@insv<mode>_1): Ditto.
      	(*addqi_ext<mode>_0): Remove (m,0,Q) alternative.  Change (QBc,0,Q)
      	alternative to (QBn,0,Q).  Add "addr" attribute.
      	(*subqi_ext<mode>_0): Ditto.
      	(*andqi_ext<mode>_0): Ditto.
      	(*<any_or:code>qi_ext<mode>_0): Ditto.
      	(*addqi_ext<mode>_1): Remove (Q,0,m) alternative.  Change (Q,0,QnBc)
      	alternative to (Q,0,QnBn).  Add "addr" attribute.
      	(*andqi_ext<mode>_1): Ditto.
      	(*andqi_ext<mode>_1_cc): Ditto.
      	(*<any_or:code>qi_ext<mode>_1): Ditto.
      	(*xorqi_ext<mode>_1_cc): Ditto.
      	* config/i386/predicates.md (nonimm_x64constmem_operand):
      	Remove predicate.
      	(general_x64constmem_operand): Ditto.
      	(norex_memory_operand): Ditto.
      ecd755a9
  22. Nov 03, 2023
    • Uros Bizjak's avatar
      i386: Handle multiple address register classes · 751fc7bc
      Uros Bizjak authored
      The patch generalizes address register class handling to allow multiple
      register classes.  For APX EGPR targets, some instructions do not support
      GPR32 registers, so it is necessary to limit address register set to
      avoid them.  The same situation happens for instructions with high registers,
      where REX registers can not be used in the address, so the existing
      infrastructure can be adapted to also handle this case.
      
      The patch is mostly a mechanical rename of "gpr32" attribute to "addr" and
      introduces no functional changes, although it fixes a couple of inconsistent
      attribute values in passing.
      
      A follow-up patch will use the above infrastructure to limit address register
      class to legacy registers for instructions with high registers.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
      	Rename to ...
      	(ix86_memory_address_reg_class): ... this.  Generalize address
      	register class handling to allow multiple address register classes.
      	Return maximal class for unrecognized instructions.  Improve comments.
      	(ix86_insn_base_reg_class): Rewrite to handle
      	multiple address register classes.
      	(ix86_regno_ok_for_insn_base_p): Ditto.
      	(ix86_insn_index_reg_class): Ditto.
      	* config/i386/i386.md: Rename "gpr32" attribute to "addr"
      	and substitute its values with "0" -> "gpr16", "1" -> "*".
      	(addr): New attribute to limit allowed address register set.
      	(gpr32): Remove.
      	* config/i386/mmx.md: Rename "gpr32" attribute to "addr"
      	and substitute its values with "0" -> "gpr16", "1" -> "*".
      	* config/i386/sse.md: Ditto.
      751fc7bc
  23. Oct 27, 2023
    • liuhongt's avatar
      Support vec_cmpmn/vcondmn for v2hf/v4hf. · 7eed861e
      liuhongt authored
      gcc/ChangeLog:
      
      	PR target/103861
      	* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle
      	V2HF/V2BF/V4HF/V4BFmode.
      	* config/i386/i386.cc (ix86_get_mask_mode): Return QImode when
      	data_mode is V4HF/V2HFmode.
      	* config/i386/mmx.md (vec_cmpv4hfqi): New expander.
      	(vcond_mask_<mode>v4hi): Ditto.
      	(vcond_mask_<mode>qi): Ditto.
      	(vec_cmpv2hfqi): Ditto.
      	(vcond_mask_<mode>v2hi): Ditto.
      	(mmx_plendvb_<mode>): Add 2 combine splitters after the
      	patterns.
      	(mmx_pblendvb_v8qi): Ditto.
      	(<code>v2hi3): Add a combine splitter after the pattern.
      	(<code><mode>3): Ditto.
      	(<code>v8qi3): Ditto.
      	(<code><mode>3): Ditto.
      	* config/i386/sse.md (vcond<mode><mode>): Merge this with ..
      	(vcond<sseintvecmodelower><mode>): .. this into ..
      	(vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): .. this,
      	and extend to V8BF/V16BF/V32BFmode.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.target/i386/part-vect-vcondhf.C: New test.
      	* gcc.target/i386/part-vect-vec_cmphf.c: New test.
      7eed861e
  24. Oct 23, 2023
    • Haochen Jiang's avatar
      i386: Prevent splitting to xmm16+ when !TARGET_AVX512VL · 1df490ed
      Haochen Jiang authored
      Currently, there will be a chance in split to use x/ymm16+ w/o AVX512VL,
      which finally leads to an ICE as pr111753 does.
      
      This patch aims to fix that.
      
      gcc/ChangeLog:
      
      	PR target/111753
      	* config/i386/i386.cc (ix86_standard_x87sse_constant_load_p):
      	Do not split to xmm16+ when !TARGET_AVX512VL.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/111753
      	* gcc.target/i386/pr111753.c: New test.
      1df490ed
  25. Oct 22, 2023
  26. Oct 16, 2023
    • Uros Bizjak's avatar
      i386: Allow -mlarge-data-threshold with -mcmodel=large · 1a64156c
      Uros Bizjak authored
      From: Fangrui Song <maskray@google.com>
      
      When using -mcmodel=medium, large data objects larger than the
      -mlarge-data-threshold threshold are placed into large data sections
      (.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
      .l* sections into separate output sections.  If small and medium code
      model object files are mixed, the .l* sections won't exert relocation
      overflow pressure on sections in object files built with -mcmodel=small.
      
      However, when using -mcmodel=large, -mlarge-data-threshold doesn't
      apply.  This means that the .rodata/.data/.bss sections may exert
      relocation overflow pressure on sections in -mcmodel=small object files.
      
      This patch allows -mcmodel=large to generate .l* sections and drops an
      unneeded documentation restriction that the value must be the same.
      
      Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
      
      
      ("Large data sections for the large code model")
      
      Signed-off-by: default avatarFangrui Song <maskray@google.com>
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_can_inline_p):
      	Handle CM_LARGE and CM_LARGE_PIC.
      	(x86_elf_aligned_decl_common): Ditto.
      	(x86_output_aligned_bss): Ditto.
      	* config/i386/i386.opt: Update doc for -mlarge-data-threshold=.
      	* doc/invoke.texi: Update doc for -mlarge-data-threshold=.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/large-data.c: New test.
      1a64156c
  27. Oct 12, 2023
    • Mo, Zewei's avatar
      [APX] Support Intel APX PUSH2POP2 · 180b08f6
      Mo, Zewei authored
      
      This feature requires stack to be aligned at 16byte, therefore in
      prologue/epilogue, a standalone push/pop will be emitted before any
      push2/pop2 if the stack was not aligned to 16byte.
      Also for current implementation we only support push2/pop2 usage in
      function prologue/epilogue for those callee-saved registers.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (gen_push2): New function to emit push2
      	and adjust cfa offset.
      	(ix86_pro_and_epilogue_can_use_push2_pop2): New function to
      	determine whether push2/pop2 can be used.
      	(ix86_compute_frame_layout): Adjust preferred stack boundary
      	and stack alignment needed for push2/pop2.
      	(ix86_emit_save_regs): Emit push2 when available.
      	(ix86_emit_restore_reg_using_pop2): New function to emit pop2
      	and adjust cfa info.
      	(ix86_emit_restore_regs_using_pop2): New function to loop
      	through the saved regs and call above.
      	(ix86_expand_epilogue): Call ix86_emit_restore_regs_using_pop2
      	when push2pop2 available.
      	* config/i386/i386.md (push2_di): New pattern for push2.
      	(pop2_di): Likewise for pop2.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/apx-push2pop2-1.c: New test.
      	* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
      	* gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise.
      
      Co-authored-by: default avatarHu Lin1 <lin1.hu@intel.com>
      Co-authored-by: default avatarHongyu Wang <hongyu.wang@intel.com>
      180b08f6
  28. Oct 09, 2023
    • Haochen Jiang's avatar
      Support -mevex512 for AVX512BW intrins · 8e79b1b4
      Haochen Jiang authored
      gcc/Changelog:
      
      	* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
      	Make sure there is EVEX512 enabled.
      	(ix86_expand_vecop_qihi2): Refuse V32QI->V32HI when no EVEX512.
      	* config/i386/i386.cc (ix86_hard_regno_mode_ok): Disable 64 bit mask
      	when !TARGET_EVEX512.
      	* config/i386/i386.md (avx512bw_512): New.
      	(SWI1248_AVX512BWDQ_64): Add TARGET_EVEX512.
      	(*zero_extendsidi2): Change isa to avx512bw_512.
      	(kmov_isa): Ditto.
      	(*anddi_1): Ditto.
      	(*andn<mode>_1): Change isa to kmov_isa.
      	(*<code><mode>_1): Ditto.
      	(*notxor<mode>_1): Ditto.
      	(*one_cmpl<mode>2_1): Ditto.
      	(*one_cmplsi2_1_zext): Change isa to avx512bw_512.
      	(*ashl<mode>3_1): Change isa to kmov_isa.
      	(*lshr<mode>3_1): Ditto.
      	* config/i386/sse.md (VI12HFBF_AVX512VL): Add TARGET_EVEX512.
      	(VI1248_AVX512VLBW): Ditto.
      	(VHFBF_AVX512VL): Ditto.
      	(VI): Ditto.
      	(VIHFBF): Ditto.
      	(VI_AVX2): Ditto.
      	(VI1_AVX512): Ditto.
      	(VI12_256_512_AVX512VL): Ditto.
      	(VI2_AVX2_AVX512BW): Ditto.
      	(VI2_AVX512VNNIBW): Ditto.
      	(VI2_AVX512VL): Ditto.
      	(VI2HFBF_AVX512VL): Ditto.
      	(VI8_AVX2_AVX512BW): Ditto.
      	(VIMAX_AVX2_AVX512BW): Ditto.
      	(VIMAX_AVX512VL): Ditto.
      	(VI12_AVX2_AVX512BW): Ditto.
      	(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
      	(VI248_AVX512VL): Ditto.
      	(VI248_AVX512VLBW): Ditto.
      	(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
      	(VI248_AVX512BW): Ditto.
      	(VI248_AVX512BW_AVX512VL): Ditto.
      	(VI248_512): Ditto.
      	(VI124_256_AVX512F_AVX512BW): Ditto.
      	(VI_AVX512BW): Ditto.
      	(VIHFBF_AVX512BW): Ditto.
      	(SWI1248_AVX512BWDQ): Ditto.
      	(SWI1248_AVX512BW): Ditto.
      	(SWI1248_AVX512BWDQ2): Ditto.
      	(*knotsi_1_zext): Ditto.
      	(define_split for zero_extend + not): Ditto.
      	(kunpckdi): Ditto.
      	(REDUC_SMINMAX_MODE): Ditto.
      	(VEC_EXTRACT_MODE): Ditto.
      	(*avx512bw_permvar_truncv16siv16hi_1): Ditto.
      	(*avx512bw_permvar_truncv16siv16hi_1_hf): Ditto.
      	(truncv32hiv32qi2): Ditto.
      	(avx512bw_<code>v32hiv32qi2): Ditto.
      	(avx512bw_<code>v32hiv32qi2_mask): Ditto.
      	(avx512bw_<code>v32hiv32qi2_mask_store): Ditto.
      	(usadv64qi): Ditto.
      	(VEC_PERM_AVX2): Ditto.
      	(AVX512ZEXTMASK): Ditto.
      	(SWI24_MASK): New.
      	(vec_pack_trunc_<mode>): Change iterator to SWI24_MASK.
      	(avx512bw_packsswb<mask_name>): Add TARGET_EVEX512.
      	(avx512bw_packssdw<mask_name>): Ditto.
      	(avx512bw_interleave_highv64qi<mask_name>): Ditto.
      	(avx512bw_interleave_lowv64qi<mask_name>): Ditto.
      	(<mask_codefor>avx512bw_pshuflwv32hi<mask_name>): Ditto.
      	(<mask_codefor>avx512bw_pshufhwv32hi<mask_name>): Ditto.
      	(vec_unpacks_lo_di): Ditto.
      	(SWI48x_MASK): New.
      	(vec_unpacks_hi_<mode>): Change iterator to SWI48x_MASK.
      	(avx512bw_umulhrswv32hi3<mask_name>): Add TARGET_EVEX512.
      	(VI1248_AVX512VL_AVX512BW): Ditto.
      	(avx512bw_<code>v32qiv32hi2<mask_name>): Ditto.
      	(*avx512bw_zero_extendv32qiv32hi2_1): Ditto.
      	(*avx512bw_zero_extendv32qiv32hi2_2): Ditto.
      	(<insn>v32qiv32hi2): Ditto.
      	(pbroadcast_evex_isa): Change isa attribute to avx512bw_512.
      	(VPERMI2): Add TARGET_EVEX512.
      	(VPERMI2I): Ditto.
      8e79b1b4
    • Haochen Jiang's avatar
      Support -mevex512 for AVX512DQ intrins · 1b248907
      Haochen Jiang authored
      gcc/ChangeLog:
      
      	* config/i386/i386-expand.cc (ix86_expand_sse2_mulvxdi3):
      	Add TARGET_EVEX512 for 512 bit usage.
      	* config/i386/i386.cc (standard_sse_constant_opcode): Ditto.
      	* config/i386/sse.md (VF1_VF2_AVX512DQ): Ditto.
      	(VF1_128_256VL): Ditto.
      	(VF2_AVX512VL): Ditto.
      	(VI8_256_512): Ditto.
      	(<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>):
      	Ditto.
      	(AVX512_VEC): Ditto.
      	(AVX512_VEC_2): Ditto.
      	(VI4F_BRCST32x2): Ditto.
      	(VI8F_BRCST64x2): Ditto.
      1b248907
    • Haochen Jiang's avatar
      Support -mevex512 for AVX512F intrins · c1eef66b
      Haochen Jiang authored
      gcc/ChangeLog:
      
      	* config/i386/i386-builtins.cc
      	(ix86_vectorize_builtin_gather): Disable 512 bit gather
      	when !TARGET_EVEX512.
      	* config/i386/i386-expand.cc (ix86_valid_mask_cmp_mode):
      	Add TARGET_EVEX512.
      	(ix86_expand_int_sse_cmp): Ditto.
      	(ix86_expand_vector_init_one_nonzero): Disable subroutine
      	when !TARGET_EVEX512.
      	(ix86_emit_swsqrtsf): Add TARGET_EVEX512.
      	(ix86_vectorize_vec_perm_const): Disable subroutine when
      	!TARGET_EVEX512.
      	* config/i386/i386.cc
      	(standard_sse_constant_p): Add TARGET_EVEX512.
      	(standard_sse_constant_opcode): Ditto.
      	(ix86_get_ssemov): Ditto.
      	(ix86_legitimate_constant_p): Ditto.
      	(ix86_vectorize_builtin_scatter): Diable 512 bit scatter
      	when !TARGET_EVEX512.
      	* config/i386/i386.md (avx512f_512): New.
      	(movxi): Add TARGET_EVEX512.
      	(*movxi_internal_avx512f): Ditto.
      	(*movdi_internal): Change alternative 12 to ?Yv. Adjust mode
      	for alternative 13.
      	(*movsi_internal): Change alternative 8 to ?Yv. Adjust mode for
      	alternative 9.
      	(*movhi_internal): Change alternative 11 to *Yv.
      	(*movdf_internal): Change alternative 12 to Yv.
      	(*movsf_internal): Change alternative 5 to Yv. Adjust mode for
      	alternative 5 and 6.
      	(*mov<mode>_internal): Change alternative 4 to Yv.
      	(define_split for convert SF to DF): Add TARGET_EVEX512.
      	(extendbfsf2_1): Ditto.
      	* config/i386/predicates.md (bcst_mem_operand): Disable predicate
      	for 512 bit when !TARGET_EVEX512.
      	* config/i386/sse.md (VMOVE): Add TARGET_EVEX512.
      	(V48_AVX512VL): Ditto.
      	(V48_256_512_AVX512VL): Ditto.
      	(V48H_AVX512VL): Ditto.
      	(VI12_AVX512VL): Ditto.
      	(V): Ditto.
      	(V_512): Ditto.
      	(V_256_512): Ditto.
      	(VF): Ditto.
      	(VF1_VF2_AVX512DQ): Ditto.
      	(VFH): Ditto.
      	(VFB): Ditto.
      	(VF1): Ditto.
      	(VF1_AVX2): Ditto.
      	(VF2): Ditto.
      	(VF2H): Ditto.
      	(VF2_512_256): Ditto.
      	(VF2_512_256VL): Ditto.
      	(VF_512): Ditto.
      	(VFB_512): Ditto.
      	(VI48_AVX512VL): Ditto.
      	(VI1248_AVX512VLBW): Ditto.
      	(VF_AVX512VL): Ditto.
      	(VFH_AVX512VL): Ditto.
      	(VF1_AVX512VL): Ditto.
      	(VI): Ditto.
      	(VIHFBF): Ditto.
      	(VI_AVX2): Ditto.
      	(VI8): Ditto.
      	(VI8_AVX512VL): Ditto.
      	(VI2_AVX512F): Ditto.
      	(VI4_AVX512F): Ditto.
      	(VI4_AVX512VL): Ditto.
      	(VI48_AVX512F_AVX512VL): Ditto.
      	(VI8_AVX2_AVX512F): Ditto.
      	(VI8_AVX_AVX512F): Ditto.
      	(V8FI): Ditto.
      	(V16FI): Ditto.
      	(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
      	(VI248_AVX512VLBW): Ditto.
      	(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
      	(VI248_AVX512BW): Ditto.
      	(VI248_AVX512BW_AVX512VL): Ditto.
      	(VI48_AVX512F): Ditto.
      	(VI48_AVX_AVX512F): Ditto.
      	(VI12_AVX_AVX512F): Ditto.
      	(VI148_512): Ditto.
      	(VI124_256_AVX512F_AVX512BW): Ditto.
      	(VI48_512): Ditto.
      	(VI_AVX512BW): Ditto.
      	(VIHFBF_AVX512BW): Ditto.
      	(VI4F_256_512): Ditto.
      	(VI48F_256_512): Ditto.
      	(VI48F): Ditto.
      	(VI12_VI48F_AVX512VL): Ditto.
      	(V32_512): Ditto.
      	(AVX512MODE2P): Ditto.
      	(STORENT_MODE): Ditto.
      	(REDUC_PLUS_MODE): Ditto.
      	(REDUC_SMINMAX_MODE): Ditto.
      	(*andnot<mode>3): Change isa attribute to avx512f_512.
      	(*andnot<mode>3): Ditto.
      	(<code><mode>3): Ditto.
      	(<code>tf3): Ditto.
      	(FMAMODEM): Add TARGET_EVEX512.
      	(FMAMODE_AVX512): Ditto.
      	(VFH_SF_AVX512VL): Ditto.
      	(avx512f_fix_notruncv16sfv16si<mask_name><round_name>): Ditto.
      	(fix<fixunssuffix>_truncv16sfv16si2<mask_name><round_saeonly_name>):
      	Ditto.
      	(avx512f_cvtdq2pd512_2): Ditto.
      	(avx512f_cvtpd2dq512<mask_name><round_name>): Ditto.
      	(fix<fixunssuffix>_truncv8dfv8si2<mask_name><round_saeonly_name>):
      	Ditto.
      	(<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>): Ditto.
      	(vec_unpacks_lo_v16sf): Ditto.
      	(vec_unpacks_hi_v16sf): Ditto.
      	(vec_unpacks_float_hi_v16si): Ditto.
      	(vec_unpacks_float_lo_v16si): Ditto.
      	(vec_unpacku_float_hi_v16si): Ditto.
      	(vec_unpacku_float_lo_v16si): Ditto.
      	(vec_pack_sfix_trunc_v8df): Ditto.
      	(avx512f_vec_pack_sfix_v8df): Ditto.
      	(<mask_codefor>avx512f_unpckhps512<mask_name>): Ditto.
      	(<mask_codefor>avx512f_unpcklps512<mask_name>): Ditto.
      	(<mask_codefor>avx512f_movshdup512<mask_name>): Ditto.
      	(<mask_codefor>avx512f_movsldup512<mask_name>): Ditto.
      	(AVX512_VEC): Ditto.
      	(AVX512_VEC_2): Ditto.
      	(vec_extract_lo_v64qi): Ditto.
      	(vec_extract_hi_v64qi): Ditto.
      	(VEC_EXTRACT_MODE): Ditto.
      	(<mask_codefor>avx512f_unpckhpd512<mask_name>): Ditto.
      	(avx512f_movddup512<mask_name>): Ditto.
      	(avx512f_unpcklpd512<mask_name>): Ditto.
      	(*<avx512>_vternlog<mode>_all): Ditto.
      	(*<avx512>_vpternlog<mode>_1): Ditto.
      	(*<avx512>_vpternlog<mode>_2): Ditto.
      	(*<avx512>_vpternlog<mode>_3): Ditto.
      	(avx512f_shufps512_mask): Ditto.
      	(avx512f_shufps512_1<mask_name>): Ditto.
      	(avx512f_shufpd512_mask): Ditto.
      	(avx512f_shufpd512_1<mask_name>): Ditto.
      	(<mask_codefor>avx512f_interleave_highv8di<mask_name>): Ditto.
      	(<mask_codefor>avx512f_interleave_lowv8di<mask_name>): Ditto.
      	(vec_dupv2df<mask_name>): Ditto.
      	(trunc<pmov_src_lower><mode>2): Ditto.
      	(*avx512f_<code><pmov_src_lower><mode>2): Ditto.
      	(*avx512f_vpermvar_truncv8div8si_1): Ditto.
      	(avx512f_<code><pmov_src_lower><mode>2_mask): Ditto.
      	(avx512f_<code><pmov_src_lower><mode>2_mask_store): Ditto.
      	(truncv8div8qi2): Ditto.
      	(avx512f_<code>v8div16qi2): Ditto.
      	(*avx512f_<code>v8div16qi2_store_1): Ditto.
      	(*avx512f_<code>v8div16qi2_store_2): Ditto.
      	(avx512f_<code>v8div16qi2_mask): Ditto.
      	(*avx512f_<code>v8div16qi2_mask_1): Ditto.
      	(*avx512f_<code>v8div16qi2_mask_store_1): Ditto.
      	(avx512f_<code>v8div16qi2_mask_store_2): Ditto.
      	(vec_widen_umult_even_v16si<mask_name>): Ditto.
      	(*vec_widen_umult_even_v16si<mask_name>): Ditto.
      	(vec_widen_smult_even_v16si<mask_name>): Ditto.
      	(*vec_widen_smult_even_v16si<mask_name>): Ditto.
      	(VEC_PERM_AVX2): Ditto.
      	(one_cmpl<mode>2): Ditto.
      	(<mask_codefor>one_cmpl<mode>2<mask_name>): Ditto.
      	(*one_cmpl<mode>2_pternlog_false_dep): Ditto.
      	(define_split to xor): Ditto.
      	(*andnot<mode>3): Ditto.
      	(define_split for ior): Ditto.
      	(*iornot<mode>3): Ditto.
      	(*xnor<mode>3): Ditto.
      	(*<nlogic><mode>3): Ditto.
      	(<mask_codefor>avx512f_interleave_highv16si<mask_name>): Ditto.
      	(<mask_codefor>avx512f_interleave_lowv16si<mask_name>): Ditto.
      	(avx512f_pshufdv3_mask): Ditto.
      	(avx512f_pshufd_1<mask_name>): Ditto.
      	(*vec_extractv4ti): Ditto.
      	(VEXTRACTI128_MODE): Ditto.
      	(define_split to vec_extract): Ditto.
      	(VI1248_AVX512VL_AVX512BW): Ditto.
      	(<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>): Ditto.
      	(<insn>v16qiv16si2): Ditto.
      	(avx512f_<code>v16hiv16si2<mask_name>): Ditto.
      	(<insn>v16hiv16si2): Ditto.
      	(avx512f_zero_extendv16hiv16si2_1): Ditto.
      	(avx512f_<code>v8qiv8di2<mask_name>): Ditto.
      	(*avx512f_<code>v8qiv8di2<mask_name>_1): Ditto.
      	(*avx512f_<code>v8qiv8di2<mask_name>_2): Ditto.
      	(<insn>v8qiv8di2): Ditto.
      	(avx512f_<code>v8hiv8di2<mask_name>): Ditto.
      	(<insn>v8hiv8di2): Ditto.
      	(avx512f_<code>v8siv8di2<mask_name>): Ditto.
      	(*avx512f_zero_extendv8siv8di2_1): Ditto.
      	(*avx512f_zero_extendv8siv8di2_2): Ditto.
      	(<insn>v8siv8di2): Ditto.
      	(avx512f_roundps512_sfix): Ditto.
      	(vashrv8di3): Ditto.
      	(vashrv16si3): Ditto.
      	(pbroadcast_evex_isa): Change isa attribute to avx512f_512.
      	(vec_dupv4sf): Add TARGET_EVEX512.
      	(*vec_dupv4si): Ditto.
      	(*vec_dupv2di): Ditto.
      	(vec_dup<mode>): Change isa attribute to avx512f_512.
      	(VPERMI2): Add TARGET_EVEX512.
      	(VPERMI2I): Ditto.
      	(VEC_INIT_MODE): Ditto.
      	(VEC_INIT_HALF_MODE): Ditto.
      	(<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>):
      	Ditto.
      	(avx512f_vcvtps2ph512_mask_sae): Ditto.
      	(<mask_codefor>avx512f_vcvtps2ph512<mask_name><round_saeonly_name>):
      	Ditto.
      	(*avx512f_vcvtps2ph512<merge_mask_name>): Ditto.
      	(INT_BROADCAST_MODE): Ditto.
      c1eef66b
    • Haochen Jiang's avatar
      Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 · aa9bce39
      Haochen Jiang authored
      gcc/ChangeLog:
      
      	* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
      	Disable zmm broadcast for !TARGET_EVEX512.
      	* config/i386/i386-options.cc (ix86_option_override_internal):
      	Do not use PVW_512 when no-evex512.
      	(ix86_simd_clone_adjust): Add evex512 target into string.
      	* config/i386/i386.cc (type_natural_mode): Report ABI warning
      	when using zmm register w/o evex512.
      	(ix86_return_in_memory): Do not allow zmm when !TARGET_EVEX512.
      	(ix86_hard_regno_mode_ok): Ditto.
      	(ix86_set_reg_reg_cost): Ditto.
      	(ix86_rtx_costs): Ditto.
      	(ix86_vector_mode_supported_p): Ditto.
      	(ix86_preferred_simd_mode): Ditto.
      	(ix86_get_mask_mode): Ditto.
      	(ix86_simd_clone_compute_vecsize_and_simdlen): Disable 512 bit
      	libmvec call when !TARGET_EVEX512.
      	(ix86_simd_clone_usable): Ditto.
      	* config/i386/i386.h (BIGGEST_ALIGNMENT): Disable 512 alignment
      	when !TARGET_EVEX512
      	(MOVE_MAX): Do not use PVW_512 when !TARGET_EVEX512.
      	(STORE_MAX_PIECES): Ditto.
      aa9bce39
  29. Oct 08, 2023
    • liuhongt's avatar
      Support signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2HF/V4HF. · b4fc1abb
      liuhongt authored
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_build_const_vector): Handle V2HF
      	and V4HFmode.
      	(ix86_build_signbit_mask): Ditto.
      	* config/i386/mmx.md (mmxintvecmode): Ditto.
      	(<code><mode>2): New define_expand.
      	(*mmx_<code><mode>): New define_insn_and_split.
      	(*mmx_nabs<mode>2): Ditto.
      	(*mmx_andnot<mode>3): New define_insn.
      	(<code><mode>3): Ditto.
      	(copysign<mode>3): New define_expand.
      	(xorsign<mode>3): Ditto.
      	(signbit<mode>2): Ditto.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/part-vect-absneghf.c: New test.
      	* gcc.target/i386/part-vect-copysignhf.c: New test.
      	* gcc.target/i386/part-vect-xorsignhf.c: New test.
      b4fc1abb
  30. Oct 07, 2023
    • Kong Lingling's avatar
      [APX EGPR] Handle legacy insns that only support GPR16 (3/5) · 1328bb72
      Kong Lingling authored
      
      Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
      but no evex counterpart.
      
      insn list:
      1. phminposuw/vphminposuw
      2. ptest/vptest
      3. roundps/vroundps, roundpd/vroundpd,
         roundss/vroundss, roundsd/vroundsd
      4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
      5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
      6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
      
      gcc/ChangeLog:
      
      	* config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
      	prototype.
      	* config/i386/i386.cc (x86_evex_reg_mentioned_p): New
      	function.
      	* config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
      	and constraint jm to all non-evex alternatives, adjust
      	alternative outputs if evex reg is mentioned.
      	* config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
      	and constraint jm/ja to all non-evex alternatives.
      	(ptesttf2): Likewise.
      	(<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
      	(sse4_1_round<ssescalarmodesuffix>): Likewise.
      	(sse4_2_pcmpestri): Likewise.
      	(sse4_2_pcmpestrm): Likewise.
      	(sse4_2_pcmpestr_cconly): Likewise.
      	(sse4_2_pcmpistr): Likewise.
      	(sse4_2_pcmpistri): Likewise.
      	(sse4_2_pcmpistrm): Likewise.
      	(sse4_2_pcmpistr_cconly): Likewise.
      	(aesimc): Likewise.
      	(aeskeygenassist): Likewise.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
      	tests.
      
      Co-authored-by: default avatarHongyu Wang <hongyu.wang@intel.com>
      Co-authored-by: default avatarHongtao Liu <hongtao.liu@intel.com>
      1328bb72
    • Hongyu Wang's avatar
      [APX EGPR] Handle GPR16 only vector move insns · f4988648
      Hongyu Wang authored
      
      For vector move insns like vmovdqa/vmovdqu, their evex counterparts
      requrire explicit suffix 64/32/16/8. The usage of these instruction
      are prohibited under AVX10_1 or AVX512F, so for we select
      vmovaps/vmovups for vector load/store insns that contains EGPR if
      ther is no AVX512VL, and keep the original move insn selection
      otherwise.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used,
      	adjust mnemonic for vmovduq/vmovdqa.
      	* config/i386/sse.md (*<extract_type>_vinsert<shuffletype><extract_suf>_0):
      	Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa.
      	(avx_vec_concat<mode>): Likewise, and separate alternative 0 to
      	avx_noavx512f.
      
      Co-authored-by: default avatarKong Lingling <lingling.kong@intel.com>
      Co-authored-by: default avatarHongtao Liu <hongtao.liu@intel.com>
      f4988648
    • Kong Lingling's avatar
      [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint. · ccdc0f0f
      Kong Lingling authored
      
      In inline asm, we do not know if the insn can use EGPR, so disable EGPR
      usage by default via mapping the common reg/mem constraint to non-EGPR
      constraints.
      
      The full list of mapping goes like
      
        "g" -> "jrjmi"
        "r" -> "jr"
        "m" -> "jm"
        "<" -> "j<"
        ">" -> "j>"
        "o" -> "jo"
        "V" -> "jV"
        "p" -> "jp"
        "Bm" -> "ja
      
      For memory constraints, we add an option -mapx-inline-asm-use-gpr32
      to allow/disallow gpr32 usage in any memory related constraints, as
      base_reg_class/index_reg_class cannot aware whether the asm insn
      support gpr32 or not.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (map_egpr_constraints): New funciton to
      	map common constraints to EGPR prohibited constraints.
      	(ix86_md_asm_adjust): Calls map_egpr_constraints.
      	* config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/apx-inline-gpr-norex2.c: New test.
      
      Co-authored-by: default avatarHongyu Wang <hongyu.wang@intel.com>
      Co-authored-by: default avatarHongtao Liu <hongtao.liu@intel.com>
      ccdc0f0f
    • Kong Lingling's avatar
      [APX EGPR] Add backend hook for base_reg_class/index_reg_class. · 0793ee05
      Kong Lingling authored
      
      Add backend helper functions to verify if a rtx_insn can adopt EGPR to
      its base/index reg of memory operand. The verification rule goes like
        1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32.
        2. Disable EGPR for unrecognized insn.
        3. If which_alternative is not decided, loop through enabled alternatives
        and check its attr_gpr32. Only enable EGPR when all enabled
        alternatives has attr_gpr32 = 1.
        4. If which_alternative is decided, enable/disable EGPR by its corresponding
        attr_gpr32.
      
      gcc/ChangeLog:
      
      	* config/i386/i386-protos.h (ix86_insn_base_reg_class): New
      	prototype.
      	(ix86_regno_ok_for_insn_base_p): Likewise.
      	(ix86_insn_index_reg_class): Likewise.
      	* config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
      	New helper function to scan the insn.
      	(ix86_insn_base_reg_class): New function to choose BASE_REG_CLASS.
      	(ix86_regno_ok_for_insn_base_p): Likewise for base regno.
      	(ix86_insn_index_reg_class): Likewise for INDEX_REG_CLASS.
      	* config/i386/i386.h (INSN_BASE_REG_CLASS): Define.
      	(REGNO_OK_FOR_INSN_BASE_P): Likewise.
      	(INSN_INDEX_REG_CLASS): Likewise.
      	(enum reg_class): Add INDEX_GPR16.
      	(GENERAL_GPR16_REGNO_P): Define.
      	* config/i386/i386.md (gpr32): New attribute.
      
      Co-authored-by: default avatarHongyu Wang <hongyu.wang@intel.com>
      Co-authored-by: default avatarHongtao Liu <hongtao.liu@intel.com>
      0793ee05
Loading