- Feb 05, 2024
-
-
H.J. Lu authored
2 scratch registers, %r10 and %r11, are available at function entry for large model profiling. But %r10 may be used by stack realignment and we can't use %r10 in this case. Add x86_64_select_profile_regnum to find a caller-saved register which isn't live or a callee-saved register which has been saved on stack in the prologue at entry for large model profiling and sorry if we can't find one. gcc/ PR target/113689 * config/i386/i386.cc (x86_64_select_profile_regnum): New. (x86_function_profiler): Call x86_64_select_profile_regnum to get a scratch register for large model profiling. gcc/testsuite/ PR target/113689 * gcc.target/i386/pr113689-1.c: New file. * gcc.target/i386/pr113689-2.c: Likewise. * gcc.target/i386/pr113689-3.c: Likewise.
-
- Jan 27, 2024
-
-
H.J. Lu authored
When an interrupt handler is implemented by an assembly stub which does: 1. Save all registers. 2. Call a C function. 3. Restore all registers. 4. Return from interrupt. it is completely unnecessary to save and restore any registers in the C function called by the assembly stub, even if they would normally be callee-saved. Add no_callee_saved_registers function attribute, which is complementary to no_caller_saved_registers function attribute, to mark a function which doesn't have any callee-saved registers. Such a function won't save and restore any registers. Classify function call-saved register handling type with: 1. Default call-saved registers. 2. No caller-saved registers with no_caller_saved_registers attribute. 3. No callee-saved registers with no_callee_saved_registers attribute. Disallow sibcall if callee is a no_callee_saved_registers function and caller isn't a no_callee_saved_registers function. Otherwise, callee-saved registers won't be preserved. After a no_callee_saved_registers function is called, all registers may be clobbered. If the calling function isn't a no_callee_saved_registers function, we need to preserve all registers which aren't used by function calls. gcc/ PR target/103503 PR target/113312 * config/i386/i386-expand.cc (ix86_expand_call): Replace no_caller_saved_registers check with call_saved_registers check. Clobber all registers that are not used by the callee with no_callee_saved_registers attribute. * config/i386/i386-options.cc (ix86_set_func_type): Set call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for noreturn function. Disallow no_callee_saved_registers with interrupt or no_caller_saved_registers attributes together. (ix86_set_current_function): Replace no_caller_saved_registers check with call_saved_registers check. (ix86_handle_no_caller_saved_registers_attribute): Renamed to ... (ix86_handle_call_saved_registers_attribute): This. (ix86_gnu_attributes): Add ix86_handle_call_saved_registers_attribute. * config/i386/i386.cc (ix86_conditional_register_usage): Replace no_caller_saved_registers check with call_saved_registers check. (ix86_function_ok_for_sibcall): Don't allow callee with no_callee_saved_registers attribute when the calling function has callee-saved registers. (ix86_comp_type_attributes): Also check no_callee_saved_registers. (ix86_epilogue_uses): Replace no_caller_saved_registers check with call_saved_registers check. (ix86_hard_regno_scratch_ok): Likewise. (ix86_save_reg): Replace no_caller_saved_registers check with call_saved_registers check. Don't save any registers for TYPE_NO_CALLEE_SAVED_REGISTERS. Save all registers with TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with no_callee_saved_registers attribute is called. (find_drap_reg): Replace no_caller_saved_registers check with call_saved_registers check. * config/i386/i386.h (call_saved_registers_type): New enum. (machine_function): Replace no_caller_saved_registers with call_saved_registers. * doc/extend.texi: Document no_callee_saved_registers attribute. gcc/testsuite/ PR target/103503 PR target/113312 * gcc.dg/torture/no-callee-saved-run-1a.c: New file. * gcc.dg/torture/no-callee-saved-run-1b.c: Likewise. * gcc.target/i386/no-callee-saved-1.c: Likewise. * gcc.target/i386/no-callee-saved-2.c: Likewise. * gcc.target/i386/no-callee-saved-3.c: Likewise. * gcc.target/i386/no-callee-saved-4.c: Likewise. * gcc.target/i386/no-callee-saved-5.c: Likewise. * gcc.target/i386/no-callee-saved-6.c: Likewise. * gcc.target/i386/no-callee-saved-7.c: Likewise. * gcc.target/i386/no-callee-saved-8.c: Likewise. * gcc.target/i386/no-callee-saved-9.c: Likewise. * gcc.target/i386/no-callee-saved-10.c: Likewise. * gcc.target/i386/no-callee-saved-11.c: Likewise. * gcc.target/i386/no-callee-saved-12.c: Likewise. * gcc.target/i386/no-callee-saved-13.c: Likewise. * gcc.target/i386/no-callee-saved-14.c: Likewise. * gcc.target/i386/no-callee-saved-15.c: Likewise. * gcc.target/i386/no-callee-saved-16.c: Likewise. * gcc.target/i386/no-callee-saved-17.c: Likewise. * gcc.target/i386/no-callee-saved-18.c: Likewise.
-
- Jan 18, 2024
-
-
Jakub Jelinek authored
x86_function_profiler emits assembly directly into file and only emits AT&T syntax. The following patch adjusts it to emit MASM syntax if -masm=intel. As it doesn't use asm_fprintf, I can't use {|} syntax for the dialects. I've tested using for i in -mcmodel=large "-mcmodel=large -fpic" "" -fpic "-m32 -fpic" "-m32"; do ./xgcc -B ./ -c -O2 -fprofile $i -masm=att pr113122.c -o pr113122.o1; ./xgcc -B ./ -c -O2 -fprofile $i -masm=intel pr113122.c -o pr113122.o2; objdump -dr pr113122.o1 > /tmp/1; objdump -dr pr113122.o2 > /tmp/2; diff -up /tmp/1 /tmp/2; done that the emitted sequences are identical after assembly. 2024-01-18 Jakub Jelinek <jakub@redhat.com> PR target/113122 * config/i386/i386.cc (x86_function_profiler): Add -masm=intel support. Add missing space after , in emitted assembly in some cases. Formatting fixes. * gcc.target/i386/pr113122-1.c: New test. * gcc.target/i386/pr113122-2.c: New test. * gcc.target/i386/pr113122-3.c: New test. * gcc.target/i386/pr113122-4.c: New test.
-
- Jan 05, 2024
-
-
Ilya Leoshkevich authored
GCC can emit code between the function label and the .LASANPC label, making the latter unaligned. Some architectures cannot load unaligned labels directly and require literal pool entries, which is inefficient. Move the invocation of asan_function_start to ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code is emitted. This allows setting the .LASANPC label alignment to the respective function alignment. Link: https://inbox.sourceware.org/gcc-patches/20240102194511.3171559-3-iii@linux.ibm.com/ Signed-off-by:
Ilya Leoshkevich <iii@linux.ibm.com> gcc/ChangeLog: * asan.cc (asan_function_start): Drop switch_to_section (). (asan_emit_stack_protection): Set .LASANPC alignment. * config/i386/i386.cc: Use assemble_function_label_raw () instead of ASM_OUTPUT_LABEL (). * config/s390/s390.cc (s390_asm_output_function_label): Likewise. * defaults.h (ASM_OUTPUT_FUNCTION_LABEL): Likewise. * final.cc (final_start_function_1): Drop asan_function_start (). * output.h (assemble_function_label_raw): New function. * varasm.cc (assemble_function_label_raw): Likewise.
-
- Jan 03, 2024
-
-
Jakub Jelinek authored
-
- Dec 28, 2023
-
-
Uros Bizjak authored
Move ix86_expand_unary_operator from i386.cc to i386-expand.cc, re-arrange prototypes and do some cosmetic changes with the usage of TARGET_APX_NDD. No functional changes. gcc/ChangeLog: * config/i386/i386.cc (ix86_unary_operator_ok): Move from here... * config/i386/i386-expand.cc (ix86_unary_operator_ok): ... to here. * config/i386/i386-protos.h: Re-arrange ix86_{unary|binary}_operator_ok and ix86_expand_{unary|binary}_operator prototypes. * config/i386/i386.md: Cosmetic changes with the usage of TARGET_APX_NDD in ix86_expand_{unary|binary}_operator and ix86_{unary|binary}_operator_ok function calls.
-
- Dec 20, 2023
-
-
Haochen Jiang authored
gcc/ChangeLog: * config/i386/avx512bwintrin.h: Allow 64 bit mask intrin usage for -mno-evex512. * config/i386/i386-builtin.def: Remove OPTION_MASK_ISA2_EVEX512 for 64 bit mask builtins. * config/i386/i386.cc (ix86_hard_regno_mode_ok): Allow 64 bit mask register for -mno-evex512. * config/i386/i386.md (SWI1248_AVX512BWDQ_64): Remove TARGET_EVEX512. (*zero_extendsidi2): Change isa attribute to avx512bw. (kmov_isa): Ditto. (*anddi_1): Ditto. (*andn<mode>_1): Remove TARGET_EVEX512. (*one_cmplsi2_1_zext): Change isa attribute to avx512bw. (*ashl<mode>3_1): Ditto. (*lshr<mode>3_1): Ditto. * config/i386/sse.md (SWI1248_AVX512BWDQ): Remove TARGET_EVEX512. (SWI1248_AVX512BW): Ditto. (SWI1248_AVX512BWDQ2): Ditto. (*knotsi_1_zext): Ditto. (kunpckdi): Ditto. (SWI24_MASK): Removed. (vec_pack_trunc_<mode>): Change iterator from SWI24_MASK to SWI24. (vec_unpacks_lo_di): Remove TARGET_EVEX512. (SWI48x_MASK): Removed. (vec_unpacks_hi_<mode>): Change iterator from SWI48x_MASK to SWI48x. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_1-6.c: Remove check for errors. * gcc.target/i386/noevex512-2.c: Diito.
-
- Dec 15, 2023
-
-
Jakub Jelinek authored
Given what I saw in the aarch64/arm psABIs for BITINT_TYPE, as I said earlier I'm afraid we need to differentiate between the limb mode/precision specified in the psABIs (what is used to decide how it is actually passed, aligned or what size it has) vs. what limb mode/precision should be used during bitint lowering and in the libgcc bitint APIs. While in the x86_64 psABI a limb is 64-bit, which is perfect for both, that is a wordsize which we can perform operations natively in, e.g. aarch64 wants 128-bit limbs for alignment/sizing purposes, but on the bitint lowering side I believe it would result in terribly bad code and on the libgcc side wouldn't work at all (because it relies there on longlong.h support). So, the following patch makes it possible for aarch64 to use TImode as abi_limb_mode for _BitInt(129) and larger, while using DImode as limb_mode. 2023-12-15 Jakub Jelinek <jakub@redhat.com> * target.h (struct bitint_info): Add abi_limb_mode member, adjust comment. * target.def (bitint_type_info): Mention abi_limb_mode instead of limb_mode. * varasm.cc (output_constant): Use abi_limb_mode rather than limb_mode. * stor-layout.cc (finish_bitfield_representative): Likewise. Assert that if precision is smaller or equal to abi_limb_mode precision or if info.big_endian is different from WORDS_BIG_ENDIAN, info.limb_mode must be the same as info.abi_limb_mode. (layout_type): Use abi_limb_mode rather than limb_mode. * gimple-fold.cc (clear_padding_bitint_needs_padding_p): Likewise. (clear_padding_type): Likewise. * config/i386/i386.cc (ix86_bitint_type_info): Also set info->abi_limb_mode. * doc/tm.texi: Regenerated.
-
- Dec 13, 2023
-
-
Jakub Jelinek authored
The following patch fixes ICE on the testcase in similar way to how other folded builtins are handled in ix86_gimple_fold_builtin when they don't have a lhs; these builtins are const or pure, so normally DCE would remove them later, but with -O0 that isn't guaranteed to happen, and during expansion if they are marked TREE_SIDE_EFFECTS it might still be attempted to be expanded. This removes them right away during the folding. Initially I wanted to also change all gsi_replace last args in that function to true, but Andrew pointed to PR107209, so I've kept them as is. 2023-12-13 Jakub Jelinek <jakub@redhat.com> PR target/112962 * config/i386/i386.cc (ix86_gimple_fold_builtin): For shifts and abs without lhs replace with nop. * gcc.target/i386/pr112962.c: New test.
-
- Dec 12, 2023
-
-
liuhongt authored
Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS. If the function desn't clobber any sse registers or only clobber 128-bit part, then vzeroupper isn't issued before the function exit. the status not CLEAN but ANY after the function. Also for sibling_call, it's safe to issue an vzeroupper. Also there could be missing vzeroupper since there's no mode_exit for sibling_call_p. gcc/ChangeLog: PR target/112891 * config/i386/i386.cc (ix86_avx_u128_mode_after): Return AVX_U128_ANY if callee_abi doesn't clobber all_sse_regs to align with ix86_avx_u128_mode_needed. (ix86_avx_u128_mode_needed): Return AVX_U128_ClEAN for sibling_call. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112891.c: New test. * gcc.target/i386/pr112891-2.c: New test.
-
- Dec 07, 2023
-
-
Kong Lingling authored
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd parameter and adjust for NDD. * config/i386/i386-protos.h: Add use_ndd parameter for ix86_unary_operator_ok and ix86_expand_unary_operator. * config/i386/i386.cc (ix86_unary_operator_ok): Add use_ndd parameter and adjust for NDD. * config/i386/i386.md (neg<mode>2): Add new constraint for NDD and adjust output template. (*neg<mode>_1): Likewise. (*neg<dwi>2_doubleword): Likewise and adopt '&' to NDD dest. (*neg<mode>_2): Likewise. (*neg<mode>_ccc_1): Likewise. (*neg<mode>_ccc_2): Likewise. (*negsi_1_zext): Likewise, and use nonimmediate_operand for operands[1] to accept memory input for NDD alternatives. (*negsi_2_zext): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ndd.c: Add neg test.
-
Hongyu Wang authored
NDD uses evex prefix, so when segment prefix is also applied, the instruction could excceed its 15byte limit, especially adding immediates. This could happen when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will add the offset to segment register, which will be encoded using segment prefix. Disable those *POFF constant usage in NDD add alternatives with new constraint. gcc/ChangeLog: * config/i386/constraints.md (je): New constraint. * config/i386/i386-protos.h (x86_poff_operand_p): New function to check any *POFF constant in operand. * config/i386/i386.cc (x86_poff_operand_p): New prototype. * config/i386/i386.md (*add<mode>_1): Split out je alternative for add.
-
- Dec 05, 2023
-
-
Richard Sandiford authored
Arm's SME has an array called ZA that for inline asm purposes is effectively a form of special-purpose memory. It doesn't have an associated storage type and so can't be passed and returned in normal C/C++ objects. We'd therefore like "za" in a clobber list to mean that an inline asm can read from and write to ZA. (Just reading or writing individually is unlikely to be useful, but we could add syntax for that too if necessary.) There is currently a TARGET_MD_ASM_ADJUST target hook that allows targets to add clobbers to an asm instruction. This patch extends that to allow targets to add USEs as well. gcc/ * target.def (md_asm_adjust): Add a uses parameter. * doc/tm.texi: Regenerate. * cfgexpand.cc (expand_asm_loc): Update call to md_asm_adjust. Handle any USEs created by the target. (expand_asm_stmt): Likewise. * recog.cc (asm_noperands): Handle asms with USEs. (decode_asm_operands): Likewise. * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add uses parameter. * config/arm/aarch-common.cc (arm_md_asm_adjust): Likewise. * config/arm/arm.cc (thumb1_md_asm_adjust): Likewise. * config/avr/avr.cc (avr_md_asm_adjust): Likewise. * config/cris/cris.cc (cris_md_asm_adjust): Likewise. * config/i386/i386.cc (ix86_md_asm_adjust): Likewise. * config/mn10300/mn10300.cc (mn10300_md_asm_adjust): Likewise. * config/nds32/nds32.cc (nds32_md_asm_adjust): Likewise. * config/pdp11/pdp11.cc (pdp11_md_asm_adjust): Likewise. * config/rs6000/rs6000.cc (rs6000_md_asm_adjust): Likewise. * config/s390/s390.cc (s390_md_asm_adjust): Likewise. * config/vax/vax.cc (vax_md_asm_adjust): Likewise. * config/visium/visium.cc (visium_md_asm_adjust): Likewise.
-
liuhongt authored
Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory. For vec_contruct, the components must be live at the same time if they're not loaded from memory, when the number of those components exceeds available registers, spill happens. Try to account that with a rough estimation. ??? Ideally, we should have an overall estimation of register pressure if we know the live range of all variables. gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Count sse_reg/gpr_regs for components not loaded from memory. (ix86_vector_costs:ix86_vector_costs): New constructor. (ix86_vector_costs::m_num_gpr_needed[3]): New private memeber. (ix86_vector_costs::m_num_sse_needed[3]): Ditto. (ix86_vector_costs::finish_cost): Estimate overall register pressure cost. (ix86_vector_costs::ix86_vect_estimate_reg_pressure): New function.
-
- Dec 04, 2023
-
-
Jakub Jelinek authored
The following testcase ICEs with RTL checking, because it sets if XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set) is actually an UNSPEC, so any time we see any other insn with PARALLEL and a SET in it which is not an UNSPEC we ICE during RTL checking or access there some other union member as if it was an rt_int. The rest is just small cleanup. 2023-12-04 Jakub Jelinek <jakub@redhat.com> PR target/112837 * config/i386/i386.cc (ix86_elim_entry_set_got): Before checking for UNSPEC_SET_GOT check that SET_SRC is UNSPEC. Use SET_SRC and SET_DEST macros instead of XEXP, rename vec variable to set. * gcc.dg/pr112837.c: New test.
-
- Dec 02, 2023
-
-
Richard Sandiford authored
Currently there are four static sources of attributes: - LANG_HOOKS_ATTRIBUTE_TABLE - LANG_HOOKS_COMMON_ATTRIBUTE_TABLE - LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE - TARGET_ATTRIBUTE_TABLE All of the attributes in these tables go in the "gnu" namespace. This means that they can use the traditional GNU __attribute__((...)) syntax and the standard [[gnu::...]] syntax. Standard attributes are registered dynamically with a null namespace. There are no supported attributes in other namespaces (clang, vendor namespaces, etc.). This patch tries to generalise things by making the namespace part of the attribute specification. It's usual for multiple attributes to be defined in the same namespace, so rather than adding the namespace to each individual definition, it seemed better to group attributes in the same namespace together. This would also allow us to reuse the same table for clang attributes that are written with the GNU syntax, or other similar situations where the attribute can be accessed via multiple "spellings". The patch therefore adds a scoped_attribute_specs that contains a namespace and a list of attributes in that namespace. It's still possible to have multiple scoped_attribute_specs for the same namespace. E.g. it makes sense to keep the C++-specific, C/C++-common, and format-related attributes in separate tables, even though they're all GNU attributes. Current lists of attributes are terminated by a null name. Rather than keep that for the new structure, it seemed neater to use an array_slice. This also makes the tables slighly more compact. In general, a target might want to support attributes in multiple namespaces. Rather than have a separate hook for each possibility (like the three langhooks above), it seemed better to make TARGET_ATTRIBUTE_TABLE a table of tables. Specifically, it's an array_slice of scoped_attribute_specs. We can do the same thing for langhooks, which allows the three hooks above to be merged into a single LANG_HOOKS_ATTRIBUTE_TABLE. It also allows the standard attributes to be registered statically and checked by the usual attribs.cc checks. The patch adds a TARGET_GNU_ATTRIBUTES helper for the common case in which a target wants a single table of gnu attributes. It can only be used if the table is free of preprocessor directives. There are probably other things we need to do to make vendor namespaces work smoothly. E.g. in principle it would be good to make exclusion sets namespace-aware. But to some extent we have that with standard vs. gnu attributes too. This patch is just supposed to be a first step. gcc/ * attribs.h (scoped_attribute_specs): New structure. (register_scoped_attributes): Take a reference to a scoped_attribute_specs instead of separate namespace and array parameters. * plugin.h (register_scoped_attributes): Likewise. * attribs.cc (register_scoped_attributes): Likewise. (attribute_tables): Change into an array of scoped_attribute_specs pointers. Reduce to 1 element for frontends and 1 element for targets. (empty_attribute_table): Delete. (check_attribute_tables): Update for changes to attribute_tables. Use a hash_set to identify duplicates. (handle_ignored_attributes_option): Update for above changes. (init_attributes): Likewise. (excl_pair): Delete. (test_attribute_exclusions): Update for above changes. Don't enforce symmetry for standard attributes in the top-level namespace. * langhooks-def.h (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete. (LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Likewise. (LANG_HOOKS_INITIALIZER): Update accordingly. (LANG_HOOKS_ATTRIBUTE_TABLE): Define to an empty constructor. * langhooks.h (lang_hooks::common_attribute_table): Delete. (lang_hooks::format_attribute_table): Likewise. (lang_hooks::attribute_table): Redefine to an array of scoped_attribute_specs pointers. * target-def.h (TARGET_GNU_ATTRIBUTES): New macro. * target.def (attribute_spec): Redefine to return an array of scoped_attribute_specs pointers. * tree-inline.cc (function_attribute_inlinable_p): Update accordingly. * doc/tm.texi: Regenerate. * config/aarch64/aarch64.cc (aarch64_attribute_table): Define using TARGET_GNU_ATTRIBUTES. * config/alpha/alpha.cc (vms_attribute_table): Likewise. * config/avr/avr.cc (avr_attribute_table): Likewise. * config/bfin/bfin.cc (bfin_attribute_table): Likewise. * config/bpf/bpf.cc (bpf_attribute_table): Likewise. * config/csky/csky.cc (csky_attribute_table): Likewise. * config/epiphany/epiphany.cc (epiphany_attribute_table): Likewise. * config/gcn/gcn.cc (gcn_attribute_table): Likewise. * config/h8300/h8300.cc (h8300_attribute_table): Likewise. * config/loongarch/loongarch.cc (loongarch_attribute_table): Likewise. * config/m32c/m32c.cc (m32c_attribute_table): Likewise. * config/m32r/m32r.cc (m32r_attribute_table): Likewise. * config/m68k/m68k.cc (m68k_attribute_table): Likewise. * config/mcore/mcore.cc (mcore_attribute_table): Likewise. * config/microblaze/microblaze.cc (microblaze_attribute_table): Likewise. * config/mips/mips.cc (mips_attribute_table): Likewise. * config/msp430/msp430.cc (msp430_attribute_table): Likewise. * config/nds32/nds32.cc (nds32_attribute_table): Likewise. * config/nvptx/nvptx.cc (nvptx_attribute_table): Likewise. * config/riscv/riscv.cc (riscv_attribute_table): Likewise. * config/rl78/rl78.cc (rl78_attribute_table): Likewise. * config/rx/rx.cc (rx_attribute_table): Likewise. * config/s390/s390.cc (s390_attribute_table): Likewise. * config/sh/sh.cc (sh_attribute_table): Likewise. * config/sparc/sparc.cc (sparc_attribute_table): Likewise. * config/stormy16/stormy16.cc (xstormy16_attribute_table): Likewise. * config/v850/v850.cc (v850_attribute_table): Likewise. * config/visium/visium.cc (visium_attribute_table): Likewise. * config/arc/arc.cc (arc_attribute_table): Likewise. Move further down file. * config/arm/arm.cc (arm_attribute_table): Update for above changes, using... (arm_gnu_attributes, arm_gnu_attribute_table): ...these new globals. * config/i386/i386-options.h (ix86_attribute_table): Delete. (ix86_gnu_attribute_table): Declare. * config/i386/i386-options.cc (ix86_attribute_table): Replace with... (ix86_gnu_attributes, ix86_gnu_attribute_table): ...these two globals. * config/i386/i386.cc (ix86_attribute_table): Define as an array of scoped_attribute_specs pointers. * config/ia64/ia64.cc (ia64_attribute_table): Update for above changes, using... (ia64_gnu_attributes, ia64_gnu_attribute_table): ...these new globals. * config/rs6000/rs6000.cc (rs6000_attribute_table): Update for above changes, using... (rs6000_gnu_attributes, rs6000_gnu_attribute_table): ...these new globals. gcc/ada/ * gcc-interface/gigi.h (gnat_internal_attribute_table): Change type to scoped_attribute_specs. * gcc-interface/utils.cc (gnat_internal_attribute_table): Likewise, using... (gnat_internal_attributes): ...this as the underlying array. * gcc-interface/misc.cc (gnat_attribute_table): New global. (LANG_HOOKS_ATTRIBUTE_TABLE): Use it. gcc/c-family/ * c-common.h (c_common_attribute_table): Replace with... (c_common_gnu_attribute_table): ...this. (c_common_format_attribute_table): Change type to scoped_attribute_specs. * c-attribs.cc (c_common_attribute_table): Replace with... (c_common_gnu_attributes, c_common_gnu_attribute_table): ...these new globals. (c_common_format_attribute_table): Change type to scoped_attribute_specs, using... (c_common_format_attributes): ...this as the underlying array. gcc/c/ * c-tree.h (std_attribute_table): Declare. * c-decl.cc (std_attribute_table): Change type to scoped_attribute_specs, using... (std_attributes): ...this as the underlying array. (c_init_decl_processing): Remove call to register_scoped_attributes. * c-objc-common.h (c_objc_attribute_table): New global. (LANG_HOOKS_ATTRIBUTE_TABLE): Use it. (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete. (LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete. gcc/cp/ * cp-tree.h (cxx_attribute_table): Delete. (cxx_gnu_attribute_table, std_attribute_table): Declare. * cp-objcp-common.h (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete. (LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete. (cp_objcp_attribute_table): New table. (LANG_HOOKS_ATTRIBUTE_TABLE): Redefine. * tree.cc (cxx_attribute_table): Replace with... (cxx_gnu_attributes, cxx_gnu_attribute_table): ...these globals. (std_attribute_table): Change type to scoped_attribute_specs, using... (std_attributes): ...this as the underlying array. (init_tree): Remove call to register_scoped_attributes. gcc/d/ * d-tree.h (d_langhook_attribute_table): Replace with... (d_langhook_gnu_attribute_table): ...this. (d_langhook_common_attribute_table): Change type to scoped_attribute_specs. * d-attribs.cc (d_langhook_common_attribute_table): Change type to scoped_attribute_specs, using... (d_langhook_common_attributes): ...this as the underlying array. (d_langhook_attribute_table): Replace with... (d_langhook_gnu_attributes, d_langhook_gnu_attribute_table): ...these new globals. (uda_attribute_p): Update accordingly, and update for new targetm.attribute_table type. * d-lang.cc (d_langhook_attribute_table): New global. (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete. gcc/fortran/ * f95-lang.cc: Include attribs.h. (gfc_attribute_table): Change to an array of scoped_attribute_specs pointers, using... (gfc_gnu_attributes, gfc_gnu_attribute_table): ...these new globals. gcc/jit/ * dummy-frontend.cc (jit_format_attribute_table): Change type to scoped_attribute_specs, using... (jit_format_attributes): ...this as the underlying array. (jit_attribute_table): Change to an array of scoped_attribute_specs pointers, using... (jit_gnu_attributes, jit_gnu_attribute_table): ...these new globals for the original array. Include the format attributes. (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete. (LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete. (LANG_HOOKS_ATTRIBUTE_TABLE): Define. gcc/lto/ * lto-lang.cc (lto_format_attribute_table): Change type to scoped_attribute_specs, using... (lto_format_attributes): ...this as the underlying array. (lto_attribute_table): Change to an array of scoped_attribute_specs pointers, using... (lto_gnu_attributes, lto_gnu_attribute_table): ...these new globals for the original array. Include the format attributes. (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete. (LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete. (LANG_HOOKS_ATTRIBUTE_TABLE): Define.
-
- Nov 24, 2023
-
-
Uros Bizjak authored
For -mcmodel=large, we have to load function address to a register. PR target/112686 gcc/ChangeLog: * config/i386/i386.cc (ix86_expand_split_stack_prologue): Load function address to a register for ix86_cmodel == CM_LARGE. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112686.c: New test.
-
- Nov 23, 2023
-
-
Uros Bizjak authored
With the above two options, use a temporary register regno (as returned from split_stack_prologue_scratch_regno) as an indirect call scratch register to hold __morestack function address. On 64-bit targets, two temporary registers are always available, so load the function addres in %r11 and call __morestack_large_model with its one-argument-register value rn %r10. On 32-bit targets, bail out with a "sorry" if the temporary register can not be obtained. On 32-bit targets, also emit PIC sequence that re-uses the obtained indirect call scratch register before moving the function address to it. We can not set up %ebx PIC register in this case, but __morestack is prepared for this situation and sets it up by itself. PR target/89316 gcc/ChangeLog: * config/i386/i386.cc (ix86_expand_split_stack_prologue): Obtain scratch regno when flag_force_indirect_call is set. On 64-bit targets, call __morestack_large_model when flag_force_indirect_call is set and on 32-bit targets with -fpic, manually expand PIC sequence to call __morestack. Move the function address to an indirect call scratch register. gcc/testsuite/ChangeLog: * g++.target/i386/pr89316.C: New test. * gcc.target/i386/pr112605-1.c: New test. * gcc.target/i386/pr112605-2.c: New test. * gcc.target/i386/pr112605.c: New test.
-
- Nov 21, 2023
-
-
Hongyu Wang authored
PPX stands for Push-Pop Acceleration. PUSH/PUSH2 and its corresponding POP can be marked with a 1-bit hint to indicate that the POP reads the value written by the PUSH from the stack. The processor tracks these marked instructions internally and fast-forwards register data between matching PUSH and POP instructions, without going through memory or through the training loop of the Fast Store Forwarding Predictor (FSFP). This feature can also be adopted to PUSH2/POP2. For GCC, we emit explicit suffix 'p' (paired) to indicate the push/pop pair are marked with PPX hint. To separate form original push/pop, we add an UNSPEC on top of those PUSH/POP patterns. In the first implementation we only emit them under prologue/epilogue when saving/restoring callee-saved registers to make sure push/pop are paired. So an extra flag was added to check if PPX insns can be emitted for those register save/restore interfaces. The PPX hint is purely a performance hint. If the 'p' suffix is not emitted for paired push/pop, the PPX optimization will be disabled, while program sematic will not be affected at all. gcc/ChangeLog: * config/i386/i386-expand.h (gen_push): Add default bool parameter. (gen_pop): Likewise. * config/i386/i386-opts.h (enum apx_features): Add apx_ppx, add it to apx_all. * config/i386/i386.cc (ix86_emit_restore_reg_using_pop): Add ppx_p parameter for function declaration. (gen_push2): Add ppx_p parameter, emit push2p if ppx_p is true. (gen_push): Likewise. (ix86_emit_restore_reg_using_pop2): Likewise for pop2p. (ix86_emit_save_regs): Emit pushp/push2p under TARGET_APX_PPX. (ix86_emit_restore_reg_using_pop): Add ppx_p, emit popp insn and adjust cfi when ppx_p is ture. (ix86_emit_restore_reg_using_pop2): Add ppx_p and parse to its callee. (ix86_emit_restore_regs_using_pop2): Likewise. (ix86_expand_epilogue): Parse TARGET_APX_PPX to ix86_emit_restore_reg_using_pop. * config/i386/i386.h (TARGET_APX_PPX): New. * config/i386/i386.md (UNSPEC_APX_PPX): New unspec. (pushp_di): New define_insn. (popp_di): Likewise. (push2p_di): Likewise. (pop2p_di): Likewise. * config/i386/i386.opt: Add apx_ppx enum. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-interrupt-1.c: Adjust option to restrict them under certain subfeatures. * gcc.target/i386/apx-push2pop2-1.c: Likewise. * gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise. * gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise. * gcc.target/i386/apx-ppx-1.c: New test.
-
- Nov 13, 2023
-
-
Uros Bizjak authored
Flags reg is valid only with CC mode. gcc/ChangeLog: * config/i386/i386-expand.h (gen_pushfl): New prototype. (gen_popfl): Ditto. * config/i386/i386-expand.cc (ix86_expand_builtin) [case IX86_BUILTIN_READ_FLAGS]: Use gen_pushfl. [case IX86_BUILTIN_WRITE_FLAGS]: Use gen_popfl. * config/i386/i386.cc (gen_pushfl): New function. (gen_popfl): Ditto. * config/i386/i386.md (unspec): Add UNSPEC_PUSHFL and UNSPEC_POPFL. (@pushfl<mode>2): Rename from *pushfl<mode>2. Rewrite as unspec using UNSPEC_PUSHFL. (@popfl<mode>1): Rename from *popfl<mode>1. Rewrite as unspec using UNSPEC_POPFL.
-
Uros Bizjak authored
Combine wants to combine following instructions into an insn that can perform both an (arithmetic) operation and set the condition code. During the conversion a new RTX is created, and combine passes the RTX code of the innermost RTX expression of the CC use insn in which CC reg is used to SELECT_CC_MODE, to determine the new mode of the comparison: Trying 5 -> 8: 5: r98:DI=0xd7 8: flags:CCZ=cmp(r98:DI,0) REG_EQUAL cmp(0xd7,0) Failed to match this instruction: (parallel [ (set (reg:CC 17 flags) (compare:CC (const_int 215 [0xd7]) (const_int 0 [0]))) (set (reg/v:DI 98 [ flags ]) (const_int 215 [0xd7])) ]) where: (insn 5 2 6 2 (set (reg/v:DI 98 [ flags ]) (const_int 215 [0xd7])) "pr112494.c":8:8 84 {*movdi_internal} (nil)) (insn 8 7 11 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v:DI 98 [ flags ]) (const_int 0 [0]))) "pr112494.c":11:9 8 {*cmpdi_ccno_1} (expr_list:REG_EQUAL (compare:CCZ (const_int 215 [0xd7]) (const_int 0 [0])) (nil))) x86_cc_mode (AKA SELECT_CC_MODE) is not prepared to handle random RTX codes and triggers gcc_unreachable() when SET RTX code is passed to it. The patch removes gcc_unreachable() and returns CCmode for unknown RTX codes, so combine can try various combinations involving CC reg without triggering ICE. Please note that x86 MOV instructions do not set flags, so the above combination is not recognized as a valid x86 instruction. PR target/112494 gcc/ChangeLog: * config/i386/i386.cc (ix86_cc_mode) [default]: Return CCmode. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112494.c: New test.
-
- Nov 11, 2023
-
-
Richard Sandiford authored
This patch passes the set of live hard registers to the after hook, like the previous one did for the needed hook. gcc/ * target.def (mode_switching.after): Add a regs_live parameter. * doc/tm.texi: Regenerate. * config/epiphany/epiphany-protos.h (epiphany_mode_after): Update accordingly. * config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise. (epiphany_mode_after): Likewise. * config/i386/i386.cc (ix86_mode_after): Likewise. * config/riscv/riscv.cc (riscv_mode_after): Likewise. * config/sh/sh.cc (sh_mode_after): Likewise. * mode-switching.cc (optimize_mode_switching): Likewise.
-
Richard Sandiford authored
The emit hook already takes the set of live hard registers as input. This patch passes it to the needed hook too. SME uses this to optimise the mode choice based on whether state is live or dead. The main caller already had access to the required info, but the special handling of return values did not. gcc/ * target.def (mode_switching.needed): Add a regs_live parameter. * doc/tm.texi: Regenerate. * config/epiphany/epiphany-protos.h (epiphany_mode_needed): Update accordingly. * config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise. * config/epiphany/mode-switch-use.cc (insert_uses): Likewise. * config/i386/i386.cc (ix86_mode_needed): Likewise. * config/riscv/riscv.cc (riscv_mode_needed): Likewise. * config/sh/sh.cc (sh_mode_needed): Likewise. * mode-switching.cc (optimize_mode_switching): Likewise. (create_pre_exit): Likewise, using the DF simulate functions to calculate the required information.
-
- Nov 09, 2023
-
-
Alexandre Oliva authored
Looking at the code generated for sse2-{load,store}-multi.c with PIE, I realized we could use UNSPEC_GOTOFF as a base address, and that this would enable the test to use the vector insns expected by the tests even with PIC, so I extended the base + offset logic used by the SSE2 multi-load/store peepholes to accept reg + symbolic base + offset too, so that the test generated the expected insns even with PIE. for gcc/ChangeLog * config/i386/i386.cc (symbolic_base_address_p, base_address_p): New, factored out from... (extract_base_offset_in_addr): ... here and extended to recognize REG+GOTOFF, as in gcc.target/i386/sse2-load-multi.c and sse2-store-multi.c with PIE enabled by default.
-
- Nov 06, 2023
-
-
Uros Bizjak authored
Use "addr" attribute with "gpr8" value to limit address register class to non-REX registers in instructions with high registers, where REX registers can not be used in the address. gcc/ChangeLog: * config/i386/constraints.md (Bc): Remove constraint. (Bn): Rewrite to use x86_extended_reg_mentioned_p predicate. * config/i386/i386.cc (ix86_memory_address_reg_class): Do not limit processing to TARGET_APX_EGPR. Exit early for NULL insn. Do not check recog_data.insn before calling extract_insn_cached. (ix86_insn_base_reg_class): Handle ADDR_GPR8. (ix86_regno_ok_for_insn_base_p): Ditto. (ix86_insn_index_reg_class): Ditto. * config/i386/i386.md (*cmpqi_ext<mode>_1_mem_rex64): Remove insn pattern and corresponding peephole2 pattern. (*cmpi_ext<mode>_1): Remove (m,Q) alternative. Change (QBc,Q) alternative to (QBn,Q). Add "addr" attribute. (*cmpqi_ext<mode>_3_mem_rex64): Remove insn pattern and corresponding peephole2 pattern. (*cmpi_ext<mode>_3): Remove (Q,m) alternative. Change (Q,QnBc) alternative to (Q,QnBn). Add "addr" attribute. (*extzvqi_mem_rex64): Remove insn pattern and corresponding peephole2 pattern. (*extzvqi): Remove (Q,m) alternative. Change (Q,QnBc) alternative to (Q,QnBn). Add "addr" attribute. (*insvqi_1_mem_rex64): Remove insn pattern and corresponding peephole2 pattern. (*insvqi_1): Remove (Q,m) alternative. Change (Q,QnBc) alternative to (Q,QnBn). Add "addr" attribute. (@insv<mode>_1): Ditto. (*addqi_ext<mode>_0): Remove (m,0,Q) alternative. Change (QBc,0,Q) alternative to (QBn,0,Q). Add "addr" attribute. (*subqi_ext<mode>_0): Ditto. (*andqi_ext<mode>_0): Ditto. (*<any_or:code>qi_ext<mode>_0): Ditto. (*addqi_ext<mode>_1): Remove (Q,0,m) alternative. Change (Q,0,QnBc) alternative to (Q,0,QnBn). Add "addr" attribute. (*andqi_ext<mode>_1): Ditto. (*andqi_ext<mode>_1_cc): Ditto. (*<any_or:code>qi_ext<mode>_1): Ditto. (*xorqi_ext<mode>_1_cc): Ditto. * config/i386/predicates.md (nonimm_x64constmem_operand): Remove predicate. (general_x64constmem_operand): Ditto. (norex_memory_operand): Ditto.
-
- Nov 03, 2023
-
-
Uros Bizjak authored
The patch generalizes address register class handling to allow multiple register classes. For APX EGPR targets, some instructions do not support GPR32 registers, so it is necessary to limit address register set to avoid them. The same situation happens for instructions with high registers, where REX registers can not be used in the address, so the existing infrastructure can be adapted to also handle this case. The patch is mostly a mechanical rename of "gpr32" attribute to "addr" and introduces no functional changes, although it fixes a couple of inconsistent attribute values in passing. A follow-up patch will use the above infrastructure to limit address register class to legacy registers for instructions with high registers. gcc/ChangeLog: * config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p): Rename to ... (ix86_memory_address_reg_class): ... this. Generalize address register class handling to allow multiple address register classes. Return maximal class for unrecognized instructions. Improve comments. (ix86_insn_base_reg_class): Rewrite to handle multiple address register classes. (ix86_regno_ok_for_insn_base_p): Ditto. (ix86_insn_index_reg_class): Ditto. * config/i386/i386.md: Rename "gpr32" attribute to "addr" and substitute its values with "0" -> "gpr16", "1" -> "*". (addr): New attribute to limit allowed address register set. (gpr32): Remove. * config/i386/mmx.md: Rename "gpr32" attribute to "addr" and substitute its values with "0" -> "gpr16", "1" -> "*". * config/i386/sse.md: Ditto.
-
- Oct 27, 2023
-
-
liuhongt authored
gcc/ChangeLog: PR target/103861 * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle V2HF/V2BF/V4HF/V4BFmode. * config/i386/i386.cc (ix86_get_mask_mode): Return QImode when data_mode is V4HF/V2HFmode. * config/i386/mmx.md (vec_cmpv4hfqi): New expander. (vcond_mask_<mode>v4hi): Ditto. (vcond_mask_<mode>qi): Ditto. (vec_cmpv2hfqi): Ditto. (vcond_mask_<mode>v2hi): Ditto. (mmx_plendvb_<mode>): Add 2 combine splitters after the patterns. (mmx_pblendvb_v8qi): Ditto. (<code>v2hi3): Add a combine splitter after the pattern. (<code><mode>3): Ditto. (<code>v8qi3): Ditto. (<code><mode>3): Ditto. * config/i386/sse.md (vcond<mode><mode>): Merge this with .. (vcond<sseintvecmodelower><mode>): .. this into .. (vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): .. this, and extend to V8BF/V16BF/V32BFmode. gcc/testsuite/ChangeLog: * g++.target/i386/part-vect-vcondhf.C: New test. * gcc.target/i386/part-vect-vec_cmphf.c: New test.
-
- Oct 23, 2023
-
-
Haochen Jiang authored
Currently, there will be a chance in split to use x/ymm16+ w/o AVX512VL, which finally leads to an ICE as pr111753 does. This patch aims to fix that. gcc/ChangeLog: PR target/111753 * config/i386/i386.cc (ix86_standard_x87sse_constant_load_p): Do not split to xmm16+ when !TARGET_AVX512VL. gcc/testsuite/ChangeLog: PR target/111753 * gcc.target/i386/pr111753.c: New test.
-
- Oct 22, 2023
-
-
Andrew Burgess authored
Enable -ftrampoline-impl=heap by default if we are on macOS 11 or later. Co-Authored-By:
Maxim Blinov <maxim.blinov@embecosm.com> Co-Authored-By:
Francois-Xavier Coudert <fxcoudert@gcc.gnu.org> Co-Authored-By:
Iain Sandoe <iain@sandoe.co.uk> gcc/ChangeLog: * config.gcc: Default to heap trampolines on macOS 11 and above. * config/i386/darwin.h: Define X86_CUSTOM_FUNCTION_TEST. * config/i386/i386.h: Define X86_CUSTOM_FUNCTION_TEST. * config/i386/i386.cc: Use X86_CUSTOM_FUNCTION_TEST.
-
- Oct 16, 2023
-
-
Uros Bizjak authored
From: Fangrui Song <maskray@google.com> When using -mcmodel=medium, large data objects larger than the -mlarge-data-threshold threshold are placed into large data sections (.lrodata, .ldata, .lbss and some variants). GNU ld and ld.lld 17 place .l* sections into separate output sections. If small and medium code model object files are mixed, the .l* sections won't exert relocation overflow pressure on sections in object files built with -mcmodel=small. However, when using -mcmodel=large, -mlarge-data-threshold doesn't apply. This means that the .rodata/.data/.bss sections may exert relocation overflow pressure on sections in -mcmodel=small object files. This patch allows -mcmodel=large to generate .l* sections and drops an unneeded documentation restriction that the value must be the same. Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU ("Large data sections for the large code model") Signed-off-by:
Fangrui Song <maskray@google.com> gcc/ChangeLog: * config/i386/i386.cc (ix86_can_inline_p): Handle CM_LARGE and CM_LARGE_PIC. (x86_elf_aligned_decl_common): Ditto. (x86_output_aligned_bss): Ditto. * config/i386/i386.opt: Update doc for -mlarge-data-threshold=. * doc/invoke.texi: Update doc for -mlarge-data-threshold=. gcc/testsuite/ChangeLog: * gcc.target/i386/large-data.c: New test.
-
- Oct 12, 2023
-
-
Mo, Zewei authored
This feature requires stack to be aligned at 16byte, therefore in prologue/epilogue, a standalone push/pop will be emitted before any push2/pop2 if the stack was not aligned to 16byte. Also for current implementation we only support push2/pop2 usage in function prologue/epilogue for those callee-saved registers. gcc/ChangeLog: * config/i386/i386.cc (gen_push2): New function to emit push2 and adjust cfa offset. (ix86_pro_and_epilogue_can_use_push2_pop2): New function to determine whether push2/pop2 can be used. (ix86_compute_frame_layout): Adjust preferred stack boundary and stack alignment needed for push2/pop2. (ix86_emit_save_regs): Emit push2 when available. (ix86_emit_restore_reg_using_pop2): New function to emit pop2 and adjust cfa info. (ix86_emit_restore_regs_using_pop2): New function to loop through the saved regs and call above. (ix86_expand_epilogue): Call ix86_emit_restore_regs_using_pop2 when push2pop2 available. * config/i386/i386.md (push2_di): New pattern for push2. (pop2_di): Likewise for pop2. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-push2pop2-1.c: New test. * gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise. * gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise. Co-authored-by:
Hu Lin1 <lin1.hu@intel.com> Co-authored-by:
Hongyu Wang <hongyu.wang@intel.com>
-
- Oct 09, 2023
-
-
Haochen Jiang authored
gcc/Changelog: * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate): Make sure there is EVEX512 enabled. (ix86_expand_vecop_qihi2): Refuse V32QI->V32HI when no EVEX512. * config/i386/i386.cc (ix86_hard_regno_mode_ok): Disable 64 bit mask when !TARGET_EVEX512. * config/i386/i386.md (avx512bw_512): New. (SWI1248_AVX512BWDQ_64): Add TARGET_EVEX512. (*zero_extendsidi2): Change isa to avx512bw_512. (kmov_isa): Ditto. (*anddi_1): Ditto. (*andn<mode>_1): Change isa to kmov_isa. (*<code><mode>_1): Ditto. (*notxor<mode>_1): Ditto. (*one_cmpl<mode>2_1): Ditto. (*one_cmplsi2_1_zext): Change isa to avx512bw_512. (*ashl<mode>3_1): Change isa to kmov_isa. (*lshr<mode>3_1): Ditto. * config/i386/sse.md (VI12HFBF_AVX512VL): Add TARGET_EVEX512. (VI1248_AVX512VLBW): Ditto. (VHFBF_AVX512VL): Ditto. (VI): Ditto. (VIHFBF): Ditto. (VI_AVX2): Ditto. (VI1_AVX512): Ditto. (VI12_256_512_AVX512VL): Ditto. (VI2_AVX2_AVX512BW): Ditto. (VI2_AVX512VNNIBW): Ditto. (VI2_AVX512VL): Ditto. (VI2HFBF_AVX512VL): Ditto. (VI8_AVX2_AVX512BW): Ditto. (VIMAX_AVX2_AVX512BW): Ditto. (VIMAX_AVX512VL): Ditto. (VI12_AVX2_AVX512BW): Ditto. (VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto. (VI248_AVX512VL): Ditto. (VI248_AVX512VLBW): Ditto. (VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto. (VI248_AVX512BW): Ditto. (VI248_AVX512BW_AVX512VL): Ditto. (VI248_512): Ditto. (VI124_256_AVX512F_AVX512BW): Ditto. (VI_AVX512BW): Ditto. (VIHFBF_AVX512BW): Ditto. (SWI1248_AVX512BWDQ): Ditto. (SWI1248_AVX512BW): Ditto. (SWI1248_AVX512BWDQ2): Ditto. (*knotsi_1_zext): Ditto. (define_split for zero_extend + not): Ditto. (kunpckdi): Ditto. (REDUC_SMINMAX_MODE): Ditto. (VEC_EXTRACT_MODE): Ditto. (*avx512bw_permvar_truncv16siv16hi_1): Ditto. (*avx512bw_permvar_truncv16siv16hi_1_hf): Ditto. (truncv32hiv32qi2): Ditto. (avx512bw_<code>v32hiv32qi2): Ditto. (avx512bw_<code>v32hiv32qi2_mask): Ditto. (avx512bw_<code>v32hiv32qi2_mask_store): Ditto. (usadv64qi): Ditto. (VEC_PERM_AVX2): Ditto. (AVX512ZEXTMASK): Ditto. (SWI24_MASK): New. (vec_pack_trunc_<mode>): Change iterator to SWI24_MASK. (avx512bw_packsswb<mask_name>): Add TARGET_EVEX512. (avx512bw_packssdw<mask_name>): Ditto. (avx512bw_interleave_highv64qi<mask_name>): Ditto. (avx512bw_interleave_lowv64qi<mask_name>): Ditto. (<mask_codefor>avx512bw_pshuflwv32hi<mask_name>): Ditto. (<mask_codefor>avx512bw_pshufhwv32hi<mask_name>): Ditto. (vec_unpacks_lo_di): Ditto. (SWI48x_MASK): New. (vec_unpacks_hi_<mode>): Change iterator to SWI48x_MASK. (avx512bw_umulhrswv32hi3<mask_name>): Add TARGET_EVEX512. (VI1248_AVX512VL_AVX512BW): Ditto. (avx512bw_<code>v32qiv32hi2<mask_name>): Ditto. (*avx512bw_zero_extendv32qiv32hi2_1): Ditto. (*avx512bw_zero_extendv32qiv32hi2_2): Ditto. (<insn>v32qiv32hi2): Ditto. (pbroadcast_evex_isa): Change isa attribute to avx512bw_512. (VPERMI2): Add TARGET_EVEX512. (VPERMI2I): Ditto.
-
Haochen Jiang authored
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse2_mulvxdi3): Add TARGET_EVEX512 for 512 bit usage. * config/i386/i386.cc (standard_sse_constant_opcode): Ditto. * config/i386/sse.md (VF1_VF2_AVX512DQ): Ditto. (VF1_128_256VL): Ditto. (VF2_AVX512VL): Ditto. (VI8_256_512): Ditto. (<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>): Ditto. (AVX512_VEC): Ditto. (AVX512_VEC_2): Ditto. (VI4F_BRCST32x2): Ditto. (VI8F_BRCST64x2): Ditto.
-
Haochen Jiang authored
gcc/ChangeLog: * config/i386/i386-builtins.cc (ix86_vectorize_builtin_gather): Disable 512 bit gather when !TARGET_EVEX512. * config/i386/i386-expand.cc (ix86_valid_mask_cmp_mode): Add TARGET_EVEX512. (ix86_expand_int_sse_cmp): Ditto. (ix86_expand_vector_init_one_nonzero): Disable subroutine when !TARGET_EVEX512. (ix86_emit_swsqrtsf): Add TARGET_EVEX512. (ix86_vectorize_vec_perm_const): Disable subroutine when !TARGET_EVEX512. * config/i386/i386.cc (standard_sse_constant_p): Add TARGET_EVEX512. (standard_sse_constant_opcode): Ditto. (ix86_get_ssemov): Ditto. (ix86_legitimate_constant_p): Ditto. (ix86_vectorize_builtin_scatter): Diable 512 bit scatter when !TARGET_EVEX512. * config/i386/i386.md (avx512f_512): New. (movxi): Add TARGET_EVEX512. (*movxi_internal_avx512f): Ditto. (*movdi_internal): Change alternative 12 to ?Yv. Adjust mode for alternative 13. (*movsi_internal): Change alternative 8 to ?Yv. Adjust mode for alternative 9. (*movhi_internal): Change alternative 11 to *Yv. (*movdf_internal): Change alternative 12 to Yv. (*movsf_internal): Change alternative 5 to Yv. Adjust mode for alternative 5 and 6. (*mov<mode>_internal): Change alternative 4 to Yv. (define_split for convert SF to DF): Add TARGET_EVEX512. (extendbfsf2_1): Ditto. * config/i386/predicates.md (bcst_mem_operand): Disable predicate for 512 bit when !TARGET_EVEX512. * config/i386/sse.md (VMOVE): Add TARGET_EVEX512. (V48_AVX512VL): Ditto. (V48_256_512_AVX512VL): Ditto. (V48H_AVX512VL): Ditto. (VI12_AVX512VL): Ditto. (V): Ditto. (V_512): Ditto. (V_256_512): Ditto. (VF): Ditto. (VF1_VF2_AVX512DQ): Ditto. (VFH): Ditto. (VFB): Ditto. (VF1): Ditto. (VF1_AVX2): Ditto. (VF2): Ditto. (VF2H): Ditto. (VF2_512_256): Ditto. (VF2_512_256VL): Ditto. (VF_512): Ditto. (VFB_512): Ditto. (VI48_AVX512VL): Ditto. (VI1248_AVX512VLBW): Ditto. (VF_AVX512VL): Ditto. (VFH_AVX512VL): Ditto. (VF1_AVX512VL): Ditto. (VI): Ditto. (VIHFBF): Ditto. (VI_AVX2): Ditto. (VI8): Ditto. (VI8_AVX512VL): Ditto. (VI2_AVX512F): Ditto. (VI4_AVX512F): Ditto. (VI4_AVX512VL): Ditto. (VI48_AVX512F_AVX512VL): Ditto. (VI8_AVX2_AVX512F): Ditto. (VI8_AVX_AVX512F): Ditto. (V8FI): Ditto. (V16FI): Ditto. (VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto. (VI248_AVX512VLBW): Ditto. (VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto. (VI248_AVX512BW): Ditto. (VI248_AVX512BW_AVX512VL): Ditto. (VI48_AVX512F): Ditto. (VI48_AVX_AVX512F): Ditto. (VI12_AVX_AVX512F): Ditto. (VI148_512): Ditto. (VI124_256_AVX512F_AVX512BW): Ditto. (VI48_512): Ditto. (VI_AVX512BW): Ditto. (VIHFBF_AVX512BW): Ditto. (VI4F_256_512): Ditto. (VI48F_256_512): Ditto. (VI48F): Ditto. (VI12_VI48F_AVX512VL): Ditto. (V32_512): Ditto. (AVX512MODE2P): Ditto. (STORENT_MODE): Ditto. (REDUC_PLUS_MODE): Ditto. (REDUC_SMINMAX_MODE): Ditto. (*andnot<mode>3): Change isa attribute to avx512f_512. (*andnot<mode>3): Ditto. (<code><mode>3): Ditto. (<code>tf3): Ditto. (FMAMODEM): Add TARGET_EVEX512. (FMAMODE_AVX512): Ditto. (VFH_SF_AVX512VL): Ditto. (avx512f_fix_notruncv16sfv16si<mask_name><round_name>): Ditto. (fix<fixunssuffix>_truncv16sfv16si2<mask_name><round_saeonly_name>): Ditto. (avx512f_cvtdq2pd512_2): Ditto. (avx512f_cvtpd2dq512<mask_name><round_name>): Ditto. (fix<fixunssuffix>_truncv8dfv8si2<mask_name><round_saeonly_name>): Ditto. (<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>): Ditto. (vec_unpacks_lo_v16sf): Ditto. (vec_unpacks_hi_v16sf): Ditto. (vec_unpacks_float_hi_v16si): Ditto. (vec_unpacks_float_lo_v16si): Ditto. (vec_unpacku_float_hi_v16si): Ditto. (vec_unpacku_float_lo_v16si): Ditto. (vec_pack_sfix_trunc_v8df): Ditto. (avx512f_vec_pack_sfix_v8df): Ditto. (<mask_codefor>avx512f_unpckhps512<mask_name>): Ditto. (<mask_codefor>avx512f_unpcklps512<mask_name>): Ditto. (<mask_codefor>avx512f_movshdup512<mask_name>): Ditto. (<mask_codefor>avx512f_movsldup512<mask_name>): Ditto. (AVX512_VEC): Ditto. (AVX512_VEC_2): Ditto. (vec_extract_lo_v64qi): Ditto. (vec_extract_hi_v64qi): Ditto. (VEC_EXTRACT_MODE): Ditto. (<mask_codefor>avx512f_unpckhpd512<mask_name>): Ditto. (avx512f_movddup512<mask_name>): Ditto. (avx512f_unpcklpd512<mask_name>): Ditto. (*<avx512>_vternlog<mode>_all): Ditto. (*<avx512>_vpternlog<mode>_1): Ditto. (*<avx512>_vpternlog<mode>_2): Ditto. (*<avx512>_vpternlog<mode>_3): Ditto. (avx512f_shufps512_mask): Ditto. (avx512f_shufps512_1<mask_name>): Ditto. (avx512f_shufpd512_mask): Ditto. (avx512f_shufpd512_1<mask_name>): Ditto. (<mask_codefor>avx512f_interleave_highv8di<mask_name>): Ditto. (<mask_codefor>avx512f_interleave_lowv8di<mask_name>): Ditto. (vec_dupv2df<mask_name>): Ditto. (trunc<pmov_src_lower><mode>2): Ditto. (*avx512f_<code><pmov_src_lower><mode>2): Ditto. (*avx512f_vpermvar_truncv8div8si_1): Ditto. (avx512f_<code><pmov_src_lower><mode>2_mask): Ditto. (avx512f_<code><pmov_src_lower><mode>2_mask_store): Ditto. (truncv8div8qi2): Ditto. (avx512f_<code>v8div16qi2): Ditto. (*avx512f_<code>v8div16qi2_store_1): Ditto. (*avx512f_<code>v8div16qi2_store_2): Ditto. (avx512f_<code>v8div16qi2_mask): Ditto. (*avx512f_<code>v8div16qi2_mask_1): Ditto. (*avx512f_<code>v8div16qi2_mask_store_1): Ditto. (avx512f_<code>v8div16qi2_mask_store_2): Ditto. (vec_widen_umult_even_v16si<mask_name>): Ditto. (*vec_widen_umult_even_v16si<mask_name>): Ditto. (vec_widen_smult_even_v16si<mask_name>): Ditto. (*vec_widen_smult_even_v16si<mask_name>): Ditto. (VEC_PERM_AVX2): Ditto. (one_cmpl<mode>2): Ditto. (<mask_codefor>one_cmpl<mode>2<mask_name>): Ditto. (*one_cmpl<mode>2_pternlog_false_dep): Ditto. (define_split to xor): Ditto. (*andnot<mode>3): Ditto. (define_split for ior): Ditto. (*iornot<mode>3): Ditto. (*xnor<mode>3): Ditto. (*<nlogic><mode>3): Ditto. (<mask_codefor>avx512f_interleave_highv16si<mask_name>): Ditto. (<mask_codefor>avx512f_interleave_lowv16si<mask_name>): Ditto. (avx512f_pshufdv3_mask): Ditto. (avx512f_pshufd_1<mask_name>): Ditto. (*vec_extractv4ti): Ditto. (VEXTRACTI128_MODE): Ditto. (define_split to vec_extract): Ditto. (VI1248_AVX512VL_AVX512BW): Ditto. (<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>): Ditto. (<insn>v16qiv16si2): Ditto. (avx512f_<code>v16hiv16si2<mask_name>): Ditto. (<insn>v16hiv16si2): Ditto. (avx512f_zero_extendv16hiv16si2_1): Ditto. (avx512f_<code>v8qiv8di2<mask_name>): Ditto. (*avx512f_<code>v8qiv8di2<mask_name>_1): Ditto. (*avx512f_<code>v8qiv8di2<mask_name>_2): Ditto. (<insn>v8qiv8di2): Ditto. (avx512f_<code>v8hiv8di2<mask_name>): Ditto. (<insn>v8hiv8di2): Ditto. (avx512f_<code>v8siv8di2<mask_name>): Ditto. (*avx512f_zero_extendv8siv8di2_1): Ditto. (*avx512f_zero_extendv8siv8di2_2): Ditto. (<insn>v8siv8di2): Ditto. (avx512f_roundps512_sfix): Ditto. (vashrv8di3): Ditto. (vashrv16si3): Ditto. (pbroadcast_evex_isa): Change isa attribute to avx512f_512. (vec_dupv4sf): Add TARGET_EVEX512. (*vec_dupv4si): Ditto. (*vec_dupv2di): Ditto. (vec_dup<mode>): Change isa attribute to avx512f_512. (VPERMI2): Add TARGET_EVEX512. (VPERMI2I): Ditto. (VEC_INIT_MODE): Ditto. (VEC_INIT_HALF_MODE): Ditto. (<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>): Ditto. (avx512f_vcvtps2ph512_mask_sae): Ditto. (<mask_codefor>avx512f_vcvtps2ph512<mask_name><round_saeonly_name>): Ditto. (*avx512f_vcvtps2ph512<merge_mask_name>): Ditto. (INT_BROADCAST_MODE): Ditto.
-
Haochen Jiang authored
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_broadcast_from_constant): Disable zmm broadcast for !TARGET_EVEX512. * config/i386/i386-options.cc (ix86_option_override_internal): Do not use PVW_512 when no-evex512. (ix86_simd_clone_adjust): Add evex512 target into string. * config/i386/i386.cc (type_natural_mode): Report ABI warning when using zmm register w/o evex512. (ix86_return_in_memory): Do not allow zmm when !TARGET_EVEX512. (ix86_hard_regno_mode_ok): Ditto. (ix86_set_reg_reg_cost): Ditto. (ix86_rtx_costs): Ditto. (ix86_vector_mode_supported_p): Ditto. (ix86_preferred_simd_mode): Ditto. (ix86_get_mask_mode): Ditto. (ix86_simd_clone_compute_vecsize_and_simdlen): Disable 512 bit libmvec call when !TARGET_EVEX512. (ix86_simd_clone_usable): Ditto. * config/i386/i386.h (BIGGEST_ALIGNMENT): Disable 512 alignment when !TARGET_EVEX512 (MOVE_MAX): Do not use PVW_512 when !TARGET_EVEX512. (STORE_MAX_PIECES): Ditto.
-
- Oct 08, 2023
-
-
liuhongt authored
gcc/ChangeLog: * config/i386/i386.cc (ix86_build_const_vector): Handle V2HF and V4HFmode. (ix86_build_signbit_mask): Ditto. * config/i386/mmx.md (mmxintvecmode): Ditto. (<code><mode>2): New define_expand. (*mmx_<code><mode>): New define_insn_and_split. (*mmx_nabs<mode>2): Ditto. (*mmx_andnot<mode>3): New define_insn. (<code><mode>3): Ditto. (copysign<mode>3): New define_expand. (xorsign<mode>3): Ditto. (signbit<mode>2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-absneghf.c: New test. * gcc.target/i386/part-vect-copysignhf.c: New test. * gcc.target/i386/part-vect-xorsignhf.c: New test.
-
- Oct 07, 2023
-
-
Kong Lingling authored
Disable EGPR usage for below legacy insns in opcode map2/3 that have vex but no evex counterpart. insn list: 1. phminposuw/vphminposuw 2. ptest/vptest 3. roundps/vroundps, roundpd/vroundpd, roundss/vroundss, roundsd/vroundsd 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist gcc/ChangeLog: * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New prototype. * config/i386/i386.cc (x86_evex_reg_mentioned_p): New function. * config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0 and constraint jm to all non-evex alternatives, adjust alternative outputs if evex reg is mentioned. * config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0 and constraint jm/ja to all non-evex alternatives. (ptesttf2): Likewise. (<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise. (sse4_1_round<ssescalarmodesuffix>): Likewise. (sse4_2_pcmpestri): Likewise. (sse4_2_pcmpestrm): Likewise. (sse4_2_pcmpestr_cconly): Likewise. (sse4_2_pcmpistr): Likewise. (sse4_2_pcmpistri): Likewise. (sse4_2_pcmpistrm): Likewise. (sse4_2_pcmpistr_cconly): Likewise. (aesimc): Likewise. (aeskeygenassist): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic tests. Co-authored-by:
Hongyu Wang <hongyu.wang@intel.com> Co-authored-by:
Hongtao Liu <hongtao.liu@intel.com>
-
Hongyu Wang authored
For vector move insns like vmovdqa/vmovdqu, their evex counterparts requrire explicit suffix 64/32/16/8. The usage of these instruction are prohibited under AVX10_1 or AVX512F, so for we select vmovaps/vmovups for vector load/store insns that contains EGPR if ther is no AVX512VL, and keep the original move insn selection otherwise. gcc/ChangeLog: * config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used, adjust mnemonic for vmovduq/vmovdqa. * config/i386/sse.md (*<extract_type>_vinsert<shuffletype><extract_suf>_0): Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa. (avx_vec_concat<mode>): Likewise, and separate alternative 0 to avx_noavx512f. Co-authored-by:
Kong Lingling <lingling.kong@intel.com> Co-authored-by:
Hongtao Liu <hongtao.liu@intel.com>
-
Kong Lingling authored
In inline asm, we do not know if the insn can use EGPR, so disable EGPR usage by default via mapping the common reg/mem constraint to non-EGPR constraints. The full list of mapping goes like "g" -> "jrjmi" "r" -> "jr" "m" -> "jm" "<" -> "j<" ">" -> "j>" "o" -> "jo" "V" -> "jV" "p" -> "jp" "Bm" -> "ja For memory constraints, we add an option -mapx-inline-asm-use-gpr32 to allow/disallow gpr32 usage in any memory related constraints, as base_reg_class/index_reg_class cannot aware whether the asm insn support gpr32 or not. gcc/ChangeLog: * config/i386/i386.cc (map_egpr_constraints): New funciton to map common constraints to EGPR prohibited constraints. (ix86_md_asm_adjust): Calls map_egpr_constraints. * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-inline-gpr-norex2.c: New test. Co-authored-by:
Hongyu Wang <hongyu.wang@intel.com> Co-authored-by:
Hongtao Liu <hongtao.liu@intel.com>
-
Kong Lingling authored
Add backend helper functions to verify if a rtx_insn can adopt EGPR to its base/index reg of memory operand. The verification rule goes like 1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32. 2. Disable EGPR for unrecognized insn. 3. If which_alternative is not decided, loop through enabled alternatives and check its attr_gpr32. Only enable EGPR when all enabled alternatives has attr_gpr32 = 1. 4. If which_alternative is decided, enable/disable EGPR by its corresponding attr_gpr32. gcc/ChangeLog: * config/i386/i386-protos.h (ix86_insn_base_reg_class): New prototype. (ix86_regno_ok_for_insn_base_p): Likewise. (ix86_insn_index_reg_class): Likewise. * config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p): New helper function to scan the insn. (ix86_insn_base_reg_class): New function to choose BASE_REG_CLASS. (ix86_regno_ok_for_insn_base_p): Likewise for base regno. (ix86_insn_index_reg_class): Likewise for INDEX_REG_CLASS. * config/i386/i386.h (INSN_BASE_REG_CLASS): Define. (REGNO_OK_FOR_INSN_BASE_P): Likewise. (INSN_INDEX_REG_CLASS): Likewise. (enum reg_class): Add INDEX_GPR16. (GENERAL_GPR16_REGNO_P): Define. * config/i386/i386.md (gpr32): New attribute. Co-authored-by:
Hongyu Wang <hongyu.wang@intel.com> Co-authored-by:
Hongtao Liu <hongtao.liu@intel.com>
-