Commits · 51f8ac3341078303e81e72d9013698a31c5ddd29 · COBOLworx / gcc-cobol

Feb 05, 2024

x86-64: Find a scratch register for large model profiling · 51f8ac33

H.J. Lu authored 1 year ago

2 scratch registers, %r10 and %r11, are available at function entry for
large model profiling.  But %r10 may be used by stack realignment and we
can't use %r10 in this case.  Add x86_64_select_profile_regnum to find
a caller-saved register which isn't live or a callee-saved register
which has been saved on stack in the prologue at entry for large model
profiling and sorry if we can't find one.

gcc/

	PR target/113689
	* config/i386/i386.cc (x86_64_select_profile_regnum): New.
	(x86_function_profiler): Call x86_64_select_profile_regnum to
	get a scratch register for large model profiling.

gcc/testsuite/

	PR target/113689
	* gcc.target/i386/pr113689-1.c: New file.
	* gcc.target/i386/pr113689-2.c: Likewise.
	* gcc.target/i386/pr113689-3.c: Likewise.

51f8ac33

Jan 27, 2024

x86: Add no_callee_saved_registers function attribute · a96549dc

H.J. Lu authored 1 year ago

When an interrupt handler is implemented by an assembly stub which does:

1. Save all registers.
2. Call a C function.
3. Restore all registers.
4. Return from interrupt.

it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.

Add no_callee_saved_registers function attribute, which is complementary
to no_caller_saved_registers function attribute, to mark a function which
doesn't have any callee-saved registers.  Such a function won't save and
restore any registers.  Classify function call-saved register handling
type with:

1. Default call-saved registers.
2. No caller-saved registers with no_caller_saved_registers attribute.
3. No callee-saved registers with no_callee_saved_registers attribute.

Disallow sibcall if callee is a no_callee_saved_registers function
and caller isn't a no_callee_saved_registers function.  Otherwise,
callee-saved registers won't be preserved.

After a no_callee_saved_registers function is called, all registers may
be clobbered.  If the calling function isn't a no_callee_saved_registers
function, we need to preserve all registers which aren't used by function
calls.

gcc/

	PR target/103503
	PR target/113312
	* config/i386/i386-expand.cc (ix86_expand_call): Replace
	no_caller_saved_registers check with call_saved_registers check.
	Clobber all registers that are not used by the callee with
	no_callee_saved_registers attribute.
	* config/i386/i386-options.cc (ix86_set_func_type): Set
	call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
	noreturn function.  Disallow no_callee_saved_registers with
	interrupt or no_caller_saved_registers attributes together.
	(ix86_set_current_function): Replace no_caller_saved_registers
	check with call_saved_registers check.
	(ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
	(ix86_handle_call_saved_registers_attribute): This.
	(ix86_gnu_attributes): Add
	ix86_handle_call_saved_registers_attribute.
	* config/i386/i386.cc (ix86_conditional_register_usage): Replace
	no_caller_saved_registers check with call_saved_registers check.
	(ix86_function_ok_for_sibcall): Don't allow callee with
	no_callee_saved_registers attribute when the calling function
	has callee-saved registers.
	(ix86_comp_type_attributes): Also check
	no_callee_saved_registers.
	(ix86_epilogue_uses): Replace no_caller_saved_registers check
	with call_saved_registers check.
	(ix86_hard_regno_scratch_ok): Likewise.
	(ix86_save_reg): Replace no_caller_saved_registers check with
	call_saved_registers check.  Don't save any registers for
	TYPE_NO_CALLEE_SAVED_REGISTERS.  Save all registers with
	TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
	no_callee_saved_registers attribute is called.
	(find_drap_reg): Replace no_caller_saved_registers check with
	call_saved_registers check.
	* config/i386/i386.h (call_saved_registers_type): New enum.
	(machine_function): Replace no_caller_saved_registers with
	call_saved_registers.
	* doc/extend.texi: Document no_callee_saved_registers attribute.

gcc/testsuite/

	PR target/103503
	PR target/113312
	* gcc.dg/torture/no-callee-saved-run-1a.c: New file.
	* gcc.dg/torture/no-callee-saved-run-1b.c: Likewise.
	* gcc.target/i386/no-callee-saved-1.c: Likewise.
	* gcc.target/i386/no-callee-saved-2.c: Likewise.
	* gcc.target/i386/no-callee-saved-3.c: Likewise.
	* gcc.target/i386/no-callee-saved-4.c: Likewise.
	* gcc.target/i386/no-callee-saved-5.c: Likewise.
	* gcc.target/i386/no-callee-saved-6.c: Likewise.
	* gcc.target/i386/no-callee-saved-7.c: Likewise.
	* gcc.target/i386/no-callee-saved-8.c: Likewise.
	* gcc.target/i386/no-callee-saved-9.c: Likewise.
	* gcc.target/i386/no-callee-saved-10.c: Likewise.
	* gcc.target/i386/no-callee-saved-11.c: Likewise.
	* gcc.target/i386/no-callee-saved-12.c: Likewise.
	* gcc.target/i386/no-callee-saved-13.c: Likewise.
	* gcc.target/i386/no-callee-saved-14.c: Likewise.
	* gcc.target/i386/no-callee-saved-15.c: Likewise.
	* gcc.target/i386/no-callee-saved-16.c: Likewise.
	* gcc.target/i386/no-callee-saved-17.c: Likewise.
	* gcc.target/i386/no-callee-saved-18.c: Likewise.

a96549dc

Jan 18, 2024

i386: Add -masm=intel profiling support [PR113122] · d4a2d91b

Jakub Jelinek authored 1 year ago

x86_function_profiler emits assembly directly into file and only emits
AT&T syntax.  The following patch adjusts it to emit MASM syntax
if -masm=intel.
As it doesn't use asm_fprintf, I can't use {|} syntax for the dialects.

I've tested using
for i in -mcmodel=large "-mcmodel=large -fpic" "" -fpic "-m32 -fpic" "-m32"; do
./xgcc -B ./ -c -O2 -fprofile $i -masm=att pr113122.c -o pr113122.o1;
./xgcc -B ./ -c -O2 -fprofile $i -masm=intel pr113122.c -o pr113122.o2;
objdump -dr pr113122.o1 > /tmp/1; objdump -dr pr113122.o2 > /tmp/2;
diff -up /tmp/1 /tmp/2; done
that the emitted sequences are identical after assembly.

2024-01-18  Jakub Jelinek  <jakub@redhat.com>

	PR target/113122
	* config/i386/i386.cc (x86_function_profiler): Add -masm=intel
	support.  Add missing space after , in emitted assembly in some
	cases.  Formatting fixes.

	* gcc.target/i386/pr113122-1.c: New test.
	* gcc.target/i386/pr113122-2.c: New test.
	* gcc.target/i386/pr113122-3.c: New test.
	* gcc.target/i386/pr113122-4.c: New test.

d4a2d91b

Jan 05, 2024

asan: Align .LASANPC on function boundary · e66dc37b

Ilya Leoshkevich authored 1 year ago

GCC can emit code between the function label and the .LASANPC label,
making the latter unaligned.  Some architectures cannot load unaligned
labels directly and require literal pool entries, which is inefficient.

Move the invocation of asan_function_start to
ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code is
emitted.  This allows setting the .LASANPC label alignment to the
respective function alignment.

Link: https://inbox.sourceware.org/gcc-patches/20240102194511.3171559-3-iii@linux.ibm.com/


Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>

gcc/ChangeLog:

	* asan.cc (asan_function_start): Drop switch_to_section ().
	(asan_emit_stack_protection): Set .LASANPC alignment.
	* config/i386/i386.cc: Use assemble_function_label_raw ()
	instead of ASM_OUTPUT_LABEL ().
	* config/s390/s390.cc (s390_asm_output_function_label):
	Likewise.
	* defaults.h (ASM_OUTPUT_FUNCTION_LABEL): Likewise.
	* final.cc (final_start_function_1): Drop
	asan_function_start ().
	* output.h (assemble_function_label_raw): New function.
	* varasm.cc (assemble_function_label_raw): Likewise.

e66dc37b

Jan 03, 2024
- Update copyright years. · a945c346
  Jakub Jelinek authored 1 year ago
  
  a945c346
Dec 28, 2023

i386: Cleanup ix86_expand_{unary|binary}_operator issues · d74cceb6

Uros Bizjak authored 1 year ago

Move ix86_expand_unary_operator from i386.cc to i386-expand.cc, re-arrange
prototypes and do some cosmetic changes with the usage of TARGET_APX_NDD.

No functional changes.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_unary_operator_ok): Move from here...
	* config/i386/i386-expand.cc (ix86_unary_operator_ok): ... to here.
	* config/i386/i386-protos.h: Re-arrange ix86_{unary|binary}_operator_ok
	and ix86_expand_{unary|binary}_operator prototypes.
	* config/i386/i386.md: Cosmetic changes with the usage of
	TARGET_APX_NDD in ix86_expand_{unary|binary}_operator
	and ix86_{unary|binary}_operator_ok function calls.

d74cceb6

Dec 20, 2023

i386: Allow 64 bit mask register for -mno-evex512 · d3545378

Haochen Jiang authored 1 year ago

gcc/ChangeLog:

	* config/i386/avx512bwintrin.h: Allow 64 bit mask intrin usage
	for -mno-evex512.
	* config/i386/i386-builtin.def: Remove OPTION_MASK_ISA2_EVEX512
	for 64 bit mask builtins.
	* config/i386/i386.cc (ix86_hard_regno_mode_ok): Allow 64 bit
	mask register for -mno-evex512.
	* config/i386/i386.md (SWI1248_AVX512BWDQ_64): Remove
	TARGET_EVEX512.
	(*zero_extendsidi2): Change isa attribute to avx512bw.
	(kmov_isa): Ditto.
	(*anddi_1): Ditto.
	(*andn<mode>_1): Remove TARGET_EVEX512.
	(*one_cmplsi2_1_zext): Change isa attribute to avx512bw.
	(*ashl<mode>3_1): Ditto.
	(*lshr<mode>3_1): Ditto.
	* config/i386/sse.md (SWI1248_AVX512BWDQ): Remove TARGET_EVEX512.
	(SWI1248_AVX512BW): Ditto.
	(SWI1248_AVX512BWDQ2): Ditto.
	(*knotsi_1_zext): Ditto.
	(kunpckdi): Ditto.
	(SWI24_MASK): Removed.
	(vec_pack_trunc_<mode>): Change iterator from SWI24_MASK to SWI24.
	(vec_unpacks_lo_di): Remove TARGET_EVEX512.
	(SWI48x_MASK): Removed.
	(vec_unpacks_hi_<mode>): Change iterator from SWI48x_MASK to SWI48x.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_1-6.c: Remove check for errors.
	* gcc.target/i386/noevex512-2.c: Diito.

d3545378

Dec 15, 2023

bitint: Introduce abi_limb_mode · a98a3932

Jakub Jelinek authored 1 year ago

Given what I saw in the aarch64/arm psABIs for BITINT_TYPE, as I said
earlier I'm afraid we need to differentiate between the limb mode/precision
specified in the psABIs (what is used to decide how it is actually passed,
aligned or what size it has) vs. what limb mode/precision should be used
during bitint lowering and in the libgcc bitint APIs.
While in the x86_64 psABI a limb is 64-bit, which is perfect for both,
that is a wordsize which we can perform operations natively in,
e.g. aarch64 wants 128-bit limbs for alignment/sizing purposes, but
on the bitint lowering side I believe it would result in terribly bad code
and on the libgcc side wouldn't work at all (because it relies there on
longlong.h support).

So, the following patch makes it possible for aarch64 to use TImode
as abi_limb_mode for _BitInt(129) and larger, while using DImode as
limb_mode.

2023-12-15  Jakub Jelinek  <jakub@redhat.com>

	* target.h (struct bitint_info): Add abi_limb_mode member, adjust
	comment.
	* target.def (bitint_type_info): Mention abi_limb_mode instead of
	limb_mode.
	* varasm.cc (output_constant): Use abi_limb_mode rather than
	limb_mode.
	* stor-layout.cc (finish_bitfield_representative): Likewise.  Assert
	that if precision is smaller or equal to abi_limb_mode precision or
	if info.big_endian is different from WORDS_BIG_ENDIAN, info.limb_mode
	must be the same as info.abi_limb_mode.
	(layout_type): Use abi_limb_mode rather than limb_mode.
	* gimple-fold.cc (clear_padding_bitint_needs_padding_p): Likewise.
	(clear_padding_type): Likewise.
	* config/i386/i386.cc (ix86_bitint_type_info): Also set
	info->abi_limb_mode.
	* doc/tm.texi: Regenerated.

a98a3932

Dec 13, 2023

i386: Fix ICE on __builtin_ia32_pabsd128 without lhs [PR112962] · 02c30fda

Jakub Jelinek authored 1 year ago

The following patch fixes ICE on the testcase in similar way to how
other folded builtins are handled in ix86_gimple_fold_builtin when
they don't have a lhs; these builtins are const or pure, so normally
DCE would remove them later, but with -O0 that isn't guaranteed to
happen, and during expansion if they are marked TREE_SIDE_EFFECTS
it might still be attempted to be expanded.
This removes them right away during the folding.

Initially I wanted to also change all gsi_replace last args in that function
to true, but Andrew pointed to PR107209, so I've kept them as is.

2023-12-13  Jakub Jelinek  <jakub@redhat.com>

	PR target/112962
	* config/i386/i386.cc (ix86_gimple_fold_builtin): For shifts
	and abs without lhs replace with nop.

	* gcc.target/i386/pr112962.c: New test.

02c30fda

Dec 12, 2023

Don't assume it's AVX_U128_CLEAN after call_insn whose... · fc62716f

liuhongt authored 1 year ago

Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS.

If the function desn't clobber any sse registers or only clobber
128-bit part, then vzeroupper isn't issued before the function exit.
the status not CLEAN but ANY after the function.

Also for sibling_call, it's safe to issue an vzeroupper. Also there
could be missing vzeroupper since there's no mode_exit for
sibling_call_p.

gcc/ChangeLog:

	PR target/112891
	* config/i386/i386.cc (ix86_avx_u128_mode_after): Return
	AVX_U128_ANY if callee_abi doesn't clobber all_sse_regs to
	align with ix86_avx_u128_mode_needed.
	(ix86_avx_u128_mode_needed): Return AVX_U128_ClEAN for
	sibling_call.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr112891.c: New test.
	* gcc.target/i386/pr112891-2.c: New test.

fc62716f

Dec 07, 2023

[APX NDD] Support APX NDD for neg insn · 042519b6

Kong Lingling authored 1 year ago

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd
	parameter and adjust for NDD.
	* config/i386/i386-protos.h: Add use_ndd parameter for
	ix86_unary_operator_ok and ix86_expand_unary_operator.
	* config/i386/i386.cc (ix86_unary_operator_ok): Add use_ndd parameter
	and adjust for NDD.
	* config/i386/i386.md (neg<mode>2): Add new constraint for NDD and
	adjust output template.
	(*neg<mode>_1): Likewise.
	(*neg<dwi>2_doubleword): Likewise and adopt '&' to NDD dest.
	(*neg<mode>_2): Likewise.
	(*neg<mode>_ccc_1): Likewise.
	(*neg<mode>_ccc_2): Likewise.
	(*negsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
	to accept memory input for NDD alternatives.
	(*negsi_2_zext): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add neg test.

042519b6

[APX NDD] Disable seg_prefixed memory usage for NDD add · d564198f

Hongyu Wang authored 1 year ago

NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded using segment prefix.
Disable those *POFF constant usage in NDD add alternatives with new constraint.

gcc/ChangeLog:

	* config/i386/constraints.md (je): New constraint.
	* config/i386/i386-protos.h (x86_poff_operand_p): New function to
	check any *POFF constant in operand.
	* config/i386/i386.cc (x86_poff_operand_p): New prototype.
	* config/i386/i386.md (*add<mode>_1): Split out je alternative for add.

d564198f

Dec 05, 2023

Allow targets to add USEs to asms · 414d795d

Richard Sandiford authored 1 year ago

Arm's SME has an array called ZA that for inline asm purposes
is effectively a form of special-purpose memory.  It doesn't
have an associated storage type and so can't be passed and
returned in normal C/C++ objects.

We'd therefore like "za" in a clobber list to mean that an inline
asm can read from and write to ZA.  (Just reading or writing
individually is unlikely to be useful, but we could add syntax
for that too if necessary.)

There is currently a TARGET_MD_ASM_ADJUST target hook that allows
targets to add clobbers to an asm instruction.  This patch
extends that to allow targets to add USEs as well.

gcc/
	* target.def (md_asm_adjust): Add a uses parameter.
	* doc/tm.texi: Regenerate.
	* cfgexpand.cc (expand_asm_loc): Update call to md_asm_adjust.
	Handle any USEs created by the target.
	(expand_asm_stmt): Likewise.
	* recog.cc (asm_noperands): Handle asms with USEs.
	(decode_asm_operands): Likewise.
	* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add uses
	parameter.
	* config/arm/aarch-common.cc (arm_md_asm_adjust): Likewise.
	* config/arm/arm.cc (thumb1_md_asm_adjust): Likewise.
	* config/avr/avr.cc (avr_md_asm_adjust): Likewise.
	* config/cris/cris.cc (cris_md_asm_adjust): Likewise.
	* config/i386/i386.cc (ix86_md_asm_adjust): Likewise.
	* config/mn10300/mn10300.cc (mn10300_md_asm_adjust): Likewise.
	* config/nds32/nds32.cc (nds32_md_asm_adjust): Likewise.
	* config/pdp11/pdp11.cc (pdp11_md_asm_adjust): Likewise.
	* config/rs6000/rs6000.cc (rs6000_md_asm_adjust): Likewise.
	* config/s390/s390.cc (s390_md_asm_adjust): Likewise.
	* config/vax/vax.cc (vax_md_asm_adjust): Likewise.
	* config/visium/visium.cc (visium_md_asm_adjust): Likewise.

414d795d

Take register pressure into account for vec_construct/scalar_to_vec when the... · b1cb2d99

liuhongt authored 1 year ago

Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.

For vec_contruct, the components must be live at the same time if
they're not loaded from memory, when the number of those components
exceeds available registers, spill happens. Try to account that with a
rough estimation.
??? Ideally, we should have an overall estimation of register pressure
if we know the live range of all variables.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
	Count sse_reg/gpr_regs for components not loaded from memory.
	(ix86_vector_costs:ix86_vector_costs): New constructor.
	(ix86_vector_costs::m_num_gpr_needed[3]): New private memeber.
	(ix86_vector_costs::m_num_sse_needed[3]): Ditto.
	(ix86_vector_costs::finish_cost): Estimate overall register
	pressure cost.
	(ix86_vector_costs::ix86_vect_estimate_reg_pressure): New
	function.

b1cb2d99

Dec 04, 2023

i386: Fix rtl checking ICE in ix86_elim_entry_set_got [PR112837] · 4586d7d0

Jakub Jelinek authored 1 year ago

The following testcase ICEs with RTL checking, because it sets if
XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set)
is actually an UNSPEC, so any time we see any other insn with PARALLEL
and a SET in it which is not an UNSPEC we ICE during RTL checking or
access there some other union member as if it was an rt_int.
The rest is just small cleanup.

2023-12-04  Jakub Jelinek  <jakub@redhat.com>

	PR target/112837
	* config/i386/i386.cc (ix86_elim_entry_set_got): Before checking
	for UNSPEC_SET_GOT check that SET_SRC is UNSPEC.  Use SET_SRC and
	SET_DEST macros instead of XEXP, rename vec variable to set.

	* gcc.dg/pr112837.c: New test.

4586d7d0

Dec 02, 2023

Allow target attributes in non-gnu namespaces · 7fa24687

Richard Sandiford authored 1 year ago

Currently there are four static sources of attributes:

- LANG_HOOKS_ATTRIBUTE_TABLE
- LANG_HOOKS_COMMON_ATTRIBUTE_TABLE
- LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE
- TARGET_ATTRIBUTE_TABLE

All of the attributes in these tables go in the "gnu" namespace.
This means that they can use the traditional GNU __attribute__((...))
syntax and the standard [[gnu::...]] syntax.

Standard attributes are registered dynamically with a null namespace.
There are no supported attributes in other namespaces (clang, vendor
namespaces, etc.).

This patch tries to generalise things by making the namespace
part of the attribute specification.

It's usual for multiple attributes to be defined in the same namespace,
so rather than adding the namespace to each individual definition,
it seemed better to group attributes in the same namespace together.
This would also allow us to reuse the same table for clang attributes
that are written with the GNU syntax, or other similar situations
where the attribute can be accessed via multiple "spellings".

The patch therefore adds a scoped_attribute_specs that contains
a namespace and a list of attributes in that namespace.

It's still possible to have multiple scoped_attribute_specs
for the same namespace.  E.g. it makes sense to keep the
C++-specific, C/C++-common, and format-related attributes in
separate tables, even though they're all GNU attributes.

Current lists of attributes are terminated by a null name.
Rather than keep that for the new structure, it seemed neater
to use an array_slice.  This also makes the tables slighly more
compact.

In general, a target might want to support attributes in multiple
namespaces.  Rather than have a separate hook for each possibility
(like the three langhooks above), it seemed better to make
TARGET_ATTRIBUTE_TABLE a table of tables.  Specifically, it's
an array_slice of scoped_attribute_specs.

We can do the same thing for langhooks, which allows the three hooks
above to be merged into a single LANG_HOOKS_ATTRIBUTE_TABLE.
It also allows the standard attributes to be registered statically
and checked by the usual attribs.cc checks.

The patch adds a TARGET_GNU_ATTRIBUTES helper for the common case
in which a target wants a single table of gnu attributes.  It can
only be used if the table is free of preprocessor directives.

There are probably other things we need to do to make vendor namespaces
work smoothly.  E.g. in principle it would be good to make exclusion
sets namespace-aware.  But to some extent we have that with standard
vs. gnu attributes too.  This patch is just supposed to be a first step.

gcc/
	* attribs.h (scoped_attribute_specs): New structure.
	(register_scoped_attributes): Take a reference to a
	scoped_attribute_specs instead of separate namespace and array
	parameters.
	* plugin.h (register_scoped_attributes): Likewise.
	* attribs.cc (register_scoped_attributes): Likewise.
	(attribute_tables): Change into an array of scoped_attribute_specs
	pointers.  Reduce to 1 element for frontends and 1 element for targets.
	(empty_attribute_table): Delete.
	(check_attribute_tables): Update for changes to attribute_tables.
	Use a hash_set to identify duplicates.
	(handle_ignored_attributes_option): Update for above changes.
	(init_attributes): Likewise.
	(excl_pair): Delete.
	(test_attribute_exclusions): Update for above changes.  Don't
	enforce symmetry for standard attributes in the top-level namespace.
	* langhooks-def.h (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
	(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Likewise.
	(LANG_HOOKS_INITIALIZER): Update accordingly.
	(LANG_HOOKS_ATTRIBUTE_TABLE): Define to an empty constructor.
	* langhooks.h (lang_hooks::common_attribute_table): Delete.
	(lang_hooks::format_attribute_table): Likewise.
	(lang_hooks::attribute_table): Redefine to an array of
	scoped_attribute_specs pointers.
	* target-def.h (TARGET_GNU_ATTRIBUTES): New macro.
	* target.def (attribute_spec): Redefine to return an array of
	scoped_attribute_specs pointers.
	* tree-inline.cc (function_attribute_inlinable_p): Update accordingly.
	* doc/tm.texi: Regenerate.
	* config/aarch64/aarch64.cc (aarch64_attribute_table): Define using
	TARGET_GNU_ATTRIBUTES.
	* config/alpha/alpha.cc (vms_attribute_table): Likewise.
	* config/avr/avr.cc (avr_attribute_table): Likewise.
	* config/bfin/bfin.cc (bfin_attribute_table): Likewise.
	* config/bpf/bpf.cc (bpf_attribute_table): Likewise.
	* config/csky/csky.cc (csky_attribute_table): Likewise.
	* config/epiphany/epiphany.cc (epiphany_attribute_table): Likewise.
	* config/gcn/gcn.cc (gcn_attribute_table): Likewise.
	* config/h8300/h8300.cc (h8300_attribute_table): Likewise.
	* config/loongarch/loongarch.cc (loongarch_attribute_table): Likewise.
	* config/m32c/m32c.cc (m32c_attribute_table): Likewise.
	* config/m32r/m32r.cc (m32r_attribute_table): Likewise.
	* config/m68k/m68k.cc (m68k_attribute_table): Likewise.
	* config/mcore/mcore.cc (mcore_attribute_table): Likewise.
	* config/microblaze/microblaze.cc (microblaze_attribute_table):
	Likewise.
	* config/mips/mips.cc (mips_attribute_table): Likewise.
	* config/msp430/msp430.cc (msp430_attribute_table): Likewise.
	* config/nds32/nds32.cc (nds32_attribute_table): Likewise.
	* config/nvptx/nvptx.cc (nvptx_attribute_table): Likewise.
	* config/riscv/riscv.cc (riscv_attribute_table): Likewise.
	* config/rl78/rl78.cc (rl78_attribute_table): Likewise.
	* config/rx/rx.cc (rx_attribute_table): Likewise.
	* config/s390/s390.cc (s390_attribute_table): Likewise.
	* config/sh/sh.cc (sh_attribute_table): Likewise.
	* config/sparc/sparc.cc (sparc_attribute_table): Likewise.
	* config/stormy16/stormy16.cc (xstormy16_attribute_table): Likewise.
	* config/v850/v850.cc (v850_attribute_table): Likewise.
	* config/visium/visium.cc (visium_attribute_table): Likewise.
	* config/arc/arc.cc (arc_attribute_table): Likewise.  Move further
	down file.
	* config/arm/arm.cc (arm_attribute_table): Update for above changes,
	using...
	(arm_gnu_attributes, arm_gnu_attribute_table): ...these new globals.
	* config/i386/i386-options.h (ix86_attribute_table): Delete.
	(ix86_gnu_attribute_table): Declare.
	* config/i386/i386-options.cc (ix86_attribute_table): Replace with...
	(ix86_gnu_attributes, ix86_gnu_attribute_table): ...these two globals.
	* config/i386/i386.cc (ix86_attribute_table): Define as an array of
	scoped_attribute_specs pointers.
	* config/ia64/ia64.cc (ia64_attribute_table): Update for above changes,
	using...
	(ia64_gnu_attributes, ia64_gnu_attribute_table): ...these new globals.
	* config/rs6000/rs6000.cc (rs6000_attribute_table): Update for above
	changes, using...
	(rs6000_gnu_attributes, rs6000_gnu_attribute_table): ...these new
	globals.

gcc/ada/
	* gcc-interface/gigi.h (gnat_internal_attribute_table): Change
	type to scoped_attribute_specs.
	* gcc-interface/utils.cc (gnat_internal_attribute_table): Likewise,
	using...
	(gnat_internal_attributes): ...this as the underlying array.
	* gcc-interface/misc.cc (gnat_attribute_table): New global.
	(LANG_HOOKS_ATTRIBUTE_TABLE): Use it.

gcc/c-family/
	* c-common.h (c_common_attribute_table): Replace with...
	(c_common_gnu_attribute_table): ...this.
	(c_common_format_attribute_table): Change type to
	scoped_attribute_specs.
	* c-attribs.cc (c_common_attribute_table): Replace with...
	(c_common_gnu_attributes, c_common_gnu_attribute_table): ...these
	new globals.
	(c_common_format_attribute_table): Change type to
	scoped_attribute_specs, using...
	(c_common_format_attributes): ...this as the underlying array.

gcc/c/
	* c-tree.h (std_attribute_table): Declare.
	* c-decl.cc (std_attribute_table): Change type to
	scoped_attribute_specs, using...
	(std_attributes): ...this as the underlying array.
	(c_init_decl_processing): Remove call to register_scoped_attributes.
	* c-objc-common.h (c_objc_attribute_table): New global.
	(LANG_HOOKS_ATTRIBUTE_TABLE): Use it.
	(LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
	(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete.

gcc/cp/
	* cp-tree.h (cxx_attribute_table): Delete.
	(cxx_gnu_attribute_table, std_attribute_table): Declare.
	* cp-objcp-common.h (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
	(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete.
	(cp_objcp_attribute_table): New table.
	(LANG_HOOKS_ATTRIBUTE_TABLE): Redefine.
	* tree.cc (cxx_attribute_table): Replace with...
	(cxx_gnu_attributes, cxx_gnu_attribute_table): ...these globals.
	(std_attribute_table): Change type to scoped_attribute_specs, using...
	(std_attributes): ...this as the underlying array.
	(init_tree): Remove call to register_scoped_attributes.

gcc/d/
	* d-tree.h (d_langhook_attribute_table): Replace with...
	(d_langhook_gnu_attribute_table): ...this.
	(d_langhook_common_attribute_table): Change type to
	scoped_attribute_specs.
	* d-attribs.cc (d_langhook_common_attribute_table): Change type to
	scoped_attribute_specs, using...
	(d_langhook_common_attributes): ...this as the underlying array.
	(d_langhook_attribute_table): Replace with...
	(d_langhook_gnu_attributes, d_langhook_gnu_attribute_table): ...these
	new globals.
	(uda_attribute_p): Update accordingly, and update for new
	targetm.attribute_table type.
	* d-lang.cc (d_langhook_attribute_table): New global.
	(LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.

gcc/fortran/
	* f95-lang.cc: Include attribs.h.
	(gfc_attribute_table): Change to an array of scoped_attribute_specs
	pointers, using...
	(gfc_gnu_attributes, gfc_gnu_attribute_table): ...these new globals.

gcc/jit/
	* dummy-frontend.cc (jit_format_attribute_table): Change type to
	scoped_attribute_specs, using...
	(jit_format_attributes): ...this as the underlying array.
	(jit_attribute_table): Change to an array of scoped_attribute_specs
	pointers, using...
	(jit_gnu_attributes, jit_gnu_attribute_table): ...these new globals
	for the original array.  Include the format attributes.
	(LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
	(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete.
	(LANG_HOOKS_ATTRIBUTE_TABLE): Define.

gcc/lto/
	* lto-lang.cc (lto_format_attribute_table): Change type to
	scoped_attribute_specs, using...
	(lto_format_attributes): ...this as the underlying array.
	(lto_attribute_table): Change to an array of scoped_attribute_specs
	pointers, using...
	(lto_gnu_attributes, lto_gnu_attribute_table): ...these new globals
	for the original array.  Include the format attributes.
	(LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
	(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Delete.
	(LANG_HOOKS_ATTRIBUTE_TABLE): Define.

7fa24687

Nov 24, 2023

i386: Fix ICE with -fsplit-stack -mcmodel=large [PR112686] · 404ea4c1

Uros Bizjak authored 1 year ago

For -mcmodel=large, we have to load function address to a register.

	PR target/112686

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_expand_split_stack_prologue): Load
	function address to a register for ix86_cmodel == CM_LARGE.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr112686.c: New test.

404ea4c1

Nov 23, 2023

i386: Fix ICE with -mforce-indirect-call and -fsplit-stack [PR89316] · 2f3f8952

Uros Bizjak authored 1 year ago

With the above two options, use a temporary register regno (as returned
from split_stack_prologue_scratch_regno) as an indirect call scratch
register to hold __morestack function address.  On 64-bit targets, two
temporary registers are always available, so load the function addres in
%r11 and call __morestack_large_model with its one-argument-register value
rn %r10.  On 32-bit targets, bail out with a "sorry" if the temporary
register can not be obtained.

On 32-bit targets, also emit PIC sequence that re-uses the obtained indirect
call scratch register before moving the function address to it.  We can
not set up %ebx PIC register in this case, but __morestack is prepared
for this situation and sets it up by itself.

	PR target/89316

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_expand_split_stack_prologue): Obtain
	scratch regno when flag_force_indirect_call is set.  On 64-bit
	targets, call __morestack_large_model when  flag_force_indirect_call
	is set and on 32-bit targets with -fpic, manually expand PIC sequence
	to call __morestack.  Move the function address to an indirect
	call scratch register.

gcc/testsuite/ChangeLog:

	* g++.target/i386/pr89316.C: New test.
	* gcc.target/i386/pr112605-1.c: New test.
	* gcc.target/i386/pr112605-2.c: New test.
	* gcc.target/i386/pr112605.c: New test.

2f3f8952

Nov 21, 2023

[APX PPX] Support Intel APX PPX · 7ad308bd

Hongyu Wang authored 1 year ago

PPX stands for Push-Pop Acceleration. PUSH/PUSH2 and its corresponding POP
can be marked with a 1-bit hint to indicate that the POP reads the
value written by the PUSH from the stack. The processor tracks these marked
instructions internally and fast-forwards register data between
matching PUSH and POP instructions, without going through memory or
through the training loop of the Fast Store Forwarding Predictor (FSFP).
This feature can also be adopted to PUSH2/POP2.

For GCC, we emit explicit suffix 'p' (paired) to indicate the push/pop
pair are marked with PPX hint. To separate form original push/pop, we
add an UNSPEC on top of those PUSH/POP patterns.

In the first implementation we only emit them under prologue/epilogue
when saving/restoring callee-saved registers to make sure push/pop are
paired. So an extra flag was added to check if PPX insns can be emitted
for those register save/restore interfaces.

The PPX hint is purely a performance hint. If the 'p' suffix is not
emitted for paired push/pop, the PPX optimization will be disabled,
while program sematic will not be affected at all.

gcc/ChangeLog:

	* config/i386/i386-expand.h (gen_push): Add default bool
	parameter.
	(gen_pop): Likewise.
	* config/i386/i386-opts.h (enum apx_features): Add apx_ppx, add
	it to apx_all.
	* config/i386/i386.cc (ix86_emit_restore_reg_using_pop): Add
	ppx_p parameter for function declaration.
	(gen_push2): Add ppx_p parameter, emit push2p if ppx_p is true.
	(gen_push): Likewise.
	(ix86_emit_restore_reg_using_pop2): Likewise for pop2p.
	(ix86_emit_save_regs): Emit pushp/push2p under TARGET_APX_PPX.
	(ix86_emit_restore_reg_using_pop): Add ppx_p, emit popp insn
	and adjust cfi when ppx_p is ture.
	(ix86_emit_restore_reg_using_pop2): Add ppx_p and parse to its
	callee.
	(ix86_emit_restore_regs_using_pop2): Likewise.
	(ix86_expand_epilogue): Parse TARGET_APX_PPX to
	ix86_emit_restore_reg_using_pop.
	* config/i386/i386.h (TARGET_APX_PPX): New.
	* config/i386/i386.md (UNSPEC_APX_PPX): New unspec.
	(pushp_di): New define_insn.
	(popp_di): Likewise.
	(push2p_di): Likewise.
	(pop2p_di): Likewise.
	* config/i386/i386.opt: Add apx_ppx enum.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-interrupt-1.c: Adjust option to restrict them
	under certain subfeatures.
	* gcc.target/i386/apx-push2pop2-1.c: Likewise.
	* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
	* gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise.
	* gcc.target/i386/apx-ppx-1.c: New test.

7ad308bd

Nov 13, 2023

i386: Rewrite pushfl<mode>2 and popfl<mode>1 as unspecs · 10f12d32

Uros Bizjak authored 1 year ago

Flags reg is valid only with CC mode.

gcc/ChangeLog:

	* config/i386/i386-expand.h (gen_pushfl): New prototype.
	(gen_popfl): Ditto.
	* config/i386/i386-expand.cc (ix86_expand_builtin)
	[case IX86_BUILTIN_READ_FLAGS]: Use gen_pushfl.
	[case IX86_BUILTIN_WRITE_FLAGS]: Use gen_popfl.
	* config/i386/i386.cc (gen_pushfl): New function.
	(gen_popfl): Ditto.
	* config/i386/i386.md (unspec): Add UNSPEC_PUSHFL and UNSPEC_POPFL.
	(@pushfl<mode>2): Rename from *pushfl<mode>2.
	Rewrite as unspec using UNSPEC_PUSHFL.
	(@popfl<mode>1): Rename from *popfl<mode>1.
	Rewrite as unspec using UNSPEC_POPFL.

10f12d32

i386: Return CCmode from ix86_cc_mode for unknown RTX code [PR112494] · c75bab72

Uros Bizjak authored 1 year ago

Combine wants to combine following instructions into an insn that can
perform both an (arithmetic) operation and set the condition code.  During
the conversion a new RTX is created, and combine passes the RTX code of the
innermost RTX expression of the CC use insn in which CC reg is used to
SELECT_CC_MODE, to determine the new mode of the comparison:

Trying 5 -> 8:
    5: r98:DI=0xd7
    8: flags:CCZ=cmp(r98:DI,0)
      REG_EQUAL cmp(0xd7,0)
Failed to match this instruction:
(parallel [
        (set (reg:CC 17 flags)
            (compare:CC (const_int 215 [0xd7])
                (const_int 0 [0])))
        (set (reg/v:DI 98 [ flags ])
            (const_int 215 [0xd7]))
    ])

where:

(insn 5 2 6 2 (set (reg/v:DI 98 [ flags ])
        (const_int 215 [0xd7])) "pr112494.c":8:8 84 {*movdi_internal}
     (nil))

(insn 8 7 11 2 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg/v:DI 98 [ flags ])
            (const_int 0 [0]))) "pr112494.c":11:9 8 {*cmpdi_ccno_1}
     (expr_list:REG_EQUAL (compare:CCZ (const_int 215 [0xd7])
            (const_int 0 [0]))
        (nil)))

x86_cc_mode (AKA SELECT_CC_MODE) is not prepared to handle random RTX
codes and triggers gcc_unreachable() when SET RTX code is passed to it.
The patch removes gcc_unreachable() and returns CCmode for unknown
RTX codes, so combine can try various combinations involving CC reg
without triggering ICE.

Please note that x86 MOV instructions do not set flags, so the above
combination is not recognized as a valid x86 instruction.

	PR target/112494

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_cc_mode) [default]: Return CCmode.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr112494.c: New test.

c75bab72

Nov 11, 2023

mode-switching: Pass the set of live registers to the after hook · 93d65f39

Richard Sandiford authored 1 year ago

This patch passes the set of live hard registers to the after hook,
like the previous one did for the needed hook.

gcc/
	* target.def (mode_switching.after): Add a regs_live parameter.
	* doc/tm.texi: Regenerate.
	* config/epiphany/epiphany-protos.h (epiphany_mode_after): Update
	accordingly.
	* config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise.
	(epiphany_mode_after): Likewise.
	* config/i386/i386.cc (ix86_mode_after): Likewise.
	* config/riscv/riscv.cc (riscv_mode_after): Likewise.
	* config/sh/sh.cc (sh_mode_after): Likewise.
	* mode-switching.cc (optimize_mode_switching): Likewise.

93d65f39

mode-switching: Pass set of live registers to the needed hook · 29d3e189

Richard Sandiford authored 1 year ago

The emit hook already takes the set of live hard registers as input.
This patch passes it to the needed hook too.  SME uses this to
optimise the mode choice based on whether state is live or dead.

The main caller already had access to the required info, but the
special handling of return values did not.

gcc/
	* target.def (mode_switching.needed): Add a regs_live parameter.
	* doc/tm.texi: Regenerate.
	* config/epiphany/epiphany-protos.h (epiphany_mode_needed): Update
	accordingly.
	* config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise.
	* config/epiphany/mode-switch-use.cc (insert_uses): Likewise.
	* config/i386/i386.cc (ix86_mode_needed): Likewise.
	* config/riscv/riscv.cc (riscv_mode_needed): Likewise.
	* config/sh/sh.cc (sh_mode_needed): Likewise.
	* mode-switching.cc (optimize_mode_switching): Likewise.
	(create_pre_exit): Likewise, using the DF simulate functions
	to calculate the required information.

29d3e189

Nov 09, 2023

i386 PIE: accept @GOTOFF in load/store multi base address · 38b396d6

Alexandre Oliva authored 1 year ago

Looking at the code generated for sse2-{load,store}-multi.c with PIE,
I realized we could use UNSPEC_GOTOFF as a base address, and that this
would enable the test to use the vector insns expected by the tests
even with PIC, so I extended the base + offset logic used by the SSE2
multi-load/store peepholes to accept reg + symbolic base + offset too,
so that the test generated the expected insns even with PIE.


for  gcc/ChangeLog

	* config/i386/i386.cc (symbolic_base_address_p,
	base_address_p): New, factored out from...
	(extract_base_offset_in_addr): ... here and extended to
	recognize REG+GOTOFF, as in gcc.target/i386/sse2-load-multi.c
	and sse2-store-multi.c with PIE enabled by default.

38b396d6

Nov 06, 2023

i386: Use "addr" attribute to limit address regclass to non-REX regs · ecd755a9

Uros Bizjak authored 1 year ago

Use "addr" attribute with "gpr8" value to limit address register class
to non-REX registers in instructions with high registers, where REX
registers can not be used in the address.

gcc/ChangeLog:

	* config/i386/constraints.md (Bc): Remove constraint.
	(Bn): Rewrite to use x86_extended_reg_mentioned_p predicate.
	* config/i386/i386.cc (ix86_memory_address_reg_class):
	Do not limit processing to TARGET_APX_EGPR.  Exit early for
	NULL insn.  Do not check recog_data.insn before calling
	extract_insn_cached.
	(ix86_insn_base_reg_class): Handle ADDR_GPR8.
	(ix86_regno_ok_for_insn_base_p): Ditto.
	(ix86_insn_index_reg_class): Ditto.
	* config/i386/i386.md (*cmpqi_ext<mode>_1_mem_rex64):
	Remove insn pattern and corresponding peephole2 pattern.
	(*cmpi_ext<mode>_1): Remove (m,Q) alternative.
	Change (QBc,Q) alternative to (QBn,Q).  Add "addr" attribute.
	(*cmpqi_ext<mode>_3_mem_rex64): Remove insn pattern
	and corresponding peephole2 pattern.
	(*cmpi_ext<mode>_3): Remove (Q,m) alternative.
	Change (Q,QnBc) alternative to (Q,QnBn).  Add "addr" attribute.
	(*extzvqi_mem_rex64): Remove insn pattern and
	corresponding peephole2 pattern.
	(*extzvqi): Remove (Q,m) alternative.  Change (Q,QnBc)
	alternative to (Q,QnBn).  Add "addr" attribute.
	(*insvqi_1_mem_rex64): Remove insn pattern and
	corresponding peephole2 pattern.
	(*insvqi_1): Remove (Q,m) alternative.  Change (Q,QnBc)
	alternative to (Q,QnBn).  Add "addr" attribute.
	(@insv<mode>_1): Ditto.
	(*addqi_ext<mode>_0): Remove (m,0,Q) alternative.  Change (QBc,0,Q)
	alternative to (QBn,0,Q).  Add "addr" attribute.
	(*subqi_ext<mode>_0): Ditto.
	(*andqi_ext<mode>_0): Ditto.
	(*<any_or:code>qi_ext<mode>_0): Ditto.
	(*addqi_ext<mode>_1): Remove (Q,0,m) alternative.  Change (Q,0,QnBc)
	alternative to (Q,0,QnBn).  Add "addr" attribute.
	(*andqi_ext<mode>_1): Ditto.
	(*andqi_ext<mode>_1_cc): Ditto.
	(*<any_or:code>qi_ext<mode>_1): Ditto.
	(*xorqi_ext<mode>_1_cc): Ditto.
	* config/i386/predicates.md (nonimm_x64constmem_operand):
	Remove predicate.
	(general_x64constmem_operand): Ditto.
	(norex_memory_operand): Ditto.

ecd755a9

Nov 03, 2023

i386: Handle multiple address register classes · 751fc7bc

Uros Bizjak authored 1 year ago

The patch generalizes address register class handling to allow multiple
register classes.  For APX EGPR targets, some instructions do not support
GPR32 registers, so it is necessary to limit address register set to
avoid them.  The same situation happens for instructions with high registers,
where REX registers can not be used in the address, so the existing
infrastructure can be adapted to also handle this case.

The patch is mostly a mechanical rename of "gpr32" attribute to "addr" and
introduces no functional changes, although it fixes a couple of inconsistent
attribute values in passing.

A follow-up patch will use the above infrastructure to limit address register
class to legacy registers for instructions with high registers.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
	Rename to ...
	(ix86_memory_address_reg_class): ... this.  Generalize address
	register class handling to allow multiple address register classes.
	Return maximal class for unrecognized instructions.  Improve comments.
	(ix86_insn_base_reg_class): Rewrite to handle
	multiple address register classes.
	(ix86_regno_ok_for_insn_base_p): Ditto.
	(ix86_insn_index_reg_class): Ditto.
	* config/i386/i386.md: Rename "gpr32" attribute to "addr"
	and substitute its values with "0" -> "gpr16", "1" -> "*".
	(addr): New attribute to limit allowed address register set.
	(gpr32): Remove.
	* config/i386/mmx.md: Rename "gpr32" attribute to "addr"
	and substitute its values with "0" -> "gpr16", "1" -> "*".
	* config/i386/sse.md: Ditto.

751fc7bc

Oct 27, 2023

Support vec_cmpmn/vcondmn for v2hf/v4hf. · 7eed861e

liuhongt authored 1 year ago

gcc/ChangeLog:

	PR target/103861
	* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle
	V2HF/V2BF/V4HF/V4BFmode.
	* config/i386/i386.cc (ix86_get_mask_mode): Return QImode when
	data_mode is V4HF/V2HFmode.
	* config/i386/mmx.md (vec_cmpv4hfqi): New expander.
	(vcond_mask_<mode>v4hi): Ditto.
	(vcond_mask_<mode>qi): Ditto.
	(vec_cmpv2hfqi): Ditto.
	(vcond_mask_<mode>v2hi): Ditto.
	(mmx_plendvb_<mode>): Add 2 combine splitters after the
	patterns.
	(mmx_pblendvb_v8qi): Ditto.
	(<code>v2hi3): Add a combine splitter after the pattern.
	(<code><mode>3): Ditto.
	(<code>v8qi3): Ditto.
	(<code><mode>3): Ditto.
	* config/i386/sse.md (vcond<mode><mode>): Merge this with ..
	(vcond<sseintvecmodelower><mode>): .. this into ..
	(vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): .. this,
	and extend to V8BF/V16BF/V32BFmode.

gcc/testsuite/ChangeLog:

	* g++.target/i386/part-vect-vcondhf.C: New test.
	* gcc.target/i386/part-vect-vec_cmphf.c: New test.

7eed861e

Oct 23, 2023

i386: Prevent splitting to xmm16+ when !TARGET_AVX512VL · 1df490ed

Haochen Jiang authored 1 year ago

Currently, there will be a chance in split to use x/ymm16+ w/o AVX512VL,
which finally leads to an ICE as pr111753 does.

This patch aims to fix that.

gcc/ChangeLog:

	PR target/111753
	* config/i386/i386.cc (ix86_standard_x87sse_constant_load_p):
	Do not split to xmm16+ when !TARGET_AVX512VL.

gcc/testsuite/ChangeLog:

	PR target/111753
	* gcc.target/i386/pr111753.c: New test.

1df490ed

Oct 22, 2023

target: Support heap-based trampolines · cbf6da16

Andrew Burgess authored 1 year ago


Enable -ftrampoline-impl=heap by default if we are on macOS 11
or later.

Co-Authored-By: Maxim Blinov <maxim.blinov@embecosm.com>
Co-Authored-By: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* config.gcc: Default to heap trampolines on macOS 11 and above.
	* config/i386/darwin.h: Define X86_CUSTOM_FUNCTION_TEST.
	* config/i386/i386.h: Define X86_CUSTOM_FUNCTION_TEST.
	* config/i386/i386.cc: Use X86_CUSTOM_FUNCTION_TEST.

cbf6da16

Oct 16, 2023

i386: Allow -mlarge-data-threshold with -mcmodel=large · 1a64156c

Uros Bizjak authored 1 year ago

From: Fangrui Song <maskray@google.com>

When using -mcmodel=medium, large data objects larger than the
-mlarge-data-threshold threshold are placed into large data sections
(.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
.l* sections into separate output sections.  If small and medium code
model object files are mixed, the .l* sections won't exert relocation
overflow pressure on sections in object files built with -mcmodel=small.

However, when using -mcmodel=large, -mlarge-data-threshold doesn't
apply.  This means that the .rodata/.data/.bss sections may exert
relocation overflow pressure on sections in -mcmodel=small object files.

This patch allows -mcmodel=large to generate .l* sections and drops an
unneeded documentation restriction that the value must be the same.

Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU


("Large data sections for the large code model")

Signed-off-by: Fangrui Song <maskray@google.com>

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_can_inline_p):
	Handle CM_LARGE and CM_LARGE_PIC.
	(x86_elf_aligned_decl_common): Ditto.
	(x86_output_aligned_bss): Ditto.
	* config/i386/i386.opt: Update doc for -mlarge-data-threshold=.
	* doc/invoke.texi: Update doc for -mlarge-data-threshold=.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/large-data.c: New test.

1a64156c

Oct 12, 2023

[APX] Support Intel APX PUSH2POP2 · 180b08f6

Mo, Zewei authored 2 years ago


This feature requires stack to be aligned at 16byte, therefore in
prologue/epilogue, a standalone push/pop will be emitted before any
push2/pop2 if the stack was not aligned to 16byte.
Also for current implementation we only support push2/pop2 usage in
function prologue/epilogue for those callee-saved registers.

gcc/ChangeLog:

	* config/i386/i386.cc (gen_push2): New function to emit push2
	and adjust cfa offset.
	(ix86_pro_and_epilogue_can_use_push2_pop2): New function to
	determine whether push2/pop2 can be used.
	(ix86_compute_frame_layout): Adjust preferred stack boundary
	and stack alignment needed for push2/pop2.
	(ix86_emit_save_regs): Emit push2 when available.
	(ix86_emit_restore_reg_using_pop2): New function to emit pop2
	and adjust cfa info.
	(ix86_emit_restore_regs_using_pop2): New function to loop
	through the saved regs and call above.
	(ix86_expand_epilogue): Call ix86_emit_restore_regs_using_pop2
	when push2pop2 available.
	* config/i386/i386.md (push2_di): New pattern for push2.
	(pop2_di): Likewise for pop2.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-push2pop2-1.c: New test.
	* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
	* gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise.

Co-authored-by: Hu Lin1 <lin1.hu@intel.com>
Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>

180b08f6

Oct 09, 2023

Support -mevex512 for AVX512BW intrins · 8e79b1b4

Haochen Jiang authored 1 year ago

gcc/Changelog:

	* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
	Make sure there is EVEX512 enabled.
	(ix86_expand_vecop_qihi2): Refuse V32QI->V32HI when no EVEX512.
	* config/i386/i386.cc (ix86_hard_regno_mode_ok): Disable 64 bit mask
	when !TARGET_EVEX512.
	* config/i386/i386.md (avx512bw_512): New.
	(SWI1248_AVX512BWDQ_64): Add TARGET_EVEX512.
	(*zero_extendsidi2): Change isa to avx512bw_512.
	(kmov_isa): Ditto.
	(*anddi_1): Ditto.
	(*andn<mode>_1): Change isa to kmov_isa.
	(*<code><mode>_1): Ditto.
	(*notxor<mode>_1): Ditto.
	(*one_cmpl<mode>2_1): Ditto.
	(*one_cmplsi2_1_zext): Change isa to avx512bw_512.
	(*ashl<mode>3_1): Change isa to kmov_isa.
	(*lshr<mode>3_1): Ditto.
	* config/i386/sse.md (VI12HFBF_AVX512VL): Add TARGET_EVEX512.
	(VI1248_AVX512VLBW): Ditto.
	(VHFBF_AVX512VL): Ditto.
	(VI): Ditto.
	(VIHFBF): Ditto.
	(VI_AVX2): Ditto.
	(VI1_AVX512): Ditto.
	(VI12_256_512_AVX512VL): Ditto.
	(VI2_AVX2_AVX512BW): Ditto.
	(VI2_AVX512VNNIBW): Ditto.
	(VI2_AVX512VL): Ditto.
	(VI2HFBF_AVX512VL): Ditto.
	(VI8_AVX2_AVX512BW): Ditto.
	(VIMAX_AVX2_AVX512BW): Ditto.
	(VIMAX_AVX512VL): Ditto.
	(VI12_AVX2_AVX512BW): Ditto.
	(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
	(VI248_AVX512VL): Ditto.
	(VI248_AVX512VLBW): Ditto.
	(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
	(VI248_AVX512BW): Ditto.
	(VI248_AVX512BW_AVX512VL): Ditto.
	(VI248_512): Ditto.
	(VI124_256_AVX512F_AVX512BW): Ditto.
	(VI_AVX512BW): Ditto.
	(VIHFBF_AVX512BW): Ditto.
	(SWI1248_AVX512BWDQ): Ditto.
	(SWI1248_AVX512BW): Ditto.
	(SWI1248_AVX512BWDQ2): Ditto.
	(*knotsi_1_zext): Ditto.
	(define_split for zero_extend + not): Ditto.
	(kunpckdi): Ditto.
	(REDUC_SMINMAX_MODE): Ditto.
	(VEC_EXTRACT_MODE): Ditto.
	(*avx512bw_permvar_truncv16siv16hi_1): Ditto.
	(*avx512bw_permvar_truncv16siv16hi_1_hf): Ditto.
	(truncv32hiv32qi2): Ditto.
	(avx512bw_<code>v32hiv32qi2): Ditto.
	(avx512bw_<code>v32hiv32qi2_mask): Ditto.
	(avx512bw_<code>v32hiv32qi2_mask_store): Ditto.
	(usadv64qi): Ditto.
	(VEC_PERM_AVX2): Ditto.
	(AVX512ZEXTMASK): Ditto.
	(SWI24_MASK): New.
	(vec_pack_trunc_<mode>): Change iterator to SWI24_MASK.
	(avx512bw_packsswb<mask_name>): Add TARGET_EVEX512.
	(avx512bw_packssdw<mask_name>): Ditto.
	(avx512bw_interleave_highv64qi<mask_name>): Ditto.
	(avx512bw_interleave_lowv64qi<mask_name>): Ditto.
	(<mask_codefor>avx512bw_pshuflwv32hi<mask_name>): Ditto.
	(<mask_codefor>avx512bw_pshufhwv32hi<mask_name>): Ditto.
	(vec_unpacks_lo_di): Ditto.
	(SWI48x_MASK): New.
	(vec_unpacks_hi_<mode>): Change iterator to SWI48x_MASK.
	(avx512bw_umulhrswv32hi3<mask_name>): Add TARGET_EVEX512.
	(VI1248_AVX512VL_AVX512BW): Ditto.
	(avx512bw_<code>v32qiv32hi2<mask_name>): Ditto.
	(*avx512bw_zero_extendv32qiv32hi2_1): Ditto.
	(*avx512bw_zero_extendv32qiv32hi2_2): Ditto.
	(<insn>v32qiv32hi2): Ditto.
	(pbroadcast_evex_isa): Change isa attribute to avx512bw_512.
	(VPERMI2): Add TARGET_EVEX512.
	(VPERMI2I): Ditto.

8e79b1b4

Support -mevex512 for AVX512DQ intrins · 1b248907

Haochen Jiang authored 1 year ago

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_expand_sse2_mulvxdi3):
	Add TARGET_EVEX512 for 512 bit usage.
	* config/i386/i386.cc (standard_sse_constant_opcode): Ditto.
	* config/i386/sse.md (VF1_VF2_AVX512DQ): Ditto.
	(VF1_128_256VL): Ditto.
	(VF2_AVX512VL): Ditto.
	(VI8_256_512): Ditto.
	(<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>):
	Ditto.
	(AVX512_VEC): Ditto.
	(AVX512_VEC_2): Ditto.
	(VI4F_BRCST32x2): Ditto.
	(VI8F_BRCST64x2): Ditto.

1b248907

Support -mevex512 for AVX512F intrins · c1eef66b

Haochen Jiang authored 1 year ago

gcc/ChangeLog:

	* config/i386/i386-builtins.cc
	(ix86_vectorize_builtin_gather): Disable 512 bit gather
	when !TARGET_EVEX512.
	* config/i386/i386-expand.cc (ix86_valid_mask_cmp_mode):
	Add TARGET_EVEX512.
	(ix86_expand_int_sse_cmp): Ditto.
	(ix86_expand_vector_init_one_nonzero): Disable subroutine
	when !TARGET_EVEX512.
	(ix86_emit_swsqrtsf): Add TARGET_EVEX512.
	(ix86_vectorize_vec_perm_const): Disable subroutine when
	!TARGET_EVEX512.
	* config/i386/i386.cc
	(standard_sse_constant_p): Add TARGET_EVEX512.
	(standard_sse_constant_opcode): Ditto.
	(ix86_get_ssemov): Ditto.
	(ix86_legitimate_constant_p): Ditto.
	(ix86_vectorize_builtin_scatter): Diable 512 bit scatter
	when !TARGET_EVEX512.
	* config/i386/i386.md (avx512f_512): New.
	(movxi): Add TARGET_EVEX512.
	(*movxi_internal_avx512f): Ditto.
	(*movdi_internal): Change alternative 12 to ?Yv. Adjust mode
	for alternative 13.
	(*movsi_internal): Change alternative 8 to ?Yv. Adjust mode for
	alternative 9.
	(*movhi_internal): Change alternative 11 to *Yv.
	(*movdf_internal): Change alternative 12 to Yv.
	(*movsf_internal): Change alternative 5 to Yv. Adjust mode for
	alternative 5 and 6.
	(*mov<mode>_internal): Change alternative 4 to Yv.
	(define_split for convert SF to DF): Add TARGET_EVEX512.
	(extendbfsf2_1): Ditto.
	* config/i386/predicates.md (bcst_mem_operand): Disable predicate
	for 512 bit when !TARGET_EVEX512.
	* config/i386/sse.md (VMOVE): Add TARGET_EVEX512.
	(V48_AVX512VL): Ditto.
	(V48_256_512_AVX512VL): Ditto.
	(V48H_AVX512VL): Ditto.
	(VI12_AVX512VL): Ditto.
	(V): Ditto.
	(V_512): Ditto.
	(V_256_512): Ditto.
	(VF): Ditto.
	(VF1_VF2_AVX512DQ): Ditto.
	(VFH): Ditto.
	(VFB): Ditto.
	(VF1): Ditto.
	(VF1_AVX2): Ditto.
	(VF2): Ditto.
	(VF2H): Ditto.
	(VF2_512_256): Ditto.
	(VF2_512_256VL): Ditto.
	(VF_512): Ditto.
	(VFB_512): Ditto.
	(VI48_AVX512VL): Ditto.
	(VI1248_AVX512VLBW): Ditto.
	(VF_AVX512VL): Ditto.
	(VFH_AVX512VL): Ditto.
	(VF1_AVX512VL): Ditto.
	(VI): Ditto.
	(VIHFBF): Ditto.
	(VI_AVX2): Ditto.
	(VI8): Ditto.
	(VI8_AVX512VL): Ditto.
	(VI2_AVX512F): Ditto.
	(VI4_AVX512F): Ditto.
	(VI4_AVX512VL): Ditto.
	(VI48_AVX512F_AVX512VL): Ditto.
	(VI8_AVX2_AVX512F): Ditto.
	(VI8_AVX_AVX512F): Ditto.
	(V8FI): Ditto.
	(V16FI): Ditto.
	(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
	(VI248_AVX512VLBW): Ditto.
	(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
	(VI248_AVX512BW): Ditto.
	(VI248_AVX512BW_AVX512VL): Ditto.
	(VI48_AVX512F): Ditto.
	(VI48_AVX_AVX512F): Ditto.
	(VI12_AVX_AVX512F): Ditto.
	(VI148_512): Ditto.
	(VI124_256_AVX512F_AVX512BW): Ditto.
	(VI48_512): Ditto.
	(VI_AVX512BW): Ditto.
	(VIHFBF_AVX512BW): Ditto.
	(VI4F_256_512): Ditto.
	(VI48F_256_512): Ditto.
	(VI48F): Ditto.
	(VI12_VI48F_AVX512VL): Ditto.
	(V32_512): Ditto.
	(AVX512MODE2P): Ditto.
	(STORENT_MODE): Ditto.
	(REDUC_PLUS_MODE): Ditto.
	(REDUC_SMINMAX_MODE): Ditto.
	(*andnot<mode>3): Change isa attribute to avx512f_512.
	(*andnot<mode>3): Ditto.
	(<code><mode>3): Ditto.
	(<code>tf3): Ditto.
	(FMAMODEM): Add TARGET_EVEX512.
	(FMAMODE_AVX512): Ditto.
	(VFH_SF_AVX512VL): Ditto.
	(avx512f_fix_notruncv16sfv16si<mask_name><round_name>): Ditto.
	(fix<fixunssuffix>_truncv16sfv16si2<mask_name><round_saeonly_name>):
	Ditto.
	(avx512f_cvtdq2pd512_2): Ditto.
	(avx512f_cvtpd2dq512<mask_name><round_name>): Ditto.
	(fix<fixunssuffix>_truncv8dfv8si2<mask_name><round_saeonly_name>):
	Ditto.
	(<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>): Ditto.
	(vec_unpacks_lo_v16sf): Ditto.
	(vec_unpacks_hi_v16sf): Ditto.
	(vec_unpacks_float_hi_v16si): Ditto.
	(vec_unpacks_float_lo_v16si): Ditto.
	(vec_unpacku_float_hi_v16si): Ditto.
	(vec_unpacku_float_lo_v16si): Ditto.
	(vec_pack_sfix_trunc_v8df): Ditto.
	(avx512f_vec_pack_sfix_v8df): Ditto.
	(<mask_codefor>avx512f_unpckhps512<mask_name>): Ditto.
	(<mask_codefor>avx512f_unpcklps512<mask_name>): Ditto.
	(<mask_codefor>avx512f_movshdup512<mask_name>): Ditto.
	(<mask_codefor>avx512f_movsldup512<mask_name>): Ditto.
	(AVX512_VEC): Ditto.
	(AVX512_VEC_2): Ditto.
	(vec_extract_lo_v64qi): Ditto.
	(vec_extract_hi_v64qi): Ditto.
	(VEC_EXTRACT_MODE): Ditto.
	(<mask_codefor>avx512f_unpckhpd512<mask_name>): Ditto.
	(avx512f_movddup512<mask_name>): Ditto.
	(avx512f_unpcklpd512<mask_name>): Ditto.
	(*<avx512>_vternlog<mode>_all): Ditto.
	(*<avx512>_vpternlog<mode>_1): Ditto.
	(*<avx512>_vpternlog<mode>_2): Ditto.
	(*<avx512>_vpternlog<mode>_3): Ditto.
	(avx512f_shufps512_mask): Ditto.
	(avx512f_shufps512_1<mask_name>): Ditto.
	(avx512f_shufpd512_mask): Ditto.
	(avx512f_shufpd512_1<mask_name>): Ditto.
	(<mask_codefor>avx512f_interleave_highv8di<mask_name>): Ditto.
	(<mask_codefor>avx512f_interleave_lowv8di<mask_name>): Ditto.
	(vec_dupv2df<mask_name>): Ditto.
	(trunc<pmov_src_lower><mode>2): Ditto.
	(*avx512f_<code><pmov_src_lower><mode>2): Ditto.
	(*avx512f_vpermvar_truncv8div8si_1): Ditto.
	(avx512f_<code><pmov_src_lower><mode>2_mask): Ditto.
	(avx512f_<code><pmov_src_lower><mode>2_mask_store): Ditto.
	(truncv8div8qi2): Ditto.
	(avx512f_<code>v8div16qi2): Ditto.
	(*avx512f_<code>v8div16qi2_store_1): Ditto.
	(*avx512f_<code>v8div16qi2_store_2): Ditto.
	(avx512f_<code>v8div16qi2_mask): Ditto.
	(*avx512f_<code>v8div16qi2_mask_1): Ditto.
	(*avx512f_<code>v8div16qi2_mask_store_1): Ditto.
	(avx512f_<code>v8div16qi2_mask_store_2): Ditto.
	(vec_widen_umult_even_v16si<mask_name>): Ditto.
	(*vec_widen_umult_even_v16si<mask_name>): Ditto.
	(vec_widen_smult_even_v16si<mask_name>): Ditto.
	(*vec_widen_smult_even_v16si<mask_name>): Ditto.
	(VEC_PERM_AVX2): Ditto.
	(one_cmpl<mode>2): Ditto.
	(<mask_codefor>one_cmpl<mode>2<mask_name>): Ditto.
	(*one_cmpl<mode>2_pternlog_false_dep): Ditto.
	(define_split to xor): Ditto.
	(*andnot<mode>3): Ditto.
	(define_split for ior): Ditto.
	(*iornot<mode>3): Ditto.
	(*xnor<mode>3): Ditto.
	(*<nlogic><mode>3): Ditto.
	(<mask_codefor>avx512f_interleave_highv16si<mask_name>): Ditto.
	(<mask_codefor>avx512f_interleave_lowv16si<mask_name>): Ditto.
	(avx512f_pshufdv3_mask): Ditto.
	(avx512f_pshufd_1<mask_name>): Ditto.
	(*vec_extractv4ti): Ditto.
	(VEXTRACTI128_MODE): Ditto.
	(define_split to vec_extract): Ditto.
	(VI1248_AVX512VL_AVX512BW): Ditto.
	(<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>): Ditto.
	(<insn>v16qiv16si2): Ditto.
	(avx512f_<code>v16hiv16si2<mask_name>): Ditto.
	(<insn>v16hiv16si2): Ditto.
	(avx512f_zero_extendv16hiv16si2_1): Ditto.
	(avx512f_<code>v8qiv8di2<mask_name>): Ditto.
	(*avx512f_<code>v8qiv8di2<mask_name>_1): Ditto.
	(*avx512f_<code>v8qiv8di2<mask_name>_2): Ditto.
	(<insn>v8qiv8di2): Ditto.
	(avx512f_<code>v8hiv8di2<mask_name>): Ditto.
	(<insn>v8hiv8di2): Ditto.
	(avx512f_<code>v8siv8di2<mask_name>): Ditto.
	(*avx512f_zero_extendv8siv8di2_1): Ditto.
	(*avx512f_zero_extendv8siv8di2_2): Ditto.
	(<insn>v8siv8di2): Ditto.
	(avx512f_roundps512_sfix): Ditto.
	(vashrv8di3): Ditto.
	(vashrv16si3): Ditto.
	(pbroadcast_evex_isa): Change isa attribute to avx512f_512.
	(vec_dupv4sf): Add TARGET_EVEX512.
	(*vec_dupv4si): Ditto.
	(*vec_dupv2di): Ditto.
	(vec_dup<mode>): Change isa attribute to avx512f_512.
	(VPERMI2): Add TARGET_EVEX512.
	(VPERMI2I): Ditto.
	(VEC_INIT_MODE): Ditto.
	(VEC_INIT_HALF_MODE): Ditto.
	(<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>):
	Ditto.
	(avx512f_vcvtps2ph512_mask_sae): Ditto.
	(<mask_codefor>avx512f_vcvtps2ph512<mask_name><round_saeonly_name>):
	Ditto.
	(*avx512f_vcvtps2ph512<merge_mask_name>): Ditto.
	(INT_BROADCAST_MODE): Ditto.

c1eef66b

Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 · aa9bce39

Haochen Jiang authored 1 year ago

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
	Disable zmm broadcast for !TARGET_EVEX512.
	* config/i386/i386-options.cc (ix86_option_override_internal):
	Do not use PVW_512 when no-evex512.
	(ix86_simd_clone_adjust): Add evex512 target into string.
	* config/i386/i386.cc (type_natural_mode): Report ABI warning
	when using zmm register w/o evex512.
	(ix86_return_in_memory): Do not allow zmm when !TARGET_EVEX512.
	(ix86_hard_regno_mode_ok): Ditto.
	(ix86_set_reg_reg_cost): Ditto.
	(ix86_rtx_costs): Ditto.
	(ix86_vector_mode_supported_p): Ditto.
	(ix86_preferred_simd_mode): Ditto.
	(ix86_get_mask_mode): Ditto.
	(ix86_simd_clone_compute_vecsize_and_simdlen): Disable 512 bit
	libmvec call when !TARGET_EVEX512.
	(ix86_simd_clone_usable): Ditto.
	* config/i386/i386.h (BIGGEST_ALIGNMENT): Disable 512 alignment
	when !TARGET_EVEX512
	(MOVE_MAX): Do not use PVW_512 when !TARGET_EVEX512.
	(STORE_MAX_PIECES): Ditto.

aa9bce39

Oct 08, 2023

Support signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2HF/V4HF. · b4fc1abb

liuhongt authored 1 year ago

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_build_const_vector): Handle V2HF
	and V4HFmode.
	(ix86_build_signbit_mask): Ditto.
	* config/i386/mmx.md (mmxintvecmode): Ditto.
	(<code><mode>2): New define_expand.
	(*mmx_<code><mode>): New define_insn_and_split.
	(*mmx_nabs<mode>2): Ditto.
	(*mmx_andnot<mode>3): New define_insn.
	(<code><mode>3): Ditto.
	(copysign<mode>3): New define_expand.
	(xorsign<mode>3): Ditto.
	(signbit<mode>2): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/part-vect-absneghf.c: New test.
	* gcc.target/i386/part-vect-copysignhf.c: New test.
	* gcc.target/i386/part-vect-xorsignhf.c: New test.

b4fc1abb

Oct 07, 2023

[APX EGPR] Handle legacy insns that only support GPR16 (3/5) · 1328bb72

Kong Lingling authored 2 years ago


Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
but no evex counterpart.

insn list:
1. phminposuw/vphminposuw
2. ptest/vptest
3. roundps/vroundps, roundpd/vroundpd,
   roundss/vroundss, roundsd/vroundsd
4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist

gcc/ChangeLog:

	* config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
	prototype.
	* config/i386/i386.cc (x86_evex_reg_mentioned_p): New
	function.
	* config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
	and constraint jm to all non-evex alternatives, adjust
	alternative outputs if evex reg is mentioned.
	* config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
	and constraint jm/ja to all non-evex alternatives.
	(ptesttf2): Likewise.
	(<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
	(sse4_1_round<ssescalarmodesuffix>): Likewise.
	(sse4_2_pcmpestri): Likewise.
	(sse4_2_pcmpestrm): Likewise.
	(sse4_2_pcmpestr_cconly): Likewise.
	(sse4_2_pcmpistr): Likewise.
	(sse4_2_pcmpistri): Likewise.
	(sse4_2_pcmpistrm): Likewise.
	(sse4_2_pcmpistr_cconly): Likewise.
	(aesimc): Likewise.
	(aeskeygenassist): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
	tests.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

1328bb72

[APX EGPR] Handle GPR16 only vector move insns · f4988648

Hongyu Wang authored 1 year ago


For vector move insns like vmovdqa/vmovdqu, their evex counterparts
requrire explicit suffix 64/32/16/8. The usage of these instruction
are prohibited under AVX10_1 or AVX512F, so for we select
vmovaps/vmovups for vector load/store insns that contains EGPR if
ther is no AVX512VL, and keep the original move insn selection
otherwise.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used,
	adjust mnemonic for vmovduq/vmovdqa.
	* config/i386/sse.md (*<extract_type>_vinsert<shuffletype><extract_suf>_0):
	Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa.
	(avx_vec_concat<mode>): Likewise, and separate alternative 0 to
	avx_noavx512f.

Co-authored-by: Kong Lingling <lingling.kong@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

f4988648

[APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint. · ccdc0f0f

Kong Lingling authored 2 years ago


In inline asm, we do not know if the insn can use EGPR, so disable EGPR
usage by default via mapping the common reg/mem constraint to non-EGPR
constraints.

The full list of mapping goes like

  "g" -> "jrjmi"
  "r" -> "jr"
  "m" -> "jm"
  "<" -> "j<"
  ">" -> "j>"
  "o" -> "jo"
  "V" -> "jV"
  "p" -> "jp"
  "Bm" -> "ja

For memory constraints, we add an option -mapx-inline-asm-use-gpr32
to allow/disallow gpr32 usage in any memory related constraints, as
base_reg_class/index_reg_class cannot aware whether the asm insn
support gpr32 or not.

gcc/ChangeLog:

	* config/i386/i386.cc (map_egpr_constraints): New funciton to
	map common constraints to EGPR prohibited constraints.
	(ix86_md_asm_adjust): Calls map_egpr_constraints.
	* config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-inline-gpr-norex2.c: New test.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

ccdc0f0f

[APX EGPR] Add backend hook for base_reg_class/index_reg_class. · 0793ee05

Kong Lingling authored 2 years ago


Add backend helper functions to verify if a rtx_insn can adopt EGPR to
its base/index reg of memory operand. The verification rule goes like
  1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32.
  2. Disable EGPR for unrecognized insn.
  3. If which_alternative is not decided, loop through enabled alternatives
  and check its attr_gpr32. Only enable EGPR when all enabled
  alternatives has attr_gpr32 = 1.
  4. If which_alternative is decided, enable/disable EGPR by its corresponding
  attr_gpr32.

gcc/ChangeLog:

	* config/i386/i386-protos.h (ix86_insn_base_reg_class): New
	prototype.
	(ix86_regno_ok_for_insn_base_p): Likewise.
	(ix86_insn_index_reg_class): Likewise.
	* config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
	New helper function to scan the insn.
	(ix86_insn_base_reg_class): New function to choose BASE_REG_CLASS.
	(ix86_regno_ok_for_insn_base_p): Likewise for base regno.
	(ix86_insn_index_reg_class): Likewise for INDEX_REG_CLASS.
	* config/i386/i386.h (INSN_BASE_REG_CLASS): Define.
	(REGNO_OK_FOR_INSN_BASE_P): Likewise.
	(INSN_INDEX_REG_CLASS): Likewise.
	(enum reg_class): Add INDEX_GPR16.
	(GENERAL_GPR16_REGNO_P): Define.
	* config/i386/i386.md (gpr32): New attribute.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

0793ee05