Commit c1c267df authored 1 year ago by Richard Sandiford
aarch64: Add support for SME2 intrinsics

This patch adds support for the SME2 <arm_sme.h> intrinsics.  The
convention I've used is to put stuff in aarch64-sve-builtins-sme.*
if it relates to ZA, ZT0, the streaming vector length, or other
such SME state.  Things that operate purely on predicates and
vectors go in aarch64-sve-builtins-sve2.* instead.  Some of these
will later be picked up for SVE2p1.

We previously used Uph internally as a constraint for 16-bit
immediates to atomic instructions.  However, we need a user-facing
constraint for the upper predicate registers (already available as
PR_HI_REGS), and Uph makes a natural pair with the existing Upl.

gcc/
	* config/aarch64/aarch64.h (TARGET_STREAMING_SME2): New macro.
	(P_ALIASES): Likewise.
	(REGISTER_NAMES): Add pn aliases of the predicate registers.
	(W8_W11_REGNUM_P): New macro.
	(W8_W11_REGS): New register class.
	(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Update accordingly.
	* config/aarch64/aarch64.cc (aarch64_print_operand): Add support
	for %K, which prints a predicate as a counter.  Handle tuples of
	predicates.
	(aarch64_regno_regclass): Handle W8_W11_REGS.
	(aarch64_class_max_nregs): Likewise.
	* config/aarch64/constraints.md (Uci, Uw2, Uw4): New constraints.
	(x, y): Move further up file.
	(Uph): Redefine as the high predicate registers, renaming the old
	constraint to...
	(Uih): ...this.
	* config/aarch64/predicates.md (const_0_to_7_operand): New predicate.
	(const_0_to_4_step_4_operand, const_0_to_6_step_2_operand): Likewise.
	(const_0_to_12_step_4_operand, const_0_to_14_step_2_operand): Likewise.
	(aarch64_simd_shift_imm_qi): Use const_0_to_7_operand.
	* config/aarch64/iterators.md (VNx16SI_ONLY, VNx8SI_ONLY)
	(VNx8DI_ONLY, SVE_FULL_BHSIx2, SVE_FULL_HF, SVE_FULL_SIx2_SDIx4)
	(SVE_FULL_BHS, SVE_FULLx24, SVE_DIx24, SVE_BHSx24, SVE_Ix24)
	(SVE_Fx24, SVE_SFx24, SME_ZA_BIx24, SME_ZA_BHIx124, SME_ZA_BHIx24)
	(SME_ZA_HFx124, SME_ZA_HFx24, SME_ZA_HIx124, SME_ZA_HIx24)
	(SME_ZA_SDIx24, SME_ZA_SDFx24): New mode iterators.
	(UNSPEC_REVD, UNSPEC_CNTP_C, UNSPEC_PEXT, UNSPEC_PEXTx2): New unspecs.
	(UNSPEC_PSEL, UNSPEC_PTRUE_C, UNSPEC_SQRSHR, UNSPEC_SQRSHRN)
	(UNSPEC_SQRSHRU, UNSPEC_SQRSHRUN, UNSPEC_UQRSHR, UNSPEC_UQRSHRN)
	(UNSPEC_UZP, UNSPEC_UZPQ, UNSPEC_ZIP, UNSPEC_ZIPQ, UNSPEC_BFMLSLB)
	(UNSPEC_BFMLSLT, UNSPEC_FCVTN, UNSPEC_FDOT, UNSPEC_SQCVT): Likewise.
	(UNSPEC_SQCVTN, UNSPEC_SQCVTU, UNSPEC_SQCVTUN, UNSPEC_UQCVT): Likewise.
	(UNSPEC_SME_ADD, UNSPEC_SME_ADD_WRITE, UNSPEC_SME_BMOPA): Likewise.
	(UNSPEC_SME_BMOPS, UNSPEC_SME_FADD, UNSPEC_SME_FDOT, UNSPEC_SME_FVDOT)
	(UNSPEC_SME_FMLA, UNSPEC_SME_FMLS, UNSPEC_SME_FSUB, UNSPEC_SME_READ)
	(UNSPEC_SME_SDOT, UNSPEC_SME_SVDOT, UNSPEC_SME_SMLA, UNSPEC_SME_SMLS)
	(UNSPEC_SME_SUB, UNSPEC_SME_SUB_WRITE, UNSPEC_SME_SUDOT): Likewise.
	(UNSPEC_SME_SUVDOT, UNSPEC_SME_UDOT, UNSPEC_SME_UVDOT): Likewise.
	(UNSPEC_SME_UMLA, UNSPEC_SME_UMLS, UNSPEC_SME_USDOT): Likewise.
	(UNSPEC_SME_USVDOT, UNSPEC_SME_WRITE): Likewise.
	(Vetype, VNARROW, V2XWIDE, Ventype, V_INT_EQUIV, v_int_equiv)
	(VSINGLE, vsingle, b): Add tuple modes.
	(v2xwide, za32_offset_range, za64_offset_range, za32_long)
	(za32_last_offset, vg_modifier, z_suffix, aligned_operand)
	(aligned_fpr): New mode attributes.
	(SVE_INT_BINARY_MULTI, SVE_INT_BINARY_SINGLE, SVE_INT_BINARY_MULTI)
	(SVE_FP_BINARY_MULTI): New int iterators.
	(SVE_BFLOAT_TERNARY_LONG): Add UNSPEC_BFMLSLB and UNSPEC_BFMLSLT.
	(SVE_BFLOAT_TERNARY_LONG_LANE): Likewise.
	(SVE_WHILE_ORDER, SVE2_INT_SHIFT_IMM_NARROWxN, SVE_QCVTxN)
	(SVE2_SFx24_UNARY, SVE2_x24_PERMUTE, SVE2_x24_PERMUTEQ)
	(UNSPEC_REVD_ONLY, SME2_INT_MOP, SME2_BMOP, SME_BINARY_SLICE_SDI)
	(SME_BINARY_SLICE_SDF, SME_BINARY_WRITE_SLICE_SDI, SME_INT_DOTPROD)
	(SME_INT_DOTPROD_LANE, SME_FP_DOTPROD, SME_FP_DOTPROD_LANE)
	(SME_INT_TERNARY_SLICE, SME_FP_TERNARY_SLICE, BHSD_BITS)
	(LUTI_BITS): New int iterators.
	(optab, sve_int_op): Handle the new unspecs.
	(sme_int_op, has_16bit_form): New int attributes.
	(bits_etype): Handle 64.
	* config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): New unspec.
	(UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
	(UNSPEC_STNT1_SVE_COUNT): Likewise.
	* config/aarch64/atomics.md (cas_short_expected_imm): Use Uhi
	rather than Uph for HImode immediates.
	* config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
	(@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
	(@aarch64_stnt1<SVE_FULLx24:mode>): New patterns.
	(@aarch64_<sur>dot_prod_lane<vsi2qi>): Extend to...
	(@aarch64_<sur>dot_prod_lane<SVE_FULL_SDI:mode><SVE_FULL_BHI:mode>)
	(@aarch64_<sur>dot_prod_lane<VNx4SI_ONLY:mode><VNx16QI_ONLY:mode>):
	...these new patterns.
	(SVE_WHILE_B, SVE_WHILE_B_X2, SVE_WHILE_C): New constants.  Add
	SVE_WHILE_B to existing while patterns.
	* config/aarch64/aarch64-sve2.md (@aarch64_sve_ptrue_c<BHSD_BITS>)
	(@aarch64_sve_pext<BHSD_BITS>, @aarch64_sve_pext<BHSD_BITS>x2)
	(@aarch64_sve_psel<BHSD_BITS>, *aarch64_sve_psel<BHSD_BITS>_plus)
	(@aarch64_sve_cntp_c<BHSD_BITS>, <frint_pattern><mode>2)
	(<optab><mode>3, *<optab><mode>3, @aarch64_sve_single_<optab><mode>)
	(@aarch64_sve_<sve_int_op><mode>): New patterns.
	(@aarch64_sve_single_<sve_int_op><mode>, @aarch64_sve_<su>clamp<mode>)
	(*aarch64_sve_<su>clamp<mode>_x, @aarch64_sve_<su>clamp_single<mode>)
	(@aarch64_sve_fclamp<mode>, *aarch64_sve_fclamp<mode>_x)
	(@aarch64_sve_fclamp_single<mode>, <optab><mode><v2xwide>2)
	(@aarch64_sve_<sur>dotvnx4sivnx8hi): New patterns.
	(@aarch64_sve_<maxmin_uns_op><mode>): Likewise.
	(*aarch64_sve_<maxmin_uns_op><mode>): Likewise.
	(@aarch64_sve_single_<maxmin_uns_op><mode>): Likewise.
	(aarch64_sve_fdotvnx4sfvnx8hf): Likewise.
	(aarch64_fdot_prod_lanevnx4sfvnx8hf): Likewise.
	(@aarch64_sve_<optab><VNx16QI_ONLY:mode><VNx16SI_ONLY:mode>): Likewise.
	(@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8SI_ONLY:mode>): Likewise.
	(@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8DI_ONLY:mode>): Likewise.
	(truncvnx8sf<mode>2, @aarch64_sve_cvtn<mode>): Likewise.
	(<optab><v_int_equiv><mode>2, <optab><mode><v_int_equiv>2): Likewise.
	(@aarch64_sve_sel<mode>): Likewise.
	(@aarch64_sve_while<while_optab_cmp>_b<BHSD_BITS>_x2): Likewise.
	(@aarch64_sve_while<while_optab_cmp>_c<BHSD_BITS>): Likewise.
	(@aarch64_pred_<optab><mode>, @cond_<optab><mode>): Likewise.
	(@aarch64_sve_<optab><mode>): Likewise.
	* config/aarch64/aarch64-sme.md (@aarch64_sme_<optab><mode><mode>)
	(*aarch64_sme_<optab><mode><mode>_plus, @aarch64_sme_read<mode>)
	(*aarch64_sme_read<mode>_plus, @aarch64_sme_write<mode>): New patterns.
	(*aarch64_sme_write<mode>_plus aarch64_sme_zero_zt0): Likewise.
	(@aarch64_sme_<optab><mode>, *aarch64_sme_<optab><mode>_plus)
	(@aarch64_sme_single_<optab><mode>): Likewise.
	(*aarch64_sme_single_<optab><mode>_plus): Likewise.
	(@aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
	(*aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
	(@aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
	(*aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
	(@aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>)
	(*aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>_plus)
	(@aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
	(*aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>)
	(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>_plus)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>)
	(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus)
	(@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>)
	(*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus)
	(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>)
	(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>)
	(@aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>)
	(*aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>_plus)
	(@aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>)
	(*aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus)
	(@aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>)
	(*aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus)
	(@aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>)
	(*aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx8HI_ONLY:mode>)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx4SI_ONLY:mode>)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
	(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
	(@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
	(*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
	(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
	(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
	(@aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
	(*aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus)
	(@aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
	(*aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus)
	(@aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
	(*aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
	(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>)
	(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>_plus)
	(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>)
	(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>)
	(@aarch64_sme_lut<LUTI_BITS><mode>): Likewise.
	(UNSPEC_SME_LUTI): New unspec.
	* config/aarch64/aarch64-sve-builtins.def (single): New mode suffix.
	(c8, c16, c32, c64): New type suffixes.
	(vg1x2, vg1x4, vg2, vg2x1, vg2x2, vg2x4, vg4, vg4x1, vg4x2)
	(vg4x4): New group suffixes.
	* config/aarch64/aarch64-sve-builtins.h (CP_READ_ZT0)
	(CP_WRITE_ZT0): New constants.
	(get_svbool_t): Delete.
	(function_resolver::report_mismatched_num_vectors): New member
	function.
	(function_resolver::resolve_conversion): Likewise.
	(function_resolver::infer_predicate_type): Likewise.
	(function_resolver::infer_64bit_scalar_integer_pair): Likewise.
	(function_resolver::require_matching_predicate_type): Likewise.
	(function_resolver::require_nonscalar_type): Likewise.
	(function_resolver::finish_opt_single_resolution): Likewise.
	(function_resolver::require_derived_vector_type): Add an
	expected_num_vectors parameter.
	(function_expander::map_to_rtx_codes): Add an extra parameter
	for unconditional FP unspecs.
	(function_instance::gp_type_index): New member function.
	(function_instance::gp_type): Likewise.
	(function_instance::gp_mode): Handle multi-vector operations.
	* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_count)
	(TYPES_all_pred_count, TYPES_c, TYPES_bhs_data, TYPES_bhs_widen)
	(TYPES_hs_data, TYPES_cvt_h_s_float, TYPES_cvt_s_s, TYPES_qcvt_x2)
	(TYPES_qcvt_x4, TYPES_qrshr_x2, TYPES_qrshru_x2, TYPES_qrshr_x4)
	(TYPES_qrshru_x4, TYPES_while_x, TYPES_while_x_c, TYPES_s_narrow_fsu)
	(TYPES_za_s_b_signed, TYPES_za_s_b_unsigned, TYPES_za_s_b_integer)
	(TYPES_za_s_h_integer, TYPES_za_s_h_data, TYPES_za_s_unsigned)
	(TYPES_za_s_float, TYPES_za_s_data, TYPES_za_d_h_integer): New type
	macros.
	(groups_x2, groups_x12, groups_x4, groups_x24, groups_x124)
	(groups_vg1x2, groups_vg1x4, groups_vg1x24, groups_vg2, groups_vg4)
	(groups_vg24): New group arrays.
	(function_instance::reads_global_state_p): Handle CP_READ_ZT0.
	(function_instance::modifies_global_state_p): Handle CP_WRITE_ZT0.
	(add_shared_state_attribute): Handle zt0 state.
	(function_builder::add_overloaded_functions): Skip MODE_single
	for non-tuple groups.
	(function_resolver::report_mismatched_num_vectors): New function.
	(function_resolver::resolve_to): Add a fallback error message for
	the general two-type case.
	(function_resolver::resolve_conversion): New function.
	(function_resolver::infer_predicate_type): Likewise.
	(function_resolver::infer_64bit_scalar_integer_pair): Likewise.
	(function_resolver::require_matching_predicate_type): Likewise.
	(function_resolver::require_matching_vector_type): Specifically
	diagnose mismatched vector counts.
	(function_resolver::require_derived_vector_type): Add an
	expected_num_vectors parameter.  Extend to handle cases where
	tuples are expected.
	(function_resolver::require_nonscalar_type): New function.
	(function_resolver::check_gp_argument): Use gp_type_index rather
	than hard-coding VECTOR_TYPE_svbool_t.
	(function_resolver::finish_opt_single_resolution): New function.
	(function_checker::require_immediate_either_or): Remove hard-coded
	constants.
	(function_expander::direct_optab_handler): New function.
	(function_expander::use_pred_x_insn): Only add a strictness flag
	is the insn has an operand for it.
	(function_expander::map_to_rtx_codes): Take an unconditional
	FP unspec as an extra parameter.  Handle tuples and MODE_single.
	(function_expander::map_to_unspecs): Handle tuples and MODE_single.
	* config/aarch64/aarch64-sve-builtins-functions.h (read_zt0)
	(write_zt0): New typedefs.
	(full_width_access::memory_vector): Use the function's
	vectors_per_tuple.
	(rtx_code_function_base): Add an optional unconditional FP unspec.
	(rtx_code_function::expand): Update accordingly.
	(rtx_code_function_rotated::expand): Likewise.
	(unspec_based_function_exact_insn::expand): Use tuple_mode instead
	of vector_mode.
	(unspec_based_uncond_function): New typedef.
	(cond_or_uncond_unspec_function): New class.
	(sme_1mode_function::expand): Handle single forms.
	(sme_2mode_function_t): Likewise, adding a template parameter for them.
	(sme_2mode_function): Update accordingly.
	(sme_2mode_lane_function): New typedef.
	(multireg_permute): New class.
	(class integer_conversion): Likewise.
	(while_comparison::expand): Handle svcount_t and svboolx2_t results.
	* config/aarch64/aarch64-sve-builtins-shapes.h
	(binary_int_opt_single_n, binary_opt_single_n, binary_single)
	(binary_za_slice_lane, binary_za_slice_int_opt_single)
	(binary_za_slice_opt_single, binary_za_slice_uint_opt_single)
	(binaryx, clamp, compare_scalar_count, count_pred_c)
	(dot_za_slice_int_lane, dot_za_slice_lane, dot_za_slice_uint_lane)
	(extract_pred, inherent_zt, ldr_zt, read_za, read_za_slice)
	(select_pred, shift_right_imm_narrowxn, storexn, str_zt)
	(unary_convertxn, unary_za_slice, unaryxn, write_za)
	(write_za_slice): Declare.
	* config/aarch64/aarch64-sve-builtins-shapes.cc
	(za_group_is_pure_overload): New function.
	(apply_predication): Use the function's gp_type for the predicate,
	instead of hard-coding the use of svbool_t.
	(parse_element_type): Add support for "c" (svcount_t).
	(parse_type): Add support for "c0" and "c1" (conversion destination
	and source types).
	(binary_za_slice_lane_base): New class.
	(binary_za_slice_opt_single_base): Likewise.
	(load_contiguous_base::resolve): Pass the group suffix to r.resolve.
	(luti_lane_zt_base): New class.
	(binary_int_opt_single_n, binary_opt_single_n, binary_single)
	(binary_za_slice_lane, binary_za_slice_int_opt_single)
	(binary_za_slice_opt_single, binary_za_slice_uint_opt_single)
	(binaryx, clamp): New shapes.
	(compare_scalar_def::build): Allow the return type to be a tuple.
	(compare_scalar_def::expand): Pass the group suffix to r.resolve.
	(compare_scalar_count, count_pred_c, dot_za_slice_int_lane)
	(dot_za_slice_lane, dot_za_slice_uint_lane, extract_pred, inherent_zt)
	(ldr_zt, read_za, read_za_slice, select_pred, shift_right_imm_narrowxn)
	(storexn, str_zt): New shapes.
	(ternary_qq_lane_def, ternary_qq_opt_n_def): Replace with...
	(ternary_qq_or_011_lane_def, ternary_qq_opt_n_or_011_def): ...these
	new classes.  Allow a second suffix that specifies the type of the
	second vector argument, and that is used to derive the third.
	(unary_def::build): Extend to handle tuple types.
	(unary_convert_def::build): Use the new c0 and c1 format specifiers.
	(unary_convertxn, unary_za_slice, unaryxn, write_za): New shapes.
	(write_za_slice): Likewise.
	* config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand)
	(svext_bhw_impl::expand): Update call to map_to_rtx_costs.
	(svcntp_impl::expand): Handle svcount_t variants.
	(svcvt_impl::expand): Handle unpredicated conversions separately,
	dealing with tuples.
	(svdot_impl::expand): Handle 2-way dot products.
	(svdotprod_lane_impl::expand): Likewise.
	(svld1_impl::fold): Punt on tuple loads.
	(svld1_impl::expand): Handle tuple loads.
	(svldnt1_impl::expand): Likewise.
	(svpfalse_impl::fold): Punt on svcount_t forms.
	(svptrue_impl::fold): Likewise.
	(svptrue_impl::expand): Handle svcount_t forms.
	(svrint_impl): New class.
	(svsel_impl::fold): Punt on tuple forms.
	(svsel_impl::expand): Handle tuple forms.
	(svst1_impl::fold): Punt on tuple loads.
	(svst1_impl::expand): Handle tuple loads.
	(svstnt1_impl::expand): Likewise.
	(svwhilelx_impl::fold): Punt on tuple forms.
	(svdot_lane): Use UNSPEC_FDOT.
	(svmax, svmaxnm, svmin, svminmm): Add unconditional FP unspecs.
	(rinta, rinti, rintm, rintn, rintp, rintx, rintz): Use svrint_impl.
	* config/aarch64/aarch64-sve-builtins-base.def (svcreate2, svget2)
	(svset2, svundef2): Add _b variants.
	(svcvt): Use unary_convertxn.
	(svdot): Use ternary_qq_opt_n_or_011.
	(svdot_lane): Use ternary_qq_or_011_lane.
	(svmax, svmaxnm, svmin, svminnm): Use binary_opt_single_n.
	(svpfalse): Add a form that returns svcount_t results.
	(svrinta, svrintm, svrintn, svrintp): Use unaryxn.
	(svsel): Use binaryxn.
	(svst1, svstnt1): Use storexn.
	* config/aarch64/aarch64-sve-builtins-sme.h
	(svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za)
	(svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt)
	(svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za)
	(svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za)
	(svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za)
	(svvdot_lane_za, svwrite_za, svzero_zt): Declare.
	* config/aarch64/aarch64-sve-builtins-sme.cc (load_store_za_base):
	Rename to...
	(load_store_za_zt0_base): ...this and extend to tuples.
	(load_za_base, store_za_base): Update accordingly.
	(expand_ldr_str_zt0): New function.
	(svldr_zt_impl, svluti_lane_zt_impl, svread_za_impl, svstr_zt_impl)
	(svsudot_za_impl, svwrite_za_impl, svzero_zt_impl): New classes.
	(svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za)
	(svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt)
	(svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za)
	(svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za)
	(svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za)
	(svvdot_lane_za, svwrite_za, svzero_zt): New functions.
	* config/aarch64/aarch64-sve-builtins-sme.def: Add SME2 intrinsics.
	* config/aarch64/aarch64-sve-builtins-sve2.h
	(svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp)
	(svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn)
	(svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip)
	(svzipq): Declare.
	* config/aarch64/aarch64-sve-builtins-sve2.cc (svclamp_impl)
	(svcvtn_impl, svpext_impl, svpsel_impl): New classes.
	(svqrshl_impl::fold): Update for change to svrshl shape.
	(svrshl_impl::fold): Punt on tuple forms.
	(svsqadd_impl::expand): Update call to map_to_rtx_codes.
	(svunpk_impl): New class.
	(svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp)
	(svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn)
	(svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip)
	(svzipq): New functions.
	* config/aarch64/aarch64-sve-builtins-sve2.def: Add SME2 intrinsics.
	* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
	or undefine __ARM_FEATURE_SME2.

gcc/testsuite/
	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Provide a way
	for test functions to share ZT0.
	(ATTR): Update accordingly.
	(TEST_LOAD_COUNT, TEST_STORE_COUNT, TEST_PN, TEST_COUNT_PN)
	(TEST_EXTRACT_PN, TEST_SELECT_P, TEST_COMPARE_S_X2, TEST_COMPARE_S_C)
	(TEST_CREATE_B, TEST_GET_B, TEST_SET_B, TEST_XN, TEST_XN_SINGLE)
	(TEST_XN_SINGLE_Z15, TEST_XN_SINGLE_AWKWARD, TEST_X2_NARROW)
	(TEST_X4_NARROW): New macros.
	* gcc.target/aarch64/sve/acle/asm/create2_1.c: Add _b tests.
	* gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Remove
	test for svmopa that becomes valid with SME2.
	* gcc.target/aarch64/sve/acle/general-c/create_1.c: Adjust for
	existence of svboolx2_t version of svcreate2.
	* gcc.target/aarch64/sve/acle/general-c/store_1.c: Adjust error
	messages to account for svcount_t predication.
	* gcc.target/aarch64/sve/acle/general-c/store_2.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_qq_lane_1.c: Adjust
	error messages to account for new SME2 variants.
	* gcc.target/aarch64/sve/acle/general-c/ternary_qq_opt_n_2.c: Likewise.
parent 8d29b7ac
No related branches found
No related tags found
Expand all Hide whitespace changes
Inline Side-by-side
Showing with 4273 additions and 194 deletions
Please register or to comment