Skip to content
Snippets Groups Projects
  1. Oct 13, 2023
    • GCC Administrator's avatar
      Daily bump. · f9ef2e6d
      GCC Administrator authored
      f9ef2e6d
    • Pan Li's avatar
      RISC-V: Support FP lceil/lceilf auto vectorization · 51f7bfaa
      Pan Li authored
      
      This patch would like to support the FP lceil/lceilf auto vectorization.
      
      * long lceil (double) for rv64
      * long lceilf (float) for rv32
      
      Due to the limitation that only the same size of data type are allowed
      in the vectorier, the standard name lceilmn2 only act on DF => DI for
      rv64, and SF => SI for rv32.
      
      Given we have code like:
      
      void
      test_lceil (long *out, double *in, unsigned count)
      {
        for (unsigned i = 0; i < count; i++)
          out[i] = __builtin_lceil (in[i]);
      }
      
      Before this patch:
      .L3:
        ...
        fld         fa5,0(a1)
        fcvt.l.d    a5,fa5,rup
        sd          a5,-8(a0)
        ...
        bne         a1,a4,.L3
      
      After this patch:
        frrm        a6
        ...
        fsrmi       3 // RUP
      .L3:
        ...
        vsetvli     a3,zero,e64,m1,ta,ma
        vfcvt.x.f.v v1,v1
        vsetvli     zero,a2,e64,m1,ta,ma
        vse32.v     v1,0(a0)
        ...
        bne         a2,zero,.L3
        ...
        fsrm        a6
      
      The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
      by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
      
      gcc/ChangeLog:
      
      	* config/riscv/autovec.md (lceil<mode><v_i_l_ll_convert>2): New
      	pattern] for lceil/lceilf.
      	* config/riscv/riscv-protos.h (enum insn_type): New enum value.
      	(expand_vec_lceil): New func decl for expanding lceil.
      	* config/riscv/riscv-v.cc (expand_vec_lceil): New func impl
      	for expanding lceil.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/unop/math-lceil-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/math-lceil-1.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/math-lceil-run-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/math-lceil-run-1.c: New test.
      	* gcc.target/riscv/rvv/autovec/vls/math-lceil-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/vls/math-lceil-1.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      51f7bfaa
  2. Oct 12, 2023
    • Michael Meissner's avatar
      PR111778, PowerPC: Do not depend on an undefined shift · 611eef76
      Michael Meissner authored
      I was building a cross compiler to PowerPC on my x86_86 workstation with the
      latest version of GCC on October 11th.  I could not build the compiler on the
      x86_64 system as it died in building libgcc.  I looked into it, and I
      discovered the compiler was recursing until it ran out of stack space.  If I
      build a native compiler with the same sources on a PowerPC system, it builds
      fine.
      
      I traced this down to a change made around October 10th:
      
      | commit 8f1a70a4 (HEAD)
      | Author: Jiufu Guo <guojiufu@linux.ibm.com>
      | Date:   Tue Jan 10 20:52:33 2023 +0800
      |
      |   rs6000: build constant via li/lis;rldicl/rldicr
      |
      |   If a constant is possible left/right cleaned on a rotated value from
      |   a negative value of "li/lis".  Then, using "li/lis ; rldicl/rldicr"
      |   to build the constant.
      
      The code was doing a -1 << 64 which is undefined behavior because different
      machines produce different results.  On the x86_64 system, (-1 << 64) produces
      -1 while on a PowerPC 64-bit system, (-1 << 64) produces 0.  The x86_64 then
      recurses until the stack runs out of space.
      
      If I apply this patch, the compiler builds fine on both x86_64 as a PowerPC
      crosss compiler and on a native PowerPC system.
      
      2023-10-12  Michael Meissner  <meissner@linux.ibm.com>
      
      gcc/
      
      	PR target/111778
      	* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): Protect
      	code from shifts that are undefined.
      	(can_be_built_by_li_lis_and_rldicr): Likewise.
      	(can_be_built_by_li_and_rldic): Protect code from shifts that
      	undefined.  Also replace uses of 1ULL with HOST_WIDE_INT_1U.
      611eef76
    • Tobias Burnus's avatar
      libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatory · 8bd11fa4
      Tobias Burnus authored
      In OpenMP 5.0/5.1, the semantic of OMP_TARGET_OFFLOAD=mandatory was
      insufficiently specified; 5.2 clarified this with extensions/clarifications
      (omp_initial_device, omp_invalid_device, "conforming device number").
      GCC's implementation matches OpenMP 5.2.
      
      libgomp/ChangeLog:
      
      	* libgomp.texi (OMP_DEFAULT_DEVICE): Update spec ref; add @ref to
      	OMP_TARGET_OFFLOAD.
      	(OMP_TARGET_OFFLOAD): Update spec ref; add @ref to OMP_DEFAULT_DEVICE;
      	clarify MANDATORY behavior.
      8bd11fa4
    • Alex Coplan's avatar
      reg-notes.def: Fix up description of REG_NOALIAS · f150717b
      Alex Coplan authored
      The description of the REG_NOALIAS note in reg-notes.def isn't quite
      right. It describes it as being attached to call insns, but it is
      instead attached to a move insn receiving the return value from a call.
      
      This can be seen by looking at the code in calls.cc:expand_call which
      attaches the note:
      
        emit_move_insn (temp, valreg);
      
        /* The return value from a malloc-like function cannot alias
           anything else.  */
        last = get_last_insn ();
        add_reg_note (last, REG_NOALIAS, temp);
      
      gcc/ChangeLog:
      
      	* reg-notes.def (NOALIAS): Correct comment.
      f150717b
    • Christoph Müllner's avatar
      RISC-V: Make xtheadcondmov-indirect tests robust against instruction reordering · d8c3ace8
      Christoph Müllner authored
      
      Fixes: c1bc7513 ("RISC-V: const: hide mvconst splitter from IRA")
      
      A recent change broke the xtheadcondmov-indirect tests, because the order of
      emitted instructions changed. Since the test is too strict when testing for
      a fixed instruction order, let's change the tests to simply count instruction,
      like it is done for similar tests.
      
      Reported-by: default avatarPatrick O'Neill <patrick@rivosinc.com>
      Signed-off-by: default avatarChristoph Müllner <christoph.muellner@vrull.eu>
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/xtheadcondmov-indirect.c: Make robust against
      	instruction reordering.
      d8c3ace8
    • Jakub Jelinek's avatar
      wide-int: Fix build with gcc < 12 or clang++ [PR111787] · 53a94071
      Jakub Jelinek authored
      While my wide_int patch bootstrapped/regtested fine when I used GCC 12
      as system gcc, apparently it doesn't with GCC 11 and older or clang++.
      For GCC before PR96555 C++ DR1315 implementation the compiler complains
      about template argument involving template parameters, for clang++ the
      same + complains about missing needs_write_val_arg static data member
      in some wi::int_traits specializations.
      
      2023-10-12  Jakub Jelinek  <jakub@redhat.com>
      
      	PR bootstrap/111787
      	* tree.h (wi::int_traits <unextended_tree>::needs_write_val_arg): New
      	static data member.
      	(int_traits <extended_tree <N>>::needs_write_val_arg): Likewise.
      	(wi::ints_for): Provide separate partial specializations for
      	generic_wide_int <extended_tree <N>> and INL_CONST_PRECISION or that
      	and CONST_PRECISION, rather than using
      	int_traits <extended_tree <N> >::precision_type as the second template
      	argument.
      	* rtl.h (wi::int_traits <rtx_mode_t>::needs_write_val_arg): New
      	static data member.
      	* double-int.h (wi::int_traits <double_int>::needs_write_val_arg):
      	Likewise.
      53a94071
    • Mary Bennett's avatar
      RISCV: Bugfix for incorrect documentation heading nesting · e99ad401
      Mary Bennett authored
      	PR middle-end/111777
      
      gcc/ChangeLog:
      	* doc/extend.texi: Change subsubsection to subsection for
      	CORE-V built-ins.
      e99ad401
    • Tamar Christina's avatar
      AArch64: Fix Armv9-a warnings that get emitted whenever a ACLE header is used. · de593b3c
      Tamar Christina authored
      At the moment, trying to use -march=armv9-a with any ACLE header such as
      arm_neon.h results in rows and rows of warnings saying:
      
      <built-in>: warning: "__ARM_ARCH" redefined
      <built-in>: note: this is the location of the previous definition
      
      This is obviously not useful and happens because the header was defined at
      __ARM_ARCH == 8 and the commandline changes it.
      
      The Arm port solves this by undef the macro during argument processing and we do
      the same on AArch64 for the majority of macros.  However we define this macro
      using a different helper which requires the manual undef.
      
      Thanks,
      Tamar
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Add undef.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/armv9_warning.c: New test.
      de593b3c
    • Jakub Jelinek's avatar
      wide-int: Add simple CHECKING_P stack-protector canary like checking · fb590e4e
      Jakub Jelinek authored
      This patch adds hopefully not so expensive --enable-checking=yes
      verification that the widest_int upper length bound estimates are really
      upper bounds and nothing attempts to write more elements.
      It is done only if the estimated upper length bound is smaller than
      WIDE_INT_MAX_INL_ELTS, but that should be the most common case unless
      large _BitInt is involved.
      
      2023-10-12  Jakub Jelinek  <jakub@redhat.com>
      
      	* wide-int.h (widest_int_storage <N>::write_val): If l is small
      	and there is space in u.val array, store a canary value at the
      	end when checking.
      	(widest_int_storage <N>::set_len): Check the canary hasn't been
      	overwritten.
      fb590e4e
    • Jakub Jelinek's avatar
      wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989] · 0d00385e
      Jakub Jelinek authored
      As mentioned in the _BitInt support thread, _BitInt(N) is currently limited
      by the wide_int/widest_int maximum precision limitation, which is depending
      on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION).
      That is fairly low limit for _BitInt, especially on the targets with the 191
      bit limitation.
      
      The following patch bumps that limit to 16319 bits on all arches (which support
      _BitInt at all), which is the limit imposed by INTEGER_CST representation
      (unsigned char members holding number of HOST_WIDE_INT limbs).
      
      In order to achieve that, wide_int is changed from a trivially copyable type
      which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or
      11 limbs depending on target) limbs into a non-trivially copy constructible,
      copy assignable and destructible type which for the usual small cases (up
      to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses
      an inline array of limbs, but for larger precisions uses heap allocated
      limb array.  This makes wide_int unusable in GC structures, so for dwarf2out
      which was the only place which needed it there is a new rwide_int type
      (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs
      inline and is trivially copyable (dwarf2out should never deal with large
      _BitInt constants, those should have been lowered earlier).
      
      Similarly, widest_int has been changed from a trivially copyable type which
      contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike
      wide_int didn't contain precision and assumed that to be
      WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy
      assignable and destructible type which has always WIDEST_INT_MAX_PRECISION
      precision (32640 bits currently, twice as much as INTEGER_CST limitation
      allows) and unlike wide_int decides depending on get_len () value whether
      it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap
      allocated one.  In wide-int.h this means we need to estimate an upper
      bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h)
      need to write, heap allocate if needed based on that estimation and upon
      set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS
      and allocated dynamically, while we actually need less than that
      copy/deallocate.  The unexact guesses are needed because the exact
      computation of the length in wide-int.cc is sometimes quite complex and
      especially canonicalize at the end can decrease it.  widest_int is again
      because of this not usable in GC structures, so cfgloop.h has been changed
      to use fixed_wide_int_storage <WIDE_INT_MAX_INL_PRECISION> and punt if
      we'd have larger _BitInt based iterators, programs having more than 128-bit
      iterators will be hopefully rare and I think it is fine to treat loops with
      more than 2^127 iterations as effectively possibly infinite, omp-general.cc
      is changed to use fixed_wide_int_storage <1024>, as it better should support
      scores with the same precision on all arches.
      
      Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing
      wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for
      larger lengths.
      
      On x86_64, the patch in --enable-checking=yes,rtl,extra configured
      bootstrapped cc1plus enlarges the .text section by 1.01% - from
      0x25725a5 to 0x25e5555 and similarly at least when compiling insn-recog.cc
      with the usual bootstrap option slows compilation down by 1.01%,
      user 4m22.046s and 4m22.384s on vanilla trunk vs.
      4m25.947s and 4m25.581s on patched trunk.  I'm afraid some code size growth
      and compile time slowdown is unavoidable in this case, we use wide_int and
      widest_int everywhere, and while the rare cases are marked with UNLIKELY
      macros, it still means extra checks for it.
      
      The patch also regresses
      +FAIL: gm2/pim/fail/largeconst.mod,  -O
      +FAIL: gm2/pim/fail/largeconst.mod,  -O -g
      +FAIL: gm2/pim/fail/largeconst.mod,  -O3 -fomit-frame-pointer
      +FAIL: gm2/pim/fail/largeconst.mod,  -O3 -fomit-frame-pointer -finline-functions
      +FAIL: gm2/pim/fail/largeconst.mod,  -Os
      +FAIL: gm2/pim/fail/largeconst.mod,  -g
      +FAIL: gm2/pim/fail/largeconst2.mod,  -O
      +FAIL: gm2/pim/fail/largeconst2.mod,  -O -g
      +FAIL: gm2/pim/fail/largeconst2.mod,  -O3 -fomit-frame-pointer
      +FAIL: gm2/pim/fail/largeconst2.mod,  -O3 -fomit-frame-pointer -finline-functions
      +FAIL: gm2/pim/fail/largeconst2.mod,  -Os
      +FAIL: gm2/pim/fail/largeconst2.mod,  -g
      tests, which previously were rejected with
      error: constant literal ‘12345678912345678912345679123456789123456789123456789123456789123456791234567891234567891234567891234567891234567912345678912345678912345678912345678912345679123456789123456789’ exceeds internal ZTYPE range
      kind of errors, but now are accepted.  Seems the FE tries to parse constants
      into widest_int in that case and only diagnoses if widest_int overflows,
      that seems wrong, it should at least punt if stuff doesn't fit into
      WIDE_INT_MAX_PRECISION, but perhaps far less than that, if it wants support
      for middle-end for precisions above 128-bit, it better should be using
      BITINT_TYPE.  Will file a PR and defer to Modula2 maintainer.
      
      2023-10-12  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c/102989
      	* wide-int.h: Adjust file comment.
      	(WIDE_INT_MAX_INL_ELTS): Define to former value of WIDE_INT_MAX_ELTS.
      	(WIDE_INT_MAX_INL_PRECISION): Define.
      	(WIDE_INT_MAX_ELTS): Change to 255.  Assert that WIDE_INT_MAX_INL_ELTS
      	is smaller than WIDE_INT_MAX_ELTS.
      	(RWIDE_INT_MAX_ELTS, RWIDE_INT_MAX_PRECISION, WIDEST_INT_MAX_ELTS,
      	WIDEST_INT_MAX_PRECISION): Define.
      	(WI_BINARY_RESULT_VAR, WI_UNARY_RESULT_VAR): Change write_val callers
      	to pass 0 as a new argument.
      	(class widest_int_storage): Likewise.
      	(widest_int, widest2_int): Change typedefs to use widest_int_storage
      	rather than fixed_wide_int_storage.
      	(enum wi::precision_type): Add INL_CONST_PRECISION enumerator.
      	(struct binary_traits): Add partial specializations for
      	INL_CONST_PRECISION.
      	(generic_wide_int): Add needs_write_val_arg static data member.
      	(int_traits): Likewise.
      	(wide_int_storage): Replace val non-static data member with a union
      	u of it and HOST_WIDE_INT *valp.  Declare copy constructor, copy
      	assignment operator and destructor.  Add unsigned int argument to
      	write_val.
      	(wide_int_storage::wide_int_storage): Initialize precision to 0
      	in the default ctor.  Remove unnecessary {}s around STATIC_ASSERTs.
      	Assert in non-default ctor T's precision_type is not
      	INL_CONST_PRECISION and allocate u.valp for large precision.  Add
      	copy constructor.
      	(wide_int_storage::~wide_int_storage): New.
      	(wide_int_storage::operator=): Add copy assignment operator.  In
      	assignment operator remove unnecessary {}s around STATIC_ASSERTs,
      	assert ctor T's precision_type is not INL_CONST_PRECISION and
      	if precision changes, deallocate and/or allocate u.valp.
      	(wide_int_storage::get_val): Return u.valp rather than u.val for
      	large precision.
      	(wide_int_storage::write_val): Likewise.  Add an unused unsigned int
      	argument.
      	(wide_int_storage::set_len): Use write_val instead of writing val
      	directly.
      	(wide_int_storage::from, wide_int_storage::from_array): Adjust
      	write_val callers.
      	(wide_int_storage::create): Allocate u.valp for large precisions.
      	(wi::int_traits <wide_int_storage>::get_binary_precision): New.
      	(fixed_wide_int_storage::fixed_wide_int_storage): Make default
      	ctor defaulted.
      	(fixed_wide_int_storage::write_val): Add unused unsigned int argument.
      	(fixed_wide_int_storage::from, fixed_wide_int_storage::from_array):
      	Adjust write_val callers.
      	(wi::int_traits <fixed_wide_int_storage>::get_binary_precision): New.
      	(WIDEST_INT): Define.
      	(widest_int_storage): New template class.
      	(wi::int_traits <widest_int_storage>): New.
      	(trailing_wide_int_storage::write_val): Add unused unsigned int
      	argument.
      	(wi::get_binary_precision): Use
      	wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_precision
      	rather than get_precision on get_binary_result.
      	(wi::copy): Adjust write_val callers.  Don't call set_len if
      	needs_write_val_arg.
      	(wi::bit_not): If result.needs_write_val_arg, call write_val
      	again with upper bound estimate of len.
      	(wi::sext, wi::zext, wi::set_bit): Likewise.
      	(wi::bit_and, wi::bit_and_not, wi::bit_or, wi::bit_or_not,
      	wi::bit_xor, wi::add, wi::sub, wi::mul, wi::mul_high, wi::div_trunc,
      	wi::div_floor, wi::div_ceil, wi::div_round, wi::divmod_trunc,
      	wi::mod_trunc, wi::mod_floor, wi::mod_ceil, wi::mod_round,
      	wi::lshift, wi::lrshift, wi::arshift): Likewise.
      	(wi::bswap, wi::bitreverse): Assert result.needs_write_val_arg
      	is false.
      	(gt_ggc_mx, gt_pch_nx): Remove generic template for all
      	generic_wide_int, instead add functions and templates for each
      	storage of generic_wide_int.  Make functions for
      	generic_wide_int <wide_int_storage> and templates for
      	generic_wide_int <widest_int_storage <N>> deleted.
      	(wi::mask, wi::shifted_mask): Adjust write_val calls.
      	* wide-int.cc (zeros): Decrease array size to 1.
      	(BLOCKS_NEEDED): Use CEIL.
      	(canonize): Use HOST_WIDE_INT_M1.
      	(wi::from_buffer): Pass 0 to write_val.
      	(wi::to_mpz): Use CEIL.
      	(wi::from_mpz): Likewise.  Pass 0 to write_val.  Use
      	WIDE_INT_MAX_INL_ELTS instead of WIDE_INT_MAX_ELTS.
      	(wi::mul_internal): Use WIDE_INT_MAX_INL_PRECISION instead of
      	MAX_BITSIZE_MODE_ANY_INT in automatic array sizes, for prec
      	above WIDE_INT_MAX_INL_PRECISION estimate precision from
      	lengths of operands.  Use XALLOCAVEC allocated buffers for
      	prec above WIDE_INT_MAX_INL_PRECISION.
      	(wi::divmod_internal): Likewise.
      	(wi::lshift_large): For len > WIDE_INT_MAX_INL_ELTS estimate
      	it from xlen and skip.
      	(rshift_large_common): Remove xprecision argument, add len
      	argument with len computed in caller.  Don't return anything.
      	(wi::lrshift_large, wi::arshift_large): Compute len here
      	and pass it to rshift_large_common, for lengths above
      	WIDE_INT_MAX_INL_ELTS using estimations from xlen if possible.
      	(assert_deceq, assert_hexeq): For lengths above
      	WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer.
      	(test_printing): Use WIDE_INT_MAX_INL_PRECISION instead of
      	WIDE_INT_MAX_PRECISION.
      	* wide-int-print.h (WIDE_INT_PRINT_BUFFER_SIZE): Use
      	WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION.
      	* wide-int-print.cc (print_decs, print_decu, print_hex): For
      	lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer.
      	* tree.h (wi::int_traits<extended_tree <N>>): Change precision_type
      	to INL_CONST_PRECISION for N == ADDR_MAX_PRECISION.
      	(widest_extended_tree): Use WIDEST_INT_MAX_PRECISION instead of
      	WIDE_INT_MAX_PRECISION.
      	(wi::ints_for): Use int_traits <extended_tree <N> >::precision_type
      	instead of hard coded CONST_PRECISION.
      	(widest2_int_cst): Use WIDEST_INT_MAX_PRECISION instead of
      	WIDE_INT_MAX_PRECISION.
      	(wi::extended_tree <N>::get_len): Use WIDEST_INT_MAX_PRECISION rather
      	than WIDE_INT_MAX_PRECISION.
      	(wi::ints_for::zero): Use
      	wi::int_traits <wi::extended_tree <N> >::precision_type instead of
      	wi::CONST_PRECISION.
      	* tree.cc (build_replicated_int_cst): Formatting fix.  Use
      	WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS.
      	* print-tree.cc (print_node): Don't print TREE_UNAVAILABLE on
      	INTEGER_CSTs, TREE_VECs or SSA_NAMEs.
      	* double-int.h (wi::int_traits <double_int>::precision_type): Change
      	to INL_CONST_PRECISION from CONST_PRECISION.
      	* poly-int.h (struct poly_coeff_traits): Add partial specialization
      	for wi::INL_CONST_PRECISION.
      	* cfgloop.h (bound_wide_int): New typedef.
      	(struct nb_iter_bound): Change bound type from widest_int to
      	bound_wide_int.
      	(struct loop): Change nb_iterations_upper_bound,
      	nb_iterations_likely_upper_bound and nb_iterations_estimate type from
      	widest_int to bound_wide_int.
      	* cfgloop.cc (record_niter_bound): Return early if wi::min_precision
      	of i_bound is too large for bound_wide_int.  Adjustments for the
      	widest_int to bound_wide_int type change in non-static data members.
      	(get_estimated_loop_iterations, get_max_loop_iterations,
      	get_likely_max_loop_iterations): Adjustments for the widest_int to
      	bound_wide_int type change in non-static data members.
      	* tree-vect-loop.cc (vect_transform_loop): Likewise.
      	* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use
      	XALLOCAVEC allocated buffer for i_bound len above
      	WIDE_INT_MAX_INL_ELTS.
      	(record_estimate): Return early if wi::min_precision of i_bound is too
      	large for bound_wide_int.  Adjustments for the widest_int to
      	bound_wide_int type change in non-static data members.
      	(wide_int_cmp): Use bound_wide_int instead of widest_int.
      	(bound_index): Use bound_wide_int instead of widest_int.
      	(discover_iteration_bound_by_body_walk): Likewise.  Use
      	widest_int::from to convert it to widest_int when passed to
      	record_niter_bound.
      	(maybe_lower_iteration_bound): Use widest_int::from to convert it to
      	widest_int when passed to record_niter_bound.
      	(estimate_numbers_of_iteration): Don't record upper bound if
      	loop->nb_iterations has too large precision for bound_wide_int.
      	(n_of_executions_at_most): Use widest_int::from.
      	* tree-ssa-loop-ivcanon.cc (remove_redundant_iv_tests): Adjust for
      	the widest_int to bound_wide_int changes.
      	* match.pd (fold_sign_changed_comparison simplification): Use
      	wide_int::from on wi::to_wide instead of wi::to_widest.
      	* value-range.h (irange::maybe_resize): Avoid using memcpy on
      	non-trivially copyable elements.
      	* value-range.cc (irange_bitmask::dump): Use XALLOCAVEC allocated
      	buffer for mask or value len above WIDE_INT_PRINT_BUFFER_SIZE.
      	* fold-const.cc (fold_convert_const_int_from_int, fold_unary_loc):
      	Use wide_int::from on wi::to_wide instead of wi::to_widest.
      	* tree-ssa-ccp.cc (bit_value_binop): Zero extend r1max from width
      	before calling wi::udiv_trunc.
      	* lto-streamer-out.cc (output_cfg): Adjustments for the widest_int to
      	bound_wide_int type change in non-static data members.
      	* lto-streamer-in.cc (input_cfg): Likewise.
      	(lto_input_tree_1): Use WIDE_INT_MAX_INL_ELTS rather than
      	WIDE_INT_MAX_ELTS.  For length above WIDE_INT_MAX_INL_ELTS use
      	XALLOCAVEC allocated buffer.  Formatting fix.
      	* data-streamer-in.cc (streamer_read_wide_int,
      	streamer_read_widest_int): Likewise.
      	* tree-affine.cc (aff_combination_expand): Use placement new to
      	construct name_expansion.
      	(free_name_expansion): Destruct name_expansion.
      	* gimple-ssa-strength-reduction.cc (struct slsr_cand_d): Change
      	index type from widest_int to offset_int.
      	(class incr_info_d): Change incr type from widest_int to offset_int.
      	(alloc_cand_and_find_basis, backtrace_base_for_ref,
      	restructure_reference, slsr_process_ref, create_mul_ssa_cand,
      	create_mul_imm_cand, create_add_ssa_cand, create_add_imm_cand,
      	slsr_process_add, cand_abs_increment, replace_mult_candidate,
      	replace_unconditional_candidate, incr_vec_index,
      	create_add_on_incoming_edge, create_phi_basis_1,
      	replace_conditional_candidate, record_increment,
      	record_phi_increments_1, phi_incr_cost_1, phi_incr_cost,
      	lowest_cost_path, total_savings, ncd_with_phi, ncd_of_cand_and_phis,
      	nearest_common_dominator_for_cands, insert_initializers,
      	all_phi_incrs_profitable_1, replace_one_candidate,
      	replace_profitable_candidates): Use offset_int rather than widest_int
      	and wi::to_offset rather than wi::to_widest.
      	* real.cc (real_to_integer): Use WIDE_INT_MAX_INL_ELTS rather than
      	2 * WIDE_INT_MAX_ELTS and for words above that use XALLOCAVEC
      	allocated buffer.
      	* tree-ssa-loop-ivopts.cc (niter_for_exit): Use placement new
      	to construct tree_niter_desc and destruct it on failure.
      	(free_tree_niter_desc): Destruct tree_niter_desc if value is non-NULL.
      	* gengtype.cc (main): Remove widest_int handling.
      	* graphite-isl-ast-to-gimple.cc (widest_int_from_isl_expr_int): Use
      	WIDEST_INT_MAX_ELTS instead of WIDE_INT_MAX_ELTS.
      	* gimple-ssa-warn-alloca.cc (pass_walloca::execute): Use
      	WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION and
      	assert get_len () fits into it.
      	* value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks):
      	For mask or value lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC
      	allocated buffer.
      	* gimple-ssa-sprintf.cc (adjust_range_for_overflow): Use
      	wide_int::from on wi::to_wide instead of wi::to_widest.
      	* omp-general.cc (score_wide_int): New typedef.
      	(omp_context_compute_score): Use score_wide_int instead of widest_int
      	and adjust for those changes.
      	(struct omp_declare_variant_entry): Change score and
      	score_in_declare_simd_clone non-static data member type from widest_int
      	to score_wide_int.
      	(omp_resolve_late_declare_variant, omp_resolve_declare_variant): Use
      	score_wide_int instead of widest_int and adjust for those changes.
      	(omp_lto_output_declare_variant_alt): Likewise.
      	(omp_lto_input_declare_variant_alt): Likewise.
      	* godump.cc (go_output_typedef): Assert get_len () is smaller than
      	WIDE_INT_MAX_INL_ELTS.
      gcc/c-family/
      	* c-warn.cc (match_case_to_enum_1): Use wi::to_wide just once instead
      	of 3 times, assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS.
      gcc/testsuite/
      	* gcc.dg/bitint-38.c: New test.
      0d00385e
    • Georg-Johann Lay's avatar
      LibF7: Implement atan2. · cd0185b8
      Georg-Johann Lay authored
      libgcc/config/avr/libf7/
      	* libf7.c (F7MOD_atan2_, f7_atan2): New module and function.
      	* libf7.h: Adjust comments.
      	* libf7-common.mk (CALL_PROLOGUES): Add atan2.
      cd0185b8
    • Pan Li's avatar
      RISC-V: Support FP lround/lroundf auto vectorization · 2cc4f58a
      Pan Li authored
      
      This patch would like to support the FP lround/lroundf auto vectorization.
      
      * long lround (double) for rv64
      * long lroundf (float) for rv32
      
      Due to the limitation that only the same size of data type are allowed
      in the vectorier, the standard name lroundmn2 only act on DF => DI for
      rv64, and SF => SI for rv32.
      
      Given we have code like:
      
      void
      test_lround (long *out, double *in, unsigned count)
      {
        for (unsigned i = 0; i < count; i++)
          out[i] = __builtin_lround (in[i]);
      }
      
      Before this patch:
      .L3:
        ...
        fld      fa5,0(a1)
        fcvt.l.d a5,fa5,rmm
        sd       a5,-8(a0)
        ...
        bne      a1,a4,.L3
      
      After this patch:
        frrm     a6
        ...
        fsrmi    4 // RMM
      .L3:
        ...
        vsetvli     a3,zero,e64,m1,ta,ma
        vfcvt.x.f.v v1,v1
        vsetvli     zero,a2,e64,m1,ta,ma
        vse32.v     v1,0(a0)
        ...
        bne         a2,zero,.L3
        ...
        fsrm     a6
      
      The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
      by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
      
      gcc/ChangeLog:
      
      	* config/riscv/autovec.md (lround<mode><v_i_l_ll_convert>2): New
      	pattern for lround/lroundf.
      	* config/riscv/riscv-protos.h (enum insn_type): New enum value.
      	(expand_vec_lround): New func decl for expanding lround.
      	* config/riscv/riscv-v.cc (expand_vec_lround): New func impl
      	for expanding lround.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/unop/math-lround-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/math-lround-1.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c: New test.
      	* gcc.target/riscv/rvv/autovec/vls/math-lround-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/vls/math-lround-1.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      2cc4f58a
    • Jakub Jelinek's avatar
      dwarf2out: Stop using wide_int in GC structures · dfb40855
      Jakub Jelinek authored
      The planned wide_int/widest_int changes to support larger precisions
      make wide_int and widest_int unusable in GC structures, because it
      has non-trivial destructors (and may point to heap allocated memory).
      dwarf2out.{h,cc} is the only user of wide_int in GC structures for val_wide,
      but actually doesn't really need much, all those are at one point created
      from const wide_int_ref & and never changed afterwards, with just a couple
      of methods used on it.
      
      So, this patch replaces use of wide_int there with a new class, dw_wide_int,
      which contains just precision, len field and the limbs in trailing array.
      Most needed methods are implemented directly, just for the most complicated
      cases it temporarily constructs a wide_int_ref from it and calls its methods.
      
      2023-10-12  Jakub Jelinek  <jakub@redhat.com>
      
      	* dwarf2out.h (wide_int_ptr): Remove.
      	(dw_wide_int_ptr): New typedef.
      	(struct dw_val_node): Change type of val_wide from wide_int_ptr
      	to dw_wide_int_ptr.
      	(struct dw_wide_int): New type.
      	(dw_wide_int::elt): New method.
      	(dw_wide_int::operator ==): Likewise.
      	* dwarf2out.cc (get_full_len): Change argument type to
      	const dw_wide_int & from const wide_int &.  Use CEIL.  Call
      	get_precision method instead of calling wi::get_precision.
      	(alloc_dw_wide_int): New function.
      	(add_AT_wide): Change w argument type to const wide_int_ref &
      	from const wide_int &.  Use alloc_dw_wide_int.
      	(mem_loc_descriptor, loc_descriptor): Use alloc_dw_wide_int.
      	(insert_wide_int): Change val argument type to const wide_int_ref &
      	from const wide_int &.
      	(add_const_value_attribute): Pass rtx_mode_t temporary directly to
      	add_AT_wide instead of using a temporary variable.
      dfb40855
    • Richard Biener's avatar
      tree-optimization/111764 - wrong reduction vectorization · 05f98310
      Richard Biener authored
      The following removes a misguided attempt to allow x + x in a reduction
      path, also allowing x * x which isn't valid.  x + x actually never
      arrives this way but instead is canonicalized to 2 * x.  This makes
      reduction path handling consistent with how we handle the single-stmt
      reduction case.
      
      	PR tree-optimization/111764
      	* tree-vect-loop.cc (check_reduction_path): Remove the attempt
      	to allow x + x via special-casing of assigns.
      
      	* gcc.dg/vect/pr111764.c: New testcase.
      05f98310
    • Hu, Lin1's avatar
      Support Intel USER_MSR · 5fbd91b1
      Hu, Lin1 authored
      gcc/ChangeLog:
      
      	* common/config/i386/cpuinfo.h (get_available_features):
      	Detect USER_MSR.
      	* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_USER_MSR_SET): New.
      	(OPTION_MASK_ISA2_USER_MSR_UNSET): Ditto.
      	(ix86_handle_option): Handle -musermsr.
      	* common/config/i386/i386-cpuinfo.h (enum processor_features):
      	Add FEATURE_USER_MSR.
      	* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for usermsr.
      	* config.gcc: Add usermsrintrin.h
      	* config/i386/cpuid.h (bit_USER_MSR): New.
      	* config/i386/i386-builtin-types.def:
      	Add DEF_FUNCTION_TYPE (VOID, UINT64, UINT64).
      	* config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
      	Add __builtin_urdmsr and __builtin_uwrmsr.
      	* config/i386/i386-builtins.h (ix86_builtins):
      	Add IX86_BUILTIN_URDMSR and IX86_BUILTIN_UWRMSR.
      	* config/i386/i386-c.cc (ix86_target_macros_internal):
      	Define __USER_MSR__.
      	* config/i386/i386-expand.cc (ix86_expand_builtin):
      	Handle new builtins.
      	* config/i386/i386-isa.def (USER_MSR): Add DEF_PTA(USER_MSR).
      	* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
      	Handle usermsr.
      	* config/i386/i386.md (urdmsr): New define_insn.
      	(uwrmsr): Ditto.
      	* config/i386/i386.opt: Add option -musermsr.
      	* config/i386/x86gprintrin.h: Include usermsrintrin.h
      	* doc/extend.texi: Document usermsr.
      	* doc/invoke.texi: Document -musermsr.
      	* doc/sourcebuild.texi: Document target usermsr.
      	* config/i386/usermsrintrin.h: New file.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/funcspec-56.inc: Add new target attribute.
      	* gcc.target/i386/x86gprintrin-1.c: Add -musermsr for 64bit target.
      	* gcc.target/i386/x86gprintrin-2.c: Ditto.
      	* gcc.target/i386/x86gprintrin-3.c: Ditto.
      	* gcc.target/i386/x86gprintrin-4.c: Add musermsr for 64bit target.
      	* gcc.target/i386/x86gprintrin-5.c: Ditto
      	* gcc.target/i386/user_msr-1.c: New test.
      	* gcc.target/i386/user_msr-2.c: Ditto.
      5fbd91b1
    • Chenghui Pan's avatar
      LoongArch: Modify check_effective_target_vect_int_mod according to SX/ASX capabilities. · 39488446
      Chenghui Pan authored
      gcc/testsuite/ChangeLog:
      
      	* lib/target-supports.exp: Add LoongArch in
      	check_effective_target_vect_int_mod according to SX/ASX capabilities.
      39488446
    • Chenghui Pan's avatar
      LoongArch: Enable vect.exp for LoongArch. [PR111424] · a2a51b69
      Chenghui Pan authored
      gcc/testsuite/ChangeLog:
      
      	PR target/111424
      	* lib/target-supports.exp: Enable vect.exp for LoongArch.
      a2a51b69
    • Yang Yujie's avatar
      LoongArch: Adjust makefile dependency for loongarch headers. · 3c231836
      Yang Yujie authored
      gcc/ChangeLog:
      
      	* config.gcc: Add loongarch-driver.h to tm_files.
      	* config/loongarch/loongarch.h: Do not include loongarch-driver.h.
      	* config/loongarch/t-loongarch: Append loongarch-multilib.h to $(GTM_H)
      	instead of $(TM_H) for building generator programs.
      3c231836
    • Paul Thomas's avatar
      Fortran: Set hidden string length for pointer components [PR67740]. · 701363d8
      Paul Thomas authored
      2023-10-11  Paul Thomas  <pault@gcc.gnu.org>
      
      gcc/fortran
      	PR fortran/67740
      	* trans-expr.cc (gfc_trans_pointer_assignment): Set the hidden
      	string length component for pointer assignment to character
      	pointer components.
      
      gcc/testsuite/
      	PR fortran/67740
      	* gfortran.dg/pr67740.f90: New test
      701363d8
    • Kewen Lin's avatar
      rs6000: Make 32 bit stack_protect support prefixed insn [PR111367] · 530babc2
      Kewen Lin authored
      As PR111367 shows, with prefixed insn supported, some of
      checkings consider it's able to leverage prefixed insn
      for stack protect related load/store, but since we don't
      actually change the emitted assembly for 32 bit, it can
      cause the assembler error as exposed.
      
      Mike's commit r10-4547-gce6a6c007e5a98 has already handled
      the 64 bit case (DImode), this patch is to treat the 32
      bit case (SImode) by making use of mode iterator P and
      ptrload attribute iterator, also fixes the constraints
      to match the emitted operand formats.
      
      	PR target/111367
      
      gcc/ChangeLog:
      
      	* config/rs6000/rs6000.md (stack_protect_setsi): Support prefixed
      	instruction emission and incorporate to stack_protect_set<mode>.
      	(stack_protect_setdi): Rename to ...
      	(stack_protect_set<mode>): ... this, adjust constraint.
      	(stack_protect_testsi): Support prefixed instruction emission and
      	incorporate to stack_protect_test<mode>.
      	(stack_protect_testdi): Rename to ...
      	(stack_protect_test<mode>): ... this, adjust constraint.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.target/powerpc/pr111367.C: New test.
      530babc2
    • Kewen Lin's avatar
      testsuite: Avoid uninit var in pr60510.f [PR111427] · 610b845a
      Kewen Lin authored
      The uninitialized variable a in pr60510.f can cause
      some random failures as exposed in PR111427.  This
      patch is to make it initialized accordingly.
      
      	PR testsuite/111427
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/vect/pr60510.f (test): Init variable a.
      610b845a
    • Kewen Lin's avatar
      vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSE · f1a05dc1
      Kewen Lin authored
      For VMAT_CONTIGUOUS_REVERSE, the transform code in function
      vectorizable_store generates a VEC_PERM_EXPR stmt before
      storing, but it's never considered in costing.
      
      This patch is to make it consider vec_perm in costing, it
      adjusts the order of transform code a bit to make it easy
      to early return for costing_p.
      
      gcc/ChangeLog:
      
      	* tree-vect-stmts.cc (vectorizable_store): Consider generated
      	VEC_PERM_EXPR stmt for VMAT_CONTIGUOUS_REVERSE in costing as
      	vec_perm.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c: New test.
      f1a05dc1
    • Kewen Lin's avatar
      vect: Get rid of vect_model_store_cost · 0bdb9bb5
      Kewen Lin authored
      This patch is to eventually get rid of vect_model_store_cost,
      it adjusts the costing for the remaining memory access types
      VMAT_CONTIGUOUS{, _DOWN, _REVERSE} by moving costing close
      to the transform code.  Note that in vect_model_store_cost,
      there is one special handling for vectorizing a store into
      the function result, since it's extra penalty and the
      transform part doesn't have it, this patch keep it alone.
      
      gcc/ChangeLog:
      
      	* tree-vect-stmts.cc (vect_model_store_cost): Remove.
      	(vectorizable_store): Adjust the costing for the remaining memory
      	access types VMAT_CONTIGUOUS{, _DOWN, _REVERSE}.
      0bdb9bb5
    • Kewen Lin's avatar
      vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTE · 0a96eedb
      Kewen Lin authored
      This patch adjusts the cost handling on VMAT_CONTIGUOUS_PERMUTE
      in function vectorizable_store.  We don't call function
      vect_model_store_cost for it any more.  It's the case of
      interleaving stores, so it skips all stmts excepting for
      first_stmt_info, consider the whole group when costing
      first_stmt_info.  This patch shouldn't have any functional
      changes.
      
      gcc/ChangeLog:
      
      	* tree-vect-stmts.cc (vect_model_store_cost): Assert it will never
      	get VMAT_CONTIGUOUS_PERMUTE and remove VMAT_CONTIGUOUS_PERMUTE related
      	handlings.
      	(vectorizable_store): Adjust the cost handling on
      	VMAT_CONTIGUOUS_PERMUTE without calling vect_model_store_cost.
      0a96eedb
    • Kewen Lin's avatar
      vect: Adjust vectorizable_store costing on VMAT_LOAD_STORE_LANES · 6a88202e
      Kewen Lin authored
      This patch adjusts the cost handling on VMAT_LOAD_STORE_LANES
      in function vectorizable_store.  We don't call function
      vect_model_store_cost for it any more.  It's the case of
      interleaving stores, so it skips all stmts excepting for
      first_stmt_info, consider the whole group when costing
      first_stmt_info.  This patch shouldn't have any functional
      changes.
      
      gcc/ChangeLog:
      
      	* tree-vect-stmts.cc (vect_model_store_cost): Assert it will never
      	get VMAT_LOAD_STORE_LANES.
      	(vectorizable_store): Adjust the cost handling on VMAT_LOAD_STORE_LANES
      	without calling vect_model_store_cost.  Factor out new lambda function
      	update_prologue_cost.
      6a88202e
    • Kewen Lin's avatar
      vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP · 8b151eb9
      Kewen Lin authored
      This patch adjusts the cost handling on VMAT_ELEMENTWISE
      and VMAT_STRIDED_SLP in function vectorizable_store.  We
      don't call function vect_model_store_cost for them any more.
      
      Like what we improved for PR82255 on load side, this change
      helps us to get rid of unnecessary vec_to_scalar costing
      for some case with VMAT_STRIDED_SLP.  One typical test case
      gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c has been
      associated.  And it helps some cases with some inconsistent
      costing too.
      
      Besides, this also special-cases the interleaving stores
      for these two affected memory access types, since for the
      interleaving stores the whole chain is vectorized when the
      last store in the chain is reached, the other stores in the
      group would be skipped.  To keep consistent with this and
      follows the transforming handlings like iterating the whole
      group, it only costs for the first store in the group.
      Ideally we can only cost for the last one but it's not
      trivial and using the first one is actually equivalent.
      
      gcc/ChangeLog:
      
      	* tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get
      	VMAT_ELEMENTWISE and VMAT_STRIDED_SLP any more, and remove their
      	related handlings.
      	(vectorizable_store): Adjust the cost handling on VMAT_ELEMENTWISE
      	and VMAT_STRIDED_SLP without calling vect_model_store_cost.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c: New test.
      8b151eb9
    • Kewen Lin's avatar
      vect: Simplify costing on vectorizable_scan_store · 7184d225
      Kewen Lin authored
      This patch is to simplify the costing on the case
      vectorizable_scan_store without calling function
      vect_model_store_cost any more.
      
      I considered if moving the costing into function
      vectorizable_scan_store is a good idea, for doing
      that, we have to pass several variables down which
      are only used for costing, and for now we just
      want to keep the costing as the previous, haven't
      tried to make this costing consistent with what the
      transforming does, so I think we can leave it for now.
      
      gcc/ChangeLog:
      
      	* tree-vect-stmts.cc (vectorizable_store): Adjust costing on
      	vectorizable_scan_store without calling vect_model_store_cost
      	any more.
      7184d225
    • Kewen Lin's avatar
      vect: Adjust vectorizable_store costing on VMAT_GATHER_SCATTER · e00820c8
      Kewen Lin authored
      This patch adjusts the cost handling on VMAT_GATHER_SCATTER
      in function vectorizable_store (all three cases), then we
      won't depend on vect_model_load_store for its costing any
      more.  This patch shouldn't have any functional changes.
      
      gcc/ChangeLog:
      
      	* tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get
      	VMAT_GATHER_SCATTER any more, remove VMAT_GATHER_SCATTER related
      	handlings and the related parameter gs_info.
      	(vect_build_scatter_store_calls): Add the handlings on costing with
      	one more argument cost_vec.
      	(vectorizable_store): Adjust the cost handling on VMAT_GATHER_SCATTER
      	without calling vect_model_store_cost any more.
      e00820c8
    • Kewen Lin's avatar
      vect: Move vect_model_store_cost next to the transform in vectorizable_store · 3bf23666
      Kewen Lin authored
      This patch is an initial patch to move costing next to the
      transform, it still adopts vect_model_store_cost for costing
      but moves and duplicates it down according to the handlings
      of different vect_memory_access_types or some special
      handling need, hope it can make the subsequent patches easy
      to review.  This patch should not have any functional
      changes.
      
      gcc/ChangeLog:
      
      	* tree-vect-stmts.cc (vectorizable_store): Move and duplicate the call
      	to vect_model_store_cost down to some different transform paths
      	according to the handlings of different vect_memory_access_types
      	or some special handling need.
      3bf23666
    • Kewen Lin's avatar
      vect: Ensure vect store is supported for some VMAT_ELEMENTWISE case · 32207b15
      Kewen Lin authored
      When making/testing patches to move costing next to the
      transform code for vectorizable_store, some ICEs got
      exposed when I further refined the costing handlings on
      VMAT_ELEMENTWISE.  The apparent cause is triggering the
      assertion in rs6000 specific function for costing
      rs6000_builtin_vectorization_cost:
      
        if (TARGET_ALTIVEC)
           /* Misaligned stores are not supported.  */
           gcc_unreachable ();
      
      I used vect_get_store_cost instead of the original way by
      record_stmt_cost with scalar_store for costing, that is to
      use one unaligned_store instead, it matches what we use in
      transforming, it's a vector store as below:
      
        else if (group_size >= const_nunits
                 && group_size % const_nunits == 0)
          {
             nstores = 1;
             lnel = const_nunits;
             ltype = vectype;
             lvectype = vectype;
          }
      
      So IMHO it's more consistent with vector store instead of
      scalar store, with the given compilation option
      -mno-allow-movmisalign, the misaligned vector store is
      unexpected to be used in vectorizer, but why it's still
      adopted?  In the current implementation of function
      get_group_load_store_type, we always set alignment support
      scheme as dr_unaligned_supported for VMAT_ELEMENTWISE, it
      is true if we always adopt scalar stores, but as the above
      code shows, we could use vector stores for some cases, so
      we should use the correct alignment support scheme for it.
      
      This patch is to ensure the vector store is supported by
      further checking with vect_supportable_dr_alignment.  The
      ICEs got exposed with patches moving costing next to the
      transform but they haven't been landed, the test coverage
      would be there once they get landed.  The affected test
      cases are:
        - gcc.dg/vect/slp-45.c
        - gcc.dg/vect/vect-alias-check-{10,11,12}.c
      
      btw, I tried to make some correctness test case, but I
      realized that -mno-allow-movmisalign is mainly for noting
      movmisalign optab and it doesn't guard for the actual hw
      vector memory access insns, so I failed to make it unless
      I also altered some conditions for them as it.
      
      gcc/ChangeLog:
      
      	* tree-vect-stmts.cc (vectorizable_store): Ensure the generated
      	vector store for some case of VMAT_ELEMENTWISE is supported.
      32207b15
    • Zhang, Jun's avatar
      x86: set spincount 1 for x86 hybrid platform · e1e127de
      Zhang, Jun authored
      By test, we find in hybrid platform spincount 1 is better.
      
      Use '-march=native -Ofast -funroll-loops -flto',
      results as follows:
      
      spec2017 speed   RPL     ADL
      657.xz_s         0.00%   0.50%
      603.bwaves_s     10.90%  26.20%
      607.cactuBSSN_s  5.50%   72.50%
      619.lbm_s        2.40%   2.50%
      621.wrf_s        -7.70%  2.40%
      627.cam4_s       0.50%   0.70%
      628.pop2_s       48.20%  153.00%
      638.imagick_s    -0.10%  0.20%
      644.nab_s        2.30%   1.40%
      649.fotonik3d_s  8.00%   13.80%
      654.roms_s       1.20%   1.10%
      Geomean-int      0.00%   0.50%
      Geomean-fp       6.30%   21.10%
      Geomean-all      5.70%   19.10%
      
      omp2012          RPL     ADL
      350.md           -1.81%  -1.75%
      351.bwaves       7.72%   12.50%
      352.nab          14.63%  19.71%
      357.bt331        -0.20%  1.77%
      358.botsalgn     0.00%   0.00%
      359.botsspar     0.00%   0.65%
      360.ilbdc        0.00%   0.25%
      362.fma3d        2.66%   -0.51%
      363.swim         10.44%  0.00%
      367.imagick      0.00%   0.12%
      370.mgrid331     2.49%   25.56%
      371.applu331     1.06%   4.22%
      372.smithwa      0.74%   3.34%
      376.kdtree       10.67%  16.03%
      GEOMEAN          3.34%   5.53%
      
      include/ChangeLog:
      
      	PR target/109812
      	* spincount.h: New file.
      
      libgomp/ChangeLog:
      
      	* env.c (initialize_env): Use do_adjust_default_spincount.
      	* config/linux/x86/spincount.h: New file.
      e1e127de
    • Pan Li's avatar
      RISC-V: Support FP llrint auto vectorization · 6a3302a4
      Pan Li authored
      
      This patch would like to support the FP llrint auto vectorization.
      
      * long long llrint (double)
      
      This will be the CVT from DF => DI from the standard name's perpsective,
      which has been covered in previous PATCH(es). Thus, this patch only add
      some test cases.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/unop/test-math.h: Add type int64_t.
      	* gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      6a3302a4
    • Mo, Zewei's avatar
      [APX] Support Intel APX PUSH2POP2 · 180b08f6
      Mo, Zewei authored
      
      This feature requires stack to be aligned at 16byte, therefore in
      prologue/epilogue, a standalone push/pop will be emitted before any
      push2/pop2 if the stack was not aligned to 16byte.
      Also for current implementation we only support push2/pop2 usage in
      function prologue/epilogue for those callee-saved registers.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.cc (gen_push2): New function to emit push2
      	and adjust cfa offset.
      	(ix86_pro_and_epilogue_can_use_push2_pop2): New function to
      	determine whether push2/pop2 can be used.
      	(ix86_compute_frame_layout): Adjust preferred stack boundary
      	and stack alignment needed for push2/pop2.
      	(ix86_emit_save_regs): Emit push2 when available.
      	(ix86_emit_restore_reg_using_pop2): New function to emit pop2
      	and adjust cfa info.
      	(ix86_emit_restore_regs_using_pop2): New function to loop
      	through the saved regs and call above.
      	(ix86_expand_epilogue): Call ix86_emit_restore_regs_using_pop2
      	when push2pop2 available.
      	* config/i386/i386.md (push2_di): New pattern for push2.
      	(pop2_di): Likewise for pop2.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/apx-push2pop2-1.c: New test.
      	* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
      	* gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise.
      
      Co-authored-by: default avatarHu Lin1 <lin1.hu@intel.com>
      Co-authored-by: default avatarHongyu Wang <hongyu.wang@intel.com>
      180b08f6
    • Pan Li's avatar
      RISC-V: Support FP irintf auto vectorization · d6b7fe11
      Pan Li authored
      
      This patch would like to support the FP irintf auto vectorization.
      
      * int irintf (float)
      
      Due to the limitation that only the same size of data type are allowed
      in the vectorier, the standard name lrintmn2 only act on SF => SI.
      
      Given we have code like:
      
      void
      test_irintf (int *out, float *in, unsigned count)
      {
        for (unsigned i = 0; i < count; i++)
          out[i] = __builtin_irintf (in[i]);
      }
      
      Before this patch:
      .L3:
        ...
        flw      fa5,0(a1)
        fcvt.w.s a5,fa5,dyn
        sw       a5,-4(a0)
        ...
        bne      a1,a4,.L3
      
      After this patch:
      .L3:
        ...
        vle32.v     v1,0(a1)
        vfcvt.x.f.v v1,v1
        vse32.v     v1,0(a0)
        ...
        bne         a2,zero,.L3
      
      The rest part like DF => SI/HF => SI will be covered by the hook
      TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
      
      gcc/ChangeLog:
      
      	* config/riscv/autovec.md (lrint<mode><vlconvert>2): Rename from.
      	(lrint<mode><v_i_l_ll_convert>2): Rename to.
      	* config/riscv/vector-iterators.md: Rename and remove TARGET_64BIT.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/unop/math-irint-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c: New test.
      	* gcc.target/riscv/rvv/autovec/vls/math-irint-0.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      d6b7fe11
    • GCC Administrator's avatar
      Daily bump. · 6febf76c
      GCC Administrator authored
      6febf76c
  3. Oct 11, 2023
    • Kito Cheng's avatar
      RISC-V: Add TARGET_MIN_VLEN_OPTS to fix the build · 06f36c1d
      Kito Cheng authored
      gcc/ChangeLog:
      
      	* config/riscv/riscv-opts.h (TARGET_MIN_VLEN_OPTS): New.
      06f36c1d
    • Jeff Law's avatar
      RISC-V Adjust long unconditional branch sequence · a3e50ee9
      Jeff Law authored
      Andrew and I independently noted the long unconditional branch sequence was
      using the "call" pseudo op.  Technically it works, but it's a bit odd.  This
      patch flips it to use the "jump" pseudo-op.
      
      This was tested with a hacked-up local compiler which forced all branches/jumps
      to be long jumps.  Naturally it triggered some failures for scan-asm tests but
      no execution regressions (which is mostly what I was testing for).
      
      I've updated the long branch support item in the RISE wiki to indicate that we
      eventually want a register scavenging approach with a fallback to $ra in the
      future so that we don't muck up the return address predictors.  It's not
      super-high priority and shouldn't be terrible to implement given we've got the
      $ra fallback when a suitable register can not be found.
      
      gcc/
      	* config/riscv/riscv.md (jump): Adjust sequence to use a "jump"
      	pseudo op instead of a "call" pseudo op.
      a3e50ee9
    • Kito Cheng's avatar
      RISC-V: Extend riscv_subset_list, preparatory for target attribute support · faae30c4
      Kito Cheng authored
      riscv_subset_list only accept a full arch string before, but we need to
      parse single extension when supporting target attribute, also we may set
      a riscv_subset_list directly rather than re-parsing the ISA string
      again.
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-subset.h (riscv_subset_list::parse_single_std_ext):
      	New.
      	(riscv_subset_list::parse_single_multiletter_ext): Ditto.
      	(riscv_subset_list::clone): Ditto.
      	(riscv_subset_list::parse_single_ext): Ditto.
      	(riscv_subset_list::set_loc): Ditto.
      	(riscv_set_arch_by_subset_list): Ditto.
      	* common/config/riscv/riscv-common.cc
      	(riscv_subset_list::parse_single_std_ext): New.
      	(riscv_subset_list::parse_single_multiletter_ext): Ditto.
      	(riscv_subset_list::clone): Ditto.
      	(riscv_subset_list::parse_single_ext): Ditto.
      	(riscv_subset_list::set_loc): Ditto.
      	(riscv_set_arch_by_subset_list): Ditto.
      faae30c4
    • Kito Cheng's avatar
      RISC-V: Refactor riscv_option_override and riscv_convert_vector_bits. [NFC] · 9452d13b
      Kito Cheng authored
      Allow those funciton apply from a local gcc_options rather than the
      global options.
      
      Preparatory for target attribute, sperate this change for eaiser reivew
      since it's a NFC.
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc (riscv_convert_vector_bits): Get setting
      	from argument rather than get setting from global setting.
      	(riscv_override_options_internal): New, splited from
      	riscv_override_options, also take a gcc_options argument.
      	(riscv_option_override): Splited most part to
      	riscv_override_options_internal.
      9452d13b
Loading