Skip to content
Snippets Groups Projects
  1. Dec 27, 2021
  2. Dec 26, 2021
    • H.J. Lu's avatar
      i386: Check AX input in any_mul_highpart peepholes · d8748301
      H.J. Lu authored
      When applying peephole optimization to transform
      
      	mov imm, %reg0
      	mov %reg1, %AX_REG
      	imul %reg0
      
      to
      
      	mov imm, %AX_REG
      	imul %reg1
      
      disable peephole optimization if reg1 == AX_REG.
      
      gcc/
      
      	PR target/103785
      	* config/i386/i386.md: Swap operand order in comments and check
      	AX input in any_mul_highpart peepholes.
      
      gcc/testsuite/
      
      	PR target/103785
      	* gcc.target/i386/pr103785.c: New test.
      d8748301
    • Francois-Xavier Coudert's avatar
      Fortran: speed up decimal output of integers · 9525c26b
      Francois-Xavier Coudert authored
      libgfortran/ChangeLog:
      
      	PR libfortran/98076
      	* runtime/string.c (itoa64, itoa64_pad19): New helper functions.
      	(gfc_itoa): On targets with 128-bit integers, call fast
      	64-bit functions to avoid many slow divisions.
      
      gcc/testsuite/ChangeLog:
      
      	PR libfortran/98076
      	* gfortran.dg/pr98076.f90: New test.
      9525c26b
    • GCC Administrator's avatar
      Daily bump. · 10ae9946
      GCC Administrator authored
      10ae9946
  3. Dec 25, 2021
    • Francois-Xavier Coudert's avatar
      Fortran: simplify library code for integer-to-decimal conversion · 4ae906e4
      Francois-Xavier Coudert authored
      libgfortran/ChangeLog:
      
      	PR libfortran/81986
      	PR libfortran/99191
      
      	* libgfortran.h: Remove gfc_xtoa(), adjust gfc_itoa() and
      	GFC_ITOA_BUF_SIZE.
      	* io/write.c (write_decimal): conversion parameter is always
      	gfc_itoa(), so remove it. Protect from overflow.
      	(xtoa): Move gfc_xtoa and update its name.
      	(xtoa_big): Renamed from ztoa_big for consistency.
      	(write_z): Adjust to new function names.
      	(write_i, write_integer): Remove last arg of write_decimal.
      	* runtime/backtrace.c (error_callback): Comment on the use of
      	gfc_itoa().
      	* runtime/error.c (gfc_xtoa): Move to io/write.c.
      	* runtime/string.c (gfc_itoa): Take an unsigned argument,
      	remove the handling of negative values.
      4ae906e4
    • GCC Administrator's avatar
      Daily bump. · ffb5418f
      GCC Administrator authored
      ffb5418f
  4. Dec 24, 2021
    • Uros Bizjak's avatar
      i386: Add V2SFmode DIV insn pattern [PR95046, PR103797] · 8f921393
      Uros Bizjak authored
      Use V4SFmode "DIVPS X,Y" with [y0, y1, 1.0f, 1.0f] as a divisor
      to avoid division by zero.
      
      2021-12-24  Uroš Bizjak  <ubizjak@gmail.com>
      
      gcc/ChangeLog:
      
      	PR target/95046
      	PR target/103797
      	* config/i386/mmx.md (divv2sf3): New instruction pattern.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/95046
      	PR target/103797
      	* gcc.target/i386/pr95046-1.c (test_div): Add.
      	(dg-options): Add -mno-recip.
      8f921393
    • Iain Sandoe's avatar
      Darwin: Amend a comment to be more inclusive [NFC]. · 43dadcf3
      Iain Sandoe authored
      
      As per title.
      
      Signed-off-by: default avatarIain Sandoe <iain@sandoe.co.uk>
      
      gcc/ChangeLog:
      
      	* config/darwin.c (darwin_override_options): Make a comment
      	more inclusive.
      43dadcf3
    • Iain Sandoe's avatar
      Darwin: Update rules for handling alignment of globals. · 19bf83a9
      Iain Sandoe authored
      
      The current rule was too strict and has not been required since Darwin11.
      
      This relaxes the constraint to allow up to 2^28 alignment for non-common
      entities.  Common is still restricted to a maximum aligment of 2^15.
      
      When the host is an older version of Darwin ( earlier that 11 ) then the
      existing constraint is still applied.  Note that this is a host constraint
      not a target one (so that a compilation on 10.7 targeting 10.6 is allowed
      to use a greater alignment than the tools on 10.6 support).  This matches
      the behaviour of clang.
      
      Signed-off-by: default avatarIain Sandoe <iain@sandoe.co.uk>
      
      gcc/ChangeLog:
      
      	* config.gcc: Emit L2_MAX_OFILE_ALIGNMENT with suitable
      	values for the host.
      	* config/darwin.c (darwin_emit_common): Error for alignment
      	values > 32768.
      	* config/darwin.h (MAX_OFILE_ALIGNMENT): Rework to use the
      	configured L2_MAX_OFILE_ALIGNMENT.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/darwin-aligned-globals.c: New test.
      	* gcc.dg/darwin-comm-1.c: New test.
      	* gcc.dg/attr-aligned.c: Amend for new alignment values on
      	Darwin.
      	* gcc.target/i386/pr89261.c: Likewise.
      19bf83a9
    • Iain Sandoe's avatar
      Darwin: Check for that flag-reorder-and-partition. · 8381075f
      Iain Sandoe authored
      
      We were checking whether the flag had been set by the user, but not if
      it was set to true.  Which means that the check fails in its intent when
      the user puts -fno-reorder-and-partition.
      
      Signed-off-by: default avatarIain Sandoe <iain@sandoe.co.uk>
      
      gcc/ChangeLog:
      
      	* config/darwin.c (darwin_override_options): When checking for the
      	flag-reorder-and-partition case, also check that it is set on.
      8381075f
    • Iain Sandoe's avatar
      Darwin: Define OBJECT_FORMAT_MACHO. · 9a4a29ea
      Iain Sandoe authored
      
      There are places that we need to make different codegen depending
      on the object format rather than on the arch.  We already have
      definitions for ELF, COFF etc. this adds one for MACHO.
      
      Signed-off-by: default avatarIain Sandoe <iain@sandoe.co.uk>
      
      gcc/ChangeLog:
      
      	* config/darwin.h (OBJECT_FORMAT_MACHO): New.
      9a4a29ea
    • GCC Administrator's avatar
      Daily bump. · 7d01da81
      GCC Administrator authored
      7d01da81
  5. Dec 23, 2021
    • H.J. Lu's avatar
      smuldi3_highpart.c: Replace long with long long for -mx32 · 8f34344e
      H.J. Lu authored
      	* gcc.target/i386/smuldi3_highpart.c: Replace long with long long.
      8f34344e
    • Roger Sayle's avatar
      x86: PR target/103773: Fix wrong-code with -Oz from pop to memory. · ef26c151
      Roger Sayle authored
      This is a fix to PR target/103773 where -Oz shouldn't use push/pop
      on x86 to shrink writing small integer constants to memory.
      Instead clang uses "andl $0, mem" for writing zero, and "orl $-1, mem"
      when writing -1 to memory when using -Oz.  This patch implements this
      via peephole2 where we can confirm that its ok to clobber the flags.
      
      2021-12-23  Roger Sayle  <roger@nextmovesoftware.com>
      	    Uroš Bizjak  <ubizjak@gmail.com>
      
      gcc/ChangeLog
      	PR target/103773
      	* config/i386/i386.md (*mov<mode>_and): New define_insn for
      	writing a zero to memory using AND.
      	(*mov<mode>_or): Extend to allow memory destination and HImode.
      	(*movdi_internal): Remove -Oz push/pop optimization from here.
      	(*movsi_internal): Likewise.
      	(peephole2): Perform -Oz push/pop optimization here, only for
      	register destinations, values other than zero, and in functions
      	that don't used the red zone.
      	(peephole2): With -Oz, convert writes of 0 or -1 to memory into
      	their clobber forms, i.e. *mov<mode>_and and *mov<mode>_or resp.
      
      gcc/testsuite/ChangeLog
      	PR target/103773
      	* gcc.target/i386/pr103773-2.c: New test case.
      	* gcc.target/i386/pr103773.c: New test case.
      ef26c151
    • konglin1's avatar
      i386: Enable intrinsics that convert float and bf16 data to each other. · 61e53698
      konglin1 authored
      gcc/ChangeLog:
      
      	* config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Add new intrinsic.
      	(_mm512_cvtpbh_ps): Likewise.
      	(_mm512_maskz_cvtpbh_ps): Likewise.
      	(_mm512_mask_cvtpbh_ps): Likewise.
      	* config/i386/avx512bf16vlintrin.h (_mm_cvtness_sbh): Likewise.
      	(_mm_cvtpbh_ps): Likewise.
      	(_mm256_cvtpbh_ps): Likewise.
      	(_mm_maskz_cvtpbh_ps): Likewise.
      	(_mm256_maskz_cvtpbh_ps): Likewise.
      	(_mm_mask_cvtpbh_ps): Likewise.
      	(_mm256_mask_cvtpbh_ps): Likewise.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: New test.
      	* gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c: Ditto.
      	* gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Ditto.
      	* gcc.target/i386/avx512bf16vl-vcvtpbh2ps-1.c: Ditto.
      61e53698
    • Feng Xue's avatar
      Fix typo in type verification. · 9ac0730c
      Feng Xue authored
      	PR ipa/103786
      
      gcc/ChangeLog:
      
      	* tree.c (verify_type): Fix typo.
      9ac0730c
    • liuhongt's avatar
      Combine vpcmpuw + zero_extend to vpcmpuw. · 1a7ce857
      liuhongt authored
      vcmp{ps,ph,pd} and vpcmp{,u}{b,w,d,q} implicitly clear the upper bits
      of dest.
      
      gcc/ChangeLog:
      
      	PR target/103750
      	* config/i386/sse.md
      	(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
      	New pre_reload define_insn_and_split.
      	(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
      	Ditto.
      	(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
      	Ditto.
      	(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
      	Ditto.
      	(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
      	Ditto.
      	(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
      	Ditto.
      	(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
      	Ditto.
      	(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
      	Ditto.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/avx512bw-pr103750-1.c: New test.
      	* gcc.target/i386/avx512bw-pr103750-2.c: New test.
      	* gcc.target/i386/avx512f-pr103750-1.c: New test.
      	* gcc.target/i386/avx512f-pr103750-2.c: New test.
      	* gcc.target/i386/avx512fp16-pr103750-1.c: New test.
      	* gcc.target/i386/avx512fp16-pr103750-2.c: New test.
      1a7ce857
    • GCC Administrator's avatar
      Daily bump. · 9f9bc0bf
      GCC Administrator authored
      9f9bc0bf
  6. Dec 22, 2021
    • Harald Anlauf's avatar
      Fortran: BOZ literal constants are not interoperable · ff0ad4b5
      Harald Anlauf authored
      gcc/fortran/ChangeLog:
      
      	PR fortran/103778
      	* check.c (is_c_interoperable): A BOZ literal constant is not
      	interoperable.
      
      gcc/testsuite/ChangeLog:
      
      	PR fortran/103778
      	* gfortran.dg/illegal_boz_arg_3.f90: New test.
      ff0ad4b5
    • Harald Anlauf's avatar
      Fortran: CASE selector expressions must be scalar · 5474092c
      Harald Anlauf authored
      gcc/fortran/ChangeLog:
      
      	PR fortran/103776
      	* match.c (match_case_selector): Reject expressions in CASE
      	selector which are not scalar.
      
      gcc/testsuite/ChangeLog:
      
      	PR fortran/103776
      	* gfortran.dg/select_10.f90: New test.
      5474092c
    • Murray Steele's avatar
      arm: Declare MVE types internally via pragma · 9c1ce17b
      Murray Steele authored
      Move the implementation of MVE ACLE types from arm_mve_types.h to
      inside GCC via a new pragma, which replaces the prior type
      definitions.  This allows for the types to be used internally for
      intrinsic function definitions.
      
      gcc/ChangeLog:
      
      	* config.gcc (arm*-*-*): Add arm-mve-builtins.o to extra_objs.
      	* config/arm/arm-c.c (arm_pragma_arm): Handle "#pragma GCC arm".
      	(arm_register_target_pragmas): Register it.
      	* config/arm/arm-protos.h: (arm_mve::arm_handle_mve_types_h): New
      	prototype.
      	* config/arm/arm_mve_types.h: Replace MVE type definitions with
      	new pragma.
      	* config/arm/t-arm: (arm-mve-builtins.o): New target rule.
      	* config/arm/arm-mve-builtins.cc: New file.
      	* config/arm/arm-mve-builtins.def: New file.
      	* config/arm/arm-mve-builtins.h: New file.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/arm/mve/mve.exp: Add new subdirectories.
      	* gcc.target/arm/mve/general-c/type_redef_1.c: New test.
      	* gcc.target/arm/mve/general/double_pragmas_1.c: New test.
      	* gcc.target/arm/mve/general/nomve_1.c: New test.
      9c1ce17b
    • Murray Steele's avatar
      arm: Move arm_simd_info array declaration into header · 8c61cefe
      Murray Steele authored
      Move the arm_simd_type and arm_type_qualifiers enums, and
      arm_simd_info struct from arm-builtins.c into arm-builtins.h header.
      
      This is a first step towards internalising the type definitions for
      MVE predicate, vector, and tuple types.  By moving arm_simd_types into
      a header, we allow future patches to use these type trees externally
      to arm-builtins.c, which is a crucial step towards developing an MVE
      intrinsics framework similar to the current SVE implementation.
      
      gcc/ChangeLog:
      
      	* config/arm/arm-builtins.c (enum arm_type_qualifiers): Move to
      	arm_builtins.h.
      	(enum arm_simd_type): Move to arm-builtins.h.
      	(struct arm_simd_type_info): Move to arm-builtins.h.
      	* config/arm/arm-builtins.h (enum arm_simd_type): Move from
      	arm-builtins.c.
      	(enum arm_type_qualifiers): Move from arm-builtins.c.
      	(struct arm_simd_type_info): Move from arm-builtins.c.
      8c61cefe
    • Francois-Xavier Coudert's avatar
      Fortran: allow __float128 on targets where long double is not REAL(KIND=10) · 22817356
      Francois-Xavier Coudert authored
      The logic for detection of REAL(KIND=16) in kinds-override.h made
      assumptions:
      
          -- if real(kind=10) exists, i.e. if HAVE_GFC_REAL_10 is defined,
             then it is necessarily the "long double" type
          -- if real(kind=16) exists, then:
             * if HAVE_GFC_REAL_10, real(kind=16) is "__float128"
             * otherwise, real(kind=16) is "long double"
      
      This may not always be true. Take the aarch64-apple-darwin port,
      it has double == long double == binary64, and __float128 == binary128.
      
      We already have more fine-grained logic in the mk-kinds-h.sh script,
      where we actually check the Fortran kind corresponding to C’s long
      double. So let's use it, and emit the GFC_REAL_16_IS_FLOAT128 /
      GFC_REAL_16_IS_LONG_DOUBLE macros there.
      
      libgfortran/ChangeLog:
      
      	* kinds-override.h: Move GFC_REAL_16_IS_* macros...
      	* mk-kinds-h.sh: ... here.
      22817356
    • Martin Liska's avatar
      docs: docs: use ';' for function declarations. (part 3) · 63eb073e
      Martin Liska authored
      gcc/ChangeLog:
      
      	* doc/extend.texi: Unify all function declarations in examples
      	where some miss trailing ';'.
      63eb073e
    • Martin Liska's avatar
      docs: docs: use ';' for function declarations. (part 2) · 3892cfee
      Martin Liska authored
      gcc/ChangeLog:
      
      	* doc/extend.texi: Unify all function declarations in examples
      	where some miss trailing ';'.
      3892cfee
    • Martin Liska's avatar
      docs: use ';' for function declarations. · 1a6592ff
      Martin Liska authored
      gcc/ChangeLog:
      
      	* doc/extend.texi: Unify all function declarations in examples
      	where some miss trailing ';'.
      1a6592ff
    • Martin Liska's avatar
      docs: Unify instruct set name. · 3e1a06ec
      Martin Liska authored
      gcc/ChangeLog:
      
      	* doc/extend.texi: Use uppercase letters for SSEx.
      3e1a06ec
    • GCC Administrator's avatar
      Daily bump. · aa17859b
      GCC Administrator authored
      aa17859b
  7. Dec 21, 2021
    • Iain Buclaw's avatar
      config: Add check whether D compiler works (PR103528) · 7c6ae994
      Iain Buclaw authored
      As well as checking for the existence of a GDC compiler, also validate
      that it has also been built with libphobos, otherwise warn or fail with
      the message that GDC is required to build d.
      
      config/ChangeLog:
      
      	PR d/103528
      	* acx.m4 (ACX_PROG_GDC): Add check whether D compiler works.
      
      ChangeLog:
      
      	* configure: Regenerate.
      7c6ae994
    • Iain Buclaw's avatar
      libphobos: Add power*-*-freebsd* as supported target · 0c3fc06c
      Iain Buclaw authored
      This has been tested on powerpc64-freebsd13 and powerpc64le-freebsd13,
      and used to build dub, along with some D tools from ports.
      
      libphobos/ChangeLog:
      
      	* configure.tgt: Add power*-*-freebsd* as a supported target.
      0c3fc06c
    • Jiang Haochen's avatar
      i386: Add missing BMI intrinsic to align with clang · d2290797
      Jiang Haochen authored
      gcc/ChangeLog:
      
      	* config/i386/bmiintrin.h (_tzcnt_u16): New intrinsic.
      	(_andn_u32): Ditto.
      	(_andn_u64): Ditto.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/bmi-1.c: Add test for new intrinsic.
      	* gcc.target/i386/bmi-2.c: Ditto.
      	* gcc.target/i386/bmi-3.c: Ditto.
      d2290797
    • Martin Liska's avatar
      config.sub: change mode to 755. · 6fad101f
      Martin Liska authored
      ChangeLog:
      
      	* config.sub: Change mode back to 755.
      6fad101f
    • Xionghu Luo's avatar
      Don't move cold code out of loop by checking bb count · 51a24e4a
      Xionghu Luo authored
      v8 changes:
      1. Use hotter_than_inner_loop instead of colder to store a hotter loop
      nearest to loop.
      2. Update the logic in fill_coldest_and_hotter_out_loop and
      get_coldest_out_loop to make common case O(1).
      3. Update function argument bb_colder_than_loop_preheader.
      4. Make cached array to vec<class *loop> for index checking.
      
      v7 changes:
      1. Refine get_coldest_out_loop to replace loop with checking
      pre-computed coldest_outermost_loop and colder_than_inner_loop.
      2. Add function fill_cold_out_loop, compute coldest_outermost_loop and
      colder_than_inner_loop recursively without loop.
      
      v6 changes:
      1. Add function fill_coldest_out_loop to pre compute the coldest
      outermost loop for each loop.
      2. Rename find_coldest_out_loop to get_coldest_out_loop.
      3. Add testcase ssa-lim-22.c to differentiate with ssa-lim-19.c.
      
      v5 changes:
      1. Refine comments for new functions.
      2. Use basic_block instead of count in bb_colder_than_loop_preheader
      to align with function name.
      3. Refine with simpler implementation for get_coldest_out_loop and
      ref_in_loop_hot_body::operator for better understanding.
      
      v4 changes:
      1. Sort out profile_count comparision to function bb_cold_than_loop_preheader.
      2. Update ref_in_loop_hot_body::operator () to find cold_loop before compare.
      3. Split RTL invariant motion part out.
      4. Remove aux changes.
      
      v3 changes:
      1. Handle max_loop in determine_max_movement instead of outermost_invariant_loop.
      2. Remove unnecessary changes.
      3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in can_sm_ref_p.
      4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
      infinite loop when implementing v1 and the iteration is missed to be
      updated actually.
      
      v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
      v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
      v3: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580211.html
      v4: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581231.html
      v5: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581961.html
      ...
      v8: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586209.html
      
      There was a patch trying to avoid move cold block out of loop:
      
      https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
      
      Richard suggested to "never hoist anything from a bb with lower execution
      frequency to a bb with higher one in LIM invariantness_dom_walker
      before_dom_children".
      
      In gimple LIM analysis, add get_coldest_out_loop to move invariants to
      expected target loop, if profile count of the loop bb is colder
      than target loop preheader, it won't be hoisted out of loop.
      Likely for store motion, if all locations of the REF in loop is cold,
      don't do store motion of it.
      
      SPEC2017 performance evaluation shows 1% performance improvement for
      intrate GEOMEAN and no obvious regression for others.  Especially,
      500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
      largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
      on P8LE.
      
      gcc/ChangeLog:
      
      2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>
      
      	* tree-ssa-loop-im.c (bb_colder_than_loop_preheader): New
      	function.
      	(get_coldest_out_loop): New function.
      	(determine_max_movement): Use get_coldest_out_loop.
      	(move_computations_worker): Adjust and fix iteration udpate.
      	(class ref_in_loop_hot_body): New functor.
      	(ref_in_loop_hot_body::operator): New.
      	(can_sm_ref_p): Use for_all_locs_in_loop.
      	(fill_coldest_and_hotter_out_loop): New.
      	(tree_ssa_lim_finalize): Free coldest_outermost_loop and
      	hotter_than_inner_loop.
      	(loop_invariant_motion_in_fun): Call fill_coldest_and_hotter_out_loop.
      
      gcc/testsuite/ChangeLog:
      
      2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>
      
      	* gcc.dg/tree-ssa/recip-3.c: Adjust.
      	* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
      	* gcc.dg/tree-ssa/ssa-lim-20.c: New test.
      	* gcc.dg/tree-ssa/ssa-lim-21.c: New test.
      	* gcc.dg/tree-ssa/ssa-lim-22.c: New test.
      	* gcc.dg/tree-ssa/ssa-lim-23.c: New test.
      51a24e4a
    • Xionghu Luo's avatar
      Fix loop split incorrect count and probability · cd5ae148
      Xionghu Luo authored
      In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
      kind of split. split_loop only works for single loop and insert edge at
      exit when split, while split_loop_on_cond is not limited to single loop
      and insert edge at latch when split.  Both split behavior should consider
      loop count and probability update.  For split_loop, loop split condition
      is moved in front of loop1 and loop2; But split_loop_on_cond moves the
      condition between loop1 and loop2, this patch does:
       1) profile count proportion for both original loop and copied loop
      without dropping down the true branch's count;
       2) probability update in the two loops and between the two loops.
      
      Regression tested pass.
      
      Changes diff for split_loop and split_loop_on_cond cases:
      
      1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
      ...
         <bb 2> [local count: 118111600]:
         if (beg_5(D) < end_8(D))
           goto <bb 14>; [89.00%]
         else
           goto <bb 6>; [11.00%]
      
         <bb 14> [local count: 105119324]:
         if (beg2_6(D) < c_9(D))
      -    goto <bb 15>; [100.00%]
      +    goto <bb 15>; [33.00%]
         else
      -    goto <bb 16>; [100.00%]
      +    goto <bb 16>; [67.00%]
      
      -  <bb 15> [local count: 105119324]:
      +  <bb 15> [local count: 34689377]:
         _25 = beg_5(D) + 1;
         _26 = end_8(D) - beg_5(D);
         _27 = beg2_6(D) + _26;
         _28 = MIN_EXPR <c_9(D), _27>;
      
      -  <bb 3> [local count: 955630225]:
      +  <bb 3> [local count: 315357973]:
         # i_16 = PHI <i_11(8), beg_5(D)(15)>
         # j_17 = PHI <j_12(8), beg2_6(D)(15)>
         printf ("a: %d %d\n", i_16, j_17);
         i_11 = i_16 + 1;
         j_12 = j_17 + 1;
         if (j_12 < _28)
      -    goto <bb 8>; [89.00%]
      +    goto <bb 8>; [29.37%]
         else
      -    goto <bb 17>; [11.00%]
      +    goto <bb 17>; [70.63%]
      
      -  <bb 8> [local count: 850510901]:
      +  <bb 8> [local count: 280668596]:
         goto <bb 3>; [100.00%]
      
      -  <bb 16> [local count: 105119324]:
      +  <bb 16> [local count: 70429947]:
         # i_22 = PHI <beg_5(D)(14), i_29(17)>
         # j_23 = PHI <beg2_6(D)(14), j_30(17)>
      
         <bb 10> [local count: 955630225]:
         # i_2 = PHI <i_22(16), i_20(13)>
         # j_1 = PHI <j_23(16), j_21(13)>
         i_20 = i_2 + 1;
         j_21 = j_1 + 1;
         if (end_8(D) > i_20)
      -    goto <bb 13>; [89.00%]
      +    goto <bb 13>; [59.63%]
         else
      -    goto <bb 9>; [11.00%]
      +    goto <bb 9>; [40.37%]
      
      -  <bb 13> [local count: 850510901]:
      +  <bb 13> [local count: 569842305]:
         goto <bb 10>; [100.00%]
      
         <bb 17> [local count: 105119324]:
         # i_29 = PHI <i_11(3)>
         # j_30 = PHI <j_12(3)>
         if (end_8(D) > i_29)
           goto <bb 16>; [80.00%]
         else
           goto <bb 9>; [20.00%]
      
         <bb 9> [local count: 105119324]:
      
         <bb 6> [local count: 118111600]:
         return 0;
      
       }
         <bb 2> [local count: 118111600]:
      -  if (beg_5(D) < end_8(D))
      +  _1 = end_6(D) - beg_7(D);
      +  j_9 = _1 + beg2_8(D);
      +  if (end_6(D) > beg_7(D))
           goto <bb 14>; [89.00%]
         else
           goto <bb 6>; [11.00%]
      
         <bb 14> [local count: 105119324]:
      -  if (beg2_6(D) < c_9(D))
      -    goto <bb 15>; [100.00%]
      +  if (j_9 >= c_11(D))
      +    goto <bb 15>; [33.00%]
         else
      -    goto <bb 16>; [100.00%]
      +    goto <bb 16>; [67.00%]
      
      -  <bb 15> [local count: 105119324]:
      -  _25 = beg_5(D) + 1;
      -  _26 = end_8(D) - beg_5(D);
      -  _27 = beg2_6(D) + _26;
      -  _28 = MIN_EXPR <c_9(D), _27>;
      -
      -  <bb 3> [local count: 955630225]:
      -  # i_16 = PHI <i_11(8), beg_5(D)(15)>
      -  # j_17 = PHI <j_12(8), beg2_6(D)(15)>
      -  printf ("a: %d %d\n", i_16, j_17);
      -  i_11 = i_16 + 1;
      -  j_12 = j_17 + 1;
      -  if (j_12 < _28)
      -    goto <bb 8>; [89.00%]
      +  <bb 15> [local count: 34689377]:
      +  _27 = end_6(D) + -1;
      +  _28 = beg_7(D) - end_6(D);
      +  _29 = j_9 + _28;
      +  _30 = _29 + 1;
      +  _31 = MAX_EXPR <c_11(D), _30>;
      +
      +  <bb 3> [local count: 315357973]:
      +  # i_18 = PHI <i_13(8), end_6(D)(15)>
      +  # j_19 = PHI <j_14(8), j_9(15)>
      +  printf ("a: %d %d\n", i_18, j_19);
      +  i_13 = i_18 + -1;
      +  j_14 = j_19 + -1;
      +  if (j_14 >= _31)
      +    goto <bb 8>; [29.37%]
         else
      -    goto <bb 17>; [11.00%]
      +    goto <bb 17>; [70.63%]
      
      -  <bb 8> [local count: 850510901]:
      +  <bb 8> [local count: 280668596]:
         goto <bb 3>; [100.00%]
      
      -  <bb 16> [local count: 105119324]:
      -  # i_22 = PHI <beg_5(D)(14), i_29(17)>
      -  # j_23 = PHI <beg2_6(D)(14), j_30(17)>
      +  <bb 16> [local count: 70429947]:
      +  # i_24 = PHI <end_6(D)(14), i_32(17)>
      +  # j_25 = PHI <j_9(14), j_33(17)>
      
         <bb 10> [local count: 955630225]:
      -  # i_2 = PHI <i_22(16), i_20(13)>
      -  # j_1 = PHI <j_23(16), j_21(13)>
      -  i_20 = i_2 + 1;
      -  j_21 = j_1 + 1;
      -  if (end_8(D) > i_20)
      +  # i_3 = PHI <i_24(16), i_22(13)>
      +  # j_2 = PHI <j_25(16), j_23(13)>
      +  i_22 = i_3 + -1;
      +  j_23 = j_2 + -1;
      +  if (beg_7(D) < i_22)
           goto <bb 13>; [89.00%]
         else
           goto <bb 9>; [11.00%]
      
      -  <bb 13> [local count: 850510901]:
      +  <bb 13> [local count: 569842305]:
         goto <bb 10>; [100.00%]
      
         <bb 17> [local count: 105119324]:
      -  # i_29 = PHI <i_11(3)>
      -  # j_30 = PHI <j_12(3)>
      -  if (end_8(D) > i_29)
      +  # i_32 = PHI <i_13(3)>
      +  # j_33 = PHI <j_14(3)>
      +  if (beg_7(D) < i_32)
           goto <bb 16>; [80.00%]
         else
           goto <bb 9>; [20.00%]
      
         <bb 9> [local count: 105119324]:
      
         <bb 6> [local count: 118111600]:
         return 0;
      
       }
      
      2) diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
      ...
         <bb 2> [local count: 118111600]:
         if (n_7(D) > 0)
           goto <bb 4>; [89.00%]
         else
           goto <bb 3>; [11.00%]
      
         <bb 3> [local count: 118111600]:
         return;
      
         <bb 4> [local count: 105119324]:
         pretmp_3 = ga;
      
      -  <bb 5> [local count: 955630225]:
      +  <bb 5> [local count: 315357973]:
         # i_13 = PHI <i_10(20), 0(4)>
         # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
         if (prephitmp_12 != 0)
           goto <bb 6>; [33.00%]
         else
           goto <bb 7>; [67.00%]
      
         <bb 6> [local count: 315357972]:
         _2 = do_something ();
         ga = _2;
      
      -  <bb 7> [local count: 955630225]:
      +  <bb 7> [local count: 315357973]:
         # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
         i_10 = inc (i_13);
         if (n_7(D) > i_10)
           goto <bb 21>; [89.00%]
         else
           goto <bb 11>; [11.00%]
      
         <bb 11> [local count: 105119324]:
         goto <bb 3>; [100.00%]
      
      -  <bb 21> [local count: 850510901]:
      +  <bb 21> [local count: 280668596]:
         if (prephitmp_12 != 0)
      -    goto <bb 20>; [100.00%]
      +    goto <bb 20>; [33.00%]
         else
      -    goto <bb 19>; [INV]
      +    goto <bb 19>; [67.00%]
      
      -  <bb 20> [local count: 850510901]:
      +  <bb 20> [local count: 280668596]:
         goto <bb 5>; [100.00%]
      
      -  <bb 19> [count: 0]:
      +  <bb 19> [local count: 70429947]:
         # i_23 = PHI <i_10(21)>
         # prephitmp_25 = PHI <prephitmp_5(21)>
      
      -  <bb 12> [local count: 955630225]:
      +  <bb 12> [local count: 640272252]:
         # i_15 = PHI <i_23(19), i_22(16)>
         # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
         i_22 = inc (i_15);
         if (n_7(D) > i_22)
           goto <bb 16>; [89.00%]
         else
           goto <bb 11>; [11.00%]
      
      -  <bb 16> [local count: 850510901]:
      +  <bb 16> [local count: 569842305]:
         goto <bb 12>; [100.00%]
      
       }
      
      gcc/ChangeLog:
      
      2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>
      
      	* tree-ssa-loop-split.c (split_loop): Fix incorrect
      	profile_count and probability.
      	(do_split_loop_on_cond): Likewise.
      cd5ae148
    • Xionghu Luo's avatar
      Fix incorrect loop exit edge probability [PR103270] · 46bfe1b0
      Xionghu Luo authored
      r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
      profile-estimate when predict_extra_loop_exits, outer loop's exit edge
      is marked as inner loop's extra loop exit and set with incorrect
      prediction, then a hot inner loop will become cold loop finally through
      optimizations, this patch add loop check when searching extra exit edges
      to avoid unexpected predict_edge from predict_paths_for_bb.
      
      Regression tested on P8LE.
      
      gcc/ChangeLog:
      
      2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>
      
      	PR middle-end/103270
      	* predict.c (predict_extra_loop_exits): Add loop parameter.
      	(predict_loops): Call with loop argument.
      
      gcc/testsuite/ChangeLog:
      
      2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>
      
      	PR middle-end/103270
      	* gcc.dg/pr103270.c: New test.
      46bfe1b0
    • Xionghu Luo's avatar
      rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus · 460d53f8
      Xionghu Luo authored
      These four UNSPECS seems could be replaced with native RTL.
      
      For
      "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))":
      
      Quoted David's explanation:
      
      "The design came from the early implementation of Altivec:
      
      https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html
      
      If one later checks for saturation (reads VSCR), one needs a
      corresponding SET of the value.  It's set in an architecture-specific
      manner that isn't described to GCC, but it's set, not just clobbered
      and in an undefined state.
      
      The RTL does not describe that VSCR is set to the value 0.  The
      (const_int 0) is not the value set.  You can think of the (const_int
      0) as a dummy RTL argument to the VSCR UNSPEC.  UNSPEC requires at
      least one argument and the pattern doesn't try to express the
      argument, so it uses a dummy RTL constant.  It's part of a PARALLEL
      and the plus or minus already expresses the data dependency of the
      pattern on the input operands."
      
      gcc/ChangeLog:
      
      2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>
      
      	* config/rs6000/altivec.md (altivec_vaddu<VI_char>s): Replace
      	UNSPEC_VADDU with us_plus.
      	(altivec_vadds<VI_char>s): Replace UNSPEC_VADDS with ss_plus.
      	(altivec_vsubu<VI_char>s): Replace UNSPEC_VSUBU with us_minus.
      	(altivec_vsubs<VI_char>s): Replace UNSPEC_VSUBS with ss_minus.
      	(altivec_abss_<mode>): Likewise.
      460d53f8
    • GCC Administrator's avatar
      Daily bump. · 7631a4d1
      GCC Administrator authored
      7631a4d1
  8. Dec 20, 2021
    • Joseph Myers's avatar
      Update cpplib es.po · bb42d680
      Joseph Myers authored
      	* es.po: Update.
      bb42d680
    • Uros Bizjak's avatar
      i386: Fix <sse2p4_1>_pinsr<ssemodesuffix> and its splitters [PR103772] · 72c68d7a
      Uros Bizjak authored
      The clever trick to duplicate the value of the input operand into itself
      proved not so clever after all.  The splitter should not clobber the input
      operand in any case, since the register can hold the value outside the HImode
      lowpart when accessed as subreg.  Use the standard earlyclobber approach
      instead.
      
      The testcase fails with avx2 ISA, but I was not able to create the testcase
      that wouldn't require -mavx512fp16 compile flag.
      
      2021-12-20  Uroš Bizjak  <ubizjak@gmail.com>
      
      gcc/ChangeLog:
      
      	PR target/103772
      	* config/i386/sse.md (<sse2p4_1>_pinsr<ssemodesuffix>): Add
      	earlyclobber to (x,x,x,i) alternative.
      	(<sse2p4_1>_pinsr<ssemodesuffix> peephole2): Remove.
      	(<sse2p4_1>_pinsr<ssemodesuffix> splitter): Use output
      	operand as a temporary register.  Split after reload_completed.
      72c68d7a
    • Patrick Palka's avatar
      c++: memfn lookup consistency in incomplete-class ctx · ab85331c
      Patrick Palka authored
      When instantiating a call to a member function of a class template, we
      repeat the member function lookup in order to obtain the corresponding
      partially instantiated functions.  Within an incomplete-class context
      however, we need to be more careful when repeating the lookup because we
      don't want to introduce later-declared member functions that weren't
      visible at template definition time.  We're currently not careful enough
      in this respect, which causes us to reject memfn1.C below.
      
      This patch fixes this issue by making tsubst_baselink filter out from
      the instantiation-time lookup those member functions that were invisible
      at template definition time.  This is really only necessary within an
      incomplete-class context, so this patch adds a heuristic flag to BASELINK
      to help us avoid needlessly performing this filtering step (which would
      be a no-op) in complete-class contexts.
      
      This is also necessary for the ahead-of-time overload set pruning
      implemented in r12-6075 to be effective for member functions within
      class templates.
      
      gcc/cp/ChangeLog:
      
      	* call.c (build_new_method_call): Set
      	BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P on the pruned baselink.
      	* cp-tree.h (BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P): Define.
      	* pt.c (filter_memfn_lookup): New subroutine of tsubst_baselink.
      	(tsubst_baselink): Use filter_memfn_lookup on the new lookup
      	result when BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P is set on the
      	old baselink.  Remove redundant BASELINK_P check.
      	* search.c (build_baselink): Set
      	BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P appropriately.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/lookup/memfn1.C: New test.
      	* g++.dg/template/non-dependent16b.C: New test.
      ab85331c
Loading