Skip to content
Snippets Groups Projects
  1. May 09, 2023
    • Jakub Jelinek's avatar
      testsuite: Add further testcase for already fixed PR [PR109778] · c2cf2dc9
      Jakub Jelinek authored
      I came up with a testcase which reproduces all the way to r10-7469.
      LTO to avoid early inlining it, so that ccp handles rotates and not
      shifts before they are turned into rotates.
      
      2023-05-09  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/109778
      	* gcc.dg/lto/pr109778_0.c: New test.
      	* gcc.dg/lto/pr109778_1.c: New file.
      c2cf2dc9
    • Jakub Jelinek's avatar
      tree-ssa-ccp, wide-int: Fix up handling of [LR]ROTATE_EXPR in bitwise ccp [PR109778] · a8302d2a
      Jakub Jelinek authored
      The following testcase is miscompiled, because bitwise ccp2 handles
      a rotate with a signed type incorrectly.
      Seems tree-ssa-ccp.cc has the only callers of wi::[lr]rotate with 3
      arguments, all other callers just rotate in the right precision and
      I think work correctly.  ccp works with widest_ints and so rotations
      by the excessive precision certainly don't match what it wants
      when it sees a rotate in some specific bitsize.  Still, if it is
      unsigned rotate and the widest_int is zero extended from width,
      the functions perform left shift and logical right shift on the value
      and then at the end zero extend the result of left shift and uselessly
      also the result of logical right shift and return | of that.
      On the testcase we the signed char rrotate by 4 argument is
      CONSTANT -75 i.e. 0xffffffff....fffffb5 with mask 2.
      The mask is correctly rotated to 0x20, but because the 8-bit constant
      is sign extended to 192-bit one, the logical right shift by 4 doesn't
      yield expected 0xb, but gives 0xfffffffffff....ffffb, and then
      return wi::zext (left, width) | wi::zext (right, width); where left is
      0xfffffff....fb50, so we return 0xfb instead of the expected
      0x5b.
      
      The following patch fixes that by doing the zero extension in case of
      the right variable before doing wi::lrshift rather than after it.
      
      Also, wi::[lr]rotate widht width < precision always zero extends
      the result.  I'm afraid it can't do better because it doesn't know
      if it is done for an unsigned or signed type, but the caller in this
      case knows that very well, so I've done the extension based on sgn
      in the caller.  E.g. 0x5b rotated right (or left) by 4 with width 8
      previously gave 0xb5, but sgn == SIGNED in widest_int it should be
      0xffffffff....fffb5 instead.
      
      2023-05-09  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/109778
      	* wide-int.h (wi::lrotate, wi::rrotate): Call wi::lrshift on
      	wi::zext (x, width) rather than x if width != precision, rather
      	than using wi::zext (right, width) after the shift.
      	* tree-ssa-ccp.cc (bit_value_binop): Call wi::ext on the results
      	of wi::lrotate or wi::rrotate.
      
      	* gcc.c-torture/execute/pr109778.c: New test.
      a8302d2a
    • Alexander Monakov's avatar
      genmatch: fixup get_out_file · 153eafaa
      Alexander Monakov authored
      get_out_file did not follow the coding conventions (mixing three-space
      and two-space indentation, missing linebreak before function name).
      
      Take that as an excuse to reimplement it in a more terse manner and
      rename as 'choose_output', which is hopefully more descriptive.
      
      gcc/ChangeLog:
      
      	* genmatch.cc (get_out_file): Make static and rename to ...
      	(choose_output): ... this. Reimplement. Update all uses ...
      	(decision_tree::gen): ... here and ...
      	(main): ... here.
      153eafaa
    • Alexander Monakov's avatar
      genmatch: clean up showUsage · 425198bb
      Alexander Monakov authored
      Display usage more consistently and get rid of camelCase.
      
      gcc/ChangeLog:
      
      	* genmatch.cc (showUsage): Reimplement as ...
      	(usage): ...this.  Adjust all uses.
      	(main): Print usage when no arguments.  Add missing 'return 1'.
      425198bb
    • Alexander Monakov's avatar
      genmatch: clean up emit_func · 2ed6dd97
      Alexander Monakov authored
      Eliminate boolean parameters of emit_func. The first ('open') just
      prints 'extern' to generated header, which is unnecessary. Introduce a
      separate function to use when finishing a declaration in place of the
      second ('close').
      
      Rename emit_func to 'fp_decl' (matching 'fprintf' in length) to unbreak
      indentation in several places.
      
      Reshuffle emitted line breaks in a few places to make generated
      declarations less ugly.
      
      gcc/ChangeLog:
      
      	* genmatch.cc (header_file): Make static.
      	(emit_func): Rename to...
      	(fp_decl): ... this.  Adjust all uses.
      	(fp_decl_done): New function.  Use it...
      	(decision_tree::gen): ... here and...
      	(write_predicate): ... here.
      	(main): Adjust.
      2ed6dd97
    • Richard Sandiford's avatar
      aarch64: Avoid hard-coding specific register allocations · af84cb1f
      Richard Sandiford authored
      Some tests hard-coded specific allocations for temporary registers,
      whereas the RA should be free to pick anything that doesn't force
      unnecessary moves or spills.
      
      gcc/testsuite/
      	* gcc.target/aarch64/asimd-mul-to-shl-sub.c: Allow any register
      	allocation for temporary results, rather than requiring specific
      	registers.
      	* gcc.target/aarch64/auto-init-padding-1.c: Likewise.
      	* gcc.target/aarch64/auto-init-padding-2.c: Likewise.
      	* gcc.target/aarch64/auto-init-padding-3.c: Likewise.
      	* gcc.target/aarch64/auto-init-padding-4.c: Likewise.
      	* gcc.target/aarch64/auto-init-padding-9.c: Likewise.
      	* gcc.target/aarch64/memset-corner-cases.c: Likewise.
      	* gcc.target/aarch64/memset-q-reg.c: Likewise.
      	* gcc.target/aarch64/simd/vaddlv_1.c: Likewise.
      	* gcc.target/aarch64/sve-neon-modes_1.c: Likewise.
      	* gcc.target/aarch64/sve-neon-modes_3.c: Likewise.
      	* gcc.target/aarch64/sve/load_scalar_offset_1.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
      	* gcc.target/aarch64/sve/pr89007-1.c: Likewise.
      	* gcc.target/aarch64/sve/pr89007-2.c: Likewise.
      	* gcc.target/aarch64/sve/store_scalar_offset_1.c: Likewise.
      	* gcc.target/aarch64/vadd_reduc-1.c: Likewise.
      	* gcc.target/aarch64/vadd_reduc-2.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_bf16.c: Allow the temporary
      	predicate register to be any of p4-p7, rather than requiring p4
      	specifically.
      	* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
      af84cb1f
    • Richard Sandiford's avatar
      aarch64: Relax FP/vector register matches · 5c53d825
      Richard Sandiford authored
      There were many tests that used [0-9] to match an FP or vector register,
      but that should allow any of 0-31 instead.
      
      asm-x-constraint-1.c required s0-s7, but that's the range for "y"
      rather than "x".  "x" allows s0-s15.
      
      sve/pcs/return_9.c required z2-z7 (the initial set of available
      call-clobbered registers), but z24-z31 are OK too.
      
      gcc/testsuite/
      	* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: Allow any
      	FP/vector register, not just register 0-9.
      	* gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
      	* gcc.target/aarch64/ldp_stp_8.c: Likewise.
      	* gcc.target/aarch64/ldp_stp_17.c: Likewise.
      	* gcc.target/aarch64/ldp_stp_21.c: Likewise.
      	* gcc.target/aarch64/simd/vpaddd_f64.c: Likewise.
      	* gcc.target/aarch64/simd/vpaddd_s64.c: Likewise.
      	* gcc.target/aarch64/simd/vpaddd_u64.c: Likewise.
      	* gcc.target/aarch64/sve/adr_1.c: Likewise.
      	* gcc.target/aarch64/sve/adr_2.c: Likewise.
      	* gcc.target/aarch64/sve/adr_3.c: Likewise.
      	* gcc.target/aarch64/sve/adr_4.c: Likewise.
      	* gcc.target/aarch64/sve/adr_5.c: Likewise.
      	* gcc.target/aarch64/sve/extract_1.c: Likewise.
      	* gcc.target/aarch64/sve/extract_2.c: Likewise.
      	* gcc.target/aarch64/sve/extract_3.c: Likewise.
      	* gcc.target/aarch64/sve/extract_4.c: Likewise.
      	* gcc.target/aarch64/sve/slp_4.c: Likewise.
      	* gcc.target/aarch64/sve/spill_3.c: Likewise.
      	* gcc.target/aarch64/vfp-1.c: Likewise.
      	* gcc.target/aarch64/asm-x-constraint-1.c: Allow s0-s15, not just
      	s0-s7.
      	* gcc.target/aarch64/sve/pcs/return_9.c: Allow z24-z31 as well as
      	z2-z7.
      5c53d825
    • Richard Sandiford's avatar
      aarch64: Relax predicate register matches · 3e60e57e
      Richard Sandiford authored
      Most governing predicate operands require p0-p7, but some
      instructions also allow p8-p15.  Non-gp uses of predicates
      often also allow all of p0-p15.
      
      This patch fixes up cases where we required p0-p7 unnecessarily.
      In some cases we match the definition (typically a comparison,
      PFALSE or PTRUE), sometimes we match the use (like a logic
      instruction, MOV or SEL), and sometimes we match both.
      
      gcc/testsuite/
      	* g++.target/aarch64/sve/vcond_1.C: Allow any predicate
      	register for the temporary results, not just p0-p7.
      	* gcc.target/aarch64/sve/acle/asm/dupq_b8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/dupq_b16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/dupq_b32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/dupq_b64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general/whilele_5.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general/whilele_6.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general/whilele_7.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general/whilele_9.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general/whilele_10.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general/whilelt_1.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general/whilelt_2.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general/whilelt_3.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
      	* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
      	* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
      	* gcc.target/aarch64/sve/vcond_2.c: Likewise.
      	* gcc.target/aarch64/sve/vcond_3.c: Likewise.
      	* gcc.target/aarch64/sve/vcond_7.c: Likewise.
      	* gcc.target/aarch64/sve/vcond_18.c: Likewise.
      	* gcc.target/aarch64/sve/vcond_19.c: Likewise.
      	* gcc.target/aarch64/sve/vcond_20.c: Likewise.
      3e60e57e
    • Richard Sandiford's avatar
      aarch64: Relax ordering requirements in SVE dup tests · 75bd358e
      Richard Sandiford authored
      Some of the svdup tests expand to a SEL between two constant vectors.
      This patch allows the constants to be formed in either order.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/acle/asm/dup_s16.c: When using SEL to select
      	between two constant vectors, allow the constant moves to appear in
      	either order.
      	* gcc.target/aarch64/sve/acle/asm/dup_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/dup_s64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/dup_u16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/dup_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/dup_u64.c: Likewise.
      75bd358e
    • Richard Sandiford's avatar
      aarch64: Allow moves after tied-register intrinsics · 4ff89f10
      Richard Sandiford authored
      Some ACLE intrinsics map to instructions that tie the output
      operand to an input operand.  If all the operands are allocated
      to different registers, and if MOVPRFX can't be used, we will need
      a move either before the instruction or after it.  Many tests only
      matched the "before" case; this patch makes them accept the "after"
      case too.
      
      gcc/testsuite/
      	* gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: Allow
      	moves to occur after the intrinsic instruction, rather than requiring
      	them to happen before.
      	* gcc.target/aarch64/advsimd-intrinsics/bfdot-1.c: Likewise.
      	* gcc.target/aarch64/advsimd-intrinsics/vdot-3-1.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/adda_f16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/brka_b.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/brkb_b.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/brkn_b.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/clasta_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/clasta_f16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/clasta_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/clasta_f64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/clastb_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/clastb_f16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/clastb_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/clastb_f64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/pfirst_b.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/pnext_b16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/pnext_b32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/pnext_b64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/pnext_b8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sli_s16.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sli_s32.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sli_s64.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sli_s8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sli_u16.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sli_u32.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sli_u64.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sli_u8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sri_s16.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sri_s32.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sri_s64.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sri_s8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sri_u16.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sri_u32.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sri_u64.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sri_u8.c: Likewise.
      4ff89f10
    • Richard Sandiford's avatar
      aarch64: Fix move-after-intrinsic function-body tests · aebd8471
      Richard Sandiford authored
      Some of the SVE ACLE asm tests tried to be agnostic about the
      instruction order, but only one of the alternatives was exercised
      in practice.  This patch fixes latent typos in the other versions.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Fix expected register
      	allocation in the case where a move occurs after the intrinsic
      	instruction.
      	* gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise.
      aebd8471
    • Richard Sandiford's avatar
      ira: Don't create copies for earlyclobbered pairs · ba72a8d8
      Richard Sandiford authored
      This patch follows on from g:9f635bd1
      ("the previous patch").  To start by quoting that:
      
      If an insn requires two operands to be tied, and the input operand dies
      in the insn, IRA acts as though there were a copy from the input to the
      output with the same execution frequency as the insn.  Allocating the
      same register to the input and the output then saves the cost of a move.
      
      If there is no such tie, but an input operand nevertheless dies
      in the insn, IRA creates a similar move, but with an eighth of the
      frequency.  This helps to ensure that chains of instructions reuse
      registers in a natural way, rather than using arbitrarily different
      registers for no reason.
      
      This heuristic seems to work well in the vast majority of cases.
      However, the problem fixed in the previous patch was that we
      could create a copy for an operand pair even if, for all relevant
      alternatives, the output and input register classes did not have
      any registers in common.  It is then impossible for the output
      operand to reuse the dying input register.
      
      This left unfixed a further case where copies don't make sense:
      there is no point trying to reuse the dying input register if,
      for all relevant alternatives, the output is earlyclobbered and
      the input doesn't match the output.  (Matched earlyclobbers are fine.)
      
      Handling that case fixes several existing XFAILs and helps with
      a follow-on aarch64 patch.
      
      Tested on aarch64-linux-gnu and x86_64-linux-gnu.  A SPEC2017 run
      on aarch64 showed no differences outside the noise.  Also, I tried
      compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target
      per cpu directory, using the options -Os -fno-schedule-insns{,2}.
      The results below summarise the tests that showed a difference in LOC:
      
      Target               Tests   Good    Bad   Delta    Best   Worst  Median
      ======               =====   ====    ===   =====    ====   =====  ======
      amdgcn-amdhsa           14      7      7       3     -18      10      -1
      arm-linux-gnueabihf     16     15      1     -22      -4       2      -1
      csky-elf                 6      6      0     -21      -6      -2      -4
      hppa64-hp-hpux11.23      5      5      0      -7      -2      -1      -1
      ia64-linux-gnu          16     16      0     -70     -15      -1      -3
      m32r-elf                53      1     52      64      -2       8       1
      mcore-elf                2      2      0      -8      -6      -2      -6
      microblaze-elf         285    283      2    -909     -68       4      -1
      mmix                     7      7      0   -2101   -2091      -1      -1
      msp430-elf               1      1      0      -4      -4      -4      -4
      pru-elf                  8      6      2     -12      -6       2      -2
      rx-elf                  22     18      4     -40      -5       6      -2
      sparc-linux-gnu         15     14      1     -40      -8       1      -2
      sparc-wrs-vxworks       15     14      1     -40      -8       1      -2
      visium-elf               2      1      1       0      -2       2      -2
      xstormy16-elf            1      1      0      -2      -2      -2      -2
      
      with other targets showing no sensitivity to the patch.  The only
      target that seems to be negatively affected is m32r-elf; otherwise
      the patch seems like an extremely minor but still clear improvement.
      
      gcc/
      	* ira-conflicts.cc (can_use_same_reg_p): Skip over non-matching
      	earlyclobbers.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c: Remove XFAILs.
      	* gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/bic_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/bic_s64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/bic_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/bic_u64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/scale_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/scale_f64.c: Likewise.
      ba72a8d8
    • Jason Merrill's avatar
      c++: non-template friend of template [PR106740] · 73f7109f
      Jason Merrill authored
      This was fixed by r13-1018, but the testcase seems needed.
      
      	PR c++/106740
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/template/friend78.C: New test.
      73f7109f
    • GCC Administrator's avatar
      Daily bump. · 212905a4
      GCC Administrator authored
      212905a4
  2. May 08, 2023
    • Roger Sayle's avatar
      [x86_64] Introduce insvti_highpart define_insn_and_split. · 1e3054d2
      Roger Sayle authored
      This is a repost/respin of a patch that was conditionally approved:
      https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html
      
      This patch adds a convenient post-reload splitter for setting/updating
      the highpart of a TImode variable, using i386's previously added
      split_double_concat infrastructure.
      
      For the new test case below:
      
      __int128 foo(__int128 x, unsigned long long y)
      {
        __int128 t = (__int128)y << 64;
        __int128 r = (x & ~0ull) | t;
        return r;
      }
      
      mainline GCC with -O2 currently generates:
      
      foo:    movq    %rdi, %rcx
              xorl    %eax, %eax
              xorl    %edi, %edi
              orq     %rcx, %rax
              orq     %rdi, %rdx
              ret
      
      with this patch, GCC instead now generates the much better:
      
      foo:	movq    %rdi, %rcx
              movq    %rcx, %rax
              ret
      
      It turns out that the -m32 equivalent of this testcase, already
      avoids using explict orl/xor instructions, as it gets optimized
      (in combine) by a completely different path.  Given that this idiom
      isn't seen in 32-bit code (so this pattern doesn't match with -m32),
      and also that the shorter 32-bit AND bitmask is represented as a
      CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
      is implemented for just TARGET_64BIT rather than contort a "generic"
      implementation using DWI mode iterators.
      
      2023-05-08  Roger Sayle  <roger@nextmovesoftware.com>
      	    Uros Bizjak  <ubizjak@gmail.com>
      
      gcc/ChangeLog
      	* config/i386/i386.md (any_or_plus): Move definition earlier.
      	(*insvti_highpart_1): New define_insn_and_split to overwrite
      	(insv) the highpart of a TImode register/memory.
      
      gcc/testsuite/ChangeLog
      	* gcc.target/i386/insvti_highpart-1.c: New test case.
      1e3054d2
    • Eugene Rozenfeld's avatar
      Fix cfg maintenance after inlining in AutoFDO · 3d9853ee
      Eugene Rozenfeld authored
      Todo from early_inliner needs to be propagated so that
      cleanup_tree_cfg () is called if necessary.
      
      This bug was causing an assert in get_loop_body during
      ipa-sra in autoprofiledbootstrap build since loops weren't
      fixed up and one of the loops had num_nodes set to 0.
      
      Tested on x86_64-pc-linux-gnu.
      
      gcc/ChangeLog:
      
      	* auto-profile.cc (auto_profile): Check todo from early_inline
      	to see if cleanup_tree_vfg needs to be called.
      	(early_inline): Return todo from early_inliner.
      3d9853ee
    • Andrew Pinski's avatar
      Fix pr81192.c for int16 targets · 5d85b5d6
      Andrew Pinski authored
      I had missed when converting this
      testcase to Gimple that there was a define
      for int/unsigned type specifically to get
      an INT32 type. This means when using a
      literal integer constant you need to use the
      `_Literal (type)` to form the types correctly on the
      constants.
      
      This fixes the issue and has been both tested on
      xstormy16-elf and x86_64-linux-gnu.
      
      Committed as obvious.
      
      gcc/testsuite/ChangeLog:
      
      	PR testsuite/109776
      	* gcc.dg/pr81192.c: Fix integer constants for int16 targets.
      5d85b5d6
    • Kito Cheng's avatar
      RISC-V: Factor out vector manager code in vsetvli insertion pass. [NFC] · c139f5e1
      Kito Cheng authored
      gcc/ChangeLog:
      
      	* config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vector_info):
      	New.
      	(pass_vsetvl::get_block_info): New.
      	(pass_vsetvl::update_vector_info): New.
      	(pass_vsetvl::simple_vsetvl): Use get_vector_info.
      	(pass_vsetvl::compute_local_backward_infos): Ditto.
      	(pass_vsetvl::transfer_before): Ditto.
      	(pass_vsetvl::transfer_after): Ditto.
      	(pass_vsetvl::emit_local_forward_vsetvls): Ditto.
      	(pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
      	(pass_vsetvl::cleanup_insns): Ditto.
      	(pass_vsetvl::compute_local_backward_infos): Use
      	update_vector_info.
      c139f5e1
    • Kito Cheng's avatar
      RISC-V: Improve portability of testcases · dd7136cf
      Kito Cheng authored
      stdint.h will require having corresponding multi-lib existing, so using
      stdint-gcc.h instead, also added a riscv_vector.h wrapper to
      gcc.target/riscv/rvv/autovec/.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: Change
      	stdint.h to stdint-gcc.h.
      	* gcc.target/riscv/rvv/autovec/template-1.h: Ditto.
      	* gcc.target/riscv/rvv/autovec/riscv_vector.h: New.
      dd7136cf
    • Jeff Law's avatar
      Fix minor length computation on stormy16 · 148de3aa
      Jeff Law authored
      Today's build of xstormy16-elf failed due to a branch to an out of range
      target.  Manual inspection of the assembly code for the affected function
      (divdi3) showed that the zero-extension patterns were claiming a length
      of 2, but clearly assembled into 4 bytes.
      
      This patch adds an explicit length to the zero extension pattern and
      appears to resolve the issue in my test builds.
      
      gcc/
      
      	* config/stormy16/stormy16.md (zero_extendhisi2): Fix length.
      148de3aa
    • Thomas Schwinge's avatar
      libgomp C++ testsuite: Use 'lang_include_flags' instead of 'libstdcxx_includes' · 1b93b919
      Thomas Schwinge authored
      With nvptx offloading configured, and supported, and CUDA available:
      
          $ make check-target-libgomp RUNTESTFLAGS="--all c.exp=context-1.c c++.exp=context-1.c"
          [...]
          Running [...]/libgomp.oacc-c/c.exp ...
          PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
          PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
          PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
          PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
          UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2
          Running [...]/libgomp.oacc-c++/c++.exp ...
          PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
          PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
          PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
          PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
          UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2
          [...]
      
      ..., but for 'c++.exp=context-1.c' alone, we currently get all-UNSUPPORTED:
      
          $ make check-target-libgomp RUNTESTFLAGS_="--all c++.exp=context-1.c"
          [...]
          Running [...]/libgomp.oacc-c++/c++.exp ...
          UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
          UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2
          UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2
          [...]
      
      That is, if 'c.exp' executes first, it does successfully evaluate
      'dg-require-effective-target openacc_cublas' -- and does cache this result (so
      it isn't reevaluated for 'c++.exp').  However, for 'c++.exp' alone (that is,
      without the 'c.exp' result cached), we run into:
      
          spawn -ignore SIGHUP [xgcc] [...] -x c++ openacc_cublas2311907.c [...]
          In file included from /usr/include/cuda_fp16.h:3673,
                           from /usr/include/cublas_api.h:75,
                           from /usr/include/cublas_v2.h:65,
                           from openacc_cublas2311907.c:3:
          /usr/include/cuda_fp16.hpp:67:10: fatal error: utility: No such file or directory
      
      We're missing include paths to C++/libstdc++ build-tree headers.
      
      Fix this by using the mechanism introduced for Fortran in
      r212268 (commit f707da16) re
      "libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}".
      
      	libgomp/
      	* testsuite/libgomp.c++/c++.exp: Use 'lang_include_flags' instead
      	of 'libstdcxx_includes'.
      	* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
      1b93b919
    • Thomas Schwinge's avatar
      Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init'... · d6654a4b
      Thomas Schwinge authored
      Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS'
      
      Otherwise, for example for 'RUNTESTFLAGS' of '--target_board=unix\{-m64,-m32\}'
      vs. '--target_board=unix\{-m32,-m64\}', both variants exercise testing with
      always the first flag variant's 'LTO_OPTIONS'/'LTO_TORTURE_OPTIONS', which
      results in unequal test results between the two 'RUNTESTFLAGS' variants if one
      of the flag variants has 'check_linker_plugin_available' but the other doesn't.
      
      Fix-up for r180245 (commit c1a7cdbb)
      "Update testsuite to run with slim LTO".
      
      	gcc/testsuite/
      	* g++.dg/guality/guality.exp: Move 'torture-init' earlier.
      	* gcc.dg/guality/guality.exp: Likewise.
      	* gfortran.dg/guality/guality.exp: Likewise.
      	* lib/c-torture.exp (LTO_TORTURE_OPTIONS): Don't set.
      	* lib/gcc-dg.exp (LTO_TORTURE_OPTIONS): Don't set.
      	* lib/lto.exp (lto_init, lto_finish): Let each 'lto_init'
      	determine the default 'LTO_OPTIONS'.
      	* lib/torture-options.exp (torture-init, torture-finish): Let each
      	'torture-init' determine the 'LTO_TORTURE_OPTIONS'.
      d6654a4b
    • Thomas Schwinge's avatar
      libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation · 130c2f3c
      Thomas Schwinge authored
      ... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it.
      
      Follow-up to commit 131d18e9
      "libgomp/nvptx: Prepare for reverse-offload callback handling",
      and commit ea4b23d9
      "libgomp: Handle OpenMP's reverse offloads".
      
      	libgomp/
      	* target.c (gomp_target_rev): Instead of 'dev_to_host_cpy',
      	'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'.
      	* libgomp.h (gomp_target_rev): Adjust.
      	* libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust.
      	* libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust.
      	* plugin/plugin-gcn.c (process_reverse_offload): Adjust.
      	* plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy)
      	(rev_off_host_to_dev_cpy): Remove.
      	(GOMP_OFFLOAD_run): Adjust.
      130c2f3c
    • Thomas Schwinge's avatar
      libgm2: Remove 'autogen.sh' · bd6dbdb1
      Thomas Schwinge authored
      ... given that plain 'autoreconf' achieves the same.
      
      	libgm2/
      	* autogen.sh: Remove.
      bd6dbdb1
    • Thomas Schwinge's avatar
      libgm2: Adjust 'autogen.sh' to 'ACLOCAL_AMFLAGS', and simplify · 8b8a4fb8
      Thomas Schwinge authored
      Specifying explicit '-I ..' before '-I ../config' is what (most) other GCC
      components do.  Specifying '-I .' is not necessary.
      
      With the order of '-I's aligned, 'autogen.sh' and plain 'autoreconf' then
      produce identical results.
      
      	libgm2/
      	* autogen.sh: For 'aclocal', 'autoreconf', remove '-I .',
      	add '-I ..'.
      	* Makefile.am (ACLOCAL_AMFLAGS): Remove '-I .'.
      	* libm2cor/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
      	* libm2iso/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
      	* libm2log/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
      	* libm2min/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
      	* libm2pim/Makefile.am (ACLOCAL_AMFLAGS): Likewise.
      	* aclocal.m4: Regenerate.
      	* Makefile.in: Likewise.
      	* libm2cor/Makefile.in: Likewise.
      	* libm2iso/Makefile.in: Likewise.
      	* libm2log/Makefile.in: Likewise.
      	* libm2min/Makefile.in: Likewise.
      	* libm2pim/Makefile.in: Likewise.
      8b8a4fb8
    • Patrick Palka's avatar
      c++: list CTAD and resolve_nondeduced_context [PR106214] · 06ef1583
      Patrick Palka authored
      This extends the PR93107 fix, which made us do resolve_nondeduced_context
      on the elements of an initializer list during auto deduction, to happen for
      CTAD as well.
      
      	PR c++/106214
      	PR c++/93107
      
      gcc/cp/ChangeLog:
      
      	* pt.cc (do_auto_deduction): Move up resolve_nondeduced_context
      	calls to happen before do_class_deduction.  Add some
      	error_mark_node tests.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp1z/class-deduction114.C: New test.
      06ef1583
    • Michael Meissner's avatar
      Bump up precision size to 16 bits. · e2b993db
      Michael Meissner authored
      The new __dmr type that is being added as a possible future PowerPC instruction
      set bumps into a structure field size issue.  The size of the __dmr type is 1024 bits.
      The precision field in tree_type_common is currently 10 bits, so if you store
      1,024 into field, you get a 0 back.  When you get 0 in the precision field, the
      ccp pass passes this 0 to sext_hwi in hwint.h.  That function in turn generates
      a shift that is equal to the host wide int bit size, which is undefined as
      machine dependent for shifting in C/C++.
      
            int shift = HOST_BITS_PER_WIDE_INT - prec;
            return ((HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) src << shift)) >> shift;
      
      It turns out the x86_64 where I first did my tests returns the original input
      before the two shifts, while the PowerPC always returns 0.  In the ccp pass, the
      original input is -1, and so it worked.  When I did the runs on the PowerPC, the
      result was 0, which ultimately led to the failure.
      
      2023-02-01  Richard Biener  <rguenther@suse.de>
      	    Michael Meissner  <meissner@linux.ibm.com>
      
      	PR middle-end/108623
      	* tree-core.h (tree_type_common): Bump up precision field to 16 bits.
      	Align bit fields > 1 bit to at least an 8-bit boundary.
      e2b993db
    • Bernhard Reutner-Fischer's avatar
      fortran: Fix coding style around free() · c93bde22
      Bernhard Reutner-Fischer authored
      Fix coding-style errors introduced in ca2f64d5
      
      gcc/fortran/ChangeLog:
      
      	* resolve.cc (resolve_select_type): Fix coding style.
      
      libgfortran/ChangeLog:
      
      	* caf/single.c (_gfortran_caf_register): Fix coding style.
      	* io/async.c (update_pdt, async_io): Likewise.
      	* io/format.c (free_format_data): Likewise.
      	* io/transfer.c (st_read_done_worker, st_write_done_worker): Likewise.
      	* io/unix.c (mem_close): Likewise.
      c93bde22
    • Andrew Pinski's avatar
      PHIOPT: factor out unary operations instead of just conversions · 6d6c17e4
      Andrew Pinski authored
      After using factor_out_conditional_conversion with diamond bb,
      we should be able do use it also for all normal unary gimple and not
      just conversions. This allows to optimize PR 59424 for an example.
      This is also a start to optimize PR 64700 and a few others.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      An example of this is:
      ```
      static inline unsigned long long g(int t)
      {
        unsigned t1 = t;
        return t1;
      }
      static int abs1(int a)
      {
        if (a < 0)
          a = -a;
        return a;
      }
      unsigned long long f(int c, int d, int e)
      {
        unsigned long long t;
        if (d > e)
          t = g(abs1(d));
        else
          t = g(abs1(e));
        return t;
      }
      ```
      
      Which should be optimized to:
        _9 = MAX_EXPR <d_5(D), e_6(D)>;
        _4 = ABS_EXPR <_9>;
        t_3 = (long long unsigned intD.16) _4;
      
      gcc/ChangeLog:
      
      	* tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ...
      	(factor_out_conditional_operation): This and add support for all unary
      	operations.
      	(pass_phiopt::execute): Update call to factor_out_conditional_conversion
      	to call factor_out_conditional_operation instead.
      
      	PR tree-optimization/109424
      	PR tree-optimization/59424
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/tree-ssa/abs-2.c: Update tree scan for
      	details change in wording.
      	* gcc.dg/tree-ssa/minmax-17.c: Likewise.
      	* gcc.dg/tree-ssa/pr103771.c: Likewise.
      	* gcc.dg/tree-ssa/minmax-18.c: New test.
      	* gcc.dg/tree-ssa/minmax-19.c: New test.
      6d6c17e4
    • Andrew Pinski's avatar
      PHIOPT: Loop over calling factor_out_conditional_conversion · 01f3e376
      Andrew Pinski authored
      After adding diamond shaped bb support to factor_out_conditional_conversion,
      we can get a case where we have two conversions that needs factored out
      and then would have another phiopt happen.
      An example is:
      ```
      static inline unsigned long long g(int t)
      {
        unsigned t1 = t;
        return t1;
      }
      unsigned long long f(int c, int d, int e)
      {
        unsigned long long t;
        if (c > d)
          t = g(c);
        else
          t = g(d);
        return t;
      }
      ```
      In this case we should get a MAX_EXPR in phiopt1 with two casts.
      Before this patch, we would just factor out the outer cast and then
      wait till phiopt2 to factor out the inner cast.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      gcc/ChangeLog:
      
      	* tree-ssa-phiopt.cc (pass_phiopt::execute): Loop
      	over factor_out_conditional_conversion.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/tree-ssa/minmax-17.c: New test.
      01f3e376
    • Andrew Pinski's avatar
      PHIOPT: Add diamond bb form to factor_out_conditional_conversion · 69f1a8af
      Andrew Pinski authored
      So the function factor_out_conditional_conversion already supports
      diamond shaped bb forms, just need to be called for such a thing.
      
      harden-cond-comp.c needed to be changed as we would optimize out the
      conversion now and that causes the compare hardening not needing to
      split the block which it was testing. So change it such that there
      would be no chance of optimization.
      
      Also add two testcases that showed the improvement. PR 103771 is
      solved in ifconvert also for the vectorizer but now it is solved
      in a general sense.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      	PR tree-optimization/49959
      	PR tree-optimization/103771
      
      gcc/ChangeLog:
      
      	* tree-ssa-phiopt.cc (pass_phiopt::execute): Support
      	Diamond shapped bb form for factor_out_conditional_conversion.
      
      gcc/testsuite/ChangeLog:
      
      	* c-c++-common/torture/harden-cond-comp.c: Change testcase
      	slightly to avoid the new phiopt optimization.
      	* gcc.dg/tree-ssa/abs-2.c: New test.
      	* gcc.dg/tree-ssa/pr103771.c: New test.
      69f1a8af
    • Juzhe-Zhong's avatar
      RISC-V: Fix ugly && incorrect codes of RVV auto-vectorization · bf839c15
      Juzhe-Zhong authored
      1. Add movmisalign pattern for TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
         targethook, current RISC-V has supported this target hook, we can't make
         it supported without movmisalign pattern.
      
      2. Remove global extern of get_mask_policy_no_pred && get_tail_policy_no_pred.
         These 2 functions are comming from intrinsic builtin frameworks.
         We are sure we don't need them in auto-vectorization implementation.
      
      3. Refine mask mode implementation.
      
      4. We should not have "riscv_vector_" in riscv_vector namspace since it
         makes the codes inconsistent and ugly.
      
         For example:
         Before this patch:
         static opt_machine_mode
         riscv_get_mask_mode (machine_mode mode)
         {
           machine_mode mask_mode = VOIDmode;
           if (TARGET_VECTOR && riscv_vector::riscv_vector_get_mask_mode (mode).exists (&mask_mode))
            return mask_mode;
         ..
      
         After this patch:
         riscv_get_mask_mode (machine_mode mode)
         {
           machine_mode mask_mode = VOIDmode;
           if (TARGET_VECTOR && riscv_vector::get_mask_mode (mode).exists (&mask_mode))
            return mask_mode;
         ..
      
      5. Fix fail testcase fixed-vlmax-1.c.
      
      gcc/ChangeLog:
      
      	* config/riscv/autovec.md (movmisalign<mode>): New pattern.
      	* config/riscv/riscv-protos.h (riscv_vector_mask_mode_p): Delete.
      	(riscv_vector_get_mask_mode): Ditto.
      	(get_mask_policy_no_pred): Ditto.
      	(get_tail_policy_no_pred): Ditto.
      	(get_mask_mode): New function.
      	* config/riscv/riscv-v.cc (get_mask_policy_no_pred): Delete.
      	(get_tail_policy_no_pred): Ditto.
      	(riscv_vector_mask_mode_p): Ditto.
      	(riscv_vector_get_mask_mode): Ditto.
      	(get_mask_mode): New function.
      	* config/riscv/riscv-vector-builtins.cc (use_real_merge_p): Remove
      	global extern.
      	(get_tail_policy_for_pred): Ditto.
      	* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred): Ditto.
      	(get_mask_policy_for_pred): Ditto
      	* config/riscv/riscv.cc (riscv_get_mask_mode): Refine codes.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Fix typo.
      bf839c15
    • Kito Cheng's avatar
      RISC-V: Handle multi-lib path correclty for linux · 17d683d4
      Kito Cheng authored
      RISC-V Linux encodes the ABI into the path, so in theory, we can only use that
      to select multi-lib paths, and no way to use different multi-lib paths between
      `rv32i/ilp32` and `rv32ima/ilp32`, we'll mapping both to `/lib/ilp32`.
      
      It's hard to do that with GCC's builtin multi-lib selection mechanism; builtin
      mechanism did the option string compare and then enumerate all possible reuse
      rules during the build time. However, it's impossible to RISC-V; we have a huge
      number of combinations of `-march`, so implementing a customized multi-lib
      selection becomes the only solution.
      
      Multi-lib configuration is only used for determines which ISA should be used
      when compiling the corresponding ABI variant after this patch.
      
      During the multi-lib selection stage, only consider -mabi as the only key to
      select the multi-lib path.
      
      gcc/ChangeLog:
      
      	* common/config/riscv/riscv-common.cc (riscv_select_multilib_by_abi): New.
      	(riscv_select_multilib): New.
      	(riscv_compute_multilib): Extract logic to riscv_select_multilib and
      	also handle select_by_abi.
      	* config/riscv/elf.h (RISCV_USE_CUSTOMISED_MULTI_LIB): Change it
      	to select_by_abi_arch_cmodel from 1.
      	* config/riscv/linux.h (RISCV_USE_CUSTOMISED_MULTI_LIB): Define.
      	* config/riscv/riscv-opts.h (enum riscv_multilib_select_kind): New.
      17d683d4
    • Alexander Monakov's avatar
      Makefile.in: clean up match.pd-related dependencies · 31c70a7d
      Alexander Monakov authored
      Clean up confusing changes from the recent refactoring for
      parallel match.pd build.
      
      gimple-match-head.o is not built. Remove related flags adjustment.
      
      Autogenerated gimple-match-N.o files do not depend on
      gimple-match-exports.cc.
      
      {gimple,generic)-match-auto.h only depend on the prerequisites of the
      corresponding s-{gimple,generic}-match stamp file, not any .cc file.
      
      gcc/ChangeLog:
      
      	* Makefile.in: (gimple-match-head.o-warn): Remove.
      	(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
      	gimple-match-exports.cc.
      	(gimple-match-auto.h): Only depend on s-gimple-match.
      	(generic-match-auto.h): Likewise.
      31c70a7d
    • Andrew Pinski's avatar
      Move substitute_and_fold over to use simple_dce_from_worklist · 21e2ef2d
      Andrew Pinski authored
      While looking into a different issue, I noticed that it
      would take until the second forwprop pass to do some
      forward proping and it was because the ssa name was
      used more than once but the second statement was
      "dead" and we don't remove that until much later.
      
      So this uses simple_dce_from_worklist instead of manually
      removing of the known unused statements instead.
      Propagate engine does not do a cleanupcfg afterwards either but manually
      cleans up possible EH edges so simple_dce_from_worklist
      needs to communicate that back to the propagate engine.
      
      Some testcases needed to be updated/changed even because of better optimization.
      gcc.dg/pr81192.c even had to be changed to be using the gimple FE so it would
      be less fragile in the future too.
      gcc.dg/tree-ssa/pr98737-1.c was failing because __atomic_fetch_ was being matched
      but in those cases, the result was not being used so both __atomic_fetch_ and
      __atomic_x_and_fetch_ are valid choices and would not make a code generation difference.
      evrp7.c, evrp8.c, vrp35.c, vrp36.c: just needed a slightly change as the removal message
      is different slightly.
      kernels-alias-8.c: ccp1 is able to remove an unused load which causes ealias to have
      one less load to analysis so update the expected scan #.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/109691
      	* tree-ssa-dce.cc (simple_dce_from_worklist): Add need_eh_cleanup
      	argument.
      	If the removed statement can throw, have need_eh_cleanup
      	include the bb of that statement.
      	* tree-ssa-dce.h (simple_dce_from_worklist): Update declaration.
      	* tree-ssa-propagate.cc (struct prop_stats_d): Remove
      	num_dce.
      	(substitute_and_fold_dom_walker::substitute_and_fold_dom_walker):
      	Initialize dceworklist instead of stmts_to_remove.
      	(substitute_and_fold_dom_walker::~substitute_and_fold_dom_walker):
      	Destore dceworklist instead of stmts_to_remove.
      	(substitute_and_fold_dom_walker::before_dom_children):
      	Set dceworklist instead of adding to stmts_to_remove.
      	(substitute_and_fold_engine::substitute_and_fold):
      	Call simple_dce_from_worklist instead of poping
      	from the list.
      	Don't update the stat on removal statements.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/tree-ssa/evrp7.c: Update for output change.
      	* gcc.dg/tree-ssa/evrp8.c: Likewise.
      	* gcc.dg/tree-ssa/vrp35.c: Likewise.
      	* gcc.dg/tree-ssa/vrp36.c: Likewise.
      	* gcc.dg/tree-ssa/pr98737-1.c: Update scan-tree-dump-not
      	to check for assignment too instead of just a call.
      	* c-c++-common/goacc/kernels-alias-8.c: Update test
      	for removal of load.
      	* gcc.dg/pr81192.c: Rewrite testcase in gimple based test.
      21e2ef2d
    • Bernhard Reutner-Fischer's avatar
      fortran: Remove conditionals around free() · ca2f64d5
      Bernhard Reutner-Fischer authored
      gcc/fortran/ChangeLog:
      
      	* resolve.cc (resolve_select_type): Call free() unconditionally.
      
      libgfortran/ChangeLog:
      
      	* caf/single.c (_gfortran_caf_register): Call free() unconditionally.
      	* io/async.c (update_pdt, async_io): Likewise.
      	* io/format.c (free_format_data): Likewise.
      	* io/transfer.c (st_read_done_worker, st_write_done_worker): Likewise.
      	* io/unix.c (mem_close): Likewise.
      ca2f64d5
    • Bernhard Reutner-Fischer's avatar
      Fortran: Fix mpz and mpfr memory leaks [PR fortran/68800] · 2521390d
      Bernhard Reutner-Fischer authored
      gcc/fortran/ChangeLog:
      
      	PR fortran/68800
      	* expr.cc (find_array_section): Fix mpz memory leak.
      	* simplify.cc (gfc_simplify_reshape): Fix mpz memory leaks in
      	error paths.
      2521390d
    • Jerry DeLisle's avatar
      Fortran: Reject semicolon after namelist name. · d46b3db4
      Jerry DeLisle authored
      	PR fortran/109662
      
      libgfortran/ChangeLog:
      
      	* io/list_read.c: Add check for a semicolon after a namelist
      	name in read input. Issue a runtime error message.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/pr109662-a.f90: New test.
      d46b3db4
    • GCC Administrator's avatar
      Daily bump. · 70d03823
      GCC Administrator authored
      70d03823
  3. May 07, 2023
Loading