Skip to content
Snippets Groups Projects
  1. Nov 29, 2021
    • Eric Gallager's avatar
      Make etags path used by build system configurable · 909b30a1
      Eric Gallager authored
      This commit allows users to specify a path to their "etags"
      executable for use when doing "make tags".
      I based this patch off of this one from upstream automake:
      https://git.savannah.gnu.org/cgit/automake.git/commit/m4?id=d2ccbd7eb38d6a4277d6f42b994eb5a29b1edf29
      This means that I just supplied variables that the user can override
      for the tags programs, rather than having the configure scripts
      actually check for them. I handle etags and ctags separately because
      the intl subdirectory has separate targets for them. This commit
      only affects the subdirectories that use handwritten Makefiles; the
      ones that use automake will have to wait until we update the version
      of automake used to be 1.16.4 or newer before they'll be fixed.
      
      Addresses #103021
      
      gcc/ChangeLog:
      
      	PR other/103021
      	* Makefile.in: Substitute CTAGS, ETAGS, and CSCOPE
      	variables. Use ETAGS variable in TAGS target.
      	* configure: Regenerate.
      	* configure.ac: Allow CTAGS, ETAGS, and CSCOPE
      	variables to be overridden.
      
      gcc/ada/ChangeLog:
      
      	PR other/103021
      	* gcc-interface/Make-lang.in: Use ETAGS variable in
      	TAGS target.
      
      gcc/c/ChangeLog:
      
      	PR other/103021
      	* Make-lang.in: Use ETAGS variable in TAGS target.
      
      gcc/cp/ChangeLog:
      
      	PR other/103021
      	* Make-lang.in: Use ETAGS variable in TAGS target.
      
      gcc/d/ChangeLog:
      
      	PR other/103021
      	* Make-lang.in: Use ETAGS variable in TAGS target.
      
      gcc/fortran/ChangeLog:
      
      	PR other/103021
      	* Make-lang.in: Use ETAGS variable in TAGS target.
      
      gcc/go/ChangeLog:
      
      	PR other/103021
      	* Make-lang.in: Use ETAGS variable in TAGS target.
      
      gcc/objc/ChangeLog:
      
      	PR other/103021
      	* Make-lang.in: Use ETAGS variable in TAGS target.
      
      gcc/objcp/ChangeLog:
      
      	PR other/103021
      	* Make-lang.in: Use ETAGS variable in TAGS target.
      
      intl/ChangeLog:
      
      	PR other/103021
      	* Makefile.in: Use ETAGS variable in TAGS target,
      	CTAGS variable in CTAGS target, and MKID variable
      	in ID target.
      	* configure: Regenerate.
      	* configure.ac: Allow CTAGS, ETAGS, and MKID
      	variables to be overridden.
      
      libcpp/ChangeLog:
      
      	PR other/103021
      	* Makefile.in: Use ETAGS variable in TAGS target.
      	* configure: Regenerate.
      	* configure.ac: Allow ETAGS variable to be overridden.
      
      libiberty/ChangeLog:
      
      	PR other/103021
      	* Makefile.in: Use ETAGS variable in TAGS target.
      	* configure: Regenerate.
      	* configure.ac: Allow ETAGS variable to be overridden.
      909b30a1
    • Paul A. Clarke's avatar
      rs6000: Add Power10 optimization for most _mm_movemask* · 85289ba3
      Paul A. Clarke authored
      Power10 ISA added `vextract*` instructions which are realized in the
      `vec_extractm` instrinsic.
      
      Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
      `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.
      
      2021-11-29  Paul A. Clarke  <pc@us.ibm.com>
      
      gcc
      	* config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
      	when _ARCH_PWR10.
      	* config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
      	(_mm_movemask_epi8): Likewise.
      85289ba3
    • Richard Biener's avatar
      Fix RTL FE issue with premature return · e2194a8b
      Richard Biener authored
      This fixes an issue discovered by -Wunreachable-code-return
      
      2021-11-29  Richard Biener  <rguenther@suse.de>
      
      	* read-rtl-function.c (function_reader::read_rtx_operand):
      	Return only after resetting m_in_call_function_usage.
      e2194a8b
    • Patrick Palka's avatar
      c++: redundant explicit 'this' capture before C++20 [PR100493] · 1420ff3e
      Patrick Palka authored
      As described in detail in the PR, in C++20 implicitly capturing 'this'
      via a '=' capture default is deprecated, and in C++17 adding an explicit
      'this' capture alongside a '=' capture default is diagnosed as redundant
      (and is strictly speaking ill-formed).  This means it's impossible to
      write, in a forward-compatible way, a C++17 lambda that has a '=' capture
      default and that also captures 'this' (implicitly or explicitly):
      
        [=] { this; }      // #1 deprecated in C++20, OK in C++17
      		     // GCC issues a -Wdeprecated warning in C++20 mode
      
        [=, this] { }      // #2 ill-formed in C++17, OK in C++20
      		     // GCC issues an unconditional warning in C++17 mode
      
      This patch resolves this dilemma by downgrading the warning for #2 into
      a -pedantic one.  In passing, move it into the -Wc++20-extensions class
      of warnings and adjust its wording accordingly.
      
      	PR c++/100493
      
      gcc/cp/ChangeLog:
      
      	* parser.c (cp_parser_lambda_introducer): In C++17, don't
      	diagnose a redundant 'this' capture alongside a by-copy
      	capture default unless -pedantic.  Move the diagnostic into
      	-Wc++20-extensions and adjust wording accordingly.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp1z/lambda-this1.C: Adjust expected diagnostics.
      	* g++.dg/cpp1z/lambda-this8.C: New test.
      	* g++.dg/cpp2a/lambda-this3.C: Compile with -pedantic in C++17
      	to continue to diagnose redundant 'this' captures.
      1420ff3e
    • Roger Sayle's avatar
      x86_64: Improved V1TImode rotations by non-constant amounts. · a5d269f0
      Roger Sayle authored
      This patch builds on the recent improvements to TImode rotations (and
      Jakub's fixes to shldq/shrdq patterns).  Now that expanding a TImode
      rotation can never fail, it is safe to allow general_operand constraints
      on the QImode shift amounts in rotlv1ti3 and rotrv1ti3 patterns.
      I've also made an additional tweak to ix86_expand_v1ti_to_ti to use
      vec_extract via V2DImode, which avoid using memory and takes advantage
      vpextrq on recent hardware.
      
      For the following test case:
      
      typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
      uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); }
      
      GCC with -O2 -mavx2 would previously generate:
      
      rotr:   vmovdqa %xmm0, -24(%rsp)
              movq    -16(%rsp), %rdx
              movl    %edi, %ecx
              xorl    %esi, %esi
              movq    -24(%rsp), %rax
              shrdq   %rdx, %rax
              shrq    %cl, %rdx
              testb   $64, %dil
              cmovne  %rdx, %rax
              cmovne  %rsi, %rdx
              negl    %ecx
              xorl    %edi, %edi
              andl    $127, %ecx
              vmovq   %rax, %xmm2
              movq    -24(%rsp), %rax
              vpinsrq $1, %rdx, %xmm2, %xmm1
              movq    -16(%rsp), %rdx
              shldq   %rax, %rdx
              salq    %cl, %rax
              testb   $64, %cl
              cmovne  %rax, %rdx
              cmovne  %rdi, %rax
              vmovq   %rax, %xmm3
              vpinsrq $1, %rdx, %xmm3, %xmm0
              vpor    %xmm1, %xmm0, %xmm0
              ret
      
      with this patch, we now generate:
      
      rotr:	movl    %edi, %ecx
              vpextrq $1, %xmm0, %rax
              vmovq   %xmm0, %rdx
              shrdq   %rax, %rdx
              vmovq   %xmm0, %rsi
              shrdq   %rsi, %rax
              andl    $64, %ecx
              movq    %rdx, %rsi
              cmovne  %rax, %rsi
              cmove   %rax, %rdx
              vmovq   %rsi, %xmm0
              vpinsrq $1, %rdx, %xmm0, %xmm0
              ret
      
      2021-11-29  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	* config/i386/i386-expand.c (ix86_expand_v1ti_to_ti): Perform the
      	conversion via V2DImode using vec_extractv2didi on TARGET_SSE2.
      	* config/i386/sse.md (rotlv1ti3, rotrv1ti3): Change constraint
      	on QImode shift amounts from const_int_operand to general_operand.
      
      gcc/testsuite/ChangeLog
      	* gcc.target/i386/sse2-v1ti-rotate.c: New test case.
      a5d269f0
    • Richard Biener's avatar
      Remove unreachable gcc_unreachable () at the end of functions · a3b31fe3
      Richard Biener authored
      It seems to be a style to place gcc_unreachable () after a
      switch that handles all cases with every case returning.
      Those are unreachable (well, yes!), so they will be elided
      at CFG construction time and the middle-end will place
      another __builtin_unreachable "after" them to note the
      path doesn't lead to a return when the function is not declared
      void.
      
      So IMHO those explicit gcc_unreachable () serve no purpose,
      if they could be replaced by a comment.  But since all cases
      cover switches not handling a case or not returning will
      likely cause some diagnostic to be emitted which is better
      than running into an ICE only at runtime.
      
      2021-11-24  Richard Biener  <rguenther@suse.de>
      
      	* tree.h (reverse_storage_order_for_component_p): Remove
      	spurious gcc_unreachable.
      	* cfganal.c (dfs_find_deadend): Likewise.
      	* fold-const-call.c (fold_const_logb): Likewise.
      	(fold_const_significand): Likewise.
      	* gimple-ssa-store-merging.c (lhs_valid_for_store_merging_p):
      	Likewise.
      
      gcc/c-family/
      	* c-format.c (check_format_string): Remove spurious
      	gcc_unreachable.
      a3b31fe3
    • Richard Biener's avatar
      Remove unreachable returns · 16507dea
      Richard Biener authored
      This removes unreachable return statements as diagnosed by
      the -Wunreachable-code patch.  Some cases are more obviously
      an improvement than others - in fact some may get you the idea
      to replace them with gcc_unreachable () instead, leading to
      cases of the 'Remove unreachable gcc_unreachable () at the end
      of functions' patch.
      
      2021-11-25  Richard Biener  <rguenther@suse.de>
      
      	* vec.c (qsort_chk): Do not return the void return value
      	from the noreturn qsort_chk_error.
      	* ccmp.c (expand_ccmp_expr_1): Remove unreachable return.
      	* df-scan.c (df_ref_equal_p): Likewise.
      	* dwarf2out.c (is_base_type): Likewise.
      	(add_const_value_attribute): Likewise.
      	* fixed-value.c (fixed_arithmetic): Likewise.
      	* gimple-fold.c (gimple_fold_builtin_fputs): Likewise.
      	* gimple-ssa-strength-reduction.c (stmt_cost): Likewise.
      	* graphite-isl-ast-to-gimple.c
      	(gcc_expression_from_isl_expr_op): Likewise.
      	(gcc_expression_from_isl_expression): Likewise.
      	* ipa-fnsummary.c (will_be_nonconstant_expr_predicate):
      	Likewise.
      	* lto-streamer-in.c (lto_input_mode_table): Likewise.
      
      gcc/c-family/
      	* c-opts.c (c_common_post_options): Remove unreachable return.
      	* c-pragma.c (handle_pragma_target): Likewise.
      	(handle_pragma_optimize): Likewise.
      
      gcc/c/
      	* c-typeck.c (c_tree_equal): Remove unreachable return.
      	* c-parser.c (get_matching_symbol): Likewise.
      
      libgomp/
      	* oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): Remove unreachable
      	return.
      16507dea
    • liuhongt's avatar
      Optimize _Float16 usage for non AVX512FP16. · 11d0a2af
      liuhongt authored
      1. No memory is needed to move HI/HFmode between GPR and SSE registers
      under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o
      AVX512FP16.
      2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace
      ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant
      initialization cound be eliminated.
      
      gcc/ChangeLog:
      
      	PR target/102811
      	* config/i386/i386.c (inline_secondary_memory_needed): HImode
      	move between GPR and SSE registers is supported under
      	TARGET_SSE2 and above.
      	* config/i386/i386.md (extendhfsf2): Optimize expander.
      	(truncsfhf2): Ditto.
      	* config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to
      	align with V8HImode.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr102811-2.c: New test.
      	* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new
      	scan-assembler-times.
      11d0a2af
    • liuhongt's avatar
      Fix regression introduced by r12-5536. · 9519b694
      liuhongt authored
      There're several failures:
      1.  unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)"
      %vpextrw should be used in output templates.
      2. ICE in get_attr_memory for movhi_internal since some alternatives
      are marked as TYPE_SSELOG.
      use TYPE_SSELOG1 instead.
      
      Also this patch fixs a typo and some latent bugs which are related to
      moving HImode from/to sse register w/o TARGET_AVX512FP16.
      
      gcc/ChangeLog:
      
      	PR target/102811
      	PR target/103463
      	* config/i386/i386.c (ix86_secondary_reload): Without
      	TARGET_SSE4_1, General register is needed to move HImode from
      	sse register to memory.
      	* config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of
      	pextrw in output templates.
      	* config/i386/i386.md (movhi_internal): Ditto, also fix typo of
      	MEM_P (operands[1]) and adjust mode/prefix/type attribute for
      	alternatives related to sse register.
      9519b694
    • Richard Biener's avatar
      tree-optimization/103458 - avoid creating new loops in CD-DCE · 85e91ad5
      Richard Biener authored
      When creating forwarders in CD-DCE we have to avoid creating loops
      where we formerly did not consider those because of abnormal
      predecessors.  At this point simply excuse us when there are any
      abnormal predecessors.
      
      2021-11-29  Richard Biener  <rguenther@suse.de>
      
      	PR tree-optimization/103458
      	* tree-ssa-dce.c (make_forwarders_with_degenerate_phis): Do not
      	create forwarders for blocks with abnormal predecessors.
      
      	* gcc.dg/torture/pr103458.c: New testcase.
      85e91ad5
    • Richard Biener's avatar
      Restore can_be_invalidated_p semantics to before refactoring · 5e5f880d
      Richard Biener authored
      This restores the semantics of can_be_invalidated_p to the original
      semantics of the function this was split out from tree-ssa-uninit.c.
      The current semantics only ever look at the first predicate which
      cannot be correct.
      
      2021-11-26  Richard Biener  <rguenther@suse.de>
      
      	* gimple-predicate-analysis.cc (can_be_invalidated_p):
      	Restore semantics to the one before the split from
      	tree-ssa-uninit.c.
      5e5f880d
    • Rasmus Villemoes's avatar
      libgcc: remove crt{begin,end}.o from powerpc-wrs-vxworks target · 3e15df63
      Rasmus Villemoes authored
      Since commit 78e49fb1 (Introduce vxworks specific crtstuff support),
      the generic crtbegin.o/crtend.o have been unnecessary to build. So
      remove them from extra_parts.
      
      This is effectively a revert of commit 9a5b8df7 (libgcc: add
      crt{begin,end} for powerpc-wrs-vxworks target).
      
      libgcc/
      	* config.host (powerpc-wrs-vxworks): Do not add crtbegin.o and
      	crtend.o to extra_parts.
      3e15df63
    • Kewen Lin's avatar
      rs6000/test: Add emulated gather test case · 300dbea1
      Kewen Lin authored
      As verified, the emulated gather capability of vectorizer
      (r12-2733) can help to speed up SPEC2017 510.parest_r on
      Power8/9/10 by 5% ~ 9% with option sets Ofast unroll and
      Ofast lto.
      
      This patch is to add a test case similar to the one in i386
      to add testing coverage for 510.parest_r hotspots.
      
      btw, different from the one in i386, this uses unsigned int
      as INDEXTYPE since the unpack support for unsigned int
      (r12-3134) also matters for the hotspots vectorization.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/powerpc/vect-gather-1.c: New test.
      300dbea1
    • Andrew Pinski's avatar
      Fix PR 19089: Environment variable TMP may yield gcc: abort · 68332ab7
      Andrew Pinski authored
      Even though I cannot reproduce the ICE any more, this is still
      a bug. We check already to see if we can access the directory
      but never check to see if the path is actually a directory.
      
      This adds the check and now we reject the file as not usable
      as a tmp directory.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      libiberty/ChangeLog:
      
      	* make-temp-file.c (try_dir): Check to see if the dir
      	is actually a directory.
      68332ab7
    • GCC Administrator's avatar
      Daily bump. · 2f0dd172
      GCC Administrator authored
      2f0dd172
  2. Nov 28, 2021
    • Andrew Pinski's avatar
      Fix PR 62157: disclean in libsanitizer not working · 32377c10
      Andrew Pinski authored
      So what is happening is DIST_SUBDIRS contains the conditional
      directories which is wrong, so we need to force DIST_SUBDIRS
      to be the same as SUBDIRS as recommened by the automake manual.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      Also now make distclean works inside libsanitizer directory.
      
      libsanitizer/ChangeLog:
      
      	PR sanitizer/62157
      	* Makefile.am: Force DIST_SUBDIRS to be SUBDIRS.
      	* Makefile.in: Regenerate.
      	* asan/Makefile.in: Likewise.
      	* hwasan/Makefile.in: Likewise.
      	* interception/Makefile.in: Likewise.
      	* libbacktrace/Makefile.in: Likewise.
      	* lsan/Makefile.in: Likewise.
      	* sanitizer_common/Makefile.in: Likewise.
      	* tsan/Makefile.in: Likewise.
      	* ubsan/Makefile.in: Likewise.
      32377c10
    • Jan Hubicka's avatar
      Compare guessed and feedback frequencies during profile feedback stream-in · 2899d49e
      Jan Hubicka authored
      This patch adds simple code to dump and compare frequencies of basic blocks
      read from the profile feedback and frequencies guessed statically.
      It dumps basic blocks in the order of decreasing frequencies from feedback
      along with guessed frequencies and histograms.
      
      It makes it to possible spot basic blocks in hot regions that are considered
      cold by guessed profile or vice versa.
      
      I am trying to figure out how realistic our profile estimate is compared to
      read one on exchange2 (looking again into PR98782.  There IRA now places spills
      into hot regions of code while with older (and worse) profile it did not.
      Catch is that the function is very large and has 9 nested loops, so it is hard
      to figure out how to improve the profile estimate and/or IRA.
      
      gcc/ChangeLog:
      
      2021-11-28  Jan Hubicka  <hubicka@ucw.cz>
      
      	* profile.c: Include sreal.h
      	(struct bb_stats): New.
      	(cmp_stats): New function.
      	(compute_branch_probabilities): Output bb stats.
      2899d49e
    • Jan Hubicka's avatar
      Improve -fprofile-report · d1471457
      Jan Hubicka authored
      Profile-report was never properly updated after switch to new profile
      representation.  This patch fixes the way profile mismatches are calculated:
      we used to collect separately count and freq mismatches, while now we have
      only counts & probabilities.  So we verify
       - in count: that total count of incomming edges is close to acutal count of
         the BB
       - out prob: that total sum of outgoing edge edge probabilities is close
         to 1 (except for BB containing noreturn calls or EH).
      
      Moreover I added dumping of absolute data which is useful to plot them: with
      Martin Liska we plan to setup regular testing so we keep optimizers profie
      updates bit under control.
      
      Finally I added both static and dynamic stats about mismatches - static one is
      simply number of inconsistencies in the cfg while dynamic is scaled by the
      profile - I think in order to keep eye on optimizers the first number is quite
      relevant. WHile when tracking why code quality regressed the second number
      matters more.
      
      2021-11-28  Jan Hubicka  <hubicka@ucw.cz>
      
      	* cfghooks.c: Include sreal.h, profile.h.
      	(profile_record_check_consistency): Fix checking of count counsistency;
      	record also dynamic mismatches.
      	* cfgrtl.c (rtl_account_profile_record): Similarly.
      	* tree-cfg.c (gimple_account_profile_record): Likewise.
      	* cfghooks.h (struct profile_record): Remove num_mismatched_freq_in,
      	num_mismatched_freq_out, turn time to double, add
      	dyn_mismatched_prob_out, dyn_mismatched_count_in,
      	num_mismatched_prob_out; remove num_mismatched_count_out.
      	* passes.c (account_profile_1): New function.
      	(account_profile_in_list): New function.
      	(pass_manager::dump_profile_report): Rewrite.
      	(execute_one_ipa_transform_pass): Check profile consistency after
      	running all passes.
      	(execute_all_ipa_transforms): Remove cfun test; record all transform
      	methods.
      	(execute_one_pass): Fix collecting of profile stats.
      d1471457
    • Jakub Jelinek's avatar
      libstdc++: Implement std::byteswap for C++23 · 7393fa8b
      Jakub Jelinek authored
      This patch attempts to implement P1272R4 (except for the std::bit_cast
      changes in there which seem quite unrelated to this and will need to be
      fixed on the compiler side).
      While at least for GCC __builtin_bswap{16,32,64,128} should work fine
      in constant expressions, I wonder about other compilers, so I'm using
      a fallback implementation for constexpr evaluation always.
      If you think that is unnecessary, I can drop the
      __cpp_if_consteval >= 202106L &&
      if !consteval
        {
      and
        }
      and reformat.
      The fallback implementation is an attempt to make it work even for integral
      types that don't have number of bytes divisible by 2 or when __CHAR_BIT__
      is e.g. 16.
      
      2021-11-28  Jakub Jelinek  <jakub@redhat.com>
      
      	* include/std/bit (__cpp_lib_byteswap, byteswap): Define.
      	* include/std/version (__cpp_lib_byteswap): Define.
      	* testsuite/26_numerics/bit/bit.byteswap/byteswap.cc: New test.
      	* testsuite/26_numerics/bit/bit.byteswap/version.cc: New test.
      7393fa8b
    • Martin Liska's avatar
      d: fix thinko in optimize attr parsing · 7a66c490
      Martin Liska authored
      gcc/d/ChangeLog:
      
      	* d-attribs.cc (parse_optimize_options): Fix thinko.
      7a66c490
    • GCC Administrator's avatar
      Daily bump. · d62c8c74
      GCC Administrator authored
      d62c8c74
  3. Nov 27, 2021
    • John David Anglin's avatar
      Fix typo in t-dimode · 14dd0921
      John David Anglin authored
      2021-11-27  John David Anglin  <danglin@gcc.gnu.org>
      
      libgcc/ChangeLog:
      
      	* config/pa/t-dimode (lib2difuncs): Fix typo.
      14dd0921
    • Petter Tomner's avatar
      jit: Change printf specifiers for size_t to %zu · 1e534084
      Petter Tomner authored
      Change four occurances of %ld specifier for size_t to %zu for clean 32bit builds.
      
      Signed-off-by
      2021-11-27	Petter Tomner	<tomner@kth.se>
      
      gcc/jit/
      	* libgccjit.c: %ld -> %zu
      1e534084
    • Jakub Jelinek's avatar
      x86: Fix up x86_{,64_}sh{l,r}d patterns [PR103431] · f7e4f57f
      Jakub Jelinek authored
      The following testcase is miscompiled because the x86_{,64_}sh{l,r}d
      patterns don't properly describe what the instructions do.  One thing
      is left out, in particular that there is initial count &= 63 for
      sh{l,r}dq and initial count &= 31 for sh{l,r}d{l,w}.  And another thing
      not described properly, in particular the behavior when count (after the
      masking) is 0.  The pattern says it is e.g.
      res = (op0 << op2) | (op1 >> (64 - op2))
      but that triggers UB on op1 >> 64.  For op2 0 we actually want
      res = (op0 << op2) | 0
      When constants are propagated to these patterns during RTL optimizations,
      both such problems trigger wrong-code issues.
      This patch represents the patterns as e.g.
      res = (op0 << (op2 & 63)) | (unsigned long long) ((uint128_t) op1 >> (64 - (op2 & 63)))
      so there is both the initial masking and op2 == 0 behavior results in
      zero being ored.
      The patch introduces alternate patterns for constant op2 where
      simplify-rtx.c will fold those expressions into simple numbers,
      and define_insn_and_split pre-reload splitter for how the patterns
      looked before into the new form, so that it can pattern match during
      combine even computations that assumed the shift amount will be in
      the range of 1 .. bitsize-1.
      
      2021-11-27  Jakub Jelinek  <jakub@redhat.com>
      
      	PR middle-end/103431
      	* config/i386/i386.md (x86_64_shld, x86_shld, x86_64_shrd, x86_shrd):
      	Change insn pattern to accurately describe the instructions.
      	(*x86_64_shld_1, *x86_shld_1, *x86_64_shrd_1, *x86_shrd_1): New
      	define_insn patterns.
      	(*x86_64_shld_2, *x86_shld_2, *x86_64_shrd_2, *x86_shrd_2): New
      	define_insn_and_split patterns.
      	(*ashl<dwi>3_doubleword_mask, *ashl<dwi>3_doubleword_mask_1,
      	*<insn><dwi>3_doubleword_mask, *<insn><dwi>3_doubleword_mask_1,
      	ix86_rotl<dwi>3_doubleword, ix86_rotr<dwi>3_doubleword): Adjust
      	splitters for x86_{,64_}sh{l,r}d pattern changes.
      
      	* gcc.dg/pr103431.c: New test.
      f7e4f57f
    • Jakub Jelinek's avatar
      bswap: Fix UB in find_bswap_or_nop_finalize [PR103435] · 567d5f3d
      Jakub Jelinek authored
      On gcc.c-torture/execute/pr103376.c in the following code we trigger UB
      in the compiler.  n->range is 8 because it is 64-bit load and rsize is 0
      because it is a bswap sequence with load and known to be 0:
        /* Find real size of result (highest non-zero byte).  */
        if (n->base_addr)
          for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
        else
          rsize = n->range;
      The shifts then shift uint64_t by 64 bits.  For this case mask is 0
      and we want both *cmpxchg and *cmpnop as 0, the operation can be done as
      both nop and bswap and callers will prefer nop.
      
      2021-11-27  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/103435
      	* gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Avoid UB if
      	n->range - rsize == 8, just clear both *cmpnop and *cmpxchg in that
      	case.
      567d5f3d
    • Roger Sayle's avatar
      [Committed] Fix new ivopts-[89].c test cases for -m32. · d9c8a023
      Roger Sayle authored
      2021-11-27  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/testsuite/ChangeLog
      	* gcc.dg/tree-ssa/ivopts-8.c: Fix new test case for -m32.
      	* gcc.dg/tree-ssa/ivopts-9.c: Likewise.
      d9c8a023
    • GCC Administrator's avatar
      Daily bump. · f4ed2e3a
      GCC Administrator authored
      f4ed2e3a
    • Martin Jambor's avatar
      ipa: Fix CFG fix-up in IPA-CP transform phase (PR 103441) · 9e2e4739
      Martin Jambor authored
      I forgot that IPA passes before ipa-inline must not return
      TODO_cleanup_cfg from their transformation function because ordinary
      CFG cleanup does not remove call graph edges associated with removed
      call statements but must use
      delete_unreachable_blocks_update_callgraph instead.  This patch fixes
      that error.
      
      gcc/ChangeLog:
      
      2021-11-26  Martin Jambor  <mjambor@suse.cz>
      
      	PR ipa/103441
      	* ipa-prop.c (ipcp_transform_function): Call
      	delete_unreachable_blocks_update_callgraph instead of returning
      	TODO_cleanup_cfg.
      9e2e4739
  4. Nov 26, 2021
    • Jonathan Wakely's avatar
      libstdc++: Fix test that fails in C++20 mode · 52b76943
      Jonathan Wakely authored
      This test was written to verify that the LWG 3265 changes work. But
      those changes were superseded by LWG 3435, and the test is now incorrect
      according to the current draft. The assignment operator is now
      constrained to also require convertibility, which makes the test fail.
      
      Change the Iter type to be convertible from int*, but make it throw an
      exception if that conversion is used. Change the test from compile-only
      to run, so we verify that the exception isn't thrown.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/24_iterators/move_iterator/dr3265.cc: Fix test to
      	account for LWG 3435 resolution.
      52b76943
    • Jonathan Wakely's avatar
      libstdc++: Fix trivial relocation for constexpr std::vector · 33adfd0d
      Jonathan Wakely authored
      When implementing constexpr std::vector I added a check for constant
      evaluation in vector::_S_use_relocate(), so that we would not try to relocate
      trivial objects by using memmove. But I put it in the constexpr function
      that decides whether to relocate or not, and calls to that function are
      always constant evaluated. This had the effect of disabling relocation
      entirely, even in non-constexpr vectors.
      
      This removes the check in _S_use_relocate() and modifies the actual
      relocation algorithm, __relocate_a_1, to use the non-trivial
      implementation instead of memmove when called during constant
      evaluation.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/stl_uninitialized.h (__relocate_a_1): Do not use
      	memmove during constant evaluation.
      	* include/bits/stl_vector.h (vector::_S_use_relocate()): Do not
      	check is_constant_evaluated in always-constexpr function.
      33adfd0d
    • Jonathan Wakely's avatar
      libstdc++: Remove workaround for FE bug in std::tuple [PR96592] · 76c6be48
      Jonathan Wakely authored
      The FE bug was fixed, so we don't need this workaround now.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/96592
      	* include/std/tuple (tuple::is_constructible): Remove.
      76c6be48
    • Harald Anlauf's avatar
      Fortran: improve check of arguments to the RESHAPE intrinsic · 4d540c7a
      Harald Anlauf authored
      gcc/fortran/ChangeLog:
      
      	PR fortran/103411
      	* check.c (gfc_check_reshape): Improve check of size of source
      	array for the RESHAPE intrinsic against the given shape when pad
      	is not given, and shape is a parameter.  Try other simplifications
      	of shape.
      
      gcc/testsuite/ChangeLog:
      
      	PR fortran/103411
      	* gfortran.dg/pr68153.f90: Adjust test to improved check.
      	* gfortran.dg/reshape_7.f90: Likewise.
      	* gfortran.dg/reshape_9.f90: New test.
      4d540c7a
    • Iain Sandoe's avatar
      libitm: Fix bootstrap for targets without HAVE_ELF_STYLE_WEAKREF. · caa04517
      Iain Sandoe authored
      
      Recent improvements to null address warnings notice that for
      targets that do not support HAVE_ELF_STYLE_WEAKREF the dummy stub
      implementation of __cxa_get_globals() means that the address can
      never be null.
      
      Fixed by removing the test for such targets.
      
      Signed-off-by: default avatarIain Sandoe <iain@sandoe.co.uk>
      
      libitm/ChangeLog:
      
      	* eh_cpp.cc (GTM::gtm_thread::init_cpp_exceptions): If the
      	target does not support HAVE_ELF_STYLE_WEAKREF then do not
      	try to test the __cxa_get_globals against NULL.
      caa04517
    • Siddhesh Poyarekar's avatar
      tree-object-size: Abstract object_sizes array · 4a200759
      Siddhesh Poyarekar authored
      
      Put all accesses to object_sizes behind functions so that we can add
      dynamic capability more easily.
      
      gcc/ChangeLog:
      
      	* tree-object-size.c (object_sizes_grow, object_sizes_release,
      	object_sizes_unknown_p, object_sizes_get, object_size_set_force,
      	object_sizes_set): New functions.
      	(addr_object_size, compute_builtin_object_size,
      	expr_object_size, call_object_size, unknown_object_size,
      	merge_object_sizes, plus_stmt_object_size,
      	cond_expr_object_size, collect_object_sizes_for,
      	check_for_plus_in_loops_1, init_object_sizes,
      	fini_object_sizes): Adjust.
      
      Signed-off-by: default avatarSiddhesh Poyarekar <siddhesh@gotplt.org>
      4a200759
    • Siddhesh Poyarekar's avatar
      tree-object-size: Replace magic numbers with enums · 35c8bbe9
      Siddhesh Poyarekar authored
      
      A simple cleanup to allow inserting dynamic size code more easily.
      
      gcc/ChangeLog:
      
      	* tree-object-size.c: New enum.
      	(object_sizes, computed, addr_object_size,
      	compute_builtin_object_size, expr_object_size, call_object_size,
      	merge_object_sizes, plus_stmt_object_size,
      	collect_object_sizes_for, init_object_sizes, fini_object_sizes,
      	object_sizes_execute): Replace magic numbers with enums.
      
      Signed-off-by: default avatarSiddhesh Poyarekar <siddhesh@gotplt.org>
      35c8bbe9
    • Roger Sayle's avatar
      ivopts: Improve code generated for very simple loops. · b41be002
      Roger Sayle authored
      This patch tidies up the code that GCC generates for simple loops,
      by selecting/generating a simpler loop bound expression in ivopts.
      The original motivation came from looking at the following loop (from
      gcc.target/i386/pr90178.c)
      
      int *find_ptr (int* mem, int sz, int val)
      {
        for (int i = 0; i < sz; i++)
          if (mem[i] == val)
            return &mem[i];
        return 0;
      }
      
      which GCC currently compiles to:
      
      find_ptr:
              movq    %rdi, %rax
              testl   %esi, %esi
              jle     .L4
              leal    -1(%rsi), %ecx
              leaq    4(%rdi,%rcx,4), %rcx
              jmp     .L3
      .L7:    addq    $4, %rax
              cmpq    %rcx, %rax
              je      .L4
      .L3:    cmpl    %edx, (%rax)
              jne     .L7
              ret
      .L4:    xorl    %eax, %eax
              ret
      
      Notice the relatively complex leal/leaq instructions, that result
      from ivopts using the following expression for the loop bound:
      inv_expr 2:     ((unsigned long) ((unsigned int) sz_8(D) + 4294967295)
      		* 4 + (unsigned long) mem_9(D)) + 4
      
      which results from NITERS being (unsigned int) sz_8(D) + 4294967295,
      i.e. (sz - 1), and the logic in cand_value_at determining the bound
      as BASE + NITERS*STEP at the start of the final iteration and as
      BASE + NITERS*STEP + STEP at the end of the final iteration.
      
      Ideally, we'd like the middle-end optimizers to simplify
      BASE + NITERS*STEP + STEP as BASE + (NITERS+1)*STEP, especially
      when NITERS already has the form BOUND-1, but with type conversions
      and possible overflow to worry about, the above "inv_expr 2" is the
      best that can be done by fold (without additional context information).
      
      This patch improves ivopts' cand_value_at by instead of using just
      the tree expression for NITERS, passing the data structure that
      explains how that expression was derived.  This allows us to peek
      under the surface to check that NITERS+1 doesn't overflow, and in
      this patch to use the SSA_NAME already holding the required value.
      
      In the motivating loop above, inv_expr 2 now becomes:
      (unsigned long) sz_8(D) * 4 + (unsigned long) mem_9(D)
      
      And as a result, on x86_64 we now generate:
      
      find_ptr:
              movq    %rdi, %rax
              testl   %esi, %esi
              jle     .L4
              movslq  %esi, %rsi
              leaq    (%rdi,%rsi,4), %rcx
              jmp     .L3
      .L7:    addq    $4, %rax
              cmpq    %rcx, %rax
              je      .L4
      .L3:    cmpl    %edx, (%rax)
              jne     .L7
              ret
      .L4:    xorl    %eax, %eax
              ret
      
      This improvement required one minor tweak to GCC's testsuite for
      gcc.dg/wrapped-binop-simplify.c, where we again generate better
      code, and therefore no longer find as many optimization opportunities
      in later passes (vrp2).
      
      Previously:
      
      void v1 (unsigned long *in, unsigned long *out, unsigned int n)
      {
        int i;
        for (i = 0; i < n; i++) {
          out[i] = in[i];
        }
      }
      
      on x86_64 generated:
      v1:	testl   %edx, %edx
              je      .L1
              movl    %edx, %edx
              xorl    %eax, %eax
      .L3:	movq    (%rdi,%rax,8), %rcx
              movq    %rcx, (%rsi,%rax,8)
              addq    $1, %rax
              cmpq    %rax, %rdx
              jne     .L3
      .L1:	ret
      
      and now instead generates:
      v1:	testl   %edx, %edx
              je      .L1
              movl    %edx, %edx
              xorl    %eax, %eax
              leaq    0(,%rdx,8), %rcx
      .L3:	movq    (%rdi,%rax), %rdx
              movq    %rdx, (%rsi,%rax)
              addq    $8, %rax
              cmpq    %rax, %rcx
              jne     .L3
      .L1:	ret
      
      2021-11-26  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	* tree-ssa-loop-ivopts.c (cand_value_at): Take a class
      	tree_niter_desc* argument instead of just a tree for NITER.
      	If we require the iv candidate value at the end of the final
      	loop iteration, try using the original loop bound as the
      	NITER for sufficiently simple loops.
      	(may_eliminate_iv): Update (only) call to cand_value_at.
      
      gcc/testsuite/ChangeLog
      	* gcc.dg/wrapped-binop-simplify.c: Update expected test result.
      	* gcc.dg/tree-ssa/ivopts-5.c: New test case.
      	* gcc.dg/tree-ssa/ivopts-6.c: New test case.
      	* gcc.dg/tree-ssa/ivopts-7.c: New test case.
      	* gcc.dg/tree-ssa/ivopts-8.c: New test case.
      	* gcc.dg/tree-ssa/ivopts-9.c: New test case.
      b41be002
    • Jonathan Wakely's avatar
      libstdc++: Ensure dg-add-options comes after dg-options · 665f726b
      Jonathan Wakely authored
      This is what the docs say is required.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/29_atomics/atomic_float/1.cc: Reorder directives.
      665f726b
    • Jonathan Wakely's avatar
      libstdc++: Fix dg-do directive for tests supposed to be run · 0a12bd92
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	* testsuite/23_containers/unordered_map/modifiers/move_assign.cc:
      	Change dg-do compile to run.
      	* testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499.cc:
      	Likewise.
      0a12bd92
    • Jonathan Wakely's avatar
      libstdc++: Remove redundant xfail selectors in dg-do compile tests · 1ecc9ba5
      Jonathan Wakely authored
      An 'xfail' selector means the test is expected to fail at runtime, so is
      ignored for a compile-only test. The way to mark a compile-only test as
      failing is with dg-error (which these already do).
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/21_strings/basic_string_view/element_access/char/back_constexpr_neg.cc:
      	Remove xfail selector.
      	* testsuite/21_strings/basic_string_view/element_access/char/constexpr_neg.cc:
      	Likewise.
      	Likewise.
      	* testsuite/21_strings/basic_string_view/element_access/char/front_constexpr_neg.cc:
      	Likewise.
      	* testsuite/21_strings/basic_string_view/element_access/wchar_t/back_constexpr_neg.cc:
      	Likewise.
      	* testsuite/21_strings/basic_string_view/element_access/wchar_t/constexpr_neg.cc:
      	Likewise.
      	* testsuite/21_strings/basic_string_view/element_access/wchar_t/front_constexpr_neg.cc:
      	Likewise.
      	* testsuite/23_containers/span/101411.cc: Likewise.
      	* testsuite/25_algorithms/copy/debug/constexpr_neg.cc: Likewise.
      	* testsuite/25_algorithms/copy_backward/debug/constexpr_neg.cc:
      	Likewise.
      	* testsuite/25_algorithms/equal/constexpr_neg.cc: Likewise.
      	* testsuite/25_algorithms/equal/debug/constexpr_neg.cc: Likewise.
      	* testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_neg.cc:
      	Likewise.
      	* testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_pred_neg.cc:
      	Likewise.
      	* testsuite/25_algorithms/lower_bound/debug/constexpr_valid_range_neg.cc:
      	Likewise.
      	* testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_neg.cc:
      	Likewise.
      	* testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_pred_neg.cc:
      	Likewise.
      	* testsuite/25_algorithms/upper_bound/debug/constexpr_valid_range_neg.cc:
      	Likewise.
      1ecc9ba5
    • Martin Liska's avatar
      d: fix ASAN in option processing · f1ec39c8
      Martin Liska authored
      Fixes:
      
      ==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178
      READ of size 4 at 0x00000666ca5c thread T0
          #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
          #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
          #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
          #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829
          #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427
          #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
          #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
          #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
          #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146
      
      for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.
      
      gcc/d/ChangeLog:
      
      	* d-attribs.cc (parse_optimize_options): Check index before
      	accessing cl_options.
      f1ec39c8
Loading