Skip to content
Snippets Groups Projects
  1. Nov 20, 2020
    • Jakub Jelinek's avatar
      arm: Fix up neon_vector_mem_operand [PR97528] · 410b8f6f
      Jakub Jelinek authored
      The documentation for POST_MODIFY says:
         Currently, the compiler can only handle second operands of the
         form (plus (reg) (reg)) and (plus (reg) (const_int)), where
         the first operand of the PLUS has to be the same register as
         the first operand of the *_MODIFY.
      The following testcase ICEs, because combine just attempts to simplify
      things and ends up with
      (post_modify (reg1) (plus (mult (reg2) (const_int 4)) (reg1))
      but the target predicates accept it, because they only verify
      that POST_MODIFY's second operand is PLUS and the second operand
      of the PLUS is a REG.
      
      The following patch fixes this by performing further verification that
      the POST_MODIFY is in the form it should be.
      
      2020-11-20  Jakub Jelinek  <jakub@redhat.com>
      
      	PR target/97528
      	* config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY, require
      	first POST_MODIFY operand is a REG and is equal to the first operand
      	of PLUS.
      
      	* gcc.target/arm/pr97528.c: New test.
      410b8f6f
    • Eric Botcazou's avatar
      Plug loophole in string store merging · 1b3c9813
      Eric Botcazou authored
      There is a loophole in new string store merging support added recently:
      it does not check that the stores are consecutive, which is obviously
      required if you want to concatenate them...  Simple fix attached, the
      nice thing being that it can fall back to the regular processing if
      any hole is detected in the series of stores, thanks to the handling
      of STRING_CST by native_encode_expr.
      
      gcc/ChangeLog:
      	* gimple-ssa-store-merging.c (struct merged_store_group): Add
      	new 'consecutive' field.
      	(merged_store_group): Set it to true.
      	(do_merge): Set it to false if the store is not consecutive and
      	set string_concatenation to false in this case.
      	(merge_into): Call do_merge on entry.
      	(merge_overlapping): Likewise.
      
      gcc/testsuite/ChangeLog:
      	* gnat.dg/opt90a.adb: New test.
      	* gnat.dg/opt90b.adb: Likewise.
      	* gnat.dg/opt90c.adb: Likewise.
      	* gnat.dg/opt90d.adb: Likewise.
      	* gnat.dg/opt90e.adb: Likewise.
      	* gnat.dg/opt90a_pkg.ads: New helper.
      	* gnat.dg/opt90b_pkg.ads: Likewise.
      	* gnat.dg/opt90c_pkg.ads: Likewise.
      	* gnat.dg/opt90d_pkg.ads: Likewise.
      	* gnat.dg/opt90e_pkg.ads: Likewise.
      1b3c9813
    • Jan Hubicka's avatar
      Fix comment in ipa-icf-gimple.c · cd287abe
      Jan Hubicka authored
      	* ipa-icf-gimple.c (func_checker::operand_equal_p): Fix comment.
      cd287abe
    • Jan Hubicka's avatar
      Fix comparsion of {CLOBBER} in icf · 8e394101
      Jan Hubicka authored
      after fixing few issues I gotto stage where 1.4M icf mismatches are due to
      comparing two gimple clobber.  The problem is that operand_equal_p match
      clobber
      
      case CONSTRUCTOR:
       /* In GIMPLE empty constructors are allowed in initializers of
          aggregates.  */
       return !CONSTRUCTOR_NELTS (arg0) && !CONSTRUCTOR_NELTS (arg1);
      
      But this happens too late after comparing its types (that are not very relevant
      for memory store).
      
      In the context of ipa-icf we do not really need to match RHS of gimple clobbers:
      it is enough to know that the LHS stores can be considered equivalent.
      
      I this added logic to hash them all the same way and compare using
      TREE_CLOBBER_P flag.  I see other option in extending operand_equal_p
      in fold-const to handle them more generously or making stmt hash and compare
      to skip comparing/hashing RHS of gimple_clobber_p.
      
      	* ipa-icf-gimple.c (func_checker::hash_operand): Hash gimple clobber.
      	(func_checker::operand_equal_p): Special case gimple clobber.
      8e394101
    • Uros Bizjak's avatar
      i386: Optimize abs expansion [PR97873] · fdace758
      Uros Bizjak authored
      The patch introduces absM named pattern to generate optimal insn sequence
      for CMOVE_TARGET targets.  Currently, the expansion goes through neg+max
      optabs, and the following code is generated:
      
      	movl    %edi, %eax
      	negl    %eax
      	cmpl    %edi, %eax
      	cmovl   %edi, %eax
      
      This sequence is unoptimal in two ways.  a) The compare instruction is
      not needed, since NEG insn sets the sign flag based on the result.
      The CMOV can use sign flag to select between negated and original value:
      
      	movl    %edi, %eax
      	negl    %eax
      	cmovs   %edi, %eax
      
      b) On some targets, CMOV is undesirable due to its performance issues.
      In addition to TARGET_EXPAND_ABS bypass, the patch introduces STV
      conversion of abs RTX to use PABS SSE insn:
      
      	vmovd   %edi, %xmm0
      	vpabsd  %xmm0, %xmm0
      	vmovd   %xmm0, %eax
      
      The patch changes compare mode of NEG instruction to CCGOCmode,
      which is the same mode as the mode of SUB instruction. IOW, sign bit
      becomes usable.
      
      Also, the mode iterator of <maxmin:code><mode>3 pattern is changed
      to SWI48x instead of SWI248. The purpose of maxmin expander is to
      prepare max/min RTX for STV to eventually convert them to SSE PMAX/PMIN
      instructions, in order to *avoid* CMOV insns with general registers.
      
      2020-11-20  Uroš Bizjak  <ubizjak@gmail.com>
      
      gcc/
      	PR target/97873
      	* config/i386/i386.md (*neg<mode>2_2): Rename from
      	"*neg<mode>2_cmpz".  Use CCGOCmode instead of CCZmode.
      	(*negsi2_zext): Rename from *negsi2_cmpz_zext.
      	Use CCGOCmode instead of CCZmode.
      	(*neg<mode>_ccc_1): New insn pattern.
      	(*neg<dwi>2_doubleword): Use *neg<mode>_ccc_1.
      
      	(abs<mode>2): Add FLAGS_REG clobber.
      	Use TARGET_CMOVE insn predicate.
      	(*abs<mode>2_1): New insn_and_split pattern.
      	(*absdi2_doubleword): Ditto.
      
      	(<maxmin:code><mode>3): Use SWI48x mode iterator.
      	(*<maxmin:code><mode>3): Use SWI48 mode iterator.
      
      	* config/i386/i386-features.c
      	(general_scalar_chain::compute_convert_gain): Handle ABS code.
      	(general_scalar_chain::convert_insn): Ditto.
      	(general_scalar_to_vector_candidate_p): Ditto.
      
      gcc/testsuite/
      	PR target/97873
      	* gcc.target/i386/pr97873.c: New test.
      	* gcc.target/i386/pr97873-1.c: New test.
      fdace758
    • Jakub Jelinek's avatar
      configury: Fix up --enable-link-serialization support · a774a6a2
      Jakub Jelinek authored
      Eric reported that the --enable-link-serialization changes seemed to
      cause the binaries to be always relinked, for example from the
      gcc/ directory of the build tree:
      make
      [relink of gnat1, brig1, cc1plus, d21, f951, go1, lto1, ...]
      make
      [relink of gnat1, brig1, cc1plus, d21, f951, go1, lto1, ...]
      Furthermore as reported in PR, it can cause problems during make install
      where make install rebuilds the binaries again.
      
      The problem is that for make .PHONY targets are just
      "rebuilt" always, so it is very much undesirable for the cc1plus$(exeext)
      etc. dependencies to include .PHONY targets, but I was using
      them - cc1plus.prev which would depend on some *.serial and
      e.g. cc1.serial depending on c and c depending on cc1$(exeext).
      
      The following patch rewrites this so that *.serial and *.prev aren't
      .PHONY targets, but instead just make variables.
      
      I was worried that the order in which the language makefile fragments are
      included (which is quite random, what order we get from the filesystem
      matching */config-lang.in) would be a problem but it seems to work fine
      - as it uses make = rather than := variables, later definitions are just
      fine for earlier uses as long as the uses aren't needed during the
      makefile parsing, but only in the dependencies of make targets and in
      their commands.
      
      2020-11-20  Jakub Jelinek  <jakub@redhat.com>
      
      	PR other/97911
      gcc/
      	* configure.ac: In SERIAL_LIST use lang words without .serial
      	suffix.  Change $lang.prev from a target to variable and instead
      	of depending on *.serial expand to the *.serial variable if
      	the word is in the SERIAL_LIST at all, otherwise to nothing.
      	* configure: Regenerated.
      gcc/c/
      	* Make-lang.in (c.serial): Change from goal to a variable.
      	(.PHONY): Drop c.serial.
      gcc/ada/
      	* gcc-interface/Make-lang.in (ada.serial): Change from goal to a
      	variable.
      	(.PHONY): Drop ada.serial and ada.prev.
      	(gnat1$(exeext)): Depend on $(ada.serial) rather than ada.serial.
      gcc/brig/
      	* Make-lang.in (brig.serial): Change from goal to a variable.
      	(.PHONY): Drop brig.serial and brig.prev.
      	(brig1$(exeext)): Depend on $(brig.serial) rather than brig.serial.
      gcc/cp/
      	* Make-lang.in (c++.serial): Change from goal to a variable.
      	(.PHONY): Drop c++.serial and c++.prev.
      	(cc1plus$(exeext)): Depend on $(c++.serial) rather than c++.serial.
      gcc/d/
      	* Make-lang.in (d.serial): Change from goal to a variable.
      	(.PHONY): Drop d.serial and d.prev.
      	(d21$(exeext)): Depend on $(d.serial) rather than d.serial.
      gcc/fortran/
      	* Make-lang.in (fortran.serial): Change from goal to a variable.
      	(.PHONY): Drop fortran.serial and fortran.prev.
      	(f951$(exeext)): Depend on $(fortran.serial) rather than
      	fortran.serial.
      gcc/go/
      	* Make-lang.in (go.serial): Change from goal to a variable.
      	(.PHONY): Drop go.serial and go.prev.
      	(go1$(exeext)): Depend on $(go.serial) rather than go.serial.
      gcc/jit/
      	* Make-lang.in (jit.serial): Change from goal to a
      	variable.
      	(.PHONY): Drop jit.serial and jit.prev.
      	($(LIBGCCJIT_FILENAME)): Depend on $(jit.serial) rather than
      	jit.serial.
      gcc/lto/
      	* Make-lang.in (lto1.serial, lto2.serial): Change from goals to
      	variables.
      	(.PHONY): Drop lto1.serial, lto2.serial, lto1.prev and lto2.prev.
      	($(LTO_EXE)): Depend on $(lto1.serial) rather than lto1.serial.
      	($(LTO_DUMP_EXE)): Depend on $(lto2.serial) rather than lto2.serial.
      gcc/objc/
      	* Make-lang.in (objc.serial): Change from goal to a variable.
      	(.PHONY): Drop objc.serial and objc.prev.
      	(cc1obj$(exeext)): Depend on $(objc.serial) rather than objc.serial.
      gcc/objcp/
      	* Make-lang.in (obj-c++.serial): Change from goal to a variable.
      	(.PHONY): Drop obj-c++.serial and obj-c++.prev.
      	(cc1objplus$(exeext)): Depend on $(obj-c++.serial) rather than
      	obj-c++.serial.
      a774a6a2
    • Kewen Lin's avatar
      rs6000: Fix p8_mtvsrd_df's insn type · 02109ea2
      Kewen Lin authored
      This patch is to fix insn type of p8_mtvsrd_df from mfvsr to mtvsr,
      in order to align with the other places using mtvsrd.
      
      gcc/ChangeLog:
      
      	* config/rs6000/rs6000.md (p8_mtvsrd_df): Fix insn type.
      02109ea2
    • Martin Uecker's avatar
      C: Drop qualifiers during lvalue conversion [PR97702] · 32934a4f
      Martin Uecker authored
      2020-11-20  Martin Uecker  <muecker@gwdg.de>
      
      gcc/
      	* gimplify.c (gimplify_modify_expr_rhs): Optimizie
      	NOP_EXPRs that contain compound literals.
      
      gcc/c/
      	* c-typeck.c (convert_lvalue_to_rvalue): Drop qualifiers.
      
      gcc/testsuite/
      	* gcc.dg/cond-constqual-1.c: Adapt test.
      	* gcc.dg/lvalue-11.c: New test.
      	* gcc.dg/pr60195.c: Add warning.
      32934a4f
    • GCC Administrator's avatar
      Daily bump. · d62586ee
      GCC Administrator authored
      d62586ee
  2. Nov 19, 2020
    • Jakub Jelinek's avatar
      ranger: Improve a % b operand ranges [PR91029] · d3f29334
      Jakub Jelinek authored
      As mentioned in the PR, the previous PR91029 patch was testing
      op2 >= 0 which is unnecessary, even negative op2 values will work the same,
      furthermore, from if a % b > 0 we can deduce a > 0 rather than just a >= 0
      (0 % b would be 0), and it actually valid even for other constants than 0,
      a % b > 5 means a > 5 (a % b has the same sign as a and a in [0, 5] would
      result in a % b in [0, 5].  Also, we can deduce a range for the other
      operand, if we know
      a % b >= 20, then b must be (in absolute value for signed modulo) > 20,
      for a % [0, 20] the result would be [0, 19].
      
      2020-11-19  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/91029
      	* range-op.cc (operator_trunc_mod::op1_range): Don't require signed
      	types, nor require that op2 >= 0.  Implement (a % b) >= x && x > 0
      	implies a >= x and (a % b) <= x && x < 0 implies a <= x.
      	(operator_trunc_mod::op2_range): New method.
      
      	* gcc.dg/tree-ssa/pr91029-1.c: New test.
      	* gcc.dg/tree-ssa/pr91029-2.c: New test.
      d3f29334
    • Andrew MacLeod's avatar
      Process only valid shift ranges. · d0d8b5d8
      Andrew MacLeod authored
      When shifting outside the valid range of [0, precision-1], we can
      choose to process just the valid ones since the rest is undefined.
      this allows us to produce results for x << [0,2][+INF, +INF] by discarding
      the invalid ranges and processing just [0,2].
      
      	gcc/
      	PR tree-optimization/93781
      	* range-op.cc (get_shift_range): Rename from
      	undefined_shift_range_check and now return valid shift ranges.
      	(operator_lshift::fold_range): Use result from get_shift_range.
      	(operator_rshift::fold_range): Ditto.
      	gcc/testsuite/
      	* gcc.dg/tree-ssa/pr93781-1.c: New.
      	* gcc.dg/tree-ssa/pr93781-2.c: New.
      	* gcc.dg/tree-ssa/pr93781-3.c: New.
      d0d8b5d8
    • Nathan Sidwell's avatar
      c++: Template hash access · 5bba2215
      Nathan Sidwell authored
      This exposes the template specialization table, so the modules
      machinery may access it.  The hashed entity (tmpl, args & spec) is
      available, along with a hash table walker.  We also need a way of
      finding or inserting entries, along with some bookkeeping fns to deal
      with the instantiation and (partial) specialization lists.
      
      	gcc/cp/
      	* cp-tree.h (struct spec_entry): Moved from pt.c.
      	(walk_specializations, match_mergeable_specialization)
      	(get_mergeable_specialization_flags)
      	(add_mergeable_specialization): Declare.
      	* pt.c (struct spec_entry): Moved to cp-tree.h.
      	(walk_specializations, match_mergeable_specialization)
      	(get_mergeable_specialization_flags)
      	(add_mergeable_specialization): New.
      5bba2215
    • Jonathan Wakely's avatar
      libstdc++: Avoid calling undefined __gthread_self weak symbol [PR 95989] · 08b4d325
      Jonathan Wakely authored
      Since glibc 2.27 the pthread_self symbol has been defined in libc rather
      than libpthread. Because we only call pthread_self through a weak alias
      it's possible for statically linked executables to end up without a
      definition of pthread_self. This crashes when trying to call an
      undefined weak symbol.
      
      We can use the __GLIBC_PREREQ version check to detect the version of
      glibc where pthread_self is no longer in libpthread, and call it
      directly rather than through the weak reference.
      
      It would be better to check for pthread_self in libc during configure
      instead of hardcoding the __GLIBC_PREREQ check. That would be
      complicated by the fact that prior to glibc 2.27 libc.a didn't have the
      pthread_self symbol, but libc.so.6 did.  The configure checks would need
      to try to link both statically and dynamically, and the result would
      depend on whether the static libc.a happens to be installed during
      configure (which could vary between different systems using the same
      version of glibc). Doing it properly is left for a future date, as that
      will be needed anyway after glibc moves all pthread symbols from
      libpthread to libc. When that happens we should revisit the whole
      approach of using weak symbols for pthread symbols.
      
      For the purposes of std::this_thread::get_id() we call
      pthread_self() directly when using glibc 2.27 or later. Otherwise, if
      __gthread_active_p() is true then we know the libpthread symbol is
      available so we call that. Otherwise, we are single-threaded and just
      use ((__gthread_t)1) as the thread ID.
      
      An undesirable consequence of this change is that code compiled prior to
      the change might inline the old definition of this_thread::get_id()
      which always returns (__gthread_t)1 in a program that isn't linked to
      libpthread. Code compiled after the change will use pthread_self() and
      so get a real TID. That could result in the main thread having different
      thread::id values in different translation units. This seems acceptable,
      as there are not expected to be many uses of thread::id in programs
      that aren't linked to libpthread.
      
      An earlier version of this patch also changed __gthread_self() to use
      __GLIBC_PREREQ(2, 27) and only use the weak symbol for older glibc. Tha
      might still make sense to do, but isn't needed by libstdc++ now.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/95989
      	* config/os/gnu-linux/os_defines.h (_GLIBCXX_NATIVE_THREAD_ID):
      	Define new macro to get reliable thread ID.
      	* include/bits/std_thread.h: (this_thread::get_id): Use new
      	macro if it's defined.
      	* testsuite/30_threads/jthread/95989.cc: New test.
      	* testsuite/30_threads/this_thread/95989.cc: New test.
      08b4d325
    • Nathan Sidwell's avatar
      c++: Expose constexpr hash table · bfc139e2
      Nathan Sidwell authored
      This patch exposes the constexpr hash table so that the modules
      machinery can save and load constexpr bodies.  While there I noticed
      that we could do a little constification of the hasher and comparator
      functions.  Also combine the saving machinery to a single function
      returning void -- nothing ever looked at its return value.
      
      	gcc/cp/
      	* cp-tree.h (struct constexpr_fundef): Moved from constexpr.c.
      	(maybe_save_constexpr_fundef): Declare.
      	(register_constexpr_fundef): Take constexpr_fundef object, return
      	void.
      	* decl.c (mabe_save_function_definition): Delete, functionality
      	moved to maybe_save_constexpr_fundef.
      	(emit_coro_helper, finish_function): Adjust.
      	* constexpr.c (struct constexpr_fundef): Moved to cp-tree.h.
      	(constexpr_fundef_hasher::equal): Constify.
      	(constexpr_fundef_hasher::hash): Constify.
      	(retrieve_constexpr_fundef): Make non-static.
      	(maybe_save_constexpr_fundef): Break out checking and duplication
      	from ...
      	(register_constexpr_fundef): ... here.  Just register the constexpr.
      bfc139e2
    • Jan Hubicka's avatar
      Fix two bugs in operand_equal_p · 0862d007
      Jan Hubicka authored
      	* fold-const.c (operand_compare::operand_equal_p): Fix thinko in
      	COMPONENT_REF handling and guard types_same_for_odr by
      	virtual_method_call_p.
      	(operand_compare::hash_operand): Likewise.
      0862d007
    • Jakub Jelinek's avatar
      c, tree: Fix ICE from get_parm_array_spec [PR97860] · 8156cfaa
      Jakub Jelinek authored
      The C and C++ FEs handle zero sized arrays differently, C uses
      NULL TYPE_MAX_VALUE on non-NULL TYPE_DOMAIN on complete ARRAY_TYPEs
      with bitsize_zero_node TYPE_SIZE, while C++ FE likes to set
      TYPE_MAX_VALUE to the largest value (and min to the lowest).
      
      Martin has used array_type_nelts in get_parm_array_spec where the
      function on the C form of [0] arrays returns error_mark_node and the code
      crashes soon afterwards.  The following patch teaches array_type_nelts about
      this (e.g. dwarf2out already handles that as [0]).  While it will change
      what is_empty_type returns for certain types (e.g. struct S { int a[0]; };),
      as those types occupy zero bits in C, it should make an ABI difference.
      
      So, the tree.c change makes the c-decl.c code handle the [0] arrays
      like any other constant extents, and the c-decl.c change just makes sure
      that if we'd run into error_mark_node e.g. from the VLA expressions, we
      don't crash on those.
      
      2020-11-19  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c/97860
      	* tree.c (array_type_nelts): For complete arrays with zero min
      	and NULL max and zero size return -1.
      
      	* c-decl.c (get_parm_array_spec): Bail out of nelts is
      	error_operand_p.
      
      	* gcc.dg/pr97860.c: New test.
      8156cfaa
    • Marek Polacek's avatar
      c++: Fix array new with value-initialization [PR97523] · ae48b74c
      Marek Polacek authored
      Since my r11-3092 the following is rejected with -std=c++20:
      
        struct T { explicit T(); };
        void fn(int n) {
          new T[1]();
        }
      
      with "would use explicit constructor 'T::T()'".  It is because since
      that change we go into the P1009 block in build_new (array_p is false,
      but nelts is non-null and we're in C++20).  Since we only have (), we
      build a {} and continue to build_new_1, which then calls build_vec_init
      and then we error because the {} isn't CONSTRUCTOR_IS_DIRECT_INIT.
      
      For (), which is value-initializing, we want to do what we were doing
      before: pass empty init and let build_value_init take care of it.
      
      For various reasons I wanted to dig a little bit deeper into this,
      and as a result, I'm adding a test for [expr.new]/24 (and checked that
      out current behavior matches clang++).
      
      gcc/cp/ChangeLog:
      
      	PR c++/97523
      	* init.c (build_new): When value-initializing an array new,
      	leave the INIT as an empty vector.
      
      gcc/testsuite/ChangeLog:
      
      	PR c++/97523
      	* g++.dg/expr/anew5.C: New test.
      	* g++.dg/expr/anew6.C: New test.
      ae48b74c
    • Marek Polacek's avatar
      c++: Fix crash with broken deduction from {} [PR97895] · 25056bdf
      Marek Polacek authored
      Unfortunately, the otherwise beautiful
      
        for (constructor_elt &elt : *CONSTRUCTOR_ELTS (init))
      
      is not immune to an empty constructor, so we have to check
      CONSTRUCTOR_ELTS first.
      
      gcc/cp/ChangeLog:
      
      	PR c++/97895
      	* pt.c (do_auto_deduction): Don't crash when the constructor has
      	zero elements.
      
      gcc/testsuite/ChangeLog:
      
      	PR c++/97895
      	* g++.dg/cpp0x/auto54.C: New test.
      25056bdf
    • Nathan Sidwell's avatar
      config: Add tests for modules-desired features · e1f07131
      Nathan Sidwell authored
      this adds configure tests for features that modules can take advantage
      of -- and if they are not present has reduced or fallback functionality.
      
      	gcc/
      	* configure.ac: Add tests for fstatat, sighandler_t, O_CLOEXEC,
      	unix-domain and ipv6 sockets.
      	* config.in: Rebuilt.
      	* configure: Rebuilt.
      e1f07131
    • Nathan Sidwell's avatar
      c++: Relax new assert [PR 97905] · 255483e5
      Nathan Sidwell authored
      It turns out there are legitimate cases for the new decl to not have
      lang-specific.
      
      	PR c++/97905
      	gcc/cp/
      	* decl.c (duplicate_decls): Relax new assert.
      	gcc/testsuite/
      	* g++.dg/lookup/pr97905.C: New.
      255483e5
    • Dimitar Dimitrov's avatar
      pru: Add builtins for HALT and LMBD · 5ace1776
      Dimitar Dimitrov authored
      Add builtins for HALT and LMBD, per Texas Instruments document
      SPRUHV7C.  Use the new LMBD pattern to define an expand for clz.
      
      Binutils [1] and sim [2] support for LMBD instruction are merged now.
      
      [1] https://sourceware.org/pipermail/binutils/2020-October/113901.html
      [2] https://sourceware.org/pipermail/gdb-patches/2020-November/173141.html
      
      gcc/ChangeLog:
      
      	* config/pru/alu-zext.md: Add lmbd patterns for zero_extend
      	variants.
      	* config/pru/pru.c (enum pru_builtin): Add HALT and LMBD.
      	(pru_init_builtins): Ditto.
      	(pru_builtin_decl): Ditto.
      	(pru_expand_builtin): Ditto.
      	* config/pru/pru.h (CLZ_DEFINED_VALUE_AT_ZERO): Define PRU
      	value for CLZ with zero value parameter.
      	* config/pru/pru.md: Add halt, lmbd and clz patterns.
      	* doc/extend.texi: Document PRU builtins.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/pru/halt.c: New test.
      	* gcc.target/pru/lmbd.c: New test.
      5ace1776
    • Richard Sandiford's avatar
      vect: Add a “very cheap” cost model · 0b0061f4
      Richard Sandiford authored
      Currently we have three vector cost models: cheap, dynamic and
      unlimited.  -O2 -ftree-vectorize uses “cheap” by default, but that's
      still relatively aggressive about peeling and aliasing checks,
      and can lead to significant code size growth.
      
      This patch adds an even more conservative choice, which for lack of
      imagination I've called “very cheap”.  It only allows vectorisation
      if the vector code entirely replaces the scalar code.  It also
      requires one iteration of the vector loop to pay for itself,
      regardless of how often the loop iterates.  (If the vector loop
      needs multiple iterations to be beneficial then things are
      probably too close to call, and the conservative thing would
      be to stick with the scalar code.)
      
      The idea is that this should be suitable for -O2, although the patch
      doesn't change any defaults itself.
      
      I tested this by building and running a bunch of workloads for SVE,
      with three options:
      
        (1) -O2
        (2) -O2 -ftree-vectorize -fvect-cost-model=very-cheap
        (3) -O2 -ftree-vectorize [-fvect-cost-model=cheap]
      
      All three builds used the default -msve-vector-bits=scalable and
      ran with the minimum vector length of 128 bits, which should give
      a worst-case bound for the performance impact.
      
      The workloads included a mixture of microbenchmarks and full
      applications.  Because it's quite an eclectic mix, there's not
      much point giving exact figures.  The aim was more to get a general
      impression.
      
      Code size growth with (2) was much lower than with (3).  Only a
      handful of tests increased by more than 5%, and all of them were
      microbenchmarks.
      
      In terms of performance, (2) was significantly faster than (1)
      on microbenchmarks (as expected) but also on some full apps.
      Again, performance only regressed on a handful of tests.
      
      As expected, the performance of (3) vs. (1) and (3) vs. (2) is more
      of a mixed bag.  There are several significant improvements with (3)
      over (2), but also some (smaller) regressions.  That seems to be in
      line with -O2 -ftree-vectorize being a kind of -O2.5.
      
      The patch reorders vect_cost_model so that values are in order
      of increasing aggressiveness, which makes it possible to use
      range checks.  The value 0 still represents “unlimited”,
      so “if (flag_vect_cost_model)” is still a meaningful check.
      
      gcc/
      	* doc/invoke.texi (-fvect-cost-model): Add a very-cheap model.
      	* common.opt (fvect-cost-model=): Add very-cheap as a possible option.
      	(fsimd-cost-model=): Likewise.
      	(vect_cost_model): Add very-cheap.
      	* flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP.
      	Put the values in order of increasing aggressiveness.
      	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use
      	range checks when comparing against VECT_COST_MODEL_CHEAP.
      	(vect_prune_runtime_alias_test_list): Do not allow any alias
      	checks for the very-cheap cost model.
      	* tree-vect-loop.c (vect_analyze_loop_costing): Do not allow
      	any peeling for the very-cheap cost model.  Also require one
      	iteration of the vector loop to pay for itself.
      
      gcc/testsuite/
      	* gcc.dg/vect/vect-cost-model-1.c: New test.
      	* gcc.dg/vect/vect-cost-model-2.c: Likewise.
      	* gcc.dg/vect/vect-cost-model-3.c: Likewise.
      	* gcc.dg/vect/vect-cost-model-4.c: Likewise.
      	* gcc.dg/vect/vect-cost-model-5.c: Likewise.
      	* gcc.dg/vect/vect-cost-model-6.c: Likewise.
      0b0061f4
    • Jonathan Wakely's avatar
      libstdc++: Add missing header to some tests · 5e6a4315
      Jonathan Wakely authored
      These tests use std::this_thread::sleep_for without including <thread>.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/30_threads/async/async.cc: Include <thread>.
      	* testsuite/30_threads/future/members/93456.cc: Likewise.
      5e6a4315
    • Wilco Dijkstra's avatar
      AArch64: Add cost table for Cortex-A76 · 5c5a67e6
      Wilco Dijkstra authored
      Add an initial cost table for Cortex-A76 - this is copied from
      cotexa57_extra_costs but updated based on the Optimization Guide.
      Use the new cost table on all Neoverse tunings and ensure the tunings
      are consistent for all.  As a result more compact code is generated
      with more combined shift+alu operations. Eg. -mcpu=cortex-a76 will now
      merge the shifts in:
      
      int f(int x, int y) { return (x & y << 3) * (x | y << 3); }
      
      and  w2, w0, w1, lsl 3
      orr  w0, w0, w1, lsl 3
      mul  w0, w2, w0
      ret
      
      SPEC2017 codesize improves by 0.02% and SPECINT2017 shows 0.24% gain.
      
      2020-11-18  Wilco Dijkstra  <wdijkstr@arm.com>
      
      gcc/
      	* config/aarch64/aarch64.c (neoversen1_tunings): Use new
      	cortexa76_extra_costs.
      	(neoversev1_tunings): Likewise.
      	(neoversen2_tunines): Likewise.
      	* config/arm/aarch-cost-tables.h (cortexa76_extra_costs):
      	add new costs.
      5c5a67e6
    • Wilco Dijkstra's avatar
      AArch64: Improve inline memcpy expansion · 1d77928f
      Wilco Dijkstra authored
      Improve the inline memcpy expansion.  Use integer load/store for copies <= 24
      bytes instead of SIMD.  Set the maximum copy to expand to 256 by default,
      except that -Os or no Neon expands up to 128 bytes.  When using LDP/STP of
      Q-registers, also use Q-register accesses for the unaligned tail, saving 2
      instructions (eg. all sizes up to 48 bytes emit exactly 4 instructions).
      Cleanup code and comments.
      
      The codesize gain vs the GCC10 expansion is 0.05% on SPECINT2017.
      
      2020-11-03  Wilco Dijkstra  <wdijkstr@arm.com>
      
      gcc/
      	* config/aarch64/aarch64.c (aarch64_expand_cpymem): Cleanup code and
      	comments, tweak expansion decisions and improve tail expansion.
      1d77928f
    • Eric Botcazou's avatar
      Fix PR ada/97805 · 2729378d
      Eric Botcazou authored
      We need to include limits.h (or <climits>) in adaint.c because of LLONG_MIN.
      
      gcc/ada/ChangeLog:
      	PR ada/97805
      	* adaint.c: Include climits in C++ and limits.h otherwise.
      2729378d
    • Nathan Sidwell's avatar
      preprocessor: main file searching · 9844497a
      Nathan Sidwell authored
      This adds the capability to locate the main file on the user or system
      include paths.  That's extremely useful to users building header
      units.  Searching has to be requiested (plain header-unit compilation
      will not search).  Also, to make include_next work as expected when
      building a header unit, we add a mechanism to retrofit a non-searched
      source file as one on the include path.
      
      	libcpp/
      	* include/cpplib.h (enum cpp_main_search): New.
      	(struct cpp_options): Add main_search field.
      	(cpp_main_loc): Declare.
      	(cpp_retrofit_as_include): Declare.
      	* internal.h (struct cpp_reader): Add main_loc field.
      	(_cpp_in_main_source_file): Not main if main is a header.
      	* init.c (cpp_read_main_file): Use main_search option to locate
      	main file.  Set main_loc
      	* files.c (cpp_retrofit_as_include): New.
      9844497a
    • Jonathan Wakely's avatar
      libstdc++: Move std::thread to a new header · b204d772
      Jonathan Wakely authored
      This makes it possible to use std::thread without including the whole of
      <thread>. It also makes this_thread::get_id() and this_thread::yield()
      available even when there is no gthreads support (e.g. when GCC is built
      with --disable-threads or --enable-threads=single).
      
      In order for the std::thread::id return type of this_thread::get_id() to
      be defined, std:thread itself is defined unconditionally. However the
      constructor that creates new threads is not defined for single-threaded
      builds. The thread::join() and thread::detach() member functions are
      defined inline for single-threaded builds and just throw an exception
      (because we know the thread cannot be joinable if the constructor that
      creates joinable threads doesn't exit).
      
      The thread::hardware_concurrency() member function is also defined
      inline and returns 0 (as suggested by the standard when the value "is
      not computable or well-defined").
      
      The main benefit for most targets is that other headers such as <future>
      do not need to include the whole of <thread> just to be able to create a
      std::thread. That avoids including <stop_token> and std::jthread where
      not required. This is another partial fix for PR 92546.
      
      This also means we can use this_thread::get_id() and this_thread::yield()
      in <stop_token> instead of using the gthread functions directly. This
      removes some preprocessor conditionals, simplifying the code.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/92546
      	* include/Makefile.am: Add new <bits/std_thread.h> header.
      	* include/Makefile.in: Regenerate.
      	* include/std/future: Include new header instead of <thread>.
      	* include/std/stop_token: Include new header instead of
      	<bits/gthr.h>.
      	(stop_token::_S_yield()): Use this_thread::yield().
      	(_Stop_state_t::_M_requester): Change type to std::thread::id.
      	(_Stop_state_t::_M_request_stop()): Use this_thread::get_id().
      	(_Stop_state_t::_M_remove_callback(_Stop_cb*)): Likewise.
      	Use __is_single_threaded() to decide whether to synchronize.
      	* include/std/thread (thread, operator==, this_thread::get_id)
      	(this_thread::yield): Move to new header.
      	(operator<=>, operator!=, operator<, operator<=, operator>)
      	(operator>=, hash<thread::id>, operator<<): Define even when
      	gthreads not available.
      	* src/c++11/thread.cc: Include <memory>.
      	* include/bits/std_thread.h: New file.
      	(thread, operator==, this_thread::get_id, this_thread::yield):
      	Define even when gthreads not available.
      	[!_GLIBCXX_HAS_GTHREADS] (thread::join, thread::detach)
      	(thread::hardware_concurrency): Define inline.
      b204d772
    • Jonathan Wakely's avatar
      libstdc++: Fix overflow checks to use the correct "time_t" [PR 93456] · b108faa9
      Jonathan Wakely authored
      I recently added overflow checks to src/c++11/futex.cc for PR 93456, but
      then changed the type of the timespec for PR 93421. This meant the
      overflow checks were no longer using the right range, because the
      variable being written to might be smaller than time_t.
      
      This introduces new typedef that corresponds to the tv_sec member of the
      struct being passed to the syscall, and uses that typedef in the range
      checks.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/93421
      	PR libstdc++/93456
      	* src/c++11/futex.cc (syscall_time_t): New typedef for
      	the type of the syscall_timespec::tv_sec member.
      	(relative_timespec, _M_futex_wait_until)
      	(_M_futex_wait_until_steady): Use syscall_time_t in overflow
      	checks, not time_t.
      b108faa9
    • Nathan Sidwell's avatar
      preprocessor: main-file cleanup · bf425849
      Nathan Sidwell authored
      In preparing module patch 7 I realized there was a cleanup I could
      make to simplify it.  This is that cleanup.  Also, when doing the
      cleanup I noticed some macros had been turned into inline functions,
      but not renamed to the preprocessors internal namespace
      (_cpp_$INTERNAL rather than cpp_$USER).  Thus, this renames those
      functions, deletes an internal field of the file structure, and
      determines whether we're in the main file by comparing to
      pfile->main_file, the _cpp_file of the main file.
      
      	libcpp/
      	* internal.h (cpp_in_system_header): Rename to ...
      	(_cpp_in_system_header): ... here.
      	(cpp_in_primary_file): Rename to ...
      	(_cpp_in_main_source_file): ... here.  Compare main_file equality
      	and check main_search value.
      	* lex.c (maybe_va_opt_error, _cpp_lex_direct): Adjust for rename.
      	* macro.c (_cpp_builtin_macro_text): Likewise.
      	(replace_args): Likewise.
      	* directives.c (do_include_next): Likewise.
      	(do_pragma_once, do_pragma_system_header): Likewise.
      	* files.c (struct _cpp_file): Delete main_file field.
      	(pch_open): Check pfile->main_file equality.
      	(make_cpp_file): Drop cpp_reader parm, don't set main_file.
      	(_cpp_find_file): Adjust.
      	(_cpp_stack_file): Check pfile->main_file equality.
      	(struct report_missing_guard_data): Add cpp_reader field.
      	(report_missing_guard): Check pfile->main_file equality.
      	(_cpp_report_missing_guards): Adjust.
      bf425849
    • Richard Biener's avatar
      Fix bootstrap · d84ba819
      Richard Biener authored
      This fixes a typo in the TREE_CODE compare which should
      compare against TYPE_DECL, not TYPE_NAME.
      
      2020-11-19  Richard Biener  <rguenther@suse.de>
      
      	* fold-const.c (operand_compare::hash_operand): Fix typo.
      d84ba819
    • Richard Biener's avatar
      Fix gcc.dg/pr97897.c · 717e22dc
      Richard Biener authored
      This adds dg-options "" to avoid the pedantic error on _Complex int.
      
      2020-11-19  Richard Biener  <rguenther@suse.de>
      
      	* gcc.dg/pr97897.c: Add dg-options.
      717e22dc
    • Richard Biener's avatar
      refactor reassocs get_rank · b08e0ee3
      Richard Biener authored
      This refactors things so assigned ranks are dumped and the cache
      is consistently used also for PHIs.
      
      2020-11-19  Richard Biener  <rguenther@suse.de>
      
      	* tree-ssa-reassoc.c (get_rank): Refactor to consistently
      	use the cache and dump ranks assigned.
      b08e0ee3
    • Jan Hubicka's avatar
      Fix operand_equal_p hash and copare of ODR_TYPE_REF · d8cf8976
      Jan Hubicka authored
      	* fold-const.c (operand_compare::operand_equal_p): More OBJ_TYPE_REF
      	matching to correct place; drop OEP_ADDRESS_OF for TOKEN, OBJECT and
      	class.
      	(operand_compare::hash_operand): Hash ODR type for OBJ_TYPE_REF.
      d8cf8976
    • Joel Hutton's avatar
      [3/3] [AArch64][vect] vec_widen_lshift pattern · 27842e2a
      Joel Hutton authored
      Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
      mid-end. This pattern takes one vector with N elements of size S, shifts
      each element left by the element width and stores the results as N
      elements of size 2*s (in 2 result vectors). The aarch64 backend
      implements this with the shll,shll2 instruction pair.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-simd.md: Add vec_widen_lshift_hi/lo<mode>
      	patterns.
      	* tree-vect-stmts.c (vectorizable_conversion): Fix for widen_lshift
      	case.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/vect-widen-lshift.c: New test.
      27842e2a
    • Joel Hutton's avatar
      [2/3] [vect] Add widening add, subtract patterns · 9fc9573f
      Joel Hutton authored
      Add widening add, subtract patterns to tree-vect-patterns. Update the
      widened code of patterns that detect PLUS_EXPR to also detect
      WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size
      S and perform an add/subtract on the elements, storing the results as N
      elements of size 2*S (in 2 result vectors). This is implemented in the
      aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64
      tests for patterns.
      
      gcc/ChangeLog:
      	* doc/generic.texi: Document new widen_plus/minus_lo/hi tree codes.
      	* doc/md.texi: Document new widenening add/subtract hi/lo optabs.
      	* expr.c (expand_expr_real_2): Add widen_add, widen_subtract cases.
      	* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
      	* optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
      	* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
      	subtracts.
      	* tree-inline.c (estimate_operator_cost): Add case for widening adds,
      	subtracts.
      	* tree-vect-generic.c (expand_vector_operations_1): Add case for
      	widening adds, subtracts
      	* tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog
      	pattern.
      	(vect_recog_widen_sub_pattern): New recog pattern.
      	(vect_recog_average_pattern): Update widened add code.
      	(vect_recog_average_pattern): Update widened add code.
      	* tree-vect-stmts.c (vectorizable_conversion): Add case for widened add,
      	subtract.
      	(supportable_widening_operation): Add case for widened add, subtract.
      	* tree.def
      	(WIDEN_PLUS_EXPR): New tree code.
      	(WIDEN_MINUS_EXPR): New tree code.
      	(VEC_WIDEN_ADD_HI_EXPR): New tree code.
      	(VEC_WIDEN_PLUS_LO_EXPR): New tree code.
      	(VEC_WIDEN_MINUS_HI_EXPR): New tree code.
      	(VEC_WIDEN_MINUS_LO_EXPR): New tree code.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/vect-widen-add.c: New test.
      	* gcc.target/aarch64/vect-widen-sub.c: New test.
      9fc9573f
    • Joel Hutton's avatar
      [1/3][aarch64] Add vec_widen patterns to aarch64 · ec46904e
      Joel Hutton authored
      Add widening add and subtract patterns to the aarch64
      backend. These allow taking vectors of N elements of size S
      and performing and add/subtract on the high or low half
      widening the resulting elements and storing N/2 elements of size 2*S.
      These correspond to the addl,addl2,subl,subl2 instructions.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-simd.md: New patterns
      	vec_widen_saddl_lo/hi_<mode>.
      ec46904e
    • Richard Biener's avatar
      tree-optimization/97901 - ICE propagating out LC PHIs · ec383f0b
      Richard Biener authored
      We need to fold the stmt to canonicalize MEM_REFs which means
      we're back to using replace_uses_by.  Which means we need dominators
      to not require a CFG cleanup upthread.
      
      2020-11-19  Richard Biener  <rguenther@suse.de>
      
      	PR tree-optimization/97901
      	* tree-ssa-propagate.c (clean_up_loop_closed_phi): Compute
      	dominators and use replace_uses_by.
      
      	* gcc.dg/torture/pr97901.c: New testcase.
      ec383f0b
    • Eric Botcazou's avatar
      Enhance debug info for fixed-point types · 43a0debd
      Eric Botcazou authored
      The Ada language supports fixed-point types as first-class citizens so
      they need to be described as-is in the debug info.  We devised the
      langhook get_fixed_point_type_info for this purpose a few years ago,
      but it comes with a limitation for the representation of the scale
      factor that we would need to lift in order to be able to represent
      more fixed-point types.
      
      gcc/ChangeLog:
      	* dwarf2out.h (struct fixed_point_type_info) <scale_factor>: Turn
      	numerator and denominator into a tree.
      	* dwarf2out.c (base_type_die): In the case of a fixed-point type
      	with arbitrary scale factor, call add_scalar_info on numerator and
      	denominator to emit the appropriate attributes.
      
      gcc/ada/ChangeLog:
      	* exp_dbug.adb (Is_Handled_Scale_Factor): Delete.
      	(Get_Encoded_Name): Do not call it.
      	* gcc-interface/decl.c (gnat_to_gnu_entity) <Fixed_Point_Type>:
      	Tidy up and always use a meaningful description for arbitrary
      	scale factors.
      	* gcc-interface/misc.c (gnat_get_fixed_point_type_info): Remove
      	obsolete block and adjust the description of the scale factor.
      43a0debd
    • Richard Biener's avatar
      tree-optimization/97897 - complex lowering on abnormal edges · 0d829095
      Richard Biener authored
      This fixes complex lowering to not put constants into abnormal
      edge PHI values by making sure abnormally used SSA names are
      VARYING in its propagation lattice.
      
      2020-11-19  Richard Biener  <rguenther@suse.de>
      
      	PR tree-optimization/97897
      	* tree-complex.c (complex_propagate::visit_stmt): Make sure
      	abnormally used SSA names are VARYING.
      	(complex_propagate::visit_phi): Likewise.
      	* tree-ssa.c (verify_phi_args): Verify PHI arguments on abnormal
      	edges are SSA names.
      
      	* gcc.dg/pr97897.c: New testcase.
      0d829095
Loading