Skip to content
Snippets Groups Projects
  1. Feb 15, 2024
    • Kwok Cheung Yeung's avatar
      openmp, fortran: Add Fortran support for indirect clause on the declare target directive · 451bb586
      Kwok Cheung Yeung authored
      2024-02-15  Kwok Cheung Yeung  <kcyeung@baylibre.com>
      
      	gcc/fortran/
      	* dump-parse-tree.cc (show_attr): Handle omp_declare_target_indirect
      	attribute.
      	* f95-lang.cc (gfc_gnu_attributes): Add entry for 'omp declare
      	target indirect'.
      	* gfortran.h (symbol_attribute): Add omp_declare_target_indirect
      	field.
      	(struct gfc_omp_clauses): Add indirect field.
      	* openmp.cc (omp_mask2): Add OMP_CLAUSE_INDIRECT.
      	(gfc_match_omp_clauses): Match indirect clause.
      	(OMP_DECLARE_TARGET_CLAUSES): Add OMP_CLAUSE_INDIRECT.
      	(gfc_match_omp_declare_target): Check omp_device_type and apply
      	omp_declare_target_indirect attribute to symbol if indirect clause
      	active.  Show warning if there are only device_type and/or indirect
      	clauses on the directive.
      	* trans-decl.cc (add_attributes_to_decl): Add 'omp declare target
      	indirect' attribute if symbol has indirect attribute set.
      
      	gcc/testsuite/
      	* gfortran.dg/gomp/declare-target-4.f90 (f1): Update expected warning.
      	* gfortran.dg/gomp/declare-target-indirect-1.f90: New.
      	* gfortran.dg/gomp/declare-target-indirect-2.f90: New.
      
      	libgomp/
      	* testsuite/libgomp.fortran/declare-target-indirect-1.f90: New.
      	* testsuite/libgomp.fortran/declare-target-indirect-2.f90: New.
      	* testsuite/libgomp.fortran/declare-target-indirect-3.f90: New.
      451bb586
    • David Malcolm's avatar
      analyzer: remove offset_region size overloads [PR111266] · 617bd59c
      David Malcolm authored
      
      PR analyzer/111266 reports a missing -Wanalyzer-out-of-bounds when
      accessing relative to a concrete byte offset.
      
      Root cause is that offset_region::get_{byte,bit}_size_sval were
      attempting to compute the size that's valid to access, rather than the
      size of the access attempt.
      
      Fixed by removing these vfunc overrides from offset_region as the
      base class implementation does the right thing.
      
      gcc/analyzer/ChangeLog:
      	PR analyzer/111266
      	* region.cc (offset_region::get_byte_size_sval): Delete.
      	(offset_region::get_bit_size_sval): Delete.
      	* region.h (region::get_byte_size): Add comment clarifying that
      	this relates to the size of the access, rather than the size
      	that's valid to access.
      	(region::get_bit_size): Likewise.
      	(region::get_byte_size_sval): Likewise.
      	(region::get_bit_size_sval): Likewise.
      	(offset_region::get_byte_size_sval): Delete.
      	(offset_region::get_bit_size_sval): Delete.
      
      gcc/testsuite/ChangeLog:
      	PR analyzer/111266
      	* c-c++-common/analyzer/out-of-bounds-pr111266.c: New test.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      617bd59c
    • Jakub Jelinek's avatar
      testsuite: Require lra effective target for pr107385.c · 0d5d1c75
      Jakub Jelinek authored
      Old reload doesn't support asm goto with output operands.
      We have lra effective target (though, strangely it returns
      0 just for 2 targets out of at least 16 targets with no LRA support),
      so this patch uses it, similarly how it is done in other asm goto
      tests with output operands.
      
      2024-02-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR middle-end/107385
      	* gcc.dg/pr107385.c: Require lra effective target.
      0d5d1c75
    • Andrew Pinski's avatar
      aarch64: Fix undefined code in vect_ctz_1.c · cb805822
      Andrew Pinski authored
      
      The testcase gcc.target/aarch64/vect_ctz_1.c fails execution when running
      with -march=armv9-a due to the testcase calls __builtin_ctz with a value of 0.
      The testcase should not depend on undefined behavior of __builtin_ctz. So this
      changes it to use the g form with the 2nd argument of 32. Now the execution part
      of the testcase work. It still has a scan-assembler failure which should be fixed
      seperately.
      
      Tested on aarch64-linux-gnu.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/vect_ctz_1.c (TEST): Use g form of the builtin and pass 32
      	as the value expected at 0.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      cb805822
    • Torbjörn SVENSSON's avatar
      testsuite: Define _POSIX_SOURCE for tests [PR113278] · 8e8c2d2b
      Torbjörn SVENSSON authored
      
      As the tests assume that fileno() is visible (only part of POSIX),
      define the guard to ensure that it's visible.  Currently, glibc appears
      to always have this defined in C++, newlib does not.
      
      Without this patch, fails like this can be seen:
      
      Testing analyzer/fileno-1.c,  -std=c++98
      .../fileno-1.c: In function 'int test_pass_through(FILE*)':
      .../fileno-1.c:5:10: error: 'fileno' was not declared in this scope
      FAIL: c-c++-common/analyzer/fileno-1.c  -std=c++98 (test for excess errors)
      
      Patch has been verified on Linux.
      
      gcc/testsuite/ChangeLog:
      	PR testsuite/113278
      	* c-c++-common/analyzer/fileno-1.c: Define _POSIX_SOURCE.
      	* c-c++-common/analyzer/flex-with-call-summaries.c: Same.
      	* c-c++-common/analyzer/flex-without-call-summaries.c: Same.
      
      Signed-off-by: default avatarTorbjörn SVENSSON <torbjorn.svensson@foss.st.com>
      8e8c2d2b
    • David Faust's avatar
      bpf: fix zero_extendqidi2 ldx template · f995f567
      David Faust authored
      Commit 77d0f9ec inadvertently changed
      the normal asm dialect instruction template for zero_extendqidi2 from
      ldxb to ldxh. Fix that.
      
      gcc/
      
      	* config/bpf/bpf.md (zero_extendqidi2): Correct asm template to
      	use ldxb instead of ldxh.
      f995f567
    • Jakub Jelinek's avatar
      testsuite: Add testcase for already fixed PR [PR107385] · 5459a907
      Jakub Jelinek authored
      This testcase has been fixed by the PR113921 fix, but unlike testcase
      in there this one is not target specific.
      
      2024-02-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR middle-end/107385
      	* gcc.dg/pr107385.c: New test.
      5459a907
    • Jakub Jelinek's avatar
      expand: Fix handling of asm goto outputs vs. PHI argument adjustments [PR113921] · 2b4efc5d
      Jakub Jelinek authored
      The Linux kernel and the following testcase distilled from it is
      miscompiled, because tree-outof-ssa.cc (eliminate_phi) emits some
      fixups on some of the edges (but doesn't commit edge insertions).
      Later expand_asm_stmt emits further instructions on the same edge.
      Now the problem is that expand_asm_stmt uses insert_insn_on_edge
      to add its own fixups, but that function appends to the existing
      sequence on the edge if any.  And the bug triggers when the
      fixup sequence emitted by eliminate_phi uses a pseudo which the
      fixup sequence emitted by expand_asm_stmt later on sets.
      So, we end up with
        (set (reg A) (asm_operands ...))
      and on one of the edges queued sequence
        (set (reg C) (reg B)) // added by eliminate_phi
        (set (reg B) (reg A)) // added by expand_asm_stmt
      That is wrong, what we emit by expand_asm_stmt needs to be as close
      to the asm_operands as possible (they aren't known until expand_asm_stmt
      is called, the PHI fixup code assumes it is reg B which holds the right
      value) and the PHI adjustments need to be done after it.
      
      So, the following patch introduces a prepend_insn_to_edge function and
      uses it from expand_asm_stmt, so that we queue
        (set (reg B) (reg A)) // added by expand_asm_stmt
        (set (reg C) (reg B)) // added by eliminate_phi
      instead and so the value from the asm_operands output propagates correctly
      to the PHI result.
      
      2024-02-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR middle-end/113921
      	* cfgrtl.h (prepend_insn_to_edge): New declaration.
      	* cfgrtl.cc (insert_insn_on_edge): Clarify behavior in function
      	comment.
      	(prepend_insn_to_edge): New function.
      	* cfgexpand.cc (expand_asm_stmt): Use prepend_insn_to_edge instead of
      	insert_insn_on_edge.
      
      	* gcc.target/i386/pr113921.c: New test.
      2b4efc5d
    • Richard Biener's avatar
      tree-optimization/111156 - properly dissolve SLP only groups · b312cf21
      Richard Biener authored
      The following fixes the omission of failing to look at pattern
      stmts when we need to dissolve SLP only groups.
      
      	PR tree-optimization/111156
      	* tree-vect-loop.cc (vect_dissolve_slp_only_groups): Look
      	at the pattern stmt if any.
      b312cf21
    • Matthieu Longo's avatar
      arm: testuite: Missing optimization pattern for rev16 with thumb1 · 2acf478b
      Matthieu Longo authored
      This patch marks a rev16 test as XFAIL for architectures having only
      Thumb1 support.  The generated code is functionally correct, but the
      optimization is disabled when -mthumb is equivalent to Thumb1.  Fixing
      the root issue would requires changes that are not suitable for GCC14
      stage 4.  More information at
      https://linaro.atlassian.net/browse/GNU-1141
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/arm/rev16_2.c: XFAIL when compiled with Thumb1.
      2acf478b
    • Georg-Johann Lay's avatar
      AVR: target 113927 - Simple code triggers stack frame for Reduced Tiny. · 5cff288c
      Georg-Johann Lay authored
      The -mmcu=avrtiny cores have no ADIW and SBIW instructions.  This was
      implemented by clearing all regs out of regclass ADDW_REGS so that
      constraint "w" never matched.  This corrupted the subset relations of
      the register classes as they appear in enum reg_class.
      
      This patch keeps ADDW_REGS like for all other cores, i.e. it contains
      R24...R31.  Instead of tests like  test_hard_reg_class (ADDW_REGS, *)
      the code now uses  avr_adiw_reg_p (*).  And all insns with constraint "w"
      get "isa" insn attribute value of "adiw".
      
      Plus, a new built-in macro __AVR_HAVE_ADIW__ is provided, which is more
      specific than __AVR_TINY__.
      
      gcc/
      	PR target/113927
      	* config/avr/avr.h (AVR_HAVE_ADIW): New macro.
      	* config/avr/avr-protos.h (avr_adiw_reg_p): New proto.
      	* config/avr/avr.cc (avr_adiw_reg_p): New function.
      	(avr_conditional_register_usage) [AVR_TINY]: Don't clear ADDW_REGS.
      	Replace test_hard_reg_class (ADDW_REGS, ...) with calls to
      	* config/avr/avr.md: Same.
      	(attr "isa") <tiny, no_tiny>: Remove.
      	<adiw, no_adiw>: Add.
      	(define_insn, define_insn_and_split): When an alternative has
      	constraint "w", then set attribute "isa" to "adiw".
      	* config/avr/avr-c.cc (avr_cpu_cpp_builtins) [AVR_HAVE_ADIW]:
      	Built-in define __AVR_HAVE_ADIW__.
      	* doc/invoke.texi (AVR Options): Document it.
      5cff288c
    • Andrew Stubbs's avatar
      amdgcn: Disallow unsupported permute on RDNA devices · 84da9bca
      Andrew Stubbs authored
      The RDNA architecture has limited support for permute operations.  This should
      allow use of the permutations that do work, and fall back to linear code for
      other cases.
      
      gcc/ChangeLog:
      
      	* config/gcn/gcn-valu.md
      	(vec_extract<V_MOV:mode><V_MOV_ALT:mode>): Add conditions for RDNA.
      	* config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Check permutation
      	details are supported on RDNA devices.
      84da9bca
    • Jakub Jelinek's avatar
      gccrs: Avoid *.bak suffixed tests - use dg-skip-if instead · f0b1cf01
      Jakub Jelinek authored
      On Fri, Feb 09, 2024 at 11:03:38AM +0100, Jakub Jelinek wrote:
      > On Wed, Feb 07, 2024 at 12:43:59PM +0100, arthur.cohen@embecosm.com wrote:
      > > This patch introduces one regression because generics are getting better
      > > understood over time. The code here used to apply generics with the same
      > > symbol from previous segments which was a bit of a hack with out limited
      > > inference variable support. The regression looks like it will be related
      > > to another issue which needs to default integer inference variables much
      > > more aggresivly to default integer.
      > >
      > > Fixes #2723
      > >     * rust/compile/issue-1773.rs: Moved to...
      > >     * rust/compile/issue-1773.rs.bak: ...here.
      >
      > Please don't use such suffixes in the testsuite.
      > Either delete the testcase, or xfail it somehow until the bug is fixed.
      
      To be precise, I have scripts to look for backup files in the tree (*~,
      *.bak, *.orig, *.rej etc.) and this stands in the way several times a day.
      
      Here is a fix for that in patch form, tested on x86_64-linux with
      make check-rust RUNTESTFLAGS='compile.exp=issue-1773.rs'
      
      2024-02-15  Jakub Jelinek  <jakub@redhat.com>
      
      	* rust/compile/issue-1773.rs.bak: Rename to ...
      	* rust/compile/issue-1773.rs: ... this.  Add dg-skip-if directive.
      f0b1cf01
    • Andrew Pinski's avatar
      doc: Add documentation of which operand matches the mode of the standard pattern name [PR113508] · 5329b941
      Andrew Pinski authored
      
      In some of the standard pattern names, it is not obvious which mode is being used in the pattern
      name. Is it operand 0, 1, or 2? Is it the wider mode or the narrower mode?
      This fixes that so there is no confusion by adding a sentence to some of them.
      
      Built the documentation to make sure that it builds.
      
      gcc/ChangeLog:
      
      	PR middle-end/113508
      	* doc/md.texi (sdot_prod@var{m}, udot_prod@var{m},
      	usdot_prod@var{m}, ssad@var{m}, usad@var{m}, widen_usum@var{m}3,
      	smulhs@var{m}3, umulhs@var{m}3, smulhrs@var{m}3, umulhrs@var{m}3):
      	Add sentence about what the mode m is.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      5329b941
    • Andrew Pinski's avatar
      doc: Fix some standard named pattern documentation modes · 594829ba
      Andrew Pinski authored
      
      Currently these use `@var{m3}` but the 3 here is a literal 3
      and not part of the mode itself so it should not be inside
      the var. Fixed as such.
      
      Built the documentation to make sure it looks correct now.
      
      gcc/ChangeLog:
      
      	* doc/md.texi (widen_ssum, widen_usum, smulhs, umulhs,
      	smulhrs, umulhrs, sdiv_pow2): Move the 3 outside of the
      	var.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      594829ba
    • Richard Biener's avatar
      Do not record dependences from debug stmts in tail merging · ab5fb0f9
      Richard Biener authored
      The following avoids recording BB dependences for debug stmt uses.
      
      	* tree-ssa-tail-merge.cc (same_succ_hash): Skip debug
      	stmts.
      ab5fb0f9
    • Jonathan Wakely's avatar
      libstdc++: Remove redundant zeroing in std::bitset::operator>>= [PR113806] · bf883e64
      Jonathan Wakely authored
      The unused bits in the high word are already zero before this operation.
      Shifting the used bits to the right cannot affect the unused bits, so we
      don't need to sanitize them.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/113806
      	* include/std/bitset (bitset::operator>>=): Remove redundant
      	call to _M_do_sanitize.
      bf883e64
    • Jonathan Wakely's avatar
      libstdc++: Use memset to optimize std::bitset::set() [PR113807] · e7ae13a8
      Jonathan Wakely authored
      As pointed out in the PR we already do this for reset().
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/113807
      	* include/std/bitset (bitset::set()): Use memset instead of a
      	loop over the individual words.
      e7ae13a8
    • Jonathan Wakely's avatar
      libstdc++: Use unsigned division in std::rotate [PR113811] · 4d819db7
      Jonathan Wakely authored
      Signed 64-bit division is much slower than unsigned, so cast the n and
      k values to unsigned before doing n %= k. We know this is safe because
      neither value can be negative.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/113811
      	* include/bits/stl_algo.h (__rotate): Use unsigned values for
      	division.
      4d819db7
    • Jonathan Wakely's avatar
      libstdc++: Avoid aliasing violation in std::valarray [PR99117] · b58f0e52
      Jonathan Wakely authored
      The call to __valarray_copy constructs an _Array object to refer to
      this->_M_data but that means that accesses to this->_M_data are through
      a restrict-qualified pointer. This leads to undefined behaviour when
      copying from an _Expr object that actually aliases this->_M_data.
      
      Replace the call to __valarray_copy with a plain loop. I think this
      removes the only use of that overload of __valarray_copy, so it could
      probably be removed. I haven't done that here.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/99117
      	* include/std/valarray (valarray::operator=(const _Expr&)):
      	Use loop to copy instead of __valarray_copy with _Array.
      	* testsuite/26_numerics/valarray/99117.cc: New test.
      b58f0e52
    • Jonathan Wakely's avatar
      libstdc++: Update tzdata to 2024a · 4d6513f8
      Jonathan Wakely authored
      Import the new 2024a tzdata.zi file. The leapseconds file was also
      updated to have a new expiry (no new leap seconds were added).
      
      libstdc++-v3/ChangeLog:
      
      	* src/c++20/tzdata.zi: Import new file from 2024a release.
      	* src/c++20/tzdb.cc (tzdb_list::_Node::_S_read_leap_seconds)
      	Update expiry date for leap seconds list.
      4d6513f8
    • Jonathan Wakely's avatar
      libstdc++: Use 128-bit arithmetic for std::linear_congruential_engine [PR87744] · c9ce332b
      Jonathan Wakely authored
      For 32-bit targets without __int128 we need to implement the LCG
      transition function by hand using 64-bit types.
      
      We can also slightly simplify the __mod function by using if-constexpr
      unconditionally, disabling -Wc++17-extensions warnings with diagnostic
      pragmas.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/87744
      	* include/bits/random.h [!__SIZEOF_INT128__] (_Select_uint_least_t):
      	Define specialization for 64-bit generators with
      	non-power-of-two modulus and large constants.
      	(__mod): Use if constexpr unconditionally.
      	* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
      	line number.
      	* testsuite/26_numerics/random/linear_congruential_engine/87744.cc:
      	New test.
      c9ce332b
    • Martin Jambor's avatar
      testsuite: Fix guality/ipa-sra-1.c to work with return IPA-VRP · f0e2714f
      Martin Jambor authored
      The test guality/ipa-sra-1.c stopped working after
      r14-5628-g53ba8d669550d3 because the variable from which the values of
      removed parameters could be calculated is also removed with it.  Fixed
      with this patch which stops a function from returning a constant.
      
      I have also noticed that the XFAILed test passes at -O0 -O1 and -Og on
      all (three) targets I have tried, not just aarch64, so I extended the
      xfail exception accordingly.
      
      gcc/testsuite/ChangeLog:
      
      2024-02-14  Martin Jambor  <mjambor@suse.cz>
      
      	* gcc.dg/guality/ipa-sra-1.c (get_val1): Move up in the file.
      	(get_val2): Likewise.
      	(bar): Do not return a constant.  Extend xfail exception for all
      	targets.
      f0e2714f
    • Andreas Schwab's avatar
      Skip gnat.dg/div_zero.adb on RISC-V · 98e931de
      Andreas Schwab authored
      Like AArch64 and POWER, RISC-V does not support trap on zero divide.
      
      gcc/testsuite/
      	* gnat.dg/div_zero.adb: Skip on RISC-V.
      98e931de
    • Jakub Jelinek's avatar
      lower-bitint: Ensure we don't get coalescing ICEs for (ab) SSA_NAMEs used in mul/div/mod [PR113567] · baa40971
      Jakub Jelinek authored
      The build_bitint_stmt_ssa_conflicts hook has a special case for
      multiplication, division and modulo, where to ensure there is no overlap
      between lhs and rhs1/rhs2 arrays we make the lhs conflict with the
      operands.
      On the following testcase, we have
        # a_1(ab) = PHI <a_2(D)(0), a_3(ab)(3)>
      lab:
        a_3(ab) = a_1(ab) % 3;
      before lowering and this special case causes a_3(ab) and a_1(ab) to
      conflict, but the PHI requires them not to conflict, so we ICE because we
      can't find some partitioning that will work.
      
      The following patch fixes this by special casing such statements before
      the partitioning, force the inputs of the multiplication/division which
      have large/huge _BitInt (ab) lhs into new non-(ab) SSA_NAMEs initialized
      right before the multiplication/division.  This allows the partitioning
      to work then, as it has the possibility to use a different partition for
      the */% operands.
      
      2024-02-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/113567
      	* gimple-lower-bitint.cc (gimple_lower_bitint): For large/huge
      	_BitInt multiplication, division or modulo with
      	SSA_NAME_OCCURS_IN_ABNORMAL_PHI lhs and at least one of rhs1 and rhs2
      	force the affected inputs into a new SSA_NAME.
      
      	* gcc.dg/bitint-90.c: New test.
      baa40971
    • Richard Biener's avatar
      [libiberty] remove TBAA violation in iterative_hash, improve code-gen · 52ac4c6b
      Richard Biener authored
      The following removes the TBAA violation present in iterative_hash.
      As we eventually LTO that it's important to fix.  This also improves
      code generation for the >= 12 bytes loop by using | to compose the
      4 byte words as at least GCC 7 and up can recognize that pattern
      and perform a 4 byte load while the variant with a + is not
      recognized (not on trunk either), I think we have an enhancement bug
      for this somewhere.
      
      Given we reliably merge and the bogus "optimized" path might be
      only relevant for archs that cannot do misaligned loads efficiently
      I've chosen to keep a specialization for aligned accesses.
      
      libiberty/
      	* hashtab.c (iterative_hash): Remove TBAA violating handling
      	of aligned little-endian case in favor of just keeping the
      	aligned case special-cased.  Use | for composing a larger word.
      52ac4c6b
    • GCC Administrator's avatar
      Daily bump. · 5266f930
      GCC Administrator authored
      5266f930
  2. Feb 14, 2024
    • Steve Kargl's avatar
      Fortran: namelist-object-name renaming. · 8221201c
      Steve Kargl authored
      	PR fortran/105847
      
      gcc/fortran/ChangeLog:
      
      	* trans-io.cc (transfer_namelist_element): When building the
      	namelist object name, if the use rename attribute is set, use
      	the local name specified in the use statement.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/pr105847.f90: New test.
      8221201c
    • Uros Bizjak's avatar
      testsuite: Fix a couple of x86 issues in gcc.dg/vect testsuite · 430c772b
      Uros Bizjak authored
      A compile-time test can use -march=skylake-avx512 for all x86 targets,
      but a runtime test needs to check avx512f effective target if the
      instructions can be assembled.
      
      The runtime test also needs to check if the target machine supports
      instruction set we have been compiled for.  The testsuite uses check_vect
      infrastructure, but handling of AVX512F+ ISAs was missing there.
      
      Add detection of __AVX512F__ and __AVX512VL__, which is enough to handle
      all currently mentioned target processors in the gcc.dg/vect testsuite.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/pr113576.c (dg-additional-options):
      	Use -march=skylake-avx512 for avx512f effective target.
      	* gcc.dg/vect/pr98308.c (dg-additional-options):
      	Use -march=skylake-avx512 for all x86 targets.
      	* gcc.dg/vect/tree-vect.h (check_vect): Handle __AVX512F__
      	and __AVX512VL__.
      430c772b
    • H.J. Lu's avatar
      x86: Support x32 and IBT in heap trampoline · 67ce5c97
      H.J. Lu authored
      Add x32 and IBT support to x86 heap trampoline implementation with a
      testcase.
      
      2024-02-13  Jakub Jelinek  <jakub@redhat.com>
      	    H.J. Lu  <hjl.tools@gmail.com>
      
      libgcc/
      
      	PR target/113855
      	* config/i386/heap-trampoline.c (trampoline_insns): Add IBT
      	support and pad to the multiple of 4 bytes.  Use movabsq
      	instead of movabs in comments.  Add -mx32 variant.
      
      gcc/testsuite/
      
      	PR target/113855
      	* gcc.dg/heap-trampoline-1.c: New test.
      	* lib/target-supports.exp (check_effective_target_heap_trampoline):
      	New.
      67ce5c97
    • Uros Bizjak's avatar
      i386: psrlq is not used for PERM<a,{0},1,2,3,4> [PR113871] · 2c2f57e4
      Uros Bizjak authored
      Introduce vec_shl_<mode> and vec_shr_<mode> expanders to improve
      
      	'*a = __builtin_shufflevector(*a, (vect64){0}, 1, 2, 3, 4);'
      
      and
      	'*a = __builtin_shufflevector((vect64){0}, *a, 3, 4, 5, 6);'
      
      shuffles.  The generated code improves from:
      
      	movzwl  6(%rdi), %eax
      	movzwl  4(%rdi), %edx
      	salq    $16, %rax
      	orq     %rdx, %rax
      	movzwl  2(%rdi), %edx
      	salq    $16, %rax
      	orq     %rdx, %rax
      	movq    %rax, (%rdi)
      
      to:
      	movq    (%rdi), %xmm0
      	psrlq   $16, %xmm0
      	movq    %xmm0, (%rdi)
      
      and to:
      	movq    (%rdi), %xmm0
      	psllq   $16, %xmm0
      	movq    %xmm0, (%rdi)
      
      in the second case.
      
      The patch handles 32-bit vectors as well and improves generated code from:
      
      	movd    (%rdi), %xmm0
      	pxor    %xmm1, %xmm1
      	punpcklwd       %xmm1, %xmm0
      	pshuflw $230, %xmm0, %xmm0
      	movd    %xmm0, (%rdi)
      
      to:
      	movd    (%rdi), %xmm0
      	psrld   $16, %xmm0
      	movd    %xmm0, (%rdi)
      
      and to:
      	movd    (%rdi), %xmm0
      	pslld   $16, %xmm0
      	movd    %xmm0, (%rdi)
      
      	PR target/113871
      
      gcc/ChangeLog:
      
      	* config/i386/mmx.md (V248FI): New mode iterator.
      	(V24FI_32): DItto.
      	(vec_shl_<V248FI:mode>): New expander.
      	(vec_shl_<V24FI_32:mode>): Ditto.
      	(vec_shr_<V248FI:mode>): Ditto.
      	(vec_shr_<V24FI_32:mode>): Ditto.
      	* config/i386/sse.md (vec_shl_<V_128:mode>): Simplify expander.
      	(vec_shr_<V248FI:mode>): Ditto.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr113871-1a.c: New test.
      	* gcc.target/i386/pr113871-1b.c: New test.
      	* gcc.target/i386/pr113871-2a.c: New test.
      	* gcc.target/i386/pr113871-2b.c: New test.
      	* gcc.target/i386/pr113871-3a.c: New test.
      	* gcc.target/i386/pr113871-3b.c: New test.
      	* gcc.target/i386/pr113871-4a.c: New test.
      2c2f57e4
    • Roger Sayle's avatar
      PR other/113336: Fix libatomic testsuite regressions on ARM. · ea767576
      Roger Sayle authored
      This patch is a revised version of the fix for PR other/113336.
      Bootstrapping GCC on arm-linux-gnueabihf with --with-arch=armv6 currently
      has a large number of FAILs in libatomic (regressions since last time I
      attempted this).  The failure mode is related to IFUNC handling with the
      file tas_8_2_.o containing an unresolved reference to the function
      libat_test_and_set_1_i2.
      
      The following one line change, to build tas_1_2_.o when building tas_8_2_.o,
      resolves the problem for me and restores the libatomic testsuite to 44
      expected passes and 5 unsupported tests [from 22 unexpected failures
      and 22 unresolved testcases].
      `
      
      2024-02-14  Roger Sayle  <roger@nextmovesoftware.com>
      	    Victor Do Nascimento  <victor.donascimento@arm.com>
      
      libatomic/ChangeLog
      	PR other/113336
      	* Makefile.am: Build tas_1_2_.o on ARCH_ARM_LINUX
      	* Makefile.in: Regenerate.
      ea767576
    • Nathaniel Shead's avatar
      c++: Defer emitting inline variables [PR113708] · dd9d14f7
      Nathaniel Shead authored
      
      Inline variables are vague-linkage, and may or may not need to be
      emitted in any TU that they are part of, similarly to e.g. template
      instantiations.
      
      Currently 'import_export_decl' assumes that inline variables have
      already been emitted when it comes to end-of-TU processing, and so
      crashes when importing non-trivially-initialised variables from a
      module, as they have not yet been finalised.
      
      This patch fixes this by ensuring that inline variables are always
      deferred till end-of-TU processing, unifying the behaviour for module
      and non-module code.
      
      	PR c++/113708
      
      gcc/cp/ChangeLog:
      
      	* decl.cc (make_rtl_for_nonlocal_decl): Defer inline variables.
      	* decl2.cc (import_export_decl): Support inline variables.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/debug/dwarf2/inline-var-1.C: Reference 'a' to ensure it
      	is emitted.
      	* g++.dg/debug/dwarf2/inline-var-3.C: Likewise.
      	* g++.dg/modules/init-7_a.H: New test.
      	* g++.dg/modules/init-7_b.C: New test.
      
      Signed-off-by: default avatarNathaniel Shead <nathanieloshead@gmail.com>
      dd9d14f7
    • Andrew Pinski's avatar
      aarch64/testsuite: Remove dg-excess-errors from c-c++-common/gomp/pr63328.c... · 2b5e0c11
      Andrew Pinski authored
      aarch64/testsuite: Remove dg-excess-errors from c-c++-common/gomp/pr63328.c and gcc.dg/gomp/pr87895-2.c [PR113861]
      
      These now pass after r14-6416-gf5fc001a84a7db so let's remove the dg-excess-errors from them.
      
      Committed as obvious after a test for aarch64-linux-gnu.
      
      gcc/testsuite/ChangeLog:
      
      	PR testsuite/113861
      	* c-c++-common/gomp/pr63328.c: Remove dg-excess-errors.
      	* gcc.dg/gomp/pr87895-2.c: Likewise.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      2b5e0c11
    • Jan Hubicka's avatar
      Fix ICE in loop splitting with -fno-guess-branch-probability · 8d51bfe0
      Jan Hubicka authored
      	PR tree-optimization/111054
      
      gcc/ChangeLog:
      
      	* tree-ssa-loop-split.cc (split_loop): Check for profile being present.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.c-torture/compile/pr111054.c: New test.
      8d51bfe0
    • Tamar Christina's avatar
      middle-end: inspect all exits for additional annotations for loop. · 16ae5efe
      Tamar Christina authored
      Attaching a pragma to a loop which has a complex condition often gets the pragma
      dropped. e.g.
      
      #pragma GCC novector
        while (i < N && parse_tables_n--)
      
      before lowering this is represented as:
      
       if (ANNOTATE_EXPR <i <= 305 && parse_tables_n--  != 0, no-vector>) ...
      
      But after lowering the condition is broken appart and attached to the final
      component of the expression:
      
        if (parse_tables_n.2_2 != 0) goto <D.4456>; else goto <D.4453>;
        <D.4456>:
          iftmp.1D.4452 = 1;
          goto <D.4454>;
        <D.4453>:
          iftmp.1D.4452 = 0;
        <D.4454>:
          D.4451 = .ANNOTATE (iftmp.1D.4452, 2, 0);
          if (D.4451 != 0) goto <D.4442>; else goto <D.4440>;
        <D.4440>:
      
      and it's never heard from again because during replace_loop_annotate we only
      inspect the loop header and latch for annotations.
      
      Since annotations were supposed to apply to the loop as a whole this fixes it
      by checking the loop exit src blocks for annotations instead.
      
      gcc/ChangeLog:
      
      	* tree-cfg.cc (replace_loop_annotate): Inspect loop edges for annotations.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/vect-novect_gcond.c: New test.
      16ae5efe
    • Jerry DeLisle's avatar
      Fortran: Implement read_x for UTF-8 encoded files. · b79d3e6a
      Jerry DeLisle authored
      	PR fortran/99210
      
      libgfortran/ChangeLog:
      
      	* io/read.c (read_x): If UTF-8 encoding is enabled, use
      	read_utf8 to move one character over in the read buffer.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/pr99210.f90: New test.
      b79d3e6a
    • Jonathan Yong's avatar
      coreutils-sum-pr108666.c: fix spurious LLP64 warnings · eafbb05c
      Jonathan Yong authored
      
      Fixes the following warnings on x86_64-w64-mingw32:
      coreutils-sum-pr108666.c:17:1: warning: conflicting types for built-in function ‘memcpy’; expected ‘void *(void *, const void *, long long unsigned int)’ [-Wbuiltin-declaration-mismatch]
         17 | memcpy(void* __restrict __dest, const void* __restrict __src, size_t __n)
            | ^~~~~~
      
      coreutils-sum-pr108666.c:25:1: warning: conflicting types for built-in function ‘malloc’; expected ‘void *(long long unsigned int)’ [-Wbuiltin-declaration-mismatch]
         25 | malloc(size_t __size) __attribute__((__nothrow__, __leaf__))
            | ^~~~~~
      
      gcc/testsuite:
      
      	* c-c++-common/analyzer/coreutils-sum-pr108666.c: Use
      	__SIZE_TYPE__ instead of long unsigned int for size_t
      	definition.
      
      Signed-off-by: default avatarJonathan Yong <10walls@gmail.com>
      eafbb05c
    • Patrick Palka's avatar
      c++: synthesized_method_walk context independence [PR113908] · 9bc6b23d
      Patrick Palka authored
      
      In the second testcase below, during ahead of time checking of the
      non-dependent new-expr we synthesize B's copy ctor, which we expect to
      get defined as deleted since A's copy ctor is inaccessible.  But during
      access checking thereof, enforce_access incorrectly decides to defer it
      since we're in a template context according to current_template_parms
      (before r14-557 it checked processing_template_decl which got cleared
      from implicitly_declare_fn), which leads to the access check leaking out
      to the template context that triggered the synthesization, and B's copy
      ctor getting declared as non-deleted.
      
      This patch fixes this by using maybe_push_to_top_level to clear the
      context (including current_template_parms) before proceeding with the
      synthesization.  We could do this from implicitly_declare_fn, but it's
      better to do it more generally from synthesized_method_walk for sake of
      its other callers.
      
      This turns out to fix PR113332 as well: there the lambda context
      triggering synthesization was causing maybe_dummy_object to misbehave,
      but now synthesization is sufficiently context-independent.
      
      	PR c++/113908
      	PR c++/113332
      
      gcc/cp/ChangeLog:
      
      	* method.cc (synthesized_method_walk): Use maybe_push_to_top_level.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp0x/lambda/lambda-nsdmi11.C: New test.
      	* g++.dg/template/non-dependent31.C: New test.
      
      Reviewed-by: default avatarJason Merrill <jason@redhat.com>
      9bc6b23d
    • Richard Biener's avatar
      tree-optimization/113910 - huge compile time during PTA · ad7a365a
      Richard Biener authored
      For the testcase in PR113910 we spend a lot of time in PTA comparing
      bitmaps for looking up equivalence class members.  This points to
      the very weak bitmap_hash function which effectively hashes set
      and a subset of not set bits.
      
      The major problem with it is that it simply truncates the
      BITMAP_WORD sized intermediate hash to hashval_t which is
      unsigned int, effectively not hashing half of the bits.
      
      This reduces the compile-time for the testcase from tens of minutes
      to 42 seconds and PTA time from 99% to 46%.
      
      	PR tree-optimization/113910
      	* bitmap.cc (bitmap_hash): Mix the full element "hash" to
      	the hashval_t hash.
      ad7a365a
Loading