Skip to content
Snippets Groups Projects
  1. Feb 09, 2023
    • Tobias Burnus's avatar
      OpenMP/Fortran: Partially fix non-rect loop nests [PR107424] · ac294957
      Tobias Burnus authored
      This patch ensures that loop bounds depending on outer loop vars use the
      proper TREE_VEC format. It additionally gives a sorry if such an outer
      var has a non-one/non-minus-one increment as currently a count variable
      is used in this case (see PR).
      
      Finally, it avoids 'count' and just uses a local loop variable if the
      step increment is +/-1.
      
      	PR fortran/107424
      
      gcc/fortran/ChangeLog:
      
      	* trans-openmp.cc (struct dovar_init_d): Add 'sym' and
      	'non_unit_incr' members.
      	(gfc_nonrect_loop_expr): New.
      	(gfc_trans_omp_do): Call it; use normal loop bounds
      	for unit stride - and only create local loop var.
      
      libgomp/ChangeLog:
      
      	* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: New test.
      	* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: New test.
      	* testsuite/libgomp.fortran/non-rectangular-loop-2.f90: New test.
      	* testsuite/libgomp.fortran/non-rectangular-loop-3.f90: New test.
      	* testsuite/libgomp.fortran/non-rectangular-loop-4.f90: New test.
      	* testsuite/libgomp.fortran/non-rectangular-loop-5.f90: New test.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Update dg-note.
      	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise.
      ac294957
    • Martin Liska's avatar
      docs: add caveat for __builtin_cpu_supports · 1189d1b3
      Martin Liska authored
      Document that the function does not work correctly for old
      VIA processors.
      
      	PR target/100758
      
      gcc/ChangeLog:
      
      	* doc/extend.texi: Document that the function
      	does not work correctly for old VIA processors.
      1189d1b3
    • Tobias Burnus's avatar
      OpenMP: Parse align clause in allocate directive in C/C++ · 1eb78a93
      Tobias Burnus authored
      gcc/c/ChangeLog:
      
      	* c-parser.cc (c_parser_omp_allocate): Parse align
      	clause and check for restrictions.
      
      gcc/cp/ChangeLog:
      
      	* parser.cc (cp_parser_omp_allocate): Parse align
      	clause and check for restrictions.
      
      gcc/testsuite/ChangeLog:
      
      	* c-c++-common/gomp/allocate-5.c: Extend for align clause.
      1eb78a93
    • Tobias Burnus's avatar
      Fortran/OpenMP: Fix -fopenmp-simd for 'omp assume(s)' · ae091a44
      Tobias Burnus authored
      While 'omp assume' is enabled by -fopenmp-simd, 'omp assumes' is not;
      however, due to the way parsing works in Fortran (esp. for fixed-form
      source code), 'assumes' was parsed by 'assume' which then stumbled over
      the tailing 's'.
      
      gcc/fortran/
      
      	* parse.cc (decode_omp_directive): Really ignore 'assumes' with
      	-fopenmp-simd.
      
      gcc/testsuite/
      
      	* gfortran.dg/gomp/openmp-simd-8.f90: New test.
      ae091a44
    • Andreas Schwab's avatar
      lto-wrapper: Pass through -funwind-tables and -fasynchronous-unwind-tables · 9453e3cd
      Andreas Schwab authored
      The -funwind-tables and -fasynchronous-unwind-tables options are relevant
      for the output pass, so they need to be passed through by the LTO wrapper.
      Otherwise, dwarf2out_assembly_start may output a ".cfi_sections
      .debug_frame" directive when debug info is enabled even if every
      translation unit was compiled with -funwind-tables.
      
      gcc/
      	* lto-wrapper.cc (merge_and_complain): Handle
      	-funwind-tables and -fasynchronous-unwind-tables.
      	(append_compiler_options): Likewise.
      9453e3cd
    • Jakub Jelinek's avatar
      c++: Mangle EXCESS_PRECISION_EXPR <REAL_CST> as fold_convert REAL_CST [PR108698] · b1ed0c96
      Jakub Jelinek authored
      For standard excess precision, like the C FE we parse floating
      point constants as EXCESS_PRECISION_EXPR of promoted REAL_CST
      rather than the nominal REAL_CST, and as the following testcase
      shows the constants might need mangling.
      
      The following patch mangles those as fold_convert of the REAL_CST
      to EXCESS_PRECISION_EXPR type, i.e. how they were mangled before.
      
      I'm not really sure EXCESS_PRECISION_EXPR can appear elsewhere
      in expressions that would need mangling, tried various testcases
      but haven't managed to come up with one.  If that is possible,
      we'd keep ICEing on it without/with this patch, and the big question
      is how to mangle those; they could be mangled as casts from the
      promoted type back to nominal, but then in the mangled expressions
      one could see the effects of excess precision.  Until we have
      a reproducer, that is just theoretical though.
      
      2023-02-09  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/108698
      	* mangle.cc (write_expression, write_template_arg): Handle
      	EXCESS_PRECISION_EXPR with REAL_CST operand as
      	write_template_arg_literal on fold_convert of the REAL_CST
      	to EXCESS_PRECISION_EXPR type.
      
      	* g++.dg/cpp0x/pr108698.C: New test.
      b1ed0c96
    • Richard Biener's avatar
      tree-optimization/26854 - slow bitmap operations · 4b19ff1b
      Richard Biener authored
      With the compiler.i testcase from the PR one can see bitmap_set_bit
      very high in the profile, originating from SSA update and alias
      stmt walking.  For SSA update mark_block_for_update essentially
      performs redundant bitmap_set_bits and is called via
      insert_updated_phi_nodes_for as
      
            EXECUTE_IF_SET_IN_BITMAP (pruned_idf, 0, i, bi)
      ...
                mark_block_for_update (bb);
                FOR_EACH_EDGE (e, ei, bb->preds)
                  if (e->src->index >= 0)
                    mark_block_for_update (e->src);
      
      which is quite random in the access pattern and runs into the
      O(n) case of the linked list bitmap representation.  Switching
      blocks_to_update to tree view around insert_updated_phi_nodes_for
      improves SSA update time from
      
       tree SSA incremental               :   4.26 (  3%)
      
      to
      
       tree SSA incremental               :   2.98 (  2%)
      
      Likewise the visited bitmap allocated by the alias walker benefits
      from using the tree view in case of large CFGs and we see an
      improvement from
      
       alias stmt walking                 :  10.53 (  9%)
      
      to
      
       alias stmt walking                 :   4.05 (  4%)
      
      	PR tree-optimization/26854
      	* tree-into-ssa.cc (update_ssa): Turn blocks_to_update to tree
      	view around insert_updated_phi_nodes_for.
      	* tree-ssa-alias.cc (maybe_skip_until): Allocate visited bitmap
      	in tree view.
      	(walk_aliased_vdefs_1): Likewise.
      4b19ff1b
    • GCC Administrator's avatar
      Daily bump. · f6fc79d0
      GCC Administrator authored
      f6fc79d0
  2. Feb 08, 2023
    • Joseph Myers's avatar
      c: Update checks on constexpr pointer initializers · 53678f7f
      Joseph Myers authored
      WG14 has agreed a change of the rules on constexpr pointer
      initializers, so that a (constant) null value that is not a null
      pointer constant is accepted in that context, rather than only
      accepting null pointer constants.  (In particular, this means that a
      constexpr variable of pointer type can be used to initializer another
      such variable.)  Remove the null pointer constant restriction in GCC,
      instead checking just whether the value is null.
      
      Bootstrapped with no regressions for x86_64-pc-linux-gnu.
      
      gcc/c/
      	* c-typeck.cc (check_constexpr_init): Remove argument
      	null_pointer_constant.  Only check pointer initializers for being
      	null.
      	(digest_init): Update calls to check_constexpr_init.
      
      gcc/testsuite/
      	* gcc.dg/c2x-constexpr-1.c: Test initialization of constexpr
      	pointers with null values that are not null pointer constants.
      	* gcc.dg/c2x-constexpr-3.c: Test initialization of constexpr
      	pointers with non-null values, not with null values that are not
      	null pointer constants.
      53678f7f
    • Gerald Pfeifer's avatar
      doc: Change fsf.org to www.fsf.org · 1a49390f
      Gerald Pfeifer authored
      fsf.org has been serving a 301 (permanent redirect) http response for
      a long while.
      
      gcc/ChangeLog:
      
      	* doc/include/gpl_v3.texi: Change fsf.org to www.fsf.org.
      1a49390f
    • Hans-Peter Nilsson's avatar
      testsuite: Fix asm-goto-with-outputs tests; limit to lra targets · 70888d09
      Hans-Peter Nilsson authored
      These tests spuriously lacked a "lra" limiter.  Code using
      "asm goto" with outputs gets a:
       error: the target does not support 'asm goto' with outputs in 'asm'
      compilation error when compiled for a non-LRA target.  Limit
      to LRA targets as other asm-goto-with-outputs tests.
      
      	* gcc.dg/torture/pr100398.c: Limit to lra targets.
      	* gcc.dg/pr100590.c: Ditto.
      70888d09
    • David Malcolm's avatar
      analyzer: fix overzealous state purging with on-stack structs [PR108704] · 77bb54b1
      David Malcolm authored
      
      PR analyzer/108704 reports many false positives seen from
      -Wanalyzer-use-of-uninitialized-value on qemu's softfloat.c on code like
      the following:
      
         struct st s;
         s = foo ();
         s = bar (s); // bogusly reports that s is uninitialized here
      
      where e.g. "struct st" is "floatx80" in the qemu examples.
      
      The root cause is overzealous purging of on-stack structs in the code I
      added in r12-7718-gfaacafd2306ad7, where at:
      
      	s = bar (s);
      
      state_purge_per_decl::process_point_backwards "sees" the assignment to 's'
      and stops processing, effectively treating 's' as unneeded before this
      stmt, not noticing the use of 's' in the argument.
      
      Fixed thusly.
      
      The patch greatly reduces the number of
      -Wanalyzer-use-of-uninitialized-value warnings from my integration tests:
        ImageMagick-7.1.0-57:  10 ->  6   (-4)
                    qemu-7.2: 858 -> 87 (-771)
               haproxy-2.7.1:   1 ->  0   (-1)
      All of the above that I've examined appear to be false positives.
      
      gcc/analyzer/ChangeLog:
      	PR analyzer/108704
      	* state-purge.cc (state_purge_per_decl::process_point_backwards):
      	Don't stop processing the decl if it's fully overwritten by
      	this stmt if it's also used by this stmt.
      
      gcc/testsuite/ChangeLog:
      	PR analyzer/108704
      	* gcc.dg/analyzer/uninit-7.c: New test.
      	* gcc.dg/analyzer/uninit-pr108704.c: New test.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      77bb54b1
    • Srinath Parvathaneni's avatar
      arm: Optimize arm-mlib.h header inclusion [pr108505]. · 2eeda82d
      Srinath Parvathaneni authored
      I have committed a fix [1] into gcc trunk for a build
      issue mentioned in pr108505 and latter received few upstream
      comments proposing more robust fix for this issue.
      
      In this patch I'm addressing those comments and sending this
      as a followup patch.
      
      gcc/ChangeLog:
      
      2023-01-27  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>
      
      	PR target/108505
      	* config.gcc (tm_mlib_file): Define new variable.
      2eeda82d
    • Steve Kargl's avatar
      Fortran: error handling of global entity appearing in COMMON block [PR103259] · 7e9f20f5
      Steve Kargl authored
      gcc/fortran/ChangeLog:
      
      	PR fortran/103259
      	* resolve.cc (resolve_common_vars): Avoid NULL pointer dereference
      	when a symbol's location is not set.
      
      gcc/testsuite/ChangeLog:
      
      	PR fortran/103259
      	* gfortran.dg/pr103259.f90: New test.
      7e9f20f5
    • Jakub Jelinek's avatar
      vect-patterns: Fix up vect_widened_op_tree [PR108692] · 6ad1c102
      Jakub Jelinek authored
      The following testcase is miscompiled on aarch64-linux since r11-5160.
      Given
        <bb 3> [local count: 955630225]:
        # i_22 = PHI <i_20(6), 0(5)>
        # r_23 = PHI <r_19(6), 0(5)>
      ...
        a.0_5 = (unsigned char) a_15;
        _6 = (int) a.0_5;
        b.1_7 = (unsigned char) b_17;
        _8 = (int) b.1_7;
        c_18 = _6 - _8;
        _9 = ABS_EXPR <c_18>;
        r_19 = _9 + r_23;
      ...
      where SSA_NAMEs 15/17 have signed char, 5/7 unsigned char and rest is int
      we first pattern recognize c_18 as
      patt_34 = (a.0_5) w- (b.1_7);
      which is still correct, 5/7 are unsigned char subtracted in wider type,
      but then vect_recog_sad_pattern turns it into
      SAD_EXPR <a_15, b_17, r_23>
      which is incorrect, because 15/17 are signed char and so it is
      sum of absolute signed differences rather than unsigned sum of
      absolute unsigned differences.
      The reason why this happens is that vect_recog_sad_pattern calls
      vect_widened_op_tree with MINUS_EXPR, WIDEN_MINUS_EXPR on the
      patt_34 = (a.0_5) w- (b.1_7); statement's vinfo and vect_widened_op_tree
      calls vect_look_through_possible_promotion on the operands of the
      WIDEN_MINUS_EXPR, which looks through the further casts.
      vect_look_through_possible_promotion has careful code to stop when there
      would be nested casts that need to be preserved, but the problem here
      is that the WIDEN_*_EXPR operation itself has an implicit cast on the
      operands already - in this case of WIDEN_MINUS_EXPR the unsigned char
      5/7 SSA_NAMEs are widened to unsigned short before the subtraction,
      and vect_look_through_possible_promotion obviously isn't told about that.
      
      Now, I think when we see those WIDEN_{MULT,MINUS,PLUS}_EXPR codes, we had
      to look through possible promotions already when creating those and so
      vect_look_through_possible_promotion again isn't really needed, all we need
      to do is arrange what that function will do if the operand isn't result
      of any cast.  Other option would be let vect_look_through_possible_promotion
      know about the implicit promotion from the WIDEN_*_EXPR, but I'm afraid
      that would be much harder.
      
      2023-02-08  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/108692
      	* tree-vect-patterns.cc (vect_widened_op_tree): If rhs_code is
      	widened_code which is different from code, don't call
      	vect_look_through_possible_promotion but instead just check op is
      	SSA_NAME with integral type for which vect_is_simple_use is true
      	and call set_op on this_unprom.
      
      	* gcc.dg/pr108692.c: New test.
      6ad1c102
    • Andrea Corallo's avatar
      aarch64: Fix return_address_sign_ab_exception.C regression · b1d26458
      Andrea Corallo authored
      Hi all,
      
      this is to fix the regression of
      g++.target/aarch64/return_address_sign_ab_exception.C that I
      introduced with d8dadbc9.
      
      'aarch_ra_sign_key' for aarch64 ended up being non defined in the opt
      file and the function attribute "branch-protection=pac-ret+leaf+b-key"
      stopped working as expected.
      
      This patch moves the definition of 'aarch_ra_sign_key' to the opt
      files for both Arm back-ends.
      
      Regards
      
        Andera Corallo
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-protos.h (aarch_ra_sign_key): Remove
      	declaration.
      	* config/aarch64/aarch64.cc (aarch_ra_sign_key): Remove
      	definition.
      	* config/aarch64/aarch64.opt (aarch64_ra_sign_key): Rename
      	to 'aarch_ra_sign_key'.
      	* config/arm/aarch-common.cc (aarch_ra_sign_key): Remove
      	declaration.
      	* config/arm/arm-protos.h (aarch_ra_sign_key): Likewise.
      	* config/arm/arm.cc (enum aarch_key_type): Remove definition.
      	* config/arm/arm.opt: Define.
      b1d26458
    • Richard Sandiford's avatar
      testsuite: Import objc-dg-prune in execute.exp · 3d451c42
      Richard Sandiford authored
      The GCC-local definition of gcc-dg-prune removes extra error messages,
      such as one from the linker warning about executable stacks.  This is
      then used by tool-specific pruners like objc-dg-prune, defined in
      objc-dg.exp.  However, objc/execute/execute.exp didn't include
      objc-dg.exp, meaning that the linker warning could trigger a
      failure in objc/execute/nested-func-1.m.
      
      gcc/testsuite/
      	* objc/execute/execute.exp: Load objc-dg.exp.
      3d451c42
    • Richard Sandiford's avatar
      vect: Check gather/scatter offset types [PR108316] · 740a3be7
      Richard Sandiford authored
      The gather/scatter support can over-widen an offset if the target
      requires it, but this relies on using a pattern sequence to add
      the widening conversion.  That failed in the testcase because an
      earlier pattern (bool) took priority.
      
      I think we should allow patterns to be applied to other patterns,
      but that's quite an invasive change and isn't suitable for stage 4.
      This patch instead punts if the offset type doesn't match the
      expected one.
      
      If we switched to using the SLP representation for everything,
      we would probably handle both patterns by rewriting the graph,
      which should be much easier.
      
      gcc/
      	PR tree-optimization/108316
      	* tree-vect-stmts.cc (get_load_store_type): When using
      	internal functions for gather/scatter, make sure that the type
      	of the offset argument is consistent with the offset vector type.
      
      gcc/testsuite/
      	PR tree-optimization/108316
      	* gcc.dg/vect/pr108316.c: New test.
      740a3be7
    • Vladimir N. Makarov's avatar
    • Jakub Jelinek's avatar
      testsuite: Fix up PR108525 test [PR108525] · a58a4a57
      Jakub Jelinek authored
      Seems when committing the PR108525 fix I've missed that a test with
      the same name had been added a few hours before for PR108526.
      
      This patch separates the PR108525 test into a new file.
      
      2023-02-08  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/108525
      	* g++.dg/cpp23/static-operator-call5.C: Move PR108525 testcase
      	incorrectly applied into PR108526 testcase ...
      	* g++.dg/cpp23/static-operator-call6.C: ... here.  New test.
      a58a4a57
    • Jakub Jelinek's avatar
      tree.def: Remove outdated comment on SAD_EXPR · aa12d1b1
      Jakub Jelinek authored
      While looking at PR108692, I've noticed SAD_EXPR comment mentions that
      WIDEN_MINUS_EXPR is missing, which is not true anymore since r11-5160.
      
      The following patch just removes that part of the comment.
      
      2023-02-08  Jakub Jelinek  <jakub@redhat.com>
      
      	* tree.def (SAD_EXPR): Remove outdated comment about missing
      	WIDEN_MINUS_EXPR.
      aa12d1b1
    • GCC Administrator's avatar
      Daily bump. · 8f3b85ef
      GCC Administrator authored
      8f3b85ef
  3. Feb 07, 2023
    • Thomas Schwinge's avatar
      Fix 'libgomp.fortran/reverse-offload-6.f90' nvptx offloading compilation · 7ab75a6e
      Thomas Schwinge authored
      Fix-up for recent commit 0b1ce70a
      "libgomp: Fix reverse offload issues".
      
      	libgomp/
      	* testsuite/libgomp.fortran/reverse-offload-6.f90: Fix nvptx
      	offloading compilation.
      7ab75a6e
    • David Malcolm's avatar
      analyzer: fix -Wanalyzer-use-of-uninitialized-value false +ve on "read" [PR108661] · c300e251
      David Malcolm authored
      
      My integration testing shows many false positives from
      -Wanalyzer-use-of-uninitialized-value.
      
      One cause turns out to be that as of r13-1404-g97baacba963c06
      fd_state_machine::on_stmt recognizes calls to "read", and returns true,
      so that region_model::on_call_post doesn't call handle_unrecognized_call
      on them, and so the analyzer erroneously "thinks" that the buffer
      pointed to by "read" is never touched by the "read" call.
      
      This works for "fread" because sm-file.cc implements kf_fread, which
      handles calls to "fread" by clobbering the buffer pointed to.  In the
      long term we should probably be smarter about this and bifurcate the
      analysis to consider e.g. errors vs full reads vs partial reads, etc
      (which I'm tracking in PR analyzer/108689).
      
      In the meantime, this patch adds a kf_read for "read" analogous to the
      one for "fread", fixing 6 false positives seen in git-2.39.0 and
      2 in haproxy-2.7.1.
      
      gcc/analyzer/ChangeLog:
      	PR analyzer/108661
      	* sm-fd.cc (class kf_read): New.
      	(register_known_fd_functions): Register "read".
      	* sm-file.cc (class kf_fread): Update comment.
      
      gcc/testsuite/ChangeLog:
      	PR analyzer/108661
      	* gcc.dg/analyzer/fread-pr108661.c: New test.
      	* gcc.dg/analyzer/read-pr108661.c: New test.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      c300e251
    • Harald Anlauf's avatar
      Fortran: ASSOCIATE variables should not be TREE_STATIC [PR95107] · c36f3da5
      Harald Anlauf authored
      gcc/fortran/ChangeLog:
      
      	PR fortran/95107
      	* trans-decl.cc (gfc_finish_var_decl): With -fno-automatic, do not
      	make ASSOCIATE variables TREE_STATIC.
      
      gcc/testsuite/ChangeLog:
      
      	PR fortran/95107
      	* gfortran.dg/save_7.f90: New test.
      c36f3da5
    • Marek Polacek's avatar
      doc: Update -fchar8_t documentation · 8bc87173
      Marek Polacek authored
      Since C++20 P2513R4, char8_t Compatibility and Portability Fix it is
      no longer true that
      
        char ca[] = u8"xx";
      
      causes an error so adjust the example for -fchar8_t.
      
      gcc/ChangeLog:
      
      	* doc/invoke.texi: Update -fchar8_t documentation.
      8bc87173
    • Vladimir N. Makarov's avatar
      RA: Implement reuse of equivalent memory for caller saves optimization · f661c0bb
      Vladimir N. Makarov authored
      The test case shows opportunity to reuse memory with constant address for
      caller saves optimization for constant or pure function call.  The patch
      implements the memory reuse.
      
              PR rtl-optimization/103541
      
      gcc/ChangeLog:
      
      	* ira.h (struct ira_reg_equiv_s): Add new field caller_save_p.
      	* ira.cc (validate_equiv_mem): Check memref address variance.
      	(update_equiv_regs): Define caller save equivalence for
      	valid_combine.
      	(setup_reg_equiv): Clear defined_p flag for caller save equivalence.
      	* lra-constraints.cc (lra_copy_reg_equiv): Add new arg
      	call_save_p.  Use caller save equivalence depending on the arg.
      	(split_reg): Adjust the call.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr103541.c: New.
      f661c0bb
    • Richard Biener's avatar
      tree-optimization/26854 - compile-time hog in SSA forwprop · 295adfc9
      Richard Biener authored
      The following addresses
      
       tree forward propagate             :  12.41 (  9%)
      
      seen with the compile.i testcase of this PR which points at
      the has_use_on_stmt function which, for SSA names with many
      uses is slow.  The solution is to instead of immediate uses,
      look at stmt operands to identify whether a name has a use
      on a stmt.  That improves SSA forwprop to
      
       tree forward propagate             :   1.30 (  0%)
      
      for this testcase.
      
      	PR tree-optimization/26854
      	* gimple-fold.cc (has_use_on_stmt): Look at stmt operands
      	instead of immediate uses.
      295adfc9
    • Jakub Jelinek's avatar
      ipa-split: Don't split returns_twice functions [PR106923] · 5321d532
      Jakub Jelinek authored
      As discussed in the PR, returns_twice functions are rare/special beasts
      that need special treatment in the cfg, and inside of their bodies
      we don't know which part actually works the weird returns twice way
      (either in the fork/vfork sense, or in the setjmp) and aren't updating
      ab edges to reflect that.
      
      I think easiest is just to never split these, like we already never
      split noreturn or malloc functions.
      
      2023-02-07  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/106923
      	* ipa-split.cc (execute_split_functions): Don't split returns_twice
      	functions.
      
      	* gcc.dg/pr106923.c: New test.
      5321d532
    • Jakub Jelinek's avatar
      cgraph: Handle simd clones in cgraph_node::set_{const,pure}_flag [PR106433] · cad2412c
      Jakub Jelinek authored
      The following testcase ICEs, because we determine only in late pure const
      pass that bar is const (the content of the function loses a store to a
      global var during dse3 and read from it during cddce2) and local-pure-const2
      makes it const.  The cgraph ordering is that post IPA (in late IPA simd
      clones are created) bar is processed first, then foo as its caller, then
      foo.simdclone* and finally bar.simdclone*.  Conceptually I think that is the
      right ordering which allows for static simd clones to be removed.
      
      The reason for the ICE is that because bar was marked const, the call to
      it lost vops before vectorization, and when we in foo.simdclone* try to
      vectorize the call to bar, we replace it with bar.simdclone* which hasn't
      been marked const and so needs vops, which we don't add.
      
      Now, because the simd clones are created from the same IL, just in a loop
      with different argument/return value passing, I think generally if the base
      function is determined to be const or pure, the simd clones should be too,
      unless e.g. the vectorization causes different optimization decisions, but
      then still the global memory reads if any shouldn't affect what the function
      does and global memory stores shouldn't be reachable at runtime.
      
      So, the following patch changes set_{const,pure}_flag to mark also simd
      clones.
      
      2023-02-07  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/106433
      	* cgraph.cc (set_const_flag_1): Recurse on simd clones too.
      	(cgraph_node::set_pure_flag): Call set_pure_flag_1 on simd clones too.
      
      	* gcc.c-torture/compile/pr106433.c: New test.
      cad2412c
    • Jakub Jelinek's avatar
      testsuite: Expect -Wdeprecated warning in warn/Wstrict-aliasing-bogus-union-2.C for C++23 · 64b5ca43
      Jakub Jelinek authored
      On Mon, Feb 06, 2023 at 02:26:01PM +0000, Jonathan Wakely via Gcc-patches wrote:
      > With the recent change to deprecate std::aligned_storage and
      > std::aligned_union we need to adjust some tests that now fail with
      > -std=c++23.
      
      The g++.dg/warn/Wstrict-aliasing-bogus-union-2.C test is also affected:
      PASS: g++.dg/warn/Wstrict-aliasing-bogus-union-2.C  -std=gnu++2b  (test for bogus messages, line 12)
      FAIL: g++.dg/warn/Wstrict-aliasing-bogus-union-2.C  -std=gnu++2b (test for excess errors)
      Excess errors:
      .../gcc/testsuite/g++.dg/warn/Wstrict-aliasing-bogus-union-2.C:8:8: warning: 'template<long unsigned int _Len, long unsigned int _Align> struct std::aligned_storage' is deprecated [-
      
      The following patch adds dg-warning for it.
      
      2023-02-07  Jakub Jelinek  <jakub@redhat.com>
      
      	* g++.dg/warn/Wstrict-aliasing-bogus-union-2.C: Expect
      	-Wdeprecated warning for C++23.
      64b5ca43
    • Jan Hubicka's avatar
      Enable 512 bit vector for zen4 · a7502c4a
      Jan Hubicka authored
      While internally 512 registers are splits into two 256 halves, 512 bit vectors
      reduces number of instructions to retire and has chance to improve paralelism.
      There are few tsvc benchmarks that improves significantly:
      
                 runtime
      benchmark  256bit  512bit
      s2275      48.57   20.67    -58%
      s311       32.29   16.06    -50%
      s312       32.30   16.07    -50%
      vsumr      32.30   16.07    -50%
      s314       10.77   5.42     -50%
      s313       21.52   10.85    -50%
      vdotr      43.05   21.69    -50%
      s316       10.80   5.64     -48%
      s235       61.72   33.91    -45%
      s161       15.91   9.95     -38%
      s3251      32.13   20.31    -36%
      
      And there are no benchmarks with off-noise regression.  The basic matrix
      multiplication loop improves by 32%.  It is also expected that 512 bit
      vectors are more power effecient (I can't masure that).
      
      The down side is that loops with low trip counts may get slower when the
      unvectorized prologue and epilogue is hit more often.  With SPECfp this
      problem happens with x264 (12% regression) and bwaves (6% regression)
      and this is tracked in
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
      and will need more work on vectorizer to support masked epilogues.
      
      After some additional testing it seems that using 512 bit vectors by
      default is now overall better choice.
      
      Bootstrapped/regtested x86_64-linux. Plan to commit it tomorrow.
      
      	* config/i386/x86-tune.def (X86_TUNE_AVX256_OPTIMAL): Turn off
      	for znver4.
      a7502c4a
    • GCC Administrator's avatar
      Daily bump. · f0e73dd0
      GCC Administrator authored
      f0e73dd0
  4. Feb 06, 2023
    • Gaius Mulley's avatar
      Modula2 meets clang [PR108135] · d5f933d2
      Gaius Mulley authored
      
      Remove unused function (and build warnings).
      
      gcc/m2/ChangeLog:
      
      	* gm2-compiler/M2Search.mod (DSdbEnter): Comment out.
      	(DSdbExit): Comment out.
      
      	PR modula2/108135
      
      Signed-off-by: default avatarGaius Mulley <gaiusmod2@gmail.com>
      d5f933d2
    • Arsen Arsenović's avatar
      libstdc++: Document P1642 and extensions · 9f4baed6
      Arsen Arsenović authored
      libstdc++-v3/ChangeLog:
      
      	* doc/xml/manual/using.xml: Document newly-freestanding
      	headers and the effect of the -ffreestanding flag.
      	* doc/xml/manual/status_cxx2023.xml: Document P1642R11 as
      	completed.
      	* doc/xml/manual/configure.xml: Document that hosted installs
      	respect __STDC_HOSTED__.
      	* doc/xml/manual/test.xml: Document how to run tests in
      	freestanding mode.
      	* doc/html/*: Regenerate.
      9f4baed6
    • Gaius Mulley's avatar
      Format error in m2pp.cc (m2pp_integer_cst) [PR107234] · 17d0892d
      Gaius Mulley authored
      
      Use HOST_WIDE_INT_PRINT_UNSIGNED instead of hardcoding a
      specific format.
      
      gcc/m2/ChangeLog:
      
      	* m2pp.cc (m2pp_integer_cst): Use
      	HOST_WIDE_INT_PRINT_UNSIGNED as the format specifier.
      
      	PR modula2/107234
      	    Co-Authored by: Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
      
      Signed-off-by: default avatarGaius Mulley <gaiusmod2@gmail.com>
      17d0892d
    • Andrew Stubbs's avatar
      amdgcn: Pass -mstack-size through to runtime · 45e01229
      Andrew Stubbs authored
      But only for the offload case.
      
      gcc/ChangeLog:
      
      	* config/gcn/mkoffload.cc (gcn_stack_size): New global variable.
      	(process_asm): Create a constructor for GCN_STACK_SIZE.
      	(main): Parse the -mstack-size option.
      45e01229
    • Gaius Mulley's avatar
      Remove unused variables and procedures. · 74337475
      Gaius Mulley authored
      
      Remove unused variables and procedures (and remove build
      warning clutter).
      
      gcc/m2/ChangeLog:
      
      	* gm2-compiler/M2Preprocess.mod (BaseName): Comment out.
      	* gm2-lang.cc (opt): Remove.
      	* gm2spec.cc (add_include): Remove.
      	(full_libraries): Remove.
      	(concat_option): Remove.
      
      Signed-off-by: default avatarGaius Mulley <gaiusmod2@gmail.com>
      74337475
    • Alex Coplan's avatar
      aarch64: Fix up bfmlal lane pattern [PR104921] · 277e1f30
      Alex Coplan authored
      As the testcase shows, this pattern had an incorrect constraint leading
      to GCC's output getting rejected by the assembler.
      
      This patch fixes the constraint accordingly.
      
      The test is split into two: one that can run without bf16 support from
      the assembler and another that checks that the output actually assembles
      when such support is available.
      
      Bootstrapped/regtested on aarch64-linux-gnu.
      
      OK for GCC 13? Or better to wait for next stage 1? What about backports?
      
      Thanks,
      Alex
      
      gcc/ChangeLog:
      
      	PR target/104921
      	* config/aarch64/aarch64-simd.md (aarch64_bfmlal<bt>_lane<q>v4sf):
      	Use correct constraint for operand 3.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/104921
      	* gcc.target/aarch64/pr104921-1.c: New test.
      	* gcc.target/aarch64/pr104921-2.c: New test.
      	* gcc.target/aarch64/pr104921.x: Include file for new tests.
      277e1f30
    • Jonathan Wakely's avatar
      libstdc++: Fix non-reserved name for template parameter · 0afcb713
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	* include/bits/ranges_algo.h (__find_last_fn): Rename T to _Tp.
      	(__find_last_if_fn): Likewise.
      0afcb713
Loading