Skip to content
Snippets Groups Projects
  1. Jan 15, 2025
    • Jonathan Wakely's avatar
      libstdc++: Fix reversed args in unreachable assumption [PR109849] · 6f85a972
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/109849
      	* include/bits/vector.tcc (vector::_M_range_insert): Fix
      	reversed args in length calculation.
      Unverified
      6f85a972
    • Harald Anlauf's avatar
      Fortran: reject NULL as source-expr in ALLOCATE with SOURCE= or MOLD= [PR71884] · 89230999
      Harald Anlauf authored
      	PR fortran/71884
      
      gcc/fortran/ChangeLog:
      
      	* resolve.cc (resolve_allocate_expr): Reject intrinsic NULL as
      	source-expr.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/pr71884.f90: New test.
      89230999
    • Jakub Jelinek's avatar
      c++: Handle RAW_DATA_CST in unify [PR118390] · 2619413a
      Jakub Jelinek authored
      This patch uses the count_ctor_elements function to fix up
      unify deduction of array sizes.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118390
      	* cp-tree.h (count_ctor_elements): Declare.
      	* call.cc (count_ctor_elements): No longer static.
      	* pt.cc (unify): Use count_ctor_elements instead of
      	CONSTRUCTOR_NELTS.
      
      	* g++.dg/cpp/embed-20.C: New test.
      	* g++.dg/cpp0x/pr118390.C: New test.
      2619413a
    • Wilco Dijkstra's avatar
      AArch64: Update neoverse512tvb tuning · 4ce502f3
      Wilco Dijkstra authored
      Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the
      missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
      
      gcc:
      	* config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.
      4ce502f3
    • Wilco Dijkstra's avatar
      AArch64: Add FULLY_PIPELINED_FMA to tune baseline · 2713f6bb
      Wilco Dijkstra authored
      Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is
      already enabled for some cores, but benchmarking it shows it is faster on all
      modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1).
      
      gcc:
      	* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE):
      	Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
      	* config/aarch64/tuning_models/ampere1b.h: Remove redundant
      	AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
      	* config/aarch64/tuning_models/neoversev2.h: Likewise.
      2713f6bb
    • Wilco Dijkstra's avatar
      AArch64: Deprecate -mabi=ilp32 · 625ea3c6
      Wilco Dijkstra authored
      ILP32 was originally intended to make porting to AArch64 easier.  Support was
      never merged in the Linux kernel or GLIBC, so it has been unsupported for many
      years.  There isn't a benefit in keeping unsupported features forever, so
      deprecate it now (and it could be removed in a future release).
      
      gcc:
      	* config/aarch64/aarch64.cc (aarch64_override_options): Add warning.
      	* doc/invoke.texi: Document -mabi=ilp32 as deprecated.
      
      gcc/testsuite:
      	* gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated.
      	* gcc.target/aarch64/pr100518.c: Likewise.
      	* gcc.target/aarch64/pr113114.c: Likewise.
      	* gcc.target/aarch64/pr80295.c: Likewise.
      	* gcc.target/aarch64/pr94201.c: Likewise.
      	* gcc.target/aarch64/pr94577.c: Likewise.
      	* gcc.target/aarch64/sve/pr108603.c: Likewise.
      625ea3c6
    • Cupertino Miranda's avatar
      bpf: set index entry for a VAR_DECL in CO-RE relocs · 01c37f9a
      Cupertino Miranda authored
      CO-RE accesses with non pointer struct variables will also generate a
      "0" string access within the CO-RE relocation.
      The first index within the access string, has sort of a different
      meaning then the remaining of the indexes.
      For i0:i1:...:in being an access index for "struct A a" declaration, its
      semantics are represented by:
        (&a + (sizeof(struct A) * i0) + offsetof(i1:...:in)
      
      gcc/ChangeLog:
      	* config/bpf/core-builtins.cc (compute_field_expr): Change
      	VAR_DECL outcome in switch case.
      
      gcc/testsuite/ChangeLog:
      	* gcc.target/bpf/core-builtin-1.c: Correct test.
      	* gcc.target/bpf/core-builtin-2.c: Correct test.
      	* gcc.target/bpf/core-builtin-exprlist-1.c: Correct test.
      01c37f9a
    • Cupertino Miranda's avatar
      bpf: calls do not promote attr access_index on lhs · 42786ccf
      Cupertino Miranda authored
      When traversing gimple to introduce CO-RE relocation entries to
      expressions that are accesses to attributed perserve_access_index types,
      the access is likely to be split in multiple gimple statments.
      In order to keep doing the proper CO-RE convertion we will need to mark
      the LHS tree nodes of gimple expressions as explicit CO-RE accesses,
      such that the gimple traverser will further convert the sub-expressions.
      
      This patch makes sure that this LHS marking will not happen in case the
      gimple statement is a function call, which case it is no longer
      expecting to keep generating CO-RE accesses with the remaining of the
      expression.
      
      gcc/ChangeLog:
      
      	* config/bpf/core-builtins.cc
      	(make_gimple_core_safe_access_index): Fix in condition.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/bpf/core-attr-calls.c: New test.
      42786ccf
    • Cupertino Miranda's avatar
      bpf: make sure CO-RE relocs are typed with struct BTF_KIND_STRUCT · d30def00
      Cupertino Miranda authored
      Based on observation within bpf-next selftests and comparisson of GCC
      and clang compiled code, the BPF loader expects all CO-RE relocations to
      point to BTF non const and non volatile type nodes.
      
      gcc/ChangeLog:
      
      	* btfout.cc (get_btf_kind): Remove static from function definition.
      	* config/bpf/btfext-out.cc (bpf_code_reloc_add): Check if CO-RE type
      	is not a const or volatile.
      	* ctfc.h (btf_dtd_kind): Add prototype for function.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/bpf/core-attr-const.c: New test.
      d30def00
    • Jakub Jelinek's avatar
      c++: Implement mangling of RAW_DATA_CST [PR118278] · 8d9d5834
      Jakub Jelinek authored
      As the following testcases show (mangle80.C only after reversion of the
      temporary reversion of C++ large array speedup commit), RAW_DATA_CST can
      be seen during mangling of some templates and we ICE because
      the mangler doesn't handle it.
      
      The following patch handles it and mangles it the same as a sequence of
      INTEGER_CSTs that were used previously instead.
      The only slight complication is that if ce->value is the last nonzero
      element, we need to skip the zeros at the end of RAW_DATA_CST.
      
      2025-01-03  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118278
      	* mangle.cc (write_expression): Handle RAW_DATA_CST.
      
      	* g++.dg/abi/mangle80.C: New test.
      	* g++.dg/cpp/embed-19.C: New test.
      8d9d5834
    • Marek Polacek's avatar
      c++: handle decltype in nested-name-spec printing [PR118139] · 1bc474f6
      Marek Polacek authored
      
      Compiling this test, we emit:
      
        error: 'static void CW<T>::operator=(int) requires requires(typename'decltype_type' not supported by pp_cxx_unqualified_id::type x) {x;}' must be a non-static member function
      
      where the DECLTYPE_TYPE isn't printed properly.  This patch fixes that
      to print:
      
      error: 'static void CW<T>::operator=(int) requires requires(typename decltype(T())::type x) {x;}' must be a non-static member function
      
      	PR c++/118139
      
      gcc/cp/ChangeLog:
      
      	* cxx-pretty-print.cc (pp_cxx_nested_name_specifier): Handle
      	a computed-type-specifier.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/diagnostic/decltype1.C: New test.
      
      Reviewed-by: default avatarJason Merrill <jason@redhat.com>
      1bc474f6
    • Jonathan Wakely's avatar
      libstdc++: Fix comments in test that reference wrong subclause of C++11 · 9cc31b4e
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	* testsuite/28_regex/traits/char/transform_primary.cc: Fix
      	subclause numbering in references to the standard.
      Unverified
      9cc31b4e
    • Tamar Christina's avatar
      middle-end: Fix incorrect type replacement in operands_equals [PR118472] · 25eb892a
      Tamar Christina authored
      In g:3c32575e I made a mistake and incorrectly
      replaced the type of the arguments of an expression with the type of the
      expression.  This is of course wrong.
      
      This reverts that change and I have also double checked the other replacements
      and they are fine.
      
      gcc/ChangeLog:
      
      	PR middle-end/118472
      	* fold-const.cc (operand_compare::operand_equal_p): Fix incorrect
      	replacement.
      
      gcc/testsuite/ChangeLog:
      
      	PR middle-end/118472
      	* gcc.dg/pr118472.c: New test.
      25eb892a
    • Richard Biener's avatar
      Annotate dbg_line_numbers table · bea593f1
      Richard Biener authored
      The following adds /* <num> */ to dbg_line_numbers so there's the chance
      to more easily lookup the ID of the match.pd line number used for
      dumping when you want to debug a speicific replacement.  It also cuts
      the lines down to 10 entries.
      
        static int dbg_line_numbers[1267] = {
              /* 0 */ 161, 164, 173, 175, 178, 181, 183, 189, 197, 195,
              /* 10 */ 199, 201, 205, 923, 921, 2060, 2071, 2052, 2058, 2063,
      ...
      
      	* genmatch.cc (define_dump_logs): Make reverse lookup in
      	dbg_line_numbers easier by adding comments with start index
      	and cutting number of elements per line to 10.
      bea593f1
    • Christoph Müllner's avatar
      testsuite: i386: Fix expected vectoriziation in pr105493.c · 120a3700
      Christoph Müllner authored
      
      As reported in PR117079, commit ab187858 broke the test pr105493.c.
      The test code contains two loops, where the first one is exected to be
      vectorized.  The commit that broke that vectorization was the first of
      several that enabled vectorization of both loops.
      Now, that GCC can vectorize the whole function, let's adjust this test
      to expect vectorization of both loops by ensuring that we don't write
      to the helper-array 'tmp'.
      
      Signed-off-by: default avatarChristoph Müllner <christoph.muellner@vrull.eu>
      
      	PR target/117079
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr105493.c: Fix expected vectorization
      
      Signed-off-by: default avatarChristoph Müllner <christoph.muellner@vrull.eu>
      120a3700
    • Tobias Burnus's avatar
      OpenMP/C++: Fix 'declare variant' for struct-returning functions [PR118486] · b67a0d6a
      Tobias Burnus authored
      To find the variant declaration, a call is constructed in
      omp_declare_variant_finalize_one, which gives here:
        TARGET_EXPR <D.3010, variant_fn ()>
      
      Extracting now the function declaration failed and gave the bogus
        error: could not find variant declaration
      
      Solution: Use the 2nd argument of the TARGET_EXPR and continue.
      
      	PR c++/118486
      
      gcc/cp/ChangeLog:
      
      	* decl.cc (omp_declare_variant_finalize_one): When resolving
      	the variant to use, handle variant calls with TARGET_EXPR.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/gomp/declare-variant-11.C: New test.
      b67a0d6a
    • Jakub Jelinek's avatar
      ipa: Initialize/release global obstack in process_new_functions [PR116068] · dd389c25
      Jakub Jelinek authored
      Other spots in cgraphunit.cc already call bitmap_obstack_initialize (NULL);
      before running a pass list and bitmap_obstack_release (NULL); after that,
      while process_new_functions wasn't doing that and with the new r15-130
      bitmap_alloc checking that results in ICE.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR ipa/116068
      	* cgraphunit.cc (symbol_table::process_new_functions): Call
      	bitmap_obstack_initialize (NULL); and bitmap_obstack_release (NULL)
      	around processing the functions.
      
      	* gcc.dg/graphite/pr116068.c: New test.
      dd389c25
    • Jakub Jelinek's avatar
      c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't... · 18f6bb98
      Jakub Jelinek authored
      c++: Delete defaulted operator <=> if std::strong_ordering::equal doesn't convert to its rettype [PR118387]
      
      Note, the PR raises another problem.
      If on the same testcase the B b; line is removed, we silently synthetize
      operator<=> which will crash at runtime due to returning without a return
      statement.  That is because the standard says that in that case
      it should return static_cast<int>(std::strong_ordering::equal);
      but I can't find anywhere wording which would say that if that isn't
      valid, the function is deleted.
      https://eel.is/c++draft/class.compare#class.spaceship-2.2
      seems to talk just about cases where there are some members and their
      comparison is invalid it is deleted, but here there are none and it
      follows
      https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
      So, we synthetize with tf_none, see the static_cast is invalid, don't
      add error_mark_node statement silently, but as the function isn't deleted,
      we just silently emit it.
      Should the standard be amended to say that the operator should be deleted
      even if it has no elements and the static cast from
      https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
      
      On Fri, Jan 10, 2025 at 12:04:53PM -0500, Jason Merrill wrote:
      > That seems pretty obviously what we want, and is what the other compilers
      > implement.
      
      This patch implements it then.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118387
      	* method.cc (build_comparison_op): Set bad if
      	std::strong_ordering::equal doesn't convert to rettype.
      
      	* g++.dg/cpp2a/spaceship-err6.C: Expect another error.
      	* g++.dg/cpp2a/spaceship-synth17.C: Likewise.
      	* g++.dg/cpp2a/spaceship-synth-neg6.C: Likewise.
      	* g++.dg/cpp2a/spaceship-synth-neg7.C: New test.
      
      	* testsuite/25_algorithms/default_template_value.cc
      	(Input::operator<=>): Use auto as return type rather than bool.
      18f6bb98
    • Jakub Jelinek's avatar
      c++: Fix up maybe_init_list_as_array for RAW_DATA_CST [PR118124] · 64828272
      Jakub Jelinek authored
      The previous patch made me look around some more and I found
      maybe_init_list_as_array doesn't handle RAW_DATA_CSTs correctly either,
      while the RAW_DATA_CST is properly split during finish_compound_literal,
      it was using CONSTRUCTOR_NELTS as the size of the arrays, which is wrong,
      RAW_DATA_CST could stand for far more initializers.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118124
      	* cp-tree.h (build_array_of_n_type): Change second argument type
      	from int to unsigned HOST_WIDE_INT.
      	* tree.cc (build_array_of_n_type): Likewise.
      	* call.cc (count_ctor_elements): New function.
      	(maybe_init_list_as_array): Use it instead of CONSTRUCTOR_NELTS.
      	(convert_like_internal): Use length from init's type instead of
      	len when handling the maybe_init_list_as_array case.
      
      	* g++.dg/cpp0x/initlist-opt5.C: New test.
      64828272
    • Jakub Jelinek's avatar
      c++: Fix ICEs with large initializer lists or ones including #embed [PR118124] · f263f2d5
      Jakub Jelinek authored
      The following testcases ICE due to RAW_DATA_CST not being handled where it
      should be during ck_list conversions.
      
      The last 2 testcases started ICEing with r15-6339 committed yesterday
      (speedup of large initializers), the first two already with r15-5958
      (#embed optimization for C++).
      
      For conversion to initializer_list<unsigned char> or char/signed char
      we can optimize and keep RAW_DATA_CST with adjusted type if we report
      narrowing errors if needed, for others this converts each element
      separately.
      
      2025-01-15  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/118124
      	* call.cc (convert_like_internal): Handle RAW_DATA_CST in
      	ck_list handling.  Formatting fixes.
      
      	* g++.dg/cpp/embed-15.C: New test.
      	* g++.dg/cpp/embed-16.C: New test.
      	* g++.dg/cpp0x/initlist-opt3.C: New test.
      	* g++.dg/cpp0x/initlist-opt4.C: New test.
      f263f2d5
    • Kito Cheng's avatar
      RISC-V: Fix code gen for reduction with length 0 [PR118182] · 40ad10f7
      Kito Cheng authored
      `.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the
      return value will be the start value even if the length is 0.
      
      However current code gen in RISC-V backend is not meet that semantic, it will
      result a random garbage value if length is 0.
      
      Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64:
              # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0);
              vsetvli zero,a5,e64,m1,ta,ma
              vfmv.s.f        v2,fa5     # insn 1
              vfredosum.vs    v1,v1,v2   # insn 2
              vfmv.f.s        fa5,v1     # insn 3
      
      insn 1:
      - vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value.
      insn 2:
      - vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA.
      (v-spec say: `If vl=0, no operation is performed and the destination register
       is not updated.`)
      insn 3:
      - vfmv.f.s will move the value from v1 even VL=0, so this is safe.
      
      So how we fix that? we need two fix for that:
      
      1. insn 1: need always execute with VL=1, so that we can guarantee it will
                 always work as expect.
      2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for
                 all reduction patterns, then we can guarantee vd[0] will contain the
                 start value when vl=0
      
      For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2,
      we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1`
      (start value).
      
      Change since V3:
      - Rename _AV to _VL0_SAFE for readability.
      - Use non-VL0_SAFE version if VL is const or VLMAX.
      - Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX.
      - Two more testcase.
      
      gcc/ChangeLog:
      
      	PR target/118182
      	* config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust
      	argument for expand_reduction.
      	(*widen_reduc_plus_scal_<mode>): Ditto.
      	(*fold_left_widen_plus_<mode>): Ditto.
      	(*mask_len_fold_left_widen_plus_<mode>): Ditto.
      	(*cond_widen_reduc_plus_scal_<mode>): Ditto.
      	(*cond_len_widen_reduc_plus_scal_<mode>): Ditto.
      	(*cond_widen_reduc_plus_scal_<mode>): Ditto.
      	* config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for
      	expand_reduction.
      	(reduc_smax_scal_<mode>): Ditto.
      	(reduc_umax_scal_<mode>): Ditto.
      	(reduc_smin_scal_<mode>): Ditto.
      	(reduc_umin_scal_<mode>): Ditto.
      	(reduc_and_scal_<mode>): Ditto.
      	(reduc_ior_scal_<mode>): Ditto.
      	(reduc_xor_scal_<mode>): Ditto.
      	(reduc_plus_scal_<mode>): Ditto.
      	(reduc_smax_scal_<mode>): Ditto.
      	(reduc_smin_scal_<mode>): Ditto.
      	(reduc_fmax_scal_<mode>): Ditto.
      	(reduc_fmin_scal_<mode>): Ditto.
      	(fold_left_plus_<mode>): Ditto.
      	(mask_len_fold_left_plus_<mode>): Ditto.
      	* config/riscv/riscv-v.cc (expand_reduction): Add one more
      	argument for reduction code for vl0-safe.
      	* config/riscv/riscv-protos.h (expand_reduction): Ditto.
      	* config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of
      	reduction.
      	(ANY_REDUC_VL0_SAFE): New.
      	(ANY_WREDUC_VL0_SAFE): Ditto.
      	(ANY_FREDUC_VL0_SAFE): Ditto.
      	(ANY_FREDUC_SUM_VL0_SAFE): Ditto.
      	(ANY_FWREDUC_SUM_VL0_SAFE): Ditto.
      	(reduc_op): Add _VL0_SAFE variant of reduction.
      	(order) Ditto.
      	* config/riscv/vector.md (@pred_<reduc_op><mode>): New.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/118182
      	* gfortran.target/riscv/rvv/pr118182.f: New.
      	* gcc.target/riscv/rvv/autovec/pr118182-1.c: New.
      	* gcc.target/riscv/rvv/autovec/pr118182-2.c: New.
      40ad10f7
    • Richard Biener's avatar
      Fix SLP scalar costing with stmts also used in externals · 21edcb95
      Richard Biener authored
      When we have the situation of an external SLP node that is
      permuted the scalar stmts recorded in the permute node do not
      mean the scalar computation can be removed.  We are removing
      those stmts from the vectorized_scalar_stmts for this reason
      but we fail to check this set when we cost scalar stmts.  Note
      vectorized_scalar_stmts isn't a complete set so also pass
      scalar_stmts_in_externs and check that.
      
      The following fixes this.
      
      This shows in PR115777 when we avoid vectorizing the load, but
      on it's own doesn't help the PR yet.
      
      	PR tree-optimization/115777
      	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Do not
      	cost a scalar stmt that needs to be preserved.
      21edcb95
    • Michal Jires's avatar
      lto: Remove link() to fix build with MinGW [PR118238] · ed123311
      Michal Jires authored
      I used link() to create cheap copies of Incremental LTO cache contents
      to prevent their deletion once linking is finished.
      This is unnecessary, since output_files are deleted in our lto-plugin
      and not in the linker itself.
      
      Bootstrapped/regtested on x86_64-linux.
      lto-wrapper now again builds on MinGW. Though so far I have not setup
      MinGW to be able to do full bootstrap.
      Ok for trunk?
      
      	PR lto/118238
      
      gcc/ChangeLog:
      
      	* lto-wrapper.cc (run_gcc): Remove link() copying.
      
      lto-plugin/ChangeLog:
      
      	* lto-plugin.c (cleanup_handler):
      	Keep output_files when using Incremental LTO.
      	(onload): Detect Incremental LTO.
      ed123311
    • Anton Blanchard's avatar
      [RISC-V][PR target/118170] Add HF div/sqrt reservation · d6f1961e
      Anton Blanchard authored
      
      Clearly an oversight in the generic-ooo model caught by the checking code.  I
      should have realized it was generic-ooo as we don't have a pipeline description
      for the tenstorrent design yet, just the costing model.
      
      The patch was extracted from the BZ which indicated Anton was the author, so I
      kept that.  I'm listed as co-author just in case someone wants to complain
      about the testcase in the future.  I didn't do any notable lifting here.
      
      Thanks Peter and Anton!
      
      	PR target/118170
      gcc/
      	* config/riscv/generic-ooo.md (generic_ooo_float_div_half): New
      	reservation.
      
      gcc/testsuite
      	* gcc.target/riscv/pr118170.c: New test.
      
      Co-authored-by: default avatarJeff Law <jlaw@ventanamicro.com>
      d6f1961e
    • Richard Sandiford's avatar
      [PR rtl-optimization/109592] Simplify nested shifts · cab2e123
      Richard Sandiford authored
      > The BZ in question is a failure to recognize a pair of shifts as a sign
      > extension.
      >
      > I originally thought simplify-rtx would be the right framework to
      > address this problem, but fwprop is actually better.  We can write the
      > recognizer much simpler in that framework.
      >
      > fwprop already simplifies nested shifts/extensions to the desired RTL,
      > but it's not considered profitable and we throw away the good work done
      > by fwprop & simplifiers.
      >
      > It's hard to see a scenario where nested shifts or nested extensions
      > that simplify down to a single sign/zero extension isn't a profitable
      > transformation.  So when fwprop has nested shifts/extensions that
      > simplifies to an extension, we consider it profitable.
      >
      > This allow us to simplify the testcase on rv64 with ZBB enabled from a
      > pair of shifts to a single byte or half-word sign extension.
      
      Hmm.  So just to summarise something that was discussed in the PR
      comments, this is a case where combine's expand_compound_operation/
      make_compound_operation wrangler hurts us, because the process isn't
      idempotent, and combine produces two complex instructions:
      
      (insn 6 3 7 2 (set (reg:DI 137 [ _3 ])
              (ashift:DI (reg:DI 139 [ x ])
                  (const_int 24 [0x18]))) "foo.c":2:20 305 {ashldi3}
           (expr_list:REG_DEAD (reg:DI 139 [ x ])
              (nil)))
      (insn 12 7 13 2 (set (reg/i:DI 10 a0)
              (sign_extend:DI (ashiftrt:SI (subreg:SI (reg:DI 137 [ _3 ]) 0)
                      (const_int 24 [0x18])))) "foo.c":2:27 321 {ashrsi3_extend}
           (expr_list:REG_DEAD (reg:DI 137 [ _3 ])
              (nil)))
      
      given two simple instructions:
      
      (insn 6 3 7 2 (set (reg:SI 137 [ _3 ])
              (sign_extend:SI (subreg:QI (reg/v:DI 136 [ x ]) 0))) "foo.c":2:20 533 {*extendqisi2_bitmanip}
           (expr_list:REG_DEAD (reg/v:DI 136 [ x ])
              (nil)))
      (insn 7 6 12 2 (set (reg:DI 138 [ _3 ])
              (sign_extend:DI (reg:SI 137 [ _3 ]))) "foo.c":2:20 discrim 1 133 {*extendsidi2_internal}
           (expr_list:REG_DEAD (reg:SI 137 [ _3 ])
              (nil)))
      
      If I run with -fdisable-rtl-combine then late_combine1 already does the
      expected transformation.
      
      Although it would be nice to fix combine, that might be difficult.
      If we treat combine as immutable then the options are:
      
      (1) Teach simplify-rtx to simplify combine's output into a single sign_extend.
      
      (2) Allow fwprop1 to get in first, before combine has a chance to mess
          things up.
      
      The patch goes for (2).
      
      Is that a fair summary?
      
      Playing devil's advocate, I suppose one advantage of (1) is that it
      would allow the optimisation even if the original rtl looked like
      combine's output.  And fwprop1 doesn't distinguish between cases in
      which the source instruction disappears from cases in which the source
      instruction is kept.  Thus we could transform:
      
        (set (reg:SI R2) (sign_extend:SI (reg:QI R1)))
        (set (reg:DI R3) (sign_extend:DI (reg:SI R2)))
      
      into:
      
        (set (reg:SI R2) (sign_extend:SI (reg:QI R1)))
        (set (reg:DI R3) (sign_extend:DI (reg:QI R1)))
      
      which increases the register pressure between the two instructions
      (since R2 and R1 are both now live).  In general, there could be
      quite a gap between the two instructions.
      
      On the other hand, even in that case, fwprop1 would be parallelising
      the extensions.  And since we're talking about unary operations,
      even two-address targets would allow R1 to be extended without
      tying the source and destination.
      
      Also, it seems relatively unlikely that expand would produce code
      that looks like combine's, since the gimple optimisers should have
      simplified it into conversions.
      
      So initially I was going to agree that it's worth trying in fwprop.  But...
      
      [ commentary on Jeff's original approach dropped. ]
      
      So it seems like it's a bit of a mess :slight_frown:
      
      If we do try to fix combine, I think something like the attached
      would fit within the current scheme.  It is a pure shift-for-shift
      transformation, avoiding any extensions.
      
      Will think more about it, but wanted to get the above stream of
      consciousness out before I finish for the day :slight_smile:
      
      
      
      	PR rtl-optimization/109592
      gcc/
      	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
      	Simplify nested shifts with subregs.
      
      gcc/testsuite
      	* gcc.target/riscv/pr109592.c: New test.
      	* gcc.target/riscv/sign-extend-rshift.c: Adjust expected output
      
      Co-authored-by: default avatarJeff Law <jlaw@ventanamicro.com>
      cab2e123
    • GCC Administrator's avatar
      Daily bump. · 3b3b3f88
      GCC Administrator authored
      3b3b3f88
  2. Jan 14, 2025
    • anetczuk's avatar
      c++: dump-lang-raw with obj_type_ref fields · 6e0b048f
      anetczuk authored
      Raw dump of lang tree was missing information about virtual method call.
      The information is provided in "tok" field of obj_type_ref.
      
      gcc/ChangeLog:
      
      	* tree-dump.cc (dequeue_and_dump): Handle OBJ_TYPE_REF.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/diagnostic/lang-dump-1.C: New test.
      6e0b048f
    • Iain Buclaw's avatar
      d: Merge upstream dmd, druntime d6f693b46a, phobos 336bed6d8. · c8894b68
      Iain Buclaw authored
      D front-end changes:
      
      	- Import latest fixes from dmd v2.110.0-rc.1.
      
      D runtime changes:
      
      	- Import latest fixes from druntime v2.110.0-rc.1.
      
      Phobos changes:
      
      	- Import latest fixes from phobos v2.110.0-rc.1.
      
      Included in the merge are fixes for the following PRs:
      
      	PR d/118438
      	PR d/118448
      	PR d/118449
      
      gcc/d/ChangeLog:
      
      	* dmd/MERGE: Merge upstream dmd d6f693b46a.
      	* d-incpath.cc (add_import_paths): Update for new front-end interface.
      
      libphobos/ChangeLog:
      
      	* libdruntime/MERGE: Merge upstream druntime d6f693b46a.
      	* src/MERGE: Merge upstream phobos 336bed6d8.
      	* testsuite/libphobos.init_fini/custom_gc.d: Adjust test.
      c8894b68
    • Alexandre Oliva's avatar
      [ifcombine] robustify decode_field_reference · 5006b9d8
      Alexandre Oliva authored
      Arrange for decode_field_reference to use local variables throughout,
      to modify the out parms only when we're about to return non-NULL, and
      to drop the unused case of NULL pand_mask, that had a latent failure
      to detect signbit masking.
      
      
      for  gcc/ChangeLog
      
      	* gimple-fold.cc (decode_field_reference): Rebustify to set
      	out parms only when returning non-NULL.
      	(fold_truth_andor_for_ifcombine): Bail if
      	decode_field_reference returns NULL.  Add complementary assert
      	on r_const's not being set when l_const isn't.
      5006b9d8
    • Marek Polacek's avatar
      c++: re-enable NSDMI CONSTRUCTOR folding [PR118355] · e939005c
      Marek Polacek authored
      
      In c++/102990 we had a problem where massage_init_elt got {},
      digest_nsdmi_init turned that {} into { .value = (int) 1.0e+0 },
      and we crashed in the call to fold_non_dependent_init because
      a FIX_TRUNC_EXPR/FLOAT_EXPR got into tsubst*.  So we avoided
      calling fold_non_dependent_init for a CONSTRUCTOR.
      
      But that broke the following test, where we no longer fold the
      CONST_DECL in
        { .type = ZERO }
      to
        { .type = 0 }
      and then process_init_constructor_array does:
      
                  if (next != error_mark_node
                      && (initializer_constant_valid_p (next, TREE_TYPE (next))
                          != null_pointer_node))
                    {
                      /* Use VEC_INIT_EXPR for non-constant initialization of
                         trailing elements with no explicit initializers.  */
                      picflags |= PICFLAG_VEC_INIT;
      
      because { .type = ZERO } isn't initializer_constant_valid_p.  Then we
      create a VEC_INIT_EXPR and say we can't convert the argument.
      
      So we have to fold the elements of the CONSTRUCTOR.  We just can't
      instantiate the elements in a template.
      
      This also fixes c++/118047.
      
      	PR c++/118047
      	PR c++/118355
      
      gcc/cp/ChangeLog:
      
      	* typeck2.cc (massage_init_elt): Call fold_non_dependent_init
      	unless for a CONSTRUCTOR in a template.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp0x/nsdmi-list10.C: New test.
      	* g++.dg/cpp0x/nsdmi-list9.C: New test.
      
      Reviewed-by: default avatarJason Merrill <jason@redhat.com>
      e939005c
    • Sandra Loosemore's avatar
      OpenMP: Remove dead code from declare variant reimplementation · d27db303
      Sandra Loosemore authored
      After reimplementing late resolution of "declare variant", the
      declare_variant_alt and calls_declare_variant_alt flags on struct
      cgraph_node are no longer used by anything.  For the purposes of
      marking functions that need late resolution, the
      has_omp_variant_constructs flag has replaced
      calls_declare_variant_alt.
      
      Likewise struct omp_declare_variant_entry, struct
      omp_declare_variant_base_entry, and the hash tables used to store
      these structures are no longer needed, since the information needed for
      late resolution is now stored in the gomp_variant_construct nodes.
      
      In addition, some obsolete code that was temporarily ifdef'ed out
      instead of delted in order to produce a more readable patch for the
      previous installment of this series is now removed entirely.
      
      There are no functional changes in this patch, just removing dead code.
      
      gcc/ChangeLog
      	* cgraph.cc (symbol_table::create_edge): Don't set
      	calls_declare_variant_alt in the caller.
      	* cgraph.h (struct cgraph_node): Remove declare_variant_alt
      	and calls_declare_variant_alt flags.
      	* cgraphclones.cc (cgraph_node::create_clone): Don't copy
      	calls_declare_variant_alt bit.
      	* gimplify.cc: Remove previously #ifdef-ed out code.
      	* ipa-free-lang-data.cc (free_lang_data_in_decl): Adjust code
      	referencing declare_variant_alt bit.
      	* ipa.cc (symbol_table::remove_unreachable_nodes): Likewise.
      	* lto-cgraph.cc (lto_output_node): Remove references to deleted
      	bits.
      	(output_refs): Adjust code referencing declare_variant_alt bit.
      	(input_overwrite_node): Remove references to deleted bits.
      	(input_refs): Adjust code referencing declare_variant_alt bit.
      	* lto-streamer-out.cc (lto_output): Likewise.
      	* lto-streamer.h (omp_lto_output_declare_variant_alt): Delete.
      	(omp_lto_input_declare_variant_alt): Delete.
      	* omp-expand.cc (expand_omp_target): Use has_omp_variant_constructs
      	bit to trigger pass_omp_device_lower instead of
      	calls_declare_variant_alt.
      	* omp-general.cc (struct omp_declare_variant_entry): Delete.
      	(struct omp_declare_variant_base_entry): Delete.
      	(struct omp_declare_variant_hasher): Delete.
      	(omp_declare_variant_hasher::hash): Delete.
      	(omp_declare_variant_hasher::equal): Delete.
      	(omp_declare_variants): Delete.
      	(omp_declare_variant_alt_hasher): Delete.
      	(omp_declare_variant_alt_hasher::hash): Delete.
      	(omp_declare_variant_alt_hasher::equal): Delete.
      	(omp_declare_variant_alt): Delete.
      	(omp_lto_output_declare_variant_alt): Delete.
      	(omp_lto_input_declare_variant_alt): Delete.
      	(includes): Delete unnecessary include of gt-omp-general.h.
      	* omp-offload.cc (execute_omp_device_lower): Remove references
      	to deleted bit.
      	(pass_omp_device_lower::gate): Likewise.
      	* omp-simd-clone.cc (simd_clone_create): Likewise.
      	* passes.cc (ipa_write_summaries): Likeise.
      	* symtab.cc (symtab_node::get_partitioning_class): Likewise.
      	* tree-inline.cc (expand_call_inline): Likewise.
      	(tree_function_versioning): Likewise.
      
      gcc/lto/ChangeLog
      	* lto-partition.cc (lto_balanced_map): Adjust code referencing
      	deleted declare_variant_alt bit.
      d27db303
    • Sandra Loosemore's avatar
      OpenMP: Re-work and extend context selector resolution · 1294b819
      Sandra Loosemore authored
      
      This patch reimplements the middle-end support for "declare variant"
      and extends the resolution mechanism to also handle metadirectives
      (PR112779).  It also adds partial support for dynamic selectors
      (PR113904) and fixes a selector scoring bug reported as PR114596.  I hope
      this rewrite also improves the engineering aspect of the code, e.g. more
      comments to explain what it is doing.
      
      In most cases, variant constructs can be resolved either in the front
      end or during gimplification; if the variant with the highest score
      has a static selector, then only that one is emitted.  In the case
      where it has a dynamic selector, it is resolved into a (possibly nested)
      if/then/else construct, testing the run-time predicate for each selector
      sorted by decreasing order of score until a static selector is found.
      
      In some cases, notably a variant construct in a "declare simd"
      function which may or may not expand into a simd clone, it may not be
      possible to score or sort the variants until later in compilation (the
      ompdevlow pass).  In this case the gimplifier emits a loop containing
      a switch statement with the variants in arbitrary order and uses the
      OMP_NEXT_VARIANT tree node as a placeholder to control which variant
      is tested on each iteration of the loop.  It looks something like:
      
           switch_var = OMP_NEXT_VARIANT (0, state);
           loop_label:
           switch (switch_var)
             {
             case 1:
              if (dynamic_selector_predicate_1)
                {
                  alternative_1;
                  goto end_label;
                }
              else
                {
                  switch_var = OMP_NEXT_VARIANT (1, state);
                  goto loop_label;
                }
             case 2:
               ...
             }
            end_label:
      
      Note that when there are no dynamic selectors, the loop is unnecessary
      and only the switch is emitted.
      
      Finally, in the ompdevlow pass, the OMP_NEXT_VARIANT magic cookies are
      resolved and replaced with constants.  When compiling with -O we can
      expect that the loop and switch will be discarded by subsequent
      optimizations and replaced with direct jumps between the cases,
      eventually arriving at code with similar control flow to the
      early-resolution cases.
      
      This approach is somewhat simpler than the one currently used for
      handling declare variant in that all possible code paths are already
      included in the output of the gimplifier, so it is not necessary to
      maintain hidden references or data structures pointing to expansions of
      not-yet-resolved variant constructs and special logic for passing them
      through LTO (see PR lto/96680).
      
      A possible disadvantage of this expansion strategy is that dead code
      for unused variants in the switch can remain when compiling without
      -O.  If this turns out to be a critical problem (e.g., an unused case
      includes calls to functions not available to the linker) perhaps some
      further processing could be performed by default after ompdevlow to
      simplify such constructs.
      
      In order to make this patch more readable for review purposes, it
      leaves the existing code for "declare variant" resolution (including
      the above-mentioned LTO hack) in place, in some cases just ifdef-ing
      out functions that won't compile due to changed interfaces for
      dependencies.  The next patch in the series will delete all the
      now-unused code.
      
      gcc/ChangeLog
      	PR middle-end/114596
      	PR middle-end/112779
      	PR middle-end/113904
      
      	* Makefile.in (GTFILES): Move omp-general.h earlier; required
      	because of moving score_wide_int declaration to that file.
      	* cgraph.h (struct cgraph_node): Add has_omp_variant_constructs flag.
      	* cgraphclones.cc (cgraph_node::create_clone): Propagate
      	has_omp_variant_constructs flag.
      	* gimplify.cc (omp_resolved_variant_calls): New.
      	(expand_late_variant_directive): New.
      	(find_supercontext): New.
      	(gimplify_variant_call_expr): New.
      	(gimplify_call_expr): Adjust parameters to make fallback available.
      	Update processing for "declare variant" substitution.
      	(is_gimple_stmt): Add OMP_METADIRECTIVE.
      	(omp_construct_selector_matches): Ifdef out unused function.
      	(omp_get_construct_context): New.
      	(gimplify_omp_dispatch): Replace call to deleted function
      	omp_resolve_declare_variant with equivalent logic.
      	(expand_omp_metadirective): New.
      	(expand_late_variant_directive): New.
      	(gimplify_omp_metadirective): New.
      	(gimplify_expr): Adjust arguments to gimplify_call_expr.  Add
      	cases for OMP_METADIRECTIVE, OMP_NEXT_VARIANT, and
      	OMP_TARGET_DEVICE_MATCHES.
      	(gimplify_function_tree): Initialize/clean up
      	omp_resolved_variant_calls.
      	* gimplify.h (omp_construct_selector_matches): Delete declaration.
      	(omp_get_construct_context): Declare.
      	* lto-cgraph.cc (lto_output_node): Write has_omp_variant_constructs.
      	(input_overwrite_node): Read has_omp_variant_constructs.
      	* omp-builtins.def (BUILT_IN_OMP_GET_NUM_DEVICES): New.
      	* omp-expand.cc (expand_omp_taskreg): Propagate
      	has_omp_variant_constructs.
      	(expand_omp_target): Likewise.
      	* omp-general.cc (omp_maybe_offloaded): Add construct_context
      	parameter; use it instead of querying gimplifier state.  Add
      	comments.
      	(omp_context_name_list_prop): Do not test lang_GNU_Fortran in
      	offload compiler, just use the string as-is.
      	(expr_uses_parm_decl): New.
      	(omp_check_context_selector): Add metadirective_p parameter.
      	Remove sorry for target_device selector.  Add additional checks
      	specific to metadirective or declare variant.
      	(make_omp_metadirective_variant): New.
      	(omp_construct_traits_match): New.
      	(omp_context_selector_matches): Temporarily ifdef out the previous
      	code, and add a new implementation based on the old one with
      	different parameters, some unnecessary loops removed, and code
      	re-indented.
      	(omp_target_device_matches_on_host): New.
      	(resolve_omp_target_device_matches): New.
      	(omp_construct_simd_compare): Support matching of "simdlen" and
      	"aligned" clauses.
      	(omp_context_selector_set_compare): Make static.  Adjust call to
      	omp_construct_simd_compare.
      	(score_wide_int): Move declaration to omp-general.h.
      	(omp_selector_is_dynamic): New.
      	(omp_device_num_check): New.
      	(omp_dynamic_cond): New.
      	(omp_context_compute_score): Ifdef out the old version and
      	re-implement with different parameters.
      	(omp_complete_construct_context): New.
      	(omp_resolve_late_declare_variant): Ifdef out.
      	(omp_declare_variant_remove_hook): Likewise.
      	(omp_resolve_declare_variant): Likewise.
      	(sort_variant): New.
      	(omp_get_dynamic_candidates): New.
      	(omp_declare_variant_candidates): New.
      	(omp_metadirective_candidates): New.
      	(omp_early_resolve_metadirective): New.
      	(omp_resolve_variant_construct): New.
      	* omp-general.h (score_wide_int): Moved here from omp-general.cc.
      	(struct omp_variant): New.
      	(make_omp_metadirective_variant): Declare.
      	(omp_construct_traits_to_codes): Delete declaration.
      	(omp_check_context_selector): Adjust parameters.
      	(omp_context_selector_matches): Likewise.
      	(omp_context_selector_set_compare): Delete declaration.
      	(omp_resolve_declare_variant): Likewise.
      	(omp_declare_variant_candidates): Declare.
      	(omp_metadirective_candidates): Declare.
      	(omp_get_dynamic_candidates): Declare.
      	(omp_early_resolve_metadirective): Declare.
      	(omp_resolve_variant_construct): Declare.
      	(omp_dynamic_cond): Declare.
      	* omp-offload.cc (resolve_omp_variant_cookies): New.
      	(execute_omp_device_lower): Call the above function to resolve
      	variant directives.  Remove call to omp_resolve_declare_variant.
      	(pass_omp_device_lower::gate): Check has_omp_variant_construct bit.
      	* omp-simd-clone.cc (simd_clone_create): Propagate
      	has_omp_variant_constructs bit.
      	* tree-inline.cc (expand_call_inline): Likewise.
      	(tree_function_versioning): Likewise.
      
      gcc/c/ChangeLog
      	PR middle-end/114596
      	PR middle-end/112779
      	PR middle-end/113904
      	* c-parser.cc (c_finish_omp_declare_variant): Update for changes
      	to omp-general.h interfaces.
      
      gcc/cp/ChangeLog
      	PR middle-end/114596
      	PR middle-end/112779
      	PR middle-end/113904
      	* decl.cc (omp_declare_variant_finalize_one): Update for changes
      	to omp-general.h interfaces.
      	* parser.cc (cp_finish_omp_declare_variant): Likewise.
      
      gcc/fortran/ChangeLog
      	PR middle-end/114596
      	PR middle-end/112779
      	PR middle-end/113904
      	* trans-openmp.cc (gfc_trans_omp_declare_variant): Update for changes
      	to omp-general.h interfaces.
      
      gcc/testsuite/
      	PR middle-end/114596
      	PR middle-end/112779
      	PR middle-end/113904
      	* c-c++-common/gomp/declare-variant-12.c: Adjust expected behavior
      	per PR114596.
      	* c-c++-common/gomp/declare-variant-13.c: Test that this is resolvable
      	after gimplification, not just final resolution.
      	* c-c++-common/gomp/declare-variant-14.c: Tweak testcase to ensure
      	that -O causes dead code to be optimized away.
      	* gfortran.dg/gomp/declare-variant-12.f90: Adjust expected behavior
      	per PR114596.
      	* gfortran.dg/gomp/declare-variant-13.f90: Test that this is resolvable
      	after gimplification, not just final resolution.
      	* gfortran.dg/gomp/declare-variant-14.f90: Tweak testcase to ensure
      	that -O	causes dead code to be optimized away.
      
      Co-Authored-By: default avatarKwok Cheung Yeung <kcy@codesourcery.com>
      Co-Authored-By: default avatarSandra Loosemore <sandra@codesourcery.com>
      Co-Authored-By: default avatarMarcel Vollweiler <marcel@codesourcery.com>
      1294b819
    • Sandra Loosemore's avatar
      OpenMP: New tree nodes for metadirective and dynamic selector support. · 210a090e
      Sandra Loosemore authored
      
      This patch adds basic support for three new tree node types that will
      be used in subsequent patches to support OpenMP metadirectives and
      dynamic selectors.
      
      OMP_METADIRECTIVE is the internal representation of parsed OpenMP
      metadirective constructs.  It's produced by the front ends and is expanded
      during gimplification.
      
      OMP_NEXT_VARIANT is used as a "magic cookie" for late resolution of
      variant constructs that cannot be fully resolved during
      gimplification, used to set the controlling variable of a switch
      statement that branches to the next alternative once the candidate
      list can be filtered and sorted.  These nodes are expanded into
      constants in the ompdevlow pass.  In some gimple passes, they need to
      be treated as constants.
      
      OMP_TARGET_DEVICE_MATCHES is a similar "magic cookie" used to resolve
      the target_device dynamic selector.  It is wrapped in an OpenMP target
      construct, and can be resolved to a constant in the ompdevlow pass.
      
      gcc/ChangeLog:
      	* doc/generic.texi (OpenMP): Document OMP_METADIRECTIVE,
      	OMP_NEXT_VARIANT, and OMP_TARGET_DEVICE_MATCHES.
      	* fold-const.cc (operand_compare::hash_operand): Ignore
      	the new nodes.
      	* gimple-expr.cc (is_gimple_val): Allow OMP_NEXT_VARIANT
      	and OMP_TARGET_DEVICE_MATCHES.
      	* gimple.cc (get_gimple_rhs_num_ops): OMP_NEXT_VARIANT and
      	OMP_TARGET_DEVICE_MATCHES are both GIMPLE_SINGLE_RHS.
      	* tree-cfg.cc (tree_node_can_be_shared): Allow sharing of
      	OMP_NEXT_VARIANT.
      	* tree-inline.cc (remap_gimple_op_r): Ignore subtrees of
      	OMP_NEXT_VARIANT.
      	* tree-pretty-print.cc (dump_generic_node): Handle OMP_METADIRECTIVE,
      	OMP_NEXT_VARIANT, and OMP_TARGET_DEVICE_MATCHES.
      	* tree-ssa-operands.cc (operands_scanner::get_expr_operands):
      	Ignore operands of OMP_NEXT_VARIANT and OMP_TARGET_DEVICE_MATCHES.
      	* tree.def (OMP_METADIRECTIVE): New.
      	(OMP_NEXT_VARIANT): New.
      	(OMP_TARGET_DEVICE_MATCHES): New.
      	* tree.h (OMP_METADIRECTIVE_VARIANTS): New.
      	(OMP_METADIRECTIVE_VARIANT_SELECTOR): New.
      	(OMP_METADIRECTIVE_VARIANT_DIRECTIVE): New.
      	(OMP_METADIRECTIVE_VARIANT_BODY): New.
      	(OMP_NEXT_VARIANT_INDEX): New.
      	(OMP_NEXT_VARIANT_STATE): New.
      	(OMP_TARGET_DEVICE_MATCHES_SELECTOR): New.
      	(OMP_TARGET_DEVICE_MATCHES_PROPERTIES): New.
      
      Co-Authored-By: default avatarKwok Cheung Yeung <kcy@codesourcery.com>
      Co-Authored-By: default avatarSandra Loosemore <sandra@codesourcery.com>
      210a090e
    • Alexandre Oliva's avatar
      [ifcombine] check and extend constants to compare with bitfields · 22fe3c05
      Alexandre Oliva authored
      Add logic to check and extend constants compared with bitfields, so
      that fields are only compared with constants they could actually
      equal.  This involves making sure the signedness doesn't change
      between loads and conversions before shifts: we'd need to carry a lot
      more data to deal with all the possibilities.
      
      
      for  gcc/ChangeLog
      
      	PR tree-optimization/118456
      	* gimple-fold.cc (decode_field_reference): Punt if shifting
      	after changing signedness.
      	(fold_truth_andor_for_ifcombine): Check extension bits in
      	constants before clipping.
      
      for  gcc/testsuite/ChangeLog
      
      	PR tree-optimization/118456
      	* gcc.dg/field-merge-21.c: New.
      	* gcc.dg/field-merge-22.c: New.
      22fe3c05
    • Robin Dapp's avatar
      RISC-V: Fix vsetvl compatibility predicate [PR118154]. · e5e9e50f
      Robin Dapp authored
      In PR118154 we emit strided stores but the first of those does not
      always have the proper VTYPE.  That's because we erroneously delete
      a necessary vsetvl.
      
      In order to determine whether to elide
      
      (1)
            Expr[7]: VALID (insn 116, bb 17)
              Demand fields: demand_ratio_and_ge_sew demand_avl
              SEW=8, VLMUL=mf2, RATIO=16, MAX_SEW=64
              TAIL_POLICY=agnostic, MASK_POLICY=agnostic
              AVL=(reg:DI 0 zero)
      
      when e.g.
      
      (2)
            Expr[3]: VALID (insn 360, bb 15)
              Demand fields: demand_sew_lmul demand_avl
              SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64
              TAIL_POLICY=agnostic, MASK_POLICY=agnostic
              AVL=(reg:DI 0 zero)
              VL=(reg:DI 13 a3 [345])
      
      is already available, we use
      sew_ge_and_prev_sew_le_next_max_sew_and_next_ratio_valid_for_prev_sew_p.
      
      (1) requires RATIO = SEW/LMUL = 16 and an SEW >= 8.  (2) has ratio = 64,
      though, so we cannot directly elide (1).
      
      This patch uses ratio_eq_p instead of next_ratio_valid_for_prev_sew_p.
      
      	PR target/118154
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-vsetvl.cc (MAX_LMUL): New define.
      	(pre_vsetvl::earliest_fuse_vsetvl_info): Use.
      	(pre_vsetvl::pre_global_vsetvl_info): New predicate with equal
      	ratio.
      	* config/riscv/riscv-vsetvl.def: Use.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/pr118154-1.c: New test.
      	* gcc.target/riscv/rvv/autovec/pr118154-2.c: New test.
      e5e9e50f
    • Robin Dapp's avatar
      match: Keep conditional in simplification to constant [PR118140]. · 14cb0610
      Robin Dapp authored
      In PR118140 we simplify
      
        _ifc__33 = .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11);
      
      to 1:
      
      Match-and-simplified .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11) to 1
      
      when _46 == 1.  This happens by removing the conditional and applying
      a | 1 = 1.  Normally we re-introduce the conditional and its else value
      if needed but that does not happen here as we're not dealing with a
      vector type.  For correctness's sake, we must not remove the conditional
      even for non-vector types.
      
      This patch re-introduces a COND_EXPR in such cases.  For PR118140 this
      result in a non-vectorized loop.
      
      	PR middle-end/118140
      
      gcc/ChangeLog:
      
      	* gimple-match-exports.cc (maybe_resimplify_conditional_op): Add
      	COND_EXPR when we simplified to a scalar gimple value but still
      	have an else value.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/pr118140.c: New test.
      	* gcc.target/riscv/rvv/autovec/pr118140.c: New test.
      14cb0610
    • Nathaniel Shead's avatar
      c++/modules: Don't emit imported deduction guides [PR117397] · 87ffd205
      Nathaniel Shead authored
      
      The ICE in the linked PR is caused because name lookup finds duplicate
      copies of the deduction guides, causing a checking assert to fail.
      
      This is ultimately because we're exporting an imported guide; when name
      lookup processes 'dguide-5_b.H' it goes via the 'tt_entity' path and
      just returns the entity from 'dguide-5_a.H'.  Because this doesn't ever
      go through 'key_mergeable' we never set 'BINDING_VECTOR_GLOBAL_DUPS_P'
      and so deduping is not engaged, allowing duplicate results.
      
      Currently I believe this to be a perculiarity of the ANY_REACHABLE
      handling for deduction guides; in no other case that I can find do we
      emit bindings purely to imported entities.  As such, this patch fixes
      this problem from that end, by ensuring that we simply do not emit any
      imported deduction guides.  This avoids the ICE because no duplicates
      need deduping to start with, and should otherwise have no functional
      change because lookup of deduction guides will look at all reachable
      modules (exported or not) regardless.
      
      Since we're now deliberately not emitting imported deduction guides we
      can use LOOK_want::NORMAL instead of LOOK_want::ANY_REACHABLE, since the
      extra work to find as-yet undiscovered deduction guides in transitive
      importers is not necessary here.
      
      	PR c++/117397
      
      gcc/cp/ChangeLog:
      
      	* module.cc (depset::hash::add_deduction_guides): Don't emit
      	imported deduction guides.
      	(depset::hash::finalize_dependencies): Add check for any
      	bindings referring to imported entities.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/modules/dguide-5_a.H: New test.
      	* g++.dg/modules/dguide-5_b.H: New test.
      	* g++.dg/modules/dguide-5_c.H: New test.
      	* g++.dg/modules/dguide-6.h: New test.
      	* g++.dg/modules/dguide-6_a.C: New test.
      	* g++.dg/modules/dguide-6_b.C: New test.
      	* g++.dg/modules/dguide-6_c.C: New test.
      
      Signed-off-by: default avatarNathaniel Shead <nathanieloshead@gmail.com>
      Reviewed-by: default avatarJason Merrill <jason@redhat.com>
      87ffd205
    • Eric Botcazou's avatar
      Ada: add missing support for the S/390 and RISC-V architectures · 744a59f3
      Eric Botcazou authored
      ...to the object file reader present in the run-time library.
      
      gcc/ada/
      	PR ada/118459
      	* libgnat/s-objrea.ads (Object_Arch): Add S390 and RISCV.
      	* libgnat/s-objrea.adb (EM_S390): New named number.
      	(EM_RISCV): Likewise.
      	(ELF_Ops.Initialize): Deal with EM_S390 and EM_RISCV.
      	(Read_Address): Deal with S390 and RISCV.
      744a59f3
    • Richard Biener's avatar
      tree-optimization/118405 - ICE with vector(1) T vs T load · 31c3c1a8
      Richard Biener authored
      When vectorizing a load we are now checking alignment before emitting
      a vector(1) T load instead of blindly assuming it's OK when we had
      a scalar T load.  For reasons we're not handling alignment computation
      optimally here but we shouldn't ICE when we fall back to loads of T.
      
      The following ensures the IL remains correct by emitting VIEW_CONVERT
      from T to vector(1) T when needed.  It also removes an earlier fix
      done in r9-382-gbb4e47476537f6 for the same issue with VMAT_ELEMENTWISE.
      
      	PR tree-optimization/118405
      	* tree-vect-stmts.cc (vectorizable_load): When we fall back
      	to scalar loads make sure we properly convert to vector(1) T
      	when there was only a single vector element.
      31c3c1a8
    • Anuj Mohite's avatar
      Fortran: Add LOCALITY support for DO_CONCURRENT · 20b8500c
      Anuj Mohite authored
      
      	This patch provided by Anuj Mohite as part of the GSoC project.
      	It is modified slightly by Jerry DeLisle for minor formatting.
      	The patch provides front-end parsing of the LOCALITY specs in
      	DO_CONCURRENT and adds numerous test cases.
      
      gcc/fortran/ChangeLog:
      
      	* dump-parse-tree.cc (show_code_node):  Updated to use
      	c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
      	* frontend-passes.cc (index_interchange): Updated to
      	use c->ext.concur.forall_iterator instead of c->ext.forall_iterator.
      	(gfc_code_walker): Likewise.
      	* gfortran.h (enum locality_type): Added new enum for locality types
      	in DO CONCURRENT constructs.
      	* match.cc (match_simple_forall): Updated to use
      	new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
      	(gfc_match_forall): Likewise.
      	(gfc_match_do):  Implemented support for matching DO CONCURRENT locality
      	specifiers (LOCAL, LOCAL_INIT, SHARED, DEFAULT(NONE), and REDUCE).
      	* parse.cc (parse_do_block): Updated to use
      	new_st.ext.concur.forall_iterator instead of new_st.ext.forall_iterator.
      	* resolve.cc (struct check_default_none_data): Added struct
      	check_default_none_data.
      	(do_concur_locality_specs_f2023): New function to check compliance
      	with F2023's C1133 constraint for DO CONCURRENT.
      	(check_default_none_expr): New function to check DEFAULT(NONE)
      	compliance.
      	(resolve_locality_spec): New function to resolve locality specs.
      	(gfc_count_forall_iterators): Updated to use
      	code->ext.concur.forall_iterator.
      	(gfc_resolve_forall): Updated to use code->ext.concur.forall_iterator.
      	* st.cc (gfc_free_statement): Updated to free locality specifications
      	and use p->ext.concur.forall_iterator.
      	* trans-stmt.cc (gfc_trans_forall_1): Updated to use
      	code->ext.concur.forall_iterator.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/do_concurrent_10.f90: New test.
      	* gfortran.dg/do_concurrent_8_f2018.f90: New test.
      	* gfortran.dg/do_concurrent_8_f2023.f90: New test.
      	* gfortran.dg/do_concurrent_9.f90: New test.
      	* gfortran.dg/do_concurrent_all_clauses.f90: New test.
      	* gfortran.dg/do_concurrent_basic.f90: New test.
      	* gfortran.dg/do_concurrent_constraints.f90: New test.
      	* gfortran.dg/do_concurrent_local_init.f90: New test.
      	* gfortran.dg/do_concurrent_locality_specs.f90: New test.
      	* gfortran.dg/do_concurrent_multiple_reduce.f90: New test.
      	* gfortran.dg/do_concurrent_nested.f90: New test.
      	* gfortran.dg/do_concurrent_parser.f90: New test.
      	* gfortran.dg/do_concurrent_reduce_max.f90: New test.
      	* gfortran.dg/do_concurrent_reduce_sum.f90: New test.
      	* gfortran.dg/do_concurrent_shared.f90: New test.
      
      Signed-off-by: default avatarAnuj <anujmohite001@gmail.com>
      20b8500c
Loading