Skip to content
Snippets Groups Projects
  1. Sep 23, 2024
    • Matthieu Longo's avatar
      aarch64: store signing key and signing method in DWARF _Unwind_FrameState · f5316739
      Matthieu Longo authored
      This patch is only a refactoring of the existing implementation
      of PAuth and returned-address signing. The existing behavior is
      preserved.
      
      _Unwind_FrameState already contains several CIE and FDE information
      (see the attributes below the comment "The information we care
      about from the CIE/FDE" in libgcc/unwind-dw2.h).
      The patch aims at moving the information from DWARF CIE (signing
      key stored in the augmentation string) and FDE (the used signing
      method) into _Unwind_FrameState along the already-stored CIE and
      FDE information.
      Note: those information have to be saved in frame_state_reg_info
      instead of _Unwind_FrameState as they need to be savable by
      DW_CFA_remember_state and restorable by DW_CFA_restore_state, that
      both rely on the attribute "prev".
      
      Those new information in _Unwind_FrameState simplifies the look-up
      of the signing key when the return address is demangled. It also
      allows future signing methods to be easily added.
      
      _Unwind_FrameState is not a part of the public API of libunwind,
      so the change is backward compatible.
      
      A new architecture-specific handler MD_ARCH_EXTENSION_FRAME_INIT
      allows to reset values (if needed) in the frame state and unwind
      context before changing the frame state to the caller context.
      
      A new architecture-specific handler MD_ARCH_EXTENSION_CIE_AUG_HANDLER
      isolates the architecture-specific augmentation strings in AArch64
      backend, and allows others architectures to reuse augmentation
      strings that would have clashed with AArch64 DWARF extensions.
      
      aarch64_demangle_return_addr, DW_CFA_AARCH64_negate_ra_state and
      DW_CFA_val_expression cases in libgcc/unwind-dw2-execute_cfa.h
      were documented to clarify where the value of the RA state register
      is stored (FS and CONTEXT respectively).
      
      libgcc/ChangeLog:
      
      	* config/aarch64/aarch64-unwind.h
      	(AARCH64_DWARF_RA_STATE_MASK): The mask for RA state register.
      	(aarch64_ra_signing_method_t): The diversifiers used to sign a
      	function's return address.
      	(aarch64_pointer_auth_key): The key used to sign a function's
      	return address.
      	(aarch64_cie_signed_with_b_key): Deleted as the signing key is
      	available now in _Unwind_FrameState.
      	(MD_ARCH_EXTENSION_CIE_AUG_HANDLER): New CIE augmentation string
      	handler for architecture extensions.
      	(MD_ARCH_EXTENSION_FRAME_INIT): New architecture-extension
      	initialization routine for DWARF frame state and context before
      	execution of DWARF instructions.
      	(aarch64_context_ra_state_get): Read RA state register from CONTEXT.
      	(aarch64_ra_state_get): Read RA state register from FS.
      	(aarch64_ra_state_set): Write RA state register into FS.
      	(aarch64_ra_state_toggle): Toggle RA state register in FS.
      	(aarch64_cie_aug_handler): Handler AArch64 augmentation strings.
      	(aarch64_arch_extension_frame_init): Initialize defaults for the
      	signing key (PAUTH_KEY_A), and RA state register (RA_no_signing).
      	(aarch64_demangle_return_addr): Rely on the frame registers and
      	the signing_key attribute in _Unwind_FrameState.
      	* unwind-dw2-execute_cfa.h:
      	Use the right alias DW_CFA_AARCH64_negate_ra_state for __aarch64__
      	instead of DW_CFA_GNU_window_save.
      	(DW_CFA_AARCH64_negate_ra_state): Save the signing method in RA
      	state register. Toggle RA state register without resetting 'how'
      	to REG_UNSAVED.
      	* unwind-dw2.c:
      	(extract_cie_info): Save the signing key in the current
      	_Unwind_FrameState while parsing the augmentation data.
      	(uw_frame_state_for): Reset some attributes related to architecture
      	extensions in _Unwind_FrameState.
      	(uw_update_context): Move authentication code to AArch64 unwinding.
      	* unwind-dw2.h (enum register_rule): Give a name to the existing
      	enum for the register rules, and replace 'unsigned char' by 'enum
      	register_rule' to facilitate debugging in GDB.
      	(_Unwind_FrameState): Add a new architecture-extension attribute
      	to store the signing key.
      f5316739
    • Tobias Burnus's avatar
      OpenMP: Fix omp_get_device_from_uid, minor cleanup · cdb9aa0f
      Tobias Burnus authored
      In Fortran, omp_get_device_from_uid can also accept substrings, which are
      then not NUL terminated.  Fixed by introducing a fortran.c wrapper function.
      Additionally, in case of a fail the plugin functions now return NULL instead
      of failing fatally such that a fall-back UID is generated.
      
      gcc/ChangeLog:
      
      	* omp-general.cc (omp_runtime_api_procname): Strip "omp_" from
      	string; move get_device_from_uid as now a '_' suffix exists.
      
      libgomp/ChangeLog:
      
      	* fortran.c (omp_get_device_from_uid_): New function.
      	* libgomp.map (GOMP_6.0): Add it.
      	* oacc-host.c (host_dispatch): Init '.uid' and '.get_uid_func'.
      	* omp_lib.f90.in: Make it used by removing bind(C).
      	* omp_lib.h.in: Likewise.
      	* target.c (omp_get_device_from_uid): Ensure the device is initialized.
      	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): Add function comment;
      	return NULL in case of an error.
      	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): Likewise.
      	* testsuite/libgomp.fortran/device_uid.f90: Update to test substrings.
      cdb9aa0f
    • Claudiu Zissulescu's avatar
      arc: Remove mlra option [PR113954] · ffd861c8
      Claudiu Zissulescu authored
      
      The target dependent mlra option was designed to be able to quickly
      switch between LRA and reload.  The reload register allocator step is
      scheduled for retirement, thus, remove the functionality of mlra,
      keeping it for backward compatibility.
      
      	PR target/113954
      
      gcc/ChangeLog:
      
      	* config/arc/arc.cc (TARGET_LRA_P): Always return true.
      	(arc_lra_p): Remove.
      	* config/arc/arc.h (TARGET_LRA): Remove.
      	* config/arc/arc.opt (mlra): Change it to do nothing.
      	* doc/invoke.texi (mlra): Update option description.
      
      Signed-off-by: default avatarClaudiu Zissulescu <claziss@gmail.com>
      ffd861c8
    • Simon Martin's avatar
      c++: Don't crash when mangling member with anonymous union or template type [PR100632, PR109790] · a030fcad
      Simon Martin authored
      We currently crash upon mangling members that have an anonymous union or
      a template operator type.
      
      The problem is that before calling write_unqualified_name,
      write_member_name asserts that it has a declaration whose DECL_NAME is
      an identifier node that is not that of an operator. This is wrong:
       - In PR100632, it's an anonymous union declaration, hence a 0 DECL_NAME
       - In PR109790, it's a legitimate template declaration for an operator
         (this was accepted up to GCC 10)
      
      This assert was added via r11-6301, to be sure that we do write the "on"
      marker for operator members.
      
      This patch removes that assert and instead
       - Lets members with an anonymous union type go through
       - For operators, adds the missing "on" marker for ABI versions greater
         than the highest usable with GCC 10
      
      	PR c++/109790
      	PR c++/100632
      
      gcc/cp/ChangeLog:
      
      	* mangle.cc (write_member_name): Handle members whose type is an
      	anonymous union member. Write missing "on" marker for operators
      	when ABI version is at least 16.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp0x/decltype83.C: New test.
      	* g++.dg/cpp0x/decltype83a.C: New test.
      	* g++.dg/cpp1y/lambda-ice3.C: New test.
      	* g++.dg/cpp1y/lambda-ice3a.C: New test.
      	* g++.dg/cpp2a/nontype-class67.C: New test.
      a030fcad
    • Simon Martin's avatar
      c++: Don't ICE due to artificial constructor parameters [PR116722] · d7bf5e53
      Simon Martin authored
      The following code triggers an ICE
      
      === cut here ===
      class base {};
      class derived : virtual public base {
      public:
        template<typename Arg> constexpr derived(Arg) {}
      };
      int main() {
        derived obj(1.);
      }
      === cut here ===
      
      The problem is that cxx_bind_parameters_in_call ends up attempting to
      convert a REAL_CST (the first non artificial parameter) to INTEGER_TYPE
      (the type of the __in_chrg parameter), which ICEs.
      
      This patch changes cxx_bind_parameters_in_call to return early if it's
      called with a *structor that has an __in_chrg or __vtt_parm parameter
      since the expression won't be a constant expression.
      
      Note that in the test case, the constructor is not constexpr-suitable,
      however it's OK since it's a template according to my read of paragraph
      (3) of [dcl.constexpr].
      
      	PR c++/116722
      
      gcc/cp/ChangeLog:
      
      	* constexpr.cc (cxx_bind_parameters_in_call): Leave early for
      	{con,de}structors of classes with virtual bases.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp0x/constexpr-ctor22.C: New test.
      d7bf5e53
    • Saurabh Jha's avatar
      Add myself to write after approval · 346f767f
      Saurabh Jha authored
      ChangeLog:
      
      	* MAINTAINERS: Add myself to write after approval.
      346f767f
    • Richard Biener's avatar
      tree-optimization/116810 - out-of-bound access to matches[] · 2c04f175
      Richard Biener authored
      The following makes sure to apply forced splitting of groups for
      firced single-lane SLP only when the group being analyzed has more
      than one lane.  This avoids an out-of-bound access to matches[].
      
      	PR tree-optimization/116810
      	* tree-vect-slp.cc (vect_build_slp_instance): Onlu force
      	splitting for group_size > 1.
      2c04f175
    • Richard Biener's avatar
      tree-optimization/116796 - virtual LC SSA broken after unrolling · e97c75d6
      Richard Biener authored
      When the unroller unloops loops it tracks whether it changes any
      nesting relationship of remaining loops but when scanning a loops
      preheader it fails to pass down the LC-SSA-invalidated bitmap, losing
      the fact that an unrolled formerly inner loop can now be placed on
      an exit of its outer loop.  The following fixes that.
      
      	PR tree-optimization/116796
      	* cfgloopmanip.cc (fix_loop_placements): Get LC-SSA-invalidated
      	bitmap and pass it on.
      	(remove_path): Pass LC-SSA-invalidated to fix_loop_placements.
      e97c75d6
    • Tamar Christina's avatar
      middle-end: Insert invariant instructions before the gsi [PR116812] · 09892448
      Tamar Christina authored
      The new invariant statements should be inserted before the current
      statement and not after.  This goes fine 99% of the time but when the
      current statement is a gcond the control flow gets corrupted.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/116812
      	* tree-vect-slp.cc (vect_slp_region): Fix insertion.
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/116812
      	* gcc.dg/vect/pr116812.c: New test.
      09892448
    • Richard Biener's avatar
      tree-optimization/116791 - Elementwise SLP vectorization · 723f7b6d
      Richard Biener authored
      The following restricts the elementwise SLP vectorization to the
      single-lane case which is the reason I enabled it to avoid regressions
      with non-SLP.  The PR shows that multi-line SLP loads with elementwise
      accesses require work, I'll open a new bug to track this for the
      future.
      
      	PR tree-optimization/116791
      	* tree-vect-stmts.cc (get_group_load_store_type): Only
      	fall back to elementwise access for single-lane SLP, restore
      	hard failure mode for other cases.
      
      	* gcc.dg/vect/pr116791.c: New testcase.
      723f7b6d
    • Tobias Burnus's avatar
      gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h · dfb75079
      Tobias Burnus authored
      In commit r15-3629-g508ef585243d4674d06b0737bfe8769fc18f824f, #embed
      was added and no longer required fprintf '#include' removed, missing
      somehow that with -mstack-size=, the generated configure_stack_size
      will use 'setenv' and 'true'.
      
      gcc/ChangeLog:
      
      	* config/gcn/mkoffload.cc (process_asm): (Re)add the fprintf
      	lines for stdlib.h/stdbool.h inclusion if gcn_stack_size is used.
      dfb75079
    • Pan Li's avatar
      Genmatch: Fix ICE for binary phi cfg mismatching [PR116795] · 999363c5
      Pan Li authored
      
      This patch would like to fix one ICE when try to match the binary
      phi for below cfg.  We check the first edge of the Phi block comes
      from b0, instead of check the only one edge of b1 comes from the
      b0 too.  Thus, it will result in some code to be recog as .SAT_SUB
      but it is not, and finally result the verify_ssa failure.
      
      +------+
      | b0:  |
      | def  |       +-----+
      | ...  |       | b1: |
      | cond |------>| def |
      +------+       | ... |
         |           +-----+
         |              |
         |              |
         v              |
      +-----+           |
      | b2: |           |
      | Phi |<----------+
      +-----+
      
      The below test suites are passed for this patch.
      * The rv64gcv fully regression test.
      * The x86 bootstrap test.
      * The x86 fully regression test.
      
      	PR target/116795
      
      gcc/ChangeLog:
      
      	* gimple-match-head.cc (match_cond_with_binary_phi): Fix the
      	incorrect cfg check as b0->b1 in above example.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/torture/pr116795-1.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      999363c5
    • Andrew Pinski's avatar
      gimple: Simplify gimple_seq_nondebug_singleton_p · 831137be
      Andrew Pinski authored
      
      The implementation of gimple_seq_nondebug_singleton_p
      was convoluted on how to determine if the sequence
      was a singleton (which could contain debug statements).
      
      This simplifies the function into two calls. One to get the start
      after all of the debug statements and then check to see if it
      is at the one before the end (or there is only debug statements
      afterwards).
      
      Bootstrapped and tested on x86_64-linux-gnu (including ada).
      
      gcc/ChangeLog:
      
      	* gimple-iterator.h (gimple_seq_nondebug_singleton_p):
      	Rewrite to be simplely, gsi_start_nondebug/gsi_one_nondebug_before_end_p.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      831137be
    • Andrew Pinski's avatar
      gimple: Remove custom remove_pointer · 2cd76720
      Andrew Pinski authored
      
      Since r11-2700-g22dc89f8073cd0, type_traits has been included via system.h so
      we don't need a custom version for gimple.h.
      
      Note a small C++14 cleanup is to use remove_pointer_t directly here instead
      of remove_pointer<t>::type.
      
      bootstrapped and tested on x86_64-linux-gnu
      
      gcc/ChangeLog:
      
      	* gimple.h (remove_pointer): Remove.
      	(GIMPLE_CHECK2): Use std::remove_pointer instead of custom one.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      2cd76720
    • Andrew Pinski's avatar
      Remove commented out PHI_ARG_DEF macro defition · 0d68bfe2
      Andrew Pinski authored
      
      This was commented out since r0-125500-g80560f9521f81a and a new
      defition was added at the same time. Let's remove the commented
      out version.
      
      gcc/ChangeLog:
      
      	* tree-ssa-operands.h (PHI_ARG_DEF): Remove definition.
      
      Signed-off-by: default avatarAndrew Pinski <quic_apinski@quicinc.com>
      0d68bfe2
    • Aldy Hernandez's avatar
      Update email in MAINTAINERS file. · 52783489
      Aldy Hernandez authored
      ChangeLog:
      
      	* MAINTAINERS: Update email and add myself to DCO.
      52783489
    • Pan Li's avatar
      Match: Support form 2 for vector signed integer .SAT_ADD · 4fc92480
      Pan Li authored
      
      This patch would like to support the form 2 of the vector signed
      integer .SAT_ADD.  Aka below example:
      
      Form 2:
        #define DEF_VEC_SAT_S_ADD_FMT_2(T, UT, MIN, MAX)                     \
        void __attribute__((noinline))                                       \
        vec_sat_s_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \
        {                                                                    \
          unsigned i;                                                        \
          for (i = 0; i < limit; i++)                                        \
            {                                                                \
              T x = op_1[i];                                                 \
              T y = op_2[i];                                                 \
              T sum = (UT)x + (UT)y;                                         \
              if ((x ^ y) < 0 || (sum ^ x) >= 0)                             \
                out[i] = sum;                                                \
              else                                                           \
                out[i] = x < 0 ? MIN : MAX;                                  \
            }                                                                \
        }
      
      DEF_VEC_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX)
      
      Before this patch:
       104   │   loop_len_79 = MIN_EXPR <ivtmp.51_53, POLY_INT_CST [16, 16]>;
       105   │   _50 = &MEM <vector([16,16]) signed char> [(int8_t *)vectp_op_1.9_77];
       106   │   vect_x_18.11_80 = .MASK_LEN_LOAD (_50, 8B, { -1, ... }, loop_len_79, 0);
       107   │   _70 = vect_x_18.11_80 >> 7;
       108   │   vect_x.12_81 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned char>(vect_x_18.11_80);
       109   │   _26 = (void *) ivtmp.47_20;
       110   │   _27 = &MEM <vector([16,16]) signed char> [(int8_t *)_26];
       111   │   vect_y_20.15_84 = .MASK_LEN_LOAD (_27, 8B, { -1, ... }, loop_len_79, 0);
       112   │   vect__7.21_90 = vect_x_18.11_80 ^ vect_y_20.15_84;
       113   │   mask__50.23_92 = vect__7.21_90 >= { 0, ... };
       114   │   vect_y.16_85 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned char>(vect_y_20.15_84);
       115   │   vect__6.17_86 = vect_x.12_81 + vect_y.16_85;
       116   │   vect_sum_21.18_87 = VIEW_CONVERT_EXPR<vector([16,16]) signed char>(vect__6.17_86);
       117   │   vect__8.19_88 = vect_x_18.11_80 ^ vect_sum_21.18_87;
       118   │   mask__45.20_89 = vect__8.19_88 < { 0, ... };
       119   │   mask__44.24_93 = mask__45.20_89 & mask__50.23_92;
       120   │   _40 = .COND_XOR (mask__44.24_93, _70, { 127, ... }, vect_sum_21.18_87);
       121   │   _60 = (void *) ivtmp.49_6;
       122   │   _61 = &MEM <vector([16,16]) signed char> [(int8_t *)_60];
       123   │   .MASK_LEN_STORE (_61, 8B, { -1, ... }, loop_len_79, 0, _40);
       124   │   vectp_op_1.9_78 = vectp_op_1.9_77 + POLY_INT_CST [16, 16];
       125   │   ivtmp.47_4 = ivtmp.47_20 + POLY_INT_CST [16, 16];
       126   │   ivtmp.49_21 = ivtmp.49_6 + POLY_INT_CST [16, 16];
       127   │   ivtmp.51_98 = ivtmp.51_53;
       128   │   ivtmp.51_8 = ivtmp.51_53 + POLY_INT_CST [18446744073709551600, 18446744073709551600];
      
      After this patch:
        88   │   _103 = .SELECT_VL (ivtmp_101, POLY_INT_CST [16, 16]);
        89   │   vect_x_18.11_90 = .MASK_LEN_LOAD (vectp_op_1.9_88, 8B, { -1, ... }, _103, 0);
        90   │   vect_y_20.14_94 = .MASK_LEN_LOAD (vectp_op_2.12_92, 8B, { -1, ... }, _103, 0);
        91   │   vect_patt_49.15_95 = .SAT_ADD (vect_x_18.11_90, vect_y_20.14_94);
        92   │   .MASK_LEN_STORE (vectp_out.16_97, 8B, { -1, ... }, _103, 0, vect_patt_49.15_95);
        93   │   vectp_op_1.9_89 = vectp_op_1.9_88 + _103;
        94   │   vectp_op_2.12_93 = vectp_op_2.12_92 + _103;
        95   │   vectp_out.16_98 = vectp_out.16_97 + _103;
        96   │   ivtmp_102 = ivtmp_101 - _103;
      
      The below test suites are passed for this patch.
      * The rv64gcv fully regression test.
      * The x86 bootstrap test.
      * The x86 fully regression test.
      
      gcc/ChangeLog:
      
      	* match.pd: Add the case 3 for signed .SAT_ADD matching.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      4fc92480
    • Pan Li's avatar
      RISC-V: Add testcases for form 2 of signed vector SAT_ADD · a1e6bb6f
      Pan Li authored
      
      Form 2:
        #define DEF_VEC_SAT_S_ADD_FMT_2(T, UT, MIN, MAX)                     \
        void __attribute__((noinline))                                       \
        vec_sat_s_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \
        {                                                                    \
          unsigned i;                                                        \
          for (i = 0; i < limit; i++)                                        \
            {                                                                \
              T x = op_1[i];                                                 \
              T y = op_2[i];                                                 \
              T sum = (UT)x + (UT)y;                                         \
              if ((x ^ y) < 0 || (sum ^ x) >= 0)                             \
                out[i] = sum;                                                \
              else                                                           \
                out[i] = x < 0 ? MIN : MAX;                                  \
            }                                                                \
        }
      
      DEF_VEC_SAT_S_ADD_FMT_2 (int8_t, uint8_t, INT8_MIN, INT8_MAX)
      
      The below test are passed for this patch.
      * The rv64gcv fully regression test.
      
      It is test only patch and obvious up to a point, will commit it
      directly if no comments in next 48H.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macro.
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-5.c: New test.
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-6.c: New test.
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-7.c: New test.
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-8.c: New test.
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-5.c: New test.
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-6.c: New test.
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-7.c: New test.
      	* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-8.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      a1e6bb6f
    • Hans-Peter Nilsson's avatar
      testsuite/gfortran.dg/unsigned_22.f90: Add missing close with delete, PR116701 · 3f37c6f4
      Hans-Peter Nilsson authored
      Without this patch, gfortran.dg/unsigned_22.f90 fails for
      non-effective-target fd_truncate targets, i.e. targets that
      don't support chsize or ftruncate.  See also
      libgfortran/io/unix.c:raw_truncate.  It passes on the first
      run, but leaves behind a file "fort.10" which is then picked
      up by subsequent runs, but since that file is to be
      rewritten, the libgfortran machinery tries to truncate it,
      which fails.  The file always being left behind, is
      primarily because the test-case lacks a deleting
      close-statement, apparently accidentally.
      
      Incidentally, this "fort.10" artefact is also picked up by
      gfortran.dg/write_check3.f90 causing that test to fail too,
      observable as a regression for non-fd_truncate targets since
      the unsigned_22.f90 introduction.  Also, when running
      e.g. the whole of gfortran.dg/dg.exp, the "fort.10" is later
      deleted by gfortran.dg/write_direct_eor.f90 (which
      regardlessly passes), erasing the clue of the cause of the
      write_check3 failure.  Also, running just
      dg.exp=write_check3.f90 or manually repeating the commands
      in gfortran.log showed no error.
      
      N.B.: this close-statement will not help if unsigned_22 for
      some reason fails, executing one of the "stop" statements,
      but that's also the case for many other tests.
      
      	PR testsuite/116701
      	* gfortran.dg/unsigned_22.f90: Add missing close with delete.
      3f37c6f4
    • GCC Administrator's avatar
      Daily bump. · ca12354f
      GCC Administrator authored
      ca12354f
  2. Sep 22, 2024
    • Pan Li's avatar
      RISC-V: Add testcases for form 4 of signed scalar SAT_ADD · 50c9c3cb
      Pan Li authored
      
      Form 4:
        #define DEF_SAT_S_ADD_FMT_4(T, UT, MIN, MAX)           \
        T __attribute__((noinline))                            \
        sat_s_add_##T##_fmt_4 (T x, T y)                       \
        {                                                      \
          T sum;                                               \
          bool overflow = __builtin_add_overflow (x, y, &sum); \
          return !overflow ? sum : x < 0 ? MIN : MAX;          \
        }
      
      DEF_SAT_S_ADD_FMT_4 (int64_t, uint64_t, INT64_MIN, INT64_MAX)
      
      The below test are passed for this patch.
      * The rv64gcv fully regression test.
      
      It is test only patch and obvious up to a point, will commit it
      directly if no comments in next 48H.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/sat_arith.h: Add test helper macros.
      	* gcc.target/riscv/sat_s_add-13.c: New test.
      	* gcc.target/riscv/sat_s_add-14.c: New test.
      	* gcc.target/riscv/sat_s_add-15.c: New test.
      	* gcc.target/riscv/sat_s_add-16.c: New test.
      	* gcc.target/riscv/sat_s_add-run-13.c: New test.
      	* gcc.target/riscv/sat_s_add-run-14.c: New test.
      	* gcc.target/riscv/sat_s_add-run-15.c: New test.
      	* gcc.target/riscv/sat_s_add-run-16.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      50c9c3cb
    • Pan Li's avatar
      RISC-V: Add testcases for form 3 of signed scalar SAT_ADD · 20ec2c5d
      Pan Li authored
      
      This patch would like to add testcases of the signed scalar SAT_ADD
      for form 3.  Aka:
      
      Form 3:
        #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)           \
        T __attribute__((noinline))                            \
        sat_s_add_##T##_fmt_3 (T x, T y)                       \
        {                                                      \
          T sum;                                               \
          bool overflow = __builtin_add_overflow (x, y, &sum); \
          return overflow ? x < 0 ? MIN : MAX : sum;           \
        }
      
      DEF_SAT_S_ADD_FMT_3 (int64_t, uint64_t, INT64_MIN, INT64_MAX)
      
      The below test are passed for this patch.
      * The rv64gcv fully regression test.
      
      It is test only patch and obvious up to a point, will commit it
      directly if no comments in next 48H.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/sat_arith.h: Add test helper macros.
      	* gcc.target/riscv/sat_s_add-10.c: New test.
      	* gcc.target/riscv/sat_s_add-11.c: New test.
      	* gcc.target/riscv/sat_s_add-12.c: New test.
      	* gcc.target/riscv/sat_s_add-9.c: New test.
      	* gcc.target/riscv/sat_s_add-run-10.c: New test.
      	* gcc.target/riscv/sat_s_add-run-11.c: New test.
      	* gcc.target/riscv/sat_s_add-run-12.c: New test.
      	* gcc.target/riscv/sat_s_add-run-9.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      20ec2c5d
    • Iain Sandoe's avatar
      testsuite, coroutines: Add tests for non-supension ramp returns. · 0312b666
      Iain Sandoe authored
      
      Although it is most common for the ramp function to see a return when a coroutine
      first suspends, there are other possibilities.  For example all the awaits could
      be ready - effectively the coroutine will then run to completion and deallocation.
      Another case is where the first active suspension point causes the current routine
      to be cancelled and thence destroyed.
      
      These cases are tested here.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/coroutines/torture/special-termination-00-sync-completion.C: New test.
      	* g++.dg/coroutines/torture/special-termination-01-self-destruct.C: New test.
      
      Signed-off-by: default avatarIain Sandoe <iain@sandoe.co.uk>
      0312b666
    • Iain Sandoe's avatar
      libgcc, Darwin: From macOS 11, make that the earliest supported. · 43eab549
      Iain Sandoe authored
      
      For libgcc, we have (so far) supported building a DSO that supports
      earlier versions of the OS than the target.  From macOS 11, there are
      APIs that do not exist on earlier OS versions, so limit the libgcc
      range to macOS11..current.
      
      libgcc/ChangeLog:
      
      	* config.host: From macOS 11, limit earliest macOS support
      	to macOS 11.
      	* config/t-darwin-min-11: New file.
      
      Signed-off-by: default avatarIain Sandoe <iain@sandoe.co.uk>
      43eab549
    • Jonathan Wakely's avatar
      libstdc++: Disable std::formatter<char8_t, C> specialization · 0f52a92a
      Jonathan Wakely authored
      I noticed that char8_t was missing from the list of types that were
      prevented from using the std::formatter partial specialization for
      integer types. That partial specialization was also matching
      cv-qualified integer types, because std::integral<const int> is true.
      
      This change simplifies the constraints by introducing a new variable
      template which is only true for cv-unqualified integer types, with
      explicit specializations to exclude the character types. This should be
      slightly more efficient than the previous constraints that checked
      std::integral<T> and (!__is_one_of<T, char, wchar_t, ...>). It also
      avoids the need for a separate std::formatter specialization for 128-bit
      integers, as they can be handled by the new variable template too.
      
      libstdc++-v3/ChangeLog:
      
      	* include/std/format (__format::__is_formattable_integer): New
      	variable template and specializations.
      	(template<integral, __char> struct formatter): Replace
      	constraints on first arg with __is_formattable_integer.
      	* testsuite/std/format/formatter/requirements.cc: Check that
      	std::formatter specializations for char8_t and const int are
      	disabled.
      0f52a92a
    • Jonathan Wakely's avatar
      libstdc++: Fix condition for ranges::copy to use memmove [PR116754] · 83c6fe13
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/116754
      	* include/bits/ranges_algobase.h (__copy_or_move): Fix order of
      	arguments to __memcpyable.
      83c6fe13
    • Jonathan Wakely's avatar
      libstdc++: Fix formatting of most negative chrono::duration [PR116755] · 482e651f
      Jonathan Wakely authored
      When formatting chrono::duration<signed-integer-type, P>::min() we were
      causing undefined behaviour by trying to form the negative of the most
      negative value. If we convert negative durations with integer rep to the
      corresponding unsigned integer rep then we can safely represent all
      values.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/116755
      	* include/bits/chrono_io.h (formatter<duration<R,P>>::format):
      	Cast negative integral durations to unsigned rep.
      	* testsuite/20_util/duration/io.cc: Test the most negative
      	integer durations.
      482e651f
    • Jonathan Wakely's avatar
      libstdc++: Use constexpr instead of _GLIBCXX20_CONSTEXPR in <vector> · b6463161
      Jonathan Wakely authored
      For the operator<=> overload we can use the 'constexpr' keyword
      directly, because we know the language dialect is at least C++20.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/stl_vector.h (operator<=>): Use constexpr
      	instead of _GLIBCXX20_CONSTEXPR macro.
      b6463161
    • Jonathan Wakely's avatar
      libstdc++: Silence -Wattributes warning in exception_ptr · 164c1b1f
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	* libsupc++/exception_ptr.h (__exception_ptr::_M_safe_bool_dummy):
      	Remove __attribute__((const)) from function returning void.
      164c1b1f
    • Jonathan Wakely's avatar
      libstdc++: Silence -Woverloaded-virtual warning in cxx11-ios_failure.cc · d842eb5e
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	* src/c++11/cxx11-ios_failure.cc (__iosfail_type_info): Unhide
      	the three-arg overload of __do_upcast.
      d842eb5e
    • Jonathan Wakely's avatar
      libstdc++: Reorder C++26 entries in version.def · d024be89
      Jonathan Wakely authored
      This puts the C++26 ftms definitions in alphabetical order.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/version.def: Sort C++26 entries alphabetically.
      	* include/bits/version.h: Regenerate.
      d024be89
    • Jonathan Wakely's avatar
      libstdc++: add default template parameters to algorithms · dc47add7
      Jonathan Wakely authored
      
      This implements P2248R8 + P3217R0, both approved for C++26.
      The changes are mostly mechanical; the struggle is to keep readability
      with the pre-P2248 signatures.
      
      * For containers, "classic STL" algorithms and their parallel versions,
        introduce a macro and amend their declarations/definitions with it.
        The macro either expands to the defaulted parameter or to nothing
        in pre-C++26 modes.
      
      * For range algorithms, we need to reorder their template parameters.
        I've done so unconditionally, because users cannot rely on template
        parameters of algorithms (this is explicitly authorized by
        [algorithms.requirements]/15). The defaults are then hidden behind
        another macro.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/iterator_concepts.h: Add projected_value_t.
      	* include/bits/algorithmfwd.h: Add the default template
      	parameter to the relevant forward declarations.
      	* include/pstl/glue_algorithm_defs.h: Likewise.
      	* include/bits/ranges_algo.h: Add the default template
      	parameter to range-based algorithms.
      	* include/bits/ranges_algobase.h: Likewise.
      	* include/bits/ranges_util.h: Likewise.
      	* include/bits/ranges_base.h: Add helper macros.
      	* include/bits/stl_iterator_base_types.h: Add helper macro.
      	* include/bits/version.def: Add the new feature-testing macro.
      	* include/bits/version.h: Regenerate.
      	* include/std/algorithm: Pull the feature-testing macro.
      	* include/std/ranges: Likewise.
      	* include/std/deque: Pull the feature-testing macro, add
      	the default for std::erase.
      	* include/std/forward_list: Likewise.
      	* include/std/list: Likewise.
      	* include/std/string: Likewise.
      	* include/std/vector: Likewise.
      	* testsuite/23_containers/default_template_value.cc: New test.
      	* testsuite/25_algorithms/default_template_value.cc: New test.
      
      Signed-off-by: default avatarGiuseppe D'Angelo <giuseppe.dangelo@kdab.com>
      Co-authored-by: default avatarJonathan Wakely <jwakely@redhat.com>
      dc47add7
    • Tamar Christina's avatar
      middle-end: lower COND_EXPR into gimple form in vect_recog_bool_pattern · 4150bcd2
      Tamar Christina authored
      Currently the vectorizer cheats when lowering COND_EXPR during bool recog.
      In the cases where the conditonal is loop invariant or non-boolean it instead
      converts the operation back into GENERIC and hides much of the operation from
      the analysis part of the vectorizer.
      
      i.e.
      
        a ? b : c
      
      is transformed into:
      
        a != 0 ? b : c
      
      however by doing so we can't perform any optimization on the mask as they aren't
      explicit until quite late during codegen.
      
      To fix this this patch lowers booleans earlier and so ensures that we are always
      in GIMPLE.
      
      For when the value is a loop invariant boolean we have to generate an additional
      conversion from bool to the integer mask form.
      
      This is done by creating a loop invariant a ? -1 : 0 with the target mask
      precision and then doing a normal != 0 comparison on that.
      
      To support this the patch also adds the ability to during pattern matching
      create a loop invariant pattern that won't be seen by the vectorizer and will
      instead me materialized inside the loop preheader in the case of loops, or in
      the case of BB vectorization it materializes it in the first BB in the region.
      
      gcc/ChangeLog:
      
      	* tree-vect-patterns.cc (append_inv_pattern_def_seq): New.
      	(vect_recog_bool_pattern): Lower COND_EXPRs.
      	* tree-vect-slp.cc (vect_slp_region): Materialize loop invariant
      	statements.
      	* tree-vect-loop.cc (vect_transform_loop): Likewise.
      	* tree-vect-stmts.cc (vectorizable_comparison_1): Remove
      	VECT_SCALAR_BOOLEAN_TYPE_P handling for vectype.
      	* tree-vectorizer.cc (vec_info::vec_info): Initialize
      	inv_pattern_def_seq.
      	* tree-vectorizer.h (LOOP_VINFO_INV_PATTERN_DEF_SEQ): New.
      	(class vec_info): Add inv_pattern_def_seq.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/vect/bb-slp-conditional_store_1.c: New test.
      	* gcc.dg/vect/vect-conditional_store_5.c: New test.
      	* gcc.dg/vect/vect-conditional_store_6.c: New test.
      4150bcd2
    • Tamar Christina's avatar
      aarch64: Take into account when VF is higher than known scalar iters · e84e5d03
      Tamar Christina authored
      Consider low overhead loops like:
      
      void
      foo (char *restrict a, int *restrict b, int *restrict c, int n)
      {
        for (int i = 0; i < 9; i++)
          {
            int res = c[i];
            int t = b[i];
            if (a[i] != 0)
              res = t;
            c[i] = res;
          }
      }
      
      For such loops we use latency only costing since the loop bounds is known and
      small.
      
      The current costing however does not consider the case where niters < VF.
      
      So when comparing the scalar vs vector costs it doesn't keep in mind that the
      scalar code can't perform VF iterations.  This makes it overestimate the cost
      for the scalar loop and we incorrectly vectorize.
      
      This patch takes the minimum of the VF and niters in such cases.
      Before the patch we generate:
      
       note:  Original vector body cost = 46
       note:  Vector loop iterates at most 1 times
       note:  Scalar issue estimate:
       note:    load operations = 2
       note:    store operations = 1
       note:    general operations = 1
       note:    reduction latency = 0
       note:    estimated min cycles per iteration = 1.000000
       note:    estimated cycles per vector iteration (for VF 32) = 32.000000
       note:  SVE issue estimate:
       note:    load operations = 5
       note:    store operations = 4
       note:    general operations = 11
       note:    predicate operations = 12
       note:    reduction latency = 0
       note:    estimated min cycles per iteration without predication = 5.500000
       note:    estimated min cycles per iteration for predication = 12.000000
       note:    estimated min cycles per iteration = 12.000000
       note:  Low iteration count, so using pure latency costs
       note:  Cost model analysis:
      
      vs after:
      
       note:  Original vector body cost = 46
       note:  Known loop bounds, capping VF to 9 for analysis
       note:  Vector loop iterates at most 1 times
       note:  Scalar issue estimate:
       note:    load operations = 2
       note:    store operations = 1
       note:    general operations = 1
       note:    reduction latency = 0
       note:    estimated min cycles per iteration = 1.000000
       note:    estimated cycles per vector iteration (for VF 9) = 9.000000
       note:  SVE issue estimate:
       note:    load operations = 5
       note:    store operations = 4
       note:    general operations = 11
       note:    predicate operations = 12
       note:    reduction latency = 0
       note:    estimated min cycles per iteration without predication = 5.500000
       note:    estimated min cycles per iteration for predication = 12.000000
       note:    estimated min cycles per iteration = 12.000000
       note:  Increasing body cost to 1472 because the scalar code could issue within the limit imposed by predicate operations
       note:  Low iteration count, so using pure latency costs
       note:  Cost model analysis:
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64.cc (adjust_body_cost):
      	Cap VF for low iteration loops.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/sve/asrdiv_4.c: Update bounds.
      	* gcc.target/aarch64/sve/cond_asrd_2.c: Likewise.
      	* gcc.target/aarch64/sve/cond_uxt_6.c: Likewise.
      	* gcc.target/aarch64/sve/cond_uxt_7.c: Likewise.
      	* gcc.target/aarch64/sve/cond_uxt_8.c: Likewise.
      	* gcc.target/aarch64/sve/miniloop_1.c: Likewise.
      	* gcc.target/aarch64/sve/spill_6.c: Likewise.
      	* gcc.target/aarch64/sve/sve_iters_low_1.c: New test.
      	* gcc.target/aarch64/sve/sve_iters_low_2.c: New test.
      e84e5d03
    • GCC Administrator's avatar
      Daily bump. · 67382245
      GCC Administrator authored
      67382245
  3. Sep 21, 2024
    • Mikael Morin's avatar
      fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608] · d6cb7794
      Mikael Morin authored
      Introduce the -finline-intrinsics flag to control from the command line
      whether to generate either inline code or calls to the functions from the
      library, for the MINLOC and MAXLOC intrinsics.
      
      The flag allows to specify inlining either independently for each intrinsic
      (either MINLOC or MAXLOC), or all together.  For each intrinsic, a default
      value is set if none was set.  The default value depends on the optimization
      setting: inlining is avoided if not optimizing or if optimizing for size;
      otherwise inlining is preferred.
      
      There is no direct support for this behaviour provided by the .opt options
      framework.  It is obtained by defining three different variants of the flag
      (finline-intrinsics, fno-inline-intrinsics, finline-intrinsics=) all using
      the same underlying option variable.  Each enum value (corresponding to an
      intrinsic function) uses two identical bits, and the variable is initialized
      with alternated bits, so that we can tell whether the value was set or not
      by checking whether the two bits have different values.
      
      	PR fortran/90608
      
      gcc/ChangeLog:
      
      	* flag-types.h (enum gfc_inlineable_intrinsics): New type.
      
      gcc/fortran/ChangeLog:
      
      	* invoke.texi(finline-intrinsics): Document new flag.
      	* lang.opt (finline-intrinsics, finline-intrinsics=,
      	fno-inline-intrinsics): New flags.
      	* options.cc (gfc_post_options): If the option variable controlling
      	the inlining of MAXLOC (respectively MINLOC) has not been set, set
      	it or clear it depending on the optimization option variables.
      	* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Return false
      	if inlining for the intrinsic is disabled according to the option
      	variable.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/minmaxloc_18.f90: New test.
      	* gfortran.dg/minmaxloc_18a.f90: New test.
      	* gfortran.dg/minmaxloc_18b.f90: New test.
      	* gfortran.dg/minmaxloc_18c.f90: New test.
      	* gfortran.dg/minmaxloc_18d.f90: New test.
      d6cb7794
    • Mikael Morin's avatar
      fortran: Continue MINLOC/MAXLOC second loop where the first stopped [PR90608] · 3c01ddc4
      Mikael Morin authored
      Continue the second set of loops where the first one stopped in the
      generated inline MINLOC/MAXLOC code in the cases where the generated code
      contains two sets of loops.  This fixes a regression that was introduced
      when enabling the generation of inline MINLOC/MAXLOC code with ARRAY of rank
      greater than 1, no DIM argument, and either non-scalar MASK or floating-
      point ARRAY.
      
      In the cases where two sets of loops are generated as inline MINLOC/MAXLOC
      code, we previously generated code such as (for rank 2 ARRAY, so with two
      levels of nesting):
      
      	for (idx11 in lower1..upper1)
      	  {
      	    for (idx12 in lower2..upper2)
      	      {
      	        ...
      	        if (...)
      	          {
      	            ...
      	            goto second_loop;
      	          }
      	      }
      	  }
      	second_loop:
      	for (idx21 in lower1..upper1)
      	  {
      	    for (idx22 in lower2..upper2)
      	      {
      	        ...
      	      }
      	  }
      
      which means we process the first elements twice, once in the first set
      of loops and once in the second one.  This change avoids this duplicate
      processing by using a conditional as lower bound for the second set of
      loops, generating code like:
      
      	second_loop_entry = false;
      	for (idx11 in lower1..upper1)
      	  {
      	    for (idx12 in lower2..upper2)
      	      {
      	        ...
      	        if (...)
      	          {
      	            ...
      	            second_loop_entry = true;
      	            goto second_loop;
      	          }
      	      }
      	  }
      	second_loop:
      	for (idx21 in (second_loop_entry ? idx11 : lower1)..upper1)
      	  {
      	    for (idx22 in (second_loop_entry ? idx12 : lower2)..upper2)
      	      {
      	        ...
      	        second_loop_entry = false;
      	      }
      	  }
      
      It was expected that the compiler optimizations would be able to remove the
      state variable second_loop_entry.  It is the case if ARRAY has rank 1 (so
      without loop nesting), the variable is removed and the loop bounds become
      unconditional, which restores previously generated code, fully fixing the
      regression.  For larger rank, unfortunately, the state variable and
      conditional loop bounds remain, but those cases were previously using
      library calls, so it's not a regression.
      
      	PR fortran/90608
      
      gcc/fortran/ChangeLog:
      
      	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate a set
      	of index variables.  Set them using the loop indexes before leaving
      	the first set of loops.  Generate a new loop entry predicate.
      	Initialize it.  Set it before leaving the first set of loops.  Clear
      	it in the body of the second set of loops.  For the second set of
      	loops, update each loop lower bound to use the corresponding index
      	variable if the predicate variable is set.
      3c01ddc4
    • Mikael Morin's avatar
      fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608] · 7d43b4e0
      Mikael Morin authored
      Enable generation of inline MINLOC/MAXLOC code in the case where DIM
      is not present, and either ARRAY is of floating point type or MASK is an
      array.  Those cases are the remaining bits to fully support inlining of
      non-CHARACTER MINLOC/MAXLOC without DIM.  They are treated together because
      they generate similar code, the NANs for REAL types being handled a bit like
      a second level of masking.  These are the cases for which we generate two
      sets of loops.
      
      This change affects the code generating the second loop, that was previously
      accessible only in the cases ARRAY has rank 1 only.  The single variable
      initialization and update are changed to apply to multiple variables, one
      per dimension.
      
      The code generated is as follows (if ARRAY has rank 2):
      
      	for (idx11 in lower1..upper1)
      	  {
      	    for (idx12 in lower2..upper2)
      	      {
      		...
      		if (...)
      		  {
      		    ...
      		    goto second_loop;
      		  }
      	      }
      	  }
      	second_loop:
      	for (idx21 in lower1..upper1)
      	  {
      	    for (idx22 in lower2..upper2)
      	      {
      		...
      	      }
      	  }
      
      This code leads to processing the first elements redundantly, both in the
      first set of loops and in the second one.  The loop over idx22 could
      start from idx12 the first time it is run, but as it has to start from
      lower2 for the rest of the runs, this change uses the same bounds for both
      set of loops for simplicity.  In the rank 1 case, this makes the generated
      code worse compared to the inline code that was generated before.  A later
      change will introduce conditionals to avoid the duplicate processing and
      restore the generated code in that case.
      
      	PR fortran/90608
      
      gcc/fortran/ChangeLog:
      
      	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Initialize
      	and update all the variables.  Put the label and goto in the
      	outermost scalarizer loop.  Don't start the second loop where the
      	first stopped.
      	(gfc_inline_intrinsic_function_p): Also return TRUE for array MASK
      	or for any REAL type.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/maxloc_bounds_5.f90: Additionally accept error
      	messages reported by the scalarizer.
      	* gfortran.dg/maxloc_bounds_6.f90: Ditto.
      7d43b4e0
    • Mikael Morin's avatar
      fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK [PR90608] · 5999d558
      Mikael Morin authored
      Enable the generation of inline code for MINLOC/MAXLOC when argument ARRAY
      is of integral type, DIM is not present, and MASK is present and is scalar
      (only absent MASK or rank 1 ARRAY were inlined before).
      
      Scalar masks are implemented with a wrapping condition around the code one
      would generate if MASK wasn't present, so they are easy to support once
      inline code without MASK is working.
      
      	PR fortran/90608
      
      gcc/fortran/ChangeLog:
      
      	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate
      	variable initialization for each dimension in the else branch of
      	the toplevel condition.
      	(gfc_inline_intrinsic_function_p): Return TRUE for scalar MASK.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/maxloc_bounds_7.f90: Additionally accept the error message
      	reported by the scalarizer.
      5999d558
    • Mikael Morin's avatar
      fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK [PR90608] · dd525038
      Mikael Morin authored
      Enable generation of inline code for the MINLOC and MAXLOC intrinsic,
      if the ARRAY argument is of integral type and of any rank (only the rank 1
      case was previously inlined), and neither DIM nor MASK arguments are
      present.
      
      This needs a few adjustments in gfc_conv_intrinsic_minmaxloc,
      mainly to replace the single variables POS and OFFSET, with collections
      of variables, one variable per dimension each.
      
      The restriction to integral ARRAY and absent MASK limits the scope of
      the change to the cases where we generate single loop inline code.  The
      code generation for the second loop is only accessible with ARRAY of rank
      1, so it can continue using a single variable.  A later change will extend
      inlining to the double loop cases.
      
      There is some bounds checking code that was previously handled by the
      library, and that needed some changes in the scalarizer to avoid regressing.
      The bounds check code generation was already supported by the scalarizer,
      but it was only applying to array reference sections, checking both
      for array bound violation and for shape conformability between all the
      involved arrays.  With this change, for MINLOC or MAXLOC, enable the
      conformability check between all the scalarized arrays, and disable the
      array bound violation check.
      
      	PR fortran/90608
      
      gcc/fortran/ChangeLog:
      
      	* trans-array.cc (gfc_conv_ss_startstride): Set the MINLOC/MAXLOC
      	result upper bound using the rank of the ARRAY argument.  Ajdust
      	the error message for intrinsic result arrays.  Only check array
      	bounds for array references.  Move bound check decision code...
      	(bounds_check_needed): ... here as a new predicate.  Allow bound
      	check for MINLOC/MAXLOC intrinsic results.
      	* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Change the
      	result array upper bound to the rank of ARRAY.  Update the NONEMPTY
      	variable to depend on the non-empty extent of every dimension.  Use
      	one variable per dimension instead of a single variable for the
      	position and the offset.  Update their declaration, initialization,
      	and update to affect the variable of each dimension.  Use the first
      	variable only in areas only accessed with rank 1 ARRAY argument.
      	Set every element of the result using its corresponding variable.
      	(gfc_inline_intrinsic_function_p): Return true for integral ARRAY
      	and absent DIM and MASK.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/maxloc_bounds_4.f90: Additionally accept the error
      	message emitted by the scalarizer.
      dd525038
Loading