Skip to content
Snippets Groups Projects
  1. Nov 13, 2024
    • Jonathan Wakely's avatar
      libstdc++: Fix nodiscard warnings in perf test for memory pools · 42def7cd
      Jonathan Wakely authored
      The use of unnamed std::lock_guard temporaries was intentional here, as
      they were used like barriers (but std::barrier isn't available until
      C++20). But that gives nodiscard warnings, because unnamed temporary
      locks are usually unintentional. Use named variables in new block scopes
      instead.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/performance/20_util/memory_resource/pools.cc: Fix
      	-Wunused-value warnings about unnamed std::lock_guard objects.
      42def7cd
    • Richard Sandiford's avatar
      aarch64: Relax add_overloaded_function assert · 2d7d8179
      Richard Sandiford authored
      There are some SVE intrinsics that support one set of suffixes for
      one extension (E1, say) and another set of suffixes for another
      extension (E2, say).  It is usually the case that, mutatis mutandis,
      E2 extends E1.  Listing E1 first would then ensure that the manual
      C overload would also require E1, making it suitable for resolving
      both the E1 forms and, where appropriate, the E2 forms.
      
      However, there was one exception: the I8MM, F32MM, and F64MM extensions
      to SVE each added variants of svmmla, but there was no svmmla for SVE
      itself.  This was handled by adding an SVE entry for svmmla that only
      defined the C overload; it had no variants of its own.
      
      This situation occurs more often with upcoming patches.  Rather than
      keep adding these dummy entries, it seemed better to make the code
      automatically compute the lowest common denominator for all definitions
      that share the same C overload.
      
      gcc/
      	* config/aarch64/aarch64-protos.h
      	(aarch64_required_extensions::common_denominator): New member
      	function.
      	* config/aarch64/aarch64-sve-builtins-base.def: Remove zero-variant
      	entry for mmla.
      	* config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): Remove
      	support for it.
      	* config/aarch64/aarch64-sve-builtins.cc
      	(function_builder::add_overloaded): Relax the assert for duplicate
      	definitions and instead calculate the common denominator of all
      	requirements.
      2d7d8179
    • Filip Kastl's avatar
      i386: Add -mveclibabi=aocl [PR56504] · 99ec0eb3
      Filip Kastl authored
      
      We currently support generating vectorized math calls to the AMD core
      math library (ACML) (-mveclibabi=acml).  That library is end-of-life and
      its successor is the math library from AMD Optimizing CPU Libraries
      (AOCL).
      
      This patch adds support for AOCL (-mveclibabi=aocl).  That significantly
      broadens the range of vectorized math functions optimized for AMD CPUs
      that GCC can generate calls to.
      
      See the edit to invoke.texi for a complete list of added functions.
      Compared to the list of functions in AOCL LibM docs I left out these
      vectorized function families:
      
      - sincos and all functions working with arrays ... Because these
        functions have pointer arguments and that would require a bigger
        rework of ix86_veclibabi_aocl().  Also, I'm not sure if GCC even ever
        generates calls to these functions.
      - linearfrac ... Because these functions are specific to the AMD
        library.  There's no equivalent glibc function nor GCC internal
        function nor GCC built-in.
      - powx, sqrt, fabs ... Because GCC doesn't vectorize these functions
        into calls and uses instructions instead.
      
      I also left amd_vrd2_expm1() (the AMD docs list the function but I
      wasn't able to link calls to it with the current version of the
      library).
      
      gcc/ChangeLog:
      
      	PR target/56504
      	* config/i386/i386-options.cc (ix86_option_override_internal):
      	Add ix86_veclibabi_type_aocl case.
      	* config/i386/i386-options.h (ix86_veclibabi_aocl): Add extern
      	ix86_veclibabi_aocl().
      	* config/i386/i386-opts.h (enum ix86_veclibabi): Add
      	ix86_veclibabi_type_aocl into the ix86_veclibabi enum.
      	* config/i386/i386.cc (ix86_veclibabi_aocl): New function.
      	* config/i386/i386.opt: Add the 'aocl' type.
      	* doc/invoke.texi: Document -mveclibabi=aocl.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/56504
      	* gcc.target/i386/vectorize-aocl1.c: New test.
      
      Signed-off-by: default avatarFilip Kastl <fkastl@suse.cz>
      99ec0eb3
    • John David Anglin's avatar
      hppa: Remove inner `fix:SF/DF` from fixed-point patterns · 0342d024
      John David Anglin authored
      2024-11-13  John David Anglin  <danglin@gcc.gnu.org>
      
      gcc/ChangeLog:
      
      	PR target/117525
      	* config/pa/pa.md (fix_truncsfsi2): Remove inner `fix:SF`.
      	(fix_truncdfsi2, fix_truncsfdi2, fix_truncdfdi2,
      	fixuns_truncsfsi2, fixuns_truncdfsi2, fixuns_truncsfdi2,
      	fixuns_truncdfdi2): Likewise.
      0342d024
    • David Malcolm's avatar
      diagnostics: avoid using global_dc in path-printing · 5ace2b23
      David Malcolm authored
      
      gcc/analyzer/ChangeLog:
      	* checker-path.cc (checker_path::debug): Explicitly use
      	global_dc's reference printer.
      	* diagnostic-manager.cc
      	(diagnostic_manager::prune_interproc_events): Likewise.
      	(diagnostic_manager::prune_system_headers): Likewise.
      
      gcc/ChangeLog:
      	* diagnostic-path.cc (diagnostic_event::get_desc): Add param
      	"ref_pp" and use instead of global_dc.
      	(class path_label): Likewise, adding field m_ref_pp.
      	(event_range::event_range): Add param "ref_pp" and pass to
      	m_path_label.
      	(path_summary::path_summary): Add param "ref_pp" and pass to
      	event_range ctor.
      	(diagnostic_text_output_format::print_path): Pass *pp to
      	path_summary ctor.
      	(selftest::test_empty_path): Pass *event_pp to pass_summary ctor.
      	(selftest::test_intraprocedural_path): Likewise.
      	(selftest::test_interprocedural_path_1): Likewise.
      	(selftest::test_interprocedural_path_2): Likewise.
      	(selftest::test_recursion): Likewise.
      	(selftest::test_control_flow_1): Likewise.
      	(selftest::test_control_flow_2): Likewise.
      	(selftest::test_control_flow_3): Likewise.
      	(selftest::assert_cfg_edge_path_streq): Likewise.
      	(selftest::test_control_flow_5): Likewise.
      	(selftest::test_control_flow_6): Likewise.
      	* diagnostic-path.h (diagnostic_event::get_desc): Add param
      	"ref_pp".
      	* lazy-diagnostic-path.cc (selftest::test_intraprocedural_path):
      	Pass *event_pp to get_desc.
      	* simple-diagnostic-path.cc (selftest::test_intraprocedural_path):
      	Likewise.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      5ace2b23
    • Soumya AR's avatar
      Match: Fold pow calls to ldexp when possible [PR57492] · 5a674367
      Soumya AR authored
      This patch transforms the following POW calls to equivalent LDEXP calls, as
      discussed in PR57492:
      
      powi (powof2, i) -> ldexp (1.0, i * log2 (powof2))
      
      powof2 * ldexp (x, i) -> ldexp (x, i + log2 (powof2))
      
      a * ldexp(1., i) -> ldexp (a, i)
      
      This is especially helpful for SVE architectures as LDEXP calls can be
      implemented using the FSCALE instruction, as seen in the following patch:
      https://gcc.gnu.org/g:9b2915d95d855333d4d8f66b71a75f653ee0d076
      
      
      
      SPEC2017 was run with this patch, while there are no noticeable improvements,
      there are no non-noise regressions either.
      
      The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
      
      Signed-off-by: default avatarSoumya AR <soumyaa@nvidia.com>
      
      gcc/ChangeLog:
      	PR target/57492
      	* match.pd: Added patterns to fold calls to pow to ldexp and optimize
      	specific ldexp calls.
      
      gcc/testsuite/ChangeLog:
      	PR target/57492
      	* gcc.dg/tree-ssa/ldexp.c: New test.
      	* gcc.dg/tree-ssa/pow-to-ldexp.c: New test.
      5a674367
    • Yangyu Chen's avatar
      RISC-V: Add Multi-Versioning Test Cases · f42f8dcf
      Yangyu Chen authored
      
      This patch adds test cases for the Function Multi-Versioning (FMV)
      feature for RISC-V, which reuses the existing test cases from the
      aarch64 and ported them to RISC-V.
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/testsuite/ChangeLog:
      
      	* g++.target/riscv/mv-symbols1.C: New test.
      	* g++.target/riscv/mv-symbols2.C: New test.
      	* g++.target/riscv/mv-symbols3.C: New test.
      	* g++.target/riscv/mv-symbols4.C: New test.
      	* g++.target/riscv/mv-symbols5.C: New test.
      	* g++.target/riscv/mvc-symbols1.C: New test.
      	* g++.target/riscv/mvc-symbols2.C: New test.
      	* g++.target/riscv/mvc-symbols3.C: New test.
      	* g++.target/riscv/mvc-symbols4.C: New test.
      f42f8dcf
    • Yangyu Chen's avatar
      RISC-V: Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY and... · 917d03e4
      Yangyu Chen authored
      RISC-V: Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY and TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
      
      This patch implements the TARGET_GENERATE_VERSION_DISPATCHER_BODY and
      TARGET_GET_FUNCTION_VERSIONS_DISPATCHER for RISC-V. This is used to
      generate the dispatcher function and get the dispatcher function for
      function multiversioning.
      
      This patch copies many codes from commit 0cfde688 ("[aarch64]
      Add function multiversioning support") and modifies them to fit the
      RISC-V port. A key difference is the data structure of feature bits in
      RISC-V C-API is a array of unsigned long long, while in AArch64 is not
      a array. So we need to generate the array reference for each feature
      bits element in the dispatcher function.
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc (add_condition_to_bb): New function.
      	(dispatch_function_versions): New function.
      	(get_suffixed_assembler_name): New function.
      	(make_resolver_func): New function.
      	(riscv_generate_version_dispatcher_body): New function.
      	(riscv_get_function_versions_dispatcher): New function.
      	(TARGET_GENERATE_VERSION_DISPATCHER_BODY): Implement it.
      	(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Implement it.
      917d03e4
    • Yangyu Chen's avatar
      RISC-V: Implement TARGET_MANGLE_DECL_ASSEMBLER_NAME · 0c77c4b0
      Yangyu Chen authored
      
      This patch implements the TARGET_MANGLE_DECL_ASSEMBLER_NAME for RISC-V.
      This is used to add function multiversioning suffixes to the assembler
      name.
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc
      	(riscv_mangle_decl_assembler_name): New function.
      	(TARGET_MANGLE_DECL_ASSEMBLER_NAME): Define.
      0c77c4b0
    • Yangyu Chen's avatar
      RISC-V: Implement TARGET_COMPARE_VERSION_PRIORITY and TARGET_OPTION_FUNCTION_VERSIONS · 78753c75
      Yangyu Chen authored
      This patch implements TARGET_COMPARE_VERSION_PRIORITY and
      TARGET_OPTION_FUNCTION_VERSIONS for RISC-V.
      
      The TARGET_COMPARE_VERSION_PRIORITY is implemented to compare the
      priority of two function versions based on the rules defined in the
      RISC-V C-API Doc PR #85:
      
      https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85/files#diff-79a93ca266139524b8b642e582ac20999357542001f1f4666fbb62b6fb7a5824R721
      
      
      
      If multiple versions have equal priority, we select the function with
      the most number of feature bits generated by
      riscv_minimal_hwprobe_feature_bits. When it comes to the same number of
      feature bits, we diff two versions and select the one with the least
      significant bit set. Since a feature appears earlier in the feature_bits
      might be more important to performance.
      
      The TARGET_OPTION_FUNCTION_VERSIONS is implemented to check whether the
      two function versions are the same. This Implementation reuses the code
      in TARGET_COMPARE_VERSION_PRIORITY and check it returns 0, which means
      the equal priority.
      
      Co-Developed-by: default avatarHank Chang <hank.chang@sifive.com>
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc
      	(parse_features_for_version): New function.
      	(compare_fmv_features): New function.
      	(riscv_compare_version_priority): New function.
      	(riscv_common_function_versions): New function.
      	(TARGET_COMPARE_VERSION_PRIORITY): Implement it.
      	(TARGET_OPTION_FUNCTION_VERSIONS): Implement it.
      78753c75
    • Yangyu Chen's avatar
      RISC-V: Implement TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P · bd975bd1
      Yangyu Chen authored
      
      This patch implements the TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P for
      RISC-V. This hook is used to process attribute
      ((target_version ("..."))).
      
      As it is the first patch which introduces the target_version attribute,
      we also set TARGET_HAS_FMV_TARGET_ATTRIBUTE to 0 to use "target_version"
      for function versioning.
      
      Co-Developed-by: default avatarHank Chang <hank.chang@sifive.com>
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-protos.h
      	(riscv_process_target_attr): Remove as it is not used.
      	(riscv_option_valid_version_attribute_p): Declare.
      	(riscv_process_target_version_attr): Declare.
      	* config/riscv/riscv-target-attr.cc
      	(riscv_target_attrs): Renamed from riscv_attributes.
      	(riscv_target_version_attrs): New attributes for target_version.
      	(riscv_process_one_target_attr): New arguments to select attrs.
      	(riscv_process_target_attr): Likewise.
      	(riscv_option_valid_attribute_p): Likewise.
      	(riscv_process_target_version_attr): New function.
      	(riscv_option_valid_version_attribute_p): New function.
      	* config/riscv/riscv.cc
      	(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): Implement it.
      	* config/riscv/riscv.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): Define
      	it to 0 to use "target_version" for function versioning.
      bd975bd1
    • Yangyu Chen's avatar
      RISC-V: Implement riscv_minimal_hwprobe_feature_bits · 1f99a39d
      Yangyu Chen authored
      
      This patch implements the riscv_minimal_hwprobe_feature_bits feature
      for the RISC-V target. The feature bits are defined in the
      libgcc/config/riscv/feature_bits.c to provide bitmasks of ISA extensions
      that defined in RISC-V C-API. Thus, we need a function to generate the
      feature bits for IFUNC resolver to dispatch between different functions
      based on the hardware features.
      
      The minimal feature bits means to use the earliest extension appeard in
      the Linux hwprobe to cover the given ISA string. To allow older kernels
      without some implied extensions probe to run the FMV dispatcher
      correctly.
      
      For example, V implies Zve32x, but Zve32x appears in the Linux kernel
      since v6.11. If we use isa string directly to generate FMV dispatcher
      with functions with "arch=+v" extension, since we have V implied the
      Zve32x, FMV dispatcher will check if the Zve32x extension is supported
      by the host. If the Linux kernel is older than v6.11, the FMV dispatcher
      will fail to detect the Zve32x extension even it already implies by the
      V extension, thus making the FMV dispatcher fail to dispatch the correct
      function.
      
      Thus, we need to generate the minimal feature bits to cover the given
      ISA string to allow the FMV dispatcher to work correctly on older
      kernels.
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* common/config/riscv/riscv-common.cc
      	(RISCV_EXT_BITMASK): New macro.
      	(struct riscv_ext_bitmask_table_t): New struct.
      	(riscv_minimal_hwprobe_feature_bits): New function.
      	* common/config/riscv/riscv-ext-bitmask.def: New file.
      	* config/riscv/riscv-subset.h (GCC_RISCV_SUBSET_H): Include
      	riscv-feature-bits.h.
      	(riscv_minimal_hwprobe_feature_bits): Declare the function.
      	* config/riscv/riscv-feature-bits.h: New file.
      1f99a39d
    • Yangyu Chen's avatar
      RISC-V: Implement Priority syntax parser for Function Multi-Versioning · 6b572d4e
      Yangyu Chen authored
      This patch adds the priority syntax parser to support the Function
      Multi-Versioning (FMV) feature in RISC-V. This feature allows users to
      specify the priority of the function version in the attribute syntax.
      
      Chnages based on RISC-V C-API PR:
      https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85
      
      
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-target-attr.cc
      	(riscv_target_attr_parser::handle_priority): New function.
      	(riscv_target_attr_parser::update_settings): Update priority
      	attribute.
      	* config/riscv/riscv.opt: Add TargetVariable riscv_fmv_priority.
      6b572d4e
    • Yangyu Chen's avatar
      Introduce TARGET_CLONES_ATTR_SEPARATOR for RISC-V · 9bf0dbe6
      Yangyu Chen authored
      Some architectures may use ',' in the attribute string, but it is not
      used as the separator for different targets. To avoid conflict, we
      introduce a new macro TARGET_CLONES_ATTR_SEPARATOR to separate different
      clones.
      
      As an example, according to RISC-V C-API Specification [1], RISC-V allows
      ',' in the attribute string in the "arch=" option to specify one more
      ISA extensions in the same target function, which conflict with the
      default separator to separate different clones. This patch introduces
      TARGET_CLONES_ATTR_SEPARATOR for RISC-V and choose '#' as the separator,
      since '#' is not allowed in the target_clones option string.
      
      [1] https://github.com/riscv-non-isa/riscv-c-api-doc/blob/c6c5d6d9cf96b342293315a5dff3d25e96ef8191/src/c-api.adoc#__attribute__targetattr-string
      
      
      
      Signed-off-by: default avatarYangyu Chen <cyy@cyyself.name>
      
      gcc/ChangeLog:
      
      	* defaults.h (TARGET_CLONES_ATTR_SEPARATOR): Define new macro.
      	* multiple_target.cc (get_attr_str): Use
      	TARGET_CLONES_ATTR_SEPARATOR to separate attributes.
      	(separate_attrs): Likewise.
      	(expand_target_clones): Likewise.
      	* attribs.cc (attr_strcmp): Likewise.
      	(sorted_attr_string): Likewise.
      	* tree.cc (get_target_clone_attr_len): Likewise.
      	* config/riscv/riscv.h (TARGET_CLONES_ATTR_SEPARATOR): Define
      	TARGET_CLONES_ATTR_SEPARATOR for RISC-V.
      	* doc/tm.texi: Document TARGET_CLONES_ATTR_SEPARATOR.
      	* doc/tm.texi.in: Likewise.
      9bf0dbe6
    • Paul Thomas's avatar
      Fortran: Fix failing character pointer fcn assignment [PR105054] · f530a8c6
      Paul Thomas authored
      2024-11-14  Paul Thomas  <pault@gcc.gnu.org>
      
      gcc/fortran
      	PR fortran/105054
      	* resolve.cc (get_temp_from_expr): If the pointer function has
      	a deferred character length, generate a new deferred charlen
      	for the temporary.
      
      gcc/testsuite/
      	PR fortran/105054
      	* gfortran.dg/ptr_func_assign_6.f08: New test.
      f530a8c6
    • Martin Uecker's avatar
      c: add Wzero-as-null-pointer-constant [PR117059] · 236c0829
      Martin Uecker authored
      
      Add warnings for the use of zero as a null pointer constant to the C FE.
      
      	PR c/117059
      
      gcc/c-family/ChangeLog:
      	* c.opt (Wzero-as-null-pointer-constant): Enable for C and ObjC.
      
      gcc/c/ChangeLog:
      	* c-typeck.cc (parse_build_binary_op): Add warning.
      	(build_conditional_expr): Add warning.
      	(convert_for_assignment): Add warning.
      
      gcc/ChangeLog:
      	* doc/invoke.texi (Wzero-as-null-pointer-constant): Adapt
      	description.
      
      gcc/testsuite/ChangeLog:
      	* gcc.dg/Wzero-as-null-pointer-constant.c: New test.
      
      Suggested-by: default avatarAlejandro Colomar <alx@kernel.org>
      Acked-by: default avatarAlejandro Colomar <alx@kernel.org>
      Reviewed-by: default avatarJoseph Myers <josmyers@redhat.com>
      236c0829
    • Jakub Jelinek's avatar
      c: Handle C23 floating constant {d,D}{32,64,128} suffixes like {df,dd,dl} · 856809e5
      Jakub Jelinek authored
      C23 roughly says that {d,D}{32,64,128} floating point constant suffixes
      are alternate spellings of {df,dd,dl} suffixes in annex H.
      
      So, the following patch allows that alternate spelling.
      Or is it intentional it isn't enabled and we need to do everything in
      there first before trying to define __STDC_IEC_60559_DFP__?
      Like add support for _Decimal32x and _Decimal64x types (including
      the d32x and d64x suffixes) etc.
      
      2024-11-13  Jakub Jelinek  <jakub@redhat.com>
      
      libcpp/
      	* expr.cc (interpret_float_suffix): Handle d32 and D32 suffixes
      	for C like df, d64 and D64 like dd and d128 and D128 like
      	dl.
      gcc/c-family/
      	* c-lex.cc (interpret_float): Subtract 3 or 4 from copylen
      	rather than 2 if last character of CPP_N_DFLOAT is a digit.
      gcc/testsuite/
      	* gcc.dg/dfp/c11-constants-3.c: New test.
      	* gcc.dg/dfp/c11-constants-4.c: New test.
      	* gcc.dg/dfp/c23-constants-3.c: New test.
      	* gcc.dg/dfp/c23-constants-4.c: New test.
      856809e5
    • Jakub Jelinek's avatar
      c: Implement C2Y N3298 - Introduce complex literals [PR117029] · eb45d151
      Jakub Jelinek authored
      The following patch implements the C2Y N3298 paper Introduce complex literals
      by providing different (or no) diagnostics on imaginary constants (except
      for integer ones).
      For _DecimalN constants we don't support _Complex _DecimalN and error on any
      i/j suffixes mixed with DD/DL/DF, so nothing changed there.
      
      2024-11-13  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c/117029
      libcpp/
      	* include/cpplib.h (struct cpp_options): Add imaginary_constants
      	member.
      	* init.cc (struct lang_flags): Add imaginary_constants bitfield.
      	(lang_defaults): Add column for imaginary_constants.
      	(cpp_set_lang): Copy over imaginary_constants.
      	* expr.cc (cpp_classify_number): Diagnose CPP_N_IMAGINARY
      	non-CPP_N_FLOATING constants differently for C.
      gcc/testsuite/
      	* gcc.dg/cpp/pr7263-3.c: Adjust expected diagnostic wording.
      	* gcc.dg/c23-imaginary-constants-1.c: New test.
      	* gcc.dg/c23-imaginary-constants-2.c: New test.
      	* gcc.dg/c23-imaginary-constants-3.c: New test.
      	* gcc.dg/c23-imaginary-constants-4.c: New test.
      	* gcc.dg/c23-imaginary-constants-5.c: New test.
      	* gcc.dg/c23-imaginary-constants-6.c: New test.
      	* gcc.dg/c23-imaginary-constants-7.c: New test.
      	* gcc.dg/c23-imaginary-constants-8.c: New test.
      	* gcc.dg/c23-imaginary-constants-9.c: New test.
      	* gcc.dg/c23-imaginary-constants-10.c: New test.
      	* gcc.dg/c2y-imaginary-constants-1.c: New test.
      	* gcc.dg/c2y-imaginary-constants-2.c: New test.
      	* gcc.dg/c2y-imaginary-constants-3.c: New test.
      	* gcc.dg/c2y-imaginary-constants-4.c: New test.
      	* gcc.dg/c2y-imaginary-constants-5.c: New test.
      	* gcc.dg/c2y-imaginary-constants-6.c: New test.
      	* gcc.dg/c2y-imaginary-constants-7.c: New test.
      	* gcc.dg/c2y-imaginary-constants-8.c: New test.
      	* gcc.dg/c2y-imaginary-constants-9.c: New test.
      	* gcc.dg/c2y-imaginary-constants-10.c: New test.
      	* gcc.dg/c2y-imaginary-constants-11.c: New test.
      	* gcc.dg/c2y-imaginary-constants-12.c: New test.
      eb45d151
    • Soumya AR's avatar
      aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733] · 9b2915d9
      Soumya AR authored
      
      This patch uses the FSCALE instruction provided by SVE to implement the
      standard ldexp family of functions.
      
      Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
      following code:
      
      float
      test_ldexpf (float x, int i)
      {
      	return __builtin_ldexpf (x, i);
      }
      
      double
      test_ldexp (double x, int i)
      {
      	return __builtin_ldexp(x, i);
      }
      
      GCC Output:
      
      test_ldexpf:
      	b ldexpf
      
      test_ldexp:
      	b ldexp
      
      Since SVE has support for an FSCALE instruction, we can use this to process
      scalar floats by moving them to a vector register and performing an fscale call,
      similar to how LLVM tackles an ldexp builtin as well.
      
      New Output:
      
      test_ldexpf:
      	fmov	s31, w0
      	ptrue	p7.b, vl4
      	fscale	z0.s, p7/m, z0.s, z31.s
      	ret
      
      test_ldexp:
      	sxtw	x0, w0
      	ptrue	p7.b, vl8
      	fmov	d31, x0
      	fscale	z0.d, p7/m, z0.d, z31.d
      	ret
      
      This is a revision of an earlier patch, and now uses the extended definition of
      aarch64_ptrue_reg to generate predicate registers with the appropriate set bits.
      
      The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
      OK for mainline?
      
      Signed-off-by: default avatarSoumya AR <soumyaa@nvidia.com>
      
      gcc/ChangeLog:
      
      	PR target/111733
      	* config/aarch64/aarch64-sve.md
      	(ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
      	floating modes and expand to the existing pattern for FSCALE.
      	* config/aarch64/iterators.md:
      	(SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well
      	as their scalar equivalents.
      	(VPRED): Extended the attribute to handle GPF_HF modes.
      	* internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/sve/fscale.c: New test.
      9b2915d9
    • xuli's avatar
      RISC-V: Bugfix for max_sew_overlap_and_next_ratio_valid_for_prev_sew_p[pr117483] · 445d8bb6
      xuli authored
      This patch fixs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117483
      
      
      
      If prev and next satisfy the following rules, we should forbid the case
      (next.get_sew() < prev.get_sew() && (!next.get_ta() || !next.get_ma()))
      in the compatible function max_sew_overlap_and_next_ratio_valid_for_prev_sew_p.
      Otherwise, the tail elements of next will be polluted.
      
      DEF_SEW_LMUL_RULE (ge_sew, ratio_and_ge_sew, ratio_and_ge_sew,
       max_sew_overlap_and_next_ratio_valid_for_prev_sew_p,
       always_false, use_max_sew_and_lmul_with_next_ratio)
      
      Passed the rv64gcv full regression test.
      
      Signed-off-by: default avatarLi Xu <xuli1@eswincomputing.com>
      
      	PR target/117483
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-vsetvl.cc: Fix bug.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/pr117483.c: New test.
      445d8bb6
    • Xianmiao Qu's avatar
      [RISC-V] Fix costing of LO_SUM expressions · eeb5c6ac
      Xianmiao Qu authored
      
      This is a rewrite of a patch originally from Xianmiao Qu.  Xianmiao
      noticed that the costs we compute for LO_SUM expressions was incorrect.
      Essentially we costed based solely on the first input to the LO_SUM.
      
      In a LO_SUM, the first input is almost always going to be a REG and thus
      isn't interesting.  The second argument is almost always going to be
      some kind of symbolic operand, which is much more interesting from a
      costing standpoint.
      
      The right way to fix this is to sum the cost of the two operands.  I've
      verified this produces the same code as Xianmiao's Qu's original patch.
      
      This has been tested on rv32 and rv64 in my tester.  It missed today's
      bootstrap of riscv64 though :(  Naturally I'll wait on the pre-commit CI
      tester to render a verdict, but I don't expect any problems.
      
      --  From Xianmiao Qu's original submission --
      
      Currently, the cost of the LO_SUM expression is based on
      the cost of calculating the first subexpression. When the
      first subexpression is a register, the cost result will
      be zero. It seems a bit unreasonable for a SET expression
      to have a zero cost when its source is LO_SUM. Moreover,
      having a cost of zero for the expression will lead the
      loop invariant pass to calculate its benefits of being
      moved outside the loop as zero, thus preventing the
      out-of-loop placement of the loop invariant.
      
      As an example, consider the following test case:
         long a;
         long b[];
         long *c;
         foo () {
           for (;;)
             *c = b[a];
         }
      
      When compiling with -march=rv64gc -mabi=lp64d -Os, the following code is
      generated:
               .cfi_startproc
               lui     a5,%hi(c)
               ld      a4,%lo(c)(a5)
               lui     a2,%hi(b)
               lui     a1,%hi(a)
      .L2:
               ld      a5,%lo(a)(a1)
               addi    a3,a2,%lo(b)
               slli    a5,a5,3
               add     a5,a5,a3
               ld      a5,0(a5)
               sd      a5,0(a4)
               j       .L2
      
      After adjust the cost of the LO_SUM expression, the instruction addi will be
      moved outside the loop:
               .cfi_startproc
               lui     a5,%hi(c)
               ld      a3,%lo(c)(a5)
               lui     a4,%hi(b)
               lui     a2,%hi(a)
               addi    a4,a4,%lo(b)
      .L2:
               ld      a5,%lo(a)(a2)
               slli    a5,a5,3
               add     a5,a5,a4
               ld      a5,0(a5)
               sd      a5,0(a3)
               j       .L2
      
      gcc/
      	* config/riscv/riscv.cc (riscv_rtx_costs): Correct costing of LO_SUM
      	expressions.
      
      Co-authored-by: default avatarJeff Law <jlaw@ventanamicro.com>
      eeb5c6ac
    • Jeff Law's avatar
      10d76b7f
    • Hu, Lin1's avatar
      i386: Zero extend 32-bit address to 64-bit with option -mx32 -maddress-mode=long. [PR 117418] · 2272cd25
      Hu, Lin1 authored
      -maddress-mode=long let Pmode = DI_mode, so zero extend 32-bit address to
      64-bit and uses a 64-bit register as a pointer for avoid raise an ICE.
      
      gcc/ChangeLog:
      
      	PR target/117418
      	* config/i386/i386-expand.cc (ix86_expand_builtin): Convert
      	pointer's mode according to Pmode.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/117418
      	* gcc.target/i386/pr117418-1.c: New test.
      2272cd25
    • GCC Administrator's avatar
      Daily bump. · 9e423b5c
      GCC Administrator authored
      9e423b5c
    • Jeff Law's avatar
      de3b2772
  2. Nov 12, 2024
    • Yangyu Chen's avatar
      RISC-V: Fix target-attr-norelax.c testcase · 098214cf
      Yangyu Chen authored
      The target-attr-norelax.c testcase was failing due to the redundant "\t"
      check in the assembly output, and forgot to skip the check for lto build
      in the testcase.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/target-attr-norelax.c: Fix testcase.
      098214cf
    • Pan Li's avatar
      Revert "Match: Simplify branch form 3 of unsigned SAT_ADD into branchless" · d95339c9
      Pan Li authored
      This reverts commit df4af89b.
      d95339c9
    • David Malcolm's avatar
      selftests: clear GCC_COLORS [PR117503] · 169897bb
      David Malcolm authored
      
      gcc/ChangeLog:
      	PR bootstrap/117503
      	* Makefile.in (GCC_FOR_SELFTESTS): Set GCC_COLORS=.
      
      Signed-off-by: default avatarDavid Malcolm <dmalcolm@redhat.com>
      169897bb
    • John David Anglin's avatar
      hppa: Fix decrement_and_branch_until_zero constraint · b59c4b18
      John David Anglin authored
      The third alternative for argument 4 needs to be an early clobber
      constraint.  Noticed testing LRA.
      
      2024-11-12  John David Anglin  <danglin@gcc.gnu.org>
      
      gcc/ChangeLog:
      
      	* config/pa/pa.md (decrement_and_branch_until_zero): Fix
      	constraint.
      b59c4b18
    • Edwin Lu's avatar
      RISC-V: testsuite: Remove deprecated compatibility headers · 534e14ad
      Edwin Lu authored
      
      Since r15-4981-g5c34f02ba7e these tests have been failing on vector
      targets with excess errors due to the new deprecation warning message.
      Remove the <cstdalign> header.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.target/riscv/rvv/base/bug-10.C: Remove cstdalign header.
      	* g++.target/riscv/rvv/base/bug-11.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-12.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-13.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-14.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-15.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-16.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-17.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-2.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-23.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-3.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-4.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-5.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-6.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-7.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-8.C: Ditto.
      	* g++.target/riscv/rvv/base/bug-9.C: Ditto.
      
      Signed-off-by: default avatarEdwin Lu <ewlu@rivosinc.com>
      534e14ad
    • Jan Hubicka's avatar
      Verify that empty std::vector is optimized away · 2264b687
      Jan Hubicka authored
      With __builtin_operator_new we now can optimize away unused std::vectors.
      This adds testcases mentioned in the PR.
      
      	PR tree-optimization/96945
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/tree-ssa/pr96945.C: New test.
      2264b687
    • Andrew Carlotti's avatar
      testsuite: Adjust jump threading test expectation · f72f8c34
      Andrew Carlotti authored
      This test started failing on aarch64 after 0cfc9c95 in 2023 ("Phi
      analyzer - Initialize with range instead of a tree.").
      
      The only change visible in the pass dumps prior to thread2 is the upper
      bounds of some ranges are reduced from +INF to 7, consistent with the
      bitamsk information.  After thread2, there are changes in the control
      flow, but only affecting edges that are obviously never taken (from
      basic blocks 6 through 12).  These are cleaned up in the following pass,
      but the final codegen remains different.
      
      There isn't anything obviously wrong with the change in dump output, so
      let's just update the test expectations (as has happened previously
      here).
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/112376
      	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update expectation.
      f72f8c34
    • Wilco Dijkstra's avatar
      AArch64: Remove duplicated addr_cost tables · 95305c80
      Wilco Dijkstra authored
      Remove duplicated addr_cost tables - use generic_armv9_a_addrcost_table for
      Armv9-a cores and generic_armv8_a_addrcost_table for recent Armv8-a cores.
      No changes in generated code.
      
      gcc/ChangeLog:
      
      	* config/aarch64/tuning_models/cortexx925.h (cortexx925_addrcost_table): Remove.
      	* config/aarch64/tuning_models/neoversen1.h: Use generic_armv8_a_addrcost_table.
      	* config/aarch64/tuning_models/neoversen2.h (neoversen2_addrcost_table): Remove.
      	* config/aarch64/tuning_models/neoversen3.h (neoversen3_addrcost_table): Remove.
      	* config/aarch64/tuning_models/neoversev2.h (neoversev2_addrcost_table): Remove.
      	* config/aarch64/tuning_models/neoversev3.h (neoversev3_addrcost_table): Remove.
      	* config/aarch64/tuning_models/neoversev3ae.h (neoversev3ae_addrcost_table): Remove.
      95305c80
    • Wilco Dijkstra's avatar
      AArch64: Cleanup fusion defines · deb0e2f6
      Wilco Dijkstra authored
      Cleanup the fusion defines by introducing AARCH64_FUSE_BASE as a common base
      level of fusion supported by almost all cores.  Add AARCH64_FUSE_MOVK as a
      shortcut for all MOVK fusion.  In most cases there is no change.  It enables
      AARCH64_FUSE_CMP_BRANCH for a few older cores since it has no measurable
      effect if a core doesn't support it.  Also it may have been accidentally
      left out on some cores that support all other types of branch fusion.
      
      gcc/ChangeLog:
      
      	* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_BASE): New define.
      	(AARCH64_FUSE_MOVK): Likewise.
      	* config/aarch64/tuning_models/a64fx.h: Update.
      	* config/aarch64/tuning_models/ampere1.h: Likewise.
      	* config/aarch64/tuning_models/ampere1a.h: Likewise.
      	* config/aarch64/tuning_models/ampere1b.h: Likewise.
      	* config/aarch64/tuning_models/cortexa35.h: Likewise.
      	* config/aarch64/tuning_models/cortexa53.h: Likewise.
      	* config/aarch64/tuning_models/cortexa57.h: Likewise.
      	* config/aarch64/tuning_models/cortexa72.h: Likewise.
      	* config/aarch64/tuning_models/cortexa73.h: Likewise.
      	* config/aarch64/tuning_models/cortexx925.h: Likewise.
      	* config/aarch64/tuning_models/exynosm1.h: Likewise.
      	* config/aarch64/tuning_models/fujitsu_monaka.h: Likewise.
      	* config/aarch64/tuning_models/generic.h: Likewise.
      	* config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
      	* config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
      	* config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
      	* config/aarch64/tuning_models/neoversen1.h: Likewise.
      	* config/aarch64/tuning_models/neoversen2.h: Likewise.
      	* config/aarch64/tuning_models/neoversen3.h: Likewise.
      	* config/aarch64/tuning_models/neoversev1.h: Likewise.
      	* config/aarch64/tuning_models/neoversev2.h: Likewise.
      	* config/aarch64/tuning_models/neoversev3.h: Likewise.
      	* config/aarch64/tuning_models/neoversev3ae.h: Likewise.
      	* config/aarch64/tuning_models/qdf24xx.h: Likewise.
      	* config/aarch64/tuning_models/saphira.h: Likewise.
      	* config/aarch64/tuning_models/thunderx2t99.h: Likewise.
      	* config/aarch64/tuning_models/thunderx3t110.h: Likewise.
      	* config/aarch64/tuning_models/tsv110.h: Likewise.
      deb0e2f6
    • Pan Li's avatar
      RISC-V: Fix incorrect test macro for signed scalar SAT_ADD form 2 run test · 9a64cd19
      Pan Li authored
      
      This patch would like to fix one incorrect test macro usage for
      form 2 of signed scalar SAT_ADD run test.  It should leverage the
      _FMT_2 instead of _FMT_1 for form 2.
      
      The below test are passed for this patch.
      * The rv64gcv fully regression test.
      
      It is test only patch and obvious up to a point, will commit it
      directly if no comments in next 48H.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/sat_arith.h: Add test helper macro.
      	* gcc.target/riscv/sat_s_add-run-5.c: Take form 2 for run test.
      	* gcc.target/riscv/sat_s_add-run-6.c: Ditto.
      	* gcc.target/riscv/sat_s_add-run-7.c: Ditto.
      	* gcc.target/riscv/sat_s_add-run-8.c: Ditto.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      9a64cd19
    • yulong's avatar
      RISC-V: Add norelax function attribute · 4bee5252
      yulong authored
      This patch adds norelax function attribute that be discussed in riscv-c-api-doc PR#94.
      URL:https://github.com/riscv-non-isa/riscv-c-api-doc/pull/94
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc (riscv_declare_function_name): Add new
      	attribute.
      4bee5252
    • Jeff Law's avatar
      [RISC-V] Drop undesirable two instruction macc alternatives · 705a2103
      Jeff Law authored
      So I was looking at sub_dct a little while ago and was surprised to see
      us emit two instructions out of a single pattern.  We generally try to
      avoid that -- it's not always possible, but as a general rule of thumb
      it should be avoided.  Specifically I saw:
      
      >         vmv1r.v v4,v2   # 138   [c=4 l=4]  *pred_mul_plusrvvm1hi_undef/5
      >         vmacc.vv        v4,v8,v1
      
      When we emit multiple instructions out of a single pattern we can't
      build a good schedule as we can't really describe the two instructions
      well and we can't split them up -- they move as an atomic unit.
      
      These cases can also raise correctness issues if the pattern doesn't
      properly account for both instructions in its length computation.
      
      Note the length, 4 bytes.  So this is both a performance and latent
      correctness issue.
      
      It appears that these alternatives are meant to deal with the case when
      we have three source inputs and a non-matching output.  The author did
      put in "?" to slightly disparage these alternatives, but a "!" would
      have been better.  The best solution is to just remove those
      alternatives and let the allocator manage the matching operand issue.
      
      That's precisely what this patch does.  For the various integer
      multiply-add/multiply-accumulate patterns we drop the alternatives which
      don't require a match between the output and one of the inputs.
      
      That fixes the correctness issue and should shave a cycle or two off our
      sub_dct code.  Essentially the move bubbles up into an empty slot and we
      can schedule around the vmacc sensibly.
      
      Interestingly enough this fixes a scan-assembler test in my tester for
      both rv32 and rv64.
      
      > Tests that now work, but didn't before (10 tests):
      >
      > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
      > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
      > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
      > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
      > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
      > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
      > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
      > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
      > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
      > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
      
      My BPI is already in a bootstrap test, so this patch won't hit the BPI
      for bootstrapping until Wednesday, meaning no data until Thursday.  Will
      wait for the pre-commit tester though.
      
      gcc/
      	* config/riscv/vector.md (pred_mul_plus<mode>_undef): Drop alternatives
      	where output doesn't have to match input.
      	(pred_madd<mode>, pred_macc<mode>): Likewise.
      	(pred_madd<mode>_scalar, pred_macc<mode>_scalar): Likewise.
      	(pred_madd<mode>_exended_scalar): Likewise.
      	(pred_macc<mode>_exended_scalar): Likewise.
      	(pred_minus_mul<mode>_undef): Likewise.
      	(pred_nmsub<mode>, pred_nmsac<mode>): Likewise.
      	(pred_nmsub<mode>_scalar, pred_nmsac<mode>_scalar): Likewise.
      	(pred_nmsub<mode>_exended_scalar): Likewise.
      	(pred_nmsac<mode>_exended_scalar): Likewise.
      705a2103
    • Kito Cheng's avatar
      libsanitizer: Update LOCAL_PATCHES · 0256c8b4
      Kito Cheng authored
      0256c8b4
    • Richard Biener's avatar
      tree-optimization/116973 - SLP permute lower heuristic and single-lane SLP · 0d4b254b
      Richard Biener authored
      When forcing single-lane SLP to emulate non-SLP behavior we need to
      disable heuristics designed to optimize SLP loads and instead in
      all cases resort to an interleaving scheme as requested by forcefully
      doing single-lane SLP.
      
      This fixes the remaining fallout for --param vect-force-slp=1 on x86.
      
      	PR tree-optimization/116973
      	* tree-vect-slp.cc (vect_lower_load_permutations): Add
      	force_single_lane parameter.  Disable heuristic that keeps
      	some load-permutations.
      	(vect_analyze_slp): Pass force_single_lane to
      	vect_lower_load_permutations.
      0d4b254b
    • Kito Cheng's avatar
      libsanitizer: update test · 1b35b929
      Kito Cheng authored
      gcc/testsuite/ChangeLog:
      
      	* c-c++-common/ubsan/builtin-1.c: Update test case due to
      	sanitizer has change the error message.
      1b35b929
Loading