Commits · 0935d0d6e6c244031117f3fd7fea2bfa78295f75 · COBOLworx / gcc-cobol

Nov 13, 2024

libstdc++: Remove _Insert base class from _Hashtable · 0935d0d6

Jonathan Wakely authored 4 months ago


There's no reason to have a separate base class defining the insert
member functions now. They can all be moved into the _Hashtable class,
which simplifies them slightly.

libstdc++-v3/ChangeLog:

	* include/bits/hashtable.h (_Hashtable): Remove inheritance from
	__detail::_Insert and move its members into _Hashtable.
	* include/bits/hashtable_policy.h (__detail::_Insert): Remove.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

0935d0d6

libstdc++: Use RAII in _Hashtable · d2970e86

Jonathan Wakely authored 4 months ago


Use scoped guard types to clean up if an exception is thrown. This
allows some try-catch blocks to be removed.

libstdc++-v3/ChangeLog:

	* include/bits/hashtable.h (operator=(const _Hashtable&)): Use
	RAII instead of try-catch.
	(_M_assign(_Ht&&, _NodeGenerator&)): Likewise.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

d2970e86

libstdc++: Replace _Hashtable::__fwd_value_for with cast · e717c322

Jonathan Wakely authored 4 months ago


We can just use a cast to the appropriate type instead of calling a
function to do it. This gives the compiler less work to compile and
optimize, and at -O0 avoids a function call per element.

libstdc++-v3/ChangeLog:

	* include/bits/hashtable.h (_Hashtable::__fwd_value_for):
	Remove.
	(_Hashtable::_M_assign): Use static_cast instead of
	__fwd_value_for.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

e717c322

libstdc++: Add _Hashtable::_M_assign for the common case · 37b17388

Jonathan Wakely authored 4 months ago


This adds a convenient _M_assign overload for the common case where the
node generator is the _AllocNode type. Only two places need to call
_M_assign with a _ReuseOrAllocNode node generator, so all the other
calls to _M_assign can use the new overload instead of manually
constructing a node generator.

The _AllocNode::operator(Args&&...) function doesn't need to be a
variadic template. It is only ever called with a single argument of type
const value_type& or value_type&&, so could be simplified. That isn't
done in this commit.

libstdc++-v3/ChangeLog:

	* include/bits/hashtable.h (_Hashtable): Remove typedefs for
	node generators.
	(_Hashtable::_M_assign(_Ht&&)): Add new overload.
	(_Hashtable::operator=(initializer_list<value_type>)): Add local
	typedef for node generator.
	(_Hashtable::_M_assign_elements): Likewise.
	(_Hashtable::operator=(const _Hashtable&)): Use new _M_assign
	overload.
	(_Hashtable(const _Hashtable&)): Likewise.
	(_Hashtable(const _Hashtable&, const allocator_type&)):
	Likewise.
	(_Hashtable(_Hashtable&&, __node_alloc_type&&, false_type)):
	Likewise.
	* include/bits/hashtable_policy.h (_Insert): Remove typedef for
	node generator.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

37b17388

libstdc++: Refactor Hashtable erasure · 73676cfb

Jonathan Wakely authored 4 months ago


This reworks the internal member functions for erasure from
unordered containers, similarly to the earlier commit doing it for
insertion.

Instead of multiple overloads of _M_erase which are selected via tag
dispatching, the erase(const key_type&) member can use 'if constexpr' to
choose an appropriate implementation (returning after erasing a single
element for unique keys, or continuing to erase all equivalent elements
for non-unique keys).

libstdc++-v3/ChangeLog:

	* include/bits/hashtable.h (_Hashtable::_M_erase): Remove
	overloads for erasing by key, moving logic to ...
	(_Hashtable::erase): ... here.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

73676cfb

libstdc++: Refactor Hashtable insertion [PR115285] · ce2cf1f0

Jonathan Wakely authored 4 months ago

This completely reworks the internal member functions for insertion into
unordered containers. Currently we use a mixture of tag dispatching (for
unique vs non-unique keys) and template specialization (for maps vs
sets) to correctly implement insert and emplace members.

This removes a lot of complexity and indirection by using 'if constexpr'
to select the appropriate member function to call.

Previously there were four overloads of _M_emplace, for unique keys and
non-unique keys, and for hinted insertion and non-hinted. However two of
those were redundant, because we always ignore the hint for unique keys
and always use a hint for non-unique keys. Those four overloads have
been replaced by two new non-overloaded function templates:
_M_emplace_uniq and _M_emplace_multi. The former is for unique keys and
doesn't take a hint, and the latter is for non-unique keys and takes a
hint.

In the body of _M_emplace_uniq there are special cases to handle
emplacing values from which a key_type can be extracted directly. This
means we don't need to allocate a node and construct a value_type that
might be discarded if an equivalent key is already present. The special
case applies when emplacing the key_type into std::unordered_set, or
when emplacing std::pair<cv key_type, X> into std::unordered_map, or
when emplacing two values into std::unordered_map where the first has
type cv key_type. For the std::unordered_set case, obviously if we're
inserting something that's already the key_type, we can look it up
directly. For the std::unordered_map cases, we know that the inserted
std::pair<const key_type, mapped_type> would have its first element
initialized from first member of a std::pair value, or from the first of
two values, so if that is a key_type, we can look that up directly.

All the _M_insert overloads used a node generator parameter, but apart
from the one case where _M_insert_range was called from
_Hashtable::operator=(initializer_list<value_type>), that parameter was
always the _AllocNode type, never the _ReuseOrAllocNode type. Because
operator=(initializer_list<value_type>) was rewritten in an earlier
commit, all calls to _M_insert now use _AllocNode, so there's no reason
to pass the generator as a template parameter when inserting.

The multiple overloads of _Hashtable::_M_insert can all be removed now,
because the _Insert_base::insert members now call either _M_emplace_uniq
or _M_emplace_multi directly, only passing a hint to the latter. Which
one to call is decided using 'if constexpr (__unique_keys::value)' so
there is no unnecessary code instantiation, and overload resolution is
much simpler.

The partial specializations of the _Insert class template can be
entirely removed, moving the minor differences in 'insert' member
functions into the common _Insert_base base class. The different
behaviour for maps and sets can be implemented using enable_if
constraints and 'if constexpr'. With the _Insert class template no
longer needed, the _Insert_base class template can be renamed to
_Insert. This is a minor simplification for the complex inheritance
hierarchy used by _Hashtable, removing one base class. It also means
one less class template instantiation, and no need to match the right
partial specialization of _Insert. The _Insert base class could be
removed entirely by moving all its 'insert' members into _Hashtable,
because without any variation in specializations of _Insert there is no
reason to use a base class to define those members. That is left for a
later commit.

Consistently using _M_emplace_uniq or _M_emplace_multi for insertion
means we no longer attempt to avoid constructing a value_type object to
find its key, removing the PR libstdc++/96088 optimizations. This fixes
the bugs caused by those optimizations, such as PR libstdc++/115285, but
causes regressions in the expected number of allocations and temporary
objects constructed for the PR 96088 tests.  It should be noted that the
"regressions" in the 96088 tests put us exactly level with the number of
allocations done by libc++ for those same tests.

To mitigate this to some extent, _M_emplace_uniq detects when the
emplace arguments already contain a key_type (either as the sole
argument, for unordered_set, or as the first part of a pair of
arguments, for unordered_map). In that specific case we don't need to
allocate a node and construct a value type to check for an existing
element with equivalent key.

The remaining regressions in the number of allocations and temporaries
should be addressed separately, with more conservative optimizations
specific to std::string. That is not part of this commit.

libstdc++-v3/ChangeLog:

	PR libstdc++/115285
	* include/bits/hashtable.h (_Hashtable::_M_emplace): Replace
	with _M_emplace_uniq and _M_emplace_multi.
	(_Hashtable::_S_forward_key, _Hashtable::_M_insert_unique)
	(_Hashtable::_M_insert_unique_aux, _Hashtable::_M_insert):
	Remove.
	* include/bits/hashtable_policy.h (_ConvertToValueType):
	Remove.
	(_Insert_base::_M_insert_range): Remove overload for unique keys
	and rename overload for non-unique keys to ...
	(_Insert_base::_M_insert_range_multi): ... this.
	(_Insert_base::insert): Call _M_emplace_uniq or _M_emplace_multi
	instead of _M_insert.  Add insert overloads from _Insert.
	(_Insert_base): Rename to _Insert.
	(_Insert): Remove
	* testsuite/23_containers/unordered_map/96088.cc: Adjust
	expected number of allocations.
	* testsuite/23_containers/unordered_set/96088.cc: Likewise.

ce2cf1f0

libstdc++: Allow unordered_set assignment to assign to existing nodes · afc9351e

Jonathan Wakely authored 4 months ago


Currently the _ReuseOrAllocNode::operator(Args&&...) function always
destroys the value stored in recycled nodes and constructs a new value.

The _ReuseOrAllocNode type is only ever used for implementing
assignment, either from another unordered container of the same type, or
from std::initializer_list<value_type>. Consequently, the parameter pack
Args only ever consists of a single parameter or type const value_type&
or value_type.  We can replace the variadic parameter pack with a single
forwarding reference parameter, and when the value_type is assignable
from that type we can use assignment instead of destroying the existing
value and then constructing a new one.

Using assignment is typically only possible for sets, because for maps
the value_type is std::pair<const key_type, mapped_type> and in most
cases std::is_assignable_v<const key_type&, const key_type&> is false.

libstdc++-v3/ChangeLog:

	* include/bits/hashtable_policy.h (_ReuseOrAllocNode::operator()):
	Replace parameter pack with a single parameter. Assign to
	existing value when possible.
	* testsuite/23_containers/unordered_multiset/allocator/move_assign.cc:
	Adjust expected count of operations.
	* testsuite/23_containers/unordered_set/allocator/move_assign.cc:
	Likewise.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

afc9351e

libstdc++: Refactor _Hashtable::operator=(initializer_list<value_type>) · 9fcbbb3d

Jonathan Wakely authored 4 months ago


This replaces a call to _M_insert_range with open coding the loop. This
will allow removing the node generator parameter from _M_insert_range in
a later commit.

libstdc++-v3/ChangeLog:

	* include/bits/hashtable.h (operator=(initializer_list)):
	Refactor to not use _M_insert_range.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

9fcbbb3d

libstdc++: Fix calculation of system time in performance tests · 19d0720f

Jonathan Wakely authored 4 months ago

The system_time() function used the wrong element of the splits array.

Also add a comment about the units for time measurements.

libstdc++-v3/ChangeLog:

	* testsuite/util/testsuite_performance.h (time_counter): Add
	comment about times.
	(time_counter::system_time): Use correct split value.

19d0720f

libstdc++: Write timestamp to libstdc++-performance.sum file · de10b4fc

Jonathan Wakely authored 4 months ago

The results of 'make check-performance' are appended to the .sum file,
with no indication where one set of results ends and the next begins. We
could just remove the file when starting a new run, but appending makes
it a little easier to compare with previous runs, without having to copy
and store old files.

This adds a header containing a timestamp to the file when starting a
new run.

libstdc++-v3/ChangeLog:

	* scripts/check_performance: Add timestamp to output file at
	start of run.

de10b4fc

libstdc++: Use __is_single_threaded() in performance tests · 2b920070

Jonathan Wakely authored 4 months ago

With recent glibc releases the __gthread_active_p() function is always
true, so we always append "-thread" onto performance benchmark names.

Use the __gnu_cxx::__is_single_threaded() function instead.

libstdc++-v3/ChangeLog:

	* testsuite/util/testsuite_performance.h: Use
	__gnu_cxx::__is_single_threaded instead of __gthread_active_p().

2b920070

libstdc++: Stop using std::unary_function in perf tests · 8586e161

Jonathan Wakely authored 4 months ago

This fixes some -Wdeprecated-declarations warnings.

libstdc++-v3/ChangeLog:

	* testsuite/performance/ext/pb_ds/hash_int_erase_mem.cc: Replace
	std::unary_function with result_type and argument_type typedefs.
	* testsuite/util/performance/assoc/multimap_common_type.hpp:
	Likewise.

8586e161

libstdc++: Fix nodiscard warnings in perf test for memory pools · 42def7cd

Jonathan Wakely authored 4 months ago

The use of unnamed std::lock_guard temporaries was intentional here, as
they were used like barriers (but std::barrier isn't available until
C++20). But that gives nodiscard warnings, because unnamed temporary
locks are usually unintentional. Use named variables in new block scopes
instead.

libstdc++-v3/ChangeLog:

	* testsuite/performance/20_util/memory_resource/pools.cc: Fix
	-Wunused-value warnings about unnamed std::lock_guard objects.

42def7cd

aarch64: Relax add_overloaded_function assert · 2d7d8179

Richard Sandiford authored 4 months ago

There are some SVE intrinsics that support one set of suffixes for
one extension (E1, say) and another set of suffixes for another
extension (E2, say).  It is usually the case that, mutatis mutandis,
E2 extends E1.  Listing E1 first would then ensure that the manual
C overload would also require E1, making it suitable for resolving
both the E1 forms and, where appropriate, the E2 forms.

However, there was one exception: the I8MM, F32MM, and F64MM extensions
to SVE each added variants of svmmla, but there was no svmmla for SVE
itself.  This was handled by adding an SVE entry for svmmla that only
defined the C overload; it had no variants of its own.

This situation occurs more often with upcoming patches.  Rather than
keep adding these dummy entries, it seemed better to make the code
automatically compute the lowest common denominator for all definitions
that share the same C overload.

gcc/
	* config/aarch64/aarch64-protos.h
	(aarch64_required_extensions::common_denominator): New member
	function.
	* config/aarch64/aarch64-sve-builtins-base.def: Remove zero-variant
	entry for mmla.
	* config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): Remove
	support for it.
	* config/aarch64/aarch64-sve-builtins.cc
	(function_builder::add_overloaded): Relax the assert for duplicate
	definitions and instead calculate the common denominator of all
	requirements.

2d7d8179

i386: Add -mveclibabi=aocl [PR56504] · 99ec0eb3

Filip Kastl authored 4 months ago


We currently support generating vectorized math calls to the AMD core
math library (ACML) (-mveclibabi=acml).  That library is end-of-life and
its successor is the math library from AMD Optimizing CPU Libraries
(AOCL).

This patch adds support for AOCL (-mveclibabi=aocl).  That significantly
broadens the range of vectorized math functions optimized for AMD CPUs
that GCC can generate calls to.

See the edit to invoke.texi for a complete list of added functions.
Compared to the list of functions in AOCL LibM docs I left out these
vectorized function families:

- sincos and all functions working with arrays ... Because these
  functions have pointer arguments and that would require a bigger
  rework of ix86_veclibabi_aocl().  Also, I'm not sure if GCC even ever
  generates calls to these functions.
- linearfrac ... Because these functions are specific to the AMD
  library.  There's no equivalent glibc function nor GCC internal
  function nor GCC built-in.
- powx, sqrt, fabs ... Because GCC doesn't vectorize these functions
  into calls and uses instructions instead.

I also left amd_vrd2_expm1() (the AMD docs list the function but I
wasn't able to link calls to it with the current version of the
library).

gcc/ChangeLog:

	PR target/56504
	* config/i386/i386-options.cc (ix86_option_override_internal):
	Add ix86_veclibabi_type_aocl case.
	* config/i386/i386-options.h (ix86_veclibabi_aocl): Add extern
	ix86_veclibabi_aocl().
	* config/i386/i386-opts.h (enum ix86_veclibabi): Add
	ix86_veclibabi_type_aocl into the ix86_veclibabi enum.
	* config/i386/i386.cc (ix86_veclibabi_aocl): New function.
	* config/i386/i386.opt: Add the 'aocl' type.
	* doc/invoke.texi: Document -mveclibabi=aocl.

gcc/testsuite/ChangeLog:

	PR target/56504
	* gcc.target/i386/vectorize-aocl1.c: New test.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

99ec0eb3

hppa: Remove inner `fix:SF/DF` from fixed-point patterns · 0342d024

John David Anglin authored 4 months ago

2024-11-13  John David Anglin  <danglin@gcc.gnu.org>

gcc/ChangeLog:

	PR target/117525
	* config/pa/pa.md (fix_truncsfsi2): Remove inner `fix:SF`.
	(fix_truncdfsi2, fix_truncsfdi2, fix_truncdfdi2,
	fixuns_truncsfsi2, fixuns_truncdfsi2, fixuns_truncsfdi2,
	fixuns_truncdfdi2): Likewise.

0342d024

diagnostics: avoid using global_dc in path-printing · 5ace2b23

David Malcolm authored 4 months ago


gcc/analyzer/ChangeLog:
	* checker-path.cc (checker_path::debug): Explicitly use
	global_dc's reference printer.
	* diagnostic-manager.cc
	(diagnostic_manager::prune_interproc_events): Likewise.
	(diagnostic_manager::prune_system_headers): Likewise.

gcc/ChangeLog:
	* diagnostic-path.cc (diagnostic_event::get_desc): Add param
	"ref_pp" and use instead of global_dc.
	(class path_label): Likewise, adding field m_ref_pp.
	(event_range::event_range): Add param "ref_pp" and pass to
	m_path_label.
	(path_summary::path_summary): Add param "ref_pp" and pass to
	event_range ctor.
	(diagnostic_text_output_format::print_path): Pass *pp to
	path_summary ctor.
	(selftest::test_empty_path): Pass *event_pp to pass_summary ctor.
	(selftest::test_intraprocedural_path): Likewise.
	(selftest::test_interprocedural_path_1): Likewise.
	(selftest::test_interprocedural_path_2): Likewise.
	(selftest::test_recursion): Likewise.
	(selftest::test_control_flow_1): Likewise.
	(selftest::test_control_flow_2): Likewise.
	(selftest::test_control_flow_3): Likewise.
	(selftest::assert_cfg_edge_path_streq): Likewise.
	(selftest::test_control_flow_5): Likewise.
	(selftest::test_control_flow_6): Likewise.
	* diagnostic-path.h (diagnostic_event::get_desc): Add param
	"ref_pp".
	* lazy-diagnostic-path.cc (selftest::test_intraprocedural_path):
	Pass *event_pp to get_desc.
	* simple-diagnostic-path.cc (selftest::test_intraprocedural_path):
	Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

5ace2b23

Match: Fold pow calls to ldexp when possible [PR57492] · 5a674367

Soumya AR authored 4 months ago

This patch transforms the following POW calls to equivalent LDEXP calls, as
discussed in PR57492:

powi (powof2, i) -> ldexp (1.0, i * log2 (powof2))

powof2 * ldexp (x, i) -> ldexp (x, i + log2 (powof2))

a * ldexp(1., i) -> ldexp (a, i)

This is especially helpful for SVE architectures as LDEXP calls can be
implemented using the FSCALE instruction, as seen in the following patch:
https://gcc.gnu.org/g:9b2915d95d855333d4d8f66b71a75f653ee0d076



SPEC2017 was run with this patch, while there are no noticeable improvements,
there are no non-noise regressions either.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.

Signed-off-by: Soumya AR <soumyaa@nvidia.com>

gcc/ChangeLog:
	PR target/57492
	* match.pd: Added patterns to fold calls to pow to ldexp and optimize
	specific ldexp calls.

gcc/testsuite/ChangeLog:
	PR target/57492
	* gcc.dg/tree-ssa/ldexp.c: New test.
	* gcc.dg/tree-ssa/pow-to-ldexp.c: New test.

5a674367

RISC-V: Add Multi-Versioning Test Cases · f42f8dcf

Yangyu Chen authored 4 months ago


This patch adds test cases for the Function Multi-Versioning (FMV)
feature for RISC-V, which reuses the existing test cases from the
aarch64 and ported them to RISC-V.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>

gcc/testsuite/ChangeLog:

	* g++.target/riscv/mv-symbols1.C: New test.
	* g++.target/riscv/mv-symbols2.C: New test.
	* g++.target/riscv/mv-symbols3.C: New test.
	* g++.target/riscv/mv-symbols4.C: New test.
	* g++.target/riscv/mv-symbols5.C: New test.
	* g++.target/riscv/mvc-symbols1.C: New test.
	* g++.target/riscv/mvc-symbols2.C: New test.
	* g++.target/riscv/mvc-symbols3.C: New test.
	* g++.target/riscv/mvc-symbols4.C: New test.

f42f8dcf

RISC-V: Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY and... · 917d03e4

Yangyu Chen authored 4 months ago

RISC-V: Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY and TARGET_GET_FUNCTION_VERSIONS_DISPATCHER

This patch implements the TARGET_GENERATE_VERSION_DISPATCHER_BODY and
TARGET_GET_FUNCTION_VERSIONS_DISPATCHER for RISC-V. This is used to
generate the dispatcher function and get the dispatcher function for
function multiversioning.

This patch copies many codes from commit 0cfde688 ("[aarch64]
Add function multiversioning support") and modifies them to fit the
RISC-V port. A key difference is the data structure of feature bits in
RISC-V C-API is a array of unsigned long long, while in AArch64 is not
a array. So we need to generate the array reference for each feature
bits element in the dispatcher function.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>

gcc/ChangeLog:

	* config/riscv/riscv.cc (add_condition_to_bb): New function.
	(dispatch_function_versions): New function.
	(get_suffixed_assembler_name): New function.
	(make_resolver_func): New function.
	(riscv_generate_version_dispatcher_body): New function.
	(riscv_get_function_versions_dispatcher): New function.
	(TARGET_GENERATE_VERSION_DISPATCHER_BODY): Implement it.
	(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Implement it.

917d03e4

RISC-V: Implement TARGET_MANGLE_DECL_ASSEMBLER_NAME · 0c77c4b0

Yangyu Chen authored 4 months ago


This patch implements the TARGET_MANGLE_DECL_ASSEMBLER_NAME for RISC-V.
This is used to add function multiversioning suffixes to the assembler
name.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>

gcc/ChangeLog:

	* config/riscv/riscv.cc
	(riscv_mangle_decl_assembler_name): New function.
	(TARGET_MANGLE_DECL_ASSEMBLER_NAME): Define.

0c77c4b0

RISC-V: Implement TARGET_COMPARE_VERSION_PRIORITY and TARGET_OPTION_FUNCTION_VERSIONS · 78753c75

Yangyu Chen authored 4 months ago

This patch implements TARGET_COMPARE_VERSION_PRIORITY and
TARGET_OPTION_FUNCTION_VERSIONS for RISC-V.

The TARGET_COMPARE_VERSION_PRIORITY is implemented to compare the
priority of two function versions based on the rules defined in the
RISC-V C-API Doc PR #85:

https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85/files#diff-79a93ca266139524b8b642e582ac20999357542001f1f4666fbb62b6fb7a5824R721



If multiple versions have equal priority, we select the function with
the most number of feature bits generated by
riscv_minimal_hwprobe_feature_bits. When it comes to the same number of
feature bits, we diff two versions and select the one with the least
significant bit set. Since a feature appears earlier in the feature_bits
might be more important to performance.

The TARGET_OPTION_FUNCTION_VERSIONS is implemented to check whether the
two function versions are the same. This Implementation reuses the code
in TARGET_COMPARE_VERSION_PRIORITY and check it returns 0, which means
the equal priority.

Co-Developed-by: Hank Chang <hank.chang@sifive.com>
Signed-off-by: Yangyu Chen <cyy@cyyself.name>

gcc/ChangeLog:

	* config/riscv/riscv.cc
	(parse_features_for_version): New function.
	(compare_fmv_features): New function.
	(riscv_compare_version_priority): New function.
	(riscv_common_function_versions): New function.
	(TARGET_COMPARE_VERSION_PRIORITY): Implement it.
	(TARGET_OPTION_FUNCTION_VERSIONS): Implement it.

78753c75

RISC-V: Implement TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P · bd975bd1

Yangyu Chen authored 4 months ago


This patch implements the TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P for
RISC-V. This hook is used to process attribute
((target_version ("..."))).

As it is the first patch which introduces the target_version attribute,
we also set TARGET_HAS_FMV_TARGET_ATTRIBUTE to 0 to use "target_version"
for function versioning.

Co-Developed-by: Hank Chang <hank.chang@sifive.com>
Signed-off-by: Yangyu Chen <cyy@cyyself.name>

gcc/ChangeLog:

	* config/riscv/riscv-protos.h
	(riscv_process_target_attr): Remove as it is not used.
	(riscv_option_valid_version_attribute_p): Declare.
	(riscv_process_target_version_attr): Declare.
	* config/riscv/riscv-target-attr.cc
	(riscv_target_attrs): Renamed from riscv_attributes.
	(riscv_target_version_attrs): New attributes for target_version.
	(riscv_process_one_target_attr): New arguments to select attrs.
	(riscv_process_target_attr): Likewise.
	(riscv_option_valid_attribute_p): Likewise.
	(riscv_process_target_version_attr): New function.
	(riscv_option_valid_version_attribute_p): New function.
	* config/riscv/riscv.cc
	(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): Implement it.
	* config/riscv/riscv.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): Define
	it to 0 to use "target_version" for function versioning.

bd975bd1

RISC-V: Implement riscv_minimal_hwprobe_feature_bits · 1f99a39d

Yangyu Chen authored 4 months ago


This patch implements the riscv_minimal_hwprobe_feature_bits feature
for the RISC-V target. The feature bits are defined in the
libgcc/config/riscv/feature_bits.c to provide bitmasks of ISA extensions
that defined in RISC-V C-API. Thus, we need a function to generate the
feature bits for IFUNC resolver to dispatch between different functions
based on the hardware features.

The minimal feature bits means to use the earliest extension appeard in
the Linux hwprobe to cover the given ISA string. To allow older kernels
without some implied extensions probe to run the FMV dispatcher
correctly.

For example, V implies Zve32x, but Zve32x appears in the Linux kernel
since v6.11. If we use isa string directly to generate FMV dispatcher
with functions with "arch=+v" extension, since we have V implied the
Zve32x, FMV dispatcher will check if the Zve32x extension is supported
by the host. If the Linux kernel is older than v6.11, the FMV dispatcher
will fail to detect the Zve32x extension even it already implies by the
V extension, thus making the FMV dispatcher fail to dispatch the correct
function.

Thus, we need to generate the minimal feature bits to cover the given
ISA string to allow the FMV dispatcher to work correctly on older
kernels.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>

gcc/ChangeLog:

	* common/config/riscv/riscv-common.cc
	(RISCV_EXT_BITMASK): New macro.
	(struct riscv_ext_bitmask_table_t): New struct.
	(riscv_minimal_hwprobe_feature_bits): New function.
	* common/config/riscv/riscv-ext-bitmask.def: New file.
	* config/riscv/riscv-subset.h (GCC_RISCV_SUBSET_H): Include
	riscv-feature-bits.h.
	(riscv_minimal_hwprobe_feature_bits): Declare the function.
	* config/riscv/riscv-feature-bits.h: New file.

1f99a39d

RISC-V: Implement Priority syntax parser for Function Multi-Versioning · 6b572d4e

Yangyu Chen authored 4 months ago

This patch adds the priority syntax parser to support the Function
Multi-Versioning (FMV) feature in RISC-V. This feature allows users to
specify the priority of the function version in the attribute syntax.

Chnages based on RISC-V C-API PR:
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85



Signed-off-by: Yangyu Chen <cyy@cyyself.name>

gcc/ChangeLog:

	* config/riscv/riscv-target-attr.cc
	(riscv_target_attr_parser::handle_priority): New function.
	(riscv_target_attr_parser::update_settings): Update priority
	attribute.
	* config/riscv/riscv.opt: Add TargetVariable riscv_fmv_priority.

6b572d4e

Introduce TARGET_CLONES_ATTR_SEPARATOR for RISC-V · 9bf0dbe6

Yangyu Chen authored 4 months ago

Some architectures may use ',' in the attribute string, but it is not
used as the separator for different targets. To avoid conflict, we
introduce a new macro TARGET_CLONES_ATTR_SEPARATOR to separate different
clones.

As an example, according to RISC-V C-API Specification [1], RISC-V allows
',' in the attribute string in the "arch=" option to specify one more
ISA extensions in the same target function, which conflict with the
default separator to separate different clones. This patch introduces
TARGET_CLONES_ATTR_SEPARATOR for RISC-V and choose '#' as the separator,
since '#' is not allowed in the target_clones option string.

[1] https://github.com/riscv-non-isa/riscv-c-api-doc/blob/c6c5d6d9cf96b342293315a5dff3d25e96ef8191/src/c-api.adoc#__attribute__targetattr-string



Signed-off-by: Yangyu Chen <cyy@cyyself.name>

gcc/ChangeLog:

	* defaults.h (TARGET_CLONES_ATTR_SEPARATOR): Define new macro.
	* multiple_target.cc (get_attr_str): Use
	TARGET_CLONES_ATTR_SEPARATOR to separate attributes.
	(separate_attrs): Likewise.
	(expand_target_clones): Likewise.
	* attribs.cc (attr_strcmp): Likewise.
	(sorted_attr_string): Likewise.
	* tree.cc (get_target_clone_attr_len): Likewise.
	* config/riscv/riscv.h (TARGET_CLONES_ATTR_SEPARATOR): Define
	TARGET_CLONES_ATTR_SEPARATOR for RISC-V.
	* doc/tm.texi: Document TARGET_CLONES_ATTR_SEPARATOR.
	* doc/tm.texi.in: Likewise.

9bf0dbe6

Fortran: Fix failing character pointer fcn assignment [PR105054] · f530a8c6

Paul Thomas authored 4 months ago

2024-11-14  Paul Thomas  <pault@gcc.gnu.org>

gcc/fortran
	PR fortran/105054
	* resolve.cc (get_temp_from_expr): If the pointer function has
	a deferred character length, generate a new deferred charlen
	for the temporary.

gcc/testsuite/
	PR fortran/105054
	* gfortran.dg/ptr_func_assign_6.f08: New test.

f530a8c6

c: add Wzero-as-null-pointer-constant [PR117059] · 236c0829

Martin Uecker authored 4 months ago


Add warnings for the use of zero as a null pointer constant to the C FE.

	PR c/117059

gcc/c-family/ChangeLog:
	* c.opt (Wzero-as-null-pointer-constant): Enable for C and ObjC.

gcc/c/ChangeLog:
	* c-typeck.cc (parse_build_binary_op): Add warning.
	(build_conditional_expr): Add warning.
	(convert_for_assignment): Add warning.

gcc/ChangeLog:
	* doc/invoke.texi (Wzero-as-null-pointer-constant): Adapt
	description.

gcc/testsuite/ChangeLog:
	* gcc.dg/Wzero-as-null-pointer-constant.c: New test.

Suggested-by: Alejandro Colomar <alx@kernel.org>
Acked-by: Alejandro Colomar <alx@kernel.org>
Reviewed-by: Joseph Myers <josmyers@redhat.com>

236c0829

c: Handle C23 floating constant {d,D}{32,64,128} suffixes like {df,dd,dl} · 856809e5

Jakub Jelinek authored 4 months ago

C23 roughly says that {d,D}{32,64,128} floating point constant suffixes
are alternate spellings of {df,dd,dl} suffixes in annex H.

So, the following patch allows that alternate spelling.
Or is it intentional it isn't enabled and we need to do everything in
there first before trying to define __STDC_IEC_60559_DFP__?
Like add support for _Decimal32x and _Decimal64x types (including
the d32x and d64x suffixes) etc.

2024-11-13  Jakub Jelinek  <jakub@redhat.com>

libcpp/
	* expr.cc (interpret_float_suffix): Handle d32 and D32 suffixes
	for C like df, d64 and D64 like dd and d128 and D128 like
	dl.
gcc/c-family/
	* c-lex.cc (interpret_float): Subtract 3 or 4 from copylen
	rather than 2 if last character of CPP_N_DFLOAT is a digit.
gcc/testsuite/
	* gcc.dg/dfp/c11-constants-3.c: New test.
	* gcc.dg/dfp/c11-constants-4.c: New test.
	* gcc.dg/dfp/c23-constants-3.c: New test.
	* gcc.dg/dfp/c23-constants-4.c: New test.

856809e5

c: Implement C2Y N3298 - Introduce complex literals [PR117029] · eb45d151

Jakub Jelinek authored 4 months ago

The following patch implements the C2Y N3298 paper Introduce complex literals
by providing different (or no) diagnostics on imaginary constants (except
for integer ones).
For _DecimalN constants we don't support _Complex _DecimalN and error on any
i/j suffixes mixed with DD/DL/DF, so nothing changed there.

2024-11-13  Jakub Jelinek  <jakub@redhat.com>

	PR c/117029
libcpp/
	* include/cpplib.h (struct cpp_options): Add imaginary_constants
	member.
	* init.cc (struct lang_flags): Add imaginary_constants bitfield.
	(lang_defaults): Add column for imaginary_constants.
	(cpp_set_lang): Copy over imaginary_constants.
	* expr.cc (cpp_classify_number): Diagnose CPP_N_IMAGINARY
	non-CPP_N_FLOATING constants differently for C.
gcc/testsuite/
	* gcc.dg/cpp/pr7263-3.c: Adjust expected diagnostic wording.
	* gcc.dg/c23-imaginary-constants-1.c: New test.
	* gcc.dg/c23-imaginary-constants-2.c: New test.
	* gcc.dg/c23-imaginary-constants-3.c: New test.
	* gcc.dg/c23-imaginary-constants-4.c: New test.
	* gcc.dg/c23-imaginary-constants-5.c: New test.
	* gcc.dg/c23-imaginary-constants-6.c: New test.
	* gcc.dg/c23-imaginary-constants-7.c: New test.
	* gcc.dg/c23-imaginary-constants-8.c: New test.
	* gcc.dg/c23-imaginary-constants-9.c: New test.
	* gcc.dg/c23-imaginary-constants-10.c: New test.
	* gcc.dg/c2y-imaginary-constants-1.c: New test.
	* gcc.dg/c2y-imaginary-constants-2.c: New test.
	* gcc.dg/c2y-imaginary-constants-3.c: New test.
	* gcc.dg/c2y-imaginary-constants-4.c: New test.
	* gcc.dg/c2y-imaginary-constants-5.c: New test.
	* gcc.dg/c2y-imaginary-constants-6.c: New test.
	* gcc.dg/c2y-imaginary-constants-7.c: New test.
	* gcc.dg/c2y-imaginary-constants-8.c: New test.
	* gcc.dg/c2y-imaginary-constants-9.c: New test.
	* gcc.dg/c2y-imaginary-constants-10.c: New test.
	* gcc.dg/c2y-imaginary-constants-11.c: New test.
	* gcc.dg/c2y-imaginary-constants-12.c: New test.

eb45d151

aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733] · 9b2915d9

Soumya AR authored 4 months ago


This patch uses the FSCALE instruction provided by SVE to implement the
standard ldexp family of functions.

Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
following code:

float
test_ldexpf (float x, int i)
{
	return __builtin_ldexpf (x, i);
}

double
test_ldexp (double x, int i)
{
	return __builtin_ldexp(x, i);
}

GCC Output:

test_ldexpf:
	b ldexpf

test_ldexp:
	b ldexp

Since SVE has support for an FSCALE instruction, we can use this to process
scalar floats by moving them to a vector register and performing an fscale call,
similar to how LLVM tackles an ldexp builtin as well.

New Output:

test_ldexpf:
	fmov	s31, w0
	ptrue	p7.b, vl4
	fscale	z0.s, p7/m, z0.s, z31.s
	ret

test_ldexp:
	sxtw	x0, w0
	ptrue	p7.b, vl8
	fmov	d31, x0
	fscale	z0.d, p7/m, z0.d, z31.d
	ret

This is a revision of an earlier patch, and now uses the extended definition of
aarch64_ptrue_reg to generate predicate registers with the appropriate set bits.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR <soumyaa@nvidia.com>

gcc/ChangeLog:

	PR target/111733
	* config/aarch64/aarch64-sve.md
	(ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
	floating modes and expand to the existing pattern for FSCALE.
	* config/aarch64/iterators.md:
	(SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well
	as their scalar equivalents.
	(VPRED): Extended the attribute to handle GPF_HF modes.
	* internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/fscale.c: New test.

9b2915d9

RISC-V: Bugfix for max_sew_overlap_and_next_ratio_valid_for_prev_sew_p[pr117483] · 445d8bb6

xuli authored 4 months ago

This patch fixs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117483



If prev and next satisfy the following rules, we should forbid the case
(next.get_sew() < prev.get_sew() && (!next.get_ta() || !next.get_ma()))
in the compatible function max_sew_overlap_and_next_ratio_valid_for_prev_sew_p.
Otherwise, the tail elements of next will be polluted.

DEF_SEW_LMUL_RULE (ge_sew, ratio_and_ge_sew, ratio_and_ge_sew,
 max_sew_overlap_and_next_ratio_valid_for_prev_sew_p,
 always_false, use_max_sew_and_lmul_with_next_ratio)

Passed the rv64gcv full regression test.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>

	PR target/117483

gcc/ChangeLog:

	* config/riscv/riscv-vsetvl.cc: Fix bug.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/pr117483.c: New test.

445d8bb6

[RISC-V] Fix costing of LO_SUM expressions · eeb5c6ac

Xianmiao Qu authored 4 months ago


This is a rewrite of a patch originally from Xianmiao Qu.  Xianmiao
noticed that the costs we compute for LO_SUM expressions was incorrect.
Essentially we costed based solely on the first input to the LO_SUM.

In a LO_SUM, the first input is almost always going to be a REG and thus
isn't interesting.  The second argument is almost always going to be
some kind of symbolic operand, which is much more interesting from a
costing standpoint.

The right way to fix this is to sum the cost of the two operands.  I've
verified this produces the same code as Xianmiao's Qu's original patch.

This has been tested on rv32 and rv64 in my tester.  It missed today's
bootstrap of riscv64 though :(  Naturally I'll wait on the pre-commit CI
tester to render a verdict, but I don't expect any problems.

--  From Xianmiao Qu's original submission --

Currently, the cost of the LO_SUM expression is based on
the cost of calculating the first subexpression. When the
first subexpression is a register, the cost result will
be zero. It seems a bit unreasonable for a SET expression
to have a zero cost when its source is LO_SUM. Moreover,
having a cost of zero for the expression will lead the
loop invariant pass to calculate its benefits of being
moved outside the loop as zero, thus preventing the
out-of-loop placement of the loop invariant.

As an example, consider the following test case:
   long a;
   long b[];
   long *c;
   foo () {
     for (;;)
       *c = b[a];
   }

When compiling with -march=rv64gc -mabi=lp64d -Os, the following code is
generated:
         .cfi_startproc
         lui     a5,%hi(c)
         ld      a4,%lo(c)(a5)
         lui     a2,%hi(b)
         lui     a1,%hi(a)
.L2:
         ld      a5,%lo(a)(a1)
         addi    a3,a2,%lo(b)
         slli    a5,a5,3
         add     a5,a5,a3
         ld      a5,0(a5)
         sd      a5,0(a4)
         j       .L2

After adjust the cost of the LO_SUM expression, the instruction addi will be
moved outside the loop:
         .cfi_startproc
         lui     a5,%hi(c)
         ld      a3,%lo(c)(a5)
         lui     a4,%hi(b)
         lui     a2,%hi(a)
         addi    a4,a4,%lo(b)
.L2:
         ld      a5,%lo(a)(a2)
         slli    a5,a5,3
         add     a5,a5,a4
         ld      a5,0(a5)
         sd      a5,0(a3)
         j       .L2

gcc/
	* config/riscv/riscv.cc (riscv_rtx_costs): Correct costing of LO_SUM
	expressions.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

eeb5c6ac

Reapply "[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]" · 10d76b7f
Jeff Law authored 4 months ago
```
This reverts commit de3b2772.
```
10d76b7f

i386: Zero extend 32-bit address to 64-bit with option -mx32 -maddress-mode=long. [PR 117418] · 2272cd25

Hu, Lin1 authored 4 months ago

-maddress-mode=long let Pmode = DI_mode, so zero extend 32-bit address to
64-bit and uses a 64-bit register as a pointer for avoid raise an ICE.

gcc/ChangeLog:

	PR target/117418
	* config/i386/i386-expand.cc (ix86_expand_builtin): Convert
	pointer's mode according to Pmode.

gcc/testsuite/ChangeLog:

	PR target/117418
	* gcc.target/i386/pr117418-1.c: New test.

2272cd25

Daily bump. · 9e423b5c
GCC Administrator authored 4 months ago

9e423b5c
Revert "[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]" · de3b2772
Jeff Law authored 4 months ago
```
This reverts commit 69bd93c1.
```
de3b2772

Nov 12, 2024

RISC-V: Fix target-attr-norelax.c testcase · 098214cf

Yangyu Chen authored 4 months ago

The target-attr-norelax.c testcase was failing due to the redundant "\t"
check in the assembly output, and forgot to skip the check for lto build
in the testcase.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/target-attr-norelax.c: Fix testcase.

098214cf

Revert "Match: Simplify branch form 3 of unsigned SAT_ADD into branchless" · d95339c9
Pan Li authored 4 months ago
```
This reverts commit df4af89b.
```
d95339c9

selftests: clear GCC_COLORS [PR117503] · 169897bb

David Malcolm authored 4 months ago


gcc/ChangeLog:
	PR bootstrap/117503
	* Makefile.in (GCC_FOR_SELFTESTS): Set GCC_COLORS=.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

169897bb