Commits · f30423ea8c2152dcee91056e75a4f3736cce6a6e · COBOLworx / gcc-cobol

Jan 11, 2025

LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d · f30423ea

mengqinggang authored 2 months ago

Generate 0x1010 instead of 0x1010000>>12 for lu12i.w. lu32i.d and lu52i.d use
the same processing.

gcc/ChangeLog:

	* config/loongarch/lasx.md: Use new loongarch_output_move.
	* config/loongarch/loongarch-protos.h (loongarch_output_move):
	Change parameters from (rtx, rtx) to (rtx *).
	* config/loongarch/loongarch.cc (loongarch_output_move):
	Generate final immediate for lu12i.w and lu52i.d.
	* config/loongarch/loongarch.md:
	Generate final immediate for lu32i.d and lu52i.d.
	* config/loongarch/lsx.md: Use new loongarch_output_move.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/imm-load.c: Not generate ">>".

f30423ea

d: Merge dmd, druntime 2b89c2909d, phobos bdedad3bf · dd3026f0

Iain Buclaw authored 2 months ago

D front-end changes:

        - Import latest fixes from dmd v2.110.0-beta.1.

D runtime changes:

        - Import latest fixes from druntime v2.110.0-beta.1.

Phobos changes:

        - Import latest fixes from phobos v2.110.0-beta.1.
	- Added `popGrapheme' function to `std.uni'.

gcc/d/ChangeLog:

	* dmd/MERGE: Merge upstream dmd 2b89c2909d.
	* Make-lang.in (D_FRONTEND_OBJS): Rename d/basicmangle.o to
	d/mangle-basic.o, d/cppmangle.o to d/mangle-cpp.o, and d/dmangle.o to
	d/mangle-package.o.
	(d/mangle-%.o): New rule.
	* d-builtins.cc (maybe_set_builtin_1): Update for new front-end
	interface.
	* d-diagnostic.cc (verrorReport): Likewise.
	(verrorReportSupplemental): Likewise.
	* d-frontend.cc (getTypeInfoType): Likewise.
	* d-lang.cc (d_init_options): Likewise.
	(d_handle_option): Likewise.
	(d_post_options): Likewise.
	* d-target.cc (TargetC::contributesToAggregateAlignment): New.
	* d-tree.h (create_typeinfo): Adjust prototype.
	* decl.cc (layout_struct_initializer): Update for new front-end
	interface.
	* typeinfo.cc (create_typeinfo): Remove generate parameter.
	* types.cc (layout_aggregate_members): Update for new front-end
	interface.

libphobos/ChangeLog:

	* libdruntime/MERGE: Merge upstream druntime 2b89c2909d.
	* src/MERGE: Merge upstream phobos bdedad3bf.

dd3026f0

Use relations when simplifying MIN and MAX. · b0eeb540

Andrew MacLeod authored 2 months ago

Query for known relations between the operands, and pass that to
fold_range to help simplify MIN and MAX relations.
Make it type agnostic as well.

Adapt testcases from DOM to EVRP (e suffix) and test floats (f suffix).

	PR tree-optimization/88575
	gcc/
	* vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Query
	relation between op0 and op1 and utilize it.
	(simplify_using_ranges::simplify): Do not eliminate float checks.

	gcc/testsuite/
	* gcc.dg/tree-ssa/minmax-27.c: Disable VRP.
	* gcc.dg/tree-ssa/minmax-27e.c: New.
	* gcc.dg/tree-ssa/minmax-27f.c: New.
	* gcc.dg/tree-ssa/minmax-28.c: Disable VRP.
	* gcc.dg/tree-ssa/minmax-28e.c: New.
	* gcc.dg/tree-ssa/minmax-28f.c: New.

b0eeb540

Daily bump. · 4951a90e
GCC Administrator authored 2 months ago

4951a90e

Jan 10, 2025

d: Merge dmd, druntime 4ccb01fde5, phobos eab6595ad · c82395e0

Iain Buclaw authored 2 months ago

D front-end changes:

	- Added pragma for ImportC to allow setting `nothrow', `@nogc'
	  or `pure'.
	- Mixin templates can now use assignment syntax.

D runtime changes:

	- Removed `ThreadBase.criticalRegionLock' from `core.thread'.
	- Added `expect', `[un]likely', `trap' to `core.builtins'.

Phobos changes:

	- Import latest fixes from phobos v2.110.0-beta.1.

gcc/d/ChangeLog:

	* dmd/MERGE: Merge upstream dmd 4ccb01fde5.
	* Make-lang.in (D_FRONTEND_OBJS): Rename d/foreachvar.o to
	d/visitor-foreachvar.o, d/visitor.o to d/visitor-package.o, and
	d/statement_rewrite_walker.o to d/visitor-statement_rewrite_walker.o.
	(D_FRONTEND_OBJS): Rename
	d/{parsetime,permissive,postorder,transitive}visitor.o to
	d/visitor-{parsetime,permissive,postorder,transitive}.o.
	(D_FRONTEND_OBJS): Remove d/sapply.o.
	(d.tags): Add dmd/common/*.h.
	(d/visitor-%.o:): New rule.
	* d-codegen.cc (get_frameinfo): Update for new front-end interface.

libphobos/ChangeLog:

	* libdruntime/MERGE: Merge upstream druntime 4ccb01fde5.
	* src/MERGE: Merge upstream phobos eab6595ad.

c82395e0

d: Merge dmd, druntime 6884b433d2, phobos 48d581a1f · a7ae0c31

Iain Buclaw authored 2 months ago

D front-end changes:

	- It's now deprecated to declare `auto ref' parameters without
	  putting those two keywords next to each other.
        - An error is now given for case fallthough for multivalued
	  cases.
        - An error is now given for constructors with field destructors
	  with stricter attributes.
        - An error is now issued for `in'/`out' contracts of `nothrow'
	  functions that may throw.
	- `auto ref' can now be applied to local, static, extern, and
	  global variables.

D runtime changes:

        - Import latest fixes from druntime v2.110.0-beta.1.

Phobos changes:

        - Import latest fixes from phobos v2.110.0-beta.1.

gcc/d/ChangeLog:

	* dmd/MERGE: Merge upstream dmd 6884b433d2.
	* d-builtins.cc (build_frontend_type): Update for new front-end
	interface.
	(d_build_builtins_module): Likewise.
	(matches_builtin_type): Likewise.
	(covariant_with_builtin_type_p): Likewise.
	* d-codegen.cc (lower_struct_comparison): Likewise.
	(call_side_effect_free_p): Likewise.
	* d-compiler.cc (Compiler::paintAsType): Likewise.
	* d-convert.cc (convert_expr): Likewise.
	(convert_for_assignment): Likewise.
	* d-target.cc (Target::isVectorTypeSupported): Likewise.
	(Target::isVectorOpSupported): Likewise.
	(Target::isReturnOnStack): Likewise.
	* decl.cc (get_symbol_decl): Likewise.
	* expr.cc (build_return_dtor): Likewise.
	* imports.cc (class ImportVisitor): Likewise.
	* toir.cc (class IRVisitor): Likewise.
	* types.cc (class TypeVisitor): Likewise.

libphobos/ChangeLog:

	* libdruntime/MERGE: Merge upstream druntime 6884b433d2.
	* src/MERGE: Merge upstream phobos 48d581a1f.

a7ae0c31

vect: Also cost gconds for scalar [PR118211] · 086031c0

Alex Coplan authored 9 months ago

Currently we only cost gconds for the vector loop while we omit costing
them when analyzing the scalar loop; this unfairly penalizes the vector
loop in the case of loops with early exits.

This (together with the previous patches) enables us to vectorize
std::find with 64-bit element sizes.

gcc/ChangeLog:

	PR tree-optimization/118211
	PR tree-optimization/116126
	* tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost):
	Don't skip over gconds.

086031c0

vect: Ensure we add vector skip guard even when versioning for aliasing [PR118211] · f4e259b4

Alex Coplan authored 8 months ago

This fixes a latent wrong code issue whereby vect_do_peeling determined
the wrong condition for inserting the vector skip guard.  Specifically
in the case where the loop niters are unknown at compile time we used to
check:

  !LOOP_REQUIRES_VERSIONING (loop_vinfo)

but LOOP_REQUIRES_VERSIONING is true for loops which we have versioned
for aliasing, and that has nothing to do with prolog peeling.  I think
this condition should instead be checking specifically if we aren't
versioning for alignment.

As it stands, when we version for alignment, we don't peel, so the
vector skip guard is indeed redundant in that case.

With the testcase added (reduced from the Fortran frontend) we would
version for aliasing, omit the vector skip guard, and then at runtime we
would peel sufficient iterations for alignment that there wasn't a full
vector iteration left when we entered the vector body, thus overflowing
the output buffer.

gcc/ChangeLog:

	PR tree-optimization/118211
	PR tree-optimization/116126
	* tree-vect-loop-manip.cc (vect_do_peeling): Adjust skip_vector
	condition to only omit the edge if we're versioning for
	alignment.

gcc/testsuite/ChangeLog:

	PR tree-optimization/118211
	PR tree-optimization/116126
	* gcc.dg/vect/vect-early-break_130.c: New test.

f4e259b4

vect: Fix dominators when adding a guard to skip the vector loop [PR118211] · f1c6789a

Tamar Christina authored 8 months ago


The alignment peeling changes exposed a latent missing dominator update
with early break vectorization, specifically when inserting the vector
skip edge, since the new edge bypasses the prolog skip block and thus
has the potential to subvert its dominance.  This patch fixes that.

gcc/ChangeLog:

	PR tree-optimization/118211
	PR tree-optimization/116126
	* tree-vect-loop-manip.cc (vect_do_peeling): Update immediate
	dominators of nodes that were dominated by the prolog skip block
	after inserting vector skip edge.  Initialize prolog variable to
	NULL to avoid bogus -Wmaybe-uninitialized during bootstrap.

gcc/testsuite/ChangeLog:

	PR tree-optimization/118211
	PR tree-optimization/116126
	* g++.dg/vect/vect-early-break_6.cc: New test.

Co-Authored-By: Alex Coplan <alex.coplan@arm.com>

f1c6789a

vect: Don't guard scalar epilogue for inverted loops [PR118211] · 0a462451

Alex Coplan authored 9 months ago

For loops with LOOP_VINFO_EARLY_BREAKS_VECT_PEELED we should always
enter the scalar epilogue, so avoid emitting a guard on entry to the
epilogue.

gcc/ChangeLog:

	PR tree-optimization/118211
	PR tree-optimization/116126
	* tree-vect-loop-manip.cc (vect_do_peeling): Avoid emitting an
	epilogue guard for inverted early-exit loops.

0a462451

vect: Force alignment peeling to vectorize more early break loops [PR118211] · 68326d5d

Alex Coplan authored 1 year ago


This allows us to vectorize more loops with early exits by forcing
peeling for alignment to make sure that we're guaranteed to be able to
safely read an entire vector iteration without crossing a page boundary.

To make this work for VLA architectures we have to allow compile-time
non-constant target alignments.  We also have to override the result of
the target's preferred_vector_alignment hook if it isn't a power-of-two
multiple of the TYPE_SIZE of the chosen vector type.

gcc/ChangeLog:

	PR tree-optimization/118211
	PR tree-optimization/116126
	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
	Set need_peeling_for_alignment flag on read DRs instead of
	failing vectorization.  Punt on gathers.
	(dr_misalignment): Handle non-constant target alignments.
	(vect_compute_data_ref_alignment): If need_peeling_for_alignment
	flag is set on the DR, then override the target alignment chosen
	by the preferred_vector_alignment hook to choose a safe
	alignment.
	(vect_supportable_dr_alignment): Override
	support_vector_misalignment hook if need_peeling_for_alignment
	is set on the DR: in this case we must return
	dr_unaligned_unsupported in order to force peeling.
	* tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
	peeling by a compile-time non-constant amount.
	* tree-vectorizer.h (dr_vec_info): Add new flag
	need_peeling_for_alignment.

gcc/testsuite/ChangeLog:

	PR tree-optimization/118211
	PR tree-optimization/116126
	* gcc.dg/tree-ssa/cunroll-13.c: Don't vectorize.
	* gcc.dg/tree-ssa/cunroll-14.c: Likewise.
	* gcc.dg/unroll-6.c: Likewise.
	* gcc.dg/tree-ssa/gen-vect-28.c: Likewise.
	* gcc.dg/vect/vect-104.c: Expect to vectorize.
	* gcc.dg/vect/vect-early-break_108-pr113588.c: Likewise.
	* gcc.dg/vect/vect-early-break_109-pr113588.c: Likewise.
	* gcc.dg/vect/vect-early-break_110-pr113467.c: Likewise.
	* gcc.dg/vect/vect-early-break_3.c: Likewise.
	* gcc.dg/vect/vect-early-break_65.c: Likewise.
	* gcc.dg/vect/vect-early-break_8.c: Likewise.
	* gfortran.dg/vect/vect-5.f90: Likewise.
	* gfortran.dg/vect/vect-8.f90: Likewise.
	* gcc.dg/vect/vect-switch-search-line-fast.c:

Co-Authored-By: Tamar Christina <tamar.christina@arm.com>

68326d5d

AArch64: correct Cortex-X4 MIDR · ddcfae1d

Tamar Christina authored 2 months ago

The Parts Num field for the MIDR for Cortex-X4 is wrong.  It's currently the
parts number for a Cortex-A720 (which does have the right number).

The correct number can be found in the Cortex-X4 Technical Reference Manual [1]
on page 382 in Issue Number 5.

[1] https://developer.arm.com/documentation/102484/latest/

gcc/ChangeLog:

	* config/aarch64/aarch64-cores.def (AARCH64_CORE): Fix cortex-x4 parts
	num.

ddcfae1d

d: Merge dmd, druntime 34875cd6e1, phobos ebd24da8a · 89629b27

Iain Buclaw authored 2 months ago

D front-end changes:

        - Import dmd v2.110.0-beta.1.
        - `ref' can now be applied to local, static, extern, and global
	  variables.

D runtime changes:

        - Import druntime v2.110.0-beta.1.

Phobos changes:

        - Import phobos v2.110.0-beta.1.

gcc/d/ChangeLog:

	* dmd/MERGE: Merge upstream dmd 34875cd6e1.
	* dmd/VERSION: Bump version to v2.110.0-beta.1.
	* Make-lang.in (D_FRONTEND_OBJS): Add d/deps.o, d/timetrace.o.
	* decl.cc (class DeclVisitor): Update for new front-end interface.
	* expr.cc (class ExprVisitor): Likewise
	* typeinfo.cc (check_typeinfo_type): Likewise.

libphobos/ChangeLog:

	* libdruntime/MERGE: Merge upstream druntime 34875cd6e1.
	* src/MERGE: Merge upstream phobos ebd24da8a.

89629b27

libstdc++: Fix unused parameter warnings in <bits/atomic_futex.h> · c9353e0f

Jonathan Wakely authored 2 months ago

This fixes warnings like the following during bootstrap:

sparc-sun-solaris2.11/libstdc++-v3/include/bits/atomic_futex.h:324:53: warning: unused parameter ‘__mo’ [-Wunused-parameter]
  324 |     _M_load_when_equal(unsigned __val, memory_order __mo)
      |                                        ~~~~~~~~~~~~~^~~~

libstdc++-v3/ChangeLog:

	* include/bits/atomic_futex.h (__atomic_futex_unsigned): Remove
	names of unused parameters in non-futex implementation.

Unverified

c9353e0f

c++: add fixed test [PR118391] · d2017159

Marek Polacek authored 2 months ago

Fixed by r15-6740.

	PR c++/118391

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/lambda-uneval20.C: New test.

d2017159

libatomic: Cleanup AArch64 ifunc selection · 81bcf412

Wilco Dijkstra authored 2 months ago

Simplify and cleanup ifunc selection logic.  Since LRCPC3 does
not imply LSE2, has_rcpc3() should also check LSE2 is enabled.

Passes regress and bootstrap, OK for commit?

libatomic:
	* config/linux/aarch64/host-config.h (has_lse2): Cleanup.
	(has_lse128): Likewise.
	(has_rcpc3): Add early check for LSE2.

81bcf412

testsuite: arm: Add pattern for armv8-m.base to cmse-15.c test · cfd7c54b

Torbjörn SVENSSON authored 2 months ago


Since armv8-m.base uses thumb1 that does not suport sibcall/tailcall,
a pattern is needed that uses PUSH/BL/POP sequence instead of a single
B instruction to reuse an already existing function in the compile unit.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/cmse/cmse-15.c: Added pattern for armv8-m.base.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

cfd7c54b

Do not call cp_parser_omp_dispatch directly in cp_parser_pragma · b5a67989

Paul-Antoine Arras authored 2 months ago

This is a followup to
ed49709a OpenMP: C++ front-end support for dispatch + adjust_args.

The call to cp_parser_omp_dispatch only belongs in cp_parser_omp_construct. In
cp_parser_pragma, handle PRAGMA_OMP_DISPATCH by calling cp_parser_omp_construct.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_pragma): Replace call to cp_parser_omp_dispatch
	with cp_parser_omp_construct and check context.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/dispatch-8.C: New test.

b5a67989

c++: Fix ICE with invalid defaulted operator <=> [PR118387] · 4c688399

Jakub Jelinek authored 2 months ago

In the following testcase there are 2 issues, one is that B doesn't
have operator<=> and the other is that A's operator<=> has int return
type, i.e. not the standard comparison category.
Because of the int return type, retcat is cc_last; when we first
try to synthetize it, it is therefore with tentative false and complain
tf_none, we find that B doesn't have operator<=> and because retcat isn't
tc_last, don't try to search for other operators in genericize_spaceship.
And then mark the operator deleted.
When trying to explain the use of the deleted operator, tentative is still
false, but complain is tf_error_or_warning.
do_one_comp will first do:
  tree comp = build_new_op (loc, code, flags, lhs, rhs,
                            NULL_TREE, NULL_TREE, &overload,
                            tentative ? tf_none : complain);
and because complain isn't tf_none, it will actually diagnose the bug
already, but then (tentative || complain) is true and we call
genericize_spaceship, which has
  if (tag == cc_last && is_auto (type))
    {
...
    }

  gcc_checking_assert (tag < cc_last);
and because tag is cc_last and type isn't auto, we just ICE on that
assertion.

The patch fixes it by returning error_mark_node from genericize_spaceship
instead of failing the assertion.

Note, the PR raises another problem.
If on the same testcase the B b; line is removed, we silently synthetize
operator<=> which will crash at runtime due to returning without a return
statement.  That is because the standard says that in that case
it should return static_cast<int>(std::strong_ordering::equal);
but I can't find anywhere wording which would say that if that isn't
valid, the function is deleted.
https://eel.is/c++draft/class.compare#class.spaceship-2.2
seems to talk just about cases where there are some members and their
comparison is invalid it is deleted, but here there are none and it
follows
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
So, we synthetize with tf_none, see the static_cast is invalid, don't
add error_mark_node statement silently, but as the function isn't deleted,
we just silently emit it.
Should the standard be amended to say that the operator should be deleted
even if it has no elements and the static cast from
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
?

2025-01-10  Jakub Jelinek  <jakub@redhat.com>

	PR c++/118387
	* method.cc (genericize_spaceship): For tag == cc_last if
	type is not auto just return error_mark_node instead of failing
	checking assertion.

	* g++.dg/cpp2a/spaceship-synth17.C: New test.

4c688399

c++: modules and DECL_REPLACEABLE_P · e86daddb

Jason Merrill authored 4 months ago

We need to remember that the ::operator new is replaceable to avoid a bogus
error about __builtin_operator_new finding a non-replaceable function.

This affected __get_temporary_buffer in stl_tempbuf.h.

gcc/cp/ChangeLog:

	* module.cc (trees_out::core_bools): Write replaceable_operator.
	(trees_in::core_bools): Read it.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/operator-2_a.C: New test.
	* g++.dg/modules/operator-2_b.C: New test.

e86daddb

Fix some memory leaks · 9193641d

Richard Biener authored 2 months ago

The following fixes memory leaks found compiling SPEC CPU 2017 with
valgrind.

	* df-core.cc (rest_of_handle_df_finish): Release dflow for
	problems without free function (like LR).
	* gimple-crc-optimization.cc (crc_optimization::loop_may_calculate_crc):
	Release loop_bbs on all exits.
	* tree-vectorizer.h (supportable_indirect_convert_operation): Change.
	* tree-vect-generic.cc (expand_vector_conversion): Adjust.
	* tree-vect-stmts.cc (vectorizable_conversion): Use auto_vec for
	converts.
	(supportable_indirect_convert_operation): Get a reference to
	the output vector of converts.

9193641d

[PR118017][LRA]: Fix test for i686 · 94d8de53

Vladimir N. Makarov authored 2 months ago

My previous patch for PR118017 contains a test which fails on i686.  The patch fixes this.

gcc/testsuite/ChangeLog:

	PR target/118017
	* gcc.target/i386/pr118017.c: Check target int128.

94d8de53

arm: [MVE intrinsics] Fix tuples field name (PR 118332) · 288ac095

Christophe Lyon authored 2 months ago

The previous fix only worked for C, for C++ we need to add more
information to the underlying type so that
finish_class_member_access_expr accepts it.

We use the same logic as in aarch64's register_tuple_type for AdvSIMD
tuples.

This patch makes gcc.target/arm/mve/intrinsics/pr118332.c pass in C++
mode.

gcc/ChangeLog:

	PR target/118332
	* config/arm/arm-mve-builtins.cc (wrap_type_in_struct): Delete.
	(register_type_decl): Delete.
	(register_builtin_tuple_types): Use
	lang_hooks.types.simulate_record_decl.

288ac095

Fix bootstrap on !HARDREG_PRE_REGNOS targets · 55341185
Richard Biener authored 2 months ago
```
Pushed as obvious.

	* gcse.cc (pass_hardreg_pre::gate): Wrap possibly unused
	fun argument.
```
55341185

rtl-optimization/117467 - limit ext-dce memory use · 03faac50

Richard Biener authored 2 months ago

The following puts in a hard limit on ext-dce because it might end
up requiring memory on the order of the number of basic blocks
times the number of pseudo registers.  The limiting follows what
GCSE based passes do and thus I re-use --param max-gcse-memory here.

This doesn't in any way address the implementation issues of the pass,
but it reduces the memory-use when compiling the
module_first_rk_step_part1.F90 TU from 521.wrf_r from 25GB to 1GB.

	PR rtl-optimization/117467
	PR rtl-optimization/117934
	* ext-dce.cc (ext_dce_execute): Do nothing if a memory
	allocation estimate exceeds what is allowed by
	--param max-gcse-memory.

03faac50

c++: ICE with pack indexing and partial inst [PR117937] · d6444794

Marek Polacek authored 3 months ago

Here we ICE in expand_expr_real_1:

      if (exp)
        {
          tree context = decl_function_context (exp);
          gcc_assert (SCOPE_FILE_SCOPE_P (context)
                      || context == current_function_decl

on something like this test:

  void
  f (auto... args)
  {
    [&]<size_t... i>(seq<i...>) {
	g(args...[i]...);
    }(seq<0>());
  }

because while current_function_decl is:

  f<int>(int)::<lambda(seq<i ...>)> [with long unsigned int ...i = {0}]

(correct), context is:

  f<int>(int)::<lambda(seq<i ...>)>

which is only the partial instantiation.

I think that when tsubst_pack_index gets a partial instantiation, e.g.
{*args#0} as the pack, we should still tsubst it.  The args#0's value-expr
can be __closure->__args#0 where the closure's context is the partially
instantiated operator().  So we should let retrieve_local_specialization
find the right args#0.

	PR c++/117937

gcc/cp/ChangeLog:

	* pt.cc (tsubst_pack_index): tsubst the pack even when it's not
	PACK_EXPANSION_P.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp26/pack-indexing13.C: New test.
	* g++.dg/cpp26/pack-indexing14.C: New test.

d6444794

s390: Add expander for uaddc/usubc optabs · 8a2d5bc2

Stefan Schulze Frielinghaus authored 2 months ago

gcc/ChangeLog:

	* config/s390/s390-protos.h (s390_emit_compare): Add mode
	parameter for the resulting RTX.
	* config/s390/s390.cc (s390_emit_compare): Dito.
	(s390_emit_compare_and_swap): Change.
	(s390_expand_vec_strlen): Change.
	(s390_expand_cs_hqi): Change.
	(s390_expand_split_stack_prologue): Change.
	* config/s390/s390.md (*add<mode>3_carry1_cc): Renamed to ...
	(add<mode>3_carry1_cc): this and in order to use the
	corresponding gen function, encode CC mode into pattern.
	(*sub<mode>3_borrow_cc): Renamed to ...
	(sub<mode>3_borrow_cc): this and in order to use the
	corresponding gen function, encode CC mode into pattern.
	(*add<mode>3_alc_carry1_cc): Renamed to ...
	(add<mode>3_alc_carry1_cc): this and in order to use the
	corresponding gen function, encode CC mode into pattern.
	(sub<mode>3_slb_borrow1_cc): New.
	(uaddc<mode>5): New.
	(usubc<mode>5): New.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/uaddc-1.c: New test.
	* gcc.target/s390/uaddc-2.c: New test.
	* gcc.target/s390/uaddc-3.c: New test.
	* gcc.target/s390/usubc-1.c: New test.
	* gcc.target/s390/usubc-2.c: New test.
	* gcc.target/s390/usubc-3.c: New test.

8a2d5bc2

docs: Document new hardreg PRE pass · 016e2f00
Andrew Carlotti authored 3 months ago
```
gcc/ChangeLog:

	* doc/passes.texi: Document hardreg PRE pass.
```
016e2f00

Add new hardreg PRE pass · e7f98d96

Andrew Carlotti authored 5 months ago

This pass is used to optimise assignments to the FPMR register in
aarch64.  I chose to implement this as a middle-end pass because it
mostly reuses the existing RTL PRE code within gcse.cc.

Compared to RTL PRE, the key difference in this new pass is that we
insert new writes directly to the destination hardreg, instead of
writing to a new pseudo-register and copying the result later.  This
requires changes to the analysis portion of the pass, because sets
cannot be moved before existing instructions that set, use or clobber
the hardreg, and the value becomes unavailable after any uses of
clobbers of the hardreg.

Any uses of the hardreg in debug insns will be deleted.  We could do
better than this, but for the aarch64 fpmr I don't think we emit useful
debuginfo for deleted fp8 instructions anyway (and I don't even know if
it's possible to have a debug fpmr use when entering hardreg PRE).

gcc/ChangeLog:

	* config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
	* gcse.cc (doing_hardreg_pre_p): New global variable.
	(do_load_motion): New boolean check.
	(current_hardreg_regno): New global variable.
	(compute_local_properties): Unset transp for hardreg clobbers.
	(prune_hardreg_uses): New function.
	(want_to_gcse_p): Use different checks for hardreg PRE.
	(oprs_unchanged_p): Disable load motion for hardreg PRE pass.
	(hash_scan_set): For hardreg PRE, skip non-hardreg sets and
	check for hardreg clobbers.
	(record_last_mem_set_info): Skip for hardreg PRE.
	(compute_pre_data): Prune hardreg uses from transp bitmap.
	(pre_expr_reaches_here_p_work): Add sentence to comment.
	(insert_insn_start_basic_block): New functions.
	(pre_edge_insert): Don't add hardreg sets to predecessor block.
	(pre_delete): Use hardreg for the reaching reg.
	(reset_hardreg_debug_uses): New function.
	(pre_gcse): For hardreg PRE, reset debug uses and don't insert
	copies.
	(one_pre_gcse_pass): Disable load motion for hardreg PRE.
	(execute_hardreg_pre): New.
	(class pass_hardreg_pre): New.
	(pass_hardreg_pre::gate): New.
	(make_pass_hardreg_pre): New.
	* passes.def (pass_hardreg_pre): New pass.
	* tree-pass.h (make_pass_hardreg_pre): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/acle/fpmr-1.c: New test.
	* gcc.target/aarch64/acle/fpmr-2.c: New test.
	* gcc.target/aarch64/acle/fpmr-3.c: New test.
	* gcc.target/aarch64/acle/fpmr-4.c: New test.

e7f98d96

Disable a broken multiversioning optimisation · 21212f08

Andrew Carlotti authored 2 months ago

This patch skips redirect_to_specific clone for aarch64 and riscv,
because the optimisation has two flaws:

1. It checks the value of the "target" attribute, even on targets that
don't use this attribute for multiversioning.

2. The algorithm used is too aggressive, and will eliminate the
indirection in some cases where the runtime choice of callee version
can't be determined statically at compile time.  A correct would need to
verify that:
 - if the current caller version were selected at runtime, then the
   chosen callee version would be eligible for selection.
 - if any higher priority callee version were selected at runtime, then
   a higher priority caller version would have been eligble for
   selection (and hence the current caller version wouldn't have been
   selected).

The current checks only verify a more restrictive version of the first
condition, and don't check the second condition at all.

Fixing the optimisation properly would require implementing target hooks
to check for implications between version attributes, which is too
complicated for this stage.  However, I would like to see this hook
implemented in the future, since it could also help deduplicate other
multiversioning code.

Since this behaviour has existed for x86 and powerpc for a while, I
think it's best to preserve the existing behaviour on those targets,
unless any maintainer for those targets disagrees.

gcc/ChangeLog:

	* multiple_target.cc
	(redirect_to_specific_clone): Assert that "target" attribute is
	used for FMV before checking it.
	(ipa_target_clone): Skip redirect_to_specific_clone on some
	targets.

gcc/testsuite/ChangeLog:

	* g++.target/aarch64/mv-pragma.C: New test.

21212f08

docs: Add new AArch64 flags · abbe2905
Andrew Carlotti authored 4 months ago
```
gcc/ChangeLog:

	* doc/invoke.texi: Add new AArch64 flags.
```
abbe2905

aarch64: Add new +xs flag · f06c6f8b

Andrew Carlotti authored 7 months ago

GCC does not emit tlbi instructions, so this only affects the flags
passed through to the assembler.

gcc/ChangeLog:

	* config/aarch64/aarch64-arches.def (V8_7A): Add XS.
	* config/aarch64/aarch64-option-extensions.def (XS): New flag.

f06c6f8b

aarch64: Add new +wfxt flag · 4984119b

Andrew Carlotti authored 7 months ago

GCC does not currently emit the wfet or wfit instructions, so this
primarily affects the flags passed through to the assembler.

gcc/ChangeLog:

	* config/aarch64/aarch64-arches.def (V8_7A): Add WFXT.
	* config/aarch64/aarch64-option-extensions.def (WFXT): New flag.

4984119b

aarch64: Add new +rcpc2 flag · 5747c121

Andrew Carlotti authored 7 months ago

gcc/ChangeLog:

	* config/aarch64/aarch64-arches.def (V8_4A): Add RCPC2.
	* config/aarch64/aarch64-option-extensions.def
	(RCPC2): New flag.
	(RCPC3): Add RCPC2 dependency.
	* config/aarch64/aarch64.h (TARGET_RCPC2): Use new flag.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/cpunative/native_cpu_21.c: Add rcpc2 to
	expected feature string instead of rcpc.
	* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

5747c121

aarch64: Add new +flagm2 flag · f5915726

Andrew Carlotti authored 7 months ago

GCC does not currently emit the axflag or xaflag instructions, so this
primarily affects the flags passed through to the assembler.

gcc/ChangeLog:

	* config/aarch64/aarch64-arches.def (V8_5A): Add FLAGM2.
	* config/aarch64/aarch64-option-extensions.def (FLAGM2): New flag.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/cpunative/native_cpu_21.c: Add flagm2 to
	expected feature string instead of flagm.
	* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

f5915726

aarch64: Add new +frintts flag · 32a45a21

Andrew Carlotti authored 7 months ago

gcc/ChangeLog:

	* config/aarch64/aarch64-arches.def (V8_5A): Add FRINTTS
	* config/aarch64/aarch64-option-extensions.def (FRINTTS): New flag.
	* config/aarch64/aarch64.h (TARGET_FRINT): Use new flag.
	* config/aarch64/arm_acle.h: Use new flag for frintts intrinsics.
	* config/aarch64/arm_neon.h: Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/cpunative/native_cpu_21.c: Add frintts to
	expected feature string.
	* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

32a45a21

aarch64: Add new +jscvt flag · 2c891357

Andrew Carlotti authored 7 months ago

gcc/ChangeLog:

	* config/aarch64/aarch64-arches.def (V8_3A): Add JSCVT.
	* config/aarch64/aarch64-option-extensions.def (JSCVT): New flag.
	* config/aarch64/aarch64.h (TARGET_JSCVT): Use new flag.
	* config/aarch64/arm_acle.h: Use new flag for jscvt intrinsics.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/cpunative/native_cpu_21.c: Add jscvt to
	expected feature string.
	* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

2c891357

aarch64: Add new +fcma flag · 9bbb91e8

Andrew Carlotti authored 7 months ago

This includes +fcma as a dependency of +sve, and means that we can
finally support fcma intrinsics on a64fx.

Also add fcma to the Features list in several cpunative testcases that
incorrectly included sve without fcma.

gcc/ChangeLog:

	* config/aarch64/aarch64-arches.def (V8_3A): Add FCMA.
	* config/aarch64/aarch64-option-extensions.def (FCMA): New flag.
	(SVE): Add FCMA dependency.
	* config/aarch64/aarch64.h (TARGET_COMPLEX): Use new flag.
	* config/aarch64/arm_neon.h: Use new flag for fcma intrinsics.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/cpunative/info_15: Add fcma to Features.
	* gcc.target/aarch64/cpunative/info_16: Ditto.
	* gcc.target/aarch64/cpunative/info_17: Ditto.
	* gcc.target/aarch64/cpunative/info_8: Ditto.
	* gcc.target/aarch64/cpunative/info_9: Ditto.

9bbb91e8

aarch64: Use PAUTH instead of V8_3A in some places · 20385cb9

Andrew Carlotti authored 7 months ago

gcc/ChangeLog:

	* config/aarch64/aarch64.cc
	(aarch64_expand_epilogue): Use TARGET_PAUTH.
	* config/aarch64/aarch64.md: Update comment.

20385cb9

c: Fix up expr location for __builtin_stdc_rotate_* [PR118376] · 76b7f60f

Jakub Jelinek authored 2 months ago

Seems I forgot to set_c_expr_source_range for the __builtin_stdc_rotate_*
case (the other __builtin_stdc_* cases already have it), which means
the locations in expr are uninitialized, sometimes causing ICEs in linemap
code, at other times just valgrind errors about uninitialized var uses.

2025-01-10  Jakub Jelinek  <jakub@redhat.com>

	PR c/118376
	* c-parser.cc (c_parser_postfix_expression): Call
	set_c_expr_source_range before break in the __builtin_stdc_rotate_*
	case.

	* gcc.dg/pr118376.c: New test.

76b7f60f