Commits · ddeb70548c81f5dba91f281290584698897151d8 · COBOLworx / gcc-cobol

Mar 05, 2025

openmp, c++: Fix up OpenMP/OpenACC handling in C++ modules [PR119102] · ddeb7054

Jakub Jelinek authored 1 week ago

modules.cc has apparently support for extensions and attempts to ensure
that if a module is compiled with those extensions enabled, sources which
use the module are compiled with the same extensions.
The only extension supported is SE_OPENMP right now.
And the use of the extension is keyed on streaming out or in OMP_CLAUSE
tree.
This is undesirable for several reasons.
OMP_CLAUSE is the only tree which can appear in the IL even without
-fopenmp/-fopenmp-simd/-fopenacc (when simd ("notinbranch") or
simd ("inbranch") attributes are used), and it can appear also in all
the 3 modes mentioned above.  On the other side, with the exception of
arguments of attributes added e.g. for declare simd where no harm should
be done if -fopenmp/-fopenmp-simd isn't enabled later on, OMP_CLAUSE appears
in OMP_*_CLAUSES of OpenMP/OpenACC construct trees.  And those construct
trees often have no clauses at all, so keying the extension on OMP_CLAUSE
doesn't catch many cases that should be caught.
Furthermore, for OpenMP we have 2 modes, -fopenmp-simd which parses some
OpenMP but constructs from that mostly OMP_SIMD and a few other cases,
and -fopenmp which includes that and far more on top of that; and there is
also -fopenacc.

So, this patch stops setting/requesting the extension on OMP_CLAUSE,
introduces 3 extensions rather than one (SE_OPENMP_SIMD, SE_OPENMP and
SE_OPENACC) and keyes those on OpenMP constructs from the -fopenmp-simd
subset, other OpenMP constructs and OpenACC constructs.

2025-03-05  Jakub Jelinek  <jakub@redhat.com>

	PR c++/119102
gcc/cp/
	* module.cc (enum streamed_extensions): Add SE_OPENMP_SIMD
	and SE_OPENACC, change value of SE_OPENMP and SE_BITS.
	(CASE_OMP_SIMD_CODE, CASE_OMP_CODE, CASE_OACC_CODE): Define.
	(trees_out::start): Don't set SE_OPENMP extension for OMP_CLAUSE.
	Set SE_OPENMP_SIMD extension for CASE_OMP_SIMD_CODE, SE_OPENMP
	for CASE_OMP_CODE and SE_OPENACC for CASE_OACC_CODE.
	(trees_in::start): Don't fail for OMP_CLAUSE with missing
	SE_OPENMP extension.  Do fail for CASE_OMP_SIMD_CODE and missing
	SE_OPENMP_SIMD extension, or CASE_OMP_CODE and missing SE_OPENMP
	extension, or CASE_OACC_CODE and missing SE_OPENACC extension.
	(module_state::write_readme): Write all of SE_OPENMP_SIMD, SE_OPENMP
	and SE_OPENACC extensions.
	(module_state::read_config): Diagnose missing -fopenmp, -fopenmp-simd
	and/or -fopenacc depending on extensions used.
gcc/testsuite/
	* g++.dg/modules/pr119102_a.H: New test.
	* g++.dg/modules/pr119102_b.C: New test.
	* g++.dg/modules/omp-3_a.C: New test.
	* g++.dg/modules/omp-3_b.C: New test.
	* g++.dg/modules/omp-3_c.C: New test.
	* g++.dg/modules/omp-3_d.C: New test.
	* g++.dg/modules/oacc-1_a.C: New test.
	* g++.dg/modules/oacc-1_b.C: New test.
	* g++.dg/modules/oacc-1_c.C: New test.

ddeb7054

c++: Fix a comment typo · b85b405e

Jakub Jelinek authored 1 week ago

During the 118874 coro investigation I found a typo in a comment.

Fixed thusly.

2025-03-05  Jakub Jelinek  <jakub@redhat.com>

	* typeck.cc (check_return_expr): Fix comment typo, rom -> from.

b85b405e

c++: Apply/diagnose attributes when instatiating ARRAY/POINTER/REFERENCE_TYPE [PR118787] · 1853b02d

Jakub Jelinek authored 1 week ago

The following testcase IMO in violation of the P2552R3 paper doesn't
pedwarn on alignas applying to dependent types or alignas with dependent
argument.

tsubst was just ignoring TYPE_ATTRIBUTES.

The following patch fixes it for the POINTER/REFERENCE_TYPE and
ARRAY_TYPE cases, but perhaps we need to do the same also for other
types (INTEGER_TYPE/REAL_TYPE and the like).  I guess I'll need to
construct more testcases.

2025-03-05  Jakub Jelinek  <jakub@redhat.com>

	PR c++/118787
	* pt.cc (tsubst) <case ARRAY_TYPE>: Use return t; only if it doesn't
	have any TYPE_ATTRIBUTES.  Call apply_late_template_attributes.
	<case POINTER_TYPE, case REFERENCE_TYPE>: Likewise.  Formatting fix.

	* g++.dg/cpp0x/alignas22.C: New test.

1853b02d

LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084] · 4856292f

Xi Ruoyao authored 2 weeks ago

They could be incorrectly reordered with store instructions like st.b
because the RTL expression does not have a memory_operand or a (mem)
expression.  The incorrect reorder has been observed in openh264 LTO
build.

Expand them to a (mem) expression instead of unspec to fix the issue.
Then we need to make loongarch_address_insns return 1 for
ADDRESS_REG_REG because the constraint "R" expects this behavior, or
the vldx instruction will be considered invalid by the register
allocate pass and turned to add.d + vld.  Apply the ADDRESS_REG_REG
penalty in loongarch_address_cost instead, loongarch_rtx_costs should
also call loongarch_address_cost instead of loongarch_address_insns
then.

Closes: https://github.com/cisco/openh264/issues/3857

gcc/ChangeLog:

	PR target/119084
	* config/loongarch/lasx.md (UNSPEC_LASX_XVLDX): Remove.
	(lasx_xvldx): Remove.
	* config/loongarch/lsx.md (UNSPEC_LSX_VLDX): Remove.
	(lsx_vldx): Remove.
	* config/loongarch/simd.md (QIVEC): New define_mode_iterator.
	(<simd_isa>_<x>vldx): New define_expand.
	* config/loongarch/loongarch.cc (loongarch_address_insns_1): New
	static function with most logic factored out from ...
	(loongarch_address_insns): ... here.  Call
	loongarch_address_insns_1 with reg_reg_cost = 1.
	(loongarch_address_cost): Call loongarch_address_insns_1 with
	reg_reg_cost = la_addr_reg_reg_cost.

gcc/testsuite/ChangeLog:

	PR target/119084
	* gcc.target/loongarch/pr119084.c: New test.

4856292f

Daily bump. · c49ef76d
GCC Administrator authored 1 week ago

c49ef76d

Mar 04, 2025

c++: C++23 range-for temps and ?: [PR119073] · f2a7f845

Jason Merrill authored 1 week ago

Here gimplification got confused because extend_temps_r messed up the types
of the arms of a COND_EXPR.

	PR c++/119073

gcc/cp/ChangeLog:

	* call.cc (extend_temps_r): Preserve types of COND_EXPR arms.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/range-for39.C: New test.

f2a7f845

libgo: bump libgo version for GCC 15 release · 8d776294
Ian Lance Taylor authored 1 week ago
```
For PR go/119098

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/654477
```
8d776294

C prototypes for external arguments; add warning for mismatch. · 21ca9153

Thomas Koenig authored 1 week ago

The problem was that we were not handling external dummy arguments
with -fc-prototypes-external. In looking at this, I found that we
were not warning about external procedures with different argument
lists.  This can actually be legal (see the two test cases) but
creates a problem for the C prototypes: If we have something like

subroutine foo(a,n)
  external a
  if (n == 1) call a(1)
  if (n == 2) call a(2,3)
end subroutine foo

then, pre-C23, we could just have written out the prototype as

void foo_ (void (*a) (), int *n);

but this is illegal in C23. What to do?  I finally chose to warn
about the argument mismatch, with a new option. Warn only because the
code above is legal, but include in -Wall because such code seems highly
suspect.  This option is also implied in -fc-prototypes-external. I also
put a warning in the generated header file in that case, so users
have a chance to see what is going on (especially since gcc now
defaults to C23).

gcc/fortran/ChangeLog:

	PR fortran/119049
	PR fortran/119074
	* dump-parse-tree.cc (seen_conflict): New static varaible.
	(gfc_dump_external_c_prototypes): Initialize it. If it was
	set, write out a warning that -std=c23 will not work.
	(write_proc): Move the work of actually writing out the
	formal arglist to...
	(write_formal_arglist): New function. Handle external dummy
	parameters and their argument lists. If there were mismatched
	arguments, output an empty argument list in pre-C23 style.
	* gfortran.h (struct gfc_symbol): Add ext_dummy_arglist_mismatch
	flag and formal_at.
	* invoke.texi: Document -Wexternal-argument-mismatch.
	* lang.opt: Put it in.
	* resolve.cc (resolve_function): If warning about external
	argument mismatches, build a formal from actual arglist the
	first time around, and later compare and warn.
	(resolve_call): Likewise

gcc/testsuite/ChangeLog:

	PR fortran/119049
	PR fortran/119074
	* gfortran.dg/interface_55.f90: New test.
	* gfortran.dg/interface_56.f90: New test.

21ca9153

AVR: Add texi @subsubsection "AVR Optimization Options". · 9ee39fcb
Georg-Johann Lay authored 1 week ago
```
gcc/
	* doc/invoke.texi (AVR Optimization Options): New @subsubsection
	for pure optimization options.
```
9ee39fcb

testsuite: arm: Use effective-target for pr68674.c test · 879fd9c8

Torbjörn SVENSSON authored 4 months ago


gcc/testsuite/ChangeLog:

	* gcc.target/arm/pr68674.c: Use effective-target arm_arch_v7a
	and arm_libc_fp_abi.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

879fd9c8

__builtin_bswapXX: improve docs · 5452b50a
Oscar Gustafsson authored 1 week ago
```
gcc/ChangeLog:

	* doc/extend.texi: Improve example for __builtin_bswap16.
```
5452b50a

Break false dependency chain on Zen5 · 8c4a00f9

Jan Hubicka authored 1 week ago

Zen5 on some variants has false dependency on tzcnt, blsi, blsr and blsmsk
instructions.  Those can be tested by the following benchmark

jh@shroud:~> cat ee.c
int
main()
{
       int a = 10;
       int b = 0;
       for (int i = 0; i < 1000000000; i++)
       {
               asm volatile ("xor %0, %0": "=r" (b));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
               asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
       }
       return 0;
}
jh@shroud:~> cat bmk.sh
gcc ee.c -DBREAK -DINST=\"$1\" -O2 ; time ./a.out ; gcc ee.c -DINST=\"$1\" -O2 ; time ./a.out
jh@shroud:~> sh bmk.sh tzcnt

real    0m0.886s
user    0m0.886s
sys     0m0.000s

real    0m0.886s
user    0m0.886s
sys     0m0.000s

jh@shroud:~> sh bmk.sh blsi

real    0m0.979s
user    0m0.979s
sys     0m0.000s

real    0m2.418s
user    0m2.418s
sys     0m0.000s

jh@shroud:~> sh bmk.sh blsr

real    0m0.986s
user    0m0.986s
sys     0m0.000s

real    0m2.422s
user    0m2.421s
sys     0m0.000s
jh@shroud:~> sh bmk.sh blsmsk

real    0m0.973s
user    0m0.973s
sys     0m0.000s

real    0m2.422s
user    0m2.422s
sys     0m0.000s

We already have runable that controls tzcnt together with lzcnt and popcnt.
Since it seems that only tzcnt is affected I added new tunable to control tzcnt
only.  I also added splitters for blsi/blsr/blsmsk implemented analogously to
existing splitter for lzcnt.

The patch is neutral on SPEC. We produce blsi and blsr in some internal loops, but
they usually have same destination as source. However it is good to break the
dependency chain to avoid patogolical cases and it is quite cheap overall, so I
think we want to enable this for generic.  I will send followup patch for this.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

	* config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_TZCNT): New macro.
	(TARGET_AVOID_FALSE_DEP_FOR_BLS): New macro.
	* config/i386/i386.md (*bmi_blsi_<mode>): Add splitter for false
	dependency.
	(*bmi_blsi_<mode>_ccno): Add splitter for false dependency.
	(*bmi_blsi_<mode>_falsedep): New pattern.
	(*bmi_blsmsk_<mode>): Add splitter for false dependency.
	(*bmi_blsmsk_<mode>_falsedep): New pattern.
	(*bmi_blsr_<mode>): Add splitter for false dependency.
	(*bmi_blsr_<mode>_cmp): Add splitter for false dependency
	(*bmi_blsr_<mode>_cmp_falsedep): New pattern.
	* config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT): New tune.
	(X86_TUNE_AVOID_FALSE_DEP_FOR_BLS): New tune.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/blsi.c: New test.
	* gcc.target/i386/blsmsk.c: New test.
	* gcc.target/i386/blsr.c: New test.

8c4a00f9

Fortran: Fix gimplification error on assignment to pointer [PR103391] · 04909c7e

Andre Vehreschild authored 1 week ago

	PR fortran/103391

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_trans_assignment_1): Do not use poly assign
	for pointer arrays on lhs (as it is done for allocatables
	already).

gcc/testsuite/ChangeLog:

	* gfortran.dg/assign_12.f90: New test.

04909c7e

Make ix86_macro_fusion_pair_p and ix86_fuse_mov_alu_p match current CPUs · c84be624

Jan Hubicka authored 1 week ago

The current implementation of fussion predicates misses some common
fussion cases on zen and more recent cores.  I added knobs for
individual conditionals we test.

 1) I split checks for fusing ALU with conditional operands when the ALU
 has memory operand.  This seems to be supported by zen3+ and by
 tigerlake and coperlake (according to Agner Fog's manual)

 2) znver4 and 5 supports fussion of ALU and conditional even if ALU has
    memory and immediate operands.
    This seems to be relatively important enabling 25% more fusions on
    gcc bootstrap.

 3) no CPU supports fusing when ALU contains IP relative memory
    references.  I added separate knob so we do not forger about this if
    this gets supoorted later.

The patch does not solve the limitation of sched that fuse pairs must be
adjacent on imput and the first operation must be signle-set.  Fixing
single-set is easy (I have separate patch for this), for non-adjacent
pairs we need bigger surgery.

To verify what CPU really does I made simpe test script.

jh@ryzen3:~> cat fuse-test.c
        int b;
        const int z = 0;
        const int o = 1;
        int
main()
{
        int a = 1000000000;
        int b;
        int z = 0;
        int o = 1;
        asm volatile ("\n"
".L1234:\n"
        "nop\n"
        "subl   %3, %0\n"

        "movl %0, %1\n"
        "cmpl     %2, %1\n"
        "movl %0, %1\n"
        "test %1, %1\n"

        "nop\n"
        "jne    .L1234":"=a"(a),
        "=m"(b)
        "=r"(b)
        :
        "m"(z),
        "m"(o),
        "i"(0),
        "i"(1),
        "0"(a)
                );
}
jh@ryzen3:~> cat fuse-test.sh
EVENT=ex_ret_fused_instr
dotest()
{
gcc -O2  fuse-test.c $* -o fuse-cmp-imm-mem-nofuse
perf stat -e $EVENT ./fuse-cmp-imm-mem-nofuse  2>&1 | grep $EVENT
gcc -O2 fuse-test.c -DFUSE $* -o fuse-cmp-imm-mem-fuse
perf stat  -e $EVENT ./fuse-cmp-imm-mem-fuse 2>&1 | grep $EVENT
}

echo ALU with immediate
dotest
echo ALU with memory
dotest -D MEM
echo ALU with IP relative memory
dotest -D MEM -D IPRELATIVE
echo CMP with immediate
dotest -D CMP
echo CMP with memory
dotest -D CMP -D MEM
echo CMP with memory and immediate
dotest -D CMP -D MEMIMM
echo CMP with IP relative memory
dotest -D CMP -D MEM -D IPRELATIVE
echo TEST
dotest -D TEST

On zen5 I get:
ALU with immediate
            20,345      ex_ret_fused_instr:u
     1,000,020,278      ex_ret_fused_instr:u
ALU with memory
            20,367      ex_ret_fused_instr:u
     1,000,020,290      ex_ret_fused_instr:u
ALU with IP relative memory
            20,395      ex_ret_fused_instr:u
            20,403      ex_ret_fused_instr:u
CMP with immediate
            20,369      ex_ret_fused_instr:u
     1,000,020,301      ex_ret_fused_instr:u
CMP with memory
            20,314      ex_ret_fused_instr:u
     1,000,020,341      ex_ret_fused_instr:u
CMP with memory and immediate
            20,372      ex_ret_fused_instr:u
     1,000,020,266      ex_ret_fused_instr:u
CMP with IP relative memory
            20,382      ex_ret_fused_instr:u
            20,369      ex_ret_fused_instr:u
TEST
            20,346      ex_ret_fused_instr:u
     1,000,020,301      ex_ret_fused_instr:u

IP relative memory seems to not be documented.

On zen3/4 I get:

ALU with immediate
            20,263      ex_ret_fused_instr:u
     1,000,020,051      ex_ret_fused_instr:u
ALU with memory
            20,255      ex_ret_fused_instr:u
     1,000,020,056      ex_ret_fused_instr:u
ALU with IP relative memory
            20,253      ex_ret_fused_instr:u
            20,266      ex_ret_fused_instr:u
CMP with immediate
            20,264      ex_ret_fused_instr:u
     1,000,020,052      ex_ret_fused_instr:u
CMP with memory
            20,253      ex_ret_fused_instr:u
     1,000,019,794      ex_ret_fused_instr:u
CMP with memory and immediate
            20,260      ex_ret_fused_instr:u
            20,264      ex_ret_fused_instr:u
CMP with IP relative memory
            20,258      ex_ret_fused_instr:u
            20,256      ex_ret_fused_instr:u
TEST
            20,261      ex_ret_fused_instr:u
     1,000,020,048      ex_ret_fused_instr:u

zen1 and 2 gets:

ALU with immediate
            21,610      ex_ret_fus_brnch_inst:u
            21,697      ex_ret_fus_brnch_inst:u
ALU with memory
            21,479      ex_ret_fus_brnch_inst:u
            21,747      ex_ret_fus_brnch_inst:u
ALU with IP relative memory
            21,623      ex_ret_fus_brnch_inst:u
            21,684      ex_ret_fus_brnch_inst:u
CMP with immediate
            21,708      ex_ret_fus_brnch_inst:u
     1,000,021,288      ex_ret_fus_brnch_inst:u
CMP with memory
            21,689      ex_ret_fus_brnch_inst:u
     1,000,004,270      ex_ret_fus_brnch_inst:u
CMP with memory and immediate
            21,604      ex_ret_fus_brnch_inst:u
            21,671      ex_ret_fus_brnch_inst:u
CMP with IP relative memory
            21,589      ex_ret_fus_brnch_inst:u
            21,602      ex_ret_fus_brnch_inst:u
TEST
            21,600      ex_ret_fus_brnch_inst:u
     1,000,021,233      ex_ret_fus_brnch_inst:u

I tested the patch on zen3 and zen5 and spec2k17 and it seems neutral, however
the number of fussion does go up.

Bootstrapped/regtested x86_64-linux, I plan to commit it tomorrow.

Honza

gcc/ChangeLog:

	* config/i386/i386.h (TARGET_FUSE_ALU_AND_BRANCH_MEM): New macro.
	(TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM): New macro.
	(TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New macro.
	* config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Support
	non-single-set.
	(ix86_macro_fusion_pair_p): Allow ALU which only clobbers;
	be more careful about immediates; check TARGET_FUSE_ALU_AND_BRANCH_MEM,
	TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM, TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE;
	verify that we never use unsigned checks with inc/dec.
	* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): New tune.
	(X86_TUNE_FUSE_ALU_AND_BRANCH_MEM): New tune.
	(X86_TUNE_FUSE_ALU_AND_BRANCH_MEM_IMM): New tune.
	(X86_TUNE_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New tune.

c84be624

c++: ICE with RANGE_EXPR and array init [PR109431] · 173cf7c9

Marek Polacek authored 2 weeks ago


We crash because we generate

  {[0 ... 1]={.low=0, .high=1}, [1]={.low=0, .high=1}}

which output_constructor_regular_field doesn't want to see.  This
happens since r9-1483: process_init_constructor_array can now create
a RANGE_EXPR.  But the bug isn't in that patch; the problem is that
build_vec_init doesn't handle RANGE_EXPRs.

build_vec_init has a FOR_EACH_CONSTRUCTOR_ELT loop which populates
const_vec.  In this case it loops over the elements of

  {[0 ... 1]={.low=0, .high=1}}

but assumes that each element initializes one element.  So after the
loop num_initialized_elts was 1, and then below:

              HOST_WIDE_INT last = tree_to_shwi (maxindex);
              if (num_initialized_elts <= last)
                {
                  tree field = size_int (num_initialized_elts);
                  if (num_initialized_elts != last)
                    field = build2 (RANGE_EXPR, sizetype, field,
                                    size_int (last));
                  CONSTRUCTOR_APPEND_ELT (const_vec, field, e);
                }

we added the extra initializer.

It seemed convenient to use range_expr_nelts like below.

	PR c++/109431

gcc/cp/ChangeLog:

	* cp-tree.h (range_expr_nelts): Declare.
	* init.cc (build_vec_init): If the CONSTRUCTOR's index is a
	RANGE_EXPR, use range_expr_nelts to count how many elements
	were initialized.

gcc/testsuite/ChangeLog:

	* g++.dg/init/array67.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

173cf7c9

aarch64: force operand to fresh register to avoid subreg issues [PR118892] · d883f323

Tamar Christina authored 1 week ago

When the input is already a subreg and we try to make a paradoxical
subreg out of it for copysign this can fail if it violates the subreg
relationship.

Use force_lowpart_subreg instead of lowpart_subreg to then force the
results to a register instead of ICEing.

gcc/ChangeLog:

	PR target/118892
	* config/aarch64/aarch64.md (copysign<GPF:mode>3): Use
	force_lowpart_subreg instead of lowpart_subreg.

gcc/testsuite/ChangeLog:

	PR target/118892
	* gcc.target/aarch64/copysign-pr118892.c: New test.

d883f323

libstdc++: Remove stray comma in testing docs · ac16d6d7

Jonathan Wakely authored 1 week ago

libstdc++-v3/ChangeLog:

	* doc/xml/manual/test.xml: Remove stray comma.
	* doc/html/manual/test.html: Regenerate.

ac16d6d7

Fix folding of BIT_NOT_EXPR for POLY_INT_CST [PR118976] · 78380fd7

Richard Sandiford authored 1 week ago

There was an embarrassing typo in the folding of BIT_NOT_EXPR for
POLY_INT_CSTs: it used - rather than ~ on the poly_int.  Not sure
how that happened, but it might have been due to the way that
~x is implemented as -1 - x internally.

gcc/
	PR tree-optimization/118976
	* fold-const.cc (const_unop): Use ~ rather than - for BIT_NOT_EXPR.
	* config/aarch64/aarch64.cc (aarch64_test_sve_folding): New function.
	(aarch64_run_selftests): Run it.

78380fd7

simplify-rtx: Fix up simplify_logical_relational_operation [PR119002] · 1ff01a88

Richard Sandiford authored 1 week ago


The following testcase is miscompiled on powerpc64le-linux starting with
r15-6777.  During combine we see:

(set (reg:SI 134)
    (ior:SI (ge:SI (reg:CCFP 128)
            (const_int 0 [0]))
        (lt:SI (reg:CCFP 128)
            (const_int 0 [0]))))

The simplify_logical_relational_operation code (in its current form)
was written with arithmetic rather than CC modes in mind.  Since CCFP
is a CC mode, it fails the HONOR_NANS check, and so the function assumes
that ge | lt => true.

If one comparison is unsigned then it should be safe to assume that
the other comparison is also unsigned, even for CC modes, since the
optimisation checks that the comparisons are between the same operands.
For the other cases, we can only safely fold comparisons of CC mode
values if the result is always-true (15) or always-false (0).

It turns out that the original testcase for PR117186, which ran at -O,
was relying on the old behaviour for some of the functions.  It needs
4-instruction combinations, and so -fexpensive-optimizations, to pass
in its intended form.

gcc/
	PR rtl-optimization/119002
	* simplify-rtx.cc
	(simplify_context::simplify_logical_relational_operation): Handle
	comparisons between CC values.  If there is no evidence that the
	CC values are unsigned, restrict the fold to always-true or
	always-false results.

gcc/testsuite/
	* gcc.c-torture/execute/ieee/pr119002.c: New test.
	* gcc.target/aarch64/pr117186.c: Run at -O2 rather than -O.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>

1ff01a88

testsuite: Add tests for already fixed PR [PR119071] · ccf9db9a

Jakub Jelinek authored 1 week ago

Uros' r15-7793 fixed this PR as well, I'm just committing tests
from the PR so that it can be closed.

2025-03-04  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/119071
	* gcc.dg/pr119071.c: New test.
	* gcc.c-torture/execute/pr119071.c: New test.

ccf9db9a

Fortran: Prevent ICE when getting caf-token from abstract type [PR77872] · 5bd66483

Andre Vehreschild authored 1 week ago

	PR fortran/77872

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_get_tree_for_caf_expr): Pick up token from
	decl when it is present there for class types.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/class_1.f90: New test.

5bd66483

Fortran: Reduce code complexity [PR77872] · ef605e10

Andre Vehreschild authored 1 week ago

	PR fortran/77872

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_conv_procedure_call): Use attr instead of
	doing type check and branching for BT_CLASS.

ef605e10

tree-optimization/119096 - bogus conditional reduction vectorization · 10e4107d

Richard Biener authored 1 week ago

When we vectorize a .COND_ADD reduction and apply the single-use-def
cycle optimization we can end up chosing the wrong else value for
subsequent .COND_ADD.  The following rectifies this.

	PR tree-optimization/119096
	* tree-vect-loop.cc (vect_transform_reduction): Use the
	correct else value for .COND_fn.

	* gcc.dg/vect/pr119096.c: New testcase.

10e4107d

RISC-V: Fix the test case bug-3.c failure · bfb9276f

Pan Li authored 1 week ago


The bug-3.c would like to check the slli a[0-9]+, a[0-9]+, 33 for the
big poly int handling.  But the underlying insn may change to slli 1
+ slli 32 with sorts of optimization.  Thus, update the asm check to
function body check with above slli 1 + slli 32 series.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/bug-3.c: Update asm check to
	function body check.

Signed-off-by: Pan Li <pan2.li@intel.com>

bfb9276f

Daily bump. · 491c0b80
GCC Administrator authored 1 week ago

491c0b80

Mar 03, 2025

Update .po files · 6fdc64ed

Joseph Myers authored 1 week ago

gcc/po/
	* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po,
	ja.po, ka.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po,
	zh_CN.po, zh_TW.po: Update.

libcpp/po/
	* be.po, ca.po, da.po, de.po, el.po, eo.po, es.po, fi.po, fr.po,
	id.po, ja.po, ka.po, nl.po, pt_BR.po, ro.po, ru.po, sr.po, sv.po,
	tr.po, uk.po, vi.po, zh_CN.po, zh_TW.po: Update.

6fdc64ed

Fortran: reject empty derived type with bind(C) attribute [PR101577] · f9f16b9f

Harald Anlauf authored 1 week ago

	PR fortran/101577

gcc/fortran/ChangeLog:

	* symbol.cc (verify_bind_c_derived_type): Generate error message
	for derived type with no components in standard conformance mode,
	indicating that this is a GNU extension.

gcc/testsuite/ChangeLog:

	* gfortran.dg/empty_derived_type.f90: Adjust dg-options.
	* gfortran.dg/empty_derived_type_2.f90: New test.

f9f16b9f

aarch64: Ignore target pragmas while defining intrinsics · 71355700

Andrew Carlotti authored 1 month ago

Refactor the switcher classes into two separate classes:

- sve_alignment_switcher takes the alignment switching functionality,
  and is used only for ABI correctness when defining sve structure
  types.
- aarch64_target_switcher takes the rest of the functionality of
  aarch64_simd_switcher and sve_switcher, and gates simd/sve specific
  parts upon the specified feature flags.

Additionally, aarch64_target_switcher now adds dependencies of the
specified flags (which adds +fcma and +bf16 to some intrinsic
declarations), and unsets current_target_pragma.

This last change fixes an internal bug where we would sometimes add a
user specified target pragma (stored in current_target_pragma) on top of
an internally specified target architecture while initialising
intrinsics with `#pragma GCC aarch64 "arm_*.h"`.  As far as I can tell, this
has no visible impact at the moment.  However, the unintended target
feature combinations lead to unwanted behaviour in an under-development
patch.

This also fixes a missing Makefile dependency, which was due to
aarch64-sve-builtins.o incorrectly depending on the undefined $(REG_H).
The correct $(REGS_H) dependency is added to the switcher's new source
location.

gcc/ChangeLog:

	* common/config/aarch64/aarch64-common.cc
	(struct aarch64_extension_info): Add field.
	(aarch64_get_required_features): New.
	* config/aarch64/aarch64-builtins.cc
	(aarch64_simd_switcher::aarch64_simd_switcher): Rename to...
	(aarch64_target_switcher::aarch64_target_switcher): ...this,
	and extend to handle sve, nosimd and target pragmas.
	(aarch64_simd_switcher::~aarch64_simd_switcher): Rename to...
	(aarch64_target_switcher::~aarch64_target_switcher): ...this,
	and extend to handle sve, nosimd and target pragmas.
	(handle_arm_acle_h): Use aarch64_target_switcher.
	(handle_arm_neon_h): Rename switcher and pass explicit flags.
	(aarch64_general_init_builtins): Ditto.
	* config/aarch64/aarch64-protos.h
	(class aarch64_simd_switcher): Rename to...
	(class aarch64_target_switcher): ...this, and add new members.
	(aarch64_get_required_features): New prototype.
	* config/aarch64/aarch64-sve-builtins.cc
	(sve_switcher::sve_switcher): Delete
	(sve_switcher::~sve_switcher): Delete
	(sve_alignment_switcher::sve_alignment_switcher): New
	(sve_alignment_switcher::~sve_alignment_switcher): New
	(register_builtin_types): Use alignment switcher
	(init_builtins): Rename switcher.
	(handle_arm_neon_sve_bridge_h): Ditto.
	(handle_arm_sme_h): Ditto.
	(handle_arm_sve_h): Ditto, and use alignment switcher.
	* config/aarch64/aarch64-sve-builtins.h
	(class sve_switcher): Delete.
	(class sme_switcher): Delete.
	(class sve_alignment_switcher): New.
	* config/aarch64/t-aarch64 (aarch64-builtins.o): Add $(REGS_H).
	(aarch64-sve-builtins.o): Remove $(REG_H).

71355700

arm: remove some redundant zero_extend ops on thumb1 · 2a502f9e

Richard Earnshaw authored 1 week ago

The code in gcc.target/unsigned-extend-1.c really should not need an
unsigned extension operations when the optimizers are used.  For Arm
and thumb2 that is indeed the case, but for thumb1 code it gets more
complicated as there are too many instructions for combine to look at.
For thumb1 we end up with two redundant zero_extend patterns which are
not removed: the first after the subtract instruction and the second of
the final boolean result.

We can partially fix this (for the second case above) by adding a new
split pattern for LEU and GEU patterns which work because the two
instructions for the [LG]EU pattern plus the redundant extension
instruction are combined into a single insn, which we can then split
using the 3->2 method back into the two insns of the [LG]EU sequence.

Because we're missing the optimization for all thumb1 cases (not just
those architectures with UXTB), I've adjust the testcase to detect all
the idioms that we might use for zero-extending a value, namely:

       UXTB
       AND ...#255 (in thumb1 this would require a register to hold 255)
       LSL ... #24; LSR ... #24

but I've also marked this test as XFAIL for thumb1 because we can't yet
eliminate the first of the two extend instructions.

gcc/
	* config/arm/thumb1.md (split patterns for GEU and LEU): New.

gcc/testsuite:
	* gcc.target/arm/unsigned-extend-1.c: Expand check for any
	insn suggesting a zero-extend.  XFAIL for thumb1 code.

2a502f9e

Revert "combine: Reverse negative logic in ternary operator" · ebc6c54e
Uros Bizjak authored 1 week ago
```
This reverts commit f1c30c62.
```
ebc6c54e

combine: Reverse negative logic in ternary operator · f1c30c62

Uros Bizjak authored 1 week ago

Reverse negative logic in !a ? b : c to become a ? c : b.

No functional changes.

gcc/ChangeLog:

	* combine.cc (distribute_notes):
	Reverse negative logic in ternary operators.

f1c30c62

combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739] · a92dc3fe

Uros Bizjak authored 1 month ago

The combine pass is trying to combine:

Trying 16, 22, 21 -> 23:
   16: r104:QI=flags:CCNO>0
   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
      REG_UNUSED flags:CC
   21: r119:QI=flags:CCNO<=0
      REG_DEAD flags:CCNO
   23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;}
      REG_DEAD r120:QI
      REG_DEAD r119:QI
      REG_UNUSED flags:CC

and creates the following two insn sequence:

modifying insn i2    22: r104:QI=flags:CCNO>0
      REG_DEAD flags:CC
deferring rescan insn with uid = 22.
modifying insn i3    23: r110:QI=flags:CCNO<=0
      REG_DEAD flags:CC
deferring rescan insn with uid = 23.

where the REG_DEAD note in i2 is not correct, because the flags
register is still referenced in i3.  In try_combine() megafunction,
we have this part:

--cut here--
    /* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3.  */
    if (i3notes)
      distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, elim_i0);
    if (i2notes)
      distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, elim_i0);
    if (i1notes)
      distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL,
			elim_i2, local_elim_i1, local_elim_i0);
    if (i0notes)
      distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, local_elim_i0);
    if (midnotes)
      distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, elim_i0);
--cut here--

where the compiler distributes REG_UNUSED note from i2:

   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
      REG_UNUSED flags:CC

via distribute_notes() using the following:

--cut here--
	  /* Otherwise, if this register is used by I3, then this register
	     now dies here, so we must put a REG_DEAD note here unless there
	     is one already.  */
	  else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3))
		   && ! (REG_P (XEXP (note, 0))
			 ? find_regno_note (i3, REG_DEAD,
					    REGNO (XEXP (note, 0)))
			 : find_reg_note (i3, REG_DEAD, XEXP (note, 0))))
	    {
	      PUT_REG_NOTE_KIND (note, REG_DEAD);
	      place = i3;
	    }
--cut here--

Flags register is used in I3, but there already is a REG_DEAD note in I3.
The above condition doesn't trigger and continues in the "else" part where
REG_DEAD note is put to I2.  The proposed solution corrects the above
logic to trigger every time the register is referenced in I3, avoiding the
"else" part.

	PR rtl-optimization/118739

gcc/ChangeLog:

	* combine.cc (distribute_notes) <case REG_UNUSED>: Correct the
	logic when the register is used by I3.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr118739.c: New test.

a92dc3fe

ipa-vr: Handle non-conversion unary ops separately from conversions (PR 118785) · d05b64bd

Martin Jambor authored 1 week ago

Since we construct arithmetic jump functions even when there is a
type conversion in between the operation encoded in the jump function
and when it is passed in a call argument, the IPA propagation phase
must also perform the operation and conversion in two steps.  IPA-VR
had actually been doing it even before for binary operations but, as
PR 118756 exposes, not in the case on unary operations.  This patch
adds the necessary step to rectify that.

Like in the scalar constant case, we depend on
expr_type_first_operand_type_p to determine the type of the result of
the arithmetic operation.  On top this, the patch special-cases
ABSU_EXPR because it looks useful an so that the PR testcase exercises
the added code-path.  This seems most appropriate for stage 4, long
term we should probably stream the types, probably after also encoding
them with a string of expr_eval_op rather than what we have today.

A check for expr_type_first_operand_type_p was also missing in the
handling of binary ops and the intermediate value_range was
initialized with a wrong type, so I also fixed this.

gcc/ChangeLog:

2025-02-24  Martin Jambor  <mjambor@suse.cz>

	PR ipa/118785

	* ipa-cp.cc (ipa_vr_intersect_with_arith_jfunc): Handle non-conversion
	unary operations separately before doing any conversions.  Check
	expr_type_first_operand_type_p for non-unary operations too.  Fix type
	of op_res.

gcc/testsuite/ChangeLog:

2025-02-24  Martin Jambor  <mjambor@suse.cz>

	PR ipa/118785
	* g++.dg/lto/pr118785_0.C: New test.

d05b64bd

tree-optimization/119057 - bogus double reduction detection · 758de626

Richard Biener authored 1 week ago

We are detecting a cycle as double reduction where the inner loop
cycle has extra out-of-loop uses.  This clashes at least with
assumptions from the SLP discovery code which says the cycle
isn't reachable from another SLP instance.  It also was not intended
to support this case, in fact with GCC 14 we seem to generate wrong
code here.

	PR tree-optimization/119057
	* tree-vect-loop.cc (check_reduction_path): Add argument
	specifying whether we're analyzing the inner loop of a
	double reduction.  Do not allow extra uses outside of the
	double reduction cycle in this case.
	(vect_is_simple_reduction): Adjust.

	* gcc.dg/vect/pr119057.c: New testcase.

758de626

ipa/119067 - bogus TYPE_PRECISION check on VECTOR_TYPE · f22e8916

Richard Biener authored 1 week ago

odr_types_equivalent_p can end up using TYPE_PRECISION on vector
types which is a no-go.  The following instead uses TYPE_VECTOR_SUBPARTS
for vector types so we also end up comparing the number of vector elements.

	PR ipa/119067
	* ipa-devirt.cc (odr_types_equivalent_p): Check
	TYPE_VECTOR_SUBPARTS for vectors.

	* g++.dg/lto/pr119067_0.C: New testcase.
	* g++.dg/lto/pr119067_1.C: Likewise.

f22e8916

Fortran: Fix regression on double free on elemental function [PR118747] · 43c11931

Andre Vehreschild authored 2 weeks ago

Fix a regression were adding a temporary variable inserted a copy of the
argument to the elemental function.  That copy was then later used to
free allocated memory, but the freeing was not tracked in the source
array correctly.

	PR fortran/118747

gcc/fortran/ChangeLog:

	* trans-array.cc (gfc_trans_array_ctor_element): Remove copy to
	temporary variable.
	* trans-expr.cc (gfc_conv_procedure_call): Use references to
	array members instead of copies when freeing after use.
	Formatting fix.

gcc/testsuite/ChangeLog:

	* gfortran.dg/alloc_comp_auto_array_4.f90: New test.

43c11931

Daily bump. · 0163d505
GCC Administrator authored 1 week ago

0163d505

Mar 02, 2025

[RISC-V][PR target/118934] Fix ICE in RISC-V long branch support · 67e824c2

Jeff Law authored 1 week ago

I'm not sure if I goof'd this or if I merely upstreamed someone else's goof.
Either way the long branch code isn't working correctly.

We were using 'n' as the output modifier to negate the condition.  But 'n' has
a special meaning elsewhere, so when presented with a condition rather than
what was expected, boom, the compiler ICE'd.

Thankfully there's only a few places where we were using %n which I turned into
%r.

The BZ entry includes a good testcase, it just takes a long time to compile as
it's trying to create the out-of-range scenario.  I'm not including the
testcase due to how long it takes, but I did test it locally to ensure it's
working properly now.

I'm sure that with a little bit of work I could create at testcase that worked
before and fails with the trunk (by taking advantage of the fuzzyness in length
computations).  So I'm going to consider this a regression.

Will push to the trunk after pre-commit testing does its thing.

	PR target/118934
gcc/
	* config/riscv/corev.md (cv_branch): Adjust output template.
	(branch): Likewise.
	* config/riscv/riscv.md (branch): Likewise.
	* config/riscv/riscv.cc (riscv_asm_output_opcode): Handle 'r' rather
	than 'n'.

67e824c2

PR modula2/119088 ICE when for loop accesses an unknown variable as the iterator · 585aa406

Gaius Mulley authored 1 week ago


This patch fixes an ICE which occurs when a FOR statement attempts to
use an undeclared variable as its iterator.

gcc/m2/ChangeLog:

	PR modula2/119088
	* gm2-compiler/M2SymInit.mod (ConfigSymInit): Reimplement to
	defensively check for NulSym type.

gcc/testsuite/ChangeLog:

	PR modula2/119088
	* gm2/pim/fail/tinyfor4.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

585aa406

Fortran: Small fixes in intrinsic.texi. · 43a9022a

Sandra Loosemore authored 2 weeks ago

gcc/fortran/ChangeLog
	* intrinsic.texi: Fix inconsistent capitalization of argument
	names and other minor copy-editing.

43a9022a