- Mar 05, 2025
-
-
Jakub Jelinek authored
modules.cc has apparently support for extensions and attempts to ensure that if a module is compiled with those extensions enabled, sources which use the module are compiled with the same extensions. The only extension supported is SE_OPENMP right now. And the use of the extension is keyed on streaming out or in OMP_CLAUSE tree. This is undesirable for several reasons. OMP_CLAUSE is the only tree which can appear in the IL even without -fopenmp/-fopenmp-simd/-fopenacc (when simd ("notinbranch") or simd ("inbranch") attributes are used), and it can appear also in all the 3 modes mentioned above. On the other side, with the exception of arguments of attributes added e.g. for declare simd where no harm should be done if -fopenmp/-fopenmp-simd isn't enabled later on, OMP_CLAUSE appears in OMP_*_CLAUSES of OpenMP/OpenACC construct trees. And those construct trees often have no clauses at all, so keying the extension on OMP_CLAUSE doesn't catch many cases that should be caught. Furthermore, for OpenMP we have 2 modes, -fopenmp-simd which parses some OpenMP but constructs from that mostly OMP_SIMD and a few other cases, and -fopenmp which includes that and far more on top of that; and there is also -fopenacc. So, this patch stops setting/requesting the extension on OMP_CLAUSE, introduces 3 extensions rather than one (SE_OPENMP_SIMD, SE_OPENMP and SE_OPENACC) and keyes those on OpenMP constructs from the -fopenmp-simd subset, other OpenMP constructs and OpenACC constructs. 2025-03-05 Jakub Jelinek <jakub@redhat.com> PR c++/119102 gcc/cp/ * module.cc (enum streamed_extensions): Add SE_OPENMP_SIMD and SE_OPENACC, change value of SE_OPENMP and SE_BITS. (CASE_OMP_SIMD_CODE, CASE_OMP_CODE, CASE_OACC_CODE): Define. (trees_out::start): Don't set SE_OPENMP extension for OMP_CLAUSE. Set SE_OPENMP_SIMD extension for CASE_OMP_SIMD_CODE, SE_OPENMP for CASE_OMP_CODE and SE_OPENACC for CASE_OACC_CODE. (trees_in::start): Don't fail for OMP_CLAUSE with missing SE_OPENMP extension. Do fail for CASE_OMP_SIMD_CODE and missing SE_OPENMP_SIMD extension, or CASE_OMP_CODE and missing SE_OPENMP extension, or CASE_OACC_CODE and missing SE_OPENACC extension. (module_state::write_readme): Write all of SE_OPENMP_SIMD, SE_OPENMP and SE_OPENACC extensions. (module_state::read_config): Diagnose missing -fopenmp, -fopenmp-simd and/or -fopenacc depending on extensions used. gcc/testsuite/ * g++.dg/modules/pr119102_a.H: New test. * g++.dg/modules/pr119102_b.C: New test. * g++.dg/modules/omp-3_a.C: New test. * g++.dg/modules/omp-3_b.C: New test. * g++.dg/modules/omp-3_c.C: New test. * g++.dg/modules/omp-3_d.C: New test. * g++.dg/modules/oacc-1_a.C: New test. * g++.dg/modules/oacc-1_b.C: New test. * g++.dg/modules/oacc-1_c.C: New test.
-
Jakub Jelinek authored
During the 118874 coro investigation I found a typo in a comment. Fixed thusly. 2025-03-05 Jakub Jelinek <jakub@redhat.com> * typeck.cc (check_return_expr): Fix comment typo, rom -> from.
-
Jakub Jelinek authored
The following testcase IMO in violation of the P2552R3 paper doesn't pedwarn on alignas applying to dependent types or alignas with dependent argument. tsubst was just ignoring TYPE_ATTRIBUTES. The following patch fixes it for the POINTER/REFERENCE_TYPE and ARRAY_TYPE cases, but perhaps we need to do the same also for other types (INTEGER_TYPE/REAL_TYPE and the like). I guess I'll need to construct more testcases. 2025-03-05 Jakub Jelinek <jakub@redhat.com> PR c++/118787 * pt.cc (tsubst) <case ARRAY_TYPE>: Use return t; only if it doesn't have any TYPE_ATTRIBUTES. Call apply_late_template_attributes. <case POINTER_TYPE, case REFERENCE_TYPE>: Likewise. Formatting fix. * g++.dg/cpp0x/alignas22.C: New test.
-
Xi Ruoyao authored
They could be incorrectly reordered with store instructions like st.b because the RTL expression does not have a memory_operand or a (mem) expression. The incorrect reorder has been observed in openh264 LTO build. Expand them to a (mem) expression instead of unspec to fix the issue. Then we need to make loongarch_address_insns return 1 for ADDRESS_REG_REG because the constraint "R" expects this behavior, or the vldx instruction will be considered invalid by the register allocate pass and turned to add.d + vld. Apply the ADDRESS_REG_REG penalty in loongarch_address_cost instead, loongarch_rtx_costs should also call loongarch_address_cost instead of loongarch_address_insns then. Closes: https://github.com/cisco/openh264/issues/3857 gcc/ChangeLog: PR target/119084 * config/loongarch/lasx.md (UNSPEC_LASX_XVLDX): Remove. (lasx_xvldx): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_VLDX): Remove. (lsx_vldx): Remove. * config/loongarch/simd.md (QIVEC): New define_mode_iterator. (<simd_isa>_<x>vldx): New define_expand. * config/loongarch/loongarch.cc (loongarch_address_insns_1): New static function with most logic factored out from ... (loongarch_address_insns): ... here. Call loongarch_address_insns_1 with reg_reg_cost = 1. (loongarch_address_cost): Call loongarch_address_insns_1 with reg_reg_cost = la_addr_reg_reg_cost. gcc/testsuite/ChangeLog: PR target/119084 * gcc.target/loongarch/pr119084.c: New test.
-
GCC Administrator authored
-
- Mar 04, 2025
-
-
Jason Merrill authored
Here gimplification got confused because extend_temps_r messed up the types of the arms of a COND_EXPR. PR c++/119073 gcc/cp/ChangeLog: * call.cc (extend_temps_r): Preserve types of COND_EXPR arms. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/range-for39.C: New test.
-
Ian Lance Taylor authored
For PR go/119098 Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/654477
-
Thomas Koenig authored
The problem was that we were not handling external dummy arguments with -fc-prototypes-external. In looking at this, I found that we were not warning about external procedures with different argument lists. This can actually be legal (see the two test cases) but creates a problem for the C prototypes: If we have something like subroutine foo(a,n) external a if (n == 1) call a(1) if (n == 2) call a(2,3) end subroutine foo then, pre-C23, we could just have written out the prototype as void foo_ (void (*a) (), int *n); but this is illegal in C23. What to do? I finally chose to warn about the argument mismatch, with a new option. Warn only because the code above is legal, but include in -Wall because such code seems highly suspect. This option is also implied in -fc-prototypes-external. I also put a warning in the generated header file in that case, so users have a chance to see what is going on (especially since gcc now defaults to C23). gcc/fortran/ChangeLog: PR fortran/119049 PR fortran/119074 * dump-parse-tree.cc (seen_conflict): New static varaible. (gfc_dump_external_c_prototypes): Initialize it. If it was set, write out a warning that -std=c23 will not work. (write_proc): Move the work of actually writing out the formal arglist to... (write_formal_arglist): New function. Handle external dummy parameters and their argument lists. If there were mismatched arguments, output an empty argument list in pre-C23 style. * gfortran.h (struct gfc_symbol): Add ext_dummy_arglist_mismatch flag and formal_at. * invoke.texi: Document -Wexternal-argument-mismatch. * lang.opt: Put it in. * resolve.cc (resolve_function): If warning about external argument mismatches, build a formal from actual arglist the first time around, and later compare and warn. (resolve_call): Likewise gcc/testsuite/ChangeLog: PR fortran/119049 PR fortran/119074 * gfortran.dg/interface_55.f90: New test. * gfortran.dg/interface_56.f90: New test.
-
Georg-Johann Lay authored
gcc/ * doc/invoke.texi (AVR Optimization Options): New @subsubsection for pure optimization options.
-
Torbjörn SVENSSON authored
gcc/testsuite/ChangeLog: * gcc.target/arm/pr68674.c: Use effective-target arm_arch_v7a and arm_libc_fp_abi. Signed-off-by:
Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
-
Oscar Gustafsson authored
gcc/ChangeLog: * doc/extend.texi: Improve example for __builtin_bswap16.
-
Jan Hubicka authored
Zen5 on some variants has false dependency on tzcnt, blsi, blsr and blsmsk instructions. Those can be tested by the following benchmark jh@shroud:~> cat ee.c int main() { int a = 10; int b = 0; for (int i = 0; i < 1000000000; i++) { asm volatile ("xor %0, %0": "=r" (b)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a)); } return 0; } jh@shroud:~> cat bmk.sh gcc ee.c -DBREAK -DINST=\"$1\" -O2 ; time ./a.out ; gcc ee.c -DINST=\"$1\" -O2 ; time ./a.out jh@shroud:~> sh bmk.sh tzcnt real 0m0.886s user 0m0.886s sys 0m0.000s real 0m0.886s user 0m0.886s sys 0m0.000s jh@shroud:~> sh bmk.sh blsi real 0m0.979s user 0m0.979s sys 0m0.000s real 0m2.418s user 0m2.418s sys 0m0.000s jh@shroud:~> sh bmk.sh blsr real 0m0.986s user 0m0.986s sys 0m0.000s real 0m2.422s user 0m2.421s sys 0m0.000s jh@shroud:~> sh bmk.sh blsmsk real 0m0.973s user 0m0.973s sys 0m0.000s real 0m2.422s user 0m2.422s sys 0m0.000s We already have runable that controls tzcnt together with lzcnt and popcnt. Since it seems that only tzcnt is affected I added new tunable to control tzcnt only. I also added splitters for blsi/blsr/blsmsk implemented analogously to existing splitter for lzcnt. The patch is neutral on SPEC. We produce blsi and blsr in some internal loops, but they usually have same destination as source. However it is good to break the dependency chain to avoid patogolical cases and it is quite cheap overall, so I think we want to enable this for generic. I will send followup patch for this. Bootstrapped/regtested x86_64-linux, will commit it shortly. gcc/ChangeLog: * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_TZCNT): New macro. (TARGET_AVOID_FALSE_DEP_FOR_BLS): New macro. * config/i386/i386.md (*bmi_blsi_<mode>): Add splitter for false dependency. (*bmi_blsi_<mode>_ccno): Add splitter for false dependency. (*bmi_blsi_<mode>_falsedep): New pattern. (*bmi_blsmsk_<mode>): Add splitter for false dependency. (*bmi_blsmsk_<mode>_falsedep): New pattern. (*bmi_blsr_<mode>): Add splitter for false dependency. (*bmi_blsr_<mode>_cmp): Add splitter for false dependency (*bmi_blsr_<mode>_cmp_falsedep): New pattern. * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT): New tune. (X86_TUNE_AVOID_FALSE_DEP_FOR_BLS): New tune. gcc/testsuite/ChangeLog: * gcc.target/i386/blsi.c: New test. * gcc.target/i386/blsmsk.c: New test. * gcc.target/i386/blsr.c: New test.
-
Andre Vehreschild authored
PR fortran/103391 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_trans_assignment_1): Do not use poly assign for pointer arrays on lhs (as it is done for allocatables already). gcc/testsuite/ChangeLog: * gfortran.dg/assign_12.f90: New test.
-
Jan Hubicka authored
The current implementation of fussion predicates misses some common fussion cases on zen and more recent cores. I added knobs for individual conditionals we test. 1) I split checks for fusing ALU with conditional operands when the ALU has memory operand. This seems to be supported by zen3+ and by tigerlake and coperlake (according to Agner Fog's manual) 2) znver4 and 5 supports fussion of ALU and conditional even if ALU has memory and immediate operands. This seems to be relatively important enabling 25% more fusions on gcc bootstrap. 3) no CPU supports fusing when ALU contains IP relative memory references. I added separate knob so we do not forger about this if this gets supoorted later. The patch does not solve the limitation of sched that fuse pairs must be adjacent on imput and the first operation must be signle-set. Fixing single-set is easy (I have separate patch for this), for non-adjacent pairs we need bigger surgery. To verify what CPU really does I made simpe test script. jh@ryzen3:~> cat fuse-test.c int b; const int z = 0; const int o = 1; int main() { int a = 1000000000; int b; int z = 0; int o = 1; asm volatile ("\n" ".L1234:\n" "nop\n" "subl %3, %0\n" "movl %0, %1\n" "cmpl %2, %1\n" "movl %0, %1\n" "test %1, %1\n" "nop\n" "jne .L1234":"=a"(a), "=m"(b) "=r"(b) : "m"(z), "m"(o), "i"(0), "i"(1), "0"(a) ); } jh@ryzen3:~> cat fuse-test.sh EVENT=ex_ret_fused_instr dotest() { gcc -O2 fuse-test.c $* -o fuse-cmp-imm-mem-nofuse perf stat -e $EVENT ./fuse-cmp-imm-mem-nofuse 2>&1 | grep $EVENT gcc -O2 fuse-test.c -DFUSE $* -o fuse-cmp-imm-mem-fuse perf stat -e $EVENT ./fuse-cmp-imm-mem-fuse 2>&1 | grep $EVENT } echo ALU with immediate dotest echo ALU with memory dotest -D MEM echo ALU with IP relative memory dotest -D MEM -D IPRELATIVE echo CMP with immediate dotest -D CMP echo CMP with memory dotest -D CMP -D MEM echo CMP with memory and immediate dotest -D CMP -D MEMIMM echo CMP with IP relative memory dotest -D CMP -D MEM -D IPRELATIVE echo TEST dotest -D TEST On zen5 I get: ALU with immediate 20,345 ex_ret_fused_instr:u 1,000,020,278 ex_ret_fused_instr:u ALU with memory 20,367 ex_ret_fused_instr:u 1,000,020,290 ex_ret_fused_instr:u ALU with IP relative memory 20,395 ex_ret_fused_instr:u 20,403 ex_ret_fused_instr:u CMP with immediate 20,369 ex_ret_fused_instr:u 1,000,020,301 ex_ret_fused_instr:u CMP with memory 20,314 ex_ret_fused_instr:u 1,000,020,341 ex_ret_fused_instr:u CMP with memory and immediate 20,372 ex_ret_fused_instr:u 1,000,020,266 ex_ret_fused_instr:u CMP with IP relative memory 20,382 ex_ret_fused_instr:u 20,369 ex_ret_fused_instr:u TEST 20,346 ex_ret_fused_instr:u 1,000,020,301 ex_ret_fused_instr:u IP relative memory seems to not be documented. On zen3/4 I get: ALU with immediate 20,263 ex_ret_fused_instr:u 1,000,020,051 ex_ret_fused_instr:u ALU with memory 20,255 ex_ret_fused_instr:u 1,000,020,056 ex_ret_fused_instr:u ALU with IP relative memory 20,253 ex_ret_fused_instr:u 20,266 ex_ret_fused_instr:u CMP with immediate 20,264 ex_ret_fused_instr:u 1,000,020,052 ex_ret_fused_instr:u CMP with memory 20,253 ex_ret_fused_instr:u 1,000,019,794 ex_ret_fused_instr:u CMP with memory and immediate 20,260 ex_ret_fused_instr:u 20,264 ex_ret_fused_instr:u CMP with IP relative memory 20,258 ex_ret_fused_instr:u 20,256 ex_ret_fused_instr:u TEST 20,261 ex_ret_fused_instr:u 1,000,020,048 ex_ret_fused_instr:u zen1 and 2 gets: ALU with immediate 21,610 ex_ret_fus_brnch_inst:u 21,697 ex_ret_fus_brnch_inst:u ALU with memory 21,479 ex_ret_fus_brnch_inst:u 21,747 ex_ret_fus_brnch_inst:u ALU with IP relative memory 21,623 ex_ret_fus_brnch_inst:u 21,684 ex_ret_fus_brnch_inst:u CMP with immediate 21,708 ex_ret_fus_brnch_inst:u 1,000,021,288 ex_ret_fus_brnch_inst:u CMP with memory 21,689 ex_ret_fus_brnch_inst:u 1,000,004,270 ex_ret_fus_brnch_inst:u CMP with memory and immediate 21,604 ex_ret_fus_brnch_inst:u 21,671 ex_ret_fus_brnch_inst:u CMP with IP relative memory 21,589 ex_ret_fus_brnch_inst:u 21,602 ex_ret_fus_brnch_inst:u TEST 21,600 ex_ret_fus_brnch_inst:u 1,000,021,233 ex_ret_fus_brnch_inst:u I tested the patch on zen3 and zen5 and spec2k17 and it seems neutral, however the number of fussion does go up. Bootstrapped/regtested x86_64-linux, I plan to commit it tomorrow. Honza gcc/ChangeLog: * config/i386/i386.h (TARGET_FUSE_ALU_AND_BRANCH_MEM): New macro. (TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM): New macro. (TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New macro. * config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Support non-single-set. (ix86_macro_fusion_pair_p): Allow ALU which only clobbers; be more careful about immediates; check TARGET_FUSE_ALU_AND_BRANCH_MEM, TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM, TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE; verify that we never use unsigned checks with inc/dec. * config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): New tune. (X86_TUNE_FUSE_ALU_AND_BRANCH_MEM): New tune. (X86_TUNE_FUSE_ALU_AND_BRANCH_MEM_IMM): New tune. (X86_TUNE_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New tune.
-
Marek Polacek authored
We crash because we generate {[0 ... 1]={.low=0, .high=1}, [1]={.low=0, .high=1}} which output_constructor_regular_field doesn't want to see. This happens since r9-1483: process_init_constructor_array can now create a RANGE_EXPR. But the bug isn't in that patch; the problem is that build_vec_init doesn't handle RANGE_EXPRs. build_vec_init has a FOR_EACH_CONSTRUCTOR_ELT loop which populates const_vec. In this case it loops over the elements of {[0 ... 1]={.low=0, .high=1}} but assumes that each element initializes one element. So after the loop num_initialized_elts was 1, and then below: HOST_WIDE_INT last = tree_to_shwi (maxindex); if (num_initialized_elts <= last) { tree field = size_int (num_initialized_elts); if (num_initialized_elts != last) field = build2 (RANGE_EXPR, sizetype, field, size_int (last)); CONSTRUCTOR_APPEND_ELT (const_vec, field, e); } we added the extra initializer. It seemed convenient to use range_expr_nelts like below. PR c++/109431 gcc/cp/ChangeLog: * cp-tree.h (range_expr_nelts): Declare. * init.cc (build_vec_init): If the CONSTRUCTOR's index is a RANGE_EXPR, use range_expr_nelts to count how many elements were initialized. gcc/testsuite/ChangeLog: * g++.dg/init/array67.C: New test. Reviewed-by:
Jason Merrill <jason@redhat.com>
-
Tamar Christina authored
When the input is already a subreg and we try to make a paradoxical subreg out of it for copysign this can fail if it violates the subreg relationship. Use force_lowpart_subreg instead of lowpart_subreg to then force the results to a register instead of ICEing. gcc/ChangeLog: PR target/118892 * config/aarch64/aarch64.md (copysign<GPF:mode>3): Use force_lowpart_subreg instead of lowpart_subreg. gcc/testsuite/ChangeLog: PR target/118892 * gcc.target/aarch64/copysign-pr118892.c: New test.
-
Jonathan Wakely authored
libstdc++-v3/ChangeLog: * doc/xml/manual/test.xml: Remove stray comma. * doc/html/manual/test.html: Regenerate.
-
Richard Sandiford authored
There was an embarrassing typo in the folding of BIT_NOT_EXPR for POLY_INT_CSTs: it used - rather than ~ on the poly_int. Not sure how that happened, but it might have been due to the way that ~x is implemented as -1 - x internally. gcc/ PR tree-optimization/118976 * fold-const.cc (const_unop): Use ~ rather than - for BIT_NOT_EXPR. * config/aarch64/aarch64.cc (aarch64_test_sve_folding): New function. (aarch64_run_selftests): Run it.
-
Richard Sandiford authored
The following testcase is miscompiled on powerpc64le-linux starting with r15-6777. During combine we see: (set (reg:SI 134) (ior:SI (ge:SI (reg:CCFP 128) (const_int 0 [0])) (lt:SI (reg:CCFP 128) (const_int 0 [0])))) The simplify_logical_relational_operation code (in its current form) was written with arithmetic rather than CC modes in mind. Since CCFP is a CC mode, it fails the HONOR_NANS check, and so the function assumes that ge | lt => true. If one comparison is unsigned then it should be safe to assume that the other comparison is also unsigned, even for CC modes, since the optimisation checks that the comparisons are between the same operands. For the other cases, we can only safely fold comparisons of CC mode values if the result is always-true (15) or always-false (0). It turns out that the original testcase for PR117186, which ran at -O, was relying on the old behaviour for some of the functions. It needs 4-instruction combinations, and so -fexpensive-optimizations, to pass in its intended form. gcc/ PR rtl-optimization/119002 * simplify-rtx.cc (simplify_context::simplify_logical_relational_operation): Handle comparisons between CC values. If there is no evidence that the CC values are unsigned, restrict the fold to always-true or always-false results. gcc/testsuite/ * gcc.c-torture/execute/ieee/pr119002.c: New test. * gcc.target/aarch64/pr117186.c: Run at -O2 rather than -O. Co-authored-by:
Jakub Jelinek <jakub@redhat.com>
-
Jakub Jelinek authored
Uros' r15-7793 fixed this PR as well, I'm just committing tests from the PR so that it can be closed. 2025-03-04 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/119071 * gcc.dg/pr119071.c: New test. * gcc.c-torture/execute/pr119071.c: New test.
-
Andre Vehreschild authored
PR fortran/77872 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_get_tree_for_caf_expr): Pick up token from decl when it is present there for class types. gcc/testsuite/ChangeLog: * gfortran.dg/coarray/class_1.f90: New test.
-
Andre Vehreschild authored
PR fortran/77872 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_conv_procedure_call): Use attr instead of doing type check and branching for BT_CLASS.
-
Richard Biener authored
When we vectorize a .COND_ADD reduction and apply the single-use-def cycle optimization we can end up chosing the wrong else value for subsequent .COND_ADD. The following rectifies this. PR tree-optimization/119096 * tree-vect-loop.cc (vect_transform_reduction): Use the correct else value for .COND_fn. * gcc.dg/vect/pr119096.c: New testcase.
-
Pan Li authored
The bug-3.c would like to check the slli a[0-9]+, a[0-9]+, 33 for the big poly int handling. But the underlying insn may change to slli 1 + slli 32 with sorts of optimization. Thus, update the asm check to function body check with above slli 1 + slli 32 series. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/bug-3.c: Update asm check to function body check. Signed-off-by:
Pan Li <pan2.li@intel.com>
-
GCC Administrator authored
-
- Mar 03, 2025
-
-
Joseph Myers authored
gcc/po/ * be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po, ja.po, ka.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po, zh_TW.po: Update. libcpp/po/ * be.po, ca.po, da.po, de.po, el.po, eo.po, es.po, fi.po, fr.po, id.po, ja.po, ka.po, nl.po, pt_BR.po, ro.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po, zh_TW.po: Update.
-
Harald Anlauf authored
PR fortran/101577 gcc/fortran/ChangeLog: * symbol.cc (verify_bind_c_derived_type): Generate error message for derived type with no components in standard conformance mode, indicating that this is a GNU extension. gcc/testsuite/ChangeLog: * gfortran.dg/empty_derived_type.f90: Adjust dg-options. * gfortran.dg/empty_derived_type_2.f90: New test.
-
Andrew Carlotti authored
Refactor the switcher classes into two separate classes: - sve_alignment_switcher takes the alignment switching functionality, and is used only for ABI correctness when defining sve structure types. - aarch64_target_switcher takes the rest of the functionality of aarch64_simd_switcher and sve_switcher, and gates simd/sve specific parts upon the specified feature flags. Additionally, aarch64_target_switcher now adds dependencies of the specified flags (which adds +fcma and +bf16 to some intrinsic declarations), and unsets current_target_pragma. This last change fixes an internal bug where we would sometimes add a user specified target pragma (stored in current_target_pragma) on top of an internally specified target architecture while initialising intrinsics with `#pragma GCC aarch64 "arm_*.h"`. As far as I can tell, this has no visible impact at the moment. However, the unintended target feature combinations lead to unwanted behaviour in an under-development patch. This also fixes a missing Makefile dependency, which was due to aarch64-sve-builtins.o incorrectly depending on the undefined $(REG_H). The correct $(REGS_H) dependency is added to the switcher's new source location. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (struct aarch64_extension_info): Add field. (aarch64_get_required_features): New. * config/aarch64/aarch64-builtins.cc (aarch64_simd_switcher::aarch64_simd_switcher): Rename to... (aarch64_target_switcher::aarch64_target_switcher): ...this, and extend to handle sve, nosimd and target pragmas. (aarch64_simd_switcher::~aarch64_simd_switcher): Rename to... (aarch64_target_switcher::~aarch64_target_switcher): ...this, and extend to handle sve, nosimd and target pragmas. (handle_arm_acle_h): Use aarch64_target_switcher. (handle_arm_neon_h): Rename switcher and pass explicit flags. (aarch64_general_init_builtins): Ditto. * config/aarch64/aarch64-protos.h (class aarch64_simd_switcher): Rename to... (class aarch64_target_switcher): ...this, and add new members. (aarch64_get_required_features): New prototype. * config/aarch64/aarch64-sve-builtins.cc (sve_switcher::sve_switcher): Delete (sve_switcher::~sve_switcher): Delete (sve_alignment_switcher::sve_alignment_switcher): New (sve_alignment_switcher::~sve_alignment_switcher): New (register_builtin_types): Use alignment switcher (init_builtins): Rename switcher. (handle_arm_neon_sve_bridge_h): Ditto. (handle_arm_sme_h): Ditto. (handle_arm_sve_h): Ditto, and use alignment switcher. * config/aarch64/aarch64-sve-builtins.h (class sve_switcher): Delete. (class sme_switcher): Delete. (class sve_alignment_switcher): New. * config/aarch64/t-aarch64 (aarch64-builtins.o): Add $(REGS_H). (aarch64-sve-builtins.o): Remove $(REG_H).
-
Richard Earnshaw authored
The code in gcc.target/unsigned-extend-1.c really should not need an unsigned extension operations when the optimizers are used. For Arm and thumb2 that is indeed the case, but for thumb1 code it gets more complicated as there are too many instructions for combine to look at. For thumb1 we end up with two redundant zero_extend patterns which are not removed: the first after the subtract instruction and the second of the final boolean result. We can partially fix this (for the second case above) by adding a new split pattern for LEU and GEU patterns which work because the two instructions for the [LG]EU pattern plus the redundant extension instruction are combined into a single insn, which we can then split using the 3->2 method back into the two insns of the [LG]EU sequence. Because we're missing the optimization for all thumb1 cases (not just those architectures with UXTB), I've adjust the testcase to detect all the idioms that we might use for zero-extending a value, namely: UXTB AND ...#255 (in thumb1 this would require a register to hold 255) LSL ... #24; LSR ... #24 but I've also marked this test as XFAIL for thumb1 because we can't yet eliminate the first of the two extend instructions. gcc/ * config/arm/thumb1.md (split patterns for GEU and LEU): New. gcc/testsuite: * gcc.target/arm/unsigned-extend-1.c: Expand check for any insn suggesting a zero-extend. XFAIL for thumb1 code.
-
Uros Bizjak authored
This reverts commit f1c30c62.
-
Uros Bizjak authored
Reverse negative logic in !a ? b : c to become a ? c : b. No functional changes. gcc/ChangeLog: * combine.cc (distribute_notes): Reverse negative logic in ternary operators.
-
Uros Bizjak authored
The combine pass is trying to combine: Trying 16, 22, 21 -> 23: 16: r104:QI=flags:CCNO>0 22: {r120:QI=r104:QI^0x1;clobber flags:CC;} REG_UNUSED flags:CC 21: r119:QI=flags:CCNO<=0 REG_DEAD flags:CCNO 23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;} REG_DEAD r120:QI REG_DEAD r119:QI REG_UNUSED flags:CC and creates the following two insn sequence: modifying insn i2 22: r104:QI=flags:CCNO>0 REG_DEAD flags:CC deferring rescan insn with uid = 22. modifying insn i3 23: r110:QI=flags:CCNO<=0 REG_DEAD flags:CC deferring rescan insn with uid = 23. where the REG_DEAD note in i2 is not correct, because the flags register is still referenced in i3. In try_combine() megafunction, we have this part: --cut here-- /* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3. */ if (i3notes) distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); if (i2notes) distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); if (i1notes) distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL, elim_i2, local_elim_i1, local_elim_i0); if (i0notes) distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, local_elim_i0); if (midnotes) distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); --cut here-- where the compiler distributes REG_UNUSED note from i2: 22: {r120:QI=r104:QI^0x1;clobber flags:CC;} REG_UNUSED flags:CC via distribute_notes() using the following: --cut here-- /* Otherwise, if this register is used by I3, then this register now dies here, so we must put a REG_DEAD note here unless there is one already. */ else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3)) && ! (REG_P (XEXP (note, 0)) ? find_regno_note (i3, REG_DEAD, REGNO (XEXP (note, 0))) : find_reg_note (i3, REG_DEAD, XEXP (note, 0)))) { PUT_REG_NOTE_KIND (note, REG_DEAD); place = i3; } --cut here-- Flags register is used in I3, but there already is a REG_DEAD note in I3. The above condition doesn't trigger and continues in the "else" part where REG_DEAD note is put to I2. The proposed solution corrects the above logic to trigger every time the register is referenced in I3, avoiding the "else" part. PR rtl-optimization/118739 gcc/ChangeLog: * combine.cc (distribute_notes) <case REG_UNUSED>: Correct the logic when the register is used by I3. gcc/testsuite/ChangeLog: * gcc.target/i386/pr118739.c: New test.
-
Martin Jambor authored
Since we construct arithmetic jump functions even when there is a type conversion in between the operation encoded in the jump function and when it is passed in a call argument, the IPA propagation phase must also perform the operation and conversion in two steps. IPA-VR had actually been doing it even before for binary operations but, as PR 118756 exposes, not in the case on unary operations. This patch adds the necessary step to rectify that. Like in the scalar constant case, we depend on expr_type_first_operand_type_p to determine the type of the result of the arithmetic operation. On top this, the patch special-cases ABSU_EXPR because it looks useful an so that the PR testcase exercises the added code-path. This seems most appropriate for stage 4, long term we should probably stream the types, probably after also encoding them with a string of expr_eval_op rather than what we have today. A check for expr_type_first_operand_type_p was also missing in the handling of binary ops and the intermediate value_range was initialized with a wrong type, so I also fixed this. gcc/ChangeLog: 2025-02-24 Martin Jambor <mjambor@suse.cz> PR ipa/118785 * ipa-cp.cc (ipa_vr_intersect_with_arith_jfunc): Handle non-conversion unary operations separately before doing any conversions. Check expr_type_first_operand_type_p for non-unary operations too. Fix type of op_res. gcc/testsuite/ChangeLog: 2025-02-24 Martin Jambor <mjambor@suse.cz> PR ipa/118785 * g++.dg/lto/pr118785_0.C: New test.
-
Richard Biener authored
We are detecting a cycle as double reduction where the inner loop cycle has extra out-of-loop uses. This clashes at least with assumptions from the SLP discovery code which says the cycle isn't reachable from another SLP instance. It also was not intended to support this case, in fact with GCC 14 we seem to generate wrong code here. PR tree-optimization/119057 * tree-vect-loop.cc (check_reduction_path): Add argument specifying whether we're analyzing the inner loop of a double reduction. Do not allow extra uses outside of the double reduction cycle in this case. (vect_is_simple_reduction): Adjust. * gcc.dg/vect/pr119057.c: New testcase.
-
Richard Biener authored
odr_types_equivalent_p can end up using TYPE_PRECISION on vector types which is a no-go. The following instead uses TYPE_VECTOR_SUBPARTS for vector types so we also end up comparing the number of vector elements. PR ipa/119067 * ipa-devirt.cc (odr_types_equivalent_p): Check TYPE_VECTOR_SUBPARTS for vectors. * g++.dg/lto/pr119067_0.C: New testcase. * g++.dg/lto/pr119067_1.C: Likewise.
-
Andre Vehreschild authored
Fix a regression were adding a temporary variable inserted a copy of the argument to the elemental function. That copy was then later used to free allocated memory, but the freeing was not tracked in the source array correctly. PR fortran/118747 gcc/fortran/ChangeLog: * trans-array.cc (gfc_trans_array_ctor_element): Remove copy to temporary variable. * trans-expr.cc (gfc_conv_procedure_call): Use references to array members instead of copies when freeing after use. Formatting fix. gcc/testsuite/ChangeLog: * gfortran.dg/alloc_comp_auto_array_4.f90: New test.
-
GCC Administrator authored
-
- Mar 02, 2025
-
-
Jeff Law authored
I'm not sure if I goof'd this or if I merely upstreamed someone else's goof. Either way the long branch code isn't working correctly. We were using 'n' as the output modifier to negate the condition. But 'n' has a special meaning elsewhere, so when presented with a condition rather than what was expected, boom, the compiler ICE'd. Thankfully there's only a few places where we were using %n which I turned into %r. The BZ entry includes a good testcase, it just takes a long time to compile as it's trying to create the out-of-range scenario. I'm not including the testcase due to how long it takes, but I did test it locally to ensure it's working properly now. I'm sure that with a little bit of work I could create at testcase that worked before and fails with the trunk (by taking advantage of the fuzzyness in length computations). So I'm going to consider this a regression. Will push to the trunk after pre-commit testing does its thing. PR target/118934 gcc/ * config/riscv/corev.md (cv_branch): Adjust output template. (branch): Likewise. * config/riscv/riscv.md (branch): Likewise. * config/riscv/riscv.cc (riscv_asm_output_opcode): Handle 'r' rather than 'n'.
-
Gaius Mulley authored
This patch fixes an ICE which occurs when a FOR statement attempts to use an undeclared variable as its iterator. gcc/m2/ChangeLog: PR modula2/119088 * gm2-compiler/M2SymInit.mod (ConfigSymInit): Reimplement to defensively check for NulSym type. gcc/testsuite/ChangeLog: PR modula2/119088 * gm2/pim/fail/tinyfor4.mod: New test. Signed-off-by:
Gaius Mulley <gaiusmod2@gmail.com>
-
Sandra Loosemore authored
gcc/fortran/ChangeLog * intrinsic.texi: Fix inconsistent capitalization of argument names and other minor copy-editing.
-