Skip to content
Snippets Groups Projects
  1. Mar 04, 2025
    • Oscar Gustafsson's avatar
      __builtin_bswapXX: improve docs · 5452b50a
      Oscar Gustafsson authored
      gcc/ChangeLog:
      
      	* doc/extend.texi: Improve example for __builtin_bswap16.
      5452b50a
    • Jan Hubicka's avatar
      Break false dependency chain on Zen5 · 8c4a00f9
      Jan Hubicka authored
      Zen5 on some variants has false dependency on tzcnt, blsi, blsr and blsmsk
      instructions.  Those can be tested by the following benchmark
      
      jh@shroud:~> cat ee.c
      int
      main()
      {
             int a = 10;
             int b = 0;
             for (int i = 0; i < 1000000000; i++)
             {
                     asm volatile ("xor %0, %0": "=r" (b));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
                     asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
             }
             return 0;
      }
      jh@shroud:~> cat bmk.sh
      gcc ee.c -DBREAK -DINST=\"$1\" -O2 ; time ./a.out ; gcc ee.c -DINST=\"$1\" -O2 ; time ./a.out
      jh@shroud:~> sh bmk.sh tzcnt
      
      real    0m0.886s
      user    0m0.886s
      sys     0m0.000s
      
      real    0m0.886s
      user    0m0.886s
      sys     0m0.000s
      
      jh@shroud:~> sh bmk.sh blsi
      
      real    0m0.979s
      user    0m0.979s
      sys     0m0.000s
      
      real    0m2.418s
      user    0m2.418s
      sys     0m0.000s
      
      jh@shroud:~> sh bmk.sh blsr
      
      real    0m0.986s
      user    0m0.986s
      sys     0m0.000s
      
      real    0m2.422s
      user    0m2.421s
      sys     0m0.000s
      jh@shroud:~> sh bmk.sh blsmsk
      
      real    0m0.973s
      user    0m0.973s
      sys     0m0.000s
      
      real    0m2.422s
      user    0m2.422s
      sys     0m0.000s
      
      We already have runable that controls tzcnt together with lzcnt and popcnt.
      Since it seems that only tzcnt is affected I added new tunable to control tzcnt
      only.  I also added splitters for blsi/blsr/blsmsk implemented analogously to
      existing splitter for lzcnt.
      
      The patch is neutral on SPEC. We produce blsi and blsr in some internal loops, but
      they usually have same destination as source. However it is good to break the
      dependency chain to avoid patogolical cases and it is quite cheap overall, so I
      think we want to enable this for generic.  I will send followup patch for this.
      
      Bootstrapped/regtested x86_64-linux, will commit it shortly.
      
      gcc/ChangeLog:
      
      	* config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_TZCNT): New macro.
      	(TARGET_AVOID_FALSE_DEP_FOR_BLS): New macro.
      	* config/i386/i386.md (*bmi_blsi_<mode>): Add splitter for false
      	dependency.
      	(*bmi_blsi_<mode>_ccno): Add splitter for false dependency.
      	(*bmi_blsi_<mode>_falsedep): New pattern.
      	(*bmi_blsmsk_<mode>): Add splitter for false dependency.
      	(*bmi_blsmsk_<mode>_falsedep): New pattern.
      	(*bmi_blsr_<mode>): Add splitter for false dependency.
      	(*bmi_blsr_<mode>_cmp): Add splitter for false dependency
      	(*bmi_blsr_<mode>_cmp_falsedep): New pattern.
      	* config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT): New tune.
      	(X86_TUNE_AVOID_FALSE_DEP_FOR_BLS): New tune.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/blsi.c: New test.
      	* gcc.target/i386/blsmsk.c: New test.
      	* gcc.target/i386/blsr.c: New test.
      8c4a00f9
    • Andre Vehreschild's avatar
      Fortran: Fix gimplification error on assignment to pointer [PR103391] · 04909c7e
      Andre Vehreschild authored
      	PR fortran/103391
      
      gcc/fortran/ChangeLog:
      
      	* trans-expr.cc (gfc_trans_assignment_1): Do not use poly assign
      	for pointer arrays on lhs (as it is done for allocatables
      	already).
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/assign_12.f90: New test.
      04909c7e
    • Jan Hubicka's avatar
      Make ix86_macro_fusion_pair_p and ix86_fuse_mov_alu_p match current CPUs · c84be624
      Jan Hubicka authored
      The current implementation of fussion predicates misses some common
      fussion cases on zen and more recent cores.  I added knobs for
      individual conditionals we test.
      
       1) I split checks for fusing ALU with conditional operands when the ALU
       has memory operand.  This seems to be supported by zen3+ and by
       tigerlake and coperlake (according to Agner Fog's manual)
      
       2) znver4 and 5 supports fussion of ALU and conditional even if ALU has
          memory and immediate operands.
          This seems to be relatively important enabling 25% more fusions on
          gcc bootstrap.
      
       3) no CPU supports fusing when ALU contains IP relative memory
          references.  I added separate knob so we do not forger about this if
          this gets supoorted later.
      
      The patch does not solve the limitation of sched that fuse pairs must be
      adjacent on imput and the first operation must be signle-set.  Fixing
      single-set is easy (I have separate patch for this), for non-adjacent
      pairs we need bigger surgery.
      
      To verify what CPU really does I made simpe test script.
      
      jh@ryzen3:~> cat fuse-test.c
              int b;
              const int z = 0;
              const int o = 1;
              int
      main()
      {
              int a = 1000000000;
              int b;
              int z = 0;
              int o = 1;
              asm volatile ("\n"
      ".L1234:\n"
              "nop\n"
              "subl   %3, %0\n"
      
              "movl %0, %1\n"
              "cmpl     %2, %1\n"
              "movl %0, %1\n"
              "test %1, %1\n"
      
              "nop\n"
              "jne    .L1234":"=a"(a),
              "=m"(b)
              "=r"(b)
              :
              "m"(z),
              "m"(o),
              "i"(0),
              "i"(1),
              "0"(a)
                      );
      }
      jh@ryzen3:~> cat fuse-test.sh
      EVENT=ex_ret_fused_instr
      dotest()
      {
      gcc -O2  fuse-test.c $* -o fuse-cmp-imm-mem-nofuse
      perf stat -e $EVENT ./fuse-cmp-imm-mem-nofuse  2>&1 | grep $EVENT
      gcc -O2 fuse-test.c -DFUSE $* -o fuse-cmp-imm-mem-fuse
      perf stat  -e $EVENT ./fuse-cmp-imm-mem-fuse 2>&1 | grep $EVENT
      }
      
      echo ALU with immediate
      dotest
      echo ALU with memory
      dotest -D MEM
      echo ALU with IP relative memory
      dotest -D MEM -D IPRELATIVE
      echo CMP with immediate
      dotest -D CMP
      echo CMP with memory
      dotest -D CMP -D MEM
      echo CMP with memory and immediate
      dotest -D CMP -D MEMIMM
      echo CMP with IP relative memory
      dotest -D CMP -D MEM -D IPRELATIVE
      echo TEST
      dotest -D TEST
      
      On zen5 I get:
      ALU with immediate
                  20,345      ex_ret_fused_instr:u
           1,000,020,278      ex_ret_fused_instr:u
      ALU with memory
                  20,367      ex_ret_fused_instr:u
           1,000,020,290      ex_ret_fused_instr:u
      ALU with IP relative memory
                  20,395      ex_ret_fused_instr:u
                  20,403      ex_ret_fused_instr:u
      CMP with immediate
                  20,369      ex_ret_fused_instr:u
           1,000,020,301      ex_ret_fused_instr:u
      CMP with memory
                  20,314      ex_ret_fused_instr:u
           1,000,020,341      ex_ret_fused_instr:u
      CMP with memory and immediate
                  20,372      ex_ret_fused_instr:u
           1,000,020,266      ex_ret_fused_instr:u
      CMP with IP relative memory
                  20,382      ex_ret_fused_instr:u
                  20,369      ex_ret_fused_instr:u
      TEST
                  20,346      ex_ret_fused_instr:u
           1,000,020,301      ex_ret_fused_instr:u
      
      IP relative memory seems to not be documented.
      
      On zen3/4 I get:
      
      ALU with immediate
                  20,263      ex_ret_fused_instr:u
           1,000,020,051      ex_ret_fused_instr:u
      ALU with memory
                  20,255      ex_ret_fused_instr:u
           1,000,020,056      ex_ret_fused_instr:u
      ALU with IP relative memory
                  20,253      ex_ret_fused_instr:u
                  20,266      ex_ret_fused_instr:u
      CMP with immediate
                  20,264      ex_ret_fused_instr:u
           1,000,020,052      ex_ret_fused_instr:u
      CMP with memory
                  20,253      ex_ret_fused_instr:u
           1,000,019,794      ex_ret_fused_instr:u
      CMP with memory and immediate
                  20,260      ex_ret_fused_instr:u
                  20,264      ex_ret_fused_instr:u
      CMP with IP relative memory
                  20,258      ex_ret_fused_instr:u
                  20,256      ex_ret_fused_instr:u
      TEST
                  20,261      ex_ret_fused_instr:u
           1,000,020,048      ex_ret_fused_instr:u
      
      zen1 and 2 gets:
      
      ALU with immediate
                  21,610      ex_ret_fus_brnch_inst:u
                  21,697      ex_ret_fus_brnch_inst:u
      ALU with memory
                  21,479      ex_ret_fus_brnch_inst:u
                  21,747      ex_ret_fus_brnch_inst:u
      ALU with IP relative memory
                  21,623      ex_ret_fus_brnch_inst:u
                  21,684      ex_ret_fus_brnch_inst:u
      CMP with immediate
                  21,708      ex_ret_fus_brnch_inst:u
           1,000,021,288      ex_ret_fus_brnch_inst:u
      CMP with memory
                  21,689      ex_ret_fus_brnch_inst:u
           1,000,004,270      ex_ret_fus_brnch_inst:u
      CMP with memory and immediate
                  21,604      ex_ret_fus_brnch_inst:u
                  21,671      ex_ret_fus_brnch_inst:u
      CMP with IP relative memory
                  21,589      ex_ret_fus_brnch_inst:u
                  21,602      ex_ret_fus_brnch_inst:u
      TEST
                  21,600      ex_ret_fus_brnch_inst:u
           1,000,021,233      ex_ret_fus_brnch_inst:u
      
      I tested the patch on zen3 and zen5 and spec2k17 and it seems neutral, however
      the number of fussion does go up.
      
      Bootstrapped/regtested x86_64-linux, I plan to commit it tomorrow.
      
      Honza
      
      gcc/ChangeLog:
      
      	* config/i386/i386.h (TARGET_FUSE_ALU_AND_BRANCH_MEM): New macro.
      	(TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM): New macro.
      	(TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New macro.
      	* config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Support
      	non-single-set.
      	(ix86_macro_fusion_pair_p): Allow ALU which only clobbers;
      	be more careful about immediates; check TARGET_FUSE_ALU_AND_BRANCH_MEM,
      	TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM, TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE;
      	verify that we never use unsigned checks with inc/dec.
      	* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): New tune.
      	(X86_TUNE_FUSE_ALU_AND_BRANCH_MEM): New tune.
      	(X86_TUNE_FUSE_ALU_AND_BRANCH_MEM_IMM): New tune.
      	(X86_TUNE_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New tune.
      c84be624
    • Marek Polacek's avatar
      c++: ICE with RANGE_EXPR and array init [PR109431] · 173cf7c9
      Marek Polacek authored
      
      We crash because we generate
      
        {[0 ... 1]={.low=0, .high=1}, [1]={.low=0, .high=1}}
      
      which output_constructor_regular_field doesn't want to see.  This
      happens since r9-1483: process_init_constructor_array can now create
      a RANGE_EXPR.  But the bug isn't in that patch; the problem is that
      build_vec_init doesn't handle RANGE_EXPRs.
      
      build_vec_init has a FOR_EACH_CONSTRUCTOR_ELT loop which populates
      const_vec.  In this case it loops over the elements of
      
        {[0 ... 1]={.low=0, .high=1}}
      
      but assumes that each element initializes one element.  So after the
      loop num_initialized_elts was 1, and then below:
      
                    HOST_WIDE_INT last = tree_to_shwi (maxindex);
                    if (num_initialized_elts <= last)
                      {
                        tree field = size_int (num_initialized_elts);
                        if (num_initialized_elts != last)
                          field = build2 (RANGE_EXPR, sizetype, field,
                                          size_int (last));
                        CONSTRUCTOR_APPEND_ELT (const_vec, field, e);
                      }
      
      we added the extra initializer.
      
      It seemed convenient to use range_expr_nelts like below.
      
      	PR c++/109431
      
      gcc/cp/ChangeLog:
      
      	* cp-tree.h (range_expr_nelts): Declare.
      	* init.cc (build_vec_init): If the CONSTRUCTOR's index is a
      	RANGE_EXPR, use range_expr_nelts to count how many elements
      	were initialized.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/init/array67.C: New test.
      
      Reviewed-by: default avatarJason Merrill <jason@redhat.com>
      173cf7c9
    • Tamar Christina's avatar
      aarch64: force operand to fresh register to avoid subreg issues [PR118892] · d883f323
      Tamar Christina authored
      When the input is already a subreg and we try to make a paradoxical
      subreg out of it for copysign this can fail if it violates the subreg
      relationship.
      
      Use force_lowpart_subreg instead of lowpart_subreg to then force the
      results to a register instead of ICEing.
      
      gcc/ChangeLog:
      
      	PR target/118892
      	* config/aarch64/aarch64.md (copysign<GPF:mode>3): Use
      	force_lowpart_subreg instead of lowpart_subreg.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/118892
      	* gcc.target/aarch64/copysign-pr118892.c: New test.
      d883f323
    • Jonathan Wakely's avatar
      libstdc++: Remove stray comma in testing docs · ac16d6d7
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	* doc/xml/manual/test.xml: Remove stray comma.
      	* doc/html/manual/test.html: Regenerate.
      ac16d6d7
    • Richard Sandiford's avatar
      Fix folding of BIT_NOT_EXPR for POLY_INT_CST [PR118976] · 78380fd7
      Richard Sandiford authored
      There was an embarrassing typo in the folding of BIT_NOT_EXPR for
      POLY_INT_CSTs: it used - rather than ~ on the poly_int.  Not sure
      how that happened, but it might have been due to the way that
      ~x is implemented as -1 - x internally.
      
      gcc/
      	PR tree-optimization/118976
      	* fold-const.cc (const_unop): Use ~ rather than - for BIT_NOT_EXPR.
      	* config/aarch64/aarch64.cc (aarch64_test_sve_folding): New function.
      	(aarch64_run_selftests): Run it.
      78380fd7
    • Richard Sandiford's avatar
      simplify-rtx: Fix up simplify_logical_relational_operation [PR119002] · 1ff01a88
      Richard Sandiford authored
      
      The following testcase is miscompiled on powerpc64le-linux starting with
      r15-6777.  During combine we see:
      
      (set (reg:SI 134)
          (ior:SI (ge:SI (reg:CCFP 128)
                  (const_int 0 [0]))
              (lt:SI (reg:CCFP 128)
                  (const_int 0 [0]))))
      
      The simplify_logical_relational_operation code (in its current form)
      was written with arithmetic rather than CC modes in mind.  Since CCFP
      is a CC mode, it fails the HONOR_NANS check, and so the function assumes
      that ge | lt => true.
      
      If one comparison is unsigned then it should be safe to assume that
      the other comparison is also unsigned, even for CC modes, since the
      optimisation checks that the comparisons are between the same operands.
      For the other cases, we can only safely fold comparisons of CC mode
      values if the result is always-true (15) or always-false (0).
      
      It turns out that the original testcase for PR117186, which ran at -O,
      was relying on the old behaviour for some of the functions.  It needs
      4-instruction combinations, and so -fexpensive-optimizations, to pass
      in its intended form.
      
      gcc/
      	PR rtl-optimization/119002
      	* simplify-rtx.cc
      	(simplify_context::simplify_logical_relational_operation): Handle
      	comparisons between CC values.  If there is no evidence that the
      	CC values are unsigned, restrict the fold to always-true or
      	always-false results.
      
      gcc/testsuite/
      	* gcc.c-torture/execute/ieee/pr119002.c: New test.
      	* gcc.target/aarch64/pr117186.c: Run at -O2 rather than -O.
      
      Co-authored-by: default avatarJakub Jelinek <jakub@redhat.com>
      1ff01a88
    • Jakub Jelinek's avatar
      testsuite: Add tests for already fixed PR [PR119071] · ccf9db9a
      Jakub Jelinek authored
      Uros' r15-7793 fixed this PR as well, I'm just committing tests
      from the PR so that it can be closed.
      
      2025-03-04  Jakub Jelinek  <jakub@redhat.com>
      
      	PR rtl-optimization/119071
      	* gcc.dg/pr119071.c: New test.
      	* gcc.c-torture/execute/pr119071.c: New test.
      ccf9db9a
    • Andre Vehreschild's avatar
      Fortran: Prevent ICE when getting caf-token from abstract type [PR77872] · 5bd66483
      Andre Vehreschild authored
      	PR fortran/77872
      
      gcc/fortran/ChangeLog:
      
      	* trans-expr.cc (gfc_get_tree_for_caf_expr): Pick up token from
      	decl when it is present there for class types.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/coarray/class_1.f90: New test.
      5bd66483
    • Andre Vehreschild's avatar
      Fortran: Reduce code complexity [PR77872] · ef605e10
      Andre Vehreschild authored
      	PR fortran/77872
      
      gcc/fortran/ChangeLog:
      
      	* trans-expr.cc (gfc_conv_procedure_call): Use attr instead of
      	doing type check and branching for BT_CLASS.
      ef605e10
    • Richard Biener's avatar
      tree-optimization/119096 - bogus conditional reduction vectorization · 10e4107d
      Richard Biener authored
      When we vectorize a .COND_ADD reduction and apply the single-use-def
      cycle optimization we can end up chosing the wrong else value for
      subsequent .COND_ADD.  The following rectifies this.
      
      	PR tree-optimization/119096
      	* tree-vect-loop.cc (vect_transform_reduction): Use the
      	correct else value for .COND_fn.
      
      	* gcc.dg/vect/pr119096.c: New testcase.
      10e4107d
    • Pan Li's avatar
      RISC-V: Fix the test case bug-3.c failure · bfb9276f
      Pan Li authored
      
      The bug-3.c would like to check the slli a[0-9]+, a[0-9]+, 33 for the
      big poly int handling.  But the underlying insn may change to slli 1
      + slli 32 with sorts of optimization.  Thus, update the asm check to
      function body check with above slli 1 + slli 32 series.
      
      The below test suites are passed for this patch.
      * The rv64gcv fully regression test.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/bug-3.c: Update asm check to
      	function body check.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      bfb9276f
    • GCC Administrator's avatar
      Daily bump. · 491c0b80
      GCC Administrator authored
      491c0b80
  2. Mar 03, 2025
    • Joseph Myers's avatar
      Update .po files · 6fdc64ed
      Joseph Myers authored
      gcc/po/
      	* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po,
      	ja.po, ka.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po,
      	zh_CN.po, zh_TW.po: Update.
      
      libcpp/po/
      	* be.po, ca.po, da.po, de.po, el.po, eo.po, es.po, fi.po, fr.po,
      	id.po, ja.po, ka.po, nl.po, pt_BR.po, ro.po, ru.po, sr.po, sv.po,
      	tr.po, uk.po, vi.po, zh_CN.po, zh_TW.po: Update.
      6fdc64ed
    • Harald Anlauf's avatar
      Fortran: reject empty derived type with bind(C) attribute [PR101577] · f9f16b9f
      Harald Anlauf authored
      	PR fortran/101577
      
      gcc/fortran/ChangeLog:
      
      	* symbol.cc (verify_bind_c_derived_type): Generate error message
      	for derived type with no components in standard conformance mode,
      	indicating that this is a GNU extension.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/empty_derived_type.f90: Adjust dg-options.
      	* gfortran.dg/empty_derived_type_2.f90: New test.
      f9f16b9f
    • Andrew Carlotti's avatar
      aarch64: Ignore target pragmas while defining intrinsics · 71355700
      Andrew Carlotti authored
      Refactor the switcher classes into two separate classes:
      
      - sve_alignment_switcher takes the alignment switching functionality,
        and is used only for ABI correctness when defining sve structure
        types.
      - aarch64_target_switcher takes the rest of the functionality of
        aarch64_simd_switcher and sve_switcher, and gates simd/sve specific
        parts upon the specified feature flags.
      
      Additionally, aarch64_target_switcher now adds dependencies of the
      specified flags (which adds +fcma and +bf16 to some intrinsic
      declarations), and unsets current_target_pragma.
      
      This last change fixes an internal bug where we would sometimes add a
      user specified target pragma (stored in current_target_pragma) on top of
      an internally specified target architecture while initialising
      intrinsics with `#pragma GCC aarch64 "arm_*.h"`.  As far as I can tell, this
      has no visible impact at the moment.  However, the unintended target
      feature combinations lead to unwanted behaviour in an under-development
      patch.
      
      This also fixes a missing Makefile dependency, which was due to
      aarch64-sve-builtins.o incorrectly depending on the undefined $(REG_H).
      The correct $(REGS_H) dependency is added to the switcher's new source
      location.
      
      gcc/ChangeLog:
      
      	* common/config/aarch64/aarch64-common.cc
      	(struct aarch64_extension_info): Add field.
      	(aarch64_get_required_features): New.
      	* config/aarch64/aarch64-builtins.cc
      	(aarch64_simd_switcher::aarch64_simd_switcher): Rename to...
      	(aarch64_target_switcher::aarch64_target_switcher): ...this,
      	and extend to handle sve, nosimd and target pragmas.
      	(aarch64_simd_switcher::~aarch64_simd_switcher): Rename to...
      	(aarch64_target_switcher::~aarch64_target_switcher): ...this,
      	and extend to handle sve, nosimd and target pragmas.
      	(handle_arm_acle_h): Use aarch64_target_switcher.
      	(handle_arm_neon_h): Rename switcher and pass explicit flags.
      	(aarch64_general_init_builtins): Ditto.
      	* config/aarch64/aarch64-protos.h
      	(class aarch64_simd_switcher): Rename to...
      	(class aarch64_target_switcher): ...this, and add new members.
      	(aarch64_get_required_features): New prototype.
      	* config/aarch64/aarch64-sve-builtins.cc
      	(sve_switcher::sve_switcher): Delete
      	(sve_switcher::~sve_switcher): Delete
      	(sve_alignment_switcher::sve_alignment_switcher): New
      	(sve_alignment_switcher::~sve_alignment_switcher): New
      	(register_builtin_types): Use alignment switcher
      	(init_builtins): Rename switcher.
      	(handle_arm_neon_sve_bridge_h): Ditto.
      	(handle_arm_sme_h): Ditto.
      	(handle_arm_sve_h): Ditto, and use alignment switcher.
      	* config/aarch64/aarch64-sve-builtins.h
      	(class sve_switcher): Delete.
      	(class sme_switcher): Delete.
      	(class sve_alignment_switcher): New.
      	* config/aarch64/t-aarch64 (aarch64-builtins.o): Add $(REGS_H).
      	(aarch64-sve-builtins.o): Remove $(REG_H).
      71355700
    • Richard Earnshaw's avatar
      arm: remove some redundant zero_extend ops on thumb1 · 2a502f9e
      Richard Earnshaw authored
      The code in gcc.target/unsigned-extend-1.c really should not need an
      unsigned extension operations when the optimizers are used.  For Arm
      and thumb2 that is indeed the case, but for thumb1 code it gets more
      complicated as there are too many instructions for combine to look at.
      For thumb1 we end up with two redundant zero_extend patterns which are
      not removed: the first after the subtract instruction and the second of
      the final boolean result.
      
      We can partially fix this (for the second case above) by adding a new
      split pattern for LEU and GEU patterns which work because the two
      instructions for the [LG]EU pattern plus the redundant extension
      instruction are combined into a single insn, which we can then split
      using the 3->2 method back into the two insns of the [LG]EU sequence.
      
      Because we're missing the optimization for all thumb1 cases (not just
      those architectures with UXTB), I've adjust the testcase to detect all
      the idioms that we might use for zero-extending a value, namely:
      
             UXTB
             AND ...#255 (in thumb1 this would require a register to hold 255)
             LSL ... #24; LSR ... #24
      
      but I've also marked this test as XFAIL for thumb1 because we can't yet
      eliminate the first of the two extend instructions.
      
      gcc/
      	* config/arm/thumb1.md (split patterns for GEU and LEU): New.
      
      gcc/testsuite:
      	* gcc.target/arm/unsigned-extend-1.c: Expand check for any
      	insn suggesting a zero-extend.  XFAIL for thumb1 code.
      2a502f9e
    • Uros Bizjak's avatar
      Revert "combine: Reverse negative logic in ternary operator" · ebc6c54e
      Uros Bizjak authored
      This reverts commit f1c30c62.
      ebc6c54e
    • Uros Bizjak's avatar
      combine: Reverse negative logic in ternary operator · f1c30c62
      Uros Bizjak authored
      Reverse negative logic in !a ? b : c to become a ? c : b.
      
      No functional changes.
      
      gcc/ChangeLog:
      
      	* combine.cc (distribute_notes):
      	Reverse negative logic in ternary operators.
      f1c30c62
    • Uros Bizjak's avatar
      combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739] · a92dc3fe
      Uros Bizjak authored
      The combine pass is trying to combine:
      
      Trying 16, 22, 21 -> 23:
         16: r104:QI=flags:CCNO>0
         22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
            REG_UNUSED flags:CC
         21: r119:QI=flags:CCNO<=0
            REG_DEAD flags:CCNO
         23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;}
            REG_DEAD r120:QI
            REG_DEAD r119:QI
            REG_UNUSED flags:CC
      
      and creates the following two insn sequence:
      
      modifying insn i2    22: r104:QI=flags:CCNO>0
            REG_DEAD flags:CC
      deferring rescan insn with uid = 22.
      modifying insn i3    23: r110:QI=flags:CCNO<=0
            REG_DEAD flags:CC
      deferring rescan insn with uid = 23.
      
      where the REG_DEAD note in i2 is not correct, because the flags
      register is still referenced in i3.  In try_combine() megafunction,
      we have this part:
      
      --cut here--
          /* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3.  */
          if (i3notes)
            distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL,
      			elim_i2, elim_i1, elim_i0);
          if (i2notes)
            distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL,
      			elim_i2, elim_i1, elim_i0);
          if (i1notes)
            distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL,
      			elim_i2, local_elim_i1, local_elim_i0);
          if (i0notes)
            distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL,
      			elim_i2, elim_i1, local_elim_i0);
          if (midnotes)
            distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL,
      			elim_i2, elim_i1, elim_i0);
      --cut here--
      
      where the compiler distributes REG_UNUSED note from i2:
      
         22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
            REG_UNUSED flags:CC
      
      via distribute_notes() using the following:
      
      --cut here--
      	  /* Otherwise, if this register is used by I3, then this register
      	     now dies here, so we must put a REG_DEAD note here unless there
      	     is one already.  */
      	  else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3))
      		   && ! (REG_P (XEXP (note, 0))
      			 ? find_regno_note (i3, REG_DEAD,
      					    REGNO (XEXP (note, 0)))
      			 : find_reg_note (i3, REG_DEAD, XEXP (note, 0))))
      	    {
      	      PUT_REG_NOTE_KIND (note, REG_DEAD);
      	      place = i3;
      	    }
      --cut here--
      
      Flags register is used in I3, but there already is a REG_DEAD note in I3.
      The above condition doesn't trigger and continues in the "else" part where
      REG_DEAD note is put to I2.  The proposed solution corrects the above
      logic to trigger every time the register is referenced in I3, avoiding the
      "else" part.
      
      	PR rtl-optimization/118739
      
      gcc/ChangeLog:
      
      	* combine.cc (distribute_notes) <case REG_UNUSED>: Correct the
      	logic when the register is used by I3.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/i386/pr118739.c: New test.
      a92dc3fe
    • Martin Jambor's avatar
      ipa-vr: Handle non-conversion unary ops separately from conversions (PR 118785) · d05b64bd
      Martin Jambor authored
      Since we construct arithmetic jump functions even when there is a
      type conversion in between the operation encoded in the jump function
      and when it is passed in a call argument, the IPA propagation phase
      must also perform the operation and conversion in two steps.  IPA-VR
      had actually been doing it even before for binary operations but, as
      PR 118756 exposes, not in the case on unary operations.  This patch
      adds the necessary step to rectify that.
      
      Like in the scalar constant case, we depend on
      expr_type_first_operand_type_p to determine the type of the result of
      the arithmetic operation.  On top this, the patch special-cases
      ABSU_EXPR because it looks useful an so that the PR testcase exercises
      the added code-path.  This seems most appropriate for stage 4, long
      term we should probably stream the types, probably after also encoding
      them with a string of expr_eval_op rather than what we have today.
      
      A check for expr_type_first_operand_type_p was also missing in the
      handling of binary ops and the intermediate value_range was
      initialized with a wrong type, so I also fixed this.
      
      gcc/ChangeLog:
      
      2025-02-24  Martin Jambor  <mjambor@suse.cz>
      
      	PR ipa/118785
      
      	* ipa-cp.cc (ipa_vr_intersect_with_arith_jfunc): Handle non-conversion
      	unary operations separately before doing any conversions.  Check
      	expr_type_first_operand_type_p for non-unary operations too.  Fix type
      	of op_res.
      
      gcc/testsuite/ChangeLog:
      
      2025-02-24  Martin Jambor  <mjambor@suse.cz>
      
      	PR ipa/118785
      	* g++.dg/lto/pr118785_0.C: New test.
      d05b64bd
    • Richard Biener's avatar
      tree-optimization/119057 - bogus double reduction detection · 758de626
      Richard Biener authored
      We are detecting a cycle as double reduction where the inner loop
      cycle has extra out-of-loop uses.  This clashes at least with
      assumptions from the SLP discovery code which says the cycle
      isn't reachable from another SLP instance.  It also was not intended
      to support this case, in fact with GCC 14 we seem to generate wrong
      code here.
      
      	PR tree-optimization/119057
      	* tree-vect-loop.cc (check_reduction_path): Add argument
      	specifying whether we're analyzing the inner loop of a
      	double reduction.  Do not allow extra uses outside of the
      	double reduction cycle in this case.
      	(vect_is_simple_reduction): Adjust.
      
      	* gcc.dg/vect/pr119057.c: New testcase.
      758de626
    • Richard Biener's avatar
      ipa/119067 - bogus TYPE_PRECISION check on VECTOR_TYPE · f22e8916
      Richard Biener authored
      odr_types_equivalent_p can end up using TYPE_PRECISION on vector
      types which is a no-go.  The following instead uses TYPE_VECTOR_SUBPARTS
      for vector types so we also end up comparing the number of vector elements.
      
      	PR ipa/119067
      	* ipa-devirt.cc (odr_types_equivalent_p): Check
      	TYPE_VECTOR_SUBPARTS for vectors.
      
      	* g++.dg/lto/pr119067_0.C: New testcase.
      	* g++.dg/lto/pr119067_1.C: Likewise.
      f22e8916
    • Andre Vehreschild's avatar
      Fortran: Fix regression on double free on elemental function [PR118747] · 43c11931
      Andre Vehreschild authored
      Fix a regression were adding a temporary variable inserted a copy of the
      argument to the elemental function.  That copy was then later used to
      free allocated memory, but the freeing was not tracked in the source
      array correctly.
      
      	PR fortran/118747
      
      gcc/fortran/ChangeLog:
      
      	* trans-array.cc (gfc_trans_array_ctor_element): Remove copy to
      	temporary variable.
      	* trans-expr.cc (gfc_conv_procedure_call): Use references to
      	array members instead of copies when freeing after use.
      	Formatting fix.
      
      gcc/testsuite/ChangeLog:
      
      	* gfortran.dg/alloc_comp_auto_array_4.f90: New test.
      43c11931
    • GCC Administrator's avatar
      Daily bump. · 0163d505
      GCC Administrator authored
      0163d505
  3. Mar 02, 2025
    • Jeff Law's avatar
      [RISC-V][PR target/118934] Fix ICE in RISC-V long branch support · 67e824c2
      Jeff Law authored
      I'm not sure if I goof'd this or if I merely upstreamed someone else's goof.
      Either way the long branch code isn't working correctly.
      
      We were using 'n' as the output modifier to negate the condition.  But 'n' has
      a special meaning elsewhere, so when presented with a condition rather than
      what was expected, boom, the compiler ICE'd.
      
      Thankfully there's only a few places where we were using %n which I turned into
      %r.
      
      The BZ entry includes a good testcase, it just takes a long time to compile as
      it's trying to create the out-of-range scenario.  I'm not including the
      testcase due to how long it takes, but I did test it locally to ensure it's
      working properly now.
      
      I'm sure that with a little bit of work I could create at testcase that worked
      before and fails with the trunk (by taking advantage of the fuzzyness in length
      computations).  So I'm going to consider this a regression.
      
      Will push to the trunk after pre-commit testing does its thing.
      
      	PR target/118934
      gcc/
      	* config/riscv/corev.md (cv_branch): Adjust output template.
      	(branch): Likewise.
      	* config/riscv/riscv.md (branch): Likewise.
      	* config/riscv/riscv.cc (riscv_asm_output_opcode): Handle 'r' rather
      	than 'n'.
      67e824c2
    • Gaius Mulley's avatar
      PR modula2/119088 ICE when for loop accesses an unknown variable as the iterator · 585aa406
      Gaius Mulley authored
      
      This patch fixes an ICE which occurs when a FOR statement attempts to
      use an undeclared variable as its iterator.
      
      gcc/m2/ChangeLog:
      
      	PR modula2/119088
      	* gm2-compiler/M2SymInit.mod (ConfigSymInit): Reimplement to
      	defensively check for NulSym type.
      
      gcc/testsuite/ChangeLog:
      
      	PR modula2/119088
      	* gm2/pim/fail/tinyfor4.mod: New test.
      
      Signed-off-by: default avatarGaius Mulley <gaiusmod2@gmail.com>
      585aa406
    • Sandra Loosemore's avatar
      Fortran: Small fixes in intrinsic.texi. · 43a9022a
      Sandra Loosemore authored
      gcc/fortran/ChangeLog
      	* intrinsic.texi: Fix inconsistent capitalization of argument
      	names and other minor copy-editing.
      43a9022a
    • Sandra Loosemore's avatar
      Fortran: Move "Standard" subheading in documentation [PR47928] · 43f2bc4a
      Sandra Loosemore authored
      As noted in the issue, the version of the standard an intrinsic was
      introduced in is usually not the second-most-important thing a user
      needs to know.  This patch moves it from near the beginning of each
      section towards the end, just ahead of "See also".
      
      gcc/fortran/ChangeLog
      	PR fortran/47928
      	* intrinsic.texi: Move the "Standard" subheading farther down.
      43f2bc4a
    • Sandra Loosemore's avatar
      Fortran: Rename/move "Syntax" subheading in documentation [PR47928] · 9edd165e
      Sandra Loosemore authored
      As suggested in the issue, it makes more sense to describe the function
      call argument syntax before talking about the arguments in the description.
      
      gcc/fortran/ChangeLog
      	PR fortran/47928
      	* gfortran.texi: Move all the "Syntax" subheadings ahead of
      	"Description", and rename to "Synopsis".
      	* intrinsic.texi: Likewise.
      9edd165e
    • Sandra Loosemore's avatar
      Fortran: Whitespace cleanup in documentation [PR47928] · 1f458cfc
      Sandra Loosemore authored
      This is a preparatory patch for the main changes requested in the issue.
      
      gcc/fortran/ChangeLog
      	PR fortran/47928
      	* intrinsic.texi: Put a blank line between "@item @emph{}"
      	subheadings, but not more than one.
      1f458cfc
    • Sandra Loosemore's avatar
      Fortran: Tidy subheadings in Fortran documentation [PR47928] · d8f5e1b0
      Sandra Loosemore authored
      This is a preparatory patch for the main documentation changes requested
      in the issue.
      
      gcc/fortran/ChangeLog
      	PR fortran/47928
      	* gfortran.texi: Consistently use "@emph{Notes}:" instead of
      	other spellings.
      	* intrinsic.texi: Likewise.  Also fix an inconsistent capitalization
      	and remove a redundant "Standard" entry.
      d8f5e1b0
    • Jakub Jelinek's avatar
      avr: Fix up avr_print_operand diagnostics [PR118991] · 047b7f9a
      Jakub Jelinek authored
      As can be seen in gcc/po/gcc.pot:
       #: config/avr/avr.cc:2754
       #, c-format
       msgid "bad I/O address 0x"
       msgstr ""
      
      exgettext couldn't retrieve the whole format string in this case,
      because it uses a macro in the middle.  output_operand_lossage
      is c-format function though, so we can't use %wx to print HOST_WIDE_INT,
      and HOST_WIDE_INT_PRINT_HEX_PURE is on some hosts %lx, on others %llx
      and on others %I64x so isn't really translatable that way.
      
      As Joseph mentioned in the PR, there is no easy way around this
      but go through a temporary buffer, which the following patch does.
      
      2025-03-02  Jakub Jelinek  <jakub@redhat.com>
      
      	PR translation/118991
      	* config/avr/avr.cc (avr_print_operand): Print ival into
      	a temporary buffer and use %s in output_operand_lossage to make
      	the diagnostics translatable.
      047b7f9a
    • Filip Kastl's avatar
      gimple: sccopy: Prune removed statements from SCCs [PR117919] · 5349aa2a
      Filip Kastl authored
      
      While writing the sccopy pass I didn't realize that 'replace_uses_by ()' can
      remove portions of the CFG.  This happens when replacing arguments of some
      statement results in the removal of an EH edge.  Because of this sccopy can
      then work with GIMPLE statements that aren't part of the IR anymore.  In
      PR117919 this triggered an assertion within the pass which assumes that
      statements the pass works with are reachable.
      
      This patch tells the pass to notice when a statement isn't in the IR anymore
      and remove it from it's worklist.
      
      	PR tree-optimization/117919
      
      gcc/ChangeLog:
      
      	* gimple-ssa-sccopy.cc (scc_copy_prop::propagate): Prune
      	statements that 'replace_uses_by ()' removed.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/pr117919.C: New test.
      
      Signed-off-by: default avatarFilip Kastl <fkastl@suse.cz>
      5349aa2a
    • GCC Administrator's avatar
      Daily bump. · 88e620c8
      GCC Administrator authored
      88e620c8
  4. Mar 01, 2025
    • Gerald Pfeifer's avatar
      doc: Simplify description of *-*-freebsd* · 4fee152d
      Gerald Pfeifer authored
      gcc:
      	PR target/69374
      	* doc/install.texi (Specific, *-*-freebsd*): Simplify description.
      4fee152d
    • Jakub Jelinek's avatar
      ggc: Fix up ggc_internal_cleared_alloc_no_dtor [PR117047] · ff38712b
      Jakub Jelinek authored
      Apparently I got one of the !HAVE_ATTRIBUTE_ALIAS fallbacks wrong.
      
      It compiled with a warning:
      ../../gcc/ggc-common.cc: In function 'void* ggc_internal_cleared_alloc_no_dtor(size_t, void (*)(void*), size_t, size_t)':
      ../../gcc/ggc-common.cc:154:44: warning: unused parameter 'size' [-Wunused-parameter]
        154 | ggc_internal_cleared_alloc_no_dtor (size_t size, void (*f)(void *),
            |                                     ~~~~~~~^~~~
      and obviously didn't work right (always allocated 0-sized objects).
      
      Fixed thusly.
      
      2025-03-01  Jakub Jelinek  <jakub@redhat.com>
      
      	PR jit/117047
      	* ggc-common.cc (ggc_internal_cleared_alloc_no_dtor): Pass size
      	rather than s as the first argument to ggc_internal_cleared_alloc.
      ff38712b
    • Harald Anlauf's avatar
      Fortran: fix front-end memleak after failure during parsing of NULLIFY · f7db0263
      Harald Anlauf authored
      gcc/fortran/ChangeLog:
      
      	* match.cc (gfc_match_nullify): Free matched expression when
      	cleaning up.
      	* primary.cc (match_variable): Initialize result to NULL.
      f7db0263
Loading