Skip to content
Snippets Groups Projects
  1. Oct 25, 2023
    • Chung-Lin Tang's avatar
      OpenACC 2.7: Implement self clause for compute constructs · 3a359638
      Chung-Lin Tang authored
      This patch implements the 'self' clause for compute constructs: parallel,
      kernels, and serial. This clause conditionally uses the local device
      (the host mult-core CPU) as the executing device of the compute region.
      
      The actual implementation of the "local device" device type inside libgomp
      (presumably using pthreads) is still not yet completed, so the libgomp
      side is still implemented the exact same as host-fallback mode. (so as of now,
      it essentially behaves like the 'if' clause with the condition inverted)
      
      gcc/c/ChangeLog:
      
      	* c-parser.cc (c_parser_oacc_compute_clause_self): New function.
      	(c_parser_oacc_all_clauses): Add new 'bool compute_p = false'
      	parameter, add parsing of self clause when compute_p is true.
      	(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
      	(OACC_PARALLEL_CLAUSE_MASK): Likewise,
      	(OACC_SERIAL_CLAUSE_MASK): Likewise.
      	(c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
      	set compute_p argument to true.
      	* c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case.
      
      gcc/cp/ChangeLog:
      
      	* parser.cc (cp_parser_oacc_compute_clause_self): New function.
      	(cp_parser_oacc_all_clauses): Add new 'bool compute_p = false'
      	parameter, add parsing of self clause when compute_p is true.
      	(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
      	(OACC_PARALLEL_CLAUSE_MASK): Likewise,
      	(OACC_SERIAL_CLAUSE_MASK): Likewise.
      	(cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
      	set compute_p argument to true.
      	* pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case.
      	* semantics.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged
      	with OMP_CLAUSE_IF case.
      
      gcc/fortran/ChangeLog:
      
      	* gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field.
      	* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF.
      	(gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF.
      	(OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF.
      	(OACC_KERNELS_CLAUSES): Likewise.
      	(OACC_SERIAL_CLAUSES): Likewise.
      	(resolve_omp_clauses): Add handling for omp_clauses->self_expr.
      	* trans-openmp.cc (gfc_trans_omp_clauses): Add handling of
      	clauses->self_expr and building of OMP_CLAUSE_SELF tree clause.
      	(gfc_split_omp_clauses): Add handling of self_expr field copy.
      
      gcc/ChangeLog:
      
      	* gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case.
      	(gimplify_adjust_omp_clauses): Likewise.
      	* omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code,
      	* omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case.
      	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum.
      	* tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF
      	case.
      	(convert_local_omp_clauses): Likewise.
      	* tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case.
      	* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry.
      	(omp_clause_code_name): Likewise.
      	* tree.h (OMP_CLAUSE_SELF_EXPR): New macro.
      
      gcc/testsuite/ChangeLog:
      
      	* c-c++-common/goacc/self-clause-1.c: New test.
      	* c-c++-common/goacc/self-clause-2.c: New test.
      	* gfortran.dg/goacc/self.f95: New test.
      
      include/ChangeLog:
      
      	* gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value.
      
      libgomp/ChangeLog:
      
      	* oacc-parallel.c (GOACC_parallel_keyed): Add code to handle
      	GOACC_FLAG_LOCAL_DEVICE case.
      	* testsuite/libgomp.oacc-c-c++-common/self-1.c: New test.
      3a359638
    • Thomas Schwinge's avatar
      OpenMP/Fortran: Group handling of 'if' clause without and with modifier · fa68e04e
      Thomas Schwinge authored
      The 'if' clause with modifier was introduced in
      commit b4c3a85b (Subversion r242037)
      "Partial OpenMP 4.5 fortran support", but -- in some instances -- didn't place
      it next to the existing handling of 'if' clause without modifier.  Unify that;
      no change in behavior.
      
      	gcc/fortran/
      	* dump-parse-tree.cc (show_omp_clauses): Group handling of 'if'
      	clause without and with modifier.
      	* frontend-passes.cc (gfc_code_walker): Likewise.
      	* gfortran.h (gfc_omp_clauses): Likewise.
      	* openmp.cc (gfc_free_omp_clauses): Likewise.
      fa68e04e
    • Juzhe-Zhong's avatar
      RISC-V: Change MD attribute avl_type into avl_type_idx[NFC] · 5e714992
      Juzhe-Zhong authored
      Address kito's comments of AVL propagation patch.
      
      Change avl_type into avl_type_idx.
      
      No functionality change.
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-protos.h (vlmax_avl_type_p): New function.
      	* config/riscv/riscv-v.cc (vlmax_avl_type_p): Ditto.
      	* config/riscv/riscv-vsetvl.cc (get_avl): Adapt function.
      	* config/riscv/vector.md: Change avl_type into avl_type_idx.
      5e714992
    • Marek Polacek's avatar
      c++: error with bit-fields and scoped enums [PR111895] · 6fa7284e
      Marek Polacek authored
      Here we issue a bogus error: invalid operands of types 'unsigned char:2'
      and 'int' to binary 'operator!=' when casting a bit-field of scoped enum
      type to bool.
      
      In build_static_cast_1, perform_direct_initialization_if_possible returns
      NULL_TREE, because the invented declaration T t(e) fails, which is
      correct.  So we go down to ocp_convert, which has code to deal with this
      case:
                /* We can't implicitly convert a scoped enum to bool, so convert
                   to the underlying type first.  */
                if (SCOPED_ENUM_P (intype) && (convtype & CONV_STATIC))
                  e = build_nop (ENUM_UNDERLYING_TYPE (intype), e);
      but the SCOPED_ENUM_P is false since intype is <unnamed-unsigned:2>.
      This could be fixed by using unlowered_expr_type.  But then
      c_common_truthvalue_conversion/CASE_CONVERT has a similar problem, and
      unlowered_expr_type is a C++-only function.
      
      Rather than adding a dummy unlowered_expr_type to C, I think we should
      follow [expr.static.cast]p3: "the lvalue-to-rvalue conversion is applied
      to the bit-field and the resulting prvalue is used as the operand of the
      static_cast."  There are no prvalue bit-fields, so the l-to-r conversion
      performed in decay_conversion will get us an expression whose type is the
      enum.
      
      	PR c++/111895
      
      gcc/cp/ChangeLog:
      
      	* typeck.cc (build_static_cast_1): Call decay_conversion.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/cpp0x/scoped_enum12.C: New test.
      6fa7284e
    • GCC Administrator's avatar
      Daily bump. · 444a485f
      GCC Administrator authored
      444a485f
  2. Oct 24, 2023
    • Gaius Mulley's avatar
      modula2: tidyup M2Dependent.mod · 5dbcc40a
      Gaius Mulley authored
      
      This patch tidies up M2Dependent.mod by introducing a new procedure
      to initialize all fields of DependencyList.
      
      gcc/m2/ChangeLog:
      
      	* gm2-libs/M2Dependent.mod (InitDependencyList): New
      	procedure.
      	(CreateModule): Call InitDependencyList to initialize
      	all fields of DependencyList.
      
      Signed-off-by: default avatarGaius Mulley <gaiusmod2@gmail.com>
      5dbcc40a
    • Patrick Palka's avatar
      c++: non-dep array new-expr size [PR111929] · d80a26cc
      Patrick Palka authored
      This PR is another instance of NON_DEPENDENT_EXPR having acted as an
      "analysis barrier" for middle-end routines, and now that it's gone we're
      more prone to passing weird templated trees (that have a generic tree
      code) to middle-end routines which end up ICEing on such trees.
      
      In the testcase below the non-dependent array new-expr size 'x + 42' is
      expressed as an ordinary PLUS_EXPR, but whose operands have different
      types (since templated trees encode just the syntactic form of an
      expression devoid of e.g. implicit conversions).  This type incoherency
      triggers an ICE from size_binop in build_new_1 due to a wide_int assert
      that expects the operand types to have the same precision.
      
      This patch fixes this by replacing our piecemeal folding of 'size' in
      build_new_1 with a single call to cp_fully_fold (which is a no-op in a
      template context) once 'size' is built up.
      
      	PR c++/111929
      
      gcc/cp/ChangeLog:
      
      	* init.cc (build_new_1): Use convert, build2, build3 and
      	cp_fully_fold instead of fold_convert, size_binop and
      	fold_build3 when building up 'size'.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/template/non-dependent28.C: New test.
      d80a26cc
    • Patrick Palka's avatar
      c++: cp_stabilize_reference and non-dep exprs [PR111919] · 51f164f7
      Patrick Palka authored
      After the removal of NON_DEPENDENT_EXPR, cp_stabilize_reference (which
      used to just exit early for NON_DEPENDENT_EXPR) is now more prone to
      passing a weird templated tree to middle-end routines, which for the
      testcase below leads to a crash from contains_placeholder_p.  It seems
      the best fix is to just exit early when in a template context, like we
      do in the closely related function cp_save_expr.
      
      	PR c++/111919
      
      gcc/cp/ChangeLog:
      
      	* tree.cc (cp_stabilize_reference): Do nothing when
      	processing_template_decl.
      
      gcc/testsuite/ChangeLog:
      
      	* g++.dg/template/non-dependent27.C: New test.
      51f164f7
    • Paul M. Bendixen's avatar
      libstdc++: Include cstdarg in freestanding · c1eee808
      Paul M. Bendixen authored
      
      P1642 includes cstdarg in the full headers to include.
      This commit includes it along with cstdalign and cstdbool that were
      left out when updating in an earlier commit.
      
      libstdc++/Changelog
      
      	* include/Makefile.am: Move cstdarg, cstdalign and cstdbool to
      	freestanding.
      	* include/Makefile.in: Regenerate.
      
      Signed-off-by: default avatarPaul M. Bendixen <paulbendixen@gmail.com>
      c1eee808
    • Gaius Mulley's avatar
      modula2: gcc/m2/gm2-libs/M2Dependent.mod initialize all record fields. · 23ddfa1b
      Gaius Mulley authored
      
      Initialize all sub fields within mptr.  Valgrind detected
      uninitialized fields in M2Dependent.mod.  CreateModule must ensure all
      sub fields are initialized.
      
      gcc/m2/ChangeLog:
      
      	* gm2-libs/M2Dependent.mod (CreateModule): Initialize all
      	dependency fields for DependencyList.
      
      Signed-off-by: default avatarGaius Mulley <gaiusmod2@gmail.com>
      23ddfa1b
    • Richard Sandiford's avatar
      recog/reload: Remove old UNARY_P operand support · 1fa7bcfd
      Richard Sandiford authored
      reload and constrain_operands had some old code to look through unary
      operators.  E.g. an operand could be (sign_extend (reg X)), and the
      constraints would match the reg rather than the sign_extend.
      
      This was previously used by the MIPS port.  But relying on it was a
      recurring source of problems, so Eric and I removed it in the MIPS
      rewrite from ~20 years back.  I don't know of any other port that used it.
      
      Also, the constraints processing in LRA and IRA do not have direct
      support for these embedded operators, so I think it was only ever a
      reload-specific feature (and probably only a global/local+reload-specific
      feature, rather than IRA+reload).
      
      Keeping the checks caused problems for special memory constraints,
      leading to:
      
      	  /* A unary operator may be accepted by the predicate, but it
      	     is irrelevant for matching constraints.  */
      	  /* For special_memory_operand, there could be a memory operand inside,
      	     and it would cause a mismatch for constraint_satisfied_p.  */
      	  if (UNARY_P (op) && op == extract_mem_from_operand (op))
      	    op = XEXP (op, 0);
      
      But inline asms are another source of problems.  Asms don't have
      predicates, and so we can't use recog to decide whether a given change
      to an asm gives a valid match.  We instead rely on constrain_operands as
      something of a recog stand-in.  For an example like:
      
          void
          foo (int *ptr)
          {
            asm volatile ("%0" :: "r" (-*ptr));
          }
      
      any attempt to propagate the negation into the asm would be allowed,
      because it's the negated register that would be checked against the
      "r" constraint.  This would later lead to:
      
          error: invalid 'asm': invalid operand
      
      The same thing happened in gcc.target/aarch64/vneg_s.c with the
      upcoming late-combine pass.
      
      Rather than add more workarounds, it seemed better just to delete
      this code.
      
      gcc/
      	* recog.cc (constrain_operands): Remove UNARY_P handling.
      	* reload.cc (find_reloads): Likewise.
      1fa7bcfd
    • Jose E. Marchesi's avatar
      gcc: fix typo in comment in gcov-io.h · e6fdea82
      Jose E. Marchesi authored
      gcc/ChangeLog:
      
      	* gcov-io.h: Fix record length encoding in comment.
      e6fdea82
    • Roger Sayle's avatar
      i386: Fine tune STV register conversion costs for -Os. · 99a6c106
      Roger Sayle authored
      The eagle-eyed may have spotted that my recent testcases for DImode shifts
      on x86_64 included -mno-stv in the dg-options.  This is because the
      Scalar-To-Vector (STV) pass currently transforms these shifts to use
      SSE vector operations, producing larger code even with -Os.  The issue
      is that the compute_convert_gain currently underestimates the size of
      instructions required for interunit moves, which is corrected with the
      patch below.
      
      For the simple test case:
      
      unsigned long long shl1(unsigned long long x) { return x << 1; }
      
      without this patch, GCC -m32 -Os -mavx2 currently generates:
      
      shl1:	push   %ebp		 // 1 byte
      	mov    %esp,%ebp	 // 2 bytes
      	vmovq  0x8(%ebp),%xmm0	 // 5 bytes
      	pop    %ebp		 // 1 byte
      	vpaddq %xmm0,%xmm0,%xmm0 // 4 bytes
      	vmovd  %xmm0,%eax	 // 4 bytes
      	vpextrd $0x1,%xmm0,%edx  // 6 bytes
      	ret			 // 1 byte  = 24 bytes total
      
      with this patch, we now generate the shorter
      
      shl1:	push   %ebp		// 1 byte
      	mov    %esp,%ebp	// 2 bytes
      	mov    0x8(%ebp),%eax	// 3 bytes
      	mov    0xc(%ebp),%edx	// 3 bytes
      	pop    %ebp		// 1 byte
      	add    %eax,%eax	// 2 bytes
      	adc    %edx,%edx	// 2 bytes
      	ret			// 1 byte  = 15 bytes total
      
      Benchmarking using CSiBE, shows that this patch saves 1361 bytes
      when compiling with -m32 -Os, and saves 172 bytes when compiling
      with -Os.
      
      2023-10-24  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	* config/i386/i386-features.cc (compute_convert_gain): Provide
      	more accurate values (sizes) for inter-unit moves with -Os.
      99a6c106
    • Roger Sayle's avatar
      ARC: Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER. · 35f4e952
      Roger Sayle authored
      This patch completes the ARC back-end's transition to using pre-reload
      splitters for SImode shifts and rotates on targets without a barrel
      shifter.  The core part is that the shift_si3 define_insn is no longer
      needed, as shifts and rotates that don't require a loop are split
      before reload, and then because shift_si3_loop is the only caller
      of output_shift, both can be significantly cleaned up and simplified.
      The output_shift function (Claudiu's "the elephant in the room") is
      renamed output_shift_loop, which handles just the four instruction
      zero-overhead loop implementations.
      
      Aside from the clean-ups, the user visible changes are much improved
      implementations of SImode shifts and rotates on affected targets.
      
      For the function:
      unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); }
      
      GCC with -O2 -mcpu=em would previously generate:
      
      rotr_1: lsr_s r2,r0
              bmsk_s r0,r0,0
              ror     r0,r0
              j_s.d   [blink]
              or_s    r0,r0,r2
      
      with this patch, we now generate:
      
              j_s.d   [blink]
              ror     r0,r0
      
      For the function:
      unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); }
      
      GCC with -O2 -mcpu=em would previously generate:
      
      rotr_31:
              mov_s   r2,r0   ;4
              asl_s r0,r0
              add.f 0,r2,r2
              rlc r2,0
              j_s.d   [blink]
              or_s    r0,r0,r2
      
      with this patch we now generate an add.f followed by an adc:
      
      rotr_31:
              add.f   r0,r0,r0
              j_s.d   [blink]
              add.cs  r0,r0,1
      
      Shifts by constants requiring a loop have been improved for even counts
      by performing two operations in each iteration:
      
      int shl10(int x) { return x >> 10; }
      
      Previously looked like:
      
      shl10:	mov.f lp_count, 10
              lpnz    2f
              asr r0,r0
              nop
      2:      # end single insn loop
              j_s     [blink]
      
      And now becomes:
      
      shl10:
              mov     lp_count,5
              lp      2f
              asr     r0,r0
              asr     r0,r0
      2:      # end single insn loop
              j_s     [blink]
      
      So emulating ARC's SWAP on architectures that don't have it:
      
      unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); }
      
      previously required 10 instructions and ~70 cycles:
      
      rotr_16:
              mov_s   r2,r0   ;4
              mov.f lp_count, 16
              lpnz    2f
              add r0,r0,r0
              nop
      2:      # end single insn loop
              mov.f lp_count, 16
              lpnz    2f
              lsr r2,r2
              nop
      2:      # end single insn loop
              j_s.d   [blink]
              or_s    r0,r0,r2
      
      now becomes just 4 instructions and ~18 cycles:
      
      rotr_16:
              mov     lp_count,8
              lp      2f
              ror     r0,r0
              ror     r0,r0
      2:      # end single insn loop
              j_s     [blink]
      
      2023-10-24  Roger Sayle  <roger@nextmovesoftware.com>
      	    Claudiu Zissulescu  <claziss@gmail.com>
      
      gcc/ChangeLog
      	* config/arc/arc-protos.h (output_shift): Rename to...
      	(output_shift_loop): Tweak API to take an explicit rtx_code.
      	(arc_split_ashl): Prototype new function here.
      	(arc_split_ashr): Likewise.
      	(arc_split_lshr): Likewise.
      	(arc_split_rotl): Likewise.
      	(arc_split_rotr): Likewise.
      	* config/arc/arc.cc (output_shift): Delete local prototype.  Rename.
      	(output_shift_loop): New function replacing output_shift to output
      	a zero overheap loop for SImode shifts and rotates on ARC targets
      	without barrel shifter (i.e. no hardware support for these insns).
      	(arc_split_ashl): New helper function to split *ashlsi3_nobs.
      	(arc_split_ashr): New helper function to split *ashrsi3_nobs.
      	(arc_split_lshr): New helper function to split *lshrsi3_nobs.
      	(arc_split_rotl): New helper function to split *rotlsi3_nobs.
      	(arc_split_rotr): New helper function to split *rotrsi3_nobs.
      	(arc_print_operand): Correct whitespace.
      	(arc_rtx_costs): Likewise.
      	(hwloop_optimize): Likewise.
      	* config/arc/arc.md (ANY_SHIFT_ROTATE): New define_code_iterator.
      	(define_code_attr insn): New code attribute to map to pattern name.
      	(<ANY_SHIFT_ROTATE>si3): New expander unifying previous ashlsi3,
      	ashrsi3 and lshrsi3 define_expands.  Adds rotlsi3 and rotrsi3.
      	(*<ANY_SHIFT_ROTATE>si3_nobs): New define_insn_and_split that
      	unifies the previous *ashlsi3_nobs, *ashrsi3_nobs and *lshrsi3_nobs.
      	We now call arc_split_<insn> in arc.cc to implement each split.
      	(shift_si3): Delete define_insn, all shifts/rotates are now split.
      	(shift_si3_loop): Rename to...
      	(<insn>si3_loop): define_insn to handle loop implementations of
      	SImode shifts and rotates, calling ouput_shift_loop for template.
      	(rotrsi3): Rename to...
      	(*rotrsi3_insn): define_insn for TARGET_BARREL_SHIFTER's ror.
      	(*rotlsi3): New define_insn_and_split to transform left rotates
      	into right rotates before reload.
      	(rotlsi3_cnt1): New define_insn_and_split to implement a left
      	rotate by one bit using an add.f followed by an adc.
      	* config/arc/predicates.md (shiftr4_operator): Delete.
      35f4e952
    • Christophe Lyon's avatar
      testsuite: Fix gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c · 326a8c04
      Christophe Lyon authored
      The test was declaring 'int *carry;' and wrote to '*carry' without
      initializing 'carry' first, leading to an attempt to write at address
      zero, and a crash.
      
      Fix by declaring 'int carry;' and passing '&carrry' instead of 'carry'
      as parameter.
      
      2023-09-08  Christophe Lyon  <christophe.lyon@linaro.org>
      
      	gcc/testsuite/
      	* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: Fix.
      326a8c04
    • Claudiu Zissulescu's avatar
      arc: Remove mpy_dest_reg_operand predicate · 2287fa29
      Claudiu Zissulescu authored
      
      The mpy_dest_reg_operand is just a wrapper for
      register_operand. Remove it.
      
      gcc/
      
      	* config/arc/arc.md (mulsi3_700): Update pattern.
      	(mulsi3_v2): Likewise.
      	* config/arc/predicates.md (mpy_dest_reg_operand): Remove it.
      
      Signed-off-by: default avatarClaudiu Zissulescu <claziss@gmail.com>
      2287fa29
    • Andrew Pinski's avatar
      Improve factor_out_conditional_operation for conversions and constants · 0fc13e8c
      Andrew Pinski authored
      In the case of a NOP conversion (precisions of the 2 types are equal),
      factoring out the conversion can be done even if int_fits_type_p returns
      false and even when the conversion is defined by a statement inside the
      conditional. Since it is a NOP conversion there is no zero/sign extending
      happening which is why it is ok to be done here; we were trying to prevent
      an extra sign/zero extend from being moved away from definition which no-op
      conversions are not.
      
      Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/104376
      	PR tree-optimization/101541
      	* tree-ssa-phiopt.cc (factor_out_conditional_operation):
      	Allow nop conversions even if it is defined by a statement
      	inside the conditional.
      
      gcc/testsuite/ChangeLog:
      
      	PR tree-optimization/101541
      	* gcc.dg/tree-ssa/phi-opt-39.c: New test.
      0fc13e8c
    • Andrew Pinski's avatar
      match: Fix the `popcnt(a&b) + popcnt(a|b)` pattern for types [PR111913] · 452c4f32
      Andrew Pinski authored
      So this pattern needs a little help on the gimple side of things to know what
      the type popcount should be. For most builtins, the type is the same as the input
      but popcount and others are not. And when using it with another outer expression,
      genmatch needs some slight help to know that the return type was type rather than
      the argument type.
      
      Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      	PR tree-optimization/111913
      
      gcc/ChangeLog:
      
      	* match.pd (`popcount(X&Y) + popcount(X|Y)`): Add the resulting
      	type for popcount.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.c-torture/compile/fold-popcount-1.c: New test.
      	* gcc.dg/fold-popcount-8a.c: New test.
      452c4f32
    • Richard Sandiford's avatar
      rtl-ssa: Avoid creating duplicated phis · 3e901615
      Richard Sandiford authored
      If make_uses_available was called twice for the same use,
      we could end up trying to create duplicate definitions for
      the same extended live range.
      
      gcc/
      	* rtl-ssa/blocks.cc (function_info::create_degenerate_phi): Check
      	whether the requested phi already exists.
      3e901615
    • Richard Sandiford's avatar
      rtl-ssa: Don't insert after insns that can throw · d0eb4ace
      Richard Sandiford authored
      rtl_ssa::can_insert_after didn't handle insns that can throw.
      Fixing that avoids a regression with a later patch.
      
      gcc/
      	* rtl-ssa.h: Include cfgbuild.h.
      	* rtl-ssa/movement.h (can_insert_after): Replace is_jump with the
      	more comprehensive control_flow_insn_p.
      d0eb4ace
    • Richard Sandiford's avatar
      rtl-ssa: Fix handling of deleted insns · c97b167e
      Richard Sandiford authored
      RTL-SSA queues up some invasive changes for later.  But sometimes
      the insns involved in those changes can be deleted by later
      optimisations, making the queued change unnecessary.  This patch
      checks for that case.
      
      gcc/
      	* rtl-ssa/changes.cc (function_info::perform_pending_updates): Check
      	whether an insn has been replaced by a note.
      c97b167e
    • Richard Sandiford's avatar
      rtl-ssa: Fix null deref in first_any_insn_use · 50313dcd
      Richard Sandiford authored
      first_any_insn_use implicitly (but contrary to its documentation)
      assumed that there was at least one use.
      
      gcc/
      	* rtl-ssa/member-fns.inl (first_any_insn_use): Handle null
      	m_first_use.
      50313dcd
    • Richard Sandiford's avatar
      i386: Avoid paradoxical subreg dests in vector zero_extend · 58de8e93
      Richard Sandiford authored
      For the V2HI -> V2SI zero extension in:
      
        typedef unsigned short v2hi __attribute__((vector_size(4)));
        typedef unsigned int v2si __attribute__((vector_size(8)));
        v2si f (v2hi x) { return (v2si) {x[0], x[1]}; }
      
      ix86_expand_sse_extend would generate:
      
         (set (reg:V2HI 102)
              (const_vector:V2HI [(const_int 0 [0])
      			    (const_int 0 [0])]))
         (set (subreg:V8HI (reg:V2HI 101) 0)
              (vec_select:V8HI
                (vec_concat:V16HI (subreg:V8HI (reg/v:V2HI 99 [ x ]) 0)
                                  (subreg:V8HI (reg:V2HI 102) 0))
                (parallel [(const_int 0 [0])
                           (const_int 8 [0x8])
                           (const_int 1 [0x1])
                           (const_int 9 [0x9])
                           (const_int 2 [0x2])
                           (const_int 10 [0xa])
                           (const_int 3 [0x3])
                           (const_int 11 [0xb])])))
        (set (reg:V2SI 100)
             (subreg:V2SI (reg:V2HI 101) 0))
          (expr_list:REG_EQUAL (zero_extend:V2SI (reg/v:V2HI 99 [ x ])))
      
      But using (subreg:V8HI (reg:V2HI 101) 0) as the destination of
      the vec_select means that only the low 4 bytes of the destination
      are stored.  Only the lower half of reg 100 is well-defined.
      
      Things tend to happen to work if the register allocator ties reg 101
      to reg 100.  But it caused problems with the upcoming late-combine pass
      because we propagated the set of reg 100 into its uses.
      
      gcc/
      	* config/i386/i386-expand.cc (ix86_split_mmx_punpck): Allow the
      	destination to be wider than the sources.  Take the mode from the
      	first source.
      	(ix86_expand_sse_extend): Pass the destination directly to
      	ix86_split_mmx_punpck, rather than using a fresh register that
      	is half the size.
      58de8e93
    • Richard Sandiford's avatar
      i386: Fix unprotected REGNO in aeswidekl_operation · cc477955
      Richard Sandiford authored
      I hit an ICE in aeswidekl_operation while testing the late-combine
      pass on x86.  The predicate tested REGNO without first testing REG_P.
      
      gcc/
      	* config/i386/predicates.md (aeswidekl_operation): Protect
      	REGNO check with REG_P.
      cc477955
    • Richard Sandiford's avatar
      aarch64: Define TARGET_INSN_COST · 21416caf
      Richard Sandiford authored
      This patch adds a bare-bones TARGET_INSN_COST.  See the comment
      in the patch for the rationale.
      
      Just to get a flavour for how much difference it makes, I tried
      compiling the testsuite with -Os -fno-schedule-insns{,2} and
      seeing what effect the patch had on the number of instructions.
      Very few tests changed, but all the changes were positive:
      
        Tests   Good    Bad   Delta    Best   Worst  Median
        =====   ====    ===   =====    ====   =====  ======
           19     19      0    -177     -52      -1      -4
      
      The change for -O2 was even smaller, but more mixed:
      
        Tests   Good    Bad   Delta    Best   Worst  Median
        =====   ====    ===   =====    ====   =====  ======
            6      3      3      -8      -9       6      -2
      
      There were no obvious effects on SPEC CPU2017.
      
      The patch is needed to avoid a regression with a later change.
      
      gcc/
      	* config/aarch64/aarch64.cc (aarch64_insn_cost): New function.
      	(TARGET_INSN_COST): Define.
      21416caf
    • Richard Sandiford's avatar
      aarch64: Avoid bogus atomics match · b632a516
      Richard Sandiford authored
      The non-LSE pattern aarch64_atomic_exchange<mode> comes before the
      LSE pattern aarch64_atomic_exchange<mode>_lse.  From a recog
      perspective, the only difference between the patterns is that
      the non-LSE one clobbers CC and needs a scratch.
      
      However, combine and RTL-SSA can both add clobbers to make a
      pattern match.  This means that if they try to rerecognise an
      LSE pattern, they could end up turning it into a non-LSE pattern.
      This patch adds a !TARGET_LSE test to avoid that.
      
      This is needed to avoid a regression with later patches.
      
      gcc/
      	* config/aarch64/atomics.md (aarch64_atomic_exchange<mode>): Require
      	!TARGET_LSE.
      b632a516
    • xuli's avatar
      RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935] · b44d4ff7
      xuli authored
      Calling vget/vset intrinsic without receiving a return value will cause
      a crash. Because in this case e.target is null.
      This patch should be backported to releases/gcc-13.
      
      	PR target/111935
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-vector-builtins-bases.cc: fix bug.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/base/pr111935.c: New test.
      b44d4ff7
    • Sergei Trofimovich's avatar
      libgcc: make heap-based trampolines conditional on libc presence · eaf75155
      Sergei Trofimovich authored
      To build `libc` for a target one needs to build `gcc` without `libc`
      support first. Commit r14-4823-g8abddb187b3348 "libgcc: support
      heap-based trampolines" added unconditional `libc` dependency and broke
      libc-less `gcc` builds.
      
      An example failure on `x86_64-unknown-linux-gnu`:
      
          $ mkdir -p /tmp/empty
          $ ../gcc/configure \
              --disable-multilib \
              --without-headers \
              --with-newlib \
              --enable-languages=c \
              --disable-bootstrap \
              --disable-gcov \
              --disable-threads \
              --disable-shared \
              --disable-libssp \
              --disable-libquadmath \
              --disable-libgomp \
              --disable-libatomic \
              --with-build-sysroot=/tmp/empty
          $ make
          ...
          /tmp/gb/./gcc/xgcc -B/tmp/gb/./gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -B/usr/local/x86_64-pc-linux-gnu/lib/ -isystem /usr/local/x86_64-pc-linux-gnu/include -isystem /usr/local/x86_64-pc-linux-gnu/sys-include --sysroot=/tmp/empty   -g -O2 -O2  -g -O2 -DIN_GCC   -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem ./include  -fpic -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -Dinhibit_libc -fpic -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -I. -I. -I../.././gcc -I/home/slyfox/dev/git/gcc/libgcc -I/home/slyfox/dev/git/gcc/libgcc/. -I/home/slyfox/dev/git/gcc/libgcc/../gcc -I/home/slyfox/dev/git/gcc/libgcc/../include  -DHAVE_CC_TLS  -DUSE_TLS  -o heap-trampoline.o -MT heap-trampoline.o -MD -MP -MF heap-trampoline.dep  -c .../gcc/libgcc/config/i386/heap-trampoline.c -fvisibility=hidden -DHIDE_EXPORTS
          ../gcc/libgcc/config/i386/heap-trampoline.c:3:10: fatal error: unistd.h: No such file or directory
              3 | #include <unistd.h>
                |          ^~~~~~~~~~
          compilation terminated.
          make[2]: *** [.../gcc/libgcc/static-object.mk:17: heap-trampoline.o] Error 1
          make[2]: Leaving directory '/tmp/gb/x86_64-pc-linux-gnu/libgcc'
          make[1]: *** [Makefile:13307: all-target-libgcc] Error 2
      
      The change inhibits any heap-based trampoline code.
      
      libgcc/
      
      	* config/aarch64/heap-trampoline.c: Disable when libc is not
      	present.
      	* config/i386/heap-trampoline.c: Ditto.
      eaf75155
    • Mark Harmstone's avatar
      Remove obsolete debugging formats from names list · 724badca
      Mark Harmstone authored
      	* opts.cc (debug_type_names): Remove stabs and xcoff.
      	(df_set_names): Adjust.
      724badca
    • Juzhe-Zhong's avatar
      RISC-V: Fix ICE of RTL CHECK on VSETVL PASS[PR111947] · 7b2984ad
      Juzhe-Zhong authored
      ICE on vsetvli a5, 8 instruction demand info.
      
      The AVL is const_int 8 which ICE on RENGO caller.
      
      Committed as it is obvious fix.
      
      	PR target/111947
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_lcm_local_properties): Add REGNO check.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/vsetvl/pr111947.c: New test.
      7b2984ad
    • GCC Administrator's avatar
      Daily bump. · 9cf2e744
      GCC Administrator authored
      9cf2e744
  3. Oct 23, 2023
    • Lewis Hyatt's avatar
      libcpp: Improve the diagnostic for poisoned identifiers [PR36887] · cb05acdc
      Lewis Hyatt authored
      The PR requests an enhancement to the diagnostic issued for the use of a
      poisoned identifier. Currently, we show the location of the usage, but not
      the location which requested the poisoning, which would be helpful for the
      user if the decision to poison an identifier was made externally, such as
      in a library header.
      
      In order to output this information, we need to remember a location_t for
      each identifier that has been poisoned, and that data needs to be preserved
      as well in a PCH. One option would be to add a field to struct cpp_hashnode,
      but there is no convenient place to add it without increasing the size of
      the struct for all identifiers. Given this facility will be needed rarely,
      it seemed better to add a second hash map, which is handled PCH-wise the
      same as the current one in gcc/stringpool.cc. This hash map associates a new
      struct cpp_hashnode_extra with each identifier that needs one. Currently
      that struct only contains the new location_t, but it could be extended in
      the future if there is other ancillary data that may be convenient to put
      there for other purposes.
      
      libcpp/ChangeLog:
      
      	PR preprocessor/36887
      	* directives.cc (do_pragma_poison): Store in the extra hash map the
      	location from which an identifier has been poisoned.
      	* lex.cc (identifier_diagnostics_on_lex): When issuing a diagnostic
      	for the use of a poisoned identifier, also add a note indicating the
      	location from which it was poisoned.
      	* identifiers.cc (alloc_node): Convert to template function.
      	(_cpp_init_hashtable): Handle the new extra hash map.
      	(_cpp_destroy_hashtable): Likewise.
      	* include/cpplib.h (struct cpp_hashnode_extra): New struct.
      	(cpp_create_reader): Update prototype to...
      	* init.cc (cpp_create_reader): ...accept an argument for the extra
      	hash table and pass it to _cpp_init_hashtable.
      	* include/symtab.h (ht_lookup): New overload for convenience.
      	* internal.h (struct cpp_reader): Add EXTRA_HASH_TABLE member.
      	(_cpp_init_hashtable): Adjust prototype.
      
      gcc/c-family/ChangeLog:
      
      	PR preprocessor/36887
      	* c-opts.cc (c_common_init_options): Pass new extra hash map
      	argument to cpp_create_reader().
      
      gcc/ChangeLog:
      
      	PR preprocessor/36887
      	* toplev.h (ident_hash_extra): Declare...
      	* stringpool.cc (ident_hash_extra): ...this new global variable.
      	(init_stringpool): Handle ident_hash_extra as well as ident_hash.
      	(ggc_mark_stringpool): Likewise.
      	(ggc_purge_stringpool): Likewise.
      	(struct string_pool_data_extra): New struct.
      	(spd2): New GC root variable.
      	(gt_pch_save_stringpool): Use spd2 to handle ident_hash_extra,
      	analogous to how spd is used to handle ident_hash.
      	(gt_pch_restore_stringpool): Likewise.
      
      gcc/testsuite/ChangeLog:
      
      	PR preprocessor/36887
      	* c-c++-common/cpp/diagnostic-poison.c: New test.
      	* g++.dg/pch/pr36887.C: New test.
      	* g++.dg/pch/pr36887.Hs: New test.
      cb05acdc
    • Ian Lance Taylor's avatar
      compiler: move Selector_expression up in file · 02aa322c
      Ian Lance Taylor authored
      This is a mechanical change to move Selector_expression up in expressions.cc.
      This will make it visible to Builtin_call_expression for later work.
      This produces a very large "git --diff", but "git diff --minimal" is clear.
      
      Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536642
      02aa322c
    • Ian Lance Taylor's avatar
      compiler: make xx_constant_value methods non-const · 597dba85
      Ian Lance Taylor authored
      This changes the Expression {numeric,string,boolean}_constant_value
      methods non-const.  This does not affect anything immediately,
      but will be useful for later CLs in this series.
      
      The only real effect is to Builtin_call_expression::do_export,
      which remains const and can no longer call numeric_constant_value.
      But it never needed to call it, as do_export runs after do_lower,
      and do_lower replaces a constant expression with the actual constant.
      
      Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536641
      597dba85
    • Ian Lance Taylor's avatar
      compiler: pass gogo to Runtime::make_call · 45a5ab05
      Ian Lance Taylor authored
      This is a boilerplate change to pass gogo to Runtime::make_call.
      It's not currently used but will be used by later CLs in this series.
      
      Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536640
      45a5ab05
    • Ian Lance Taylor's avatar
      compiler: add Expression::is_untyped method · ac50e9b7
      Ian Lance Taylor authored
      This method is not currently used by anything, but it will be used
      by later CLs in this series.
      
      Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536639
      ac50e9b7
    • Ian Lance Taylor's avatar
      syscall: add missing type conversion · 2621bd1b
      Ian Lance Taylor authored
      The gofrontend incorrectly accepted code that was missing a type conversion.
      The test case for this is bug518.go in https://go.dev/cl/536537.
      Future CLs in this series will detect the type error.
      
      Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536638
      2621bd1b
    • Robin Dapp's avatar
      vect: Allow same precision for bit-precision conversions. · 32b74c9e
      Robin Dapp authored
      In PR111794 we miss a vectorization because on riscv type precision and
      mode precision differ for mask types.  We can still vectorize when
      allowing assignments with the same precision for dest and source which
      is what this patch does.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/111794
      	* tree-vect-stmts.cc (vectorizable_assignment): Add
      	same-precision exception for dest and source.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/slp-mask-1.c: New test.
      	* gcc.target/riscv/rvv/autovec/slp-mask-run-1.c: New test.
      32b74c9e
    • Robin Dapp's avatar
      RISC-V: Add popcount fallback expander. · 82bbbb73
      Robin Dapp authored
      I didn't manage to get back to the generic vectorizer fallback for
      popcount so I figured I'd rather create a popcount fallback in the
      riscv backend.  It uses the WWG algorithm from libgcc.
      
      gcc/ChangeLog:
      
      	* config/riscv/autovec.md (popcount<mode>2): New expander.
      	* config/riscv/riscv-protos.h (expand_popcount): Define.
      	* config/riscv/riscv-v.cc (expand_popcount): Vectorize popcount
      	with the WWG algorithm.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/autovec/unop/popcount-1.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/popcount-2.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: New test.
      	* gcc.target/riscv/rvv/autovec/unop/popcount.c: New test.
      82bbbb73
    • Richard Biener's avatar
      tree-optimization/111916 - SRA of BIT_FIELD_REF of constant pool entries · 458db9b6
      Richard Biener authored
      The following adjusts a leftover BIT_FIELD_REF special-casing to only
      cover the cases general code doesn't handle.
      
      	PR tree-optimization/111916
      	* tree-sra.cc (sra_modify_assign): Do not lower all
      	BIT_FIELD_REF reads that are sra_handled_bf_read_p.
      
      	* gcc.dg/torture/pr111916.c: New testcase.
      458db9b6
Loading