Skip to content
Snippets Groups Projects
  1. May 01, 2023
  2. Apr 30, 2023
    • Jeff Law's avatar
      33b6b791
    • Roger Sayle's avatar
      [Committed] Update xstormy16's neghi2 pattern to not clobber the carry flag. · b159026b
      Roger Sayle authored
      When I converted xstormy's neghi2 pattern from a define_expand to a
      define_insn, I forgot that define_expand implicitly produces a
      sequence of instructions, but a define_insn is an implicit parallel,
      thereby messing up the clobber (reg:BI CARRY_REG), which can then cause
      an ICE in the auto-generated added_clobbers_hard_reg_p.  Whilst stripping
      the superfluous PARALLEL resolves this issue, an even better fix is to
      use xstormy16's INC instruction, that (like NOT) doesn't affect the carry
      flag, resulting in a neghi2 implementation that can more easily be CSE'd
      and scheduled.
      
      Many thanks (again) to Jeff Law for testing/reporting this issue.
      
      2024-04-30  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	* config/stormy16/stormy16.md (neghi2): Rewrite pattern using
      	inc to avoid clobbering the carry flag.
      
      gcc/testsuite/ChangeLog
      	* gcc.target/xstormy16/neghi2.c: Update expected implementation.
      b159026b
    • Andrew Pinski's avatar
      Improve error message for excess elements in array initializer from {"a"} · d56af02f
      Andrew Pinski authored
      So char arrays are not the only type that be initialized from {"a"}.
      We can have wchar_t (L"") and char16_t (u"") types too. So let's
      print out the type of the array instead of just saying char.
      
      Note in the testsuite I used regex . to match '[' and ']' as
      I could not figure out how many '\' I needed.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      gcc/c/ChangeLog:
      
      	* c-typeck.cc (process_init_element): Print out array type
      	for excessive elements.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/init-bad-1.c: Update error message.
      	* gcc.dg/init-bad-2.c: Likewise.
      	* gcc.dg/init-bad-3.c: Likewise.
      	* gcc.dg/init-excess-3.c: Likewise.
      	* gcc.dg/pr61096-1.c: Likewise.
      d56af02f
    • Andrew Pinski's avatar
      Fix C/107926: Wrong error message when initializing char array · a6b810ae
      Andrew Pinski authored
      The problem here is the code which handles {"a"} is supposed
      to handle the case where the is something after the string but
      it only handles the case where there is another string so
      we go down the other path and error out saying "excess elements
      in struct initializer" even though this was a character array.
      To fix this, we need to move the ckeck if the initializer is
      a string after the check for array and initializer.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      Thanks,
      Adnrew Pinski
      
      gcc/c/ChangeLog:
      
      	PR c/107926
      	* c-typeck.cc (process_init_element): Move the check
      	for string cst until after the error message.
      
      gcc/testsuite/ChangeLog:
      
      	PR c/107926
      	* gcc.dg/init-excess-3.c: New test.
      a6b810ae
    • Andrew Pinski's avatar
      MATCH: add some of what phiopt's builtin_zero_pattern did · c53237ce
      Andrew Pinski authored
      This adds the patterns for
      POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
      For "a != 0 ? FUNC(a) : CST".
      CLRSB, CLRSBL, and CLRSBLL will be moved next.
      
      Note this is not enough to remove
      cond_removal_in_builtin_zero_pattern as we need to handle
      the case where there is an NOP_CONVERT inside the conditional
      to move out of the condition inside match_simplify_replacement.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu.
      
      gcc/ChangeLog:
      
      	* match.pd: Add patterns for "a != 0 ? FUNC(a) : CST"
      	for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
      c53237ce
    • Andrew Pinski's avatar
      PHIOPT: Allow moving of some builtin calls · 55b70889
      Andrew Pinski authored
      While moving working on moving
      cond_removal_in_builtin_zero_pattern to match, I noticed
      that functions were not allowed to move as we reject all
      non-assignments.
      This changes to allowing a few calls which are known not
      to throw/trap. Right now it is restricted to ones
      which cond_removal_in_builtin_zero_pattern handles but
      adding more is just adding it to the switch statement.
      
      gcc/ChangeLog:
      
      	* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p):
      	Allow some builtin/internal function calls which
      	are known not to trap/throw.
      	(phiopt_worker::match_simplify_replacement):
      	Use name instead of getting the lhs again.
      55b70889
    • Martin Liska's avatar
      hwasan: adjust wording in expected output in tests · 84e7d62c
      Martin Liska authored
      gcc/testsuite/ChangeLog:
      
      	* c-c++-common/hwasan/asan-pr70541.c: Adjust wording of expected
      	output.
      	* c-c++-common/hwasan/heap-overflow.c: Likewise.
      	* c-c++-common/hwasan/sanity-check-pure-c.c: Likewise.
      	* c-c++-common/hwasan/use-after-free.c: Likewise.
      84e7d62c
    • Martin Liska's avatar
      libsanitizer: link hwasan against lsan library · 54765c87
      Martin Liska authored
      Similarly to libasan.so, libhwasan.so also utilizes some
      of the symbols from lsan library.
      
      	PR sanitizer/109674
      
      libsanitizer/ChangeLog:
      
      	* hwasan/Makefile.am: Depend on liblsan.
      	* hwasan/Makefile.in: Re-generate.
      54765c87
    • Longjun Luo's avatar
      [PATCH] libcpp: suppress builtin macro redefined warnings for __LINE__ · e7ce7c49
      Longjun Luo authored
      From 0821df518b264e754d698d399f98be1a62945e32 Mon Sep 17 00:00:00 2001
      From: Longjun Luo <luolongjuna@gmail.com>
      Date: Thu, 12 Jan 2023 23:59:54 +0800
      Subject: [PATCH] libcpp: suppress builtin macro redefined warnings for
       __LINE__
      
      As implied in
      gcc.gnu.org/legacy-ml/gcc-patches/2008-09/msg00076.html,
      gcc provides -Wno-builtin-macro-redefined to suppress warning when
      redefining builtin macro. However, at that time, there was no
      scenario for __LINE__ macro.
      
      But, when we try to build a live-patch, we compare sections by using
      -ffunction-sections. Some same functions are considered changed because
      of __LINE__ macro.
      
      At present, to detect such a changed caused by __LINE__ macro, we
      have to analyse code and maintain a function list. For example,
      in kpatch, check this commit
      github.com/dynup/kpatch/commit/0e1b95edeafa36edb7bcf11da6d1c00f76d7e03d.
      
      So, in this scenario, when we try to compared sections, it would
      be better to support suppress builtin macro redefined warnings for
      __LINE__ macro.
      
      libcpp:
      	* init.cc (builtin_array): Do not always warn for a redefinition
      	of __LINE__.
      
      gcc/testsuite
      
      	* gcc.dg/builtin-redefine.c: Test for redefintion warnings
      	for __LINE__.
      	* gcc.dg/builtin-redefine-1.c: New test.
      e7ce7c49
    • Joakim Nohlgård's avatar
      gcc: Use ld -r when checking for HAVE_LD_RO_RW_SECTION_MIXING · 2744dbb9
      Joakim Nohlgård authored
      Fall back to ld -r if ld -shared fails during configure. The check for
      HAVE_LD_RO_RW_SECTION_MIXING can fail on targets where ld does not
      support shared objects, even though the answer to the test should be
      'read-write'. One such target is riscv64-unknown-elf. Failing this test
      results in a libgcc crtbegin.o which has a writable .eh_frame section
      leading to the default linker scripts placing the .eh_frame section in a
      writable memory segment, or a linker warning when using ld scripts that
      place .eh_frame unconditionally in ROM.
      
      gcc/ChangeLog:
      
      	* configure: Regenerate.
      	* configure.ac: Use ld -r in the check for HAVE_LD_RO_RW_SECTION_MIXING
      2744dbb9
    • Martin Liska's avatar
      libsanitizer: update LOCAL_PATCHES revision · d2ab430a
      Martin Liska authored
      libsanitizer/ChangeLog:
      
      	* LOCAL_PATCHES: Update revision.
      d2ab430a
    • Martin Liska's avatar
      libsanitizer: Apply local patches · 401f46e6
      Martin Liska authored
      401f46e6
    • Martin Liska's avatar
    • Gaius Mulley's avatar
      Remove duplicate constants created between passes · d5e2694e
      Gaius Mulley authored
      
      There is no need to re-create constant literals between passes.
      This patch creates a constant pool and reuses a constant literal
      providing it is created at the same location.  This in turn avoids
      generating duplicate overflow error messages when encountering an
      out of range constant literal.
      
      gcc/m2/ChangeLog:
      
      	* gm2-compiler/SymbolTable.mod (ConstLitPoolEntry): New
      	pointer to record.
      	(ConstLitSym): New field RangeError.
      	(ConstLitPoolTree): New SymbolTree representing name to
      	index.
      	(ConstLitArray): New dynamic array containing pointers
      	to a ConstLitPoolEntry.
      	(CreateConstLit): New procedure function.
      	(LookupConstLitPoolEntry): New procedure function.
      	(AddConstLitPoolEntry): New procedure function.
      	(MakeConstLit): Re-implemented to check the constant lit
      	pool before calling CreateConstLit.
      	* m2.flex: Add ability to decode binary constant literals.
      
      gcc/testsuite/ChangeLog:
      
      	* gm2/pim/run/pass/constlitbase.mod: New test.
      
      Signed-off-by: default avatarGaius Mulley <gaiusmod2@gmail.com>
      d5e2694e
    • GCC Administrator's avatar
      Daily bump. · 8eb1e394
      GCC Administrator authored
      8eb1e394
  3. Apr 29, 2023
    • Hans-Peter Nilsson's avatar
      reload: Handle generating reloads that also clobbers flags · 7eefdc9c
      Hans-Peter Nilsson authored
      	* reload1.cc (emit_insn_if_valid_for_reload_1): Rename from
      	emit_insn_if_valid_for_reload.
      	(emit_insn_if_valid_for_reload): Call new helper, and if a SET fails
      	to be recognized, also try emitting a parallel that clobbers
      	TARGET_FLAGS_REGNUM, as applicable.
      7eefdc9c
    • Roger Sayle's avatar
      [xstormy16] Efficient HImode rotate left by a single bit. · e2b204c3
      Roger Sayle authored
      This patch contains some minor tweak to xstormy16's machine description
      most significantly providing a pattern for HImode rotate left by a single
      bit that requires only two instructions.
      
      unsigned short foo(unsigned short x)
      {
        return (x << 1) | (x >> 15);
      }
      
      currently with -O2 generates:
      foo:    mov r7,r2
              shr r7,#15
              shl r2,#1
              or r2,r7
              ret
      
      with this patch, GCC now generates:
      foo:	shl r2,#1 | adc r2,#0
              ret
      
      Additionally neghi2 is converted to a define_insn (so that the RTL
      optimizers see the negation semantics), and HImode rotations by
      8-bits can now be recognized and implemented using swpb.
      
      2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	* config/stormy16/stormy16.md (neghi2): Convert from a define_expand
      	to a define_insn.
      	(*rotatehi_1): New define_insn for efficient 2 insn sequence.
      	(*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb.
      
      gcc/testsuite/ChangeLog
      	* gcc.target/xstormy16/neghi2.c: New test case.
      	* gcc.target/xstormy16/rotatehi-1.c: Likewise.
      e2b204c3
    • Roger Sayle's avatar
      [xstormy16] Recognize/support swpn (swap nibbles) instruction. · 58f3cbbd
      Roger Sayle authored
      This patch adds support for xstormy16's swap nibbles instruction (swpn).
      For the test case:
      
      short foo(short x) {
        return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f);
      }
      
      GCC with -O2 currently generates the nine instruction sequence:
      foo:    mov r7,r2
              asr r2,#4
              and r2,#15
              mov.w r6,#-256
              and r6,r7
              or r2,r6
              shl r7,#4
              and r7,#255
              or r2,r7
              ret
      
      with this patch, we now generate:
      foo:	swpn r2
      	ret
      
      To achieve this using combine's four instruction "combinations" requires
      a little wizardry.  Firstly, define_insn_and_split are introduced to
      treat logical shifts followed by bitwise-AND as macro instructions that
      are split after reload.  This is sufficient to recognize a QImode
      nibble swap, which can be implemented by swpn followed by either a
      zero-extension or a sign-extension from QImode to HImode.  Then finally,
      in the correct context, a QImode swap-nibbles pattern can be combined to
      preserve the high-byte of a HImode word, matching the xstormy16's swpn
      semantics.  The naming of the new code iterators is taken from i386.md.
      
      2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	* config/stormy16/stormy16.md (any_lshift): New code iterator.
      	(any_or_plus): Likewise.
      	(any_rotate): Likewise.
      	(*<any_lshift>_and_internal): New define_insn_and_split to
      	recognize a logical shift followed by an AND, and split it
      	again after reload.
      	(*swpn): New define_insn matching xstormy16's swpn.
      	(*swpn_zext): New define_insn recognizing swpn followed by
      	zero_extendqihi2, i.e. with the high byte set to zero.
      	(*swpn_sext): Likewise, for swpn followed by cbw.
      	(*swpn_sext_2): Likewise, for an alternate RTL form.
      	(*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
      	sequence is split in the correct place to recognize the *swpn_zext
      	followed by any_or_plus (ior, xor or plus) instruction.
      
      gcc/testsuite/ChangeLog
      	* gcc.target/xstormy16/swpn-1.c: New QImode test case.
      	* gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
      	* gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
      	* gcc.target/xstormy16/swpn-4.c: New HImode test case.
      58f3cbbd
    • Mikael Pettersson's avatar
      add glibc-stdint.h to vax and lm32 linux target (PR target/105525) · 83c78cb0
      Mikael Pettersson authored
      PR target/105525 is a build regression for the vax and lm32 linux
      targets present in gcc-12/13/head, where the builds fail due to
      unsatisfied references to __INTPTR_TYPE__ and __UINTPTR_TYPE__,
      caused by these two targets failing to provide glibc-stdint.h.
      
      Fixed thusly, tested by building crosses, which now succeeds.
      
      Ok for trunk? (Note I don't have commit rights.)
      
      	PR target/105525
      gcc/
      	* config.gcc (vax-*-linux*): Add glibc-stdint.h.
      	(lm32-*-uclinux*): Likewise.
      83c78cb0
    • Jeff Law's avatar
      Adjust mips test for recent ifcvt costing changes · ef6c3095
      Jeff Law authored
      MIPS ports have been failing a few tests since the change to add cost
      checks in another path through the if-converter pass.
      
      As with the other ports, these look like cases where we don't do good
      costing in the MIPS port.  Someone who cares about MIPS will need to
      fix this properly.
      
      In the mean time this patch adjusts the branch cost when running the
      two affected tests and skips them at -Os.  This is enough to verify
      that if conversion can still happen if the costs are adjusted.
      
      gcc/testsuite
      	* gcc.target/mips/mips-ps-type-2.c: Adjust branch cost to
      	encourage if-conversion.  Skip for -Os.
      	* gcc.target/mips/movcc-3.c: Similarly.
      ef6c3095
    • Fei Gao's avatar
      RISC-V: decouple stack allocation for rv32e w/o save-restore · a5b2a3bf
      Fei Gao authored
      Currently in rv32e, stack allocation for GPR callee-saved registers is
      always 12 bytes w/o save-restore. Actually, for the case without save-restore,
      less stack memory can be reserved. This patch decouples stack allocation for
      rv32e w/o save-restore and makes riscv_compute_frame_info more readable.
      
      output of testcase rv32e_stack.c
      before patch:
      	addi	sp,sp,-16
      	sw	ra,12(sp)
      	call	getInt
      	sw	a0,0(sp)
      	lw	a0,0(sp)
      	call	PrintInts
      	lw	a5,0(sp)
      	mv	a0,a5
      	lw	ra,12(sp)
      	addi	sp,sp,16
      	jr	ra
      
      after patch:
      	addi	sp,sp,-8
      	sw	ra,4(sp)
      	call	getInt
      	sw	a0,0(sp)
      	lw	a0,0(sp)
      	call	PrintInts
      	lw	a5,0(sp)
      	mv	a0,a5
      	lw	ra,4(sp)
      	addi	sp,sp,8
      	jr	ra
      
      gcc/ChangeLog:
      
      	* config/riscv/riscv.cc (riscv_avoid_save_libcall): helper function
      	for riscv_use_save_libcall.
      	(riscv_use_save_libcall): call riscv_avoid_save_libcall.
      	(riscv_compute_frame_info): restructure to decouple stack allocation
      	for rv32e w/o save-restore.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rv32e_stack.c: New test.
      a5b2a3bf
    • GCC Administrator's avatar
      Daily bump. · 50205195
      GCC Administrator authored
      50205195
  4. Apr 28, 2023
    • Hans-Peter Nilsson's avatar
      testsuite: Handle empty assembly lines in check-function-bodies · 5cf6160a
      Hans-Peter Nilsson authored
      I tried to make use of check-function-bodies for cris-elf and was a
      bit surprised to see it failing.  There's a deliberate empty line
      after the filled delay slot of the return-function which was
      mishandled.  I thought "aha" and tried to add an empty line
      (containing just a "**" prefix) to the match, but that didn't help.
      While it was added as input from the function's assembly output
      to-be-matched like any other line, it couldn't be matched: I had to
      use "...", which works but is...distracting.
      
      Some digging shows that an empty assembly line can't be deliberately
      matched because all matcher lines (lines starting with the prefix,
      the ubiquitous "**") are canonicalized by trimming leading
      whitespace (the "string trim" in check-function-bodies) and instead
      adding a leading TAB character, thus empty lines end up containing
      just a TAB.  For usability it's better to treat empty lines as fluff
      than to uglifying the test-case and the code to properly match them.
      Double-checking, no test-case tries to match an line containing just
      TAB (by providing an a line containing just "**\s*", i.e. zero or
      more whitespace characters).
      
      	* lib/scanasm.exp (parse_function_bodies): Set fluff to include
      	empty lines (besides optionally leading whitespace).
      5cf6160a
    • Eugene Rozenfeld's avatar
      Fix autoprofiledbootstrap build · 0c77a090
      Eugene Rozenfeld authored
      1. Fix gcov version
      2. Merge perf data collected when compiling the compiler and runtime libraries
      3. Fix documentation typo
      
      Tested on x86_64-pc-linux-gnu.
      
      ChangeLog:
      
      	* Makefile.in: Define PROFILE_MERGER
      	* Makefile.tpl: Define PROFILE_MERGER
      
      gcc/c/ChangeLog:
      
      	* Make-lang.in: Merge perf data collected when compiling cc1 and runtime libraries
      
      gcc/cp/ChangeLog:
      
      	* Make-lang.in: Merge perf data collected when compiling cc1plus and runtime libraries
      
      gcc/lto/ChangeLog:
      
      	* Make-lang.in: Merge perf data collected when compiling lto1 and runtime libraries
      
      gcc/ChangeLog:
      
      	* doc/install.texi: Fix documentation typo
      0c77a090
    • Matevos Mehrabyan's avatar
      RISC-V: Add divmod expansion support · 065be0ff
      Matevos Mehrabyan authored
      Hi all,
      If we have division and remainder calculations with the same operands:
      
        a = b / c;
        d = b % c;
      
      We can replace the calculation of remainder with multiplication +
      subtraction, using the result from the previous division:
      
        a = b / c;
        d = a * c;
        d = b - d;
      
      Which will be faster.
      Currently, it isn't done for RISC-V.
      
      I've added an expander for DIVMOD which replaces 'rem' with 'mul + sub'.
      
      Best regards,
      Matevos.
      
      gcc/ChangeLog:
      
      	* config/riscv/iterators.md (only_div, paired_mod): New iterators.
      	(u): Add div/udiv cases.
      	* config/riscv/riscv-protos.h (riscv_use_divmod_expander): Prototype.
      	* config/riscv/riscv.cc (struct riscv_tune_param): Add field for
      	divmod expansion.
      	(rocket_tune_info, sifive_7_tune_info): Initialize new field.
      	(thead_c906_tune_info): Likewise.
      	(optimize_size_tune_info): Likewise.
      	(riscv_use_divmod_expander): New function.
      	* config/riscv/riscv.md (<u>divmod<mode>4): New expander.
      
      gcc/testsuite/ChangeLog:
      	* gcc.target/riscv/divmod-1.c: New testcase.
      	* gcc.target/riscv/divmod-2.c: New testcase.
      065be0ff
    • Karen Sargsyan's avatar
      RISC-V: Added support clmul[r,h] instructions for Zbc extension. · d9df45a6
      Karen Sargsyan authored
      clmul[h] instructions were added only for the ZBKC extension.
      This patch includes them in the ZBC extension too.
      Besides, added support of 'clmulr' instructions for ZBC extension.
      
      gcc/ChangeLog:
      
      	* config/riscv/bitmanip.md: Added clmulr instruction.
      	* config/riscv/riscv-builtins.cc (AVAIL): Add new.
      	* config/riscv/riscv.md: (UNSPEC_CLMULR): Add new unspec type.
      	(type): Add clmul
      	* config/riscv/riscv-cmo.def: Added built-in function for clmulr.
      	* config/riscv/crypto.md: Move clmul[h] instructions to bitmanip.md.
      	* config/riscv/riscv-scalar-crypto.def: Move clmul[h] built-in
      	functions to riscv-cmo.def.
      	* config/riscv/generic.md: Add clmul to list of instructions
      	using the generic_imul reservation.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/zbc32.c: New test.
      	* gcc.target/riscv/zbc64.c: New test.
      d9df45a6
    • Jivan Hakobyan's avatar
      RISC-V: Eliminate redundant zero extension of minu/maxu operands · 19667413
      Jivan Hakobyan authored
      RV64 the following code:
      
        unsigned Min(unsigned a, unsigned b) {
            return a < b ? a : b;
        }
      
      Compiles to:
        Min:
             zext.w  a1,a1
             zext.w  a0,a0
             minu    a0,a1,a0
             sext.w  a0,a0
             ret
      
      This patch removes unnecessary zero extensions of minu/maxu operands.
      
      gcc/ChangeLog:
      
      	* config/riscv/bitmanip.md: Added expanders for minu/maxu instructions
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/zbb-min-max-02.c: Updated scanning check.
      	* gcc.target/riscv/zbb-min-max-03.c: New tests.
      19667413
    • Martin Liska's avatar
      contrib: port doxygen script to Python3 · db7e7776
      Martin Liska authored
      contrib/ChangeLog:
      
      	* filter_gcc_for_doxygen: Use python3 and not python2.
      	* filter_params.py: Likewise.
      db7e7776
    • Andrew Pinski's avatar
      PHIOPT: Move two_value_replacement to match.pd · 1dd154f6
      Andrew Pinski authored
      This patch converts two_value_replacement function
      into a match.pd pattern.
      It is a direct translation with only one minor change,
      does not check for the {0,+-1} case as that is handled
      before in match.pd so there is no reason to do the extra
      check for it.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with
      no regressions.
      
      gcc/ChangeLog:
      
      	PR tree-optimization/100958
      	* tree-ssa-phiopt.cc (two_value_replacement): Remove.
      	(pass_phiopt::execute): Don't call two_value_replacement.
      	* match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to
      	handle what two_value_replacement did.
      1dd154f6
    • Andrew Pinski's avatar
      MATCH: Add patterns from phiopt's minmax_replacement · c43819a9
      Andrew Pinski authored
      This adds a few patterns from phiopt's minmax_replacement
      for (A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C> .
      It is progress to remove minmax_replacement from phiopt.
      There are still some more cases dealing with constants on the
      edges (0/INT_MAX) to handle in match.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
      
      gcc/ChangeLog:
      
      	* match.pd: Add patterns for
      	"(A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C>".
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.dg/tree-ssa/minmax-16.c: Update testcase slightly.
      	* gcc.dg/tree-ssa/split-path-1.c: Also disable tree-loop-if-convert
      	as that now does the combining.
      c43819a9
    • Andrew Pinski's avatar
      MATCH: Factor out code that for min max detection with constants · b9b30dba
      Andrew Pinski authored
      This factors out some of the code from the min/max detection
      from match.pd into a function so it can be reused in other
      places. This is mainly used to detect the conversions
      of >= to > which causes the integer values to be changed by
      one.
      
      Changes since v1:
      * factor out the checks for INTEGER_CSTs so it is more obvious.
      
      OK? Bootstrapped and tested on x86_64-linux-gnu.
      
      gcc/ChangeLog:
      
      	* match.pd: Factor out the deciding the min/max from
      	the "(cond (cmp (convert1? x) c1) (convert2? x) c2)"
      	pattern to ...
      	* fold-const.cc (minmax_from_comparison): this new function.
      	* fold-const.h (minmax_from_comparison): New prototype.
      b9b30dba
    • Roger Sayle's avatar
      PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG. · 650c36ec
      Roger Sayle authored
      This patch fixes PR rtl-optimization/109476, which is a code quality
      regression affecting AVR.  The cause is that the lower-subreg pass is
      sometimes overly aggressive, lowering the LSHIFTRT below:
      
      (insn 7 4 8 2 (set (reg:HI 51)
              (lshiftrt:HI (reg/v:HI 49 [ b ])
                  (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
           (nil))
      
      into a pair of QImode SUBREG assignments:
      
      (insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
              (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
           (nil))
      (insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
              (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
           (nil))
      
      but this idiom, SETs of SUBREGs, interferes with combine's ability
      to associate/fuse instructions.  The solution, on targets that
      have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
      wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
      is to split/lower LSHIFTRT to a ZERO_EXTEND.
      
      To answer Richard's question in comment #10 of the bugzilla PR,
      the function resolve_shift_zext is called with one of four RTX
      codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
      LSHIFTRT can the setting of low_part and high_part SUBREGs be
      replaced by a ZERO_EXTEND.  For ASHIFTRT, we require a sign
      extension, so don't set the high_part to zero; if we're splitting
      a ZERO_EXTEND then it doesn't make sense to replace it with a
      ZERO_EXTEND, and for ASHIFT we've played games to swap the
      high_part and low_part SUBREGs, so that we assign the low_part
      to zero (for double word shifts by greater than word size bits).
      
      2023-04-28  Roger Sayle  <roger@nextmovesoftware.com>
      
      gcc/ChangeLog
      	PR rtl-optimization/109476
      	* lower-subreg.cc: Include explow.h for force_reg.
      	(find_decomposable_shift_zext): Pass an additional SPEED_P argument.
      	If decomposing a suitable LSHIFTRT and we're not splitting
      	ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
      	instead of setting a high part SUBREG to zero, which helps combine.
      	(decompose_multiword_subregs): Update call to resolve_shift_zext.
      
      gcc/testsuite/ChangeLog
      	PR rtl-optimization/109476
      	* gcc.target/avr/mmcu/pr109476.c: New test case.
      650c36ec
    • Roger Sayle's avatar
      Synchronize include/ctf.h with upstream binutils/libctf. · fde00589
      Roger Sayle authored
      This patch updates include/ctf.h to match the current libctf version in
      binutils' include/.  I recently attempted to build a uber tree (following
      some notes that are so old they used CVS) and noticed that binutils won't
      build with gcc's top-level include, due to CTF_F_IDXSORTED not being
      defined in ctf.h.
      
      2023-04-28  Roger Sayle  <roger@nextmovesoftware.com>
      
      include/ChangeLog
      	* ctf.h: Import latest version from binutils/libctf.
      fde00589
    • Richard Biener's avatar
      Add emulated scatter capability to the vectorizer · 6d4b59a9
      Richard Biener authored
      This adds a scatter vectorization capability to the vectorizer
      without target support by decomposing the offset and data vectors
      and then performing scalar stores in the order of vector lanes.
      This is aimed at cases where vectorizing the rest of the loop
      offsets the cost of vectorizing the scatter.
      
      The offset load is still vectorized and costed as such, but like
      with emulated gather those will be turned back to scalar loads
      by forwrpop.
      
      	* tree-vect-data-refs.cc (vect_analyze_data_refs): Always
      	consider scatters.
      	* tree-vect-stmts.cc (vect_model_store_cost): Pass in the
      	gather-scatter info and cost emulated scatters accordingly.
      	(get_load_store_type): Support emulated scatters.
      	(vectorizable_store): Likewise.  Emulate them by extracting
      	scalar offsets and data, doing scalar stores.
      
      	* gcc.dg/vect/pr25413a.c: Un-XFAIL everywhere.
      	* gcc.dg/vect/vect-71.c: Likewise.
      	* gcc.dg/vect/tsvc/vect-tsvc-s4113.c: Likewise.
      	* gcc.dg/vect/tsvc/vect-tsvc-s491.c: Likewise.
      	* gcc.dg/vect/tsvc/vect-tsvc-vas.c: Likewise.
      6d4b59a9
    • Richard Biener's avatar
      Adjust costing of emulated vectorized gather/scatter · 24905a4b
      Richard Biener authored
      Emulated gather/scatter behave similar to strided elementwise
      accesses in that they need to decompose the offset vector
      and construct or decompose the data vector so handle them
      the same way, pessimizing the cases with may elements.
      
      For pr88531-2c.c instead of
      
      .L4:
              leaq    (%r15,%rcx), %rdx
              incl    %edi
              movl    16(%rdx), %r13d
              movl    24(%rdx), %r14d
              movl    (%rdx), %r10d
              movl    4(%rdx), %r9d
              movl    8(%rdx), %ebx
              movl    12(%rdx), %r11d
              movl    20(%rdx), %r12d
              vmovss  (%rax,%r14,4), %xmm2
              movl    28(%rdx), %edx
              vmovss  (%rax,%r13,4), %xmm1
              vmovss  (%rax,%r10,4), %xmm0
              vinsertps       $0x10, (%rax,%rdx,4), %xmm2, %xmm2
              vinsertps       $0x10, (%rax,%r12,4), %xmm1, %xmm1
              vinsertps       $0x10, (%rax,%r9,4), %xmm0, %xmm0
              vmovlhps        %xmm2, %xmm1, %xmm1
              vmovss  (%rax,%rbx,4), %xmm2
              vinsertps       $0x10, (%rax,%r11,4), %xmm2, %xmm2
              vmovlhps        %xmm2, %xmm0, %xmm0
              vinsertf128     $0x1, %xmm1, %ymm0, %ymm0
              vmulps  %ymm3, %ymm0, %ymm0
              vmovups %ymm0, (%r8,%rcx)
              addq    $32, %rcx
              cmpl    %esi, %edi
              jb      .L4
      
      we now prefer
      
      .L4:
              leaq    0(%rbp,%rdx,8), %rcx
              movl    (%rcx), %r10d
              movl    4(%rcx), %ecx
              vmovss  (%rsi,%r10,4), %xmm0
              vinsertps       $0x10, (%rsi,%rcx,4), %xmm0, %xmm0
              vmulps  %xmm1, %xmm0, %xmm0
              vmovlps %xmm0, (%rbx,%rdx,8)
              incq    %rdx
              cmpl    %edi, %edx
              jb      .L4
      
      	* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
      	Tame down element extracts and scalar loads for gather/scatter
      	similar to elementwise strided accesses.
      
      	* gcc.target/i386/pr89618-2.c: New testcase.
      	* gcc.target/i386/pr88531-2b.c: Adjust.
      	* gcc.target/i386/pr88531-2c.c: Likewise.
      24905a4b
    • Pan Li's avatar
      RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR · 8b84d879
      Pan Li authored
      
      When some RVV integer compare operators act on the same vector
      registers without mask. They can be simplified to VMCLR.
      
      This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind
      of the simplification by adding one new define_split.
      
      Given we have:
      vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
        return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
      }
      
      Before this patch:
      vsetvli  zero,a2,e8,m8,ta,ma
      vl8re8.v v24,0(a1)
      vmslt.vv v8,v24,v24
      vsetvli  a5,zero,e8,m8,ta,ma
      vsm.v    v8,0(a0)
      ret
      
      After this patch:
      vsetvli zero,a2,e8,mf8,ta,ma
      vmclr.m v24                    <- optimized to vmclr.m
      vsetvli zero,a5,e8,mf8,ta,ma
      vsm.v   v24,0(a0)
      ret
      
      As above, we may have one instruction eliminated and require less
      vector registers.
      
      gcc/ChangeLog:
      
      	* config/riscv/vector.md: Add new define split to perform
      	the simplification.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
      
      Signed-off-by: default avatarPan Li <pan2.li@intel.com>
      Co-authored-by: default avatarkito-cheng <kito.cheng@sifive.com>
      8b84d879
    • Jonathan Wakely's avatar
      libstdc++: Improve doxygen docs for <random> · d711f8f8
      Jonathan Wakely authored
      Add @headerfile and @since tags. Add gamma_distribution to the correct
      group (poisson distributions). Add a group for the sampling
      distributions and add the missing definitions of their probability
      functions. Add uniform_int_distribution back to the uniform
      distributions group.
      
      libstdc++-v3/ChangeLog:
      
      	* include/bits/random.h (gamma_distribution): Add to the right
      	doxygen group.
      	(discrete_distribution, piecewise_constant_distribution)
      	(piecewise_linear_distribution): Create a new doxygen group and
      	fix the incomplete doxygen comments.
      	* include/bits/uniform_int_dist.h (uniform_int_distribution):
      	Add to doxygen group.
      d711f8f8
    • Jonathan Wakely's avatar
      libstdc++: Minor fixes to doxygen comments · 30f6aace
      Jonathan Wakely authored
      libstdc++-v3/ChangeLog:
      
      	* include/bits/uses_allocator.h: Add missing @file comment.
      	* include/bits/regex.tcc: Remove stray doxygen comments.
      	* include/experimental/memory_resource: Likewise.
      	* include/std/bit: Tweak doxygen @cond comments.
      	* include/std/expected: Likewise.
      	* include/std/numbers: Likewise.
      30f6aace
    • Jonathan Wakely's avatar
      libstdc++: Strip absolute paths from files shown in Doxygen docs · 975e8e83
      Jonathan Wakely authored
      This avoids showing absolute paths from the expansion of
      @srcdir@/libsupc++/ in the doxygen File List view.
      
      libstdc++-v3/ChangeLog:
      
      	* doc/doxygen/user.cfg.in (STRIP_FROM_PATH): Remove prefixes
      	from header paths.
      975e8e83
Loading