- May 01, 2023
-
-
GCC Administrator authored
-
- Apr 30, 2023
-
-
Roger Sayle authored
When I converted xstormy's neghi2 pattern from a define_expand to a define_insn, I forgot that define_expand implicitly produces a sequence of instructions, but a define_insn is an implicit parallel, thereby messing up the clobber (reg:BI CARRY_REG), which can then cause an ICE in the auto-generated added_clobbers_hard_reg_p. Whilst stripping the superfluous PARALLEL resolves this issue, an even better fix is to use xstormy16's INC instruction, that (like NOT) doesn't affect the carry flag, resulting in a neghi2 implementation that can more easily be CSE'd and scheduled. Many thanks (again) to Jeff Law for testing/reporting this issue. 2024-04-30 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.md (neghi2): Rewrite pattern using inc to avoid clobbering the carry flag. gcc/testsuite/ChangeLog * gcc.target/xstormy16/neghi2.c: Update expected implementation.
-
Andrew Pinski authored
So char arrays are not the only type that be initialized from {"a"}. We can have wchar_t (L"") and char16_t (u"") types too. So let's print out the type of the array instead of just saying char. Note in the testsuite I used regex . to match '[' and ']' as I could not figure out how many '\' I needed. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/c/ChangeLog: * c-typeck.cc (process_init_element): Print out array type for excessive elements. gcc/testsuite/ChangeLog: * gcc.dg/init-bad-1.c: Update error message. * gcc.dg/init-bad-2.c: Likewise. * gcc.dg/init-bad-3.c: Likewise. * gcc.dg/init-excess-3.c: Likewise. * gcc.dg/pr61096-1.c: Likewise.
-
Andrew Pinski authored
The problem here is the code which handles {"a"} is supposed to handle the case where the is something after the string but it only handles the case where there is another string so we go down the other path and error out saying "excess elements in struct initializer" even though this was a character array. To fix this, we need to move the ckeck if the initializer is a string after the check for array and initializer. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Adnrew Pinski gcc/c/ChangeLog: PR c/107926 * c-typeck.cc (process_init_element): Move the check for string cst until after the error message. gcc/testsuite/ChangeLog: PR c/107926 * gcc.dg/init-excess-3.c: New test.
-
Andrew Pinski authored
This adds the patterns for POPCOUNT BSWAP FFS PARITY CLZ and CTZ. For "a != 0 ? FUNC(a) : CST". CLRSB, CLRSBL, and CLRSBLL will be moved next. Note this is not enough to remove cond_removal_in_builtin_zero_pattern as we need to handle the case where there is an NOP_CONVERT inside the conditional to move out of the condition inside match_simplify_replacement. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Add patterns for "a != 0 ? FUNC(a) : CST" for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
-
Andrew Pinski authored
While moving working on moving cond_removal_in_builtin_zero_pattern to match, I noticed that functions were not allowed to move as we reject all non-assignments. This changes to allowing a few calls which are known not to throw/trap. Right now it is restricted to ones which cond_removal_in_builtin_zero_pattern handles but adding more is just adding it to the switch statement. gcc/ChangeLog: * tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p): Allow some builtin/internal function calls which are known not to trap/throw. (phiopt_worker::match_simplify_replacement): Use name instead of getting the lhs again.
-
Martin Liska authored
gcc/testsuite/ChangeLog: * c-c++-common/hwasan/asan-pr70541.c: Adjust wording of expected output. * c-c++-common/hwasan/heap-overflow.c: Likewise. * c-c++-common/hwasan/sanity-check-pure-c.c: Likewise. * c-c++-common/hwasan/use-after-free.c: Likewise.
-
Martin Liska authored
Similarly to libasan.so, libhwasan.so also utilizes some of the symbols from lsan library. PR sanitizer/109674 libsanitizer/ChangeLog: * hwasan/Makefile.am: Depend on liblsan. * hwasan/Makefile.in: Re-generate.
-
Longjun Luo authored
From 0821df518b264e754d698d399f98be1a62945e32 Mon Sep 17 00:00:00 2001 From: Longjun Luo <luolongjuna@gmail.com> Date: Thu, 12 Jan 2023 23:59:54 +0800 Subject: [PATCH] libcpp: suppress builtin macro redefined warnings for __LINE__ As implied in gcc.gnu.org/legacy-ml/gcc-patches/2008-09/msg00076.html, gcc provides -Wno-builtin-macro-redefined to suppress warning when redefining builtin macro. However, at that time, there was no scenario for __LINE__ macro. But, when we try to build a live-patch, we compare sections by using -ffunction-sections. Some same functions are considered changed because of __LINE__ macro. At present, to detect such a changed caused by __LINE__ macro, we have to analyse code and maintain a function list. For example, in kpatch, check this commit github.com/dynup/kpatch/commit/0e1b95edeafa36edb7bcf11da6d1c00f76d7e03d. So, in this scenario, when we try to compared sections, it would be better to support suppress builtin macro redefined warnings for __LINE__ macro. libcpp: * init.cc (builtin_array): Do not always warn for a redefinition of __LINE__. gcc/testsuite * gcc.dg/builtin-redefine.c: Test for redefintion warnings for __LINE__. * gcc.dg/builtin-redefine-1.c: New test.
-
Joakim Nohlgård authored
Fall back to ld -r if ld -shared fails during configure. The check for HAVE_LD_RO_RW_SECTION_MIXING can fail on targets where ld does not support shared objects, even though the answer to the test should be 'read-write'. One such target is riscv64-unknown-elf. Failing this test results in a libgcc crtbegin.o which has a writable .eh_frame section leading to the default linker scripts placing the .eh_frame section in a writable memory segment, or a linker warning when using ld scripts that place .eh_frame unconditionally in ROM. gcc/ChangeLog: * configure: Regenerate. * configure.ac: Use ld -r in the check for HAVE_LD_RO_RW_SECTION_MIXING
-
Martin Liska authored
libsanitizer/ChangeLog: * LOCAL_PATCHES: Update revision.
-
Martin Liska authored
-
Martin Liska authored
-
Gaius Mulley authored
There is no need to re-create constant literals between passes. This patch creates a constant pool and reuses a constant literal providing it is created at the same location. This in turn avoids generating duplicate overflow error messages when encountering an out of range constant literal. gcc/m2/ChangeLog: * gm2-compiler/SymbolTable.mod (ConstLitPoolEntry): New pointer to record. (ConstLitSym): New field RangeError. (ConstLitPoolTree): New SymbolTree representing name to index. (ConstLitArray): New dynamic array containing pointers to a ConstLitPoolEntry. (CreateConstLit): New procedure function. (LookupConstLitPoolEntry): New procedure function. (AddConstLitPoolEntry): New procedure function. (MakeConstLit): Re-implemented to check the constant lit pool before calling CreateConstLit. * m2.flex: Add ability to decode binary constant literals. gcc/testsuite/ChangeLog: * gm2/pim/run/pass/constlitbase.mod: New test. Signed-off-by:
Gaius Mulley <gaiusmod2@gmail.com>
-
GCC Administrator authored
- Apr 29, 2023
-
-
Hans-Peter Nilsson authored
* reload1.cc (emit_insn_if_valid_for_reload_1): Rename from emit_insn_if_valid_for_reload. (emit_insn_if_valid_for_reload): Call new helper, and if a SET fails to be recognized, also try emitting a parallel that clobbers TARGET_FLAGS_REGNUM, as applicable.
-
Roger Sayle authored
This patch contains some minor tweak to xstormy16's machine description most significantly providing a pattern for HImode rotate left by a single bit that requires only two instructions. unsigned short foo(unsigned short x) { return (x << 1) | (x >> 15); } currently with -O2 generates: foo: mov r7,r2 shr r7,#15 shl r2,#1 or r2,r7 ret with this patch, GCC now generates: foo: shl r2,#1 | adc r2,#0 ret Additionally neghi2 is converted to a define_insn (so that the RTL optimizers see the negation semantics), and HImode rotations by 8-bits can now be recognized and implemented using swpb. 2023-04-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.md (neghi2): Convert from a define_expand to a define_insn. (*rotatehi_1): New define_insn for efficient 2 insn sequence. (*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb. gcc/testsuite/ChangeLog * gcc.target/xstormy16/neghi2.c: New test case. * gcc.target/xstormy16/rotatehi-1.c: Likewise.
-
Roger Sayle authored
This patch adds support for xstormy16's swap nibbles instruction (swpn). For the test case: short foo(short x) { return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f); } GCC with -O2 currently generates the nine instruction sequence: foo: mov r7,r2 asr r2,#4 and r2,#15 mov.w r6,#-256 and r6,r7 or r2,r6 shl r7,#4 and r7,#255 or r2,r7 ret with this patch, we now generate: foo: swpn r2 ret To achieve this using combine's four instruction "combinations" requires a little wizardry. Firstly, define_insn_and_split are introduced to treat logical shifts followed by bitwise-AND as macro instructions that are split after reload. This is sufficient to recognize a QImode nibble swap, which can be implemented by swpn followed by either a zero-extension or a sign-extension from QImode to HImode. Then finally, in the correct context, a QImode swap-nibbles pattern can be combined to preserve the high-byte of a HImode word, matching the xstormy16's swpn semantics. The naming of the new code iterators is taken from i386.md. 2023-04-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.md (any_lshift): New code iterator. (any_or_plus): Likewise. (any_rotate): Likewise. (*<any_lshift>_and_internal): New define_insn_and_split to recognize a logical shift followed by an AND, and split it again after reload. (*swpn): New define_insn matching xstormy16's swpn. (*swpn_zext): New define_insn recognizing swpn followed by zero_extendqihi2, i.e. with the high byte set to zero. (*swpn_sext): Likewise, for swpn followed by cbw. (*swpn_sext_2): Likewise, for an alternate RTL form. (*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior sequence is split in the correct place to recognize the *swpn_zext followed by any_or_plus (ior, xor or plus) instruction. gcc/testsuite/ChangeLog * gcc.target/xstormy16/swpn-1.c: New QImode test case. * gcc.target/xstormy16/swpn-2.c: New zero_extend test case. * gcc.target/xstormy16/swpn-3.c: New sign_extend test case. * gcc.target/xstormy16/swpn-4.c: New HImode test case.
-
Mikael Pettersson authored
PR target/105525 is a build regression for the vax and lm32 linux targets present in gcc-12/13/head, where the builds fail due to unsatisfied references to __INTPTR_TYPE__ and __UINTPTR_TYPE__, caused by these two targets failing to provide glibc-stdint.h. Fixed thusly, tested by building crosses, which now succeeds. Ok for trunk? (Note I don't have commit rights.) PR target/105525 gcc/ * config.gcc (vax-*-linux*): Add glibc-stdint.h. (lm32-*-uclinux*): Likewise.
-
Jeff Law authored
MIPS ports have been failing a few tests since the change to add cost checks in another path through the if-converter pass. As with the other ports, these look like cases where we don't do good costing in the MIPS port. Someone who cares about MIPS will need to fix this properly. In the mean time this patch adjusts the branch cost when running the two affected tests and skips them at -Os. This is enough to verify that if conversion can still happen if the costs are adjusted. gcc/testsuite * gcc.target/mips/mips-ps-type-2.c: Adjust branch cost to encourage if-conversion. Skip for -Os. * gcc.target/mips/movcc-3.c: Similarly.
-
Fei Gao authored
Currently in rv32e, stack allocation for GPR callee-saved registers is always 12 bytes w/o save-restore. Actually, for the case without save-restore, less stack memory can be reserved. This patch decouples stack allocation for rv32e w/o save-restore and makes riscv_compute_frame_info more readable. output of testcase rv32e_stack.c before patch: addi sp,sp,-16 sw ra,12(sp) call getInt sw a0,0(sp) lw a0,0(sp) call PrintInts lw a5,0(sp) mv a0,a5 lw ra,12(sp) addi sp,sp,16 jr ra after patch: addi sp,sp,-8 sw ra,4(sp) call getInt sw a0,0(sp) lw a0,0(sp) call PrintInts lw a5,0(sp) mv a0,a5 lw ra,4(sp) addi sp,sp,8 jr ra gcc/ChangeLog: * config/riscv/riscv.cc (riscv_avoid_save_libcall): helper function for riscv_use_save_libcall. (riscv_use_save_libcall): call riscv_avoid_save_libcall. (riscv_compute_frame_info): restructure to decouple stack allocation for rv32e w/o save-restore. gcc/testsuite/ChangeLog: * gcc.target/riscv/rv32e_stack.c: New test.
-
GCC Administrator authored
-
- Apr 28, 2023
-
-
Hans-Peter Nilsson authored
I tried to make use of check-function-bodies for cris-elf and was a bit surprised to see it failing. There's a deliberate empty line after the filled delay slot of the return-function which was mishandled. I thought "aha" and tried to add an empty line (containing just a "**" prefix) to the match, but that didn't help. While it was added as input from the function's assembly output to-be-matched like any other line, it couldn't be matched: I had to use "...", which works but is...distracting. Some digging shows that an empty assembly line can't be deliberately matched because all matcher lines (lines starting with the prefix, the ubiquitous "**") are canonicalized by trimming leading whitespace (the "string trim" in check-function-bodies) and instead adding a leading TAB character, thus empty lines end up containing just a TAB. For usability it's better to treat empty lines as fluff than to uglifying the test-case and the code to properly match them. Double-checking, no test-case tries to match an line containing just TAB (by providing an a line containing just "**\s*", i.e. zero or more whitespace characters). * lib/scanasm.exp (parse_function_bodies): Set fluff to include empty lines (besides optionally leading whitespace).
-
Eugene Rozenfeld authored
1. Fix gcov version 2. Merge perf data collected when compiling the compiler and runtime libraries 3. Fix documentation typo Tested on x86_64-pc-linux-gnu. ChangeLog: * Makefile.in: Define PROFILE_MERGER * Makefile.tpl: Define PROFILE_MERGER gcc/c/ChangeLog: * Make-lang.in: Merge perf data collected when compiling cc1 and runtime libraries gcc/cp/ChangeLog: * Make-lang.in: Merge perf data collected when compiling cc1plus and runtime libraries gcc/lto/ChangeLog: * Make-lang.in: Merge perf data collected when compiling lto1 and runtime libraries gcc/ChangeLog: * doc/install.texi: Fix documentation typo
-
Matevos Mehrabyan authored
Hi all, If we have division and remainder calculations with the same operands: a = b / c; d = b % c; We can replace the calculation of remainder with multiplication + subtraction, using the result from the previous division: a = b / c; d = a * c; d = b - d; Which will be faster. Currently, it isn't done for RISC-V. I've added an expander for DIVMOD which replaces 'rem' with 'mul + sub'. Best regards, Matevos. gcc/ChangeLog: * config/riscv/iterators.md (only_div, paired_mod): New iterators. (u): Add div/udiv cases. * config/riscv/riscv-protos.h (riscv_use_divmod_expander): Prototype. * config/riscv/riscv.cc (struct riscv_tune_param): Add field for divmod expansion. (rocket_tune_info, sifive_7_tune_info): Initialize new field. (thead_c906_tune_info): Likewise. (optimize_size_tune_info): Likewise. (riscv_use_divmod_expander): New function. * config/riscv/riscv.md (<u>divmod<mode>4): New expander. gcc/testsuite/ChangeLog: * gcc.target/riscv/divmod-1.c: New testcase. * gcc.target/riscv/divmod-2.c: New testcase.
-
Karen Sargsyan authored
clmul[h] instructions were added only for the ZBKC extension. This patch includes them in the ZBC extension too. Besides, added support of 'clmulr' instructions for ZBC extension. gcc/ChangeLog: * config/riscv/bitmanip.md: Added clmulr instruction. * config/riscv/riscv-builtins.cc (AVAIL): Add new. * config/riscv/riscv.md: (UNSPEC_CLMULR): Add new unspec type. (type): Add clmul * config/riscv/riscv-cmo.def: Added built-in function for clmulr. * config/riscv/crypto.md: Move clmul[h] instructions to bitmanip.md. * config/riscv/riscv-scalar-crypto.def: Move clmul[h] built-in functions to riscv-cmo.def. * config/riscv/generic.md: Add clmul to list of instructions using the generic_imul reservation. gcc/testsuite/ChangeLog: * gcc.target/riscv/zbc32.c: New test. * gcc.target/riscv/zbc64.c: New test.
-
Jivan Hakobyan authored
RV64 the following code: unsigned Min(unsigned a, unsigned b) { return a < b ? a : b; } Compiles to: Min: zext.w a1,a1 zext.w a0,a0 minu a0,a1,a0 sext.w a0,a0 ret This patch removes unnecessary zero extensions of minu/maxu operands. gcc/ChangeLog: * config/riscv/bitmanip.md: Added expanders for minu/maxu instructions gcc/testsuite/ChangeLog: * gcc.target/riscv/zbb-min-max-02.c: Updated scanning check. * gcc.target/riscv/zbb-min-max-03.c: New tests.
-
Martin Liska authored
contrib/ChangeLog: * filter_gcc_for_doxygen: Use python3 and not python2. * filter_params.py: Likewise.
-
Andrew Pinski authored
This patch converts two_value_replacement function into a match.pd pattern. It is a direct translation with only one minor change, does not check for the {0,+-1} case as that is handled before in match.pd so there is no reason to do the extra check for it. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/100958 * tree-ssa-phiopt.cc (two_value_replacement): Remove. (pass_phiopt::execute): Don't call two_value_replacement. * match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to handle what two_value_replacement did.
-
Andrew Pinski authored
This adds a few patterns from phiopt's minmax_replacement for (A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C> . It is progress to remove minmax_replacement from phiopt. There are still some more cases dealing with constants on the edges (0/INT_MAX) to handle in match. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd: Add patterns for "(A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C>". gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-16.c: Update testcase slightly. * gcc.dg/tree-ssa/split-path-1.c: Also disable tree-loop-if-convert as that now does the combining.
-
Andrew Pinski authored
This factors out some of the code from the min/max detection from match.pd into a function so it can be reused in other places. This is mainly used to detect the conversions of >= to > which causes the integer values to be changed by one. Changes since v1: * factor out the checks for INTEGER_CSTs so it is more obvious. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Factor out the deciding the min/max from the "(cond (cmp (convert1? x) c1) (convert2? x) c2)" pattern to ... * fold-const.cc (minmax_from_comparison): this new function. * fold-const.h (minmax_from_comparison): New prototype.
-
Roger Sayle authored
This patch fixes PR rtl-optimization/109476, which is a code quality regression affecting AVR. The cause is that the lower-subreg pass is sometimes overly aggressive, lowering the LSHIFTRT below: (insn 7 4 8 2 (set (reg:HI 51) (lshiftrt:HI (reg/v:HI 49 [ b ]) (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3} (nil)) into a pair of QImode SUBREG assignments: (insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0) (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split} (nil)) (insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1) (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split} (nil)) but this idiom, SETs of SUBREGs, interferes with combine's ability to associate/fuse instructions. The solution, on targets that have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false), is to split/lower LSHIFTRT to a ZERO_EXTEND. To answer Richard's question in comment #10 of the bugzilla PR, the function resolve_shift_zext is called with one of four RTX codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with LSHIFTRT can the setting of low_part and high_part SUBREGs be replaced by a ZERO_EXTEND. For ASHIFTRT, we require a sign extension, so don't set the high_part to zero; if we're splitting a ZERO_EXTEND then it doesn't make sense to replace it with a ZERO_EXTEND, and for ASHIFT we've played games to swap the high_part and low_part SUBREGs, so that we assign the low_part to zero (for double word shifts by greater than word size bits). 2023-04-28 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR rtl-optimization/109476 * lower-subreg.cc: Include explow.h for force_reg. (find_decomposable_shift_zext): Pass an additional SPEED_P argument. If decomposing a suitable LSHIFTRT and we're not splitting ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND instead of setting a high part SUBREG to zero, which helps combine. (decompose_multiword_subregs): Update call to resolve_shift_zext. gcc/testsuite/ChangeLog PR rtl-optimization/109476 * gcc.target/avr/mmcu/pr109476.c: New test case.
-
Roger Sayle authored
This patch updates include/ctf.h to match the current libctf version in binutils' include/. I recently attempted to build a uber tree (following some notes that are so old they used CVS) and noticed that binutils won't build with gcc's top-level include, due to CTF_F_IDXSORTED not being defined in ctf.h. 2023-04-28 Roger Sayle <roger@nextmovesoftware.com> include/ChangeLog * ctf.h: Import latest version from binutils/libctf.
-
Richard Biener authored
This adds a scatter vectorization capability to the vectorizer without target support by decomposing the offset and data vectors and then performing scalar stores in the order of vector lanes. This is aimed at cases where vectorizing the rest of the loop offsets the cost of vectorizing the scatter. The offset load is still vectorized and costed as such, but like with emulated gather those will be turned back to scalar loads by forwrpop. * tree-vect-data-refs.cc (vect_analyze_data_refs): Always consider scatters. * tree-vect-stmts.cc (vect_model_store_cost): Pass in the gather-scatter info and cost emulated scatters accordingly. (get_load_store_type): Support emulated scatters. (vectorizable_store): Likewise. Emulate them by extracting scalar offsets and data, doing scalar stores. * gcc.dg/vect/pr25413a.c: Un-XFAIL everywhere. * gcc.dg/vect/vect-71.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s4113.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s491.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-vas.c: Likewise.
-
Richard Biener authored
Emulated gather/scatter behave similar to strided elementwise accesses in that they need to decompose the offset vector and construct or decompose the data vector so handle them the same way, pessimizing the cases with may elements. For pr88531-2c.c instead of .L4: leaq (%r15,%rcx), %rdx incl %edi movl 16(%rdx), %r13d movl 24(%rdx), %r14d movl (%rdx), %r10d movl 4(%rdx), %r9d movl 8(%rdx), %ebx movl 12(%rdx), %r11d movl 20(%rdx), %r12d vmovss (%rax,%r14,4), %xmm2 movl 28(%rdx), %edx vmovss (%rax,%r13,4), %xmm1 vmovss (%rax,%r10,4), %xmm0 vinsertps $0x10, (%rax,%rdx,4), %xmm2, %xmm2 vinsertps $0x10, (%rax,%r12,4), %xmm1, %xmm1 vinsertps $0x10, (%rax,%r9,4), %xmm0, %xmm0 vmovlhps %xmm2, %xmm1, %xmm1 vmovss (%rax,%rbx,4), %xmm2 vinsertps $0x10, (%rax,%r11,4), %xmm2, %xmm2 vmovlhps %xmm2, %xmm0, %xmm0 vinsertf128 $0x1, %xmm1, %ymm0, %ymm0 vmulps %ymm3, %ymm0, %ymm0 vmovups %ymm0, (%r8,%rcx) addq $32, %rcx cmpl %esi, %edi jb .L4 we now prefer .L4: leaq 0(%rbp,%rdx,8), %rcx movl (%rcx), %r10d movl 4(%rcx), %ecx vmovss (%rsi,%r10,4), %xmm0 vinsertps $0x10, (%rsi,%rcx,4), %xmm0, %xmm0 vmulps %xmm1, %xmm0, %xmm0 vmovlps %xmm0, (%rbx,%rdx,8) incq %rdx cmpl %edi, %edx jb .L4 * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Tame down element extracts and scalar loads for gather/scatter similar to elementwise strided accesses. * gcc.target/i386/pr89618-2.c: New testcase. * gcc.target/i386/pr88531-2b.c: Adjust. * gcc.target/i386/pr88531-2c.c: Likewise.
-
Pan Li authored
When some RVV integer compare operators act on the same vector registers without mask. They can be simplified to VMCLR. This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind of the simplification by adding one new define_split. Given we have: vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) { return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); } Before this patch: vsetvli zero,a2,e8,m8,ta,ma vl8re8.v v24,0(a1) vmslt.vv v8,v24,v24 vsetvli a5,zero,e8,m8,ta,ma vsm.v v8,0(a0) ret After this patch: vsetvli zero,a2,e8,mf8,ta,ma vmclr.m v24 <- optimized to vmclr.m vsetvli zero,a5,e8,mf8,ta,ma vsm.v v24,0(a0) ret As above, we may have one instruction eliminated and require less vector registers. gcc/ChangeLog: * config/riscv/vector.md: Add new define split to perform the simplification. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test. Signed-off-by:
Pan Li <pan2.li@intel.com> Co-authored-by:
kito-cheng <kito.cheng@sifive.com>
-
Jonathan Wakely authored
Add @headerfile and @since tags. Add gamma_distribution to the correct group (poisson distributions). Add a group for the sampling distributions and add the missing definitions of their probability functions. Add uniform_int_distribution back to the uniform distributions group. libstdc++-v3/ChangeLog: * include/bits/random.h (gamma_distribution): Add to the right doxygen group. (discrete_distribution, piecewise_constant_distribution) (piecewise_linear_distribution): Create a new doxygen group and fix the incomplete doxygen comments. * include/bits/uniform_int_dist.h (uniform_int_distribution): Add to doxygen group.
-
Jonathan Wakely authored
libstdc++-v3/ChangeLog: * include/bits/uses_allocator.h: Add missing @file comment. * include/bits/regex.tcc: Remove stray doxygen comments. * include/experimental/memory_resource: Likewise. * include/std/bit: Tweak doxygen @cond comments. * include/std/expected: Likewise. * include/std/numbers: Likewise.
-
Jonathan Wakely authored
This avoids showing absolute paths from the expansion of @srcdir@/libsupc++/ in the doxygen File List view. libstdc++-v3/ChangeLog: * doc/doxygen/user.cfg.in (STRIP_FROM_PATH): Remove prefixes from header paths.
-