- Jan 24, 2025
-
-
rdubner authored
-
rdubner authored
-
Richard Biener authored
r15-491-gc290e6a0b7a9de fixed a latent issue with dr_analyze_innermost and dr_may_alias where not properly analyzed DRs would yield an invalid answer. This caused some missed optimizations in case there is not actually any evolution in the not analyzed base part. The following recovers this by only handling base parts which reference SSA vars as index in the conservative way. The gfortran.dg/vect/vect-8.f90 testcase is difficult to deal with, so the following merely bumps the maximum number of expected vectorized loops for both aarch64 and x86-64. PR tree-optimization/116010 * tree-data-ref.cc (contains_ssa_ref_p_1): New function. (contains_ssa_ref_p): Likewise. (dr_may_alias_p): Avoid treating unanalyzed base parts without SSA reference conservatively. * gfortran.dg/vect/vect-8.f90: Adjust.
-
Stefan Schulze Frielinghaus authored
Merge new optabs with the existing implementations for signbit and isinf. gcc/ChangeLog: * config/s390/s390.h (S390_TDC_POSITIVE_ZERO): Remove. (S390_TDC_NEGATIVE_ZERO): Remove. (S390_TDC_POSITIVE_NORMALIZED_BFP_NUMBER): Remove. (S390_TDC_NEGATIVE_NORMALIZED_BFP_NUMBER): Remove. (S390_TDC_POSITIVE_DENORMALIZED_BFP_NUMBER): Remove. (S390_TDC_NEGATIVE_DENORMALIZED_BFP_NUMBER): Remove. (S390_TDC_POSITIVE_INFINITY): Remove. (S390_TDC_NEGATIVE_INFINITY): Remove. (S390_TDC_POSITIVE_QUIET_NAN): Remove. (S390_TDC_NEGATIVE_QUIET_NAN): Remove. (S390_TDC_POSITIVE_SIGNALING_NAN): Remove. (S390_TDC_NEGATIVE_SIGNALING_NAN): Remove. (S390_TDC_POSITIVE_DENORMALIZED_DFP_NUMBER): Remove. (S390_TDC_NEGATIVE_DENORMALIZED_DFP_NUMBER): Remove. (S390_TDC_POSITIVE_NORMALIZED_DFP_NUMBER): Remove. (S390_TDC_NEGATIVE_NORMALIZED_DFP_NUMBER): Remove. (S390_TDC_SIGNBIT_SET): Remove. (S390_TDC_INFINITY): Remove. * config/s390/s390.md (signbit<mode>2<tf_fpr>): Merge this one (isinf<mode>2<tf_fpr>): and this one into (<TDC_CLASS:tdc_insn><mode>2<tf_fpr>): new expander. (isnormal<mode>2<tf_fpr>): New BFP expander. (isnormal<mode>2): New DFP expander. * config/s390/vector.md (signbittf2_vr): Merge this one (isinftf2_vr): and this one into (<tdc_insn>tf2_vr): new expander. (signbittf2): Merge this one (isinftf2): and this one into (<tdc_insn>tf2): new expander. gcc/testsuite/ChangeLog: * gcc.target/s390/isfinite-isinf-isnormal-signbit-1.c: New test. * gcc.target/s390/isfinite-isinf-isnormal-signbit-2.c: New test. * gcc.target/s390/isfinite-isinf-isnormal-signbit-3.c: New test. * gcc.target/s390/isfinite-isinf-isnormal-signbit.h: New test.
-
Richard Biener authored
We no longer subtract the estimated eliminated number of instructions from the estimated size after unrolling we print - this is a bit confusing when comparing dumps to previous releases. The following changes the dump from Estimated size after unrolling: 42 to Estimated size after unrolling: 42-12 for the testcase in the PR. PR tree-optimization/118634 * tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Dump the number of estimated eliminated insns.
-
Saurabh Jha authored
Earlier, we were gating SVE2 faminmax behind sve+faminmax. This was incorrect and this patch changes it so that it is gated behind sve2+faminmax. gcc/ChangeLog: * config/aarch64/aarch64-sve2.md: (*aarch64_pred_faminmax_fused): Fix to use the correct flags. * config/aarch64/aarch64.h (TARGET_SVE_FAMINMAX): Remove. * config/aarch64/iterators.md: Fix iterators so that famax and famin use correct flags. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/faminmax_1.c: Fix test to use the correct flags. * gcc.target/aarch64/sve/faminmax_2.c: Fix test to use the correct flags. * gcc.target/aarch64/sve/faminmax_3.c: New test.
-
Alexandre Oliva authored
When comparing a signed narrow variable with a wider constant that has the bit corresponding to the variable's sign bit set, we would check that the constant is a sign-extension from that sign bit, and conclude that the compare fails if it isn't. When the signed variable is masked without getting the [lr]l_signbit variable set, or when the sign bit itself is masked out, we know the sign-extension bits from the extended variable are going to be zero, so the constant will only compare equal if it is a zero- rather than sign-extension from the narrow variable's precision, therefore, check that it satisfies this property, and yield a false compare result otherwise. for gcc/ChangeLog PR tree-optimization/118572 * gimple-fold.cc (fold_truth_andor_for_ifcombine): Compare as unsigned the variables whose extension bits are masked out. for gcc/testsuite/ChangeLog PR tree-optimization/118572 * gcc.dg/field-merge-24.c: New.
-
Alexandre Oliva authored
Don't reject an ifcombine field-merging opportunity just because the left-hand operands aren't both reversed, if the second compare needs to be swapped for operands to match. Also mention that reversep does NOT affect the turning of range tests into bit tests. for gcc/ChangeLog * gimple-fold.cc (fold_truth_andor_for_ifcombine): Document reversep's absence of effects on range tests. Don't reject reversep mismatches before trying compare swapping.
-
Alexandre Oliva authored
Check that BIT_FIELD_REFs of DECLs are in range before deciding they don't trap. Check that a replacement bitfield load is as trapping as the replaced load. for gcc/ChangeLog PR tree-optimization/118514 * tree-eh.cc (bit_field_ref_in_bounds_p): New. (tree_could_trap_p) <BIT_FIELD_REF>: Call it. * gimple-fold.cc (make_bit_field_load): Check trapping status of replacement load against original load. for gcc/testsuite/ChangeLog PR tree-optimization/118514 * gcc.dg/field-merge-23.c: New.
-
GCC Administrator authored
-
- Jan 23, 2025
-
-
Marek Polacek authored
The error here should also check that we aren't nested in another lambda; in it, at_function_scope_p() will be false. PR c++/117602 gcc/cp/ChangeLog: * cp-tree.h (current_nonlambda_scope): Add a default argument. * lambda.cc (current_nonlambda_scope): New bool parameter. Use it. * parser.cc (cp_parser_lambda_introducer): Use current_nonlambda_scope to check if the lambda is non-local. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-uneval21.C: New test. Reviewed-by:
Jason Merrill <jason@redhat.com>
-
Jakub Jelinek authored
After committing the append_ctor_to_tree_vector patch, I've realized that for the larger constructors make_tree_vector_from_ctor unnecessarily wastes one GC vector; make_tree_vector () / release_tree_vector () only caches GC vectors from 4 to 16 allocated tree elements, so in the likely case of a rather small ctor using make_tree_vector () can be beneficial, we can pick something from the cache and if we don't need it later, pt.cc calls release_tree_vector on it to return it back to the cache. But for the larger ctors, we just eat one vector from the cache, never use it (because the vec_safe_reserve will immediately allocate a different vector) and never return it back to the cache. So, the following patch passes NULL for the larger vectors, which append_ctor_to_tree_vector handles just fine now (vec_safe_reserve will just allocate appropriately sized vector). 2025-01-23 Jakub Jelinek <jakub@redhat.com> * c-common.cc (make_tree_vector_from_ctor): Only use make_tree_vector for ctors with <= 16 elements.
-
John David Anglin authored
2025-01-23 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa32-regs.h (ADDITIONAL_REGISTER_NAMES): Change register 86 name to "%fr31L".
-
rdubner authored
-
rdubner authored
-
Jakub Jelinek authored
vectorizable_{store,load} does roughly tree offvar; tree running_off; if (!costing_p) { ... initialize offvar ... } running_off = offvar; for (...) { if (costing_p) { ... continue; } ... use running_off ... } so, it copies unconditionally sometimes uninitialized variable (but then uses the copied variable only if it was set to something initialized). Still, I think it is better to avoid copying around maybe uninitialized vars. 2025-01-23 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/118628 * tree-vect-stmts.cc (vectorizable_store, vectorizable_load): Initialize offvar to NULL_TREE.
-
rdubner authored
-
Harald Anlauf authored
PR fortran/118613 gcc/fortran/ChangeLog: * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxval): Adjust algorithm for inlined version of MINLOC and MAXLOC so that arguments are only evaluted once, and create temporaries where necessary. Document change of algorithm. gcc/testsuite/ChangeLog: * gfortran.dg/maxval_arg_eval_count.f90: New test.
-
James K. Lowden authored
-
Georg-Johann Lay authored
This patch tries to work around PR118012 which may use a full fledged multiplication instead of a simple bit test. This is because match.pd's /* (zero_one == 0) ? y : z <op> y -> ((typeof(y))zero_one * z) <op> y */ /* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */ "optimizes" code with op in { plus, ior, xor } like if (a & 1) b = b <op> c; to something like: x1 = EXTRACT_BIT0 (a); x2 = c MULT x1; b = b <op> x2; or x1 = EXTRACT_BIT0 (a); x2 = ZERO_EXTEND (x1); x3 = NEG x2; x4 = a AND x3: b = b <op> x4; which is very expensive and may even result in a libgcc call for a 32-bit multiplication on devices that don't even have MUL. Notice that EXTRACT_BIT0 is already more expensive (slower, more code, more register pressure) than a bit-test + branch. The patch: o Adds some combiner patterns that try to map sick code back to a bit test + branch. o Adjusts costs to make MULT (x AND 1) cheap, in the hope that the middle-end will use that alternative (which we map to sane code). o On devices without MUL, 32-bit multiplication was performed by a library call, which bypasses the MULT (x AND 1) and similar patterns. Therefore, mulsi3 is also allowed for devices without MUL so that we get at MULT pattern that can be transformed. (Though this is not possible on AVR_TINY since it passes arguments on the stack). o Add a new command line option -mpr118012, so most of the patterns and cost computations can be switched off as they have avropt_pr118012 in their insn condition. o Added sign-extract.0 patterns unconditionally (no avropt_pr118012). Notice that this patch is just a work-around, it's not a fix of the root cause, which are the patterns in match.pd that don't care about the target and don't even care about costs. The work-around is incomplete, and 3 of the new tests are still failing. This is because there are situations where it does not work: * The MULT is realized as a library call. * The MULT is realized as an ASHIFT, and the ASHIFT again is transformed into something else. For example, with -O2 -mmcu=atmega128, ASHIFT(3) is transformed into ASHIFT(1) + ASHIFT(2). PR tree-optimization/118012 PR tree-optimization/118360 gcc/ * config/avr/avr.opt (-mpr118012): New undocumented option. * config/avr/avr-protos.h (avr_out_sextr) (avr_emit_skip_pixop, avr_emit_skip_clear): New protos. * config/avr/avr.cc (avr_adjust_insn_length) [case ADJUST_LEN_SEXTR]: Handle case. (avr_rtx_costs_1) [NEG]: Costs for NEG (ZERO_EXTEND (ZERO_EXTRACT)). [MULT && avropt_pr118012]: Costs for MULT (x AND 1). (avr_out_sextr, avr_emit_skip_pixop, avr_emit_skip_clear): New functions. * config/avr/avr.md [avropt_pr118012]: Add combine patterns with that condition that try to work around PR118012. (adjust_len) <sextr>: Add insn attr value. (pixop): New code iterator. (mulsi3) [avropt_pr118012 && !AVR_TINY]: Allow these in insn condition. gcc/testsuite/ * gcc.target/avr/mmcu/pr118012-1.h: New file. * gcc.target/avr/mmcu/pr118012-1-o2-m128.c: New test. * gcc.target/avr/mmcu/pr118012-1-os-m128.c: New test. * gcc.target/avr/mmcu/pr118012-1-o2-m103.c: New test. * gcc.target/avr/mmcu/pr118012-1-os-m103.c: New test. * gcc.target/avr/mmcu/pr118012-1-o2-t40.c: New test. * gcc.target/avr/mmcu/pr118012-1-os-t40.c: New test. * gcc.target/avr/mmcu/pr118360-1.h: New file. * gcc.target/avr/mmcu/pr118360-1-o2-m128.c: New test. * gcc.target/avr/mmcu/pr118360-1-os-m128.c: New test. * gcc.target/avr/mmcu/pr118360-1-o2-m103.c: New test. * gcc.target/avr/mmcu/pr118360-1-os-m103.c: New test. * gcc.target/avr/mmcu/pr118360-1-o2-t40.c: New test. * gcc.target/avr/mmcu/pr118360-1-os-t40.c: New test.
-
Jan Hubicka authored
the following testcase: bool f(const std::vector<bool>& v, std::size_t x) { return v[x]; } is compiled as: f(std::vector<bool, std::allocator<bool> > const&, unsigned long): testq %rsi, %rsi leaq 63(%rsi), %rax movq (%rdi), %rdx cmovns %rsi, %rax sarq $6, %rax leaq (%rdx,%rax,8), %rdx movq %rsi, %rax sarq $63, %rax shrq $58, %rax addq %rax, %rsi andl $63, %esi subq %rax, %rsi jns .L2 addq $64, %rsi subq $8, %rdx .L2: movl $1, %eax shlx %rsi, %rax, %rax andq (%rdx), %rax setne %al ret which is quite expensive for simple bit access in a bitmap. The reason is that the bit access is implemented using iterators return begin()[__n]; Which in turn cares about situation where __n is negative yielding the extra conditional. _GLIBCXX20_CONSTEXPR void _M_incr(ptrdiff_t __i) { _M_assume_normalized(); difference_type __n = __i + _M_offset; _M_p += __n / int(_S_word_bit); __n = __n % int(_S_word_bit); if (__n < 0) { __n += int(_S_word_bit); --_M_p; } _M_offset = static_cast<unsigned int>(__n); } While we can use __builtin_unreachable to declare that __n is in range 0...max_size () but I think it is better to implement it directly, since resulting code is shorter and much easier to optimize. We now porduce: .LFB1248: .cfi_startproc movq (%rdi), %rax movq %rsi, %rdx shrq $6, %rdx andq (%rax,%rdx,8), %rsi andl $63, %esi setne %al ret Testcase suggests movq (%rdi), %rax movl %esi, %ecx shrq $5, %rsi # does still need to be 64-bit movl (%rax,%rsi,4), %eax btl %ecx, %eax setb %al retq Which is still one instruction shorter. libstdc++-v3/ChangeLog: PR target/80813 * include/bits/stl_bvector.h (vector<bool, _Alloc>::operator []): Do not use iterators. gcc/testsuite/ChangeLog: PR target/80813 * g++.dg/tree-ssa/bvector-3.C: New test.
-
Richard Sandiford authored
rtl-ssa uses degenerate phis to maintain an RPO list of accesses in which every use is of the RPO-previous definition. Thus, if it finds that a phi is always equal to a particular value V, it sometimes needs to keep the phi and make V the single input, rather than replace all uses of the phi with V. The code to do that rerouted the phi's first input to the single value V. But as this PR shows, it failed to unlink the uses of the other inputs. The specific problem in the PR was that we had: x = PHI<x(a), V(b)> The code replaced the first input with V and removed the second input from the phi, but it didn't unlink the use of V associated with that second input. gcc/ PR rtl-optimization/118562 * rtl-ssa/blocks.cc (function_info::replace_phi): When converting to a degenerate phi, make sure to remove all uses of the previous inputs. gcc/testsuite/ PR rtl-optimization/118562 * gcc.dg/torture/pr118562.c: New test.
-
Richard Sandiford authored
GCC 15 is the first release to support FP8 intrinsics. The underlying instructions depend on the value of a new register, FPMR. Unlike FPCR, FPMR is a normal call-clobbered/caller-save register rather than a global register. So: - The FP8 intrinsics take a final uint64_t argument that specifies what value FPMR should have. - If an FP8 operation is split across multiple functions, it is likely that those functions would have a similar argument. If the object code has the structure: for (...) fp8_kernel (..., fpmr_value); then fp8_kernel would set FPMR to fpmr_value each time it is called, even though FPMR will already have that value for at least the second and subsequent calls (and possibly the first). The working assumption for the ABI has been that writes to registers like FPMR can in general be more expensive than reads and so it would be better to use a conditional write like: mrs tmp, fpmr cmp tmp, <value> beq 1f msr fpmr, <value> 1: instead of writing the same value to FPMR repeatedly. This patch implements that. It also adds a tuning flag that suppresses the behaviour, both to make testing easier and to support any future cores that (for example) are able to rename FPMR. Hopefully this really is the last part of the FP8 enablement. gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag. * config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro. * config/aarch64/aarch64.md: Split moves into FPMR into a test and branch around. (aarch64_write_fpmr): New pattern. gcc/testsuite/ * g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add cheap_fpmr_write by default. * gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise. * gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write. * gcc.target/aarch64/acle/fpmr-2.c: Likewise. * gcc.target/aarch64/simd/vcvt_fpm.c: Likewise. * gcc.target/aarch64/simd/vdot2_fpm.c: Likewise. * gcc.target/aarch64/simd/vdot4_fpm.c: Likewise. * gcc.target/aarch64/simd/vmla_fpm.c: Likewise. * gcc.target/aarch64/acle/fpmr-6.c: New test.
-
Richard Sandiford authored
GCC 15 is going to be the first release to support FPMR. While working on a follow-up patch, I noticed that for: (set (reg:DI R) ...) ... (set (reg:DI fpmr) (reg:DI R)) IRA would prefer to spill R to memory rather than allocate a GPR. This is because the register move cost for GENERAL_REGS to MOVEABLE_SYSREGS is very high: /* Moves to/from sysregs are expensive, and must go via GPR. */ if (from == MOVEABLE_SYSREGS) return 80 + aarch64_register_move_cost (mode, GENERAL_REGS, to); if (to == MOVEABLE_SYSREGS) return 80 + aarch64_register_move_cost (mode, from, GENERAL_REGS); but the memory cost for MOVEABLE_SYSREGS was the same as for GENERAL_REGS, making memory much cheaper. Loading and storing FPMR involves a GPR temporary, so the cost should account for moving into and out of that temporary. This did show up indirectly in some of the existing asm tests, where the stack frame allocated 16 bytes for callee saves (D8) and another 16 bytes for spilling a temporary register. It's possible that other registers need the same treatment and it's more than probable that this code needs a rework. None of that seems suitable for stage 4 though. gcc/ * config/aarch64/aarch64.cc (aarch64_memory_move_cost): Account for the cost of moving in and out of GENERAL_SYSREGS. gcc/testsuite/ * gcc.target/aarch64/acle/fpmr-5.c: New test. * gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Don't expect a spill slot to be allocated. * gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.
-
Richard Sandiford authored
GCC 15 is going to be the first release to support FPMR. The alternatives for moving values into FPMR were missing a zero alternative, meaning that moves of zero would use an unnecessary temporary register. gcc/ * config/aarch64/aarch64.md (*mov<SHORT:mode>_aarch64) (*movsi_aarch64, *movdi_aarch64): Allow the source of an MSR to be zero. gcc/testsuite/ * gcc.target/aarch64/acle/fp8.c: Add tests for moving zero into FPMR.
-
Jakub Jelinek authored
The assume_query constructor does assume_query::assume_query (function *f, bitmap p) : m_parm_list (p), m_func (f) where m_parm_list is bitmap &. This is compile time UB, because as soon as the constructor returns, m_parm_list reference is still bound to the parameter of the constructor which is no longer in scope. Now, one possible fix would be change the ctor argument to be bitmap &, but that doesn't really work because in the only user of that class we have auto_bitmap decls; ... assume_query query (fun, decls); and auto_bitmap just has operator bitmap () { return &m_bits; } Could be perhaps const bitmap &, but why? bitmap is a pointer: typedef class bitmap_head *bitmap; and the EXECUTE_IF_SET_IN_BITMAP macros don't really change that point, they just inspect what is inside of that bitmap_head the pointer points to. So, the simplest I think is avoid references (which cause even worse code as it has to be dereferenced twice rather than once). 2025-01-23 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/118605 * tree-assume.cc (assume_query::m_parm_list): Change type from bitmap & to bitmap.
-
Tejas Belagod authored
Currently poly-int type structures are passed by value to OpenMP runtime functions for shared clauses etc. This patch improves on this by passing around poly-int structures by address to avoid copy-overhead. gcc/ChangeLog: * omp-low.cc (use_pointer_for_field): Use pointer if the OMP data structure's field type is a poly-int.
-
Rainer Orth authored
The new gcc.target/i386/cmov12.c test FAILs on Solaris/x86 with the native as: FAIL: gcc.target/i386/cmov12.c scan-assembler-times cmovg 3 This happens because as uses a different syntax for cmov: --- cmov12.s.bu243 2025-01-21 16:55:27.038829605 +0100 +++ cmov12.s.bu24390 2025-01-21 16:55:44.565051230 +0100 @@ -41,9 +41,9 @@ leal 1(%rdx), %ebp movl (%r11), %esi cmpl %eax, %esi - cmovg %ebp, %edx - cmovg %r11, %rcx - cmovg %esi, %eax + cmovl.g %ebp, %edx + cmovq.g %r11, %rcx + cmovl.g %esi, %eax The problem is even more prominent with the upcoming gas 2.44 which added support for the Sun as syntax on Solaris, which gcc/configure picks up. This patch allows for both forms. Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu. 2025-01-22 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/testsuite: * gcc.target/i386/cmov12.c (scan-assembler-times): Allow for cmovl.g etc.
-
Jakub Jelinek authored
As can be seen on the testcase, when array_expr is type dependent, assuming it has non-NULL TREE_TYPE is just wrong, it can often have NULL type, and even if not, blindly assuming it is a pointer or array type is also wrong. So, like in many other spots in the C++ FE, for type dependent expressions we want to create something which will survive until instantiation and can be redone at that point. Unfortunately, build_omp_array_section is called before we actually do any kind of checking what array_expr really is, and on invalid code it can be e.g. a TYPE_DECL on which type_dependent_expression_p ICEs (as can be seen on the pr67522.C testcase). So, I've hacked this by checking it is not TYPE_DECL, I hope a TYPE_P can't make it through there when we just lookup an identifier. Anyway, this patch is not enough, we can ICE e.g. on __uint128_t[0:something] during instantiation, so I think something needs to be done for this in pt.cc as well. 2025-01-23 Jakub Jelinek <jakub@redhat.com> PR c++/118590 * typeck.cc (build_omp_array_section): If array_expr is type dependent or a TYPE_DECL, build OMP_ARRAY_SECTION with NULL type. * g++.dg/goacc/pr118590.C: New test.
-
Jakub Jelinek authored
Some clang analyzer warned about if (!strcmp (p, "when") == 0 && !default_p) which really looks weird, it is better to use strcmp (p, "when") != 0 or !!strcmp (p, "when"). Furthermore, as a micro optimization, it is cheaper to evaluate default_p than calling strcmp, so that can be put first in the &&. The C test for the same thing wasn't that weird, but I think for consistency it is better to use the same test rather than trying to be creative. 2025-01-23 Jakub Jelinek <jakub@redhat.com> PR c++/118604 gcc/c/ * c-parser.cc (c_parser_omp_metadirective): Rewrite condition for clauses other than when, default and otherwise. gcc/cp/ * parser.cc (cp_parser_omp_metadirective): Test !default_p first and use strcmp () != 0 rather than !strcmp () == 0.
-
Jakub Jelinek authored
The fold_builtin_frexp folding for NaN/Inf just returned the first argument with evaluating second arguments side-effects, rather than storing something to what the second argument points to. The PR argues that the C standard requires the function to store something there but what exactly is stored is unspecified, so not storing there anything can result in UB if the value isn't initialized and is read later. glibc and newlib store there 0, musl apparently doesn't store anything. The following patch stores there zero (or would you prefer storing there some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form in assembly though) and adjusts the test so that it doesn't rely on not storing there anything but instead checks for -Wmaybe-uninitialized warning to find out that something has been stored there. Unfortunately I had to disable the NaN tests for -O0, while we can fold __builtin_isnan (__builtin_nan ("")) at compile time, we can't fold __builtin_isnan ((i = 0, __builtin_nan (""))) at compile time. fold_builtin_classify uses just tree_expr_nan_p and if that isn't true (because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg), it does arg = builtin_save_expr (arg); return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg); and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and nothing propagates the NAN to the comparison. I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR: added and recurse on the second argument, but that feels like stage1 material to me if we want to do that at all. 2025-01-23 Jakub Jelinek <jakub@redhat.com> PR middle-end/114877 * builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases like rvc_zero, return passed in arg and set *exp = 0. * gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as dg-additional-options. (bar): New function. (TESTIT_FREXP2): Rework the macro so that it doesn't test whether nothing has been stored to what the second argument points to, but instead that something has been stored there, whatever it is. (main): Temporarily don't enable the nan tests for -O0.
-
Torbjörn SVENSSON authored
Most baremetal toolchains will not have an implementation for alarm and sigaction as they are target specific. For arm-none-eabi with newlib, function signatures are exposed, but there is no implmentation and thus the test cases causes a undefined symbol link error. gcc/testsuite/ChangeLog: * gcc.dg/pr78185.c: Remove dg-do and replace with with dg-require-effective-target of signal and alarm. * gcc.dg/pr116906-1.c: Likewise. * gcc.dg/pr116906-2.c: Likewise. * gcc.dg/vect/pr101145inf.c: Use effective-target alarm. * gcc.dg/vect/pr101145inf_1.c: Likewise. * lib/target-supports.exp(check_effective_target_alarm): New. gcc/ChangeLog: * doc/sourcebuild.texi (Effective-Target Keywords): Document 'alarm'. Signed-off-by:
Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
-
Georg-Johann Lay authored
As it turns out, logical 32-bit shifts with an offset of 25..30 can be performed in 7 instructions or less. This beats the 7 instruc- tions required for the default code of a shift loop. Plus, with zero overhead, these cases can be 3-operand. This is only relevant for -Oz because with -Os, 3op shifts are split with -msplit-bit-shift (which is not performed with -Oz). PR target/117726 gcc/ * config/avr/avr.cc (avr_ld_regno_p): New function. (ashlsi3_out) [case 25,26,27,28,29,30]: Handle and tweak. (lshrsi3_out): Same. (avr_rtx_costs_1) [SImode, ASHIFT, LSHIFTRT]: Adjust costs. * config/avr/avr.md (ashlsi3, *ashlsi3, *ashlsi3_const): Add "r,r,C4L" alternative. (lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,r,C4R" alternative. * config/avr/constraints.md (C4R, C4L): New, gcc/testsuite/ * gcc.target/avr/torture/avr-torture.exp (AVR_TORTURE_OPTIONS): Turn one option variant into -Oz.
-
Paul Thomas authored
2025-01-23 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/96087 * trans-decl.cc (gfc_get_symbol_decl): If a dummy is missing a backend decl, it is likely that it has come from a module proc interface. Look for the formal symbol by name in the containing proc and use its backend decl. * trans-expr.cc (gfc_apply_interface_mapping_to_expr): For the same reason, match the name, rather than the symbol address to perform the mapping. gcc/testsuite/ PR fortran/96087 * gfortran.dg/pr96087.f90: New test.
-
Richard Biener authored
There are calls to dr_misalignment left that do not correct for the offset (which is vector type dependent) when the stride is negative. Notably vect_known_alignment_in_bytes doesn't allow to pass through such offset which the following adds (computing the offset in vect_known_alignment_in_bytes would be possible as well, but the offset can be shared as seen). Eventually this function could go away. This leads to peeling for gaps not considerd, nor shortening of the access applied which is what fixes the testcase on x86_64. PR tree-optimization/118558 * tree-vectorizer.h (vect_known_alignment_in_bytes): Pass through offset to dr_misalignment. * tree-vect-stmts.cc (get_group_load_store_type): Compute offset applied for negative stride and use it when querying alignment of accesses. (vectorizable_load): Likewise. * gcc.dg/vect/pr118558.c: New testcase.
-
Nathaniel Shead authored
https://github.com/itanium-cxx-abi/cxx-abi/pull/85 clarifies that mangling a lambda expression should use 'L' rather than "tl". gcc/cp/ChangeLog: * mangle.cc (write_expression): Update mangling for lambdas. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-generic-mangle1.C: Update mangling. * g++.dg/cpp2a/lambda-generic-mangle1a.C: Likewise. Signed-off-by:
Nathaniel Shead <nathanieloshead@gmail.com>
-
Nathaniel Shead authored
This fixes an issue where lambdas declared in the initializer of a static data member within the class body do not get a mangling scope of that variable; this results in mangled names that do not conform to the ABI spec. To do this, the patch splits up grokfield for this case specifically, allowing a declaration to be build and used in start_lambda_scope before parsing the initializer, so that record_lambda_scope works correctly. As a drive-by, this also fixes the issue of a static member not being visible within its own initializer. PR c++/107741 gcc/c-family/ChangeLog: * c-opts.cc (c_common_post_options): Bump ABI version. gcc/ChangeLog: * common.opt: Add -fabi-version=20. * doc/invoke.texi: Likewise. gcc/cp/ChangeLog: * cp-tree.h (start_initialized_static_member): Declare. (finish_initialized_static_member): Declare. * decl2.cc (start_initialized_static_member): New function. (finish_initialized_static_member): New function. * lambda.cc (record_lambda_scope): Support falling back to old ABI (maybe with warning). * parser.cc (cp_parser_member_declaration): Build decl early when parsing an initialized static data member. gcc/testsuite/ChangeLog: * g++.dg/abi/macro0.C: Bump ABI version. * g++.dg/abi/mangle74.C: Remove XFAILs. * g++.dg/other/fold1.C: Restore originally raised error. * g++.dg/abi/lambda-ctx2-19.C: New test. * g++.dg/abi/lambda-ctx2-19vs20.C: New test. * g++.dg/abi/lambda-ctx2-20.C: New test. * g++.dg/abi/lambda-ctx2.h: New test. * g++.dg/cpp0x/static-member-init-1.C: New test. Signed-off-by:
Nathaniel Shead <nathanieloshead@gmail.com>
-
Nathaniel Shead authored
When we started streaming the bit to handle merging of imported temploid friends in r15-2807, I unthinkingly only streamed it in the '!state->is_header ()' case. This patch reworks the streaming logic to ensure that this data is always streamed, including for unique entities (in case that ever comes up somehow). This does make the streaming slightly less efficient, as functions and types will need an extra byte, but this doesn't appear to make a huge difference to the size of the resulting module; the 'std' module on my machine grows by 0.2% from 30671136 to 30730144 bytes. PR c++/118582 gcc/cp/ChangeLog: * module.cc (trees_out::decl_value): Always stream imported_temploid_friends information. (trees_in::decl_value): Likewise. gcc/testsuite/ChangeLog: * g++.dg/modules/pr118582_a.H: New test. * g++.dg/modules/pr118582_b.H: New test. * g++.dg/modules/pr118582_c.H: New test. Signed-off-by:
Nathaniel Shead <nathanieloshead@gmail.com>
-
Xi Ruoyao authored
The test case added in r15-7073 now triggers an ICE, indicating we need the same fix as AArch64. gcc/ChangeLog: PR target/118501 * config/loongarch/loongarch.md (@xorsign<mode>3): Use force_lowpart_subreg.
-