gcc.gnu.org Git - gcc.git/log

fortran: fix simple typo in libgfortran

This patch fix a simple typo in the comment of libgfortran.
No user facing change here.

libgfortran/ChangeLog:

* io/read.c (read_f): Comment typo, explict -> explicit.

Signed-off-by: Yuao Ma <c8ef@outlook.com>

testsuite: Disable bit tests in aarch64/pr99988.c

My recent changes to bit-test switch lowering broke pr99988.c testcase.
The testcase assumes a switch will be lowered using jump tables. Make
the testcase run with -fno-bit-tests.

Pushed as obvious.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr99988.c: Add -fno-bit-tests.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

gimple: Don't assert that switch has nondefault cases during lowering [PR120080]

I have mistakenly assumed that switch lowering cannot encounter a switch
with zero clusters. This patch removes the relevant assert and instead
gives up bit-test lowering when this happens.

PR tree-optimization/120080

gcc/ChangeLog:

* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Replace assert with return.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr120080.c: New test.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

[V2][RISC-V] Synthesize more efficient IOR/XOR sequences

So mvconst_internal's primary benefit is in constant synthesis not impacting
the combine budget in terms of the number of instructions it is willing to
combine together at any given time.  The downside is mvconst_internal breaks
combine's toplevel costing model and as a result many other patterns have to be
implemented as define_insn_and_splits rather than the often more natural
define_splits.

This primarily impacts logical operations where we want to see the constant
operand and potentially simplify the logical with other nearby logicals or
shifts.

We can reduce our reliance on mvconst_internal and generate better code for
various cases by generating better initial code for logical operations.

So let's assume we have a inclusive-or of a register with a nontrivial
constant.  Right now we will load the nontrivial constant into a new pseudo
(using multiple instructions), then emit a two register source ior operation.

For some cases we can just generate the code we want at expansion time.
Concretely let's take this testcase:

> unsigned long foo(unsigned long src) { return src | 0x8800000000000007; }

Right now we generate this code:

>         li      a5,-15
>         slli    a5,a5,59
>         addi    a5,a5,7
>         or      a0,a0,a5

The first three instructions are synthesizing the constant.  The last
instruction performs the desired operation.  But we can do better:

>         ori     a0,a0,7
>         bseti   a0,a0,59
>         bseti   a0,a0,63

Notice how we never even bother to synthesize the constant.

IOR/XOR are pretty simple and this patch focuses exclusively on those. We use
[x]ori to set whatever low 11 bits we need, then bset/binv for a small number
of higher bits.  We use the cost of constant synthesis as our budget.

We also support a couple special cases.  First, we might be able to rotate the
source value such that all the bits we want to manipulate are in the low 11
bits.  So we rotate the source, manipulate the bits, then rotate things back to
where they belong.  I didn't see this trigger in spec, but I did trivially find
a testcase where it was likely faster.

Second, we can have cases where we want to invert most of the bits, but a small
number are supposed to be preserved.  We can pre-flip the bits we want to
preserve with binv, then invert the whole register with not (which puts the
bits to be preserved back in their original state).

I suspect there are likely a few more cases that could be improved, but the
patch should stand on its own now and getting it out of the way allows us to
focus on logical AND which is far tougher, but also more important in the task
of removing mvconst_internal.

As we're not removing mvconst_internal yet, this patch is mostly a nop. I did
look at spec before/after and didn't see anything particular interesting.  I
also temporarily removed mvconst_internal and looked at spec before/after to
hopefully ensure we weren't missing anything obvious in the XOR/IOR cases.
Obviously that latter test showed all kinds of regressions with AND.

We're still working through implementation details on the AND case and
determining what bridge patterns we're going to need to ensure we don't
regress.   But this XOR/IOR patch is in good enough shape that it can go
forward now.

Naturally this has been run through my tester (bootstrap & regression test is
in flight, but won't finish for many more hours).  Obviously I'm quite
interested in anything spit out by the pre-commit CI system.

gcc/

* config/riscv/iterators.md (OPTAB): New iterator.
* config/riscv/predicates.md (arith_or_zbs_operand): Remove.
(reg_or_const_int_operand): New predicate.
* config/riscv/riscv-protos.h (synthesize_ior_xor): Prototype.
* config/riscv/riscv.cc (synthesize_ior_xor): New function.
* config/riscv/riscv.md (ior/xor expander): Use synthesize_ior_xor.

gcc/testsuite/

* gcc.target/riscv/ior-synthesis-1.c: New test.
* gcc.target/riscv/ior-synthesis-2.c: New test.
* gcc.target/riscv/xor-synthesis-1.c: New test.
* gcc.target/riscv/xor-synthesis-2.c: New test.
* gcc.target/riscv/xor-synthesis-3.c: New test.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

i386/cygming: Decrease default preferred stack boundary for 32-bit targets

This commit decreases the default preferred stack boundary to 4.

In i386-options.cc, there's

ix86_default_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY;

which sets the default incoming stack boundary to this value, if it's not
overridden by other options or attributes.

Previously, GCC preferred 16-byte alignment like other platforms, unless
`-miamcu` was specified. However, the Microsoft x86 ABI only requires the
stack be aligned to 4-byte boundaries. Callback functions from MSVC code may
break this assumption by GCC (see reference below), causing local variables
to be misaligned.

For compatibility reasons, when the attribute `force_align_arg_pointer` is
attached to a function, it continues to ensure the stack is at least aligned
to a 16-byte boundary, as the documentation seems to suggest.

After this change, `STACK_REALIGN_DEFAULT` no longer has an effect on this
target, so it is removed.

Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111107#c9
Signed-off-by: LIU Hao <lh_mouse@126.com>
Signed-off-by: Jonathan Yong <10walls@gmail.com>
gcc/ChangeLog:

PR target/111107
* config/i386/cygming.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Override
definition from i386.h.
(STACK_REALIGN_DEFAULT): Undefine, as it no longer has an effect.
* config/i386/i386.cc (ix86_update_stack_boundary): Force minimum
128-bit alignment if `force_align_arg_pointer`.

[PATCH v2] RISC-V: Use vclmul for CRC expansion if available

If the vector version of clmul (vclmul) is available and the scalar
one is not, use it for CRC expansion.

gcc/
* config/riscv/bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4): Check
TARGET_ZVBC.
* config/riscv/riscv.cc (expand_crc_using_clmul): Emit code using
vclmul if TARGET_ZVBC.

gcc/testsuite

* gcc.target/riscv/rvv/base/crc-builtin-zvbc.c: New test.

[testsuite] [ppc] pr87600, pr89313: test for __PPC__ as well

gcc.dg/pr87600.h and gcc.dg/pr89313.c test for __powerpc__ and
__POWERPC__ to choose ppc register names, but ppc-elf defines neither;
it defines __PPC__, so test for that as well.

for gcc/testsuite/ChangeLog

* gcc.dg/pr87600.h (REG1, REG2): Test for __PPC__ as well.
* gcc.dg/pr89313.c (REG): Likewise.

[testsuite] [ppc] block-cmp-8 should require powerpc64

gcc.target/powerpc/block-cmp-8.c is an execution test on ilp32.  It
tests for support for the 64-bit ISA in the compiler, but not for the
ability to execute powerpc64 instructions, so the test fails on 32-bit
hardware.  Require powerpc64 instead.

for  gcc/testsuite/ChangeLog

* gcc.target/powerpc/block-cmp-8.c: Require powerpc64
instruction execution support.

vxworks: libstdc++: include ioLib.h for dup()

vxworks's dup function is not declared in unistd.h, but c++23/print.cc
expects to be able to call it if unistd.h is available. On vxworks,
the function is only declared in ioLib.h, so arrange to include it.

for libstdc++-v3/ChangeLog

* src/c++23/print.cc [__VXWORKS__]: Include ioLib.h.

c++: recursive instantiation diagnostic [PR120204]

Here tsubst_baselink was returning error_mark_node silently despite
tf_error; we need to actually give an error.

PR c++/120204

gcc/cp/ChangeLog:

* pt.cc (tsubst_baselink): Always error if lookup fails.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-recursion3.C: New test.

Daily bump.

c++: visibility of instantiated template friends

In 20_util/variant/visit_member.cc, instantiation of the variant friend
declaration of __get for variant<test01()::X> was being marked as internal
because that variant specialization is itself internal.  And therefore
check_module_override didn't try to merge it with the non-exported
namespace-scope declaration of __get.

But the template parms of variant are not part of the friend template's
identity, so they should not affect its visibility.  If they are substituted
into the friend declaration, we'll handle that when looking at the
declaration itself.

This change no longer seems necessary to fix the testcase, but does still
seem correct.  We definitely still get here during tsubst_friend_function.

gcc/cp/ChangeLog:

* decl2.cc (determine_visibility): Ignore args for friend templates.

c++: CWG2369 workaround and ... [PR120185]

My r16-479 adjustment to the PR99599 workaround broke on a class with a
varargs constructor.

It also occurred to me that we don't need to do non-dep conversion checking
in two phases when concepts aren't supported.

PR c++/99599
PR c++/120185

gcc/cp/ChangeLog:

* class.cc (type_has_converting_constructor): Handle null parm.
* pt.cc (fn_type_unification): Skip early non-dep checking if
no concepts.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-nondep6.C: New test.

Fix wrong optimization of complex boolean expression

The VRP2 pass turns:

  # prephitmp_3 = PHI <0(4)>
  _1 = prephitmp_3 == 0;
  _5 = stretch_14(D) ^ 1;
  _39 = _1 & _5;
  _40 = _39 | last_20(D);

into

  _5 = stretch_14(D) ^ 1;
  _42 = ~stretch_14(D);
  _39 = _42;
  _40 = last_20(D) | _39;

using the following step:

Folding statement: _1 = prephitmp_3 == 0;
Queued stmt for removal.  Folds to: 1
Folding statement: _5 = stretch_14(D) ^ 1;
Not folded
Folding statement: _39 = _1 & _5;
gimple_simplified to _42 = ~stretch_14(D);
_39 = _42 & 1;
Folded into: _39 = _42;

Folding statement: _40 = _39 | last_20(D);
Folded into: _40 = last_20(D) | _39;

but stretch_14 is a 8-bit boolean so the two forms are not equivalent, that
is to say dropping the "& 1" is wrong.  It's another instance of the issue:
  https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558537.html

Here it's the reverse case: the bitwise NOT (~) is treated as logical by the
machinery in range-op.cc but the bitwise AND (&) is *not* treated as logical
by that of vr-values.cc, leading to the same problematic outcome.

gcc/
* vr-values.cc (simplify_using_ranges::simplify) <BIT_AND_EXPR>:
Do not call simplify_bit_ops_using_ranges for boolean types whose
precision is not 1.

gcc/testsuite/
* gnat.dg/opt106.adb: New test.
* gnat.dg/opt106_pkg1.ads, gnat.dg/opt106_pkg1.adb: New helper.
* gnat.dg/opt106_pkg2.ads, gnat.dg/opt106_pkg2.adb: Likewise.

tree-optimization/114166 - vectorize to lowered form with word_mode

The following adjusts the non-PLUS/MINUS/NEGATE_EXPR vectorizations
of "word_mode" vectors to emit the form vector lowering will later use.
This allows us to move the vector lowering pass before vectorization,
specifically closing the gap between vectorization and lowering,
so we can eventually assert the vectorizer doesn't emit any code
that's not directly supported by the target.

PR tree-optimization/114166
* tree-vect-stmts.cc (vectorizable_operation): Lower also
bitwise operations on word-mode vectors.

Remove non-SLP path from vectorizable_operation

This removes the non-SLP path from vectorizable_operation and folds
away ncopies, replaces STMT_VINFO_VECTYPE with SLP_TREE_VECTYPE
and removes a big comment that's inaccurate in many details since
a long time. It does not get rid of the 'vec_stmt' argument
since splitting the function into analysis and transform would
require storing analysis results somewhere which should be done
separately.

* tree-vect-stmts.cc (vectorizable_operation): Remve non-SLP
path.

gimple-fold: Don't replace `{true/false} != false` with `true/false` inside GIMPLE_COND

This is like the patch where we don't want to replace `bool_name != 0`
with `bool_name` but for instead for INTEGER_CST. The only thing
difference is there are a few different forms for always true/always
false; only handle it if it was in the canonical form. A few new helpers are
added for the canonical form detection.

This also replaces the previous version of the patch which did an early
exit from fold_stmt_1 instead so we can change the non-canonical form
into a canonical in the end.

gcc/ChangeLog:

* gimple.h (gimple_cond_true_canonical_p): New function.
(gimple_cond_false_canonical_p): New function.
* gimple-fold.cc (replace_stmt_with_simplification): Return
false if replacing the operands of GIMPLE_COND with an INTEGER_CST
and already in canonical form.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

rtl-optimization/120182 - wrong-code with RTL DSE and constant addresses

RTL DSE forms store groups from unique invariant bases but that is
confused when presented with constant addresses where it assigns
one store group per unique address.  That causes it to not consider
0x101:QI to alias 0x100:SI.  Constant accesses can really alias
to every object, in practice they appear for I/O and for access
to objects fixed via linker scripts for example.  So simply avoid
registering a store group for them.

PR rtl-optimization/120182
* dse.cc (canon_address): Constant addresses have no
separate store group.

* gcc.dg/torture/pr120182.c: New testcase.

libgomp.{c,fortran}/interop-{hip,cuda}: Fix dg-run target selection

While the tests checked whether the CUDA/HIP runtime is available
before processing them, the execution was then done unconditionally,
leading to FAIL when the default device was the host (or the wrong
offload device).

Now the test is only executed ('run') when the default device is an
Nvidia or AMD GPU (depending on the test case, cf. the test file name).
Otherwise, only a 'link' test is done. (Except when the effective-target
check cannot find the runtime lib - then the test is skipped [as before].)

Note: The cublas/hipblas tests use variant functions and iterate over
all devices, such that the cublas or hipblas, respectively, is only
called when the active device is an AMD or Nvidia device, respectively,
while for the host and other device types the fallback is called.

libgomp/ChangeLog:

* testsuite/libgomp.c/interop-cuda-full.c: Use 'link' instead
of 'run' when the default device is "! offload_device_nvptx".
* testsuite/libgomp.c/interop-cuda-libonly.c: Likewise.
* testsuite/libgomp.c/interop-hip-nvidia-full.c: Likewise.
* testsuite/libgomp.c/interop-hip-nvidia-no-headers.c: Likewise.
* testsuite/libgomp.c/interop-hip-nvidia-no-hip-header.c: Likewise.
* testsuite/libgomp.fortran/interop-hip-nvidia-full.F90: Likewise.
* testsuite/libgomp.fortran/interop-hip-nvidia-no-module.F90: Likewise.
* testsuite/libgomp.c/interop-hip-amd-full.c: Use 'link' instead
of 'run' when the default device is "! offload_device_gcn".
* testsuite/libgomp.c/interop-hip-amd-no-hip-header.c: Likewise.
* testsuite/libgomp.fortran/interop-hip-amd-full.F90: Likewise.
* testsuite/libgomp.fortran/interop-hip-amd-no-module.F90: Likewise.

tree-optimization/119960 - failed external SLP promotion

The following addresses a too conservative sanity check of SLP nodes
we want to promote external.  The issue lies in code generation
for such external which relies on get_later_stmt to figure an
insert location.  But get_later_stmt relies on the ability to
totally order stmts, specifically implementation-wise that they
are all from the same BB, which is what is verified at the moment.

The patch changes this to require stmts to be orderable by
dominance queries.  For simplicity and seemingly enough for the
testcase in PR119960, this handles the case of two distinct BBs.

PR tree-optimization/119960
* tree-vect-slp.cc (vect_slp_can_convert_to_external):
Handle cases where defs from multiple BBs are ordered
by their dominance relation.

* gcc.dg/vect/bb-slp-pr119960-1.c: New testcase.

testsuite: g++.dg/cpp2a/constinit16.C requires tls

This test is 'dg-do compile', so require tls instead of tls_runtime.

This enables it on targets such as arm-none-eabi configured with
--enable-threads=no.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constinit16.C: Require tls.

testsuite: g++.dg/cpp2a/decomp2.C requires tls_runtime

Since this test is a 'dg-do run', it requires tls_runtime rather than
just tls.

This makes the test UNSUPPORTED on targets such as arm-non-eabi,
instead of FAIL/UNRESOLVED because __aeabi_read_tp is not provided
(e.g. when GCC is configured with --enable-threads=no.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/decomp2.C: Require tls_runtime.

Printf properly on systems without %zu [PR120086]

Some systems don't support the %zu format modifier for size_t, such as
hppa64-hp-hpux. We don't really need the full width of size_t for
printing the number of prime paths as path counts of those sizes
would've already blown up the machine. For printing the vector size we
can use the formatting directives from hwint.h.

PR gcov-profile/120086

gcc/ChangeLog:

* gcov.cc (print_prime_path_lines): Use unsigned, format with
%u.
(print_prime_path_source): Likewise.
(output_path_coverage): Format with HOST_SIZE_T_PRINT_UNSIGNED,
use unsigned for pathno.

testsuite: Limit option '-mgeneral-regs-only' backends in pr119160.

Limit option '-mgeneral-regs-only' to those in supported backends.

Version log:
https://patchwork.sourceware.org/project/gcc/patch/20250508080102.1340059-1-jiawei@iscas.ac.cn/

gcc/testsuite/ChangeLog:

* gcc.dg/pr119160.c: Limit backends.

AArch64: Optimize SVE loads/stores with ptrue predicates to unpredicated instructions.

SVE loads and stores where the predicate is all-true can be optimized to
unpredicated instructions. For example,
svuint8_t foo (uint8_t *x)
{
  return svld1 (svptrue_b8 (), x);
}
was compiled to:
foo:
ptrue p3.b, all
ld1b z0.b, p3/z, [x0]
ret
but can be compiled to:
foo:
ldr z0, [x0]
ret

Late_combine2 had already been trying to do this, but was missing the
instruction:
(set (reg/i:VNx16QI 32 v0)
    (unspec:VNx16QI [
            (const_vector:VNx16BI repeat [
                    (const_int 1 [0x1])
                ])
            (mem:VNx16QI (reg/f:DI 0 x0 [orig:106 x ] [106])
      [0 MEM <svuint8_t> [(unsigned char *)x_2(D)]+0 S[16, 16] A8])
        ] UNSPEC_PRED_X))

This patch adds a new define_insn_and_split that matches the missing
instruction and splits it to an unpredicated load/store. Because LDR
offers fewer addressing modes than LD1[BHWD], the pattern is
guarded under reload_completed to only apply the transform once the
address modes have been chosen during RA.

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve.md (*aarch64_sve_ptrue<mode>_ldr_str):
Add define_insn_and_split to fold predicated SVE loads/stores with
ptrue predicates to unpredicated instructions.

gcc/testsuite/
* gcc.target/aarch64/sve/ptrue_ldr_str.c: New test.
* gcc.target/aarch64/sve/acle/general/attributes_6.c: Adjust
expected outcome.
* gcc.target/aarch64/sve/cost_model_14.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_4.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_5.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_6.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_7.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_mf8.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Adjust expected outcome.
* gcc.target/aarch64/sve/peel_ind_2.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_1.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_2.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_3.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_4.c: Adjust expected outcome.

check_GNU_style: Remove literal prefix

The path "b/binutils/dwarf.c" should be printed as binutils/dwarf.c",
not "inutils/dwarf.c".

contrib/ChangeLog:

* check_GNU_style_lib.py: Remove literal prefix.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

libstdc++: Use _Padding_sink in __formatter_chrono to produce padded output.

Formatting code is extracted to _M_format_to function, that produced output
to specified iterator. This function is now invoked either with __fc.out()
directly (if width is not specified) or _Padding_sink::out().

This avoid formatting to temporary string if no padding is requested,
and minimize allocations otherwise. For more details see commit message of
r16-142-g01e5ef3e8b91288f5d387a27708f9f8979a50edf.

This should not increase number of instantiations, as implementation only
produce basic_format_context with _Sink_iter as iterator, which is also
_Padding_sink iterator.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_format_to):
Extracted from _M_format.
(__formatter_chrono::_M_format): Use _Padding_sink and delegate
to _M_format_to.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Provide ability to query _Sink_iter if writes are discarded.

This patch provides _M_discarding functiosn for _Sink_iter and _Sink function
that returns true, if any further writes to the _Sink_iter and underlying _Sink,
will be discared, and thus can be omitted.

Currently only the _Padding_sink reports discarding mode of if width of sequence
characters is greater than _M_maxwidth (precision), or underlying _Sink is
discarding characters. The _M_discarding override, is separate function from
_M_ignoring, that remain annotated with [[__gnu__::__always_inline__]].

Despite having notion of maximum characters to be written (_M_max), _Iter_sink
nevers discard characters, as the total number of characters that would be written
needs to be returned by format_to_n. This is documented in-source by providing an
_Iter_sink::_M_discarding override, that always returns false.

The function is currently queried only by the _Padding_sinks, that may be stacked
for example a range is formatted, with padding with being specified both for range
itself and it's elements. The state of underlying sink is checked during construction
and after each write (_M_sync_discarding).

libstdc++-v3/ChangeLog:

* include/std/format (__Sink_iter<_CharT>::_M_discarding)
(__Sink<_CharT>::_M_discarding, _Iter_sink<_CharT, _OutIter>::_M_discarding)
(_Padding_sinl<_CharT, _Out>::_M_padwidth)
(_Padding_sink<_CharT, _Out>::_M_maxwidth): Remove const.
(_Padding_sink<_CharT, _Out>::_M_sync_discarding)
(_Padding_sink<_CharT, _Out>::_M_discarding): Define.
(_Padding_sink<_CharT, _Out>::_Padding_sink(_Out, size_t, size_t))
(_Padding_sink<_CharT, _Out>::_M_force_update):
(_Padding_sink<_CharT, _Out>::_M_flush): Call _M_sync_discarding.
(_Padding_sink<_CharT, _Out>::_Padding_sink(_Out, size_t)): Delegate.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

diagnostics: convert HTML output test plugin to 'experimental-html' sink [PR116792]

In r15-3752-g48261bd26df624 I added a test plugin that overrode the
regular output, instead emitting diagnostics in crude HTML form.

In r15-4760-g0b73e9382ab51c I added support for multiple kinds of
diagnostic output simultaneously, adding
-fdiagnostics-add-output=DIAGNOSTICS-OUTPUT-SPEC
-fdiagnostics-set-output=DIAGNOSTICS-OUTPUT-SPEC
for adding/changing the kind of diagnostics output, supporting
"text" and "sarif" output schemes.

This patch promotes the HTML output code from the test plugins so
that it is available from "-fdiagnostics-add-output=", using a
new "experimental-html" scheme, to allow simultaneous text, sarif
and html output, and to make it easier to experiment with.  The
patch adds Python-based testing of the emitted HTML.

The patch does not affect the generated HTML, which is still crude, and
not yet ready for end-users.  I hope to improve it in followups.

gcc/ChangeLog:
PR other/116792
* Makefile.in (OBJS-libcommon): Add diagnostic-format-html.o.
* diagnostic-format-html.cc: Move here from
testsuite/gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc.
Simplify includes.  Rename "xhtml" to "html" throughout.
(write_escaped_text): Drop.
(class xhtml_stream_output_format): Drop.
(class html_file_output_format): Reimplement using
diagnostic_output_file.
(diagnostic_output_format_init_xhtml): Drop.
(diagnostic_output_format_init_xhtml_stderr): Drop.
(diagnostic_output_format_init_xhtml_file): Drop.
(diagnostic_output_format_open_html_file): New.
(make_html_sink): New.
(xhtml_format_selftests): Convert to...
(diagnostic_format_html_cc_tests): ...this.
(plugin_is_GPL_compatible): Drop.
(plugin_init): Drop.
* diagnostic-format-html.h: New file.
* doc/invoke.texi (-fdiagnostics-add-output=): Add
"experimental-html" scheme.
* opts-diagnostic.cc: Include "diagnostic-format-html.h".
(class html_scheme_handler): New.
(output_factory::output_factory): Add html_scheme_handler.
(html_scheme_handler::make_sink): New.
* selftest-run-tests.cc (selftest::run_tests): Call the new
selftests.
* selftest.h (selftest::diagnostic_format_html_cc_tests): New
decl.

gcc/testsuite/ChangeLog:
PR other/116792
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc: Move to
gcc/diagnostic-format-html.cc.
* gcc.dg/html-output/html-output.exp: New support script.
* gcc.dg/html-output/missing-semicolon.c: New test.
* gcc.dg/html-output/missing-semicolon.py: New test script.
* gcc.dg/plugin/diagnostic-test-xhtml-1.c: Deleted test.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Drop moved plugin
and its deleted test.
* lib/gcc-dg.exp (load_lib): Add load_lib of scanhtml.exp.
* lib/htmltest.py: New support script.
* lib/scanhtml.exp: New support script, based on scansarif.exp.

libatomic/ChangeLog:
PR other/116792
* testsuite/lib/libatomic.exp: Add load_lib of scanhtml.exp.

libgomp/ChangeLog:
PR other/116792
* testsuite/lib/libgomp.exp: Add load_lib of scanhtml.exp.

libitm/ChangeLog:
PR other/116792
* testsuite/lib/libitm.exp: Add load_lib of scanhtml.exp.

libphobos/ChangeLog:
PR other/116792
* testsuite/lib/libphobos-dg.exp: Add load_lib of scanhtml.exp.

libvtv/ChangeLog:
PR other/116792
* testsuite/lib/libvtv-dg.exp: Add load_lib of scanhtml.exp.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

RISC-V: Add testcases for vec_duplicate + vadd.vv combine case 1 with GR2VR cost 2

Add asm dump check and for vec_duplicate + vadd.vv combine case 1 to vadd.vx
with the cost of GR2VR is 2. The testcases is not that tidy according
to the result, but we will continue tuning the cost model for this.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for vec_duplicate + vadd.vv combine case 1 with GR2VR cost 1

Add asm dump check and for vec_duplicate + vadd.vv combine case 1 to vadd.vx
with the cost of GR2VR is 1. The testcases is not that tidy according
to the result, but we will continue tuning the cost model for this.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for vec_duplicate + vadd.vv combine case 1 with GR2VR cost 0

Add asm dump check and for vec_duplicate + vadd.vv combine case 1 to vadd.vx.
The late-combine will take action when GR2VR cost is 0, because the vmv
and the vadd.vx will consume the same cost of GR2VR.  Aka:

Before:
L1:
  vmv.v.x
  vadd.vv
  J L1

After:
L1:
  vadd.vx
  J L1

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Rename VX_BINARY test helper to VX_BINARY_CASE_0

This patch would like to rename the VX_BINARY within CASE_0 suffix, as
we have another case of VX_BINARY test code.  Aka case 1:

L1:
  vmv.v.x
  vadd.vv
  J L1

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Rename VX_BINARY
to VX_BINARY_CASE_0 for underlying case 1.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c: Take the
new name for test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c: Ditto

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Separate the test running of rvv vx_vf

The default test running in rvv.exp takes the -fno-vect-cost-model
for most of these options. It is not that suitable as the vx_vf
test depends on the cost-model. Thus, separate the vx_vf test
cases without -fno-vect-cost-model in another options.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Separate test running of
rvv vx_vf.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

Fortran: parsing issue with DO CONCURRENT;ENDDO on same line [PR120179]

PR fortran/120179

gcc/fortran/ChangeLog:

* match.cc (gfc_match_do): Do not attempt to match end-of-statement
twice.

gcc/testsuite/ChangeLog:

* gfortran.dg/do_concurrent_basic.f90: Extend testcase.

c++: adjust PR99599/CWG2369 workaround

This tweak to CWG2369 has gotten more discussion lately in CWG, including in
P3606. In those discussions, it occurred to me that having the check depend
on whether a class has been instantiated yet is unstable, that it should
only check for user-defined conversions.

Also, one commenter was surprised that adding an explicitly-declared default
constructor to a class changed things, so this patch also changes the
aggregate check to more narrowly checking for one-argument constructors
other than the copy/move constructors.

As a result, this early filter resembles how LOOKUP_DEFAULTED rejects any
candidate that would need a UDC: in both cases we want to avoid considering
arbitrary UDCs. But here, rather than rejecting, we want the early filter
to let the candidate past without considering the conversion.

PR c++/99599

gcc/cp/ChangeLog:

* cp-tree.h (type_has_converting_constructor): Declare.
* class.cc (type_has_converting_constructor): New.
* pt.cc (conversion_may_instantiate_p): Don't check completeness.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-recursive-sat4.C: Adjust again.
* g++.dg/cpp2a/concepts-nondep5.C: New test.

gimple-fold: Don't replace `bool_var != 0` with `bool_var` inside GIMPLE_COND

Since match and simplify will simplify `bool_var != 0` to just `bool_var` and
this is inside a GIMPLE_COND, fold_stmt will return true but nothing has changed.
So let's just reject the replacement if we are replacing with the same simplification
inside replace_stmt_with_simplification. This can speed up things slightly because
now fold_stmt won't return true on all GIMPLE_COND with `bool_var != 0` in it.

gcc/ChangeLog:

* gimple-fold.cc (replace_stmt_with_simplification): Return false
if replacing `bool_var != 0` with `bool_var` in GIMPLE_COND.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Fix tree-ssa/pr31261.c testcase after r16-400 [PR120168]

AFter r16-400-g5e363ffefaceb9, on targets where char is unsigned by
default, tree-ssa/pr31261.c testcase started to fail:
FAIL: gcc.dg/tree-ssa/pr31261.c scan-tree-dump-times original "return \\\$char\\\$ -\\\$unsigned char\\\$ c & 31;" 1

This is because the casts are no longer needed as both char and
unsigned char are the same signedness.
I was deciding between add -fsigned-char or changing the testcase
to use explicitly `signed char`. I went with using an explicit
`signed char` as that would be case normally.

PR testsuite/120168

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr31261.c: Use `signed char` instead
of plain char.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

vect: Remove non-SLP path from vectorizable_reduction

Fold slp_node to TRUE and clean-up vectorizable_reduction and related functions.
Also split up vectorizable_lc_phi and create vect_transform_lc_phi.

gcc/ChangeLog:

* tree-vect-loop.cc (get_initial_def_for_reduction): Remove.
(vect-create_epilog_for_reduction): Remove non-SLP path.
(vectorize_fold_left_reduction): Likewise.
(vectorizable_lane_reducing): Likewise.
(vectorizable_reduction): Likewise.
(vect_transform_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
(vectorizable_lc_phi): Remove non-SLP PATH and split into...
(vect_transform_lc_phi): ... this.
(update_epilogue_loop_vinfo): Update comment.
* tree-vect-stmts.cc (vect_analyze_stmt): Update call to
vectorizable_lc_phi.
(vect_transform_stmt): Update calls to vect_transform_reduction and
vect_transform_cycle_phi. Rename call from vectorizable_lc_phi to
vect_transform_lc_phi.
* tree-vectorizer.h (vect_transform_reduction): Update declaration.
(vect_transform_cycle_phi): Likewise.
(vectorizable_lc_phi): Likewise.
(vect_transform_lc_phi): New.

gensupport: validate compact constraint modifiers

For constraints there are operand modifiers and constraint qualifiers.
Operand modifiers apply to all alternatives and must appear, in
traditional syntax before the first alternative. Constraint
qualifiers, on the other hand must appear in each alternative to which
they apply.

There's no easy way to validate the distinction in the traditional md
format, but when using the new compact format we can enforce some
semantic checking of these characters to avoid some potentially
surprising code generation.

gcc/

* gensupport.cc (conlist::conlist): Pass a location to the constructor.
Only allow skipping of non-alpha-numeric characters when parsing a
number and only allow '=', '+' or '%'. Add some error checking when
parsing an operand number.
(parse_section_layout): Pass the location to the conlist constructor.
(parse_section): Allow an optional list of forbidden characters.
If specified, reject strings containing them.
(convert_syntax): Reject '=', '+' or '%' in an alternative.

aarch64: Fix up commutative and early-clobber markers on compact insns

For constraints there are operand modifiers and constraint qualifiers.
Operand modifiers apply to all alternatives and must appear, in
traditional syntax before the first alternative. Constraint
qualifiers, on the other hand must appear in each alternative to which
they apply.

There's no easy way to validate the distinction in the traditional md
format, but when using the new compact format we can enforce some
semantic checking of these characters to avoid some potentially
surprising code generation.

Fortunately, all of these errors are benign, but the two misplaced
early-clobber markers were quite suspicious at first sight - it's only
by luck that the second alternative does not need an early-clobber.

The syntax checking will be added in the following patch, but first of
all, fix up the errors in aarch64.md.

gcc/
* config/aarch64/aarch64-sve.md (@aarch64_pred_<optab><mode>): Move
commutative marker to the cons specification.
(add<mode>3): Likewise.
(@aarch64_pred_<su>abd<mode>): Likewise.
(@aarch64_pred_<optab><mode>): Likewise.
(*cond_<optab><mode>_z): Likewise.
(<optab><mode>3): Likewise.
(@aarch64_pred_<optab><mode>): Likewise.
(*aarch64_pred_abd<mode>_relaxed): Likewise.
(*aarch64_pred_abd<mode>_strict): Likewise.
(@aarch64_pred_<optab><mode>): Likewise.
(@aarch64_pred_<optab><mode>): Likewise.
(@aarch64_pred_fma<mode>): Likewise.
(@aarch64_pred_fnma<mode>): Likewise.
(@aarch64_pred_<optab><mode>): Likewise.

* config/aarch64/aarch64-sve2.md (@aarch64_sve_<su>clamp<mode>): Move
commutative marker to the cons specification.
(*aarch64_sve_<su>clamp<mode>_x): Likewise.
(@aarch64_sve_fclamp<mode>): Likewise.
(*aarch64_sve_fclamp<mode>_x): Likewise.
(*aarch64_sve2_nor<mode>): Likewise.
(*aarch64_sve2_nand<mode>): Likewise.
(*aarch64_pred_faminmax_fused): Likewise.

* config/aarch64/aarch64.md (*loadwb_pre_pair_<ldst_sz>): Move the
early-clobber marker to the relevant alternative.
(*storewb_pre_pair_<ldst_sz>): Likewise.
(*add<mode>3_aarch64): Move commutative marker to the cons
specification.
(*addsi3_aarch64_uxtw): Likewise.
(*add<mode>3_poly_1): Likewise.
(add<mode>3_compare0): Likewise.
(*addsi3_compare0_uxtw): Likewise.
(*add<mode>3nr_compare0): Likewise.
(<optab><mode>3): Likewise.
(*<optab>si3_uxtw): Likewise.
(*and<mode>3_compare0): Likewise.
(*andsi3_compare0_uxtw): Likewise.
(@aarch64_and<mode>3nr_compare0): Likewise.

tree-optimization/116352 - amend previous fix

The previous fix restricted external vector builds to defs from
the same basic-block. That turns out too restrictive so we have
to mitigate the original issue in a different way which is
restricting it to the original case where all defs are in the
same basic-block.

PR tree-optimization/116352
* tree-vect-slp.cc (vect_build_slp_tree_2): When compressing
operands from a two-operator node make sure the resulting
operation does not mix defs from different basic-blocks.

tree-optimization/120043 - bogus conditional store elimination

The following fixes conditional store elimination to properly
check for conditional stores to readonly memory which we can
obviously not store to unconditionally. The tree_could_trap_p
predicate used is only considering rvalues and the chosen
approach mimics that of loop store motion.

PR tree-optimization/120043
* tree-ssa-phiopt.cc (cond_store_replacement): Check
whether the store is to readonly memory.

* gcc.dg/torture/pr120043.c: New testcase.

fortran: Add testcases for PR120152, PR120153 and PR120158

The following patch adds testcase coverage for the 3 recently fixed
libgfortran PRs.
On trunk before those fixes I'm getting with -m32
FAIL: gfortran.dg/pr120152_1.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/pr120152_1.f90   -Os  (test for excess errors)
and with -m64
FAIL: gfortran.dg/pr120152_1.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/pr120152_1.f90   -Os  (test for excess errors)
FAIL: gfortran.dg/pr120152_2.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/pr120152_2.f90   -Os  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -Os  (test for excess errors)
FAIL: gfortran.dg/pr120158.f90   -O0  execution test
FAIL: gfortran.dg/pr120158.f90   -O1  execution test
FAIL: gfortran.dg/pr120158.f90   -O2  execution test
FAIL: gfortran.dg/pr120158.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/pr120158.f90   -O3 -g  execution test
FAIL: gfortran.dg/pr120158.f90   -Os  execution test
On latest trunk everything PASSes.

2025-05-08  Jakub Jelinek  <jakub@redhat.com>

PR libfortran/120152
PR libfortran/120153
PR libfortran/120158
* gfortran.dg/pr120152_1.f90: New test.
* gfortran.dg/pr120152_2.f90: New test.
* gfortran.dg/pr120153.f90: New test.
* gfortran.dg/pr120158.f90: New test.

libgcobol: Heed --enable-libgcobol

If some target isn't listed as supported in configure.tgt,
--enable-libgcobol cannot override that. However, that's what should
happen just like an explicit --enable-languages=cobol forces the
frontend to be built.

This patch, shamelessly adapted from libphobos, does just that.

Tested on amd64-pc-solaris2.11, sparcv9-sun-solaris2.11, and
x86_64-pc-linux-gnu.

2025-04-08 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

libgcobol:
* configure.ac: Handle --enable-libgcobol.
Let it override LIBGCOBOL_SUPPORTED.
* configure: Regenerate.

cobol: Allow for undefined NAME_MAX [PR119217]

All users of symbols.h fail to compile on Solaris:

/vol/gcc/src/hg/master/local/gcc/cobol/symbols.h: At global scope:
/vol/gcc/src/hg/master/local/gcc/cobol/symbols.h:1365:13: error: ‘NAME_MAX’ was not declared in this scope
1365 |   char name[NAME_MAX];
      |             ^~~~~~~~

NAME_MAX being undefined is allowed by POSIX.1, actually: it's listed
for <limits.h> under "Pathname Variable Values":

A definition of one of the symbolic constants in the following list
shall be omitted from the <limits.h> header on specific implementations
where the corresponding value is equal to or greater than the stated
minimum, but where the value can vary depending on the file to which it
is applied. The actual value supported for a specific pathname shall be
provided by the pathconf() function.

As a hack, this patch provides a fallback definition to allow the build
to finish.   In fact it turned out that cbl_funtion_t.name isn't filename
related and never set at all, so this patch serves as a mere stopgap fix
to unbreak the build until a real solution can be figured out.

Bootstrapped without regressions on amd64-pc-solaris2.11,
sparcv9-sun-solaris2.11, and x86_64-pc-linux-gnu.

2025-04-08  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/cobol:
PR cobol/119217
* symbols.h (NAME_MAX): Define fallback.

libfortran: Fix up maxval/maxloc for UNSIGNED [PR120158]

When libgfortran is compiled, there are some -Woverflow warnings like
../../../libgfortran/generated/maxloc0_4_m1.c:99:14: warning: unsigned conversion from ‘int’ to ‘GFC_UINTEGER_1’ {aka ‘unsigned char’} changes value from ‘-255’ to ‘1’ [-Woverflow]
   99 |     maxval = -GFC_UINTEGER_1_HUGE;
      |              ^
and those actually point a bug in the maxloc*/maxval* implementation
for UNSIGNED.
The intent of
#if defined ('atype_inf`)
        result = -atype_inf;
#else
        result = atype_min;
#endif
(or similar for maxval) is to initialize the variable with
minimum value of the type, if the type has infinities, then
negative infinity, otherwise the minimum (normalized) value.
atype_min expands for signed integers to say (-GFC_INTEGER_4_HUGE-1)
or for floating point to say -GFC_REAL_8_HUGE.
For UNSIGNED it expands to e.g. -GFC_UINTEGER_4_HUGE, but that is
-0xffffffffU which is 1U, while the minimum value of the type is
0.
Haven't tried to construct testcases for that, but I believe e.g.
maskval could return incorrectly 1 on an array (or masked array)
full of 0s, or maxloc could identify incorrectly the maximum location.

The following patch makes sure atype_min expands to 0 for atype_name
GFC_UINTEGER*.

2025-05-07  Jakub Jelinek  <jakub@redhat.com>

PR libfortran/120158
* m4/iparm.m4 (atype_min): For atype_name starting with
GFC_UINTEGER define to 0.
* generated/maxloc0_16_m1.c: Regenerate.
* generated/maxloc0_16_m2.c: Regenerate.
* generated/maxloc0_16_m4.c: Regenerate.
* generated/maxloc0_16_m8.c: Regenerate.
* generated/maxloc0_16_m16.c: Regenerate.
* generated/maxloc0_4_m1.c: Regenerate.
* generated/maxloc0_4_m2.c: Regenerate.
* generated/maxloc0_4_m4.c: Regenerate.
* generated/maxloc0_4_m8.c: Regenerate.
* generated/maxloc0_4_m16.c: Regenerate.
* generated/maxloc0_8_m1.c: Regenerate.
* generated/maxloc0_8_m2.c: Regenerate.
* generated/maxloc0_8_m4.c: Regenerate.
* generated/maxloc0_8_m8.c: Regenerate.
* generated/maxloc0_8_m16.c: Regenerate.
* generated/maxloc1_16_m1.c: Regenerate.
* generated/maxloc1_16_m2.c: Regenerate.
* generated/maxloc1_16_m4.c: Regenerate.
* generated/maxloc1_16_m8.c: Regenerate.
* generated/maxloc1_16_m16.c: Regenerate.
* generated/maxloc1_4_m1.c: Regenerate.
* generated/maxloc1_4_m2.c: Regenerate.
* generated/maxloc1_4_m4.c: Regenerate.
* generated/maxloc1_4_m8.c: Regenerate.
* generated/maxloc1_4_m16.c: Regenerate.
* generated/maxloc1_8_m1.c: Regenerate.
* generated/maxloc1_8_m2.c: Regenerate.
* generated/maxloc1_8_m4.c: Regenerate.
* generated/maxloc1_8_m8.c: Regenerate.
* generated/maxloc1_8_m16.c: Regenerate.
* generated/maxval_m1.c: Regenerate.
* generated/maxval_m2.c: Regenerate.
* generated/maxval_m4.c: Regenerate.
* generated/maxval_m8.c: Regenerate.
* generated/maxval_m16.c: Regenerate.

cobol: Initialize regmatch_t portably [PR119217]

The dts.h initialization of regmatch_t currently breaks Solaris compilation:

In file included from /vol/gcc/src/hg/master/local/gcc/cobol/lexio.h:208,
                 from /vol/gcc/src/hg/master/local/gcc/cobol/lexio.cc:36:
/vol/gcc/src/hg/master/local/gcc/cobol/dts.h: In constructor ‘dts::csub_match::csub_match(const char*)’:
/vol/gcc/src/hg/master/local/gcc/cobol/dts.h:36:35: error: invalid conversion from ‘int’ to ‘const char*’ [-fpermissive]
   36 |       static regmatch_t empty = { -1, -1 };
      |                                   ^~
      |                                   |
      |                                   int

The problem is that Solaris regmatch_t has additional members before
rm_so and rm_eo, as is always allowed by POSIX.1

typedef struct {
        const char      *rm_sp, *rm_ep; /* Start pointer, end pointer */
        regoff_t        rm_so, rm_eo;   /* Start offset, end offset */
        int             rm_ss, rm_es;   /* Used internally */
} regmatch_t;

so the initialization doesn't do what it's supposed to do.

Fixed by initializing the rm_so and rm_eo members explicitly.

Bootstrapped without regressions on amd64-pc-solaris2.11,
sparcv9-sun-solaris2.11, and x86_64-pc-linux-gnu.

2025-04-08  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/cobol:
PR cobol/119217
* dts.h (csub_match): Initialize rm_so, rm_eo fields explicitly.

phiopt: Use rewrite_to_defined_overflow in move_stmt [PR116938]

As mentioned previously the rewrite in move_stmt should be
using gimple_needing_rewrite_undefined/rewrite_to_defined_unconditional
instead of just rewriting the VCE.
This moves move_stmt over to those APIs.

A few testcases needed to be updated due to ABS_EXPR rewrite that happens.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116938

gcc/ChangeLog:

* tree-ssa-phiopt.cc (move_stmt): Use rewrite_to_defined_overflow
isntead of manually doing the rewrite of the VCE.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-40.c: Update to expect ABSU_EXPR.
* gcc.dg/tree-ssa/phi-opt-41.c: Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Rewrite VCEs of integral types [PR116939]

Like the patch to phiopt (r15-4033-g1f619fe25925a5f7), this adds rewriting
of VCE to gimple_with_undefined_signed_overflow/rewrite_to_defined_overflow.
In the case of moving VCE of a bool from being conditional to unconditional,
it needs to be rewritten to not to use VCE but a normal cast. pr120122-1.c is
an example of where LIM needs this rewriting. The precision of the outer type
needs to be less then the inner one.

This also renames gimple_with_undefined_signed_overflow to gimple_needing_rewrite_undefined
and rewrite_to_defined_overflow to rewrite_to_defined_unconditional as they will be doing
more than just handling signed overflow.

Changes since v1:
* v2: rename the functions.
* v3: Add check for precision to be smaller.

Bootstrappd and tested on x86_64-linux-gnu.

PR tree-optimization/120122
PR tree-optimization/116939

gcc/ChangeLog:

* gimple-fold.h (gimple_with_undefined_signed_overflow): Rename to ..
(rewrite_to_defined_overflow): This.
(gimple_needing_rewrite_undefined): Rename to ...
(rewrite_to_defined_unconditional): this.
* gimple-fold.cc (gimple_with_undefined_signed_overflow): Rename to ...
(gimple_needing_rewrite_undefined): This. Return true for VCE with integral
types of smaller precision.
(rewrite_to_defined_overflow): Rename to ...
(rewrite_to_defined_unconditional): This. Handle VCE rewriting to a cast.
* tree-if-conv.cc: s/gimple_with_undefined_signed_overflow/gimple_needing_rewrite_undefined/
s/rewrite_to_defined_overflow/rewrite_to_defined_unconditional.
* tree-scalar-evolution.cc: Likewise
* tree-ssa-ifcombine.cc: Likewise.
* tree-ssa-loop-im.cc: Likewise.
* tree-ssa-loop-split.cc: Likewise.
* tree-ssa-reassoc.cc: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr120122-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

ipa/120146 - deal with vanished varpool nodes in IPA PTA

I don't understand why they vanish when still refered to, but
lets deal with that in a conservative way.

PR ipa/120146
* tree-ssa-structalias.cc (create_variable_info_for): If
the symtab cannot tell us whether all refs to a variable
are explicit assume they are not.

* g++.dg/ipa/pr120146.C: New testcase.

cobol: Don't require GLOB_BRACE etc. [PR119217]

cdf-copy.cc doesn't compile on Solaris:

/vol/gcc/src/hg/master/local/gcc/cobol/cdf-copy.cc: In member function ‘int
copybook_elem_t::open_file(const char*, bool)’:
/vol/gcc/src/hg/master/local/gcc/cobol/cdf-copy.cc:317:34: error:
‘GLOB_BRACE’ was not declared in this scope; did you mean ‘GLOB_ERR’?
  317 |   static int flags = GLOB_MARK | GLOB_BRACE | GLOB_TILDE;
      |                                  ^~~~~~~~~~
      |                                  GLOB_ERR
/vol/gcc/src/hg/master/local/gcc/cobol/cdf-copy.cc:317:47: error:
‘GLOB_TILDE’ was not declared in this scope
  317 |   static int flags = GLOB_MARK | GLOB_BRACE | GLOB_TILDE;
      |                                               ^~~~~~~~~~

GLOB_BRACE and GLOB_TILDE are BSD extensions not in POSIX.1, thus
missing on Solaris probably due to its System V heritage.

This patch introduces fallback definitions to avoid this.

Bootstrapped without regressions on amd64-pc-solaris2.11,
sparcv9-sun-solaris2.11, and x86_64-pc-linux-gnu.

2025-04-08  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/cobol:
PR cobol/119217
* cdf-copy.cc (GLOB_BRACE): Define fallback.
(GLOB_TILDE): Likewise.

tree-optimization/119589 - alignment analysis for VF > 1 and VMAT_STRIDED_SLP

The following fixes the alignment analysis done by the VMAT_STRIDED_SLP
code which for the case of VF > 1 currently relies on dataref analysis
which assumes consecutive accesses. But the code generation advances
by DR_STEP between each iteration which requires us to assess that
individual DR_STEP preserve the alignment rather than only VF * DR_STEP.
This allows us to use vector aligned accesses in some cases.

PR tree-optimization/119589
PR tree-optimization/119586
PR tree-optimization/119155
* tree-vect-stmts.cc (vectorizable_store): Verify
DR_STEP_ALIGNMENT preserves DR_TARGET_ALIGNMENT when
VF > 1 and VMAT_STRIDED_SLP. Use vector aligned accesses when
we can.
(vectorizable_load): Likewise.

tree-optimization/120143 - ICE with failed early break store move

The early break vectorization store moving was incorrectly trying
to move the pattern stmt instead of the original one which failed
to register and then confused virtual SSA form due to the update
triggered by a degenerate virtual PHI.

PR tree-optimization/120143
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
Move/update the original stmts, not the pattern stmts which
lack virtual operands and are not in the IL.

* gcc.dg/vect/vect-early-break_135-pr120143.c: New testcase.

tree-optimization/120089 - force all PHIs live for early-break vect

The following makes sure to even mark unsupported PHIs live when
doing early-break vectorization since otherwise we fail to validate
we can vectorize those and generate wrong code based on the scalar
PHIs which would only work with a vectorization factor of one.

PR tree-optimization/120089
* tree-vect-stmts.cc (vect_stmt_relevant_p): Mark all
PHIs live when not already so and doing early-break
vectorization.
(vect_mark_stmts_to_be_vectorized): Skip virtual PHIs.
* tree-vect-slp.cc (vect_analyze_slp): Robustify handling
of early-break forced IVs.

* gcc.dg/vect/vect-early-break_134-pr120089.c: New testcase.

Canonicalize vec_merge in simplify_ternary_operation

Similar to the canonicalization done in combine, we canonicalize vec_merge with
swap_communattive_operands_p in simplify_ternary_operation too.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_exact_log2_inverse): New.
* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>):
Update pattern accordingly.
* config/aarch64/aarch64.cc (aarch64_exact_log2_inverse): New.
* simplify-rtx.cc (simplify_context::simplify_ternary_operation):
Canonicalize vec_merge.

Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>

Daily bump.

[RISC-V][PR target/120137][PR target/120154] Don't create out-of-range permutation constants

To make hashing sensible we canonicalize constant vectors in the hash table so
that their first entry always has the value zero.  That normalization can
result in a value that can't be represented in the element mode.

So before entering anything into the hash table we need to verify the
normalized entries will fit into the element's mode.

This fixes both 120137 and its duplicate 120154.  This has been tested in my
tester.  I'm just waiting for the pre-commit tester to render its verdict.

PR target/120137
PR target/120154
gcc/
* config/riscv/riscv-vect-permconst.cc (process_bb): Verify each
canonicalized element fits into the vector element mode.

gcc/testsuite/

* gcc.target/riscv/pr120137.c: New test.
* gcc.target/riscv/pr120154.c: New test.

[PATCH] RISC-V: Minimal support for zama16b extension.

This patch support zama16b extension[1].
To enable GCC to recognize and process zama16b extension correctly at compile time.

[1] https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* config/riscv/riscv.opt: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-48.c: New test.

arm: select CCFPEmode for LTGT [PR91323]

Besides Arm, there are three other ports that define both CCFPmode and
CCFPEmode. AArch64 and Sparc return CCFPEmode for LTGT; the other,
Visium, doesn't support LTGT at all.

AArch64 was changed in r8-5286-g8332c5ee8c5f3b, and Sparc with
r10-2926-g000a5f8d23c04c.

I suspect this issue is latent on Arm because cbranch?f4 and cstore?f4
reject LTGT and UNEQ and we fall back to a generic expansion which
happens to work. Nevertheless, this patch updates the relevant bits
of the Arm port to match the specification introduced in
r10-2926-g000a5f8d23c04c.

gcc/ChangeLog:

PR target/91323
* config/arm/arm.cc (arm_select_cc_mode): Use CCFPEmode for LTGT.

arm: Only reverse FP inequalities when -ffinite-math-only [PR110796...]

On Arm we have been failing to fully implement support for IEEE NaNs
in inequality comparisons because we have allowed reversing of
inequalities in a way that allows SELECT_CC_MODE to produce different
answers.  For example, the reverse of GT is UNLE, but if we pass these
two RTL codes to SELECT_CC_MODE, the former will return CCFPEmode,
while the latter CCFPmode.

It would be possible to allow fully reversible FPmodes, but to do so
would involve adding yet more RTL codes, something like NOT_GT and
NOT_UNLE, for the cases we cannot currently reverse.  NOT_GT would
then have the same condition code generation as UNLT, but the same
mode selection as GT.

In the mean time, we need to restrict REVERSIBLE_CC_MODE to
non-floating modes unless we are compiling with -ffinite-math-only.  In
that case we can continue to reverse the comparisons, but now we want
to always select CCFPmode as there's no need to consider the exception
raising cases.

PR target/110796
PR target/118446

gcc/ChangeLog:

* config/arm/arm.h (REVERSIBLE_CC_MODE): FP modes are only
reversible if flag_finite_math_only.
* config/arm/arm.cc (arm_select_cc_mode): Return CCFPmode for all
FP comparisons if flag_finite_math_only.

gcc/testsuite/ChangeLog:

* gcc.target/arm/armv8_2-fp16-arith-1.c: Adjust due to no-longer
emitting VCMPE when -ffast-math..

libfortran: Add 5 missing UNSIGNED symbols [PR120153]

While looking at PR120152, I have noticed that libgfortran.so doesn't
export 5 *m16* symbols I would have expected that should be exported.
This is caused by 2 issues, one filename was forgotten to be added in r15-4124
to i_maxloc1_c (guess because generated/maxloc1_16_i16.c was kept in the
position after generated/maxloc1_8_m16.c and the i -> m difference wasn't
spotted), and one some garbage prefix on HAVE_GFC_UINTEGER_16 macro.

The first two hunks of this patch fix that.
Though, as GCC 15.1 has been released already, we can't add these symbols
to GFORTRAN_15 symbol version as they've never been there, so the patch
adds them to a new GFORTRAN_15.2 symbol version instead.

2025-05-07 Jakub Jelinek <jakub@redhat.com>

PR libfortran/120153
* Makefile.am (i_maxloc1_c): Add generated/maxloc1_16_m16.c.
* intrinsics/random.c (arandom_m16): Use #ifdef HAVE_GFC_UINTEGER_16
guard rather than #ifdef GFC_HAVE_GFC_UINTEGER_16.
* gfortran.map (GFORTRAN_15): Remove _gfortran_arandom_m16,
_gfortran_maxloc1_16_m16, _gfortran_mmaxloc1_16_m16 and
_gfortran_smaxloc1_16_m16.
(GFORTRAN_15.2): New symbol version, add those 4 symbols to it.
* generated/maxloc1_16_m16.c: New file.
* Makefile.in: Regenerate.

ibfortran: Readd 15 accidentally removed libgfortran symbols [PR120152]

The r15-4124-gc0002a675a92e76d change seems to have accidentally
dropped 5 sourcefiles from i_maxloc1_c, which resulted in dropping
15 GFORTRAN_8 symbols on x86_64 and 6 on i686.

The following patch adds it back, so that we export those symbols
again, fixing the ABI problem.

2025-05-07 Jakub Jelinek <jakub@redhat.com>

PR libfortran/120152
* Makefile.am (i_maxloc1_c): Readd generated/maxloc1_4_i8.c,
generated/maxloc1_8_i8.c, generated/maxloc1_16_i8.c,
generated/maxloc1_4_i16.c, generated/maxloc1_8_i16.c. Move
generated/maxloc1_16_i16.c entry earlier in the list.
* Makefile.in: Regenerated.

libstdc++: Add missing export for std::is_layout_compatible_v [PR120159]

libstdc++-v3/ChangeLog:

PR libstdc++/120159
* src/c++23/std.cc.in (is_layout_compatible_v): Export.

libcpp: Further fixes for incorrect line numbers in large files [PR120061]

The backport of the PR108900 fix to 14 branch broke building chromium
because static_assert (__LINE__ == expected_line_number, ""); now triggers
as the __LINE__ values are off by one.
This isn't the case on the trunk and 15 branch because we've switched
to 64-bit location_t and so one actually needs far longer header files
to trigger it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c12
contain (large) testcases in patch form which show on the 14 branch
that the first one used to fail before the PR108900 backport and now
works correctly, while the second one attempts to match the chromium
behavior and it used to pass before the PR108900 backport and now it
FAILs.
The two testcases show rare problematic cases, because
do_include_common -> parse_include -> check_eol -> check_eol_1 ->
cpp_get_token_1 -> _cpp_lex_token -> _cpp_lex_direct -> linemap_line_start
triggers there
      /* Allocate the new line_map.  However, if the current map only has a
         single line we can sometimes just increase its column_bits instead. */
      if (line_delta < 0
          || last_line != ORDINARY_MAP_STARTING_LINE_NUMBER (map)
          || SOURCE_COLUMN (map, highest) >= (1U << (column_bits - range_bits))
          || ( /* We can't reuse the map if the line offset is sufficiently
                  large to cause overflow when computing location_t values.  */
              (to_line - ORDINARY_MAP_STARTING_LINE_NUMBER (map))
              >= (((uint64_t) 1)
                  << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
          || range_bits < map->m_range_bits)
        map = linemap_check_ordinary
                (const_cast <line_map *>
                  (linemap_add (set, LC_RENAME,
                                ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
                                ORDINARY_MAP_FILE_NAME (map),
                                to_line)));
and so creates a new ordinary map on the line right after the
(problematic) #include line.
Now, in the spot that r14-11679-g8a884140c2bcb7 patched,
pfile->line_table->highest_location in all 3 tests (also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c13
) is before the decrement the start of the line after the #include line and so
the decrement is really desirable in that case to put highest_location
somewhere on the line where the #include actually is.
But at the same time it is also undesirable, because if we do decrement it,
then linemap_add LC_ENTER called from _cpp_do_file_change will then
  /* Generate a start_location above the current highest_location.
     If possible, make the low range bits be zero.  */
  location_t start_location = set->highest_location + 1;
  unsigned range_bits = 0;
  if (start_location < LINE_MAP_MAX_LOCATION_WITH_COLS)
    range_bits = set->default_range_bits;
  start_location += (1 << range_bits) - 1;
  start_location &=  ~((1 << range_bits) - 1);

  linemap_assert (!LINEMAPS_ORDINARY_USED (set)
                  || (start_location
                      >= MAP_START_LOCATION (LINEMAPS_LAST_ORDINARY_MAP (set))));
and we can end up with the new LC_ENTER ordinary map having the same
start_location as the preceding LC_RENAME one.
Next thing that happens is computation of included_from:
  if (reason == LC_ENTER)
    {
      if (set->depth == 0)
        map->included_from = 0;
      else
        /* The location of the end of the just-closed map.  */
        map->included_from
          = (((map[0].start_location - 1 - map[-1].start_location)
              & ~((1 << map[-1].m_column_and_range_bits) - 1))
             + map[-1].start_location);
The normal case (e.g. with the testcase included at the start of this comment) is
that map[-1] starts somewhere earlier and so map->included_from computation above
nicely computes location_t which expands to the start of the #include line.
With r14-11679 reverted, for #c11 as well as #c12
map[0].start_location == map[-1].start_location above, and so it is
((location_t) -1 & ~((1 << map[-1].m_column_and_range_bits) - 1)))
+ map[-1].start_location,
which happens to be start of the #include line.
For #c11 map[0].start_location is 0x500003a0 and map[-1] has
m_column_and_range_bits 7 and map[-2] has m_column_and_range_bits 12 and
map[0].included_from is set to 0x50000320.
For #c12 map[0].start_location is 0x606c0402 and map[-2].start_location is
0x606c0400 and m_column_and_range_bits is 0 for all 3 maps.
map[0].included_from is set to 0x606c0401.
The last important part is again in linemap_add when doing LC_LEAVE:
      /* (MAP - 1) points to the map we are leaving. The
         map from which (MAP - 1) got included should be the map
         that comes right before MAP in the same file.  */
      from = linemap_included_from_linemap (set, map - 1);

      /* A TO_FILE of NULL is special - we use the natural values.  */
      if (to_file == NULL)
        {
          to_file = ORDINARY_MAP_FILE_NAME (from);
          to_line = SOURCE_LINE (from, from[1].start_location);
          sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
        }
Here it wants to compute the right to_line which ought to be the line after
the #include directive.
On the #c11 testcase that doesn't work correctly though, because
map[-1].included_from is 0x50000320, from[0] for that is LC_ENTER with
start_location 0x4080 and m_column_and_range_bits 12 but note that we've
earlier computed map[-1].start_location + (-1 & 0xffffff80) and so only
decreased by 7 bits, so to_line is still on the line with #include and not
after it.  In the #c12 that doesn't happen, all the ordinary maps involved
there had 0 m_column_and_range_bits and so this computes correct line.

Below is a fix for the trunk including testcases using the
location_overflow_plugin hack to simulate the bugs without needing huge
files (in the 14 case it is just 330KB and almost 10MB, but in the 15
case it would need to be far bigger).
The pre- r15-9018 trunk has
FAIL: gcc.dg/plugin/location-overflow-test-pr116047.c -fplugin=./location_overflow_plugin.so  scan-file static_assert[^\n\r]*6[^\n\r]*== 6
and current trunk
FAIL: gcc.dg/plugin/location-overflow-test-pr116047.c -fplugin=./location_overflow_plugin.so  scan-file static_assert[^\n\r]*6[^\n\r]*== 6
FAIL: gcc.dg/plugin/location-overflow-test-pr120061.c -fplugin=./location_overflow_plugin.so  scan-file static_assert[^\n\r]*5[^\n\r]*== 5
and with the patch everything PASSes.
I'll post afterwards a 14 version of the patch.

The patch reverts the r15-9018 change, because it is incorrect,
we really need to decrement it even when crossing ordinary map
boundaries, so that the location is not on the line after the #include
line but somewhere on the #include line.  It also patches two spots
in linemap_add mentioned above to make sure we get correct locations
both in the included_from location_t when doing LC_ENTER (second
line-map.cc hunk) and when doing LC_LEAVE to compute the right to_line
(first line-map.cc hunk), both in presence of an added LC_RENAME
with the same start_location as the following LC_ENTER (i.e. the
problematic cases).
The LC_ENTER hunk is mostly to ensure included_form location_t is
at the start of the #include line (column 0), without it we can
decrease include_from not enough and end up at some random column
in the middle of the line, because it is masking away
map[-1].m_column_and_range_bits bits even when in the end the resulting
include_from location_t will be found in map[-2] map with perhaps
different m_column_and_range_bits.  That alone doesn't fix the bug
though.
The more important is the LC_LEAVE hunk and the problem there is
caused by linemap_line_start not actually doing
    r = set->highest_line + (line_delta << map->m_column_and_range_bits);
when adding a new map (the LC_RENAME one because we need to switch to
different number of directly encoded ranges, or columns, etc.).
So, in the original PR108900 case that
  to_line = SOURCE_LINE (from, from[1].start_location);
doesn't do the right thing, from there is the last < 0x50000000 map
with m_column_and_range_bits 12, from[1] is the first one above it
and map[-1].included_from is the correct location of column 0 on
the #include line, but as the new LC_RENAME map has been created without
actually increasing highest_location to be on the new line (we've just
set to_line of the new LC_RENAME map to the correct line),
  to_line = SOURCE_LINE (from, from[1].start_location);
stays on the same source line.  I've tried to just replace that with
  to_line = SOURCE_LINE (from, linemap_included_from (map - 1)) + 1;
i.e. just find out the #include line from map[-1].included_from and
add 1 to it, unfortunately that breaks the
c-c++-common/cpp/line-4.c
test where we expect to stay on the same 0 line for LC_LEAVE from
<command line> and gcc.dg/cpp/trad/Wunused.c, gcc.dg/cpp/trad/builtins.c
and c-c++-common/analyzer/named-constants-via-macros-traditional.c tests
all with -traditional-cpp preprocessing where to_line is also off-by-one
from the expected one.
So, this patch instead conditionalizes it, uses the
  to_line = SOURCE_LINE (from, linemap_included_from (map - 1)) + 1;
way only if from[1] is a LC_RENAME map (rather than the usual
LC_ENTER one), that should limit it to the problematic cases of when
parse_include peeked after EOL and had to create LC_RENAME map with
the same start_location as the LC_ENTER after it.

Some further justification for the LC_ENTER hunk, using the
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682774.html testcase
(old is 14 before r14-11679, vanilla current 14 and new with the 14 patch)
I get
$ /usr/src/gcc-14/obj/gcc/cc1.old -quiet -std=c23 pr116047.c -nostdinc
In file included from pr116047-1.h:327677:21,
                 from pr116047.c:4:
pr116047-2.h:1:1: error: unknown type name ‘a’
    1 | a b c;
      | ^
pr116047-2.h:1:5: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘c’
    1 | a b c;
      |     ^
pr116047-1.h:327677:1: error: static assertion failed: ""
327677 | #include "pr116047-2.h"
       | ^~~~~~~~~~~~~
$ /usr/src/gcc-14/obj/gcc/cc1.vanilla -quiet -std=c23 pr116047.c -nostdinc
In file included from pr116047-1.h:327678,
                 from pr116047.c:4:
pr116047-2.h:1:1: error: unknown type name ‘a’
    1 | a b c;
      | ^
pr116047-2.h:1:5: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘c’
    1 | a b c;
      |     ^
$ /usr/src/gcc-14/obj/gcc/cc1.new -quiet -std=c23 pr116047.c -nostdinc
In file included from pr116047-1.h:327677,
                 from pr116047.c:4:
pr116047-2.h:1:1: error: unknown type name ‘a’
    1 | a b c;
      | ^
pr116047-2.h:1:5: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘c’
    1 | a b c;
      |     ^

pr116047-1.h has on lines 327677+327678:
#include "pr116047-2.h"
static_assert (__LINE__ == 327678, "");
so the static_assert failure is something that was dealt mainly in the
LC_LEAVE hunk and files.cc reversion, but please have a look at the
In file included from lines.
14.2 emits correct line (#include "pr116047-2.h" is indeed on line
327677) but some random column in there (which is not normally printed
for smaller headers; 21 is the . before extension in the filename).
Current trunk emits incorrect line (327678 instead of 327677, clearly
it didn't decrement).
And the patched compiler emits the right line with no column, as would
be printed if I remove e.g. 300000 newlines from the file.

2025-05-07  Jakub Jelinek  <jakub@redhat.com>

PR preprocessor/108900
PR preprocessor/116047
PR preprocessor/120061
* files.cc (_cpp_stack_file): Revert 2025-03-28 change.
* line-map.cc (linemap_add): Use
SOURCE_LINE (from, linemap_included_from (map - 1)) + 1; instead of
SOURCE_LINE (from, from[1].start_location); to compute to_line
for LC_LEAVE.  For LC_ENTER included_from computation, look at
map[-2] or even lower if map[-1] has the same start_location as
map[0].

* gcc.dg/plugin/plugin.exp: Add location-overflow-test-pr116047.c
and location-overflow-test-pr120061.c.
* gcc.dg/plugin/location_overflow_plugin.cc (plugin_init): Don't error
on unknown values, instead just break.  Handle 0x4fHHHHHH arguments
differently.
* gcc.dg/plugin/location-overflow-test-pr116047.c: New test.
* gcc.dg/plugin/location-overflow-test-pr116047-1.h: New test.
* gcc.dg/plugin/location-overflow-test-pr116047-2.h: New test.
* gcc.dg/plugin/location-overflow-test-pr120061.c: New test.
* gcc.dg/plugin/location-overflow-test-pr120061-1.h: New test.
* gcc.dg/plugin/location-overflow-test-pr120061-2.h: New test.

gimple: Add gimple_with_undefined_signed_overflow and use it [PR111276]

While looking into the ifcombine, I noticed that rewrite_to_defined_overflow
was rewriting already defined code. In the previous attempt at fixing this,
the review mentioned we should not be calling rewrite_to_defined_overflow
in those cases. The places which called rewrite_to_defined_overflow didn't
always check the lhs of the assignment. This fixes the problem by
introducing a helper function which is to be used before calling
rewrite_to_defined_overflow.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/111276
* gimple-fold.cc (arith_code_with_undefined_signed_overflow): Make static.
(gimple_with_undefined_signed_overflow): New function.
* gimple-fold.h (arith_code_with_undefined_signed_overflow): Remove.
(gimple_with_undefined_signed_overflow): Add declaration.
* tree-if-conv.cc (if_convertible_gimple_assign_stmt_p): Use
gimple_with_undefined_signed_overflow instead of manually
checking lhs and the code of the stmt.
(predicate_statements): Likewise.
* tree-ssa-ifcombine.cc (ifcombine_rewrite_to_defined_overflow): Likewise.
* tree-ssa-loop-im.cc (move_computations_worker): Likewise.
* tree-ssa-reassoc.cc (update_range_test): Likewise. Reformat.
* tree-scalar-evolution.cc (final_value_replacement_loop): Use
gimple_with_undefined_signed_overflow instead of
arith_code_with_undefined_signed_overflow.
* tree-ssa-loop-split.cc (split_loop): Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Loop-IM: Hoist (non-expensive) stmts to executed all loop when running before PRE

While fixing up how rewrite_to_defined_overflow works, gcc.dg/Wrestrict-22.c started
to fail. This is because `d p+ 2` would moved by LIM and then be rewritten not using
pointer plus. The rewriting part is correct behavior. It only recently started to be
moved out; due to r16-190-g6901d56fea2132.
Which has the following comment:
```
When we run before PRE and PRE is active hoist all expressions
since PRE would do so anyway and we can preserve range info
but PRE cannot.
```
This is not true if hoisting past the always executed point; so, instead of hoisting
these statements all the way out of the max loops, take into account the always executed
loop too.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-loop-im.cc (compute_invariantness): Hoist to the always executed point
if ignorning the cost.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

i386: implement costs for float<->int conversions in ix86_vector_costs::add_stmt_cost

This patch adds pattern matching for float<->int conversions both as normal
statements and promote_demote. While updating promote_demote I noticed that
in cleanups I turned "stmt_cost =" into "int stmt_cost = " which turned
the existing FP costing to NOOP. I also added comment on how demotes are done
when turning i.e. 32bit into 8bit value (which is the case of pr19919.c).

The patch disables vectorization in pr119919.c on generic tuning, but keeps
it at both zen and skylake+. The underlying problem is bad cost of open-coded
scatter which is tracked by 119902 so I simply added -mtune=znver1 so the testcase
keeps testing vectorization.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Add FLOAT_EXPR;
FIX_TRUNC_EXPR and vec_promote_demote costs.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr119919.c: Add -mtune=znver1

AArch64: Fold SVE load/store with certain ptrue patterns to LDR/STR.

SVE loads/stores using predicates that select the bottom 8, 16, 32, 64,
or 128 bits of a register can be folded to ASIMD LDR/STR, thus avoiding the
predicate.
For example,
svuint8_t foo (uint8_t *x) {
  return svld1 (svwhilelt_b8 (0, 16), x);
}
was previously compiled to:
foo:
ptrue p3.b, vl16
ld1b z0.b, p3/z, [x0]
ret

and is now compiled to:
foo:
ldr q0, [x0]
ret

The optimization is applied during the expand pass and was implemented
by making the following changes to maskload<mode><vpred> and
maskstore<mode><vpred>:
- the existing define_insns were renamed and new define_expands for maskloads
  and maskstores were added with nonmemory_operand as predicate such that the
  SVE predicate matches both register operands and constant-vector operands.
- if the SVE predicate is a constant vector and contains a pattern as
  described above, an ASIMD load/store is emitted instead of the SVE load/store.

The patch implements the optimization for LD1 and ST1, for 8-bit, 16-bit,
32-bit, 64-bit, and 128-bit moves, for all full SVE data vector modes.

Follow-up patches for LD2/3/4 and ST2/3/4 and potentially partial SVE vector
modes are planned.

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
PR target/117978
* config/aarch64/aarch64-protos.h: Declare
aarch64_emit_load_store_through_mode and aarch64_sve_maskloadstore.
* config/aarch64/aarch64-sve.md
(maskload<mode><vpred>): New define_expand folding maskloads with
certain predicate patterns to ASIMD loads.
(*aarch64_maskload<mode><vpred>): Renamed from maskload<mode><vpred>.
(maskstore<mode><vpred>): New define_expand folding maskstores with
certain predicate patterns to ASIMD stores.
(*aarch64_maskstore<mode><vpred>): Renamed from maskstore<mode><vpred>.
* config/aarch64/aarch64.cc
(aarch64_emit_load_store_through_mode): New function emitting a
load/store through subregs of a given mode.
(aarch64_emit_sve_pred_move): Refactor to use
aarch64_emit_load_store_through_mode.
(aarch64_expand_maskloadstore): New function to emit ASIMD loads/stores
for maskloads/stores with SVE predicates with VL1, VL2, VL4, VL8, or
VL16 patterns.
(aarch64_partial_ptrue_length): New function returning number of leading
set bits in a predicate.

gcc/testsuite/
PR target/117978
* gcc.target/aarch64/sve/acle/general/whilelt_5.c: Adjust expected
outcome.
* gcc.target/aarch64/sve/ldst_ptrue_pat_128_to_neon.c: New test.
* gcc.target/aarch64/sve/while_7.c: Adjust expected outcome.
* gcc.target/aarch64/sve/while_9.c: Adjust expected outcome.

libgomp.fortran/map-alloc-comp-9{,-usm}.f90: Add unified_shared_memory variant

When host memory is device accessible - independent whether mapping is done or
not (i.e. self map), the 'vtab' pointer becomes accessible, which stores the
dynamic type's type and size information.

In principle, we want to test: USM available but mapping is still done, but
as there is no simple + reliable not-crashing way to test for this, those
checks are skipped in the (pre)existing test file map-alloc-comp-9.f90.

Or rather: those are only active with self-maps, which is currently only true
for the host.

This commit adds map-alloc-comp-9-usm.f90 which runs the same test with
'omp requires unified_shared_memory'. While OpenMP permits both actual
mapping and self maps with this flag, it in theory covers the missing cases.
However, currently, GCC always uses self maps with USM. Still, having a
device-run self-maps check is better than nothing, even if it misses the
most interesting case.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/map-alloc-comp-9.f90: Process differently
when USE_USM_REQUIREMENT is set.
* testsuite/libgomp.fortran/map-alloc-comp-9-usm.f90: New test.

libstdc++: Fix module std export for std::extents

libstdc++-v3/ChangeLog:

* src/c++23/std.cc.in: Fix export for std::extents.

libstdc++: Add tests for std::extents.

A prior commit added std::extents, this commit adds the tests. The bulk
is focussed on testing the constructors. These are split into three
groups:

1. the ctor from other extents and the copy ctor,
2. the ctor from a pack of integer-like objects,
3. the ctor from shapes, i.e. span and array.

For each group check that the ctor:
* produces an object with the expected values for extent,
* is implicit if and only if required,
* is constexpr,
* doesn't change the rank of the extent.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/extents/class_mandates_neg.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_copy.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_ints.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_shape.cc: New test.
* testsuite/23_containers/mdspan/extents/custom_integer.cc: New test.
* testsuite/23_containers/mdspan/extents/misc.cc: New test.

Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>

libstdc++: Implement std::extents [PR107761].

This implements std::extents from <mdspan> according to N4950 and
contains partial progress towards PR107761.

If an extent changes its type, there's a precondition in the standard,
that the value is representable in the target integer type. This
precondition is not checked at runtime.

The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
For extents<T> this precondition is always violated and results in
calling __builtin_trap. For all other specializations it's checked via
__glibcxx_assert.

PR libstdc++/107761

libstdc++-v3/ChangeLog:

* include/std/mdspan (extents): New class.
* src/c++23/std.cc.in: Add 'using std::extents'.

Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>

libstdc++: Add header mdspan to the build-system.

Creates a nearly empty header mdspan and adds it to the build-system and
Doxygen config file.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in: Add <mdspan>.
* include/Makefile.am: Ditto.
* include/Makefile.in: Ditto.
* include/precompiled/stdc++.h: Ditto.
* include/std/mdspan: New file.

Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>

libstdc++: Setup internal FTM for mdspan.

Uses the FTM infrastructure to create an internal feature testing macro
for partial availability of mdspan; which is then used to hide the
contents of the header mdspan when compiling against a standard prior to
C++23.

libstdc++-v3/ChangeLog:

* include/bits/version.def: Add internal feature testing macro
__glibcxx_mdspan.
* include/bits/version.h: Regenerate.

Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>

s390: Add cstoreti4 expander

For target VXE3 just emit a 128-bit comparison followed by a conditional
load. For targets prior VXE3, emulate the 128-bit comparison and make
use of a conditional load, too.

gcc/ChangeLog:

* config/s390/s390-protos.h (s390_expand_cstoreti4): New
function.
* config/s390/s390.cc (s390_expand_cstoreti4): New function.
* config/s390/s390.md (CC_SUZ): New mode iterator.
(l): New mode attribute.
(cc_tolower): New mode attribute.
* config/s390/vector.md (cstoreti4): New expander.
(*vec_cmpv2di_lane0_<cc_tolower>): New insn.
(*vec_cmpti_<cc_tolower>): New insn.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/cstoreti-1.c: New test.
* gcc.target/s390/vector/cstoreti-2.c: New test.

libstdc++: Fix width computation for the chrono formatting [PR120114]

Use `__unicode::_field_width` to compute the field width of the output when writting
the formatted output for std::chrono::types. This applies both to characters copied
from format string, and one produced by localized formatting.

We also use _Str_sink::view() instead of get(), which avoids copying the content of
the buffer to std::string in case of small output.

PR libstdc++/120114

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_format): Use __field_width.
* testsuite/std/time/format/pr120114.cc: New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Remove use of undefined GLIBCXX_LANG_{PUSH,POP} [PR120147]

Commit r16-427-g86627faec10da5 was using the new GLIBCXX_LANG_PUSH and
GLIBCXX_LANG_POP macros from a change that I haven't pushed yet,
resulting in changes to CXXFLAGS not being restored after the
GLIBCXX_ENABLE_BACKTRACE checks.

libstdc++-v3/ChangeLog:

PR libstdc++/120147
* acinclude.m4 (GLIBCXX_ENABLE_BACKTRACE): Restore use of
AC_LANG_CPLUSPLUS.
* configure: Regenerate.

x86: Insert extra move for mode size smaller than natural size

When generating a SUBREG from V16QI to V2HF, validate_subreg fails since
V2HF is a floating point vector and its size (4 bytes) is smaller than its
natural size (word size). Insert an extra move with a QI vector SUBREG of
the same size to avoid validate_subreg failure.

gcc/

PR target/120036
* config/i386/i386-features.cc (ix86_get_vector_load_mode):
Handle 8/4/2 bytes.
(remove_redundant_vector_load): If the mode size is smaller than
its natural size, first insert an extra move with a QI vector
SUBREG of the same size to avoid validate_subreg failure.

gcc/testsuite/

PR target/120036
* g++.target/i386/pr120036.C: New test.
* gcc.target/i386/pr117839-3a.c: Likewise.
* gcc.target/i386/pr117839-3b.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Fix name mismatch for fortran.

Function name in afdo_string_table is step3d_t_tile.
but DECL_ASSEMBLER_NAME (edge->callee->decl))) gets
__step3d_t_mod_MOD_step3d_t_tile, Looks like the prefix is not in the
debug string table.
The patch uses
afdo_string_table->get_index_by_decl (edge->callee->decl) instead.

gcc/ChangeLog:

PR gcov-profile/118508
* auto-profile.cc
(autofdo_source_profile::get_callsite_total_count): Fix name
mismatch for fortran.

Fortran: Source allocation of pure module function rejected [PR119948]

2025-05-07 Paul Thomas <pault@gcc.gnu.org>
and Steven G. Kargl <kargl@gcc.gnu.org>

gcc/fortran
PR fortran/119948
* primary.cc (match_variable): Module procedures with sym the
same as result can be treated as variables, although marked
external.

gcc/testsuite/
PR fortran/119948
* gfortran.dg/pr119948.f90: Update to incorporate failing test,
where module procedure is the result. Test submodule cases.

[RISC-V] Avoid unnecessary andi with -1 argument

I was preparing to do some testing of Shreya's next patch on spec and stumbled
across another "andi dst,src,-1" case.  I fixed some stuff like this in the
gcc-15 cycle, but this one slipped through.

It's probably about 100M instructions on deepsjeng.  So tiny, but there's no
good reason to leave the clearly extraneous instructions in the output.

As with the other cases, it's a post-reload splitter that's not being careful
enough about the code it generates.

This has gone through my tester successfully.  Waiting on the pre-commit tester
before going forward.

gcc/
* config/riscv/riscv.md (*branch<ANYI:mode>_shiftedarith_equals_zero):
Avoid generating unnecessary andi.  Fix formatting.

gcc/testsuite
* g++.target/riscv/redundant-andi.C: New test.

Daily bump.

[PATCH] RISC-V: Minimal support for sdtrig and ssstrict extensions.

This patch support sdtrig and ssstrict extensions[1].
To enable GCC to recognize and process sdtrig and ssstrict extensions correctly
at compile time.

[1] https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-47.c: New test.

[PATCH] RISC-V: Recognized svadu and svade extension

This patch support svadu and svade extension.
To enable GCC to recognize and process svadu and svade extension correctly at compile time.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table): New
extension.
(riscv_ext_flag_table) Ditto.
* config/riscv/riscv.opt: New mask.

* doc/invoke.texi (RISC-V Options): New extension

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-45.c: New test.
* gcc.target/riscv/arch-46.c: New test.

i386: Add costs for integer<->float conversions

Extend ix86_rtx_costs to cost FLOAT, UNSIGNED_FLOAT, FIX, and UNSIGNED_FIX.
There are many variants of integer<->float conversions and it seems
meaningful to start with the typical scalar and vector ones. On modern CPUs the
variants differs by at most 1 cycle.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_rtx_costs): Cost FLOAT, UNSIGNED_FLOAT,
FIX, UNSIGNED_FIX.
* config/i386/i386.h (struct processor_costs): Add
cvtsi2ss, cvtss2si, cvtpi2ps, cvtps2pi.
* config/i386/x86-tune-costs.h (struct processor_costs): Update tables.

Fortran: Fix ICE with use of c_associated.

PR fortran/120049

gcc/fortran/ChangeLog:

* check.cc (gfc_check_c_associated): Modify checks to avoid
ICE and allow use, intrinsic :: iso_c_binding from a separate
module file.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr120049_a.f90: New test.
* gfortran.dg/pr120049_b.f90: New test.

libstdc++: Rewrite atomic builtin checks [PR70560]

Currently the GLIBCXX_ENABLE_ATOMIC_BUILTINS macro checks for a variety
of __atomic built-ins for bool, short and int. If all those checks pass,
then it defines _GLIBCXX_ATOMIC_BUILTINS and uses the definitions from
config/cpu/generic/atomicity_builtins/atomicity.h for the non-inline
versions of __exchange_and_add and __atomic_add that get compiled into
libsupc++.

However, the config/cpu/generic/atomicity_builtins/atomicity.h
definitions only depend on __atomic_fetch_add not on
__atomic_test_and_set or __atomic_compare_exchange. And they only
operate on a variable of type _Atomic word, which is not necessarily one
of bool, short or int (e.g. for sparcv9 _Atomic_word is 64-bit long).

This means that for a target where _Atomic_word is int but there are no
1-byte or 2-byte atomic instructions, GLIBCXX_ENABLE_ATOMIC_BUILTINS
will fail the checks for bool and short and not define the macro
_GLIBCXX_ATOMIC_BUILTINS. That means that we will use a single global
mutex for reference counting in the COW std::string and std::locale,
even though we could use __atomic_fetch_add to do it lock-free.

This commit removes most of the GLIBCXX_ENABLE_ATOMIC_BUILTINS checks,
so that it only checks __atomic_fetch_add on _Atomic_word. The macro
defined by GLIBCXX_ENABLE_ATOMIC_BUILTINS is renamed from
_GLIBCXX_ATOMIC_BUILTINS to _GLIBCXX_ATOMIC_WORD_BUILTINS to better
reflect what it really means. This will enable the inline versions of
__exchange_and_add and __atomic_add for more targets. This is not an ABI
change, because targets which didn't previously use the inline
definitions of those functions made non-inlined calls to the functions
in the library. If the definitions of those functions now start using
atomics, that doesn't change the semantics for the code calling those
functions.

On affected targets, new code compiled after this change will see the
_GLIBCXX_ATOMIC_WORD_BUILTINS macro and so will use the always-inline
versions of __exchange_and_add and __atomic_add, which use
__atomic_fetch_add directly. That is also compatible with older code
which still calls the non-inline definitions, because those non-inline
definitions now also use __atomic_fetch_add.

The only configuration where this could be an ABI change is for a target
which previously defined _GLIBCXX_ATOMIC_BUILTINS (because all the
atomic built-ins for bool, short and int are supported), but which
defines _Atomic_word to some other type for which __atomic_fetch_add is
/not/ supported. Such a target would have called the inline functions
using __atomic_fetch_add, which would actually have depended on
libatomic (which is what the configure checks were supposed to
prevent!). After this change, that target would not define the new
macro, _GLIBCXX_ATOMIC_WORD_BUILTINS, and so would make non-inline calls
into the library where __exchange_and_add and __atomic_add would use the
global mutex. That would be an ABI break. I don't consider that a
realistic scenario, because it wouldn't have made any sense to define
_Atomic_word to a wider type than int, when doing so would have required
libatomic to make libstdc++.so work. Surely such a target would have
just used int for its _Atomic_word type.

The GLIBCXX_ENABLE_BACKTRACE macro currently uses the
glibcxx_ac_atomic_int variable defined by the checks that this commit
removes from GLIBCXX_ENABLE_ATOMIC_BUILTINS. That wasn't a good check
anyway, because libbacktrace actually depends on atomic loads+stores for
pointers as well as int, and for atomic stores for size_t. This commit
replaces the glibcxx_ac_atomic_int check with a proper test for all the
required atomic operations on all three of int, void* and size_t. This
ensures that the libbacktrace code used for std::stacktrace will either
use native atomics, or implement those loads and stores only in terms of
__sync_bool_compare_and_swap (possibly requiring that to come from
libatomic or elsewhere).

libstdc++-v3/ChangeLog:

PR libstdc++/70560
PR libstdc++/119667
* acinclude.m4 (GLIBCXX_ENABLE_ATOMIC_BUILTINS): Only check for
__atomic_fetch_add on _Atomic_word. Define new macro
_GLIBCXX_ATOMIC_WORD_BUILTINS and stop defining macro
_GLIBCXX_ATOMIC_BUILTINS.
(GLIBCXX_ENABLE_BACKTRACE): Check for __atomic_load_n and
__atomic_store_n on int, void* and size_t.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.host: Fix typo in comment.
* include/ext/atomicity.h (__exchange_and_add, __atomic_add):
Depend on _GLIBCXX_ATOMIC_WORD_BUILTINS macro instead of old
_GLIBCXX_ATOMIC_BUILTINS macro.

libstdc++: Fix <numeric> parallel algos for move-only values [PR117905]

All of reduce, transform_reduce, exclusive_scan, and inclusive_scan,
transform_exclusive_scan, and transform_inclusive_scan have a
precondition that the type of init meets the Cpp17MoveConstructible
requirements. It isn't required to be copy constructible, so when
passing it to the next internal function it needs to be moved, not
copied. We also need to move when creating local variables on the stack,
and when returning as part of a pair.

libstdc++-v3/ChangeLog:

PR libstdc++/117905
* include/pstl/glue_numeric_impl.h (reduce, transform_reduce)
(transform_reduce, inclusive_scan, transform_exclusive_scan)
(transform_inclusive_scan): Use std::move for __init parameter.
* include/pstl/numeric_impl.h (__brick_transform_reduce)
(__pattern_transform_reduce, __brick_transform_scan)
(__pattern_transform_scan): Likewise.
* include/std/numeric (inclusive_scan, transform_exclusive_scan):
Use std::move to create local copy of the first element.
* testsuite/26_numerics/pstl/numeric_ops/108236.cc: Move test
using move-only type to ...
* testsuite/26_numerics/pstl/numeric_ops/move_only.cc: New test.

libstdc++: Fix dangling pointer in fs::path::operator+=(*this) [PR120029]

When concatenating a path we reallocate the left operand's storage to
make room for the new components being added. When the two operands are
the same object, or the right operand is one of the components of the
left operand, the reallocation invalidates the pointers that refer
into the right operand's storage.

The solution in this commit is to detect these aliasing cases and just
do the concatenation in terms of the contained string, as that code
already handles the case where the string aliases the path. The standard
specifies the concatenation in terms of the native() string, so all this
change does is disable the optimized implementation of concatenation for
path objects which attempts to avoid re-parsing the path from the
concatenated string.

The potential loss of performance for this case isn't likely to be an
issue, because concatenating a path with itself (or one of its existing
components) probably isn't a common use case.

The Filesystem TS implementation doesn't have the optimized form of
concatenation and always does it in terms of the native string and
reparsing the whole thing, so doesn't have this bug. A test is added to
confirm that anyway (that test has some slightly different results due
to different behaviour for trailing slashes and implicit "." filenames
in the TS spec).

libstdc++-v3/ChangeLog:

PR libstdc++/120029
* src/c++17/fs_path.cc (path::operator+=(const path&)): Handle
parameters that alias the path or one of its components.
* testsuite/27_io/filesystem/path/concat/120029.cc: New test.
* testsuite/experimental/filesystem/path/concat/120029.cc: New
test.

libstdc++: Fix -Wmismatched-tags warnings for _Safe_iterator [PR120112]

This causes an ICE as shown in the PR, but it should be fixed in the
library code anyway.

libstdc++-v3/ChangeLog:

PR c++/120112
* include/bits/ptr_traits.h (_Safe_iterator_base): Use class
keyword in class-head of declaration.
* include/debug/debug.h (_Safe_iterator): Likewise.

Fix PR 119928, formal arguments used to wrongly inferred for CLASS.

The problem was indeed that generating a formal from an actual
arglist is a bad idea when classes are involved. Fixed in the
attached patch. I think it still makes sense to remove the checks
when the other attributes are present (or PR96073 may come back
in different guise, even if I have to test case at present).
I have also converted the test to a run-time check.

gcc/fortran/ChangeLog:

PR fortran/119928
* interface.cc (gfc_check_dummy_characteristics): Do not issue
error if one dummy symbol has been generated from an actual
argument and the other one has OPTIONAL, INTENT, ALLOCATABLE,
POINTER, TARGET, VALUE, ASYNCHRONOUS or CONTIGUOUS.
(gfc_get_formal_from_actual_arglist): Do nothing if symbol
is a class.

gcc/testsuite/ChangeLog:

PR fortran/119928
* gfortran.dg/interface_60.f90: New test.

ipa: Drop the default value of suffix parameter of create_clone (PR119852)

In PR 119852 we agreed that since the NULL-ness of the suffix
parameter should prevent creation of a record in the ipa-clones
dump (which is implemented by a previous patch), it should not default
to NULL.

gcc/ChangeLog:

2025-04-25 Martin Jambor <mjambor@suse.cz>

PR ipa/119852
* cgraph.h (cgraph_node::create_clone): Remove the default value of
argument suffix. Update function comment.
* cgraphclones.cc (cgraph_node::create_clone): Update function comment.
* ipa-inline-transform.cc (clone_inlined_nodes): Pass NULL to suffix
of create_clone explicitely.
* ipa-inline.cc (recursive_inlining): Likewise.
* lto-cgraph.cc (input_node): Likewise.

ipa: Fix create_version_clone_with_body declaration and comment

I noticed that the name of the fifth parameter of
cgraph_node::create_version_clone_with_body is different in the class
definition in cgraph.h and in the actual member function definition in
cgraphclones.cc.  The former (clone_name) is misleading and so this
patch changes it to the latter (suffix) which is also used in related
functions.

The patch also updates the function comment in both places because it
clearly became out of date.

gcc/ChangeLog:

2025-04-25  Martin Jambor  <mjambor@suse.cz>

* cgraph.h (cgraph_node::create_version_clone_with_body): Fix function
comment.  Change the name of clone_name to suffix, in line with the
function definition.
* cgraphclones.cc (cgraph_node::create_version_clone_with_body): Fix
function comment.

ipa: Do not emit info about temporary clones to ipa-clones dump (PR119852)

As described in PR 119852, the output of -fdump-ipa-clones can contain
"(null)" as the suffix/reason for cloning when we need to create a
clone to hold the original function during recursive inlining.  Such
clone is never output and so should not be part of the dump output
either.

gcc/ChangeLog:

2025-04-23  Martin Jambor  <mjambor@suse.cz>

PR ipa/119852
* cgraphclones.cc (dump_callgraph_transformation): Document the
function.  Do not dump if suffix is NULL.

gcc/testsuite/ChangeLog:

2025-04-23  Martin Jambor  <mjambor@suse.cz>

PR ipa/119852
* gcc.dg/ipa/pr119852.c: New test.

Document option -fdump-ipa-clones

I have noticed that the option -fdump-ipa-clones is not documented
although there are users who depend on it. This patch adds the
missing documentation along with the description of the information it
dumps and the format it uses.

I am never quite sure which of the texinfo mark-ups is the most
appropriate in which situation, I'll of course incorporate any
feedback on this as well as the general wording of the text.

After we settle on a version, I'd like to backport the documentation
also at least to GCC 15, 14 and 13.

Is it perhaps OK for master and the branches or what would better be
changed?

Thanks,

Martin

gcc/ChangeLog:

2025-04-23 Martin Jambor <mjambor@suse.cz>

* doc/invoke.texi (Developer Options): Document -fdump-ipa-clones.

libgcobol: Fix bootstrap for targets without program_invocation_short_name

program_invocation_short_name is not widely available, however getprogname()
appears to be a suitable replacement.

Amend the library configuration to look for both. Use program_invocation_short_name
in preference to getprogname() when it is available. If neither is found fall
back to a constant string.

libgcobol/ChangeLog:

* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check for program_invocation_short_name and
and getprogname().
* libgcobol.cc (default_exception_handler): When the platform
has program_invocation_short_name, use it otherwise fall
back to using getprogname() or a constant string (if neither
interface is available).

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

diagnostics: use diagnostic_option_id in one more place

No functional change intended.

gcc/ChangeLog:
* selftest-diagnostic.cc (test_diagnostic_context::report): Use
diagnostic_option_id rather than plain int.
* selftest-diagnostic.h (test_diagnostic_context::report):
Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

json: implement JSON pointer; use it in sarif-replay [PR117988]

This patch extends our json class to track JSON pointers (RFC 6901),
and then uses this within sarif-replay to provide logical locations
within the JSON when reporting on issues in the SARIF.

gcc/ChangeLog:
PR sarif-replay/117988
* json.cc (json::pointer::token::token): New ctors.
(json::pointer::token::~token): New.
(json::pointer::token::operator=): New.
(json::object::set): Set the value's m_pointer_token.
(json::array::append): Likewise.
* json.h (json::pointer::token): New struct.
(json::value::get_pointer_token): New accessor.
(json::value::m_pointer_token): New field.
* libsarifreplay.cc (get_logical_location_kind_for_json_kind):
New.
(make_logical_location_from_jv): New.
(sarif_replayer::report_problem): Set the logical location of the
diagnostic.

gcc/testsuite/ChangeLog:
PR sarif-replay/117988
* sarif-replay.dg/2.1.0-invalid/3.1-not-an-object.sarif: Add
expected logical location.
* sarif-replay.dg/2.1.0-invalid/3.11.11-missing-arguments-for-placeholders.sarif:
Likewise.
* sarif-replay.dg/2.1.0-invalid/3.11.11-not-enough-arguments-for-placeholders.sarif:
Likewise.
* sarif-replay.dg/2.1.0-invalid/3.11.5-unescaped-braces.sarif: Likewise.
* sarif-replay.dg/2.1.0-invalid/3.13.2-no-version.sarif: Likewise.
* sarif-replay.dg/2.1.0-invalid/3.13.2-version-not-a-string.sarif: Likewise.
* sarif-replay.dg/2.1.0-invalid/3.13.4-bad-runs.sarif: Likewise.
* sarif-replay.dg/2.1.0-invalid/3.13.4-no-runs.sarif: Likewise.
* sarif-replay.dg/2.1.0-invalid/3.13.4-non-object-in-runs.sarif: Likewise.
* sarif-replay.dg/2.1.0-invalid/3.27.10-bad-level.sarif: Likewise.
* sarif-replay.dg/2.1.0-invalid/3.33.3-index-out-of-range.sarif: Likewise.
* sarif-replay.dg/2.1.0-unhandled/3.27.10-none-level.sarif: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>