gcc.gnu.org Git - gcc.git/log

Daily bump.

Fix wrong optimization of complex boolean expression

The VRP2 pass turns:

  # prephitmp_3 = PHI <0(4)>
  _1 = prephitmp_3 == 0;
  _5 = stretch_14(D) ^ 1;
  _39 = _1 & _5;
  _40 = _39 | last_20(D);

into

  _5 = stretch_14(D) ^ 1;
  _42 = ~stretch_14(D);
  _39 = _42;
  _40 = last_20(D) | _39;

using the following step:

Folding statement: _1 = prephitmp_3 == 0;
Queued stmt for removal.  Folds to: 1
Folding statement: _5 = stretch_14(D) ^ 1;
Not folded
Folding statement: _39 = _1 & _5;
gimple_simplified to _42 = ~stretch_14(D);
_39 = _42 & 1;
Folded into: _39 = _42;

Folding statement: _40 = _39 | last_20(D);
Folded into: _40 = last_20(D) | _39;

but stretch_14 is a 8-bit boolean so the two forms are not equivalent, that
is to say dropping the "& 1" is wrong.  It's another instance of the issue:
  https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558537.html

Here it's the reverse case: the bitwise NOT (~) is treated as logical by the
machinery in range-op.cc but the bitwise AND (&) is *not* treated as logical
by that of vr-values.cc, leading to the same problematic outcome.

gcc/
* vr-values.cc (simplify_using_ranges::simplify) <BIT_AND_EXPR>:
Do not call simplify_bit_ops_using_ranges for boolean types whose
precision is not 1.

gcc/testsuite/
* gnat.dg/opt106.adb: New test.
* gnat.dg/opt106_pkg1.ads, gnat.dg/opt106_pkg1.adb: New helper.
* gnat.dg/opt106_pkg2.ads, gnat.dg/opt106_pkg2.adb: Likewise.

tree-optimization/120156 - ICE in ptr_derefs_may_alias_p

This picks the ptr_derefs_may_alias_p fix from the PR99954 fix
which said: This makes us run into a latent issue in
ptr_deref_may_alias_decl_p when the pointer is something like &MEM[0].a
in which case we fail to handle non-SSA name pointers. Add code
similar to what we have in ptr_derefs_may_alias_p.

PR tree-optimization/120156
* tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): Verify
the pointer is an SSA name.

(cherry picked from commit c290e6a0b7a9de5692963affc6627a4af7dc2411)

Daily bump.

aarch64: Fix CFA offsets in non-initial stack probes [PR119610]

PR119610 is about incorrect CFI output for a stack probe when that
probe is not the initial allocation.  The main aarch64 stack probe
function, aarch64_allocate_and_probe_stack_space, implicitly assumed
that the incoming stack pointer pointed to the top of the frame,
and thus held the CFA.

aarch64_save_callee_saves and aarch64_restore_callee_saves use a
parameter called bytes_below_sp to track how far the stack pointer
is above the base of the static frame.  This patch does the same
thing for aarch64_allocate_and_probe_stack_space.

Also, I noticed that the SVE path was attaching the first CFA note
to the wrong instruction: it was attaching the note to the calculation
of the stack size, rather than to the r11<-sp copy.

gcc/
PR target/119610
* config/aarch64/aarch64.cc (aarch64_allocate_and_probe_stack_space):
Add a bytes_below_sp parameter and use it to calculate the CFA
offsets.  Attach the first SVE CFA note to the move into the
associated temporary register.
(aarch64_allocate_and_probe_stack_space): Update calls accordingly.
Start out with bytes_per_sp set to the frame size and decrement
it after each allocation.

gcc/testsuite/
PR target/119610
* g++.dg/torture/pr119610.C: New test.
* g++.target/aarch64/sve/pr119610-sve.C: Likewise.

(cherry picked from commit fa61afef18a8566d1907a5ae0e7754e1eac207d9)

rs6000: Ignore OPTION_MASK_SAVE_TOC_INDIRECT differences in inlining decisions [PR119327]

The following testcase FAILs because the always_inline function can't
be inlined.
The rs6000 backend has similarly to other targets a hook which rejects
inlining which would bring in new ISAs which aren't there in the caller.
And this hook rejects this because of OPTION_MASK_SAVE_TOC_INDIRECT
differences.
This flag is set if explicitly requested or by default depending on
whether the current function looks hot (or at least not cold):
  if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
      && flag_shrink_wrap_separate
      && optimize_function_for_speed_p (cfun))
    rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
The target nodes that are being compared here are actually the default
target node (which was created when cfun was NULL) vs. one that was
created for the always_inline function when it wasn't NULL, so one
doesn't have it, the other does.
In any case, this flag feels like a tuning decision rather than hard
ISA requirement and I see no problems why we couldn't inline
even explicit -msave-toc-indirect function into -mno-save-toc-indirect
or vice versa.
We already ignore OPTION_MASK_P{8,10}_FUSION which are also more
like tuning flags.

2025-04-22  Jakub Jelinek  <jakub@redhat.com>

PR target/119327
* config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore also
OPTION_MASK_SAVE_TOC_INDIRECT differences.

* g++.dg/opt/pr119327.C: New test.

(cherry picked from commit 4b62cf555b5446cb02fc471519cf1afa09e1a108)

libcpp: Further fixes for incorrect line numbers in large files [PR120061]

The backport of the PR108900 fix to 14 branch broke building chromium
because static_assert (__LINE__ == expected_line_number, ""); now triggers
as the __LINE__ values are off by one.
This isn't the case on the trunk and 15 branch because we've switched
to 64-bit location_t and so one actually needs far longer header files
to trigger it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c12
contain (large) testcases in patch form which show on the 14 branch
that the first one used to fail before the PR108900 backport and now
works correctly, while the second one attempts to match the chromium
behavior and it used to pass before the PR108900 backport and now it
FAILs.
The two testcases show rare problematic cases, because
do_include_common -> parse_include -> check_eol -> check_eol_1 ->
cpp_get_token_1 -> _cpp_lex_token -> _cpp_lex_direct -> linemap_line_start
triggers there
      /* Allocate the new line_map.  However, if the current map only has a
         single line we can sometimes just increase its column_bits instead. */
      if (line_delta < 0
          || last_line != ORDINARY_MAP_STARTING_LINE_NUMBER (map)
          || SOURCE_COLUMN (map, highest) >= (1U << (column_bits - range_bits))
          || ( /* We can't reuse the map if the line offset is sufficiently
                  large to cause overflow when computing location_t values.  */
              (to_line - ORDINARY_MAP_STARTING_LINE_NUMBER (map))
              >= (((uint64_t) 1)
                  << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
          || range_bits < map->m_range_bits)
        map = linemap_check_ordinary
                (const_cast <line_map *>
                  (linemap_add (set, LC_RENAME,
                                ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
                                ORDINARY_MAP_FILE_NAME (map),
                                to_line)));
and so creates a new ordinary map on the line right after the
(problematic) #include line.
Now, in the spot that r14-11679-g8a884140c2bcb7 patched,
pfile->line_table->highest_location in all 3 tests (also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c13
) is before the decrement the start of the line after the #include line and so
the decrement is really desirable in that case to put highest_location
[jakub@tucnak gcc-15]$ git log -1 --format=%B r15-9638-gbfcb5da69a41f7a5e41faab39b763d9d7c8bd2ea | cat
libcpp: Further fixes for incorrect line numbers in large files [PR120061]

The backport of the PR108900 fix to 14 branch broke building chromium
because static_assert (__LINE__ == expected_line_number, ""); now triggers
as the __LINE__ values are off by one.
This isn't the case on the trunk and 15 branch because we've switched
to 64-bit location_t and so one actually needs far longer header files
to trigger it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c12
contain (large) testcases in patch form which show on the 14 branch
that the first one used to fail before the PR108900 backport and now
works correctly, while the second one attempts to match the chromium
behavior and it used to pass before the PR108900 backport and now it
FAILs.
The two testcases show rare problematic cases, because
do_include_common -> parse_include -> check_eol -> check_eol_1 ->
cpp_get_token_1 -> _cpp_lex_token -> _cpp_lex_direct -> linemap_line_start
triggers there
      /* Allocate the new line_map.  However, if the current map only has a
         single line we can sometimes just increase its column_bits instead. */
      if (line_delta < 0
          || last_line != ORDINARY_MAP_STARTING_LINE_NUMBER (map)
          || SOURCE_COLUMN (map, highest) >= (1U << (column_bits - range_bits))
          || ( /* We can't reuse the map if the line offset is sufficiently
                  large to cause overflow when computing location_t values.  */
              (to_line - ORDINARY_MAP_STARTING_LINE_NUMBER (map))
              >= (((uint64_t) 1)
                  << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
          || range_bits < map->m_range_bits)
        map = linemap_check_ordinary
                (const_cast <line_map *>
                  (linemap_add (set, LC_RENAME,
                                ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
                                ORDINARY_MAP_FILE_NAME (map),
                                to_line)));
and so creates a new ordinary map on the line right after the
(problematic) #include line.
Now, in the spot that r14-11679-g8a884140c2bcb7 patched,
pfile->line_table->highest_location in all 3 tests (also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c13
) is before the decrement the start of the line after the #include line and so
the decrement is really desirable in that case to put highest_location
somewhere on the line where the #include actually is.
But at the same time it is also undesirable, because if we do decrement it,
then linemap_add LC_ENTER called from _cpp_do_file_change will then
  /* Generate a start_location above the current highest_location.
     If possible, make the low range bits be zero.  */
  location_t start_location = set->highest_location + 1;
  unsigned range_bits = 0;
  if (start_location < LINE_MAP_MAX_LOCATION_WITH_COLS)
    range_bits = set->default_range_bits;
  start_location += (1 << range_bits) - 1;
  start_location &=  ~((1 << range_bits) - 1);

  linemap_assert (!LINEMAPS_ORDINARY_USED (set)
                  || (start_location
                      >= MAP_START_LOCATION (LINEMAPS_LAST_ORDINARY_MAP (set))));
and we can end up with the new LC_ENTER ordinary map having the same
start_location as the preceding LC_RENAME one.
Next thing that happens is computation of included_from:
  if (reason == LC_ENTER)
    {
      if (set->depth == 0)
        map->included_from = 0;
      else
        /* The location of the end of the just-closed map.  */
        map->included_from
          = (((map[0].start_location - 1 - map[-1].start_location)
              & ~((1 << map[-1].m_column_and_range_bits) - 1))
             + map[-1].start_location);
The normal case (e.g. with the testcase included at the start of this comment) is
that map[-1] starts somewhere earlier and so map->included_from computation above
nicely computes location_t which expands to the start of the #include line.
With r14-11679 reverted, for #c11 as well as #c12
map[0].start_location == map[-1].start_location above, and so it is
((location_t) -1 & ~((1 << map[-1].m_column_and_range_bits) - 1)))
+ map[-1].start_location,
which happens to be start of the #include line.
For #c11 map[0].start_location is 0x500003a0 and map[-1] has
m_column_and_range_bits 7 and map[-2] has m_column_and_range_bits 12 and
map[0].included_from is set to 0x50000320.
For #c12 map[0].start_location is 0x606c0402 and map[-2].start_location is
0x606c0400 and m_column_and_range_bits is 0 for all 3 maps.
map[0].included_from is set to 0x606c0401.
The last important part is again in linemap_add when doing LC_LEAVE:
      /* (MAP - 1) points to the map we are leaving. The
         map from which (MAP - 1) got included should be the map
         that comes right before MAP in the same file.  */
      from = linemap_included_from_linemap (set, map - 1);

      /* A TO_FILE of NULL is special - we use the natural values.  */
      if (to_file == NULL)
        {
          to_file = ORDINARY_MAP_FILE_NAME (from);
          to_line = SOURCE_LINE (from, from[1].start_location);
          sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
        }
Here it wants to compute the right to_line which ought to be the line after
the #include directive.
On the #c11 testcase that doesn't work correctly though, because
map[-1].included_from is 0x50000320, from[0] for that is LC_ENTER with
start_location 0x4080 and m_column_and_range_bits 12 but note that we've
earlier computed map[-1].start_location + (-1 & 0xffffff80) and so only
decreased by 7 bits, so to_line is still on the line with #include and not
after it.  In the #c12 that doesn't happen, all the ordinary maps involved
there had 0 m_column_and_range_bits and so this computes correct line.

Below is a fix for the trunk including testcases using the
location_overflow_plugin hack to simulate the bugs without needing huge
files (in the 14 case it is just 330KB and almost 10MB, but in the 15
case it would need to be far bigger).
The pre- r15-9018 trunk has
FAIL: gcc.dg/plugin/location-overflow-test-pr116047.c -fplugin=./location_overflow_plugin.so  scan-file static_assert[^\n\r]*6[^\n\r]*== 6
and current trunk
FAIL: gcc.dg/plugin/location-overflow-test-pr116047.c -fplugin=./location_overflow_plugin.so  scan-file static_assert[^\n\r]*6[^\n\r]*== 6
FAIL: gcc.dg/plugin/location-overflow-test-pr120061.c -fplugin=./location_overflow_plugin.so  scan-file static_assert[^\n\r]*5[^\n\r]*== 5
and with the patch everything PASSes.

The patch reverts the r14-11679 change, because it is incorrect,
we really need to decrement it even when crossing ordinary map
boundaries, so that the location is not on the line after the #include
line but somewhere on the #include line.  It also patches two spots
in linemap_add mentioned above to make sure we get correct locations
both in the included_from location_t when doing LC_ENTER (second
line-map.cc hunk) and when doing LC_LEAVE to compute the right to_line
(first line-map.cc hunk), both in presence of an added LC_RENAME
with the same start_location as the following LC_ENTER (i.e. the
problematic cases).
The LC_ENTER hunk is mostly to ensure included_form location_t is
at the start of the #include line (column 0), without it we can
decrease include_from not enough and end up at some random column
in the middle of the line, because it is masking away
map[-1].m_column_and_range_bits bits even when in the end the resulting
include_from location_t will be found in map[-2] map with perhaps
different m_column_and_range_bits.  That alone doesn't fix the bug
though.
The more important is the LC_LEAVE hunk and the problem there is
caused by linemap_line_start not actually doing
    r = set->highest_line + (line_delta << map->m_column_and_range_bits);
when adding a new map (the LC_RENAME one because we need to switch to
different number of directly encoded ranges, or columns, etc.).
So, in the original PR108900 case that
  to_line = SOURCE_LINE (from, from[1].start_location);
doesn't do the right thing, from there is the last < 0x50000000 map
with m_column_and_range_bits 12, from[1] is the first one above it
and map[-1].included_from is the correct location of column 0 on
the #include line, but as the new LC_RENAME map has been created without
actually increasing highest_location to be on the new line (we've just
set to_line of the new LC_RENAME map to the correct line),
  to_line = SOURCE_LINE (from, from[1].start_location);
stays on the same source line.  I've tried to just replace that with
  to_line = SOURCE_LINE (from, linemap_included_from (map - 1)) + 1;
i.e. just find out the #include line from map[-1].included_from and
add 1 to it, unfortunately that breaks the
c-c++-common/cpp/line-4.c
test where we expect to stay on the same 0 line for LC_LEAVE from
<command line> and gcc.dg/cpp/trad/Wunused.c, gcc.dg/cpp/trad/builtins.c
and c-c++-common/analyzer/named-constants-via-macros-traditional.c tests
all with -traditional-cpp preprocessing where to_line is also off-by-one
from the expected one.
So, this patch instead conditionalizes it, uses the
  to_line = SOURCE_LINE (from, linemap_included_from (map - 1)) + 1;
way only if from[1] is a LC_RENAME map (rather than the usual
LC_ENTER one), that should limit it to the problematic cases of when
parse_include peeked after EOL and had to create LC_RENAME map with
the same start_location as the LC_ENTER after it.

Some further justification for the LC_ENTER hunk, using the
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682774.html testcase
(old is 14 before r14-11679, vanilla current 14 and new with the 14 patch)
I get
$ /usr/src/gcc-14/obj/gcc/cc1.old -quiet -std=c23 pr116047.c -nostdinc
In file included from pr116047-1.h:327677:21,
                 from pr116047.c:4:
pr116047-2.h:1:1: error: unknown type name ‘a’
    1 | a b c;
      | ^
pr116047-2.h:1:5: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘c’
    1 | a b c;
      |     ^
pr116047-1.h:327677:1: error: static assertion failed: ""
327677 | #include "pr116047-2.h"
       | ^~~~~~~~~~~~~
$ /usr/src/gcc-14/obj/gcc/cc1.vanilla -quiet -std=c23 pr116047.c -nostdinc
In file included from pr116047-1.h:327678,
                 from pr116047.c:4:
pr116047-2.h:1:1: error: unknown type name ‘a’
    1 | a b c;
      | ^
pr116047-2.h:1:5: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘c’
    1 | a b c;
      |     ^
$ /usr/src/gcc-14/obj/gcc/cc1.new -quiet -std=c23 pr116047.c -nostdinc
In file included from pr116047-1.h:327677,
                 from pr116047.c:4:
pr116047-2.h:1:1: error: unknown type name ‘a’
    1 | a b c;
      | ^
pr116047-2.h:1:5: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘c’
    1 | a b c;
      |     ^

pr116047-1.h has on lines 327677+327678:
#include "pr116047-2.h"
static_assert (__LINE__ == 327678, "");
so the static_assert failure is something that was dealt mainly in the
LC_LEAVE hunk and files.cc reversion, but please have a look at the
In file included from lines.
14.2 emits correct line (#include "pr116047-2.h" is indeed on line
327677) but some random column in there (which is not normally printed
for smaller headers; 21 is the . before extension in the filename).
Current trunk emits incorrect line (327678 instead of 327677, clearly
it didn't decrement).
And the patched compiler emits the right line with no column, as would
be printed if I remove e.g. 300000 newlines from the file.

2025-05-08  Jakub Jelinek  <jakub@redhat.com>

PR preprocessor/108900
PR preprocessor/116047
PR preprocessor/120061
* files.cc (_cpp_stack_file): Revert 2025-03-28 change.
* line-map.cc (linemap_add): Use
SOURCE_LINE (from, linemap_included_from (map - 1)) + 1; instead of
SOURCE_LINE (from, from[1].start_location); to compute to_line
for LC_LEAVE if from[1].reason is LC_RENAME.  For LC_ENTER
included_from computation, look at map[-2] or even lower if map[-1]
has the same start_location as map[0].

* gcc.dg/plugin/plugin.exp: Add location-overflow-test-pr116047.c
and location-overflow-test-pr120061.c.
* gcc.dg/plugin/location_overflow_plugin.c (plugin_init): Don't error
on unknown values, instead just break.
* gcc.dg/plugin/location-overflow-test-pr116047.c: New test.
* gcc.dg/plugin/location-overflow-test-pr116047-1.h: New test.
* gcc.dg/plugin/location-overflow-test-pr116047-2.h: New test.
* gcc.dg/plugin/location-overflow-test-pr120061.c: New test.
* gcc.dg/plugin/location-overflow-test-pr120061-1.h: New test.
* gcc.dg/plugin/location-overflow-test-pr120061-2.h: New test.

Daily bump.

vect: Use original LHS type for gather pattern [PR118950].

In PR118950 we do not zero masked elements in a gather load.
While recognizing a gather/scatter pattern we do not use the original
type of the LHS. This matters because the type can differ with bool
patterns (e.g. _Bool vs unsigned char) and we don't notice the need
for zeroing out the padding bytes.

This patch just uses the original LHS's type.

PR middle-end/118950

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Use
original LHS's type.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr118950.c: New test.

(cherry picked from commit f3d4208e798afafcba5246334004e9646e390681)

Daily bump.

Allow IPA_CP to handle UNDEFINED as VARYING.

When applying a bitmask to reflect ranges, it is sometimes deferred and
this can result in an UNDEFINED result. IPA is not expecting this, and
add a check for it, and convert to VARYING if encountered.

PR tree-optimization/120048
gcc/
* ipa-cp.cc (ipcp_store_vr_results): Check for UNDEFINED.

gcc/testsuite/
* gcc.dg/pr120048.c: New.

c++: add fixed testcase [PR109464]

Seems to be fixed by r15-521-g6ad7ca1bb90573.

PR c++/109464

gcc/testsuite/ChangeLog:

* g++.dg/template/explicit-instantiation8.C: New test.

(cherry picked from commit 58a9f3ded1a0ccc2c8b0a42f976950041734798e)

c++: Optimize in maybe_clone_body aliases even when not at_eof [PR113208]

This patch reworks the cdtor alias optimization, such that we can create
aliases even when maybe_clone_body is called not at at_eof time, without trying
to repeat it in maybe_optimize_cdtor.

2024-05-15 Jakub Jelinek <jakub@redhat.com>
Jason Merrill <jason@redhat.com>

PR lto/113208
* cp-tree.h (maybe_optimize_cdtor): Remove.
* decl2.cc (tentative_decl_linkage): Call maybe_make_one_only
for implicit instantiations of maybe in charge ctors/dtors
declared inline.
(import_export_decl): Don't call maybe_optimize_cdtor.
(c_parse_final_cleanups): Formatting fixes.
* optimize.cc (can_alias_cdtor): Adjust condition, for
HAVE_COMDAT_GROUP && DECL_ONE_ONLY && DECL_WEAK return true even
if not DECL_INTERFACE_KNOWN.
(maybe_clone_body): Don't clear DECL_SAVED_TREE, instead set it
to void_node.
(maybe_clone_body): Remove.
* decl.cc (cxx_comdat_group): For DECL_CLONED_FUNCTION_P
functions if SUPPORTS_ONE_ONLY return DECL_COMDAT_GROUP if already
set.

* g++.dg/abi/comdat3.C: New test.
* g++.dg/abi/comdat4.C: New test.

(cherry picked from commit 6ad7ca1bb905736c0f57688e93e9e77cbc71a325)

libstdc++: Add missing feature-test macro in <memory>

Per version.syn#2, <memory> is required to define
__cpp_lib_addressof_constexpr as 201603L.

Bootstrapped and tested on aarch64-linux-gnu.

Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com>
libstdc++-v3/ChangeLog:
* include/std/memory: Define __glibcxx_want_addressof_constexpr.
* testsuite/20_util/headers/memory/version.cc: Test for macro
value.

(cherry picked from commit 0e65fef8717f404cf9c85bff51bf87d534f87828)

Daily bump.

c++/coroutines: Fix awaiter var creation [PR116506]

Awaiters always need to have a coroutine state frame copy since
they persist across potential supensions. It simplifies the later
analysis considerably to assign these early which we do when
building co_await expressions.

The cleanups in r15-3146-g47dbd69b1, unfortunately elided some of
processing used to cater for cases where the var created from an
xvalue, or is a pointer/reference type.

Corrected thus.

PR c++/116506
PR c++/116880

gcc/cp/ChangeLog:

* coroutines.cc (build_co_await): Ensure that xvalues are
materialised. Handle references/pointer values in awaiter
access expressions.
(is_stable_lvalue): New.
* decl.cc (cxx_maybe_build_cleanup): Handle null arg.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr116506.C: New test.
* g++.dg/coroutines/pr116880.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Co-authored-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 4c743798b1d4530b327dad7c606c610f3811fdbf)

c++: fix conversion issues

back-port the coroutine part of this.

Some issues caught by a check from another patch:

In build_ramp_function, we're assigning REFERENCE_TYPE things, so we need to
build the assignment directly rather than rely on functions that implement
C++ semantics.

gcc/cp/ChangeLog:

* coroutines.cc (cp_coroutine_transform::build_ramp_function): Build
INIT_EXPR directly.

(cherry picked from commit dd3f3c71df66ed6fd3872ab780f5813831100d1c)

c++, coroutines:Ensure bind exprs are visited once [PR98935].

Recent changes in the C++ FE and the coroutines implementation have
exposed a latent issue in which a bind expression containing a var
that we need to capture in the coroutine state gets visited twice.
This causes an ICE (from a checking assert).  Fixed by adding a pset
to the relevant tree walk.  Exit the callback early when the tree is
not a BIND_EXPR.

PR c++/98935

gcc/cp/ChangeLog:

* coroutines.cc (register_local_var_uses): Add a pset to the
tree walk to avoid visiting the same BIND_EXPR twice.  Make
an early exit for cases that the callback does not apply.
(cp_coroutine_transform::apply_transforms): Add a pset to the
tree walk for register_local_vars.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr98935.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit bd8c7e71f516bae29a5a9e517b266141458f3977)

c++/coro: ignore cleanup_point_exprs while expanding awaits [PR116793]

If we reach a CLEANUP_POINT_EXPR while trying to walk statements, we
actually care about the statement or statement list contained within it.

Indeed, such a construction started happening with
r15-3513-g964577c31df206, after temporary promotion.  In the test case
presented in PR116793, the compiler generated:

  <<cleanup_point {
    struct _cleanup_task Aw0 [value-expr: frame_ptr->Aw0_2_3];
    int T002 [value-expr: frame_ptr->T002_2_3];

      int T002 [value-expr: frame_ptr->T002_2_3];
    <<cleanup_point <<< Unknown tree: expr_stmt
      (void) (T002 = TARGET_EXPR <D.20994, 3>) >>>>>;
      struct _cleanup_task Aw0 [value-expr: frame_ptr->Aw0_2_3];
    <<cleanup_point <<< Unknown tree: expr_stmt
      (void) (Aw0 = TARGET_EXPR <D.20995, func ((int &) &T002)>) >>>>>;
    <<cleanup_point <<< Unknown tree: expr_stmt
      (void) (D.22450 = <<< Unknown tree: co_await
        TARGET_EXPR <D.20995, func ((int &) &T002)>
        Aw0

        {_cleanup_task::await_ready (&Aw0), _cleanup_task::await_suspend<_task1::promise_type> (&Aw0, TARGET_EXPR <D.21078, _Coro_self_handle>), <<< Unknown tree: aggr_init_expr
          4
          await_resume
          D.22443
          &Aw0 >>>}
        0 >>>) >>>>>;
    <<cleanup_point <<< Unknown tree: expr_stmt
      (void) (D.20991 = (struct tuple &) &D.22450) >>>>>;
  }
  D.22467 = 1;
  int & i [value-expr: frame_ptr->i_1_2];
  <<cleanup_point <<< Unknown tree: expr_stmt
    (void) (i = std::get<0, int&> (NON_LVALUE_EXPR <D.20991>)) >>>>>;>>;

... i.e. a statement list within a cleanup point.  In such a case, we
don't actually care about the cleanup point, but we do care about the
statement inside, so, we can just walk down into the CLEANUP_POINT_EXPR.

PR c++/116793

gcc/cp/ChangeLog:

* coroutines.cc (await_statement_expander): Just process
subtrees if encountering a CLEANUP_POINT_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr116793-1.C: New test.

(cherry picked from commit d888a8a8dcf391197ae82e2bbf99507effc27950)

c++: simplify handling implicit INDIRECT_REF and co_await in convert_to_void

convert_to_void has, so far, when converting a co_await expression to
void altered the await_resume expression of a co_await so that it is
also converted to void.  This meant that the type of the await_resume
expression, which is also supposed to be the type of the whole co_await
expression, was not the same as the type of the CO_AWAIT_EXPR tree.

While this has not caused problems so far, it is unexpected, I think.

Also, convert_to_void had a special case when an INDIRECT_REF wrapped a
CALL_EXPR.  In this case, we also diagnosed maybe_warn_nodiscard.  This
was a duplication of logic related to converting call expressions to
void.

Instead, we can generalize a bit, and rather discard the expression that
was implicitly dereferenced instead.

This patch changes the diagnostic of:

  void f(struct S* x) { static_cast<volatile S&>(*x); }

... from:

  warning: indirection will not access object of incomplete type
           'volatile S' in statement

... to:

  warning: implicit dereference will not access object of type
           ‘volatile S’ in statement

... but should have no impact in other cases.

gcc/cp/ChangeLog:

* coroutines.cc (co_await_get_resume_call): Return a tree
directly, rather than a tree pointer.
* cp-tree.h (co_await_get_resume_call): Adjust signature
accordingly.
* cvt.cc (convert_to_void): Do not alter CO_AWAIT_EXPRs when
discarding them.  Simplify handling implicit INDIRECT_REFs.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/nodiscard-1.C: New test.

(cherry picked from commit de03ef6337b0a368d61c74b790313b4216c7ed6e)

c++/coro: prevent ICV_STATEMENT diagnostics in temp promotion [PR116502]

If such a diagnostic is necessary, it has already been emitted,
otherwise, it is not correct and emitting it here is inactionable by the
user, and bogus.

PR c++/116502

gcc/cp/ChangeLog:

* coroutines.cc (maybe_promote_temps): Convert temporary
initializers to void without complaining.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/maybe-unused-1.C: New test.
* g++.dg/coroutines/pr116502.C: New test.

(cherry picked from commit 05e4f07cad1eacf869c10622cae2a9cdee3b6a7a)

c++, coroutines: Revise promise construction/destruction.

In examining the coroutine testcases for unexpected diagnostic output
for 'Wall', I found a 'statement has no effect' warning for the promise
construction in one case.  In particular, the case is where the users
promise type has an implicit CTOR but a user-provided DTOR. Further, the
type does not actually need constructing.

In very early versions of the coroutines code we used to check
TYPE_NEEDS_CONSTRUCTING() to determine whether to attempt to build
a constructor call for the promise.  During review, it was suggested
to use type_build_ctor_call () instead.

This latter call checks the constructors in the type (both user-defined
and implicit) and returns true, amongst other cases if any of the found
CTORs are marked as deprecated.

In a number of places (for example [class.copy.ctor] / 6) the standard
says that some version of an implicit CTOR is deprecated when the user
provides a DTOR.

Thus, for this specific arrangement of promise type, type_build_ctor_call
returns true, because of (for example) a deprecated implicit copy CTOR.

We are not going to use any of the deprecated CTORs and thus will not
see warnings from this - however, since the call returned true, we have
now determined that we should attempt to build a constructor call.

Note as above, the type does not actually require construction and thus
one might expect either a NULL_TREE or error_mark_node in response to
the build_special_member_call ().  However, in practice the function
returns the original instance object instead of a call or some error.

When we add that as a statement it triggers the 'statement has no effect'
warning.

The patch here rearranges the promise construction/destruction code to
allow for the case that a DTOR is required independently of a CTOR. In
addition, we check that the return from build_special_member_call ()
has side effects before we add it as a statement.

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Separate the
build of promise constructor and destructor.  When evaluating
the constructor, check that build_special_member_call returns
an expression with side effects before adding it.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit 7d1483921941d21d91f929ef0d59a9794b1946b4)

c++, coroutines: Fix handling of bool await_suspend() [PR115905].

As noted in the PR the action of the existing implementation was to
treat a false value from await_suspend () as equivalent to "do not
suspend". Actually it needs to be the equivalent of "resume" - and
we need to restart the dispatcher - since the await_suspend() body
could have already resumed the coroutine.
See also https://github.com/cplusplus/CWG/issues/601 (NAD) for more
discussion.

Since we need to amend the await expansion and the actor build, take
the opportunity to clean up and modernise the code there. Note that
we need to make the jump back to the dispatcher without any scope
exit cleanups (so we have to use the .CO_SUSPN IFN to do this).

PR c++/115905

gcc/cp/ChangeLog:

* coroutines.cc (struct coro_aw_data): Add a member for the
restart dispatch label.
(expand_one_await_expression): Rework to modernise and to
handle the boolean await_suspend() case.
(build_actor_fn): Rework the dispatcher and allow for a jump
back to the dispatcher.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/pr115905.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit 368ba7aed46d57d093c0180baae4dc0e0ba468b6)

coros: mark .CO_YIELD as LEAF [PR106973]

We rely on .CO_YIELD calls being followed by an assignment (optionally)
and then a switch/if in the same basic block. This implies that a
.CO_YIELD can never end a block. However, since a call to .CO_YIELD is
still a call, if the function containing it calls setjmp, GCC thinks
that the .CO_YIELD can introduce abnormal control flow, and generates an
edge for the call.

We know this is not the case; .CO_YIELD calls get removed quite early on
and have no effect, and result in no other calls, so .CO_YIELD can be
considered a leaf function, preventing generating an edge when calling
it.

PR c++/106973 - coroutine generator and setjmp

PR c++/106973

gcc/ChangeLog:

* internal-fn.def (CO_YIELD): Mark as ECF_LEAF.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr106973.C: New test.

(cherry picked from commit 7b7ad3f4b2455072f42e7884b93fd96ebb920bc8)

c++: don't remove labels during coro-early-expand-ifns [PR105104]

In some scenarios, it is possible for the CFG cleanup to cause one of
the labels mentioned in CO_YIELD, which coro-early-expand-ifns intends
to remove, to become part of some statement. As a result, when that
label is removed, the statement it became part of becomes invalid,
crashing the compiler.

There doesn't appear to be a reason to remove the labels (anymore, at
least), so let's not do that.

PR c++/105104

gcc/ChangeLog:

* coroutine-passes.cc (execute_early_expand_coro_ifns): Don't
remove any labels.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/pr105104.C: New test.

(cherry picked from commit d9c54e9a036189e8961ec17e118fccf794d7bfab)

c++, coroutines: The frame pointer is used in the helpers [PR116482].

We have a bogus warning about the coroutine state frame pointers
being apparently unused in the resume and destroy functions. Fixed
by making the parameters DECL_ARTIFICIAL.

PR c++/116482

gcc/cp/ChangeLog:

* coroutines.cc
(coro_build_actor_or_destroy_function): Make the parameter
decls DECL_ARTIFICIAL.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr116482.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit 8d6d6c864442a1cc987b3e6bcb1d903ceb975e4a)

c++/coros: do not assume coros don't nest [PR113457]

In the testcase presented in the PR, during template expansion, an
tsubst of an operand causes a lambda coroutine to be processed, causing
it to get an initial suspend and final suspend.  The code for assigning
awaitable var names (get_awaitable_var) assumed that the sequence Is ->
Is -> Fs -> Fs is impossible (i.e. that one could only 'open' one
coroutine before closing it at a time), and reset the counter used for
unique numbering each time a final suspend occured.  This assumption is
false in a few cases, usually when lambdas are involved.

Instead of storing this counter in a static-storage variable, we can
store it in coroutine_info.  This struct is local to each function, so
we don't need to worry about "cross-contamination" nor resetting.

PR c++/113457

gcc/cp/ChangeLog:

* coroutines.cc (struct coroutine_info): Add integer field
awaitable_number.  This is a counter used for assigning unique
names to awaitable temporaries.
(get_awaitable_var): Use awaitable_number from coroutine_info
instead of the static int awn.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr113457-1.C: New test.
* g++.dg/coroutines/pr113457.C: New test.

(cherry picked from commit 5cca7517c5868b7b9aa13992145eb6082ac5d5b9)

c++, coroutines: Look through initial_await target exprs [PR110635].

In the case that the initial awaiter returns an object, the initial await
can be a target expression and we need to look at its initializer to cast
the await_resume() to void and to wrap in a compound expression that sets
the initial_await_resume_called flag.

PR c++/110635

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Look through
initial await target expressions to find the actual co_await_expr
that we need to update.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr110635.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit c442a9b78bdbebdbcb4a8f91bc36961eb732fbdf)

c++, coroutines: Rework handling of throwing_cleanups [PR102051].

In the fix for PR95822 (r11-7402) we set throwing_cleanup false in the top
level of the coroutine transform code. However, as the current PR shows,
that is not sufficient.

Any use of cxx_maybe_build_cleanup() can reset the flag, which causes the
check_return_expr () logic to try to add a guard variable and set it.

For the coroutine code, we need to handle the cleanups separately, since
the responsibility for them changes after the first resume point, which
we handle in the ramp exception processing.

Fix this by forcing the "throwing_cleanup" flag false right before the
processing of the return expression.

PR c++/102051

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Handle
"throwing_cleanup" here instead of ...
(cp_coroutine_transform::apply_transforms): ... here.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr102051.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit f0315f7a325ffccb446fe378fcdfccda6eead8ba)

c++, coroutines: Allow convertible get_return_on_allocation_fail [PR109682].

We have been requiring the get_return_on_allocation_fail() call to have the
same type as the ramp. This is not intended by the standard, so relax that
to allow anything convertible to the ramp return.

PR c++/109682

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Allow for cases where
get_return_on_allocation_fail has a type convertible to the ramp
return type.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr109682.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit f4915e6c4cd42e7d6f397dc36fab507cc47dad05)

c++, coroutines: Only allow void get_return_object if the ramp is void [PR100476].

Require that the value returned by get_return_object is convertible to
the ramp return. This means that the only time we allow a void
get_return_object, is when the ramp is also a void function.

We diagnose this early to allow us to exit the ramp build if the return
values are incompatible.

PR c++/100476

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Remove special
handling of void get_return_object expressions.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C:
Adjust expected diagnostic.
* g++.dg/coroutines/pr102489.C: Avoid void get_return_object.
* g++.dg/coroutines/pr103868.C: Likewise.
* g++.dg/coroutines/pr94879-folly-1.C: Likewise.
* g++.dg/coroutines/pr94883-folly-2.C: Likewise.
* g++.dg/coroutines/pr96749-2.C: Likewise.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit a0b431033c307982123abbff752045cfe7eda47f)

Daily bump.

c++, coroutines: check for members we use in handle_types [PR105475]

Currently, it is possible to ICE GCC by giving it sufficiently broken
code, where sufficiently broken means a std::coroutine_handle missing a
default on the promise_type template argument, and missing members.
As the code generator relies on lookups in the coroutine_handle never
failing (and has no way to signal that error), lets do it ahead of time,
save the result, and use that.  This saves us some lookups and allows us
to propagate an error.

PR c++/105475 - coroutines: ICE in coerce_template_parms, at cp/pt.cc:9183

This includes refactoring from 370a0dee55569, ffd521d8dcddcd4 and
00019b88e714c29c387.

gcc/cp/ChangeLog:

PR c++/105475
* coroutines.cc (struct coroutine_info): Add from_address.
Carries the from_address member we looked up earlier.
(coro_resume_identifier): Remove.  Unused.
(coro_init_identifiers): Do not initialize the above.
(struct susp_frame_data): Remove unused members, provide a CTOR.
(void_coro_handle_address): New variable.  Contains the baselink
for the std::coroutine_handle<void>::address() instance method.
(get_handle_type_address): New function.  Looks up and validates
handle_type::address in a given handle_type.
(get_handle_type_from_address): New function.  Looks up and
validates handle_type::from_address in a given handle_type.
(coro_promise_type_found_p): Remove reliance on
coroutine_handle<> defaulting the promise type to void.  Store
get_handle_type_* results where appropriate.
(struct local_vars_frame_data): Add a CTOR.
(replace_continue): Look up expression type.
(get_coroutine_from_address): New helper.  Gets the
handle_type::from_address BASELINK from a coroutine_info.
(morph_fn_to_coro): Use susp_frame_data CTOR, and make the suspend
state hash map local to the morph function. Use CTOR for
local_vars_frame_data instead of brace init.
(build_actor_fn): Use the get_coroutine_from_address helper and
void_coro_handle_address.

gcc/testsuite/ChangeLog:

PR c++/105475
* g++.dg/coroutines/pr103868.C: Add std::coroutine_handle
members we check for now.
* g++.dg/coroutines/pr105287.C: Ditto.
* g++.dg/coroutines/pr105301.C: Ditto.
* g++.dg/coroutines/pr94528.C: Ditto.
* g++.dg/coroutines/pr94879-folly-1.C: Ditto.
* g++.dg/coroutines/pr94883-folly-2.C: Ditto.
* g++.dg/coroutines/pr98118.C: Ditto.
* g++.dg/coroutines/pr105475.C: New test.
* g++.dg/coroutines/pr105475-1.C: New test.
* g++.dg/coroutines/pr105475-2.C: New test.
* g++.dg/coroutines/pr105475-3.C: New test.
* g++.dg/coroutines/pr105475-4.C: New test.
* g++.dg/coroutines/pr105475-5.C: New test.
* g++.dg/coroutines/pr105475-6.C: New test.
* g++.dg/coroutines/pr105475-broken-spec.C: New test.
* g++.dg/coroutines/pr105475-broken-spec-2.C: New test.

Co-authored-by: Arsen Arsenović <arsen@aarsen.me>
(cherry picked from commit 5b4476a165565cb20729c0a97a3f43b060595209)

c++/coroutines: only defer expanding co_{await,return,yield} if dependent [PR112341]

By doing so, we can get diagnostics in template decls when we know we
can.  For instance, in the following:

  awaitable g();
  template<typename>
  task f()
  {
    co_await g();
    co_yield 1;
    co_return "foo";
  }

... the coroutine promise type in each statement is always
std::coroutine_handle<task>::promise_type, and all of the operands are
not type-dependent, so we can always compute the resulting types (and
expected types) of these expressions and statements.

Also, when we do not know the type of the CO_AWAIT_EXPR or
CO_YIELD_EXPR, we now return NULL_TREE as the type rather than
unknown_type_node.  This is more correct, since the type is not unknown,
it just isn't determined yet.  This also means we can remove the
CO_AWAIT_EXPR and CO_YIELD_EXPR special-cases from
type_dependent_expression_p.

PR c++/112341 - error: insufficient contextual information to determine type on co_await result in function template

gcc/cp/ChangeLog:

PR c++/112341
* coroutines.cc (struct coroutine_info): Also cache the
traits type.
(ensure_coro_initialized): New function.  Makes sure we have
initialized the coroutine state successfully, or informs the
caller should it fail to do so.  Extracted from
coro_promise_type_found_p.
(coro_get_traits_class): New function.  Gets the (cached)
coroutine traits type for a given coroutine.  Extracted from
coro_promise_type_found_p and refactored to cache the result.
(coro_promise_type_found_p): Use the two functions above.
(build_template_co_await_expr): New function.  Builds a
CO_AWAIT_EXPR representing a CO_AWAIT_EXPR in a template
declaration.
(build_co_await): Use the above if processing_template_decl, and
give it a proper type.
(coro_dependent_p): New function.  Returns true iff its
argument is a type-dependent expression OR the current functions
traits class is type dependent.
(finish_co_await_expr): Defer expansion only in the case
coro_dependent_p returns true.
(finish_co_yield_expr): Ditto.
(finish_co_return_stmt): Ditto.
* pt.cc (type_dependent_expression_p): Do not treat
CO_AWAIT/CO_YIELD specially.

gcc/testsuite/ChangeLog:

PR c++/112341
* g++.dg/coroutines/pr112341-2.C: New test.
* g++.dg/coroutines/pr112341-3.C: New test.
* g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C: New
test.
* g++.dg/coroutines/pr112341.C: New test.

(cherry picked from commit 32e678b2ed752154b2f96719e33f11a7c6417f20)

c++: diagnose usage of co_await and co_yield in default args [PR115906]

This is a partial fix for PR115906. Per [expr.await] 2s3, "An
await-expression shall not appear in a default argument
([dcl.fct.default])". This patch introduces the diagnostic in that
case, and in the case of a co_yield (as co_yield is defined in terms of
co_await, so prerequisites of co_await hold).

PR c++/115906 - [coroutines] missing diagnostic and ICE when co_await used as default argument in function declaration

gcc/cp/ChangeLog:

PR c++/115906
* parser.cc (cp_parser_unary_expression): Reject await
expressions if use of local variables is currently forbidden.
(cp_parser_yield_expression): Reject yield expressions if use of
local variables is currently forbidden.

gcc/testsuite/ChangeLog:

PR c++/115906
* g++.dg/coroutines/pr115906-yield.C: New test.
* g++.dg/coroutines/pr115906.C: New test.
* g++.dg/coroutines/co-await-syntax-02-outside-fn.C: Don't rely
on default arguments.
* g++.dg/coroutines/co-yield-syntax-01-outside-fn.C: Ditto.

(cherry picked from commit 0c382da0943dc7d14455ba2ada2f620a25bd1366)

c++: fix ICE on FUNCTION_DECLs inside coroutines [PR115906]

When register_local_var_uses iterates a BIND_EXPRs BIND_EXPR_VARS, it
fails to account for the fact that FUNCTION_DECLs might be present, and
later passes it to DECL_HAS_VALUE_EXPR_P.  This leads to a tree check
failure in DECL_HAS_VALUE_EXPR_P:

  tree check: expected var_decl or parm_decl or result_decl, have
  function_decl in register_local_var_uses

We only care about PARM_DECL and VAR_DECL, so select only those.

PR c++/115906 - [coroutines] missing diagnostic and ICE when co_await used as default argument in function declaration

gcc/cp/ChangeLog:

PR c++/115906
* coroutines.cc (register_local_var_uses): Only process
PARM_DECL and VAR_DECLs.

gcc/testsuite/ChangeLog:

PR c++/115906
* g++.dg/coroutines/coro-function-decl.C: New test.

(cherry picked from commit a362c9ca4ef6585e678f899705043a9aa10dd670)

cp/coroutines: do not rewrite parameters in unevaluated contexts

It is possible to use parameters of a parent function of a lambda in
unevaluated contexts without capturing them.  By not capturing them, we
work around the usual mechanism we use to prevent rewriting captured
parameters.  Prevent this by simply skipping rewrites in unevaluated
contexts.  Those won't mind the value not being present anyway.

This prevents an ICE during parameter substitution.  In the testcase
from the PR, the rewriting machinery finds a param in the body of the
coroutine, which it did not previously encounter while processing the
coroutine declaration, and that does not have a DECL_VALUE_EXPR, and
fails.

gcc/cp/ChangeLog:

PR c++/111728
* coroutines.cc (rewrite_param_uses): Skip unevaluated
subexpressions.

gcc/testsuite/ChangeLog:

PR c++/111728
* g++.dg/coroutines/pr111728.C: New test.

(cherry picked from commit 1a37d6b732506f8c3f9e9452c9dc6a456f25397b)

c++, coroutines, contracts: Handle coroutine and void functions [PR110871,PR110872,PR115434].

The current implementation of contracts emits the checks into function
bodies in three places; for pre-conditions at the start of the body,
for asserts in-line in the function body and for post-conditions as an
addition to return statements.

In general (at least with existing "2a" contract semantics) the in-line
contract asserts behave as expected.

However, the mechanism is not applicable to:

* Handling pre conditions in coroutines since, for those, the standard
  specifies a wrapping of the original function body by functionality
  implementing initial and final suspends (along with some housekeeping
  to route exceptions).  Thus for such transformed function bodies, the
  preconditions then get actioned after the initial suspend, which does
  not behave as intended.

  * Handling post conditions in functions that do not have return
    statements (which applies to coroutines and void functions).

In the following, we identify a potentially transformed function body
(in the case of coroutines, this is usually called the "ramp()" function).

The patch here re-implements the code insertion in one of the two
following ways (code for exposition only):

  * For functions with no post-conditions we wrap the potentially
    transformed function as follows:

  {
     handle_pre_condition_checking ();
     potentially_transformed_function_body ();
  }

  This implements the intent that the preconditions are processed after
  the function parameters are initialised but before any other actions.

  * For functions with post-conditions:

  if (preconditions_exist)
    handle_pre_condition_checking ();
  try
   {
     potentially_transformed_function_body ();
   }
  finally
   {
     handle_post_condition_checking ();
   }
  else [only if the function is not marked noexcept(true) ]
   {
     ;
   }

In this, post-conditions [that might apply to the return value etc.]
are evaluated on every non-exceptional edge out of the function.

At present, the model here is that exceptions thrown by the function
propagate upwards as if there were no contracts present.  If the desired
semantic becomes that an exception is counted as equivalent to a contract
violation - then we can add a second handler in place of the empty
statement.

This patch specifically does not address changes to code-gen and constexpr
handling that are contained in P2900.

PR c++/115434
PR c++/110871
PR c++/110872

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Handle EH_ELSE_EXPR.
* contracts.cc (finish_contract_attribute): Remove excess line.
(build_contract_condition_function): Post condition handlers are
void now.
(emit_postconditions_cleanup): Remove.
(emit_postconditions): New.
(add_pre_condition_fn_call): New.
(add_post_condition_fn_call): New.
(apply_preconditions): New.
(apply_postconditions): New.
(maybe_apply_function_contracts): New.
(apply_postcondition_to_return): Remove.
* contracts.h (apply_postcondition_to_return): Remove.
(maybe_apply_function_contracts): Add.
* coroutines.cc (coro_build_actor_or_destroy_function): Do not
copy contracts to coroutine helpers.
* decl.cc (finish_function): Handle wrapping a possibly
transformed function body in contract checks.
* typeck.cc (check_return_expr): Remove handling of post
conditions on return expressions.

gcc/ChangeLog:

* gimplify.cc (struct gimplify_ctx): Add a flag to show we are
expending a handler.
(gimplify_expr): When we are expanding a handler, and the body
transforms might have re-written DECL_RESULT into a gimple var,
ensure that hander references to DECL_RESULT are also re-written
to refer to the gimple var.  When we are processing an EH_ELSE
expression, then add it if either of the cleanup slots is in
use.

gcc/testsuite/ChangeLog:

* g++.dg/contracts/pr115434.C: New test.
* g++.dg/coroutines/pr110871.C: New test.
* g++.dg/coroutines/pr110872.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit d1706235ed2b274a2d1fa3c3039b5874b4ae7a0e)

configure, Darwin: Recognise new naming for Xcode ld.

The latest editions of XCode have altered the identify reported by 'ld -v'
(again). This means that GCC configure no longer detects the version.

Fixed by adding the new name to the set checked.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Recognise PROJECT:ld-mmmm.nn.aa as an identifier
for Darwin's static linker.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit 7f56a8e8ad1c33d358e9e09fcbaf263c2caba1b9)

includes, Darwin: Handle modular use for macOS SDKs [PR116827].

Recent changes to the OS SDKs have altered the way in which include guards
are used for a number of headers when C++ modules are enabled.  Instead of
placing the guards in the included header, they are being placed in the
including header.  This breaks the assumptions in the current GCC stddef.h
specifically, that the presence of __PTRDIFF_T and __SIZE_T means that the
relevant defs are already made.  However in the case of the module-enabled
C++ with these SDKs, that is no longer true.

stddef.h has a large body of special-cases already, but it seems that the
only viable solution here is to add a new one specifically for __APPLE__
and modular code.

This fixes around 280 new fails in the modules test-suite; it is needed on
all open branches that support modules.

PR target/116827

gcc/ChangeLog:

* ginclude/stddef.h: Undefine __PTRDIFF_T and __SIZE_T for module-
enabled c++ on Darwin/macOS platforms.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit 9cf6b52d04df22726d88eef113211b3cc08515de)

testsuite, gm2: Use -B option for libstdc++ where required.

We need to add testsuite  options to locate gm2 libs and libstdc++.

Usually '-L' options are added to point to the relevant directories for
the uninstalled libraries.

In cases where libraries are available as both shared and convenience some
additional checks are made.

For some targets -static-xxxx options are handled by specs substitution and
need a '-B' option rather than '-L'.  For Darwin, when embedded runpaths are
in use (the default for all versions after macOS 10.11), '-B' is also needed
to provide the runpath.

When '-B' is used, this results in a '-L' for each path that exists (so that
appending a '-L' as well is a needless duplicate).  There are also cases
where tools warn for duplicates, leading to spurious fails.

Therefore the objective of the code here is to add just one '-L' or '-B' for
each of the libraries.

Currently, we are forcing the full paths to each of the gm2 convenience libs
onto the link line and therefore the B/L logic is not needed there.  It would
need to be added if/when gm2 is tested with shared libraries

gcc/testsuite/ChangeLog:

* lib/gm2.exp: Arrange for a '-B' option to be added for the
libstdc++ paths on targets that need it.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit 6b9ceac9e4e2be304c39e6bc8744edf21faac4fb)

Darwin: Pass -macos_version_min to the linker [PR119172].

For binaries to be notarised, the SDK version must be available.
Since we do not, at present, parse this information we have been
passing "0.0" to ld64. This now results in a warning and a fail
to notarise. As a quick-fix, we can fall back to letting ld64
figure out the SDK version (which it does for -macos_version_min).

TODO: Parse the SDKSetting.plist at some point.

cherry-picked from 952e17223d3a9 and fc728cfd569e291a5

PR target/119172

gcc/ChangeLog:

* config.in: Regenerate.
* config/darwin.h (DARWIN_PLATFORM_ID): Add the option to
use -macos_version_min where available.
* configure: Regenerate.
* configure.ac: Check for ld64 support of -macos_version_min.

Co-authored-by: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

fixincludes: adjust stdio fix for macOS 15 headers

fixincludes/ChangeLog:

* fixincl.x: Regenerate.
* inclhack.def (apple_local_stdio_fn_deprecation): Also apply to
_stdio.h.

(cherry picked from commit 1dc143181550573c9c902fb7a3b495e9b409d0b0)

libgcc, Darwin: Drop the legacy library build for macOS >= 10.12 [PR116809].

From macOSX15 SDK, the unwinder no longer exports some of the symbols used
in that library which (a) causes bootstrap fail and (b) means that the
legacy library is no longer useful.

No open branch of GCC emits references to this library - and any already
-built code that depends on the symbols would need rework anyway.

We have been asked to extend this back to the earliest OS vesion supported
by the SDK (10.12).

PR target/116809

libgcc/ChangeLog:

* config.host: Build legacy libgcc_s.1 on hosts before macOS 10.12.
* config/i386/t-darwin: Remove reference to legacy libgcc_s.1
* config/rs6000/t-darwin: Likewise.
* config/t-darwin-libgccs1: New file.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
(cherry picked from commit d9cafa0c4f0a81304d9b95a78ccc8e9003c6d7a3)

Daily bump.

Fortran: pure subroutine with pure procedure as dummy [PR106948]

PR fortran/106948

gcc/fortran/ChangeLog:

* resolve.cc (gfc_pure_function): If a function has been resolved,
but esym is not yet set, look at its attributes to see whether it
is pure or elemental.

gcc/testsuite/ChangeLog:

* gfortran.dg/pure_formal_proc_4.f90: New test.

(cherry picked from commit 4e3060ee17e6eb8bab718d320199f713533dbbd6)

c++: UNBOUND_CLASS_TEMPLATE context substitution [PR119981]

In r15-123 and r14-11434 we unconditionally set processing_template_decl
when substituting the context of an UNBOUND_CLASS_TEMPLATE, in order to
handle instantiation of the dependently scoped friend declaration

  template<int N>
  template<class T>
  friend class A<N>::B;

where the scope A<N> remains dependent after instantiation.  But this
turns out to misbehave for the UNBOUND_CLASS_TEMPLATE in the below
testcase representing

  g<[]{}>::template fn

since with the flag set substituting the args of test3 into the lambda
causes us to defer the substitution and yield a lambda that still looks
dependent, which in turn makes g<[]{}> still dependent and not suitable
for qualified name lookup.

This patch restricts setting processing_template_decl during
UNBOUND_CLASS_TEMPLATE substitution to the case where there are multiple
levels of introduced template parameters, as in the friend declaration.
(This means we need to substitute the template parameter list(s) first,
which makes sense since they lexically appear first.)

PR c++/119981
PR c++/119378

gcc/cp/ChangeLog:

* pt.cc (tsubst) <case UNBOUND_CLASS_TEMPLATE>: Substitute
into template parameter list first.  When substituting the
context, only set processing_template_decl if there's more
than one level of introduced template parameters.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ15.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 05ea8baf6ff96c77a9a2467d5c45b1ed575fca92)

testsuite: Force -mcmodel=small for gcc.target/aarch64/pr115258.c

The test implicitly assumed the default code model and so failed
for -mcmodel=tiny.

gcc/testsuite/
* gcc.target/aarch64/pr115258.c: Add -mcmodel=small.

(cherry picked from commit 3584aab37f54bcd220c7061568af777e37f4f6ed)

Fix GNAT build failure for x86/FreeBSD

gcc/ada/
PR ada/112958
* Makefile.rtl (LIBGNAT_TARGET_PAIRS) [x86 FreeBSD]: Add specific
version of s-dorepr.adb.
* libgnat/s-dorepr__freebsd.adb: New file.

AVR: target/119989 - Add missing clobbers to xload_<mode>_libgcc.

libgcc's __xload_1...4 is clobbering Z (and also R21 is some cases),
but avr.md had clobbers of respective GPRs only up to reload.
Outcome was that code reading from the same __memx address twice
could be wrong. This patch adds respective clobbers.

gcc/
* config/avr/avr.md (xload_<mode>_libgcc): Clobber R21, Z.

gcc/testsuite/
* gcc.target/avr/torture/pr119989.h: New file.
* gcc.target/avr/torture/pr119989-memx-1.c: New test.
* gcc.target/avr/torture/pr119989-memx-2.c: New test.
* gcc.target/avr/torture/pr119989-memx-3.c: New test.
* gcc.target/avr/torture/pr119989-memx-4.c: New test.

libstdc++: [_GLIBCXX_INLINE_VERSION] Fix tests failures

Adapt testsuite v3_target_compile to strip version namespace from compiler
output so that dg-error and dg-warning directives do not need to consider it.

libstdc++-v3/ChangeLog:

* testsuite/lib/libstdc++.exp (v3_target_compile): Strip version namespace
from compiler output.
* testsuite/20_util/function/cons/70692.cc: Remove now useless __8 namespace
pattern.
* testsuite/23_containers/map/48101_neg.cc: Likewise.
* testsuite/23_containers/multimap/48101_neg.cc: Likewise.

Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 8709d6a17830c8a9f48cb3ac6dfc6af76f2e1e81)

RISC-V: vsetvl: elide abnormal edges from LCM computations [PR119533]

vsetvl phase4 uses LCM guided info to insert VSETVL insns, including a
straggler loop for "mising vsetvls" on certain edges. Currently it
asserts on encountering EDGE_ABNORMAL.

When enabling go frontend with V enabled, libgo build hits the assert.

The solution is to prevent abnormal edges from getting into LCM at all
(my prior attempt at this just ignored them after LCM which is not
right). Existing invalid_opt_bb_p () current does this for BB predecessors
but not for successors which is what the patch adds.

Crucially, the ICE/fix also depends on avoiding vsetvl hoisting past
non-transparent blocks: That is taken care of by Robin's patch
"RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547]"
for a different yet related issue.

Reported-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
PR target/119533

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (invalid_opt_bb_p): Check for
EDGE_ABNOMAL.
(pre_vsetvl::compute_lcm_local_properties): Initialize kill
bitmap.
Debug dump skipped edge.

gcc/testsuite/ChangeLog:

* go.dg/pr119533-riscv.go: New test.
* go.dg/pr119533-riscv-2.go: New test.

(cherry picked from commit edb4867412895100b3addc525bc0dba0ea90c7f9)

RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547].

When lifting up a vsetvl into a block we currently don't consider the
block's transparency with respect to the vsetvl as in other parts of the
pass. This patch does not perform the lift when transparency is not
guaranteed.

This condition is more restrictive than necessary as we can still
perform a vsetvl lift if the conflicting register is only every used
in vsetvls and no regular insns but given how late we are in the GCC 15
cycle it seems better to defer this. Therefore
gcc.target/riscv/rvv/vsetvl/avl_single-68.c is XFAILed for now.

This issue was found in OpenCV where it manifests as a runtime error.
Zhijin Zeng debugged PR119547 and provided an initial patch.

Reported-By: 曾治金 <zhijin.zeng@spacemit.com>
PR target/119547

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info):
Do not perform lift if block is not transparent.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-68.c: xfail.
* g++.target/riscv/rvv/autovec/pr119547.C: New test.
* g++.target/riscv/rvv/autovec/pr119547-2.C: New test.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-12.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: Adjust.

(cherry picked from commit 517f7e3f02b4c945d2b4bdabb490961cf986391e)

Daily bump.

Remove other processors from X86_TUNE_DEST_FALSE_DEP_FOR_GLC except GLC

Since the tune if only for GLC(sapphirerapids and alderlake-P).

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_DEST_FALSE_DEP_FOR_GLC):
Remove other processor except for GLC since this one is only
for GLC.

(cherry picked from commit 1ad6e171b126a82f38b1e8cbfd207f1d91c58a59)

libstdc++: Improve optional's <=> constraint recursion workaround [PR104606]

It turns out the reason the behavior of this testcase changed after CWG
2369 is because validity of the substituted return type is now checked
later, after constraints. So a more reliable workaround for this issue
is to add a constraint to check the validity of the return type earlier,
matching the pre-CWG 2369 semantics.

PR libstdc++/104606

libstdc++-v3/ChangeLog:

* include/std/optional (operator<=>): Revert r14-9771 change.
Add constraint checking the validity of the return type
compare_three_way_result_t before the three_way_comparable_with
constraint.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 815f1f27a1dba2f0acd1f02d0beafedadebe967c)

libstdc++: Add code comment documenting LWG 4027 change [PR118083]

PR libstdc++/118083

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h
(ranges::__access::__possibly_const_range): Mention LWG 4027.

(cherry picked from commit 640697f7c2def415db81c84010ae25be0785d867)

libstdc++: Implement LWG 4027 change to possibly-const-range [PR118083]

LWG 4027 effectively makes the const range access CPOs ranges::cfoo behave
more consistently across C++23 and C++20 (pre-P2278R4) and also more
consistently with the std::cfoo range accessors, as the below testcase
adjustments demonstrate (which mostly consist of reverting workarounds
added by r14-3771-gf12e26f3496275 and r13-7186-g0d94c6df183375).

In passing fix PR118083 which reports that the input_range constraint on
possibly-const-range is missing in our implementation. A consequence of
this is that the const range access CPOs now consistently reject a non-range
argument, and so in some our of tests we need to introduce otherwise
unused begin/end members.

PR libstdc++/118083

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h
(ranges::__access::__possibly_const_range): Adjust logic as per
LWG 4027. Add missing input_range constraint.
* testsuite/std/ranges/access/cbegin.cc (test05): Verify LWG
4027 testcases.
* testsuite/std/ranges/access/cdata.cc: Adjust, simplify and
consolidate some tests after the above.
* testsuite/std/ranges/access/cend.cc: Likewise.
* testsuite/std/ranges/access/crbegin.cc: Likewise.
* testsuite/std/ranges/access/crend.cc: Likewise.
* testsuite/std/ranges/adaptors/join.cc: Likewise.
* testsuite/std/ranges/adaptors/take_while.cc: Likewise.
* testsuite/std/ranges/adaptors/transform.cc: Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 1b9e4fe2ff5f4711406cdcf0e6e183b247d9f42b)

c++: prev declared hidden tmpl friend inst, cont [PR119807]

When remapping existing specializations of a hidden template friend from
a previous declaration to the new definition, we must remap only those
specializations that match this new definition, but currently we
remap all specializations (since they all appear in the same
DECL_TEMPLATE_INSTANTIATIONS list of the most general template).

Concretely, in the first testcase below, we form two specializations of
the friend A::f, one with arguments {{0},{bool}} and another with
arguments {{1},{bool}}.  Later when instantiating B, we need to remap
these specializations.  During the B<0> instantiation we only want to
remap the first specialization, and during the B<1> instantiation only
the second specialization, but currently we remap both specializations
twice.

tsubst_friend_function needs to determine if an existing specialization
matches the shape of the new definition, which is tricky in general,
e.g. if the outer template parameters may not match up.  Fortunately we
don't have to reinvent the wheel here since is_specialization_of_friend
seems to do exactly what we need.  We can check this unconditionally,
but I think it's only necessary when dealing with specializations formed
from a class template scope previous declaration, hence the
TMPL_ARGS_HAVE_MULTIPLE_LEVELS check.

PR c++/119807
PR c++/112288

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_function): Skip remapping an
existing specialization if it doesn't match the shape of
the new friend definition.

gcc/testsuite/ChangeLog:

* g++.dg/template/friend86.C: New test.
* g++.dg/template/friend87.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 369461d0749790f1291f76096064d583d2547934)

Daily bump.

c++: format attribute redeclaration [PR116954]

Here when merging the two decls, remove_contract_attributes loses
ATTR_IS_DEPENDENT on the format attribute, so apply_late_template_attributes
just returns, so the attribute doesn't get propagated to the type where the
warning looks for it.

Fixed by using copy_node instead of tree_cons to preserve flags.

PR c++/116954

gcc/cp/ChangeLog:

* contracts.cc (remove_contract_attributes): Preserve flags
on the attribute list.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wformat-3.C: New test.

(cherry picked from commit b0d7d644f3c25af9bf60c948ab26aa7b09a68787)

c++: shortcut constexpr vector ctor [PR113835]

Since std::vector became usable in constant evaluation in C++20, a vector
variable with static storage duration might be manifestly
constant-evaluated, so we properly try to constant-evaluate its initializer.
But it can never succeed since the result will always refer to the result of
operator new, so trying is a waste of time. Potentially a large waste of
time for a large vector, as in the testcase in the PR.

So, let's recognize this case and skip trying constant-evaluation. I do
this only for the case of an integer argument, as that's the case that's
easy to write but slow to (fail to) evaluate.

In the test, I use dg-timeout-factor to lower the default timeout from 300
seconds to 15; on my laptop, compilation without the patch takes about 20
seconds versus about 2 with the patch.

is_std_class comes from r15-4953.

PR c++/113835

gcc/cp/ChangeLog:

* cp-tree.h (is_std_class): Declare.
* constexpr.cc (is_std_class): New function.
(is_std_allocator): Use it.
(cxx_eval_outermost_constant_expr): Bail out early
for std::vector(N).

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-vector1.C: New test.

(cherry picked from commit 764f02327f7b2dc6ac5abaf89038e51cf0ee6d13)

middle-end: fix masking for partial vectors and early break [PR119351]

The following testcase shows an incorrect masked codegen:

#define N 512
#define START 1
#define END 505

int x[N] __attribute__((aligned(32)));

int __attribute__((noipa))
foo (void)
{
  int z = 0;
  for (unsigned int i = START; i < END; ++i)
    {
      z++;
      if (x[i] > 0)
        continue;

      return z;
    }
  return -1;
}

notice how there's a continue there instead of a break.  This means we generate
a control flow where success stays within the loop iteration:

  mask_patt_9.12_46 = vect__1.11_45 > { 0, 0, 0, 0 };
  vec_mask_and_47 = mask_patt_9.12_46 & loop_mask_41;
  if (vec_mask_and_47 == { -1, -1, -1, -1 })
    goto <bb 4>; [41.48%]
  else
    goto <bb 15>; [58.52%]

However when loop_mask_41 is a partial mask this comparison can lead to an
incorrect match.  In this case the mask is:

  # loop_mask_41 = PHI <next_mask_63(6), { 0, -1, -1, -1 }(2)>

due to peeling for alignment with masking and compiling with
-msve-vector-bits=128.

At codegen time we generate:

ptrue   p15.s, vl4
ptrue   p7.b, vl1
not     p7.b, p15/z, p7.b
.L5:
ld1w    z29.s, p7/z, [x1, x0, lsl 2]
cmpgt   p7.s, p7/z, z29.s, #0
not     p7.b, p15/z, p7.b
ptest   p15, p7.b
b.none  .L2
...<early exit>...

Here the basic blocks are rotated and a not is generated.
But the generated not is unmasked (or predicated over an ALL true mask in this
case).  This has the unintended side-effect of flipping the results of the
inactive lanes (which were zero'd by the cmpgt) into -1.  Which then incorrectly
causes us to not take the branch to .L2.

This is happening because we're not comparing against the right value for the
forall case.  This patch gets rid of the forall case by rewriting the
if(all(mask)) into if (!all(mask)) which is the same as if (any(~mask)) by
negating the masks and flipping the branches.

1. For unmasked loops we simply reduce the ~mask.
2. For masked loops we reduce (~mask & loop_mask) which is the same as
   doing (mask & loop_mask) ^ loop_mask.

For the above we now generate:

.L5:
        ld1w    z28.s, p7/z, [x1, x0, lsl 2]
        cmple   p7.s, p7/z, z28.s, #0
        ptest   p15, p7.b
        b.none  .L2

This fixes gromacs with > 1 OpenMP threads and improves performance.

gcc/ChangeLog:

PR tree-optimization/119351
* tree-vect-stmts.cc (vectorizable_early_exit): Mask both operands of
the gcond for partial masking support.

gcc/testsuite/ChangeLog:

PR tree-optimization/119351
* gcc.target/aarch64/sve/pr119351.c: New test.
* gcc.target/aarch64/sve/pr119351_run.c: New test.

    (cherry picked from commit 7cf5503e0af52f5b726da4274a148590c57a458a)

aarch64: force operand to fresh register to avoid subreg issues [PR118892]

When the input is already a subreg and we try to make a paradoxical
subreg out of it for copysign this can fail if it violates the subreg
relationship.

Use force_lowpart_subreg instead of lowpart_subreg to then force the
results to a register instead of ICEing.

gcc/ChangeLog:

PR target/118892
* config/aarch64/aarch64.md (copysign<GPF:mode>3): Use
force_lowpart_subreg instead of lowpart_subreg.

gcc/testsuite/ChangeLog:

PR target/118892
* gcc.target/aarch64/copysign-pr118892.c: New test.

Daily bump.

libbacktrace: Add hpux fileline support

Fixes libstdc++ stacktrace tests.

2025-04-10 John David Anglin <danglin@gcc.gnu.org>

libbacktrace/ChangeLog:
* fileline.c (hpux_get_executable_path): New.
(fileline_initialize): Add pass to get hpux executable path.

Daily bump.

sra: Clear grp_same_access_path of acesses created by total scalarization (PR118924)

During analysis of PR 118924 it was discussed that total scalarization
invents access paths (strings of COMPONENT_REFs and possibly even
ARRAY_REFs) which did not exist in the program before which can have
unintended effects on subsequent AA queries. Although not doing that
does not mean that SRA cannot create such situations (see the bug for
more info), it has been agreed that not doing this is generally better.
This patch therfore makes SRA fall back on creating simple MEM_REFs when
accessing components of an aggregate corresponding to what a SRA
variable now represents.

gcc/ChangeLog:

2025-03-26 Martin Jambor <mjambor@suse.cz>

PR tree-optimization/118924
* tree-sra.cc (create_total_scalarization_access): Set
grp_same_access_path flag to zero.

(cherry picked from commit 40445711b8af113ef423d8bcac1a7ce1c47f62d7)

sra: Avoid creating TBAA hazards (PR118924)

The testcase in PR 118924, when compiled on Aarch64, contains an
gimple aggregate assignment statement in between different types which
are types_compatible_p but behave differently for the purposes of
alias analysis.

SRA replaces the statement with a series of scalar assignments which
however have LHSs access chains modeled on the RHS type and so do not
alias with a subsequent reads and so are DSEd.

SRA clearly gets its "same_access_path" logic subtly wrong.  One issue
is that the same_access_path_p function probably should be implemented
more along the lines of (parts of ao_compare::compare_ao_refs) instead
of internally relying on operand_equal_p.  That is however not the
problem in the PR and so I will deal with it only later.

The issue here is that even when the access path is the same, it must
not be bolted on an aggregate type that does not match.  This patch
does that, taking just one simple function from the
ao_compare::compare_ao_refs machinery and using it to detect the
situation.  The rest is just merging the information in between
accesses of the same access group.

I looked at how many times we come across such assignment during
"make stage2-bubble" of GCC (configured with only c and C++ and
without multilib and libsanitizers) and on an x86_64 there were 87924
such assignments (though now I realize not all of them had to be
aggregate), so they do happen.  The patch leads to about 5% increase
of cases where we don't use an "access path" but resort to a
MEM_REF (from 90209 to 95204).  On an Aarch64, there were 92268 such
assignments and the increase of falling back to MEM_REFs was by
4% (but from a bigger base 132983 to 107991).

gcc/ChangeLog:

2025-04-04  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/118924
* tree-ssa-alias-compare.h (types_equal_for_same_type_for_tbaa_p):
Declare.
* tree-ssa-alias.cc: Include ipa-utils.h.
(types_equal_for_same_type_for_tbaa_p): New public overloaded variant.
* tree-sra.cc: Include tree-ssa-alias-compare.h.
(create_access): Initialzie grp_same_access_path to true.
(build_accesses_from_assign): Detect tbaa hazards and clear
grp_same_access_path fields of involved accesses when they occur.
(sort_and_splice_var_accesses): Take previous values of
grp_same_access_path into account.

gcc/testsuite/ChangeLog:

2025-03-25  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/118924
* g++.dg/tree-ssa/pr118924.C: New test.

(cherry picked from commit 07d243670020b339380194f6125cde87ada56148)

s390: Accept only Pmode for registers AP/FP/RA [PR119235]

gcc/ChangeLog:

PR target/119235
* config/s390/s390.cc (s390_hard_regno_mode_ok): Accept only
Pmode for registers AP/FP/RA.

(cherry picked from commit 2b383ae2a6e5fc0530bfd8b86ad0e6b27e760bd2)

c++: templates, attributes, #pragma target [PR114772]

Since r12-5426 apply_late_template_attributes suppresses various global
state to avoid applying active pragmas to earlier declarations; we also
need to override target_option_current_node.

PR c++/114772
PR c++/101180

gcc/cp/ChangeLog:

* pt.cc (apply_late_template_attributes): Also override
target_option_current_node.

gcc/testsuite/ChangeLog:

* g++.dg/ext/pragma-target2.C: New test.

(cherry picked from commit 5fdb0145fb9499f5db9e27f775895ce4a39215e4)

libcpp: Fix incorrect line numbers in large files [PR108900]

This patch addresses an issue in the C preprocessor where incorrect
line number information is generated when processing files with a
large number of lines. The problem arises from improper handling
of location intervals in the line map, particularly when locations
exceed LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES.

By ensuring that the highest location is not decremented if it
would move to a different ordinary map, this fix resolves
the line number discrepancies observed in certain test cases.
This change improves the accuracy of line number reporting, benefiting
users relying on precise code coverage and debugging information.

Tested x86_64-linux.

libcpp/ChangeLog:

PR preprocessor/108900
* files.cc (_cpp_stack_file): Do not decrement highest_location
across distinct maps.

Signed-off-by: Jeremy Bettis <jbettis@google.com>
Signed-off-by: Yash Shinde <Yash.Shinde@windriver.com>
(cherry picked from commit d9b56c65a2697e0d7a6c0f15f1977803dc94579b)

c++: constexpr, trivial, and non-alias target [PR111075]

On Darwin and other targets with !can_alias_cdtor, we instead go to
maybe_thunk_ctor, which builds a thunk function that calls the general
constructor. And then cp_fold tries to constant-evaluate that call, and we
ICE because we don't expect to ever be asked to constant-evaluate a call to
a trivial function.

No new test because this fixes g++.dg/torture/tail-padding1.C on affected
targets.

PR c++/111075

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Allow trivial
call from a thunk.

(cherry picked from commit 628aecb050bbbc4bb0bd4468c474623e20d64e21)

Daily bump.

[14] Use --param=aarch64-autovec-preference=2 instead of =sve-only

This updates the backported test.

PR tree-optimization/119706
* g++.target/aarch64/sve/pr119706.C: Adjust --param
aarch64-autovec-preference.

tree-optimization/119778 - properly mark abnormal edge sources during inlining

When inlining a call that abnormally transfers control-flow we make
all inlined calls that can possibly transfer abnormal control-flow
do so as well. But we failed to mark the calls as altering
control-flow. This results in inconsistent behavior later and
possibly wrong-code (we'd eventually prune those edges).

PR tree-optimization/119778
* tree-inline.cc (copy_edges_for_bb): Mark calls that are
source of abnormal edges as altering control-flow.

* g++.dg/torture/pr119778.C: New testcase.

(cherry picked from commit a48f934211434cac1be951c207ee76e4b4340fac)

aarch64: Add test case.

This patch adds a test case to the testsuite for PR119706.
The bug was already fixed by
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680573.html.

OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/testsuite/
PR tree-optimization/119706
* g++.target/aarch64/sve/pr119706.C: New test.

(cherry picked from commit f6e6e6d9ba1d71fdd02a2c570d60217db6c5a31b)

middle-end/119706 - allow POLY_INT_CST as is_gimple_mem_ref_addr

We currently only INTEGER_CST, but not POLY_INT_CST, which leads
to the situation that when the POLY_INT_CST is only indrectly
present via a SSA def the IL is valid but when propagated it's not.
That's unsustainable.

PR middle-end/119706
* gimple-expr.cc (is_gimple_mem_ref_addr): Also allow
POLY_INT_CST.

(cherry picked from commit bf812c6ad83ec0b241bb3fecc7e68f883b6083df)

target/119549 - fixup handling of -mno-sse4 in target attribute

The following fixes ix86_valid_target_attribute_inner_p to properly
handle target("no-sse4") via OPT_mno_sse4 rather than as unset OPT_msse4.
I've added asserts to ix86_handle_option that RejectNegative is honored
for both.

PR target/119549
* common/config/i386/i386-common.cc (ix86_handle_option):
Assert that both OPT_msse4 and OPT_mno_sse4 are never unset.
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Process negated OPT_msse4 as OPT_mno_sse4.

* gcc.target/i386/pr119549.c: New testcase.

(cherry picked from commit ae2912748b9913fe9cd198db1b3fa704d0cbc728)

tree-optimization/119534 - reject bogus emulated vectorized gather

The following makes sure to reject the attempts to emulate a vector
gather when the discovered index vector type is a vector mask.

PR tree-optimization/119534
* tree-vect-stmts.cc (get_load_store_type): Reject
VECTOR_BOOLEAN_TYPE_P offset vector type for emulated gathers.

* gcc.dg/vect/pr119534.c: New testcase.

(cherry picked from commit d0cc14c62ad7403afcab3c2e38851d3ab179352f)

Daily bump.

d: Fix ICE: type variant differs by TYPE_MAX_VALUE with -g [PR119826]

Forward referenced enum types were never fixed up after the main
ENUMERAL_TYPE was finished. All flags set are now propagated to all
variants after its mode, size, and alignment has been calculated.

PR d/119826

gcc/d/ChangeLog:

* types.cc (TypeVisitor::visit (TypeEnum *)): Propagate flags of main
enum types to all forward-referenced variants.

gcc/testsuite/ChangeLog:

* gdc.dg/debug/imports/pr119826b.d: New test.
* gdc.dg/debug/pr119826.d: New test.

(cherry picked from commit c5ffab99a5a962aa955310e74ca0a4be5c1acf30)

d: Fix ICE in dwarf2out_imported_module_or_decl, at dwarf2out.cc:27676 [PR119817]

The ImportVisitor method for handling the importing of overload sets was
pushing NULL_TREE to the array of import decls, which in turn got passed
to `debug_hooks->imported_module_or_decl', triggering the observed
internal compiler error.

NULL_TREE is returned from `build_import_decl' when the symbol was
ignored for being non-trivial to represent in debug, for example,
template or tuple declarations. So similarly "skip" adding the symbol
when this is the case for overload sets too.

PR d/119817

gcc/d/ChangeLog:

* imports.cc (ImportVisitor::visit (OverloadSet *)): Don't push
NULL_TREE to vector of import symbols.

gcc/testsuite/ChangeLog:

* gdc.dg/debug/imports/m119817/a.d: New test.
* gdc.dg/debug/imports/m119817/b.d: New test.
* gdc.dg/debug/imports/m119817/package.d: New test.
* gdc.dg/debug/pr119817.d: New test.

(cherry picked from commit f5ed7d19c965de9ccb158d77e929b17459bf65b5)

d: Fix forward referenced enums missing type names in debug info [PR118309]

Calling `rest_of_type_compilation' as the D types were built meant that
debug info was being emitted before all forward references were
resolved, resulting in DW_AT_name's to be missing.

Instead, defer outputting type debug information until all modules have
been parsed and generated in `d_finish_compilation'.

PR d/118309

gcc/d/ChangeLog:

* modules.cc: Include debug.h
(d_finish_compilation): Call debug_hooks->type_decl on all TYPE_DECLs.
* types.cc: Remove toplev.h include.
(finish_aggregate_type): Don't call rest_of_type_compilation or
rest_of_decl_compilation on type.
(TypeVisitor::visit (TypeEnum *)): Likewise.

gcc/testsuite/ChangeLog:

* gdc.dg/debug/dwarf2/pr118309.d: New test.

(cherry picked from commit cee353c2653d274768a67677c8ea37fd23422b3c)

libatomic: Fix up libat_{,un}lock_n for mingw [PR119796]

Here is just a port of the previously posted patch to mingw which
clearly has the same problems.

2025-04-16 Jakub Jelinek <jakub@redhat.com>

PR libgcc/101075
PR libgcc/119796
* config/mingw/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE). If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.

(cherry picked from commit 34fe8e90007afbc87941df9b01ffcf8747c11497)

libatomic: Fix up libat_{,un}lock_n [PR119796]

As mentioned in the PR (and I think in PR101075 too), we can run into
deadlock with libat_lock_n calls with larger n.
As mentioned in PR66842, we use multiple locks (normally 64 mutexes
for each 64 byte cache line in 4KiB page) and currently can lock more
than one lock, in particular for n [0, 64] a single lock, for n [65, 128]
2 locks, for n [129, 192] 3 locks etc.
There are two problems with this:
1) we can deadlock if there is some wrap-around, because the locks are
   acquired always in the order from addr_hash (ptr) up to
   locks[NLOCKS-1].mutex and then if needed from locks[0].mutex onwards;
   so if e.g. 2 threads perform libat_lock_n with n = 2048+64, in one
   case at pointer starting at page boundary and in another case at
   page boundary + 2048 bytes, the first thread can lock the first
   32 mutexes, the second thread can lock the last 32 mutexes and
   then first thread wait for the lock 32 held by second thread and
   second thread wait for the lock 0 held by the first thread;
   fixed below by always locking the locks in order of increasing
   index, if there is a wrap-around, by locking in 2 loops, first
   locking some locks at the start of the array and second at the
   end of it
2) the number of locks seems to be determined solely depending on the
   n value, I think that is wrong, we don't know the structure alignment
   on the libatomic side, it could very well be 1 byte aligned struct,
   and so how many cachelines are actually (partly or fully) occupied
   by the atomic access depends not just on the size, but also on
   ptr % WATCH_SIZE, e.g. 2 byte structure at address page_boundary+63
   should IMHO lock 2 locks because it occupies the first and second
   cacheline

Note, before this patch it locked exactly one lock for n = 0, while
with this patch it could lock either no locks at all (if it is at cacheline
boundary) or 1 (otherwise).
Dunno of libatomic APIs can be called for zero sizes and whether
we actually care that much how many mutexes are locked in that case,
because one can't actually read/write anything into zero sized memory.
If you think it is important, I could add else if (nlocks == 0) nlocks = 1;
in both spots.

2025-04-16  Jakub Jelinek  <jakub@redhat.com>

PR libgcc/101075
PR libgcc/119796
* config/posix/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE).  If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.

(cherry picked from commit 61dfb0747afcece3b7a690807b83b366ff34f329)

bitintlower: Fix interaction of gimple_assign_copy_p stmts vs. has_single_use [PR119808]

The following testcase is miscompiled, because we emit a CLOBBER in a place
where it shouldn't be emitted.
Before lowering we have:
  b_5 = 0;
  b.0_6 = b_5;
  b.1_1 = (unsigned _BitInt(129)) b.0_6;
...
  <retval> = b_5;
The bitint coalescing assigns the same partition/underlying variable
for both b_5 and b.0_6 (possible because there is a copy assignment)
and of course a different one for b.1_1 (and other SSA_NAMEs in between).
This is -O0 so stmts aren't DCEd and aren't propagated that much etc.
It is -O0 so we also don't try to optimize and omit some names from m_names
and handle multiple stmts at once, so the expansion emits essentially
  bitint.4 = {};
  bitint.4 = bitint.4;
  bitint.2 = cast of bitint.4;
  bitint.4 = CLOBBER;
...
  <retval> = bitint.4;
and the CLOBBER is the problem because bitint.4 is still live afterwards.
We emit the clobbers to improve code generation, but do it only for
(initially) has_single_use SSA_NAMEs (remembered in m_single_use_names)
being used, if they don't have the same partition on the lhs and a few
other conditions.
The problem above is that b.0_6 which is used in the cast has_single_use
and so was in m_single_use_names bitmask and the lhs in that case is
bitint.2, so a different partition.  But there is gimple_assign_copy_p
with SSA_NAME rhs1 and the partitioning special cases those and while
b.0_6 is single use, b_5 has multiple uses.  I believe this ought to be
a problem solely in the case of such copy stmts and its special case
by the partitioning, if instead of b.0_6 = b_5; there would be
b.0_6 = b_5 + 1; or whatever other stmts that performs or may perform
changes on the value, partitioning couldn't assign the same partition
to b.0_6 and b_5 if b_5 is used later, it couldn't have two different
(or potentially different) values in the same bitint.N var.  With
copy that is possible though.

So the following patch fixes it by being more careful when we set
m_single_use_names, don't set it if it is a has_single_use SSA_NAME
but SSA_NAME_DEF_STMT of it is a copy stmt with SSA_NAME rhs1 and that
rhs1 doesn't have single use, or has_single_use but SSA_NAME_DEF_STMT of it
is a copy stmt etc.

Just to make sure it doesn't change code generation too much, I've gathered
statistics how many times
      if (m_first
          && m_single_use_names
          && m_vars[p] != m_lhs
          && m_after_stmt
          && bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
        {
          tree clobber = build_clobber (TREE_TYPE (m_vars[p]),
                                        CLOBBER_STORAGE_END);
          g = gimple_build_assign (m_vars[p], clobber);
          gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
          gsi_insert_after (&gsi, g, GSI_SAME_STMT);
        }
emits a clobber on
make check-gcc GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="--target_board=unix\{-m64,-m32\} GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c pr116588.c pr116003.c pr113693.c pr113602.c flex-array-counted-by-7.c' dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* i386.exp='pr118017.c pr117946.c apx-ndd-x32-2a.c' vect.exp='vect-early-break_99-pr113287.c' tree-ssa.exp=pr113735.c"
and before this patch it was 41010 clobbers and after it is 40968,
so difference is 42 clobbers, 0.1% fewer.

2025-04-16  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/119808
* gimple-lower-bitint.cc (gimple_lower_bitint): Don't set
m_single_use_names bits for SSA_NAMEs which have single use but
their SSA_NAME_DEF_STMT is a copy from another SSA_NAME which doesn't
have a single use, or single use which is such a copy etc.

(cherry picked from commit 5a48e7732d6aa0aaf12b508fa640125e6c4d14b9)

expmed: Always use QImode for init_expmed set_zero_cost [PR119785]

This is a regression on some targets introduced I believe by r6-2055
which added mode argument to set_src_cost.

The problem here is that in the first iteration, mode is always QImode
and we get as -Os zero cost set_src_cost (const0_rtx, QImode, false).
But then we use the mode variable for iterating over int, partial int
and vector int modes, so for the second iteration we call set_src_cost
with mode which is at that time (machine_mode) (MAX_MODE_VECTOR_INT + 1).

In the x86 case that happens to be V2HFmode and we don't crash (and
compute the same 0 cost as we would for QImode).
But e.g. in the SPARC case (machine_mode) (MAX_MODE_VECTOR_INT + 1) is
MAX_MACHINE_MODE and that does all kinds of weird things especially
when doing ubsan bootstrap.

Fixed by always using QImode.

2025-04-14 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/119785
* expmed.cc (init_expmed): Always pass QImode rather than mode to
set_src_cost passed to set_zero_cost.

(cherry picked from commit f96a54350afcf7f3c90d0ecb51d7683d826acc00)

driver: On linux hosts disable ASLR during -freport-bug [PR119727]

Andi had a useful comment that even with the PR119727 workaround to
ignore differences in libbacktrace printed addresses, it is still better
to turn off ASLR when easily possible, e.g. in case some address leaks
in somewhere in the ICE message elsewhere, or to verify the ICE doesn't
depend on a particular library/binary load addresses.

The following patch adds a configure check and uses personality syscall
to turn off randomization for further -freport-bug subprocesses.

2025-04-14 Jakub Jelinek <jakub@redhat.com>

PR driver/119727
* configure.ac (HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE): New check.
* gcc.cc: Include sys/personality.h if
HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
(try_generate_repro): Call
personality (personality (0xffffffffU) | ADDR_NO_RANDOMIZE)
if HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
* config.in: Regenerate.
* configure: Regenerate.

(cherry picked from commit 5a32e85810d33dc46b1b5fe2803ee787d40709d5)

driver: Fix up -freport-bug for ASLR [PR119727]

With --enable-host-pie -freport-bug almost never prepares preprocessed
source and instead emits
The bug is not reproducible, so it is likely a hardware or OS problem.
message even for bogus which are 100% reproducible.
The way -freport-bug works is that it reruns it 3 times, capturing stdout
and stderr from each and then tries to compare the outputs in between
different runs.
The libbacktrace emitted hexadecimal addresses at the start of the lines
can differ between runs due to ASLR, either of the PIE executable, or
even if not PIE if there is some frame with e.g. libc function (say
crash in strlen/memcpy etc.).

The following patch fixes it by ignoring such differences at the start of
the lines.

2025-04-12 Jakub Jelinek <jakub@redhat.com>

PR driver/119727
* gcc.cc (files_equal_p): Rewritten using fopen/fgets/fclose instead
of open/fstat/read/close. At the start of lines, ignore lowercase
hexadecimal addresses followed by space.

(cherry picked from commit 8b2ceb421f045ee8b39d7941f39f1e9a67217583)

bitintlower: Fix up handling of SSA_NAME copies in coalescing [PR119722]

The following patch is miscompiled, because during the limited
SSA name coalescing the bitintlower pass does we incorrectly don't
register a conflict.
This is on
  <bb 4> [local count: 1073741824]:
  # b_17 = PHI <b_19(3), 8(2)>
  g.4_13 = g;
  _14 = g.4_13 >> 50;
  _15 = (unsigned int) _14;
  _21 = b_17;
  _16 = (unsigned int) _21;
  s_22 = _15 + _16;
  return s_22;
basic block where in the map->bitint bitmap we track 14, 17 and 19.
The build_bitint_stmt_ssa_conflicts "hook" has special code where
it tracks uses at the final statements of mergeable operations, so
e.g. the
  _16 = (unsigned int) _21;
statement is considered to be use of b_17 because _21 is not in
map->bitmap (or large_huge.m_names), i.e. is mergeable.
The problem is that build_ssa_conflict_graph has special code to handle
SSA_NAME copies and _21 = b_17; is gimple_assign_copy_p.  In such cases
it calls live_track_clear_var on the rhs1.  The problem is that
on the above bb, after we note in the _16 = (unsigned int) _21;
stmt we need b_17 the generic code makes us forget that because
of the copy statement, and then build_bitint_stmt_ssa_conflicts
ignores it completely (because _21 is large/huge bitint and is
not in map->bitint, so assumed to be handled by a later stmt in the
bb, for backwards walk like this before this one).
As the b_17 use is ignored, the coalescing thinks it can put
all of b_17, b_19 and _14 into the same partition, which is wrong,
while we can and should coalesce b_17 and b_19, _14 needs to be a different
temporary because b_17 is set before and used after _14 has been written.

The following patch fixes it by handling gimple_assign_copy_p in two
separate spots, move the generic coalesce handling of it after
build_ssa_conflict_graph (where build_ssa_conflict_graph handling
doesn't fall through to that, it does continue after the call) and
inside of build_ssa_conflict_graph it performs it too, but only if
the lhs is not mergeable large/huge bitint.

2025-04-12  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/119722
* gimple-lower-bitint.h (build_bitint_stmt_ssa_conflicts): Add
CLEAR argument.
* gimple-lower-bitint.cc (build_bitint_stmt_ssa_conflicts): Add
CLEAR argument.  Call clear on gimple_assign_copy_p rhs1 if lhs
is large/huge bitint unless lhs is not in names.
* tree-ssa-coalesce.cc (build_ssa_conflict_graph): Adjust
build_bitint_stmt_ssa_conflicts caller.  Move gimple_assign_copy_p
handling to after the build_bitint_stmt_ssa_conflicts call.

(cherry picked from commit 3f9dfb94eab1ab1bbf9a2b5e20d1f61e36516063)

bitintlower: Fix up handling of nested casts in m_upward_2limbs cases [PR119707]

The following testcase is miscompiled I believe starting with
PR112941 r14-6742.  That commit fixed the bitint-55.c testcase.
The m_first initialization for such conversion initializes 2 SSA_NAMEs,
one is PHI result on the loop (m_data[save_data_cnt]) and the other
(m_data[save_data_cnt+1]) is the argument of that PHI from the latch
edge initialized somewhere in the loop.  Both of these are used to
propagate sign extension (i.e. either 0 or all ones limb) from the
iteration with the sign bit of a narrower type to following iterations.
The bitint-55.c testcase was ICEing with invalid SSA forms as it was
using unconditionally the PHI argument SSA_NAME even in places which
weren't dominated by that.  And the code which was touched is about
handling constant idx, so if e.g. there are nested casts and the
outer one does conditional code based on index comparison with
a particular constant index.
In the following testcase there are 2 nested casts, one from signed
_BitInt(129) to unsigned _BitInt(255) and the outer from unsigned
_BitInt(255) to unsigned _BitInt(256).  The m_upward_2limbs case which
is used for handling mergeable arithmetics (like +-|&^ and casts etc.)
one loop iteration handles 2 limbs, the first half the even ones, the
second half the odd ones.
And for these 2 conversions, the special one for the inner conversion
on x86_64 is with index 2 where the sign bit of _BitInt(129) is present,
while for the outer one index 3 where we need to mask off the most
significant bit.
The r15-6742 change started using m_data[save_data_cnt] for all constant
indexes if it is still inside of the loop (and it is sign extension).
But that doesn't work correctly for the case where the inner conversion
produces the sign extension limb in the loop for an even index and
the outer conversion needs to special case the immediately next conversion,
because in that case using the PHI result will see still 0 there rather
than the updated value from the handling of previous limb.
So the following patch special cases this and uses the other SSA_NAME.

Commented IL, trying to lower
  _1 = (unsigned _BitInt(255)) y_4(D);
  _2 = (unsigned _BitInt(256)) _1;
  _3 = _2 + x_5(D);
  <retval> = _3;
we were emitting
  <bb 3> [local count: 1073741824]:
  # _8 = PHI <0(2), _9(12)>     // This is the limb index
  # _10 = PHI <0(2), _11(12)>   // Sign extension limb from inner cast (0 or ~0UL)
  # _22 = PHI <0(2), _23(12)>   // Overflow bit from addition of previous limb
  if (_8 <= 2)
    goto <bb 4>; [80.00%]
  else
    goto <bb 7>; [20.00%]

  <bb 4> [local count: 1073741824]:
  if (_8 == 2)
    goto <bb 6>; [20.00%]
  else
    goto <bb 5>; [80.00%]

  <bb 5> [local count: 1073741824]:
  _12 = VIEW_CONVERT_EXPR<unsigned long[3]>(y)[_8];     // Full limbs in y
  goto <bb 7>; [100.00%]

  <bb 6> [local count: 214748360]:
  _13 = MEM <unsigned long> [(_BitInt(129) *)&y + 16B]; // y[2] which
  _14 = (<unnamed-signed:1>) _13;                       // needs to be
  _15 = (unsigned long) _14;                            // sign extended
  _16 = (signed long) _15;                              // to full
  _17 = _16 >> 63;                                      // limb
  _18 = (unsigned long) _17;

  <bb 7> [local count: 1073741824]:
  # _19 = PHI <_12(5), _10(3), _15(6)>  // Limb to add for result of casts
  # _20 = PHI <0(5), _10(3), _18(6)>    // Sign extension limb from previous limb
  _11 = _20;                            // PHI _10 argument above
  _21 = VIEW_CONVERT_EXPR<unsigned long[4]>(x)[_8];
  _24 = .UADDC (_19, _21, _22);
  _25 = IMAGPART_EXPR <_24>;
  _26 = REALPART_EXPR <_24>;
  VIEW_CONVERT_EXPR<unsigned long[4]>(<retval>)[_8] = _26;
  _27 = _8 + 1;
  if (_27 == 3)                 // For the outer cast limb 3 is special
    goto <bb 11>; [20.00%]
  else
    goto <bb 8>; [80.00%]

  <bb 8> [local count: 1073741824]:
  if (_27 < 2)
    goto <bb 9>; [80.00%]
  else
    goto <bb 10>; [20.00%]

  <bb 9> [local count: 1073741824]:
  _28 = VIEW_CONVERT_EXPR<unsigned long[3]>(y)[_27];    // These are used in full

  <bb 10> [local count: 1073741824]:
  # _29 = PHI <_28(9), _11(8)>
  goto <bb 12>; [100.00%]

  <bb 11> [local count: 214748360]:
// And HERE is the actual bug.  Using _10 for idx 3 will mean it is always
// zero there and doesn't contain the _18 value propagated to it.
// It should be
// _30 = (<unnamed-unsigned:63>) _11;
// Now if the outer conversion had special iteration say 5, we could
// have used _10 fine here, by that time it already propagates through
// the PHI.
  _30 = (<unnamed-unsigned:63>) _10;
  _31 = (unsigned long) _30;

  <bb 12> [local count: 1073741824]:
  # _32 = PHI <_29(10), _31(11)>
  _33 = VIEW_CONVERT_EXPR<unsigned long[4]>(x)[_27];
  _34 = .UADDC (_32, _33, _25);
  _23 = IMAGPART_EXPR <_34>;
  _35 = REALPART_EXPR <_34>;
  VIEW_CONVERT_EXPR<unsigned long[4]>(<retval>)[_27] = _35;
  _9 = _8 + 2;
  if (_9 != 4)
    goto <bb 3>; [0.05%]
  else
    goto <bb 13>; [99.95%]

2025-04-11  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/119707
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Only use
m_data[save_data_cnt] instead of m_data[save_data_cnt + 1] if
idx is odd and equal to low + 1.  Remember tree_to_uhwi (idx) in
a temporary instead of calling the function multiple times.

* gcc.dg/torture/bitint-76.c: New test.

(cherry picked from commit b57d7ef4bdda8f939d804bfe40123cb9e4b447b3)

libquadmath: Fix up THREEp96 constant in expq

Here is a cherry-pick from glibc [BZ #32411] fix.

As mentioned by the reporter in a pull request against gcc-mirror,
the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128
rather than 0x3.p+96f128.

The algorithm uses that to compute the t2 integer (tval2), by whose
delta it adjusts the x+xl pair and then in the result uses the precomputed
exp value for that entry.
Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes
being one smaller, sometimes one larger than the desired value, thus can mean
the x+xl pair after adjustment will be larger in absolute value than it
should be.

DesWursters created a test program for this
https://github.com/DesWurstes/comparefloats
and his results were
total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648
I've modified this so with
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3
so that it actually tests pseudo-random _Float128 values with range
(-16384.,16384) with strong bias on values larger than 0.0002 in absolute
value (so that tval1/tval2 aren't zero most of the time) and that gave
total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255
So, in both cases, in most cases the change doesn't result in any differences,
and in those rare cases where does, about 85% have smaller ulp than without
the patch.
Additionally I've tried
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4
and in 2 billion iterations it didn't find any case where x+xl after the
adjustments without this change would be smaller in absolute value compared
to x+xl after the adjustments with this change.

2025-04-09 Jakub Jelinek <jakub@redhat.com>

* math/expq.c (C): Fix up THREEp96 constant.

(cherry picked from commit e081ced345c45581a4891361c08e50e07720239e)