gcc.gnu.org Git - gcc.git/log

Daily bump.

aarch64: Fix CFA offsets in non-initial stack probes [PR119610]

PR119610 is about incorrect CFI output for a stack probe when that
probe is not the initial allocation.  The main aarch64 stack probe
function, aarch64_allocate_and_probe_stack_space, implicitly assumed
that the incoming stack pointer pointed to the top of the frame,
and thus held the CFA.

aarch64_save_callee_saves and aarch64_restore_callee_saves use a
parameter called bytes_below_sp to track how far the stack pointer
is above the base of the static frame.  This patch does the same
thing for aarch64_allocate_and_probe_stack_space.

Also, I noticed that the SVE path was attaching the first CFA note
to the wrong instruction: it was attaching the note to the calculation
of the stack size, rather than to the r11<-sp copy.

gcc/
PR target/119610
* config/aarch64/aarch64.cc (aarch64_allocate_and_probe_stack_space):
Add a bytes_below_sp parameter and use it to calculate the CFA
offsets.  Attach the first SVE CFA note to the move into the
associated temporary register.
(aarch64_allocate_and_probe_stack_space): Update calls accordingly.
Start out with bytes_per_sp set to the frame size and decrement
it after each allocation.

gcc/testsuite/
PR target/119610
* g++.dg/torture/pr119610.C: New test.
* g++.target/aarch64/sve/pr119610-sve.C: Likewise.

(cherry picked from commit fa61afef18a8566d1907a5ae0e7754e1eac207d9)

Daily bump.

libstdc++/112351 - deal with __gthread_once failure during locale init

The following makes the C++98 locale init path follow the way the
C++11 performs initialization. This way we deal with pthread_once
failing, falling back to non-threadsafe initialization which, given we
initialize from the library, should be serialized by the dynamic
loader already.

PR libstdc++/112351
libstdc++-v3/
* src/c++98/locale.cc (locale::facet::_S_initialize_once):
Check whether _S_c_locale is already initialized.
(locale::facet::_S_get_c_locale): Always perform non-threadsafe
init when threadsafe init failed.

(cherry picked from commit 7562f089a190953b8ef615b90b7b0520e812a930)

Daily bump.

debug/101533 - ICE with variant typedef DIE generation

There's a sanity check in gen_type_die_with_usage that trips
unnecessarily for a case where the relevant DIE has already been
generated successfully in other ways. The following keys the
existing TREE_ASM_WRITTEN check on the correct object, honoring
this and does nothing instead of ICEing for the testcase at hand.

PR debug/101533
* dwarf2out.cc (gen_type_die_with_usage): When we have
output the typedef already do nothing for a typedef variant.
Do not set TREE_ASM_WRITTEN on the type.

* g++.dg/debug/pr101533.C: New testcase.

(cherry picked from commit 99a3f013c3bb8bc022ca488b40aa18fd97b5224d)

middle-end/101478 - ICE with degenerate address during gimplification

When we gimplify &MEM[0B + 4] we are re-folding the address in case
types are not canonical which ends up with a constant address that
recompute_tree_invariant_for_addr_expr ICEs on. Properly guard
that call.

PR middle-end/101478
* gimplify.cc (gimplify_addr_expr): Check we still have an
ADDR_EXPR before calling recompute_tree_invariant_for_addr_expr.

* gcc.dg/pr101478.c: New testcase.

(cherry picked from commit 33ead6400ad59d4b38fa0527a9a7b53a28114ab7)

lto/91299 - weak definition inlined with LTO

The following fixes a thinko in the handling of interposed weak
definitions which confused the interposition check in
get_availability by setting DECL_EXTERNAL too early.

PR lto/91299
gcc/lto/
* lto-symtab.cc (lto_symtab_merge_symbols): Set DECL_EXTERNAL
only after calling get_availability.

gcc/testsuite/
* gcc.dg/lto/pr91299_0.c: New testcase.
* gcc.dg/lto/pr91299_1.c: Likewise.

(cherry picked from commit bc34db5b12e008f6ec4fdf4ebd22263c8617e5e3)

tree-optimization/87984 - hard register assignments not preserved

The following disables redundant store elimination to hard register
variables which isn't valid.

PR tree-optimization/87984
* tree-ssa-dom.cc (dom_opt_dom_walker::optimize_stmt): Do
not perform redundant store elimination to hard register
variables.
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt):
Likewise.

* gcc.target/i386/pr87984.c: New testcase.

(cherry picked from commit 535115caaf97f5201fb528f67f15b4c52be5619d)

c++/79786 - bougs invocation of DATA_ABI_ALIGNMENT macro

The first argument is supposed to be a type, not a decl.

PR c++/79786
gcc/cp/
* rtti.cc (emit_tinfo_decl): Fix DATA_ABI_ALIGNMENT invocation.

(cherry picked from commit 6ec19825b4e72611cdbd4749feed67b61392aa81)

middle-end/66279 - gimplification clobbers shared asm constraints

When the C++ frontend clones a CTOR we do not copy ASM_EXPR constraints
fully as walk_tree does not recurse to TREE_PURPOSE of TREE_LIST nodes.
At this point doing that seems too dangerous so the following instead
avoids gimplification of ASM_EXPRs to clobber the shared constraints
and unshares it there, like it also unshares TREE_VALUE when it
re-writes a "+" output constraint to separate "=" output and matching
input constraint.

PR middle-end/66279
* gimplify.cc (gimplify_asm_expr): Copy TREE_PURPOSE before
rewriting it for "+" processing.

* g++.dg/pr66279.C: New testcase.

(cherry picked from commit 95f5d6cc17e7d6b689674756c62b6b5e1284afd0)

tree-optimization/111125 - avoid BB vectorization in novector loops

When a loop is marked with

  #pragma GCC novector

the following makes sure to also skip BB vectorization for contained
blocks.  That avoids gcc.dg/vect/bb-slp-29.c failing on aarch64
because of extra BB vectorization therein.  I'm not specifically
dealing with sub-loops of novector loops, the desired semantics
isn't documented.

PR tree-optimization/111125
* tree-vect-slp.cc (vect_slp_function): Split at novector
loop entry, do not push blocks in novector loops.

(cherry picked from commit 43da77a4f1636280c4259402c9c2c543e6ec6c0b)

Daily bump.

Enable generation of GNU stack notes on Linux

2023-11-06 John David Anglin <danglin@gcc.gnu.org>

* config/pa/pa-linux.h (NEED_INDICATE_EXEC_STACK): Define to 1.

Daily bump.

Fix GNAT build failure for x86/FreeBSD

gcc/ada/
PR ada/112958
* Makefile.rtl (LIBGNAT_TARGET_PAIRS) [x86 FreeBSD]: Add specific
version of s-dorepr.adb.
* libgnat/s-dorepr__freebsd.adb: New file.

AVR: target/119989 - Add missing clobbers to xload_<mode>_libgcc.

libgcc's __xload_1...4 is clobbering Z (and also R21 is some cases),
but avr.md had clobbers of respective GPRs only up to reload.
Outcome was that code reading from the same __memx address twice
could be wrong.  This patch adds respective clobbers.

      Backport from 2025-04-30 r14-11703
      PR target/119989
gcc/
* config/avr/avr.md (xload_<mode>_libgcc): Clobber R21, Z.

gcc/testsuite/
* gcc.target/avr/torture/pr119989.h: New file.
* gcc.target/avr/torture/pr119989-memx-1.c: New test.
* gcc.target/avr/torture/pr119989-memx-2.c: New test.
* gcc.target/avr/torture/pr119989-memx-3.c: New test.
* gcc.target/avr/torture/pr119989-memx-4.c: New test.

(cherry picked from commit 1ca1c1fc3b58ae5e1d3db4f5a2014132fe69f82a)

Daily bump.

sra: Clear grp_same_access_path of acesses created by total scalarization (PR118924)

During analysis of PR 118924 it was discussed that total scalarization
invents access paths (strings of COMPONENT_REFs and possibly even
ARRAY_REFs) which did not exist in the program before which can have
unintended effects on subsequent AA queries. Although not doing that
does not mean that SRA cannot create such situations (see the bug for
more info), it has been agreed that not doing this is generally better.
This patch therfore makes SRA fall back on creating simple MEM_REFs when
accessing components of an aggregate corresponding to what a SRA
variable now represents.

gcc/ChangeLog:

2025-03-26 Martin Jambor <mjambor@suse.cz>

PR tree-optimization/118924
* tree-sra.cc (create_total_scalarization_access): Set
grp_same_access_path flag to zero.

(cherry picked from commit 40445711b8af113ef423d8bcac1a7ce1c47f62d7)

sra: Avoid creating TBAA hazards (PR118924)

The testcase in PR 118924, when compiled on Aarch64, contains an
gimple aggregate assignment statement in between different types which
are types_compatible_p but behave differently for the purposes of
alias analysis.

SRA replaces the statement with a series of scalar assignments which
however have LHSs access chains modeled on the RHS type and so do not
alias with a subsequent reads and so are DSEd.

SRA clearly gets its "same_access_path" logic subtly wrong.  One issue
is that the same_access_path_p function probably should be implemented
more along the lines of (parts of ao_compare::compare_ao_refs) instead
of internally relying on operand_equal_p.  That is however not the
problem in the PR and so I will deal with it only later.

The issue here is that even when the access path is the same, it must
not be bolted on an aggregate type that does not match.  This patch
does that, taking just one simple function from the
ao_compare::compare_ao_refs machinery and using it to detect the
situation.  The rest is just merging the information in between
accesses of the same access group.

I looked at how many times we come across such assignment during
"make stage2-bubble" of GCC (configured with only c and C++ and
without multilib and libsanitizers) and on an x86_64 there were 87924
such assignments (though now I realize not all of them had to be
aggregate), so they do happen.  The patch leads to about 5% increase
of cases where we don't use an "access path" but resort to a
MEM_REF (from 90209 to 95204).  On an Aarch64, there were 92268 such
assignments and the increase of falling back to MEM_REFs was by
4% (but from a bigger base 132983 to 107991).

gcc/ChangeLog:

2025-04-04  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/118924
* tree-ssa-alias-compare.h (types_equal_for_same_type_for_tbaa_p):
Declare.
* tree-ssa-alias.cc: Include ipa-utils.h.
(types_equal_for_same_type_for_tbaa_p): New public overloaded variant.
* tree-sra.cc: Include tree-ssa-alias-compare.h.
(create_access): Initialzie grp_same_access_path to true.
(build_accesses_from_assign): Detect tbaa hazards and clear
grp_same_access_path fields of involved accesses when they occur.
(sort_and_splice_var_accesses): Take previous values of
grp_same_access_path into account.

gcc/testsuite/ChangeLog:

2025-03-25  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/118924
* g++.dg/tree-ssa/pr118924.C: New test.

(cherry picked from commit 07d243670020b339380194f6125cde87ada56148)

Remove other processors from X86_TUNE_DEST_FALSE_DEP_FOR_GLC except GLC

Since the tune if only for GLC(sapphirerapids and alderlake-P).

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_DEST_FALSE_DEP_FOR_GLC):
Remove other processor except for GLC since this one is only
for GLC.

(cherry picked from commit 1ad6e171b126a82f38b1e8cbfd207f1d91c58a59)

Daily bump.

Avoid using POINTER_DIFF_EXPR for overlap checks [PR119399]

In r10-4803-g8489e1f45b50600c I'd used POINTER_DIFF_EXPR to subtract
the two pointers involved in an overlap test. I'm not sure whether
I'd specifically chosen that over MINUS_EXPR or not; if so, the only
reason I can think of is that it is probably faster on targets with
PSImode pointers. Regardless, as the PR points out, subtracting
unrelated pointers using POINTER_DIFF_EXPR is undefined behaviour.

gcc/
PR tree-optimization/119399
* tree-data-ref.cc (create_waw_or_war_checks): Use a MINUS_EXPR
on two converted pointers, rather than converting a POINTER_DIFF_EXPR
on the pointers.

gcc/testsuite/
PR tree-optimization/119399
* gcc.dg/vect/pr119399.c: New test.

(cherry picked from commit 4c8c373495d7d863dfb7102726ac3b4b41685df4)

vect: Enforce dr_with_seg_len::align precondition [PR116125]

tree-data-refs.cc uses alignment information to try to optimise
the code generated for alias checks.  The assumption for "normal"
non-grouped, full-width scalar accesses was that the access size
would be a multiple of the alignment.  As Richi notes in the PR,
this is a documented precondition of dr_with_seg_len:

  /* The minimum common alignment of DR's start address, SEG_LEN and
     ACCESS_SIZE.  */
  unsigned int align;

PR115192 was a case in which this assumption didn't hold.  The access
was part of an aligned 4-element group, but only the first 2 elements
of the group were accessed.  The alignment was therefore double the
access size.

In r15-820-ga0fe4fb1c8d78045 I'd "fixed" that by capping the
alignment in one of the output routines.  But I think that was
misconceived.  The precondition means that we should cap the
alignment at source instead.

Failure to do that caused a similar wrong code bug in this PR,
where the alignment comes from a short bitfield access rather
than from a group access.

gcc/
PR tree-optimization/116125
* tree-vect-data-refs.cc (vect_prune_runtime_alias_test_list): Make
the dr_with_seg_len alignment fields describe tha access sizes as
well as the pointer alignment.
* tree-data-ref.cc (create_intersect_range_checks): Don't compensate
for invalid alignment fields here.

gcc/testsuite/
PR tree-optimization/116125
* gcc.dg/vect/pr116125.c: New test.

(cherry picked from commit e8651b80aeb86da935035e218747a6b41b611497)

s390: Accept only Pmode for registers AP/FP/RA [PR119235]

gcc/ChangeLog:

PR target/119235
* config/s390/s390.cc (s390_hard_regno_mode_ok): Accept only
Pmode for registers AP/FP/RA.

(cherry picked from commit 2b383ae2a6e5fc0530bfd8b86ad0e6b27e760bd2)

Fix a pasto in ao_compare::compare_ao_refs

When reading the function ao_compare::compare_ao_refs I came accross
what I believe to ba a copy-and-paste error which this patch fixes.

gcc/ChangeLog:

2025-03-10 Martin Jambor <mjambor@suse.cz>

* tree-ssa-alias.cc (ao_compare::compare_ao_refs): Fix a
copy-and-paste error.

(cherry picked from commit dc47161c1f32c3f27d1157ba0de9d98ea1b7fc82)

c++: templates, attributes, #pragma target [PR114772]

Since r12-5426 apply_late_template_attributes suppresses various global
state to avoid applying active pragmas to earlier declarations; we also
need to override target_option_current_node.

PR c++/114772
PR c++/101180

gcc/cp/ChangeLog:

* pt.cc (apply_late_template_attributes): Also override
target_option_current_node.

gcc/testsuite/ChangeLog:

* g++.dg/ext/pragma-target2.C: New test.

(cherry picked from commit 5fdb0145fb9499f5db9e27f775895ce4a39215e4)

c++: constexpr, trivial, and non-alias target [PR111075]

On Darwin and other targets with !can_alias_cdtor, we instead go to
maybe_thunk_ctor, which builds a thunk function that calls the general
constructor. And then cp_fold tries to constant-evaluate that call, and we
ICE because we don't expect to ever be asked to constant-evaluate a call to
a trivial function.

No new test because this fixes g++.dg/torture/tail-padding1.C on affected
targets.

PR c++/111075

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Allow trivial
call from a thunk.

(cherry picked from commit 628aecb050bbbc4bb0bd4468c474623e20d64e21)

libatomic: Fix up libat_{,un}lock_n for mingw [PR119796]

Here is just a port of the previously posted patch to mingw which
clearly has the same problems.

2025-04-16 Jakub Jelinek <jakub@redhat.com>

PR libgcc/101075
PR libgcc/119796
* config/mingw/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE). If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.

(cherry picked from commit 34fe8e90007afbc87941df9b01ffcf8747c11497)

libatomic: Fix up libat_{,un}lock_n [PR119796]

As mentioned in the PR (and I think in PR101075 too), we can run into
deadlock with libat_lock_n calls with larger n.
As mentioned in PR66842, we use multiple locks (normally 64 mutexes
for each 64 byte cache line in 4KiB page) and currently can lock more
than one lock, in particular for n [0, 64] a single lock, for n [65, 128]
2 locks, for n [129, 192] 3 locks etc.
There are two problems with this:
1) we can deadlock if there is some wrap-around, because the locks are
   acquired always in the order from addr_hash (ptr) up to
   locks[NLOCKS-1].mutex and then if needed from locks[0].mutex onwards;
   so if e.g. 2 threads perform libat_lock_n with n = 2048+64, in one
   case at pointer starting at page boundary and in another case at
   page boundary + 2048 bytes, the first thread can lock the first
   32 mutexes, the second thread can lock the last 32 mutexes and
   then first thread wait for the lock 32 held by second thread and
   second thread wait for the lock 0 held by the first thread;
   fixed below by always locking the locks in order of increasing
   index, if there is a wrap-around, by locking in 2 loops, first
   locking some locks at the start of the array and second at the
   end of it
2) the number of locks seems to be determined solely depending on the
   n value, I think that is wrong, we don't know the structure alignment
   on the libatomic side, it could very well be 1 byte aligned struct,
   and so how many cachelines are actually (partly or fully) occupied
   by the atomic access depends not just on the size, but also on
   ptr % WATCH_SIZE, e.g. 2 byte structure at address page_boundary+63
   should IMHO lock 2 locks because it occupies the first and second
   cacheline

Note, before this patch it locked exactly one lock for n = 0, while
with this patch it could lock either no locks at all (if it is at cacheline
boundary) or 1 (otherwise).
Dunno of libatomic APIs can be called for zero sizes and whether
we actually care that much how many mutexes are locked in that case,
because one can't actually read/write anything into zero sized memory.
If you think it is important, I could add else if (nlocks == 0) nlocks = 1;
in both spots.

2025-04-16  Jakub Jelinek  <jakub@redhat.com>

PR libgcc/101075
PR libgcc/119796
* config/posix/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE).  If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.

(cherry picked from commit 61dfb0747afcece3b7a690807b83b366ff34f329)

expmed: Always use QImode for init_expmed set_zero_cost [PR119785]

This is a regression on some targets introduced I believe by r6-2055
which added mode argument to set_src_cost.

The problem here is that in the first iteration, mode is always QImode
and we get as -Os zero cost set_src_cost (const0_rtx, QImode, false).
But then we use the mode variable for iterating over int, partial int
and vector int modes, so for the second iteration we call set_src_cost
with mode which is at that time (machine_mode) (MAX_MODE_VECTOR_INT + 1).

In the x86 case that happens to be V2HFmode and we don't crash (and
compute the same 0 cost as we would for QImode).
But e.g. in the SPARC case (machine_mode) (MAX_MODE_VECTOR_INT + 1) is
MAX_MACHINE_MODE and that does all kinds of weird things especially
when doing ubsan bootstrap.

Fixed by always using QImode.

2025-04-14 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/119785
* expmed.cc (init_expmed): Always pass QImode rather than mode to
set_src_cost passed to set_zero_cost.

(cherry picked from commit f96a54350afcf7f3c90d0ecb51d7683d826acc00)

driver: On linux hosts disable ASLR during -freport-bug [PR119727]

Andi had a useful comment that even with the PR119727 workaround to
ignore differences in libbacktrace printed addresses, it is still better
to turn off ASLR when easily possible, e.g. in case some address leaks
in somewhere in the ICE message elsewhere, or to verify the ICE doesn't
depend on a particular library/binary load addresses.

The following patch adds a configure check and uses personality syscall
to turn off randomization for further -freport-bug subprocesses.

2025-04-14 Jakub Jelinek <jakub@redhat.com>

PR driver/119727
* configure.ac (HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE): New check.
* gcc.cc: Include sys/personality.h if
HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
(try_generate_repro): Call
personality (personality (0xffffffffU) | ADDR_NO_RANDOMIZE)
if HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
* config.in: Regenerate.
* configure: Regenerate.

(cherry picked from commit 5a32e85810d33dc46b1b5fe2803ee787d40709d5)

driver: Fix up -freport-bug for ASLR [PR119727]

With --enable-host-pie -freport-bug almost never prepares preprocessed
source and instead emits
The bug is not reproducible, so it is likely a hardware or OS problem.
message even for bogus which are 100% reproducible.
The way -freport-bug works is that it reruns it 3 times, capturing stdout
and stderr from each and then tries to compare the outputs in between
different runs.
The libbacktrace emitted hexadecimal addresses at the start of the lines
can differ between runs due to ASLR, either of the PIE executable, or
even if not PIE if there is some frame with e.g. libc function (say
crash in strlen/memcpy etc.).

The following patch fixes it by ignoring such differences at the start of
the lines.

2025-04-12 Jakub Jelinek <jakub@redhat.com>

PR driver/119727
* gcc.cc (files_equal_p): Rewritten using fopen/fgets/fclose instead
of open/fstat/read/close. At the start of lines, ignore lowercase
hexadecimal addresses followed by space.

(cherry picked from commit 8b2ceb421f045ee8b39d7941f39f1e9a67217583)

libquadmath: Fix up THREEp96 constant in expq

Here is a cherry-pick from glibc [BZ #32411] fix.

As mentioned by the reporter in a pull request against gcc-mirror,
the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128
rather than 0x3.p+96f128.

The algorithm uses that to compute the t2 integer (tval2), by whose
delta it adjusts the x+xl pair and then in the result uses the precomputed
exp value for that entry.
Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes
being one smaller, sometimes one larger than the desired value, thus can mean
the x+xl pair after adjustment will be larger in absolute value than it
should be.

DesWursters created a test program for this
https://github.com/DesWurstes/comparefloats
and his results were
total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648
I've modified this so with
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3
so that it actually tests pseudo-random _Float128 values with range
(-16384.,16384) with strong bias on values larger than 0.0002 in absolute
value (so that tval1/tval2 aren't zero most of the time) and that gave
total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255
So, in both cases, in most cases the change doesn't result in any differences,
and in those rare cases where does, about 85% have smaller ulp than without
the patch.
Additionally I've tried
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4
and in 2 billion iterations it didn't find any case where x+xl after the
adjustments without this change would be smaller in absolute value compared
to x+xl after the adjustments with this change.

2025-04-09 Jakub Jelinek <jakub@redhat.com>

* math/expq.c (C): Fix up THREEp96 constant.

(cherry picked from commit e081ced345c45581a4891361c08e50e07720239e)

lto: lto-opts fixes [PR119625]

I can reproduce a really weird error in our distro i686 trunk gcc
(but haven't managed to reproduce it with vanilla trunk yet).
echo 'void foo (void) {}' > a.c; gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -o a.o -c a.c; gcc -O2 -flto=auto -m32 -march=i686 -r -o a.lo a.o
lto1: fatal error: open  failed: No such file or directory
compilation terminated.
lto-wrapper: fatal error: gcc returned 1 exit status
The error is because
cat ./a.lo.lto.o-args.0
""
a.o
My suspicion is that this "" in there is caused by weird .gnu.lto_.opts
section content during
gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -S -o a.s -c a.c
compilation (and I can reproduce that one with vanilla trunk).
The above results in
        .section        .gnu.lto_.opts,"e",@progbits
        .string "'-fno-openmp' '-fno-openacc' '-fPIC' '' '-m32' '-march=i686' '-O2' '-flto=auto' '-ffat-lto-objects'"
There are two weird things, one (IMHO the cause of the "" later on) is
the '' part, I think it comes from lto_write_options doing
append_to_collect_gcc_options (&temporary_obstack, &first_p, "");
IMHO it shouldn't call append_to_collect_gcc_options at all for that case.

The -fhardened option causes global_options.x_flag_cf_protection
to be set to CF_FULL and later on the backend option processing
sets it to CF_FULL | CF_SET (i.e. 7, a value not handled in
lto_write_options).

The following patch fixes it by not emitting anything there if
flag_cf_protection is one of the unhandled values.

Perhaps it could incrementally use
switch (global_options.x_flag_cf_protection & ~CF_SET)
instead, dunno.

And the other problem is that the -fPIC in there is really weird.
Our distro compiler or vanilla configured trunk certainly doesn't
default to -fPIC and -fhardened uses -fPIE when
-fPIC/-fpic/-fno-pie/-fno-pic is not specified, so I was expecting
-fPIE in there.
The thing is that the -fpie option causes setting of both
global_options.x_flag_pi{c,e} to 1, -fPIE both to 2:
      /* If -fPIE or -fpie is used, turn on PIC.  */
      if (opts->x_flag_pie)
        opts->x_flag_pic = opts->x_flag_pie;
      else if (opts->x_flag_pic == -1)
        opts->x_flag_pic = 0;
      if (opts->x_flag_pic && !opts->x_flag_pie)
        opts->x_flag_shlib = 1;
so checking first for flag_pic == 2 and then flag_pic == 1
and only afterwards for flag_pie means we never print
-fPIE/-fpie.

Or do you want something further (like
switch (global_options.x_flag_cf_protection & ~CF_SET)
)?

2025-04-04  Jakub Jelinek  <jakub@redhat.com>

PR lto/119625
* lto-opts.cc (lto_write_options): If neither flag_pic nor
flag_pie are set, check first for flag_pie and only later
for flag_pic rather than the other way around, use a temporary
variable.  If flag_cf_protection is not set, don't append anything
if flag_cf_protection is none of CF_{NONE,FULL,BRANCH,RETURN} and
use a temporary variable.

(cherry picked from commit d25728c98682c058bfda79333c94b0a8cf2a3f49)

c: Fix ICEs with -fsanitize=pointer-{subtract,compare} [PR119582]

The following testcase ICEs because c_fully_fold isn't performed on the
arguments of __sanitizer_ptr_{sub,cmp} builtins and so e.g.
C_MAYBE_CONST_EXPR can leak into the gimplifier where it ICEs.

2025-04-02 Jakub Jelinek <jakub@redhat.com>

PR c/119582
* c-typeck.cc (pointer_diff, build_binary_op): Call c_fully_fold on
__sanitizer_ptr_sub or __sanitizer_ptr_cmp arguments.

* gcc.dg/asan/pr119582.c: New test.

(cherry picked from commit 29bc904cb827615ed9f36bc3742ccc4ac77515ec)

combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]

The following testcase is miscompiled on x86_64-linux at -O2 by the combiner.
We have from earlier combinations
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
        (const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal}
     (nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
        (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
     (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
        (nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (parallel [
            (set (reg:CCZ 17 flags)
                (compare:CCZ (neg:SI (reg:SI 104 [ _7 ]))
                    (const_int 0 [0])))
            (set (reg/v:SI 116 [ e ])
                (neg:SI (reg:SI 104 [ _7 ])))
        ]) "pr119291.c":26:13 977 {*negsi_2}
     (expr_list:REG_DEAD (reg:SI 104 [ _7 ])
        (nil)))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
        (ne:DI (reg:CCZ 17 flags)
            (const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1}
     (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (nil)))
and try_combine is called on i3 25 and i2 22 (second time)
and reach the hunk being patched with simplified i3
(insn 25 24 26 4 (parallel [
            (set (pc)
                (pc))
            (set (reg/v:SI 116 [ e ])
                (const_int 0 [0]))
        ]) "pr119291.c":28:13 977 {*negsi_2}
     (expr_list:REG_DEAD (reg:SI 104 [ _7 ])
        (nil)))
and
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
        (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
     (nil))
Now, the try_combine code there attempts to split two independent
sets in newpat by moving one of them to i2.
And among other tests it checks
!modified_between_p (SET_DEST (set1), i2, i3)
which is certainly needed, if there would be say
(set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a]))
in between i2 and i3, we couldn't do that, as that set would overwrite
the value set by set1 we want to move to the i2 position.
But in this case pseudo 116 isn't set in between i2 and i3, but used
(and additionally there is a REG_DEAD note for it).

This is equally bad for the move, because while the i3 insn
and later will see the pseudo value that we set, the insn in between
which uses the value will see a different value from the one that
it should see.

As we don't check for that, in the end try_combine succeeds and
changes the IL to:
(insn 22 21 23 4 (set (reg/v:SI 116 [ e ])
        (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
     (nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
        (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
     (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
        (nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (set (pc)
        (pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE}
     (nil))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
        (const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal}
     (nil))
(note, the i3 got turned into a nop and try_combine also modified insn 27).

The following patch replaces the modified_between_p
tests with reg_used_between_p, my understanding is that
modified_between_p is a subset of reg_used_between_p, so one
doesn't need both.

Looking at this some more today, I think we should special case
set_noop_p because that can be put into i2 (except for the JUMP_P
violations), currently both modified_between_p (pc_rtx, i2, i3)
and reg_used_between_p (pc_rtx, i2, i3) returns false.
I'll post a patch incrementally for that (but that feels like
new optimization, so probably not something that should be backported).

On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote:
> Can we constrain SET_DEST (set1/set0) to a REG_P in combine?  Why
> does the comment talk about memory?

I was worried about making too risky changes this late in stage4
(and especially also for backports).  Most of this code is 1992-ish.
I think many of the functions are just misnamed, the reg_ in there doesn't
match what those functions do (bet they initially supported just REGs
and later on support for other kinds of expressions was added, but haven't
done git archeology to prove that).

What we know for sure is:
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
that is checked earlier in the condition.
Then it calls
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
                                  XVECEXP (newpat, 0, 0))
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
                                  XVECEXP (newpat, 0, 1))
While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p
which is also misnamed, that function handles just fine all of
REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT,
STRICT_LOW_PART, PC and even some further cases.
So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG
of REG, PC (at least the REG and PC cases are triggered on the testcase)
and quite possibly also MEM (SUBREG of MEM not, see below).

Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that
function for constants just returns false, for PC returns true, for REG
returns reg_set_between_p, for MEM recurses on the address, for
MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code
whether the memory could have been modified in between, for all other
rtxes recurses on the subrtxes.  This part didn't change in my patch.

I've only changed those
-         && !modified_between_p (SET_DEST (set{1,0}), i2, i3)
+         && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3)
where the former has been described above and clearly handles all of
REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things.

The replacement reg_used_between_p calls reg_overlap_mentioned_p on each
instruction in between i2 and i3.  So, there is clearly a difference
in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p
returns unconditionally true even if there are no instructions in between,
but reg_used_between_p if there are no non-debug insns in between returns
false.  Sorry for missing that, guess I should check for that (with the
exception of the noop moves which are often (set (pc) (pc)) and handled
by the incremental patch).  In fact not just that, reg_used_between_p
will only return true for PC if it is mentioned anywhere in the insns
in between.
Anyway, except for that, for REG it calls refers_to_regno_p
and so should find any occurrences of any of the REG or parts of it for hard
registers, for MEM returns true if it sees any MEMs in insns in between
(conservatively), for SUBREGs apparently it relies on it being SUBREG of REG
(so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the
SUBREG_REG, PC I've already described.

Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think
already the initial
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
                                  XVECEXP (newpat, 0, 0))
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
                                  XVECEXP (newpat, 0, 1))
calls would have failed --enable-checking=rtl or would have misbehaved, so
I think there is no need to check for it further.

To your question why I don't use reg_referenced_p, that is because
reg_referenced_p is something to call on one insn pattern, while
reg_used_between_p is pretty much that on all insns in between two
instructions (excluding the boundaries).

So, I think it would be safer to add && SET_DEST (set{1,0} != pc_rtx
checks to preserve former behavior, like in the following version.

2025-04-01  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/119291
* combine.cc (try_combine): For splitting of PARALLEL with
2 independent SETs into i2 and i3 sets check reg_used_between_p
of the SET_DESTs rather than just modified_between_p.

* gcc.c-torture/execute/pr119291.c: New test.

(cherry picked from commit 19ba913517b5e2a001fa9c0f060a1ac74430c027)

Fix up some further cases of missing or extraneous spaces in diagnostics

Given the recent PR119406 I've tried to grep for concatenated string
literals without space at the end of one line and at the start of next line,
unless it was obviously intentional.
Furthermore, I've then looked through gcc.pot looking for 2 adjacent spaces
and looking back if that wasn't the case of "something "
" with spaces at both sides".

Here is the result from that.

I think just the c.opt change needs an explanation, the "" in the
description is simply eaten up somewhere during the option processing and
gcc -v --help before this patch was displaying
-Wdeprecated-literal-operator Warn about deprecated space between and suffix in a user-defined literal operator.

2025-03-22 Jakub Jelinek <jakub@redhat.com>

gcc/
* gimplify.cc (warn_switch_unreachable_and_auto_init_r): Add missing
space in the middle of diagnostics.
gcc/fortran/
* resolve.cc (resolve_procedure_expression): Remove extraneous space
from the middle of diagnostics.

(cherry picked from commit 20360e4b6b5a63bc65d1855a7ecf22eb7148a452)

builtins: Fix up strspn/strcspn folding [PR119219]

The PR119204 r15-7955 fix caused some regressions.
The problem is that the fold_builtin* APIs document that expr is
either a CALL_EXPR of the call or NULL, so using TREE_TYPE (expr)
can crash e.g. during constexpr evaluation etc.

As can be seen in the surrounding patch, for the neighbouring builtins
(both modf and strpbrk) fold_builtin_2 passes down type, which is the
result type, TREE_TYPE (TREE_TYPE (fndecl)) and those builtins use it
to build the return value, while strspn was always building size_type_node
and strcspn had this change from that to TREE_TYPE (expr).
The patch passes type to these two and uses it there as well.

The patch keeps passing expr because it is used in the
check_nul_terminated_array calls done for both strspn and strcspn,
those calls clearly can deal with NULL expr but prefer if it is non-NULL
for some warning.

2025-03-12 Jakub Jelinek <jakub@redhat.com>

PR middle-end/119204
PR middle-end/119219
* builtins.cc (fold_builtin_2): Pass type as another argument
to fold_builtin_strspn and fold_builtin_strcspn.
(fold_builtin_strspn): Add type argument, use it instead of
size_type_node.
(fold_builtin_strcspn): Add type argument, use it instead of
TREE_TYPE (expr).

(cherry picked from commit da967f0ff324053304b350fdb18384607a346ebd)

middle-end/119204 - ICE with strcspn folding

The following makes sure to convert the folded expression to the
original expression type.

PR middle-end/119204
* builtins.cc (fold_builtin_strcspn): Preserve the original
expression type.

* gcc.dg/pr119204.c: New testcase.

(cherry picked from commit 68932eeb38f66fbc0c3cf4b77ff7dde8a408f2e4)

tree: Improve skip_simple_arithmetic [PR119183]

The following testcase takes very long time to compile, because
skip_simple_arithmetic decides to first call tree_invariant_p on
the second argument (and indirectly recurse there).  I think before
canonicalization of operands for commutative binary expressions
(and for non-commutative ones always) it is pretty common that the
first operand is a constant, something which tree_invariant_p handles
immediately, so the following patch special cases that; I've added
there a tree_invariant_p call too after the checks, while it is not
really needed currently, tree_invariant_p has the same checks, I wanted
to be prepared in case tree_invariant_p changes.  But if you think
I should avoid it, I can drop it too.

This is just a partial fix, I think one can certainly construct a testcase
which will still have horrible compile time complexity (but I've tried and
haven't managed to do so), so perhaps we should just limit the recursion
depth through skip_simple_arithmetic/tree_invariant_p with some defaulted
argument.

2025-03-11  Jakub Jelinek  <jakub@redhat.com>

PR c/119183
* tree.cc (skip_simple_arithmetic): If first operand of binary
expr is TREE_CONSTANT or TREE_READONLY with no side-effects, call
tree_invariant_p on that operand first instead of on the second.

* gcc.dg/pr119183.c: New test.

(cherry picked from commit 20e5aa9cc1519f871cce25dbfdc149d9d60da779)

libgcc: Fix up unwind-dw2-btree.h [PR119151]

The following testcase shows a bug in unwind-dw2-btree.h.
In short, the header provides lock-free btree data structure (so no parent
link on nodes, both insertion and deletion are done in top-down walks
with some locking of just a few nodes at a time so that lookups can notice
concurrent modifications and retry, non-leaf (inner) nodes contain keys
which are initially the base address of the left-most leaf entry of the
following child (or all ones if there is none) minus one, insertion ensures
balancing of the tree to ensure [d/2, d] entries filled through aggressive
splitting if it sees a full tree while walking, deletion performs various
operations like merging neighbour trees, merging into parent or moving some
nodes from neighbour to the current one).
What differs from the textbook implementations is mostly that the leaf nodes
don't include just address as a key, but address range, address + size
(where we don't insert any ranges with zero size) and the lookups can be
performed for any address in the [address, address + size) range.  The keys
on inner nodes are still just address-1, so the child covers all nodes
where addr <= key unless it is covered already in children to the left.
The user (static executables or JIT) should always ensure there is no
overlap in between any of the ranges.

In the testcase a bunch of insertions are done, always followed by one
removal, followed by one insertion of a range slightly different from the
removed one.  E.g. in the first case [&code[0x50], &code[0x59]] range
is removed and then we insert [&code[0x4c], &code[0x53]] range instead.
This is valid, it doesn't overlap anything.  But the problem is that some
non-leaf (inner) one used the &code[0x4f] key (after the 11 insertions
completely correctly).  On removal, nothing adjusts the keys on the parent
nodes (it really can't in the top-down only walk, the keys could be many nodes
above it and unlike insertion, removal only knows the start address, doesn't
know the removed size and so will discover it only when reaching the leaf
node which contains it; plus even if it knew the address and size, it still
doesn't know what the second left-most leaf node will be (i.e. the one after
removal)).  And on insertion, if nodes aren't split at a level, nothing
adjusts the inner keys either.  If a range is inserted and is either fully
bellow key (keys are - 1, so having address + size - 1 being equal to key is
fine) or fully after key (i.e. address > key), it works just fine, but if
the key is in a middle of the range like in this case, &code[0x4f] is in the
middle of the [&code[0x4c], &code[0x53]] range, then insertion works fine
(we only use size on the leaf nodes), and lookup of the addresses below
the key work fine too (i.e. [&code[0x4c], &code[0x4f]] will succeed).
The problem is with lookups after the key (i.e. [&code[0x50, &code[0x53]]),
the lookup looks for them in different children of the btree and doesn't
find an entry and returns NULL.

As users need to ensure non-overlapping entries at any time, the following
patch fixes it by adjusting keys during insertion where we know not just
the address but also size; if we find during the top-down walk a key
which is in the middle of the range being inserted, we simply increase the
key to be equal to address + size - 1 of the range being inserted.
There can't be any existing leaf nodes overlapping the range in correct
programs and the btree rebalancing done on deletion ensures we don't have
any empty nodes which would also cause problems.

The patch adjusts the keys in two spots, once for the current node being
walked (the last hunk in the header, with large comment trying to explain
it) and once during inner node splitting in a parent node if we'd otherwise
try to add that key in the middle of the range being inserted into the
parent node (in that case it would be missed in the last hunk).
The testcase covers both of those spots, so succeeds with GCC 12 (which
didn't have btrees) and fails with vanilla GCC trunk and also fails if
either the
  if (fence < base + size - 1)
    fence = iter->content.children[slot].separator = base + size - 1;
or
  if (left_fence >= target && left_fence < target + size - 1)
    left_fence = target + size - 1;
hunk is removed (of course, only with the current node sizes, i.e. up to
15 children of inner nodes and up to 10 entries in leaf nodes).

2025-03-10  Jakub Jelinek  <jakub@redhat.com>
    Michael Leuchtenburg  <michael@slashhome.org>

PR libgcc/119151
* unwind-dw2-btree.h (btree_split_inner): Add size argument.  If
left_fence is in the middle of [target,target + size - 1] range,
increase it to target + size - 1.
(btree_insert): Adjust btree_split_inner caller.  If fence is smaller
than base + size - 1, increase it and separator of the slot to
base + size - 1.

* gcc.dg/pr119151.c: New test.

(cherry picked from commit 21109b37e8585a7a1b27650fcbf1749380016108)

c++: Update TYPE_FIELDS of variant types if cp_parser_late_parsing_default_args etc. modify it [PR98533]

The following testcases ICE during type verification, because TYPE_FIELDS
of e.g. S RECORD_TYPE in pr119123.C is different from TYPE_FIELDS of const S.
Various decls are added to S's TYPE_FIELDS first, then finish_struct
indirectly calls fixup_type_variants to sync the variant copies.
But later on cp_parser_class_specifier calls
cp_parser_late_parsing_default_args and that apparently adds a lambda
type (from default argument) to TYPE_FIELDS of S.
Dunno if that is right or not, assuming it is right, the following
patch fixes it by updating TYPE_FIELDS of variant types if there were
any changes in the various functions cp_parser_class_specifier defers and
calls on the outermost enclosing class.
There was quite a lot of code repetition already before, so the patch
uses a lambda to avoid the repetitions.
To my surprise, in some of the contract testcases (
g++.dg/contracts/contracts-friend1.C
g++.dg/contracts/contracts-nested-class1.C
g++.dg/contracts/contracts-nested-class2.C
g++.dg/contracts/contracts-redecl7.C
g++.dg/contracts/contracts-redecl8.C
) it is actually setting class_type and pushing TRANSLATION_UNIT_DECL
rather than some class types in some cases.

Or should the lambda pushing into the containing class be somehow avoided?

2025-03-06 Jakub Jelinek <jakub@redhat.com>

PR c++/98533
PR c++/119123
* parser.cc (cp_parser_class_specifier): Update TYPE_FIELDS of
variant types in case cp_parser_late_parsing_default_args etc. change
TYPE_FIELDS on the main variant. Add switch_to_class lambda and
use it to simplify repeated class switching code.

* g++.dg/cpp0x/pr98533.C: New test.
* g++.dg/cpp0x/pr119123.C: New test.

(cherry picked from commit 179e01085b0aed111ef1f7908c4b87c800f880e9)

c++: Fix cxx_eval_store_expression {REAL,IMAG}PART_EXPR handling [PR119045]

I've added the asserts that probe == target because {REAL,IMAG}PART_EXPR
always implies a scalar type and so applying ARRAY_REF/COMPONENT_REF
etc. on it further doesn't make sense and the later code relies on it
to be the last one in refs array. But as the following testcase shows,
we can fail those assertions in case there is a reference or pointer
to the __real__ or __imag__ part, in that case we just evaluate the
constant expression and so probe won't be the same as target.
That case doesn't push anything into the refs array though.

The following patch changes those asserts to verify that refs is still
empty, which fixes it.

2025-02-28 Jakub Jelinek <jakub@redhat.com>

PR c++/119045
* constexpr.cc (cxx_eval_store_expression) <case REALPART_EXPR>:
Assert that refs->is_empty () rather than probe == target.
(cxx_eval_store_expression) <case IMAGPART_EXPR>: Likewise.

* g++.dg/cpp1y/constexpr-complex2.C: New test.

(cherry picked from commit 7eb8ec1856f71b039d1c2235b1c941934fa28e22)

c: stddef.h C23 fixes [PR114870]

The stddef.h header for C23 defines __STDC_VERSION_STDDEF_H__ and
unreachable macros multiple times in some cases.
The header doesn't have normal multiple inclusion guard, because it supports
for glibc inclusion with __need_{size_t,wchar_t,ptrdiff_t,wint_t,NULL}.
While the definition of __STDC_VERSION_STDDEF_H__ and unreachable is done
solely in the #ifdef _STDDEF_H part, so they are defined only if stddef.h
is included without those __need_* macros defined.  But actually once
stddef.h is included without the __need_* macros, _STDDEF_H is then defined
and while further stddef.h includes without __need_* macros don't do
anything:
#if (!defined(_STDDEF_H) && !defined(_STDDEF_H_) && !defined(_ANSI_STDDEF_H) \
      && !defined(__STDDEF_H__)) \
     || defined(__need_wchar_t) || defined(__need_size_t) \
     || defined(__need_ptrdiff_t) || defined(__need_NULL) \
     || defined(__need_wint_t)
if one includes whole stddef.h first and then stddef.h with some of the
__need_* macros defined, the #ifdef _STDDEF_H part is used again.
It isn't that big deal for most cases, as it uses extra guarding macros
like:
#ifndef _GCC_MAX_ALIGN_T
#define _GCC_MAX_ALIGN_T
...
#endif
etc., but for __STDC_VERSION_STDDEF_H__/unreachable nothing like that is
used.

So, either we do what the following patch does and just don't define
__STDC_VERSION_STDDEF_H__/unreachable second time, or use #ifndef
unreachable separately for the #define unreachable() case, or use
new _GCC_STDC_VERSION_STDDEF_H macro to guard this (or two, one for
__STDC_VERSION_STDDEF_H__ and one for unreachable), or rework the initial
condition to be just
#if !defined(_STDDEF_H) && !defined(_STDDEF_H_) && !defined(_ANSI_STDDEF_H) \
     && !defined(__STDDEF_H__)
- I really don't understand why the header should do anything at all after
it has been included once without __need_* macros.  But changing how this
behaves after 35 years might be risky for various OS/libc combinations.

2025-02-26  Jakub Jelinek  <jakub@redhat.com>

PR c/114870
* ginclude/stddef.h (__STDC_VERSION_STDDEF_H__, unreachable): Don't
redefine multiple times if stddef.h is first included without __need_*
defines and later with them.  Move nullptr_t and unreachable and
__STDC_VERSION_STDDEF_H__ definitions into the same
defined (__STDC_VERSION__) && __STDC_VERSION__ > 201710L #if block.

* gcc.dg/c23-stddef-2.c: New test.

(cherry picked from commit 8d22474af76a386eed488b3c66124134f0e41363)

openmp: Mark OpenMP atomic write expression as read [PR119000]

The following testcase was emitting false positive warning that
the rhs of #pragma omp atomic write was stored but not read,
when the atomic actually does read it. The following patch
fixes that by calling default_function_array_read_conversion
on it, so that it is marked as read as well as converted from
lvalue to rvalue.
Furthermore, the code had
if (code == NOP_EXPR) ... else ... if (code == NOP_EXPR) ...
with none of ... parts changing code, so I've merged the two ifs.

2025-02-25 Jakub Jelinek <jakub@redhat.com>

PR c/119000
* c-parser.cc (c_parser_omp_atomic): For omp write call
default_function_array_read_conversion on the rhs expression.
Merge the two adjacent if (code == NOP_EXPR) blocks.

* c-c++-common/gomp/pr119000.c: New test.

(cherry picked from commit cdffc76393488a73671b70481cf8a4b7c289029d)

reassoc: Fix up optimize_range_tests_to_bit_test [PR118915]

The following testcase is miscompiled due to a bug in
optimize_range_tests_to_bit_test.  It is trying to optimize
check for a in [-34,-34] or [-26,-26] or [-6,-6] or [-4,inf] ranges.
Another reassoc optimization folds the the test for the first
two ranges into (a + 34U) & ~8U in [0U,0U] range, and extract_bit_test_mask
actually has code to virtually undo it and treat that again as test
for a being -34 or -26.  The problem is that optimize_range_tests_to_bit_test
remembers in the type variable TREE_TYPE (ranges[i].exp); from the first
range.  If extract_bit_test_mask doesn't do that virtual undoing of the
BIT_AND_EXPR handling, that is just fine, the returned exp is ranges[i].exp.
But if the first range is BIT_AND_EXPR, the type could be different, the
BIT_AND_EXPR form has the optional cast to corresponding unsigned type
in order to avoid introducing UB.  Now, type was used to fill in the
max value if ranges[j].high was missing in subsequently tested range,
and so in this particular testcase the [-4,inf] range which was
signed int and so [-4,INT_MAX] was treated as [-4,UINT_MAX] instead.
And we were subtracting values of 2 different types and trying to make
sense out of that.

The following patch fixes this by using the type of the low bound
(which is always non-NULL) for the max value of the high bound instead.

2025-02-24  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/118915
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): For
highj == NULL_TREE use TYPE_MAX_VALUE (TREE_TYPE (lowj)) rather
than TYPE_MAX_VALUE (type).

* gcc.c-torture/execute/pr118915.c: New test.

(cherry picked from commit 5806279610783805286ebcd0af3b455602a3a8f9)

i386: Fix ICE with conditional QI/HI vector maxmin [PR118776]

The following testcase ICEs starting with GCC 12 since r12-4526
although the bug has been introduced already in r12-2751.
The problem was in the addition of cond_<code><mode> define_expand
which uses nonimmediate_operand predicates for both maxmin operands
for all VI1248_AVX512VLBW modes.  It works fine with
VI48_AVX512VL modes because the <code><mode>3_mask VI48_AVX512VL
define_expand uses ix86_fixup_binary_operands_no_copy and the
*avx512f_<code><mode>3<mask_name> VI48_AVX512VL define_insn uses
% in constraint and !(MEM_P && MEM_P) check in condition (and
<code><mode>3 define_expand with VI124_256_AVX512F_AVX512BW iterator
does that too), but eventhough the 8-bit and 16-bit element maxmin
is commutative too, the <mask_codefor><code><mode>3<mask_name>
define_insn with VI12_AVX512VL iterator didn't use % in constraint
to make it commutative.  So, e.g. cond_umaxv32qi define_expand
allowed nonimmediate_operand for both umax operands, but used
gen_umaxv32qi_mask which wasn't commutative and only allowed
nonimmediate_operand for the second operand.

The following patch fixes it by keeping the <code><mode>3
VI124_256_AVX512F_AVX512BW define_expand as is (it does
ix86_fixup_binary_operands_no_copy) but extending the
<code><mode>3_mask define_expand from VI48_AVX512VL to
VI1248_AVX512VLBW which keeps the current modes with their
ISA conditions and adds the VI12_AVX512VL modes under additional
TARGET_AVX512BW condition, and turning the actual define_insn
into an * prefixed name (which it was before just for the non-masked
case) and having the same commutative operand handling as in other
define_insns.

2025-02-08  Jakub Jelinek  <jakub@redhat.com>

PR target/118776
* config/i386/sse.md (<code><mode>3_mask): Use VI1248_AVX512VLBW
iterator rather than VI48_AVX512VL.
(<mask_codefor><code><mode>3<mask_name>): Rename to ...
(*avx512bw_<code><mode>3<mask_name>): ... this.  Use
nonimmediate_operand rather than register_operand predicate and %v
rather than v constraint for operand 1 and adjust condition to reject
MEMs in both operand 1 and 2.

* gcc.target/i386/pr118776.c: New test.

(cherry picked from commit 64d8ea056a5c339700118a412dea1c44a57acf55)

c++: Don't use CLEANUP_EH_ONLY for new expression cleanup [PR118763]

The following testcase is miscompiled since r12-6325 stopped
preevaluating the initializers for new expression.
If evaluating the initializers throws, there is a correct cleanup
for that, but it is marked CLEANUP_EH_ONLY.  While in standard
C++ that is just fine, if it has statement expressions, it can
return or goto out of the expression and we should delete the
pointer in that case too.

There is already a sentry variable initialized to true and
set to false after everything is initialized and used as a guard
for the cleanup, so just removing the CLEANUP_EH_ONLY flag does
everything we need.  And in the normal case of the initializer
not using statement expressions at least with -O2 we get the same code,
while the change changes one
try { sentry = true; ... sentry = false; } catch { if (sentry) delete ...; }
into
try { sentry = true; ... sentry = false; } finally { if (sentry) delete ...; }
optimizations will see that sentry is false when reaching the finally
other than through an exception.

Though, wonder what other CLEANUP_EH_ONLY cleanups might be an issue
with statement expressions.

2025-02-07  Jakub Jelinek  <jakub@redhat.com>

PR c++/118763
* init.cc (build_new_1): Don't set CLEANUP_EH_ONLY.

* g++.dg/asan/pr118763.C: New test.

(cherry picked from commit fcecc74cb38723457a0447924d9993b31252a8f9)

c++: Allow constexpr reads from volatile std::nullptr_t objects [PR118661]

As mentioned in the PR, https://eel.is/c++draft/conv.lval#note-1
says that even volatile reads from std::nullptr_t typed objects actually
don't read anything and https://eel.is/c++draft/expr.const#10.9
says that even those are ok in constant expressions.

So, the following patch adjusts the r9-4793 changes to have an exception
for NULLPTR_TYPE.
As [conv.lval]/3 also talks about accessing to inactive member, I've added
testcase to cover that as well.

2025-02-07 Jakub Jelinek <jakub@redhat.com>

PR c++/118661
* constexpr.cc (potential_constant_expression_1): Don't diagnose
lvalue-to-rvalue conversion of volatile lvalue if it has NULLPTR_TYPE.
* decl2.cc (decl_maybe_constant_var_p): Return true for constexpr
decls with NULLPTR_TYPE even if they are volatile.

* g++.dg/cpp0x/constexpr-volatile4.C: New test.
* g++.dg/cpp0x/constexpr-union9.C: New test.

(cherry picked from commit 6c8e6d6febaed3c167ca9534935c2cb18045528e)

icf: Compare call argument types in certain cases and asm operands [PR117432]

compare_operand uses operand_equal_p under the hood, which e.g. for
INTEGER_CSTs will just match the values rather regardless of their types.
Now, in many comparing the type is redundant, if we have
  x_2 = y_3 + 1;
we've already compared the type for the lhs and also for rhs1, there won't
be any surprises on rhs2.
As noted in the PR, there are cases where the type of the operand is the
sole place of information and we don't want to ICF merge functions if the
types differ.
One case is stdarg functions, arguments passed to ..., it is different
if we pass 1, 1L, 1LL.
Another case are the K&R unprototyped functions (sure, gone in C23).
And yet another case are inline asm operands, "r" (1) is different from "r"
(1L) from "r" (1LL).

So, the following patch determines based on lack of fntype (e.g. for
internal functions), or on !prototype_p, or on stdarg_p (in that case
using number of named arguments) which arguments need to have type checked
and does that, plus compares types on inline asm operands (maybe it would be
enough to do that just for input operands but we have just a routine to
handle both and I didn't feel we need to differentiate).

Furthermore, I've noticed fntype{1,2} isn't actually compared if it is a
direct call (gimple_call_fndecl is non-NULL).  That is wrong too, we could
have
  void (*fn) (int, long long) = (void (*) (int, long long)) foo;
  fn (1, 1LL);
in one case and
  void (*fn) (long long, int) = (void (*) (long long, int)) foo;
  fn (1LL, 1);
in another, both folded into a direct call of foo with different
gimple_call_fntype.  Sure, one of them would be UB at runtime (or both), but
what if we ICF merge it into something that into the one UB at runtime
and the program actually calls the correct one only?

2025-02-01  Jakub Jelinek  <jakub@redhat.com>

PR ipa/117432
* ipa-icf-gimple.cc (func_checker::compare_asm_inputs_outputs):
Also return_false if operands have incompatible types.
(func_checker::compare_gimple_call): Check fntype1 vs. fntype2
compatibility for all non-internal calls and assume fntype1 and
fntype2 are non-NULL for those.  For calls to non-prototyped
calls or for stdarg_p functions after the last named argument (if any)
check type compatibility of call arguments.

* gcc.c-torture/execute/pr117432.c: New test.
* gcc.target/i386/pr117432.c: New test.

(cherry picked from commit ebd111a2896816e4f5ddf5108f361b3d9d287fa0)

niter: Make build_cltz_expr more robust [PR118689]

Since my r15-7223 the niter analysis can recognize one loop during bootstrap
as being ctz like.
The patch just turned
@@ -2173,7 +2173,7 @@ PROC m2pim_NumberIO_BinToStr (CARDINAL x
   _T535_44 = &buf[i.40_2]{lb: 1 sz: 4};
   _T536_45 = x_21 & 1;
   *_T535_44 = _T536_45;
-  _T537_47 = x_21 / 2;
+  _T537_47 = x_21 >> 1;
   x_48 = _T537_47;
   # DEBUG x => x_48
   if (x_48 != 0)
which is not a big deal for the number_of_iterations_cltz optimization, it
recognizes both right shift by 1 and unsigned division by 2 (and similarly
for clz left shift by 1 or multiplication by 2).
But starting with forwprop1 that change also resulted in
@@ -1875,9 +1875,9 @@ PROC m2pim_NumberIO_BinToStr (CARDINAL x
   i.40_2 = (INTEGER) _T530_34;
   _T536_45 = x_21 & 1;
   MEM <CARDINAL[1:64]> [(CARDINAL *)&buf][i.40_2]{lb: 1 sz: 4} = _T536_45;
-  _T537_47 = x_21 / 2;
+  _T537_47 = x_21 >> 1;
   # DEBUG x => _T537_47
-  if (x_21 > 1)
+  if (_T537_47 != 0)
     goto <bb 3>; [INV]
   else
     goto <bb 8>; [INV]
and apparently it is only the latter form that number_of_iterations_cltz
pattern matches, not the former (after all, that was the exact reason
for r15-7223).
The problem is that build_cltz_expr assumes if IFN_C[LT]Z can't be used it
can use the __builtin_c[lt]z{,l,ll} builtins, and while most of the FEs do
create them, modula 2 does not.

The following patch just lets us punt if the FE doesn't build those builtins.
I've filed a PR against modula2 so that they add the builtins too.

2025-01-31  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/118689
PR modula2/115032
* tree-ssa-loop-niter.cc (build_cltz_expr): Return NULL_TREE if fn is
NULL and use_ifn is false.

(cherry picked from commit 85e1714b0606579a339c234510063e057fe662af)

d: give dependency files better filenames [PR118477]

Currently, the dependency files for root-file.o and common-file.o were
both d/.deps/file.Po, which would cause parallel builds to fail
sometimes with:

  make[3]: Leaving directory '/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/build/gcc'
  make[3]: Entering directory '/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/build/gcc'
  mv: cannot stat 'd/.deps/file.TPo': No such file or directory
  make[3]: *** [/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/gcc-14-20240511/gcc/d/Make-lang.in:421: d/root-file.o] Error 1 shuffle=131581365

Also, this means that dependencies of one of root-file or common-file
are missing when developing.  After this patch, those two files get
assigned dependency files d/.deps/root-file.Po and
d/.deps/common-file.Po respectively, so match the actual object
files in the d/ subdirectory.

There are other files with similar conflicts (mangle-package.o,
visitor-package.o for instance).

2025-01-29  Arsen Arsenović  <arsen@aarsen.me>
    Jakub Jelinek  <jakub@redhat.com>

PR d/118477
* Make-lang.in (DCOMPILE, DPOSTCOMPILE): Use $(basename $(@F))
instead of $(*F).

Co-Authored-By: Jakub Jelinek <jakub@redhat.com>
(cherry picked from commit d9ac0ad1e9a4ceec2d354ac0368da7462bea5675)

c++: Only destruct elts of array for new expression if exception is thrown during the initialization [PR117827]

The following testcase r12-6328, because the elements of the array
are destructed twice, once when the callee encounters delete[] p;
and then second time when the exception is thrown.
The array elts should be only destructed if exception is thrown from
one of the constructors during the build_vec_init emitted code in case of
new expressions, but when the new expression completes, it is IMO
responsibility of user code to delete[] it when it is no longer needed.

So, the following patch uses the cleanup_flags argument to build_vec_init
to get notified of the flags that need to be changed when the expression
is complete and build_disable_temp_cleanup to do the changes.

2025-01-25 Jakub Jelinek <jakub@redhat.com>

PR c++/117827
* init.cc (build_new_1): Pass address of a make_tree_vector ()
initialized gc tree vector to build_vec_init and append
build_disable_temp_cleanup to init_expr from it.

* g++.dg/init/array66.C: New test.

(cherry picked from commit ce268ca2a923f8f35cc9dd5a7d0468a3980f129f)

builtins: Store unspecified value to *exp for inf/nan [PR114877]

The fold_builtin_frexp folding for NaN/Inf just returned the first argument
with evaluating second arguments side-effects, rather than storing something
to what the second argument points to.

The PR argues that the C standard requires the function to store something
there but what exactly is stored is unspecified, so not storing there
anything can result in UB if the value isn't initialized and is read later.

glibc and newlib store there 0, musl apparently doesn't store anything.

The following patch stores there zero (or would you prefer storing there
some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form
in assembly though) and adjusts the test so that it
doesn't rely on not storing there anything but instead checks for
-Wmaybe-uninitialized warning to find out that something has been stored
there.
Unfortunately I had to disable the NaN tests for -O0, while we can fold
__builtin_isnan (__builtin_nan ("")) at compile time, we can't fold
__builtin_isnan ((i = 0, __builtin_nan (""))) at compile time.
fold_builtin_classify uses just tree_expr_nan_p and if that isn't true
(because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg),
it does
      arg = builtin_save_expr (arg);
      return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and
nothing propagates the NAN to the comparison.
I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR:
added and recurse on the second argument, but that feels like stage1
material to me if we want to do that at all.

2025-01-23  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/114877
* builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases
like rvc_zero, return passed in arg and set *exp = 0.

* gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as
dg-additional-options.
(bar): New function.
(TESTIT_FREXP2): Rework the macro so that it doesn't test whether
nothing has been stored to what the second argument points to, but
instead that something has been stored there, whatever it is.
(main): Temporarily don't enable the nan tests for -O0.

(cherry picked from commit d19b0682f18f9f5217aee8002e3d04f8ded04ae8)

c++: Wrap force_target_expr in get_member_function_from_ptrfunc with save_expr [PR118509]

My October PR117259 fix to get_member_function_from_ptrfunc to use a
TARGET_EXPR rather than SAVE_EXPR unfortunately caused some regressions as
well as the following testcase shows.
What happens is that
get_member_function_from_ptrfunc -> build_base_path calls save_expr,
so since the PR117259 change in mnay cases it will call save_expr on
a TARGET_EXPR.  And, for some strange reason a TARGET_EXPR is not considered
an invariant, so we get a SAVE_EXPR wrapped around the TARGET_EXPR.
That SAVE_EXPR <TARGET_EXPR <...>> gets initially added only to the second
operand of ?:, so at that point it would still work fine during expansion.
But unfortunately an expression with that subexpression is handed to the
caller also through *instance_ptrptr = instance_ptr; and gets evaluated
once again when computing the first argument to the method.
So, essentially, we end up with
(TARGET_EXPR <D.2907, ...>, (... ? ... SAVE_EXPR <TARGET_EXPR <D.2907, ...>
... : ...)) (... SAVE_EXPR <TARGET_EXPR <D.2907, ...> ..., ...);
and while D.2907 is initialized during gimplification in the code dominating
everything that uses it, the extra temporary created for the SAVE_EXPR
is initialized only conditionally (if the ?: condition is true) but then
used unconditionally, so we get
pmf-4.C: In function ‘void foo(C, B*)’:
pmf-4.C:12:11: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
   12 |   (y->*x) ();
      |   ~~~~~~~~^~
pmf-4.C:12:11: note: ‘<anonymous>’ was declared here
   12 |   (y->*x) ();
      |   ~~~~~~~~^~
diagnostic and wrong-code issue too.

As the trunk fix to just treat TARGET_EXPR as invariant seems a little bit risky
and I'd like to get it tested on the trunk for a while, for 14.2.1 this patch
instead wraps those TARGET_EXPRs into SAVE_EXPRs.  Eventually that can be reverted
and the trunk fix backported.

2025-01-21  Jakub Jelinek  <jakub@redhat.com>

PR c++/118509
* typeck.cc (get_member_function_from_ptrfunc): Wrap force_target_expr
with save_expr.

* g++.dg/expr/pmf-4.C: New test.

c++: Honor complain in cp_build_function_call_vec for check_function_arguments warnings [PR117825]

The following testcase ICEs due to re-entering diagnostics.
When diagnosing -Wformat-security warning, we try to print instantiation
context, which calls tsubst with tf_none, but that in the end calls
cp_build_function_call_vec which calls check_function_arguments which
diagnoses another warning (again -Wformat-security).

The other check_function_arguments caller, build_over_call, doesn't call
that function if !(complain & tf_warning), so I think the best fix is
to do it the same in cp_build_function_call_vec as well.

2025-01-08 Jakub Jelinek <jakub@redhat.com>

PR c++/117825
* typeck.cc (cp_build_function_call_vec): Don't call
check_function_arguments if complain doesn't have tf_warning bit set.

* g++.dg/warn/pr117825.C: New test.

(cherry picked from commit e5180fbcbcc356c71154413588288cbd30e5198d)

c++: Diagnose earlier non-static data members with cv containing class type [PR116108]

In r10-6457 aka PR92593 fix a check has been added to reject
earlier non-static data members with current_class_type in templates,
as the deduction then can result in endless recursion in reshape_init.
It fixed the
template <class T>
struct S { S s = 1; };
S t{2};
crashes, but as the following testcase shows, didn't catch when there
are cv qualifiers on the non-static data member.

Fixed by using TYPE_MAIN_VARIANT.

2024-12-17 Jakub Jelinek <jakub@redhat.com>

PR c++/116108
gcc/cp/
* decl.cc (grokdeclarator): Pass TYYPE_MAIN_VARIANT (type)
rather than type to same_type_p when checking if the non-static
data member doesn't have current class type.
gcc/testsuite/
* g++.dg/cpp1z/class-deduction117.C: New test.

(cherry picked from commit 88bfee560681d8248b89f130ada249e35ee2e344)

warn-access: Fix up matching_alloc_calls_p [PR118024]

The following testcase ICEs because of a bug in matching_alloc_calls_p.
The loop was apparently meant to be walking the two attribute chains
in lock-step, but doesn't really do that.  If the first lookup_attribute
returns non-NULL, the second one is not done, so rmats in that case can
be some random unrelated attribute rather than "malloc" attribute; the
body assumes even rmats if non-NULL is "malloc" attribute and relies
on its argument to be a "malloc" argument and if it is some other
attribute with incompatible attribute, it just crashes.

Now, fixing that in the obvious way, instead of doing
(amats = lookup_attribute ("malloc", amats))
|| (rmats = lookup_attribute ("malloc", rmats))
in the condition do
((amats = lookup_attribute ("malloc", amats)),
(rmats = lookup_attribute ("malloc", rmats)),
(amats || rmats))
fixes the testcase but regresses Wmismatched-dealloc-{2,3}.c tests.
The problem is that walking the attribute lists in a lock-step is obviously
a very bad idea, there is no requirement that the same deallocators are
present in the same order on both decls, e.g. there could be an extra malloc
attribute without argument in just one of the lists, or the order of say
free/realloc could be swapped, etc.  We don't generally document nor enforce
any particular ordering of attributes (even when for some attributes we just
handle the first one rather than all).

So, this patch instead simply splits it into two loops, the first one walks
alloc_decl attributes, the second one walks dealloc_decl attributes.
If the malloc attribute argument is a built-in, that doesn't change
anything, and otherwise we have the chance to populate the whole
common_deallocs hash_set in the first loop and then can check it in the
second one (and don't need to use more expensive add method on it, can just
check contains there).  Not to mention that it also fixes the case when
the function would incorrectly return true if there wasn't a common
deallocator between the two, but dealloc_decl had 2 malloc attributes with
the same deallocator.

2024-12-14  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/118024
* gimple-ssa-warn-access.cc (matching_alloc_calls_p): Walk malloc
attributes of alloc_decl and dealloc_decl in separate loops rather
than in lock-step.  Use common_deallocs.contains rather than
common_deallocs.add in the second loop.

(cherry picked from commit 9537ca5ad9bc23d7e9c446b4a7cbb98f63bddb6a)

cse: Fix up record_jump_equiv checks [PR117095]

The following testcase is miscompiled on s390x-linux with -O2 -march=z15.
The problem happens during cse2, which sees in an extended basic block
(jump_insn 217 78 216 10 (parallel [
            (set (pc)
                (if_then_else (ne (reg:SI 165)
                        (const_int 1 [0x1]))
                    (label_ref 216)
                    (pc)))
            (set (reg:SI 165)
                (plus:SI (reg:SI 165)
                    (const_int -1 [0xffffffffffffffff])))
            (clobber (scratch:SI))
            (clobber (reg:CC 33 %cc))
        ]) "t.c":14:17 discrim 1 2192 {doloop_si64}
     (int_list:REG_BR_PROB 955630228 (nil))
-> 216)
...
(insn 99 98 100 12 (set (reg:SI 138)
        (const_int 1 [0x1])) "t.c":9:31 1507 {*movsi_zarch}
     (nil))
(insn 100 99 103 12 (parallel [
            (set (reg:SI 137)
                (minus:SI (reg:SI 138)
                    (subreg:SI (reg:HI 135 [ a ]) 0)))
            (clobber (reg:CC 33 %cc))
        ]) "t.c":9:31 1904 {*subsi3}
     (expr_list:REG_DEAD (reg:SI 138)
        (expr_list:REG_DEAD (reg:HI 135 [ a ])
            (expr_list:REG_UNUSED (reg:CC 33 %cc)
                (nil)))))
Note, cse2 has df_note_add_problem () before df_analyze, which add
     (expr_list:REG_UNUSED (reg:SI 165)
        (expr_list:REG_UNUSED (reg:CC 33 %cc)
notes to the first insn (correctly so, %cc is clobbered there and pseudo
165 isn't used after the insn).
Now, cse_extended_basic_block has an extra optimization on conditional
jumps, where it records equivalence on the edge which continues in the ebb.
Here it sees (ne reg:SI 165) (const_int 1) is false on the edge and
remembers that pseudo 165 is comparison equivalent to (const_int 1),
so on insn 100 it decides to replace (reg:SI 138) with (reg:SI 165).

This optimization isn't correct here though, because the JUMP_INSN has
multiple sets.  Before r0-77890 record_jump_equiv has been called from
cse_insn guarded on n_sets == 1 && any_condjump_p (insn), so it wouldn't
be done on the above JUMP_INSN where n_sets == 2.  But since that change
it is guarded with single_set (insn) && any_condjump_p (insn) and that
is true because of the REG_UNUSED note.  Looking at that note is
inappropriate in CSE though, because the whole intent of the pass is to
extend the lifetimes of the pseudos if equivalence is found, so the fact
that there is REG_UNUSED note for (reg:SI 165) and that the reg isn't used
later doesn't imply that it won't be used after the optimization.
So, unless we manage to process the other sets on the JUMP_INSN (it wouldn't
be terribly hard in this exact case, the doloop insn decreases the register
by 1 and so we could just record equivalence to (const_int 0) instead, but
generally it might be hard), we should IMHO just punt if there are multiple
sets.

The patch below adds !multiple_sets (insn) check instead of replacing with
it the single_set (insn) check, because apparently any_condjump_p uses
pc_set which supports the case where PATTERN is a SET to PC (that is a
single_set (insn) && !multiple_sets (insn), PATTERN is a PARALLEL with a
single SET to PC (likewise) and some CLOBBERs, PARALLEL with two or more
SETs where the first one is SET to PC (that could be single_set (insn)
with REG_UNUSED notes but is not !multiple_sets (insn)) or PATTERN
is UNSPEC/UNSPEC_VOLATILE with SET inside of it.  For the last case
!multiple_sets (insn) will be true, but IMHO we shouldn't try to derive
anything from those because we haven't checked the rest of the UNSPEC*
and we don't really know what it does.

2024-12-13  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/117095
* cse.cc (cse_extended_basic_block): Don't call record_jump_equiv
if multiple_sets (insn).

* gcc.c-torture/execute/pr117095.c: New test.

(cherry picked from commit b626ebc0d7888ddae16a55ca583b56a4b8434bdf)

docs: Clarify -fsanitize=hwaddress target support [PR117960]

Since GCC 13 -fsanitize=hwaddress is not supported just on AArch64, but also
on x86_64 (but only with -mlam=u48 or -mlam=u57).

2024-12-09 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/117960
* doc/invoke.texi (fsanitize=hwaddress): Clarify on which targets
it is supported.

(cherry picked from commit 2e958291ff68d9bff1092895a14b6763de56823b)

docs: Fix up __sync_* documentation [PR117642]

The PR14311 commit which added support for __sync_* builtins documented that
there is a warning if a particular operation cannot be implemented.
But that commit nor anything later on implemented such warning, it was
always silent generation of the mentioned calls (which can in most cases
result in linker errors of course because those functions aren't implemented
anywhere, in libatomic or elsewhere in code shipped in gcc).

So, the following patch just adjust the documentation to match the
implementation.

2024-11-28 Jakub Jelinek <jakub@redhat.com>

PR target/117642
* doc/extend.texi: Remove documentation of warning for unimplemented
__sync_* operations, such warning has never been implemented.

(cherry picked from commit 0dcc09a8b5eb275ce939daad2bdfc7076ae1863c)

c: Fix sizeof error recovery [PR117745]

Compilation of the following testcase hangs forever after emitting first
error.  The problem is that in one place we just return error_mark_node
directly rather than going through c_expr_sizeof_expr or c_expr_sizeof_type.
The parsing of the expression could have called record_maybe_used_decl
though, but nothing calls pop_maybe_used which needs to be called after
parsing of every sizeof/typeof, successful or not.
At the end of the toplevel declaration we free the parser_obstack and in
another function record_maybe_used_decl is called again and due to the
missing pop_maybe_unused we end up with a cycle in the chain.

The following patch fixes it by just setting error and goto to the
    sizeof_expr:
      c_inhibit_evaluation_warnings--;
      in_sizeof--;
      mark_exp_read (expr.value);
      if (TREE_CODE (expr.value) == COMPONENT_REF
          && DECL_C_BIT_FIELD (TREE_OPERAND (expr.value, 1)))
        error_at (expr_loc, "%<sizeof%> applied to a bit-field");
      result = c_expr_sizeof_expr (expr_loc, expr);
where c_expr_sizeof_expr will do:
  struct c_expr ret;
  if (expr.value == error_mark_node)
    {
      ret.value = error_mark_node;
      ret.original_code = ERROR_MARK;
      ret.original_type = NULL;
      ret.m_decimal = 0;
      pop_maybe_used (false);
    }
...
  return ret;
which is exactly what the old code did manually except for the missing
pop_maybe_used call.  mark_exp_read does nothing on error_mark_node and
error_mark_node doesn't have COMPONENT_REF tree_code.

2024-11-27  Jakub Jelinek  <jakub@redhat.com>

PR c/117745
* c-parser.cc (c_parser_sizeof_expression): If type_name is NULL,
just expr.set_error () and goto sizeof_expr instead of doing error
recovery manually.

* gcc.dg/pr117745.c: New test.

(cherry picked from commit 958f0025f41d8bd9812e4da91a72b1ad79496e5b)

builtins: Fix up DFP ICEs on __builtin_fpclassify [PR102674]

This patch is similar to the one I've just posted, __builtin_fpclassify also
needs to print decimal float minimum differently and use real_from_string3.
Plus I've done some formatting fixes.

2024-11-26  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/102674
* builtins.cc (fold_builtin_fpclassify): Use real_from_string3 rather
than real_from_string.  Use "1E%d" format string rather than "0x1p%d"
for decimal float minimum.  Formatting fixes.

* gcc.dg/dfp/pr102674.c: New test.

(cherry picked from commit 5bb36d832c955e575bd458a02f3c6c5b28564aed)

builtins: Fix up DFP ICEs on __builtin_is{inf,finite,normal} [PR43374]

__builtin_is{inf,finite,normal} builtins ICE on _Decimal{32,64,128,64x}
operands unless those operands are constant.

The problem is that we fold the builtins to comparisons with the largest
finite number, but
a) get_max_float was only handling binary floats
b) real_from_string again assumes binary float
and so we were ICEing in the build_real called after the two calls.

This patch adds decimal handling into get_max_float (well, moves it
from c-cppbuiltin.cc which was printing those for __DEC{32,64,128}_MAX__
macros) and uses real_from_string3 (perhaps it is time to rename it
to just real_from_string now that we can use function overloading)
so that it handles both binary and decimal floats.

2024-11-26 Jakub Jelinek <jakub@redhat.com>

PR middle-end/43374
gcc/
* real.cc (get_max_float): Handle decimal float.
* builtins.cc (fold_builtin_interclass_mathfn): Use
real_from_string3 rather than real_from_string. Use
"1E%d" format string rather than "0x1p%d" for decimal
float minimum.
gcc/c-family/
* c-cppbuiltin.cc (builtin_define_decimal_float_constants): Use
get_max_float.
gcc/testsuite/
* gcc.dg/dfp/pr43374.c: New test.

(cherry picked from commit f39e6b4f5cd4e5362cf4b1004a591df2c8b00304)

phiopt: Fix a pasto in spaceship_replacement [PR117612]

When working on the PR117612 fix, I've noticed a pasto in
tree-ssa-phiopt.cc (spaceship_replacement).
The code is
      if (absu_hwi (tree_to_shwi (arg2)) != 1)
        return false;
      if (e1->flags & EDGE_TRUE_VALUE)
        {
          if (tree_to_shwi (arg0) != 2
              || absu_hwi (tree_to_shwi (arg1)) != 1
              || wi::to_widest (arg1) == wi::to_widest (arg2))
            return false;
        }
      else if (tree_to_shwi (arg1) != 2
               || absu_hwi (tree_to_shwi (arg0)) != 1
               || wi::to_widest (arg0) == wi::to_widest (arg1))
        return false;
where arg{0,1,2,3} are PHI args and wants to ensure that if e1 is a
true edge, then arg0 is 2 and one of arg{1,2} is -1 and one is 1,
otherwise arg1 is 2 and one of arg{0,2} is -1 and one is 1.
But due to pasto in the latte case doesn't verify that arg0
is different from arg2, it could be both -1 or both 1 and we wouldn't
punt.  The wi::to_widest (arg0) == wi::to_widest (arg1) test
is always false when we've made sure in the earlier conditions that
arg1 is 2 and arg0 is -1 or 1, so never 2.

2024-11-21  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/94589
PR tree-optimization/117612
* tree-ssa-phiopt.cc (spaceship_replacement): Fix up
a pasto in check when arg1 is 2.

(cherry picked from commit ca7430f145f5c7960f67ec77f585f3a2b58c7d10)

m2: Fix up dependencies some more

Every now and then my x86_64-linux bootstrap fails due to missing
dependencies somewhere in m2, usually during stage3.  I'm using
make -j32 and run 2 bootstraps concurrently (x86_64-linux and i686-linux)
on the same box.

Last night the same happened to me again,
with the first error
In file included from ./tm.h:29,
                 from ../../gcc/backend.h:28,
                 from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:27,
                 from m2/gm2-compiler-boot/M2AsmUtil.c:26:
../../gcc/config/i386/i386.h:2484:10: fatal error: insn-attr-common.h: No such file or directory
2484 | #include "insn-attr-common.h"
      |          ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[3]: *** [../../gcc/m2/Make-lang.in:1576: m2/gm2-compiler-boot/M2AsmUtil.o] Error 1
make[3]: *** Waiting for unfinished jobs....

Now, gcc/Makefile.in has a general rule:
# In order for parallel make to really start compiling the expensive
# objects from $(OBJS) as early as possible, build all their
# prerequisites strictly before all objects.
$(ALL_HOST_OBJS) : | $(generated_files)
which ensures that everything that might depend on the generated files
waits for those to be generated.
The above error clearly shows that such waiting didn't happen for
m2/gm2-compiler-boot/M2AsmUtil.o and some others.
ALL_HOST_OBJS includes $(ALL_HOST_FRONTEND_OBJS),
where the latter is
ALL_HOST_FRONTEND_OBJS = $(foreach v,$(CONFIG_LANGUAGES),$($(v)_OBJS))
m2_OBJS already includes various *.o files, for all those we wait until
the generated files are generated.  Though, seems
cc1gm2 depends on m2/stage1/cc1gm2 (which is just copied there),
and that depends on m2/gm2-compiler-boot/m2flex.o,
$(GM2_C_OBJS) and m2/gm2-gcc/rtegraph.o already included in m2_OBJS,
but also on
$(GM2_LIBS_BOOT) $(MC_LIBS)
where
MC_LIBS=m2/mc-boot-ch/Glibc.o m2/mc-boot-ch/Gmcrts.o
GM2_LIBS_BOOT     = m2/gm2-compiler-boot/gm2.a \
                    m2/gm2-libs-boot/libgm2.a \
                    $(GM2-BOOT-O)
GM2-BOOT-O isn't defined, and the 2 libraries depend on
$(BUILD-LIBS-BOOT) $(BUILD-COMPILER-BOOT)

So, the following patch adds those to m2_OBJS.

I'm not sure if something further is needed, like some objects
used to build the helper programs, mc and whatever else is needed,
I guess it depends on if they use or can use say tm.h or similar
headers which depend on the generated headers.

2024-11-09  Jakub Jelinek  <jakub@redhat.com>

gcc/m2/
* Make-lang.in (m2_OBJS): Add $(BUILD-LIBS-BOOT),
$(BUILD-COMPILER-BOOT) and $(MC_LIBS).

(cherry picked from commit 44682fbead3bba7345e0a575a8a680d10e75ae48)

c++: Fix ICE on constexpr virtual function [PR117317]

Since C++20 virtual methods can be constexpr, and if they are
constexpr evaluated, we choose tentative_decl_linkage for those
defer their output and decide at_eof again.
On the following testcases we ICE though, because if
expand_or_defer_fn_1 decides to use tentative_decl_linkage, it
returns true and the caller in that case cals emit_associated_thunks,
where use_thunk which it calls asserts DECL_INTERFACE_KNOWN on the
thunk destination, which isn't the case for tentative_decl_linkage.

The following patch fixes the ICE by not emitting the thunks
for the DECL_DEFER_OUTPUT fns just yet but waiting until at_eof
time when we return to those.
Note, the second testcase ICEs already since r0-110035 with -std=c++0x
before it gets a chance to diagnose constexpr virtual method.

2024-11-08 Jakub Jelinek <jakub@redhat.com>

PR c++/117317
* semantics.cc (emit_associated_thunks): Do nothing for
!DECL_INTERFACE_KNOWN && DECL_DEFER_OUTPUT fns.

* g++.dg/cpp2a/pr117317-1.C: New test.
* g++.dg/cpp2a/pr117317-2.C: New test.

(cherry picked from commit 5ff9e21c1ec81f8288e74679547e56051e051975)

store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless necessary [PR117439]

encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which
it has to allocate a buffer and do various extra work like shifting the bits
etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen
doesn't have corresponding integer mode.
The last case is explained later in the comments:
  /* The native_encode_expr machinery uses TYPE_MODE to determine how many
     bytes to write.  This means it can write more than
     ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example
     write 8 bytes for a bitlen of 40).  Skip the bytes that are not within
     bitlen and zero out the bits that are not relevant as well (that may
     contain a sign bit due to sign-extension).  */
Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR
or {CLOBBER}, which doesn't use native_encode_expr at all, just memset,
so that case doesn't need those fancy games unless bitlen or bitpos
aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is
possible).

The following patch makes us use the fast path even for empty_ctor_p
which occupy full bytes, we can just memset that in the provided buffer and
don't need to XALLOCAVEC another buffer.

This patch in itself fixes the testcase from the PR (which was about using
huge XALLLOCAVEC), but I want to do some other changes, to be posted in a
next patch.

2024-11-06  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/117439
* gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For
empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an
integral mode.

(cherry picked from commit aab572240a0752da74029ed9f8918e0b1628e8ba)

libstdc++: Fix up 117406.cc test [PR117406]

Christophe mentioned in bugzilla that the test FAILs on aarch64,
I'm not including <climits> and use INT_MAX.
Apparently during my testing I got it because the test preinclude
-include bits/stdc++.h
and that includes <climits>, dunno why that didn't happen on aarch64.
In any case, either I can add #include <climits>, or because the
test already has #include <limits> I've changed uses of INT_MAX
with std::numeric_limits<int>::max(), that should be the same thing.
But if you prefer
#include <climits>
I can surely add that instead.

2024-11-04 Jakub Jelinek <jakub@redhat.com>

PR libstdc++/117406
* testsuite/26_numerics/headers/cmath/117406.cc: Use
std::numeric_limits<int>::max() instead of INT_MAX.

(cherry picked from commit afcbf4dd27c147eb7d8f84e1a41c021eddec777e)

libstdc++: Fix up std::{,b}float16_t std::{ilogb,l{,l}r{ound,int}} [PR117406]

These overloads incorrectly cast the result of the float __builtin_*
to _Float or __gnu_cxx::__bfloat16_t.  For std::ilogb that changes
behavior for the INT_MAX return because that isn't representable in
either of the floating point formats, for the others it is I think
just a very inefficient hop from int/long/long long to std::{,b}float16_t
and back.  I mean for the round/rint cases, either the argument is small
and then the return value should be representable in the floating point
format too, or it is too large that the argument is already integral
and then it should just return the argument with the round trips.
Too large value is unspecified unlike ilogb.

2024-11-02  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/117406
* include/c_global/cmath (std::ilogb(_Float16), std::llrint(_Float16),
std::llround(_Float16), std::lrint(_Float16), std::lround(_Float16)):
Don't cast __builtin_* return to _Float16.
(std::ilogb(__gnu_cxx::__bfloat16_t),
std::llrint(__gnu_cxx::__bfloat16_t),
std::llround(__gnu_cxx::__bfloat16_t),
std::lrint(__gnu_cxx::__bfloat16_t),
std::lround(__gnu_cxx::__bfloat16_t)): Don't cast __builtin_* return to
__gnu_cxx::__bfloat16_t.
* testsuite/26_numerics/headers/cmath/117406.cc: New test.

(cherry picked from commit 36a9e2b22596711455e702ea5a5a3f26e145321c)

function: Call do_pending_stack_adjust in assign_parms [PR117296]

Functions called by assign_parms call emit_block_move in two places,
so on some targets can be expanded as calls and can result in pending
stack adjustment.

Now, during expansion we normally call do_pending_stack_adjust at the end
of expansion of each basic block or before emitting code that will branch
and/or has labels, and when emitting labels we assert that there are no
pending stack adjustments.

assign_parms is expanded before the first basic block and if the first
basic block starts with a label and at least one of those emit_block_move
calls resulted in the need of pending stack adjustments, we ICE when
emitting that label.

The following patch fixes that by calling do_pending_stack_adjust after
after the assign_parms potential emit_block_move calls.

2024-10-30 Jakub Jelinek <jakub@redhat.com>

PR target/117296
* function.cc (assign_parms): Call do_pending_stack_adjust.

* gcc.target/i386/pr117296.c: New test.

(cherry picked from commit fccef0c4ed0119ac53940bdb3838052339cf14a2)

libstdc++: Use if consteval rather than if (std::__is_constant_evaluated()) for {,b}float16_t nextafter [PR117321]

The nextafter_c++23.cc testcase fails to link at -O0.
The problem is that eventhough std::__is_constant_evaluated() has
always_inline attribute, that at -O0 just means that we inline the
call, but its result is still assigned to a temporary which is tested
later, nothing at -O0 propagates that false into the if and optimizes
away the if body.  And the __builtin_nextafterf16{,b} calls are meant
to be used solely for constant evaluation, the C libraries don't
define nextafterf16 these days.

As __STDCPP_FLOAT16_T__ and __STDCPP_BFLOAT16_T__ are predefined right
now only by GCC, not by clang which doesn't implement the extended floating
point types paper, and as they are predefined in C++23 and later modes only,
I think we can just use if consteval which is folded already during the FE
and the body isn't included even at -O0.  I've added a feature test for
that just in case clang implements those and implements those in some weird
way.  Note, if (__builtin_is_constant_evaluted()) would work correctly too,
that is also folded to false at gimplification time and the corresponding
if block not emitted at all.  But for -O0 it can't be wrapped into a helper
inline function.

2024-10-29  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/117321
* include/c_global/cmath (nextafter(_Float16, _Float16)): Use
if consteval rather than if (std::__is_constant_evaluated()) around
the __builtin_nextafterf16 call.
(nextafter(__gnu_cxx::__bfloat16_t, __gnu_cxx::__bfloat16_t)): Use
if consteval rather than if (std::__is_constant_evaluated()) around
the __builtin_nextafterf16b call.
* testsuite/26_numerics/headers/cmath/117321.cc: New test.

(cherry picked from commit 5e247ac0c28b9a2662f99c4a5420c5f7c2d0c6bd)

Assorted --disable-checking fixes [PR117249]

We have currently 3 different definitions of gcc_assert macro, one used most
of the time (unless --disable-checking) which evaluates the condition at
runtime and also checks it at runtime, then one for --disable-checking GCC 4.5+
which looks like
((void)(UNLIKELY (!(EXPR)) ? __builtin_unreachable (), 0 : 0))
and a fallback one
((void)(0 && (EXPR)))
Now, the last one actually doesn't evaluate any of the side-effects in the
argument, just quiets up unused var/parameter warnings.
I've tried to replace the middle definition with
({ [[assume (EXPR)]]; (void) 0; })
for compilers which support assume attribute and statement expressions
(surprisingly quite a few spots use gcc_assert inside of comma expressions),
but ran into PR117287, so for now such a change isn't being proposed.

The following patch attempts to move important side-effects from gcc_assert
arguments.

Bootstrapped/regtested on x86_64-linux and i686-linux with normal
--enable-checking=yes,rtl,extra, plus additionally I've attempted to do
x86_64-linux bootstrap with --disable-checking and gcc_assert changed to the
((void)(0 && (EXPR)))
version when --disable-checking.  That version ran into spurious middle-end
warnings
../../gcc/../include/libiberty.h:733:36: error: argument to ‘alloca’ is too large [-Werror=alloca-larger-than=]
../../gcc/tree-ssa-reassoc.cc:5659:20: note: in expansion of macro ‘XALLOCAVEC’
  int op_num = ops.length ();
  int op_normal_num = op_num;
  gcc_assert (op_num > 0);
  int stmt_num = op_num - 1;
  gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
where we have gcc_assert exactly to work-around middle-end warnings.
Guess I'd need to also disable -Werror for this experiment, which actually
isn't a problem with unmodified system.h, because even for
--disable-checking we use the __builtin_unreachable at least in
stage2/stage3 and so the warnings aren't emitted, and even if it used
[[assume ()]]; it would work too because in stage2/stage3 we could again
rely on assume and statement expression support.

2024-10-25  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/117249
* tree-ssa-structalias.cc (insert_vi_for_tree): Move put calls out of
gcc_assert.
* lto-cgraph.cc (lto_symtab_encoder_delete_node): Likewise.
* gimple-ssa-strength-reduction.cc (get_alternative_base,
add_cand_for_stmt): Likewise.
* tree-eh.cc (add_stmt_to_eh_lp_fn): Likewise.
* except.cc (duplicate_eh_regions_1): Likewise.
* tree-ssa-reassoc.cc (insert_operand_rank): Likewise.
* config/nvptx/nvptx.cc (nvptx_expand_call): Use == rather than = in
gcc_assert.
* opts-common.cc (jobserver_info::disconnect): Call close outside of
gcc_assert and only check result in it.
(jobserver_info::return_token): Call write outside of gcc_assert and
only check result in it.
* genautomata.cc (output_default_latencies): Move j++ side-effect
outside of gcc_assert.
* tree-ssa-loop-ivopts.cc (get_alias_ptr_type_for_ptr_address): Use
== rather than = in gcc_assert.
* cgraph.cc (symbol_table::create_edge): Move ++edges_max_uid
side-effect outside of gcc_assert.

(cherry picked from commit e2a8772c9328960c625f5b95091d4312efa0e284)

c++: Further fix for get_member_function_from_ptrfunc [PR117259]

The following testcase shows that the previous get_member_function_from_ptrfunc
changes weren't sufficient and we still have cases where
-fsanitize=undefined with pointers to member functions can cause wrong code
being generated and related false positive warnings.

The problem is that save_expr doesn't always create SAVE_EXPR, it can skip
some invariant arithmetics and in the end it could be really large
expressions which would be evaluated several times (and what is worse, with
-fsanitize=undefined those expressions then can have SAVE_EXPRs added to
their subparts for -fsanitize=bounds or -fsanitize=null or
-fsanitize=alignment instrumentation).  Tried to just build1 a SAVE_EXPR
+ add TREE_SIDE_EFFECTS instead of save_expr, but that doesn't work either,
because cp_fold happily optimizes those SAVE_EXPRs away when it sees
SAVE_EXPR operand is tree_invariant_p.

So, the following patch instead of using save_expr or building SAVE_EXPR
manually builds a TARGET_EXPR.  Both types are pointers, so it doesn't need
to be destroyed in any way, but TARGET_EXPR is what doesn't get optimized
away immediately.

2024-10-24  Jakub Jelinek  <jakub@redhat.com>

PR c++/117259
* typeck.cc (get_member_function_from_ptrfunc): Use force_target_expr
rather than save_expr for instance_ptr and function.  Don't call it
for TREE_CONSTANT.

* g++.dg/ubsan/pr117259.C: New test.

(cherry picked from commit b25d3201b6338d9f71c64f524ca2974d9a1f38e8)

asan: Fix up build_check_stmt gsi handling [PR117209]

gsi_safe_insert_before properly updates gsi_bb in gimple_stmt_iterator
in case it splits objects, but unfortunately build_check_stmt was in
some places (but not others) using a copy of the iterator rather than
the iterator passed from callers and so didn't propagate that to callers.
I guess it didn't matter much before when it was just using
gsi_insert_before as that really didn't change the iterator.
The !before_p case is apparently dead code, nothing is calling it with
before_p=false since around 4.9.

2024-10-24 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/117209
* asan.cc (maybe_cast_to_ptrmode): Formatting fix.
(build_check_stmt): Don't copy *iter into gsi, perform all
the updates on iter directly.

* gcc.dg/asan/pr117209.c: New test.

(cherry picked from commit 885143fa77599c44bfdd4e8e6b6987b7824db6ba)

c-family: Fix up -Wsizeof-pointer-memaccess ICEs [PR117230]

In the following testcases, we ICE on all 4 function calls.
The problem is using TYPE_PRECISION on vector types (but guess it
would be similarly problematic on structures/unions/arrays).
The test only differentiates between suggestion what to do, whether
to supply explicit size because sizeof (*p) for
{,{,un}signed }char *p is not very likely what the user want, or
dereferencing the pointer, so I think limiting that suggestion
to integral types is ok.

2024-10-22 Jakub Jelinek <jakub@redhat.com>

PR c/117230
* c-warn.cc (sizeof_pointer_memaccess_warning): Only compare
TYPE_PRECISION of TREE_TYPE (type) to precision of char if
TREE_TYPE (type) is integral type.

* c-c++-common/Wsizeof-pointer-memaccess5.c: New test.

(cherry picked from commit 5fd1c0c1b6968d55e3f997d67a4c149edf20c012)

i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]

min/max patterns for intrinsics which on x86 result in the second
input operand if the two operands are both zeros or one or both of them
are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to
MIN_EXPR/MAX_EXPR undefined what will be the result in those cases.

The following patch adds an expander which uses either a new pattern with
UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same.

2024-09-20 Uros Bizjak <ubizjak@gmail.com>
Jakub Jelinek <jakub@redhat.com>

PR target/116738
* config/i386/subst.md (mask_scalar_operand_arg34,
mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New
subst attributes.
* config/i386/sse.md
(<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
Change from define_insn to define_expand, rename the old define_insn
to ...
(*<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
... this.
(<sse>_ieee_vm<ieee_maxmin><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
New define_insn.

* gcc.target/i386/sse-pr116738.c: New test.

(cherry picked from commit 624d3af025e820ede7ec4334f7a2d5d4731c99a9)

c++: Don't emit deprecated/unavailable attribute diagnostics when creating cdtor thunks [PR116678]

Another spot where we mark_used a function (in this case ctor or dtor)
even when it is just artificially used inside of thunks (emitted on mingw
with -Os for the testcase).

2024-09-13 Jakub Jelinek <jakub@redhat.com>

PR c++/116678
* optimize.cc: Include decl.h.
(maybe_thunk_body): Temporarily change deprecated_state to
UNAVAILABLE_DEPRECATED_SUPPRESS.

* g++.dg/warn/deprecated-20.C: New test.

(cherry picked from commit b7b67732e20217196f2a13a10fc3df4605b2b2ab)

Daily bump.

Fortran: fix issue with impure elemental subroutine and interface [PR119656]

PR fortran/119656

gcc/fortran/ChangeLog:

* interface.cc (gfc_compare_actual_formal): Fix front-end memleak
when searching for matching interfaces.
* trans-expr.cc (gfc_conv_procedure_call): If there is a formal
dummy corresponding to an absent argument, use its type, and only
fall back to inferred type otherwise.

gcc/testsuite/ChangeLog:

* gfortran.dg/optional_absent_13.f90: New test.

(cherry picked from commit 334545194d9023fb9b2f72ee0dcde8af94930f25)

Daily bump.

d: Fix ICE: type variant differs by TYPE_MAX_VALUE with -g [PR119826]

Forward referenced enum types were never fixed up after the main
ENUMERAL_TYPE was finished. All flags set are now propagated to all
variants after its mode, size, and alignment has been calculated.

PR d/119826

gcc/d/ChangeLog:

* types.cc (TypeVisitor::visit (TypeEnum *)): Propagate flags of main
enum types to all forward-referenced variants.

gcc/testsuite/ChangeLog:

* gdc.dg/debug/imports/pr119826b.d: New test.
* gdc.dg/debug/pr119826.d: New test.

(cherry picked from commit c5ffab99a5a962aa955310e74ca0a4be5c1acf30)

d: Fix ICE in dwarf2out_imported_module_or_decl, at dwarf2out.cc:27676 [PR119817]

The ImportVisitor method for handling the importing of overload sets was
pushing NULL_TREE to the array of import decls, which in turn got passed
to `debug_hooks->imported_module_or_decl', triggering the observed
internal compiler error.

NULL_TREE is returned from `build_import_decl' when the symbol was
ignored for being non-trivial to represent in debug, for example,
template or tuple declarations. So similarly "skip" adding the symbol
when this is the case for overload sets too.

PR d/119817

gcc/d/ChangeLog:

* imports.cc (ImportVisitor::visit (OverloadSet *)): Don't push
NULL_TREE to vector of import symbols.

gcc/testsuite/ChangeLog:

* gdc.dg/debug/imports/m119817/a.d: New test.
* gdc.dg/debug/imports/m119817/b.d: New test.
* gdc.dg/debug/imports/m119817/package.d: New test.
* gdc.dg/debug/pr119817.d: New test.

(cherry picked from commit f5ed7d19c965de9ccb158d77e929b17459bf65b5)

d: Fix forward referenced enums missing type names in debug info [PR118309]

Calling `rest_of_type_compilation' as the D types were built meant that
debug info was being emitted before all forward references were
resolved, resulting in DW_AT_name's to be missing.

Instead, defer outputting type debug information until all modules have
been parsed and generated in `d_finish_compilation'.

PR d/118309

gcc/d/ChangeLog:

* modules.cc: Include debug.h
(d_finish_compilation): Call debug_hooks->type_decl on all TYPE_DECLs.
* types.cc: Remove toplev.h include.
(finish_aggregate_type): Don't call rest_of_type_compilation or
rest_of_decl_compilation on type.
(TypeVisitor::visit (TypeEnum *)): Likewise.

gcc/testsuite/ChangeLog:

* gdc.dg/debug/dwarf2/pr118309.d: New test.

(cherry picked from commit cee353c2653d274768a67677c8ea37fd23422b3c)

Daily bump.

libstdc++: Correct preprocessing checks for floatX_t and bfloat_16 formatting

Floating points types _Float16, _Float32, _Float64, and bfloat16,
can be formatted only if std::to_chars overloads for such types
were provided. Currently this is only the case for architectures
where float and double are 32-bits and 64-bits IEEE floating points types.

Remove a potential UB, where we could produce basic_format_arg
with _M_type set to _Arg_fp32 or _Arg_fp64, that was later not
handled by `_M_visit`.

libstdc++-v3/ChangeLog:

* include/std/format (basic_format_arg::_S_to_arg_type): Normalize
_Float32 and _Float64 only to float and double respectivelly.
(basic_format_arg::_S_to_enum): Remove handling of _Float32 and _Float64.

Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked with modifications from commit 445128c12cf22081223f7385196ee3889ef4c4b2)

Daily bump.