Richard Biener [Fri, 19 Jul 2024 05:53:33 +0000 (05:53 +0000)]
Update ChangeLog and version files for release
GCC Administrator [Fri, 19 Jul 2024 00:20:37 +0000 (00:20 +0000)]
Daily bump.
Jakub Jelinek [Thu, 18 Jul 2024 07:22:10 +0000 (09:22 +0200)]
testsuite: Fix up builtin-clear-padding-3.c for -funsigned-char
As reported on gcc-regression, this test FAILs on aarch64, but my
r15-2090 change didn't change anything on the generated assembly,
just added the forgotten dg-do run directive to the test, so the
test has been failing forever, just we didn't know it.
I can actually reproduce it on x86_64 with -funsigned-char too,
s2.b.a has int type and -1 is stored to it, so we should compare
it against -1 rather than (char) -1; the latter is appropriate for
testing char fields into which we've stored -1.
2024-07-18 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/torture/builtin-clear-padding-3.c (main): Compare
s2.b.a against -1 rather than (char) -1.
(cherry picked from commit
958ee138748fae4371e453eb9b357f576abbe83e)
GCC Administrator [Thu, 18 Jul 2024 00:19:34 +0000 (00:19 +0000)]
Daily bump.
Jakub Jelinek [Wed, 17 Jul 2024 09:38:33 +0000 (11:38 +0200)]
gimple-fold: Fix up __builtin_clear_padding lowering [PR115527]
The builtin-clear-padding-6.c testcase fails as clear_padding_type
doesn't correctly recompute the buf->size and buf->off members after
expanding clearing of an array using a runtime loop.
buf->size should be in that case the offset after which it should continue
with next members or padding before them modulo UNITS_PER_WORD and
buf->off that offset minus buf->size. That is what the code was doing,
but with off being the start of the loop cleared array, not its end.
So, the last hunk in gimple-fold.cc fixes that.
When adding the testcase, I've noticed that the
c-c++-common/torture/builtin-clear-padding-* tests, although clearly
written as runtime tests to test the builtins at runtime, didn't have
{ dg-do run } directive and were just compile tests because of that.
When adding that to the tests, builtin-clear-padding-1.c was already
failing without that clear_padding_type hunk too, but
builtin-clear-padding-5.c was still failing even after the change.
That is due to a bug in clear_padding_flush which the patch fixes as
well - when clear_padding_flush is called with full=true (that happens
at the end of the whole __builtin_clear_padding or on those array
padding clears done by a runtime loop), it wants to flush all the pending
padding clearings rather than just some. If it is at the end of the whole
object, it decreases wordsize when needed to make sure the code never writes
including RMW cycles to something outside of the object:
if ((unsigned HOST_WIDE_INT) (buf->off + i + wordsize)
> (unsigned HOST_WIDE_INT) buf->sz)
{
gcc_assert (wordsize > 1);
wordsize /= 2;
i -= wordsize;
continue;
}
but if it is full==true flush in the middle, this doesn't happen, but we
still process just the buffer bytes before the current end. If that end
is not on a wordsize boundary, e.g. on the builtin-clear-padding-5.c test
the last chunk is 2 bytes, '\0', '\xff', i is 16 and end is 18,
nonzero_last might be equal to the end - i, i.e. 2 here, but still all_ones
might be true, so in some spots we just didn't emit any clearing in that
last chunk.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
PR middle-end/115527
* gimple-fold.c (clear_padding_flush): Introduce endsize
variable and use it instead of wordsize when comparing it against
nonzero_last.
(clear_padding_type): Increment off by sz.
* c-c++-common/torture/builtin-clear-padding-1.c: Add dg-do run
directive.
* c-c++-common/torture/builtin-clear-padding-2.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-3.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-4.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-5.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-6.c: New test.
(cherry picked from commit
8b5919bae11754f4b65a17e63663d3143f9615ac)
GCC Administrator [Wed, 17 Jul 2024 00:19:22 +0000 (00:19 +0000)]
Daily bump.
Richard Biener [Mon, 15 Jul 2024 11:01:24 +0000 (13:01 +0200)]
Fixup unaligned load/store cost for znver4
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue. It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4. The following makes
the unaligned costs equal to the aligned costs.
This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there. But it makes it qualify
as a regression fix.
PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.
(cherry picked from commit
1e3aa9c9278db69d4bdb661a750a7268789188d6)
GCC Administrator [Tue, 16 Jul 2024 00:22:14 +0000 (00:22 +0000)]
Daily bump.
GCC Administrator [Mon, 15 Jul 2024 00:20:01 +0000 (00:20 +0000)]
Daily bump.
GCC Administrator [Sun, 14 Jul 2024 00:20:21 +0000 (00:20 +0000)]
Daily bump.
GCC Administrator [Sat, 13 Jul 2024 00:20:09 +0000 (00:20 +0000)]
Daily bump.
Jonathan Wakely [Fri, 29 Apr 2022 11:17:13 +0000 (12:17 +0100)]
libstdc++: Add missing exports for ppc64le --with-long-double-format=ibm [PR105417]
The --with-long-double-abi=ibm build is missing some exports that are
present in the --with-long-double-abi=ieee build. Those symbols never
should have been exported at all, but now that they have been, they
should be exported consistently by both ibm and ieee.
This simply defines them as aliases for equivalent symbols that are
already present. The abi-tag on num_get::_M_extract_int isn't really
needed, because it only uses a std::string as a local variable, not in
the return type or function parameters, so it's safe to define the
_M_extract_int[abi:cxx11] symbols as aliases for the corresponding
function without the abi-tag.
This causes some new symbols to be added to the GLIBCXX_3.4.29 version
for the ibm long double build mode, but there is no advantage to adding
them to 3.4.30 for that build. That would just create more
inconsistencies.
libstdc++-v3/ChangeLog:
PR libstdc++/105417
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt:
Regenerate.
* src/c++11/compatibility-ldbl-alt128.cc [_GLIBCXX_USE_DUAL_ABI]:
Define __gnu_ieee128::num_get<C>::_M_extract_int[abi:cxx11]<I>
symbols as aliases for corresponding symbols without abi-tag.
(cherry picked from commit
bb7cf39b05a216431a431499d0c36a6034f6acc4)
GCC Administrator [Fri, 12 Jul 2024 00:21:38 +0000 (00:21 +0000)]
Daily bump.
Jonathan Wakely [Wed, 27 Mar 2024 21:51:13 +0000 (21:51 +0000)]
libstdc++: Reverse arguments in constraint for std::optional's <=> [PR104606]
This is a workaround for a possible compiler bug that causes constraint
recursion in the operator<=>(const optional<T>&, const U&) overload.
libstdc++-v3/ChangeLog:
PR libstdc++/104606
* include/std/optional (operator<=>(const optional<T>&, const U&)):
Reverse order of three_way_comparable_with template arguments.
* testsuite/20_util/optional/relops/104606.cc: New test.
(cherry picked from commit
7f65d8267fbfd19cf21a3dc71d27e989e75044a3)
Andre Vieira [Thu, 11 Jul 2024 14:38:45 +0000 (15:38 +0100)]
mve: Fix vsetq_lane for 64-bit elements with lane 1 [PR 115611]
This patch fixes the backend pattern that was printing the wrong input
scalar register pair when inserting into lane 1.
Added a new test to force float-abi=hard so we can use scan-assembler to check
correct codegen.
gcc/ChangeLog:
PR target/115611
* config/arm/mve.md (mve_vec_setv2di_internal): Fix printing of input
scalar register pair when lane = 1.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/intrinsics/vsetq_lane_su64.c: New test.
(cherry picked from commit
7c11fdd2cc11a7058e9643b6abf27831970ad2c9)
GCC Administrator [Thu, 11 Jul 2024 00:19:36 +0000 (00:19 +0000)]
Daily bump.
Uros Bizjak [Wed, 10 Jul 2024 07:27:27 +0000 (09:27 +0200)]
middle-end: Fix stalled swapped condition code value [PR115836]
emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable. However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.
Move calculation of scode value just before its only usage site to
avoid stalled scode value.
PR middle-end/115836
gcc/ChangeLog:
* expmed.c (emit_store_flag_1): Move calculation of
scode just before its only usage site.
(cherry picked from commit
44933fdeb338e00c972e42224b9a83d3f8f6a757)
Jonathan Wakely [Thu, 21 Mar 2024 13:25:15 +0000 (13:25 +0000)]
libstdc++: Destroy allocators in re-inserted container nodes [PR114401]
The allocator objects in container node handles were not being destroyed
after the node was re-inserted into a container. They are stored in a
union and so need to be explicitly destroyed when the node becomes
empty. The containers were zeroing the node handle's pointer, which
makes it empty, causing the handle's destructor to think there's nothing
to clean up.
Add a new member function to the node handle which destroys the
allocator and zeros the pointer. Change the containers to call that
instead of just changing the pointer manually.
We can also remove the _M_empty member of the union which is not
necessary.
libstdc++-v3/ChangeLog:
PR libstdc++/114401
* include/bits/hashtable.h (_Hashtable::_M_reinsert_node): Call
release() on node handle instead of just zeroing its pointer.
(_Hashtable::_M_reinsert_node_multi): Likewise.
(_Hashtable::_M_merge_unique): Likewise.
* include/bits/node_handle.h (_Node_handle_common::release()):
New member function.
(_Node_handle_common::_Optional_alloc::_M_empty): Remove
unnecessary union member.
(_Node_handle_common): Declare _Hashtable as a friend.
* include/bits/stl_tree.h (_Rb_tree::_M_reinsert_node_unique):
Call release() on node handle instead of just zeroing its
pointer.
(_Rb_tree::_M_reinsert_node_equal): Likewise.
(_Rb_tree::_M_reinsert_node_hint_unique): Likewise.
(_Rb_tree::_M_reinsert_node_hint_equal): Likewise.
* testsuite/23_containers/multiset/modifiers/114401.cc: New test.
* testsuite/23_containers/set/modifiers/114401.cc: New test.
* testsuite/23_containers/unordered_multiset/modifiers/114401.cc: New test.
* testsuite/23_containers/unordered_set/modifiers/114401.cc: New test.
(cherry picked from commit
c2e28df90a1640cebadef6c6c8ab5ea964071bb1)
GCC Administrator [Wed, 10 Jul 2024 00:20:31 +0000 (00:20 +0000)]
Daily bump.
Kyrylo Tkachov [Fri, 28 Jun 2024 07:52:37 +0000 (13:22 +0530)]
aarch64: PR target/115475 Implement missing __ARM_FEATURE_SVE_BF16 macro
The ACLE requires __ARM_FEATURE_SVE_BF16 to be enabled when SVE and BF16
and the associated intrinsics are available.
GCC does support the required intrinsics for TARGET_SVE_BF16 so define
this macro too.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/115475
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_SVE_BF16 for TARGET_SVE_BF16.
gcc/testsuite/
PR target/115475
* gcc.target/aarch64/acle/bf16_sve_feature.c: New test.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
(cherry picked from commit
6492c7130d6ae9992298fc3d072e2589d1131376)
Kyrylo Tkachov [Thu, 27 Jun 2024 10:40:41 +0000 (16:10 +0530)]
aarch64: PR target/115457 Implement missing __ARM_FEATURE_BF16 macro
The ACLE asks the user to test for __ARM_FEATURE_BF16 before using the
<arm_bf16.h> header but GCC doesn't set this up.
LLVM does, so this is an inconsistency between the compilers.
This patch enables that macro for TARGET_BF16_FP.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/115457
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_BF16 for TARGET_BF16_FP.
gcc/testsuite/
PR target/115457
* gcc.target/aarch64/acle/bf16_feature.c: New test.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
(cherry picked from commit
c10942134fa759843ac1ed1424b86fcb8e6368ba)
GCC Administrator [Tue, 9 Jul 2024 00:19:52 +0000 (00:19 +0000)]
Daily bump.
Andrew Pinski [Fri, 16 Feb 2024 18:55:43 +0000 (10:55 -0800)]
c++: Add testcase for this PR [PR97990]
This testcase was fixed by
r14-5934-gf26d68d5d128c8 but we should add
one to make sure it does not regress again.
Committed as obvious after a quick test on the testcase.
PR c++/97990
gcc/testsuite/ChangeLog:
* g++.dg/torture/vector-struct-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit
5f1438db419c9eb8901d1d1d7f98fb69082aec8e)
Richard Biener [Tue, 28 Nov 2023 11:36:21 +0000 (12:36 +0100)]
middle-end/112732 - stray TYPE_ALIAS_SET in type variant
The following fixes a stray TYPE_ALIAS_SET in a type variant built
by build_opaque_vector_type which is diagnosed by type checking
enabled with -flto.
PR middle-end/112732
* tree.c (build_opaque_vector_type): Reset TYPE_ALIAS_SET
of the newly built type.
(cherry picked from commit
f26d68d5d128c86faaceeb81b1e8f22254ad53df)
GCC Administrator [Mon, 8 Jul 2024 00:19:20 +0000 (00:19 +0000)]
Daily bump.
GCC Administrator [Sun, 7 Jul 2024 00:19:14 +0000 (00:19 +0000)]
Daily bump.
GCC Administrator [Sat, 6 Jul 2024 00:19:56 +0000 (00:19 +0000)]
Daily bump.
GCC Administrator [Fri, 5 Jul 2024 00:19:49 +0000 (00:19 +0000)]
Daily bump.
GCC Administrator [Thu, 4 Jul 2024 00:20:18 +0000 (00:20 +0000)]
Daily bump.
John David Anglin [Sun, 30 Jun 2024 13:48:21 +0000 (09:48 -0400)]
hppa: Fix ICE caused by mismatched predicate and constraint in xmpyu patterns
2024-06-30 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR target/115691
* config/pa/pa.md: Remove incorrect xmpyu patterns.
GCC Administrator [Wed, 3 Jul 2024 00:21:06 +0000 (00:21 +0000)]
Daily bump.
GCC Administrator [Tue, 2 Jul 2024 00:19:33 +0000 (00:19 +0000)]
Daily bump.
GCC Administrator [Mon, 1 Jul 2024 00:20:26 +0000 (00:20 +0000)]
Daily bump.
GCC Administrator [Sun, 30 Jun 2024 00:19:43 +0000 (00:19 +0000)]
Daily bump.
Francois-Xavier Coudert [Sat, 16 Mar 2024 08:50:00 +0000 (09:50 +0100)]
libcc1: fix <vector> include
Use INCLUDE_VECTOR before including system.h, instead of directly
including <vector>, to avoid running into poisoned identifiers.
Signed-off-by: Dimitry Andric <dimitry@andric.com>
libcc1/ChangeLog:
PR middle-end/111632
* libcc1plugin.cc: Fix include.
* libcp1plugin.cc: Fix include.
(cherry picked from commit
5213047b1d50af63dfabb5e5649821a6cb157e33)
Francois-Xavier Coudert [Thu, 7 Mar 2024 13:36:03 +0000 (14:36 +0100)]
Include safe-ctype.h after C++ standard headers, to avoid over-poisoning
When building gcc's C++ sources against recent libc++, the poisoning of
the ctype macros due to including safe-ctype.h before including C++
standard headers such as <list>, <map>, etc, causes many compilation
errors, similar to:
In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
In file included from /home/dim/src/gcc/master/gcc/system.h:233:
In file included from /usr/include/c++/v1/vector:321:
In file included from
/usr/include/c++/v1/__format/formatter_bool.h:20:
In file included from
/usr/include/c++/v1/__format/formatter_integral.h:32:
In file included from /usr/include/c++/v1/locale:202:
/usr/include/c++/v1/__locale:546:5: error: '__abi_tag__' attribute
only applies to structs, variables, functions, and namespaces
546 | _LIBCPP_INLINE_VISIBILITY
| ^
/usr/include/c++/v1/__config:813:37: note: expanded from macro
'_LIBCPP_INLINE_VISIBILITY'
813 | # define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
| ^
/usr/include/c++/v1/__config:792:26: note: expanded from macro
'_LIBCPP_HIDE_FROM_ABI'
792 |
__attribute__((__abi_tag__(_LIBCPP_TOSTRING(
_LIBCPP_VERSIONED_IDENTIFIER))))
| ^
In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
In file included from /home/dim/src/gcc/master/gcc/system.h:233:
In file included from /usr/include/c++/v1/vector:321:
In file included from
/usr/include/c++/v1/__format/formatter_bool.h:20:
In file included from
/usr/include/c++/v1/__format/formatter_integral.h:32:
In file included from /usr/include/c++/v1/locale:202:
/usr/include/c++/v1/__locale:547:37: error: expected ';' at end of
declaration list
547 | char_type toupper(char_type __c) const
| ^
/usr/include/c++/v1/__locale:553:48: error: too many arguments
provided to function-like macro invocation
553 | const char_type* toupper(char_type* __low, const
char_type* __high) const
| ^
/home/dim/src/gcc/master/gcc/../include/safe-ctype.h:146:9: note:
macro 'toupper' defined here
146 | #define toupper(c) do_not_use_toupper_with_safe_ctype
| ^
This is because libc++ uses different transitive includes than
libstdc++, and some of those transitive includes pull in various ctype
declarations (typically via <locale>).
There was already a special case for including <string> before
safe-ctype.h, so move the rest of the C++ standard header includes to
the same location, to fix the problem.
gcc/ChangeLog:
* system.h: Include safe-ctype.h after C++ standard headers.
Signed-off-by: Dimitry Andric <dimitry@andric.com>
(cherry picked from commit
9970b576b7e4ae337af1268395ff221348c4b34a)
Iain Sandoe [Fri, 12 Nov 2021 16:36:25 +0000 (16:36 +0000)]
Ada, Darwin : Use DSYMUTIL_FOR_TARGET in libgnat/gnarl builds.
Most of the time we get away with using the dsymutil that is
installed with the latest Xcode, however for some cross-compilation
cases that does not work.
We now have the ability to specify the correct dsymutil to use for
the toolchain (--with-dsymutil=) and we should use that specified
tool for debug link. Fixes cross-compilers from x86-64 to powerpc.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ada/ChangeLog:
* gcc-interface/Makefile.in: Use DSYMUTIL_FOR_TARGET in
libgnat/libgnarl recipies.
GCC Administrator [Sat, 29 Jun 2024 00:19:58 +0000 (00:19 +0000)]
Daily bump.
GCC Administrator [Fri, 28 Jun 2024 00:20:03 +0000 (00:20 +0000)]
Daily bump.
Martin Liska [Thu, 27 Jan 2022 13:47:23 +0000 (14:47 +0100)]
libstdc++: fix typo in acinclude.m4.
PR libstdc++/104259
libstdc++-v3/ChangeLog:
* acinclude.m4: Fix typo.
* configure: Regenerate.
(cherry picked from commit
14f339894db6ca7fe4772d5528c726694d2517c4)
Iain Sandoe [Mon, 18 Apr 2022 15:23:30 +0000 (16:23 +0100)]
c++, coroutines: Improve check for throwing final await [PR104051].
We check that the final_suspend () method returns a sane type (i.e. a class
or structure) but, unfortunately, that check has to be later than the one
for a throwing case. If the use returns some nonsensical type from the
method, we need to handle that in the checking for noexcept.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/104051
gcc/cp/ChangeLog:
* coroutines.cc (coro_diagnose_throwing_final_aw_expr): Handle
non-target expression inputs.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr104051.C: New test.
(cherry picked from commit
7b96274a340bc0e9bcaef9baff3a44ec2f12c3df)
Iain Sandoe [Sat, 2 Oct 2021 15:15:38 +0000 (16:15 +0100)]
coroutines: Fail with a sorry when presented with a VLA [PR 101765].
We do not support this yet.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/101765
gcc/cp/ChangeLog:
* coroutines.cc (register_local_var_uses): Emit a sorry if
we encounter a VLA in the coroutine local variables.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr101765.C: New test.
(cherry picked from commit
fdf0b6ce6c1cfa1c328c0c40473c71ca11fd8303)
Iain Sandoe [Sun, 3 Oct 2021 18:46:09 +0000 (19:46 +0100)]
coroutines: Pass lvalues to user-defined operator new [PR 100772].
The wording of the standard has been clarified to be explicit that
the the parameters to any user-defined operator-new in the promise
class should be lvalues.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/100772
gcc/cp/ChangeLog:
* coroutines.cc (morph_fn_to_coro): Convert function parms
from reference before constructing any operator-new args
list.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr100772-a.C: New test.
* g++.dg/coroutines/pr100772-b.C: New test.
(cherry picked from commit
921942a8a106cb53994c21162922e4934eb3a3e0)
Iain Sandoe [Sat, 2 Oct 2021 13:43:39 +0000 (14:43 +0100)]
coroutines: Await expressions are not allowed in handlers [PR 99710].
C++20 [expr.await] / 2
An await-expression shall appear only in a potentially-evaluated expression
within the compound-statement of a function-body outside of a handler.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/99710
gcc/cp/ChangeLog:
* coroutines.cc (await_statement_walker): Report an error if
an await expression is found in a handler body.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr99710.C: New test.
(cherry picked from commit
650beb110538097b9c3e8600149b333a83e7e836)
Kyrylo Tkachov [Wed, 19 Jun 2024 09:26:02 +0000 (14:56 +0530)]
Add support for -mcpu=grace
This adds support for the NVIDIA Grace CPU to aarch64.
We reuse the tuning decisions for the Neoverse V2 core, but include a
number of architecture features that are not enabled by default in
-mcpu=neoverse-v2.
This allows Grace users to more simply target the CPU with -mcpu=grace
rather than remembering what extensions to tag on top of
-mcpu=neoverse-v2.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
* config/aarch64/aarch64-cores.def (grace): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document the above.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
GCC Administrator [Thu, 27 Jun 2024 00:19:47 +0000 (00:19 +0000)]
Daily bump.
GCC Administrator [Wed, 26 Jun 2024 00:18:40 +0000 (00:18 +0000)]
Daily bump.
Jonathan Wakely [Tue, 25 Jun 2024 22:20:49 +0000 (23:20 +0100)]
libstdc++: Replace reference to mainline in release branch docs
libstdc++-v3/ChangeLog:
* doc/xml/manual/status_cxx2023.xml: Change reference from
mainline GCC to the release branch.
* doc/html/manual/status.html: Regenerate.
GCC Administrator [Tue, 25 Jun 2024 00:19:12 +0000 (00:19 +0000)]
Daily bump.
Kewen Lin [Wed, 29 May 2024 02:13:40 +0000 (21:13 -0500)]
rs6000: Don't clobber return value when eh_return called [PR114846]
As the associated test case in PR114846 shows, currently
with eh_return involved some register restoring for EH
RETURN DATA in epilogue can clobber the one which holding
the return value. Referring to the existing handlings in
some other targets, this patch makes eh_return expander
call one new define_insn_and_split eh_return_internal which
directly calls rs6000_emit_epilogue with epilogue_type
EPILOGUE_TYPE_EH_RETURN instead of the previous treating
normal return with crtl->calls_eh_return specially.
PR target/114846
gcc/ChangeLog:
* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): As
EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
now, adjust the relevant handlings on it.
* config/rs6000/rs6000.md (eh_return expander): Append by calling
gen_eh_return_internal and emit_barrier.
(eh_return_internal): New define_insn_and_split, call function
rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr114846.c: New test.
(cherry picked from commit
e5fc5d42d25c86ae48178db04ce64d340a834614)
GCC Administrator [Mon, 24 Jun 2024 00:19:50 +0000 (00:19 +0000)]
Daily bump.
GCC Administrator [Sun, 23 Jun 2024 00:18:01 +0000 (00:18 +0000)]
Daily bump.
GCC Administrator [Sat, 22 Jun 2024 00:19:23 +0000 (00:19 +0000)]
Daily bump.
Matthias Kretz [Fri, 21 Jun 2024 14:22:22 +0000 (16:22 +0200)]
libstdc++: Fix test on x86_64 and non-simd targets
* Running a test compiled with AVX512 instructions requires
avx512f_runtime not just avx512f.
* The 'reduce2' test violated an invariant of fixed_size_simd_mask and
thus failed on all targets without 16-Byte vector builtins enabled (in
bits/simd.h).
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/115575
* testsuite/experimental/simd/pr115454_find_last_set.cc: Require
avx512f_runtime. Don't memcpy fixed_size masks.
(cherry picked from commit
77f321435b4ac37992c2ed6737ca0caa1dd50551)
Richard Biener [Wed, 31 Jan 2024 13:40:24 +0000 (14:40 +0100)]
middle-end/110176 - wrong zext (bool) <= (int) 4294967295u folding
The following fixes a wrong pattern that didn't match the behavior
of the original fold_widened_comparison in that get_unwidened
returned a constant always in the wider type. But here we're
using (int) 4294967295u without the conversion applied. Fixed
by doing as earlier in the pattern - matching constants only
if the conversion was actually applied.
PR middle-end/110176
* match.pd (zext (bool) <= (int) 4294967295u): Make sure
to match INTEGER_CST only without outstanding conversion.
* gcc.dg/torture/pr110176.c: New testcase.
(cherry picked from commit
22dbfbe8767ff4c1d93e39f68ec7c2d5b1358beb)
Richard Biener [Mon, 21 Aug 2023 07:01:00 +0000 (09:01 +0200)]
tree-optimization/111070 - fix ICE with recent ifcombine fix
We now got test coverage for non-SSA name bits so the following amends
the SSA_NAME_OCCURS_IN_ABNORMAL_PHI checks.
PR tree-optimization/111070
* tree-ssa-ifcombine.c (ifcombine_ifandif): Check we have
an SSA name before checking SSA_NAME_OCCURS_IN_ABNORMAL_PHI.
* gcc.dg/pr111070.c: New testcase.
Richard Biener [Thu, 17 Aug 2023 11:10:14 +0000 (13:10 +0200)]
tree-optimization/111039 - abnormals and bit test merging
The following guards the bit test merging code in if-combine against
the appearance of SSA names used in abnormal PHIs.
PR tree-optimization/111039
* tree-ssa-ifcombine.c (ifcombine_ifandif): Check for
SSA_NAME_OCCURS_IN_ABNORMAL_PHI.
* gcc.dg/pr111039.c: New testcase.
Richard Biener [Mon, 21 Aug 2023 08:34:30 +0000 (10:34 +0200)]
debug/111080 - avoid outputting debug info for unused restrict qualified type
The following applies some maintainance with respect to type qualifiers
and kinds added by later DWARF standards to prune_unused_types_walk.
The particular case in the bug is not handling (thus marking required)
all restrict qualified type DIEs. I've found more DW_TAG_*_type that
are unhandled, looked up the DWARF docs and added them as well based
on common sense.
PR debug/111080
* dwarf2out.c (prune_unused_types_walk): Handle
DW_TAG_restrict_type, DW_TAG_shared_type, DW_TAG_atomic_type,
DW_TAG_immutable_type, DW_TAG_coarray_type, DW_TAG_unspecified_type
and DW_TAG_dynamic_type as to only output them when referenced.
* gcc.dg/debug/dwarf2/pr111080.c: New testcase.
Richard Biener [Fri, 20 Oct 2023 13:08:49 +0000 (15:08 +0200)]
tree-optimization/111445 - simple_iv simplification fault
The following fixes a missed check in the simple_iv attempt
to simplify (signed T)((unsigned T) base + step) where it
allows a truncating inner conversion leading to wrong code.
PR tree-optimization/111445
* tree-scalar-evolution.c (simple_iv_with_niters):
Add missing check for a sign-conversion.
* gcc.dg/torture/pr111445.c: New testcase.
(cherry picked from commit
9692309ed6b625f0fb358c0e230404b5603f69a6)
Richard Biener [Mon, 13 Nov 2023 09:20:37 +0000 (10:20 +0100)]
tree-optimization/112495 - alias versioning and address spaces
We are not correctly handling differing address spaces in dependence
analysis runtime alias check generation so refuse to do that.
PR tree-optimization/112495
* tree-data-ref.c (runtime_alias_check_p): Reject checks
between different address spaces.
* gcc.target/i386/pr112495.c: New testcase.
(cherry picked from commit
0f593c0521caab8cfac53514b1a5e7d0d0dd1932)
Richard Biener [Thu, 11 Jan 2024 13:00:33 +0000 (14:00 +0100)]
tree-optimization/112505 - bit-precision induction vectorization
Vectorization of bit-precision inductions isn't implemented but we
don't check this, instead we ICE during transform.
PR tree-optimization/112505
* tree-vect-loop.c (vectorizable_induction): Reject
bit-precision induction.
* gcc.dg/vect/pr112505.c: New testcase.
(cherry picked from commit
ec345df53556ec581590347f71c3d9ff3cdbca76)
Richard Biener [Mon, 22 Jan 2024 14:42:59 +0000 (15:42 +0100)]
debug/112718 - reset all type units with -ffat-lto-objects
When mixing -flto, -ffat-lto-objects and -fdebug-type-section we
fail to reset all type units after early output resulting in an
ICE when attempting to add then duplicate sibling attributes.
PR debug/112718
* dwarf2out.c (dwarf2out_finish): Reset all type units
for the fat part of an LTO compile.
* gcc.dg/debug/pr112718.c: New testcase.
(cherry picked from commit
7218f5050cb7163edae331f54ca163248ab48bfa)
Richard Biener [Wed, 13 Dec 2023 13:23:31 +0000 (14:23 +0100)]
tree-optimization/112793 - SLP of constant/external code-generated twice
The following makes the attempt at code-generating a constant/external
SLP node twice well-formed as that can happen when partitioning BB
vectorization attempts where we keep constants/externals unpartitioned.
PR tree-optimization/112793
* tree-vect-slp.c (vect_schedule_slp_node): Already
code-generated constant/external nodes are OK.
* g++.dg/vect/pr112793.cc: New testcase.
(cherry picked from commit
d782ec8362eadc3169286eb1e39c631effd02323)
Richard Biener [Tue, 26 Mar 2024 08:46:06 +0000 (09:46 +0100)]
tree-optimization/114027 - fix testcase
The following fixes out-of-bounds read in the testcase.
PR tree-optimization/114027
* gcc.dg/vect/pr114027.c: Fix iteration count.
(cherry picked from commit
4470611e20f3217ee81647b01fda65b6a62229aa)
Richard Biener [Thu, 22 Feb 2024 09:50:12 +0000 (10:50 +0100)]
tree-optimization/114027 - conditional reduction chain
When we classify a conditional reduction chain as CONST_COND_REDUCTION
we fail to verify all involved conditionals have the same constant.
That's a quite unlikely situation so the following simply disables
such classification when there's more than one reduction statement.
PR tree-optimization/114027
* tree-vect-loop.c (vecctorizable_reduction): Use optimized
condition reduction classification only for single-element
chains.
* gcc.dg/vect/pr114027.c: New testcase.
(cherry picked from commit
549f251f055e3a0b0084189a3012c4f15d635e75)
Richard Biener [Fri, 26 Apr 2024 13:47:13 +0000 (15:47 +0200)]
middle-end/114734 - wrong code with expand_call_mem_ref
When expand_call_mem_ref looks at the definition of the address
argument to eventually expand a &TARGET_MEM_REF argument together
with a masked load it fails to honor constraints imposed by SSA
coalescing decisions. The following fixes this.
PR middle-end/114734
* internal-fn.c (expand_call_mem_ref): Use
get_gimple_for_ssa_name to get at the def stmt of the address
argument to honor SSA coalescing constraints.
(cherry picked from commit
20ebcaf826c91ddaf2aac35417ec1e5e6d31ad50)
GCC Administrator [Fri, 21 Jun 2024 00:19:22 +0000 (00:19 +0000)]
Daily bump.
Matthias Kretz [Fri, 14 Jun 2024 13:11:25 +0000 (15:11 +0200)]
libstdc++: Fix find_last_set(simd_mask) to ignore padding bits
With the change to the AVX512 find_last_set implementation, the change
to AVX512 operator!= is unnecessary. However, the latter was not
producing optimal code and unnecessarily set the padding bits. In
theory, the compiler could determine that with the new !=
implementation, the bit operation for clearing the padding bits is a
no-op and can be elided.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/115454
* include/experimental/bits/simd_x86.h (_S_not_equal_to): Use
neq comparison instead of bitwise negation after eq.
(_S_find_last_set): Clear unused high bits before computing
bit_width.
* testsuite/experimental/simd/pr115454_find_last_set.cc: New
test.
(cherry picked from commit
4787960dcaf0de3f46464960f5246de9b3c69a06)
Jakub Jelinek [Mon, 17 Jun 2024 20:02:46 +0000 (22:02 +0200)]
diagnostics: Fix add_misspelling_candidates [PR115440]
The option_map array for most entries contains just non-NULL opt0
{ "-Wno-", NULL, "-W", false, true },
{ "-fno-", NULL, "-f", false, true },
{ "-gno-", NULL, "-g", false, true },
{ "-mno-", NULL, "-m", false, true },
{ "--debug=", NULL, "-g", false, false },
{ "--machine-", NULL, "-m", true, false },
{ "--machine-no-", NULL, "-m", false, true },
{ "--machine=", NULL, "-m", false, false },
{ "--machine=no-", NULL, "-m", false, true },
{ "--machine", "", "-m", false, false },
{ "--machine", "no-", "-m", false, true },
{ "--optimize=", NULL, "-O", false, false },
{ "--std=", NULL, "-std=", false, false },
{ "--std", "", "-std=", false, false },
{ "--warn-", NULL, "-W", true, false },
{ "--warn-no-", NULL, "-W", false, true },
{ "--", NULL, "-f", true, false },
{ "--no-", NULL, "-f", false, true }
and so add_misspelling_candidates works correctly for it, but 3 out of
these,
{ "--machine", "", "-m", false, false },
{ "--machine", "no-", "-m", false, true },
and
{ "--std", "", "-std=", false, false },
use non-NULL opt1. That says that
--machine foo
should map to
-mfoo
and
--machine no-foo
should map to
-mno-foo
and
--std c++17
should map to
-std=c++17
add_misspelling_canidates was not handling this, so it hapilly
registered say
--stdc++17
or
--machineavx512
(twice) as spelling alternatives, when those options aren't recognized.
Instead we support
--std c++17
or
--machine avx512
--machine no-avx512
The following patch fixes that. On this particular testcase, we no longer
suggest anything, even when among the suggestion is say that
--std c++17
or
-std=c++17
etc.
2024-06-17 Jakub Jelinek <jakub@redhat.com>
PR driver/115440
* opts-common.c (add_misspelling_candidates): If opt1 is non-NULL,
add a space and opt1 to the alternative suggestion text.
* g++.dg/cpp1z/pr115440.C: New test.
(cherry picked from commit
96db57948b50f45235ae4af3b46db66cae7ea859)
Jakub Jelinek [Thu, 6 Jun 2024 20:12:11 +0000 (22:12 +0200)]
c: Fix up pointer types to may_alias structures [PR114493]
The following testcase ICEs in ipa-free-lang, because the
fld_incomplete_type_of
gcc_assert (TYPE_CANONICAL (t2) != t2
&& TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
assertion doesn't hold.
This is because t is a struct S * type which was created while struct S
was still incomplete and without the may_alias attribute (and TYPE_CANONICAL
of a pointer type is a type created with can_alias_all = false argument),
while later on on the struct definition may_alias attribute was used.
fld_incomplete_type_of then creates an incomplete distinct copy of the
structure (but with the original attributes) but pointers created for it
are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including
their TYPE_CANONICAL, because while that is created with !can_alias_all
argument, we later set it because of the "may_alias" attribute on the
to_type.
This doesn't ICE with C++ since PR70512 fix because the C++ FE sets
TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its
variants) when the may_alias is added.
The following patch does that in the C FE as well.
2024-06-06 Jakub Jelinek <jakub@redhat.com>
PR c/114493
* c-decl.c (c_fixup_may_alias): New function.
(finish_struct): Call it if "may_alias" attribute is
specified.
* gcc.dg/pr114493-1.c: New test.
* gcc.dg/pr114493-2.c: New test.
(cherry picked from commit
d5a3c6d43acb8b2211d9fb59d59482d74c010f01)
Jakub Jelinek [Tue, 4 Jun 2024 13:49:41 +0000 (15:49 +0200)]
fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]
The function currently incorrectly assumes all the __builtin_clz* and .CLZ
calls have non-negative result. That is the case of the former which is UB
on zero and has [0, prec-1] return value otherwise, and is the case of the
single argument .CLZ as well (again, UB on zero), but for two argument
.CLZ is the case only if the second argument is also nonnegative (or if we
know the argument can't be zero, but let's do that just in the ranger IMHO).
The following patch does that.
2024-06-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/115337
* fold-const.c (tree_call_nonnegative_warnv_p) <CASE_CFN_CLZ>:
If fn is CFN_CLZ, use CLZ_DEFINED_VALUE_AT.
(cherry picked from commit
b82a816000791e7a286c7836b3a473ec0e2a577b)
Jakub Jelinek [Tue, 4 Jun 2024 10:28:01 +0000 (12:28 +0200)]
builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow [PR108789]
The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR. Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).
2024-06-04 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108789
* builtins.c (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.
* gcc.c-torture/execute/pr108789.c: New test.
(cherry picked from commit
b8e28381cb5c0cddfe5201faf799d8b27f5d7d6c)
Jakub Jelinek [Wed, 15 May 2024 16:37:17 +0000 (18:37 +0200)]
combine: Fix up simplify_compare_const [PR115092]
The following testcases are miscompiled (with tons of GIMPLE
optimization disabled) because combine sees GE comparison of
1-bit sign_extract (i.e. something with [-1, 0] value range)
with (const_int -1) (which is always true) and optimizes it into
NE comparison of 1-bit zero_extract ([0, 1] value range) against
(const_int 0).
The reason is that simplify_compare_const first (correctly)
simplifies the comparison to
GE (ashift:SI something (const_int 31)) (const_int -
2147483648)
and then an optimization for when the second operand is power of 2
triggers. That optimization is fine for power of 2s which aren't
the signed minimum of the mode, or if it is NE, EQ, GEU or LTU
against the signed minimum of the mode, but for GE or LT optimizing
it into NE (or EQ) against const0_rtx is wrong, those cases
are always true or always false (but the function doesn't have
a standardized way to tell callers the comparison is now unconditional).
The following patch just disables the optimization in that case.
2024-05-15 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114902
PR rtl-optimization/115092
* combine.c (simplify_compare_const): Don't optimize
GE op0 SIGNED_MIN or LT op0 SIGNED_MIN into NE op0 const0_rtx or
EQ op0 const0_rtx.
* gcc.dg/pr114902.c: New test.
* gcc.dg/pr115092.c: New test.
(cherry picked from commit
0b93a0ae153ef70a82ff63e67926a01fdab9956b)
Jakub Jelinek [Tue, 7 May 2024 19:29:14 +0000 (21:29 +0200)]
tree-inline: Remove .ASAN_MARK calls when inlining functions into no_sanitize callers [PR114956]
In r9-5742 we've started allowing to inline always_inline functions into
functions which have disabled e.g. address sanitization even when the
always_inline function is implicitly from command line options sanitized.
This mostly works fine because most of the asan instrumentation is done only
late after ipa, but as the following testcase the .ASAN_MARK ifn calls
gimplifier adds can result in ICEs.
Fixed by dropping those during inlining, similarly to how we drop
.TSAN_FUNC_EXIT calls.
2024-05-07 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/114956
* tree-inline.c: Include asan.h.
(copy_bb): Remove also .ASAN_MARK calls if id->dst_fn has asan/hwasan
sanitization disabled.
* gcc.dg/asan/pr114956.c: New test.
(cherry picked from commit
d4e25cf4f7c1f51a8824cc62bbb85a81a41b829a)
Jakub Jelinek [Tue, 30 Apr 2024 09:22:32 +0000 (11:22 +0200)]
gimple-ssa-sprintf: Use [0, 1] range for %lc with (wint_t) 0 argument [PR114876]
Seems when Martin S. implemented this, he coded there strict reading
of the standard, which said that %lc with (wint_t) 0 argument is handled
as wchar_t[2] temp = { arg, 0 }; %ls with temp arg and so shouldn't print
any values. But, most of the libc implementations actually handled that
case like %c with '\0' argument, adding a single NUL character, the only
known exception is musl.
Recently, C23 changed this in response to GB-141 and POSIX in
https://austingroupbugs.net/view.php?id=1647
so that it should have the same behavior as %c with '\0'.
Because there is implementation divergence, the following patch uses
a range rather than hardcoding it to all 1s (i.e. the %c behavior),
though the likely case is still 1 (forward looking plus most of
implementations).
The res.knownrange = true; assignment removed is redundant due to
the same assignment done unconditionally before the if statement,
rest is formatting fixes.
I don't think the min >= 0 && min < 128 case is right either, I'd think
it should be min >= 0 && max < 128, otherwise it is just some possible
inputs are (maybe) ASCII and there can be others, but this code is a total
mess anyway, with the min, max, likely (somewhere in [min, max]?) and then
unlikely possibly larger than max, dunno, perhaps for at least some chars
in the ASCII range the likely case could be for the ascii case; so perhaps
just the one_2_one_ascii shouldn't set max to 1 and mayfail should be true
for max >= 128. Anyway, didn't feel I should touch that right now.
2024-04-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114876
* gimple-ssa-sprintf.c (format_character): For min == 0 && max == 0,
set max, likely and unlikely members to 1 rather than 0. Remove
useless res.knownrange = true;. Formatting fixes.
* gcc.dg/pr114876.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Adjust expected
diagnostics.
(cherry picked from commit
6c6b70f07208ca14ba783933988c04c6fc2fff42)
Jakub Jelinek [Thu, 25 Apr 2024 18:09:35 +0000 (20:09 +0200)]
openmp: Copy DECL_LANG_SPECIFIC and DECL_LANG_FLAG_? to tree-nested decl copy [PR114825]
tree-nested.cc creates in 2 spots artificial VAR_DECLs, one of them is used
both for debug info and OpenMP/OpenACC lowering purposes, the other solely for
OpenMP/OpenACC lowering purposes.
When the decls are used in OpenMP/OpenACC lowering, the OMP langhooks (mostly
Fortran, C just a little and C++ doesn't have nested functions) then inspect
the flags on the vars and based on that decide how to lower the corresponding
clauses.
Unfortunately we weren't copying DECL_LANG_SPECIFIC and DECL_LANG_FLAG_?, so
the langhooks made decisions on the default flags on those instead.
As the original decl isn't necessarily a VAR_DECL, could be e.g. PARM_DECL,
using copy_node wouldn't work properly, so this patch just copies those
flags in addition to other flags it was copying already. And I've removed
code duplication by introducing a helper function which does copying common
to both uses.
2024-04-25 Jakub Jelinek <jakub@redhat.com>
PR fortran/114825
* tree-nested.c (get_debug_decl): New function.
(get_nonlocal_debug_decl): Use it.
(get_local_debug_decl): Likewise.
* gfortran.dg/gomp/pr114825.f90: New test.
(cherry picked from commit
14d48516e588ad2b35e2007b3970bdcb1b3f145c)
Jakub Jelinek [Fri, 19 Apr 2024 06:47:53 +0000 (08:47 +0200)]
rtlanal: Fix set_noop_p for volatile loads or stores [PR114768]
On the following testcase, combine propagates the mem/v load into mem store
with the same address and then removes it, because noop_move_p says it is a
no-op move. If it was the other way around, i.e. mem/v store and mem load,
or both would be mem/v, it would be kept.
The problem is that rtx_equal_p never checks any kind of flags on the rtxes
(and I think it would be quite dangerous to change it at this point), and
set_noop_p checks side_effects_p on just one of the operands, not both.
In the MEM <- MEM set, it only checks it on the destination, in
store to ZERO_EXTRACT only checks it on the source.
The following patch adds the missing side_effects_p checks.
2024-04-19 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114768
* rtlanal.c (set_noop_p): Don't return true for MEM <- MEM
sets if src has side-effects or for stores into ZERO_EXTRACT
if ZERO_EXTRACT operand has side-effects.
* gcc.dg/pr114768.c: New test.
(cherry picked from commit
9f295847a9c32081bdd0fe908ffba58e830a24fb)
Jakub Jelinek [Thu, 18 Apr 2024 07:45:14 +0000 (09:45 +0200)]
internal-fn: Temporarily disable flag_trapv during .{ADD,SUB,MUL}_OVERFLOW etc. expansion [PR114753]
__builtin_{add,sub,mul}_overflow{,_p} builtins are well defined
for all inputs even for -ftrapv, and the -fsanitize=signed-integer-overflow
ifns shouldn't abort in libgcc but emit the desired ubsan diagnostics
or abort depending on -fsanitize* setting regardless of -ftrapv.
The expansion of these internal functions uses expand_expr* in various
places (e.g. MULT_EXPR at least in 2 spots), so temporarily disabling
flag_trapv in all those spots would be hard.
The following patch disables it around the bodies of 3 functions
which can do the expand_expr calls.
If it was in the C++ FE, I'd use some RAII sentinel, but I don't think
we have one in the middle-end.
2024-04-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114753
* internal-fn.c (expand_mul_overflow): Save flag_trapv and
temporarily clear it for the duration of the function, then
restore previous value.
(expand_vector_ubsan_overflow): Likewise.
(expand_arith_overflow): Likewise.
* gcc.dg/pr114753.c: New test.
(cherry picked from commit
6c152c9db3b5b9d43e12846fb7a44977c0b65fc2)
Jakub Jelinek [Mon, 15 Apr 2024 08:25:22 +0000 (10:25 +0200)]
attribs: Don't crash on NULL TREE_TYPE in diag_attr_exclusions [PR114634]
The enumerator still doesn't have TREE_TYPE set but diag_attr_exclusions
assumes that all decls must have types.
I think it is better in something as unimportant as diag_attr_exclusions
to be more robust, if there is no type, it can just diagnose exclusions
on the DECL_ATTRIBUTES, like for types it only diagnoses it on
TYPE_ATTRIBUTES.
2024-04-15 Jakub Jelinek <jakub@redhat.com>
PR c++/114634
* attribs.c (diag_attr_exclusions): Set attrs[1] to NULL_TREE for
decls with NULL TREE_TYPE.
* g++.dg/ext/attrib68.C: New test.
(cherry picked from commit
7ec54f5fdfec298812a749699874db4d6a7246bb)
Jakub Jelinek [Fri, 12 Apr 2024 18:53:10 +0000 (20:53 +0200)]
c++: Fix bogus warnings about ignored annotations [PR114691]
The middle-end warns about the ANNOTATE_EXPR added for while/for loops
if they declare a var inside of the loop condition.
This is because the assumption is that ANNOTATE_EXPR argument is used
immediately in a COND_EXPR (later GIMPLE_COND), but simplify_loop_decl_cond
wraps the ANNOTATE_EXPR inside of a TRUTH_NOT_EXPR, so it no longer
holds.
The following patch fixes that by adding the TRUTH_NOT_EXPR inside of the
ANNOTATE_EXPR argument if any.
2024-04-12 Jakub Jelinek <jakub@redhat.com>
PR c++/114691
* semantics.c (simplify_loop_decl_cond): Use cp_build_unary_op with
TRUTH_NOT_EXPR on ANNOTATE_EXPR argument (if any) rather than
ANNOTATE_EXPR itself.
* g++.dg/ext/pr114691.C: New test.
(cherry picked from commit
91146346f57cc54dfeb2669347edd0eb3d13af7f)
Jakub Jelinek [Thu, 11 Apr 2024 09:12:11 +0000 (11:12 +0200)]
asan, v3: Fix up handling of > 32 byte aligned variables with -fsanitize=address -fstack-protector* [PR110027]
On Tue, Mar 26, 2024 at 02:08:02PM +0800, liuhongt wrote:
> > > So, try to add some other variable with larger size and smaller alignment
> > > to the frame (and make sure it isn't optimized away).
> > >
> > > alignb above is the alignment of the first partition's var, if
> > > align_frame_offset really needs to depend on the var alignment, it probably
> > > should be the maximum alignment of all the vars with alignment
> > > alignb * BITS_PER_UNIT <=3D MAX_SUPPORTED_STACK_ALIGNMENT
> > >
>
> In asan_emit_stack_protection, when it allocated fake stack, it assume
> bottom of stack is also aligned to alignb. And the place violated this
> is the first var partition. which is 32 bytes offsets, it should be
> BIGGEST_ALIGNMENT / BITS_PER_UNIT.
> So I think we need to use MAX (BIGGEST_ALIGNMENT /
> BITS_PER_UNIT, ASAN_RED_ZONE_SIZE) for the first var partition.
Your first patch aligned offsets[0] to maximum of alignb and
ASAN_RED_ZONE_SIZE. But as I wrote in the reply to that mail, alignb there
is the alignment of just a single variable which is the first one to appear
in the sorted list and is placed in the highest spot in the stack frame.
That is not necessarily the largest alignment, the sorting ensures that it
is a variable with the largest size in the frame (and only if several of
them have equal size, largest alignment from the same sized ones). Your
second patch used maximum of BIGGEST_ALIGNMENT / BITS_PER_UNIT and
ASAN_RED_ZONE_SIZE. That doesn't change anything at all when using
-mno-avx512f - offsets[0] is still just 32-byte aligned in that case
relative to top of frame, just changes the -mavx512f case to be 64-byte
aligned offsets[0] (aka offsets[0] is then either 0 or -64 instead of either
0 or -32). That will not help if any variable in the frame needs 128-byte,
256-byte, 512-byte ... 4096-byte alignment. If you want to fix the bug in
the spot you've touched, you'd need to walk all the
stack_vars[stack_vars_sorted[si2]] for si2 [si + 1, n - 1] and for those
where the loop would do anything (i.e.
stack_vars[i2].representative == i2
&& TREE_CODE (decl2) == SSA_NAME
? SA.partition_to_pseudo[var_to_partition (SA.map, decl2)] == NULL_RTX
: DECL_RTL (decl2) == pc_rtx
and the pred applies (but that means also walking the earlier ones!
because with -fstack-protector* the vars can be processed in several calls) and
alignb2 * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT
and compute maximum of those alignments.
That maximum is already computed,
data->asan_alignb = MAX (data->asan_alignb, alignb);
computes that, but you get the final result only after you do all the
expand_stack_vars calls. You'd need to compute it before.
Though, that change would be still in the wrong place.
The thing is, it would be a waste of the precious stack space when it isn't
needed at all (e.g. when asan will not at compile time do the use after
return checking, or if it won't do it at runtime, or even if it will do at
runtime it will waste the space on the stack).
The following patch fixes it solely for the __asan_stack_malloc_N
allocations, doesn't enlarge unnecessarily further the actual stack frame.
Because asan is only supported on FRAME_GROWS_DOWNWARD architectures
(mips, rs6000 and xtensa are conditional FRAME_GROWS_DOWNWARD arches, which
for -fsanitize=address or -fstack-protector* use FRAME_GROWS_DOWNWARD 1,
otherwise 0, others supporting asan always just use 1), the assumption for
the dynamic stack realignment is that the top of the stack frame (aka offset
0) is aligned to alignb passed to the function (which is the maximum of alignb
of all the vars in the frame). As checked by the assertion in the patch,
offsets[0] is 0 most of the time and so that assumption is correct, the only
case when it is not 0 is if -fstack-protector* is on together with
-fsanitize=address and cfgexpand.cc (create_stack_guard) created a stack
guard. That is the only variable which is allocated in the stack frame
right away, for all others with -fsanitize=address defer_stack_allocation
(or -fstack-protector*) returns true and so they aren't allocated
immediately but handled during the frame layout phases. So, the original
frame_offset of 0 is changed because of the stack guard to
-pointer_size_in_bytes and later at the
if (data->asan_vec.is_empty ())
{
align_frame_offset (ASAN_RED_ZONE_SIZE);
prev_offset = frame_offset.to_constant ();
}
to -ASAN_RED_ZONE_SIZE. The asan_emit_stack_protection code wasn't
taking this into account though, so essentially assumed in the
__asan_stack_malloc_N allocated memory it needs to align it such that
pointer corresponding to offsets[0] is alignb aligned. But that isn't
correct if alignb > ASAN_RED_ZONE_SIZE, in that case it needs to ensure that
pointer corresponding to frame offset 0 is alignb aligned.
The following patch fixes that. Unlike the previous case where
we knew that asan_frame_size + base_align_bias falls into the same bucket
as asan_frame_size, this isn't in some cases true anymore, so the patch
recomputes which bucket to use and if going to bucket 11 (because there is
no __asan_stack_malloc_11 function in the library) disables the after return
sanitization.
2024-04-11 Jakub Jelinek <jakub@redhat.com>
PR middle-end/110027
* asan.c (asan_emit_stack_protection): Assert offsets[0] is
zero if there is no stack protect guard, otherwise
-ASAN_RED_ZONE_SIZE. If alignb > ASAN_RED_ZONE_SIZE and there is
stack pointer guard, take the ASAN_RED_ZONE_SIZE bytes allocated at
the top of the stack into account when computing base_align_bias.
Recompute use_after_return_class from asan_frame_size + base_align_bias
and set to -1 if that would overflow to 11.
* gcc.dg/asan/pr110027.c: New test.
(cherry picked from commit
467898d513e602f5b5fc4183052217d7e6d6e8ab)
Jakub Jelinek [Fri, 5 Apr 2024 12:56:14 +0000 (14:56 +0200)]
vect: Don't clear base_misaligned in update_epilogue_loop_vinfo [PR114566]
The following testcase is miscompiled, because in the vectorized
epilogue the vectorizer assumes it can use aligned loads/stores
(if the base decl gets alignment increased), but it actually doesn't
increase that.
This is because
r10-4203-g97c1460367 added the hunk following
patch removes. The explanation feels reasonable, but actually it
is not true as the testcase proves.
The thing is, we vectorize the main loop with 64-byte vectors
and the corresponding data refs have base_alignment 16 (the
a array has DECL_ALIGN 128) and offset_alignment 32. Now, because
of the offset_alignment 32 rather than 64, we need to use unaligned
loads/stores in the main loop (and ditto in the first load/store
in vectorized epilogue). But the second load/store in the vectorized
epilogue uses only 32-byte vectors and because it is a multiple
of offset_alignment, it checks if we could increase alignment of the
a VAR_DECL, the function returns true, sets base_misaligned = true
and says the access is then aligned.
But when update_epilogue_loop_vinfo clears base_misaligned with the
assumption that the var had to have the alignment increased already,
the update of DECL_ALIGN doesn't happen anymore.
Now, I'd think this base_alignment = false was needed before
r10-4030-gd2db7f7901 change was committed where it incorrectly
overwrote DECL_ALIGN even if it was already larger, rather than
just always increasing it. But with that change in, it doesn't
make sense to me anymore.
Note, the testcase is latent on the trunk, but reproduces on the 13
branch.
2024-04-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114566
* tree-vect-loop.c (update_epilogue_loop_vinfo): Don't clear
base_misaligned.
* gcc.target/i386/avx512f-pr114566.c: New test.
(cherry picked from commit
a844095e17c1a5aada1364c6f6eaade87ead463c)
Jakub Jelinek [Fri, 5 Apr 2024 07:31:28 +0000 (09:31 +0200)]
c++: Fix ICE with weird copy assignment operator [PR114572]
While ctors/dtors don't return anything (undeclared void or this pointer
on arm) and copy assignment operators normally return a reference to *this,
it isn't invalid to return uselessly some class object which might need
destructing, but the OpenMP clause handling code wasn't expecting that.
The following patch fixes that.
2024-04-05 Jakub Jelinek <jakub@redhat.com>
PR c++/114572
* cp-gimplify.c (cxx_omp_clause_apply_fn): Call build_cplus_new
on build_call_a result if it has class type.
* testsuite/libgomp.c++/pr114572.C: New test.
(cherry picked from commit
592536eb3c0a97a55b1019ff0216ef77e6ca847e)
Jakub Jelinek [Thu, 4 Apr 2024 08:47:52 +0000 (10:47 +0200)]
fold-const: Handle NON_LVALUE_EXPR in native_encode_initializer [PR114537]
The following testcase is incorrectly rejected. The problem is that
for bit-fields native_encode_initializer expects the corresponding
CONSTRUCTOR elt value must be INTEGER_CST, but that isn't the case
here, it is wrapped into NON_LVALUE_EXPR by maybe_wrap_with_location.
We could STRIP_ANY_LOCATION_WRAPPER as well, but as all we are looking for
is INTEGER_CST inside, just looking through NON_LVALUE_EXPR seems easier.
2024-04-04 Jakub Jelinek <jakub@redhat.com>
PR c++/114537
* fold-const.c (native_encode_initializer): Look through
NON_LVALUE_EXPR if val is INTEGER_CST.
* g++.dg/cpp2a/bit-cast16.C: New test.
(cherry picked from commit
1baec8deb014b8a7da58879a407a4c00cdeb5a09)
Jakub Jelinek [Wed, 3 Apr 2024 08:02:35 +0000 (10:02 +0200)]
libquadmath: Don't assume the storage for __float128 arguments is aligned [PR114533]
With the register_printf_type/register_printf_modifier/register_printf_specifier
APIs the C library is just told the size of the argument and is provided with
a callback to fetch the argument from va_list using va_arg into C library provided
memory. The C library isn't told what alignment requirement it has, but we were
using direct load of a __float128 value from that memory which assumes
__alignof (__float128) alignment.
The following patch fixes that by using memcpy instead.
I haven't been able to reproduce an actual crash, tried
#include <quadmath.h>
#include <stdlib.h>
#include <stdio.h>
int main ()
{
__float128 r;
int prec = 20;
int width = 46;
char buf[128];
r = 2.0q;
r = sqrtq (r);
int n = quadmath_snprintf (buf, sizeof buf, "%+-#*.20Qe", width, r);
if ((size_t) n < sizeof buf)
printf ("%s\n", buf);
/* Prints: +1.
41421356237309504880e+00 */
quadmath_snprintf (buf, sizeof buf, "%Qa", r);
if ((size_t) n < sizeof buf)
printf ("%s\n", buf);
/* Prints: 0x1.6a09e667f3bcc908b2fb1366ea96p+0 */
n = quadmath_snprintf (NULL, 0, "%+-#46.*Qe", prec, r);
if (n > -1)
{
char *str = malloc (n + 1);
if (str)
{
quadmath_snprintf (str, n + 1, "%+-#46.*Qe", prec, r);
printf ("%s\n", str);
/* Prints: +1.
41421356237309504880e+00 */
}
free (str);
}
printf ("%+-#*.20Qe\n", width, r);
printf ("%Qa\n", r);
printf ("%+-#46.*Qe\n", prec, r);
printf ("%d %Qe %d %Qe %d %Qe\n", 1, r, 2, r, 3, r);
return 0;
}
In any case, I think memcpy for loading from it is right.
2024-04-03 Simon Chopin <simon.chopin@canonical.com>
Jakub Jelinek <jakub@redhat.com>
PR libquadmath/114533
* printf/printf_fp.c (__quadmath_printf_fp): Use memcpy to copy
__float128 out of args.
* printf/printf_fphex.c (__quadmath_printf_fphex): Likewise.
Signed-off-by: Simon Chopin <simon.chopin@canonical.com>
(cherry picked from commit
8455d6f6cd43b7b143ab9ee19437452fceba9cc9)
Jakub Jelinek [Thu, 14 Mar 2024 16:48:30 +0000 (17:48 +0100)]
icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]
AFAIK we have no code in LTO streaming to stream out or in
SSA_NAME_{RANGE,PTR}_INFO, so LTO effectively throws it all away
and let vrp1 and alias analysis after IPA recompute that. There is
just one spot, for IPA VRP and IPA bit CCP we save/restore ranges
and set SSA_NAME_{PTR,RANGE}_INFO e.g. on parameters depending on what
we saved and propagated, but that is after streaming in bodies for the
post IPA optimizations.
Now, without LTO SSA_NAME_{RANGE,PTR}_INFO is already computed from
earlier in many cases (er.g. evrp and early alias analysis but other spots
too), but IPA ICF is ignoring the ranges and points-to details when
comparing the bodies. I think ignoring that is just fine, that is
effectively what we do for LTO where we throw that information away
before the analysis, and not ignoring it could lead to fewer ICF merging
possibilities.
So, the following patch instead verifies that for LTO SSA_NAME_{PTR,RANGE}_INFO
just isn't there on SSA_NAMEs in functions into which other functions have
been ICFed, and for non-LTO throws that information away (which matches the
LTO behavior).
Another possibility would be to remember the SSA_NAME <-> SSA_NAME mapping
vector (just one of the 2) on successful sem_function::equals on the
sem_function which is not the chosen leader (e.g. how SSA_NAMEs in the
leader map to SSA_NAMEs in the other function) and use that vector
to union the ranges in sem_function::merge. I can implement that for
comparison, but wanted to post this first if there is an agreement on
doing that or if Honza thinks we should take SSA_NAME_{RANGE,PTR}_INFO
into account. I think we can compare SSA_NAME_RANGE_INFO, but have
no idea how to try to compare points to info. And I think it will result
in less effective ICF for non-LTO vs. LTO unnecessarily.
2024-03-12 Jakub Jelinek <jakub@redhat.com>
PR middle-end/113907
* ipa-icf.c (sem_item_optimizer::merge_classes): Reset
SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO on successfully ICF merged
functions.
* gcc.dg/pr113907-1.c: New test.
(cherry picked from commit
7580e39452b65ab5fb5a06f3f1ad7d59720269b5)
Jakub Jelinek [Thu, 14 Mar 2024 13:09:20 +0000 (14:09 +0100)]
aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]
The following testcase ICEs with LSE atomics.
The problem is that the @atomic_compare_and_swap<mode> expander uses
aarch64_reg_or_zero predicate for the desired operand, which is fine,
given that for most of the modes and even for TImode in some cases
it can handle zero immediate just fine, but the TImode
@aarch64_compare_and_swap<mode>_lse just uses register_operand for
that operand instead, again intentionally so, because the casp,
caspa, caspl and caspal instructions need to use a pair of consecutive
registers for the operand and xzr is just one register and we can't
just store zero into the link register to emulate pair of zeros.
So, the following patch fixes that by forcing the newval operand into
a register for the TImode LSE case.
2024-03-14 Jakub Jelinek <jakub@redhat.com>
PR target/114310
* config/aarch64/aarch64.c (aarch64_expand_compare_and_swap): For
TImode force newval into a register.
* gcc.dg/pr114310.c: New test.
(cherry picked from commit
9349aefa1df7ae36714b7b9f426ad46e314892d1)
Jakub Jelinek [Thu, 7 Mar 2024 09:02:49 +0000 (10:02 +0100)]
bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto [PR110079]
The following testcase ICEs, because fix_crossing_unconditional_branches
thinks that asm goto is an unconditional jump and removes it, replacing it
with unconditional jump to one of the labels.
This doesn't happen on x86 because the function in question isn't invoked
there at all:
/* If the architecture does not have unconditional branches that
can span all of memory, convert crossing unconditional branches
into indirect jumps. Since adding an indirect jump also adds
a new register usage, update the register usage information as
well. */
if (!HAS_LONG_UNCOND_BRANCH)
fix_crossing_unconditional_branches ();
I think for the asm goto case, for the non-fallthru edge if any we should
handle it like any other fallthru (and fix_crossing_unconditional_branches
doesn't really deal with those, it only looks at explicit branches at the
end of bbs and we are in cfglayout mode at that point) and for the labels
we just pass the labels as immediates to the assembly and it is up to the
user to figure out how to store them/branch to them or whatever they want to
do.
So, the following patch fixes this by not treating asm goto as a simple
unconditional jump.
I really think that on the !HAS_LONG_UNCOND_BRANCH targets we have a bug
somewhere else, where outofcfglayout or whatever should actually create
those indirect jumps on the crossing edges instead of adding normal
unconditional jumps, I see e.g. in
__attribute__((cold)) int bar (char *);
__attribute__((hot)) int baz (char *);
void qux (int x) { if (__builtin_expect (!x, 1)) goto l1; bar (""); goto l1; l1: baz (""); }
void corge (int x) { if (__builtin_expect (!x, 0)) goto l1; baz (""); l2: return; l1: bar (""); goto l2; }
with -O2 -freorder-blocks-and-partition on aarch64 before/after this patch
just b .L? jumps which I believe are +-32MB, so if .text is larger than
32MB, it could fail to link, but this patch doesn't address that.
2024-03-07 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/110079
* bb-reorder.c (fix_crossing_unconditional_branches): Don't adjust
asm goto.
* gcc.dg/pr110079.c: New test.
(cherry picked from commit
b209d905f5ce1fa9d76ce634fd54245ff340960b)
Jakub Jelinek [Mon, 4 Mar 2024 09:04:19 +0000 (10:04 +0100)]
i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]
The Intel extended format has the various weird number categories,
pseudo denormals, pseudo infinities, pseudo NaNs and unnormals.
Those are not representable in the GCC real_value and so neither
GIMPLE nor RTX VIEW_CONVERT_EXPR/SUBREG folding folds those into
constants.
As can be seen on the following testcase, because it isn't folded
(since GCC 12, before that we were folding it) we can end up with
a SUBREG of a CONST_VECTOR or similar constant, which isn't valid
general_operand, so we ICE during vregs pass trying to recognize
the move instruction.
Initially I thought it is a middle-end bug, the movxf instruction
has general_operand predicate, but the middle-end certainly never
tests that predicate, seems moves are special optabs.
And looking at other mov optabs, e.g. for vector modes the i386
patterns use nonimmediate_operand predicate on the input, yet
ix86_expand_vector_move deals with CONSTANT_P and SUBREG of CONSTANT_P
arguments which if the predicate was checked couldn't ever make it through.
The following patch handles this case similarly to the
ix86_expand_vector_move's SUBREG of CONSTANT_P case, does it just for XFmode
because I believe that is the only mode that needs it from the scalar ones,
others should just be folded.
2024-03-04 Jakub Jelinek <jakub@redhat.com>
PR target/114184
* config/i386/i386-expand.c (ix86_expand_move): If XFmode op1
is SUBREG of CONSTANT_P, force the SUBREG_REG into memory or
register.
* gcc.target/i386/pr114184.c: New test.
(cherry picked from commit
ea1c16f95b8fbaba4a7f3663ff9933ebedfb92a5)
Jakub Jelinek [Thu, 22 Feb 2024 18:32:02 +0000 (19:32 +0100)]
c: Handle scoped attributes in __has*attribute and scoped attribute parsing changes in -std=c11 etc. modes [PR114007]
We aren't able to parse __has_attribute (vendor::attr) (and __has_c_attribute
and __has_cpp_attribute) in strict C < C23 modes. While in -std=gnu* modes
or in -std=c23 there is CPP_SCOPE token, in -std=c* (except for -std=c23)
there are is just a pair of CPP_COLON tokens.
The c-lex.cc hunk adds support for that, but always returns 0 in that case
unlike the GCC 14+ version.
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR c/114007
gcc/c-family/
* c-lex.c (c_common_has_attribute): Parse 2 CPP_COLONs with
the first one with COLON_SCOPE flag the same as CPP_SCOPE but
ensure 0 is returned then.
gcc/testsuite/
* gcc.dg/c23-attr-syntax-8.c: New test.
libcpp/
* include/cpplib.h (COLON_SCOPE): Define to PURE_ZERO.
* lex.c (_cpp_lex_direct): When lexing CPP_COLON with another
colon after it, if !CPP_OPTION (pfile, scope) set COLON_SCOPE
flag on the first CPP_COLON token.
(cherry picked from commit
37127ed975e09813eaa2d1cf1062055fce45dd16)
Jakub Jelinek [Mon, 12 Feb 2024 19:45:01 +0000 (20:45 +0100)]
attribs: Don't canonicalize lookup_scoped_attribute_spec argument [PR113674]
The C and C++ FEs when parsing attributes already canonicalize them
(i.e. if they start with __ and end with __ substrings, we remove those).
lookup_attribute already verifies in gcc_assert that the first character
of name is not an underscore, and even lookup_scoped_attribute_spec doesn't
attempt to canonicalize the namespace it is passed. But for some historic
reason it was canonicalizing the name argument, which misbehaves when
an attribute starts with ____ and ends with ____.
I believe it is just wrong to try to canonicalize
lookup_scope_attribute_spec name attribute, it should have been
canonicalized already, in other spots where it is called it is already
canonicalized before.
2024-02-12 Jakub Jelinek <jakub@redhat.com>
PR c++/113674
* attribs.c (extract_attribute_substring): Remove.
(lookup_scoped_attribute_spec): Don't call it.
* c-lex.c (c_common_has_attribute): Call canonicalize_attr_name.
* c-c++-common/Wattributes-3.c: New test.
(cherry picked from commit
b42e978f29b33071addff6d7bb8bcdb11d176606)
Jakub Jelinek [Tue, 30 Jan 2024 08:58:05 +0000 (09:58 +0100)]
tree-ssa-strlen: Fix up handle_store [PR113603]
Since
r10-2101-gb631bdb3c16e85f35d3 handle_store uses
count_nonzero_bytes{,_addr} which (more recently limited to statements
with the same vuse) can walk earlier statements feeding the rhs
of the store and call get_stridx on it.
Unlike most of the other functions where get_stridx is called first on
rhs and only later on lhs, handle_store calls get_stridx on the lhs before
the count_nonzero_bytes* call and does some si->nonzero_bytes comparison
on it.
Now, strinfo structures are refcounted and it is important not to screw
it up.
What happens on the following testcase is that we call get_strinfo on the
destination idx's base (g), which returns a strinfo at that moment
with refcount of 2, one copy referenced in bb 2 final strinfos, one in bb 3
(the vector of strinfos was unshared from the dominator there because some
other strinfo was added) and finally we process a store in bb 6.
Now, count_nonzero_bytes is called and that sees &g[1] in a PHI and
calls get_stridx on it, which in turn calls get_stridx_plus_constant
because &g + 1 address doesn't have stridx yet. This creates a new
strinfo for it:
si = new_strinfo (ptr, idx, build_int_cst (size_type_node, nonzero_chars),
basesi->full_string_p);
set_strinfo (idx, si);
and the latter call, because it is the first one in bb 6 that needs it,
unshares the stridx_to_strinfo vector (so refcount of the g strinfo becomes
3).
Now, get_stridx_plus_constant needs to chain the new strinfo of &g[1] in
between the related strinfos, so after the g record. Because the strinfo
is now shared between the current bb and 2 other bbs, it needs to
unshare_strinfo it (creating a new strinfo which can be modified as a copy
of the old one, decrementing refcount of the old shared one and setting
refcount of the new one to 1):
if (strinfo *nextsi = get_strinfo (chainsi->next))
{
nextsi = unshare_strinfo (nextsi);
si->next = nextsi->idx;
nextsi->prev = idx;
}
chainsi = unshare_strinfo (chainsi);
if (chainsi->first == 0)
chainsi->first = chainsi->idx;
chainsi->next = idx;
Now, the bug is that the caller of this a couple of frames above,
handle_store, holds on a pointer to this g strinfo (but doesn't know
about the unsharing, so the pointer is to the old strinfo with refcount
of 2), and later needs to update it, so it
si = unshare_strinfo (si);
and modifies some fields in it.
This creates a new strinfo (with refcount of 1 which is stored into
the vector of the current bb) based on the old strinfo for g and
decrements refcount of the old one to 1. So, now we are in inconsistent
state, because the old strinfo for g is referenced in bb 2 and bb 3
vectors, but has just refcount of 1, and then have one strinfo (the one
created by unshare_strinfo (chainsi) in get_stridx_plus_constant) which
has refcount of 1 but isn't referenced from anywhere anymore.
Later on when we free one of the bb 2 or bb 3 vectors (forgot which)
that decrements refcount from 1 to 0 and poisons the strinfo/returns it to
the pool, but then maybe_invalidate when looking at the other bb's pointer
to it ICEs.
The following patch fixes it by calling get_strinfo again, it is guaranteed
to return non-NULL, but could be an unshared copy instead of the originally
fetched shared one.
I believe we only need to do this refetching for the case where get_strinfo
is called on the lhs before get_stridx is called on other operands, because
we should be always modifying (apart from the chaining changes) the strinfo
for the destination of the statements, not other strinfos just consumed in
there.
2024-01-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113603
* tree-ssa-strlen.c (strlen_pass::handle_store): After
count_nonzero_bytes call refetch si using get_strinfo in case it
has been unshared in the meantime.
* gcc.c-torture/compile/pr113603.c: New test.
(cherry picked from commit
d7250c1e02478586a0cd6d5cb67bf4d17249a7e7)
Jakub Jelinek [Thu, 25 Jan 2024 08:10:08 +0000 (09:10 +0100)]
docs: Fix 2 typos
When looking into PR113572, I've noticed a typo in VECTOR_CST documentation
and grep found pasto of it elsewhere.
2024-01-25 Jakub Jelinek <jakub@redhat.com>
* doc/generic.texi (VECTOR_CST): Fix typo - petterns -> patterns.
* doc/rtl.texi (CONST_VECTOR): Likewise.
(cherry picked from commit
36c1384038f3b9f01124f0fc38bb3c930b1cbe8a)
Jakub Jelinek [Thu, 18 Jan 2024 09:21:12 +0000 (10:21 +0100)]
i386: Add -masm=intel profiling support [PR113122]
x86_function_profiler emits assembly directly into file and only emits
AT&T syntax. The following patch adjusts it to emit MASM syntax
if -masm=intel.
As it doesn't use asm_fprintf, I can't use {|} syntax for the dialects.
I've tested using
for i in -mcmodel=large "-mcmodel=large -fpic" "" -fpic "-m32 -fpic" "-m32"; do
./xgcc -B ./ -c -O2 -fprofile $i -masm=att pr113122.c -o pr113122.o1;
./xgcc -B ./ -c -O2 -fprofile $i -masm=intel pr113122.c -o pr113122.o2;
objdump -dr pr113122.o1 > /tmp/1; objdump -dr pr113122.o2 > /tmp/2;
diff -up /tmp/1 /tmp/2; done
that the emitted sequences are identical after assembly.
2024-01-18 Jakub Jelinek <jakub@redhat.com>
PR target/113122
* config/i386/i386.c (x86_function_profiler): Add -masm=intel
support. Add missing space after , in emitted assembly in some
cases. Formatting fixes.
* gcc.target/i386/pr113122-1.c: New test.
* gcc.target/i386/pr113122-2.c: New test.
* gcc.target/i386/pr113122-3.c: New test.
* gcc.target/i386/pr113122-4.c: New test.
(cherry picked from commit
d4a2d91b46b2cf758b249a4545e34287e90da23b)
Jakub Jelinek [Tue, 16 Jan 2024 10:49:34 +0000 (11:49 +0100)]
cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372]
The following patch adds a quick workaround to bugs in VAR_DECL
partitioning.
The problem is that there is no dependency between ADDR_EXPRs of local
decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs
(including ivopts integral variants thereof), which can break
add_scope_conflicts discovery of what variables are actually used
in certain region.
E.g. we can have
ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
...
uses of ivtmp.40_3
...
bitint.6 ={v} {CLOBBER(eos)};
...
ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
...
uses of ivtmp.28_43
before VN (such as dom3), which the add_scope_conflicts code identifies as 2
independent uses of bitint.6 variable (which is correct), but then VN
determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3
even in the second region; at that point add_scope_conflict thinks the
bitint.6 variable is not used in that region anymore.
The following patch does a simple single def-stmt check for such ADDR_EXPRs
(rather than say trying to do a full propagation of what SSA_NAMEs can
contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs.
In addition to this patch I've used the attached one to gather statistics
on the total size of all variable partitions in a function and seems besides
the new testcases nothing is really affected compared to no patch (I've
actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so
it looks the same except that it never triggers). The comparison wasn't
perfect because I've only gathered BITS_PER_WORD, main_input_filename (did
some replacement of build directories and /tmp/ccXXXXXX names of LTO to make
it more similar between the two bootstraps/regtests), current_function_name
and the total size of all variable partitions if any, because I didn't
record e.g. the optimization options and so e.g. torture tests which iterate
over options could have different partition sizes even in one compiler when
BITS_PER_WORD, main_input_filename and current_function_name are all equal.
So had to write an awk script to check if the first triple in the second
build appeared in the first one and the quadruple in the second build
appeared in the first one too, otherwise print result and that only
triggered in the new tests.
Also, the cc1plus binary according to objdump -dr is identical between the
two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots.
2024-01-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113372
PR middle-end/90348
PR middle-end/110115
PR middle-end/111422
* cfgexpand.c (add_scope_conflicts_2): New function.
(add_scope_conflicts_1): Use it.
* gcc.c-torture/execute/pr90348.c: New test.
* gcc.c-torture/execute/pr110115.c: New test.
* gcc.c-torture/execute/pr111422.c: New test.
(cherry picked from commit
1251d3957de04dc9b023a23c09400217e13deadb)
Jakub Jelinek [Wed, 10 Jan 2024 12:29:47 +0000 (13:29 +0100)]
libgomp: Fix up FLOCK fallback handling [PR113192]
My earlier change broke Solaris testing, because @FLOCK@ isn't substituted
just into libgomp/Makefile where it worked, but also the
testsuite/libgomp-site-extra.exp file where Make variables aren't present
and can't be substituted.
The following patch instead computes the absolute srcdir path and uses it
for FLOCK.
2024-01-10 Jakub Jelinek <jakub@redhat.com>
PR libgomp/113192
* configure.ac (FLOCK): Use $libgomp_abs_srcdir/testsuite/flock
instead of \$(abs_top_srcdir)/testsuite/flock.
* configure: Regenerated.
(cherry picked from commit
2fb3ee3ee82874e160309344bc3e52afeed8f26a)
Jakub Jelinek [Tue, 9 Jan 2024 14:37:04 +0000 (15:37 +0100)]
c-family: copy attribute diagnostic fixes [PR113262]
The copy attributes is allowed on decls as well as types and even has
checks whether decl (set to *node) is DECL_P or TYPE_P, but for diagnostics
unconditionally uses DECL_SOURCE_LOCATION (decl), which obviously only works
if it applies to a decl.
2024-01-09 Jakub Jelinek <jakub@redhat.com>
PR c/113262
* c-attribs.c (handle_copy_attribute): Don't use
DECL_SOURCE_LOCATION (decl) if decl is not DECL_P, use input_location
instead. Formatting fixes.
* gcc.dg/pr113262.c: New test.
(cherry picked from commit
c9fc7f398e8b330ff12ec8a29bfa058b6daf6624)
GCC Administrator [Thu, 20 Jun 2024 00:18:11 +0000 (00:18 +0000)]
Daily bump.
GCC Administrator [Wed, 19 Jun 2024 00:18:42 +0000 (00:18 +0000)]
Daily bump.
GCC Administrator [Tue, 18 Jun 2024 00:17:44 +0000 (00:17 +0000)]
Daily bump.
This page took 0.099462 seconds and 5 git commands to generate.