gcc.gnu.org Git - gcc.git/log

Work in progress for refactoring simd intrinsic

aarch64: Add support for fp8fma instructions

The AArch64 FEAT_FP8FMA extension introduces instructions for
multiply-add of vectors.

This patch introduces the following instructions:
1. {vmlalbq|vmlaltq}_f16_mf8_fpm.
2. {vmlalbq|vmlaltq}_lane{q}_f16_mf8_fpm.
3. {vmlallbbq|vmlallbtq|vmlalltbq|vmlallttq}_f32_mf8_fpm.
4. {vmlallbbq|vmlallbtq|vmlalltbq|vmlallttq}_lane{q}_f32_mf8_fpm.

It introduces the fp8fma flag.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc
(check_simd_lane_bounds): Add support for new unspecs.
(aarch64_expand_pragma_builtins): Add support for new unspecs.
* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): New flags.
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): New flags.
* config/aarch64/aarch64-simd-pragma-builtins.def
(ENTRY_FMA_FPM): Macro to declare fma intrinsics.
(REQUIRED_EXTENSIONS): Define to declare functions behind
command line flags.
* config/aarch64/aarch64-simd.md:
(@aarch64_<fpm_uns_op><VQ_HSF:mode><VQ_HSF:mode><V16QI_ONLY:mode><V16QI_ONLY:mode): Instruction pattern for fma intrinsics.
(@aarch64_<fpm_uns_op><VQ_HSF:mode><VQ_HSF:mode><V16QI_ONLY:mode><VB:mode><SI_ONLY:mode): Instruction pattern for fma intrinsics with lane.
* config/aarch64/aarch64.h
(TARGET_FP8FMA): New flag for fp8fma instructions.
* config/aarch64/iterators.md: New attributes and iterators.
* doc/invoke.texi: New flag for fp8fma instructions.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/fma_fpm.c: New test.

aarch64: Add support for fp8dot2 and fp8dot4

The AArch64 FEAT_FP8DOT2 and FEAT_FP8DOT4 extension introduces
instructions for dot product of vectors.

This patch introduces the following intrinsics:
1. vdot{q}_{fp16|fp32}_mf8_fpm.
2. vdot{q}_lane{q}_{fp16|fp32}_mf8_fpm.

It introduces two flags: fp8dot2 and fp8dot4.

We had to add space for another type in aarch64_pragma_builtins_data
struct. The macros were updated to reflect that.

We added a new aarch64_builtin_signature variant, quaternary, and added
support for it in the functions aarch64_fntype and
aarch64_expand_pragma_builtin.

We added a new namespace, function_checker, to implement range checks
for functions defined using the new pragma approach. The old intrinsic
range checks will continue to work. All the new AdvSIMD intrinsics we
define that need lane checks should be using the function in this
namespace to implement the checks.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc
(ENTRY): Change to handle extra type.
(enum class): Added new variant.
(struct aarch64_pragma_builtins_data): Add support for another
type.
(aarch64_get_number_of_args): Handle new signature.
(require_integer_constant): New function to check whether the
operand is an integer constant.
(require_immediate_range): New function to validate index
ranges.
(check_simd_lane_bounds): New function to validate index
operands.
(aarch64_general_check_builtin_call): Call
function_checker::check-simd_lane_bounds.
(aarch64_expand_pragma_builtin): Handle new signature.
* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): New flags.
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): New flags.
* config/aarch64/aarch64-simd-pragma-builtins.def
(ENTRY_BINARY): Change to handle extra type.
(ENTRY_BINARY_FPM): Change to handle extra type.
(ENTRY_UNARY_FPM): Change to handle extra type.
(ENTRY_TERNARY_FPM_LANE): Macro to declare fpm ternary with
lane intrinsics.
(ENTRY_VDOT_FPM): Macro to declare vdot intrinsics.
(REQUIRED_EXTENSIONS): Define to declare functions behind
command line flags.
* config/aarch64/aarch64-simd.md:
(@aarch64_<fpm_uns_op><VHF:mode><VHF:mode><VB:mode><VB:mode>):
Instruction pattern for vdot2 intrinsics.
(@aarch64_<fpm_uns_op><VHF:mode><VHF:mode><VB:mode><VB2:mode><SI_ONLY:mode>):
Instruction pattern for vdot2 intrinsics with lane.
(@aarch64_<fpm_uns_op><VDQSF:mode><VDQSF:mode><VB:mode><VB:mode>):
Instruction pattern for vdot4 intrinsics.
(@aarch64_<fpm_uns_op><VDQSF:mode><VDQSF:mode><VB:mode><VB2:mode><SI_ONLY:mode>):
Instruction pattern for vdo4 intrinsics with lane.
* config/aarch64/aarch64.h
(TARGET_FP8DOT2): New flag for fp8dot2 instructions.
(TARGET_FP8DOT4): New flag for fp8dot4 instructions.
* config/aarch64/iterators.md: New attributes and iterators.
* doc/invoke.texi: New flag for fp8dot2 and fp8dot4
instructions.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/vdot2_fpmdot.c: New test.
* gcc.target/aarch64/simd/vdot4_fpmdot.c: New test.

aarch64: Add support for fp8 convert and scale

The AArch64 FEAT_FP8 extension introduces instructions for conversion
and scaling.

This patch introduces the following intrinsics:
1. vcvt{1|2}_{bf16|high_bf16|low_bf16}_mf8_fpm.
2. vcvt{q}_mf8_f16_fpm.
3. vcvt_{high}_mf8_f32_fpm.
4. vscale{q}_{f16|f32|f64}.

We introduced two aarch64_builtin_signatures enum variants, unary and
ternary, and added support for these variants in the functions
aarch64_fntype and aarch64_expand_pragma_builtin.

We added new simd_types for integers (s32, s32q, and s64q) and for
floating points (f8 and f8q).

Because we added support for fp8 intrinsics here, we modified the check
in acle/fp8.c that was checking that __ARM_FEATURE_FP8 macro is not
defined.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc
(ENTRY): Modified to support uses_fpmr flag.
(enum class): New variants to support new signatures.
(struct aarch64_pragma_builtins_data): Add a new boolean field,
uses_fpmr.
(aarch64_get_number_of_args): Helper function used in
aarch64_fntype and aarch64_expand_pragma_builtin.
(aarch64_fntype): Handle new signatures.
(aarch64_expand_pragma_builtin): Handle new signatures.
* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): New flag for FP8.
* config/aarch64/aarch64-simd-pragma-builtins.def
(ENTRY_BINARY): Macro to declare binary intrinsics.
(ENTRY_TERNARY): Macro to declare ternary intrinsics.
(ENTRY_UNARY): Macro to declare unary intrinsics.
(ENTRY_VHSDF): Macro to declare binary intrinsics.
(ENTRY_VHSDF_VHSDI): Macro to declare binary intrinsics.
(REQUIRED_EXTENSIONS): Define to declare functions behind
command line flags.
* config/aarch64/aarch64-simd.md
(@aarch64_<fpm_unary_bf_uns_op><V8BF_ONLY:mode><VB:mode>): Unary
pattern.
(@aarch64_<fpm_unary_hf_uns_op><V8HF_ONLY:mode><VB:mode>): Unary
pattern.
(@aarch64_lower_<fpm_unary_bf_uns_op><V8BF_ONLY:mode><V16QI_ONLY:mode>):
Unary pattern.
(@aarch64_lower_<fpm_unary_hf_uns_op><V8HF_ONLY:mode><V16QI_ONLY:mode>):
Unary pattern.
(@aarch64<fpm_uns_op><VB:mode><VCVTFPM:mode><VH_SF:mode>):
Binary pattern.
(@aarch64_<fpm_uns_op><V16QI_ONLY:mode><V8QI_ONLY:mode><V4SF_ONLY:mode><V4SF_ONLY:mode>):
Unary pattern.
(@aarch64_<fpm_uns_op><VHSDF:mode><VHSDI:mode>): Binary pattern.
* config/aarch64/iterators.md: New attributes and iterators.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/fp8.c: Remove check that fp8 feature
macro doesn't exist.
* gcc.target/aarch64/simd/scale_fpm.c: New test.
* gcc.target/aarch64/simd/vcvt_fpm.c: New test.

aarch64: Refactor infrastructure for advsimd intrinsics

This patch refactors the infrastructure for defining advsimd pragma
intrinsics, adding support for more flexible type and signature
handling in future SIMD extensions.

A new simd_type structure is introduced, which allows for consistent
mode and qualifier management across various advsimd operations.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (ENTRY): Modify to
include modes and qualifiers for simd_type structure.
(ENTRY_VHSDF): Move to aarch64-builtins.cc to decouple.
(struct simd_type): New structure for managing mode and
qualifier combinations for SIMD types.
(struct aarch64_pragma_builtins_data): Replace mode with
simd_type to support multiple argument types for intrinsics.
(aarch64_fntype): Modify to handle different shapes type.
(aarch64_expand_pragma_builtin): Modify to handle different
shapes type.

* config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_BINARY):
Move from aarch64-builtins.cc.
(ENTRY_VHSDF): Move from aarch64-builtins.cc.
(REQUIRED_EXTENSIONS): New macro.

i386: Fix cstorebf4 fp comparison operand [PR117495]

For cstorebf4 it uses comparison_operator for BFmode compare, which is
incorrect when directly uses ix86_expand_setcc as it does not canonicalize
the input comparison to correct the compare code by swapping operands.
The original code without AVX10.2 calls emit_store_flag_force, who
actually calls to emit_store_flags_1 and recurisive calls to this expander
again with swapped operand and flag.
Therefore, we can avoid do the redundant recurisive call by adjusting
the comparison_operator to ix86_fp_comparison_operator, and calls
ix86_expand_setcc directly.

gcc/ChangeLog:

PR target/117495
* config/i386/i386.md (cstorebf4): Use ix86_fp_comparison_operator
and calls ix86_expand_setcc directly.

gcc/testsuite/ChangeLog:

PR target/117495
* gcc.target/i386/pr117495.c: New test.

[PATCH] RISC-V: Bugfix for unrecognizable insn for XTheadVector

error: unrecognizable insn:

(insn 35 34 36 2 (set (subreg:RVVM1SF (reg/v:RVVM1x4SF 142 [ _r ]) 0)
        (unspec:RVVM1SF [
                (const_vector:RVVM1SF repeat [
                        (const_double:SF 0.0 [0x0.0p+0])
                    ])
                (reg:DI 0 zero)
                (const_int 1 [0x1])
                (reg:SI 66 vl)
                (reg:SI 67 vtype)
            ] UNSPEC_TH_VWLDST)) -1
     (nil))
during RTL pass: mode_sw

PR target/116591

gcc/ChangeLog:

* config/riscv/vector.md: Add restriction to call pred_th_whole_mov.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr116591.c: New test.

libstdc++: Refactor std::hash specializations

This attempts to simplify and clean up our std::hash code. The primary
benefit is improved diagnostics for users when they do something wrong
involving std::hash or unordered containers. An additional benefit is
that for the unstable ABI (--enable-symvers=gnu-versioned-namespace) we
can reduce the memory footprint of several std::hash specializations.

In the current design, __hash_enum is a base class of the std::hash
primary template, but the partial specialization of __hash_enum for
non-enum types is disabled. This means that if a user forgets to
specialize std::hash for their class type (or forgets to use a custom
hash function for unordered containers) they get error messages about
std::__hash_enum not being constructible. This is confusing when there
is no enum type involved: why should users care about __hash_enum not
being constructible if they're not trying to hash enums?

This change makes the std::hash primary template only derive from
__hash_enum when the template argument type is an enum. Otherwise, it
derives directly from a new class template, __hash_not_enabled. This new
class template defines the deleted members that cause a given std::hash
specialization to be a disabled specialization (as per P0513R0). Now
when users try to use a disabled specialization, they get more
descriptive errors that mention __hash_not_enabled instead of
__hash_enum.

Additionally, adjust __hash_base to remove the deprecated result_type
and argument_type typedefs for C++20 and later.

In the current code we use a __poison_hash base class in the std::hash
specializations for std::unique_ptr, std::optional, and std::variant.
The primary template of __poison_hash has deleted special members, which
is used to conditionally disable the derived std::hash specialization.
This can also result in confusing diagnostics, because seeing "poison"
in an enabled specialization is misleading. Only some uses of
__poison_hash actually "poison" anything, i.e. cause a specialization to
be disabled. In other cases it's just an empty base class that does
nothing.

This change removes __poison_hash and changes the std::hash
specializations that were using it to conditionally derive from
__hash_not_enabled instead. When the std::hash specialization is
enabled, there is no more __poison_hash base class. However, to preserve
the ABI properties of those std::hash specializations, we need to
replace __poison_hash with some other empty base class. This is needed
because in the current code std::hash<std::variant<int, const int>> has
two __poison_hash<int> base classes, which must have unique addresses,
so sizeof(std::hash<std::variant<int, const int>>) == 2. To preserve
this unfortunate property, a new __hash_empty_base class is used as a
base class to re-introduce du0plicate base classes that increase the
class size. For the unstable ABI we don't use __hash_empty_base so the
std::hash<std::variant<T...>> specializations are always size 1, and
the class hierarchy is much simpler so will compile faster.

Additionally, remove the result_type and argument_type typedefs from all
disabled specializations of std::hash for std::unique_ptr,
std::optional, and std::variant. Those typedefs are useless for disabled
specializations, and although the standard doesn't say they must *not*
be present for disabled specializations, it certainly only requires them
for enabled specializations. Finally, for C++20 the typedefs are also
removed from enabled specializations of std::hash for std::unique_ptr,
std::optional, and std::variant.

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document removal of nested types
from std::hash specializations.
* doc/html/manual/api.html: Regenerate.
* include/bits/functional_hash.h (__hash_base): Remove
deprecated nested types for C++20.
(__hash_empty_base): Define new class template.
(__is_hash_enabled_for): Define new variable template.
(__poison_hash): Remove.
(__hash_not_enabled): Define new class template.
(__hash_enum): Remove partial specialization for non-enums.
(hash): Derive from __hash_not_enabled for non-enums, instead of
__hash_enum.
* include/bits/unique_ptr.h (__uniq_ptr_hash): Derive from
__hash_base. Conditionally derive from __hash_empty_base.
(__uniq_ptr_hash<>): Remove disabled specialization.
(hash): Do not derive from __hash_base unconditionally.
Conditionally derive from either __uniq_ptr_hash or
__hash_not_enabled.
* include/std/optional (__optional_hash_call_base): Remove.
(__optional_hash): Define new class template.
(hash): Derive from either
(hash): Conditionally derive from either __optional_hash or
__hash_not_enabled. Remove nested typedefs.
* include/std/variant (_Base_dedup): Replace __poison_hash with
__hash_empty_base.
(__variant_hash_call_base_impl): Remove.
(__variant_hash): Define new class template.
(hash): Conditionally derive from either __variant_hash or
__hash_not_enabled. Remove nested typedefs.
* testsuite/20_util/optional/hash.cc: Check whether nested types
are present.
* testsuite/20_util/variant/hash.cc: Likewise.
* testsuite/20_util/optional/hash_abi.cc: New test.
* testsuite/20_util/unique_ptr/hash/abi.cc: New test.
* testsuite/20_util/unique_ptr/hash/types.cc: New test.
* testsuite/20_util/variant/hash_abi.cc: New test.

libstdc++: Add _Hashtable::_M_locate(const key_type&)

We have two overloads of _M_find_before_node but they have quite
different performance characteristics, which isn't necessarily obvious.

The original version, _M_find_before_node(bucket, key, hash_code), looks
only in the specified bucket, doing a linear search within that bucket
for an element that compares equal to the key. This is the typical fast
lookup for hash containers, assuming the load factor is low so that each
bucket isn't too large.

The newer _M_find_before_node(key) was added in r12-6272-ge3ef832a9e8d6a
and could be naively assumed to calculate the hash code and bucket for
key and then call the efficient _M_find_before_node(bkt, key, code)
function. But in fact it does a linear search of the entire container.
This is potentially very slow and should only be used for a suitably
small container, as determined by the __small_size_threshold() function.
We don't even have a comment pointing out this O(N) performance of the
newer overload.

Additionally, the newer overload is only ever used in exactly one place,
which would suggest it could just be removed. However there are several
places that do the linear search of the whole container with an explicit
loop each time.

This adds a new member function, _M_locate, and uses it to replace most
uses of _M_find_node and the loops doing linear searches. This new
member function does both forms of lookup, the linear search for small
sizes and the _M_find_node(bkt, key, code) lookup within a single
bucket. The new function returns a __location_type which is a struct
that contains a pointer to the first node matching the key (if such a
node is present), or the hash code and bucket index for the key. The
hash code and bucket index allow the caller to know where a new node
with that key should be inserted, for the cases where the lookup didn't
find a matching node.

The result struct actually contains a pointer to the node *before* the
one that was located, as that is needed for it to be useful in erase and
extract members. There is a member function that returns the found node,
i.e. _M_before->_M_nxt downcast to __node_ptr, which should be used in
most cases.

This new function greatly simplifies the functions that currently have
to do two kinds of lookup and explicitly check the current size against
the small size threshold.

Additionally, now that try_emplace is defined directly in _Hashtable
(not in _Insert_base) we can use _M_locate in there too, to speed up
some try_emplace calls. Previously it did not do the small-size linear
search.

It would be possible to add a function to get a __location_type from an
iterator, and then rewrite some functions like _M_erase and
_M_extract_node to take a __location_type parameter. While that might be
conceptually nice, it wouldn't really make the code any simpler or more
readable than it is now. That isn't done in this change.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (__location_type): New struct.
(_M_locate): New member function.
(_M_find_before_node(const key_type&)): Remove.
(_M_find_node): Move variable initialization into condition.
(_M_find_node_tr): Likewise.
(operator=(initializer_list<T>), try_emplace, _M_reinsert_node)
(_M_merge_unique, find, erase(const key_type&)): Use _M_locate
for lookup.

libstdc++: Simplify _Hashtable merge functions

I realised that _M_merge_unique and _M_merge_multi call extract(iter)
which then has to call _M_get_previous_node to iterate through the
bucket to find the node before the one iter points to. Since the merge
function is already iterating over the entire container, we had the
previous node a moment ago. Walking the whole bucket to find it again is
wasteful. We could just rewrite the loop in terms of node pointers
instead of iterators, and then call _M_extract_node directly. However,
this is only possible when the source container is the same type as the
destination, because otherwise we can't access the source's private
members (_M_before_begin, _M_begin, _M_extract_node etc.)

Add overloads of _M_merge_unique and _M_merge_multi that work with
source containers of the same type, to enable this optimization.

For both overloads of _M_merge_unique we can also remove the conditional
modifications to __n_elt and just consistently decrement it for every
element processed. Use a multiplier of one or zero that dictates whether
__n_elt is passed to _M_insert_unique_node or not. We can also remove
the repeated calls to size() and just keep track of the size in a local
variable.

Although _M_merge_unique and _M_merge_multi should be safe for
"self-merge", i.e. when doing c.merge(c), it's wasteful to search/insert
every element when we don't need to do anything. Add 'this == &source'
checks to the overloads taking an lvalue of the container's own type.
Because those checks aren't needed for the rvalue overloads, change
those to call the underlying _M_merge_xxx function directly instead of
going through the lvalue overload that checks the address.

I've also added more extensive tests for better coverage of the new
overloads added in this commit.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_M_merge_unique): Add overload for
merging from same type.
(_M_merge_unique<Compatible>): Simplify size tracking. Add
comment.
(_M_merge_multi): Add overload for merging from same type.
(_M_merge_multi<Compatible>): Add comment.
* include/bits/unordered_map.h (unordered_map::merge): Check for
self-merge in the lvalue overload. Call _M_merge_unique directly
for the rvalue overload.
(unordered_multimap::merge): Likewise.
* include/bits/unordered_set.h (unordered_set::merge): Likewise.
(unordered_multiset::merge): Likewise.
* testsuite/23_containers/unordered_map/modifiers/merge.cc:
Add more tests.
* testsuite/23_containers/unordered_multimap/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_set/modifiers/merge.cc:
Likewise.

libstdc++: Remove _Hashtable_base::_S_equals

This removes the overloaded _S_equals and _S_node_equals functions,
replacing them with 'if constexpr' in the handful of places they're
used.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_Hashtable_base::_S_equals):
Remove.
(_Hashtable_base::_S_node_equals): Remove.
(_Hashtable_base::_M_key_equals_tr): Fix inaccurate
static_assert string.
(_Hashtable_base::_M_equals, _Hashtable_base::_M_equals_tr): Use
'if constexpr' instead of _S_equals.
(_Hashtable_base::_M_node_equals): Use 'if constexpr' instead of
_S_node_equals.

libstdc++: Remove _Equality base class from _Hashtable

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable): Remove _Equality base
class.
(_Hashtable::_M_equal): Define equality comparison here instead
of in _Equality::_M_equal.
* include/bits/hashtable_policy.h (_Equality): Remove.

libstdc++: Remove _Insert base class from _Hashtable

There's no reason to have a separate base class defining the insert
member functions now. They can all be moved into the _Hashtable class,
which simplifies them slightly.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable): Remove inheritance from
__detail::_Insert and move its members into _Hashtable.
* include/bits/hashtable_policy.h (__detail::_Insert): Remove.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

libstdc++: Use RAII in _Hashtable

Use scoped guard types to clean up if an exception is thrown. This
allows some try-catch blocks to be removed.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (operator=(const _Hashtable&)): Use
RAII instead of try-catch.
(_M_assign(_Ht&&, _NodeGenerator&)): Likewise.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

libstdc++: Replace _Hashtable::__fwd_value_for with cast

We can just use a cast to the appropriate type instead of calling a
function to do it. This gives the compiler less work to compile and
optimize, and at -O0 avoids a function call per element.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable::__fwd_value_for):
Remove.
(_Hashtable::_M_assign): Use static_cast instead of
__fwd_value_for.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

libstdc++: Add _Hashtable::_M_assign for the common case

This adds a convenient _M_assign overload for the common case where the
node generator is the _AllocNode type. Only two places need to call
_M_assign with a _ReuseOrAllocNode node generator, so all the other
calls to _M_assign can use the new overload instead of manually
constructing a node generator.

The _AllocNode::operator(Args&&...) function doesn't need to be a
variadic template. It is only ever called with a single argument of type
const value_type& or value_type&&, so could be simplified. That isn't
done in this commit.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable): Remove typedefs for
node generators.
(_Hashtable::_M_assign(_Ht&&)): Add new overload.
(_Hashtable::operator=(initializer_list<value_type>)): Add local
typedef for node generator.
(_Hashtable::_M_assign_elements): Likewise.
(_Hashtable::operator=(const _Hashtable&)): Use new _M_assign
overload.
(_Hashtable(const _Hashtable&)): Likewise.
(_Hashtable(const _Hashtable&, const allocator_type&)):
Likewise.
(_Hashtable(_Hashtable&&, __node_alloc_type&&, false_type)):
Likewise.
* include/bits/hashtable_policy.h (_Insert): Remove typedef for
node generator.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

libstdc++: Refactor Hashtable erasure

This reworks the internal member functions for erasure from
unordered containers, similarly to the earlier commit doing it for
insertion.

Instead of multiple overloads of _M_erase which are selected via tag
dispatching, the erase(const key_type&) member can use 'if constexpr' to
choose an appropriate implementation (returning after erasing a single
element for unique keys, or continuing to erase all equivalent elements
for non-unique keys).

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable::_M_erase): Remove
overloads for erasing by key, moving logic to ...
(_Hashtable::erase): ... here.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

libstdc++: Refactor Hashtable insertion [PR115285]

This completely reworks the internal member functions for insertion into
unordered containers. Currently we use a mixture of tag dispatching (for
unique vs non-unique keys) and template specialization (for maps vs
sets) to correctly implement insert and emplace members.

This removes a lot of complexity and indirection by using 'if constexpr'
to select the appropriate member function to call.

Previously there were four overloads of _M_emplace, for unique keys and
non-unique keys, and for hinted insertion and non-hinted. However two of
those were redundant, because we always ignore the hint for unique keys
and always use a hint for non-unique keys. Those four overloads have
been replaced by two new non-overloaded function templates:
_M_emplace_uniq and _M_emplace_multi. The former is for unique keys and
doesn't take a hint, and the latter is for non-unique keys and takes a
hint.

In the body of _M_emplace_uniq there are special cases to handle
emplacing values from which a key_type can be extracted directly. This
means we don't need to allocate a node and construct a value_type that
might be discarded if an equivalent key is already present. The special
case applies when emplacing the key_type into std::unordered_set, or
when emplacing std::pair<cv key_type, X> into std::unordered_map, or
when emplacing two values into std::unordered_map where the first has
type cv key_type. For the std::unordered_set case, obviously if we're
inserting something that's already the key_type, we can look it up
directly. For the std::unordered_map cases, we know that the inserted
std::pair<const key_type, mapped_type> would have its first element
initialized from first member of a std::pair value, or from the first of
two values, so if that is a key_type, we can look that up directly.

All the _M_insert overloads used a node generator parameter, but apart
from the one case where _M_insert_range was called from
_Hashtable::operator=(initializer_list<value_type>), that parameter was
always the _AllocNode type, never the _ReuseOrAllocNode type. Because
operator=(initializer_list<value_type>) was rewritten in an earlier
commit, all calls to _M_insert now use _AllocNode, so there's no reason
to pass the generator as a template parameter when inserting.

The multiple overloads of _Hashtable::_M_insert can all be removed now,
because the _Insert_base::insert members now call either _M_emplace_uniq
or _M_emplace_multi directly, only passing a hint to the latter. Which
one to call is decided using 'if constexpr (__unique_keys::value)' so
there is no unnecessary code instantiation, and overload resolution is
much simpler.

The partial specializations of the _Insert class template can be
entirely removed, moving the minor differences in 'insert' member
functions into the common _Insert_base base class. The different
behaviour for maps and sets can be implemented using enable_if
constraints and 'if constexpr'. With the _Insert class template no
longer needed, the _Insert_base class template can be renamed to
_Insert. This is a minor simplification for the complex inheritance
hierarchy used by _Hashtable, removing one base class. It also means
one less class template instantiation, and no need to match the right
partial specialization of _Insert. The _Insert base class could be
removed entirely by moving all its 'insert' members into _Hashtable,
because without any variation in specializations of _Insert there is no
reason to use a base class to define those members. That is left for a
later commit.

Consistently using _M_emplace_uniq or _M_emplace_multi for insertion
means we no longer attempt to avoid constructing a value_type object to
find its key, removing the PR libstdc++/96088 optimizations. This fixes
the bugs caused by those optimizations, such as PR libstdc++/115285, but
causes regressions in the expected number of allocations and temporary
objects constructed for the PR 96088 tests. It should be noted that the
"regressions" in the 96088 tests put us exactly level with the number of
allocations done by libc++ for those same tests.

To mitigate this to some extent, _M_emplace_uniq detects when the
emplace arguments already contain a key_type (either as the sole
argument, for unordered_set, or as the first part of a pair of
arguments, for unordered_map). In that specific case we don't need to
allocate a node and construct a value type to check for an existing
element with equivalent key.

The remaining regressions in the number of allocations and temporaries
should be addressed separately, with more conservative optimizations
specific to std::string. That is not part of this commit.

libstdc++-v3/ChangeLog:

PR libstdc++/115285
* include/bits/hashtable.h (_Hashtable::_M_emplace): Replace
with _M_emplace_uniq and _M_emplace_multi.
(_Hashtable::_S_forward_key, _Hashtable::_M_insert_unique)
(_Hashtable::_M_insert_unique_aux, _Hashtable::_M_insert):
Remove.
* include/bits/hashtable_policy.h (_ConvertToValueType):
Remove.
(_Insert_base::_M_insert_range): Remove overload for unique keys
and rename overload for non-unique keys to ...
(_Insert_base::_M_insert_range_multi): ... this.
(_Insert_base::insert): Call _M_emplace_uniq or _M_emplace_multi
instead of _M_insert. Add insert overloads from _Insert.
(_Insert_base): Rename to _Insert.
(_Insert): Remove
* testsuite/23_containers/unordered_map/96088.cc: Adjust
expected number of allocations.
* testsuite/23_containers/unordered_set/96088.cc: Likewise.

libstdc++: Allow unordered_set assignment to assign to existing nodes

Currently the _ReuseOrAllocNode::operator(Args&&...) function always
destroys the value stored in recycled nodes and constructs a new value.

The _ReuseOrAllocNode type is only ever used for implementing
assignment, either from another unordered container of the same type, or
from std::initializer_list<value_type>. Consequently, the parameter pack
Args only ever consists of a single parameter or type const value_type&
or value_type. We can replace the variadic parameter pack with a single
forwarding reference parameter, and when the value_type is assignable
from that type we can use assignment instead of destroying the existing
value and then constructing a new one.

Using assignment is typically only possible for sets, because for maps
the value_type is std::pair<const key_type, mapped_type> and in most
cases std::is_assignable_v<const key_type&, const key_type&> is false.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_ReuseOrAllocNode::operator()):
Replace parameter pack with a single parameter. Assign to
existing value when possible.
* testsuite/23_containers/unordered_multiset/allocator/move_assign.cc:
Adjust expected count of operations.
* testsuite/23_containers/unordered_set/allocator/move_assign.cc:
Likewise.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

libstdc++: Refactor _Hashtable::operator=(initializer_list<value_type>)

This replaces a call to _M_insert_range with open coding the loop. This
will allow removing the node generator parameter from _M_insert_range in
a later commit.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (operator=(initializer_list)):
Refactor to not use _M_insert_range.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

libstdc++: Fix calculation of system time in performance tests

The system_time() function used the wrong element of the splits array.

Also add a comment about the units for time measurements.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_performance.h (time_counter): Add
comment about times.
(time_counter::system_time): Use correct split value.

libstdc++: Write timestamp to libstdc++-performance.sum file

The results of 'make check-performance' are appended to the .sum file,
with no indication where one set of results ends and the next begins. We
could just remove the file when starting a new run, but appending makes
it a little easier to compare with previous runs, without having to copy
and store old files.

This adds a header containing a timestamp to the file when starting a
new run.

libstdc++-v3/ChangeLog:

* scripts/check_performance: Add timestamp to output file at
start of run.

libstdc++: Use __is_single_threaded() in performance tests

With recent glibc releases the __gthread_active_p() function is always
true, so we always append "-thread" onto performance benchmark names.

Use the __gnu_cxx::__is_single_threaded() function instead.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_performance.h: Use
__gnu_cxx::__is_single_threaded instead of __gthread_active_p().

libstdc++: Stop using std::unary_function in perf tests

This fixes some -Wdeprecated-declarations warnings.

libstdc++-v3/ChangeLog:

* testsuite/performance/ext/pb_ds/hash_int_erase_mem.cc: Replace
std::unary_function with result_type and argument_type typedefs.
* testsuite/util/performance/assoc/multimap_common_type.hpp:
Likewise.

libstdc++: Fix nodiscard warnings in perf test for memory pools

The use of unnamed std::lock_guard temporaries was intentional here, as
they were used like barriers (but std::barrier isn't available until
C++20). But that gives nodiscard warnings, because unnamed temporary
locks are usually unintentional. Use named variables in new block scopes
instead.

libstdc++-v3/ChangeLog:

* testsuite/performance/20_util/memory_resource/pools.cc: Fix
-Wunused-value warnings about unnamed std::lock_guard objects.

aarch64: Relax add_overloaded_function assert

There are some SVE intrinsics that support one set of suffixes for
one extension (E1, say) and another set of suffixes for another
extension (E2, say).  It is usually the case that, mutatis mutandis,
E2 extends E1.  Listing E1 first would then ensure that the manual
C overload would also require E1, making it suitable for resolving
both the E1 forms and, where appropriate, the E2 forms.

However, there was one exception: the I8MM, F32MM, and F64MM extensions
to SVE each added variants of svmmla, but there was no svmmla for SVE
itself.  This was handled by adding an SVE entry for svmmla that only
defined the C overload; it had no variants of its own.

This situation occurs more often with upcoming patches.  Rather than
keep adding these dummy entries, it seemed better to make the code
automatically compute the lowest common denominator for all definitions
that share the same C overload.

gcc/
* config/aarch64/aarch64-protos.h
(aarch64_required_extensions::common_denominator): New member
function.
* config/aarch64/aarch64-sve-builtins-base.def: Remove zero-variant
entry for mmla.
* config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): Remove
support for it.
* config/aarch64/aarch64-sve-builtins.cc
(function_builder::add_overloaded): Relax the assert for duplicate
definitions and instead calculate the common denominator of all
requirements.

i386: Add -mveclibabi=aocl [PR56504]

We currently support generating vectorized math calls to the AMD core
math library (ACML) (-mveclibabi=acml).  That library is end-of-life and
its successor is the math library from AMD Optimizing CPU Libraries
(AOCL).

This patch adds support for AOCL (-mveclibabi=aocl).  That significantly
broadens the range of vectorized math functions optimized for AMD CPUs
that GCC can generate calls to.

See the edit to invoke.texi for a complete list of added functions.
Compared to the list of functions in AOCL LibM docs I left out these
vectorized function families:

- sincos and all functions working with arrays ... Because these
  functions have pointer arguments and that would require a bigger
  rework of ix86_veclibabi_aocl().  Also, I'm not sure if GCC even ever
  generates calls to these functions.
- linearfrac ... Because these functions are specific to the AMD
  library.  There's no equivalent glibc function nor GCC internal
  function nor GCC built-in.
- powx, sqrt, fabs ... Because GCC doesn't vectorize these functions
  into calls and uses instructions instead.

I also left amd_vrd2_expm1() (the AMD docs list the function but I
wasn't able to link calls to it with the current version of the
library).

gcc/ChangeLog:

PR target/56504
* config/i386/i386-options.cc (ix86_option_override_internal):
Add ix86_veclibabi_type_aocl case.
* config/i386/i386-options.h (ix86_veclibabi_aocl): Add extern
ix86_veclibabi_aocl().
* config/i386/i386-opts.h (enum ix86_veclibabi): Add
ix86_veclibabi_type_aocl into the ix86_veclibabi enum.
* config/i386/i386.cc (ix86_veclibabi_aocl): New function.
* config/i386/i386.opt: Add the 'aocl' type.
* doc/invoke.texi: Document -mveclibabi=aocl.

gcc/testsuite/ChangeLog:

PR target/56504
* gcc.target/i386/vectorize-aocl1.c: New test.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

hppa: Remove inner `fix:SF/DF` from fixed-point patterns

2024-11-13 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/117525
* config/pa/pa.md (fix_truncsfsi2): Remove inner `fix:SF`.
(fix_truncdfsi2, fix_truncsfdi2, fix_truncdfdi2,
fixuns_truncsfsi2, fixuns_truncdfsi2, fixuns_truncsfdi2,
fixuns_truncdfdi2): Likewise.

diagnostics: avoid using global_dc in path-printing

gcc/analyzer/ChangeLog:
* checker-path.cc (checker_path::debug): Explicitly use
global_dc's reference printer.
* diagnostic-manager.cc
(diagnostic_manager::prune_interproc_events): Likewise.
(diagnostic_manager::prune_system_headers): Likewise.

gcc/ChangeLog:
* diagnostic-path.cc (diagnostic_event::get_desc): Add param
"ref_pp" and use instead of global_dc.
(class path_label): Likewise, adding field m_ref_pp.
(event_range::event_range): Add param "ref_pp" and pass to
m_path_label.
(path_summary::path_summary): Add param "ref_pp" and pass to
event_range ctor.
(diagnostic_text_output_format::print_path): Pass *pp to
path_summary ctor.
(selftest::test_empty_path): Pass *event_pp to pass_summary ctor.
(selftest::test_intraprocedural_path): Likewise.
(selftest::test_interprocedural_path_1): Likewise.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
(selftest::test_control_flow_1): Likewise.
(selftest::test_control_flow_2): Likewise.
(selftest::test_control_flow_3): Likewise.
(selftest::assert_cfg_edge_path_streq): Likewise.
(selftest::test_control_flow_5): Likewise.
(selftest::test_control_flow_6): Likewise.
* diagnostic-path.h (diagnostic_event::get_desc): Add param
"ref_pp".
* lazy-diagnostic-path.cc (selftest::test_intraprocedural_path):
Pass *event_pp to get_desc.
* simple-diagnostic-path.cc (selftest::test_intraprocedural_path):
Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Match: Fold pow calls to ldexp when possible [PR57492]

This patch transforms the following POW calls to equivalent LDEXP calls, as
discussed in PR57492:

powi (powof2, i) -> ldexp (1.0, i * log2 (powof2))

powof2 * ldexp (x, i) -> ldexp (x, i + log2 (powof2))

a * ldexp(1., i) -> ldexp (a, i)

This is especially helpful for SVE architectures as LDEXP calls can be
implemented using the FSCALE instruction, as seen in the following patch:
https://gcc.gnu.org/g:9b2915d95d855333d4d8f66b71a75f653ee0d076

SPEC2017 was run with this patch, while there are no noticeable improvements,
there are no non-noise regressions either.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.

Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
PR target/57492
* match.pd: Added patterns to fold calls to pow to ldexp and optimize
specific ldexp calls.

gcc/testsuite/ChangeLog:
PR target/57492
* gcc.dg/tree-ssa/ldexp.c: New test.
* gcc.dg/tree-ssa/pow-to-ldexp.c: New test.

RISC-V: Add Multi-Versioning Test Cases

This patch adds test cases for the Function Multi-Versioning (FMV)
feature for RISC-V, which reuses the existing test cases from the
aarch64 and ported them to RISC-V.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/testsuite/ChangeLog:

* g++.target/riscv/mv-symbols1.C: New test.
* g++.target/riscv/mv-symbols2.C: New test.
* g++.target/riscv/mv-symbols3.C: New test.
* g++.target/riscv/mv-symbols4.C: New test.
* g++.target/riscv/mv-symbols5.C: New test.
* g++.target/riscv/mvc-symbols1.C: New test.
* g++.target/riscv/mvc-symbols2.C: New test.
* g++.target/riscv/mvc-symbols3.C: New test.
* g++.target/riscv/mvc-symbols4.C: New test.

RISC-V: Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY and TARGET_GET_FUNCTION_VERSIONS_DISPATCHER

This patch implements the TARGET_GENERATE_VERSION_DISPATCHER_BODY and
TARGET_GET_FUNCTION_VERSIONS_DISPATCHER for RISC-V. This is used to
generate the dispatcher function and get the dispatcher function for
function multiversioning.

This patch copies many codes from commit 0cfde688e213 ("[aarch64]
Add function multiversioning support") and modifies them to fit the
RISC-V port. A key difference is the data structure of feature bits in
RISC-V C-API is a array of unsigned long long, while in AArch64 is not
a array. So we need to generate the array reference for each feature
bits element in the dispatcher function.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* config/riscv/riscv.cc (add_condition_to_bb): New function.
(dispatch_function_versions): New function.
(get_suffixed_assembler_name): New function.
(make_resolver_func): New function.
(riscv_generate_version_dispatcher_body): New function.
(riscv_get_function_versions_dispatcher): New function.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): Implement it.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Implement it.

RISC-V: Implement TARGET_MANGLE_DECL_ASSEMBLER_NAME

This patch implements the TARGET_MANGLE_DECL_ASSEMBLER_NAME for RISC-V.
This is used to add function multiversioning suffixes to the assembler
name.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_mangle_decl_assembler_name): New function.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): Define.

RISC-V: Implement TARGET_COMPARE_VERSION_PRIORITY and TARGET_OPTION_FUNCTION_VERSIONS

This patch implements TARGET_COMPARE_VERSION_PRIORITY and
TARGET_OPTION_FUNCTION_VERSIONS for RISC-V.

The TARGET_COMPARE_VERSION_PRIORITY is implemented to compare the
priority of two function versions based on the rules defined in the
RISC-V C-API Doc PR #85:

https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85/files#diff-79a93ca266139524b8b642e582ac20999357542001f1f4666fbb62b6fb7a5824R721

If multiple versions have equal priority, we select the function with
the most number of feature bits generated by
riscv_minimal_hwprobe_feature_bits. When it comes to the same number of
feature bits, we diff two versions and select the one with the least
significant bit set. Since a feature appears earlier in the feature_bits
might be more important to performance.

The TARGET_OPTION_FUNCTION_VERSIONS is implemented to check whether the
two function versions are the same. This Implementation reuses the code
in TARGET_COMPARE_VERSION_PRIORITY and check it returns 0, which means
the equal priority.

Co-Developed-by: Hank Chang <hank.chang@sifive.com>
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* config/riscv/riscv.cc
(parse_features_for_version): New function.
(compare_fmv_features): New function.
(riscv_compare_version_priority): New function.
(riscv_common_function_versions): New function.
(TARGET_COMPARE_VERSION_PRIORITY): Implement it.
(TARGET_OPTION_FUNCTION_VERSIONS): Implement it.

RISC-V: Implement TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P

This patch implements the TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P for
RISC-V. This hook is used to process attribute
((target_version ("..."))).

As it is the first patch which introduces the target_version attribute,
we also set TARGET_HAS_FMV_TARGET_ATTRIBUTE to 0 to use "target_version"
for function versioning.

Co-Developed-by: Hank Chang <hank.chang@sifive.com>
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* config/riscv/riscv-protos.h
(riscv_process_target_attr): Remove as it is not used.
(riscv_option_valid_version_attribute_p): Declare.
(riscv_process_target_version_attr): Declare.
* config/riscv/riscv-target-attr.cc
(riscv_target_attrs): Renamed from riscv_attributes.
(riscv_target_version_attrs): New attributes for target_version.
(riscv_process_one_target_attr): New arguments to select attrs.
(riscv_process_target_attr): Likewise.
(riscv_option_valid_attribute_p): Likewise.
(riscv_process_target_version_attr): New function.
(riscv_option_valid_version_attribute_p): New function.
* config/riscv/riscv.cc
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): Implement it.
* config/riscv/riscv.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): Define
it to 0 to use "target_version" for function versioning.

RISC-V: Implement riscv_minimal_hwprobe_feature_bits

This patch implements the riscv_minimal_hwprobe_feature_bits feature
for the RISC-V target. The feature bits are defined in the
libgcc/config/riscv/feature_bits.c to provide bitmasks of ISA extensions
that defined in RISC-V C-API. Thus, we need a function to generate the
feature bits for IFUNC resolver to dispatch between different functions
based on the hardware features.

The minimal feature bits means to use the earliest extension appeard in
the Linux hwprobe to cover the given ISA string. To allow older kernels
without some implied extensions probe to run the FMV dispatcher
correctly.

For example, V implies Zve32x, but Zve32x appears in the Linux kernel
since v6.11. If we use isa string directly to generate FMV dispatcher
with functions with "arch=+v" extension, since we have V implied the
Zve32x, FMV dispatcher will check if the Zve32x extension is supported
by the host. If the Linux kernel is older than v6.11, the FMV dispatcher
will fail to detect the Zve32x extension even it already implies by the
V extension, thus making the FMV dispatcher fail to dispatch the correct
function.

Thus, we need to generate the minimal feature bits to cover the given
ISA string to allow the FMV dispatcher to work correctly on older
kernels.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(RISCV_EXT_BITMASK): New macro.
(struct riscv_ext_bitmask_table_t): New struct.
(riscv_minimal_hwprobe_feature_bits): New function.
* common/config/riscv/riscv-ext-bitmask.def: New file.
* config/riscv/riscv-subset.h (GCC_RISCV_SUBSET_H): Include
riscv-feature-bits.h.
(riscv_minimal_hwprobe_feature_bits): Declare the function.
* config/riscv/riscv-feature-bits.h: New file.

RISC-V: Implement Priority syntax parser for Function Multi-Versioning

This patch adds the priority syntax parser to support the Function
Multi-Versioning (FMV) feature in RISC-V. This feature allows users to
specify the priority of the function version in the attribute syntax.

Chnages based on RISC-V C-API PR:
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* config/riscv/riscv-target-attr.cc
(riscv_target_attr_parser::handle_priority): New function.
(riscv_target_attr_parser::update_settings): Update priority
attribute.
* config/riscv/riscv.opt: Add TargetVariable riscv_fmv_priority.

Introduce TARGET_CLONES_ATTR_SEPARATOR for RISC-V

Some architectures may use ',' in the attribute string, but it is not
used as the separator for different targets. To avoid conflict, we
introduce a new macro TARGET_CLONES_ATTR_SEPARATOR to separate different
clones.

As an example, according to RISC-V C-API Specification [1], RISC-V allows
',' in the attribute string in the "arch=" option to specify one more
ISA extensions in the same target function, which conflict with the
default separator to separate different clones. This patch introduces
TARGET_CLONES_ATTR_SEPARATOR for RISC-V and choose '#' as the separator,
since '#' is not allowed in the target_clones option string.

[1] https://github.com/riscv-non-isa/riscv-c-api-doc/blob/c6c5d6d9cf96b342293315a5dff3d25e96ef8191/src/c-api.adoc#__attribute__targetattr-string

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* defaults.h (TARGET_CLONES_ATTR_SEPARATOR): Define new macro.
* multiple_target.cc (get_attr_str): Use
TARGET_CLONES_ATTR_SEPARATOR to separate attributes.
(separate_attrs): Likewise.
(expand_target_clones): Likewise.
* attribs.cc (attr_strcmp): Likewise.
(sorted_attr_string): Likewise.
* tree.cc (get_target_clone_attr_len): Likewise.
* config/riscv/riscv.h (TARGET_CLONES_ATTR_SEPARATOR): Define
TARGET_CLONES_ATTR_SEPARATOR for RISC-V.
* doc/tm.texi: Document TARGET_CLONES_ATTR_SEPARATOR.
* doc/tm.texi.in: Likewise.

Fortran: Fix failing character pointer fcn assignment [PR105054]

2024-11-14 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/105054
* resolve.cc (get_temp_from_expr): If the pointer function has
a deferred character length, generate a new deferred charlen
for the temporary.

gcc/testsuite/
PR fortran/105054
* gfortran.dg/ptr_func_assign_6.f08: New test.

c: add Wzero-as-null-pointer-constant [PR117059]

Add warnings for the use of zero as a null pointer constant to the C FE.

PR c/117059

gcc/c-family/ChangeLog:
* c.opt (Wzero-as-null-pointer-constant): Enable for C and ObjC.

gcc/c/ChangeLog:
* c-typeck.cc (parse_build_binary_op): Add warning.
(build_conditional_expr): Add warning.
(convert_for_assignment): Add warning.

gcc/ChangeLog:
* doc/invoke.texi (Wzero-as-null-pointer-constant): Adapt
description.

gcc/testsuite/ChangeLog:
* gcc.dg/Wzero-as-null-pointer-constant.c: New test.

Suggested-by: Alejandro Colomar <alx@kernel.org>
Acked-by: Alejandro Colomar <alx@kernel.org>
Reviewed-by: Joseph Myers <josmyers@redhat.com>

c: Handle C23 floating constant {d,D}{32,64,128} suffixes like {df,dd,dl}

C23 roughly says that {d,D}{32,64,128} floating point constant suffixes
are alternate spellings of {df,dd,dl} suffixes in annex H.

So, the following patch allows that alternate spelling.
Or is it intentional it isn't enabled and we need to do everything in
there first before trying to define __STDC_IEC_60559_DFP__?
Like add support for _Decimal32x and _Decimal64x types (including
the d32x and d64x suffixes) etc.

2024-11-13 Jakub Jelinek <jakub@redhat.com>

libcpp/
* expr.cc (interpret_float_suffix): Handle d32 and D32 suffixes
for C like df, d64 and D64 like dd and d128 and D128 like
dl.
gcc/c-family/
* c-lex.cc (interpret_float): Subtract 3 or 4 from copylen
rather than 2 if last character of CPP_N_DFLOAT is a digit.
gcc/testsuite/
* gcc.dg/dfp/c11-constants-3.c: New test.
* gcc.dg/dfp/c11-constants-4.c: New test.
* gcc.dg/dfp/c23-constants-3.c: New test.
* gcc.dg/dfp/c23-constants-4.c: New test.

c: Implement C2Y N3298 - Introduce complex literals [PR117029]

The following patch implements the C2Y N3298 paper Introduce complex literals
by providing different (or no) diagnostics on imaginary constants (except
for integer ones).
For _DecimalN constants we don't support _Complex _DecimalN and error on any
i/j suffixes mixed with DD/DL/DF, so nothing changed there.

2024-11-13 Jakub Jelinek <jakub@redhat.com>

PR c/117029
libcpp/
* include/cpplib.h (struct cpp_options): Add imaginary_constants
member.
* init.cc (struct lang_flags): Add imaginary_constants bitfield.
(lang_defaults): Add column for imaginary_constants.
(cpp_set_lang): Copy over imaginary_constants.
* expr.cc (cpp_classify_number): Diagnose CPP_N_IMAGINARY
non-CPP_N_FLOATING constants differently for C.
gcc/testsuite/
* gcc.dg/cpp/pr7263-3.c: Adjust expected diagnostic wording.
* gcc.dg/c23-imaginary-constants-1.c: New test.
* gcc.dg/c23-imaginary-constants-2.c: New test.
* gcc.dg/c23-imaginary-constants-3.c: New test.
* gcc.dg/c23-imaginary-constants-4.c: New test.
* gcc.dg/c23-imaginary-constants-5.c: New test.
* gcc.dg/c23-imaginary-constants-6.c: New test.
* gcc.dg/c23-imaginary-constants-7.c: New test.
* gcc.dg/c23-imaginary-constants-8.c: New test.
* gcc.dg/c23-imaginary-constants-9.c: New test.
* gcc.dg/c23-imaginary-constants-10.c: New test.
* gcc.dg/c2y-imaginary-constants-1.c: New test.
* gcc.dg/c2y-imaginary-constants-2.c: New test.
* gcc.dg/c2y-imaginary-constants-3.c: New test.
* gcc.dg/c2y-imaginary-constants-4.c: New test.
* gcc.dg/c2y-imaginary-constants-5.c: New test.
* gcc.dg/c2y-imaginary-constants-6.c: New test.
* gcc.dg/c2y-imaginary-constants-7.c: New test.
* gcc.dg/c2y-imaginary-constants-8.c: New test.
* gcc.dg/c2y-imaginary-constants-9.c: New test.
* gcc.dg/c2y-imaginary-constants-10.c: New test.
* gcc.dg/c2y-imaginary-constants-11.c: New test.
* gcc.dg/c2y-imaginary-constants-12.c: New test.

aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]

This patch uses the FSCALE instruction provided by SVE to implement the
standard ldexp family of functions.

Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
following code:

float
test_ldexpf (float x, int i)
{
return __builtin_ldexpf (x, i);
}

double
test_ldexp (double x, int i)
{
return __builtin_ldexp(x, i);
}

GCC Output:

test_ldexpf:
b ldexpf

test_ldexp:
b ldexp

Since SVE has support for an FSCALE instruction, we can use this to process
scalar floats by moving them to a vector register and performing an fscale call,
similar to how LLVM tackles an ldexp builtin as well.

New Output:

test_ldexpf:
fmov s31, w0
ptrue p7.b, vl4
fscale z0.s, p7/m, z0.s, z31.s
ret

test_ldexp:
sxtw x0, w0
ptrue p7.b, vl8
fmov d31, x0
fscale z0.d, p7/m, z0.d, z31.d
ret

This is a revision of an earlier patch, and now uses the extended definition of
aarch64_ptrue_reg to generate predicate registers with the appropriate set bits.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:

PR target/111733
* config/aarch64/aarch64-sve.md
(ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
floating modes and expand to the existing pattern for FSCALE.
* config/aarch64/iterators.md:
(SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well
as their scalar equivalents.
(VPRED): Extended the attribute to handle GPF_HF modes.
* internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/fscale.c: New test.

RISC-V: Bugfix for max_sew_overlap_and_next_ratio_valid_for_prev_sew_p[pr117483]

This patch fixs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117483

If prev and next satisfy the following rules, we should forbid the case
(next.get_sew() < prev.get_sew() && (!next.get_ta() || !next.get_ma()))
in the compatible function max_sew_overlap_and_next_ratio_valid_for_prev_sew_p.
Otherwise, the tail elements of next will be polluted.

DEF_SEW_LMUL_RULE (ge_sew, ratio_and_ge_sew, ratio_and_ge_sew,
max_sew_overlap_and_next_ratio_valid_for_prev_sew_p,
always_false, use_max_sew_and_lmul_with_next_ratio)

Passed the rv64gcv full regression test.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
PR target/117483

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117483.c: New test.

[RISC-V] Fix costing of LO_SUM expressions

This is a rewrite of a patch originally from Xianmiao Qu.  Xianmiao
noticed that the costs we compute for LO_SUM expressions was incorrect.
Essentially we costed based solely on the first input to the LO_SUM.

In a LO_SUM, the first input is almost always going to be a REG and thus
isn't interesting.  The second argument is almost always going to be
some kind of symbolic operand, which is much more interesting from a
costing standpoint.

The right way to fix this is to sum the cost of the two operands.  I've
verified this produces the same code as Xianmiao's Qu's original patch.

This has been tested on rv32 and rv64 in my tester.  It missed today's
bootstrap of riscv64 though :(  Naturally I'll wait on the pre-commit CI
tester to render a verdict, but I don't expect any problems.

--  From Xianmiao Qu's original submission --

Currently, the cost of the LO_SUM expression is based on
the cost of calculating the first subexpression. When the
first subexpression is a register, the cost result will
be zero. It seems a bit unreasonable for a SET expression
to have a zero cost when its source is LO_SUM. Moreover,
having a cost of zero for the expression will lead the
loop invariant pass to calculate its benefits of being
moved outside the loop as zero, thus preventing the
out-of-loop placement of the loop invariant.

As an example, consider the following test case:
   long a;
   long b[];
   long *c;
   foo () {
     for (;;)
       *c = b[a];
   }

When compiling with -march=rv64gc -mabi=lp64d -Os, the following code is
generated:
         .cfi_startproc
         lui     a5,%hi(c)
         ld      a4,%lo(c)(a5)
         lui     a2,%hi(b)
         lui     a1,%hi(a)
.L2:
         ld      a5,%lo(a)(a1)
         addi    a3,a2,%lo(b)
         slli    a5,a5,3
         add     a5,a5,a3
         ld      a5,0(a5)
         sd      a5,0(a4)
         j       .L2

After adjust the cost of the LO_SUM expression, the instruction addi will be
moved outside the loop:
         .cfi_startproc
         lui     a5,%hi(c)
         ld      a3,%lo(c)(a5)
         lui     a4,%hi(b)
         lui     a2,%hi(a)
         addi    a4,a4,%lo(b)
.L2:
         ld      a5,%lo(a)(a2)
         slli    a5,a5,3
         add     a5,a5,a4
         ld      a5,0(a5)
         sd      a5,0(a3)
         j       .L2

gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Correct costing of LO_SUM
expressions.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

Reapply "[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]"

This reverts commit de3b277247ce98d189f121155b75f490725a42f6.

i386: Zero extend 32-bit address to 64-bit with option -mx32 -maddress-mode=long. [PR 117418]

-maddress-mode=long let Pmode = DI_mode, so zero extend 32-bit address to
64-bit and uses a 64-bit register as a pointer for avoid raise an ICE.

gcc/ChangeLog:

PR target/117418
* config/i386/i386-expand.cc (ix86_expand_builtin): Convert
pointer's mode according to Pmode.

gcc/testsuite/ChangeLog:

PR target/117418
* gcc.target/i386/pr117418-1.c: New test.

Daily bump.

Revert "[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]"

This reverts commit 69bd93c167fefbdff0cb88614275358b7a2b2941.

RISC-V: Fix target-attr-norelax.c testcase

The target-attr-norelax.c testcase was failing due to the redundant "\t"
check in the assembly output, and forgot to skip the check for lto build
in the testcase.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/target-attr-norelax.c: Fix testcase.

Revert "Match: Simplify branch form 3 of unsigned SAT_ADD into branchless"

This reverts commit df4af89bc3eabbeaccb16539aa1082cb9863e187.

selftests: clear GCC_COLORS [PR117503]

gcc/ChangeLog:
PR bootstrap/117503
* Makefile.in (GCC_FOR_SELFTESTS): Set GCC_COLORS=.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

hppa: Fix decrement_and_branch_until_zero constraint

The third alternative for argument 4 needs to be an early clobber
constraint. Noticed testing LRA.

2024-11-12 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.md (decrement_and_branch_until_zero): Fix
constraint.

RISC-V: testsuite: Remove deprecated compatibility headers

Since r15-4981-g5c34f02ba7e these tests have been failing on vector
targets with excess errors due to the new deprecation warning message.
Remove the <cstdalign> header.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/bug-10.C: Remove cstdalign header.
* g++.target/riscv/rvv/base/bug-11.C: Ditto.
* g++.target/riscv/rvv/base/bug-12.C: Ditto.
* g++.target/riscv/rvv/base/bug-13.C: Ditto.
* g++.target/riscv/rvv/base/bug-14.C: Ditto.
* g++.target/riscv/rvv/base/bug-15.C: Ditto.
* g++.target/riscv/rvv/base/bug-16.C: Ditto.
* g++.target/riscv/rvv/base/bug-17.C: Ditto.
* g++.target/riscv/rvv/base/bug-2.C: Ditto.
* g++.target/riscv/rvv/base/bug-23.C: Ditto.
* g++.target/riscv/rvv/base/bug-3.C: Ditto.
* g++.target/riscv/rvv/base/bug-4.C: Ditto.
* g++.target/riscv/rvv/base/bug-5.C: Ditto.
* g++.target/riscv/rvv/base/bug-6.C: Ditto.
* g++.target/riscv/rvv/base/bug-7.C: Ditto.
* g++.target/riscv/rvv/base/bug-8.C: Ditto.
* g++.target/riscv/rvv/base/bug-9.C: Ditto.

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

Verify that empty std::vector is optimized away

With __builtin_operator_new we now can optimize away unused std::vectors.
This adds testcases mentioned in the PR.

PR tree-optimization/96945

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr96945.C: New test.

testsuite: Adjust jump threading test expectation

This test started failing on aarch64 after 0cfc9c95 in 2023 ("Phi
analyzer - Initialize with range instead of a tree.").

The only change visible in the pass dumps prior to thread2 is the upper
bounds of some ranges are reduced from +INF to 7, consistent with the
bitamsk information. After thread2, there are changes in the control
flow, but only affecting edges that are obviously never taken (from
basic blocks 6 through 12). These are cleaned up in the following pass,
but the final codegen remains different.

There isn't anything obviously wrong with the change in dump output, so
let's just update the test expectations (as has happened previously
here).

gcc/testsuite/ChangeLog:

PR tree-optimization/112376
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update expectation.

AArch64: Remove duplicated addr_cost tables

Remove duplicated addr_cost tables - use generic_armv9_a_addrcost_table for
Armv9-a cores and generic_armv8_a_addrcost_table for recent Armv8-a cores.
No changes in generated code.

gcc/ChangeLog:

* config/aarch64/tuning_models/cortexx925.h (cortexx925_addrcost_table): Remove.
* config/aarch64/tuning_models/neoversen1.h: Use generic_armv8_a_addrcost_table.
* config/aarch64/tuning_models/neoversen2.h (neoversen2_addrcost_table): Remove.
* config/aarch64/tuning_models/neoversen3.h (neoversen3_addrcost_table): Remove.
* config/aarch64/tuning_models/neoversev2.h (neoversev2_addrcost_table): Remove.
* config/aarch64/tuning_models/neoversev3.h (neoversev3_addrcost_table): Remove.
* config/aarch64/tuning_models/neoversev3ae.h (neoversev3ae_addrcost_table): Remove.

AArch64: Cleanup fusion defines

Cleanup the fusion defines by introducing AARCH64_FUSE_BASE as a common base
level of fusion supported by almost all cores.  Add AARCH64_FUSE_MOVK as a
shortcut for all MOVK fusion.  In most cases there is no change.  It enables
AARCH64_FUSE_CMP_BRANCH for a few older cores since it has no measurable
effect if a core doesn't support it.  Also it may have been accidentally
left out on some cores that support all other types of branch fusion.

gcc/ChangeLog:

* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_BASE): New define.
(AARCH64_FUSE_MOVK): Likewise.
* config/aarch64/tuning_models/a64fx.h: Update.
* config/aarch64/tuning_models/ampere1.h: Likewise.
* config/aarch64/tuning_models/ampere1a.h: Likewise.
* config/aarch64/tuning_models/ampere1b.h: Likewise.
* config/aarch64/tuning_models/cortexa35.h: Likewise.
* config/aarch64/tuning_models/cortexa53.h: Likewise.
* config/aarch64/tuning_models/cortexa57.h: Likewise.
* config/aarch64/tuning_models/cortexa72.h: Likewise.
* config/aarch64/tuning_models/cortexa73.h: Likewise.
* config/aarch64/tuning_models/cortexx925.h: Likewise.
* config/aarch64/tuning_models/exynosm1.h: Likewise.
* config/aarch64/tuning_models/fujitsu_monaka.h: Likewise.
* config/aarch64/tuning_models/generic.h: Likewise.
* config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
* config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
* config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
* config/aarch64/tuning_models/neoversen1.h: Likewise.
* config/aarch64/tuning_models/neoversen2.h: Likewise.
* config/aarch64/tuning_models/neoversen3.h: Likewise.
* config/aarch64/tuning_models/neoversev1.h: Likewise.
* config/aarch64/tuning_models/neoversev2.h: Likewise.
* config/aarch64/tuning_models/neoversev3.h: Likewise.
* config/aarch64/tuning_models/neoversev3ae.h: Likewise.
* config/aarch64/tuning_models/qdf24xx.h: Likewise.
* config/aarch64/tuning_models/saphira.h: Likewise.
* config/aarch64/tuning_models/thunderx2t99.h: Likewise.
* config/aarch64/tuning_models/thunderx3t110.h: Likewise.
* config/aarch64/tuning_models/tsv110.h: Likewise.

RISC-V: Fix incorrect test macro for signed scalar SAT_ADD form 2 run test

This patch would like to fix one incorrect test macro usage for
form 2 of signed scalar SAT_ADD run test. It should leverage the
_FMT_2 instead of _FMT_1 for form 2.

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macro.
* gcc.target/riscv/sat_s_add-run-5.c: Take form 2 for run test.
* gcc.target/riscv/sat_s_add-run-6.c: Ditto.
* gcc.target/riscv/sat_s_add-run-7.c: Ditto.
* gcc.target/riscv/sat_s_add-run-8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add norelax function attribute

This patch adds norelax function attribute that be discussed in riscv-c-api-doc PR#94.
URL:https://github.com/riscv-non-isa/riscv-c-api-doc/pull/94

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_declare_function_name): Add new
attribute.

[RISC-V] Drop undesirable two instruction macc alternatives

So I was looking at sub_dct a little while ago and was surprised to see
us emit two instructions out of a single pattern.  We generally try to
avoid that -- it's not always possible, but as a general rule of thumb
it should be avoided.  Specifically I saw:

>         vmv1r.v v4,v2   # 138   [c=4 l=4]  *pred_mul_plusrvvm1hi_undef/5
>         vmacc.vv        v4,v8,v1

When we emit multiple instructions out of a single pattern we can't
build a good schedule as we can't really describe the two instructions
well and we can't split them up -- they move as an atomic unit.

These cases can also raise correctness issues if the pattern doesn't
properly account for both instructions in its length computation.

Note the length, 4 bytes.  So this is both a performance and latent
correctness issue.

It appears that these alternatives are meant to deal with the case when
we have three source inputs and a non-matching output.  The author did
put in "?" to slightly disparage these alternatives, but a "!" would
have been better.  The best solution is to just remove those
alternatives and let the allocator manage the matching operand issue.

That's precisely what this patch does.  For the various integer
multiply-add/multiply-accumulate patterns we drop the alternatives which
don't require a match between the output and one of the inputs.

That fixes the correctness issue and should shave a cycle or two off our
sub_dct code.  Essentially the move bubbles up into an empty slot and we
can schedule around the vmacc sensibly.

Interestingly enough this fixes a scan-assembler test in my tester for
both rv32 and rv64.

> Tests that now work, but didn't before (10 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
> unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
> unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
> unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
> unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
> unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
> unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
> unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
> unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8
> unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8

My BPI is already in a bootstrap test, so this patch won't hit the BPI
for bootstrapping until Wednesday, meaning no data until Thursday.  Will
wait for the pre-commit tester though.

gcc/
* config/riscv/vector.md (pred_mul_plus<mode>_undef): Drop alternatives
where output doesn't have to match input.
(pred_madd<mode>, pred_macc<mode>): Likewise.
(pred_madd<mode>_scalar, pred_macc<mode>_scalar): Likewise.
(pred_madd<mode>_exended_scalar): Likewise.
(pred_macc<mode>_exended_scalar): Likewise.
(pred_minus_mul<mode>_undef): Likewise.
(pred_nmsub<mode>, pred_nmsac<mode>): Likewise.
(pred_nmsub<mode>_scalar, pred_nmsac<mode>_scalar): Likewise.
(pred_nmsub<mode>_exended_scalar): Likewise.
(pred_nmsac<mode>_exended_scalar): Likewise.

libsanitizer: Update LOCAL_PATCHES

tree-optimization/116973 - SLP permute lower heuristic and single-lane SLP

When forcing single-lane SLP to emulate non-SLP behavior we need to
disable heuristics designed to optimize SLP loads and instead in
all cases resort to an interleaving scheme as requested by forcefully
doing single-lane SLP.

This fixes the remaining fallout for --param vect-force-slp=1 on x86.

PR tree-optimization/116973
* tree-vect-slp.cc (vect_lower_load_permutations): Add
force_single_lane parameter. Disable heuristic that keeps
some load-permutations.
(vect_analyze_slp): Pass force_single_lane to
vect_lower_load_permutations.

libsanitizer: update test

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/builtin-1.c: Update test case due to
sanitizer has change the error message.

libsanitizer: Improve FrameIsInternal

`FrameIsInternal` is a function that improves report quality by filtering out
internal functions from the sanitizer, allowing it to point to a more precise
root cause. However, the current checks are mostly specific to compiler-rt,
so we are adding a few more rules to enhance the filtering for libsanitizer as
well.

libsanitizer: Apply local patches

This patch just reapplies local patches (will be noted in LOCAL_PATCHES).

libsanitizer: merge from upstream (61a6439f35b6de28)

[committed] Fix minor c6x backend bug exposed by CRC patches

This is a minor bug in the c6x port I saw when testing Mariam's CRC work.
Specifically some of the CRC tests were failing with a segfault testing if an
operand was an "a_register" from within the dest_regfile attribute.  We were
extracting what we thought should have been a register operand then looking at
the REGNO.  The underlying data was totally bogus, hence the fault in the
accessor macros.

The core issue is we were trying to extract operands from a nop insn which has
no operands.  As far as I can tell "unknown" is a reasonable answer for the
dest_regfile attribute on a nop insn, so this patch adds an explicit setting of
dest_regfile rather than letting the default processing kick in.

I'm applying the attached patch to the trunk.

There's still a backend bug affecting ~15 CRC tests.  Essentially the assembler
complains about a label (related to debugging info) not at the start of an
execution packet.  I'm not chasing this down.

gcc/

* config/c6x/c6x.md (nop, nop_count): Add explicit
"dest_regfile" attribute setting.

ada: Typo fix in comment

gcc/ada/ChangeLog:

* gcc-interface/Makefile.in: Remove extra 'with'.

ada: Compile time crash on limited object in extended return

This patch fixes an error in the compiler whereby using an extended return on
an object of limited tagged type which extends a tagged protected type may lead
to a compile-time crash.

gcc/ada/ChangeLog:

* exp_ch3.adb (Build_Assignment): Add condition to fetch corresponding
record types for concurrent tagged types.

ada: Fix spurious error on iterated component association with large index type

This is only for the Ada 2022 form of the iterated component association.

gcc/ada/ChangeLog:

* exp_aggr.adb (Two_Pass_Aggregate_Expansion): Use a type sized
from the index type to compute the length. Simplify and remove
useless calls to New_Copy_Tree for this computation.

ada: Include design documentation within runtime sources

The existing design documentation, required when generating the Software
Architecture Design Specification and Software Component Design
Specification documents for the light and light-tasking runtimes, has been
included directly within runtime sources.

gcc/ada/ChangeLog:

* libgnarl/a-dynpri.ads: Add design annotations.
* libgnarl/a-reatim.ads: Likewise.
* libgnarl/a-synbar.ads: Likewise.
* libgnarl/a-taside.ads: Likewise.
* libgnarl/s-tarest.ads: Likewise.
* libgnarl/s-tasinf.ads: Likewise.
* libgnarl/s-taspri__posix.ads: Likewise.
* libgnarl/s-tpobmu.ads: Likewise.
* libgnat/a-assert.ads: Likewise.
* libgnat/a-comlin.ads: Likewise.
* libgnat/a-nbnbig.ads: Likewise.
* libgnat/a-nubinu.ads: Likewise.
* libgnat/a-numeri.ads: Likewise.
* libgnat/a-unccon.ads: Likewise.
* libgnat/a-uncdea.ads: Likewise.
* libgnat/ada.ads: Likewise.
* libgnat/g-debuti.ads: Likewise.
* libgnat/g-sestin.ads: Likewise.
* libgnat/g-souinf.ads: Likewise.
* libgnat/gnat.ads: Likewise.
* libgnat/i-cexten.ads: Likewise.
* libgnat/i-cexten__128.ads: Likewise.
* libgnat/i-cstrin.adb: Likewise.
* libgnat/i-cstrin.ads: Likewise.
* libgnat/interfac__2020.ads: Likewise.
* libgnat/machcode.ads: Likewise.
* libgnat/s-addope.ads: Likewise.
* libgnat/s-aridou.ads: Likewise.
* libgnat/s-arit32.ads: Likewise.
* libgnat/s-arit64.ads: Likewise.
* libgnat/s-assert.ads: Likewise.
* libgnat/s-atacco.ads: Likewise.
* libgnat/s-atocou.ads: Likewise.
* libgnat/s-atocou__builtin.adb: Likewise.
* libgnat/s-atopri.ads: Likewise.
* libgnat/s-bitops.ads: Likewise.
* libgnat/s-boarop.ads: Likewise.
* libgnat/s-bytswa.ads: Likewise.
* libgnat/s-carsi8.ads: Likewise.
* libgnat/s-carun8.ads: Likewise.
* libgnat/s-casi16.ads: Likewise.
* libgnat/s-casi32.ads: Likewise.
* libgnat/s-casi64.ads: Likewise.
* libgnat/s-caun16.ads: Likewise.
* libgnat/s-caun32.ads: Likewise.
* libgnat/s-caun64.ads: Likewise.
* libgnat/s-exnint.ads: Likewise.
* libgnat/s-exnllf.ads: Likewise.
* libgnat/s-exnlli.ads: Likewise.
* libgnat/s-expint.ads: Likewise.
* libgnat/s-explli.ads: Likewise.
* libgnat/s-expllu.ads: Likewise.
* libgnat/s-expmod.ads: Likewise.
* libgnat/s-exponn.ads: Likewise.
* libgnat/s-expont.ads: Likewise.
* libgnat/s-exponu.ads: Likewise.
* libgnat/s-expuns.ads: Likewise.
* libgnat/s-fatflt.ads: Likewise.
* libgnat/s-fatgen.ads: Likewise.
* libgnat/s-fatlfl.ads: Likewise.
* libgnat/s-fatllf.ads: Likewise.
* libgnat/s-flocon.ads: Likewise.
* libgnat/s-geveop.ads: Likewise.
* libgnat/s-imageb.ads: Likewise.
* libgnat/s-imaged.ads: Likewise.
* libgnat/s-imagef.ads: Likewise.
* libgnat/s-imagei.ads: Likewise.
* libgnat/s-imagen.ads: Likewise.
* libgnat/s-imageu.ads: Likewise.
* libgnat/s-imagew.ads: Likewise.
* libgnat/s-imde128.ads: Likewise.
* libgnat/s-imde32.ads: Likewise.
* libgnat/s-imde64.ads: Likewise.
* libgnat/s-imen16.ads: Likewise.
* libgnat/s-imen32.ads: Likewise.
* libgnat/s-imenu8.ads: Likewise.
* libgnat/s-imfi32.ads: Likewise.
* libgnat/s-imfi64.ads: Likewise.
* libgnat/s-imgbiu.ads: Likewise.
* libgnat/s-imgboo.ads: Likewise.
* libgnat/s-imgcha.ads: Likewise.
* libgnat/s-imgint.ads: Likewise.
* libgnat/s-imgllb.ads: Likewise.
* libgnat/s-imglli.ads: Likewise.
* libgnat/s-imgllu.ads: Likewise.
* libgnat/s-imgllw.ads: Likewise.
* libgnat/s-imgrea.ads: Likewise.
* libgnat/s-imguns.ads: Likewise.
* libgnat/s-imguti.ads: Likewise.
* libgnat/s-imgwiu.ads: Likewise.
* libgnat/s-maccod.ads: Likewise.
* libgnat/s-multip.ads: Likewise.
* libgnat/s-pack03.ads: Likewise.
* libgnat/s-pack05.ads: Likewise.
* libgnat/s-pack06.ads: Likewise.
* libgnat/s-pack07.ads: Likewise.
* libgnat/s-pack09.ads: Likewise.
* libgnat/s-pack10.ads: Likewise.
* libgnat/s-pack100.ads: Likewise.
* libgnat/s-pack101.ads: Likewise.
* libgnat/s-pack102.ads: Likewise.
* libgnat/s-pack103.ads: Likewise.
* libgnat/s-pack104.ads: Likewise.
* libgnat/s-pack105.ads: Likewise.
* libgnat/s-pack106.ads: Likewise.
* libgnat/s-pack107.ads: Likewise.
* libgnat/s-pack108.ads: Likewise.
* libgnat/s-pack109.ads: Likewise.
* libgnat/s-pack11.ads: Likewise.
* libgnat/s-pack110.ads: Likewise.
* libgnat/s-pack111.ads: Likewise.
* libgnat/s-pack112.ads: Likewise.
* libgnat/s-pack113.ads: Likewise.
* libgnat/s-pack114.ads: Likewise.
* libgnat/s-pack115.ads: Likewise.
* libgnat/s-pack116.ads: Likewise.
* libgnat/s-pack117.ads: Likewise.
* libgnat/s-pack118.ads: Likewise.
* libgnat/s-pack119.ads: Likewise.
* libgnat/s-pack12.ads: Likewise.
* libgnat/s-pack120.ads: Likewise.
* libgnat/s-pack121.ads: Likewise.
* libgnat/s-pack122.ads: Likewise.
* libgnat/s-pack123.ads: Likewise.
* libgnat/s-pack124.ads: Likewise.
* libgnat/s-pack125.ads: Likewise.
* libgnat/s-pack126.ads: Likewise.
* libgnat/s-pack127.ads: Likewise.
* libgnat/s-pack13.ads: Likewise.
* libgnat/s-pack14.ads: Likewise.
* libgnat/s-pack15.ads: Likewise.
* libgnat/s-pack17.ads: Likewise.
* libgnat/s-pack18.ads: Likewise.
* libgnat/s-pack19.ads: Likewise.
* libgnat/s-pack20.ads: Likewise.
* libgnat/s-pack21.ads: Likewise.
* libgnat/s-pack22.ads: Likewise.
* libgnat/s-pack23.ads: Likewise.
* libgnat/s-pack24.ads: Likewise.
* libgnat/s-pack25.ads: Likewise.
* libgnat/s-pack26.ads: Likewise.
* libgnat/s-pack27.ads: Likewise.
* libgnat/s-pack28.ads: Likewise.
* libgnat/s-pack29.ads: Likewise.
* libgnat/s-pack30.ads: Likewise.
* libgnat/s-pack31.ads: Likewise.
* libgnat/s-pack33.ads: Likewise.
* libgnat/s-pack34.ads: Likewise.
* libgnat/s-pack35.ads: Likewise.
* libgnat/s-pack36.ads: Likewise.
* libgnat/s-pack37.ads: Likewise.
* libgnat/s-pack38.ads: Likewise.
* libgnat/s-pack39.ads: Likewise.
* libgnat/s-pack40.ads: Likewise.
* libgnat/s-pack41.ads: Likewise.
* libgnat/s-pack42.ads: Likewise.
* libgnat/s-pack43.ads: Likewise.
* libgnat/s-pack44.ads: Likewise.
* libgnat/s-pack45.ads: Likewise.
* libgnat/s-pack46.ads: Likewise.
* libgnat/s-pack47.ads: Likewise.
* libgnat/s-pack48.ads: Likewise.
* libgnat/s-pack49.ads: Likewise.
* libgnat/s-pack50.ads: Likewise.
* libgnat/s-pack51.ads: Likewise.
* libgnat/s-pack52.ads: Likewise.
* libgnat/s-pack53.ads: Likewise.
* libgnat/s-pack54.ads: Likewise.
* libgnat/s-pack55.ads: Likewise.
* libgnat/s-pack56.ads: Likewise.
* libgnat/s-pack57.ads: Likewise.
* libgnat/s-pack58.ads: Likewise.
* libgnat/s-pack59.ads: Likewise.
* libgnat/s-pack60.ads: Likewise.
* libgnat/s-pack61.ads: Likewise.
* libgnat/s-pack62.ads: Likewise.
* libgnat/s-pack63.ads: Likewise.
* libgnat/s-pack65.ads: Likewise.
* libgnat/s-pack66.ads: Likewise.
* libgnat/s-pack67.ads: Likewise.
* libgnat/s-pack68.ads: Likewise.
* libgnat/s-pack69.ads: Likewise.
* libgnat/s-pack70.ads: Likewise.
* libgnat/s-pack71.ads: Likewise.
* libgnat/s-pack72.ads: Likewise.
* libgnat/s-pack73.ads: Likewise.
* libgnat/s-pack74.ads: Likewise.
* libgnat/s-pack75.ads: Likewise.
* libgnat/s-pack76.ads: Likewise.
* libgnat/s-pack77.ads: Likewise.
* libgnat/s-pack78.ads: Likewise.
* libgnat/s-pack79.ads: Likewise.
* libgnat/s-pack80.ads: Likewise.
* libgnat/s-pack81.ads: Likewise.
* libgnat/s-pack82.ads: Likewise.
* libgnat/s-pack83.ads: Likewise.
* libgnat/s-pack84.ads: Likewise.
* libgnat/s-pack85.ads: Likewise.
* libgnat/s-pack86.ads: Likewise.
* libgnat/s-pack87.ads: Likewise.
* libgnat/s-pack88.ads: Likewise.
* libgnat/s-pack89.ads: Likewise.
* libgnat/s-pack90.ads: Likewise.
* libgnat/s-pack91.ads: Likewise.
* libgnat/s-pack92.ads: Likewise.
* libgnat/s-pack93.ads: Likewise.
* libgnat/s-pack94.ads: Likewise.
* libgnat/s-pack95.ads: Likewise.
* libgnat/s-pack96.ads: Likewise.
* libgnat/s-pack97.ads: Likewise.
* libgnat/s-pack98.ads: Likewise.
* libgnat/s-pack99.ads: Likewise.
* libgnat/s-parame.ads: Likewise.
* libgnat/s-rident.ads: Likewise.
* libgnat/s-spark.ads: Likewise.
* libgnat/s-spcuop.ads: Likewise.
* libgnat/s-stoele.ads: Likewise.
* libgnat/s-traent.ads: Likewise.
* libgnat/s-unstyp.ads: Likewise.
* libgnat/s-vaispe.ads: Likewise.
* libgnat/s-valspe.ads: Likewise.
* libgnat/s-vauspe.ads: Likewise.
* libgnat/s-veboop.ads: Likewise.
* libgnat/s-vector.ads: Likewise.
* libgnat/s-vs_int.ads: Likewise.
* libgnat/s-vs_lli.ads: Likewise.
* libgnat/s-vs_llu.ads: Likewise.
* libgnat/s-vs_uns.ads: Likewise.
* libgnat/s-vsllli.ads: Likewise.
* libgnat/text_io.ads: Likewise.
* libgnat/unchconv.ads: Likewise.
* libgnat/unchdeal.ads: Likewise.
* s-pack.ads.tmpl: Likewise.

ada: Make sure not to access past the end of bit-packed arrays

The code generated for the routines of the run-time library that implement
support for bit-packed arrays with non-power-of-2 component sizes turns out
to be problematic for the Address Sanitizer and the CHERI architecture, as
it may access past the end of bit-packed arrays in specific cases.

No functional changes.

gcc/ada/ChangeLog:

* s-pack.adb.tmpl: Add '7' suffix to all existing constructs and
add variants with suffixes ranging from '0' to '6'.
(Get_@@): Dereference the address as a record object whose accessed
component is always the last.
(GetU_@@): Likewise.
(Set_@@): Likewise.
(SetU_@@): Likewise.
* libgnat/s-pack03.adb: Regenerate.
* libgnat/s-pack05.adb: Likewise.
* libgnat/s-pack06.adb: Likewise.
* libgnat/s-pack07.adb: Likewise.
* libgnat/s-pack09.adb: Likewise.
* libgnat/s-pack10.adb: Likewise.
* libgnat/s-pack100.adb: Likewise.
* libgnat/s-pack101.adb: Likewise.
* libgnat/s-pack102.adb: Likewise.
* libgnat/s-pack103.adb: Likewise.
* libgnat/s-pack104.adb: Likewise.
* libgnat/s-pack105.adb: Likewise.
* libgnat/s-pack106.adb: Likewise.
* libgnat/s-pack107.adb: Likewise.
* libgnat/s-pack108.adb: Likewise.
* libgnat/s-pack109.adb: Likewise.
* libgnat/s-pack11.adb: Likewise.
* libgnat/s-pack110.adb: Likewise.
* libgnat/s-pack111.adb: Likewise.
* libgnat/s-pack112.adb: Likewise.
* libgnat/s-pack113.adb: Likewise.
* libgnat/s-pack114.adb: Likewise.
* libgnat/s-pack115.adb: Likewise.
* libgnat/s-pack116.adb: Likewise.
* libgnat/s-pack117.adb: Likewise.
* libgnat/s-pack118.adb: Likewise.
* libgnat/s-pack119.adb: Likewise.
* libgnat/s-pack12.adb: Likewise.
* libgnat/s-pack120.adb: Likewise.
* libgnat/s-pack121.adb: Likewise.
* libgnat/s-pack122.adb: Likewise.
* libgnat/s-pack123.adb: Likewise.
* libgnat/s-pack124.adb: Likewise.
* libgnat/s-pack125.adb: Likewise.
* libgnat/s-pack126.adb: Likewise.
* libgnat/s-pack127.adb: Likewise.
* libgnat/s-pack13.adb: Likewise.
* libgnat/s-pack14.adb: Likewise.
* libgnat/s-pack15.adb: Likewise.
* libgnat/s-pack17.adb: Likewise.
* libgnat/s-pack18.adb: Likewise.
* libgnat/s-pack19.adb: Likewise.
* libgnat/s-pack20.adb: Likewise.
* libgnat/s-pack21.adb: Likewise.
* libgnat/s-pack22.adb: Likewise.
* libgnat/s-pack23.adb: Likewise.
* libgnat/s-pack24.adb: Likewise.
* libgnat/s-pack25.adb: Likewise.
* libgnat/s-pack26.adb: Likewise.
* libgnat/s-pack27.adb: Likewise.
* libgnat/s-pack28.adb: Likewise.
* libgnat/s-pack29.adb: Likewise.
* libgnat/s-pack30.adb: Likewise.
* libgnat/s-pack31.adb: Likewise.
* libgnat/s-pack33.adb: Likewise.
* libgnat/s-pack34.adb: Likewise.
* libgnat/s-pack35.adb: Likewise.
* libgnat/s-pack36.adb: Likewise.
* libgnat/s-pack37.adb: Likewise.
* libgnat/s-pack38.adb: Likewise.
* libgnat/s-pack39.adb: Likewise.
* libgnat/s-pack40.adb: Likewise.
* libgnat/s-pack41.adb: Likewise.
* libgnat/s-pack42.adb: Likewise.
* libgnat/s-pack43.adb: Likewise.
* libgnat/s-pack44.adb: Likewise.
* libgnat/s-pack45.adb: Likewise.
* libgnat/s-pack46.adb: Likewise.
* libgnat/s-pack47.adb: Likewise.
* libgnat/s-pack48.adb: Likewise.
* libgnat/s-pack49.adb: Likewise.
* libgnat/s-pack50.adb: Likewise.
* libgnat/s-pack51.adb: Likewise.
* libgnat/s-pack52.adb: Likewise.
* libgnat/s-pack53.adb: Likewise.
* libgnat/s-pack54.adb: Likewise.
* libgnat/s-pack55.adb: Likewise.
* libgnat/s-pack56.adb: Likewise.
* libgnat/s-pack57.adb: Likewise.
* libgnat/s-pack58.adb: Likewise.
* libgnat/s-pack59.adb: Likewise.
* libgnat/s-pack60.adb: Likewise.
* libgnat/s-pack61.adb: Likewise.
* libgnat/s-pack62.adb: Likewise.
* libgnat/s-pack63.adb: Likewise.
* libgnat/s-pack65.adb: Likewise.
* libgnat/s-pack66.adb: Likewise.
* libgnat/s-pack67.adb: Likewise.
* libgnat/s-pack68.adb: Likewise.
* libgnat/s-pack69.adb: Likewise.
* libgnat/s-pack70.adb: Likewise.
* libgnat/s-pack71.adb: Likewise.
* libgnat/s-pack72.adb: Likewise.
* libgnat/s-pack73.adb: Likewise.
* libgnat/s-pack74.adb: Likewise.
* libgnat/s-pack75.adb: Likewise.
* libgnat/s-pack76.adb: Likewise.
* libgnat/s-pack77.adb: Likewise.
* libgnat/s-pack78.adb: Likewise.
* libgnat/s-pack79.adb: Likewise.
* libgnat/s-pack80.adb: Likewise.
* libgnat/s-pack81.adb: Likewise.
* libgnat/s-pack82.adb: Likewise.
* libgnat/s-pack83.adb: Likewise.
* libgnat/s-pack84.adb: Likewise.
* libgnat/s-pack85.adb: Likewise.
* libgnat/s-pack86.adb: Likewise.
* libgnat/s-pack87.adb: Likewise.
* libgnat/s-pack88.adb: Likewise.
* libgnat/s-pack89.adb: Likewise.
* libgnat/s-pack90.adb: Likewise.
* libgnat/s-pack91.adb: Likewise.
* libgnat/s-pack92.adb: Likewise.
* libgnat/s-pack93.adb: Likewise.
* libgnat/s-pack94.adb: Likewise.
* libgnat/s-pack95.adb: Likewise.
* libgnat/s-pack96.adb: Likewise.
* libgnat/s-pack97.adb: Likewise.
* libgnat/s-pack98.adb: Likewise.
* libgnat/s-pack99.adb: Likewise.

ada: Fix assertion failure on null aggregate in generic with pragma Ada_2022

This happens when the unit is instantiated in a non-Ada 2022 unit.

gcc/ada/ChangeLog:

PR ada/114127
* sem_aggr.adb (Is_Null_Aggregate): Replace test on Ada_Version
with test on Nkind.

ada: Get rid of N_Unchecked_Expression node

This node is used in a single place in the front-end: it wraps the newly
built N_Indexed_Component nodes on the left-hand side of assignments
generated to elaborate array aggregates, and its effect is to disable
range checks for the expressions of these nodes.

Most of the code in the front-end does not expect to encounter it at all,
which leads to weird effects when this actually happens after changes are
made to the processing of array aggregates.

This change replaces the node by the Kill_Range_Check flag already present
on N_Unchecked_Type_Conversion, but with a slightly adjusted semantics.

gcc/ada/ChangeLog:

* exp_aggr.adb (Build_Array_Aggr_Code.Gen_Assign): Do not call
Checks_Off on the newly built N_Indexed_Component node but instead
set Kill_Range_Check on it.
* exp_ch4.ads (Expand_N_Unchecked_Expression): Delete.
* exp_ch4.adb (Expand_N_Indexed_Component): Remove handling of
N_Unchecked_Expression.
(Expand_N_Unchecked_Expression): Delete.
(Expand_N_Unchecked_Type_Conversion): Propagate the Assignment_OK
flag and rewrite the node manually.
* exp_util.adb (Insert_Actions): Remove handling of
N_Unchecked_Expression.
(Side_Effect_Free): Likewise.
* expander.adb (Expand): Likewise.
* gen_il-gen-gen_nodes.adb (N_Indexed_Component): Add flag
Kill_Range_Check for the purpose of semantics.
(N_Unchecked_Expression): Delete.
* gen_il-internals.ads (Type_Frequency): Remove entry for
N_Unchecked_Expression.
* gen_il-types.ads (Opt_Type_Enum): Remove N_Unchecked_Expression.
* pprint.adb (Expression_Image): Remove handling of
N_Unchecked_Expression.
* sem.adb (Analyze): Likewise.
* sem_ch4.ads (Analyze_Unchecked_Expression): Delete.
* sem_ch4.adb (Analyze_Unchecked_Expression): Likewise.
* sem_res.adb (Resolve_Unchecked_Expression): Likewise.
(Resolve): Remove handling of N_Unchecked_Expression.
(Resolve_Indexed_Component): Do not call Apply_Scalar_Range_Check
on the expressions if Kill_Range_Check is set on the node.
* sem_util.adb (Is_Non_Preelaborable_Construct): Remove handling of
N_Unchecked_Expression.
* sinfo.ads (Kill_Range_Check): Document it for N_Indexed_Component.
(Unchecked Expression): Delete specification.
* sprint.adb (Sprint_Node_Actual): Remove handling of
N_Unchecked_Expression.
* tbuild.ads (Checks_Off): Delete.
* tbuild.adb (Checks_Off): Likewise.

ada: Fix internal error on invalid prefix with assertions enabled

This happens for example with:

package Q3 is
type Types is (One, Two);
end Q3;

with Q3;

package P3 is
Kind : Q3.Types := Q3.Types.One;
end P3;

and prevents the error from being given.

gcc/ada/ChangeLog:

PR ada/112979
* sem_ch8.adb (Find_Selected_Component): Try to recognize the
object operation notation only if the selector is a subprogram.

ada: Fix assertion failure on illegal use of aspect Type_Invariant

The illegal use is on a type derived from a formal private type, e.g.:

generic

type T is private;

package G is

type D is new T with Type_Invariant => True;

end G;

gcc/ada/ChangeLog:

PR ada/113037
* sem_prag.adb (Analyze_Pragma) <Pragma_Invariant>: Reject types
that are derived from formal private types.

ada: Fix unexpected Program_Error raised in the parser on mismatched []

This happens for example with:

A : constant array (Natural range <>) of String := [ "xor" [;

The problem is that the left bracket token is incorrectly classified as
a name extension, but there is no handler in the Scan_Name_Extension_OK
part of P_Name in Par.Ch4.

gcc/ada/ChangeLog:

PR ada/112821
* scans.ads (Token_Type): Remove Tok_Left_Bracket from Namext.

ada: Fix internal error on instantiation of package with a nested ghost package

The instantiation triggers an internal error in Gigi because of a dangling
ghost entity created by the finalization machinery.

gcc/ada/ChangeLog:

PR ada/114300
* exp_ch7.adb (Attach_Object_To_Master_Node): Propagate the
Is_Ignored_Ghost_Entity flag from the finalization procedure.
(Build_Finalizer.Process_Declarations): Move up the test on
Is_Ignored_Ghost_Entity.
* exp_util.adb (Requires_Cleanup_Actions): Likewise.

ada: Fix premature finalization of anonymous access result from library function

In GNAT's implementation, the finalization of controlled objects created
through anonymous access types occurs when the enclosing library unit goes
out of scope if this is safe, and never occurs otherwise.

The case of a function that is a library unit with an anonymous access
result type falls in the second category for the anonymous access result
type itself and, therefore, finalization cannot take place for it.

gcc/ada/ChangeLog:

PR ada/55725
* exp_ch6.adb (Add_Collection_Actual_To_Build_In_Place_Call): Be
prepared for no collection if the access type is anonymous.
* exp_ch7.adb (Build_Anonymous_Collection): Return early for the
anonymous access result type of a library function.

ada: Accept SPARK.Big_Integers.Big_Integer where Big_Integer is accepted

For certification of a light SPARK runtime libraries we now accept
expressions of type SPARK.Big_Integers.Big_Integer in subprogram and
loop variants.

gcc/ada/ChangeLog:

* exp_util.adb (Make_Variant_Comparison): Accept new types in
expansion.
* rtsfind.adb (Get_Unit_Name): Support SPARK.Big_Integers.
* rtsfind.ads (RTU_Id, RE_Id, RE_Unit_Table): Support new type
and its enclosing unit.
* sem_prag.adb (Analyze_Pragma): Support new type in pragma
Loop_Variant.
(Analyze_Subprogram_Variant_In_Decl_Part): Support new type in
aspect Subprogram_Variant.

ada: Make Interrupt and Attach Handlers Obsolescent in VXWorks

In order to trigger an obsolescent feature warning in
VXWorks if either the pragma or aspect of Interrupt_Handler
or Attach_Handler is used, the spec of the Register_Interrupt_Handler
method needs to be marked as obsolescent in a VXWorks specific
version of the file.

gcc/ada/ChangeLog:

* libgnarl/s-interr__vxworks.ads (new): A VXWorks specific
version of the file where Register_Interrupt_Handler is marked
with the Obsolescent pragma.
* libgnarl/s-interr__vxworks.adb: Remove pragma Obsoloescent
that had no effect.
* Makefile.rtl: Add entries for using the
libgnarl/s-interr__vxworks.ads file.

ada: Fix bogus error for delta aggregate as expression function

The compiler correctly accepts the other forms of aggregates.

gcc/ada/ChangeLog:

PR ada/113868
* par-ch6.adb (P_Subprogram) <Scan_Body_Or_Expression_Function>:
Add delta aggregate alongside the other forms of aggregates.

ada: Remove couple of irregular calls to Resolve_Aggr_Expr

The function is supposed to be passed an expression, but it is passed the
enclosing N_Component_Association node in a couple of cases, only to give
an error that can as well be given in the caller, at the cost of bypasses
to disable most of its processing.

gcc/ada/ChangeLog:

* sem_aggr.adb (Resolve_Array_Aggregate): In the case of an others
choice with a box, do not call Resolve_Aggr_Exp and give the error
for a multidimensional array directly.
(Resolve_Aggr_Expr): Remove bypasses for above case.

ada: Allow file mapping for System's spec

Before this patch, it was never allowed to use pragma Source_File_Name
for the spec of System, allegedly because Targparm.Get_Target_Parameters
is called before configuration pragmas are processed. Using a mapping
file was allowed but did not work correctly.

This patch makes mapping files loading happen before the call to
Get_Target_Parameters so mapping file can set the file name of System.
Also, pragma Source_File_Name is allowed if it confirms a mapping that
was previously given in a mapping file, to accommodate GPRbuild that
uses both pragmas and mapping files.

gcc/ada/ChangeLog:

* frontend.adb (Frontend): Move call to Fmap.Initialize ...
* gnat1drv.adb (Gnat1drv): ... here. Look up Fmap when loading System.
* par-prag.adb (Prag): Allow pragma Source_File_Name for System when
it confirms an existing mapping.

ada: Fix markup typos

gcc/ada/ChangeLog:

* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Fix minor
markup errors.
* doc/gnat_ugn/gnat_utility_programs.rst: Likewise.
* gnat_ugn.texi: Regenerate.

ada: Remove use of overlays in implementation of System.Pack_N units

The implementation uses an overlay between an address and an access value,
which is convoluted. This changes it to use a direct conversion instead.

No functional changes (and no changes to generated code at -O2).

gcc/ada/ChangeLog:

* s-pack.adb.tmpl: Add "with System.Address_To_Access_Conversions".
(Cluster_Ref): Delete.
(AAC): New instance of System.Address_To_Access_Conversions.
(Rev_Cluster_Ref): Delete.
(Rev_ACC): New instance of System.Address_To_Access_Conversions.
(ClusterU_Ref): Delete.
(AACU): New instance of System.Address_To_Access_Conversions.
(Rev_ClusterU_Ref): Delete.
(Rev_ACCU): New instance of System.Address_To_Access_Conversions.
(Get_@@): Use a direct address-to-access conversion.
(GetU_@@): Likewise.
(Set_@@): Likewise.
(SetU_@@): Likewise.
* libgnat/s-pack03.adb: Regenerate.
* libgnat/s-pack05.adb: Likewise.
* libgnat/s-pack06.adb: Likewise.
* libgnat/s-pack07.adb: Likewise.
* libgnat/s-pack09.adb: Likewise.
* libgnat/s-pack10.adb: Likewise.
* libgnat/s-pack100.adb: Likewise.
* libgnat/s-pack101.adb: Likewise.
* libgnat/s-pack102.adb: Likewise.
* libgnat/s-pack103.adb: Likewise.
* libgnat/s-pack104.adb: Likewise.
* libgnat/s-pack105.adb: Likewise.
* libgnat/s-pack106.adb: Likewise.
* libgnat/s-pack107.adb: Likewise.
* libgnat/s-pack108.adb: Likewise.
* libgnat/s-pack109.adb: Likewise.
* libgnat/s-pack11.adb: Likewise.
* libgnat/s-pack110.adb: Likewise.
* libgnat/s-pack111.adb: Likewise.
* libgnat/s-pack112.adb: Likewise.
* libgnat/s-pack113.adb: Likewise.
* libgnat/s-pack114.adb: Likewise.
* libgnat/s-pack115.adb: Likewise.
* libgnat/s-pack116.adb: Likewise.
* libgnat/s-pack117.adb: Likewise.
* libgnat/s-pack118.adb: Likewise.
* libgnat/s-pack119.adb: Likewise.
* libgnat/s-pack12.adb: Likewise.
* libgnat/s-pack120.adb: Likewise.
* libgnat/s-pack121.adb: Likewise.
* libgnat/s-pack122.adb: Likewise.
* libgnat/s-pack123.adb: Likewise.
* libgnat/s-pack124.adb: Likewise.
* libgnat/s-pack125.adb: Likewise.
* libgnat/s-pack126.adb: Likewise.
* libgnat/s-pack127.adb: Likewise.
* libgnat/s-pack13.adb: Likewise.
* libgnat/s-pack14.adb: Likewise.
* libgnat/s-pack15.adb: Likewise.
* libgnat/s-pack17.adb: Likewise.
* libgnat/s-pack18.adb: Likewise.
* libgnat/s-pack19.adb: Likewise.
* libgnat/s-pack20.adb: Likewise.
* libgnat/s-pack21.adb: Likewise.
* libgnat/s-pack22.adb: Likewise.
* libgnat/s-pack23.adb: Likewise.
* libgnat/s-pack24.adb: Likewise.
* libgnat/s-pack25.adb: Likewise.
* libgnat/s-pack26.adb: Likewise.
* libgnat/s-pack27.adb: Likewise.
* libgnat/s-pack28.adb: Likewise.
* libgnat/s-pack29.adb: Likewise.
* libgnat/s-pack30.adb: Likewise.
* libgnat/s-pack31.adb: Likewise.
* libgnat/s-pack33.adb: Likewise.
* libgnat/s-pack34.adb: Likewise.
* libgnat/s-pack35.adb: Likewise.
* libgnat/s-pack36.adb: Likewise.
* libgnat/s-pack37.adb: Likewise.
* libgnat/s-pack38.adb: Likewise.
* libgnat/s-pack39.adb: Likewise.
* libgnat/s-pack40.adb: Likewise.
* libgnat/s-pack41.adb: Likewise.
* libgnat/s-pack42.adb: Likewise.
* libgnat/s-pack43.adb: Likewise.
* libgnat/s-pack44.adb: Likewise.
* libgnat/s-pack45.adb: Likewise.
* libgnat/s-pack46.adb: Likewise.
* libgnat/s-pack47.adb: Likewise.
* libgnat/s-pack48.adb: Likewise.
* libgnat/s-pack49.adb: Likewise.
* libgnat/s-pack50.adb: Likewise.
* libgnat/s-pack51.adb: Likewise.
* libgnat/s-pack52.adb: Likewise.
* libgnat/s-pack53.adb: Likewise.
* libgnat/s-pack54.adb: Likewise.
* libgnat/s-pack55.adb: Likewise.
* libgnat/s-pack56.adb: Likewise.
* libgnat/s-pack57.adb: Likewise.
* libgnat/s-pack58.adb: Likewise.
* libgnat/s-pack59.adb: Likewise.
* libgnat/s-pack60.adb: Likewise.
* libgnat/s-pack61.adb: Likewise.
* libgnat/s-pack62.adb: Likewise.
* libgnat/s-pack63.adb: Likewise.
* libgnat/s-pack65.adb: Likewise.
* libgnat/s-pack66.adb: Likewise.
* libgnat/s-pack67.adb: Likewise.
* libgnat/s-pack68.adb: Likewise.
* libgnat/s-pack69.adb: Likewise.
* libgnat/s-pack70.adb: Likewise.
* libgnat/s-pack71.adb: Likewise.
* libgnat/s-pack72.adb: Likewise.
* libgnat/s-pack73.adb: Likewise.
* libgnat/s-pack74.adb: Likewise.
* libgnat/s-pack75.adb: Likewise.
* libgnat/s-pack76.adb: Likewise.
* libgnat/s-pack77.adb: Likewise.
* libgnat/s-pack78.adb: Likewise.
* libgnat/s-pack79.adb: Likewise.
* libgnat/s-pack80.adb: Likewise.
* libgnat/s-pack81.adb: Likewise.
* libgnat/s-pack82.adb: Likewise.
* libgnat/s-pack83.adb: Likewise.
* libgnat/s-pack84.adb: Likewise.
* libgnat/s-pack85.adb: Likewise.
* libgnat/s-pack86.adb: Likewise.
* libgnat/s-pack87.adb: Likewise.
* libgnat/s-pack88.adb: Likewise.
* libgnat/s-pack89.adb: Likewise.
* libgnat/s-pack90.adb: Likewise.
* libgnat/s-pack91.adb: Likewise.
* libgnat/s-pack92.adb: Likewise.
* libgnat/s-pack93.adb: Likewise.
* libgnat/s-pack94.adb: Likewise.
* libgnat/s-pack95.adb: Likewise.
* libgnat/s-pack96.adb: Likewise.
* libgnat/s-pack97.adb: Likewise.
* libgnat/s-pack98.adb: Likewise.
* libgnat/s-pack99.adb: Likewise.

ada: Remove obsolete ??? comment about Assignment_OK flag

The flagged use has apparently disappeared for long.

gcc/ada/ChangeLog:

* sinfo.ads (Assignment_OK): Remove obsolete ??? comment.

ada: Get rid of Kill_Range_Checks flag on entities

This flag is set in a single context, namely semantic analysis of record
type definitions, to avoid generating spurious range checks from it, and
a large testing campaign showed that, in practice, it makes a difference
in a single case, namely an access-to-constrained-array component with a
default expression, for example:

  type Acc_String is access all String (1 .. 100);

  type Rec (D : Positive) is record
    A : Acc_String := new String (1 .. D);
  end record;

Now there is another mechanism implemented in Process_Range_Expr_In_Decl to
avoid generating spurious range checks, which does not work in this specific
case but can be made to work with a small tweak to Denotes_Discriminant.

gcc/ada/ChangeLog:

* checks.adb (Range_Checks_Suppressed): Remove test on the
Kill_Range_Checks flag.
* einfo.ads (Kill_Range_Checks): Delete.
* gen_il-fields.ads (Opt_Field_Enum): Remove Kill_Range_Checks.
* gen_il-gen-gen_entities.adb (Entity_Kind): Likewise.
* sem_ch3.adb (Record_Type_Declaration): Do not set the
Kill_Range_Checks flag.
* sem_util.adb (Denotes_Discriminant): In a default expression,
also return True for a discriminal.

ada: Improve message for misused implicitly-defined preprocessor symbol.

If the -u option is specified, then otherwise-undefined preprocessor
symbols are implicitly defined to be False. If such an implicitly-defined
symbol is then incorrectly used in a context that requires an integer value,
the resulting error message should not incorrectly state that the symbol is
undefined.

gcc/ada/ChangeLog:

* prep.adb (Expression): Improve error message text when an
implicitly-defined Boolean-valued symbol is used in a context that
requires an integer value.

ada: Flatten Is_Build_In_Place_Aggregate_Return predicate

The predicate is passed an aggregate node and goes up its parent chain,
but that's unnecessary because Convert_To_Assignments has already done
so in the case of a record aggregate and Expand_Array_Aggregate does not
fully support intermediate conditional expressions yet.

gcc/ada/ChangeLog:

* exp_aggr.adb (Is_Build_In_Place_Aggregate_Return): Directly test
the node and remove dead code for extended return statements.

ada: Set correct minimum stack size for aarch64-linux

The minimum stack size defined by PTHREAD_STACK_MIN defined on
AArch64 Linux is 131072 bytes. Add a separate version for this
target to reflect that value. Previously the x86-64 value of 16384
bytes was used.

gcc/ada/ChangeLog:

* Makefile.rtl: Use libgnat/s-parame__aarch64-linux.adb for
s-parame.adb on aarch64-linux.
* libgnat/s-parame__aarch64-linux.adb: Add file.

ada: Detect sharing of external file in inconsistent read-write modes

When opening files with "shared=yes", as described in GNAT RM 11.10,
Sharing Files, we now prevent sharing a single file in inconsistent
read-write modes.

gcc/ada/ChangeLog:

* doc/gnat_rm/the_implementation_of_standard_i_o.rst
(Shared Files): Add trailing period.
* libgnat/s-ficobl.ads (AFCB): Reflect new behavior in comment.
* libgnat/s-fileio.adb (Open): Detect inconsistent sharing,
just like we do in System.File_IO.Reset.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: Spurious error on abstract primitive with access formals

This patch fixes an issue in the compiler whereby using anonymous access
types as abstract overridden subprogram formals for a derived abtract type
may lead to compile-time errors.

gcc/ada/ChangeLog:

* accessibility.adb (Type_Access_Level): Add handling for
subprogram aliases.

ada: Missing runtime tag check on mutably tagged objects

This patch fixes an issue in the compiler whereby assigning to a non-existant
mutably tagged object component failed to result in the expected run-time
exception.

gcc/ada/ChangeLog:

* exp_ch4.adb (Expand_N_Type_Conversion): Add special runtime check
generation for mutably tagged objects.

ada: GNAT Calendar Support for 64-bit Unix Time

The Epochalypse of 2038 will require the use of 64-bit time_t and
tv_sec (aka time in seconds from the Unix Epoch). The subprograms
in Ada calendar are self contained but nevertheless will malfunction
if a 64-bit integer type and calculations aren't used. Add 64-bit
versions and mark the old ones with pragma Obsolescent.

gcc/ada/ChangeLog:

* libgnat/a-calcon.adb (To_Ada_Time) (To_Duration)
(To_Struct_Timespec) (To_Unix_Time): Mark as obsolescent.
(To_Ada_Time_64) (To_Duration_64) (To_Struct_Timespec_64)
(To_Unix_Time_64): New.
* libgnat/a-calcon.ads (To_Ada_Time) (To_Duration)
(To_Struct_Timespec) (To_Unix_Time): Mark as obsolescent.
(To_Ada_Time_64) (To_Duration_64) (To_Struct_Timespec_64)
(To_Unix_Time_64): New.
* libgnat/a-calend.adb (To_Ada_Time) (To_Duration)
(To_Struct_Timespec) (To_Unix_Time): Mark as obsolescent.
(To_Ada_Time_64) (To_Duration_64) (To_Struct_Timespec_64)
(To_Unix_Time_64): New.
* libgnat/a-calend.ads (To_Ada_Time) (To_Duration)
(To_Struct_Timespec) (To_Unix_Time): Mark as obsolescent.
(To_Ada_Time_64) (To_Duration_64) (To_Struct_Timespec_64)
(To_Unix_Time_64): New.

ada: Fix internal error on nested iterated component associations

The problem is that Insert_Actions gets confused as to where it should
insert actions coming from within an N_Iterated_Component_Association,
because some actions may be generated during semantic analysis and some
others during expansion.

Instead of another ad-hoc fix, this change extends the processing done
for N_Component_Association, that is to say waiting for the Loop_Actions
field to be set during expansion before inserting actions in there.

This in turn requires semantic analysis to stop generating actions for
N_Iterated_Component_Association nodes.  The current processing is a
little unstable:
  - for container aggregates, Resolve_Iterated_Association preanalyzes
    a copy of the expression,
  - for delta aggregates, Resolve_Delta_Array_Aggregate fully analyzes
    a copy of the expression,
  - for array aggregate, Resolve_Aggr_Expr entirely skips the analysis.

The change implements a preanalysis of a copy of the expression using
Copy_Separate_Tree, which should be sufficient since the expression is
supposed to be unanalyzed at this point, recursively in the context of
N_Iterated_Component_Association nodes.

gcc/ada/ChangeLog:

PR ada/117018
* exp_aggr.adb (Build_Array_Aggr_Code): Do not expect the
Loop_Actions field to be already present on association nodes.
* exp_util.adb (Insert_Actions): For association nodes, insert
into the Loop_Actions field only if it is already present.
* sem_aggr.adb (Resolve_Array_Aggregate): Add Iterated parameter.
(Resolve_Aggregate): Adjust calls to Resolve_Array_Aggregate.
(Resolve_Aggr_Expr): Add Iterated_Elmt defaulted parameter and
a default for Single_Elmt.  Adjust call to Resolve_Array_Aggregate.
Preanalyze a copy of the expression in an iteration context.
(Resolve_Iterated_Component_Association): Pass Iterated_Elmt as
True to Resolve_Aggr_Expr and remove processing of Loop_Actions.
Do not check incorrect use of dynamically tagged expression in
an iteration context.
(Resolve_Iterated_Association): Use Copy_Separate_Tree instead of
New_Copy_Tree and set the Parent field of the result.
(Resolve_Delta_Array_Aggregate): Likewise.  Only preanalyze the
copy instead of analyzing it.

ada: Add documentation about GNAT LLVM to GNAT User's Guide

Also be consistent on spelling of "back end".

gcc/ada/ChangeLog:

* doc/gnat_ugn/about_this_guide.rst: Add information about GNAT LLVM.
Be consistent about spelling of "back end".
* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Likewise.
* doc/gnat_ugn/gnat_and_program_execution.rst: Be consistent about
spelling of "back end".
* doc/gnat_ugn/the_gnat_compilation_model.rst: Likewise.
* gnat_ugn.texi: Regenerate.

ada: Fix compilation failure due to style warning

gcc/ada/ChangeLog:

* mdll.adb (Build_Dynamic_Library): Fix indentation.

ada: Rework GNATdll shared library relocation support.

The code has been simplified to use a single way to create a DLL.
The relocation support is based on whether the base address for the
DLL is passed to the final linker step or not.

gcc/ada/ChangeLog:

* mdll.adb: Use the same procedure to create relocatable or non
relocatable DLL. The only difference is wether the base address is
passed to the final linker. If no base-address is given the DLL is
relocatable.