Open
Description
Eventually we will want to be able to make use of simd operations for f16 and f128, now that we have primitives to represent them. Possibilities that I know of:
- Aarch64 neon supports
float16x{4,8}
https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&f:@navigationhierarchiesreturnbasetype=[float]&f:@navigationhierarchieselementbitsize=[16]&q=. - Arm sve supports
float16x{1,2}
https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiesreturnbasetype=[float]&f:@navigationhierarchieselementbitsize=[16]&f:@navigationhierarchiessimdisa=[sve2,sve]&q= - RISC-V apparently has both f16 and f128 https://five-embeddev.com/riscv-user-isa-manual/riscv-user-2.2/v.html
- NVIDIA PTX has f16 SIMD
- Implementation: NVPTX: Add f16 SIMD intrinsics stdarch#1626
- Submodule Update stdarch submodule #128866
- Tracking issue: Tracking Issue for NVPTX arch intrinsics #111199
- x86 with +avx512fp16
- Implementation: Implement AVX512_FP16 stdarch#1605
- Submodule: Update the stdarch submodule #128466
- Tracking issue: Tracking Issue for AVX512_FP16 intrinsics #127213
- Portable SIMD should eventually be able to support these operations
Probably some work/research overlap with adding assembly #125398
Tracking issue: #116909