Skip to content

Misoptimization: EarlyCSEPass uses replaces powi.f16 with float result #98665

Closed
@tgross35

Description

@tgross35

It looks like EarlyCSEPass is transformin the following:

  %_6 = alloca [48 x i8], align 8
  %_3 = alloca [2 x i8], align 2
  %0 = call half @llvm.powi.f16.i32(half 0xH3C00, i32 1) ; 0xH3C00 = 1.0f16
  store half %0, ptr %_3, align 2
  %1 = load half, ptr %_3, align 2
  %_4 = fcmp oeq half %1, 0xH3C00
  br i1 %_4, label %bb1, label %bb2

Into this:

  %_6 = alloca [48 x i8], align 8
  %_3 = alloca [2 x i8], align 2
  store float 1.000000e+00, ptr %_3, align 2
  %0 = load half, ptr %_3, align 2
  %_4 = fcmp oeq half %0, 0xH3C00
  br i1 %_4, label %bb1, label %bb2

And later InstCombine folds further into:

  %_6 = alloca [48 x i8], align 8
  %_3 = alloca [2 x i8], align 2
  store float 1.000000e+00, ptr %_3, align 2
  br i1 false, label %bb1, label %bb2

EarlyCSE seems to be doing an incorrect transformation: the result of powi.f16(1.0, 1) should be half 1.0 (0x3c00), but it is returning float 1.0 (0x3f800000). This is incorrect and an OOB write.

This comes from the following rust code, which asserts only when optimizations are enabled:

#![feature(f16)]
#![allow(unused)]

#[inline(never)]
pub fn check_pow(a: f16) {
    assert_eq!(1.0f16.powi(1), 1.0);
}

pub fn main() {
    check_pow(1.0);
    println!("finished");
}

Link to compiler explorer: https://rust.godbolt.org/z/zsbzzxGvj

I'm not sure how to reduce to a llc example since the passes appear different. I have been testing on aarch64 since x86 has other f16 ABI bugs, but I don't think this is limited to aarch64.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions