Hacker News new | past | comments | ask | show | jobs | submit login

"Register pressure is even worse because that is only a problem because of the ISA, not the microarchitecture."

I'm not so sure. How many cycles would you expect this code to take?

  mov dword [rsi], eax
  add dword [rsi], 5
  mov ebx, dword [rsi]

According to Agner Fog, these 3 instructions have a latency of 15 cycles on an AMD Zen 1 processor. On Zen 2, its latency is 2 cycles. This is because the CPU was given the ability to assign a register to `dword [rsi]`, overcoming the limit of 16 registers.

This optimization is subject to problems, obviously pointer aliasing will enable the CPU to make the wrong assumption at times, and cause a situation not entirely unlike a branch mispredict.

There are constraints imposed by the micro-architecture for this feature. For you and I, a big one is it only works with general purpose registers. But is there a reason it couldn't or shouldn't be done for vectors? It seems like a micro-arch issue to me. Perhaps in a few years or in a lot of years, we'll have a CPU that can do this optimization for vectors.




I learn something new every day! Thanks for mentioning this. For other readers: Agner Fog documents this in 22.18 Mirroring memory operands.

I've known that similar optimizations exist, namely store-to-load forwarding, but I didn't know that AMD has experimented with mapping in-flight writes straight into the register file. Sounds like they've abandoned this approach, though, and Zen 3 doesn't feature this, supposedly because it's expensive to implement. So for all intents and purposes, it doesn't exist anymore, and it probably won't be brought back in the same fashion.

I do still think this is something better solved by ISA changes. Doing this on the uarch level will either be flaky or more costly. It is absolutely possible, but only with tradeoffs that may not be acceptable. The APX extension doubles the number of GPRs and improves orthogonality, so there's at least work in that direction on the ISA level, and I think that's what we're realistically going to use soon.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact