Skip to main content
Peter Cordes's user avatar
Peter Cordes's user avatar
Peter Cordes's user avatar
Peter Cordes
  • Member for 15 years, 5 months
  • Last seen this week
102 votes
3 answers
16k views

Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?

76 votes
1 answer
19k views

What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?

58 votes
0 answers
2k views

Per-element atomicity of vector load/store and gather/scatter?

53 votes
2 answers
5k views

How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent

34 votes
1 answer
5k views

What are the best instruction sequences to generate vector constants on the fly?

23 votes
2 answers
4k views

Does modern PC video hardware support VGA text mode in HW, or does the BIOS emulate it (with System Management Mode)?

19 votes
3 answers
10k views

How to convert a binary integer number to a hex string?

18 votes
2 answers
5k views

In GNU C inline asm, what are the size-override modifiers for xmm/ymm/zmm for a single operand?

17 votes
2 answers
506 views

Does undefined behaviour retroactively mean that earlier visible side-effects aren't guaranteed?

15 votes
1 answer
3k views

Which 2's complement integer operations can be used without zeroing high bits in the inputs, if only the low part of the result is wanted?

15 votes
2 answers
3k views

Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? Or not using that insn at all

12 votes
2 answers
2k views

Are there any modern CPUs where a cached byte store is actually slower than a word store?

12 votes
1 answer
2k views

Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm?

11 votes
1 answer
2k views

Symbol name conflicts with new register names in new NASM versions?

11 votes
1 answer
2k views

How to merge a scalar into a vector without the compiler wasting an instruction zeroing upper elements? Design limitation in Intel's intrinsics?

11 votes
1 answer
2k views

x86-32 / x86-64 polyglot machine-code fragment that detects 64bit mode at run-time?

10 votes
1 answer
878 views

Does Skylake need vzeroupper for turbo clocks to recover after a 512-bit instruction that only reads a ZMM register, writing a k mask?

10 votes
2 answers
728 views

Which Intel microarchitecture introduced the ADC reg,0 single-uop special case?

9 votes
1 answer
2k views

Why can't I mmap(MAP_FIXED) the highest virtual page in a 32-bit Linux process on a 64-bit kernel?

8 votes
2 answers
2k views

How does MIPS I handle branching on the previous ALU instruction without stalling?

8 votes
1 answer
990 views

How to force NASM to encode [1 + rax*2] as disp32 + index*2 instead of disp8 + base + index?

8 votes
1 answer
1k views

Can PTEST be used to test if two registers are both zero or some other condition?

7 votes
1 answer
806 views

Do FP and integer division compete for the same throughput resources on x86 CPUs?

7 votes
1 answer
2k views

How to write an absolute target for a near direct relative call/jmp in MASM

7 votes
1 answer
2k views

GNU C native vectors: how to broadcast a scalar, like x86's _mm_set1_epi16

2 votes
1 answer
439 views

Does a function with instructions before the entry-point label cause problems for anything (linking)?

2 votes
1 answer
650 views

Write x86 asm functions portably (win/linux/osx), without a build-depend on yasm/nasm?

2 votes
2 answers
2k views

PHP form and DB query input validation