I spent 10 years straight doing C++ and assembly optimization. My work is still fun these days but that was probably the most enjoyable work of my career in terms of the actual day to day coding.
Code cleanup in general is the same for me, but it’s really hard to justify putting much time into that when running your own company solo.
The routines were individually benchmarked using some custom tools (iterate repeatedly and use statistical analysis to converge on an estimate). Always compared against a plain C reference implementation.
Then there was a system for benchmarking the software as a whole on a wide variety of architectures, including NUMA. With lots of plots and statistics.
Usually you’d eventually end up at a point where the improvements are below the noise floor or they help on some systems and cause regression on others. The rule was usually “no regressions”
VTune for multithreading optimization. Built a fibers and lockfree system for efficient scheduling.
Code cleanup in general is the same for me, but it’s really hard to justify putting much time into that when running your own company solo.