> it won't show up in profiles except for ones that capture stacks
I don't think I've ever used a profiler that couldn't report you were in F() here. One that only captures your innermost functions really doesn't seem that useful, for exactly the reasons you give.
The default usage of perf does this. There's also a few profilers I know of that will show the functions taking the most time.
IMO, those are (generally) nowhere near as useful as a flame/icicle graph.
Not saying they are never useful; Sometimes people do really dumb things in 1 function. However, the actual performance bottleneck often lives at least a few levels up the stack.
Which is why the defaults for perf always drive me crazy. You want to see the entire call tree with the cumulative and exclusive time spent in all the functions.
I’m honestly curious why the defaults are the way they are. I have basically never found them to be what I want. Surely the perf people aren’t doing something completely different than I am?
I almost never find graph usage useful, TBH (and flamegraphs are worse than useless). And perf's support for stack traces is always wonky _somehow_, so it's not easy to find good defaults for the cases where I need them (I tend to switch between fp, lbr and dwarf depending on a whole lot of factors).
I think I've only been able to get good call stacks when I build everything myself with the right compilation options. This is a big contrast with what I remember working with similar tools under MSFT environments (MS Profiler or vTune).
I don't think I've ever used a profiler that couldn't report you were in F() here. One that only captures your innermost functions really doesn't seem that useful, for exactly the reasons you give.