Yes, there is non-obvious structure in some algorithms solely for the purpose of turning a single logical stream of dependent instructions into multiple concurrent streams of dependent instructions running through the same pipeline. The caveat of doing this, of course, is that it typically increases register pressure.