1.5x faster ff_vorbis_floor1_render_list, 5% faster vorbis decoding on Core2.
1.3x and 3% on G4.
Though I think only part of this speedup is due to my optimizations per se;
some of it is that I got a better roll on the GCC random code generator.
Trivial reorderings of this function have a disproportionate effect on speed.
Originally committed as revision 19726 to svn://svn.ffmpeg.org/ffmpeg/trunk