Do MC and IDCT in coding (hilbert) order
This increases the slice size to 64 pixels, due to having to decode an
entire chroma superblock row per slice.
This can be up to 6% slower depending on clip and CPU, but is necessary
for future optimizations that gain significantly more than was lost.
Originally committed as revision 22189 to svn://svn.ffmpeg.org/ffmpeg/trunk