VC1: inline vc1_put_block() in vc1_decode_i_blocks().
Advantage is that it allows us to combine several loops into a single
one, and these can eventually be merged into the IDCT itself. Also, it
allows us to remove vc1_put_block(), and makes CODEC_FLAG_GRAY faster.
(cherry picked from commit bbfd2e7ab4e2ae0b934657fe51afdbbbaead52b7)