Statistics
| Branch: | Revision:

ffmpeg / libavcodec / x86 / h264_idct.asm @ b9c7f66e

History | View | Annotate | Download (24.6 KB)

# Date Author Comment
19fb234e 01/14/2011 09:34 PM Jason Garrett-Glaser

H.264: split luma dc idct out and implement MMX/SSE2 versions
About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed....

02b424d9 09/26/2010 09:15 AM Reimar Döffinger

Add d suffix to movd target register to make it work with nasm.

Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk

ae112918 09/24/2010 02:07 PM Ronald S. Bultje

Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, this
inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk

4bca6774 09/24/2010 02:05 PM Ronald S. Bultje

Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in the
code directly also and remove loop setup. 20% faster in function, 0.8% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk

1d16a1cf 09/14/2010 01:36 PM Ronald S. Bultje

Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping....