| Branch: | Revision:

ffmpeg / libavcodec / x86 / vp8dsp-init.c @ b9c7f66e

History | View | Annotate | Download (19 KB)

# Date Author Comment
c6c98d08 09/08/2010 03:07 PM Stefano Sabatini

Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://

7160bb71 09/04/2010 09:59 AM Stefano Sabatini

Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://

c0ec9918 08/24/2010 05:47 PM Måns Rullgård

Remove global mm_flags variable

Originally committed as revision 24909 to svn://

827d43bb 08/02/2010 08:18 PM Jason Garrett-Glaser

VP8: move zeroing of luma DC block into the WHT
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.

Originally committed as revision 24668 to svn://

6341838f 07/31/2010 11:13 PM Ronald S. Bultje

Use word-writing instead of dword-writing (with two cached but otherwise
unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte...

3ae079a3 07/23/2010 06:02 AM Jason Garrett-Glaser

VP8: optimize DC-only chroma case in the same way as luma.
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.

Originally committed as revision 24455 to svn://

8a467b2d 07/23/2010 02:58 AM Jason Garrett-Glaser

VP8: 30% faster idct_mb
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?...

c25c7767 07/23/2010 12:07 AM Jason Garrett-Glaser

VP8: clear DCT blocks in iDCT instead of using clear_blocks.
~0.3% faster overall.

Originally committed as revision 24448 to svn://

dc5eec80 07/22/2010 07:59 PM Ronald S. Bultje

Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on
CPUs supporting it.

Originally committed as revision 24437 to svn://

003243c3 07/22/2010 01:35 AM Ronald S. Bultje

Fix and enable horizontal >=SSE2 mbedge loopfilter.

Originally committed as revision 24409 to svn://

7dd224a4 07/21/2010 10:11 PM Jason Garrett-Glaser

Various VP8 x86 deblocking speedups
SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.

Originally committed as revision 24403 to svn://

b8b231b5 07/21/2010 08:51 PM Jason Garrett-Glaser

Make mmx VP8 WHT faster
Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.

Originally committed as revision 24397 to svn://

e9e456d8 07/20/2010 10:58 PM Ronald S. Bultje

VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)
and chroma (width=8).

Originally committed as revision 24378 to svn://

268821e7 07/20/2010 10:04 PM Ronald S. Bultje

Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.

Originally committed as revision 24377 to svn://

c60ed66d 07/19/2010 11:57 PM Ronald S. Bultje

Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's
wrong with it tomorrow or so, then re-submit.

Originally committed as revision 24341 to svn://

6526976f 07/19/2010 10:38 PM Ronald S. Bultje

Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster than
regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag,
FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that
have been checked specifically on such CPUs and are actually faster than...

1878f685 07/19/2010 09:53 PM Ronald S. Bultje

Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.

Originally committed as revision 24339 to svn://

3facfc99 07/19/2010 09:18 PM Ronald S. Bultje

Change function prototypes for width=8 inner and mbedge loopfilter functions
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register....

a711eb48 07/15/2010 11:02 PM Ronald S. Bultje

VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.

Originally committed as revision 24250 to svn://

f2a30bd8 07/03/2010 07:26 PM Ronald S. Bultje

Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).

Originally committed as revision 24029 to svn://

b06855f1 07/03/2010 12:48 AM Jason Garrett-Glaser

SSSE3 versions of vp8 width4 bilinear MC functions

Originally committed as revision 24013 to svn://

dcc602d8 07/02/2010 05:27 AM Jason Garrett-Glaser

SSSE3 versions of width4 VP8 6-tap MC functions
Also make some small changes to saturation order of 4-tap SSSE3 MC to fix a
non-bitexactness bug.

Patch mostly by Eli Friedman <eli.friedman AT gmail DOT com>.

Originally committed as revision 23965 to svn://

8434fc26 07/01/2010 10:09 PM Jason Garrett-Glaser

Fix 100L in vp8dsp asm init

Originally committed as revision 23946 to svn://

2dd2f716 06/29/2010 02:43 PM Ronald S. Bultje

MMX idct_add for VP8.

Originally committed as revision 23886 to svn://

004cda8e 06/29/2010 01:41 AM Jason Garrett-Glaser

Add mmxext version of VP8 DC Hadamard transform

Originally committed as revision 23878 to svn://

50f70541 06/28/2010 09:12 PM Baptiste Coudurier

Change MMXEXT to MMX2, MMXEXT is deprecated

Originally committed as revision 23865 to svn://

0fecad09 06/28/2010 07:14 PM Jason Garrett-Glaser

Add x86 asm functions for VP8 put_pixels

Originally committed as revision 23858 to svn://

a173aa89 06/28/2010 06:56 PM Jason Garrett-Glaser

Add MMX, SSE2, SSSE3 asm for VP8 bilinear MC

Originally committed as revision 23857 to svn://

30bdefd1 06/27/2010 02:52 AM David Conrad

Fix build without yasm

Originally committed as revision 23816 to svn://

0178d14f 06/27/2010 02:01 AM Jason Garrett-Glaser

First shot at VP8 optimizations:
- MMXEXT, SSE2 and SSSE3 MC functions
- MMX and SSE4 IDCT dc_add functions

Patch by Jason Garrett-Glaser <darkshikari gmail com> and myself.

Originally committed as revision 23815 to svn://