Statistics
| Branch: | Revision:

ffmpeg / libavcodec / x86 @ 0cc8a5d0

# Date Author Comment
0cc8a5d0 09/29/2010 02:03 PM Ronald S. Bultje

Remove mv_mask variable. Replace the related pand -1/0 instructions by either
a pxor, or remove the instruction alltogether. Altogether, this saves 1
instruction.

Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk

c0673f2c 09/29/2010 02:02 PM Ronald S. Bultje

Remove d_idx as a variable, and instead load it as a constant in the asm.
This has no measurable speed effect because the surrounding code doesn't
take advantage of this yet.

Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk

2c3135f6 09/29/2010 01:35 PM Ronald S. Bultje

Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid
of the d_idx variable and therefore allows for future optimizations. No speed
difference by this commit itself.

Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk

4b81511c 09/29/2010 01:34 PM Ronald S. Bultje

Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows
inlining various constants within the loop code. 20 cycles faster on
cathedral sample.

Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk

02b424d9 09/26/2010 09:15 AM Reimar Döffinger

Add d suffix to movd target register to make it work with nasm.

Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk

dc77e985 09/26/2010 09:08 AM Reimar Döffinger

Split and then simplify address generation macro.
Allows nasm to work for this code.

Originally committed as revision 25205 to svn://svn.ffmpeg.org/ffmpeg/trunk

7e117771 09/24/2010 03:31 PM Ronald S. Bultje

Remove unused variable.

Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk

ae112918 09/24/2010 02:07 PM Ronald S. Bultje

Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, this
inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk

4bca6774 09/24/2010 02:05 PM Ronald S. Bultje

Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in the
code directly also and remove loop setup. 20% faster in function, 0.8% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk

c0bc8b9a 09/21/2010 05:57 PM Måns Rullgård

x86: disable SSE functions using stack when stack is not aligned

This fixes crashes with ICC 10.1.

Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk

f41237c9 09/18/2010 08:44 PM Måns Rullgård

x86: remove hack disabling sse2 h264 loop filter with 32-bit icc

Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk

ada65af9 09/17/2010 12:24 PM Ronald S. Bultje

Don't access upper 32 bits of a 32-bit int on 64-bit systems.

Originally committed as revision 25140 to svn://svn.ffmpeg.org/ffmpeg/trunk

6c3d0218 09/17/2010 03:01 AM Ronald S. Bultje

Properly add HAVE_YASM around yasmified symbols. Should fix compile error
on configurations using --disable-yasm.

Originally committed as revision 25138 to svn://svn.ffmpeg.org/ffmpeg/trunk

e2e34104 09/17/2010 01:56 AM Ronald S. Bultje

Move hadamard_diff{,16}_{mmx,mmx2,sse2,ssse3}() from inline asm to yasm,
which will hopefully solve the Win64/FATE failures caused by these functions.

Originally committed as revision 25137 to svn://svn.ffmpeg.org/ffmpeg/trunk

d0acc2d2 09/17/2010 01:44 AM Ronald S. Bultje

Move sse16_sse2() from inline asm to yasm. It is one of the functions causing
Win64/FATE issues.

Originally committed as revision 25136 to svn://svn.ffmpeg.org/ffmpeg/trunk

1d16a1cf 09/14/2010 01:36 PM Ronald S. Bultje

Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping....

8acb554a 09/10/2010 02:25 AM Jason Garrett-Glaser

LGPL SSE2 H.264 iDCT
This leaves no more GPL-only H.264 decoding asm code.

Approved by Loren.

Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk

c6c98d08 09/08/2010 03:07 PM Stefano Sabatini

Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk

b1c32fb5 09/05/2010 10:10 AM Reimar Döffinger

Use "d" suffix for general-purpose registers used with movd.
This increases compatibilty with nasm and is also more consistent,
e.g. with h264_intrapred.asm and h264_chromamc.asm that already
do it that way.

Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk

7160bb71 09/04/2010 09:59 AM Stefano Sabatini

Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk

2c166c3a 09/03/2010 04:52 PM Ronald S. Bultje

Port latest x264 deblock asm (before they moved to using NV12 as internal
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.

Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk

a10a9f5c 09/01/2010 11:19 PM Eli Friedman

Fix typo in r25019.

Patch by Eli Friedman <eli.friedman at gmail dot com>.

Originally committed as revision 25022 to svn://svn.ffmpeg.org/ffmpeg/trunk

615da9b1 09/01/2010 09:10 PM Ronald S. Bultje

Unscrew breakage after my last commit because of symbol prefixes.

Originally committed as revision 25020 to svn://svn.ffmpeg.org/ffmpeg/trunk

a33a2562 09/01/2010 08:56 PM Ronald S. Bultje

Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square
biweight code to sse2/ssse3; add sse2 weight code; and use that same code to
create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be
removed. OK'ed by Jason on IRC....

14bc1f24 09/01/2010 08:48 PM Ronald S. Bultje

Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c,
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.

Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk

5929b3a6 08/31/2010 12:32 PM Ronald S. Bultje

Fix vertical align.

Originally committed as revision 25009 to svn://svn.ffmpeg.org/ffmpeg/trunk

79ce0f00 08/30/2010 08:30 PM Ronald S. Bultje

Fix compilation failure if yasm is disabled (missing vp3 symbols).

Originally committed as revision 24992 to svn://svn.ffmpeg.org/ffmpeg/trunk

de1c253b 08/30/2010 04:34 PM Ronald S. Bultje

Split intra prediction initialization (i.e. assigning of function pointers)
into its own file, it doesn't belong in h264dsp_mmx.c (much less so in
dsputil_mmx.c).

Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk

d0eb5a11 08/30/2010 04:31 PM Ronald S. Bultje

Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1
fate failures on Win64.

Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk

e9f5f020 08/30/2010 04:25 PM Ronald S. Bultje

Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6
issues on Win64.

Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk

7e7c4b60 08/30/2010 04:22 PM Ronald S. Bultje

Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx()
functions.

Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk

19d929f9 08/28/2010 09:03 PM Loren Merritt

cosmetics in imdct_sse

Originally committed as revision 24958 to svn://svn.ffmpeg.org/ffmpeg/trunk

4eca52ed 08/26/2010 02:33 PM Ronald S. Bultje

Fix typos when converting inline asm to yasm, fixes MMX-only fate-ea-vp61.

Originally committed as revision 24948 to svn://svn.ffmpeg.org/ffmpeg/trunk

6697bc33 08/25/2010 08:36 PM Ronald S. Bultje

Revert r24931, it broke Win32 and some BSD compiles (yay fate).

Originally committed as revision 24934 to svn://svn.ffmpeg.org/ffmpeg/trunk

72f64240 08/25/2010 07:57 PM Ronald S. Bultje

Mark xmm6 and xmm7 as clobbered in ff_vp3_idct_sse2(), which is contributing
to the VP6 fate failures on Win64.

Originally committed as revision 24931 to svn://svn.ffmpeg.org/ffmpeg/trunk

69dad87c 08/25/2010 03:41 PM Måns Rullgård

VP6: fix vp6_filter_diag4_mmx/sse on 64-bit

The stride can be negative and must be sign extended before being
used in pointer arithmetic.

Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk

89fa3504 08/25/2010 01:44 PM Ronald S. Bultje

Move vp6_filter_diag4() x86 SIMD code from inline ASM to YASM. This should
help in fixing the Win64 fate failures.

Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk

3a088514 08/25/2010 01:42 PM Ronald S. Bultje

Move vp6_filter_diag4() from DSPContext to VP56DSPContext.

Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk

c0ec9918 08/24/2010 05:47 PM Måns Rullgård

Remove global mm_flags variable

Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk

3611c45a 08/24/2010 04:52 PM Ronald S. Bultje

Mark xmm registers as clobbered in simple loopfilter. Should fix the last
two VP8-related fate failures on Win64.

Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk

cb4f1246 08/23/2010 03:51 PM Alex Converse

imdct/x86: Use "s->mdct_size" instead of "1 << s->mdct_bits".

It generates smaller cleaner code.

Originally committed as revision 24887 to svn://svn.ffmpeg.org/ffmpeg/trunk

684d608b 08/23/2010 02:41 AM Ronald S. Bultje

Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures).

Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk

78b5c97d 08/22/2010 02:39 PM Alex Converse

Convert ff_imdct_half_sse() to yasm.

This is to avoid split asm sections that attempt to preserve some
registers between sections.

Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk

05c04cdf 08/12/2010 01:11 AM Jason Garrett-Glaser

VP5/6/8: ~7% faster arithmetic decoding
Grab from the bitstream in 16-bit chunks instead of 8-bit chunks.
TODO: grab in 32-bit chunks on 64-bit systems.

Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk

4a384de5 08/07/2010 11:10 PM Jason Garrett-Glaser

Split h264dsp and h264pred in configure.
Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions
but not the weight/loopfilter functions.
This should reduce the size of builds with one of these derivatives but without
H.264 decoding itself....

98fe09df 08/05/2010 12:49 AM Jason Garrett-Glaser

Add file missing in r24702

Originally committed as revision 24703 to svn://svn.ffmpeg.org/ffmpeg/trunk

c12d6955 08/05/2010 12:13 AM Eli Friedman

H.264: SSE2/SSSE3 weighted prediction asm
Patch by Eli Friedman <eli.friedman at gmail dot com>

Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk

f079a64a 08/03/2010 08:59 PM Måns Rullgård

Move cavs dsp functions to their own struct

Originally committed as revision 24685 to svn://svn.ffmpeg.org/ffmpeg/trunk

8b9b5e08 08/03/2010 11:21 AM Jason Garrett-Glaser

VP5/6/8: add one inline missed in r24677

Originally committed as revision 24682 to svn://svn.ffmpeg.org/ffmpeg/trunk

827d43bb 08/02/2010 08:18 PM Jason Garrett-Glaser

VP8: move zeroing of luma DC block into the WHT
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.

Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk

6341838f 07/31/2010 11:13 PM Ronald S. Bultje

Use word-writing instead of dword-writing (with two cached but otherwise
unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte...

fa738b3a 07/31/2010 04:20 PM Vitor Sessak

Remove x86/mmx.h. It is not used anymore and has been deprecated for years.

Originally committed as revision 24618 to svn://svn.ffmpeg.org/ffmpeg/trunk

de4bc44a 07/31/2010 02:50 PM Vitor Sessak

Convert deinterlacing MMX code to YASM

Originally committed as revision 24615 to svn://svn.ffmpeg.org/ffmpeg/trunk

740dfe70 07/29/2010 10:45 PM Vitor Sessak

Fix compilation in x86_64. I broke it with r24580.

Originally committed as revision 24582 to svn://svn.ffmpeg.org/ffmpeg/trunk

2c3dda68 07/29/2010 10:19 PM Vitor Sessak

Translate libmpeg2 MMX IDCT to plain asm

Originally committed as revision 24580 to svn://svn.ffmpeg.org/ffmpeg/trunk

ab4d0318 07/26/2010 09:18 PM Ronald S. Bultje

Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster.

Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk

e25dee60 07/26/2010 07:34 PM Jason Garrett-Glaser

VP8: Much faster SSE2 MC
5-10% faster or more on Phenom, Athlon 64, and some others.
Helps some on pre-SSSE3 Intel chips as well, but not as much.

Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk

48adb7e7 07/26/2010 02:07 PM Ronald S. Bultje

Enable no-loop memory/register saving for ssse3/sse4 also.

Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk

2a180c69 07/26/2010 02:00 PM Ronald S. Bultje

Save a register (or regsize of stackspace for x86-32) for the no-loop
mbedge loopfilter functions, by re-using space that holds a variable
that we no longer need.

Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk

bcd4aa64 07/26/2010 01:56 PM Ronald S. Bultje

Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this
construct was always enabled, even for <ssse3 versions).

Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk

2208053b 07/26/2010 01:50 PM Ronald S. Bultje

Split pextrw macro-spaghetti into several opt-specific macros, this will make
future new optimizations (imagine a sse5) much easier. Also fix a bug where
we used the direction (%2) rather than optimization (%1) to enable this, which
means it wasn't ever actually used......

6de5b7c6 07/25/2010 02:42 AM Ronald S. Bultje

Fix obvious bug in assignment. Somehow, the test vectors don't test this...

Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk

e3f7bf77 07/24/2010 07:33 PM Ronald S. Bultje

Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this
splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath....

3611e7a3 07/23/2010 09:46 PM Eli Friedman

Inline asm for VP56 arith coder

This is a lot more reliable to get cmov rather than trying to trick gcc into
generating it, useful since it's 2% faster overall.

Patch by Eli Friedman <eli.friedman at gmail>

Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk

3ae079a3 07/23/2010 06:02 AM Jason Garrett-Glaser

VP8: optimize DC-only chroma case in the same way as luma.
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.

Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk

51c91564 07/23/2010 03:02 AM Jason Garrett-Glaser

VP8 asm: cosmetics (spacing)

Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk

8a467b2d 07/23/2010 02:58 AM Jason Garrett-Glaser

VP8: 30% faster idct_mb
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?...

c25c7767 07/23/2010 12:07 AM Jason Garrett-Glaser

VP8: clear DCT blocks in iDCT instead of using clear_blocks.
~0.3% faster overall.

Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk

dc5eec80 07/22/2010 07:59 PM Ronald S. Bultje

Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on
CPUs supporting it.

Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk

003243c3 07/22/2010 01:35 AM Ronald S. Bultje

Fix and enable horizontal >=SSE2 mbedge loopfilter.

Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk

c7b1d976 07/22/2010 12:39 AM Loren Merritt

relicense h264 deblock sse2 to lgpl

Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk

532e7697 07/21/2010 10:45 PM Loren Merritt

sync yasm macros from x264

Originally committed as revision 24406 to svn://svn.ffmpeg.org/ffmpeg/trunk

8731dbd8 07/21/2010 10:41 PM Jason Garrett-Glaser

Eliminate one instruction in VP8 dc_add_sse4

Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk

7dd224a4 07/21/2010 10:11 PM Jason Garrett-Glaser

Various VP8 x86 deblocking speedups
SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.

Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk

b8b231b5 07/21/2010 08:51 PM Jason Garrett-Glaser

Make mmx VP8 WHT faster
Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.

Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk

af521abc 07/21/2010 10:02 AM David Conrad

Add header declarations for mmx/sse constants missing them

Originally committed as revision 24381 to svn://svn.ffmpeg.org/ffmpeg/trunk

c7eec581 07/21/2010 10:02 AM David Conrad

Move ff_pw_* from vc1dsp_mmx.c to dsputil_mmx.c

Should fix compilation with icc and should help prevent any future duplicates

Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk

e9e456d8 07/20/2010 10:58 PM Ronald S. Bultje

VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)
and chroma (width=8).

Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk

268821e7 07/20/2010 10:04 PM Ronald S. Bultje

Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.

Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk

c60ed66d 07/19/2010 11:57 PM Ronald S. Bultje

Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's
wrong with it tomorrow or so, then re-submit.

Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk

6526976f 07/19/2010 10:38 PM Ronald S. Bultje

Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster than
regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag,
FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that
have been checked specifically on such CPUs and are actually faster than...

1878f685 07/19/2010 09:53 PM Ronald S. Bultje

Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.

Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk

fb9bdf04 07/19/2010 09:45 PM Ronald S. Bultje

Be more efficient with registers or stack memory. Saves 8/16 bytes stack
for x86-32, or 2 MM registers on x86-64.

Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk

3facfc99 07/19/2010 09:18 PM Ronald S. Bultje

Change function prototypes for width=8 inner and mbedge loopfilter functions
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register....

1ee076b1 07/18/2010 08:06 PM Loren Merritt

more credits to D. J. Bernstein for fft

Originally committed as revision 24308 to svn://svn.ffmpeg.org/ffmpeg/trunk

819b2dd2 07/16/2010 09:35 PM Ronald S. Bultje

Attempt to fix x86-64 testsuite on fate.

Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk

6f323f12 07/16/2010 07:54 PM Ronald S. Bultje

Remove duplicate define.

Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk

889b2c26 07/16/2010 07:54 PM Ronald S. Bultje

Revert 24270, it contained some stuff that shouldn't have been in there.

Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk

2356a783 07/16/2010 07:42 PM Ronald S. Bultje

Remove duplicate define.

Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk

ede1b966 07/16/2010 07:38 PM Ronald S. Bultje

Give x86 r%d registers names, this will simplify implementation of the chroma
inner loopfilter, and it also allows us to save one register on x86-64/sse2.

Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk

526e831a 07/16/2010 06:29 PM Ronald S. Bultje

Change return statement, the REP_RET is a mistake since the else case (x86-64,
sse2) doesn't actually loop, so REP_RET isn't necessary.

Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk

a711eb48 07/15/2010 11:02 PM Ronald S. Bultje

VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.

Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk

faa26db2 07/11/2010 10:53 PM David Conrad

MMX/SSE VC1 loop filter

Originally committed as revision 24208 to svn://svn.ffmpeg.org/ffmpeg/trunk

7af8fbd3 07/11/2010 10:52 PM David Conrad

Make ff_pw_4 128 bits

Originally committed as revision 24207 to svn://svn.ffmpeg.org/ffmpeg/trunk

881fd7a6 07/06/2010 05:48 PM Vitor Sessak

Move SSE optimized 32-point DCT to its own file. Should fix breakage with YASM
disabled.

Originally committed as revision 24078 to svn://svn.ffmpeg.org/ffmpeg/trunk

4dcc4f8e 07/06/2010 04:58 PM Vitor Sessak

SSE optimized 32-point DCT

Originally committed as revision 24077 to svn://svn.ffmpeg.org/ffmpeg/trunk

f2a30bd8 07/03/2010 07:26 PM Ronald S. Bultje

Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).

Originally committed as revision 24029 to svn://svn.ffmpeg.org/ffmpeg/trunk

b06855f1 07/03/2010 12:48 AM Jason Garrett-Glaser

SSSE3 versions of vp8 width4 bilinear MC functions

Originally committed as revision 24013 to svn://svn.ffmpeg.org/ffmpeg/trunk

dcc602d8 07/02/2010 05:27 AM Jason Garrett-Glaser

SSSE3 versions of width4 VP8 6-tap MC functions
Also make some small changes to saturation order of 4-tap SSSE3 MC to fix a
non-bitexactness bug.

Patch mostly by Eli Friedman <eli.friedman AT gmail DOT com>.

Originally committed as revision 23965 to svn://svn.ffmpeg.org/ffmpeg/trunk

8434fc26 07/01/2010 10:09 PM Jason Garrett-Glaser

Fix 100L in vp8dsp asm init

Originally committed as revision 23946 to svn://svn.ffmpeg.org/ffmpeg/trunk