| Branch: | Revision:

ffmpeg / libavcodec / x86 / dsputil_mmx.c @ fbb6b49d

History | View | Annotate | Download (118 KB)

# Date Author Comment
c73d99e6 02/02/2011 02:44 AM Justin Ruggles

Separate format conversion DSP functions from DSPContext.

This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.

Signed-off-by: Mans Rullgard <>

81f2a3f4 02/01/2011 01:55 AM Ronald S. Bultje

Implement a SIMD version of emulated_edge_mc() for x86.

From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.

d19b744a 01/31/2011 08:30 PM Justin Ruggles

cosmetics: indentation

Signed-off-by: Mans Rullgard <>

80ba1ddb 01/31/2011 08:28 PM Justin Ruggles

Remove unneeded add bias from 3 functions.


Signed-off-by: Mans Rullgard <>

6eabb0d3 01/22/2011 05:53 PM Justin Ruggles

Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.

Signed-off-by: Mans Rullgard <>

ef4a6514 01/18/2011 08:48 PM Mans Rullgard

Replace ASMALIGN with .p2align

This macro has unconditionally used .p2align for a long time and
serves no useful purpose.

ac3c9d01 01/18/2011 08:48 PM Mans Rullgard

x86: remove VLA in ac3_downmix_sse

ec3233a8 01/14/2011 11:26 PM Ronald S. Bultje

Fix ff_pw_3 alignment.

Originally committed as revision 26344 to svn://

19fb234e 01/14/2011 09:34 PM Jason Garrett-Glaser

H.264: split luma dc idct out and implement MMX/SSE2 versions
About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed....

8d147f1f 12/24/2010 05:23 PM Ronald S. Bultje

For rounding in chroma MC SSSE3, use 16-byte pw_3/4 instead of reading 8 bytes
and then using movlhps to dup it into the higher half of the register.

Originally committed as revision 26086 to svn://

90f1f3bf 12/06/2010 12:14 AM Baptiste Coudurier

In yadif filter, declare asm constants directly to avoid dependency on libavcodec

Originally committed as revision 25895 to svn://

9e95999e 12/04/2010 01:06 PM Baptiste Coudurier

10l, add ff_pw_1 to dsputil_mmx for yadif sse2

Originally committed as revision 25881 to svn://

80e33d24 11/01/2010 07:35 PM İsmail Dönmez

dsputil: Use explicit movzbl instead of movzx

This fixes compilation with the latest clang trunk version.

Patch by İsmail Dönmez, ismail at namtrac dot org

Originally committed as revision 25628 to svn://

153ca56b 10/31/2010 06:14 PM Ramiro Polla

xmm_clobbers: list xmm registers first in clobber list

suncc does not like the leading commas inside the macro, but it has no problem
with trailing commas.

Originally committed as revision 25615 to svn://

5d543a3d 10/31/2010 01:57 PM Ramiro Polla

dsputil_mmx: add xmm registers to clobber list

Originally committed as revision 25611 to svn://

559738ef 10/31/2010 01:13 PM Ramiro Polla

dsputil_mmx: prefer xmm registers below xmm6 when they are available

Originally committed as revision 25606 to svn://

dd68d4db 10/05/2010 10:06 PM Ronald S. Bultje

MMX, MMX2, SSE2 and SSSE3 optimizations for pred16x16/8x8_plane H264 intra
prediction (plus some with different rounding for svq3/rv40). Speedup (for
SSSE3) about ~6-fold, 3.6% faster overall with cathedral sample.

Originally committed as revision 25361 to svn://

329d689f 09/29/2010 03:34 PM Eli Friedman

Use sse2 variant of put_pixels16() for no_rnd also. Provides a minor speed
increase to e.g. vc1, snow and mpeg decoding.

Patch by Eli Friedman <eli dot friedman gmail com>.

Originally committed as revision 25259 to svn://

c6c98d08 09/08/2010 03:07 PM Stefano Sabatini

Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://

7160bb71 09/04/2010 09:59 AM Stefano Sabatini

Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://

2c166c3a 09/03/2010 04:52 PM Ronald S. Bultje

Port latest x264 deblock asm (before they moved to using NV12 as internal
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.

Originally committed as revision 25031 to svn://

14bc1f24 09/01/2010 08:48 PM Ronald S. Bultje

Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c,
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.

Originally committed as revision 25018 to svn://

79ce0f00 08/30/2010 08:30 PM Ronald S. Bultje

Fix compilation failure if yasm is disabled (missing vp3 symbols).

Originally committed as revision 24992 to svn://

d0eb5a11 08/30/2010 04:31 PM Ronald S. Bultje

Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1
fate failures on Win64.

Originally committed as revision 24989 to svn://

e9f5f020 08/30/2010 04:25 PM Ronald S. Bultje

Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6
issues on Win64.

Originally committed as revision 24988 to svn://

7e7c4b60 08/30/2010 04:22 PM Ronald S. Bultje

Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx()

Originally committed as revision 24987 to svn://

3a088514 08/25/2010 01:42 PM Ronald S. Bultje

Move vp6_filter_diag4() from DSPContext to VP56DSPContext.

Originally committed as revision 24921 to svn://

c0ec9918 08/24/2010 05:47 PM Måns Rullgård

Remove global mm_flags variable

Originally committed as revision 24909 to svn://

c12d6955 08/05/2010 12:13 AM Eli Friedman

H.264: SSE2/SSSE3 weighted prediction asm
Patch by Eli Friedman <eli.friedman at gmail dot com>

Originally committed as revision 24702 to svn://

f079a64a 08/03/2010 08:59 PM Måns Rullgård

Move cavs dsp functions to their own struct

Originally committed as revision 24685 to svn://

c7b1d976 07/22/2010 12:39 AM Loren Merritt

relicense h264 deblock sse2 to lgpl

Originally committed as revision 24408 to svn://

c7eec581 07/21/2010 10:02 AM David Conrad

Move ff_pw_* from vc1dsp_mmx.c to dsputil_mmx.c

Should fix compilation with icc and should help prevent any future duplicates

Originally committed as revision 24380 to svn://

e9e456d8 07/20/2010 10:58 PM Ronald S. Bultje

VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)
and chroma (width=8).

Originally committed as revision 24378 to svn://

a711eb48 07/15/2010 11:02 PM Ronald S. Bultje

VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.

Originally committed as revision 24250 to svn://

7af8fbd3 07/11/2010 10:52 PM David Conrad

Make ff_pw_4 128 bits

Originally committed as revision 24207 to svn://

f2a30bd8 07/03/2010 07:26 PM Ronald S. Bultje

Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).

Originally committed as revision 24029 to svn://

b3858964 06/27/2010 03:11 PM Eli Friedman

Add const to some pointer parameters.

Patch by Eli Friedman, eli D friedman A gmail

Originally committed as revision 23826 to svn://

4af8cdfc 06/25/2010 06:25 PM Jason Garrett-Glaser

16x16 and 8x8c x86 SIMD intra pred functions for VP8 and H.264

Originally committed as revision 23783 to svn://

413abbe1 06/04/2010 04:46 AM David Conrad

Add bitexact versions of put_no_rnd_pixels8 _x2 and _y2 for vp3/theora

Originally committed as revision 23463 to svn://

eb6a6cd7 04/17/2010 02:04 AM David Conrad

vp3: DC-only IDCT

2-4% faster overall decode

Originally committed as revision 22896 to svn://

4693b031 03/16/2010 01:17 AM Måns Rullgård

Move H264 dsputil functions into their own struct

This moves the H264-specific functions from DSPContext to the new
H264DSPContext. The code is made conditional on CONFIG_H264DSP
which is set by the codecs requiring it.

The qpel and chroma MC functions are not moved as these are used by...

05aec7bb 03/14/2010 05:50 PM Måns Rullgård

Separate DWT from snow and dsputil

This moves the DWT functions from snow.c and dsputil.c to a file of
their own. A new struct, DWTContext, holds the function pointers
previously part of DSPContext.

Originally committed as revision 22522 to svn://

f49747e9 03/06/2010 10:37 PM Måns Rullgård

x86: move function prototypes to header files

Originally committed as revision 22266 to svn://

84dc2d8a 03/06/2010 02:24 PM Måns Rullgård

Remove DECLARE_ALIGNED_{8,16} macros

These macros are redundant. All uses are replaced with the generic
DECLARE_ALIGNED macro instead.

Originally committed as revision 22233 to svn://

19530266 02/10/2010 02:02 AM David Conrad

Enable SSE2 (put|avg)_pixels_16_sse2

SVQ1 chroma has been special-cased aligned to 16-bytes since at least r15466
Other architectures also assume 16-byte alignment here too but set STRIDE_ALIGN
to 16.

Originally committed as revision 21736 to svn://

3deb5384 01/22/2010 11:07 PM Alex Converse

Implement an sse version of scalarproduct_float().

Originally committed as revision 21386 to svn://

c6727809 01/22/2010 03:25 AM Måns Rullgård

Move array specifiers outside DECLARE_ALIGNED() invocations

Originally committed as revision 21377 to svn://

5716aec3 01/04/2010 09:19 AM Gwenole Beauchesne

Fix XvMC. XvMCCreateBlocks() may not allocate 16-byte aligned blocks,
so we can't use SSE-optimized routines.

Originally committed as revision 21011 to svn://

4052cbf1 12/30/2009 11:33 AM Diego Biurrun

Get rid of pointless CONFIG_ANY_H263 preprocessor definition.

Originally committed as revision 20975 to svn://

91e644ff 12/05/2009 05:51 PM Loren Merritt

r20739 broke compilation on systems without yasm

Originally committed as revision 20742 to svn://

b1159ad9 12/05/2009 03:09 PM Loren Merritt

refactor and optimize scalarproduct
29-105% faster apply_filter, 6-90% faster ape decoding on core2
(Any x86 other than core2 probably gets much less, since this is mostly due to ssse3 cachesplit avoidance and I haven't written the full gamut of other cachesplit modes.)...

b10fa1bb 12/03/2009 06:53 PM Loren Merritt

port ape dsp functions from sse2 to mmx
now requires yasm

Originally committed as revision 20722 to svn://

e17ccf60 10/18/2009 08:47 PM Loren Merritt

huffyuv: add some const qualifiers

Originally committed as revision 20290 to svn://

2f77923d 10/18/2009 08:10 PM Loren Merritt

simd add_hfyu_left_prediction
2.2x faster than C on conroe, 3.6x on penryn.
4-6% faster huffyuv decoding if using left or plane mode and yuv

Originally committed as revision 20287 to svn://

35de5d24 09/27/2009 04:52 PM Måns Rullgård

cosmetics: fix indentation after previous commit

Originally committed as revision 20062 to svn://

952e8721 09/27/2009 04:51 PM Måns Rullgård

Drop unused args from vector_fmul_add_add, simpify code, and rename

The src3 and step arguments to vector_fmul_add_add() are always zero
and one, respectively. This removes these arguments from the function,
simplifies the code accordingly, and renames the function to better...

9263a05a 08/27/2009 03:52 PM Vitor Sessak

Mark "i" parameter of vector_clipf_sse() as early-clobber

Originally committed as revision 19731 to svn://

50e23ae9 08/27/2009 03:38 PM Vitor Sessak

Mark parameter src of vector_clipf() as const

Originally committed as revision 19729 to svn://

0a68cd87 08/27/2009 02:49 PM Vitor Sessak

SSE optimized vector_clipf(). 10% faster TwinVQ decoding.

Originally committed as revision 19728 to svn://

9be6f0d2 07/29/2009 09:54 AM Diego Biurrun

Do not check for both CONFIG_VC1_DECODER and CONFIG_WMV3_DECODER,
the former depends upon the latter.

Originally committed as revision 19533 to svn://

99e5a9d1 07/22/2009 10:27 PM Diego Biurrun

Do not redundantly check for both CONFIG_THEORA_DECODER and CONFIG_VP3_DECODER.
The Theora decoder depends on the VP3 decoder.

Originally committed as revision 19492 to svn://

36904c4c 07/17/2009 09:07 AM Carl Eugen Hoyos

Icc 11.1 still does not align the stack pointer, disable some x264 functions.

Originally committed as revision 19454 to svn://

73b02e24 06/16/2009 05:33 PM Jason Garrett-Glaser

SSE version of clear_blocks

Originally committed as revision 19206 to svn://

c21c835b 04/15/2009 07:10 PM David Conrad

avg_ pixel functions need to use (dst+pix+1)>>1 to average with existing
pixels, not (dst+pix)>>1.
This makes the mmx functions bitexact with the C functions.

Originally committed as revision 18527 to svn://

9bf0fdf3 04/15/2009 02:25 AM David Conrad

VC1: extend MMX qpel MC to include MMX2 avg qpel

Originally committed as revision 18519 to svn://

8013da73 04/14/2009 11:56 PM David Conrad

VC1: add and use avg_no_rnd chroma MC functions

Originally committed as revision 18518 to svn://

c374691b 04/14/2009 11:55 PM David Conrad

Rename put_no_rnd_h264_chroma* to reflect its usage in VC1 only

Originally committed as revision 18517 to svn://

6b434361 04/04/2009 01:20 PM Stefano Sabatini

Rename FF_MM_MMXEXT to FF_MM_MMX2, for both clarity and consistency
with libswscale.

Originally committed as revision 18330 to svn://

0be9e73e 04/03/2009 02:03 PM Reimar Döffinger

Mark line_skip3 asm argument as output-only instead of using av_uninit.

Originally committed as revision 18327 to svn://

d7460a9c 04/03/2009 02:02 PM Reimar Döffinger

Mark put_signed_pixels_clamped_mmx output operands as early-clobber because
they are. Hopefully fixes some FATE errors, too.

Originally committed as revision 18326 to svn://

531a3d27 04/03/2009 02:01 PM Reimar Döffinger

Use DECLARE_ASM_CONST for non-global ff_vector128 constant used via MANGLE

Originally committed as revision 18325 to svn://

3dd65312 04/02/2009 09:02 PM Alex Converse

Rewrite put_signed_pixels_clamped_mmx() to eliminate mmx.h from dsputil_mmx.c.

Originally committed as revision 18319 to svn://

ecb24904 02/13/2009 12:02 AM Zuxy Meng

add SSE2 version of vp6_filter_diag
original patch by Zuxy Meng zuxy.meng at gmail dot com

Originally committed as revision 17195 to svn://

6af3c226 02/12/2009 11:52 PM Sebastien Lucas

add MMX version of vp6_filter_diag
original patch by Sebastien Lucas sebastien.lucas at gmail dot com

Originally committed as revision 17194 to svn://

5110b25e 02/12/2009 11:48 PM Aurelien Jacobs

convert ff_pw_64 into an xmm_reg for future use in vp6 sse code

Originally committed as revision 17192 to svn://

d3a4b4e0 02/11/2009 11:16 AM Diego Biurrun

Add check whether the compiler/assembler supports 10 or more operands.
thanks to Loren for some help with the asm statements

Originally committed as revision 17151 to svn://

3daa434a 02/08/2009 05:45 PM Loren Merritt

overall ffvhuff decoding speedup: 28% on core2, 25% on k8.

Originally committed as revision 17059 to svn://

137ae327 01/26/2009 03:40 AM David Conrad

Workaround for gcc 3.4 to align sh properly

Originally committed as revision 16797 to svn://

406792e7 01/19/2009 03:46 PM Diego Biurrun

cosmetics: Remove pointless period after copyright statement non-sentences.

Originally committed as revision 16684 to svn://

49fb20cb 01/14/2009 05:19 PM Aurelien Jacobs

replace all occurrence of ENABLE_ by the corresponding CONFIG_, HAVE_ or ARCH_
and remove all ENABLE_ definitions.

Originally committed as revision 16600 to svn://

b250f9c6 01/13/2009 11:44 PM Aurelien Jacobs

Change semantic of CONFIG_*, HAVE_* and ARCH_*.
They are now always defined to either 0 or 1.

Originally committed as revision 16590 to svn://

c47d146b 01/05/2009 01:57 PM Diego Biurrun

Add missing 'void' keyword to parameterless function declarations.

Originally committed as revision 16436 to svn://

21ff7689 01/04/2009 01:36 AM Mathieu Velten

Use H264 MMX chroma functions to accelerate RV40 decoding.

Patch by Mathieu Velten (matmaul A gmail)

Originally committed as revision 16419 to svn://

37fed100 01/03/2009 12:46 AM Jason Garrett-Glaser

Add x264 SSE2 iDCT functions to H.264 decoder.

Originally committed as revision 16409 to svn://

2c67c659 12/28/2008 07:40 PM Carl Eugen Hoyos

Fix h264 decoding on SSE2 cores with icc compilation.

Originally committed as revision 16373 to svn://

c1fc7036 12/26/2008 12:19 AM Jason Garrett-Glaser

Fix compilation without optimization under 64-bit with x264 deblock asm enabled.

Originally committed as revision 16313 to svn://

a6493a8f 12/22/2008 09:12 AM Diego Biurrun

Rename libavcodec/i386/ --> libavcodec/x86/.
It contains optimizations that are not specific to i386 and
libavutil uses this naming scheme already.

Originally committed as revision 16270 to svn://