| Branch: | Revision:

ffmpeg / libavcodec / x86 / h264dsp_mmx.c @ 888fa31e

History | View | Annotate | Download (20.1 KB)

# Date Author Comment
5705b020 05/11/2011 06:09 PM Jason Garrett-Glaser

10-bit H.264 x86 chroma v loopfilter asm

Also delete some unused deblock asm macros.

9f3d6ca4 05/11/2011 03:02 AM Jason Garrett-Glaser

Port x86 10-bit H.264 deblock asm from x264

8ad77b65 05/11/2011 03:01 AM Jason Garrett-Glaser

Update x86 H.264 deblock asm

Includes AVX versions from x264.

86b29553 05/10/2011 12:39 PM Ronald S. Bultje

h264dsp_mmx: place bracket outside #if/#endif block.

Should fix compile on systems missing yasm/nasm.

19a0729b 05/10/2011 11:24 AM Oskar Arvidsson

Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.

This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_

] (i.e. the old...

2912e87a 03/19/2011 01:33 PM Mans Rullgard

Replace FFmpeg with Libav in licence headers

Signed-off-by: Mans Rullgard <>

19fb234e 01/14/2011 09:34 PM Jason Garrett-Glaser

H.264: split luma dc idct out and implement MMX/SSE2 versions
About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed....

a52ffc3f 09/29/2010 05:42 PM Ronald S. Bultje

Move static inline function to a macro, so that constant propagation in
inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE
breakage after r25254.

Originally committed as revision 25262 to svn://

cd17285e 09/29/2010 02:04 PM Ronald S. Bultje

Merge b_idx and edge variables, and optimize the ASM to directly load variables
from memory locations/offsets depending on b_idx plus constants, rather than
having gcc do this. This saves several lea calls and together saves about
10 cycles in h264_loop_filter_strength_mmx2()....

0cc8a5d0 09/29/2010 02:03 PM Ronald S. Bultje

Remove mv_mask variable. Replace the related pand -1/0 instructions by either
a pxor, or remove the instruction alltogether. Altogether, this saves 1

Originally committed as revision 25255 to svn://

c0673f2c 09/29/2010 02:02 PM Ronald S. Bultje

Remove d_idx as a variable, and instead load it as a constant in the asm.
This has no measurable speed effect because the surrounding code doesn't
take advantage of this yet.

Originally committed as revision 25254 to svn://

2c3135f6 09/29/2010 01:35 PM Ronald S. Bultje

Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid
of the d_idx variable and therefore allows for future optimizations. No speed
difference by this commit itself.

Originally committed as revision 25253 to svn://

4b81511c 09/29/2010 01:34 PM Ronald S. Bultje

Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows
inlining various constants within the loop code. 20 cycles faster on
cathedral sample.

Originally committed as revision 25252 to svn://

7e117771 09/24/2010 03:31 PM Ronald S. Bultje

Remove unused variable.

Originally committed as revision 25173 to svn://

c0bc8b9a 09/21/2010 05:57 PM Måns Rullgård

x86: disable SSE functions using stack when stack is not aligned

This fixes crashes with ICC 10.1.

Originally committed as revision 25153 to svn://

f41237c9 09/18/2010 08:44 PM Måns Rullgård

x86: remove hack disabling sse2 h264 loop filter with 32-bit icc

Originally committed as revision 25146 to svn://

1d16a1cf 09/14/2010 01:36 PM Ronald S. Bultje

Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping....

8acb554a 09/10/2010 02:25 AM Jason Garrett-Glaser

This leaves no more GPL-only H.264 decoding asm code.

Approved by Loren.

Originally committed as revision 25092 to svn://

c6c98d08 09/08/2010 03:07 PM Stefano Sabatini

Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://

7160bb71 09/04/2010 09:59 AM Stefano Sabatini

Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://

2c166c3a 09/03/2010 04:52 PM Ronald S. Bultje

Port latest x264 deblock asm (before they moved to using NV12 as internal
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.

Originally committed as revision 25031 to svn://

a33a2562 09/01/2010 08:56 PM Ronald S. Bultje

Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square
biweight code to sse2/ssse3; add sse2 weight code; and use that same code to
create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be
removed. OK'ed by Jason on IRC....

14bc1f24 09/01/2010 08:48 PM Ronald S. Bultje

Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c,
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.

Originally committed as revision 25018 to svn://

de1c253b 08/30/2010 04:34 PM Ronald S. Bultje

Split intra prediction initialization (i.e. assigning of function pointers)
into its own file, it doesn't belong in h264dsp_mmx.c (much less so in

Originally committed as revision 24990 to svn://

d0eb5a11 08/30/2010 04:31 PM Ronald S. Bultje

Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1
fate failures on Win64.

Originally committed as revision 24989 to svn://

7e7c4b60 08/30/2010 04:22 PM Ronald S. Bultje

Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx()

Originally committed as revision 24987 to svn://

c0ec9918 08/24/2010 05:47 PM Måns Rullgård

Remove global mm_flags variable

Originally committed as revision 24909 to svn://

4a384de5 08/07/2010 11:10 PM Jason Garrett-Glaser

Split h264dsp and h264pred in configure.
Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions
but not the weight/loopfilter functions.
This should reduce the size of builds with one of these derivatives but without
H.264 decoding itself....

c12d6955 08/05/2010 12:13 AM Eli Friedman

H.264: SSE2/SSSE3 weighted prediction asm
Patch by Eli Friedman <eli.friedman at gmail dot com>

Originally committed as revision 24702 to svn://

17dc7c7a 07/01/2010 10:29 AM Jason Garrett-Glaser

Fix h264/vp8 intra pred on Athlon XP
Whose idea was it to have a CPU that didn't SIGILL on an invalid instruction?

Originally committed as revision 23927 to svn://

29e71937 06/29/2010 12:28 PM Jason Garrett-Glaser

Add missing mm_support call toff_h264_pred_init_x86.
I'm not sure if this is supposed to be here, but it can't hurt.

Originally committed as revision 23885 to svn://

bc14f04b 06/29/2010 12:23 AM Jason Garrett-Glaser

MMXEXT version of vp8 4x4 vertical pred

Originally committed as revision 23876 to svn://

fb9927ad 06/28/2010 11:53 PM Jason Garrett-Glaser

Add mmx/mmxext/ssse3 4x4 TM intra pred functions for vp8

Originally committed as revision 23875 to svn://

270a85d2 06/28/2010 11:35 PM Jason Garrett-Glaser

Fix some intra pred MMX functions that used MMXEXT instructions
Also add predict_4x4_dc MMXEXT function for vp8/h264.

Originally committed as revision 23873 to svn://

50f70541 06/28/2010 09:12 PM Baptiste Coudurier

Change MMXEXT to MMX2, MMXEXT is deprecated

Originally committed as revision 23865 to svn://

1f65b67c 06/28/2010 10:02 AM Måns Rullgård

Fix x86 build with h264dsp disabled

Originally committed as revision 23844 to svn://

96da2a69 06/25/2010 06:34 PM Carl Eugen Hoyos

Cosmetics: Fix indentation.

Originally committed as revision 23785 to svn://

4af8cdfc 06/25/2010 06:25 PM Jason Garrett-Glaser

16x16 and 8x8c x86 SIMD intra pred functions for VP8 and H.264

Originally committed as revision 23783 to svn://

1c71b5c8 05/10/2010 09:16 PM Reimar Döffinger

Replace more "m" constraints with MANGLE to fix compilation issues
with x86_32 gcc 4.4.4 and -fPIC.

Originally committed as revision 23082 to svn://

27eecec3 04/01/2010 04:52 PM Reimar Döffinger

Convert two "m" constraints to MANGLE to fix compilation with some compilers.

Originally committed as revision 22760 to svn://

84dc2d8a 03/06/2010 02:24 PM Måns Rullgård

Remove DECLARE_ALIGNED_{8,16} macros

These macros are redundant. All uses are replaced with the generic
DECLARE_ALIGNED macro instead.

Originally committed as revision 22233 to svn://

900479bb 01/26/2010 05:17 PM Loren Merritt

optimize h264_loop_filter_strength_mmx2
244->160 cycles on core2

Originally committed as revision 21462 to svn://

c6727809 01/22/2010 03:25 AM Måns Rullgård

Move array specifiers outside DECLARE_ALIGNED() invocations

Originally committed as revision 21377 to svn://

1f630b97 01/21/2010 09:46 AM David Conrad

Use two separate memory arguments since 8+() is invalid gas syntax

Originally committed as revision 21360 to svn://

b4c2ada5 01/20/2010 07:23 PM Michael Niedermayer

Attempt to fix asm compilation failure.
Only tested on gcc 4 & x86_64.

Originally committed as revision 21355 to svn://

c4f2b6dc 01/20/2010 12:34 AM David Conrad

Use constant offsets for memory operands since gcc is unable to
This fixes gcc failing to fit 6 memory locations into 7 registers on x86-32

Originally committed as revision 21337 to svn://

9ac4548f 01/19/2010 04:40 PM Michael Niedermayer

Fix h264_loop_filter_strength_mmx2() so it works with b frames.

Originally committed as revision 21327 to svn://

ebddd2e2 01/19/2010 02:28 PM Michael Niedermayer

Remove -2 -> -1 remapping, its not needed anymore as we must remap all
references per LUT anyway.

Originally committed as revision 21323 to svn://

74a841af 06/04/2009 11:25 PM Ramiro Polla

Replace more uses of attribute((aligned)) by DECLARE_ALIGNED.

Originally committed as revision 19089 to svn://

2b9969a9 05/30/2009 10:19 PM Alexander Strange

H264: Fix out of bounds reads in SSSE3 MC

Reading above src[-2] isn't safe, so move loads and palignr ahead
3 pixels to load starting at the first pixel actually used.

Fixes issue941.

Originally committed as revision 18999 to svn://

8013da73 04/14/2009 11:56 PM David Conrad

VC1: add and use avg_no_rnd chroma MC functions

Originally committed as revision 18518 to svn://

c374691b 04/14/2009 11:55 PM David Conrad

Rename put_no_rnd_h264_chroma* to reflect its usage in VC1 only

Originally committed as revision 18517 to svn://

353f87b8 02/08/2009 06:35 AM Baptiste Coudurier

fix typo in h264dsp_mmx (no effect currently as the function is not used), approved by Dark Shikari on IRC

Originally committed as revision 17046 to svn://

b250f9c6 01/13/2009 11:44 PM Aurelien Jacobs

Change semantic of CONFIG_*, HAVE_* and ARCH_*.
They are now always defined to either 0 or 1.

Originally committed as revision 16590 to svn://

21ff7689 01/04/2009 01:36 AM Mathieu Velten

Use H264 MMX chroma functions to accelerate RV40 decoding.

Patch by Mathieu Velten (matmaul A gmail)

Originally committed as revision 16419 to svn://

37fed100 01/03/2009 12:46 AM Jason Garrett-Glaser

Add x264 SSE2 iDCT functions to H.264 decoder.

Originally committed as revision 16409 to svn://

a6493a8f 12/22/2008 09:12 AM Diego Biurrun

Rename libavcodec/i386/ --> libavcodec/x86/.
It contains optimizations that are not specific to i386 and
libavutil uses this naming scheme already.

Originally committed as revision 16270 to svn://