Statistics
| Branch: | Revision:

ffmpeg / libavcodec / i386 @ 40d0e665

# Date Author Comment
40d0e665 05/08/2008 09:11 PM Ramiro Polla

Do not misuse long as the size of a register in x86.
typedef x86_reg as the appropriate size and use it instead.

Originally committed as revision 13081 to svn://svn.ffmpeg.org/ffmpeg/trunk

57105ddd 04/26/2008 04:02 PM Diego Biurrun

Rename i386/cputest.c --> i386/cpuid.c.

Originally committed as revision 13002 to svn://svn.ffmpeg.org/ffmpeg/trunk

c88c253d 04/17/2008 09:57 PM Diego Biurrun

cosmetics: asm volatile --> asm volatile

Originally committed as revision 12885 to svn://svn.ffmpeg.org/ffmpeg/trunk

80465c7e 04/16/2008 08:51 PM Diego Biurrun

cosmetics: Fix nonstandard indentation.

Originally committed as revision 12863 to svn://svn.ffmpeg.org/ffmpeg/trunk

591d87ba 04/16/2008 08:43 PM Jeff Downs

Cosmetics:
Break long lines.
Correct spelling in comment (duplicatin -> duplicating)

Originally committed as revision 12862 to svn://svn.ffmpeg.org/ffmpeg/trunk

52cb7981 04/16/2008 04:40 AM Jeff Downs

Redo r12838, this time using svn copy to create h264_i386.h from cabac.h.

Move decode_significance_x86() and decode_significance_8x8_x86() to
i386-specific file from cabac.h.
New file is h264-oriented and only included from h264.c
Resolves compilation when configured with --disable-optimizations due to...

3aa9ede4 04/16/2008 04:26 AM Jeff Downs

Revert 12838 to redo it the right way (use svn copy to create new
file based on old).

Originally committed as revision 12845 to svn://svn.ffmpeg.org/ffmpeg/trunk

f73a6393 04/16/2008 01:36 AM Alexander Strange

Add a new xvid-style IDCT using SSE2.

Originally committed as revision 12843 to svn://svn.ffmpeg.org/ffmpeg/trunk

e6cfd8ff 04/15/2008 01:51 PM Jeff Downs

Move decode_significance_x86() and decode_significance_8x8_x86() to
i386-specific file from cabac.h.
New file is h264-oriented and only included from h264.c
Resolves compilation when configured with --disable-optimizations due to
decode_significance_8x8_x86 using last_coeff_flag_offset_8x8, which is...

3fbe7118 04/14/2008 08:54 PM Luca Barbato

Eliminate movdqu in vp3dsp_sse2, patch from Alexander Strange astrangeAtithinkswDoTcom

Originally committed as revision 12824 to svn://svn.ffmpeg.org/ffmpeg/trunk

54a0b6e5 04/12/2008 04:54 PM Alexander Strange

Add a header file to declare Xvid IDCT functions.
patch by Alexander Strange, astrange ithinksw com

Originally committed as revision 12794 to svn://svn.ffmpeg.org/ffmpeg/trunk

96275520 04/08/2008 11:49 PM Loren Merritt

Fix H.264 interframe decoding when compiling with icc. Patch by Loren
Merritt:

"It seems that icc copies the constants from their global var onto the
stack, at which point they're not aligned, hence the crash.
[This change] really shouldn't mean anything different, but maybe it'll...

ce53144b 04/01/2008 04:51 AM Loren Merritt

h264 chroma mc ssse3
width8: 180->92, width4: 78->63 cycles (core2)

Originally committed as revision 12661 to svn://svn.ffmpeg.org/ffmpeg/trunk

04932b0d 03/22/2008 04:46 PM Diego Biurrun

cosmetics: typo fixes

Originally committed as revision 12554 to svn://svn.ffmpeg.org/ffmpeg/trunk

9e8e6d31 03/21/2008 12:36 PM Zuxy Meng

Add missed call to ff_cavsdsp_init_3dnow() in dsputil_init_mmx()

Originally committed as revision 12540 to svn://svn.ffmpeg.org/ffmpeg/trunk

943032b1 03/20/2008 02:24 PM Michael Niedermayer

Hardcode register to prevent aparent miscompilation.
Fixes regression tests with gcc 2.95.

Originally committed as revision 12512 to svn://svn.ffmpeg.org/ffmpeg/trunk

dea00a46 03/20/2008 02:09 PM Michael Niedermayer

remove unused temp

Originally committed as revision 12511 to svn://svn.ffmpeg.org/ffmpeg/trunk

b55aa9a9 03/17/2008 11:08 PM Måns Rullgård

get register names from x86_cpu.h

Originally committed as revision 12482 to svn://svn.ffmpeg.org/ffmpeg/trunk

5a6a9e78 03/04/2008 12:07 AM Aurelien Jacobs

move draw_edges() into dsputil

Originally committed as revision 12309 to svn://svn.ffmpeg.org/ffmpeg/trunk

97d1d009 02/25/2008 11:14 PM Aurelien Jacobs

split encoding part of dsputil_mmx into its own file

Originally committed as revision 12223 to svn://svn.ffmpeg.org/ffmpeg/trunk

f2217d6f 02/24/2008 02:47 PM Reimar Döffinger

__asm __volatile -> asm volatile part 2

Originally committed as revision 12189 to svn://svn.ffmpeg.org/ffmpeg/trunk

78d3d94f 02/24/2008 02:46 PM Reimar Döffinger

__asm __volatile -> asm volatile, improves code consistency and works
(as far as that is possible) with the Sun C compiler.

Originally committed as revision 12188 to svn://svn.ffmpeg.org/ffmpeg/trunk

4a9ca0a2 02/21/2008 07:10 AM Loren Merritt

simd and unroll png_filter_row
cycles per 1000 pixels on core2:
left: 9211->5170
top: 9283->2138
avg: 12215->7611
paeth: 64024->17360
overall rgb png decoding speed: +45%
overall greyscale png decoding speed: +6%

Originally committed as revision 12164 to svn://svn.ffmpeg.org/ffmpeg/trunk

1435e4cc 02/21/2008 12:06 AM Michael Niedermayer

Disabling all SSE* code for old gcc to avoid alignment issues.

Originally committed as revision 12163 to svn://svn.ffmpeg.org/ffmpeg/trunk

754bf3d8 02/19/2008 09:55 PM Reimar Döffinger

Fix warnings:
i386/vp3dsp_sse2.c:805: warning: cast discards qualifiers from pointer target type
i386/vp3dsp_sse2.c:806: warning: cast discards qualifiers from pointer target type

Originally committed as revision 12150 to svn://svn.ffmpeg.org/ffmpeg/trunk

5edac5dc 02/13/2008 01:18 AM Diego Biurrun

cosmetics: Replace // by /* */ comments.
sync with upstream libmpeg2 0.4.1

Originally committed as revision 11915 to svn://svn.ffmpeg.org/ffmpeg/trunk

ec199cc9 02/10/2008 01:45 AM Loren Merritt

asm argument that might be in memory needs a size

Originally committed as revision 11890 to svn://svn.ffmpeg.org/ffmpeg/trunk

2c70770e 02/09/2008 05:29 AM Loren Merritt

use fewer registers in apply_welch_window_sse2

Originally committed as revision 11882 to svn://svn.ffmpeg.org/ffmpeg/trunk

1d67b037 02/06/2008 12:32 PM Loren Merritt

sse2 h264 motion compensation. not new code, just separate out the cases that didn't need ssse3.

Originally committed as revision 11877 to svn://svn.ffmpeg.org/ffmpeg/trunk

20d565be 02/06/2008 04:44 AM Loren Merritt

put loop counter in a register if possible. makes some of the qpel functions 3% faster.

Originally committed as revision 11876 to svn://svn.ffmpeg.org/ffmpeg/trunk

7080ec29 02/06/2008 04:14 AM Loren Merritt

fix aliasing warnings. simpler too.

Originally committed as revision 11875 to svn://svn.ffmpeg.org/ffmpeg/trunk

a2b7bc8e 02/06/2008 03:51 AM Loren Merritt

constant was excessively aligned

Originally committed as revision 11874 to svn://svn.ffmpeg.org/ffmpeg/trunk

ddf96970 02/05/2008 11:22 AM Loren Merritt

ssse3 h264 motion compensation.
25% faster tham mmx on core2, 35% if you discount fullpel, 4% overall decoding.

Originally committed as revision 11871 to svn://svn.ffmpeg.org/ffmpeg/trunk

b64dfbb8 02/05/2008 03:58 AM Loren Merritt

add qpel rounder once during hv rather than twice during hv and whatever it's averaged with

Originally committed as revision 11870 to svn://svn.ffmpeg.org/ffmpeg/trunk

fa9b873e 02/05/2008 01:16 AM Loren Merritt

clean up an ugliness introduced in r11826. this syntax will require fewer changes when adding future sse2 code.

Originally committed as revision 11868 to svn://svn.ffmpeg.org/ffmpeg/trunk

9a7871f7 02/04/2008 08:03 PM Michael Niedermayer

Deprecate old and inefficient per instruction asm().

Originally committed as revision 11865 to svn://svn.ffmpeg.org/ffmpeg/trunk

b2f77586 02/04/2008 04:20 PM Loren Merritt

reduce code duplication

Originally committed as revision 11863 to svn://svn.ffmpeg.org/ffmpeg/trunk

b313e815 02/03/2008 05:04 PM Loren Merritt

avg_pixels4_mmx2

Originally committed as revision 11829 to svn://svn.ffmpeg.org/ffmpeg/trunk

6c01d006 02/03/2008 04:19 PM Loren Merritt

use mmx2/3dnow avg functions in avg_qpel*_mc00

Originally committed as revision 11828 to svn://svn.ffmpeg.org/ffmpeg/trunk

ed5d7a53 02/03/2008 07:05 AM Loren Merritt

ff_h264_idct8_add_sse2.
compared to mmx, 217->126 cycles on core2, 262->220 on k8.

Originally committed as revision 11826 to svn://svn.ffmpeg.org/ffmpeg/trunk

51f0ac65 02/03/2008 03:21 AM Loren Merritt

remove some movq in ff_h264_idct8_add_mmx. 225->217 cycles on core2.

Originally committed as revision 11825 to svn://svn.ffmpeg.org/ffmpeg/trunk

066e0cc5 01/30/2008 11:54 PM Baptiste Coudurier

add parenthesis, fix warning: i386/dsputil_mmx.c:2618: warning: suggest parentheses around arithmetic in operand of |

Originally committed as revision 11673 to svn://svn.ffmpeg.org/ffmpeg/trunk

afa47789 01/30/2008 11:52 PM Baptiste Coudurier

fix prototypes, remove warning: i386/dsputil_mmx.c:3594: warning: assignment from incompatible pointer type

Originally committed as revision 11672 to svn://svn.ffmpeg.org/ffmpeg/trunk

766324fc 01/27/2008 08:50 PM Reimar Döffinger

Add and use DECLARE_ASM_CONST for constants used in assembler code.
Should make it easier to work around compilation problems with e.g. ICC.

Originally committed as revision 11641 to svn://svn.ffmpeg.org/ffmpeg/trunk

038f0f9b 01/27/2008 08:45 PM Reimar Döffinger

Use DECLARE_ALIGNED in yet another place

Originally committed as revision 11640 to svn://svn.ffmpeg.org/ffmpeg/trunk

6a1a2fa0 01/27/2008 07:59 PM Reimar Döffinger

Use DECLARE_ALIGNED and remove unneeded attribute_used

Originally committed as revision 11639 to svn://svn.ffmpeg.org/ffmpeg/trunk

27215c6b 01/27/2008 02:46 PM Reimar Döffinger

Use DECLARE_ALIGNED

Originally committed as revision 11630 to svn://svn.ffmpeg.org/ffmpeg/trunk

426d18b8 01/16/2008 09:21 PM Diego Biurrun

Rename illegal identifiers, _ followed by capital is reserved for the system.

Originally committed as revision 11541 to svn://svn.ffmpeg.org/ffmpeg/trunk

28748a91 01/11/2008 08:29 AM Christophe Gisquet

Factorize some duplicated code from CAVS and H.264 into a common file.
patch by Christophe Gisquet, christophe.gisquet free fr

Originally committed as revision 11504 to svn://svn.ffmpeg.org/ffmpeg/trunk

ae904fd0 01/02/2008 07:24 PM Christophe Gisquet

Fix issue #301:
summary of changes:
- Use MANGLE when loading some constants into MMX registers.
- Convert those constants to non-static and thus add ff_ prefix.
- Remove last parameter of MSPEL_FILTER13_CORE (was constant).
- Use of "+r" instead of stricter but unnecessary "+g"....

9fa35729 12/21/2007 11:11 PM Christophe Gisquet

add MMX version for put_no_rnd_h264_chroma_mc8_c, used in VC-1 decoding.
patch by Christophe GISQUET christophe P gisquet A free P fr
original thread:
date: Nov 25, 2007 12:35 AM
subject: Re: [FFmpeg-devel] MMX version for put_no_rnd_h264_chroma_mc8_c

Originally committed as revision 11298 to svn://svn.ffmpeg.org/ffmpeg/trunk

9fbd14ac 12/21/2007 12:38 PM Diego Biurrun

Fix typo in macro name: WARPER8_16_SQ --> WRAPPER8_16_SQ.

Originally committed as revision 11296 to svn://svn.ffmpeg.org/ffmpeg/trunk

407c50a0 12/16/2007 10:20 PM Aurelien Jacobs

move FLAC mmx dsp to its own file

Originally committed as revision 11244 to svn://svn.ffmpeg.org/ffmpeg/trunk

15c57ced 12/15/2007 11:08 PM Reimar Döffinger

Add 'l' suffix where it is necessary because type can not always be
inferred from arguments. Fixes compilation with Intel compiler

Originally committed as revision 11227 to svn://svn.ffmpeg.org/ffmpeg/trunk

1b77e877 12/12/2007 10:45 PM Aurelien Jacobs

add required include to make this file self-contained

Originally committed as revision 11211 to svn://svn.ffmpeg.org/ffmpeg/trunk

571bf37f 12/11/2007 06:47 PM Diego Biurrun

typo/clarification

Originally committed as revision 11201 to svn://svn.ffmpeg.org/ffmpeg/trunk

56cc85a0 12/02/2007 03:43 PM Diego Biurrun

Misc spelling fixes, prefer American over British English.

Originally committed as revision 11126 to svn://svn.ffmpeg.org/ffmpeg/trunk

52b541ad 12/01/2007 10:21 PM Vitor Sessak

spelling

Originally committed as revision 11122 to svn://svn.ffmpeg.org/ffmpeg/trunk

bb6cc730 11/27/2007 10:57 PM Aurelien Jacobs

remove some unused ff_p* vars from dsputil

Originally committed as revision 11106 to svn://svn.ffmpeg.org/ffmpeg/trunk

dbb5fdbd 11/27/2007 10:56 PM Aurelien Jacobs

remove useless #ifdef around extern declaration

Originally committed as revision 11105 to svn://svn.ffmpeg.org/ffmpeg/trunk

7c35b551 11/27/2007 10:54 PM Aurelien Jacobs

cosmetics: indentation

Originally committed as revision 11104 to svn://svn.ffmpeg.org/ffmpeg/trunk

51ac8822 11/27/2007 10:54 PM Aurelien Jacobs

convert some #ifdef CONFIG_ to if(ENABLE_

Originally committed as revision 11103 to svn://svn.ffmpeg.org/ffmpeg/trunk

5b67ce2a 11/27/2007 10:42 PM Aurelien Jacobs

build vc1dsp_mmx.c in its own compilation unit

Originally committed as revision 11102 to svn://svn.ffmpeg.org/ffmpeg/trunk

43de5065 11/27/2007 10:36 PM Aurelien Jacobs

use ff_ prefix for extern vars

Originally committed as revision 11101 to svn://svn.ffmpeg.org/ffmpeg/trunk

182f56cb 11/27/2007 10:23 PM Aurelien Jacobs

make ff_p* vars extern so that they can be used in various *_mmx.c files

Originally committed as revision 11100 to svn://svn.ffmpeg.org/ffmpeg/trunk

ac40ce42 11/25/2007 09:43 AM Christophe Gisquet

Typo fix. Previous version had some picture error building up until next keyframe.
Now MMX version decodes 1:1 what the C version does
patch by Christophe GISQUET christophe P gisquet A free P fr

Originally committed as revision 11090 to svn://svn.ffmpeg.org/ffmpeg/trunk

d3a9c44e 11/24/2007 02:34 PM Christophe Gisquet

Strip debug stuff from vc1dsp_mmx.c, patch by Christophe GISQUET hristophe P gisquet A free P fr
Original thread:
date: Nov 24, 2007 3:09 PM
subject: [FFmpeg-devel] [PATCH] Strip debug stuff from vc1dsp_mmx.c

Originally committed as revision 11088 to svn://svn.ffmpeg.org/ffmpeg/trunk

82821c91 11/21/2007 10:41 PM Christophe Gisquet

add VC-1 MMX DSP functions, under MIT license.
patch by Christophe GISQUET christophe P gisquet A free P fr
original thread:
date: Jul 7, 2007 12:52 PM
subject: [FFmpeg-devel] [PATCH] VC-1 MMX DSP functions

Originally committed as revision 11074 to svn://svn.ffmpeg.org/ffmpeg/trunk

02d36191 11/12/2007 02:04 AM Michael Niedermayer

tring to workaround gcc 2.95 bug which causes random failures

Originally committed as revision 11003 to svn://svn.ffmpeg.org/ffmpeg/trunk

deb43f0b 10/17/2007 10:29 PM Diego Biurrun

Explain why there are no multiple inclusion guards in these header files.

Originally committed as revision 10771 to svn://svn.ffmpeg.org/ffmpeg/trunk

ab54bff2 10/17/2007 11:19 AM Aurelien Jacobs

Remove wrong multiple inclusion guards.
Those files are really meant to be included several times.

Originally committed as revision 10766 to svn://svn.ffmpeg.org/ffmpeg/trunk

5b21bdab 10/17/2007 09:37 AM Diego Biurrun

Add FFMPEG_ prefix to all multiple inclusion guards.

Originally committed as revision 10765 to svn://svn.ffmpeg.org/ffmpeg/trunk

31b2c144 10/17/2007 09:31 AM Diego Biurrun

Add missing multiple inclusion guards.

Originally committed as revision 10763 to svn://svn.ffmpeg.org/ffmpeg/trunk

bdb27356 10/11/2007 10:18 PM Shane

Fix intended order of operations for 4 assert() checks.
Patch by Shane, gnome42 T gmail O com

Originally committed as revision 10711 to svn://svn.ffmpeg.org/ffmpeg/trunk

6810b93a 09/29/2007 10:31 PM Loren Merritt

sse2 version of compute_autocorr().
4x faster than c (somehow, even though doubles only allow 2x simd).
overal flac encoding: 15-50% faster on core2, 4-11% on k8, 3-13% on p4.

Originally committed as revision 10621 to svn://svn.ffmpeg.org/ffmpeg/trunk

eafa1c90 08/30/2007 11:41 AM Reimar Döffinger

Replace complicated and currently broken manual alignment code by
DECLARE_ALIGNED_16. Fixes crash in ff_snow_horizontal_compose97i_sse2

Originally committed as revision 10261 to svn://svn.ffmpeg.org/ffmpeg/trunk

267b9479 08/27/2007 10:39 AM Michael Niedermayer

typo

Originally committed as revision 10250 to svn://svn.ffmpeg.org/ffmpeg/trunk

7bcc1d5b 08/26/2007 04:10 PM Ramiro Polla

CONFIG_7REGS has been renamed to HAVE_7REGS

Originally committed as revision 10237 to svn://svn.ffmpeg.org/ffmpeg/trunk

90e9e94d 08/26/2007 12:34 PM Michael Niedermayer

workaround gcc bug, untested as my gcc is not complaining

Originally committed as revision 10236 to svn://svn.ffmpeg.org/ffmpeg/trunk

cefa5999 08/26/2007 11:16 AM Michael Niedermayer

optimize the first vertical lifting step, this also prevents another
overflow, the last known possible overflow

Originally committed as revision 10234 to svn://svn.ffmpeg.org/ffmpeg/trunk

c9076276 08/26/2007 08:31 AM Michael Niedermayer

optimize 1st horizontal lifting step

Originally committed as revision 10231 to svn://svn.ffmpeg.org/ffmpeg/trunk

1104bf2b 08/26/2007 08:03 AM Michael Niedermayer

typo

Originally committed as revision 10230 to svn://svn.ffmpeg.org/ffmpeg/trunk

8b502929 08/26/2007 06:51 AM Michael Niedermayer

get rid of totally senseless "m" + read in register we have enough
registers to keep everything in registers

Originally committed as revision 10229 to svn://svn.ffmpeg.org/ffmpeg/trunk

bc1e78d8 08/26/2007 02:02 AM Michael Niedermayer

simplify senselessly complex addressing

Originally committed as revision 10228 to svn://svn.ffmpeg.org/ffmpeg/trunk

25bb359f 08/26/2007 01:20 AM Michael Niedermayer

cosmetics
remove brain amputated mmx wrappers around sse2 macros
fix name of ..._sub macro to match ..._add naming

Originally committed as revision 10227 to svn://svn.ffmpeg.org/ffmpeg/trunk

62975029 08/26/2007 01:11 AM Michael Niedermayer

avoid overflow in the 3rd lifting step, this now needs mmx2 at minimum
(patch for plain mmx support is welcome ...)

Originally committed as revision 10226 to svn://svn.ffmpeg.org/ffmpeg/trunk

b696a4c9 08/25/2007 07:04 PM Michael Niedermayer

avoid an overflow in the 1 horizontal lifting step

Originally committed as revision 10225 to svn://svn.ffmpeg.org/ffmpeg/trunk

9caa1ccc 08/25/2007 04:28 PM Michael Niedermayer

prevent one overflow in the first vertical lifting step

Originally committed as revision 10224 to svn://svn.ffmpeg.org/ffmpeg/trunk

3e0f7126 08/25/2007 03:20 PM Michael Niedermayer

update mmx code to latest snow changes
note, the code likely can overflow and thus needs some more changes
sse2 updated too but disabled as it is untested

Originally committed as revision 10223 to svn://svn.ffmpeg.org/ffmpeg/trunk

d593e329 08/25/2007 03:00 AM Michael Niedermayer

use 16bit IDWT (a SIMD implementation of it should be >2x faster then with
the old 32bit code)
disable mmx/sse2 optimizations as they need a rewrite now

Originally committed as revision 10218 to svn://svn.ffmpeg.org/ffmpeg/trunk

ce611a27 08/21/2007 04:29 PM Michael Niedermayer

Change rounding of the horizontal DWT to match the vertical one.
This allows some simplifications and optimizations and should
not have any effect on quality.

Originally committed as revision 10172 to svn://svn.ffmpeg.org/ffmpeg/trunk

30cd3e66 08/21/2007 12:05 AM Michael Niedermayer

remove code which become unused by the previous changes

Originally committed as revision 10166 to svn://svn.ffmpeg.org/ffmpeg/trunk

72dee89b 08/21/2007 12:03 AM Michael Niedermayer

Simplify and optimize the 4th vertical lifting step of the SSE2 code (untested)
This also reduces the needed headroom in that step by 1 bit

Originally committed as revision 10165 to svn://svn.ffmpeg.org/ffmpeg/trunk

d0dae46a 08/21/2007 12:02 AM Michael Niedermayer

Simplify and optimize the 4th vertical lifting step of the MMX code
This also reduces the needed headroom in that step by 1 bit

Originally committed as revision 10164 to svn://svn.ffmpeg.org/ffmpeg/trunk

1ffbbef2 08/20/2007 11:59 PM Michael Niedermayer

Simplify and speedup code, reduce needed headroom by 2 bits in the 3rd
vertical lifting step of the SSE2 code (untested)

Originally committed as revision 10163 to svn://svn.ffmpeg.org/ffmpeg/trunk

4bf17904 08/20/2007 11:54 PM Michael Niedermayer

simplify, speedup and reduce needed headroom by 2 bits in the 3rd
vertical lifting step

Originally committed as revision 10162 to svn://svn.ffmpeg.org/ffmpeg/trunk

dd30437b 08/20/2007 11:11 PM Michael Niedermayer

replace <<1 by add for SSE2 (untested)

Originally committed as revision 10161 to svn://svn.ffmpeg.org/ffmpeg/trunk

7e665a39 08/20/2007 11:09 PM Michael Niedermayer

replace <<1 by add

Originally committed as revision 10160 to svn://svn.ffmpeg.org/ffmpeg/trunk

eee649d3 08/20/2007 11:02 PM Michael Niedermayer

slightly change horizontal lift3 so it needs 1 bit less headroom

Originally committed as revision 10159 to svn://svn.ffmpeg.org/ffmpeg/trunk

be3b22f9 08/20/2007 10:41 PM Michael Niedermayer

remove idiotc double subtraction from the sse2 code (untested, no sse2 here)

Originally committed as revision 10158 to svn://svn.ffmpeg.org/ffmpeg/trunk