| Branch: | Revision:

ffmpeg / libavcodec / vp8.c @ b0d58795

History | View | Annotate | Download (59 KB)

# Date Author Comment
b0d58795 08/03/2010 11:21 PM Jason Garrett-Glaser

VP8: slightly faster DCT coefficient probability update

Originally committed as revision 24687 to svn://

476be414 08/03/2010 11:34 AM Jason Garrett-Glaser

VP8: make another RAC call branchy
1-2 clocks faster.

Originally committed as revision 24683 to svn://

0908f1b9 08/03/2010 11:10 AM Jason Garrett-Glaser

VP8: unroll partition type decoding tree
~34% faster partition type decoding.

Originally committed as revision 24681 to svn://

c5dec7f1 08/03/2010 10:37 AM Jason Garrett-Glaser

VP8: unroll splitmv decoding tree
Much faster splitmv mode decoding.

Originally committed as revision 24680 to svn://

23117d69 08/03/2010 10:24 AM Jason Garrett-Glaser

VP8: unroll MB mode decoding tree
~50% faster MB mode decoding, plus eliminate a costly switch.

Originally committed as revision 24679 to svn://

370b622a 08/02/2010 10:48 PM Jason Garrett-Glaser

VP8: eliminate a dereference in coefficient decoding

Originally committed as revision 24671 to svn://

f311208c 08/02/2010 08:57 PM Jason Garrett-Glaser

VP8: much faster DC transform handling
A lot of the time the DC block is empty: don't do the WHT in this case.
A lot of the rest of the time, there's only one coefficient: make a special
DC-only transform for that case.
When the block is empty, don't incorrectly mark luma DCT blocks as having DC...

827d43bb 08/02/2010 08:18 PM Jason Garrett-Glaser

VP8: move zeroing of luma DC block into the WHT
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.

Originally committed as revision 24668 to svn://

d2840fa4 08/02/2010 09:44 AM Pascal Massimino

only store intra prediction modes on the boundary for keyframes, not as a plane.
inter-frame behaviour unchanged.

Originally committed as revision 24664 to svn://

10bf2eeb 08/02/2010 05:20 AM Jason Garrett-Glaser

VP8: simplify token_prob handling
~1.5% faster decode_block_coeffs

Originally committed as revision 24659 to svn://

c22b4468 08/01/2010 11:20 PM Pascal Massimino

prevent access to vp8_coeff_band16

Originally committed as revision 24656 to svn://

a8ab0ccc 07/27/2010 11:09 PM Pascal Massimino

b0rk3d FATE + black helicopters hissing -> rolling back to r24556 and sleeping

Originally committed as revision 24559 to svn://

62d1f786 07/27/2010 10:23 PM Pascal Massimino

perform the clipping on luma_dc_qmul1 and chroma_qmul0 earlier

Originally committed as revision 24558 to svn://

e7e81959 07/27/2010 10:21 PM Pascal Massimino

save some copies by moving some fields out of proba2

Originally committed as revision 24557 to svn://

fca05ea8 07/26/2010 07:10 AM Jason Garrett-Glaser

VP8: add missing free
Fixes a tiny memory leak.

Originally committed as revision 24504 to svn://

28e241de 07/25/2010 02:49 PM Carl Eugen Hoyos

Fix r24445: Instead of needlessly initialising a variable, silence the warning.

Originally committed as revision 24498 to svn://

ca18a478 07/23/2010 09:46 PM David Conrad

VP8: Inline traversing vp8_small_mvtree

Much faster read_mv_component, slightly faster overall

Originally committed as revision 24470 to svn://

7697cdcf 07/23/2010 09:46 PM David Conrad

VP8: Use vp56_rac_get_prob_branchy when the bit is only used by an if()

Originally committed as revision 24469 to svn://

fe1b5d97 07/23/2010 09:46 PM David Conrad

Decode DCT tokens by branching to a different code path for each branch
on the huffman tree, instead of traversing the tree in a while loop.

Based on the similar optimization in libvpx's detokenize.c

10% faster at normal bitrates, and 30% faster for high-bitrate intra-only...

13a1304b 07/23/2010 09:42 PM Jason Garrett-Glaser

Add myself to VP8 copyright and maintainers.
Also add Ronald to maintainers.

Originally committed as revision 24464 to svn://

414ac27d 07/23/2010 09:36 PM Jason Garrett-Glaser

VP8: always_inline some things to force gcc to do the right thing
Mostly seems to help in the MC code, which gets a hundred cycles faster.

Originally committed as revision 24463 to svn://

06d50ca8 07/23/2010 09:17 PM Jason Garrett-Glaser

VP8: use AV_RL24 instead of defining a new RL24.

Originally committed as revision 24462 to svn://

9fddd14a 07/23/2010 07:06 PM Jason Garrett-Glaser

VP8: Slightly faster MV selection
Don't clamp best mv unless it's actually used.

Originally committed as revision 24461 to svn://

14767f35 07/23/2010 10:42 AM Jason Garrett-Glaser

VP8: use AV_ZERO32 instead of AV_WN32A where relevant

Originally committed as revision 24460 to svn://

09959ec4 07/23/2010 10:34 AM Jason Garrett-Glaser

VP8: eliminate redundant code in r24458

Originally committed as revision 24459 to svn://

a71abb71 07/23/2010 10:24 AM Jason Garrett-Glaser

VP8: shave a few clocks off check_intra_pred_mode

Originally committed as revision 24458 to svn://

0087aa47 07/23/2010 06:41 AM Jason Garrett-Glaser

VP8: fix broken sign bias code in MV pred
Apparently the official conformance test vectors don't test this feature,
even though libvpx uses it.

Originally committed as revision 24456 to svn://

3ae079a3 07/23/2010 06:02 AM Jason Garrett-Glaser

VP8: optimize DC-only chroma case in the same way as luma.
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.

Originally committed as revision 24455 to svn://

3df56f41 07/23/2010 03:44 AM Jason Garrett-Glaser

VP8: Clean up some variable shadowing.

Originally committed as revision 24454 to svn://

8a467b2d 07/23/2010 02:58 AM Jason Garrett-Glaser

VP8: 30% faster idct_mb
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?...

ef38842f 07/23/2010 01:59 AM Jason Garrett-Glaser

VP8: smarter prefetching
Don't prefetch reference frames that were used less than 1/32th of the time so
far in the frame.
This helps speed up to ~2% on videos that, in many frames, make near-zero
(but not entirely zero) use of golden and/or alt-refs.
This is a very common property of videos encoded by libvpx....

c25c7767 07/23/2010 12:07 AM Jason Garrett-Glaser

VP8: clear DCT blocks in iDCT instead of using clear_blocks.
~0.3% faster overall.

Originally committed as revision 24448 to svn://

b74f70d6 07/23/2010 12:05 AM Jason Garrett-Glaser

VP8: avoid a memset for non-i4x4 blocks with no coefficients

Originally committed as revision 24447 to svn://

145d3186 07/22/2010 11:11 PM Jason Garrett-Glaser

Get rid of more unnecessary dereferences in VP8 deblocking

Originally committed as revision 24446 to svn://

86721533 07/22/2010 11:04 PM Jason Garrett-Glaser

Shut up an uninitialized variable GCC warning in VP8.

Originally committed as revision 24445 to svn://

c4211046 07/22/2010 11:03 PM Jason Garrett-Glaser

Smarter VP8 prefetching
Prefetch all refs (including altref), but only if they've been used so far this
~2.5% faster overall.

TODO: Do something even smarter, like using how often each ref has been used
so far, so that a couple blocks of a rarely-used ref don't force us to prefetch...

8cfae560 07/22/2010 10:15 PM Jason Garrett-Glaser

Fix stupid bug in VP8 prefetching code

Originally committed as revision 24443 to svn://

2a38c2e9 07/22/2010 10:08 PM Jason Garrett-Glaser

Eliminate a LUT in escape decoding in VP8 decode_block_coeffs

Originally committed as revision 24441 to svn://

d292c345 07/22/2010 09:05 PM Jason Garrett-Glaser

Eliminate some repeated dereferences in VP8 inter_predict

Originally committed as revision 24438 to svn://

b946111f 07/22/2010 12:15 PM Jason Garrett-Glaser

Eliminate a pointless memset for intra blocks in P-frames in VP8

Originally committed as revision 24429 to svn://

b9a7186b 07/22/2010 11:55 AM Jason Garrett-Glaser

VP8: Don't store segment in macroblock struct anymore.
Not necessary with the previous patch.

Originally committed as revision 24427 to svn://

c55e0d34 07/22/2010 11:45 AM Jason Garrett-Glaser

Convert VP8 macroblock structures to a ring buffer.
Uses a slightly nonintuitive ring buffer size of (width+height*2) to simplify
addressing logic.
Also split out the segmentation map to a separate structure, necessary to
implement the ring buffer.

Originally committed as revision 24426 to svn://

968570d6 07/22/2010 07:24 AM Jason Garrett-Glaser

Calculate deblock strength per-MB instead of per-row
Gives better cache locality, since the VP8Macroblock structs are still in cache.
Inspired by the way x264 does it.

Originally committed as revision 24417 to svn://

d1c58fce 07/22/2010 07:04 AM Jason Garrett-Glaser

Avoid tracking i4x4 modes in P-frames in VP8
As in the previous commit, they aren't used for context selection, so it saves
memory this way.

Originally committed as revision 24416 to svn://

158e062c 07/22/2010 06:39 AM Jason Garrett-Glaser

Avoid useless fill_rectangle in P-frames in VP8
In VP8, i4x4 only uses contexts based on neighbors in I-frames.

Originally committed as revision 24415 to svn://

7bf254c4 07/22/2010 06:29 AM Jason Garrett-Glaser

Optimize partition mv decoding in VP8

Originally committed as revision 24414 to svn://

c0498b30 07/22/2010 05:49 AM Jason Garrett-Glaser

Take shortcuts for mv0 case in VP8 MC
Avoid edge emulation -- it isn't needed if there isn't any subpel.

Originally committed as revision 24413 to svn://

702e8d33 07/22/2010 04:26 AM Jason Garrett-Glaser

Much faster VP8 mv and mode prediction

Originally committed as revision 24412 to svn://

d864dee8 07/22/2010 03:09 AM Jason Garrett-Glaser

Add prefetching to VP8 decoder
~5% faster overall, probably depends on CPU and resolution.

Originally committed as revision 24410 to svn://

096971e8 07/20/2010 05:54 PM Måns Rullgård

vp8: indent

Originally committed as revision 24368 to svn://

070ce7ef 07/20/2010 05:54 PM Måns Rullgård

vp8: add do { } while(0) around XCHG macro to avoid confusing if/else

This is the correct solution to the warning "fixed" in the previous

Originally committed as revision 24367 to svn://

153da88d 07/20/2010 05:45 PM Diego Biurrun

Add some braces to silence the warning:
libavcodec/vp8.c:892: warning: suggest explicit braces to avoid ambiguous `else'

Originally committed as revision 24366 to svn://

3facfc99 07/19/2010 09:18 PM Ronald S. Bultje

Change function prototypes for width=8 inner and mbedge loopfilter functions
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register....

9ac831c2 07/16/2010 07:20 AM David Conrad

vp8: Save mb border needed for intra prediction so that loop filter can run
immediately after a mb row is decoded

Originally committed as revision 24252 to svn://

b6c420ce 07/16/2010 07:20 AM David Conrad

vp8: Check for malloc failure

Originally committed as revision 24251 to svn://

e394953e 07/08/2010 03:01 PM Ronald S. Bultje

Add missing doxy for function arguments.

Originally committed as revision 24110 to svn://

5245c04d 07/02/2010 09:04 PM David Conrad

VP8: Move calculation of outer filter limit out of dsp functions for normal
filter to match the simple loop filter

Originally committed as revision 24010 to svn://

3fa76268 07/02/2010 11:44 AM Diego Biurrun

Avoid square brackets in Doxygen comments; Doxygen chokes on them.

Originally committed as revision 23979 to svn://

7ed06b2b 06/28/2010 04:04 PM Ronald S. Bultje

Simplify MV parsing, removes laying out 2 or 4 (16x8/8x8/8x16) MVs over all
16 subblocks (since we no longer need that), which should also lead to a
minor speedup.

Originally committed as revision 23854 to svn://

7c4dcf81 06/28/2010 01:50 PM Ronald S. Bultje

Optimize split MC, so we don't always do 4x4 blocks of 4x4pixels each, but
we apply them as 16x8/8x16/8x8 subblocks where possible. Since this allows
us to use width=8/16 instead of width=4 MC functions, we can now take more
advantage of SSE2/SSSE3 optimizations, leading to a total speedup for splitMV...

0ef1dbed 06/27/2010 01:46 AM David Conrad

VP8 bilinear filter

Originally committed as revision 23813 to svn://

92a54426 06/27/2010 12:37 AM Måns Rullgård

vp8: warn and request sample if upscaling specified in header

Originally committed as revision 23809 to svn://

d6f8476b 06/25/2010 06:14 PM Jason Garrett-Glaser

Make VP8 DSP functions take two strides
This isn't useful for the C functions, but will allow re-using H and V functions
for HV functions without adding separate H and V wrappers.

Originally committed as revision 23782 to svn://

03ac56e7 06/25/2010 04:23 AM Jason Garrett-Glaser

fix typo in vp8 decoder error message

Originally committed as revision 23765 to svn://

8f910a56 06/23/2010 09:45 PM Stefan Gehrer

avoid conditional and division in chroma MV calculation

Originally committed as revision 23745 to svn://

3b636f21 06/22/2010 07:24 PM David Conrad

Native VP8 decoder.

Patch by David Conrad <lessen42 gmail com> and myself.

Originally committed as revision 23719 to svn://