| Branch: | Revision:

ffmpeg / libavcodec / h264_cabac.c @ f14bdd8e

History | View | Annotate | Download (71.3 KB)

# Date Author Comment
f14bdd8e 01/15/2011 05:52 PM Jason Garrett-Glaser

H.264: Partially inline CABAC residual decoding
Improves CABAC performance about ~1.2%.

Trick originates from x264 and has also been used in ffvp8. It's useful because
coded block flags are usually zero, so it helps to have the early termination
inlined into the main function....

2a1f431d 01/15/2011 01:10 AM Jason Garrett-Glaser

H.264/SVQ3: make chroma DC work the same way as luma DC
No speed improvement, but necessary for some future stuff.
Also opens up the possibility of asm chroma dc idct/dequant.

Originally committed as revision 26349 to svn://

5657d140 01/14/2011 09:36 PM Jason Garrett-Glaser

H.264: switch to x264-style tracking of luma/chroma DC NNZ
Useful so that we don't have to run the hierarchical DC iDCT if there aren't
any coefficients. Opens up some future opportunities for optimization as well.

Originally committed as revision 26337 to svn://

19fb234e 01/14/2011 09:34 PM Jason Garrett-Glaser

H.264: split luma dc idct out and implement MMX/SSE2 versions
About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed....

ba87f080 04/20/2010 02:45 PM Diego Biurrun

Remove explicit filename from Doxygen @file commands.

Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.

Originally committed as revision 22921 to svn://

32e543f8 03/30/2010 03:50 PM Benoit Fouet

Replace @returns by @return.

Originally committed as revision 22729 to svn://

767738f7 03/26/2010 05:04 AM Alexander Strange

h264: Use + instead of | in some places

6 insns less on x86-64/gcc 4.2.

Originally committed as revision 22692 to svn://

601ca8c5 03/26/2010 03:31 AM Alexander Strange

h264: Remove unused function argument

Originally committed as revision 22690 to svn://

f7ba470d 03/26/2010 03:29 AM Alexander Strange

h264: Simplify decode_cabac_residual() specialization

Gives more consistent inlining with some compilers (such as llvm).

Originally committed as revision 22689 to svn://

8897b247 02/28/2010 11:54 PM Michael Niedermayer

Remove some unneeded fill_rectangle() for 16x16 blocks.

Originally committed as revision 22124 to svn://

821fe7f3 02/26/2010 10:45 PM Zhou Zongyi

Optimize (amvd>2)+(amvd>32), about 1 cpu cycles faster.
patch by Zhou Zongyi @ zhouzy () os punkt pku dot edu speck cn

Originally committed as revision 22084 to svn://

b5bd0700 02/24/2010 08:43 PM Michael Niedermayer

Change mvd_cache & mvd_table to 8bit, this is overall a bit faster
for high resolution videos.
about 20cycles faster per MB for cathederal.

Originally committed as revision 22038 to svn://

81b5e4ee 02/24/2010 06:50 PM Michael Niedermayer

Calculate mvd without abs()
same speed (ask gcc why, i dont know)

Originally committed as revision 22035 to svn://

855a1ba5 02/24/2010 06:16 PM Michael Niedermayer

switch back to (amvd>2)+(amvd>32), its 5 cpu cycles faster now.

Originally committed as revision 22032 to svn://

01b35be1 02/24/2010 06:06 PM Michael Niedermayer

Factorize common code from the top of decode_cabac_mb_mvd()
10-15 cpu cycles faster.

Originally committed as revision 22029 to svn://

6d0155c7 02/24/2010 04:16 PM Michael Niedermayer

Replace mvd>2 + mvd>32 by MIN*17>>9, 2)
same speed as far as i can meassure but it might have fewer branches on some
Idea from x264 / jason

Originally committed as revision 22027 to svn://

90332deb 02/24/2010 01:12 PM Michael Niedermayer

Replace ad-hoc fill rectangle by fill_rectangle().

Originally committed as revision 22025 to svn://

f4ce8531 02/19/2010 03:10 AM Michael Niedermayer

get rid of an if() 1 cpu cycle faster.

Originally committed as revision 21889 to svn://

e69bfde6 02/19/2010 02:37 AM Michael Niedermayer

Get rid of a local variable, 10 cpu cycles faster.

Originally committed as revision 21888 to svn://

a305449d 02/18/2010 11:37 PM Michael Niedermayer

Move abs() from decode_cabac_mb_mvd() to the code that writes mvd_cache.
4-8 cycles faster

Originally committed as revision 21887 to svn://

90a5849e 02/18/2010 12:13 PM Michael Niedermayer

Speedup decode_cabac_field_decoding_flag() by 9 cpu cycles.

Originally committed as revision 21875 to svn://

69cc3183 02/17/2010 02:14 AM Michael Niedermayer

Move check for and call of predict_field_decoding_flag() from the mb code to
the row code. This function would only be needed on a MB basis for MBAFF+FMO

Originally committed as revision 21860 to svn://

59f733d1 02/16/2010 11:43 PM Michael Niedermayer

2x faster ff_h264_init_cabac_states(), 4k cpu cycles less.
Sadly this is just per slice so the speedup with normal files should be negligible.

Originally committed as revision 21859 to svn://

37a9719a 02/16/2010 02:51 AM Michael Niedermayer

2 cpu cycles faster context calculation for decode_cabac_intra_mb_type()

Originally committed as revision 21845 to svn://

5806e8cd 02/16/2010 12:09 AM Michael Niedermayer

Drop a few redundant slice_num checks.

Originally committed as revision 21844 to svn://

05307427 02/15/2010 11:04 PM Michael Niedermayer

Drop compute_mb_neighbors() and move fill_decode_neighbors() up to take its
Should be faster as this is a strict code removial.

Originally committed as revision 21843 to svn://

c1bb66ac 02/15/2010 10:07 PM Michael Niedermayer

Split setting neighboring MBs from fill_decode_caches()
no speed change.

Originally committed as revision 21842 to svn://

cf55f59d 02/15/2010 07:22 PM Michael Niedermayer

Simplify decode_cabac_mb_intra4x4_pred_mode().
same speed

Originally committed as revision 21839 to svn://

f4060611 02/15/2010 07:20 PM Michael Niedermayer

Merge decode_cabac_mb_type_b() into calling code.
This avoids a conditional branch and is about 3 cpu cyclues faster.

Originally committed as revision 21838 to svn://

64dd1b0a 02/15/2010 01:04 AM Michael Niedermayer

Merge the single line function decode_cabac_mb_transform_size()
into the calling code.
8 cpu cycles faster

Originally committed as revision 21828 to svn://

8b38d107 02/14/2010 11:10 PM Michael Niedermayer


Originally committed as revision 21827 to svn://

f4b8b825 02/14/2010 11:06 PM Michael Niedermayer

Merge decode_cabac_mb_dqp() with surronding code.
~20 cpu cycles faster

Originally committed as revision 21826 to svn://

a59b9ee3 02/14/2010 04:51 PM Michael Niedermayer

Set sub_mb_type in direct_cache instead of just the direct flag.
Simpler, cleaner and faster.

Originally committed as revision 21822 to svn://

2dc380ca 02/14/2010 02:41 PM Michael Niedermayer

Store sub_mb_type in direct_cache/direct_table.
This is equal complexity but could be more usefull.

Originally committed as revision 21821 to svn://

3d2c3ef4 02/14/2010 02:08 AM Michael Niedermayer

Remove slice_table checks from decode_cabac_mb_cbp_luma() and set left/top_cbp so
these checks arent needed.

Originally committed as revision 21819 to svn://

27739206 01/25/2010 02:44 AM Michael Niedermayer

Optimize decode_cabac_field_decoding_flag().
~4 cpu cycles faster

Originally committed as revision 21447 to svn://

c6727809 01/22/2010 03:25 AM Måns Rullgård

Move array specifiers outside DECLARE_ALIGNED() invocations

Originally committed as revision 21377 to svn://

7231ccf4 01/18/2010 11:55 PM Michael Niedermayer

Cosmetic, get rid of &x0

Originally committed as revision 21309 to svn://

f432b43b 01/17/2010 09:43 PM Michael Niedermayer

Split fill_caches() between filter and decoder.

Originally committed as revision 21271 to svn://

c988f975 01/17/2010 08:35 PM Michael Niedermayer

Rearchitecturing the stiched up goose part 1
Run loop filter per row instead of per MB, this also should make it
much easier to switch to per frame filtering and also doing so in a
seperate thread in the future if some volunteer wants to try.
Overall decoding speedup of 1.7% (single thread on pentium dual / cathedral sample)...

ddd60f28 01/16/2010 05:41 AM Michael Niedermayer

Replace cabac checks in inline functions from h264.h with constants.
No benchmark because its just replacing variables with litteral constants
(so no risk for slowdown outside gcc silliness) and i need sleep.

Originally committed as revision 21237 to svn://

cc51b282 01/13/2010 02:35 AM Michael Niedermayer

Split cabac decoding code out of h264.c.
not slower according to benchmarks.

Originally committed as revision 21181 to svn://