Statistics
| Branch: | Revision:

ffmpeg / libavcodec / h264_loopfilter.c @ 84dc2d8a

History | View | Annotate | Download (34.8 KB)

# Date Author Comment
84dc2d8a 03/06/2010 02:24 PM Måns Rullgård

Remove DECLARE_ALIGNED_{8,16} macros

These macros are redundant. All uses are replaced with the generic
DECLARE_ALIGNED macro instead.

Originally committed as revision 22233 to svn://svn.ffmpeg.org/ffmpeg/trunk

19769ece 02/18/2010 04:24 PM Måns Rullgård

H264: use alias-safe macros

This eliminates all aliasing violation warnings in h264 code.
No measurable speed difference with gcc-4.4.3 on i7.

Originally committed as revision 21881 to svn://svn.ffmpeg.org/ffmpeg/trunk

40d11227 02/17/2010 08:36 PM Måns Rullgård

Use LOCAL_ALIGNED macro for local arrays

Originally committed as revision 21866 to svn://svn.ffmpeg.org/ffmpeg/trunk

78998bf2 02/13/2010 09:09 PM Alexander Strange

h264: Remove unused variables.

Originally committed as revision 21815 to svn://svn.ffmpeg.org/ffmpeg/trunk

9873ae0d 02/07/2010 02:00 AM Michael Niedermayer

Fix CAVLC+8x8DCT+MBAFF loopfiltering.
Fixes issue1250

Originally committed as revision 21665 to svn://svn.ffmpeg.org/ffmpeg/trunk

37b2b0d6 01/31/2010 02:05 AM Michael Niedermayer

Get rid of a check in one direction that cant be true in it in that part
of the code.
No meassureable speed change.

Originally committed as revision 21566 to svn://svn.ffmpeg.org/ffmpeg/trunk

26468148 01/30/2010 08:07 PM Michael Niedermayer

Split first reference list comparission from mv comparission.
about 0.5% faster MBAFF loop filtering

Originally committed as revision 21552 to svn://svn.ffmpeg.org/ffmpeg/trunk

4e992796 01/30/2010 02:33 PM Michael Niedermayer

Replace h->left_type0 by the local variable for it we have.
No meassureable speed effect.

Originally committed as revision 21541 to svn://svn.ffmpeg.org/ffmpeg/trunk

012dbcce 01/30/2010 02:10 PM Michael Niedermayer

slightly faster bit trickery.

Originally committed as revision 21540 to svn://svn.ffmpeg.org/ffmpeg/trunk

77821e11 01/30/2010 01:40 PM Michael Niedermayer

Replace ?: by branchless code.
about 0.5% faster loop filtering

Originally committed as revision 21539 to svn://svn.ffmpeg.org/ffmpeg/trunk

34032e26 01/28/2010 07:44 PM Michael Niedermayer

factorize first filter call out, this makes the code somewhat
smaller without any speed loss.

Originally committed as revision 21514 to svn://svn.ffmpeg.org/ffmpeg/trunk

592e03a8 01/28/2010 11:37 AM Michael Niedermayer

Change wraper functions to always inline, they are faster now that way.
1% faster MBAFF decoding overall, maybe ~0.1% faster for the cathedral sample.

Originally committed as revision 21507 to svn://svn.ffmpeg.org/ffmpeg/trunk

5364db28 01/28/2010 11:18 AM Michael Niedermayer

indent

Originally committed as revision 21506 to svn://svn.ffmpeg.org/ffmpeg/trunk

2cf0d46d 01/28/2010 11:12 AM Michael Niedermayer

Restructure check_mv()
~20 cpu cycles faster loopfilter

Originally committed as revision 21505 to svn://svn.ffmpeg.org/ffmpeg/trunk

fabd704b 01/28/2010 10:38 AM Michael Niedermayer

Restructure if() in check_mv()
quite a bit faster

Originally committed as revision 21504 to svn://svn.ffmpeg.org/ffmpeg/trunk

ca7c784f 01/28/2010 10:34 AM Michael Niedermayer

Unroll loops in check_mv()
~6% faster (slow path) loopfilter (should be ~2% overall)

Originally committed as revision 21503 to svn://svn.ffmpeg.org/ffmpeg/trunk

e814817b 01/28/2010 10:10 AM Michael Niedermayer

Factor mv/ref compare code out.
This is a hair slower (0.15% maybe) but i really dont want to have the
identical code duplicated 3 times because gcc adds odd threaded jumps with
register reshuffling and register safe/restore.

Originally committed as revision 21502 to svn://svn.ffmpeg.org/ffmpeg/trunk

3b849245 01/28/2010 02:41 AM Michael Niedermayer

Simplify first edge filter condition.

Originally committed as revision 21497 to svn://svn.ffmpeg.org/ffmpeg/trunk

b6302d0c 01/28/2010 02:20 AM Michael Niedermayer

Cosmetics, mostly indention, 2 or so new fixme comments that i was to lazy
to split out

Originally committed as revision 21496 to svn://svn.ffmpeg.org/ffmpeg/trunk

0a32508d 01/28/2010 02:15 AM Michael Niedermayer

Make the fast loop filter path work with unavailable left MBs.
This prevents the issue with having to switch between slow and
fast code paths in each row.
0.5% faster loopfilter for cathedral

Originally committed as revision 21495 to svn://svn.ffmpeg.org/ffmpeg/trunk

b3047673 01/28/2010 01:31 AM Michael Niedermayer

get rid of the start variable.
a few cycles faster

Originally committed as revision 21494 to svn://svn.ffmpeg.org/ffmpeg/trunk

980bcc55 01/28/2010 01:24 AM Michael Niedermayer

Unroll main loop so the edge==0 case is seperate.
This allows many things to be simplified away.
h264 decoder is overall 1% faster with a mbaff sample and
0.1% slower with the cathedral sample, probably because the slow loop
filter code must be loaded into the code cache for each first MB of each...

8670f84c 01/27/2010 01:18 PM Michael Niedermayer

Update comment.

Originally committed as revision 21479 to svn://svn.ffmpeg.org/ffmpeg/trunk

e470ef76 01/27/2010 11:14 AM Michael Niedermayer

Use table to speedup access to non_zero_count in MBAFF with differing interlacing.
~4 cpu cycles speedup

Originally committed as revision 21474 to svn://svn.ffmpeg.org/ffmpeg/trunk

16e5e39a 01/26/2010 10:59 PM Michael Niedermayer

Optimize loop filtering of the left edge in MBAFF.
60 cpu cycles speedup

Originally committed as revision 21467 to svn://svn.ffmpeg.org/ffmpeg/trunk

6548c939 01/26/2010 03:34 PM Michael Niedermayer

remove unneeded check

Originally committed as revision 21460 to svn://svn.ffmpeg.org/ffmpeg/trunk

18ea2f93 01/26/2010 02:57 PM Michael Niedermayer

Use left_mb_xy from fill_caches instead of recalculating it.

Originally committed as revision 21459 to svn://svn.ffmpeg.org/ffmpeg/trunk

d5c30c86 01/26/2010 01:39 PM Michael Niedermayer

Simplify loop filter a little by using top/left_type.

Originally committed as revision 21457 to svn://svn.ffmpeg.org/ffmpeg/trunk

50eb40a7 01/24/2010 01:20 PM Michael Niedermayer

Remove all uses of slice_type* from the loop filter, also remove its
initialization befre the loop filter.

Originally committed as revision 21416 to svn://svn.ffmpeg.org/ffmpeg/trunk

0c32e19d 01/23/2010 06:05 PM Michael Niedermayer

Move +52 from the loop filter to the alpha/beta offsets in the context.
This should fix a segfault, also it might be faster on systems where the
+52 wasnt free.

Originally committed as revision 21406 to svn://svn.ffmpeg.org/ffmpeg/trunk

1cc2d211 01/23/2010 03:28 PM Michael Niedermayer

Set edges based on cbp and mv partitioning, not just skiped MBs.
This is faster for videos that have lots of MBs that fall in this category.

Originally committed as revision 21400 to svn://svn.ffmpeg.org/ffmpeg/trunk

6b3661b2 01/23/2010 02:50 PM Michael Niedermayer

Optimize filter_mb_mbaff_edge*()

Originally committed as revision 21397 to svn://svn.ffmpeg.org/ffmpeg/trunk

933bea77 01/23/2010 01:54 PM Michael Niedermayer

Optmize 8x8dct check used to skip some borders in the loop filter.
4 cpu cycles faster.

Originally committed as revision 21396 to svn://svn.ffmpeg.org/ffmpeg/trunk

c6727809 01/22/2010 03:25 AM Måns Rullgård

Move array specifiers outside DECLARE_ALIGNED() invocations

Originally committed as revision 21377 to svn://svn.ffmpeg.org/ffmpeg/trunk

258b60c2 01/22/2010 01:59 AM Michael Niedermayer

Gcc idiocy fixes related to filter_mb_edge*.
Change order of operands as gcc uses a hardcoded register per operand it seems
even for static functions
thus reducing unneeded moved (now functions try to pass the same argument in
the same spot).
Change signed int to unsigned int for array indexes as signed requires signed...

31f6e3c1 01/21/2010 04:50 PM Michael Niedermayer

Make calculation of mask_edge free of branches, faster of course but probably
little effect overall as this is not that often executed.

Originally committed as revision 21366 to svn://svn.ffmpeg.org/ffmpeg/trunk

bec358d6 01/20/2010 03:28 AM Alexander Strange

H.264: Declare bS with DECLARE_ALIGNED_8 for uint64_t casts.

Originally committed as revision 21345 to svn://svn.ffmpeg.org/ffmpeg/trunk

97775235 01/20/2010 03:00 AM Michael Niedermayer

Simplify/Optimize another of the mbaff loop filter cases.
Its faster but too rarely used to make a differnce.

Originally committed as revision 21344 to svn://svn.ffmpeg.org/ffmpeg/trunk

085d9d98 01/20/2010 01:49 AM Michael Niedermayer

Only calculate the second chroma qp if it differs from the firstin the main
loop filter. (a little faster for the common case where they are equal)

Originally committed as revision 21342 to svn://svn.ffmpeg.org/ffmpeg/trunk

948180e7 01/20/2010 01:38 AM Michael Niedermayer

Set bS with 64bits at a time.

Originally committed as revision 21341 to svn://svn.ffmpeg.org/ffmpeg/trunk

87df989e 01/20/2010 01:15 AM Michael Niedermayer

Merge multiple IS_* macro uses where possible.

Originally committed as revision 21340 to svn://svn.ffmpeg.org/ffmpeg/trunk

55c54371 01/20/2010 12:44 AM Michael Niedermayer

Simplify and optimize intra code in h264_loopfilter.c

Originally committed as revision 21339 to svn://svn.ffmpeg.org/ffmpeg/trunk

9528ce7b 01/20/2010 12:17 AM Michael Niedermayer

Sightly simplify initialization of int start.
No real speed change.

Originally committed as revision 21336 to svn://svn.ffmpeg.org/ffmpeg/trunk

655a1d57 01/19/2010 04:43 PM Michael Niedermayer

Reenable ff_h264_filter_mb_fast() for all slices it supported before.

Originally committed as revision 21328 to svn://svn.ffmpeg.org/ffmpeg/trunk

2b3649f6 01/18/2010 11:41 PM Michael Niedermayer

Fix compilation with -O0.

Originally committed as revision 21308 to svn://svn.ffmpeg.org/ffmpeg/trunk

bffe82f5 01/18/2010 09:22 PM Michael Niedermayer

Rather call filter_mb_mbaff_edge*v() more often than do extra calculations
in the innerst loop. ~150 cpu cycles faster

Originally committed as revision 21299 to svn://svn.ffmpeg.org/ffmpeg/trunk

0fe674cb 01/18/2010 08:13 PM Michael Niedermayer

Use h->slice_num where possible.

Originally committed as revision 21292 to svn://svn.ffmpeg.org/ffmpeg/trunk

bce6a1e7 01/18/2010 07:45 PM Michael Niedermayer

Enable filter_mb_fast for CAVLC P slices.

Originally committed as revision 21291 to svn://svn.ffmpeg.org/ffmpeg/trunk

42ebca85 01/18/2010 04:29 PM Michael Niedermayer

PAFF CABAC P slices seem to work as well, so enable them for ff_h264_filter_mb_fast() too.

Originally committed as revision 21289 to svn://svn.ffmpeg.org/ffmpeg/trunk

a8f49215 01/18/2010 04:16 PM Michael Niedermayer

Reenable filter_mb_fast for I slices and progressive CABAC P slices.

Originally committed as revision 21288 to svn://svn.ffmpeg.org/ffmpeg/trunk

b6ef858e 01/18/2010 01:09 PM Michael Niedermayer

Move CAVLC 8x8 DCT special case from ff_h264_filter_mb() to fill_caches
that way it is also available for ff_h264_filter_mb_fast().

Originally committed as revision 21283 to svn://svn.ffmpeg.org/ffmpeg/trunk

6d7e6b26 01/18/2010 05:15 AM Michael Niedermayer

Perform reference remapping at fill_cache() time instead of in the
loop filter. This removes one obstacle of getting ff_h264_filter_mb_fast()
bitexact. code is maybe 0.1% faster

Originally committed as revision 21280 to svn://svn.ffmpeg.org/ffmpeg/trunk

44a5e7b6 01/18/2010 12:20 AM Michael Niedermayer

Move the qp check to skip the loop filter up.

Originally committed as revision 21274 to svn://svn.ffmpeg.org/ffmpeg/trunk

b6303e6d 01/17/2010 11:44 PM Michael Niedermayer

Reorganize how values are stored in h->non_zero_count.
~1% faster

Originally committed as revision 21273 to svn://svn.ffmpeg.org/ffmpeg/trunk

c988f975 01/17/2010 08:35 PM Michael Niedermayer

Rearchitecturing the stiched up goose part 1
Run loop filter per row instead of per MB, this also should make it
much easier to switch to per frame filtering and also doing so in a
seperate thread in the future if some volunteer wants to try.
Overall decoding speedup of 1.7% (single thread on pentium dual / cathedral sample)...

7931bb2a 01/16/2010 05:41 PM Michael Niedermayer

Comment for() ; out
~200 bytes smaller ff_h264_filter_mb()
please everyone, NEVER add code with the assumtation that gcc will remove it
without checking gcc actually does. Chances are it does not.

Originally committed as revision 21251 to svn://svn.ffmpeg.org/ffmpeg/trunk

ed3d7e2f 01/16/2010 05:27 PM Michael Niedermayer

Mark a few functions as noinline, this makes ff_h264_filter_mb() a bit smaller
and 5% faster.
ff_h264_filter_mb_fast() stay the same size as gcc decided not to inline these
functions there in the first place.

Originally committed as revision 21250 to svn://svn.ffmpeg.org/ffmpeg/trunk

183a86c9 01/16/2010 04:21 PM Michael Niedermayer

Apply last 2 optimizations to similar code i forgot.

Originally committed as revision 21249 to svn://svn.ffmpeg.org/ffmpeg/trunk

3f55a651 01/16/2010 04:14 PM Michael Niedermayer

Another microopt, 4 cpu cycles for avoidance of FFABS.

Originally committed as revision 21248 to svn://svn.ffmpeg.org/ffmpeg/trunk

26147d36 01/16/2010 03:19 PM Michael Niedermayer

Minor (2 cpu cycles) optimization ||->|.

Originally committed as revision 21246 to svn://svn.ffmpeg.org/ffmpeg/trunk

2e36c931 01/16/2010 11:55 AM Michael Niedermayer

Avoid wasting 4 cpu cycles per MB in redundantly calculating qp_thresh.

Originally committed as revision 21243 to svn://svn.ffmpeg.org/ffmpeg/trunk

082cf971 01/12/2010 06:01 AM Michael Niedermayer

Split h264 loop filter off h264.c.
No meassureable speed difference on pentium dual & cathedral sample.

Originally committed as revision 21159 to svn://svn.ffmpeg.org/ffmpeg/trunk