Statistics
| Branch: | Revision:

ffmpeg / doc / optimization.txt @ 0a46c933

History | View | Annotate | Download (5.06 KB)

1 a552591f Michael Niedermayer
optimization Tips (for libavcodec):
2
3
What to optimize:
4 c5a44f57 Diego Biurrun
If you plan to do non-x86 architecture specific optimizations (SIMD normally),
5 8ea9ce41 Diego Biurrun
then take a look in the i386/ directory, as most important functions are
6
already optimized for MMX.
7 a552591f Michael Niedermayer
8 8ea9ce41 Diego Biurrun
If you want to do x86 optimizations then you can either try to finetune the
9
stuff in the i386 directory or find some other functions in the C source to
10
optimize, but there aren't many left.
11 a552591f Michael Niedermayer
12
Understanding these overoptimized functions:
13 0a46c933 Diego Biurrun
As many functions tend to be a bit difficult to understand because
14
of optimizations, it can be hard to optimize them further, or write
15
architecture-specific versions. It is recommened to look at older
16
CVS versions of the interesting files (just use ViewCVS at
17
http://www1.mplayerhq.hu/cgi-bin/cvsweb.cgi/ffmpeg/?cvsroot=FFMpeg).
18
Alternatively, look into the other architecture-specific versions in
19
the i386/, ppc/, alpha/ subdirectories. Even if you don't exactly
20
comprehend the instructions, it could help understanding the functions
21
and how they can be optimized.
22 8ea9ce41 Diego Biurrun
23
NOTE: If you still don't understand some function, ask at our mailing list!!!
24 67c311fa Alex Beregszaszi
(http://www1.mplayerhq.hu/mailman/listinfo/ffmpeg-devel)
25 a552591f Michael Niedermayer
26
27
28 8ea9ce41 Diego Biurrun
WTF is that function good for ....:
29
The primary purpose of that list is to avoid wasting time to optimize functions
30 a552591f Michael Niedermayer
which are rarely used
31
32
put(_no_rnd)_pixels{,_x2,_y2,_xy2}
33 8ea9ce41 Diego Biurrun
    Used in motion compensation (en/decoding).
34 a552591f Michael Niedermayer
35
avg_pixels{,_x2,_y2,_xy2}
36 8ea9ce41 Diego Biurrun
    Used in motion compensation of B-frames.
37 c5a44f57 Diego Biurrun
    These are less important than the put*pixels functions.
38 a552591f Michael Niedermayer
39
avg_no_rnd_pixels*
40 38aca760 Diego Biurrun
    unused
41 a552591f Michael Niedermayer
42
pix_abs16x16{,_x2,_y2,_xy2}
43 8ea9ce41 Diego Biurrun
    Used in motion estimation (encoding) with SAD.
44 a552591f Michael Niedermayer
45
pix_abs8x8{,_x2,_y2,_xy2}
46 8ea9ce41 Diego Biurrun
    Used in motion estimation (encoding) with SAD of MPEG-4 4MV only.
47 c5a44f57 Diego Biurrun
    These are less important than the pix_abs16x16* functions.
48 a552591f Michael Niedermayer
49
put_mspel8_mc* / wmv2_mspel8*
50 8ea9ce41 Diego Biurrun
    Used only in WMV2.
51
    it is not recommended that you waste your time with these, as WMV2
52
    is an ugly and relatively useless codec.
53 a552591f Michael Niedermayer
54
mpeg4_qpel* / *qpel_mc*
55 8ea9ce41 Diego Biurrun
    Used in MPEG-4 qpel motion compensation (encoding & decoding).
56
    The qpel8 functions are used only for 4mv,
57
    the avg_* functions are used only for B-frames.
58
    Optimizing them should have a significant impact on qpel
59
    encoding & decoding.
60 38aca760 Diego Biurrun
61 a552591f Michael Niedermayer
qpel{8,16}_mc??_old_c / *pixels{8,16}_l4
62 8ea9ce41 Diego Biurrun
    Just used to work around a bug in an old libavcodec encoder version.
63
    Don't optimize them.
64 a552591f Michael Niedermayer
65 7d67aa9b Michael Niedermayer
tpel_mc_func {put,avg}_tpel_pixels_tab
66 8ea9ce41 Diego Biurrun
    Used only for SVQ3, so only optimize them if you need fast SVQ3 decoding.
67 7d67aa9b Michael Niedermayer
68 a552591f Michael Niedermayer
add_bytes/diff_bytes
69 8ea9ce41 Diego Biurrun
    For huffyuv only, optimize if you want a faster ffhuffyuv codec.
70 a552591f Michael Niedermayer
71
get_pixels / diff_pixels
72 8ea9ce41 Diego Biurrun
    Used for encoding, easy.
73 38aca760 Diego Biurrun
74 a552591f Michael Niedermayer
clear_blocks
75 8ea9ce41 Diego Biurrun
    easiest to optimize
76 38aca760 Diego Biurrun
77 a552591f Michael Niedermayer
gmc
78 8ea9ce41 Diego Biurrun
    Used for MPEG-4 gmc.
79
    Optimizing this should have a significant effect on the gmc decoding
80
    speed but it's very likely impossible to write in SIMD.
81 a552591f Michael Niedermayer
82 143cc725 Michael Niedermayer
gmc1
83 8ea9ce41 Diego Biurrun
    Used for chroma blocks in MPEG-4 gmc with 1 warp point
84
    (there are 4 luma & 2 chroma blocks per macroblock, so
85 38aca760 Diego Biurrun
    only 1/3 of the gmc blocks use this, the other 2/3
86
    use the normal put_pixel* code, but only if there is
87 8ea9ce41 Diego Biurrun
    just 1 warp point).
88
    Note: DivX5 gmc always uses just 1 warp point.
89 143cc725 Michael Niedermayer
90 a552591f Michael Niedermayer
pix_sum
91 8ea9ce41 Diego Biurrun
    Used for encoding.
92 38aca760 Diego Biurrun
93 8c55915b Michael Niedermayer
hadamard8_diff / sse / sad == pix_norm1 / dct_sad / quant_psnr / rd / bit
94 8ea9ce41 Diego Biurrun
    Specific compare functions used in encoding, it depends upon the
95
    command line switches which of these are used.
96
    Don't waste your time with dct_sad & quant_psnr, they aren't
97
    really useful.
98 a552591f Michael Niedermayer
99
put_pixels_clamped / add_pixels_clamped
100 8ea9ce41 Diego Biurrun
    Used for en/decoding in the IDCT, easy.
101
    Note, some optimized IDCTs have the add/put clamped code included and
102
    then put_pixels_clamped / add_pixels_clamped will be unused.
103 a552591f Michael Niedermayer
104
idct/fdct
105 38aca760 Diego Biurrun
    idct (encoding & decoding)
106
    fdct (encoding)
107
    difficult to optimize
108
109 a552591f Michael Niedermayer
dct_quantize_trellis
110 8ea9ce41 Diego Biurrun
    Used for encoding with trellis quantization.
111 38aca760 Diego Biurrun
    difficult to optimize
112 a552591f Michael Niedermayer
113
dct_quantize
114 8ea9ce41 Diego Biurrun
    Used for encoding.
115 38aca760 Diego Biurrun
116 a552591f Michael Niedermayer
dct_unquantize_mpeg1
117 8ea9ce41 Diego Biurrun
    Used in MPEG-1 en/decoding.
118 a552591f Michael Niedermayer
119
dct_unquantize_mpeg2
120 8ea9ce41 Diego Biurrun
    Used in MPEG-2 en/decoding.
121 a552591f Michael Niedermayer
122
dct_unquantize_h263
123 8ea9ce41 Diego Biurrun
    Used in MPEG-4/H.263 en/decoding.
124 a552591f Michael Niedermayer
125
FIXME remaining functions?
126 8ea9ce41 Diego Biurrun
BTW, most of these functions are in dsputil.c/.h, some are in mpegvideo.c/.h.
127 a552591f Michael Niedermayer
128
129 38aca760 Diego Biurrun
130 a552591f Michael Niedermayer
Alignment:
131 8ea9ce41 Diego Biurrun
Some instructions on some architectures have strict alignment restrictions,
132 c5a44f57 Diego Biurrun
for example most SSE/SSE2 instructions on x86.
133 8ea9ce41 Diego Biurrun
The minimum guaranteed alignment is written in the .h files, for example:
134 a552591f Michael Niedermayer
    void (*put_pixels_clamped)(const DCTELEM *block/*align 16*/, UINT8 *pixels/*align 8*/, int line_size);
135
136
137
138
Links:
139 3df7be0f Michael Niedermayer
http://www.aggregate.org/MAGIC/
140
141 8ea9ce41 Diego Biurrun
x86-specific:
142 a552591f Michael Niedermayer
http://developer.intel.com/design/pentium4/manuals/248966.htm
143
144 38aca760 Diego Biurrun
The IA-32 Intel Architecture Software Developer's Manual, Volume 2:
145 a552591f Michael Niedermayer
Instruction Set Reference
146
http://developer.intel.com/design/pentium4/manuals/245471.htm
147
148
http://www.agner.org/assem/
149
150
AMD Athlon Processor x86 Code Optimization Guide:
151
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf
152
153
GCC asm links:
154 3df7be0f Michael Niedermayer
official doc but quite ugly
155
http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
156
157 8ea9ce41 Diego Biurrun
a bit old (note "+" is valid for input-output, even though the next disagrees)
158 8c55915b Michael Niedermayer
http://www.cs.virginia.edu/~clc5q/gcc-inline-asm.pdf