Statistics
| Branch: | Revision:

ffmpeg / doc / optimization.txt @ 8c55915b

History | View | Annotate | Download (4.89 KB)

1 a552591f Michael Niedermayer
optimization Tips (for libavcodec):
2
3
What to optimize:
4
if u plan to do non-x86 architecture specific optimiztions (SIMD normally) then
5
take a look in the i386/ directory, as most important functions are allready
6
optimized for MMX
7
8
if u want to do x86 optimizations then u can either try to finetune the stuff in the
9
i386 directory or find some other functions in the c source to optimize, but there
10
arent many left
11
12
Understanding these overoptimized functions:
13
as many functions, like the c ones tend to be a bit unreadable currently becouse 
14
of optimizations it is difficult to understand them (and write arichtecture 
15
specific versions, or optimize the c functions further) it is recommanded to look
16
at older CVS versions of the interresting files (just use CVSWEB at 
17
(http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ffmpeg/ffmpeg/libavcodec/))
18
or perhaps look into the other architecture specific versions in i386/, ppc/, 
19
alpha/, ...; even if u dont understand the instructions exactly it could help
20
understanding the functions & how they can be optimized
21
22
NOTE:!!! if u still dont understand some function then ask at our mailing list!!!
23
(http://lists.sourceforge.net/lists/listinfo/ffmpeg-devel)
24
25
26
27
wtf is that function good for ....:
28
the primary purpose of that list is to avoid wasting time to optimize functions
29
which are rarely used
30
31
put(_no_rnd)_pixels{,_x2,_y2,_xy2}
32
	used in motion compensation (en/decoding)
33
34
avg_pixels{,_x2,_y2,_xy2}
35
	used in motion compensation of B Frames 
36
	these are less important then the put*pixels functions
37
38
avg_no_rnd_pixels*
39
	unused
40
41
pix_abs16x16{,_x2,_y2,_xy2}
42
	used in motion estimation (encoding) with SAD
43
44
pix_abs8x8{,_x2,_y2,_xy2}
45
	used in motion estimation (encoding) with SAD of MPEG4 4MV only
46
	these are less important then the pix_abs16x16* functions
47
48
put_mspel8_mc* / wmv2_mspel8*
49
	used only in WMV2
50
	it is not recommanded that u waste ur time with these, as WMV2 is a
51
	ugly and relativly useless codec
52
53
mpeg4_qpel* / *qpel_mc*
54
	use in MPEG4 qpel Motion compensation (encoding & decoding)
55
	the qpel8 functions are used only for 4mv
56
	the avg_* functions are used only for b frames
57
	optimizing them should have a significant impact on qpel encoding & decoding
58
 
59
qpel{8,16}_mc??_old_c / *pixels{8,16}_l4
60
	just used to workaround a bug in old libavcodec encoder
61
        dont optimze them
62
63
add_bytes/diff_bytes
64
	for huffyuv only, optimize if u want a faster ff-huffyuv codec
65
66
get_pixels / diff_pixels
67
	used for encoding, easy
68
        
69
clear_blocks
70
	easiest, to optimize
71 143cc725 Michael Niedermayer
        
72 a552591f Michael Niedermayer
gmc
73
	used for mpeg4 gmc
74
        optimizing this should have a significant effect on the gmc decoding speed but
75
        its very likely impossible to write in SIMD
76
77 143cc725 Michael Niedermayer
gmc1
78
	used for chroma blocks in mpeg4 gmc with 1 warp point
79
	(there are 4 luma & 2 chroma blocks per macrobock, so 
80
        only 1/3 of the gmc blocks use this, the other 2/3 
81
        use the normal put_pixel* code, but only if there is 
82
        just 1 warp point)
83
        Note: Divx5 gmc always uses just 1 warp point
84
85 a552591f Michael Niedermayer
pix_sum
86
	used for encoding
87
        
88 8c55915b Michael Niedermayer
hadamard8_diff / sse / sad == pix_norm1 / dct_sad / quant_psnr / rd / bit
89 a552591f Michael Niedermayer
	specific compare functions used in encoding, it depends upon the command line
90
        switches which of these are used
91
        dont waste ur time with dct_sad & quant_psnr they arent really usefull
92
93
put_pixels_clamped / add_pixels_clamped
94 8c55915b Michael Niedermayer
	used for en/decoding in the IDCT, easy
95
        Note, some optimized IDCTs have the add/put clamped code included and then 
96
        put_pixels_clamped / add_pixels_clamped will be unused
97 a552591f Michael Niedermayer
98
idct/fdct
99
	idct (encoding & decoding)
100
        fdct (encoding)
101
	difficult to optimize
102
        
103
dct_quantize_trellis
104
	used for encoding with trellis quantization
105
	difficult to optimize 
106
107
dct_quantize
108
	used for encoding
109
        
110
dct_unquantize_mpeg1
111
	used in mpeg1 en/decoding
112
113
dct_unquantize_mpeg2
114
	used in mpeg2 en/decoding
115
116
dct_unquantize_h263
117
	used in mpeg4/h263 en/decoding
118
119
FIXME remaining functions?
120
btw, most of these are in dsputil.c/.h some are in mpegvideo.c/.h
121
122
123
        
124
Alignment:
125
some instructions on some architectures have strict alignment restrictions,
126
for example most SSE/SSE2 inctructios on X86
127
the minimum guranteed alignment is writen in the .h files
128
for example: 
129
    void (*put_pixels_clamped)(const DCTELEM *block/*align 16*/, UINT8 *pixels/*align 8*/, int line_size);
130
131
132
133
Links:
134 3df7be0f Michael Niedermayer
http://www.aggregate.org/MAGIC/
135
136 a552591f Michael Niedermayer
X86 specific:
137
http://developer.intel.com/design/pentium4/manuals/248966.htm
138
139
The IA-32 Intel Architecture Software Developer's Manual, Volume 2: 
140
Instruction Set Reference
141
http://developer.intel.com/design/pentium4/manuals/245471.htm
142
143
http://www.agner.org/assem/
144
145
AMD Athlon Processor x86 Code Optimization Guide:
146
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf
147
148
GCC asm links:
149 3df7be0f Michael Niedermayer
official doc but quite ugly
150
http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
151
152
a bit old (note "+" is valid for input-output, even though the next says its not)
153 8c55915b Michael Niedermayer
http://www.cs.virginia.edu/~clc5q/gcc-inline-asm.pdf