Revision 38aca760

View differences:

doc/optimization.txt
10 10
arent many left
11 11

  
12 12
Understanding these overoptimized functions:
13
as many functions, like the c ones tend to be a bit unreadable currently becouse 
14
of optimizations it is difficult to understand them (and write arichtecture 
13
as many functions, like the c ones tend to be a bit unreadable currently becouse
14
of optimizations it is difficult to understand them (and write arichtecture
15 15
specific versions, or optimize the c functions further) it is recommanded to look
16
at older CVS versions of the interresting files (just use CVSWEB at 
16
at older CVS versions of the interresting files (just use CVSWEB at
17 17
http://www1.mplayerhq.hu/cgi-bin/cvsweb.cgi/ffmpeg/libavcodec/?cvsroot=FFmpeg)
18
or perhaps look into the other architecture specific versions in i386/, ppc/, 
18
or perhaps look into the other architecture specific versions in i386/, ppc/,
19 19
alpha/, ...; even if u dont understand the instructions exactly it could help
20 20
understanding the functions & how they can be optimized
21 21

  
......
29 29
which are rarely used
30 30

  
31 31
put(_no_rnd)_pixels{,_x2,_y2,_xy2}
32
	used in motion compensation (en/decoding)
32
    used in motion compensation (en/decoding)
33 33

  
34 34
avg_pixels{,_x2,_y2,_xy2}
35
	used in motion compensation of B Frames 
36
	these are less important then the put*pixels functions
35
    used in motion compensation of B Frames
36
    these are less important then the put*pixels functions
37 37

  
38 38
avg_no_rnd_pixels*
39
	unused
39
    unused
40 40

  
41 41
pix_abs16x16{,_x2,_y2,_xy2}
42
	used in motion estimation (encoding) with SAD
42
    used in motion estimation (encoding) with SAD
43 43

  
44 44
pix_abs8x8{,_x2,_y2,_xy2}
45
	used in motion estimation (encoding) with SAD of MPEG4 4MV only
46
	these are less important then the pix_abs16x16* functions
45
    used in motion estimation (encoding) with SAD of MPEG4 4MV only
46
    these are less important then the pix_abs16x16* functions
47 47

  
48 48
put_mspel8_mc* / wmv2_mspel8*
49
	used only in WMV2
50
	it is not recommanded that u waste ur time with these, as WMV2 is a
51
	ugly and relativly useless codec
49
    used only in WMV2
50
    it is not recommanded that u waste ur time with these, as WMV2 is a
51
    ugly and relativly useless codec
52 52

  
53 53
mpeg4_qpel* / *qpel_mc*
54
	use in MPEG4 qpel Motion compensation (encoding & decoding)
55
	the qpel8 functions are used only for 4mv
56
	the avg_* functions are used only for b frames
57
	optimizing them should have a significant impact on qpel encoding & decoding
58
 
54
    use in MPEG4 qpel Motion compensation (encoding & decoding)
55
    the qpel8 functions are used only for 4mv
56
    the avg_* functions are used only for b frames
57
    optimizing them should have a significant impact on qpel encoding & decoding
58

  
59 59
qpel{8,16}_mc??_old_c / *pixels{8,16}_l4
60
	just used to workaround a bug in old libavcodec encoder
61
        dont optimze them
60
    just used to workaround a bug in old libavcodec encoder
61
    dont optimze them
62 62

  
63 63
tpel_mc_func {put,avg}_tpel_pixels_tab
64
	used only for SVQ3, so only optimze them if u need fast SVQ3 decoding
64
    used only for SVQ3, so only optimze them if u need fast SVQ3 decoding
65 65

  
66 66
add_bytes/diff_bytes
67
	for huffyuv only, optimize if u want a faster ff-huffyuv codec
67
    for huffyuv only, optimize if u want a faster ff-huffyuv codec
68 68

  
69 69
get_pixels / diff_pixels
70
	used for encoding, easy
71
        
70
    used for encoding, easy
71

  
72 72
clear_blocks
73
	easiest, to optimize
74
        
73
    easiest, to optimize
74

  
75 75
gmc
76
	used for mpeg4 gmc
77
        optimizing this should have a significant effect on the gmc decoding speed but
78
        its very likely impossible to write in SIMD
76
    used for mpeg4 gmc
77
    optimizing this should have a significant effect on the gmc decoding speed but
78
    its very likely impossible to write in SIMD
79 79

  
80 80
gmc1
81
	used for chroma blocks in mpeg4 gmc with 1 warp point
82
	(there are 4 luma & 2 chroma blocks per macrobock, so 
83
        only 1/3 of the gmc blocks use this, the other 2/3 
84
        use the normal put_pixel* code, but only if there is 
85
        just 1 warp point)
86
        Note: Divx5 gmc always uses just 1 warp point
81
    used for chroma blocks in mpeg4 gmc with 1 warp point
82
    (there are 4 luma & 2 chroma blocks per macrobock, so
83
    only 1/3 of the gmc blocks use this, the other 2/3
84
    use the normal put_pixel* code, but only if there is
85
    just 1 warp point)
86
    Note: Divx5 gmc always uses just 1 warp point
87 87

  
88 88
pix_sum
89
	used for encoding
90
        
89
    used for encoding
90

  
91 91
hadamard8_diff / sse / sad == pix_norm1 / dct_sad / quant_psnr / rd / bit
92
	specific compare functions used in encoding, it depends upon the command line
93
        switches which of these are used
94
        dont waste ur time with dct_sad & quant_psnr they arent really usefull
92
    specific compare functions used in encoding, it depends upon the command line
93
    switches which of these are used
94
    dont waste ur time with dct_sad & quant_psnr they arent really usefull
95 95

  
96 96
put_pixels_clamped / add_pixels_clamped
97
	used for en/decoding in the IDCT, easy
98
        Note, some optimized IDCTs have the add/put clamped code included and then 
99
        put_pixels_clamped / add_pixels_clamped will be unused
97
    used for en/decoding in the IDCT, easy
98
    Note, some optimized IDCTs have the add/put clamped code included and then
99
    put_pixels_clamped / add_pixels_clamped will be unused
100 100

  
101 101
idct/fdct
102
	idct (encoding & decoding)
103
        fdct (encoding)
104
	difficult to optimize
105
        
102
    idct (encoding & decoding)
103
    fdct (encoding)
104
    difficult to optimize
105

  
106 106
dct_quantize_trellis
107
	used for encoding with trellis quantization
108
	difficult to optimize 
107
    used for encoding with trellis quantization
108
    difficult to optimize
109 109

  
110 110
dct_quantize
111
	used for encoding
112
        
111
    used for encoding
112

  
113 113
dct_unquantize_mpeg1
114
	used in mpeg1 en/decoding
114
    used in mpeg1 en/decoding
115 115

  
116 116
dct_unquantize_mpeg2
117
	used in mpeg2 en/decoding
117
    used in mpeg2 en/decoding
118 118

  
119 119
dct_unquantize_h263
120
	used in mpeg4/h263 en/decoding
120
    used in mpeg4/h263 en/decoding
121 121

  
122 122
FIXME remaining functions?
123 123
btw, most of these are in dsputil.c/.h some are in mpegvideo.c/.h
124 124

  
125 125

  
126
        
126

  
127 127
Alignment:
128 128
some instructions on some architectures have strict alignment restrictions,
129 129
for example most SSE/SSE2 inctructios on X86
130 130
the minimum guranteed alignment is writen in the .h files
131
for example: 
131
for example:
132 132
    void (*put_pixels_clamped)(const DCTELEM *block/*align 16*/, UINT8 *pixels/*align 8*/, int line_size);
133 133

  
134 134

  
......
139 139
X86 specific:
140 140
http://developer.intel.com/design/pentium4/manuals/248966.htm
141 141

  
142
The IA-32 Intel Architecture Software Developer's Manual, Volume 2: 
142
The IA-32 Intel Architecture Software Developer's Manual, Volume 2:
143 143
Instruction Set Reference
144 144
http://developer.intel.com/design/pentium4/manuals/245471.htm
145 145

  

Also available in: Unified diff