Statistics
| Branch: | Revision:

ffmpeg / doc / ffmpeg_powerpc_performance_evaluation_howto.txt @ 42225a30

History | View | Annotate | Download (5.15 KB)

1 b64dcbe3 Michael Niedermayer
FFmpeg & evaluating performance on the PowerPC Architecture HOWTO
2
3 b839da64 Michael Niedermayer
(c) 2003-2004 Romain Dolbeau <romain@dolbeau.org>
4 b64dcbe3 Michael Niedermayer
5
6
7
I - Introduction
8
9 2b552569 Diego Biurrun
The PowerPC architecture and its SIMD extension AltiVec offer some
10
interesting tools to evaluate performance and improve the code.
11 41061adf Diego Biurrun
This document tries to explain how to use those tools with FFmpeg.
12 b64dcbe3 Michael Niedermayer
13 2b552569 Diego Biurrun
The architecture itself offers two ways to evaluate the performance of
14
a given piece of code:
15 b64dcbe3 Michael Niedermayer
16
1) The Time Base Registers (TBL)
17
2) The Performance Monitor Counter Registers (PMC)
18
19 41061adf Diego Biurrun
The first ones are always available, always active, but they're not very
20
accurate: the registers increment by one every four *bus* cycles. On
21
my 667 Mhz tiBook (ppc7450), this means once every twenty *processor*
22
cycles. So we won't use that.
23 b64dcbe3 Michael Niedermayer
24 41061adf Diego Biurrun
The PMC are much more useful: not only can they report cycle-accurate
25 2b552569 Diego Biurrun
timing, but they can also be used to monitor many other parameters,
26 41061adf Diego Biurrun
such as the number of AltiVec stalls for every kind of instruction,
27 2b552569 Diego Biurrun
or instruction cache misses. The downside is that not all processors
28
support the PMC (all G3, all G4 and the 970 do support them), and
29
they're inactive by default - you need to activate them with a
30 41061adf Diego Biurrun
dedicated tool. Also, the number of available PMC depends on the
31
procesor: the various 604 have 2, the various 75x (aka. G3) have 4,
32
and the various 74xx (aka G4) have 6.
33 b64dcbe3 Michael Niedermayer
34 41061adf Diego Biurrun
*WARNING*: The PowerPC 970 is not very well documented, and its PMC
35
registers are 64 bits wide. To properly notify the code, you *must*
36
tune for the 970 (using --tune=970), or the code will assume 32 bit
37 2b552569 Diego Biurrun
registers.
38 b64dcbe3 Michael Niedermayer
39
40
II - Enabling FFmpeg PowerPC performance support
41
42 41061adf Diego Biurrun
This needs to be done by hand. First, you need to configure FFmpeg as
43
usual, but add the "--powerpc-perf-enable" option. For instance:
44 b64dcbe3 Michael Niedermayer
45
#####
46
./configure --prefix=/usr/local/ffmpeg-cvs --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable
47
#####
48
49 2b552569 Diego Biurrun
This will configure FFmpeg to install inside /usr/local/ffmpeg-cvs,
50
compiling with gcc-3.3 (you should try to use this one or a newer
51 41061adf Diego Biurrun
gcc), and tuning for the PowerPC 7450 (i.e. the newer G4; as a rule of
52
thumb, those at 550Mhz and more). It will also enable the PMC.
53 b64dcbe3 Michael Niedermayer
54
You may also edit the file "config.h" to enable the following line:
55
56
#####
57
// #define ALTIVEC_USE_REFERENCE_C_CODE 1
58
#####
59
60 2b552569 Diego Biurrun
If you enable this line, then the code will not make use of AltiVec,
61
but will use the reference C code instead. This is useful to compare
62 41061adf Diego Biurrun
performance between two versions of the code.
63 b64dcbe3 Michael Niedermayer
64 41061adf Diego Biurrun
Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h":
65 b64dcbe3 Michael Niedermayer
66
#####
67
#define POWERPC_NUM_PMC_ENABLED 4
68
#####
69
70 41061adf Diego Biurrun
If you have a G4 CPU, you can enable all 6 PMC. DO NOT enable more
71
PMC than available on your CPU!
72 b64dcbe3 Michael Niedermayer
73 41061adf Diego Biurrun
Then, simply compile FFmpeg as usual (make && make install).
74 b64dcbe3 Michael Niedermayer
75
76
77
III - Using FFmpeg PowerPC performance support
78
79 41061adf Diego Biurrun
This FFmeg can be used exactly as usual. But before exiting, FFmpeg
80 2b552569 Diego Biurrun
will dump a per-function report that looks like this:
81 b64dcbe3 Michael Niedermayer
82
#####
83
PowerPC performance report
84 2b552569 Diego Biurrun
 Values are from the PMC registers, and represent whatever the
85
 registers are set to record.
86 b64dcbe3 Michael Niedermayer
 Function "gmc1_altivec" (pmc1):
87
        min: 231
88
        max: 1339867
89
        avg: 558.25 (255302)
90
 Function "gmc1_altivec" (pmc2):
91
        min: 93
92
        max: 2164
93
        avg: 267.31 (255302)
94
 Function "gmc1_altivec" (pmc3):
95
        min: 72
96
        max: 1987
97
        avg: 276.20 (255302)
98
(...)
99
#####
100
101 2b552569 Diego Biurrun
In this example, PMC1 was set to record CPU cycles, PMC2 was set to
102 41061adf Diego Biurrun
record AltiVec Permute Stall Cycles, and PMC3 was set to record AltiVec
103 2b552569 Diego Biurrun
Issue Stalls.
104 b64dcbe3 Michael Niedermayer
105 2b552569 Diego Biurrun
The function "gmc1_altivec" was monitored 255302 times, and the
106
minimum execution time was 231 processor cycles. The max and average
107
aren't much use, as it's very likely the OS interrupted execution for
108 41061adf Diego Biurrun
reasons of its own :-(
109 b64dcbe3 Michael Niedermayer
110 41061adf Diego Biurrun
With the exact same settings and source file, but using the reference C
111
code we get:
112 b64dcbe3 Michael Niedermayer
113
#####
114
PowerPC performance report
115 2b552569 Diego Biurrun
 Values are from the PMC registers, and represent whatever the
116
 registers are set to record.
117 b64dcbe3 Michael Niedermayer
 Function "gmc1_altivec" (pmc1):
118
        min: 592
119
        max: 2532235
120
        avg: 962.88 (255302)
121
 Function "gmc1_altivec" (pmc2):
122
        min: 0
123
        max: 33
124
        avg: 0.00 (255302)
125
 Function "gmc1_altivec" (pmc3):
126
        min: 0
127
        max: 350
128
        avg: 0.03 (255302)
129
(...)
130
#####
131
132 2b552569 Diego Biurrun
592 cycles, so the fastest AltiVec execution is about 2.5x faster than
133
the fastest C execution in this example. It's not perfect but it's not
134
bad (well I wrote this function so I can't say otherwise :-).
135 b64dcbe3 Michael Niedermayer
136 2b552569 Diego Biurrun
Once you have that kind of report, you can try to improve things by
137 41061adf Diego Biurrun
finding what goes wrong and fixing it; in the example above, one
138
should try to diminish the number of AltiVec stalls, as this *may*
139
improve performance.
140 b64dcbe3 Michael Niedermayer
141
142
143 41061adf Diego Biurrun
IV) Enabling the PMC in Mac OS X
144 b64dcbe3 Michael Niedermayer
145 2b552569 Diego Biurrun
This is easy. Use "Monster" and "monster". Those tools come from
146
Apple's CHUD package, and can be found hidden in the developer web
147 41061adf Diego Biurrun
site & FTP site. "MONster" is the graphical application, use it to
148 2b552569 Diego Biurrun
generate a config file specifying what each register should
149
monitor. Then use the command-line application "monster" to use that
150
config file, and enjoy the results.
151 b64dcbe3 Michael Niedermayer
152 41061adf Diego Biurrun
Note that "MONster" can be used for many other things, but it's
153 2b552569 Diego Biurrun
documented by Apple, it's not my subject.
154 b64dcbe3 Michael Niedermayer
155
156
157 41061adf Diego Biurrun
V) Enabling the PMC on Linux
158 b64dcbe3 Michael Niedermayer
159
I don't know how to do it, sorry :-) Any idea very much welcome.
160
161 115329f1 Diego Biurrun
--
162 b64dcbe3 Michael Niedermayer
Romain Dolbeau
163
<romain@dolbeau.org>