Statistics
| Branch: | Revision:

ffmpeg / libswscale / ppc / yuv2rgb_altivec.c @ 5d6e4c16

History | View | Annotate | Download (37.5 KB)

1 a31de956 Michael Niedermayer
/*
2 7a4d5e17 Diego Biurrun
 * AltiVec acceleration for colorspace conversion
3
 *
4
 * copyright (C) 2004 Marc Hoffman <marc.hoffman@analog.com>
5
 *
6
 * This file is part of FFmpeg.
7
 *
8 ee8ee340 Diego Biurrun
 * FFmpeg is free software; you can redistribute it and/or
9
 * modify it under the terms of the GNU Lesser General Public
10
 * License as published by the Free Software Foundation; either
11
 * version 2.1 of the License, or (at your option) any later version.
12 7a4d5e17 Diego Biurrun
 *
13
 * FFmpeg is distributed in the hope that it will be useful,
14
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
15 ee8ee340 Diego Biurrun
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
16
 * Lesser General Public License for more details.
17 7a4d5e17 Diego Biurrun
 *
18 ee8ee340 Diego Biurrun
 * You should have received a copy of the GNU Lesser General Public
19
 * License along with FFmpeg; if not, write to the Free Software
20 7a4d5e17 Diego Biurrun
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
21
 */
22 a31de956 Michael Niedermayer
23 7a4d5e17 Diego Biurrun
/*
24 8a322796 Diego Biurrun
Convert I420 YV12 to RGB in various formats,
25
  it rejects images that are not in 420 formats,
26
  it rejects images that don't have widths of multiples of 16,
27
  it rejects images that don't have heights of multiples of 2.
28
Reject defers to C simulation code.
29 a31de956 Michael Niedermayer

30 8a322796 Diego Biurrun
Lots of optimizations to be done here.
31 a31de956 Michael Niedermayer

32 8a322796 Diego Biurrun
1. Need to fix saturation code. I just couldn't get it to fly with packs
33
   and adds, so we currently use max/min to clip.
34 a31de956 Michael Niedermayer

35 8a322796 Diego Biurrun
2. The inefficient use of chroma loading needs a bit of brushing up.
36 a31de956 Michael Niedermayer

37 8a322796 Diego Biurrun
3. Analysis of pipeline stalls needs to be done. Use shark to identify
38
   pipeline stalls.
39 a31de956 Michael Niedermayer

40

41 4bdc44c7 Diego Biurrun
MODIFIED to calculate coeffs from currently selected color space.
42 8a322796 Diego Biurrun
MODIFIED core to be a macro where you specify the output format.
43
ADDED UYVY conversion which is never called due to some thing in swscale.
44 4bdc44c7 Diego Biurrun
CORRECTED algorithim selection to be strict on input formats.
45 8a322796 Diego Biurrun
ADDED runtime detection of AltiVec.
46 a31de956 Michael Niedermayer

47 4bdc44c7 Diego Biurrun
ADDED altivec_yuv2packedX vertical scl + RGB converter
48 a31de956 Michael Niedermayer

49 4bdc44c7 Diego Biurrun
March 27,2004
50
PERFORMANCE ANALYSIS
51 a31de956 Michael Niedermayer

52 8a322796 Diego Biurrun
The C version uses 25% of the processor or ~250Mips for D1 video rawvideo
53
used as test.
54
The AltiVec version uses 10% of the processor or ~100Mips for D1 video
55
same sequence.
56 a31de956 Michael Niedermayer

57 8a322796 Diego Biurrun
720 * 480 * 30  ~10MPS
58 a31de956 Michael Niedermayer

59 8a322796 Diego Biurrun
so we have roughly 10 clocks per pixel. This is too high, something has
60
to be wrong.
61 a31de956 Michael Niedermayer

62 8a322796 Diego Biurrun
OPTIMIZED clip codes to utilize vec_max and vec_packs removing the
63
need for vec_min.
64 a31de956 Michael Niedermayer

65 8a322796 Diego Biurrun
OPTIMIZED DST OUTPUT cache/DMA controls. We are pretty much guaranteed to have
66
the input video frame, it was just decompressed so it probably resides in L1
67
caches. However, we are creating the output video stream. This needs to use the
68
DSTST instruction to optimize for the cache. We couple this with the fact that
69
we are not going to be visiting the input buffer again so we mark it Least
70
Recently Used. This shaves 25% of the processor cycles off.
71 a31de956 Michael Niedermayer

72 8a322796 Diego Biurrun
Now memcpy is the largest mips consumer in the system, probably due
73 4bdc44c7 Diego Biurrun
to the inefficient X11 stuff.
74 a31de956 Michael Niedermayer

75 4bdc44c7 Diego Biurrun
GL libraries seem to be very slow on this machine 1.33Ghz PB running
76
Jaguar, this is not the case for my 1Ghz PB.  I thought it might be
77 8a322796 Diego Biurrun
a versioning issue, however I have libGL.1.2.dylib for both
78
machines. (We need to figure this out now.)
79 a31de956 Michael Niedermayer

80 8a322796 Diego Biurrun
GL2 libraries work now with patch for RGB32.
81 a31de956 Michael Niedermayer

82 8a322796 Diego Biurrun
NOTE: quartz vo driver ARGB32_to_RGB24 consumes 30% of the processor.
83 a31de956 Michael Niedermayer

84 8a322796 Diego Biurrun
Integrated luma prescaling adjustment for saturation/contrast/brightness
85
adjustment.
86 d026b45e Diego Biurrun
*/
87 a31de956 Michael Niedermayer
88
#include <stdio.h>
89
#include <stdlib.h>
90 84fdd642 Alex Beregszaszi
#include <string.h>
91 a31de956 Michael Niedermayer
#include <inttypes.h>
92
#include <assert.h>
93
#include "config.h"
94 befa8e66 Ramiro Polla
#include "libswscale/rgb2rgb.h"
95
#include "libswscale/swscale.h"
96
#include "libswscale/swscale_internal.h"
97 a31de956 Michael Niedermayer
98
#undef PROFILE_THE_BEAST
99
#undef INC_SCALING
100
101
typedef unsigned char ubyte;
102
typedef signed char   sbyte;
103
104
105
/* RGB interleaver, 16 planar pels 8-bit samples per channel in
106
   homogeneous vector registers x0,x1,x2 are interleaved with the
107
   following technique:
108

109
      o0 = vec_mergeh (x0,x1);
110
      o1 = vec_perm (o0, x2, perm_rgb_0);
111
      o2 = vec_perm (o0, x2, perm_rgb_1);
112
      o3 = vec_mergel (x0,x1);
113
      o4 = vec_perm (o3,o2,perm_rgb_2);
114
      o5 = vec_perm (o3,o2,perm_rgb_3);
115

116
  perm_rgb_0:   o0(RG).h v1(B) --> o1*
117
              0   1  2   3   4
118
             rgbr|gbrg|brgb|rgbr
119
             0010 0100 1001 0010
120
             0102 3145 2673 894A
121

122
  perm_rgb_1:   o0(RG).h v1(B) --> o2
123
              0   1  2   3   4
124
             gbrg|brgb|bbbb|bbbb
125
             0100 1001 1111 1111
126
             B5CD 6EF7 89AB CDEF
127

128
  perm_rgb_2:   o3(RG).l o2(rgbB.l) --> o4*
129
              0   1  2   3   4
130
             gbrg|brgb|rgbr|gbrg
131
             1111 1111 0010 0100
132
             89AB CDEF 0182 3945
133

134
  perm_rgb_2:   o3(RG).l o2(rgbB.l) ---> o5*
135
              0   1  2   3   4
136
             brgb|rgbr|gbrg|brgb
137
             1001 0010 0100 1001
138
             a67b 89cA BdCD eEFf
139

140
*/
141
static
142
const vector unsigned char
143 f22e5e22 Diego Biurrun
  perm_rgb_0 = {0x00,0x01,0x10,0x02,0x03,0x11,0x04,0x05,
144
                0x12,0x06,0x07,0x13,0x08,0x09,0x14,0x0a},
145
  perm_rgb_1 = {0x0b,0x15,0x0c,0x0d,0x16,0x0e,0x0f,0x17,
146
                0x18,0x19,0x1a,0x1b,0x1c,0x1d,0x1e,0x1f},
147
  perm_rgb_2 = {0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,
148
                0x00,0x01,0x18,0x02,0x03,0x19,0x04,0x05},
149
  perm_rgb_3 = {0x1a,0x06,0x07,0x1b,0x08,0x09,0x1c,0x0a,
150
                0x0b,0x1d,0x0c,0x0d,0x1e,0x0e,0x0f,0x1f};
151 42809816 Diego Biurrun
152
#define vec_merge3(x2,x1,x0,y0,y1,y2)       \
153
do {                                        \
154 9655ffb5 David Conrad
    __typeof__(x0) o0,o2,o3;                \
155 42809816 Diego Biurrun
        o0 = vec_mergeh (x0,x1);            \
156
        y0 = vec_perm (o0, x2, perm_rgb_0); \
157
        o2 = vec_perm (o0, x2, perm_rgb_1); \
158
        o3 = vec_mergel (x0,x1);            \
159
        y1 = vec_perm (o3,o2,perm_rgb_2);   \
160
        y2 = vec_perm (o3,o2,perm_rgb_3);   \
161 a31de956 Michael Niedermayer
} while(0)
162
163 42809816 Diego Biurrun
#define vec_mstbgr24(x0,x1,x2,ptr)      \
164
do {                                    \
165 9655ffb5 David Conrad
    __typeof__(x0) _0,_1,_2;            \
166 42809816 Diego Biurrun
    vec_merge3 (x0,x1,x2,_0,_1,_2);     \
167
    vec_st (_0, 0, ptr++);              \
168
    vec_st (_1, 0, ptr++);              \
169
    vec_st (_2, 0, ptr++);              \
170 9451b59f Ramiro Polla
}  while (0)
171 a31de956 Michael Niedermayer
172 42809816 Diego Biurrun
#define vec_mstrgb24(x0,x1,x2,ptr)      \
173
do {                                    \
174 9655ffb5 David Conrad
    __typeof__(x0) _0,_1,_2;            \
175 42809816 Diego Biurrun
    vec_merge3 (x2,x1,x0,_0,_1,_2);     \
176
    vec_st (_0, 0, ptr++);              \
177
    vec_st (_1, 0, ptr++);              \
178
    vec_st (_2, 0, ptr++);              \
179 9451b59f Ramiro Polla
}  while (0)
180 a31de956 Michael Niedermayer
181
/* pack the pixels in rgb0 format
182
   msb R
183
   lsb 0
184
*/
185 42809816 Diego Biurrun
#define vec_mstrgb32(T,x0,x1,x2,x3,ptr)                                       \
186
do {                                                                          \
187
    T _0,_1,_2,_3;                                                            \
188
    _0 = vec_mergeh (x0,x1);                                                  \
189
    _1 = vec_mergeh (x2,x3);                                                  \
190
    _2 = (T)vec_mergeh ((vector unsigned short)_0,(vector unsigned short)_1); \
191
    _3 = (T)vec_mergel ((vector unsigned short)_0,(vector unsigned short)_1); \
192
    vec_st (_2, 0*16, (T *)ptr);                                              \
193
    vec_st (_3, 1*16, (T *)ptr);                                              \
194
    _0 = vec_mergel (x0,x1);                                                  \
195
    _1 = vec_mergel (x2,x3);                                                  \
196
    _2 = (T)vec_mergeh ((vector unsigned short)_0,(vector unsigned short)_1); \
197
    _3 = (T)vec_mergel ((vector unsigned short)_0,(vector unsigned short)_1); \
198
    vec_st (_2, 2*16, (T *)ptr);                                              \
199
    vec_st (_3, 3*16, (T *)ptr);                                              \
200
    ptr += 4;                                                                 \
201 9451b59f Ramiro Polla
}  while (0)
202 a31de956 Michael Niedermayer
203
/*
204

205
  | 1     0       1.4021   | | Y |
206
  | 1    -0.3441 -0.7142   |x| Cb|
207 42809816 Diego Biurrun
  | 1     1.7718  0        | | Cr|
208 a31de956 Michael Niedermayer

209

210
  Y:      [-128 127]
211
  Cb/Cr : [-128 127]
212

213
  typical yuv conversion work on Y: 0-255 this version has been optimized for jpeg decode.
214

215
*/
216
217
218
219
220
#define vec_unh(x) \
221 42809816 Diego Biurrun
    (vector signed short) \
222 9655ffb5 David Conrad
        vec_perm(x,(__typeof__(x)){0}, \
223 14b83f9a Guillaume Poirier
                 ((vector unsigned char){0x10,0x00,0x10,0x01,0x10,0x02,0x10,0x03,\
224
                                         0x10,0x04,0x10,0x05,0x10,0x06,0x10,0x07}))
225 a31de956 Michael Niedermayer
#define vec_unl(x) \
226 42809816 Diego Biurrun
    (vector signed short) \
227 9655ffb5 David Conrad
        vec_perm(x,(__typeof__(x)){0}, \
228 14b83f9a Guillaume Poirier
                 ((vector unsigned char){0x10,0x08,0x10,0x09,0x10,0x0A,0x10,0x0B,\
229
                                         0x10,0x0C,0x10,0x0D,0x10,0x0E,0x10,0x0F}))
230 a31de956 Michael Niedermayer
231 cbddd5df Alan Curry
#define vec_clip_s16(x) \
232 14b83f9a Guillaume Poirier
    vec_max (vec_min (x, ((vector signed short){235,235,235,235,235,235,235,235})), \
233
                         ((vector signed short){ 16, 16, 16, 16, 16, 16, 16, 16}))
234 a31de956 Michael Niedermayer
235
#define vec_packclp(x,y) \
236 42809816 Diego Biurrun
    (vector unsigned char)vec_packs \
237 14b83f9a Guillaume Poirier
        ((vector unsigned short)vec_max (x,((vector signed short) {0})), \
238
         (vector unsigned short)vec_max (y,((vector signed short) {0})))
239 a31de956 Michael Niedermayer
240 68363b69 Reimar Döffinger
//#define out_pixels(a,b,c,ptr) vec_mstrgb32(__typeof__(a),((__typeof__ (a)){255}),a,a,a,ptr)
241 a31de956 Michael Niedermayer
242
243 84fdd642 Alex Beregszaszi
static inline void cvtyuvtoRGB (SwsContext *c,
244 42809816 Diego Biurrun
                                vector signed short Y, vector signed short U, vector signed short V,
245
                                vector signed short *R, vector signed short *G, vector signed short *B)
246 a31de956 Michael Niedermayer
{
247 42809816 Diego Biurrun
    vector signed   short vx,ux,uvx;
248 a31de956 Michael Niedermayer
249 42809816 Diego Biurrun
    Y = vec_mradds (Y, c->CY, c->OY);
250
    U  = vec_sub (U,(vector signed short)
251 f22e5e22 Diego Biurrun
                    vec_splat((vector signed short){128},0));
252 42809816 Diego Biurrun
    V  = vec_sub (V,(vector signed short)
253 f22e5e22 Diego Biurrun
                    vec_splat((vector signed short){128},0));
254 a31de956 Michael Niedermayer
255 42809816 Diego Biurrun
    //   ux  = (CBU*(u<<c->CSHIFT)+0x4000)>>15;
256
    ux = vec_sl (U, c->CSHIFT);
257
    *B = vec_mradds (ux, c->CBU, Y);
258 a31de956 Michael Niedermayer
259 42809816 Diego Biurrun
    // vx  = (CRV*(v<<c->CSHIFT)+0x4000)>>15;
260
    vx = vec_sl (V, c->CSHIFT);
261
    *R = vec_mradds (vx, c->CRV, Y);
262 a31de956 Michael Niedermayer
263 42809816 Diego Biurrun
    // uvx = ((CGU*u) + (CGV*v))>>15;
264
    uvx = vec_mradds (U, c->CGU, Y);
265
    *G  = vec_mradds (V, c->CGV, uvx);
266 a31de956 Michael Niedermayer
}
267
268
269
/*
270
  ------------------------------------------------------------------------------
271
  CS converters
272
  ------------------------------------------------------------------------------
273
*/
274
275
276 42809816 Diego Biurrun
#define DEFCSP420_CVT(name,out_pixels)                                  \
277
static int altivec_##name (SwsContext *c,                               \
278 a4eef68f Reimar Döffinger
                           const unsigned char **in, int *instrides,    \
279 42809816 Diego Biurrun
                           int srcSliceY,        int srcSliceH,         \
280
                           unsigned char **oplanes, int *outstrides)    \
281
{                                                                       \
282
    int w = c->srcW;                                                    \
283
    int h = srcSliceH;                                                  \
284
    int i,j;                                                            \
285
    int instrides_scl[3];                                               \
286
    vector unsigned char y0,y1;                                         \
287
                                                                        \
288
    vector signed char  u,v;                                            \
289
                                                                        \
290
    vector signed short Y0,Y1,Y2,Y3;                                    \
291
    vector signed short U,V;                                            \
292
    vector signed short vx,ux,uvx;                                      \
293
    vector signed short vx0,ux0,uvx0;                                   \
294
    vector signed short vx1,ux1,uvx1;                                   \
295
    vector signed short R0,G0,B0;                                       \
296
    vector signed short R1,G1,B1;                                       \
297
    vector unsigned char R,G,B;                                         \
298
                                                                        \
299
    vector unsigned char *y1ivP, *y2ivP, *uivP, *vivP;                  \
300
    vector unsigned char align_perm;                                    \
301
                                                                        \
302
    vector signed short                                                 \
303
        lCY  = c->CY,                                                   \
304
        lOY  = c->OY,                                                   \
305
        lCRV = c->CRV,                                                  \
306
        lCBU = c->CBU,                                                  \
307
        lCGU = c->CGU,                                                  \
308
        lCGV = c->CGV;                                                  \
309
                                                                        \
310
    vector unsigned short lCSHIFT = c->CSHIFT;                          \
311
                                                                        \
312 a4eef68f Reimar Döffinger
    const ubyte *y1i   = in[0];                                         \
313
    const ubyte *y2i   = in[0]+instrides[0];                            \
314
    const ubyte *ui    = in[1];                                         \
315
    const ubyte *vi    = in[2];                                         \
316 42809816 Diego Biurrun
                                                                        \
317
    vector unsigned char *oute                                          \
318
        = (vector unsigned char *)                                      \
319
            (oplanes[0]+srcSliceY*outstrides[0]);                       \
320
    vector unsigned char *outo                                          \
321
        = (vector unsigned char *)                                      \
322
            (oplanes[0]+srcSliceY*outstrides[0]+outstrides[0]);         \
323
                                                                        \
324
                                                                        \
325
    instrides_scl[0] = instrides[0]*2-w;  /* the loop moves y{1,2}i by w */ \
326
    instrides_scl[1] = instrides[1]-w/2;  /* the loop moves ui by w/2 */    \
327
    instrides_scl[2] = instrides[2]-w/2;  /* the loop moves vi by w/2 */    \
328
                                                                        \
329
                                                                        \
330
    for (i=0;i<h/2;i++) {                                               \
331
        vec_dstst (outo, (0x02000002|(((w*3+32)/32)<<16)), 0);          \
332
        vec_dstst (oute, (0x02000002|(((w*3+32)/32)<<16)), 1);          \
333
                                                                        \
334
        for (j=0;j<w/16;j++) {                                          \
335
                                                                        \
336
            y1ivP = (vector unsigned char *)y1i;                        \
337
            y2ivP = (vector unsigned char *)y2i;                        \
338
            uivP  = (vector unsigned char *)ui;                         \
339
            vivP  = (vector unsigned char *)vi;                         \
340
                                                                        \
341
            align_perm = vec_lvsl (0, y1i);                             \
342
            y0 = (vector unsigned char)                                 \
343
                 vec_perm (y1ivP[0], y1ivP[1], align_perm);             \
344
                                                                        \
345
            align_perm = vec_lvsl (0, y2i);                             \
346
            y1 = (vector unsigned char)                                 \
347
                 vec_perm (y2ivP[0], y2ivP[1], align_perm);             \
348
                                                                        \
349
            align_perm = vec_lvsl (0, ui);                              \
350
            u = (vector signed char)                                    \
351
                vec_perm (uivP[0], uivP[1], align_perm);                \
352
                                                                        \
353
            align_perm = vec_lvsl (0, vi);                              \
354
            v = (vector signed char)                                    \
355
                vec_perm (vivP[0], vivP[1], align_perm);                \
356
                                                                        \
357
            u  = (vector signed char)                                   \
358
                 vec_sub (u,(vector signed char)                        \
359 f22e5e22 Diego Biurrun
                          vec_splat((vector signed char){128},0));      \
360 42809816 Diego Biurrun
            v  = (vector signed char)                                   \
361
                 vec_sub (v,(vector signed char)                        \
362 f22e5e22 Diego Biurrun
                          vec_splat((vector signed char){128},0));      \
363 42809816 Diego Biurrun
                                                                        \
364
            U  = vec_unpackh (u);                                       \
365
            V  = vec_unpackh (v);                                       \
366
                                                                        \
367
                                                                        \
368
            Y0 = vec_unh (y0);                                          \
369
            Y1 = vec_unl (y0);                                          \
370
            Y2 = vec_unh (y1);                                          \
371
            Y3 = vec_unl (y1);                                          \
372
                                                                        \
373
            Y0 = vec_mradds (Y0, lCY, lOY);                             \
374
            Y1 = vec_mradds (Y1, lCY, lOY);                             \
375
            Y2 = vec_mradds (Y2, lCY, lOY);                             \
376
            Y3 = vec_mradds (Y3, lCY, lOY);                             \
377
                                                                        \
378
            /*   ux  = (CBU*(u<<CSHIFT)+0x4000)>>15 */                  \
379
            ux = vec_sl (U, lCSHIFT);                                   \
380 f22e5e22 Diego Biurrun
            ux = vec_mradds (ux, lCBU, (vector signed short){0});       \
381 42809816 Diego Biurrun
            ux0  = vec_mergeh (ux,ux);                                  \
382
            ux1  = vec_mergel (ux,ux);                                  \
383
                                                                        \
384
            /* vx  = (CRV*(v<<CSHIFT)+0x4000)>>15;        */            \
385
            vx = vec_sl (V, lCSHIFT);                                   \
386 f22e5e22 Diego Biurrun
            vx = vec_mradds (vx, lCRV, (vector signed short){0});       \
387 42809816 Diego Biurrun
            vx0  = vec_mergeh (vx,vx);                                  \
388
            vx1  = vec_mergel (vx,vx);                                  \
389
                                                                        \
390
            /* uvx = ((CGU*u) + (CGV*v))>>15 */                         \
391 f22e5e22 Diego Biurrun
            uvx = vec_mradds (U, lCGU, (vector signed short){0});       \
392 42809816 Diego Biurrun
            uvx = vec_mradds (V, lCGV, uvx);                            \
393
            uvx0 = vec_mergeh (uvx,uvx);                                \
394
            uvx1 = vec_mergel (uvx,uvx);                                \
395
                                                                        \
396
            R0 = vec_add (Y0,vx0);                                      \
397
            G0 = vec_add (Y0,uvx0);                                     \
398
            B0 = vec_add (Y0,ux0);                                      \
399
            R1 = vec_add (Y1,vx1);                                      \
400
            G1 = vec_add (Y1,uvx1);                                     \
401
            B1 = vec_add (Y1,ux1);                                      \
402
                                                                        \
403
            R  = vec_packclp (R0,R1);                                   \
404
            G  = vec_packclp (G0,G1);                                   \
405
            B  = vec_packclp (B0,B1);                                   \
406
                                                                        \
407
            out_pixels(R,G,B,oute);                                     \
408
                                                                        \
409
            R0 = vec_add (Y2,vx0);                                      \
410
            G0 = vec_add (Y2,uvx0);                                     \
411
            B0 = vec_add (Y2,ux0);                                      \
412
            R1 = vec_add (Y3,vx1);                                      \
413
            G1 = vec_add (Y3,uvx1);                                     \
414
            B1 = vec_add (Y3,ux1);                                      \
415
            R  = vec_packclp (R0,R1);                                   \
416
            G  = vec_packclp (G0,G1);                                   \
417
            B  = vec_packclp (B0,B1);                                   \
418
                                                                        \
419
                                                                        \
420
            out_pixels(R,G,B,outo);                                     \
421
                                                                        \
422
            y1i  += 16;                                                 \
423
            y2i  += 16;                                                 \
424
            ui   += 8;                                                  \
425
            vi   += 8;                                                  \
426
                                                                        \
427
        }                                                               \
428
                                                                        \
429
        outo  += (outstrides[0])>>4;                                    \
430
        oute  += (outstrides[0])>>4;                                    \
431
                                                                        \
432
        ui    += instrides_scl[1];                                      \
433
        vi    += instrides_scl[2];                                      \
434
        y1i   += instrides_scl[0];                                      \
435
        y2i   += instrides_scl[0];                                      \
436
    }                                                                   \
437
    return srcSliceH;                                                   \
438 a31de956 Michael Niedermayer
}
439
440
441 68363b69 Reimar Döffinger
#define out_abgr(a,b,c,ptr)  vec_mstrgb32(__typeof__(a),((__typeof__ (a)){255}),c,b,a,ptr)
442
#define out_bgra(a,b,c,ptr)  vec_mstrgb32(__typeof__(a),c,b,a,((__typeof__ (a)){255}),ptr)
443
#define out_rgba(a,b,c,ptr)  vec_mstrgb32(__typeof__(a),a,b,c,((__typeof__ (a)){255}),ptr)
444
#define out_argb(a,b,c,ptr)  vec_mstrgb32(__typeof__(a),((__typeof__ (a)){255}),a,b,c,ptr)
445 a31de956 Michael Niedermayer
#define out_rgb24(a,b,c,ptr) vec_mstrgb24(a,b,c,ptr)
446 7d20ebff Alan Curry
#define out_bgr24(a,b,c,ptr) vec_mstbgr24(a,b,c,ptr)
447 a31de956 Michael Niedermayer
448 340ea251 Alan Curry
DEFCSP420_CVT (yuv2_abgr, out_abgr)
449 582552fb Luca Barbato
#if 1
450 340ea251 Alan Curry
DEFCSP420_CVT (yuv2_bgra, out_bgra)
451 582552fb Luca Barbato
#else
452 6a4970ab Diego Biurrun
static int altivec_yuv2_bgra32 (SwsContext *c,
453 42809816 Diego Biurrun
                                unsigned char **in, int *instrides,
454
                                int srcSliceY,        int srcSliceH,
455
                                unsigned char **oplanes, int *outstrides)
456 6a4970ab Diego Biurrun
{
457 42809816 Diego Biurrun
    int w = c->srcW;
458
    int h = srcSliceH;
459
    int i,j;
460
    int instrides_scl[3];
461
    vector unsigned char y0,y1;
462
463
    vector signed char  u,v;
464
465
    vector signed short Y0,Y1,Y2,Y3;
466
    vector signed short U,V;
467
    vector signed short vx,ux,uvx;
468
    vector signed short vx0,ux0,uvx0;
469
    vector signed short vx1,ux1,uvx1;
470
    vector signed short R0,G0,B0;
471
    vector signed short R1,G1,B1;
472
    vector unsigned char R,G,B;
473
474
    vector unsigned char *uivP, *vivP;
475
    vector unsigned char align_perm;
476
477
    vector signed short
478
        lCY  = c->CY,
479
        lOY  = c->OY,
480
        lCRV = c->CRV,
481
        lCBU = c->CBU,
482
        lCGU = c->CGU,
483
        lCGV = c->CGV;
484
485
    vector unsigned short lCSHIFT = c->CSHIFT;
486
487
    ubyte *y1i   = in[0];
488
    ubyte *y2i   = in[0]+w;
489
    ubyte *ui    = in[1];
490
    ubyte *vi    = in[2];
491
492
    vector unsigned char *oute
493
        = (vector unsigned char *)
494
          (oplanes[0]+srcSliceY*outstrides[0]);
495
    vector unsigned char *outo
496
        = (vector unsigned char *)
497
          (oplanes[0]+srcSliceY*outstrides[0]+outstrides[0]);
498
499
500
    instrides_scl[0] = instrides[0];
501
    instrides_scl[1] = instrides[1]-w/2;  /* the loop moves ui by w/2 */
502
    instrides_scl[2] = instrides[2]-w/2;  /* the loop moves vi by w/2 */
503
504
505
    for (i=0;i<h/2;i++) {
506
        vec_dstst (outo, (0x02000002|(((w*3+32)/32)<<16)), 0);
507
        vec_dstst (oute, (0x02000002|(((w*3+32)/32)<<16)), 1);
508
509
        for (j=0;j<w/16;j++) {
510
511
            y0 = vec_ldl (0,y1i);
512
            y1 = vec_ldl (0,y2i);
513
            uivP = (vector unsigned char *)ui;
514
            vivP = (vector unsigned char *)vi;
515
516
            align_perm = vec_lvsl (0, ui);
517
            u  = (vector signed char)vec_perm (uivP[0], uivP[1], align_perm);
518
519
            align_perm = vec_lvsl (0, vi);
520
            v  = (vector signed char)vec_perm (vivP[0], vivP[1], align_perm);
521
            u  = (vector signed char)
522
                 vec_sub (u,(vector signed char)
523 f22e5e22 Diego Biurrun
                          vec_splat((vector signed char){128},0));
524 42809816 Diego Biurrun
525
            v  = (vector signed char)
526
                 vec_sub (v, (vector signed char)
527 f22e5e22 Diego Biurrun
                          vec_splat((vector signed char){128},0));
528 42809816 Diego Biurrun
529
            U  = vec_unpackh (u);
530
            V  = vec_unpackh (v);
531
532
533
            Y0 = vec_unh (y0);
534
            Y1 = vec_unl (y0);
535
            Y2 = vec_unh (y1);
536
            Y3 = vec_unl (y1);
537
538
            Y0 = vec_mradds (Y0, lCY, lOY);
539
            Y1 = vec_mradds (Y1, lCY, lOY);
540
            Y2 = vec_mradds (Y2, lCY, lOY);
541
            Y3 = vec_mradds (Y3, lCY, lOY);
542
543
            /*   ux  = (CBU*(u<<CSHIFT)+0x4000)>>15 */
544
            ux = vec_sl (U, lCSHIFT);
545 f22e5e22 Diego Biurrun
            ux = vec_mradds (ux, lCBU, (vector signed short){0});
546 42809816 Diego Biurrun
            ux0  = vec_mergeh (ux,ux);
547
            ux1  = vec_mergel (ux,ux);
548
549
            /* vx  = (CRV*(v<<CSHIFT)+0x4000)>>15;        */
550
            vx = vec_sl (V, lCSHIFT);
551 f22e5e22 Diego Biurrun
            vx = vec_mradds (vx, lCRV, (vector signed short){0});
552 42809816 Diego Biurrun
            vx0  = vec_mergeh (vx,vx);
553
            vx1  = vec_mergel (vx,vx);
554
            /* uvx = ((CGU*u) + (CGV*v))>>15 */
555 f22e5e22 Diego Biurrun
            uvx = vec_mradds (U, lCGU, (vector signed short){0});
556 42809816 Diego Biurrun
            uvx = vec_mradds (V, lCGV, uvx);
557
            uvx0 = vec_mergeh (uvx,uvx);
558
            uvx1 = vec_mergel (uvx,uvx);
559
            R0 = vec_add (Y0,vx0);
560
            G0 = vec_add (Y0,uvx0);
561
            B0 = vec_add (Y0,ux0);
562
            R1 = vec_add (Y1,vx1);
563
            G1 = vec_add (Y1,uvx1);
564
            B1 = vec_add (Y1,ux1);
565
            R  = vec_packclp (R0,R1);
566
            G  = vec_packclp (G0,G1);
567
            B  = vec_packclp (B0,B1);
568
569
            out_argb(R,G,B,oute);
570
            R0 = vec_add (Y2,vx0);
571
            G0 = vec_add (Y2,uvx0);
572
            B0 = vec_add (Y2,ux0);
573
            R1 = vec_add (Y3,vx1);
574
            G1 = vec_add (Y3,uvx1);
575
            B1 = vec_add (Y3,ux1);
576
            R  = vec_packclp (R0,R1);
577
            G  = vec_packclp (G0,G1);
578
            B  = vec_packclp (B0,B1);
579
580
            out_argb(R,G,B,outo);
581
            y1i  += 16;
582
            y2i  += 16;
583
            ui   += 8;
584
            vi   += 8;
585 6a4970ab Diego Biurrun
586 42809816 Diego Biurrun
        }
587 6a4970ab Diego Biurrun
588 42809816 Diego Biurrun
        outo  += (outstrides[0])>>4;
589
        oute  += (outstrides[0])>>4;
590 6a4970ab Diego Biurrun
591 42809816 Diego Biurrun
        ui    += instrides_scl[1];
592
        vi    += instrides_scl[2];
593
        y1i   += instrides_scl[0];
594
        y2i   += instrides_scl[0];
595
    }
596
    return srcSliceH;
597 582552fb Luca Barbato
}
598
599
#endif
600
601
602 340ea251 Alan Curry
DEFCSP420_CVT (yuv2_rgba, out_rgba)
603
DEFCSP420_CVT (yuv2_argb, out_argb)
604 a31de956 Michael Niedermayer
DEFCSP420_CVT (yuv2_rgb24,  out_rgb24)
605
DEFCSP420_CVT (yuv2_bgr24,  out_bgr24)
606
607
608
// uyvy|uyvy|uyvy|uyvy
609
// 0123 4567 89ab cdef
610
static
611
const vector unsigned char
612 f22e5e22 Diego Biurrun
    demux_u = {0x10,0x00,0x10,0x00,
613 6b83bb1e Diego Biurrun
               0x10,0x04,0x10,0x04,
614
               0x10,0x08,0x10,0x08,
615 f22e5e22 Diego Biurrun
               0x10,0x0c,0x10,0x0c},
616
    demux_v = {0x10,0x02,0x10,0x02,
617 6b83bb1e Diego Biurrun
               0x10,0x06,0x10,0x06,
618
               0x10,0x0A,0x10,0x0A,
619 f22e5e22 Diego Biurrun
               0x10,0x0E,0x10,0x0E},
620
    demux_y = {0x10,0x01,0x10,0x03,
621 6b83bb1e Diego Biurrun
               0x10,0x05,0x10,0x07,
622
               0x10,0x09,0x10,0x0B,
623 f22e5e22 Diego Biurrun
               0x10,0x0D,0x10,0x0F};
624 a31de956 Michael Niedermayer
625
/*
626
  this is so I can play live CCIR raw video
627
*/
628
static int altivec_uyvy_rgb32 (SwsContext *c,
629 a4eef68f Reimar Döffinger
                               const unsigned char **in, int *instrides,
630 42809816 Diego Biurrun
                               int srcSliceY,        int srcSliceH,
631
                               unsigned char **oplanes, int *outstrides)
632 a31de956 Michael Niedermayer
{
633 42809816 Diego Biurrun
    int w = c->srcW;
634
    int h = srcSliceH;
635
    int i,j;
636
    vector unsigned char uyvy;
637
    vector signed   short Y,U,V;
638
    vector signed   short R0,G0,B0,R1,G1,B1;
639
    vector unsigned char  R,G,B;
640
    vector unsigned char *out;
641 a4eef68f Reimar Döffinger
    const ubyte *img;
642 a31de956 Michael Niedermayer
643 42809816 Diego Biurrun
    img = in[0];
644
    out = (vector unsigned char *)(oplanes[0]+srcSliceY*outstrides[0]);
645 a31de956 Michael Niedermayer
646 42809816 Diego Biurrun
    for (i=0;i<h;i++) {
647
        for (j=0;j<w/16;j++) {
648
            uyvy = vec_ld (0, img);
649
            U = (vector signed short)
650 f22e5e22 Diego Biurrun
                vec_perm (uyvy, (vector unsigned char){0}, demux_u);
651 a31de956 Michael Niedermayer
652 42809816 Diego Biurrun
            V = (vector signed short)
653 f22e5e22 Diego Biurrun
                vec_perm (uyvy, (vector unsigned char){0}, demux_v);
654 a31de956 Michael Niedermayer
655 42809816 Diego Biurrun
            Y = (vector signed short)
656 f22e5e22 Diego Biurrun
                vec_perm (uyvy, (vector unsigned char){0}, demux_y);
657 a31de956 Michael Niedermayer
658 42809816 Diego Biurrun
            cvtyuvtoRGB (c, Y,U,V,&R0,&G0,&B0);
659 a31de956 Michael Niedermayer
660 42809816 Diego Biurrun
            uyvy = vec_ld (16, img);
661
            U = (vector signed short)
662 f22e5e22 Diego Biurrun
                vec_perm (uyvy, (vector unsigned char){0}, demux_u);
663 a31de956 Michael Niedermayer
664 42809816 Diego Biurrun
            V = (vector signed short)
665 f22e5e22 Diego Biurrun
                vec_perm (uyvy, (vector unsigned char){0}, demux_v);
666 a31de956 Michael Niedermayer
667 42809816 Diego Biurrun
            Y = (vector signed short)
668 f22e5e22 Diego Biurrun
                vec_perm (uyvy, (vector unsigned char){0}, demux_y);
669 a31de956 Michael Niedermayer
670 42809816 Diego Biurrun
            cvtyuvtoRGB (c, Y,U,V,&R1,&G1,&B1);
671 a31de956 Michael Niedermayer
672 42809816 Diego Biurrun
            R  = vec_packclp (R0,R1);
673
            G  = vec_packclp (G0,G1);
674
            B  = vec_packclp (B0,B1);
675 a31de956 Michael Niedermayer
676 42809816 Diego Biurrun
            //      vec_mstbgr24 (R,G,B, out);
677
            out_rgba (R,G,B,out);
678 a31de956 Michael Niedermayer
679 42809816 Diego Biurrun
            img += 32;
680
        }
681 a31de956 Michael Niedermayer
    }
682 42809816 Diego Biurrun
    return srcSliceH;
683 a31de956 Michael Niedermayer
}
684
685
686
687
/* Ok currently the acceleration routine only supports
688
   inputs of widths a multiple of 16
689
   and heights a multiple 2
690

691
   So we just fall back to the C codes for this.
692
*/
693 780daf2b Diego Biurrun
SwsFunc ff_yuv2rgb_init_altivec(SwsContext *c)
694 a31de956 Michael Niedermayer
{
695 42809816 Diego Biurrun
    if (!(c->flags & SWS_CPU_CAPS_ALTIVEC))
696
        return NULL;
697 a31de956 Michael Niedermayer
698 42809816 Diego Biurrun
    /*
699
      and this seems not to matter too much I tried a bunch of
700 bee972ee Diego Biurrun
      videos with abnormal widths and MPlayer crashes elsewhere.
701 42809816 Diego Biurrun
      mplayer -vo x11 -rawvideo on:w=350:h=240 raw-350x240.eyuv
702
      boom with X11 bad match.
703 a31de956 Michael Niedermayer

704 42809816 Diego Biurrun
    */
705
    if ((c->srcW & 0xf) != 0)    return NULL;
706
707
    switch (c->srcFormat) {
708
    case PIX_FMT_YUV410P:
709
    case PIX_FMT_YUV420P:
710
    /*case IMGFMT_CLPL:        ??? */
711
    case PIX_FMT_GRAY8:
712
    case PIX_FMT_NV12:
713
    case PIX_FMT_NV21:
714
        if ((c->srcH & 0x1) != 0)
715
            return NULL;
716
717 dd68318c Ramiro Polla
        switch(c->dstFormat) {
718 42809816 Diego Biurrun
        case PIX_FMT_RGB24:
719
            av_log(c, AV_LOG_WARNING, "ALTIVEC: Color Space RGB24\n");
720
            return altivec_yuv2_rgb24;
721
        case PIX_FMT_BGR24:
722
            av_log(c, AV_LOG_WARNING, "ALTIVEC: Color Space BGR24\n");
723
            return altivec_yuv2_bgr24;
724
        case PIX_FMT_ARGB:
725
            av_log(c, AV_LOG_WARNING, "ALTIVEC: Color Space ARGB\n");
726
            return altivec_yuv2_argb;
727
        case PIX_FMT_ABGR:
728
            av_log(c, AV_LOG_WARNING, "ALTIVEC: Color Space ABGR\n");
729
            return altivec_yuv2_abgr;
730
        case PIX_FMT_RGBA:
731
            av_log(c, AV_LOG_WARNING, "ALTIVEC: Color Space RGBA\n");
732
            return altivec_yuv2_rgba;
733
        case PIX_FMT_BGRA:
734
            av_log(c, AV_LOG_WARNING, "ALTIVEC: Color Space BGRA\n");
735
            return altivec_yuv2_bgra;
736
        default: return NULL;
737
        }
738
        break;
739
740
    case PIX_FMT_UYVY422:
741 dd68318c Ramiro Polla
        switch(c->dstFormat) {
742 42809816 Diego Biurrun
        case PIX_FMT_BGR32:
743
            av_log(c, AV_LOG_WARNING, "ALTIVEC: Color Space UYVY -> RGB32\n");
744
            return altivec_uyvy_rgb32;
745
        default: return NULL;
746
        }
747
        break;
748
749
    }
750
    return NULL;
751 a31de956 Michael Niedermayer
}
752
753 780daf2b Diego Biurrun
void ff_yuv2rgb_init_tables_altivec(SwsContext *c, const int inv_table[4], int brightness, int contrast, int saturation)
754 582552fb Luca Barbato
{
755 42809816 Diego Biurrun
    union {
756 4a888526 Måns Rullgård
        DECLARE_ALIGNED(16, signed short, tmp)[8];
757 42809816 Diego Biurrun
        vector signed short vec;
758
    } buf;
759
760 e5091488 Benoit Fouet
    buf.tmp[0] =  ((0xffffLL) * contrast>>8)>>9;                        //cy
761 42809816 Diego Biurrun
    buf.tmp[1] =  -256*brightness;                                      //oy
762
    buf.tmp[2] =  (inv_table[0]>>3) *(contrast>>16)*(saturation>>16);   //crv
763
    buf.tmp[3] =  (inv_table[1]>>3) *(contrast>>16)*(saturation>>16);   //cbu
764
    buf.tmp[4] = -((inv_table[2]>>1)*(contrast>>16)*(saturation>>16));  //cgu
765
    buf.tmp[5] = -((inv_table[3]>>1)*(contrast>>16)*(saturation>>16));  //cgv
766
767
768
    c->CSHIFT = (vector unsigned short)vec_splat_u16(2);
769
    c->CY   = vec_splat ((vector signed short)buf.vec, 0);
770
    c->OY   = vec_splat ((vector signed short)buf.vec, 1);
771
    c->CRV  = vec_splat ((vector signed short)buf.vec, 2);
772
    c->CBU  = vec_splat ((vector signed short)buf.vec, 3);
773
    c->CGU  = vec_splat ((vector signed short)buf.vec, 4);
774
    c->CGV  = vec_splat ((vector signed short)buf.vec, 5);
775
    return;
776 a31de956 Michael Niedermayer
}
777
778
779
void
780 780daf2b Diego Biurrun
ff_yuv2packedX_altivec(SwsContext *c,
781 6cce7cab Reimar Döffinger
                       const int16_t *lumFilter, const int16_t **lumSrc, int lumFilterSize,
782
                       const int16_t *chrFilter, const int16_t **chrSrc, int chrFilterSize,
783 42809816 Diego Biurrun
                     uint8_t *dest, int dstW, int dstY)
784 a31de956 Michael Niedermayer
{
785 42809816 Diego Biurrun
    int i,j;
786
    vector signed short X,X0,X1,Y0,U0,V0,Y1,U1,V1,U,V;
787
    vector signed short R0,G0,B0,R1,G1,B1;
788 582552fb Luca Barbato
789 42809816 Diego Biurrun
    vector unsigned char R,G,B;
790
    vector unsigned char *out,*nout;
791 a31de956 Michael Niedermayer
792 42809816 Diego Biurrun
    vector signed short   RND = vec_splat_s16(1<<3);
793
    vector unsigned short SCL = vec_splat_u16(4);
794 4a888526 Måns Rullgård
    DECLARE_ALIGNED(16, unsigned long, scratch)[16];
795 a31de956 Michael Niedermayer
796 42809816 Diego Biurrun
    vector signed short *YCoeffs, *CCoeffs;
797 a31de956 Michael Niedermayer
798 42809816 Diego Biurrun
    YCoeffs = c->vYCoeffsBank+dstY*lumFilterSize;
799
    CCoeffs = c->vCCoeffsBank+dstY*chrFilterSize;
800 a31de956 Michael Niedermayer
801 42809816 Diego Biurrun
    out = (vector unsigned char *)dest;
802 a31de956 Michael Niedermayer
803 dd68318c Ramiro Polla
    for (i=0; i<dstW; i+=16) {
804 42809816 Diego Biurrun
        Y0 = RND;
805
        Y1 = RND;
806
        /* extract 16 coeffs from lumSrc */
807
        for (j=0; j<lumFilterSize; j++) {
808
            X0 = vec_ld (0,  &lumSrc[j][i]);
809
            X1 = vec_ld (16, &lumSrc[j][i]);
810
            Y0 = vec_mradds (X0, YCoeffs[j], Y0);
811
            Y1 = vec_mradds (X1, YCoeffs[j], Y1);
812
        }
813 a31de956 Michael Niedermayer
814 42809816 Diego Biurrun
        U = RND;
815
        V = RND;
816
        /* extract 8 coeffs from U,V */
817
        for (j=0; j<chrFilterSize; j++) {
818
            X  = vec_ld (0, &chrSrc[j][i/2]);
819
            U  = vec_mradds (X, CCoeffs[j], U);
820
            X  = vec_ld (0, &chrSrc[j][i/2+2048]);
821
            V  = vec_mradds (X, CCoeffs[j], V);
822 3845b56d Alan Curry
        }
823 a31de956 Michael Niedermayer
824 42809816 Diego Biurrun
        /* scale and clip signals */
825
        Y0 = vec_sra (Y0, SCL);
826
        Y1 = vec_sra (Y1, SCL);
827
        U  = vec_sra (U,  SCL);
828
        V  = vec_sra (V,  SCL);
829
830
        Y0 = vec_clip_s16 (Y0);
831
        Y1 = vec_clip_s16 (Y1);
832
        U  = vec_clip_s16 (U);
833
        V  = vec_clip_s16 (V);
834
835
        /* now we have
836
          Y0= y0 y1 y2 y3 y4 y5 y6 y7     Y1= y8 y9 y10 y11 y12 y13 y14 y15
837
          U= u0 u1 u2 u3 u4 u5 u6 u7      V= v0 v1 v2 v3 v4 v5 v6 v7
838

839
          Y0= y0 y1 y2 y3 y4 y5 y6 y7    Y1= y8 y9 y10 y11 y12 y13 y14 y15
840
          U0= u0 u0 u1 u1 u2 u2 u3 u3    U1= u4 u4 u5 u5 u6 u6 u7 u7
841
          V0= v0 v0 v1 v1 v2 v2 v3 v3    V1= v4 v4 v5 v5 v6 v6 v7 v7
842
        */
843
844
        U0 = vec_mergeh (U,U);
845
        V0 = vec_mergeh (V,V);
846
847
        U1 = vec_mergel (U,U);
848
        V1 = vec_mergel (V,V);
849
850
        cvtyuvtoRGB (c, Y0,U0,V0,&R0,&G0,&B0);
851
        cvtyuvtoRGB (c, Y1,U1,V1,&R1,&G1,&B1);
852
853
        R  = vec_packclp (R0,R1);
854
        G  = vec_packclp (G0,G1);
855
        B  = vec_packclp (B0,B1);
856
857
        switch(c->dstFormat) {
858 9b734d44 Ramiro Polla
        case PIX_FMT_ABGR:  out_abgr  (R,G,B,out); break;
859
        case PIX_FMT_BGRA:  out_bgra  (R,G,B,out); break;
860
        case PIX_FMT_RGBA:  out_rgba  (R,G,B,out); break;
861
        case PIX_FMT_ARGB:  out_argb  (R,G,B,out); break;
862
        case PIX_FMT_RGB24: out_rgb24 (R,G,B,out); break;
863
        case PIX_FMT_BGR24: out_bgr24 (R,G,B,out); break;
864
        default:
865 42809816 Diego Biurrun
            {
866
                /* If this is reached, the caller should have called yuv2packedXinC
867
                   instead. */
868
                static int printed_error_message;
869
                if (!printed_error_message) {
870
                    av_log(c, AV_LOG_ERROR, "altivec_yuv2packedX doesn't support %s output\n",
871
                           sws_format_name(c->dstFormat));
872
                    printed_error_message=1;
873
                }
874
                return;
875
            }
876
        }
877 a31de956 Michael Niedermayer
    }
878
879 42809816 Diego Biurrun
    if (i < dstW) {
880
        i -= 16;
881
882
        Y0 = RND;
883
        Y1 = RND;
884
        /* extract 16 coeffs from lumSrc */
885
        for (j=0; j<lumFilterSize; j++) {
886
            X0 = vec_ld (0,  &lumSrc[j][i]);
887
            X1 = vec_ld (16, &lumSrc[j][i]);
888
            Y0 = vec_mradds (X0, YCoeffs[j], Y0);
889
            Y1 = vec_mradds (X1, YCoeffs[j], Y1);
890
        }
891 a31de956 Michael Niedermayer
892 42809816 Diego Biurrun
        U = RND;
893
        V = RND;
894
        /* extract 8 coeffs from U,V */
895
        for (j=0; j<chrFilterSize; j++) {
896
            X  = vec_ld (0, &chrSrc[j][i/2]);
897
            U  = vec_mradds (X, CCoeffs[j], U);
898
            X  = vec_ld (0, &chrSrc[j][i/2+2048]);
899
            V  = vec_mradds (X, CCoeffs[j], V);
900
        }
901 a31de956 Michael Niedermayer
902 42809816 Diego Biurrun
        /* scale and clip signals */
903
        Y0 = vec_sra (Y0, SCL);
904
        Y1 = vec_sra (Y1, SCL);
905
        U  = vec_sra (U,  SCL);
906
        V  = vec_sra (V,  SCL);
907
908
        Y0 = vec_clip_s16 (Y0);
909
        Y1 = vec_clip_s16 (Y1);
910
        U  = vec_clip_s16 (U);
911
        V  = vec_clip_s16 (V);
912
913
        /* now we have
914
           Y0= y0 y1 y2 y3 y4 y5 y6 y7     Y1= y8 y9 y10 y11 y12 y13 y14 y15
915
           U = u0 u1 u2 u3 u4 u5 u6 u7     V = v0 v1 v2 v3 v4 v5 v6 v7
916

917
           Y0= y0 y1 y2 y3 y4 y5 y6 y7    Y1= y8 y9 y10 y11 y12 y13 y14 y15
918
           U0= u0 u0 u1 u1 u2 u2 u3 u3    U1= u4 u4 u5 u5 u6 u6 u7 u7
919
           V0= v0 v0 v1 v1 v2 v2 v3 v3    V1= v4 v4 v5 v5 v6 v6 v7 v7
920
        */
921
922
        U0 = vec_mergeh (U,U);
923
        V0 = vec_mergeh (V,V);
924
925
        U1 = vec_mergel (U,U);
926
        V1 = vec_mergel (V,V);
927
928
        cvtyuvtoRGB (c, Y0,U0,V0,&R0,&G0,&B0);
929
        cvtyuvtoRGB (c, Y1,U1,V1,&R1,&G1,&B1);
930
931
        R  = vec_packclp (R0,R1);
932
        G  = vec_packclp (G0,G1);
933
        B  = vec_packclp (B0,B1);
934
935
        nout = (vector unsigned char *)scratch;
936
        switch(c->dstFormat) {
937 9b734d44 Ramiro Polla
        case PIX_FMT_ABGR:  out_abgr  (R,G,B,nout); break;
938
        case PIX_FMT_BGRA:  out_bgra  (R,G,B,nout); break;
939
        case PIX_FMT_RGBA:  out_rgba  (R,G,B,nout); break;
940
        case PIX_FMT_ARGB:  out_argb  (R,G,B,nout); break;
941
        case PIX_FMT_RGB24: out_rgb24 (R,G,B,nout); break;
942
        case PIX_FMT_BGR24: out_bgr24 (R,G,B,nout); break;
943
        default:
944
            /* Unreachable, I think. */
945
            av_log(c, AV_LOG_ERROR, "altivec_yuv2packedX doesn't support %s output\n",
946
                   sws_format_name(c->dstFormat));
947
            return;
948 42809816 Diego Biurrun
        }
949 a31de956 Michael Niedermayer
950 42809816 Diego Biurrun
        memcpy (&((uint32_t*)dest)[i], scratch, (dstW-i)/4);
951 3845b56d Alan Curry
    }
952 a31de956 Michael Niedermayer
953
}