Revision 8a322796 libswscale/internal_bfin.S
libswscale/internal_bfin.S | ||
---|---|---|
2 | 2 |
* Copyright (C) 2007 Marc Hoffman <marc.hoffman@analog.com> |
3 | 3 |
* April 20, 2007 |
4 | 4 |
* |
5 |
* Blackfin Video Color Space Converters Operations
|
|
6 |
* convert I420 YV12 to RGB in various formats,
|
|
5 |
* Blackfin video color space converter operations
|
|
6 |
* convert I420 YV12 to RGB in various formats
|
|
7 | 7 |
* |
8 | 8 |
* This file is part of FFmpeg. |
9 | 9 |
* |
... | ... | |
24 | 24 |
|
25 | 25 |
|
26 | 26 |
/* |
27 |
YUV420 to RGB565 conversion. This routine takes a YUV 420 planar macroblock
|
|
28 |
and converts it to RGB565. R:5 bits, G:6 bits, B:5 bits.. packed into shorts
|
|
27 |
YUV420 to RGB565 conversion. This routine takes a YUV 420 planar macroblock |
|
28 |
and converts it to RGB565. R:5 bits, G:6 bits, B:5 bits.. packed into shorts.
|
|
29 | 29 |
|
30 | 30 |
|
31 | 31 |
The following calculation is used for the conversion: |
... | ... | |
34 | 34 |
g = clipz((y-oy)*cy + cgv*(v-128) + cgu*(u-128)) |
35 | 35 |
b = clipz((y-oy)*cy + cbu*(u-128)) |
36 | 36 |
|
37 |
y,u,v are pre scaled by a factor of 4 i.e. left shifted to gain precision.
|
|
37 |
y,u,v are prescaled by a factor of 4 i.e. left-shifted to gain precision.
|
|
38 | 38 |
|
39 | 39 |
|
40 | 40 |
New factorization to eliminate the truncation error which was |
41 |
occuring due to the byteop3p. |
|
41 |
occurring due to the byteop3p.
|
|
42 | 42 |
|
43 | 43 |
|
44 |
1) use the bytop16m to subtract quad bytes we use this in U8 this
|
|
44 |
1) Use the bytop16m to subtract quad bytes we use this in U8 this
|
|
45 | 45 |
then so the offsets need to be renormalized to 8bits. |
46 | 46 |
|
47 |
2) scale operands up by a factor of 4 not 8 because Blackfin
|
|
47 |
2) Scale operands up by a factor of 4 not 8 because Blackfin
|
|
48 | 48 |
multiplies include a shift. |
49 | 49 |
|
50 |
3) compute into the accumulators cy*yx0, cy*yx1
|
|
50 |
3) Compute into the accumulators cy*yx0, cy*yx1.
|
|
51 | 51 |
|
52 |
4) compute each of the linear equations
|
|
52 |
4) Compute each of the linear equations:
|
|
53 | 53 |
r = clipz((y - oy) * cy + crv * (v - 128)) |
54 | 54 |
|
55 | 55 |
g = clipz((y - oy) * cy + cgv * (v - 128) + cgu * (u - 128)) |
56 | 56 |
|
57 | 57 |
b = clipz((y - oy) * cy + cbu * (u - 128)) |
58 | 58 |
|
59 |
reuse of the accumulators requires that we actually multiply
|
|
60 |
twice once with addition and the second time with a subtaction. |
|
59 |
Reuse of the accumulators requires that we actually multiply
|
|
60 |
twice once with addition and the second time with a subtraction.
|
|
61 | 61 |
|
62 |
because of this we need to compute the equations in the order R B
|
|
62 |
Because of this we need to compute the equations in the order R B
|
|
63 | 63 |
then G saving the writes for B in the case of 24/32 bit color |
64 | 64 |
formats. |
65 | 65 |
|
66 |
api: yuv2rgb_kind (uint8_t *Y, uint8_t *U, uint8_t *V, int *out,
|
|
66 |
API: yuv2rgb_kind (uint8_t *Y, uint8_t *U, uint8_t *V, int *out,
|
|
67 | 67 |
int dW, uint32_t *coeffs); |
68 | 68 |
|
69 | 69 |
A B |
... | ... | |
77 | 77 |
|
78 | 78 |
coeffs is a pointer to oy. |
79 | 79 |
|
80 |
the {rgb} masks are only utilized by the 565 packing algorithm. Note the data
|
|
81 |
replication is used to simplify the internal algorithms for the dual mac architecture
|
|
82 |
of BlackFin. |
|
80 |
The {rgb} masks are only utilized by the 565 packing algorithm. Note the data
|
|
81 |
replication is used to simplify the internal algorithms for the dual Mac
|
|
82 |
architecture of BlackFin.
|
|
83 | 83 |
|
84 |
All routines are exported with _ff_bfin_ as a symbol prefix |
|
84 |
All routines are exported with _ff_bfin_ as a symbol prefix.
|
|
85 | 85 |
|
86 |
rough performance gain compared against -O3:
|
|
86 |
Rough performance gain compared against -O3:
|
|
87 | 87 |
|
88 | 88 |
2779809/1484290 187.28% |
89 | 89 |
|
Also available in: Unified diff