Lossless RGB to Y'CbCr transformation

Lossless RGB to Y'CbCr transformation - rgb

I am trying to losslessly compress an image, and in order to take advantage of regularities, I want to convert the image from RGB to Y'CbCr. (The exact details of what I mean by RGB and Y'CbCr are not important here; the RGB data consists of three bytes, and I have three bytes to store the result in.)
The conversion process itself is pretty straightforward, but there is one problem: although the transformation is mathematically invertible, in practice there will be rounding errors. Of course these errors are small and virtually unnoticeable, but it does mean that the process is not lossless any more.
My question is: does a transformation exist, that converts three eight-bit integers (representing red, green and blue components) into three other eight-bit integers (representing a colour space similar to Y'CbCr, where two components change only slightly with respect to position, or at least less than in an RGB colour space), and that can be inverted without loss of information?

YCoCg24
Here is one color transformation I call "YCoCg24" that that converts three eight-bit integers (representing red, green and blue components) into three other eight-bit (signed) integers (representing a colour space similar to Y'CbCr), and is bijective (and therefore can be inversed without loss of information):
G R B Y Cg Co
| | | | | |
| |->-(-1)->(+) (+)<-(-/2)<-| |
| | | | | |
| (+)<-(/2)-<-| |->-(+1)->(+) |
| | | | | |
|->-(-1)->(+) | | (+)<-(-/2)<-|
| | | | | |
(+)<-(/2)-<-| | | |->-(+1)->(+)
| | | | | |
Y Cg Co G R B
forward transformation reverse transformation
or in pseudocode:
function forward_lift( x, y ):
signed int8 diff = ( y - x ) mod 0x100
average = ( x + ( diff >> 1 ) ) mod 0x100
return ( average, diff )
function reverse_lift( average, signed int8 diff ):
x = ( average - ( diff >> 1 ) ) mod 0x100
y = ( x + diff ) mod 0x100
return ( x, y )
function RGB_to_YCoCg24( red, green, blue ):
(temp, Co) = forward_lift( red, blue )
(Y, Cg) = forward_lift( green, temp )
return( Y, Cg, Co)
function YCoCg24_to_RGB( Y, Cg, Co ):
(green, temp) = reverse_lift( Y, Cg )
(red, blue) = reverse_lift( temp, Co)
return( red, green, blue )
Some example colors:
color R G B Y CoCg24
white 0xFFFFFF 0xFF0000
light grey 0xEFEFEF 0xEF0000
dark grey 0x111111 0x110000
black 0x000000 0x000000
red 0xFF0000 0xFF01FF
lime 0x00FF00 0xFF0001
blue 0x0000FF 0xFFFFFF
G, R-G, B-G color space
Another color transformation that converts three eight-bit integers into three other eight-bit integers.
function RGB_to_GCbCr( red, green, blue ):
Cb = (blue - green) mod 0x100
Cr = (red - green) mod 0x100
return( green, Cb, Cr)
function GCbCr_to_RGB( Y, Cg, Co ):
blue = (Cb + green) mod 0x100
red = (Cr + green) mod 0x100
return( red, green, blue )
Some example colors:
color R G B G CbCr
white 0xFFFFFF 0xFF0000
light grey 0xEFEFEF 0xEF0000
dark grey 0x111111 0x110000
black 0x000000 0x000000
comments
There seem to be quite a few lossless color space transforms.
Several lossless color space transforms are mentioned in Henrique S. Malvar, et al. "Lifting-based reversible color transformations for image compression";
there's the lossless colorspace transformation in JPEG XR;
the original reversible color transform (ORCT) used in several "lossless JPEG" proposals;
G, R-G, B-G color space;
etc.
Malvar et al seem pretty excited about the 26-bit YCoCg-R representation of a 24-bit RGB pixel.
However, nearly all of them require more than 24 bits to store the transformed pixel color.
The "lifting" technique I use in YCoCg24 is similar to the one in Malvar et al and to the lossless colorspace transformation in JPEG XR.
Because addition is reversible (and addition modulo 0x100 is bijective), any transform from (a,b) to (x,y) that can be produced by the following Feistel network is reversible and bijective:
a b
| |
|->-F->-(+)
| |
(+)-<-G-<-|
| |
x y
where (+) indicates 8-bit addition (modulo 0x100), a b x y are all 8-bit values, and F and G indicate any arbitrary function.
details
Why do you only have 3 bytes to store the result in?
That sounds like a counter-productive premature optimization.
If your goal is to losslessly compress an image into as small a compressed file as possible in a reasonable amount of time, then the size of the intermediate stages is irrelevant.
It may even be counter-productive --
a "larger" intermediate representation (such as Reversible Colour Transform or the 26-bit YCoCg-R) may result in smaller final compressed file size than a "smaller" intermediate representation (such as RGB or YCoCg24).
EDIT:
Oopsies.
Either one of "(x) mod 0x100" or "(x) & 0xff" give exactly the same results --
the results I wanted.
But somehow I jumbled them together to produce something that wouldn't work.

I did find one such solution, used by JPEG 2000. It is called a Reversible Colour Transform (RCT), and it is described at Wikipedia as well as the JPEG site (though the rounding methods are not consistent). The results are not as good as with the irreversible colour transform, however.
I also found a better method described in the paper Improved Reversible Integer-to-integer Color Transforms by Soo-Chang Pei and Jian-Jiun Ding. However, the methods described in that paper, and the method used by JPEG 2000, require extra bits to store the result. This means that the transformed values do not fit in 24 bits any more.

Related

Colorspace management in Postscript, Ghostscript, GSView

I am attempting to write some Postscript in order to produce artwork in files I can send to a printer to get some signs printed.
The printer has various requirements for PDFs, one of which is that they should use CMYK.
In all my prior use of Postscript I have used setrgbcolor and never really dealt with colorspace management, ICC profiles, etc.
One of the colours I am using is called RAL 1507 RAL 5017 (Traffic Blue) with RGB and CMYK values I obtained by using a search engine for the colour name. I checked by using an online RGB to CMYK convertor (with no specified colourspace profile)
I though I'd try setcmykcolor and created the following
%!PS-Adobe3.0
%
% Test use of CMYK in Postscript in preparation for creating a PDF/A-1a file
% for use by a commercial printer.
%
%%Pages: 1
%%Page: One 1
/Hevetica-Bold 20 selectfont
0 90 255 div 140 255 div setrgbcolor
100 100 250 100 rectfill
120 130 moveto 1 setgray (RGB: 0 90 140) show
100 255 div 60 255 div 0 10 255 div setcmykcolor
100 200 250 100 rectfill
120 230 moveto 1 setgray (CMYK: 100 60 0 10) show
100 255 div 36 255 div 0 45 255 div setcmykcolor
100 300 250 100 rectfill
120 330 moveto 1 setgray (CMYK: 100 36 0 45) show
0 0 1 setrgbcolor
100 400 250 100 rectfill
120 430 moveto 1 setgray (RGB: 0 0 255) show
showpage
%%EOF
(Forgive the DSC - it's intended to be just enough to placate GSView)
GSView 5.0 on MS-Windows 10 with Ghostscript 9.05 renders it like this
I had expected at least one of the CMYK colours to be rendered much closer to the bottom RGB colour.
The colour in question is designed for printing road-signs, so I'd be surprised if it is outside the relevant colour gamut used by commercial printers.
What do I need to do to be confident the printer will print my CMYK value with a result close to what I expect from GSView's rendering of the RGB value.

I don't know where you got the CMYK values from but they are not (IMO) a good representation of the RGB colour. Try 0.74 0.44 0 0.27 setcmykcolor instead.
The numbers you have used would be reasonable, if you treated them as percentages, not as values in the range 0->255. 100% Cyan, 36% magenta 0% yellow and 45% black produces quite a respectable match. I wonder if that's your mistake ?
That would be:
1 0.36 0 0.45 setcmykcolor
By the way, I think you mean RAL 5017, not 1507 which is red.
On top of that, bear in mind that you are converting an RGB colour to CMYK, then displaying that CMYK value on an RGB monitor, which involves converting it back to RGB, so some loss of precision is to be expected.
The highly simplistic calculation given in the Red Book (PostScript Language Reference Manual) is that cyan = 1 - red, magenta = 1 - green, yellow = 1 - blue. However equal values of CMYK do not generally create black, so we also apply undercolor removal.
Take the lowest value of C, M, Y, Make that value K (black). Then subtract k from each of C, M, Y. The final result is:
c = 1 - red
m = 1 - green
y = 1 - blue
k = min (c, m, y)
cyan = c - k
magenta = m - k
yellow = y - k
black = k
For your values (mapped to values from 0-1, assuming a range of 0-255);
red = 0
green = 0.353
blue = 0.549
c = 1 - 0 = 1
m = 1 - 0.353 = 0.647
y = 1 - 0.549 = 0.451
k = 0.451
cyan = 1 - 0.451 = .549
magenta = 0.647 - 0.451 = 0.196
yellow = 0.451 - 0.451 = 0
black = 0.451
so
0.549 0.196 0 0.451 setcmykcolor
That's a cheap and cheerful calculation, its intended to be done by a PostScript interpreter in a printer, so its chosen to be quick, rather than accurate. But I think you'll see that its closer than the values you were using.
For proper colour space management the RGB colours you are using would be values in a particular RGB space, for example the colour space of your monitor. You would then use the ICC profile associated with that device to turn the RGB values into values in the CIE XYZ space (a device-independent space). Then you would choose a particular destination CMYK space (eg the printer you want to use) and would use the ICC profile asociated with the destination device to go the other way, turn the XYZ values into CMYK values.
In a properly colour managed workflow, where all the devices are characterised by ICC profiles, the result is that the colour on all the devices would be as close as it possible to get to being the same.
Of course, this relies on you having everything characterised, clearly you don't.
Note that spot colours (/Separation colours in PostScript and PDF) are somewhat 'different'. These are intended to be printed using the specific ink so there's no need to characterise the values, 50% Pantone 1495 is an absolutely accurate value.
However, if your printer isn't equipped to print that colour, because for example your doing a quick check on your local CMYK printer, these colours are normally defined to have an 'alternate' representation. Ideally these would be CMYK values which will print something which is not entirely unlike the desired colour. Some ink manufacturers specify an alternate representation which is not a particularly good representation of the actual colour, arguably because they have a number of inks which map to the same colour in CMYK, so they use 'off' values to be able to tell the difference. Suspicious users have been known to comment that its done to make sure you can't do a decent print without using the manufacturers inks.....

Move item in list with sorted numeric-index

Case: Consider a list stored in sql-database with 2 columns:
-----------------------
Note | Index_Value
-----------------------
Yellow | 4
Red | 7
Blue | 3
Grey | 30
The user is shown data of this list by simply sorting on Index_Value column, which is shown as:
Blue
Yellow
Red
Grey
How do I move Grey between Red & Blue, without affecting any other row but Grey's index value?
I have came up with simple maths, but it's far from perfect.
Solution: Median of values between which object is moved
( Index_Value(Previous) + Index_Value(After) )/2
(7 + 3)/2 = 5.5
This approach produces large decimal numbers just after few repetitions.

How to find out the intersection of two coplanar lines in C

I have two 3D lines which lie on the same plane. line1 is defined by a point (x1, y1, z1) and its direction vector (a1, b1, c1) while line2 is defined by a point (x2, y2, z2) and its direction vector (a2, b2, c2). Then the parametric equations for both lines are
x = x1 + a1*t; x = x2 + a2*s;
y = y1 + b1*t; y = y2 + b2*s;
z = z1 + c1*t; z = z2 + c2*s;
If both direction vectors are nonzeros, we can find out the location of intersection node easily by equating the right-hand-side of the equations above and solving t and s from either two of the three. However, it's possible that a1 b1 c1 a2 b2 c2 are not all nonzero so that I can't solve those equations in the same way. My current thought is to deal with this issue case by case, like
case1: a1 = 0, others are nonzero
case2: a2 = 0, others are nonzero
case3: b1 = 0, others are nonzero
...
However, there are so many cases in total and the implementation would become messy. Is there any good ways to tackle this problem? Any reference? Thanks a lot!

It is much more practical to see this as a vector equation. A dot . is a scalar product and A,n,B,m are vectors describing the lines. Point A is on the first line of direction n. Directions are normalized : n.n=1 and m.m=1. The point of intersection C is such that :
C=A+nt=B+ms
where t and s are scalar parameters to be computed.
Therefore (.n) :
A.n+ t=B.n+m.n s
t= (B-A).n+m.n s
And (.m):
A.m+n.m t=B.m+ s
A.m+n.m (B-A).n+(m.n)^2 s=B.m+ s
n.m(B-A).n+(A-B).m=(1-(m.n)^2).s
Since n.n=m.m=1 and n and m are not aligned, (m.n)^2<1 :
s=[n.m(B-A).n+(A-B).m]/[1-(m.n)^2]
t= (B-A).n+m.n s

You can solve this as a linear system:
| 1 0 0 -a1 0 | | x | | x1 |
| 0 1 0 -b1 0 | | y | | y1 |
| 0 0 1 -c1 0 | | z | = | z1 |
| 1 0 0 0 -a2 | | s | | x2 |
| 0 1 0 0 -b2 | | t | | y2 |
| 0 0 1 0 -c2 | | z2 |
x y z is the intersection point, and s t are the coefficients of the vectors. This solves the same equation that #francis wrote, with the advantage that it also obtains the solution that minimizes the error in case your data are not perfect.
This equation is usually expressed as Ax=b, and can be solved by doing x = A^(-1) * b, where A^(-1) is the pseudo-inverse of A. All the linear algebra libraries implement some function to solve systems like this, so don't worry.

It might be vital to remember that calculations are never exact, and small deviations in your constants and calculations can make your lines not exactly intersect.
Therefore, let's solve a more general problem - find the values of t and s for which the distance between the corresponding points in the lines is minimal. This is clearly a task for calculus, and it's easy (because linear functions are the easiest ones in calculus).
So the points are
[xyz1]+[abc1]*t
and
[xyz2]+[abc2]*s
(here [xyz1] is a 3-vector [x1, y1, z1] and so on)
The (square of) the distance between them:
([abc1]*t - [abc2]*s + [xyz1]-[xyz2])^2
(here ^2 is a scalar product of a 3-vector with itself)
Let's find a derivative of this with respect to t:
[abc1] * ([abc1]*t - [abc2]*s + [xyz1]-[xyz2]) (multiplied by 2, but this doesn't matter)
(here the first * is a scalar product, and the other *s are regular multiplications between a vector and a number)
The derivative should be equal to zero at the minimum point:
[abc1] * ([abc1]*t - [abc2]*s + [xyz1]-[xyz2]) = 0
Let's use the derivative with respect to s too - we want it to be zero too.
[abc1]*[abc1]*t - [abc1]*[abc2]*s = -[abc1]*([xyz1]-[xyz2])
-[abc2]*[abc1]*t + [abc2]*[abc2]*s = [abc2]*([xyz1]-[xyz2])
From here, let's find t and s.
Then, let's find the two points that correspond to these t and s. If all calculations were ideal, these points would coincide. However, at this point you are practically guaranteed to get some small deviations, so take and of these points as your result (intersection of the two lines).
It might be better to take the average of these points, to make the result symmetrical.

Fastest approximative methods to convert YUV to RGBA?

I am looking for a fastest method to convert one YUV array into RGBA array. For example, given a YCbYCr array, which is a sequence of bytes:
YCbYCr = Luma0, Cb0, Luma1, Cr0, Luma2, ...
where 8-bit blue- and 8-bit red- difference chroma components are sampled with period 2, but 8-bit luma is sampled with period 1. For example:
image pixel (0,0) has luma0, Cr0 red-difference chroma, Cb0 blue-difference chroma
image pixel (0,1) has luma1, Cr0 red-difference chroma, Cb0 blue-difference chroma
image pixel (0,2) has luma2, Cr1 red-difference chroma, Cb1 blue-difference chroma
image pixel (0,3) has luma3, Cr1 red-difference chroma, Cb1 blue-difference chroma
etc.
RGBA array should be produced which is:
RGBA = R0, G0, B0, A0, R1, G1, ...
where all elements are unsigned chars and all bits of each A# are zero.
Luma component in YCbYCr is 0..255 unsigned char, Cb and Cr are -127..+126 signed chars. There is a standard approach --- matrix multiplication, but it's very slow for real time apps and it operates with floating point numbers. I am looking for a fast approximate numerical method.

The biggest single computational saving you're likely to get is simply by doing the computation in fixed-point rather than floating-point. It's likely to be an order of magnitude faster (at a guess).
You can also take advantage of the redundancy in the subsampled chroma contributions. Given that the full matrix multiply is of the form:
R a b c Y
G = d e f . Cb
B g h i Cr
You can compute the chroma partial sum half as often:
R' b c
G' = e f . Cb
B' h i Cr
and then simply add it to the luma contribution at the full output rate:
R R' a
G = G' + d . Y
B B' g

Fast Hypotenuse Algorithm for Embedded Processor?

Is there a clever/efficient algorithm for determining the hypotenuse of an angle (i.e. sqrt(a² + b²)), using fixed point math on an embedded processor without hardware multiply?

If the result doesn't have to be particularly accurate, you can get a crude
approximation quite simply:
Take absolute values of a and b, and swap if necessary so that you have a <= b. Then:
h = ((sqrt(2) - 1) * a) + b
To see intuitively how this works, consider the way that a shallow angled line is plotted on a pixel display (e.g. using Bresenham's algorithm). It looks something like this:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | | | | | | | | | | | | | | |*|*|*| ^
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | | | | | | | | | | |*|*|*|*| | | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | | | | | | |*|*|*|*| | | | | | | | a pixels
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | | |*|*|*|*| | | | | | | | | | | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
|*|*|*|*| | | | | | | | | | | | | | | | v
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
<-------------- b pixels ----------->
For each step in the b direction, the next pixel to be plotted is either immediately to the right, or one pixel up and to the right.
The ideal line from one end to the other can be approximated by the path which joins the centre of each pixel to the centre of the adjacent one. This is a series of a segments of length sqrt(2), and b-a segments of length 1 (taking a pixel to be the unit of measurement). Hence the above formula.
This clearly gives an accurate answer for a == 0 and a == b; but gives an over-estimate for values in between.
The error depends on the ratio b/a; the maximum error occurs when b = (1 + sqrt(2)) * a and turns out to be 2/sqrt(2+sqrt(2)), or about 8.24% over the true value. That's not great, but if it's good enough for your application, this method has the advantage of being simple and fast. (The multiplication by a constant can be written as a sequence of shifts and adds.)

For the record, here are a few more approximations, listed in roughly
increasing order of complexity and accuracy. All these assume 0 ≤ a ≤ b.
h = b + 0.337 * a // max error ≈ 5.5 %
h = max(b, 0.918 * (b + (a>>1))) // max error ≈ 2.6 %
h = b + 0.428 * a * a / b // max error ≈ 1.04 %
Edit: to answer Ecir Hana's question, here is how I derived these
approximations.
First step. Approximating a function of two variables can be a
complex problem. Thus I first transformed this into the problem of
approximating a function of one variable. This can be done by choosing
the longest side as a “scale” factor, as follows:
h = √(b2 + a2)
= b √(1 + (a/b)2)
= b f(a/b) where f(x) = √(1+x2)
Adding the constraint 0 ≤ a ≤ b means we are only concerned with
approximating f(x) in the interval [0, 1].
Below is the plot of f(x) in the relevant interval, together with the
approximation given by Matthew Slattery (namely (√2−1)x + 1).
Second step. Next step is to stare at this plot, while asking
yourself the question “how can I approximate this function cheaply?”.
Since the curve looks roughly parabolic, my first idea was to use a
quadratic function (third approximation). But since this is still
relatively expensive, I also looked at linear and piecewise linear
approximations. Here are my three solutions:
The numerical constants (0.337, 0.918 and 0.428) were initially free
parameters. The particular values were chosen in order to minimize the
maximum absolute error of the approximations. The minimization could
certainly be done by some algorithm, but I just did it “by hand”,
plotting the absolute error and tuning the constant until it is
minimized. In practice this works quite fast. Writing the code to
automate this would have taken longer.
Third step is to come back to the initial problem of approximating a
function of two variables:
h ≈ b (1 + 0.337 (a/b)) = b + 0.337 a
h ≈ b max(1, 0.918 (1 + (a/b)/2)) = max(b, 0.918 (b + a/2))
h ≈ b (1 + 0.428 (a/b)2) = b + 0.428 a2/b

Consider using CORDIC methods. Dr. Dobb's has an article and associated library source here. Square-root, multiply and divide are dealt with at the end of the article.

One possibility looks like this:
#include <math.h>
/* Iterations Accuracy
* 2 6.5 digits
* 3 20 digits
* 4 62 digits
* assuming a numeric type able to maintain that degree of accuracy in
* the individual operations.
*/
#define ITER 3
double dist(double P, double Q) {
/* A reasonably robust method of calculating `sqrt(P*P + Q*Q)'
*
* Transliterated from _More Programming Pearls, Confessions of a Coder_
* by Jon Bentley, pg. 156.
*/
double R;
int i;
P = fabs(P);
Q = fabs(Q);
if (P<Q) {
R = P;
P = Q;
Q = R;
}
/* The book has this as:
* if P = 0.0 return Q; # in AWK
* However, this makes no sense to me - we've just insured that P>=Q, so
* P==0 only if Q==0; OTOH, if Q==0, then distance == P...
*/
if ( Q == 0.0 )
return P;
for (i=0;i<ITER;i++) {
R = Q / P;
R = R * R;
R = R / (4.0 + R);
P = P + 2.0 * R * P;
Q = Q * R;
}
return P;
}
This still does a couple of divides and four multiples per iteration, but you rarely need more than three iterations (and two is often adequate) per input. At least with most processors I've seen, that'll generally be faster than the sqrt would be on its own.
For the moment it's written for doubles, but assuming you've implemented the basic operations, converting it to work with fixed point shouldn't be terribly difficult.
Some doubts have been raised by the comment about "reasonably robust". At least as originally written, this was basically a rather backhanded way of saying that "it may not be perfect, but it's still at least quite a bit better than a direct implementation of the Pythagorean theorem."
In particular, when you square each input, you need roughly twice as many bits to represent the squared result as you did to represent the input value. After you add (which needs only one extra bit) you take the square root, which gets you back to needing roughly the same number of bits as the inputs. Unless you have a type with substantially greater precision than the inputs, it's easy for this to produce really poor results.
This algorithm doesn't square either input directly. It is still possible for an intermediate result to underflow, but it's designed so that when it does so, the result still comes out as well as the format in use supports. Basically, the situation in which it happens is that you have an extremely acute triangle (e.g., something like 90 degrees, 0.000001 degrees, and 89.99999 degrees). If it's close enough to 90, 0, 90, we may not be able to represent the difference between the two longer sides, so it'll compute the hypotenuse as being the same length as the other long side.
By contrast, when the Pythagorean theorem fails, the result will often be a NaN (i.e., tells us nothing) or, depending on the floating point format in use, quite possibly something that looks like a reasonable answer, but is actually wildly incorrect.

You can start by reevaluating if you need the sqrt at all. Many times you are calculating the hypotenuse just to compare it to another value - if you square the value you're comparing against you can eliminate the square root altogether.

Unless you're doing this at >1kHz, multiply even on a MCU without hardware MUL isn't terrible. What's much worse is the sqrt. I would try to modify my application so it doesn't need to calculate it at all.
Standard libraries would probably be best if you actually need it, but you could look at using Newton's method as a possible alternative. It would require several multiply/divide cycles to perform, however.
AVR resources
Atmel App note AVR200: Multiply and Divide Routines (pdf)
This sqrt function on AVR Freaks forum
Another AVR Freaks post

Maybe you could use some of Elm Chans Assembler Libraries and adapt the ihypot-function to your ATtiny. You would need to replace the MUL and maybe (i haven't checked) some other instructions.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight