Fast, optimized and accurate RGB <-> HSB conversion code in C - c

I'm looking for a fast, accurate implementation of RGB to HSB and HSB to RGB in pure C. Note that I'm specifically looking for Hue, Saturation, Brightness and not HSL (Luminosity).
Of course I have Googled this extensively, but speed is of the utmost importance here and I am looking for any specific recommendations for solid, fast, reliable code.

Here is a straightforward implementation in standard C.
This is - without further context - as good as it can get. Perhaps you care to shed some more light on
how you store your RGB samples (bits/pixel to begin with !?)
how you store your pixel data (do you want to efficiently transform larger buffers, if so, what is the organization)
how you want to represent the output (I assumed floats for now)
I could come up with a further optimized version (perhaps one that utilisze SSE4 instructions nicely...)
All that said, when compiled with optimizations, this doesn't work too badly:
#include <stdio.h>
#include <math.h>
typedef struct RGB_t { unsigned char red, green, blue; } RGB;
typedef struct HSB_t { float hue, saturation, brightness; } HSB;
/*
* Returns the hue, saturation, and brightness of the color.
*/
void RgbToHsb(struct RGB_t rgb, struct HSB_t* outHsb)
{
// TODO check arguments
float r = rgb.red / 255.0f;
float g = rgb.green / 255.0f;
float b = rgb.blue / 255.0f;
float max = fmaxf(fmaxf(r, g), b);
float min = fminf(fminf(r, g), b);
float delta = max - min;
if (delta != 0)
{
float hue;
if (r == max)
{
hue = (g - b) / delta;
}
else
{
if (g == max)
{
hue = 2 + (b - r) / delta;
}
else
{
hue = 4 + (r - g) / delta;
}
}
hue *= 60;
if (hue < 0) hue += 360;
outHsb->hue = hue;
}
else
{
outHsb->hue = 0;
}
outHsb->saturation = max == 0 ? 0 : (max - min) / max;
outHsb->brightness = max;
}
Typical usage and test:
int main()
{
struct RGB_t rgb = { 132, 34, 255 };
struct HSB_t hsb;
RgbToHsb(rgb, &hsb);
printf("RGB(%u,%u,%u) -> HSB(%f,%f,%f)\n", rgb.red, rgb.green, rgb.blue,
hsb.hue, hsb.saturation, hsb.brightness);
// prints: RGB(132,34,255) -> HSB(266.606354,0.866667,1.000000)
return 0;
}

First off
HSB and HLS were developed to specify numerical Hue, Saturation and Brightness (or Hue, Lightness and Saturation) in an age when users had to specify colors numerically. The usual formulations of HSB and HLS are flawed with respect to the properties of color vision. Now that users can choose colors visually, or choose colors related to other media (such as PANTONE), or use perceptually-based systems like L*u*v* and L*a*b*, HSB and HLS should be abandoned [source]
Look at the opensource Java implementation here
Boost library (I know, it's C++) seemed to contains conversion to HSB at one time but nowadays I can only find a luminance conversion (here)

I would suggest using a lookup table to store the HSB and RGB values. First convert the RGB value (which is presumably 8 bits per component) to a 16-bit value (5 bits per component). The HSB values, also 16-bit values, can be converted in the same way, whereby here, the hue component should probably use more bits than the saturation and brightness, probably 8 bits per component, and the saturation and brightness 4 bits each. Of course, the reverse will apply when converting from HSB to RGB.

A fast RGB to HSV floating point conversion from lolengine.net takes a "common" RGB2HSV implementation and makes these observations:
Only the hue offset K changes. The idea now is the following:
Sort the triplet (r,g,b) using comparisons
Build K while sorting the triplet
Perform the final calculation
We notice that the last swap effectively changes the sign of K and the sign of g - b. Since both are then added and passed to fabs(), the sign reversal can actually be omitted.
The step before that last point looks like C code, but their final form is C++ that's trivially convertible to C:
static void RGB2HSV(float r, float g, float b,
float &h, float &s, float &v)
{
float K = 0.f;
if (g < b)
{
std::swap(g, b);
K = -1.f;
}
if (r < g)
{
std::swap(r, g);
K = -2.f / 6.f - K;
}
float chroma = r - std::min(g, b);
h = fabs(K + (g - b) / (6.f * chroma + 1e-20f));
s = chroma / (r + 1e-20f);
v = r;
}

Related

How to generate a uniformly distributed random double in C using libsodium in range [-a,a]?

The libsodium library has a function
uint32_t randombytes_uniform(const uint32_t upper_bound);
but obviously this returns an unsigned integer. Can I somehow use this to generate a uniformly distributed random double in range [-a,a] where a is also a double given by the user ? I am especially focused on the result being uniformly distributed/unbiased, so that is why I would like to use the libsodium library.
const uint32_t mybound = 1000000000; // Example
const uint32_t x = randombytes_uniform(mybound);
const double a = 3.5; // Example
const double variate = a * ( (2.0 * x / mybound) - 1);
Let me try to to do it step-by-step.
First, you obviously need to combine two calls to get up to 64bit of randomness for one double value output.
Second, you convert it to [0...1] interval. There are several ways to do it, all of the are good in some sense or another, I prefer uniform random dyadic rationals in the form n*2-53, see here for details. You could try other methods listed above as well. NB: methods in the link produce results in [0...1) range, I've tried to do acceptance/rejection to get closed [0...1] range.
Last, I scale result into desired range.
Sorry, C++ only but it is trivial to convert to C
#include <stdint.h>
#include <math.h>
#include <iostream>
#include <random>
// emulate libsodium RNG, valid for full 32bits result only!
static uint32_t randombytes_uniform(const uint32_t upper_bound) {
static std::mt19937 mt{9876713};
return mt();
}
// get 64bits from two 32bit numbers
static inline uint64_t rng() {
return (uint64_t)randombytes_uniform(UINT32_MAX) << 32 | randombytes_uniform(UINT32_MAX);
}
const int32_t bits_in_mantissa = 53;
const uint64_t max = (1ULL << bits_in_mantissa);
const uint64_t mask = (1ULL << (bits_in_mantissa+1)) - 1;
static double rnd(double a, double b) {
uint64_t r;
do {
r = rng() & mask; // get 54 random bits, need 53 or max
} while (r > max);
double v = ldexp( (double)r, -bits_in_mantissa ); // http://xoshiro.di.unimi.it/random_real.c
return a + (b-a)*v;
}
int main() {
double a = -3.5;
double b = 3.5;
for(int k = 0; k != 100; ++k)
std::cout << rnd(a, b) << '\n';
return 0;
}
First recognizing that finding a random number [0...a] is a sufficient step, followed by a coin flip for +/-.
Step 2. Find the expo such that a < 2**expo or ceil(log2(a)).
int sign;
do {
int exp;
frexp(a, &exp);
Step 3. Form an integral 63-bit random number [0...0x7FFF_FFFF_FFFF_FFFF] and random sign. The 63 should be at least as wide as the precision of a double - which is often 53 bits. At this point r is certainly uniform.
unit64_t r = randombytes_uniform(0xFFFFFFFF);
r <<= 32;
r |= randombytes_uniform(0xFFFFFFFF);
// peel off one bit for sign
sign = r & 1;
r >>= 1;
Step 4. Scale and test if in range. Repeat as needed.
double candidate = ldexp(r/pow(2 63), expo);
} while (candidate > a);
Step 5. Apply the sign.
if (sign) {
candidate = -candidate;
}
return candidate;
Avoid (2.0 * x / a) - 1 as the calculation is not symmetric about 0.0.
Code would benefit with improvements to deal with a near DBL_MAX.
Some rounding issues apply that this answer glosses over, yet the distribution remains uniform - except potentially at the edges.

How to calculate the log2 of integer in C as precisely as possible with bitwise operations

I need to calculate the entropy and due to the limitations of my system I need to use restricted C features (no loops, no floating point support) and I need as much precision as possible. From here I figure out how to estimate the floor log2 of an integer using bitwise operations. Nevertheless, I need to increase the precision of the results. Since no floating point operations are allowed, is there any way to calculate log2(x/y) with x < y so that the result would be something like log2(x/y)*10000, aiming at getting the precision I need through arithmetic integer?
You will base an algorithm on the formula
log2(x/y) = K*(-log(x/y));
where
K = -1.0/log(2.0); // you can precompute this constant before run-time
a = (y-x)/y;
-log(x/y) = a + a^2/2 + a^3/3 + a^4/4 + a^5/5 + ...
If you write the loop correctly—or, if you prefer, unroll the loop to code the same sequence of operations looplessly—then you can handle everything in integer operations:
(y^N*(1*2*3*4*5*...*N)) * (-log(x/y))
= y^(N-1)*(2*3*4*5*...*N)*(y-x) + y^(N-2)*(1*3*4*5*...*N)*(y-x)^2 + ...
Of course, ^, the power operator, binding tighter than *, is not a C operator, but you can implement that efficiently in the context of your (perhaps unrolled) loop as a running product.
The N is an integer large enough to afford desired precision but not so large that it overruns the number of bits you have available. If unsure, then try N = 6 for instance. Regarding K, you might object that that is a floating-point number, but this is not a problem for you because you are going to precompute K, storing it as a ratio of integers.
SAMPLE CODE
This is a toy code but it works for small values of x and y such as 5 and 7, thus sufficing to prove the concept. In the toy code, larger values can silently overflow the default 64-bit registers. More work would be needed to make the code robust.
#include <stddef.h>
#include <stdlib.h>
// Your program will not need the below headers, which are here
// included only for comparison and demonstration.
#include <math.h>
#include <stdio.h>
const size_t N = 6;
const long long Ky = 1 << 10; // denominator of K
// Your code should define a precomputed value for Kx here.
int main(const int argc, const char *const *const argv)
{
// Your program won't include the following library calls but this
// does not matter. You can instead precompute the value of Kx and
// hard-code its value above with Ky.
const long long Kx = lrintl((-1.0/log(2.0))*Ky); // numerator of K
printf("K == %lld/%lld\n", Kx, Ky);
if (argc != 3) exit(1);
// Read x and y from the command line.
const long long x0 = atoll(argv[1]);
const long long y = atoll(argv[2]);
printf("x/y == %lld/%lld\n", x0, y);
if (x0 <= 0 || y <= 0 || x0 > y) exit(1);
// If 2*x <= y, then, to improve accuracy, double x repeatedly
// until 2*x > y. Each doubling offsets the log2 by 1. The offset
// is to be recovered later.
long long x = x0;
int integral_part_of_log2 = 0;
while (1) {
const long long trial_x = x << 1;
if (trial_x > y) break;
x = trial_x;
--integral_part_of_log2;
}
printf("integral_part_of_log2 == %d\n", integral_part_of_log2);
// Calculate the denominator of -log(x/y).
long long yy = 1;
for (size_t j = N; j; --j) yy *= j*y;
// Calculate the numerator of -log(x/y).
long long xx = 0;
{
const long long y_minus_x = y - x;
for (size_t i = N; i; --i) {
long long term = 1;
size_t j = N;
for (; j > i; --j) {
term *= j*y;
}
term *= y_minus_x;
--j;
for (; j; --j) {
term *= j*y_minus_x;
}
xx += term;
}
}
// Convert log to log2.
xx *= Kx;
yy *= Ky;
// Restore the aforementioned offset.
for (; integral_part_of_log2; ++integral_part_of_log2) xx -= yy;
printf("log2(%lld/%lld) == %lld/%lld\n", x0, y, xx, yy);
printf("in floating point, this ratio of integers works out to %g\n",
(1.0*xx)/(1.0*yy));
printf("the CPU's floating-point unit computes the log2 to be %g\n",
log2((1.0*x0)/(1.0*y)));
return 0;
}
Running this on my machine with command-line arguments of 5 7, it outputs:
K == -1477/1024
x/y == 5/7
integral_part_of_log2 == 0
log2(5/7) == -42093223872/86740254720
in floating point, this ratio of integers works out to -0.485279
the CPU's floating-point unit computes the log2 to be -0.485427
Accuracy would be substantially improved by N = 12 and Ky = 1 << 20, but for that you need either thriftier code or more than 64 bits.
THRIFTIER CODE
Thriftier code, wanting more effort to write, might represent numerator and denominator in prime factors. For example, it might represent 500 as [2 0 3], meaning (22)(30)(53).
Yet further improvements might occur to your imagination.
AN ALTERNATE APPROACH
For an alternate approach, though it might not meet your requirements precisely as you have stated them, #phuclv has given the suggestion I would be inclined to follow if your program were mine: work the problem in reverse, guessing a value c/d for the logarithm and then computing 2^(c/d), presumably via a Newton-Raphson iteration. Personally, I like the Newton-Raphson approach better. See sect. 4.8 here (my original).
MATHEMATICAL BACKGROUND
Several sources including mine already linked explain the Taylor series underlying the first approach and the Newton-Raphson iteration of the second approach. The mathematics unfortunately is nontrivial, but there you have it. Good luck.

Mixing 16 bit linear PCM streams and avoiding clipping/overflow

I've trying to mix together 2 16bit linear PCM audio streams and I can't seem to overcome the noise issues. I think they are coming from overflow when mixing samples together.
I have following function ...
short int mix_sample(short int sample1, short int sample2)
{
return #mixing_algorithm#;
}
... and here's what I have tried as #mixing_algorithm#
sample1/2 + sample2/2
2*(sample1 + sample2) - 2*(sample1*sample2) - 65535
(sample1 + sample2) - sample1*sample2
(sample1 + sample2) - sample1*sample2 - 65535
(sample1 + sample2) - ((sample1*sample2) >> 0x10) // same as divide by 65535
Some of them have produced better results than others but even the best result contained quite a lot of noise.
Any ideas how to solve it?
The best solution I have found is given by Viktor Toth. He provides a solution for 8-bit unsigned PCM, and changing that for 16-bit signed PCM, produces this:
int a = 111; // first sample (-32768..32767)
int b = 222; // second sample
int m; // mixed result will go here
// Make both samples unsigned (0..65535)
a += 32768;
b += 32768;
// Pick the equation
if ((a < 32768) || (b < 32768)) {
// Viktor's first equation when both sources are "quiet"
// (i.e. less than middle of the dynamic range)
m = a * b / 32768;
} else {
// Viktor's second equation when one or both sources are loud
m = 2 * (a + b) - (a * b) / 32768 - 65536;
}
// Output is unsigned (0..65536) so convert back to signed (-32768..32767)
if (m == 65536) m = 65535;
m -= 32768;
Using this algorithm means there is almost no need to clip the output as it is only one value short of being within range. Unlike straight averaging, the volume of one source is not reduced even when the other source is silent.
here's a descriptive implementation:
short int mix_sample(short int sample1, short int sample2) {
const int32_t result(static_cast<int32_t>(sample1) + static_cast<int32_t>(sample2));
typedef std::numeric_limits<short int> Range;
if (Range::max() < result)
return Range::max();
else if (Range::min() > result)
return Range::min();
else
return result;
}
to mix, it's just add and clip!
to avoid clipping artifacts, you will want to use saturation or a limiter. ideally, you will have a small int32_t buffer with a small amount of lookahead. this will introduce latency.
more common than limiting everywhere, is to leave a few bits' worth of 'headroom' in your signal.
Here is what I did on my recent synthesizer project.
int* unfiltered = (int *)malloc(lengthOfLongPcmInShorts*4);
int i;
for(i = 0; i < lengthOfShortPcmInShorts; i++){
unfiltered[i] = shortPcm[i] + longPcm[i];
}
for(; i < lengthOfLongPcmInShorts; i++){
unfiltered[i] = longPcm[i];
}
int max = 0;
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
int val = unfiltered[i];
if(abs(val) > max)
max = val;
}
short int *newPcm = (short int *)malloc(lengthOfLongPcmInShorts*2);
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
newPcm[i] = (unfilted[i]/max) * MAX_SHRT;
}
I added all the PCM data into an integer array, so that I get all the data unfiltered.
After doing that I looked for the absolute max value in the integer array.
Finally, I took the integer array and put it into a short int array by taking each element dividing by that max value and then multiplying by the max short int value.
This way you get the minimum amount of 'headroom' needed to fit the data.
You might be able to do some statistics on the integer array and integrate some clipping, but for what I needed the minimum amount of headroom was good enough for me.
There's a discussion here: https://dsp.stackexchange.com/questions/3581/algorithms-to-mix-audio-signals-without-clipping about why the A+B - A*B solution is not ideal. Hidden down in one of the comments on this discussion is the suggestion to sum the values and divide by the square root of the number of signals. And an additional check for clipping couldn't hurt. This seems like a reasonable (simple and fast) middle ground.
I think they should be functions mapping [MIN_SHORT, MAX_SHORT] -> [MIN_SHORT, MAX_SHORT] and they are clearly not (besides first one), so overflows occurs.
If unwind's proposition won't work you can also try:
((long int)(sample1) + sample2) / 2
Since you are in time domain the frequency info is in the difference between successive samples, when you divide by two you damage that information. That's why adding and clipping works better. Clipping will of course add very high frequency noise which is probably filtered out.

Efficient implementation of natural logarithm (ln) and exponentiation

I'm looking for implementation of log() and exp() functions provided in C library <math.h>. I'm working with 8 bit microcontrollers (OKI 411 and 431). I need to calculate Mean Kinetic Temperature. The requirement is that we should be able to calculate MKT as fast as possible and with as little code memory as possible. The compiler comes with log() and exp() functions in <math.h>. But calling either function and linking with the library causes the code size to increase by 5 Kilobytes, which will not fit in one of the micro we work with (OKI 411), because our code already consumed ~12K of available ~15K code memory.
The implementation I'm looking for should not use any other C library functions (like pow(), sqrt() etc). This is because all library functions are packed in one library and even if one function is called, the linker will bring whole 5K library to code memory.
EDIT
The algorithm should be correct up to 3 decimal places.
Using Taylor series is not the simplest neither the fastest way of doing this. Most professional implementations are using approximating polynomials. I'll show you how to generate one in Maple (it is a computer algebra program), using the Remez algorithm.
For 3 digits of accuracy execute the following commands in Maple:
with(numapprox):
Digits := 8
minimax(ln(x), x = 1 .. 2, 4, 1, 'maxerror')
maxerror
Its response is the following polynomial:
-1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x
With the maximal error of: 0.000061011436
We generated a polynomial which approximates the ln(x), but only inside the [1..2] interval. Increasing the interval is not wise, because that would increase the maximal error even more. Instead of that, do the following decomposition:
So first find the highest power of 2, which is still smaller than the number (See: What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?). That number is actually the base-2 logarithm. Divide with that value, then the result gets into the 1..2 interval. At the end we will have to add n*ln(2) to get the final result.
An example implementation for numbers >= 1:
float ln(float y) {
int log2;
float divisor, x, result;
log2 = msb((int)y); // See: https://stackoverflow.com/a/4970859/6630230
divisor = (float)(1 << log2);
x = y / divisor; // normalized value between [1.0, 2.0]
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
result += ((float)log2) * 0.69314718; // ln(2) = 0.69314718
return result;
}
Although if you plan to use it only in the [1.0, 2.0] interval, then the function is like:
float ln(float x) {
return -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x;
}
The Taylor series for e^x converges extremely quickly, and you can tune your implementation to the precision that you need. (http://en.wikipedia.org/wiki/Taylor_series)
The Taylor series for log is not as nice...
If you don't need floating-point math for anything else, you may compute an approximate fractional base-2 log pretty easily. Start by shifting your value left until it's 32768 or higher and store the number of times you did that in count. Then, repeat some number of times (depending upon your desired scale factor):
n = (mult(n,n) + 32768u) >> 16; // If a function is available for 16x16->32 multiply
count<<=1;
if (n < 32768) n*=2; else count+=1;
If the above loop is repeated 8 times, then the log base 2 of the number will be count/256. If ten times, count/1024. If eleven, count/2048. Effectively, this function works by computing the integer power-of-two logarithm of n**(2^reps), but with intermediate values scaled to avoid overflow.
Would basic table with interpolation between values approach work? If ranges of values are limited (which is likely for your case - I doubt temperature readings have huge range) and high precisions is not required it may work. Should be easy to test on normal machine.
Here is one of many topics on table representation of functions: Calculating vs. lookup tables for sine value performance?
Necromancing.
I had to implement logarithms on rational numbers.
This is how I did it:
Occording to Wikipedia, there is the Halley-Newton approximation method
which can be used for very-high precision.
Using Newton's method, the iteration simplifies to (implementation), which has cubic convergence to ln(x), which is way better than what the Taylor-Series offers.
// Using Newton's method, the iteration simplifies to (implementation)
// which has cubic convergence to ln(x).
public static double ln(double x, double epsilon)
{
double yn = x - 1.0d; // using the first term of the taylor series as initial-value
double yn1 = yn;
do
{
yn = yn1;
yn1 = yn + 2 * (x - System.Math.Exp(yn)) / (x + System.Math.Exp(yn));
} while (System.Math.Abs(yn - yn1) > epsilon);
return yn1;
}
This is not C, but C#, but I'm sure anybody capable to program in C will be able to deduce the C-Code from that.
Furthermore, since
logn(x) = ln(x)/ln(n).
You have therefore just implemented logN as well.
public static double log(double x, double n, double epsilon)
{
return ln(x, epsilon) / ln(n, epsilon);
}
where epsilon (error) is the minimum precision.
Now as to speed, you're probably better of using the ln-cast-in-hardware, but as I said, I used this as a base to implement logarithms on a rational numbers class working with arbitrary precision.
Arbitrary precision might be more important than speed, under certain circumstances.
Then, use the logarithmic identities for rational numbers:
logB(x/y) = logB(x) - logB(y)
In addition to Crouching Kitten's answer which gave me inspiration, you can build a pseudo-recursive (at most 1 self-call) logarithm to avoid using polynomials. In pseudo code
ln(x) :=
If (x <= 0)
return NaN
Else if (!(1 <= x < 2))
return LN2 * b + ln(a)
Else
return taylor_expansion(x - 1)
This is pretty efficient and precise since on [1; 2) the taylor series converges A LOT faster, and we get such a number 1 <= a < 2 with the first call to ln if our input is positive but not in this range.
You can find 'b' as your unbiased exponent from the data held in the float x, and 'a' from the mantissa of the float x (a is exactly the same float as x, but now with exponent biased_0 rather than exponent biased_b). LN2 should be kept as a macro in hexadecimal floating point notation IMO. You can also use http://man7.org/linux/man-pages/man3/frexp.3.html for this.
Also, the trick
unsigned long tmp = *(ulong*)(&d);
for "memory-casting" double to unsigned long, rather than "value-casting", is very useful to know when dealing with floats memory-wise, as bitwise operators will cause warnings or errors depending on the compiler.
Possible computation of ln(x) and expo(x) in C without <math.h> :
static double expo(double n) {
int a = 0, b = n > 0;
double c = 1, d = 1, e = 1;
for (b || (n = -n); e + .00001 < (e += (d *= n) / (c *= ++a)););
// approximately 15 iterations
return b ? e : 1 / e;
}
static double native_log_computation(const double n) {
// Basic logarithm computation.
static const double euler = 2.7182818284590452354 ;
unsigned a = 0, d;
double b, c, e, f;
if (n > 0) {
for (c = n < 1 ? 1 / n : n; (c /= euler) > 1; ++a);
c = 1 / (c * euler - 1), c = c + c + 1, f = c * c, b = 0;
for (d = 1, c /= 2; e = b, b += 1 / (d * c), b - e/* > 0.0000001 */;)
d += 2, c *= f;
} else b = (n == 0) / 0.;
return n < 1 ? -(a + b) : a + b;
}
static inline double native_ln(const double n) {
// Returns the natural logarithm (base e) of N.
return native_log_computation(n) ;
}
static inline double native_log_base(const double n, const double base) {
// Returns the logarithm (base b) of N.
return native_log_computation(n) / native_log_computation(base) ;
}
Try it Online
Building off #Crouching Kitten's great natural log answer above, if you need it to be accurate for inputs <1 you can add a simple scaling factor. Below is an example in C++ that i've used in microcontrollers. It has a scaling factor of 256 and it's accurate to inputs down to 1/256 = ~0.04, and up to 2^32/256 = 16777215 (due to overflow of a uint32 variable).
It's interesting to note that even on an STMF103 Arm M3 with no FPU, the float implementation below is significantly faster (eg 3x or better) than the 16 bit fixed-point implementation in libfixmath (that being said, this float implementation still takes a few thousand cycles so it's still not ~fast~)
#include <float.h>
float TempSensor::Ln(float y)
{
// Algo from: https://stackoverflow.com/a/18454010
// Accurate between (1 / scaling factor) < y < (2^32 / scaling factor). Read comments below for more info on how to extend this range
float divisor, x, result;
const float LN_2 = 0.69314718; //pre calculated constant used in calculations
uint32_t log2 = 0;
//handle if input is less than zero
if (y <= 0)
{
return -FLT_MAX;
}
//scaling factor. The polynomial below is accurate when the input y>1, therefore using a scaling factor of 256 (aka 2^8) extends this to 1/256 or ~0.04. Given use of uint32_t, the input y must stay below 2^24 or 16777216 (aka 2^(32-8)), otherwise uint_y used below will overflow. Increasing the scaing factor will reduce the lower accuracy bound and also reduce the upper overflow bound. If you need the range to be wider, consider changing uint_y to a uint64_t
const uint32_t SCALING_FACTOR = 256;
const float LN_SCALING_FACTOR = 5.545177444; //this is the natural log of the scaling factor and needs to be precalculated
y = y * SCALING_FACTOR;
uint32_t uint_y = (uint32_t)y;
while (uint_y >>= 1) // Convert the number to an integer and then find the location of the MSB. This is the integer portion of Log2(y). See: https://stackoverflow.com/a/4970859/6630230
{
log2++;
}
divisor = (float)(1 << log2);
x = y / divisor; // FInd the remainder value between [1.0, 2.0] then calculate the natural log of this remainder using a polynomial approximation
result = -1.7417939 + (2.8212026 + (-1.4699568 + (0.44717955 - 0.056570851 * x) * x) * x) * x; //This polynomial approximates ln(x) between [1,2]
result = result + ((float)log2) * LN_2 - LN_SCALING_FACTOR; // Using the log product rule Log(A) + Log(B) = Log(AB) and the log base change rule log_x(A) = log_y(A)/Log_y(x), calculate all the components in base e and then sum them: = Ln(x_remainder) + (log_2(x_integer) * ln(2)) - ln(SCALING_FACTOR)
return result;
}

What is the correct way to perform alpha blending? (C)

I'm writing a very simple graphics library, and I'm trying to figure out how to do alpha blending. I tried it a few times, but my results were less than satisfactory. According to Wikipedia, I should do:
Value = (1-alpha)Value0 + alphavalue1
This, however is not working at all. Maybe I'm doing something wrong?
The code I've included draws a colorful picture (that's the 'proximity' function), then attempts to draw a partially transparent box at (100,100). However, instead of a white translucent box, I get a weird-looking distortion to the image (I'll try to have them at the bottom of my post). Any suggestions? Here is my code:
#include "hgl.h"
void proximity()
{
int x = 0, y = 0, d1, d2, d3, dcenter;
while(x < WIDTH){
while(y < HEIGHT){
d1 = distance(x, y, (WIDTH/2) - 200, (HEIGHT/2) + 200);
d2 = distance(x, y, (WIDTH/2) + 200, (HEIGHT/2) + 200);
d3 = distance(x, y, (WIDTH/2), (HEIGHT/2) - 150);
dcenter = distance(x, y, WIDTH/2, HEIGHT/2);
putpixel(x, y, d1, d2, d3);
y++;
}
y = 0;
x++;
}
}
int alpha_transparency(float alpha, float value1, float value2)
{
return (1-alpha) * value1 + alpha * value2;
}
void transparent_box(int pos_x, int pos_y, int width, int height, float alpha, char r, char g, char b)
{
int x = 0, y = 0;
while(x < width)
{
while(y < height)
{
int rr, rg, rb;
rr = alpha_transparency(alpha, p.bitmap[x+pos_x][y+pos_y].r, r);
rg = alpha_transparency(alpha, p.bitmap[x+pos_x][y+pos_y].g, g);
rb = alpha_transparency(alpha, p.bitmap[x+pos_x][y+pos_y].b, b);
putpixel(pos_x + x, pos_y + y, rr, rg, rb);
y++;
}
x++;
y = 0;
}
}
int main()
{
fp = fopen("out.bmp","wb");
set_dimensions(1440, 900);
insert_header();
white_screen();
proximity();
transparent_box(100, 100, 500, 500, .9, 255, 255, 255);
insert_image();
fclose(fp);
return 0;
}
Sorry, I couldn't include the output because I'm a new user. However, here are the links:
Original Picture
Picture with "transparent" box
Your alpha blend function is correct; another way to think of alpha blending is that it interpolates between two color values based on alpha, so it should be a value in [0, 1].
However, you shouldn't be passing the color components as char, which is signed by default. You should pass them either as unsigned char or as a wider integer type. What is happening is that instead of passing in 255 as you expect, you are passing in -1.
In other words, store your color components as unsigned chars to ensure you don't have signedness shenanigans (see EDIT2).
EDIT: Note that if your alpha is in [0, 255], you should normalize it to [0, 1] to perform the alpha blending operation.
EDIT2: Also, if you are storing your pixels as char instead of unsigned char, this would explain the odd clamping I saw:
alpha_transparency(0.9, (char)255, (char)255)
== alpha_transparency(0.9f, -1.0f, -1.0f)
== -1.0f
== 0xff (cast to char)
alpha_transparency(0.9, (char)128, (char)255)
== alpha_transparency(0.9f, -128.0f, -1.0f)
== -13.7f
== 0xf3
alpha_transparency(0.9, (char)127, (char)255)
== alpha_transparency(0.9f, 127.0f, -1.0f)
== -11.80f
== 0x0b
alpha_transparency(0.9, (char)0, (char)255)
== alpha_transparency(0.9f, 0.0f, -1.0f)
== -0.9f
== 0x00
The issue I think is in the way you're dealing with colors. The method used in Wikipedia assumes that 0 is black and 1 is white, with 0.5 being in the middle. However your code is using ints, so I assume that you're defining 0 as black and 255 as white.
So the correct code is:
return (255-alpha)*value1+alpha*value2;
You may be also suffering from the compiler rounding where you don't think it would. I would change the code in your function to this:
float result = (255.0f-alpha)*value1+alpha*value2;
return (int) result;
In general it's very common to work with images using floats instead of ints. Many programs today convert the entire image to floats, process it then convert it back. You can avoid several bugs this way.
It is probably best to stick with a single datatype: either all floats, or all integers; this reduces the potential for confusion, and avoids a class of performance pitfalls.
For all ints, you need to remember to re-scale the integer so the result fits back into the original range:
int alpha_transparency(int alpha, int value1, int value2) {
int unscaled= (255-alpha)*value1 + alpha*value2;
return unscaled / 255; /* integer division */
}
For all-floats, you need to remember to normalize integer inputs from a raw value in [0..255] (or whatever) to a float in [0.0 .. 1.0], do all the processing, then convert back to integer only at the end.
float input_raw_value(unsigned char value) { return value/255.0; }
...
float alpha_transparency(float alpha, float value1, float value2) {
return (1.0-alpha)*value1 + alpha*value2;
}
...
unsigned char output_raw_value(float value) { return value*255.0; }
Note that I have ignored rounding issues in each method; once you've got the basic math up and running, you should pay some attention to that. Also, there are various tricks to replace the divisions (which can be relatively slow) by multiplication or bit-fiddling.

Resources