Precisly convert float 32 to unsigned short or unsigned char

Precisly convert float 32 to unsigned short or unsigned char - c

First of all sorry if this is a duplicate, I couldn't find any subject answering my question.
I'm coding a little program that will be used to convert 32-bit floating point values to short int (16 bits) and unsigned char (8 bits) values. This is for HDR images purpose.
From here I could get the following function (without clamping):
static inline uint8_t u8fromfloat(float x)
{
return (int)(x * 255.0f);
}
I suppose that in the same way we could get short int by multiplying by (pow( 2,16 ) -1)
But then I ended up thinking about ordered dithering and especially to Bayer dithering. To convert to uint8_t I suppose I could use a 4x4 matrix and a 8x8 matrix for unsigned short.
I also thought of a Look-up table to speed-up the process, this way:
uint16_t LUT[0x10000] // 2¹⁶ values contained
and store 2^16 unsigned short values corresponding to a float.
This same table could be then used for uint8_t as well because of the implicit cast between unsigned short ↔ unsigned int
But wouldn't a look-up table like this be huge in memory? Also how would one fill a table like this?!
Now I'm confused, what would be best according to you?
EDIT after uwind answer: Let's say now that I also want to do basic color space conversion at the same time, that is before converting to U8/U16 , do a color space conversion (in float), and then shrink it to U8/U16. Wouldn't in that case use a LUT be more efficient? And yeah I would still have the problem to index the LUT.

The way I see it, the look-up table won't help since in order to index into it, you need to convert the float into some integer type. Catch 22.
The table would require 0x10000 * sizeof (uint16_t) bytes, which is 128 KB. Not a lot by modern standards, but on the other hand cache is precious. But, as I said, the table doesn't add much to the solution since you need to convert float to integer in order to index.
You could do a table indexed by the raw bits of the float re-interpreted as integer, but that would have to be 32 bits which becomes very large (8 GB or so).
Go for the straight-forward runtime conversion you outlined.

Just stay with the multiplication - it'll work fine.
Practically all modern CPU have vector instructions (SSE, AVX, ...) adapted to this stuff, so you might look at programming for that. Or use a compiler that automatically vectorizes your code, if possible (Intel C, also GCC). Even in cases where table-lookup is a possible solution, this can often be faster because you don't suffer from memory latency.

First, it should be noted that float has 24 bits of precision, which can no way fit into a 16-bit int or even 8 bits. Second, float have much larger range, which can't be stored in any int or long long int
So your question title is actually incorrect, no way to precisely convert any float to short or char. You want to map a float value between 0 and 1 to an 8-bit or 16-bit int range.
For the code you use above, it'll work fine. However the value 255 is extremely unlikely to be returned because it needs exactly 1.0 as input, otherwise values such as 254.99999 will ended up being truncated as 254. You should round the value instead
return (int)(x * 255.0f + .5f);
or better, use the code provided in your link for more balanced distribution
static inline uint8_t u8fromfloat_trick(float x)
{
union { float f; uint32_t i; } u;
u.f = 32768.0f + x * (255.0f / 256.0f);
return (uint8_t)u.i;
}
Using LUT wouldn't be any faster because a table for 16-bit values is too large for fitting in cache, and in fact may reduce your performance greatly. The snippet above needs only 2 floating-point instructions, or only 1 with FMA. And SIMD will improve performance 4-32x (or more) further, so LUT method would be easily outperformed as it's much harder to parallelize table look ups

Related

C: Is using char faster than using int?

Since char is only 1 byte long, is it better to to use char while dealing with 8-bit unsigned int?
Example:
I was trying to create a struct for storing rgb values of a color.
struct color
{
unsigned int r: 8;
unsigned int g: 8;
unsigned int b: 8;
};
Now since it is int, it allocates a memory of 4 bytes in my case. But if I replace them with unsigned char, they will be taking 3 bytes of memory as intended (in my platform).

No. Maybe a tiny bit.
First, this is a very platform dependent question.
However the <stdint.h> header was introduced to help with this.
Some hardware platforms are optimised for a particular size of operand and have an overhead to using smaller (in bit-size) operands even though logically the smaller operand requires less computation.
You should find that uint_fast8_t is the fastest unsigned integer type with at least 8 bits (#include <stdint.h> to use it).
That may be the same as unsigned char or unsigned int depending on whether your question is 'yes' or 'no' respectively(*).
So the idea would be that if you're speed focused you'd use uint_fast8_t and the compiler will pick the fastest type fitting your purpose.
There are a couple of downsides to this scheme.
One is that if you create very vast quantities of data performance can be impaired (and limits reached) because you're using an 'oversized' type for the purpose.
On a platform where a 4-byte int is faster than a 1-byte char you're using about 4 times as much memory as you need.
If your platform is small or your scale large that can be a significant overhead.
Also you need to be careful that if the underlying type isn't the minimum size you expect then some calculations may be confounded.
Arithmetic 'clocks' neatly for unsigned operands but obviously at different sizes if uint_fast8_t isn't in fact 8-bits.
It's platform dependent what the following returns:
#include <stdint.h>
//...
int foo() {
uint_fast8_t x=255;
++x;
if(x==0){
return 1;
}
return 0;
}
The overhead of dealing with potentially outsized types can claw back your gains.
I tend to agree with Knuth that "premature optimisation is the root of all evil" and would say you should only get into this sort of cleverness if you need it.
Do a typedef for typedef uint8_t color_comp; for now and get the application working before trying to shave off fractions of a second performance later!
I don't know what your application does but it may be that it's not compute intensive in RGB channels and the bottleneck (if any) is elsewhere. Maybe you find some high load calculation where it's worth dealing with uint_fast8_t conversions and issues.
The wisdom of Knuth is that you probably don't know where that is until later anyway.
(*) It could be unsigned long or indeed some other type. The only constraint is that it is an unsigned type and at least 8 bits.

How to force compiler to promote variable to float when doing maths

I got question about math in C, quick example below:
uint32_t desired_val;
uint16_t target_us=1500
uint32_t period_us=20000;
uint32_t tempmod=37500;
desired_val = (((target_us)/period_us) * tempmod);
At the moment (target_us/period_us) results in 0 which gives desired_value also 0. I don't want to make these variables float unless i really have to. I dont need anything after comma as it will be saved into 32bit register.
Is it possible to get correct results from this equation without declaring target_us or period_us as float? I want to make fixed point calculations when it's possible and floating point when it's needed.
Working on cortex-M4 if that helps.

Do the multiplication first.
You should split it into two statements with a temporary variable, to ensure the desired order of operations (parentheses ensure proper grouping, but not order).
uint64_t tempprod = (uint64_t)target_us * tempmod;
desired_val = tempprod / period_us;
I've also used uint64_t for the temporary, in case the product overflows. There's still a problem if the desired value doesn't fit into 32 bits; hopefully the data precludes that.

You'll probably have to do some casting in any case, but there's two different methods. First, stick with integers and do the multiplication first:
desired_val = ((uint64_t)target_us * tempmod) / period_us;
or do the calculations in floating point:
desired_val = (uint32_t)(((double)target_us / period_us) * tempmod);

You can do the computation with double quite easily:
desired_val = (double)target_us * tempmod / period_us;
float would be a mistake, since it has far too little precision to be reliable.
You might want to round that off to the nearest integer rather than letting it be truncated:
#include <math.h>
desired_val = round((double)target_us * tempmod / period_us);
See man round
You could, of course, do the computation using a wider integer type (for example, replacing the double with int64_t or long long). That will make rounding slightly trickier.

Efficiently implementing arrays of ternary data types in C

I need to implement "big" arrays (~1800 elements) of a ternary datatype as runtime-efficiently as possible in C for cryptographic research. I thought of the following:
Using an array of any-sized integers, using 2 Bits to represent one element each
So I'd have
typedef uint32_t block;
const int blocksize = sizeof(block)<<3;
block dataArray[3]; // 3*32 bit => 48 Elements
uint8_t getElementAt(block *data, int position)
{
position = position * 2;
return (data[position/blocksize] >> (position % blocksize)) & 3;
}
returning me 0..2 which i can map to my three values.
Using an array of uint8_t addressing the elementy directly.
uint8_t data[48];
Sure, that needs at least four times more RAM but addressing and setting might be more efficient - is it?
Are there any other good possibilities I'm missing or special caveats in any of the two solutions?

The answer depends on how big the arrays will get, and how you want to optimize. I sketch some scenarios:
Runtime, small arrays.
Simply use unsigned long arr[N]. Reading only on machine word boundaries is the fastest, but uses a lot of memory. When the memory usage gets too big you actually do not want to do this, because cache performance outweighs the aligned reads.
Runtime, big arrays.
Use unsigned char arr[N]. This will give you fast reads/writes at a decent speed.
Good memory usage, mediocre speed.
Use unsigned long arr[N] and store each trit in two bits, unpacking using shifts and masks.
Better memory usage, slow.
Use unsigned long arr[N], and store floor(CHAR_BIT * sizeof(long) * log(2) / log(3)) numbers, by storing digits in base-3. You can pack 20 trits in 32 bits using this method.
Best memory usage, horrendous.
Store all the numbers as digits in one base-3 number, using a bignum implementation.

On embedded platforms, is it more efficient to use unsigned int instead of (implicity signed) int?

I've got into this habit of always using unsigned integers where possible in my code, because the processor can do divides by powers of two on unsigned types, which it can't with signed types. Speed is critical for this project. The processor operates at up to 40 MIPS.
My processor has an 18 cycle divide, but it takes longer than the single cycle barrel shifter. So is it worth using unsigned integers here to speed things up or do they bring other disadvantages? I'm using a dsPIC33FJ128GP802 - a member of the dsPIC33F series by Microchip. It has single cycle multiply for both signed and unsigned ints. It also has sign and zero extend instructions.
For example, it produces this code when mixing signed and unsigned integers.
026E4 97E80F mov.b [w15-24],w0
026E6 FB0000 se w0,w0
026E8 97E11F mov.b [w15-31],w2
026EA FB8102 ze w2,w2
026EC B98002 mul.ss w0,w2,w0
026EE 400600 add.w w0,w0,w12
026F0 FB8003 ze w3,w0
026F2 100770 subr.w w0,#16,w14
I'm using C (GCC for dsPIC.)

I think we all need to know a lot more about the peculiarities of your processor to answer this question. Why can't it do divides by powers of two on signed integers? As far as I remember the operation is the same for both. I.e.
10/2 = 00001010 goes to 00000101
-10/2 = 11110110 goes to 11111011
Maybe you should write some simple code doing an unsigned divide and a signed divide and compare the compiled output.
Also benchmarking is a good idea. It doesn't need to be precise. Just have a an array of a few thousand numbers, start a timer and start dividing them a few million times and time how long it takes. Maybe do a few billion times if your processor is fast. E.g.
int s_numbers[] = { etc. etc. };
int s_array_size = sizeof(s_numbers);
unsigned int u_numbers[] = { etc. etc.};
unsigned int u_array_size = sizeof(u_numbers);
int i;
int s_result;
unsigned int u_result;
/* Start timer. */
for(i = 0; i < 100000000; i++)
{
i_result = s_numbers[i % s_array_size] / s_numbers[(i + 1) % s_array_size];
}
/* Stop timer and print difference. */
/* Repeat for unsigned integers. */
Written in a hurry to show the principle, please forgive any errors.
It won't give precise benchmarking but should give a general idea of which is faster.

I don't know much about the instruction set available on your processor but a quick look makes me think that it has instructions that may be used for both arithmetic and logical shifts, which should mean that shifting a signed value costs about the same as shifting an unsigned value, and dividing by powers of 2 for each using the shifts should also cost the same. (my knowledge about this is from a quick glance at some intrinsic functions for a C compiler that targets your processor family).
That being said, if you are working with values which are to be interpreted as unsigned then you might as well declare them as unsigned. For the last few years I've been using the types from stdint.h more and more, and usually I end up using the unsigned versions because my values are either inherently unsigned or I'm just using them as bit arrays.

Generate assembly both ways and count cycles.

I'm going to guess the unsigned divide of powers of two are faster because it can simply do a right shift as needed without needing to worry about sign extension.
As for disadvantages: detecting arithmetic overflows, overflowing a signed type because you didn't realize it while using unsigned, etc. Nothing blocking, just different things to watch out for.

Fixed-point multiplication in a known range

I'm trying to multiply A*B in 16-bit fixed point, while keeping as much accuracy as possible. A is 16-bit in unsigned integer range, B is divided by 1000 and always between 0.001 and 9.999. It's been a while since I dealt with problems like that, so:
I know I can just do A*B/1000 after moving to 32-bit variables, then strip back to 16-bit
I'd like to make it faster than that
I'd like to do all the operations without moving to 32-bit (since I've got 16-bit multiplication only)
Is there any easy way to do that?
Edit: A will be between 0 and 4000, so all possible results are in the 16-bit range too.
Edit: B comes from user, set digit-by-digit in the X.XXX mask, that's why the operation is /1000.

No, you have to go to 32 bit. In general the product of two 16 bit numbers will always give you a 32 bit wide result.
You should check the CPU instruction set of the CPU you're working on because most multiply instructions on 16 bit machines have an option to return the result as a 32 bit integer directly.
This would help you a lot because:
short testfunction (short a, short b)
{
int A32 = a;
int B32 = b;
return A32*B32/1000
}
Would force the compiler to do a 32bit * 32bit multiply. On your machine this could be very slow or even done in multiple steps using 16bit multiplies only.
A little bit of inline assembly or even better a compiler intrinsic could speed things up a lot.
Here is an example for the Texas Instruments C64x+ DSP which has such intrinsics:
short test (short a, short b)
{
int product = _mpy (a,b); // calculates product, returns 32 bit integer
return product / 1000;
}
Another thought: You're dividing by 1000. Was that constant your choice? It would be much faster to use a power of two as the base for your fixed-point numbers. 1024 is close. Why don't you:
return (a*b)/1024
instead? The compiler could optimize this by using a shift right by 10 bits. That ought to be much faster than doing reciprocal multiplication tricks.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight