Efficient way to store a fixed range float - c

I'm heaving an (big) array of floats, each float takes 4 bytes.
Is there a way, given the fact that my floats are ranged between 0 and 255, to store each float in less than 4 bytes?
I can do any amount of computation on the whole array.
I'm using C.

How much precision do you need?
You can store each float in 2 bytes by representing it as an unsigned short (ranges from 0 to 65,535) and dividing all values by 2^8 when you need the actual value. This is essentially the same as using a fixed point format instead of floating point.
Your precision is limited to 1.0 / (2^8) = 0.00390625 when you do this, however.

The absolute range of your data doesn't really matter that much, it's the amount of precision you need. If you can get away with e.g. 6 digits of precision, then you only need as much storage as would be required to store the integers from 1-1000000, and that's 20 bits. So, supposing this, what you can do is:
1) Shift your data so that the smallest element has value 0. I.e. subtract a single value from every element. Record this shift.
2) Scale (multiply) your data by a number just large enough so that after truncation to an integer, you will not lose any precision you need.
3) Now this might be tricky unless you can pack your data into convenient 8- or 16-bit units--pack the data into successive unsigned integers. Each one of your data values needs 20 bits in this example, so value 1 takes up the first 20 bits of integer 1, value 2 takes up the remaining 12 bits of integer 1 and the first 8 bits of integer 2, and so on. In this hypothetical case you end up saving ~40%.
4) Now, 'decrypting'. Unpack the values (you have saved the # of bits in each one), un-scale, and un-shift.
So, this will do it, and might be faster and more compact than standard compression algorithms, as they aren't allowed to make assumptions about how much precision you need, but you are.

For example you could store integers (floats with .0) on one byte, but the other float need more bytes.
You could also use fixed-point if you don't worry about precision...

Related

Generating random numbers in ranges from 32 bytes of random data, without bignum library

I have 32 bytes of random data.
I want to generate random numbers within variable ranges between 0-9 and 0-100.
If I used an arbitrary precision arithmetic (bignum) library, and treated the 32 bytes as a big number, I could simply do:
random = random_source % range;
random_source = random_source / range;
as often as I liked (with different ranges) until the product of the ranges nears 2^256.
Is there a way of doing this using only (fixed-size) integer arithmetic?
Certainly you can do this by doing base 256 long division (or push up multiplication). It is just like the long division you learnt in primary school, but with bytes instead of digits. It involves doing a cascade of divides and remainders for each byte in turn. Note that you also need to be aware how you are consuming the big number, and that as you consume it and it becomes smaller, there is an increasing bias against the larger values in the range. Eg if you only have 110 left, and you asked for a rnd(100), the values 0-9 would be 10% more likely than 10-99 each.
But, you don't really need the bignum techniques for this, you can use ideas from arithmetic encoding compression, where you build up the single number without actually ever dealing with the whole thing.
If you start by reading 4 bytes to an unsigned uint_32 buffer, it has a range 0..4294967295 , a non-inclusive max of 4294967296. I will refer to this synthesised value as the "carry forward", and this exclusive max value is also important to record.
[For simplicity, you might start with reading 3 bytes to your buffer, generating a max of 16M. This avoids ever having to deal with the 4G value that can't be held in a 32 bit integer.]
There are 2 ways to use this, both with accuracy implications:
Stream down:
Do your modulo range. The modulo is your random answer. The division result is your new carry forward and has a smaller range.
Say you want 0..99, so you modulo by 100, your upper part has a range max 42949672 (4294967296/100) which you carry forward for the next random request
We can't feed another byte in yet...
Say you now want 0..9, so you modulo by 10, and now your upper part has a range 0..4294967 (42949672/100)
As max is less than 16M, we can now bring in the next byte. Multiply it by the current max 4294967 and add it to the carry forward. The max is also multiplied by 256 -> 1099511552
This method has a slight bias towards small values, as 1 in the "next max" times, the available range of values will not be the full range, because the last value is truncated, but by choosing to maintain 3-4 good bytes in max, that bias is minimised. It will only occur at max 1 in 16million times.
The computational cost of this algorithm is the div by the random range of both carry forward and max, and then the multiply each time you feed in a new byte. I assume the compiler will optimise the modulo
Stream up:
Say you want 0..99
Divide your max by range, to get the nextmax, and divide carryforward by nextmax. Now, your random number is in the division result, and the remainder forms the value you carry forward to get the next random.
When nextmax becomes less than 16M, simply multiply both nextmax and your carry forward by 256 and add in the next byte.
The downside if this method is that depending on the division used to generate nextmax, the top value result (i.e. 99 or 9) is heavily biased against, OR sometimes you will generate the over-value (100) - this depends whether you round up or down doing the first division.
The computational cost here is again 2 divides, presuming the compiler optimiser blends div and mod operations. The multiply by 256 is fast.
In both cases you could choose to say that if the input carry forward value is in this "high bias range" then you will perform a different technique. You could even oscillate between the techniques - use the second in preference, but if it generates the over-value, then use the first technique, though on its own the likelihood is that both techniques will bias for similar input random streams when the carry forward value is near max. This bias can be reduced by making the second method generate -1 as the out-of-range, but each of these fixes adds an extra multiply step.
Note that in arithmetic encoding this overflow zone is effectively discarded as each symbol is extracted. It is guaranteed during decoding that those edge values won't happen, and this contributes to the slight suboptimal compression.
/* The 32 bytes in data are treated as a base-256 numeral following a "." (a
radix point marking where fractional digits start). This routine
multiplies that numeral by range, updates data to contain the fractional
portion of the product, and returns the integer portion.
8-bit bytes are assumed, or "t /= 256" could be changed to
"t >>= CHAR_BIT". But then you have to check the sizes of int
and unsigned char to consider overflow.
*/
int r(int range, unsigned char *data)
{
// Start with 0 carried from a lower position.
int t = 0;
// Iterate through each byte.
for (int i = 32; 0 < i;)
{
--i;
// Multiply next byte by our multiplier and add the carried data.
t = data[i] * range + t;
// Store the low bits of the result.
data[i] = t;
// Carry the high bits of the result to the next position.
t /= 256;
}
// Return the bits that carried out of the multiplication.
return t;
}

GMP most significant digits

I'm performing some calculations on arbitrary precision integers using GNU Multiple Precision (GMP) library. Then I need the decimal digits of the result. But not all of them: just, let's say, a hundred of most significant digits (that is, the digits the number starts with) or a selected range of digits from the middle of the number (e.g. digits 100..200 from a 1000-digit number).
Is there any way to do it in GMP?
I couldn't find any functions in the documentation to extract a range of decimal digits as a string. The conversion functions which convert mpz_t to character strings always convert the entire number. One can only specify the radix, but not the starting/ending digit.
Is there any better way to do it other than converting the entire number into a humongous string only to take a small piece of it and throw out the rest?
Edit: What I need is not to control the precision of my numbers or limit it to a particular fixed amount of digits, but selecting a subset of digits from the digit string of the number of arbitrary precision.
Here's an example of what I need:
71316831 = 19821203202357042996...2076482743
The actual number has 1112852 digits, which I contracted into the ....
Now, I need only an arbitrarily chosen substring of this humongous string of digits. For example, the ten most significant digits (1982120320 in this case). Or the digits from 1112841th to 1112849th (21203202 in this case). Or just a single digit at the 1112841th position (2 in this case).
If I were to first convert my GMP number to a string of decimal digits with mpz_get_str, I would have to allocate a tremendous amount of memory for these digits only to use a tiny fraction of them and throw out the rest. (Not to mention that the original mpz_t number in binary representation already eats up quite a lot.)
If you know the number of decimal digits of x = 7^1316831 in advance, e.g., 1112852. Then you get your lower, say, 10 digits with:
x % (10^10), and the upper 20 digits with:
x / (10^(1112852 - 20)).
Note, I get 19821203202357042995 for the latter; 5 at final, not 6.
I don't think you can do that in GMP. However you can use Boost Multiprecision Library
Depending upon the number type, precision may be arbitrarily large (limited only by available memory), fixed at compile time (for example 50 or 100 decimal digits), or a variable controlled at run-time by member functions. The types are expression-template-enabled for better performance than naive user-defined types.
Emphasis mine
Another alternative is ttmath with the type ttmath::Big<e,m> that you can control the needed precision. Any fixed-precision types will work, provided that you only need the most significant digits, as they all drop the low significant digits like how float and double work. Those digits don't affect the high digits of the result, hence can be omitted safely. For instance if you need the high 20 digits then use a type that can store 20 digits and a little more, in order to provide enough data for correct rounding later
For demonstration let's take a simple example of 77 = 823543 and you only need the top 2 digits. Using a 4-digit type for calculation you'll get this
75 = 16807 => round to 1681×10¹ and store
75×7 = 1681×101×7 = 11767*10¹ ≈ 1177×102
75×7×7 = 1177×102×7 = 8232×102
As you can see the top digits are the same even without needing to get the full exact result. Calculating the full precision using GMP not only wastes a lot of time but also memory. Think about the amount of memory you need to store the result of another operation on 2 bigints to get the digits you want. By fixing the precision instead of leaving it at infinite you'll decrease the CPU and memory usage significantly.
If you need the 100th to 200th high order digits then use a type that has enough room for 201 digits and more, and extract those 101 digits after calculation. But this will be more wasteful so you may need to change to an arbitrary-precision (or fixed-precision) type that uses a base that's a power of 10 for its limbs (I'm using GMP notation here). For example if the type uses base 109 then each limb represents 9 digits in the decimal output and you can get arbitrary digit in decimal directly without any conversion from binary to decimal. That means zero waste for the string. I'm not sure which library uses base 10n but you can look at Mini-Pi's implementation which uses base 109, or write it yourself. This way it also work for efficiently getting the high digits
See
How are extremely large floating-point numbers represented in memory?
What is the simplest way of implementing bigint in C?

Repercussions of storing a 5 digit integer as a 16bit float

I am working with some data that can have large number values and the data itself is important.
The highest number seen is "89,482". So originally I was going to use unsigned int.
However using these numbers in usigned int format is causing some headaches, namely manipulating them in OpenGL shaders.
Basically things would be a lot simpler if I could use float instead.
However I don't fully understand the repercussions of storing a number like this as floating point. Especially as in the OpenGL case I don't have the choice of storing a single channel 32 bit floating point texture, only 16bit.
For 16bit float Wikipedia states:
Precision limitations on integer values
Integers between 0 and 2048 can be exactly represented
Integers between 2049 and 4096 round to a multiple of 2 (even number)
Integers between 4097 and 8192 round to a multiple of 4
Integers between 8193 and 16384 round to a multiple of 8
Integers between 16385 and 32768 round to a multiple of 16
Integers between 32769 and 65519 round to a multiple of 32
Integers equal to or above 65520 are rounded to "infinity".
So does this quite simply mean that in the case of the number "89,482", trying to store it in an 16bit OpenGL texture it will be rounded to infinity? If so what are my options? Stick with unsigned int? What about when I need to normalize it, can I cast to float?
--EDIT---
I need the value to be un-normalised in the shader

Write 9 bits binary data in C

I am trying to write to a file binary data that does not fit in 8 bits. From what I understand you can write binary data of any length if you can group it in a predefined length of 8, 16, 32,64.
Is there a way to write just 9 bits to a file? Or two values of 9 bits?
I have one value in the range -+32768 and 3 values in the range +-256. What would be the way to save most space?
Thank you
No, I don't think there's any way using C's file I/O API:s to express storing less than 1 char of data, which will typically be 8 bits.
If you're on a 9-bit system, where CHAR_BIT really is 9, then it will be trivial.
If what you're really asking is "how can I store a number that has a limited range using the precise number of bits needed", inside a possibly larger file, then that's of course very possible.
This is often called bitstreaming and is a good way to optimize the space used for some information. Encoding/decoding bitstream formats requires you to keep track of how many bits you have "consumed" of the current input/output byte in the actual file. It's a bit complicated but not very hard.
Basically, you'll need:
A byte stream s, i.e. something you can put bytes into, such as a FILE *.
A bit index i, i.e. an unsigned value that keeps track of how many bits you've emitted.
A current byte x, into which bits can be put, each time incrementing i. When i reaches CHAR_BIT, write it to s and reset i to zero.
You cannot store values in the range –256 to +256 in nine bits either. That is 513 values, and nine bits can only distinguish 512 values.
If your actual ranges are –32768 to +32767 and –256 to +255, then you can use bit-fields to pack them into a single structure:
struct MyStruct
{
int a : 16;
int b : 9;
int c : 9;
int d : 9;
};
Objects such as this will still be rounded up to a whole number of bytes, so the above will have six bytes on typical systems, since it uses 43 bits total, and the next whole number of eight-bit bytes has 48 bits.
You can either accept this padding of 43 bits to 48 or use more complicated code to concatenate bits further before writing to a file. This requires additional code to assemble bits into sequences of bytes. It is rarely worth the effort, since storage space is currently cheap.
You can apply the principle of base64 (just enlarging your base, not making it smaller).
Every value will be written to two bytes and and combined with the last/next byte by shift and or operations.
I hope this very abstract description helps you.

What is the approximate resolution of a single precision floating point number when its around zero

I am storing many longitudes and latitudes as doubles, I am wondering if I can get away with storing them as floats.
To answer this question, I need to know the approximate resolution of a single precision floating point number when the stored values are longitudes / latitudes (-180 to +180).
Your question may have several interpretations.
If it is just for angles and for storage on a disk or on a device i would suggest you to store your values using a totally different technique: store as 32 bit integer.
int encodedAngle = (int)(value * (0x7FFFFFFF / 180.0));
To recover it, do the contrary.
double angle = (encodedAngle / (0x7FFFFFFF / 180.0));
In this way you have full 31 bit resolution for 180 degrees and 1 bit for the sign.
You can use this way also to keep your values in ram, the cost of this coversion is higher compared to work directly with doubles, but if you want to keep your memory low but resolution high this can work quite well.
The cost is not so high, just a conversion to/from integer from/to double and a multiplication, modern processors will do it in a very little amount of time, and since the accessed memory is less, if the list contains a lot of values, your code will be more friendly with processor cache.
Your resolution will be 180 / ((2^31) - 1) = 8.38190318 × 10^-8 degrees, not bad :)
The resolution you can count on with single-precision floats is about 360 / (2 ^ 23) or 4 * 10 ^ -5.
More precisely, the largest single-precision float strictly inferior to 360. (which is representable exactly) is about 359.999969. For the whole range -360. .. 360, you will be able to represent differences at least as small as the difference between these two numbers.
Depends, but rather not.
32-bit float stores 7 significant digits. That is normally too little for storing the proper resolution of longitude/latitude. For example, openstreetmap.org uses six digits after the decimal point, so minimum eight, maximum total of ten digits.
In short, use float64.
Usually floats are 4 bytes (32 bits) while doubles are double that. However, the exact precision if you're doing calculations is implementation (and hardware) specific. On some systems all floats will be stored as doubles, just to add to the confusion.

Resources