Scramble a floating point number? - c

I need a repeatable pseudo-random function from floats in [0,1] to floats in [0,1]. I.e. given a 32-bit IEEE float, return a "different" one (as random as possible, given the 24 bits of mantissa). It has to be repeatable, so keeping tons of internal state is out. And unfortunately it has to work with only 32-bit int and single-float math (no doubles and not even 32x32=64bit multiply, though I could emulate that if needed -- basically it needs to work on older CUDA hardware). The better the randomness the better, of course, within these rather severe limitations. Anyone have any ideas?
(I've been through Park-Miller, which requires 64-bit int math, and the CUDA version of Park-Miller which requires doubles, Mersenne Twisters which have lots of internal state, and a few other things which didn't work.)

Best I understand the requirements, a hash accomplishes the desired functionality. Re-interprete the float input as an integer, apply the hash function to produce an integer approximately uniformly distributed in [0,2^32), then multiply this integer by 2^-32 to convert the resulting integer back to a float roughly uniformly distributed in [0,1]. One suitable hash function which does not require multiplication is Bob Jenkin's mix(), which can be found here: http://www.burtleburtle.net/bob/hash/doobs.html.
To re-interpret the bits of a float as an integer and vice versa, there are two choices in CUDA. Use intrinsics, or use C++-style reinterpretation casts:
float f;
int i;
i = __float_as_int(f);
f = __int_as_float(i);
i = reinterpret_cast<int&>(f);
f = reinterpret_cast<float&>(i);
So as a self-contained function, the entire process might look something like this:
/* transform float in [0,1] into a different float in [0,1] */
float scramble_float (float f)
{
unsigned int magic1 = 0x96f563ae; /* number of your choice */
unsigned int magic2 = 0xb93c7563; /* number of your choice */
unsigned int j;
j = reinterpret_cast<unsigned int &>(f);
mix (magic1, magic2, j);
return 2.3283064365386963e-10f * j;
}

The NVIDIA CUDA Toolkit includes a library called CURAND that I believe fits your requirements: it produces repeatable results (assuming you start with the same seed), works on the GPU, supports 32-bit floats and ints, and should work on older GPUs. It also supports multiple pseudo- and quasi-random generation algorithms and distributions.
[Note: a problem with using the C library rand() function (other than that it does not run in CUDA on the device) is that on Windows, rand() only returns a 16-bit value, and thus any float created by division by RAND_MAX has only 16 random bits of precision. What's more, on linux/mac it returns a 32-bit value so code that uses it is not numerically portable.]

Why not use the standard C library rand() function and divide the result by RAND_MAX?
#include <stdlib.h>
float randf (void)
{
return rand() / (float) RAND_MAX;
}

Related

Setting one's own type limits in C?

Can I set my own limits for data types in C? I'm solving some problem which involves some mega-great numbers, and I wish to perform many additions and multiplications and to take the final result modulo some desired number, say 1537849. So I wonder if it's possible to reset the limits of data types such that the values are automatically taken modulo the number I wish when the outcome of any of the operations exceeds my specified number, just as the processor normally does but with the limits I wish. And if such a thing isn't possible, what is the most efficient way to negotiate such a problem?
edit:
Consider one would want to calculate (2^1000) % 1537849 and place the result in the variable monster. Below is my attempt to conquer the problem:
uint64_t monster = 1;
uint32_t power = 1000;
for (uint32_t i = 0; i < power; i ++ ) {
monster *= 2;
if (i%64==63) monster %= 1537849;
}
monster %= 1537849;
Is there any better way of doing so (different algorithm, using libraries, whatever ...)??
Can I set my own limits for data types in C?
The limits of basic types are fixed per the compiler.
I wish to perform many additions and multiplications and to take the final result modulo some desired number, say 1537849
At any stage in the addition and multiplication, code can repeatedly perform the modulo. If the original numbers are N-bit, than at most N-bit math is needed - although it is easier to do with 2N-bit math. Unlimited wide math is inefficient and not needed for this task.
Example code for +, * and pow() with modulo limitations:
Modular exponentiation without range restriction
uintmax_t addmodmax(uintmax_t a, uintmax_t b, uintmax_t mod);
uintmax_t mulmodmax(uintmax_t a, uintmax_t b, uintmax_t mod);
uintmax_t powmodmax(uintmax_t x, uintmax_t expo, uintmax_t mod);
Can I set my own limits for data types in C?
No, short of writing your own compiler and libraries.
I'm solving some problem which involves some mega-great numbers which easily exceed the types' limits
There are algorithms for handling huge numbers in parts ... and there are libraries that already do the work for you, e.g. have a look at the GNU multi precision arithmetic library (GMP).

looking for snprintf()-replacement

I want to convert a float (e.g. f=1.234) to a char-array (e.g. s="1.234"). This is quite easy with snprintf() but for some size and performance-reasons I can't use it (it is an embedded platform where snprintf() is too slow because it uses doubles internally).
So: how can I easily convert a float to a char-array (positive and negative floats, no exponential representation, maximum three digits after the dot)?
Thanks!
PS: to clarify this: the platform comes with a NEON FPU which can do 32 bit float operations in hardware but is slow with 64 bit doubles. The C-library for this platform unfortunately does not have a specific NEON/float variant of snprintf, so I need a replacement. Beside of that the complete snprintf/printf-stuff increases code size too much
For many microcontrollers a simplified printf function without float/double support is available. For instance many platforms have newlib nano and texas instruments provides ustdlib.c.
With one of those non-float printf functions you could split up the printing to something using only integers like
float a = -1.2339f;
float b = a + ((a > 0) ? 0.0005f : -0.0005f);
int c = b;
int d = (int)(b * 1000) % 1000;
if (d < 0) d = -d;
printf("%d.%03d\n", c, d);
which outputs
-1.234
Do watch out for overflows of the integer on 8 and 16 bit platforms.
-edit-
Furthermore, as by the comments, rounding corner cases will provide different answers than printfs implementation.
You might check to see if your stdlib provide strfromf, the low-level routine that converts a float into a string that is normally used by printf and friends. If available, this might be lighter-weight than including the entire stdio lib (and indeed, that is the reason it is included in the 60559 C extension standard).

C long double in golang

I am porting an algorithm from C to Go. And I got a little bit confused. This is the C function:
void gauss_gen_cdf(uint64_t cdf[], long double sigma, int n)
{
int i;
long double s, d, e;
//Calculations ...
for (i = 1; i < n - 1; i++) {
cdf[i] = s;
}
}
And in the for loop value "s" is assigned to element "x" the array cdf. How is this possible? As far as I know, a long double is a float64 (in the Go context). So I shouldn't be able to compile the C code because I am assigning an long double to an array which just contains uint64 elements. But the C code is working fine.
So can someone please explain why this is working?
Thank you very much.
UPDATE:
The original C code of the function can be found here: https://github.com/mjosaarinen/hilabliss/blob/master/distribution.c#L22
The assignment cdf[i] = s performs an implicit conversion to uint64_t. It's hard to tell if this is intended without the calculations you omitted.
In practice, long double as a type has considerable variance across architectures. Whether Go's float64 is an appropriate replacement depends on the architecture you are porting from. For example, on x86, long double is an 80-byte extended precision type, but Windows systems are usually configured in such a way to compute results only with the 53-bit mantissa, which means that float64 could still be equivalent for your purposes.
EDIT In this particular case, the values computed by the sources appear to be static and independent of the input. I would just use float64 on the Go side and see if the computed values are identical to those of the C version, when run on a x86 machine under real GNU/Linux (virtualization should be okay), to work around the Windows FPU issues. The choice of x86 is just a guess because it is likely what the original author used. I do not understand the underlying cryptography, so I can't say whether a difference in the computed values impact the security. (Also note that the C code does not seem to properly seed its PRNG.)
C long double in golang
The title suggests an interest in whether of not Go has an extended precision floating-point type similar to long double in C.
The answer is:
Not as a primitive, see Basic types.
But arbitrary precision is supported by the math/big library.
Why this is working?
long double s = some_calculation();
uint64_t a = s;
It compiles because, unlike Go, C allows for certain implicit type conversions. The integer portion of the floating-point value of s will be copied. Presumably the s value has been scaled such that it can be interpreted as a fixed-point value where, based on the linked library source, 0xFFFFFFFFFFFFFFFF (2^64-1) represents the value 1.0. In order to make the most of such assignments, it may be worthwhile to have used an extended floating-point type with 64 precision bits.
If I had to guess, I would say that the (crypto-related) library is using fixed-point here because they want to ensure deterministic results, see: How can floating point calculations be made deterministic?. And since the extended-precision floating point is only being used for initializing a lookup table, using the (presumably slow) math/big library would likely perform perfectly fine in this context.

Why does my own power function, when used for calculating roots returns wrong result?

I implemented my own power function, which I further used for calculating a root. I wanted to compare the result returned by my function, with the one returned by the pow function, from the math.h. However, it turned out, when using my power function for calculating roots, it yields wrong answers. The square root of 15 is about 3, but my code prints 15:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <math.h>
double power(int base, double index)
{
double result = 1;
int i;
for (i = 0; i<index; i++)
result *= base;
return result;
}
int main()
{
int n = 15, s = 2;
printf("2^3 = %f\n\n", power(2,3));
double result1 = power(n, 1.0/s);
printf("%d\n", (int)result1);
double result2 = pow(n, 1.0/s);
printf("%d\n", (int)result2);
return 0;
}
Your function didn't work because its implementation uses a method that's typically used for explaining powers intuitively ("take the number 1 and multiply it exponent times by the base"). However, that method is only applicable for natural numbers. It is not the actual mathematical definition for powers with arbitrary exponents.
If you want to have a function that works for other number spaces, you need to find a numerical method that's applicable for those those as well. Typically, those involve calculating a particular series.
First, you need to define a function that handles these:
Power for exponents that are positive integers. (This is what you have achieved.)
Power for exponents that are negative integers. (You could use inversion, abs and your previous step for this.)
Power for exponent equal to zero. (Luckily, this is just a constant for most cases.)
You will also need an already-implemented ln(double x) (or you can implement it by summing a particular series that will involve your integer power function) and a factorial(int n) function (it's easy to write this, even intuitively).
Then, you can write a function that takes any real base, and any real exponent and an integer n and do these:
Calculate exponent * ln(base).
Use your integer power function to calculate the n-th power of that result.
Divide that result by factorial(n).
Wrap this in a loop that sums the results of this calculation for all values of n from 0 up until the highest that can be handled validly and efficiently (the higher the maximum n, the better the approximation). That sum is the mathematical result you're looking for. Therefore, a function that takes base and exponent as parameters and runs the aforementioned loop for a series of values of n is your actual final pow function that you can expose to external code.
Alternatively, it wouldn't hurt to just look at real-world implementations and see what methods they've used. Such implementations are often different from the most obvious mathematical ones, because they can be more efficient for computers (often directly taking into account the binary representations of the numbers involved) and also take special care to avoid things like overflows and underflows of the various data types involved.

Storing numbers with higher precision in C

I am writing a program in which I need to store numbers with a very high precision(around 10^-10) and then further use them a parameter( create_bloomfilter ([yet to decide the type] falsePositivity, long expected_num_of_elem) ).
The highest precision I am able to get is with double (something around 10^-6) which is not sufficient.
How can we store numbers with more higher precision in c?
You have been misinformed about double.
The smallest positive number you can store in a double is about 2⨯10-308, not counting denormalized numbers, which can be smaller. Denormals go down to 5⨯10-324. They have the equivalent of about 15-17 digits of precision, which is sufficient to measure the diameter of the Earth to within the size of a red blood cell, the smallest cell in the human body.
If you really need more precision, you need MPFR. (If your algorithms are numerically unstable, MPFR might not help.)
Edit: I figured out what you are doing wrong.
In C, 10^-7 is an integer expression. It should be equal to -13 on most systems. The ^ operator is the bitwise XOR operator, not the exponentiation operator. There is no exponentiation operator in C, because C operators generally correspond to more primitive operations, at least in terms of hardware implementation.
You want 1e-7, or pow(10, -7).
#include <stdio.h>
#include <math.h>
int main(int argc, char *argv[])
{
printf("2e-308 = %g\n", 2e-308);
printf("2 * pow(10, -308) = %g\n", 2 * pow(10, -308));
printf("10^-7 = %d\n", 10^-7);
return 0;
}
Output:
2e-308 = 2e-308
2 * pow(10, -308) = 2e-308
10^-7 = -13
Note that there are a lot of gotchas with floating point numbers.
Try GNU MPFR library and GNU GMP library
The MPFR library is a C library for multiple-precision floating-point computations with correct rounding.
GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.
Is long double sufficient? Some implementations use 128bit long double, which should easily handle your requirements.
http://en.wikipedia.org/wiki/Quadruple_precision
If you're looking for something extremely strong, check out MPFR

Resources