Converting fractions to floating point

Converting fractions to floating point - c

I'm trying to convert a fraction to floating point and use it for comparison.
but the values are too small and it returns true for the results of the Boolean variables. is my converision correct ? or should I do it in another way which I don't know ?
A test case:
// result is -0.0074
float coilh0re = fr32_to_float(GO_coil_H[0].re)*0.8f;
// result is -0.0092
float coilrefundamental = fr32_to_float(CoilEepromData.coilboardhspule.reFundamental);
// result is -0.01123
float coilh0re2 = fr32_to_float(GO_coil_H[0].re)*1.2f;
-0.0074>-0.0092> -0.01123
here is a snipped of the code
bool resultA = fr32_to_float(GO_coil_H[0].re)*0.8f < fr32_to_float(CoilEepromData.coilboardhspule.reFundamental) ? 1 : 0;
bool resultB = fr32_to_float(CoilEepromData.coilboardhspule.reFundamental) <= fr32_to_float(GO_coil_H[0].re)*1.2f ? 1 : 0;
bool resultAB = !(resultA & resultB); // always true
bool resultC = fr32_to_float(GO_coil_H[1].re)*0.8f < fr32_to_float(CoilEepromData.coilboardhspule.reHarmonic) ? 1:0;
bool resultD = fr32_to_float(CoilEepromData.coilboardhspule.reHarmonic) <= fr32_to_float(GO_coil_H[1].re)*1.2f ? 1:0;
bool resultCD = !(resultC & resultD); // always true
bool resultE = fr32_to_float(GO_coil_H[0].im)*0.8f < fr32_to_float(CoilEepromData.coilboardhspule.imFundamental)? 1 : 0;
bool resultF = fr32_to_float(CoilEepromData.coilboardhspule.imFundamental) <= fr32_to_float(GO_coil_H[0].im)*1.2f ? 1 : 0;
bool resultEF = !(resultE & resultF);// always true
bool resultG = fr32_to_float(GO_coil_H[1].im)*0.8f < CoilEepromData.coilboardhspule.imHarmonic ? 1 : 0;
bool resultH = fr32_to_float(CoilEepromData.coilboardhspule.imHarmonic) <= fr32_to_float(GO_coil_H[1].im)*1.2f ? 1 : 0;
bool resultGH = !(resultG & resultH);// always true
if(! ((fr32_to_float(GO_coil_H[0].re)*0.8f < fr32_to_float(CoilEepromData.coilboardhspule.reFundamental)) && (fr32_to_float(CoilEepromData.coilboardhspule.reFundamental) <= fr32_to_float(GO_coil_H[0].re)*1.2f) )
|| ! ((fr32_to_float(GO_coil_H[1].re)*0.8f < fr32_to_float(CoilEepromData.coilboardhspule.reHarmonic)) && (fr32_to_float(CoilEepromData.coilboardhspule.reHarmonic) <= fr32_to_float(GO_coil_H[1].re)*1.2f) )
|| ! ((fr32_to_float(GO_coil_H[0].im)*0.8f < fr32_to_float(CoilEepromData.coilboardhspule.imFundamental)) && (fr32_to_float(CoilEepromData.coilboardhspule.imFundamental) <= fr32_to_float(GO_coil_H[0].im)*1.2f) )
|| ! ((fr32_to_float(GO_coil_H[1].im)*0.8f < fr32_to_float(CoilEepromData.coilboardhspule.imHarmonic)) && (fr32_to_float(CoilEepromData.coilboardhspule.imHarmonic) <= fr32_to_float(GO_coil_H[1].im)*1.2f) ) )
{
eUserCode = E_USER_SOIL_FAILED;
eProcessState = E_ERROR_HANDLING;
}
}

If appears OP wants to test if a value reFundamental is in range +/-20% of re. This is not a float precision issue, but a math one.
// Simplified problem
float re = -0.01123f/1.2f;
float reFundamental = -0.0092f;
bool resultA = re*0.8f < reFundamental;
bool resultB = reFundamental <= re*1.2f;
bool resultAB = !(resultA & resultB); // always true
But the values are negative and so the < and <= should be reversed.
Various alternatives. Example: (Adjust to taste)
bool in_range(float x, float limit, float factor) {
float limitp = limit*(1.0f + factor);
float limitm = limit*(1.0f - factor);
if (x > limitm) return x <= limitp;
if (x < limitm) return x >= limitp;
return x == limitp;
}
bool resultAB = !in_range(fr32_to_float(CoilEepromData.coilboardhspule.reFundamental),
fr32_to_float(GO_coil_H[0].re), 0.20);

If you want to compare fractions - do not use floating-point at all. Convert them to the same denominator and compare numerators.

Related

How to create a float given an integer and the place of the decimal?

How can you generate a floating point given an integer and the decimal position?
For example:
int decimal = 1000;
int decimal_position = 3;
float value = 1.000;
I have accomplished this by using powers but that is not efficient
decimal/pow(10, decimal_position)

You can do this with a few integer multiplications and one floating point division:
int decimal = 1000;
int decimal_position = 3;
int offset = 1, i;
for (i=0; i<decimal_position; i++) {
offset *= 10;
}
float value = (float)decimal / offset;
Note that this works assuming decimal_position is non-negative and that 10decimal_position fits in an int.

How can you generate a floating point given an integer and the decimal position?
I have accomplished this by using powers but that is not efficient
float value = decimal/pow(10, decimal_position);
It depends on the range of decimal_position.
With 0 <= decimal_position < 8, code could use a table look-up.
const float tens[8] = { 1.0f, 0.1f, ..., 1.0e-7f };
float value = decimal*tens[decimal_position];
Yet to handle all int decimal and int decimal_position that result in a finite value, using float powf(float ), rather than double pow(double), should be the first choice.
// float power function
float value = decimal/powf(10.0f, decimal_position);
If not the best value is needed, code could *. This is slightly less precise as 0.1f is not exactly math 0.1. Yet * is usually faster than /.
float value = decimal*powf(0.1f, decimal_position);
Looping to avoid powf() could be done for small values of decimal_position
if (decimal_position < 0) {
if (decimal_position > -N) {
float ten = 1.0f;
while (++decimal_position < 0) ten *= 10.0f;
value = decimal*ten;
while (++decimal_position < 0) value /= 10.0f; // or value *= 0.1f;
} else {
value = decimal*powf(10.0f, -decimal_position);
}
} else {
if (decimal_position < N) {
float ten = 1.0f;
while (decimal_position-- > 0) ten *= 10.0f;
value = decimal/ten;
} else {
value = decimal/powf(10.0f, decimal_position); // alternate: *powf(0.1f, ...
}
}
Select processors may benefit with using pow() vs. powf(), yet I find powf() more commonly faster.
Of course if int decimal and int decimal_position are such that an integer answer is possible:
// example, assume 32-bit `int`
if (decimal_position <= 0 && decimal_position >= -9) {
const long long[10] = {1,10,100,1000,..., 1000000000};
value = decimal*i_ten[-decimal_position];
} else {
value = use above code ...
Or if abs(decimal_position) <= 19 and FP math expensive, consider:
unsigned long long ipow10(unsigned expo) {
unsigned long long ten = 10;
unsigned long long y = 1;
while (expo > 0) {
if (expo % 2u) {
y = ten * y;
}
expo /= 2u;
x *= ten;
}
return y;
}
if (decimal_position <= 0) {
value = 1.0f*decimal*ipow10(-decimal_position);
} else {
value = 1.0f*decimal/ipow10(decimal_position);
}
Or if abs(decimal_position) <= 27 ...
if (decimal_position <= 0) {
value = scalbnf(decimal, -decimal_position) * ipow5(-decimal_position);
} else {
value = scalbnf(decimal, -decimal_position) / ipow5(decimal_position);
}

Pattern for action decision

I am writing maze generator and at the some point I have to choose random unvisited neighbour of a cell. The first idea was just to enumerate neighbours such as left = 0, top = 1, right = 2, bottom = 3 and use rand() % 4 to generate random number and choose appropriate cell. However, not all cells features 4 neighbours, so that I had to write following code:
Cell* getRandomNeighbour(const Maze* const maze, const Cell* const currentCell) {
int randomNumb = rand() % 4;
int timer = 1;
while(timer > 0) {
if (randomNumb == 0 && currentCell->x < maze->width-1 && maze->map[currentCell->y][currentCell->x+1].isUnvisited)
return &maze->map[currentCell->y][currentCell->x+1];
if (randomNumb == 1 && currentCell->x > 0 && maze->map[currentCell->y][currentCell->x-1].isUnvisited)
return &maze->map[currentCell->y][currentCell->x-1];
if (randomNumb == 2 && currentCell->y < maze->height-1 && maze->map[currentCell->y+1][currentCell->x].isUnvisited)
return &maze->map[currentCell->y+1][currentCell->x];
if (randomNumb == 3 && currentCell->y > 0 && maze->map[currentCell->y-1][currentCell->x].isUnvisited)
return &maze->map[currentCell->y-1][currentCell->x];
timer--;
randomNumb = rand() % 4;
}
if (currentCell->x < maze->width-1 && maze->map[currentCell->y][currentCell->x+1].isUnvisited)
return &maze->map[currentCell->y][currentCell->x+1];
if (currentCell->x > 0 && maze->map[currentCell->y][currentCell->x-1].isUnvisited)
return &maze->map[currentCell->y][currentCell->x-1];
if (currentCell->y < maze->height-1 && maze->map[currentCell->y+1][currentCell->x].isUnvisited)
return &maze->map[currentCell->y+1][currentCell->x];
if (currentCell->y > 0 && maze->map[currentCell->y-1][currentCell->x].isUnvisited)
return &maze->map[currentCell->y-1][currentCell->x];
return NULL;
}
So, if after 10 iterations the right decision isn't chosen, it will be picked by brute force. This approach seems to be good for the reason that varying of variable timer changes the complexity of maze: the less timer is, the more straightforward maze is. Nevertheless, if my only purpose is to generate completely random maze, it takes a lot of execution time and look a little bit ugly. Is there any pattern(in C language) or way of refactoring that could enable me to deal with this situation without long switches and a lot of if-else constructions?

As #pat and #Ivan Gritsenko suggested, you can limit your random choice to the valid cells only, like this:
Cell* getRandomNeighbour(const Maze* const maze, const Cell* const currentCell)
{
Cell *neighbours[4] = {NULL};
int count = 0;
// first select the valid neighbours
if ( currentCell->x < maze->width - 1
&& maze->map[currentCell->y][currentCell->x + 1].isUnvisited ) {
neighbours[count++] = &maze->map[currentCell->y][currentCell->x + 1];
}
if ( currentCell->x > 0
&& maze->map[currentCell->y][currentCell->x - 1].isUnvisited ) {
neighbours[count++] = &maze->map[currentCell->y][currentCell->x - 1];
}
if ( currentCell->y < maze->height - 1
&& maze->map[currentCell->y + 1][currentCell->x].isUnvisited ) {
neighbours[count++] = &maze->map[currentCell->y + 1][currentCell->x];
}
if ( currentCell->y > 0
&& maze->map[currentCell->y - 1][currentCell->x].isUnvisited ) {
neighbours[count++] = &maze->map[currentCell->y - 1][currentCell->x];
}
// then choose one of them (if any)
int chosen = 0;
if ( count > 1 )
{
int divisor = RAND_MAX / count;
do {
chosen = rand() / divisor;
} while (chosen >= count);
}
return neighbours[chosen];
}
The rationale behind the random number generation part (as opposed to the more common rand() % count) is well explained in this answer.

Factoring repeated code, and a more disciplined way of picking the order of directions to try yields this:
// in_maze returns whether x, y is a valid maze coodinate.
int in_maze(const Maze* const maze, int x, int y) {
return 0 <= x && x < maze->width && 0 <= y && y < maze->height;
}
Cell *get_random_neighbour(const Maze* const maze, const Cell* const c) {
int dirs[] = {0, 1, 2, 3};
// Randomly shuffle dirs.
for (int i = 0; i < 4; i++) {
int r = i + rand() % (4 - i);
int t = dirs[i];
dirs[i] = dirs[r];
dirs[r] = t;
}
// Iterate through the shuffled dirs, returning the first one that's valid.
for (int trial=0; trial<4; trial++) {
int dx = (dirs[trial] == 0) - (dirs[trial] == 2);
int dy = (dirs[trial] == 1) - (dirs[trial] == 3);
if (in_maze(maze, c->x + dx, c->y + dy)) {
const Cell * const ret = &maze->map[c->y + dy][c->x + dx];
if (ret->isUnvisited) return ret;
}
}
return NULL;
}
(Disclaimer: untested -- it probably has a few minor issues, for example const correctness).

Ray Box Intersection in C

I am trying to make a method to calculate a ray box intersection in C. Most of the procedures I googled show methods that return bools (if there is or there isn't an intersection). However, I need a method that can return a tuple (I know there are no tuples in C, but I made a struct to represent it). Specifically, I need the values of tmin and tmax, even though they are negative, and assigning them a negative value if the value does not exist. How should I manage the returns of this to work properly? The code I produced in C is based on the code displayed in this page: https://tavianator.com/fast-branchless-raybounding-box-intersections-part-2-nans/. The actual implementation of the code in my program is as follows:
RectMinMax* Intersection(BoundingBox* b, Ray* r) {
RectMinMax* TMinMax = malloc(sizeof(RectMinMax));
float tmin = -INFINITY, tmax = INFINITY;
if (ray_get_direction(r).X != 0) {
float t1 = (b->x - ray_get_origin(r).X) / ray_get_direction(r).X;
float t2 = ((b->x + b->length) - ray_get_origin(r).X)/ ray_get_direction(r).X;
tmin = fmaxf(tmin, fminf(t1, t2));
tmax = fminf(tmax, fmaxf(t1, t2));
}
else if (ray_get_origin(r).X <= b->x || ray_get_origin(r).X >= (b->x + b->length)) {
TMinMax->min = -55;
TMinMax->max = -55;
return TMinMax;
}
if (ray_get_direction(r).Y != 0) {
float t1 = (b->y - ray_get_origin(r).Y) / ray_get_direction(r).Y;
float t2 = ((b->y + b->width) - ray_get_origin(r).Y)/ ray_get_direction(r).Y;
tmin = fmaxf(tmin, fminf(t1, t2));
tmax = fminf(tmax, fmaxf(t1, t2));
}
else if (ray_get_origin(r).Y <= b->y || ray_get_origin(r).Y >= (b->y + b->width)) {
TMinMax->min = -55;
TMinMax->max = -55;
return TMinMax;
}
if (ray_get_direction(r).Z != 0) {
float t1 = (b->z - ray_get_origin(r).Z) / ray_get_direction(r).Z;
float t2 = ((b->z + b->height) - ray_get_origin(r).Z)/ ray_get_direction(r).Z;
tmin = fmaxf(tmin, fminf(t1, t2));
tmax = fminf(tmax, fmaxf(t1, t2));
}
else if (ray_get_origin(r).Z <= b->z || ray_get_origin(r).Z >= (b->z + b->height)) {
TMinMax->min = -55;
TMinMax->max = -55;
return TMinMax;
}
if (tmax > tmin && tmax > 0) {
TMinMax->min = tmin;
TMinMax->max = tmax;
return TMinMax;
}
else {
TMinMax->min = -55;
TMinMax->max = -55;
return TMinMax;
}
}
RectMinMax is just a struct with to attributes max and min. In the code I used -55 to represent the "return false" cases of the code in the link. I understand I am leaving out cases in which tmax is positive and tmin negative, for example, but I do not know how to fix it.

Fast way to select the correct face of a cubemap?

Given an axis-aligned cubemap centered in the origin and an arbitrary point in 3D space, the straightforward way to check which face the point lies into consists in grabbing the coordinate with the greatest magnitude and selecting the face corresponding to that component.
The naive code would read as follows:
if (fabs(point.x) >= fabs(point.y) && fabs(point.x) >= fabs(point.z)) {
if (point.x >= 0) {face=0;} else {face=1;}
}
if (fabs(point.y) >= fabs(point.x) && fabs(point.y) >= fabs(point.z)) {
if (point.y >= 0) {face=2;} else {face=3;}
}
if (fabs(point.z) >= fabs(point.x) && fabs(point.z) >= fabs(point.y)) {
if (point.z >= 0) {face=4;} else {face=5;}
}
Is there a way to achieve the same thing that is considered to be better in C? Would branchless code be more optimal?
Any inline assembly standard of choice can alternatively be used for the purpose.If necessary, all the \>= operators can be turned into \> operators.

Might not look like much, but the first three if statements eliminate all of the calls to fabs as well as replacing the inner if statements in the posted code. The final if/else takes a maximum of two compares/branches to determine the answer.
if ( point.x < 0 ) {
x = -point.x;
fx = 1;
} else {
x = point.x;
fx = 0;
}
if ( point.y < 0 ) {
y = -point.y;
fy = 3;
} else {
y = point.y;
fy = 2;
}
if ( point.z < 0 ) {
z = -point.z;
fz = 5;
} else {
z = point.z;
fz = 4;
}
if ( x >= y ) {
if ( x >= z ) { face = fx; } else { face = fz; }
} else {
if ( y >= z ) { face = fy; } else { face = fz; }
}

64 bit mathematical operations without any loss of data or precision

I believe there isn't any portable standard data type for 128 bits of data. So, my question is about how efficiently 64 bit operations can be carried out without loss of data using existing standard data-types.
For example : I have following two uint64_t type variables:
uint64_t x = -1;
uint64_t y = -1;
Now, how the result of mathematical operations such as x+y, x-y, x*y and x/y can be stored/retrieved/printed ?
For above variables, x+y results in value of -1 which is actually a 0xFFFFFFFFFFFFFFFFULL with a carry 1.
void add (uint64_t a, uint64_t b, uint64_t result_high, uint64_t result_low)
{
result_low = result_high = 0;
result_low = a + b;
result_high += (result_low < a);
}
How other operations can be performed as like add, which gives proper final output ?
I'd appreciate if someone share the generic algorithm which take care of overflow/underflow etcetera that might comes into picture using such operations.
Any standard tested algorithms which might can help.

There are lot of BigInteger libraries out there to manipulate big numbers.
GMP Library
C++ Big Integer Library
If you want to avoid library integration and your requirement is quite small, here is my basic BigInteger snippet that I generally use for problem with basic requirement. You can create new methods or overload operators according your need. This snippet is widely tested and bug free.
Source
class BigInt {
public:
// default constructor
BigInt() {}
// ~BigInt() {} // avoid overloading default destructor. member-wise destruction is okay
BigInt( string b ) {
(*this) = b; // constructor for string
}
// some helpful methods
size_t size() const { // returns number of digits
return a.length();
}
BigInt inverseSign() { // changes the sign
sign *= -1;
return (*this);
}
BigInt normalize( int newSign ) { // removes leading 0, fixes sign
for( int i = a.size() - 1; i > 0 && a[i] == '0'; i-- )
a.erase(a.begin() + i);
sign = ( a.size() == 1 && a[0] == '0' ) ? 1 : newSign;
return (*this);
}
// assignment operator
void operator = ( string b ) { // assigns a string to BigInt
a = b[0] == '-' ? b.substr(1) : b;
reverse( a.begin(), a.end() );
this->normalize( b[0] == '-' ? -1 : 1 );
}
// conditional operators
bool operator < (BigInt const& b) const { // less than operator
if( sign != b.sign ) return sign < b.sign;
if( a.size() != b.a.size() )
return sign == 1 ? a.size() < b.a.size() : a.size() > b.a.size();
for( int i = a.size() - 1; i >= 0; i-- ) if( a[i] != b.a[i] )
return sign == 1 ? a[i] < b.a[i] : a[i] > b.a[i];
return false;
}
bool operator == ( const BigInt &b ) const { // operator for equality
return a == b.a && sign == b.sign;
}
// mathematical operators
BigInt operator + ( BigInt b ) { // addition operator overloading
if( sign != b.sign ) return (*this) - b.inverseSign();
BigInt c;
for(int i = 0, carry = 0; i<a.size() || i<b.size() || carry; i++ ) {
carry+=(i<a.size() ? a[i]-48 : 0)+(i<b.a.size() ? b.a[i]-48 : 0);
c.a += (carry % 10 + 48);
carry /= 10;
}
return c.normalize(sign);
}
BigInt operator - ( BigInt b ) { // subtraction operator overloading
if( sign != b.sign ) return (*this) + b.inverseSign();
int s = sign;
sign = b.sign = 1;
if( (*this) < b ) return ((b - (*this)).inverseSign()).normalize(-s);
BigInt c;
for( int i = 0, borrow = 0; i < a.size(); i++ ) {
borrow = a[i] - borrow - (i < b.size() ? b.a[i] : 48);
c.a += borrow >= 0 ? borrow + 48 : borrow + 58;
borrow = borrow >= 0 ? 0 : 1;
}
return c.normalize(s);
}
BigInt operator * ( BigInt b ) { // multiplication operator overloading
BigInt c("0");
for( int i = 0, k = a[i] - 48; i < a.size(); i++, k = a[i] - 48 ) {
while(k--) c = c + b; // ith digit is k, so, we add k times
b.a.insert(b.a.begin(), '0'); // multiplied by 10
}
return c.normalize(sign * b.sign);
}
BigInt operator / ( BigInt b ) { // division operator overloading
if( b.size() == 1 && b.a[0] == '0' ) b.a[0] /= ( b.a[0] - 48 );
BigInt c("0"), d;
for( int j = 0; j < a.size(); j++ ) d.a += "0";
int dSign = sign * b.sign;
b.sign = 1;
for( int i = a.size() - 1; i >= 0; i-- ) {
c.a.insert( c.a.begin(), '0');
c = c + a.substr( i, 1 );
while( !( c < b ) ) c = c - b, d.a[i]++;
}
return d.normalize(dSign);
}
BigInt operator % ( BigInt b ) { // modulo operator overloading
if( b.size() == 1 && b.a[0] == '0' ) b.a[0] /= ( b.a[0] - 48 );
BigInt c("0");
b.sign = 1;
for( int i = a.size() - 1; i >= 0; i-- ) {
c.a.insert( c.a.begin(), '0');
c = c + a.substr( i, 1 );
while( !( c < b ) ) c = c - b;
}
return c.normalize(sign);
}
// << operator overloading
friend ostream& operator << (ostream&, BigInt const&);
private:
// representations and structures
string a; // to store the digits
int sign; // sign = -1 for negative numbers, sign = 1 otherwise
};
ostream& operator << (ostream& os, BigInt const& obj) {
if( obj.sign == -1 ) os << "-";
for( int i = obj.a.size() - 1; i >= 0; i--) {
os << obj.a[i];
}
return os;
}
Usage
BigInt a, b, c;
a = BigInt("1233423523546745312464532");
b = BigInt("45624565434216345i657652454352");
c = a + b;
// c = a * b;
// c = b / a;
// c = b - a;
// c = b % a;
cout << c << endl;
// dynamic memory allocation
BigInt *obj = new BigInt("123");
delete obj;

You can emulate uint128_t if you don't have it:
typedef struct uint128_t { uint64_t lo, hi } uint128_t;
...
uint128_t add (uint64_t a, uint64_t b) {
uint128_t r; r.lo = a + b; r.hi = + (r.lo < a); return r; }
uint128_t sub (uint64_t a, uint64_t b) {
uint128_t r; r.lo = a - b; r.hi = - (r.lo > a); return r; }
Multiplication without inbuilt compiler or assembler support is a bit more difficult to get right. Essentially, you need to split both multiplicands into hi:lo unsigned 32-bit, and perform 'long multiplication' taking care of carries and 'columns' between the partial 64-bit products.
Divide and modulo return 64 bit results given 64 bit arguments - so that's not an issue as you have defined the problem. Dividing 128 bit by 64 or 128 bit operands is a much more complicated operation, requiring normalization, etc.
longlong.h routines umul_ppmm and udiv_qrnnd in GMP give the 'elementary' steps for multiple-precision/limb operations.

In most of the modern GCC compilers __int128 type is supported which can hold a 128 bit integers.
Example,
__int128 add(__int128 a, __int128 b){
return a + b;
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Converting fractions to floating point - c

If you want to compare fractions - do not use floating-point at all. Convert them to the same denominator and compare numerators.

Related

How to create a float given an integer and the place of the decimal?

Pattern for action decision

Ray Box Intersection in C

Fast way to select the correct face of a cubemap?

64 bit mathematical operations without any loss of data or precision

Categories

Resources