Better precision with Arduino (floats)

Better precision with Arduino (floats) - c

I'm trying to do the Steinhart-Hart temperature calculation on an Arduino. The equation is
I solved a system of 3 equations to obtain the values of A, B and C, which are:
A = 0.0164872
B = -0.00158538
C = 3.3813e-6
When I plug these into WolframAlpha to solve for T I get a value in Kelvins that makes sense:
T=1/(0.0164872-0.00158538*log2(10000)+3.3813E-6*(log2(10000))^3) solve for T
T = 298.145 Kelvins = 77 Fahrenheit
However when I try to use this equation on my Arduino, I get a very wrong answer, I suspect because doubles do not have enough precision. Here's what I'm using:
double temp = (1 / (A + B*log(R_therm) + C*pow(log(R_therm),3)));
This returns 222 Kelvin instead, which is way off.
So, how can I do a calculation like this in Arduino?? Any advice is greatly appreciated, thanks.

Precision is not the main issue. Could even use float and powf(). A thermistor temperature calculation is not that accurate. After all the temperature is certainly not better than ±0.1°C accurate. Self heating of the thermistor is a larger factor.
OP's C code assumes log base 2, use log base e log() as the constants were derived using log base 2. #Martin R
// double temp = (1 / (A + B*log(R_therm) + C*pow(log(R_therm),3)));
double temp = (1 / (A + B*log(R_therm)/log(2) + C*pow(log(R_therm)/log(2),3)));`
Sample implementation, that avoids an unnecessary slow pow() call.
static const inv_ln2 = 1.4426950408889634073599246810019;
double ln2_R = log(R_therm)*inv_ln2;
double temp = 1.0 / (A + ln2_R*(B + C*ln2_R*ln2_R));

Yes, floating point arithmetic has limited precision on most arduinos.
Have you considered using fixed precision? If used correctly, this might give you better results. The requirement for this is to have rather narrow parameters, however, and be careful about unit conversions.
An unsigned long on arduino is 4 bytes too, so it can contain numbers up to 2^32-1. If using fixed point, you might want to replace this 1/T by something like 100000/T, where the numerator constant and T have been scaled according to the desired precision.
You will also need to keep a (mental or paper) model of the number of decimals each variable contains, in order to optimize the operation order not to lose precision.
For the log2 function, I doubt it is available out of the box for integers. You could either cast the result or reimplement it. There is plenty of ressources for this problem, even here on SO.

Related

Equation seems to be outputting wrong value in embedded C (stm32)

Hope you're having a nice day.
I'm encountering a weird issue on my side. I am working on embedded C code on an STM32 F103 C8T6 micro controller on a custom BMS PCB, but I am having some issue with the code that calculates the actual temperature from the thermistor ADC value.
Through excel, we have determined that the equation we need to use to calculate the temperature in Celsius from the ADC value is: y = -0.5022x^5 + 6.665x^4 - 35.123x^3 + 92.559x^2 - 144.22x + 166.76.
So, in my code I have the following lines, with temp[i] being the raw ADC value and realTemp[i] being the converted value:
realTemp[i] = (double)(temp[i] / 10000);
realTemp[i] = -0.5022 * realTemp[i]*realTemp[i]*realTemp[i]*realTemp[i]*realTemp[i] + 6.665 * realTemp[i]*realTemp[i]*realTemp[i]*realTemp[i] - 35.123 * realTemp[i]*realTemp[i]*realTemp[i] + 92.559 * realTemp[i]*realTemp[i] - 144.22 * realTemp[i] + 166.76;
I am not using the pow function from math.h as it has given us issues in the past.
The values we are getting in our temp[i] variable are the following: 35480, 35496, 35393, 35480. When using these values with our function in excel, we are getting the correct output, between 25.3 and 25.5 Celsius, however the C code listed above is outputting 36 in the realTemp array. I am not sure about the decimal values, but I don't care about them because the value is typecast to a uint16 a few lines later to be transmitted over a CAN bus.

Use floating point division, not integer division.
// Integer division ------v-------------v
// realTemp[i] = (double)(temp[i] / 10000);
realTemp[i] = temp[i] / 10000.0;

The answer by Chux is correct, I just wanted to explain more why this works.
temp[i] is uint16, therefore the formula temp[i] / 10000 is integer division, and result will be the floor of (temp[i] / 10000). Thus, the final conversion to double is performed on a value which is floored already.
By converting 10000 to 10000.0, it means that the division of an integer with a float/double will perform floating division. By this, the result will be similar to what you expected.

As others have said, you are doing integer division then casting the result to a double - you need to do the division itself as a double.
Your code will be big and very slow on the micro-controller in question. This might not be an issue, assuming that temperature values don't usually change very often, so slow code could be fine for you.
You also need to be careful with high-degree polynomials - they can easily be unstable, especially if you try to extrapolate them to very high or low temperatures. This is a particularly risky if you decide to make the code faster by switching to a float.
A better method of this kind of thing is usually a lookup table (which can be big but is simpler to implement), or with a linear spline (which has smaller footprint but a bit more complex to implement).

C: Representing a fraction without floating points

I'm writing some code for an embedded system (MSP430) without hardware floating point support. Unfortunately, I will need to work with fractions in my code as I'm doing ranging, and a short-range sensor with a precision of 1m isn't a very good sensor.
I can do as much of the math as I need in ints, but by the end there are two values that I will definitely need to have fractions on; range and speed. Range will be a value between 2-500 (cm), while speed should be no higher than -10 to 10 (ms^-1). I am unsure how to represent them without floating point values, if it is possible. A simple way of rounding the fractions up or down would be best.
Some sample code I have:
voltage_difference_new = ((memval3_new - memval4_new)*3.3/4096);
where memval3_new and memval4_new are ints, but voltage_difference_new is a float.
Please let me know if more information is needed. Or if there is a blindingly easy fix.

You have rather answered your own question with the statement:
Range will be a value between 2-500 (cm),
Work in centimetre (or even millimetre) rather than metre units.
That said you don't need floating-point hardware to do floating point math; the compiler will support "soft" floating point and generate the code to perform floating point operations - it will be slower than hardware floating point or integer operations, but that may not be an issue in your application.
Nonetheless there are many reasons to avoid floating-point even with hardware support and it does not sound like your case for FP is particularly compelling, but it is hard to tell without seeing your code and a specific example. In 32 years of embedded systems development I have seldom resorted to FP even for trig, log, sqrt and digital signal processing.
A general method is to use a fixed point presentation. My earlier suggestion of using centimetres is an example of decimal fixed point, but for greater efficiency you should use binary fixed point. For example you might represent distance in 1/1024 metre units (giving > 1 mm precision). Because the fixed point is binary, all the necessary rescaling can be done with shifts rather than more expensive multiply/divide operations.
For example, say you have an 8 bit sensor generating linear output 0 to 255 corresponding to a real distance 0 to 0.5 metre.
#define Q10_SHIFT = 10 ; // 10 bits fractional (1/1024)
typedef int q10_t ;
#define ONE_METRE = (1 << Q10_SHIFT)
#define SENSOR_MAX = 255
#define RANGE_MAX = (ONE_METRE/2)
q10_t distance = read_sensor() * RANGE_MAX / SENSOR_MAX ;
distance is in Q10 fixed point representation. Performing addition and subtraction on such is normal integer arithmentic, multiply and divide require scaling:
int q10_add( q10_t a, q10_t b )
{
return a + b ;
}
int q10_sub( q10_t a, q10_t b )
{
return a - b ;
}
int q10_mul( q10_t a, q10_t b )
{
return (a * b) >> Q10_SHIFT ;
}
int q10_div( q10_t a, q10_t b )
{
return (a << Q10_SHIFT) / b ;
}
Of course you may want to be able to mix types and say multiply a q10_t by an int - providing a comprehensive library for fixed-point can get complex. Personally for that I use C++ where you have classes, function overloading and operator overloading to support more natural code. But unless your code has a great deal of general fixed point math, it may be simpler to code specific fixed point operations ad-hoc.
To take the one example you have provided:
double voltage_difference_new = ((memval3_new - memval4_new)*3.3/4096);
The floating-point there is trivially removed using millivolts:
int voltage_difference_new_mv = ((memval3_new - memval4_new) * 3300) /4096 ;
The issue then perhaps becomes one of presentation. For example if you have to present or report the value in volts to a user. In that case:
int volt_fract = abs(voltage_difference_new_mv % 1000) ;
int volt_whole = voltage_difference_new_mv / 1000 ;
printf( "%d.%04d", volt_whole, volt_fract ) ;

Power function giving different answer than math.pow function in C

I was trying to write a program to calculate the value of x^n using a while loop:
#include <stdio.h>
#include <math.h>
int main()
{
float x = 3, power = 1, copyx;
int n = 22, copyn;
copyx = x;
copyn = n;
while (n)
{
if ((n % 2) == 1)
{
power = power * x;
}
n = n / 2;
x *= x;
}
printf("%g^%d = %f\n", copyx, copyn, power);
printf("%g^%d = %f\n", copyx, copyn, pow(copyx, copyn));
return 0;
}
Up until the value of 15 for n, the answer from my created function and the pow function (from math.h) gives the same value; but, when the value of n exceeds 15, then it starts giving different answers.
I cannot understand why there is a difference in the answer. Is it that I have written the function in the wrong way or it is something else?

You are mixing up two different types of floating-point data. The pow function uses the double type but your loop uses the float type (which has less precision).
You can make the results coincide by either using the double type for your x, power and copyx variables, or by calling the powf function (which uses the float type) instead of pow.
The latter adjustment (using powf) gives the following output (clang-cl compiler, Windows 10, 64-bit):
3^22 = 31381059584.000000
3^22 = 31381059584.000000
And, changing the first line of your main to double x = 3, power = 1, copyx; gives the following:
3^22 = 31381059609.000000
3^22 = 31381059609.000000
Note that, with larger and larger values of n, you are increasingly likely to get divergence between the results of your loop and the value calculated using the pow or powf library functions. On my platform, the double version gives the same results, right up to the point where the value overflows the range and becomes Infinity. However, the float version starts to diverge around n = 55:
3^55 = 174449198498104595772866560.000000
3^55 = 174449216944848669482418176.000000

When I run your code I get this:
3^22 = 31381059584.000000
3^22 = 31381059609.000000
This would be because pow returns a double but your code uses float. When I changed to powf I got identical results:
3^22 = 31381059584.000000
3^22 = 31381059584.000000
So simply use double everywhere if you need high resolution results.

Floating point math is imprecise (and float is worse than double, having even fewer bits to store the data in; using double might delay the imprecision longer). The pow function (usually) uses an exponentiation algorithm that minimizes precision loss, and/or delegates to a chip-level instruction that may do stuff more efficiently, more precisely, or both. There could be more than one implementation of pow too, depending on whether you tell the compiler to use strictly conformant floating point math, the fastest possible, the hardware instruction, etc.
Your code is fine (though using double would get more precise results), but matching the improved precision of math.h's pow is non-trivial; by the time you've done so, you'll have reinvented it. That's why you use the library function.
That said, for logically integer math as you're using here, precision loss from your algorithm likely doesn't matter, it's purely the float vs. double issue where you lose precision from the type itself. As a rule, default to using double, and only switch to float if you're 100% sure you don't need the precision and can't afford the extra memory/computation cost of double.

Precision
float x = 3, power = 1; ... power = power * x forms a float product.
pow(x, y) forms a double result and good implementations internally use even wider math.
OP's loop method incurs rounded results after the 15th iteration. These roundings slowly compound the inaccuracy of the final result.
316 is a 26 bit odd number.
float encodes all odd numbers exactly until typically 224. Larger values are all even and of only 24 significant binary digits.
double encodes all odd numbers exactly until typically 253.
To do a fair comparison, use:
double objects and pow() or
float objects and powf().
For large powers, the pow(f)() function is certain to provide better answers than a loop at such functions often use internally extended precision and well managed rounding vs. the loop approach.

Possible Overflow C

I have several variables listed below:
int cpu_time_b = 6
float clock_cycles_a = 2 * pow(10, 10));
float cpi_a = 2.0;
int cycle_time_a = 250;
float cpi_b = 1.2;
int cycle_time_b = 500
I am working out the clock rate of b with the following calculation:
(((1.2*clock_cycles_a)/cpu_time_b)/(1 * pow(10, 9)))
Clearly the answer should be 4 however my program is outputting 6000000204800000000.0 as the answer
I think that overflow is possibly happening here. Is this the case and if so, how could I fix the problem?

All calculations should be made to ensure comparable numbers are "reduced" together. in your example, it seems like only
cpu_time_b
is truly variable (undefined in the scope of your snippet. All other variables appears as constants. All constants should be computed before compilation especially if they are susceptible to cause overflow.
clock_cycles_a
cancels the denominator. pow is time consuming (may not be critical here) and not always that precise. You multiply the 2 explicitly when you declare clock_cycles_a and then use 1.2 below. etc. Reducing the whole thing keeping only the actual variable becomes:
24.0/cpu_time_b
which makes me deduce that cpu_time_b should be 6?
Finaly, while you write the equation, we have no idea of what you do with the result. Store it in the wrong variable type? printf with the wrong format? etc?

C equivalent of matlab angle function

I am dying here. So I have a complex number(-4.9991 + 15.2631i). In matlab if I do
angle(-4.9991 + 15.2631i) = 1.8873
I thought that angle basically calculated like
atan(15.2631/-4.9991) = -1.2543
Why are these different? I need to write a c function that calculates the angle of a complex number. I have done so like this:
#define angle(x) (atan((GSL_IMAG(x)/GSL_REAL(x))))
But that way gives me the -1.2543 answer, not the 1.8873 answer. What am I doing wrong?

-1.2543 + Pi(radians) = 1.8873 (with rounding)
As pointed out by others, use atan2()

Although using atan2 solves the problem, the actual question hasn't been answered:
Why are these different?
You are missing that the tangent function is periodic, with period pi = 3.141592... So, when you write z = atan(y/x) you expect a number z such that tan(z) = y/x, but there are infinite such numbers, since tan(z + pi) = tan(z). Of course, you get just one of these infinite values: The closest to zero, which isn't the one you always need.
In particular, note that since you are calculating the quotient Im/Re, you can't tell the difference from -Im/-Re, i.e. a minus sign on both componentes doesn't change the quotient, but it's the opposite complex number (same applies for 2-d vectors). That's what atan2 and angle do: They check for the sign of each component separately, and then determine if +/- pi should be added to the result of atan.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Better precision with Arduino (floats) - c

Related

Equation seems to be outputting wrong value in embedded C (stm32)

C: Representing a fraction without floating points

Power function giving different answer than math.pow function in C

Possible Overflow C

C equivalent of matlab angle function

Categories

Resources