Calculate a third grade function with C on a STM32 microcontroller - c

I'm trying to implement the following function to be calculated by my STM32
y=0.0006*x^3 - 0.054*x^2 + 2.9094*x - 2.3578
x is in range 0 to 215
To avoid the use of pow or any other functions, I have written the following code
static double tmp = 0;
static unsigned short c_m;
static unsigned short c_m_3;
static unsigned short c_m_2;
c_m_3 = c_m*c_m*c_m;
c_m_2 = c_m*c_m;
tmp = (0.0006*c_m_3) - (0.054*c_m_2) + (2.9094*c_m) - 2.3578;
dati->sch_g = tmp;
For some reason the calculation is totally wrong as, for istane, if c_m = 1 I should have tmp = 0.4982 instead I got 13
Am I missing something?

As denoted by Lundin in the comments your micro controller type (ARM Cortex M0) doesn't provide a floating point unit. This in consequence means you cannot rely on natural floating point math, but need to rely on a floating point software library like e.g. this one (note: I didn't evaluate, has just been the very first I stumbled upon on a quick search!).
Alternatively – and likely preferrably – you might want to do the calculations in plain integers; if you additionally convert your calculation from pattern a*x*x*x + b*x*x + c*x + d to ((a*x + b)*x + c)*x + d you even spare some mulitiplications:
int32_t c_m = ...;
c_m = ((6 * c_m - 540) * c_m + 29094) * c_m - 23578;
Note: unsigned short would be too small to hold the result on STM32, so you need to switch to 32 bit at least! Additionally you need a signed value to be able to hold the negative result arising from c_m == 0.
Your results would now be too large by a factor of 10 000, of course. As the use case is unclear question remains open how you would want to deal with, possibly rounding it (c_m = (c_m + 5000) / 10000) or evaluating the fractional part by other means.

short is 16 bits on all STM32. Thus the value 215 * 215 * 215 will not fit inside one. c_m_3 = c_m*c_m*c_m; truncates the result as per modulus USHRT_MAX+1 (65536):
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Use uint32_t instead.

short is only 16 bits, the max value it can hold is 65535. Therefore it will overflow if the number you want to calculate the third power for is over 40. This means that you must use a larger variable type like uint32_t.
You can also use ifs to detect overflow for better programming practices.
As another note, it's better to use "uint8_t" and "uint16_t" instead of "unsigned char" and "unsigned short" in embedded programming, because they're more descriptive of the data sizes.

Related

Store temporary 64 bit variable in a 32 bit machine

I'm doing some programming on a 32bit machine.
As part of an algorithm to calculate collisions between objects in 3d I have to get the results of a dot product:
//Vector3 components are signed int
signed long GaMhVecDotL(const Vector3 *p_a, const Vector3 *p_b)
{
return ((p_a->vx * p_b->vx + p_a->vy * p_b->vy + p_a->vz * p_b->vz));
}
In some cases this result overflows the 32 bit returning value (signed long).
I have tried a couple of things:
Bitshift the Vector3 components before sending them to this function
to reduce the size. This works in most cases but I lose precision and
that makes the algorithm fail in some edge cases.
Storing the result of the operation in a long long variable,
and although it compiles it doesn't seem to store the variables
correctly (this is for some PSX homebrew, compiler and tools haven't been updated since the late 90s).
I actually don't need to know the full result of the Dot Product, I would just need to know if the result is positive, negative or 0 and at the same time trying to preserve as much precision as possible.
Is there any way I can store the result of that operation (p_a->vx * p_b->vx + p_a->vy * p_b->vy + p_a->vz * p_b->vz) in a temp 64 bit var (or 2x32 bit) that would allow me later to check if this var is positive, negative or 0?
Is there any way I can store the result of that operation (p_a->vx * p_b->vx + p_a->vy * p_b->vy + p_a->vz * p_b->vz) in a temp 64 bit var (or 2x32 bit) that would allow me later to check if this var is positive, negative or 0?
This uses 32-bit math, (given int is 32-bit). Storing the return in a 64-bit result does not make the equation 64-bit.
// 32-bit math
p_a->vx * p_b->vx + p_a->vy * p_b->vy + p_a->vz * p_b->vz
Instead, use 64 bit math in the equation.
// v-----------------v multiplication now done as 64-bit
long long dot = (1LL*p_a->vx*p_b->vx) + (1LL*p_a->vy*p_b->vy) + (1LL*p_a->vz*p_b->vz);
Then check sign-ness
if (dot < 0) return -1;
return dot > 0;

Effective way to calc without float, C

I need to do a math to convert a 16-bit value received from sensor to real relative humidity value. It's calculated with following formula:
Given this in floating point math that would be:
uint16_t buf = 0x7C80; // Example
float rh = ((float)buf*125 / 65536)-6;
But I want to avoid floating point math as my platform are "FPUless".
What are the most effective way to calculate & store RH in integer math here? Considering it's humidity the actual value should be between 0 and 100% but sometimes approximation could lead that rh could be slightly less than 0 or higher than 100 (if I would leave that float, I could just do something like if (rh<0) rh=0; else if (rh>100) rh=100;) and I care only about last 2 digits after decimal point (%.2f).
Currently I've solved this like this:
int16_t rhint = ((uint32_t)buf*12500 / 65536)-600;
And working with rhint / 100; rhint % 100. But probably there are more effective way?
You could avoid the large intermediate term by writing the right hand side as
-6 + (128 - 4 + 1) * S / 65536
Which becomes
-6 + S / 512 - S / 16384 + S / 65536
You might be able to drop the last term, and possibly the penultimate one too depending on how precise you want the basis point truncation to be.

Notation for fixed point representation

I'm looking for a commonly understandable notation to define a fixed point number representation.
The notation should be able to define both a power-of-two factor (using fractional bits) and a generic factor (sometimes I'm forced to use this, though less efficient). And also an optional offset should be defined.
I already know some possible notations, but all of them seem to be constrained to specific applications.
For example the Simulink notation would perfectly fit my needs, but it's known only in the Simulink world. Furthermore the overloaded usage of the fixdt() function is not so readable.
TI defines a really compact Q Formats, but the sign is implicit, and it doesn't manage a generic factor (i.e. not a power-of-two).
ASAM uses a generic 6-coefficient rational function with 2nd-degree numerator and denominator polynomials (COMPU_METHOD). Very generic, but not so friendly.
See also the Wikipedia discussion.
The question is only about the notation (not efficiency of the representation nor fixed-point manipulation). So it's a matter of code readability, maintenability and testability.
Ah, yes. Having good naming annotations is absolutely critical to not introducing bugs with fixed point arithmetic. I use an explicit version of the Q notation which handles
any division between M and N by appending _Q<M>_<N> to the name of the variable. This also makes it possible to include the signedness as well. There are no run-time performance penalties for this. Example:
uint8_t length_Q2_6; // unsigned, 2 bit integer, 6 bit fraction
int32_t sensor_calibration_Q10_21; // signed (1 bit), 10 bit integer, 21 bit fraction.
/*
* Calculations with the bc program (with '-l' argument):
*
* sqrt(3)
* 1.73205080756887729352
*
* obase=16
* sqrt(3)
* 1.BB67AE8584CAA73B0
*/
const uint32_t SQRT_3_Q7_25 = 1 << 25 | 0xBB67AE85U >> 7; /* Unsigned shift super important here! */
In case someone have not fully understood why such annotation is extremely important,
Can you spot the if there is an bug in the following two examples?
Example 1:
speed_fraction = fix32_udiv(25, speed_percent << 25, 100 << 25);
squared_speed = fix32_umul(25, speed_fraction, speed_fraction);
tmp1 = fix32_umul(25, squared_speed, SQRT_3);
tmp2 = fix32_umul(12, tmp1 >> (25-12), motor_volt << 12);
Example 2:
speed_fraction_Q7_25 = fix32_udiv(25, speed_percent << 25, 100 << 25);
squared_speed_Q7_25 = fix32_umul(25, speed_fraction_Q7_25, speed_fraction_Q7_25);
tmp1_Q7_25 = fix32_umul(25, squared_speed_Q7_25, SQRT_3_Q1_31);
tmp2_Q20_12 = fix32_umul(12, tmp1_Q7_25 >> (25-12), motor_volt << 12);
Imagine if one file contained #define SQRT_3 (1 << 25 | 0xBB67AE85U >> 7) and another file contained #define SQRT_3 (1 << 31 | 0xBB67AE85U >> 1) and code was moved between those files. For example 1 this has a high chance of going unnoticed and introduce the bug that is present in example 2 which here is done deliberately and has a zero chance of being done accidentally.
Actually Q format is the most used representation in commercial applications: you use is when you need to deal with fractional numbers FAST and your processor does not have a FPU (floating point unit) is it cannot use float and double data types natively - it has to emulate instructions for them which are very expensive.
usually you use Q format to represent only the fractional part, though this not a must, you get more precision for your representation. Here's what you need to consider:
number of bits you use (Q15 uses 16 bitdata types, usually short int)
the first bit is the sign bit (out of 16 bits you are left with 15 for data value)
the rest of the bits are used to store the fractional part of your number.
since you are representing fractional numbers your value is somewhere in [0,1)
you can choose to use some bits for the integer part as well, but you would loose precision - e.g if you wanted to represent 3.3 in Q format, you would need 1 bit for sign, 2 bits for the integer part, and are left with 13 bits for the fractional part (assuming you are using 16 bits representation)-> this format is called 2Q13
Example: Say you want to represent 0.3 in Q15 format; you apply the Rule of Three:
1 = 2^15 = 32768 = 0x8000
0.3 = X
-------------
X = 0.3*32768 = 9830 = 0x666
You lost precision by doing this but at least the computation is fast now.
In C, you can't use a user defined type like a builtin one. If you want to do that, you need to use C++. In that language you can define a class for your fixed point type, overload all the arithmetic operators (+, -, *, /, %, +=, -=, *=, /=, %=, --, ++, cast to other types), so that usage of the instances of this class really behave like the builtin types.
In C, you need to do what you want explicitly. There are two basic approaches.
Approach 1: Do the fixed point adjustments in the user code.
This is overhead-free, but you need to remember to do the correct adjustments. I believe, it is easiest to just add the number of past point bits to the end of the variable name, because the type system won't do you much good, even if you typedef'd all the point positions you use. Here is an example:
int64_t a_7 = (int64_t)(7.3*(1<<7)); //a variable with 7 past point bits
int64_t b_5 = (int64_t)(3.78*(1<<5)); //a variable with 5 past point bits
int64_t sum_7 = a_7 + (b_5 << 2); //to add those two variables, we need to adjust the point position in b
int64_t product_12 = a_7 * b_5; //the product produces a number with 12 past point bits
Of course, this is a lot of hassle, but at least you can easily check at every point whether the point adjustment is correct.
Approach 2: Define a struct for your fixed point numbers and encapsulate the arithmetic on it in a bunch of functions. Like this:
typedef struct FixedPoint {
int64_t data;
uint8_t pointPosition;
} FixedPoint;
FixedPoint fixed_add(FixedPoint a, FixedPoint b) {
if(a.pointPosition >= b.PointPosition) {
return (FixedPoint){
.data = a.data + (b.data << a.pointPosition - b.pointPosition),
.pointPosition = a.pointPosition
};
} else {
return (FixedPoint){
.data = (a.data << b.pointPosition - a.pointPosition) + b.data,
.pointPosition = b.pointPosition
};
}
}
This approach is a bit cleaner in the usage, however, it introduces significant overhead. That overhead consists of:
The function calls.
The copying of the structs for parameter and result passing, or the pointer dereferences if you use pointers.
The need to calculate the point adjustments at runtime.
This is pretty much similar to the overhead of a C++ class without templates. Using templates would move some decisions back to compile time, at the cost of loosing flexibility.
This object based approach is probably the most flexible one, and it allows you to add support for non-binary point positions in a transparent way.

Is Multiplying a decimal number where all results are full integers, considered Floating Point Math?

Sorry for the wordy title. My code is targeting a microcontroller (msp430) with no floating point unit, but this should apply to any similar MCU.
If I am multiplying a large runtime variable with what would normally be considered a floating point decimal number (1.8), is this still treated like floating point math by the MCU or compiler?
My simplified code is:
int multip = 0xf; // Can be from 0-15, not available at compile time
int holder = multip * 625; // 0 - 9375
holder = holder * 1.8; // 0 - 16875`
Since the result will always be a positive full, real integer number, is it still floating point math as far as the MCU or compiler are concerned, or is it fixed point?
(I realize I could just multiply by 18, but that would require declaring a 32bit long instead of a 16 bit int then dividing and downcasting for the array it will be put in, trying to skimp on memory here)
The result is not an integer; it rounds to an integer.
9375 * 1.8000000000000000444089209850062616169452667236328125
yields
16875.0000000000004163336342344337026588618755340576171875
which rounds (in double precision floating point) to 16875.
If you write a floating-point multiply, I know of no compiler that will determine that there's a way to do that in fixed-point instead. (That does not mean they do not exist, but it ... seems unlikely.)
I assume you simplified away something important, because it seems like you could just do:
result = multip * 1125;
and get the final result directly.
I'd go for chux's formula if there's some reason you can't just multiply by 1125.
Confident FP code will be created for
holder = holder * 1.8
To avoid FP and 32-bit math, given the OP values of
int multip = 0xf; // Max 15
unsigned holder = multip * 625; // Max 9375
// holder = holder * 1.8;
// alpha depends on rounding desired, e.g. 2 for round to nearest.
holder += (holder*4u + alpha)/5;
If int x is non-negative, you can compute x *= 1.8 rounded to nearest using only int arithmetic, without overflow unless the final result overflows, with:
x - (x+2)/5 + x
For truncation instead of round-to-nearest, use:
x - (x+4)/5 + x
If x may be negative, some additional work is needed.

How to avoid branching in C for this operation

Is there a way to remove the following if-statement to check if the value is below 0?
int a = 100;
int b = 200;
int c = a - b;
if (c < 0)
{
c += 3600;
}
The value of c should lie between 0 and 3600. Both a and b are signed. The value of a also should lie between 0 and 3600. (yes, it is a counting value in 0.1 degrees). The value gets reset by an interrupt to 3600, but if that interrupt comes too late it underflows, which is not of a problem, but the software should still be able to handle it. Which it does.
We do this if (c < 0) check at quite some places where we are calculating positions. (Calculating a new position etc.)
I was used to pythons modulo operator to use the signedness of the divisor where our compiler (C89) is using the dividend signedness.
Is there some way to do this calculation differently?
example results:
a - b = c
100 - 200 = 3500
200 - 100 = 100
Good question! How about this?
c += 3600 * (c < 0);
This is one way we preserve branch predictor slots.
What about this (assuming 32-bit ints):
c += 3600 & (c >> 31);
c >> 31 sets all bits to the original MSB, which is 1 for negative numbers and and 0 for others in 2-complement.
Negative number shift right is formally implementation-defined according to C standard documents, however it's almost always implemented with MSB copying (common processors can do it in a single instruction).
This will surely result in no branches, unlike (c < 0) which might be implemented with branch in some cases.
Why are you worried about the branch? [Reason explained in comments to the question.]
The alternative is something like:
((a - b) + 3600) % 3600
This assumes a and b are in the range 0..3600 already; if they're not under control, the more general solution is the one Drew McGowen suggests:
((a - b) % 3600 + 3600) % 3600
The branch miss has to be very expensive to make that much calculation worthwhile.
#skjaidev showed how to do it without branching. Here's how to automatically avoid multiplication as well when ints are twos-complement:
#if ((3600 & -0) == 0) && ((3600 & -1) == 3600)
c += 3600 & -(c < 0);
#else
c += 3600 * (c < 0);
#endif
What you want to do is modular arithmetic. Your 2's complement machine already does this with integer math. So, by mapping your values into 2's complement arithmetic, you can get the modolo operation free.
The trick is represent your angle as a fraction of 360 degrees between 0 and 1-epsilon. Of course, then your constant angles would have to represented similarly, but that shouldn't be hard; its just a bit of math we can hide in a conversion function (er, macro).
The value in this idea is that if you add or subtract angles, you'll get a value whose fraction part you want, and whose integer part you want to throw away. If we represent the fraction as a 32 bit fixed point number with the binary point at 2^32 (e.g., to the left of what is normally considered to be a sign bit), any overflows of the fraction simply fall off the top of the 32 bit value for free. So, you do all integer math, and "overflow" removal happens for free.
So I'd rewrite your code (preserving the idea of degrees times 10):
typedef unsigned int32 angle; // angle*3600/(2^32) represents degrees
#define angle_scale_factor 1193046.47111111 // = 2^32/3600
#define make_angle(degrees) (unsigned int32)((degrees%3600)*angle_scale_factor )
#define make_degrees(angle) (angle/(angle_scale_factor*10)) // produces float number
...
angle a = make_angle(100); // compiler presumably does compile-time math to compute 119304647
angle b = make_angle(200); // = 238609294
angle c = a - b; // compiler should generate integer subtract, which computes 4175662649
#if 0 // no need for this at all; other solutions execute real code to do something here
if (c < 0) // this can't happen
{ c += 3600; } // this is the wrong representation for our variant
#endif
// speed doesn't matter here, we're doing output:
printf("final angle %f4.2 = \n", make_degrees(c)); // should print 350.00
I have not compiled and run this code.
Changes to make this degrees times 100 or times 1 are pretty easy; modify the angle_scale_factor. If you have a 16 bit machine, switching to 16 bits is similarly easy; if you have 32 bits, and you still want to only do 16 bit math, you will need to mask the value to be printed to 16 bits.
This solution has one other nice property: you've documented which variables are angles (and have funny representations). OP's original code just called them ints, but that's not what they represent; a future maintainer will get suprised by the original code, especially if he finds the subtraction isolated from the variables.

Resources