Most the rounding questions on Stack Overflow are about specific cases and, as I just made a generic rounding function, I thought I would share it.
I have a number f, a starting value a, and an increment b, and I want to “round” f to the nearest element of the set {a + b•i | i is an integer}. For example:
a = 0.0, b = 0.5: round to one of the values 0.0, 0.5, 1.0, 1.5, etc.
a = 0.25, b = 0.5: round to one of the values 0.25, 0.75, 1.25, 1.75, etc.
a = 0.21, b = 0.23: round to one of the values 0.21, 0.44, 0.67, 0.80, etc.
The standard C library has roundf() but it only rounds to the nearest integer.
How do I go about doing this?
#include <stdio.h>
#include <math.h>
float round_float(float x, float inc, float start_val)
{
return ( roundf( (x - start_val) / inc ) * inc + start_val );
}
int main(void)
{
printf("%f\n", round_float(12.522, 0.5, 0));
printf("%f\n", round_float(12.522, -0.5, -0.1));
printf("%f\n", round_float(5.318, 0.23, 125));
printf("%f\n", round_float(-12.522, 12, 5));
printf("%f\n", round_float(-12.522, 3.6, -2));
}
output:
12.500000
12.400000
5.400002
-7.000000
-12.799999
first example: round_float(10.1521, 0.5, 0)
second example: round_float(10.1521, 0.5, 0.25)
final example: round_float(10.1521, 0.23, 0.21)
Extended example
This code shows how random sample values are rounded to the nearest entry in each of three sequences specified by the starting value and the increment — using the values outlined in the question.
#include <stdio.h>
#include <math.h>
static inline float round_float(float x, float inc, float start_val)
{
return roundf((x - start_val) / inc) * inc + start_val;
}
int main(void)
{
printf("%f\n", round_float(12.522, 0.5, 0));
printf("%f\n", round_float(12.522, -0.5, -0.1));
printf("%f\n", round_float(5.318, 0.23, 125));
printf("%f\n", round_float(-12.522, 12, 5));
printf("%f\n", round_float(-12.522, 3.6, -2));
static const float samples[] =
{
-14.2751, -12.3080, -10.5320, -6.4804, -1.0859,
0.1999, 0.2099, 5.2980, 5.7819, 11.7052,
};
enum { NUM_SAMPLES = sizeof(samples) / sizeof(samples[0]) };
static const float control[][2] =
{
{ 0.00, 0.50 },
{ 0.25, 0.50 },
{ 0.21, 0.23 },
};
enum { NUM_CONTROL = sizeof(control) / sizeof(control[0]) };
for (int i = 0; i < NUM_CONTROL; i++)
{
float a = control[i][0];
float b = control[i][1];
printf("Start: %8.4f; increment %8.4f\n", a, b);
for (int j = 0; j < NUM_SAMPLES; j++)
{
printf(" Sample: %8.4f rounds to %8.4f\n",
samples[j], round_float(samples[j], b, a));
}
}
return 0;
}
Example output:
12.500000
12.400000
5.400002
-7.000000
-12.799999
Start: 0.0000; increment 0.5000
Sample: -14.2751 rounds to -14.5000
Sample: -12.3080 rounds to -12.5000
Sample: -10.5320 rounds to -10.5000
Sample: -6.4804 rounds to -6.5000
Sample: -1.0859 rounds to -1.0000
Sample: 0.1999 rounds to 0.0000
Sample: 0.2099 rounds to 0.0000
Sample: 5.2980 rounds to 5.5000
Sample: 5.7819 rounds to 6.0000
Sample: 11.7052 rounds to 11.5000
Start: 0.2500; increment 0.5000
Sample: -14.2751 rounds to -14.2500
Sample: -12.3080 rounds to -12.2500
Sample: -10.5320 rounds to -10.7500
Sample: -6.4804 rounds to -6.2500
Sample: -1.0859 rounds to -1.2500
Sample: 0.1999 rounds to 0.2500
Sample: 0.2099 rounds to 0.2500
Sample: 5.2980 rounds to 5.2500
Sample: 5.7819 rounds to 5.7500
Sample: 11.7052 rounds to 11.7500
Start: 0.2100; increment 0.2300
Sample: -14.2751 rounds to -14.2800
Sample: -12.3080 rounds to -12.2100
Sample: -10.5320 rounds to -10.6000
Sample: -6.4804 rounds to -6.4600
Sample: -1.0859 rounds to -1.1700
Sample: 0.1999 rounds to 0.2100
Sample: 0.2099 rounds to 0.2100
Sample: 5.2980 rounds to 5.2700
Sample: 5.7819 rounds to 5.7300
Sample: 11.7052 rounds to 11.7100
I'm sure a more judicious choice of sample values would make the point even clearer; those are just a collection of random numbers in the range -20 .. +20 in sorted order.
Apologies for the extra level of indent in the last example.
The preview was not showing 'code' when it was not indented two levels, and I've no idea why. (Configuration: Firefox Quantum 57.0.1 (restart pending) on macOS High Sierra 10.13.2 and SO 2017.12.22.28257)
Related
This was a homework problem to find cosine of an angle without using the inbuilt cos function
I wrote the following code:
// Program to find cos x using loop.
#include <stdio.h>
#define _USE_MATH_DEFINES
#include <math.h>
double cosine(double);
int main()
{
double x;
printf("Enter angle in degrees: ");
scanf("%lf", &x);
x = x*M_PI/180.0;
printf("The value of cos(%lf) is %lf", x, cosine(x));
}
double cosine(double x)
{
double previous, current = 1;
double denominator = 1*2, numerator = x*x;
double sign = -1;
while(1)
{
previous = current;
current = current + ((numerator)/(denominator))*sign;
denominator = denominator * (denominator+1) * (denominator+2);
numerator = numerator*x*x;
sign = -sign;
if (fabs(previous - current)<=0.0001)
{
break;
}
}
return current;
}
For x = 180 the answer isn't -1 (which is the correct one)
I have no clue what is going wrong here. Please help I am relatively new at programming.
I tested out your code and found an error in the derivation of the factorial value for your denominator. The following line of code was actually not providing a proper factorial value.
denominator = denominator * (denominator+1) * (denominator+2);
It actually was increasing the denominator value too fast.
With that in mind, I did a bit of refactoring including revising the "while" loop test for previous and current to a simple "for" loop with enough iterations to provide the precision you most likely need. Following is a refactored version of your program.
// Program to find cos x using loop.
#include <stdio.h>
#define _USE_MATH_DEFINES
#include <math.h>
double cosine(double);
int main()
{
double x;
printf("Enter angle in degrees: ");
scanf("%lf", &x);
x = x * M_PI / 180.0;
printf("The value of cos(%lf) is %lf\n", x, cosine(x));
}
double cosine(double x)
{
double current = 1.00;
double denominator = 2.00, numerator = x*x;
double factor = 2.00;
double sign = -1;
for(int i = 0; i < 16; i++)
{
current = current + ((numerator)/(denominator))*sign;
denominator = denominator * (factor+1.00) * (factor+2.00); /* Derives the proper factorial increase */
numerator = numerator * x * x;
sign = -sign;
factor = factor + 2.00;
}
return current;
}
Some points to note.
Instead of the previous formula for calculating the needed factorial, a work field for keeping track of the ascending factorial values is added and incremented as needed and utilized in the denominator calculation.
Instead of testing for smaller and smaller differences with the "while" loop, a "for" loop is utilized with enough iterations to provide a desired precision for the cosine value.
With those tweaks, following were some tests listed at the terminal.
#Vera:~/C_Programs/Console/Taylor/bin/Release$ ./Taylor
Enter angle in degrees: 0
The value of cos(0.000000) is 1.000000
#Vera:~/C_Programs/Console/Taylor/bin/Release$ ./Taylor
Enter angle in degrees: 90
The value of cos(1.570796) is 0.000000
#Vera:~/C_Programs/Console/Taylor/bin/Release$ ./Taylor
Enter angle in degrees: 180
The value of cos(3.141593) is -1.000000
#Vera:~/C_Programs/Console/Taylor/bin/Release$ ./Taylor
Enter angle in degrees: 270
The value of cos(4.712389) is 0.000000
#Vera:~/C_Programs/Console/Taylor/bin/Release$ ./Taylor
Enter angle in degrees: 360
The value of cos(6.283185) is 1.000000
Give those tweaks a try and see if it meets the spirit of your project.
I'm trying to make a simple program which adds two fractions. It takes four numbers as inputs:
numerator and denominator of the first fraction and numerator and denominator of the second fraction. The output should be the sum of the two fractions in decimal form. My program isn't working and I'm stuck with how to fix it. It outputs 0.000 for all of the examples tried to run the function.
# include <stdio.h>
int fracsum(float n1, float d1, float n2, float d2)
{
float n3, d3;
double res;
n3 = (n1*d2) + (d1*n2);
d3 = d1 * d2;
res = n3 / d3;
return res;
}
int main(void)
{
printf("%.3f %.3f %.3f\n",
fracsum(1,2,2,4),
fracsum(1,4,1,8),
fracsum(4,3,5,6));
return 0;
}
The output of this code is: 0.000 0.000 0.000
int fracsum
but you clearly want
double fracsum
from your printf call
the value it give is 557135813.94455. does the value will remain same every time?? why its not showing infinity??
#include <stdio.h>
#include <math.h>
#define PI 3.14159265
int main ()
{
double param, result;
param = 90.0;
result = tan ( param * PI / 180.0 );
printf ("The tangent of %f degrees is %f.\n", param, result );
return 0;
}
You are not passing the value of Pi/2, you are passing 90.0 * 3.14159265 / 180.0, an approximation.
Code is not asking for the tangent of 90°, but the tangent of a number, in radians, close to 90°. The conversion to radians is not exact since π/2 radians is not representable exactly as a double.
The solution is to perform degrees range reduction first and then call tan(d2r(x)).
#include <math.h>
static double d2r(double d) {
return (d / 180.0) * ((double) M_PI);
}
double tand(double x /* degrees */) {
if (!isfinite(x)) {
return tan(x);
} else if (x < 0.0) {
return -tand(-x);
}
int quo;
double x45 = remquo(fabs(x), 90.0, &quo);
//printf("%d %f ", quo & 3, x45);
switch (quo % 4) {
case 0:
return tan(d2r(x45));
case 1:
return 1.0 / tan(d2r(- x45));
case 2:
return -tan(d2r(-x45));
case 3:
return -1.0 / tan(d2r(x45));
}
return 0.0;
}
#define PI 3.14159265
int main(void) {
double param, result;
param = 90.0;
result = tan(param * PI / 180.0);
printf("Angle %.*e radian\n", DBL_DECIMAL_DIG - 1, param * PI / 180.0);
printf("Pi/2 = 1.5707963267948966192313216916398...\n");
printf("The tangent of %f degrees is %f.\n", param, result);
int i;
for (i = -360; i <= 360; i += 30) {
printf("The tangent method 1 of %.1f degrees is %.*e\n",
1.0*i, DBL_DECIMAL_DIG - 1, tan(d2r(-i)));
printf("The tangent method 2 of %.1f degrees is %.*e\n",
1.0*i, DBL_DECIMAL_DIG - 1, tand(-i));
}
return 0;
}
OP's output
Angle 1.5707963250000001e+00 radian
Pi/2 = 1.5707963267948966192313216916398...
The tangent of 90.000000 degrees is 557135183.943528.
Better results
The tangent method 1 of -360.0 degrees is -2.4492935982947064e-16
The tangent method 2 of -360.0 degrees is 0.0000000000000000e+00
The tangent method 1 of -330.0 degrees is -5.7735026918962640e-01
The tangent method 2 of -330.0 degrees is -5.7735026918962573e-01
The tangent method 1 of -300.0 degrees is -1.7320508075688770e+00
The tangent method 2 of -300.0 degrees is -1.7320508075688774e+00
The tangent method 1 of -270.0 degrees is 5.4437464510651230e+15
The tangent method 2 of -270.0 degrees is -inf
The tangent method 1 of -240.0 degrees is 1.7320508075688752e+00
The tangent method 2 of -240.0 degrees is 1.7320508075688774e+00
The tangent method 1 of -210.0 degrees is 5.7735026918962540e-01
The tangent method 2 of -210.0 degrees is 5.7735026918962573e-01
The tangent method 1 of -180.0 degrees is -1.2246467991473532e-16
The tangent method 2 of -180.0 degrees is 0.0000000000000000e+00
...
Floating point arithmetic is not an exact arithmetic. You cannot even compare two floating point numbers using ==; e.g. 0.6 / 0.2 - 3 == 0 should be true but on most systems it will be false. Be careful when you perform floating point calculations and expect exact results; this is doomed to fail. Consider every floating point calculation to only return an approximation; albeit a very good one, sometimes even an exact one, just don't rely on it to be exact.
I'm trying to vectorize some code.
Idea: we have a pixel(__m128 in), if any of it's elements is bigger than upper, replace entier pixel with different pixel(__m128 upper_color)
Unvectorized code that works:
if(inp[0] >= upper || inp[1] >= upper || inp[2] >= upper)
{
outp[0] = upper_color[0];
outp[1] = upper_color[1];
outp[2] = upper_color[2];
}
So far i came up with following, but (i believe so) it replaces not entire pixel, but only those components that are bigger than upper:
const __m128 pixel = _mm_load_ps(in);
const __m128 isoe = _mm_cmpge_ps(pixel, upper);
__m128 result = _mm_or_ps(_mm_andnot_ps(isoe, pixel),
_mm_and_ps(isoe, upper_color));
_mm_stream_ps(out, result);
Let's assume upper = 1,1,1 and upper_color = 1,0,0
Fourth channel is alpha, so i do not care about it.
Results:
IN: 0.5 0.3 0.7
OUT: 0.5 0.3 0.7 (Expected)
OUT: 0.5 0.3 0.7 (Recieved)
IN: 1.5 1.1 0.7
OUT: 1 0 0 (Expected)
OUT: 1 0 0.7 (Recieved)
Maybe someone could help me? Is this is even possible?
You need to compute horizontal OR. There is no horizontal OR instruction in SSE, but such operation can be simulated with 2x UNPACK + vertical OR.
const __m128 pixel = _mm_load_ps(in);
/* (p3, p2, p1, p0 ) */
__m128 isoe = _mm_cmpge_ps(pixel, upper);
/* (p3|p1, p2|p0, p3|p1, p2|p0) */
isoe = _mm_or_ps(_mm_unpacklo_ps(isoe, isoe), _mm_unpackhi_ps(isoe, isoe));
/* (p3|p2|p1|p0, p3|p2|p1|p0, p3|p2|p1|p0, p3|p2|p1|p0) */
isoe = _mm_or_ps(_mm_unpacklo_ps(isoe, isoe), _mm_unpackhi_ps(isoe, isoe));
__m128 result = _mm_or_ps(_mm_andnot_ps(isoe, pixel), _mm_and_ps(isoe, upper_color));
_mm_stream_ps(out, result);
You can use _mm_movemask_epi8 to do a horizontal OR.
#include <stdio.h>
#include <emmintrin.h>
void foo(float ina[]) {
//float ina[] = {0.5, 0.3, 0.7, 0};
float uppera[] = {1,1,1,1};
float upper_colora[] = {1,0,0,0};
float out[4];
__m128 in = _mm_load_ps(ina);
__m128 upper = _mm_load_ps(uppera);
__m128 upper_color = _mm_load_ps(upper_colora);
const __m128 pixel = _mm_load_ps(ina);
const __m128 isoe = _mm_cmpge_ps(pixel, upper);
if(_mm_movemask_epi8(_mm_castps_si128(isoe))) {
_mm_stream_ps(out, upper_color);
}
else {
_mm_stream_ps(out, in);
}
printf("%f %f %f %f\n", out[0], out[1], out[2], out[3]);
}
int main() {
float ina1[] = {0.5, 0.3, 0.7, 0}; //output 0.5 0.3 0.7 0.0
float ina2[] = {0.5, 1.1, 0.7, 0}; //output 1.0 0.0 0.0 0.0
foo(ina1);
foo(ina2);
}
I have the following code for finding quartiles:
#include <stdio.h>
#include <stdlib.h>
typedef struct {
double qrt[3];
double *value;
int count;
} t_data;
static void set_qrt(t_data *data, int qrt)
{
int n, e;
double d;
d = qrt * 0.25 * data->count + 0.5;
n = (int)d;
e = n != d;
data->qrt[qrt - 1] = data->value[n - 1];
if (e) {
data->qrt[qrt - 1] += data->value[n];
data->qrt[qrt - 1] *= 0.5;
}
}
static void set_qrts(t_data *data)
{
set_qrt(data, 2);
if (data->count > 1) {
set_qrt(data, 1);
set_qrt(data, 3);
} else {
data->qrt[0] = 0.0;
data->qrt[2] = 0.0;
}
}
static int comp(const void *pa, const void *pb)
{
const double a = *(const double *)pa;
const double b = *(const double *)pb;
return (a > b) ? 1 : (a < b) ? -1 : 0;
}
int main(void)
{
double values[] = {3.7, 8.9, 7.1, 5.4, 1.2, 6.8, 4.3, 2.7};
t_data data;
data.value = values;
data.count = (int)(sizeof(values) / sizeof(double));
qsort(data.value, data.count, sizeof(double), comp);
set_qrts(&data);
printf("Q1 = %.1f\nQ2 = %.1f\nQ3 = %.1f\n", data.qrt[0], data.qrt[1], data.qrt[2]);
}
Is
d = qrt * 0.25 * data->count + 0.5;
n = (int)d;
e = n != d;
guaranteed to work as expected? (e == isinteger(d))
Numbers 0.5, 0.25, 0.125 and so on represent negative powers of two, and therefore are representable exactly in IEEE 754 types. Using these numbers does not result in representation errors.
The values 0.5 and 0.25 themselves will be exact. The intermediate values of your calculation may or may not be, depending on their range. IEEE doubles have a 52-bit mantissa, so they will exactly represent to the 0.25 numbers that need 50 bits or fewer in the mantissa, which is about 15 decimal digits.
So if you add 0.25 to 100000000000000 (10^14), you'll get 100000000000000.25. But if you add 0.25 to 10000000000000000 (10^16), you'll lose the fraction.
dasblinkenlight is absolutely correct. Double/float and integer types are stored differently according to IEEE754. Watch this for an easy tutorial if you are curious about it.
The double precision floating point format has 53 bits in its manitissa of which one is implicit. This means that it can represent all positive and negative integers in the range 2^0 to 2^53-1.
0 (zero) is a special case which has its own format.
When it comes to a 0.25 spacing the range is straight-forwardly calculated to be 2^-2 to 2^51-0.25. This means that quite a few but by no means all multiples of 0.25 are exactly representable in the double precision format, just as a quite a few but not all integers are exactly representable.
So if you have an exactly representable spacing of 2^x the representable range is 2^x to 2^(53+x)-2^x.