C Statistical Analysis Program having mathematical errors

C Statistical Analysis Program having mathematical errors - c

I wrote a program in C that does some different calculations with numbers passed through the command-line. For some reason, the result of Average() tends to be something large (and occasionally negative, I'll include a log file) and Std_Dev() tends to print 0. I ahve the code in a repository on GitHub:
https://github.com/Jordan-Effinger/Data-Analysis
A quick notice about the files: in the repository is a file called type.h. That file is not used in my current Build so if you don't see anything defined in there used just a heads up.
Example Results:
main 0 0 0 0 0
Calculating average
Calculating std-deviation
-141545200 1195973704
main 0 0 0 0 0
Calculating average
Calculating std-deviation
-1030105488 1003883182
main 0 0 0 0 0
Calculating average
Calculating std-deviation
1478538976 1111766907
Any thoughts? I think something's going wrong when the functions are returning the result - but I've used these functions before and I didn't have this kind of problem...
Edit #1:
I realized both functions have a problem with zero. That is something I will have to work on. I looked throught he comments, implemented a few changes there and found a few changes of my own. I won't include the whole functions, just go over the changes.
file: main.c
I dynamically allocated space for Data[], Sorted[] using malloc:
` float *Data = (float *) malloc( (data_count + 3 ) * sizeof(float) );
In all of the functions ( and their prototypes), I have arrays declared as float * and am passing the data_count variable as a size reference (I'm not quite comfortable with sizeof() in most instances).
file: std_dev.c:
In the for loop I changed
sum += pow( Data[data_count] - average, 2 );
to
sum += pow( Data[index] - average, 2 );
I'm going to run some tests, implement the rest of the calculations, and then see what I can do to fix the issue with zero values.
Thank you for your input!
--Jordan.

I am relatively sure I see a few errors:
When calculating standard deviation, you use the parameter data_count in the loop body rather than idx. That will never work.
You are using Data[data_count] as the parameter for your array in both the average and standard deviation functions. If you're using C you probably just want float *Data. I am pretty sure Data[data_count] is simply wrong here. Possibly float Data[] might be correct. EDIT: it has been pointed out in the comments that this syntax can actually be correct if the compiler supports it. Check to make sure your compiler supports this and, if so, no changes should be needed.
When you call the average and standard deviation functions, you are passing Data[data_count]. I am almost sure this must be wrong; Data[data_count] is the (data_count+1)'th element of Data, an array of size data_count; so it's not even defined and, if it were, the type is still wrong. I suggest simply passing Data here.
I typically work in C++ so these comments may be off the mark but if C is like C++ in these respects then these are definitely issues to look at.

Related

C for loop, supposed to be able to omit initialization inside loop itself, doesn't work as intended

so I'm slowly trying to learn C from scratch.
I'm at a point in the book I'm using where an exercise is proposed: use nested loops to find Pythagorean triplets. Now I'll show the code.
#include <stdio.h>
int main(void){
int lato1=1;
int lato2=1;
int ipotenusa=1;
for(;lato1 <= 500; lato1++){
for(;lato2 <= 500; lato2++){
for(;ipotenusa <= 500; ipotenusa++){
if (((lato1 * lato1)+(lato2 * lato2))==(ipotenusa*ipotenusa)){
printf("Tripletta %10d %10d %10d\n",lato1,lato2,ipotenusa);
}
}
}
}
return 0;
}
now, apart from terrible formatting and style, apart from the shi*t optimization,
the code as shown, doesnt work.
It only execute the inner most loop, and then the program ends.
If, however, I initialize each variable/counter inside each loop then it works.
Why?
I read that for loop initialization is valid even whitout no arguments (;;) but in this case I just wanted to initialize those variables before the loop, let's say because I wanted to access those value after the loop is done, it just doesn't seem to work like it's supposed to.
English is not my primary language so apologies in advance for any mystake.
Can somebody explain what is the problem?
Edit 1: So, I don't know if it was my bad English or something else.
I said, if I declare and initialize the variables before the loop, like in the code I've shown you, it only goes through the inner most loop (ipotenusa) and it does so with the following output values: 1 1 1 then 1 1 2 then 1 1 3 and so on, where the only increasing number is the last one (ipotenusa); AFTER we reach 1 1 500 the programs abruptily ends.
I then said that if I initialize the variables as normal, meaning inside the for loop instruction, then it works as intended.
Even if declared earlier there is NO reason it should't work. The variable's value it's supposed to increase. Up until now the only useful answer was to initialize a variable outside the loop but assign a value to it in the loop statement, but this is not at the end the answer I need, because I should be able to skip the initialization inside the loop statement altogether.
EDIT 2: I was wrong and You guys were right, language barrier (most likely foolishness, rather) was certainly the cause of the misunderstanding lol. Sorry and Thanks for the help!

You accidentally answered your own question:
...in this case I just wanted to initialize those variables before the loop, let's say because I wanted to access those value after the loop is done...
Think about the value of ipotenusa...or have your program print out the value of each variable at the start of each loop for debugging purposes. It'll look something like the following log.
lato1: 1
lato2: 1
ipotenusa: 1
ipotenusa: 2
ipotenusa: 3
ipotenusa: 4
ipotenusa: 5
lato2: 2
lato2: 3
lato2: 4
lato2: 5
lato1: 2
lato1: 3
lato1: 4
lato1: 5
The inner loops never "reset," because you're saving the value. You're not wrong on either count, but you've set up two requirements that conflict with each other, and the C runtime can only give you one.
Basically, lato2 and ipotenusa need to be initialized (but not necessarily declared) in their for statements, so that they're set up to run again.
int lato1=1;
int lato2;
int ipotenusa;
for(;lato1 <= 5; lato1++){
for(lato2 = 1;lato2 <= 5; lato2++){
for(ipotenusa = 1;ipotenusa <= 5; ipotenusa++){
if (((lato1 * lato1)+(lato2 * lato2))==(ipotenusa*ipotenusa)){
printf("Tripletta %10d %10d %10d\n",lato1,lato2,ipotenusa);
}
}
}
}
It's worth pointing out that, even though the specification might (or might not; I don't remember and am too fuzzy to look it up, right now) say that the loop variables only exist during the loop, I've never seen an implementation actually do that, so you'll see a lot of code out there that defines the variable inside of for and uses it after. It's probably not great and a static checker like lint might complain, but it's common enough that it'll pass most code reviewers, to give you a sense of whether it can be done.

Your code does exactly what you told it to do.
If you initialise lato2 with a value of 1, what makes you think that the initialisation will be repeated at the start of the inner for loop? Of course it isn't. You told the compiler that you didn't want it to be initialised again. After that loop executed the first time, lato2 has a value of 501, and it will keep that value forever.
Even if it hadn't produced a bug, putting the initialisation and the for loop so far apart is very, very bad style (wouldn't pass a code review in a professional setting and therefore would have to be changed).
for (int lato2 = 1; lato2 <= 500; ++lato2) is clear, obvious, and works.

One approach would be - declaring variables outside of the loop and initialize them inside. This way, you can access the variables after the loops.
#include <stdio.h>
int main(void){
int lato1, lato2, ipotenusa;
for(lato1=1; lato1 <= 500; lato1++){
for(lato2=1; lato2 <= 500; lato2++){
for(ipotenusa=1; ipotenusa <= 500; ipotenusa++){
if (((lato1 * lato1)+(lato2 * lato2))==(ipotenusa*ipotenusa)){
printf("Tripletta %10d %10d %10d\n",lato1,lato2,ipotenusa);
}
}
}
}
return 0;
}

You are using variables that are initialized outside of the for loops. Usually what happens is when we use inner loops(nested loops) we intended to initialize it with base value everytime but in your case it is never going to do that. Since you have initialized variable out side of the for loops so it will persist its value through out its life time of course which is main function in this case.

Optimizing C code, Horner's polynomial evaluation

I'm trying to learn how to optimize code (I'm also learning C), and in one of my books there's a problem for optimizing Horner's method for evaluation polynomials. I'm a little lost on how to approach the problem. I'm not great at recognizing what needs optimizing.
Any advice on how to make this function run faster would be appreciated.
Thanks
double polyh(double a[], double x, int degree) {
long int i;
double result = a[degree];
for (i = degree-1; i >= 0; i--)
result = a[i] + x*result;
return result;
}

You really need to profile your code to test whether proposed optimizations really help. For example, it may be the case that declaring i as long int rather than int slows the function on your machine, but on the other hand it may make no difference on your machine but might make a difference on others, etc. Anyway, there's no reason to declare i a long int when degree is an int, so changing it probably won't hurt. (But still profile!)
Horner's rule is supposedly optimal in terms of the number of multiplies and adds required to evaluate a polynomial, so I don't see much you can do with it. One thing that might help (profile!) is changing the test i>=0 to i!=0. Of course, then the loop doesn't run enough times, so you'll have to add a line below the loop to take care of the final case.
Alternatively you could use a do { ... } while (--i) construct. (Or is it do { ... } while (i--)? You figure it out.)
You might not even need i, but using degree instead will likely not save an observable amount of time and will make the code harder to debug, so it's not worth it.
Another thing that might help (I doubt it, but profile!) is breaking up the arithmetic expression inside the loop and playing around with order, like
for (...) {
result *= x;
result += a[i];
}
which may reduce the need for temporary variables/registers. Try it out.

Some suggestion:
You may use int instead of long int for looping index.

Almost certainly the problem is inviting you to conjecture on the values of a. If that vector is mostly zeros, then you'll go faster (by doing fewer double multiplications, which will be the clear bottleneck on most machines) by computing only the values of a[i] * x^i for a[i] != 0. In turn the x^i values can be computed by careful repeated squaring, preserving intermediate terms so that you never compute the same partial power more than once. See the Wikipedia article if you've never implemented repeated squaring.

"final" modifier in C and array declaration

I'm working on a homework assignment for my Intro to C course (don't worry, I don't need you guys to solve anything for me!) and I have a question about design. I'm trying to figure out how to safely set the size of an array by reading input from a file.
Initially I wrote it out like this:
fscanf(ifp, "%d", &number_of_pizzas);
float pizza_cost[number_of_pizzas];
I'm pretty sure this will build fine, but I know that it's unwise to declare an array with a variable size. My assignment specifies the array can be no bigger than 100, so I know I can just write "pizza_cost[100]", but I'd rather do it precisely instead of wasting the memory.
Java is the language I'm most familiar with, and I believe the solution to the problem would be written out like this:
Scanner s = new Scanner(System.in);
final int i = s.nextInt();
int[] array = new int[i];
I know C doesn't have a final keyword, so I'm assuming "const" would be the way to go... Is there any way to replicate that code into C?

I'm pretty sure this will build fine, but I know that it's unwise to declare an array with a variable size.
That is true only in situations when there is no upper limit on the size. If you know that number_of_pizzas is 100 or less, your declaration would be safe on all but the most memory-constrained systems.
If you change your code to validate number_of_pizzas before declaring a variable-size array, you would be safe. However, this array would be limited in scope to a function, so you wouldn't be able to return it to your function's caller.
An analogy to Java code would look as follows:
float *pizza_cost = malloc(sizeof(float)*number_of_pizzas);
Now your array can be returned from a function, but you would be responsible for freeing it at some point in your program by calling free(pizza_cost)
As far as making number_of_pizzas a const goes, it is not going to work with scanf: it would be illegal to modify a const through a pointer. It is of very little utility even in Java, because you can get the same value by accessing array's length.

Any dynamic expression can have a limit placed upon its value easily enough:
fscanf(ifp, "%d", &number_of_pizzas);
float pizza_cost[number_of_pizzas > 100 ? 100 : number_of_pizzas];
This is not going to be any less safe than using a constant value as long as the bound(s) is/are constant, and has the potential to be smaller should the required number be less.
Making the variable const/final/anything gains you nothing in this scenario because whether it is modified after being used to create the buffer, doesn't affect the size of the buffer in any way.

Your code will build fine. Running well depends on input values.
I would add just one improvement to this; test the limit of your array initializer before using it:
fscanf(ifp, "%d", &number_of_pizzas);
if((number_of_pizzas > MIN_SIZE) &&(number_of_pizzas < MAX_SIZE))//add this test (or something similar)
{
float pizza_cost[number_of_pizzas];
//do stuff
}
Pick values for MIN_SIZE and MAX_SIZE that make sense for your application...
Doing the same this using dynamic allocation:
fscanf(ifp, "%d", &number_of_pizzas);
if((number_of_pizzas > MIN_SIZE) &&(number_of_pizzas < MAX_SIZE))//add this test (or something similar)
{
float pizza_cost = malloc(sizeof(float)*number_of_pizzas);
//do stuff
}
Don't forget to use free(pizza_cost); when you are done.

Beginner type conversion

I am extremely new to programming in general, so please forgive my noobishness. I am trying to scale down the res[?] array in the function below. I know the problem that the res[?]*(multiplier/100) is creating a decimal answer instead of the required format which is an integer therefore I need to convert the result before it is plugged into the res[?].
I did think of turning res[] array into a double but I wasnt sure whether the initwindow(?,?) was compatible with integer values.
I am on mingw with code blocks. the linker and compiler has customized setting made by my professor. I am on Plain\basic C???
I tried to apply the techniques this website used about the truncating conversion. But doesn't seem to work. http://www.cs.tut.fi/~jkorpela/round.html
Debugger watcher shows that res[?] is equivalent to 0.
#include <stdio.h>
#include <graphics_lib.h>
#include <math.h>
int res[2]//contains a array which contains the vertical and horizontal detected resolution from main()
void function(res)
{
printf("Please note the lowest resolution available is 800x600\n");
printf("Please enter percentage ratio % below 100:");
scanf("%d",&multiplier);
res[1]=(int)(res[1]*(multiplier/100));
res[2]=(int)(res[2]*(multiplier/100));
blah blah blah blah.............
initwindow(res[1],res[2]) //from custom header that my professor made which initializes basic graphics window
return;
}

I'm assuming multiplier is an int to match the %d format.
multiplier/100 is guaranteed to be zero (if the user follows directions and provides a number less than 100). You can do (res[x]*multiplier)/100 to make sure the multiply happens first (you're probably okay without the parentheses, but rather than think about order of operations why not be explicit?)
The cast to int is unnecessary, because an int divided by another int is always an int.
Unless your professor has done some very interesting things, you should also note that a two-element array such as res would have elements res[0] and res[1], not res[1] and res[2].

you don't need to cast in this situation because the conversion is done implicitly . you only need a cast to assign the content of a variable to a variable of different type .

Looks like the multiplier variable is an int, so the result of multiplier/100 expression will be 0 in all cases.
Also, it's a good programming practice to check the validity of user's input.

You must declare multiplier before you use it:
int multiplier;
printf("Please note the lowest resolution available is 800x600\n");
printf("Please enter percentage ratio % below 100:");
scanf("%d",&multiplier);
And the index is staring from 0 not 1. And you should initialize res[0] and res[1] before using them. So:
res[0] = 800;
res[1] = 600;
And the division of multiplier by 100 will truncate to 0, try this without casting as it will be automaticly converted:
res[1]=(multiplier*res[0])/100;
res[1]=(multiplier*res[1])/100;

codechef :wrong answer error in smallfactorial

#include<stdio.h>
int fact(int k)
{
int j,f=1;
for(j=1;j<=k;j++)
f*=j;
return f;
}
int main()
{
int t,i,n[100],s[100],j;
scanf("%d",&t);
for(i=0;i<t;i++)
{
scanf("%d",&n[i]);
}
for(j=0;j<t;j++)
{
s[j]=fact(n[j]);
printf("%d \n",s[j]);
}
return 0;
}
You are asked to calculate factorials of some small positive integers.
Input
An integer t, 1<=t<=100, denoting the number of testcases, followed by t lines, each containing a single integer n, 1<=n<=100.
Output
For each integer n given at input, display a line with the value of n!
Example
Sample input:
4
1
2
5
3
Sample output:
1
2
120
6

Your code will give correct results for the given test cases but that doesn't prove that your code works. It is wrong is because of integer overflow. Try to calculate 100! by your program and you'll see what's the problem.
My answer lacked details. I'll update this to add details for an answer to the question as it stands now.
C has limitations over the the maximum and minimum size that can be stored in a variable. For doing arbitrary precision arithmetic it is usually advisable to use a bignum library as PHIFounder has suggested.
In the present case however, the use of external libraries is not possible. In this case arrays can be used to store integers exceeding the maximum value of the integers possible. OP has already found this possibility and used it. Her implementation, however, can use many optimizations.
Initially the use of large arrays like that can be reduced. Instead of using an array of 100 variables a single variable can be used to store the test cases. The use of large array and reading in test cases can give optimization only if you are using buffers to read in from stdin otherwise it won't be any better than calling scanf for reading the test cases by adding a scanf in the for loop for going over individual test cases.
It's your choice to either use buffering to get speed improvement or making a single integer instead of an array of 100 integers. In both the cases there will be improvements over the current solution linked to, on codechef, by the OP. For buffering you can refer to this question. If you see the timing results on codechef the result of buffering might not be visible because the number of operations in the rest of the logic is high.
Now second thing about the use of array[200]. The blog tutorial on codechef uses an array of 200 elements for demonstrating the logic. It is a naive approach as the tutorial itself points out. Storing a single digit at each array location is a huge waste of memory. That approach also leads to much more operations leading to a slower solution. An integer can at least store 5 digits (-32768 to 32767) and can generally store more. You can store the intermediate results in a long long int used as your temp and use all 5 digits. That simplification itself would lead to the use of only arr[40] instead of arr[200]. The code would need some additional changes to take care of forward carry and would become a little more complex but both speed and memory improvements would be visible.
You can refer to this for seeing my solutions or you can see this specific solution. I was able to take the use down to 26 elements only and it might be possible to take it further down.
I'll suggest you to put up your code on codereview for getting your code reviewed. There are many more issues that would be best reviewed there.

Here, your array index should start with 0 not 1 , I mean j and ishould be initialized to 0 in for loop.
Besides, try to use a debugger , that will assist you in finding bugs.
And if my guess is right you use turbo C, if yes then my recommendation is that you start using MinGW or Cygwin and try to compile on CLI, anyway just a recommendation.
There may be one more problem may be which is why codechef is not accepting your code you have defined function to accept the integer and then you are passing the array , may be this code will work for you:
#include<stdio.h>
int fact(int a[],int n)// here in function prototype I have defined it to take array as argument where n is array size.
{
int j=0,f=1,k;
for (k=a[j];k>0;k--)
f*=k;
return f;
}
int main()
{
int t,i,n[100],s[100],j;
setbuf(stdout,NULL);
printf("enter the test cases\n");
scanf("%d",&t); //given t test cases
for(i=0;i<t;i++)
{
scanf("%d",&n[i]); //value of the test cases whose factorial is to be calculated
}
for(j=0;j<t;j++)
{
s[j]=fact(&n[j],t);// and here I have passed it as required
printf("\n %d",s[j]); //output
}
return 0;
}
NOTE:- After the last edit by OP this implementation has some limitations , it can't calculate factorials for larger numbers say for 100 , again the edit has taken the question on a different track and this answer is fit only for small factorials

above program works only for small numbers that means upto 7!,after that that code not gives the correct results because 8! value is 40320
In c language SIGNED INTEGER range is -32768 to +32767
but the >8 factorial values is beyond that value so integer cant store those values
so above code can not give the right results
for getting correct values we declare the s[100] as LONG INT,But it is also work
only for some range

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight