Comparison between N data - c

I want compare N data of float type. This comparison must be done with a tolerance.
That means if the difference between 2 data (within N data) is less than or equal the tolerance then this 2 data will be considered valid, and I get one data, otherwise if the difference is more than tolerance then the data is invalid.
Have you any idea?
here is my code:
float mytab[N];
int i,j,index=0;
for (i = 0; i < N-1; i++)
{
for (j = i+1; j < N; j++)
{
if(tab[i].valid && tab[j].valid)
{
if ( ABS(tab[i]-tab[j])<= toleance)
{
mytab[index] = tab[i];
index++;
}
}
}
}
//after i search the min value of mytab which constain a
valid value within tolerance.
Example:
tolerance = 0.15;
Data: 20.005, 20.017, 21.20, 21.25, 25.75, 25.9, 20.1
In this example, if we based on the tolerance, we can choose (20.005 OR 20.017 OR 20.1) OR (21.20 OR 21.25).
But if we based on majority voting, we choose 20... instead of 21...

If I understand your basic question, you need to compare two floats. I think you are close with ABS ... but you need the floating-point version fabs available in math.h in C99.
#include <stdio.h>
#include <math.h>
int main (void)
{
float f1 = 1.00001;
float f2 = 1.00003;
float tol= 0.00010;
if (fabs(f1 - f2) <= tol) {
puts("Test1: f1 and f2 are equal-ish.");
} else {
puts("Test1: f1 and f2 are not equal-ish.");
}
tol= 0.0000001;
if (fabs(f1 - f2) <= tol) {
puts("Test2: f1 and f2 are equal-ish.");
} else {
puts("Test2: f1 and f2 are not equal-ish");
}
}
Testing
$ cc -g -Wall -O0 -std=c99 -pedantic -o Test test.c && ./Test
Test1: f1 and f2 are equal-ish.
Test2: f1 and f2 are not equal-ish

Please, make yourself clearer. Depending on the set of numbers, you can create multiple (differents and not intersecting) subsets that share this property.
If you intend to create the largest subset with values that are within a tolerance range to every single value of the original super set, than it's unique, but you're doing it the wrong way. You should, for each value in the set, if it's within tolerance range of every single value in the set. And only after checking with every single number, you can include it.
Like this:
float mytab[N];
int marker=1; //marker that will tell if any number is outside tolerance range of some other element (then marker will be converted to 0
int i,j,index=0;
for (i = 0; i < N; i++)
{
marker=1 //for every new number, reset marker
for (j = 0; j < N; j++)
{
if(tab[i].valid && tab[j].valid)
{
if ( fabs(tab[i]-tab[j])> toleance)
{
marker=0;
}
}
}
if(marker)
{
mytab[index]=tab[i]; index++; //marker will only be 1 if the number is within tolerance range of every element
}
}
Of course it's a very inneficient code. The greatest ranges will be between your candidate number and the smallest and largest number in your set. So, what I would do is to sort your list (or simply discover which is the largest and the smallest number in your set), and compare each element to those 2 elements. If they are within range with those, they are with everyone else. So 2 comparisons for each number, and not n (or n/2 if you were a bit smarter than me in the code above, like you tried to be in first place)

Related

Matchmaking program in C?

The problem I am given is the following:
Write a program to discover the answer to this puzzle:"Let's say men and women are paid equally (from the same uniform distribution). If women date randomly and marry the first man with a higher salary, what fraction of the population will get married?"
From this site
My issue is that it seems that the percent married figure I am getting is wrong. Another poster asked this same question on the programmers exchange before, and the percentage getting married should be ~68%. However, I am getting closer to 75% (with a lot of variance). If anyone can take a look and let me know where I went wrong, I would be very grateful.
I realize, looking at the other question that was on the programmers exchange, that this is not the most efficient way to solve the problem. However, I would like to solve the problem in this manner before using more efficient approaches.
My code is below, the bulk of the problem is "solved" in the test function:
#include <cs50.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define ARRAY_SIZE 100
#define MARRIED 1
#define SINGLE 0
#define MAX_SALARY 1000000
bool arrayContains(int* array, int val);
int test();
int main()
{
printf("Trial count: ");
int trials = GetInt();
int sum = 0;
for(int i = 0; i < trials; i++)
{
sum += test();
}
int average = (sum/trials) * 100;
printf("Approximately %d %% of the population will get married\n", average / ARRAY_SIZE);
}
int test()
{
srand(time(NULL));
int femArray[ARRAY_SIZE][2];
int maleArray[ARRAY_SIZE][2];
// load up random numbers
for (int i = 0; i < ARRAY_SIZE; i++)
{
femArray[i][0] = (rand() % MAX_SALARY);
femArray[i][1] = SINGLE;
maleArray[i][0] = (rand() % MAX_SALARY);
maleArray[i][1] = SINGLE;
}
srand(time(NULL));
int singleFemales = 0;
for (int k = 0; k < ARRAY_SIZE; k++)
{
int searches = 0; // count the unsuccessful matches
int checkedMates[ARRAY_SIZE] = {[0 ... ARRAY_SIZE - 1] = ARRAY_SIZE + 1};
while(true)
{
// ARRAY_SIZE - k is number of available people, subtract searches for people left
// checked all possible mates
if(((ARRAY_SIZE - k) - searches) == 0)
{
singleFemales++;
break;
}
int randMale = rand() % ARRAY_SIZE; // find a random male
while(arrayContains(checkedMates, randMale)) // ensure that the male was not checked earlier
{
randMale = rand() % ARRAY_SIZE;
}
checkedMates[searches] = randMale;
// male has a greater income and is single
if((femArray[k][0] < maleArray[randMale][0]) && (maleArray[randMale][1] == SINGLE))
{
femArray[k][1] = MARRIED;
maleArray[randMale][1] = MARRIED;
break;
}
else
{
searches++;
continue;
}
}
}
return ARRAY_SIZE - singleFemales;
}
bool arrayContains(int* array, int val)
{
for(int i = 0; i < ARRAY_SIZE; i++)
{
if (array[i] == val)
return true;
}
return false;
}
In the first place, there is some ambiguity in the problem as to what it means for the women to "date randomly". There are at least two plausible interpretations:
You cycle through the unmarried women, with each one randomly drawing one of the unmarried men and deciding, based on salary, whether to marry. On each pass through the available women, this probably results in some available men being dated by multiple women, and others being dated by none.
You divide each trial into rounds. In each round, you randomly shuffle the unmarried men among the unmarried women, so that each unmarried man dates exactly one unmarried woman.
In either case, you must repeat the matching until there are no more matches possible, which occurs when the maximum salary among eligible men is less than or equal to the minimum salary among eligible women.
In my tests, the two interpretations produced slightly different statistics: about 69.5% married using interpretation 1, and about 67.6% using interpretation 2. 100 trials of 100 potential couples each was enough to produce fairly low variance between runs. In the common (non-statistical) sense of the term, for example, the results from one set of 10 runs varied between 67.13% and 68.27%.
You appear not to take either of those interpretations, however. If I'm reading your code correctly, you go through the women exactly once, and for each one you keep drawing random men until either you find one that that woman can marry or you have tested every one. It should be clear that this yields a greater chance for women early in the list to be married, and that order-based bias will at minimum increase the variance of your results. I find it plausible that it also exerts a net bias toward more marriages, but I don't have a good argument in support.
Additionally, as I wrote in comments, you introduce some bias through the way you select random integers. The rand() function returns an int between 0 and RAND_MAX, inclusive, for RAND_MAX + 1 possible values. For the sake of argument, let's suppose those values are uniformly distributed over that range. If you use the % operator to shrink the range of the result to N possible values, then that result is still uniformly distributed only if N evenly divides RAND_MAX + 1, because otherwise more rand() results map to some values than map to others. In fact, this applies to any strictly mathematical transformation you might think of to narrow the range of the rand() results.
For the salaries, I don't see why you even bother to map them to a restricted range. RAND_MAX is as good a maximum salary as any other; the statistics gleaned from the simulation don't depend on the range of salaries; but only on their uniform distribution.
For selecting random indices into your arrays, however, either for drawing men or for shuffling, you do need a restricted range, so you do need to take care. The best way to reduce bias in this case is to force the random numbers drawn to come from a range that is evenly divisible by the number of options by re-drawing as many times as necessary to ensure it:
/*
* Returns a random `int` in the half-open interval [0, upper_bound).
* upper_bound must be positive, and should not exceed RAND_MAX + 1.
*/
int random_draw(int upper_bound) {
/* integer division truncates the remainder: */
int rand_bound = (RAND_MAX / upper_bound) * upper_bound;
for (;;) {
int r = rand();
if (r < rand_bound) {
return r % upper_bound;
}
}
}

C Program Runs Surprisingly Slow

A simple program I wrote in C takes upwards of half an hour to run. I am surprised that C would take so long to run, because from what I can find on the internet C ( aside from C++ or Java ) is one of the faster languages.
// this is a program to find the first triangular number that is divisible by 500 factors
int main()
{
int a; // for triangular num loop
int b = 1; // limit for triangular num (1+2+3+......+b)
int c; // factor counter
int d; // divisor
int e = 1; // ends loop
long long int t = 0; // triangular number in use
while( e != 0 )
{
c = 0;
// create triangular number t
t = t + b;
b++;
// printf("%lld\n", t); // in case you want to see where it's at
// counts factors
for( d = 1 ; d != t ; d++ )
{
if( t % d == 0 )
{
c++;
}
}
// test to see if condition is met
if( c > 500 )
{
break;
}
}
printf("%lld is the first triangular number with more than 500 factors\n", t);
getchar();
return 0;
}
Granted the program runs through a lot of data, but none of it is ever saved, just tested and passed over.
I am using the Tiny C Compiler on Windows 8.
Is there a reason this runs so slowly? What would be a faster way of achieving the same result?
Thank you!
You're iterating over a ton of numbers you don't need to. By definition, a positive factor is any whole number that can be multiplied by another to obtain the desired product.
Ex: 12 = 1*12, 2*6, and 3*4
The order of multiplication are NOT considered when deciding factors. In other words,
Ex: 12 = 2*6 = 6*2
The order doesn't matter. 2 and 6 are factors once.
The square root is the one singleton that will come out of a factoring of a product that stands alone. All others are in pairs, and I hope that is clear. Given that, you can significantly speed up your code by doing the following:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
// this is a program to find the first triangular number that is divisible by 500 factors
int main()
{
int c = 0; // factor counter
long long int b = 0; // limit for triangular num (1+2+3+......+b)
long long int d; // divisor
long long int t = 0; // triangular number in use
long long int r = 0; // root of current test number
while (c <= 500)
{
c = 0;
// next triangular number
t += ++b;
// get closest root.
r = floor(sqrt(t));
// counts factors
for( d = 1 ; d < r; ++d )
{
if( t % d == 0 )
c += 2; // add the factor *pair* (there are two)
}
if (t % r == 0) // add the square root if it is applicable.
++c;
}
printf("%lld is the first triangular number with more than 500 factors\n", t);
return 0;
}
Running this on IDEOne.com takes less than two seconds to come up with the following:
Output
76576500 is the first triangular number with more than 500 factors
I hope this helps. (and I think that is the correct answer). There are certainly more efficient ways of doing this (see here for some spoilers if you're interested), but going with your code idea and seeing how far we could take it was the goal of this answer.
Finally, this finds the first number with MORE than 500 factors (i.e. 501 or more) as per your output message. Your comment at the top of the file indicates you're looking for the first number with 500-or-more, which does not match up with your output message.
Without any math analysis:
...
do
{
c = 0;
t += b;
b++;
for (d = 1; d < t; ++d)
{
if (!(t % d))
{
c++;
}
}
} while (c <= 500);
...
You are implementing an O(n^2) algorithm. It would be surprising if the code took less than a half an hour.
Refer to your computer science textbook for a better method compared to this brute force method of: check 1, 1 + 2, 1 + 2 + 3, etc.
You might be able to shorten the inner for loop. Does it really need to check all the way up to t for factors that divide the triangular number. For example, can 10 be evenly divisible by any number greater than 5? or 100 by any number greater than 50?
Thus, given a number N, what is the largest number that can evenly divide N?
Keep reading/researching this problem.
Also, as other people have mentioned, the outer loop could be simply coded as:
while (1)
{
// etc.
}
So, no need need to declare e, or a? Note, this doesn't affect the length of time, but your coding style indicates you are still learning and thus a reviewer would question everything your code does!!
You are doing some unnecessary operations, and I think those instructions are not at all required if we can check that simply.
first :
while(e!=0)
as you declared e=1, if you put only 1 in loop it will work. You are not updating value of e anywhere.
Change that and check whether it works fine or not.
One of the beautiful things about triangle numbers, is that if you have a triangle number, with a simple addition operation, you can have the next one.

How to Approximate e in an Infinite Series in C

So I am trying to do this problem:
However, I'm not entirely sure where to start or what exactly I am looking for.
In addition, I was told I should expect to give the program inputs such as: zero (0), very small (0.00001), and not so small (0.1).
I was given this: http://en.wikipedia.org/wiki/E_%28mathematical_constant%29 as a reference, but that formula doesn't look exactly like the one in the problem.
And finally, I was told that the input to the program is a small number Epsilon. You may assume 0.00001f, for example.
You keep adding the infinite series until the current term's value is below the Epsilon.
But all in all, I have no clue what that means. I somewhat understand the equation on the wiki. However, I'm not sure where to start with the problem given. Looking at it, does anyone know what kind of formula I should be looking to use in C and what "E" is and where it comes into play here (i.e. within the formula, I understand it's suppose to be the user input).
Code So Far
#include <stdio.h>
#include <math.h>
//Program that takes in multiple dates and determines the earliest one
int main(void)
{
float e = 0;
float s = 0;
float ct = 1;
float ot= 1;
int n = 0;
float i = 0;
float den = 0;
int count = 0;
printf("Enter a value for E: ");
scanf("%f", &e);
printf("The value of e is: %f", e);
for(n = 0; ct > e; n++)
{
count++;
printf("The value of the current term is: %f", ct);
printf("In here %d\n", count);
den = 0;
for(i = n; i > 0; i--)
{
den *= i;
}
//If the old term is one (meaning the very first term), then just set that to the current term
if (ot= 1)
{
ct = ot - (1.0/den);
}
//If n is even, add the term as per the rules of the formula
else if (n%2 == 0)
{
ct = ot + (1.0/den);
ot = ct;
}
//Else if n is odd, subtract the term as per the rules of the formula
else
{
ct = ot - (1.0/den);
ot = ct;
}
//If the current term becomes less than epsilon (the user input), printout the value and break from the loop
if (ct < epsilon)
{
printf("%f is less than %f",ct ,e);
break;
}
}
return 0;
}
Current Output
Enter a value for E: .00001
The value of e is: 0.000010
The value of the current term is: 1.000000
In here 1
-1.#INF00 is less than 0.000010
So based on everyone's comments, and using the 4th "Derangements" equation from wikipedia like I was told, this is the code I've come up with. The logic in my head seems to be in line with what everyone has been saying. But the output is not at all what I am trying to achieve. Does anyone have any idea from looking at this code what I might be doing wrong?
Σ represents a sum, so your equation means to compute the sum of the terms starting at n=0 and going towards infinity:
The notation n! means "factorial" which is a product of the numbers one through n:
Each iteration computed more accurately represents the actual value. ε is an error term meaning that the iteration is changing by less than the ε amount.
To start computing an interation you need some starting conditions:
unsigned int n = 0; // Iteration. Start with n=0;
double fact = 1; // 0! = 1. Keep running product of iteration numbers for factorial.
double sum = 0; // Starting summation. Keep a running sum of terms.
double last; // Sum of previous iteration for computing e
double e; // epsilon value for deciding when done.
Then the algorithm is straightforward:
Store the previous sum.
Compute the next sum.
Update n and compute the next factorial.
Check if the difference in the new vs. old iteration exceeds epsilon.
The code:
do {
last = sum;
sum += 1/fact;
fact *= ++n;
} while(sum-last >= e);
You need to write a beginning C program. There are lots of sources on the interwebs for that, including how to get user input from the argc and argv variables. It looks like you are to use 0.00001f for epsilon if it is not entered. (Use that to get the program working before trying to get it to accept input.)
For computing the series, you will use a loop and some variables: sum, current_term, and n. In each loop iteration, compute the current_term using n, increment n, check if the current term is less than epsilon, and if not add the current_term to the sum.
The big pitfall to avoid here is computing integer division by mistake. For example, you will want to avoid expressions like 1/n. If you are going to use such an expression, use 1.0/n instead.
Well in fact this program is very similar to the ones given in the learning to Program in C by Deitel, well now to the point (the error can't be 0 cause e is a irrational number so it can't be calculated exactly) I have here a code that may be very useful for you.
#include <stdio.h>
/* Function Prototypes*/
long double eulerCalculator( float error, signed long int *iterations );
signed long int factorial( int j );
/* The main body of the program */
int main( void )
{
/*Variable declaration*/
float error;
signed long int iterations = 1;
printf( "Max Epsilon admited: " );
scanf( "%f", &error );
printf( "\n The Euler calculated is: %f\n", eulerCalculator( error, &iterations ) );
printf( "\n The last calculated fraction is: %f\n", factorial( iterations ) );
return 1;
}
long double eulerCalculator( float error, signed long int *iterations )
{
/* We declare the variables*/
long double n, ecalc;
/* We initialize result and e constant*/
ecalc = 1;
/* While the error is higher than than the calcualted different keep the loop */
do {
n = ( ( long double ) ( 1.0 / factorial( *iterations ) ) );
ecalc += n;
++*iterations;
} while ( error < n );
return ecalc;
}
signed long int factorial( signed long int j )
{
signed long int b = j - 1;
for (; b > 1; b--){
j *= b;
}
return j;
}
That summation symbol gives you a clue: you need a loop.
What's 0!? 1, of course. So your starting value for e is 1.
Next you'll write a loop for n from 1 to some larger value (infinity might suggest a while loop) where you calculate each successive term, see if its size exceeds your epsilon, and add it to the sum for e.
When your terms get smaller than your epsilon, stop the loop.
Don't worry about user input for now. Get your function working. Hard code an epsilon and see what happens when you change it. Leave the input for the last bit.
You'll need a good factorial function. (Not true - thanks to Mat for reminding me.)
Did you ask where the constant e comes from? And the series? The series is the Taylor series expansion for the exponential function. See any intro calculus text. And the constant e is simple the exponential function with exponent 1.
I've got a nice Java version working here, but I'm going to refrain from posting it. It looks just like the C function will, so I don't want to give it away.
UPDATE: Since you've shown yours, I'll show you mine:
package cruft;
/**
* MathConstant uses infinite series to calculate constants (e.g. Euler)
* #author Michael
* #link
* #since 10/7/12 12:24 PM
*/
public class MathConstant {
public static void main(String[] args) {
double epsilon = 1.0e-25;
System.out.println(String.format("e = %40.35f", e(epsilon)));
}
// value should be 2.71828182845904523536028747135266249775724709369995
// e = 2.718281828459045
public static double e(double epsilon) {
double euler = 1.0;
double term = 1.0;
int n = 1;
while (term > epsilon) {
term /= n++;
euler += term;
}
return euler;
}
}
But if you ever need a factorial function I'd recommend a table, memoization, and the gamma function over the naive student implementation. Google for those if you don't know what those are. Good luck.
Write a MAIN function and a FUNCTION to compute the approximate sum of the below series.
(n!)/(2n+1)! (from n=1 to infinity)
Within the MAIN function:
Read a variable EPSILON of type DOUBLE (desired accuracy) from
the standard input.
EPSILON is an extremely small positive number which is less than or equal to
to 10^(-6).
EPSILON value will be passed to the FUNCTION as an argument.
Within the FUNCTION:
In a do-while loop:
Continue adding up the terms until |Sn+1 - Sn| < EPSILON.
Sn is the sum of the first n-terms.
Sn+1 is the sum of the first (n+1)-terms.
When the desired accuracy EPSILON is reached print the SUM and the number
of TERMS added to the sum.
TEST the program with different EPSILON values (from 10^(-6) to 10^(-12))
one at a time.

GNU Scientific Library probability distribution functions in C

I have a set of GSL Histograms, which are used to make a set of probability distribution functions, which according to the documentation are stored in a struct, as follows:
Data Type: gsl_histogram_pdf
size_t n
This is the number of bins used to approximate the probability distribution function.
double * range
The ranges of the bins are stored in an array of n+1 elements pointed to by range.
double * sum
The cumulative probability for the bins is stored in an array of n elements pointed to by sum.
I am intending to use a KS test to determine, if data was similar or not. So, I am trying to access the sum of a given bin in this structure, to calculate the 'distance' and I assumed that, I should be able to access that value by using:
((my_type)->pdf->sum+x)
with X being the bin number.
Yet this always returns 0 no matter what I do, does anyone have any idea, what is going wrong?
Thanks in advance
---- EDIT ----
Here is a snippet of my code that deals with the pdf / histogram:
/* GSL Histogram creation */
for (i = 0; i < chrom->hits; i++) {
if ( (chrom+i)->spectra->peaks != 0 ) {
(chrom+i)->hist = gsl_histogram_alloc(bins);
gsl_histogram_set_ranges_uniform((chrom+i)->hist, low_mz, high_mz);
for (j = 0; j < (chrom+i)->spectra->peaks; j++) {
gsl_histogram_increment( (chrom+i)->hist, ((chrom+i)->spectra+j)->mz_value);
}
} else {
printf("0 value encountered!\n");
}
}
/* Histogram probability distribution function creation */
for (i = 0; i < chrom->hits; i++) {
if ( (chrom+i)->spectra->peaks != 0 ) {
(chrom+i)->pdf = gsl_histogram_pdf_alloc(bins);
gsl_histogram_pdf_init( (chrom+i)->pdf, (chrom+i)->hist);
} else {
continue;
}
}
/* Kolmogorov-Smirnov */
float D;
for (i = 0; i < chrom->hits-1; i++) {
printf("%f\n",((chrom+i)->pdf->sum+25));
for (j = i+1; j < chrom->hits; j++) {
D = 0;
diff = 0;
/* Determine max distance */
}
}
You compute a pointer to the value you intend to access.
Change your current pointer computation
printf("%f\n",((chrom+i)->pdf->sum+25));
either to a normal array subscript
printf("%f\n",(chrom+i)->pdf->sum[25]);
or to a pointer computation followed by a dereferencing
printf("%f\n",*((chrom+i)->pdf->sum+25));
See whether that fixes your issue. The value shouldn't be 0 either, but it might well get displayed as 0 as it might represent a pretty small floating point number, depending on memory virtual layout.

logsumexp implementation in C?

Does anybody know of an open source numerical C library that provides the logsumexp-function?
The logsumexp(a) function computes the sum of exponentials log(e^{a_1}+...e^{a_n}) of the components of the array a, avoiding numerical overflow.
Here's a very simple implementation from scratch (tested, at least minimally):
double logsumexp(double nums[], size_t ct) {
double max_exp = nums[0], sum = 0.0;
size_t i;
for (i = 1 ; i < ct ; i++)
if (nums[i] > max_exp)
max_exp = nums[i];
for (i = 0; i < ct ; i++)
sum += exp(nums[i] - max_exp);
return log(sum) + max_exp;
}
This does the trick of effectively dividing all of the arguments by the largest, then adding its log back in at the end to avoid overflow, so it's well-behaved for adding a large number of similarly-scaled values, with errors creeping in if some arguments are many orders of magnitude larger than others.
If you want it to run without crashing when given 0 arguments, you'll have to add a case for that :)

Resources