Uniform Distribution: Bug or Paradox - c

Imagine 10 cars randomly, uniformly distributed on a round track of length 1. If the positions are represented by a C double in the range [0,1> then they can be sorted and the gaps between the cars should be the position of the car in front minus the position of the car behind. The last gap needs 1 added to account for the discontinuity.
In the program output, the last column has very different statistics and distribution from the others. The rows correctly add to 1. What's going on?
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int compare (const void * a, const void * b)
{
if (*(double*)a > *(double*)b) return 1;
else if (*(double*)a < *(double*)b) return -1;
else return 0;
}
double grand_f_0_1(){
static FILE * fp = NULL;
uint64_t bits;
if(fp == NULL) fp = fopen("/dev/urandom", "r");
fread(&bits, sizeof(bits), 1, fp);
return (double)bits * 5.421010862427522170037264004349e-020; // https://stackoverflow.com/a/26867455
}
int main()
{
const int n = 10;
double values[n];
double diffs[n];
int i, j;
for(j=0; j<10000; j++) {
for(i=0; i<n; i++) values[i] = grand_f_0_1();
qsort(values, n, sizeof(double), compare);
for(i=0; i<(n-1); i++) diffs[i] = values[i+1] - values[i];
diffs[n-1] = 1. + values[0] - values[n-1];
for(i=0; i<n; i++) printf("%.5f%s", diffs[i], i<(n-1)?"\t":"\n");
}
return(0);
}
Here is a sample of the output. The first column represents the gap between the first and second car. The last column represents the gap between 10th car and the first car, across the start/finish line. Large numbers like .33 and .51 are much more common in the last column and very small numbers are relatively rare.
0.13906 0.14241 0.24139 0.29450 0.01387 0.07906 0.02905 0.03160 0.00945 0.01962
0.01826 0.36875 0.04377 0.05016 0.05939 0.02388 0.10363 0.04640 0.03538 0.25037
0.04496 0.05036 0.00536 0.03645 0.13741 0.00538 0.24632 0.04452 0.07750 0.35176
0.00271 0.15540 0.03399 0.05654 0.00815 0.01700 0.24275 0.25494 0.00206 0.22647
0.34420 0.03226 0.01573 0.08597 0.05616 0.00450 0.05940 0.09492 0.05545 0.25141
0.18968 0.34749 0.07375 0.01481 0.01027 0.00669 0.04306 0.00279 0.08349 0.22796
0.16135 0.02824 0.07965 0.11255 0.05570 0.05550 0.05575 0.05586 0.07156 0.32385
0.12799 0.18870 0.04153 0.16590 0.02079 0.06612 0.08455 0.14696 0.13088 0.02659
0.00810 0.06335 0.13014 0.06803 0.01878 0.10119 0.00199 0.06656 0.20922 0.33263
0.00715 0.03261 0.05779 0.47221 0.13998 0.11044 0.06397 0.00238 0.04157 0.07190
0.33703 0.02945 0.06164 0.01555 0.03444 0.14547 0.02342 0.03804 0.16088 0.15407
0.10912 0.14419 0.04340 0.09204 0.23033 0.09240 0.14530 0.00960 0.03412 0.09950
0.20165 0.09222 0.04268 0.17820 0.19159 0.02074 0.05634 0.00237 0.09559 0.11863
0.09296 0.01148 0.20442 0.07070 0.05221 0.04591 0.08455 0.25799 0.01417 0.16561
0.08846 0.07075 0.03732 0.11721 0.03095 0.24329 0.06630 0.06655 0.08060 0.19857
0.06225 0.10971 0.10978 0.01369 0.13479 0.17539 0.17540 0.02690 0.00464 0.18744
0.09431 0.10851 0.05079 0.07846 0.00162 0.00463 0.06533 0.18752 0.30896 0.09986
0.23214 0.11937 0.10215 0.04040 0.02876 0.00979 0.02443 0.21859 0.15627 0.06811
0.04522 0.07920 0.02432 0.01949 0.03837 0.10967 0.11123 0.01490 0.03846 0.51915
0.13486 0.02961 0.00818 0.11947 0.17204 0.08967 0.09767 0.03349 0.08077 0.23426

Your code is ok. The mean value of the last difference it two times larger than the others.
The paradox comes from the fact is that rather than selecting 10 points on a unit interval one actually tries to divide it into 11 sub-intervals with 10 cuts.Therefore the expected length of each sub-interval is 1/11.
The difference between consecutive points is approaching 1/11 except the last pair because it contains the last sub-interval (between the last point and point 1) and the first one (between point 0 and the first point).
Thus the mean of the last difference is 2/11.

"There is no special points on the circle"
The thing is that, on a circle, one car looks always the same, and so there is no need to relate to zero: you can just relate to the first car. This means that you can fix the first car at zero, and treat the random positions of the other cars as related to it (measured from it).
And so, the convenient solution is to fix the first car at zero and think of the 9 numbers you still generate as positions related to the first one.
Hope it's a satisfying answer :-)
IDENTITY (or Which diff is first?)
If 10 cars with labels ("1","2" and so on) are placed randomly on a circle, the difference from the "1" to the next will average 1/10.
While sorting, the first diff "loses it's identity", what it is changes: it's similar to that if you chose the 1st diff to be the longest one, it would average more. Choosing it based on relation of cars to zero skews (or, in nicer terms: changes) things in a similar manner.
The first difference (2nd, 3rd etc.) just becomes something different – defining it as a difference from a given car is more intuitive, and gives an option to use it as a reference (playing nicely with circle symmetry); the distribution of the rest of cars with respect to it is a uniform one. Dealing with the smallest of random points is not that simple.
Summary: define what you're calculating, know your definitions and probability is non-intuitive

After 3 months of puzzling over this, I have an explanation that is intuitive, at least to me. This is cumulative to the answers provided by #wojand and #tstanisl.
My original code is correct: it uniformly distributes points on the interval, and the forward differences of all points have the same statistical distribution. The paradox is that the forward difference of the highest-value point, the one that crosses the 0-1 discontinuity, is on average twice the others and it's distribution has a different shape.
The reason this forward difference has a different distribution is that it contains the value 0. Larger forward differences (gaps) are more likely to contain any fixed value, simply because they are larger.
We could search for the gap that contains 1/pi, for example, and it too would have the same atypical distribution.

Related

Blackjack Probabilities

I'm currently thinking about a code question about the C language, its a game called Blackjack, and here is the original question:
In practice, one need to play the game a large number of times to get an accurate expected
value. Thus, each row of the table should be the results of at least 100,000 experiments. For example, for a particular target points, say 10 points, two cards are drawn first. If the sum of these two cards exceeds 10 points then this experiment is a failure. If the sum is exactly 10 points, then it is a success. If it is less than 10 points, then another card is drawn. If case of neither a failure (more than 10 points) or a success (exactly 10 points), cards are continuously drawn until a conclusive results is obtained. After 100,000 experiments, the probability of getting 10 points should be printed together with the average number of cards of getting 10 points (the third column of the table).
Below is my current code:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int r1,r2,count,sum,cardsadd,k;
int aftersum=sum+k;
unsigned int total,cardsum;
float percent,cards;
printf("Points Probability #Cards\n");
for (int points=4; points<=21; points++){
count = 0;
total = 0;
cardsum = 0;
do{
r1 = rand()%13 + 1;
r2 = rand()%13 + 1;
if(r1>10) r1=10;
if(r2>10) r2=10;
sum = r1+r2;
if(r1==1 && r2==1) sum=12;
else if ((r1==1 || r2==1) && r1!=r2) sum+=10;
count++;
cardsadd=0;
if(sum==points){
total++;
cardsum+=2;
}
else if(sum<points){
while(sum<points){
do{
cardsadd+=1;
k = rand()%13 + 1;
if(k>10) k=10;
else if(k==1){
if(sum<=10) k=11;
}
}while(aftersum>points);
sum+=k;
}
total+=1;
cardsum+=aftersum;
}
}while(count<100000);
percent = (float)total/1000;
cards = (float)cardsum/100000;
printf(" %2d %5.2lf%% ",points,percent);
printf("%.2lf\n",cards);
}
return 0;
}
In my code, variable count is the times needed to execute for each cards (4 to 21), total is the correct times when sum of the cards number is successfully equal to the points we want in the beginning (for loop). And cardsum is the total cards we need in 100000 tests, cardsadd is used when the first two cards drawn is less than the point we want, then we will keep drawing until sum of the point is equal to the points in the beginning.
I don't have the correct answer yet but I know my code is surely wrong, as I can clearly see that the average cards we need to get 4 points is not 2.00.
Hope someone can tell me how I should correct my code to get the answer. If anything is not clearly narrated, I will give a more complete explanation of the parts. Thanks for helping.
With an ace you have 2 possibles scores (the soft and the hard);
You cannot compare "points" with only score in case you have an ace because for example with ace and 5 you can have 6 or 16;
You need to modify your program to take this both scores in consideration (in case of an ace);

How to solve a runtime error happening when I use a big size of static array

my development environment : visual studio
Now, I have to create a input file and print random numbers from 1 to 500000 without duplicating in the file. First, I considered that if I use a big size of local array, problems related to heap may happen. So, I tried to declare as a static array. Then, in main function, I put random numbers without overlapping in the array and wrote the numbers in input file accessing array elements. However, runtime errors(the continuous blinking of the cursor in the console window) continue to occur.
The source code is as follows.
#define SIZE 500000
int sort[500000];
int main()
{
FILE* input = NULL;
input = fopen("input.txt", "w");
if (sort != NULL)
{
srand((unsigned)time(NULL));
for (int i = 0; i < SIZE; i++)
{
sort[i] = (rand() % SIZE) + 1;
for (int j = 0; j < i; j++)
{
if (sort[i] == sort[j])
{
i--;
break;
}
}
}
for (int i = 0; i < SIZE; i++)
{
fprintf(input, "%d ", sort[i]);
}
fclose(input);
}
return 0;
}
When I tried to reduce the array size from 1 to 5000, it has been implemented. So, Carefully, I think it's a memory out phenomenon. Finally, I'd appreciate it if you could comment on how to solve this problem.
“First, I considered that if I use a big size of local array, problems related to heap may happen.”
That does not make any sense. Automatic local objects generally come from the stack, not the heap. (Also, “heap” is the wrong word; a heap is a particular kind of data structure, but the malloc family of routines may use other data structures for managing memory. This can be referred to simply as dynamically allocated memory or allocated memory.)
However, runtime errors(the continuous blinking of the cursor in the console window)…
Continuous blinking of the cursor is normal operation, not a run-time error. Perhaps you are trying to say your program continues executing without ever stopping.
#define SIZE 500000<br>
...
sort[i] = (rand() % SIZE) + 1;
The C standard only requires rand to generate numbers from 0 to 32767. Some implementations may provide more. However, if your implementation does not generate numbers up to 499,999, then it will never generate the numbers required to fill the array using this method.
Also, using % to reduce the rand result skews the distribution. For example, if we were reducing modulo 30,000, and rand generated numbers from 0 to 44,999, then rand() % 30000 would generate the numbers from 0 to 14,999 each two times out of every 45,000 and the numbers from 15,000 to 29,999 each one time out of every 45,000.
for (int j = 0; j < i; j++)
So this algorithm attempts to find new numbers by rejecting those that duplicate previous numbers. When working on the last of n numbers, the average number of tries is n, if the selection of random numbers is uniform. When working on the second-to-last number, the average is n/2. When working on the third-to-last, the average is n/3. So the average number of tries for all the numbers is n + n/2 + n/3 + n/4 + n/5 + … 1.
For 5000 elements, this sum is around 45,472.5. For 500,000 elements, it is around 6,849,790. So your program will average around 150 times the number of tries with 500,000 elements than with 5,000. However, each try also takes longer: For the first try, you check against zero prior elements for duplicates. For the second, you check against one prior element. For try n, you check against n−1 elements. So, for the last of 500,000 elements, you check against 499,999 elements, and, on average, you have to repeat this 500,000 times. So the last try takes around 500,000•499,999 = 249,999,500,000 units of work.
Refining this estimate, for each selection i, a successful attempt that gets completely through the loop of checking requires checking against all i−1 prior numbers. An unsuccessful attempt will average going halfway through the prior numbers. So, for selection i, there is one successful check of i−1 numbers and, on average, n/(n+1−i) unsuccessful checks of an average of (i−1)/2 numbers.
For 5,000 numbers, the average number of checks will be around 107,455,347. For 500,000 numbers, the average will be around 1,649,951,055,183. Thus, your program with 500,000 numbers takes more than 15,000 times as long than with 5,000 numbers.
When I tried to reduce the array size from 1 to 5000, it has been implemented.
I think you mean that with an array size of 5,000, the program completes execution in a short amount of time?
So, Carefully, I think it's a memory out phenomenon.
No, there is no memory issue here. Modern general-purpose computer systems easily handle static arrays of 500,000 int.
Finally, I'd appreciate it if you could comment on how to solve this problem.
Use a Fischer-Yates shuffle: Fill the array A with integers from 1 to SIZE. Set a counter, say d to the number of selections completed so far, initially zero. Then pick a random number r from 1 to SIZE-d. Move the number in that position of the array to the front by swapping A[r] with A[d]. Then increment d. Repeat until d reaches SIZE-1.
This will swap a random element of the initial array into A[0], then a random element from those remaining into A[1], then a random element from those remaining into A[2], and so on. (We stop when d reaches SIZE-1 rather than when it reaches SIZE because, once d reaches SIZE-1, there is only one more selection to make, but there is also only one number left, and it is already in the last position in the array.)

recursion linked list ? I think

I am very new and I have trouble understanding the macro steps I need to learn how to effectively code. These assignments feel extremely abstract and have to learn everything about recursion before I can even do it. Coding up a program is not easy, and I do really well when someone "helps me stay between the mayonnaise and mustard" so to speak. What am I doing wrong and what direction do I need to continue in?
I was thinking that I needed to sort the list first then have two seperate functions for merge sort and insertion sort per the assignment:
You are spending most of your time at home in this pandemic. It is of most importance for people to be aware of where other people, who
are infected with COVID-19 are, and who they've been near. Keeping
track of this information is known as "contact tracing."
You've heard that there might be some very high paying jobs
if you can show your contract tracing skills to the
government, so you've decided to write a little program to highlight
those skills.Your area can be modeled on the Cartesian plane. You are
located at the point (x, y). In addition, you have the Cartesian
coordinates of all people currently infected with COVID-19.
What you would like to do is write a program that sorts
these locations based on their distance from you, followed by
handling queries. The queries are of the form of a point you are
thinking of visiting. Your program should identify if someone who is
infected is at that location, and if so, what their rank is on the
sorted list of infected people. If no one is infected at
that location, you should correctly identify this.
Note: There are many important implementation restrictions for this assignment, so to make sure everyone reads these, the section on
implementation restrictions will be next, changing the order of the
sections as compared to other assignments.
Implementation Restrictions
You must use a specified combination of Merge Sort and Insertion Sort to sort the point data. Specifically, for each input
case, a threshold value, t, will be given. If the subsection of the
array to sort has t or fewer values to sort, Insertion Sort should be
used. Otherwise, Merge Sort should be used.Further details about the
comparison used for the sorting are below.
You must store your coordinates in a struct that contains two integer fields.
You must write a function compareTo which takes in two pointers, ptrPt1 and ptrPt2, to coordinate structs and returns a
negative integer if the point pointed to by ptrPt1 is closer to you
than the point pointed to by ptrPt2, 0 if the two locations pointed to
by both are identical locations, and a positive integer if the point
pointed to by ptrPt1 is farther from you than the point pointed to by
ptrPt2. Exceptions to this will be when the two pointers are pointing
to points that are the same distance from you, but are distinct
points. In these cases, if ptrPt1's x coordinate is lower
than ptrPt2's x coordinate, a negative integer must be
returned.Alternatively, if ptrPt1's x coordinate is greater than
ptrPt2's x coordinate a positive integer must be returned. Finally, if
the x coordinate of both points is the same, if ptrPt1's y coordinate
is lower than ptrPt2's y coordinate, a negative integer must be
returned. If ptrPt1's y coordinate is greater than ptrPt2's y
coordinate, a positive integer must be returned.
Since your location must be used for sorting, please make the variable that stores your x and y coordinates global. Your program
should have no other global variables.
A Binary Search function must
be used when answering queries.
Your sort function should take in
the array to be sorted,the length of the array as well as the
threshold value, t, previously mentioned. This function should NOT
be recursive. It should be a wrapper function.
The recursive sort
function (you can call this mergeSort) should take in the
array, a starting index into the array, an ending index into the
array and the threshold value t. In this function, either recursive
calls should be made OR a call to an insertion sort function should be
made.
The Problem
Given your location, and the location of each person who has COVID-19, sort the list by distance from you from shortest to
longest, breaking ties by x-coordinate (lower comes first),
and then breaking those ties by y coordinate (lower comes
first). After sorting, answer several queries about points in the
coordinate plane. Specifically, determine if a query point contains
someone who is infected or not. If so, determine that person's ranking
on the sorted list in distance from you.
The Input(to be read from standard input)-Your Program Will Be Tested on Multiple Files
The
first line of the input contains 5 integers separated by spaces. The
first two of these values are x and y (|x|, |y| ≤ 10000), representing
your location. The third integer is n (2 ≤ n ≤ 106), representing the
number of infected people. The fourth integer is s (1 ≤ s ≤ 2x105),
representing the number of points to search for. The last
integer, t (1 ≤ t≤ 30), represents the threshold to be used for
determining whether you run Merge Sort of Insertion Sort. The next
n lines of the input contain x and y coordinate values, respectively,
separated by spaces, representing the locations of infected people.
Each of these values will be integers and the points will be distinct
(and also different from your location) and the absolute value of x
and y for all of these coordinates will not exceed 10,000.Then the
next s lines of the file contains x and y coordinate values for
searching. Both values on each line will be integers with an absolute
valueless than or equal to 10,000.
The Output (to be printed to standard out)
The first n lines of output should contain the coordinates of the people infected,
sorted as previously mentioned. These lines should have the
x-coordinate, followed by a space, followed by the y-coordinate.The
last s lines of output will contain the answers to each of the s queries
in the input. The answer for a single query will be on a line by
itself. If the point queried contains an infected person, output a
line with the following format:
x y found at rank R
, where (x, y) is the
query point, and R is the one-based rank of that infected person in
the sorted list. (Thus, R will be 1 more than the array index in which
(x, y) is located, after sorting.) If the point queried does NOT
contain an infected person, output a line with the following format:
x y not found
Sample Input
(Note: Query points in blue for clarity. last five)
0 0 14 5 53
1 -6
-2 -4
3 4
-4 2
4 -1
3 2
2 0
-5 -4
-2 -6
6 4
4 -2
4 0
5 -4
6 2
-13 1
0 -5
my code so far
#include <stdio.h>
int x = 0;//global coordinates
int y = 0;
typedef struct {
int xInput, yInput;
}coordinates;
void scanPoints(coordinates[], int infectedPeople);
void scanSearchValues(coordinates[], int pointsToSearch);
void SortPoints(coordinates[], int);
int lessThan(coordinates[], int, int);
void printPoints(coordinates[], int);
void
scanPoints(coordinates pts[], int infectedPeople){
for (int i = 0; i < infectedPeople; i++){
scanf("%d %d", &pts[i].xInput, &pts[i].yInput);
}
}
void
scanSearchValues(coordinates pts[], int pointsToSearch){
for (int i = 0; i < pointsToSearch; i++){
scanf("%d %d", &pts[i].xInput, &pts[i].yInput);
}
}
void
sortPoints(coordinates pts[], int infectedPeople){
int i, start, min_index, temp;
for (start = 0; start < infectedPeople - 1; start++) {
min_index = start;
for (i = start + 1; i < infectedPeople; i++) {
if (lessThan(pts, i, min_index)) {
min_index = i;
}
}
if (min_index != start) {
coordinates temp = pts[start];
pts[start] = pts[min_index];
pts[min_index] = temp;
}
}
}
int
lessThan(coordinates pts[], int p, int q) {
if ((pts[p].xInput < pts[q].xInput) || ((pts[p].xInput == pts[q].xInput) && (pts[p].yInput < pts[q].yInput))) {
return 1;
}
}
int
main(int argc, const char * argv[]) {
int infectedPeople;
int pointsToSearch;
int threshold;
scanf("%d%d", &x, &y);
if(x > 10000 || y > 10000 )
return 0;
scanf("%d", &infectedPeople);
if(infectedPeople < 2 || infectedPeople > 1000000)
return 0;
scanf("%d", &pointsToSearch);
if(pointsToSearch < 1 || pointsToSearch > 200000)
return 0;
scanf("%d", &threshold);
if(threshold < 1 || threshold > 30)
return 0;
return 0;
}
This is a challenging exercise for someone new to programming, but the first step is to read the problem description carefully. It might help to print it out on paper, so that you can easily mark it up with highlighter and / or pen. Additionally, you may be intimidated by all the details specified in the exercise. Don't be! Although some make work for you, most make decisions for you. The latter kind save you work, and do you exactly the service you asked of us: help you stay on track.
One of the keys to programming is learning to divide a problem into smaller pieces. Sometimes those pieces will also need to be divided into even smaller pieces. Many of these pieces will correspond naturally to functions, and accordingly, a second key to programming is recognizing how to choose the pieces so that they have well-defined inputs and outputs, and, to some extent, so that pieces can be re-used. In your case, the overall problem statement gives you a starting point for performing such an analysis:
Given your location, and the location of each person who has COVID-19,
sort the list by distance from you from shortest to longest, breaking ties by x-coordinate (lower comes first), and then breaking
those ties by y coordinate (lower comes first). After sorting, answer
several queries about points in the coordinate plane. Specifically,
determine if a query point contains someone who is infected or not. If
so, determine that person's ranking on the sorted list in distance
from you.
(Emphasis added.) The three main pieces I see there are
read and store input data
sort the data
analyze the result and produce output
Reading the input
The implementation restrictions in the problem description have a lot to say about how you read and store the data. In particular,
You must store your coordinates in a struct that contains two integer fields.
You've prepared such a structure type.
Since your location must be used for sorting, please make the variable that stores your x and y coordinates global. Your program should have no other global variables.
Reading the restrictions carefully, I think the expectation is that you use the coordinates structure to represent all coordinates appearing in the program, including the (one) global variable representing your own coordinates.
Your sort function should take in the array to be sorted
You mentioned a linked list, but this indicates that you are expected to store the data in an array, not a linked list. From my more experienced vantage point, I have more reasons to believe that that is the expectation.
The detailed description of the input format gives you additional guidance on how to perform the reading, as of course the code needs to be suited to the data. So, read the first line of input to get the main program parameters, and store them appropriately. Among those is the number of infected person records to read; you'll need to store all those in memory in order to sort them and answer multiple queries about them, so allocate an array of structs large enough to hold them, then proceed to read those data.
You could similarly read and store the queries in advance, but I would suggest instead performing the required sorting first, and then processing each query immediately after reading it, without storing the whole list of queries.
Sorting the data
You write,
I was thinking that I needed to sort the list first then have two seperate functions for merge sort and insertion sort
Yes, I too read the problem description to be asking for separate merge sort and insertion sort functions -- and that's not what you seem presently to be providing. It also asks for a wrapper function that accepts the input and passes it on to the appropriate sort implementation, either (recursive) merge sort or insertion sort. Note that the wrapper function does not itself sort the list, except inasmuch as it passes the list to one of the other functions for sorting:
void sortCoordinates(coordinates coords[], int count, int threshold) {
if (/* your condition here */) {
insertionSortCoordinates(coords, count);
} else {
mergeSortCoordinates(coords, count);
}
}
(The names and most of the details of these particular functions are at your discretion.)
Additionally, the restrictions help you out again here, though you need to read between the lines a bit. Both sorting and searching require that you have a way to compare the objects in the list, and look! The restrictions tell you exactly what form that should take:
You must write a function compareTo which takes in two pointers, ptrPt1 and ptrPt2, to coordinate structs [...]
In other words,
int compareTo(coordinates *ptrPt1, coordinates *ptrPt2) {
/* your code here */
}
Your insertion and merge sort functions and also your binary search function (see below) will compare structures (when needed) by calling that function.
Do pay careful attention to the restrictions, though, as one of the decisions they make for you is the name for this function: compareTo, not lessThan. Deviating from the restrictions in this regard would likely cost you some marks.
Computing the output
Whether you read and store the query lines in advance or process them as you read them (having first sorted the input), the main computation to be performed is a binary search of the coordinates, per restriction 5. You'll wan't a function for that, maybe
int binarySearch(coordinates *target, coordinates coords[]) {
/* your code here: returns 1-based rank if found, zero if not found */
}
Again, this function will use your compareTo function to compare coordinate structures. Note in particular that if implemented correctly according to the restrictions, compareTo() will return zero if and only if the two objects being compared are equal.
in
int
lessThan(coordinates pts[], int p, int q) {
if ((pts[p].xInput < pts[q].xInput) || ((pts[p].xInput == pts[q].xInput) && (pts[p].yInput < pts[q].yInput))) {
return 1;
}
}
if ((pts[p].xInput < pts[q].xInput) || ((pts[p].xInput == pts[q].xInput) && (pts[p].yInput < pts[q].yInput))) is false the function does not return, introducing an undefined behavior in sortPoints
you wanted
int lessThan(coordinates pts[], int p, int q)
{
return ((pts[p].xInput < pts[q].xInput) || ((pts[p].xInput == pts[q].xInput) && (pts[p].yInput < pts[q].yInput)));
}
in sortPoints the variable temp in int i, start, min_index, temp; is useless, remove it
In main you only read the 5 values, nothing more, so the other functions are useless, and you do not print nor compute something
Not sure my answer is really usefull ...

Best (fastest) way to find the number most frequently entered in C?

Well, I think the title basically explains my doubt. I will have n numbers to read, this n numbers go from 1 to x, where x is at most 105. What is the fastest (less possible time to run it) way to find out which number were inserted more times? That knowing that the number that appears most times appears more than half of the times.
What I've tried so far:
//for (1<=x<=10⁵)
int v[100000+1];
//multiple instances , ends when n = 0
while (scanf("%d", &n)&&n>0) {
zerofill(v);
for (i=0; i<n; i++) {
scanf("%d", &x);
v[x]++;
if (v[x]>n/2)
i=n;
}
printf("%d\n", x);
}
Zero-filling a array of x positions and increasing the position vector[x] and at the same time verifying if vector[x] is greater than n/2 it's not fast enough.
Any idea might help, thank you.
Observation: No need to care about amount of memory used.
The trivial solution of keeping a counter array is O(n) and you obviously can't get better than that. The fight is then about the constants and this is where a lot of details will play the game, including exactly what are the values of n and x, what kind of processor, what kind of architecture and so on.
On the other side this seems really the "knockout" problem, but that algorithm will need two passes over the data and an extra conditional, thus in practical terms in the computers I know it will be most probably slower than the array of counters solutions for a lot of n and x values.
The good point of the knockout solution is that you don't need to put a limit x on the values and you don't need any extra memory.
If you know already that there is a value with the absolute majority (and you simply need to find what is this value) then this could make it (but there are two conditionals in the inner loop):
initialize count = 0
loop over all elements
if count is 0 then set champion = element and count = 1
else if element != champion decrement count
else increment count
at the end of the loop your champion will be the value with the absolute majority of elements, if such a value is present.
But as said before I'd expect a trivial
for (int i=0,n=size; i<n; i++) {
if (++count[x[i]] > half) return x[i];
}
to be faster.
EDIT
After your edit seems you're really looking for the knockout algorithm, but caring about speed that's probably still the wrong question with modern computers (100000 elements is nothing even for a nail-sized single chip today).
I think you can create a max heap for the count of number you read,and use heap sort to find all the count which greater than n/2

Linear Search Algorithm Optimization

I just finished a homework problem for Computer Science 1 (yes, it's homework, but hear me out!). Now, the assignment is 100% complete and working, so I don't need help on it. My question involves the efficiency of an algorithm I'm using (we aren't graded on algorithmic efficiency yet, I'm just really curious).
The function I'm about to present currently uses a modified version of the linear search algorithm (that I came up with, all by myself!) in order to check how many numbers on a given lottery ticket match the winning numbers, assuming that both the numbers on the ticket and the numbers drawn are in ascending order. I was wondering, is there any way to make this algorithm more efficient?
/*
* Function: ticketCheck
*
* #param struct ticket
* #param array winningNums[6]
*
* Takes in a ticket, counts how many numbers
* in the ticket match, and returns the number
* of matches.
*
* Uses a modified linear search algorithm,
* in which the index of the successor to the
* last matched number is used as the index of
* the first number tested for the next ticket value.
*
* #return int numMatches
*/
int ticketCheck( struct ticket ticket, int winningNums[6] )
{
int numMatches = 0;
int offset = 0;
int i;
int j;
for( i = 0; i < 6; i++ )
{
for( j = 0 + offset; j < 6; j++ )
{
if( ticket.ticketNum[i] == winningNums[j] )
{
numMatches++;
offset = j + 1;
break;
}
if( ticket.ticketNum[i] < winningNums[j] )
{
i++;
j--;
continue;
}
}
}
return numMatches;
}
It's more or less there, but not quite. In most situations, it's O(n), but it's O(n^2) if every ticketNum is greater than every winningNum. (This is because the inner j loop doesn't break when j==6 like it should, but runs the next i iteration instead.)
You want your algorithm to increment either i or j at each step, and to terminate when i==6 or j==6. [Your algorithm almost satisfies this, as stated above.] As a result, you only need one loop:
for (i=0,j=0; i<6 && j<6; /* no increment step here */) {
if (ticketNum[i] == winningNum[j]) {
numMatches++;
i++;
j++;
}
else if (ticketNum[i] < winningNum[j]) {
/* ticketNum[i] won't match any winningNum, discard it */
i++;
}
else { /* ticketNum[i] > winningNum[j] */
/* discard winningNum[j] similarly */
j++;
}
}
Clearly this is O(n); at each stage, it either increments i or j, so the most steps it can do is 2*n-1. This has almost the same behaviour as your algorithm, but is easier to follow and easier to see that it's correct.
You're basically looking for the size of the intersection of two sets. Given that most lottos use around 50 balls (or so), you could store the numbers as bits that are set in an unsigned long long. Finding the common numbers is then a simple matter of ANDing the two together: commonNums = TicketNums & winningNums;.
Finding the size of the intersection is a matter of counting the one bits in the resulting number, a subject that's been covered previously (though in this case, you'd use 64-bit numbers, or a pair of 32-bit numbers, instead of a single 32-bit number).
Yes, there is something faster, but probably using more memory. Make an array full of 0 in the size of the possible numbers, put a 1 on every drawn number. For every ticket number add the value at the index of that number.
int NumsArray[MAX_NUMBER+1];
memset(NumsArray, 0, sizeof NumsArray);
for( i = 0; i < 6; i++ )
NumsArray[winningNums[i]] = 1;
for( i = 0; i < 6; i++ )
numMatches += NumsArray[ticket.ticketNum[i]];
12 loop rounds instead of up to 36
The surrounding code left as an exercise.
EDIT: It also has the advantage of not needing to sort both set of values.
This is really only a minor change on a scale like this, but if the second loop reaches a number bigger than the current ticket number, it is already allowed to brake. Furthermore, if your seconds traverses numbers lower than your ticket number, it may update the offset even if no match is found within that iteration.
PS:
Not to forget, general results on efficiency make more sense, if we take the number of balls or the size of the ticket to be variable. Otherwise it is too much dependent of the machine.
If instead of comparing the arrays of lottery numbers you were to create two bit arrays of flags -- each flag being set if it's index is in that array -- then you could perform a bitwise and on the two bit arrays (the lottery ticket and the winning number sets) and produce another bit array whose bits were flags for matching numbers only. Then count the bits set.
For many lotteries 64 bits would be enough, so a uint64_t should be big enough to cover this. Also, some architectures have instructions to count the bits set in a register, which some compilers might be able to recognize and optimize for.
The efficiency of this algorithm is based both on the range of lottery numbers (M) and the number of lottery numbers per ticket (N). The setting if the flags is O(N), while the and-ing of the two bit arrays and counting of the bits could be O(M), depending on if your M (lotto number range) is larger than the size that the target cpu can preform these operations on directly. Most likely, though, M will be small and its impact will likely be less than that of N on the performance.

Resources