I am very new and I have trouble understanding the macro steps I need to learn how to effectively code. These assignments feel extremely abstract and have to learn everything about recursion before I can even do it. Coding up a program is not easy, and I do really well when someone "helps me stay between the mayonnaise and mustard" so to speak. What am I doing wrong and what direction do I need to continue in?
I was thinking that I needed to sort the list first then have two seperate functions for merge sort and insertion sort per the assignment:
You are spending most of your time at home in this pandemic. It is of most importance for people to be aware of where other people, who
are infected with COVID-19 are, and who they've been near. Keeping
track of this information is known as "contact tracing."
You've heard that there might be some very high paying jobs
if you can show your contract tracing skills to the
government, so you've decided to write a little program to highlight
those skills.Your area can be modeled on the Cartesian plane. You are
located at the point (x, y). In addition, you have the Cartesian
coordinates of all people currently infected with COVID-19.
What you would like to do is write a program that sorts
these locations based on their distance from you, followed by
handling queries. The queries are of the form of a point you are
thinking of visiting. Your program should identify if someone who is
infected is at that location, and if so, what their rank is on the
sorted list of infected people. If no one is infected at
that location, you should correctly identify this.
Note: There are many important implementation restrictions for this assignment, so to make sure everyone reads these, the section on
implementation restrictions will be next, changing the order of the
sections as compared to other assignments.
Implementation Restrictions
You must use a specified combination of Merge Sort and Insertion Sort to sort the point data. Specifically, for each input
case, a threshold value, t, will be given. If the subsection of the
array to sort has t or fewer values to sort, Insertion Sort should be
used. Otherwise, Merge Sort should be used.Further details about the
comparison used for the sorting are below.
You must store your coordinates in a struct that contains two integer fields.
You must write a function compareTo which takes in two pointers, ptrPt1 and ptrPt2, to coordinate structs and returns a
negative integer if the point pointed to by ptrPt1 is closer to you
than the point pointed to by ptrPt2, 0 if the two locations pointed to
by both are identical locations, and a positive integer if the point
pointed to by ptrPt1 is farther from you than the point pointed to by
ptrPt2. Exceptions to this will be when the two pointers are pointing
to points that are the same distance from you, but are distinct
points. In these cases, if ptrPt1's x coordinate is lower
than ptrPt2's x coordinate, a negative integer must be
returned.Alternatively, if ptrPt1's x coordinate is greater than
ptrPt2's x coordinate a positive integer must be returned. Finally, if
the x coordinate of both points is the same, if ptrPt1's y coordinate
is lower than ptrPt2's y coordinate, a negative integer must be
returned. If ptrPt1's y coordinate is greater than ptrPt2's y
coordinate, a positive integer must be returned.
Since your location must be used for sorting, please make the variable that stores your x and y coordinates global. Your program
should have no other global variables.
A Binary Search function must
be used when answering queries.
Your sort function should take in
the array to be sorted,the length of the array as well as the
threshold value, t, previously mentioned. This function should NOT
be recursive. It should be a wrapper function.
The recursive sort
function (you can call this mergeSort) should take in the
array, a starting index into the array, an ending index into the
array and the threshold value t. In this function, either recursive
calls should be made OR a call to an insertion sort function should be
made.
The Problem
Given your location, and the location of each person who has COVID-19, sort the list by distance from you from shortest to
longest, breaking ties by x-coordinate (lower comes first),
and then breaking those ties by y coordinate (lower comes
first). After sorting, answer several queries about points in the
coordinate plane. Specifically, determine if a query point contains
someone who is infected or not. If so, determine that person's ranking
on the sorted list in distance from you.
The Input(to be read from standard input)-Your Program Will Be Tested on Multiple Files
The
first line of the input contains 5 integers separated by spaces. The
first two of these values are x and y (|x|, |y| ≤ 10000), representing
your location. The third integer is n (2 ≤ n ≤ 106), representing the
number of infected people. The fourth integer is s (1 ≤ s ≤ 2x105),
representing the number of points to search for. The last
integer, t (1 ≤ t≤ 30), represents the threshold to be used for
determining whether you run Merge Sort of Insertion Sort. The next
n lines of the input contain x and y coordinate values, respectively,
separated by spaces, representing the locations of infected people.
Each of these values will be integers and the points will be distinct
(and also different from your location) and the absolute value of x
and y for all of these coordinates will not exceed 10,000.Then the
next s lines of the file contains x and y coordinate values for
searching. Both values on each line will be integers with an absolute
valueless than or equal to 10,000.
The Output (to be printed to standard out)
The first n lines of output should contain the coordinates of the people infected,
sorted as previously mentioned. These lines should have the
x-coordinate, followed by a space, followed by the y-coordinate.The
last s lines of output will contain the answers to each of the s queries
in the input. The answer for a single query will be on a line by
itself. If the point queried contains an infected person, output a
line with the following format:
x y found at rank R
, where (x, y) is the
query point, and R is the one-based rank of that infected person in
the sorted list. (Thus, R will be 1 more than the array index in which
(x, y) is located, after sorting.) If the point queried does NOT
contain an infected person, output a line with the following format:
x y not found
Sample Input
(Note: Query points in blue for clarity. last five)
0 0 14 5 53
1 -6
-2 -4
3 4
-4 2
4 -1
3 2
2 0
-5 -4
-2 -6
6 4
4 -2
4 0
5 -4
6 2
-13 1
0 -5
my code so far
#include <stdio.h>
int x = 0;//global coordinates
int y = 0;
typedef struct {
int xInput, yInput;
}coordinates;
void scanPoints(coordinates[], int infectedPeople);
void scanSearchValues(coordinates[], int pointsToSearch);
void SortPoints(coordinates[], int);
int lessThan(coordinates[], int, int);
void printPoints(coordinates[], int);
void
scanPoints(coordinates pts[], int infectedPeople){
for (int i = 0; i < infectedPeople; i++){
scanf("%d %d", &pts[i].xInput, &pts[i].yInput);
}
}
void
scanSearchValues(coordinates pts[], int pointsToSearch){
for (int i = 0; i < pointsToSearch; i++){
scanf("%d %d", &pts[i].xInput, &pts[i].yInput);
}
}
void
sortPoints(coordinates pts[], int infectedPeople){
int i, start, min_index, temp;
for (start = 0; start < infectedPeople - 1; start++) {
min_index = start;
for (i = start + 1; i < infectedPeople; i++) {
if (lessThan(pts, i, min_index)) {
min_index = i;
}
}
if (min_index != start) {
coordinates temp = pts[start];
pts[start] = pts[min_index];
pts[min_index] = temp;
}
}
}
int
lessThan(coordinates pts[], int p, int q) {
if ((pts[p].xInput < pts[q].xInput) || ((pts[p].xInput == pts[q].xInput) && (pts[p].yInput < pts[q].yInput))) {
return 1;
}
}
int
main(int argc, const char * argv[]) {
int infectedPeople;
int pointsToSearch;
int threshold;
scanf("%d%d", &x, &y);
if(x > 10000 || y > 10000 )
return 0;
scanf("%d", &infectedPeople);
if(infectedPeople < 2 || infectedPeople > 1000000)
return 0;
scanf("%d", &pointsToSearch);
if(pointsToSearch < 1 || pointsToSearch > 200000)
return 0;
scanf("%d", &threshold);
if(threshold < 1 || threshold > 30)
return 0;
return 0;
}
This is a challenging exercise for someone new to programming, but the first step is to read the problem description carefully. It might help to print it out on paper, so that you can easily mark it up with highlighter and / or pen. Additionally, you may be intimidated by all the details specified in the exercise. Don't be! Although some make work for you, most make decisions for you. The latter kind save you work, and do you exactly the service you asked of us: help you stay on track.
One of the keys to programming is learning to divide a problem into smaller pieces. Sometimes those pieces will also need to be divided into even smaller pieces. Many of these pieces will correspond naturally to functions, and accordingly, a second key to programming is recognizing how to choose the pieces so that they have well-defined inputs and outputs, and, to some extent, so that pieces can be re-used. In your case, the overall problem statement gives you a starting point for performing such an analysis:
Given your location, and the location of each person who has COVID-19,
sort the list by distance from you from shortest to longest, breaking ties by x-coordinate (lower comes first), and then breaking
those ties by y coordinate (lower comes first). After sorting, answer
several queries about points in the coordinate plane. Specifically,
determine if a query point contains someone who is infected or not. If
so, determine that person's ranking on the sorted list in distance
from you.
(Emphasis added.) The three main pieces I see there are
read and store input data
sort the data
analyze the result and produce output
Reading the input
The implementation restrictions in the problem description have a lot to say about how you read and store the data. In particular,
You must store your coordinates in a struct that contains two integer fields.
You've prepared such a structure type.
Since your location must be used for sorting, please make the variable that stores your x and y coordinates global. Your program should have no other global variables.
Reading the restrictions carefully, I think the expectation is that you use the coordinates structure to represent all coordinates appearing in the program, including the (one) global variable representing your own coordinates.
Your sort function should take in the array to be sorted
You mentioned a linked list, but this indicates that you are expected to store the data in an array, not a linked list. From my more experienced vantage point, I have more reasons to believe that that is the expectation.
The detailed description of the input format gives you additional guidance on how to perform the reading, as of course the code needs to be suited to the data. So, read the first line of input to get the main program parameters, and store them appropriately. Among those is the number of infected person records to read; you'll need to store all those in memory in order to sort them and answer multiple queries about them, so allocate an array of structs large enough to hold them, then proceed to read those data.
You could similarly read and store the queries in advance, but I would suggest instead performing the required sorting first, and then processing each query immediately after reading it, without storing the whole list of queries.
Sorting the data
You write,
I was thinking that I needed to sort the list first then have two seperate functions for merge sort and insertion sort
Yes, I too read the problem description to be asking for separate merge sort and insertion sort functions -- and that's not what you seem presently to be providing. It also asks for a wrapper function that accepts the input and passes it on to the appropriate sort implementation, either (recursive) merge sort or insertion sort. Note that the wrapper function does not itself sort the list, except inasmuch as it passes the list to one of the other functions for sorting:
void sortCoordinates(coordinates coords[], int count, int threshold) {
if (/* your condition here */) {
insertionSortCoordinates(coords, count);
} else {
mergeSortCoordinates(coords, count);
}
}
(The names and most of the details of these particular functions are at your discretion.)
Additionally, the restrictions help you out again here, though you need to read between the lines a bit. Both sorting and searching require that you have a way to compare the objects in the list, and look! The restrictions tell you exactly what form that should take:
You must write a function compareTo which takes in two pointers, ptrPt1 and ptrPt2, to coordinate structs [...]
In other words,
int compareTo(coordinates *ptrPt1, coordinates *ptrPt2) {
/* your code here */
}
Your insertion and merge sort functions and also your binary search function (see below) will compare structures (when needed) by calling that function.
Do pay careful attention to the restrictions, though, as one of the decisions they make for you is the name for this function: compareTo, not lessThan. Deviating from the restrictions in this regard would likely cost you some marks.
Computing the output
Whether you read and store the query lines in advance or process them as you read them (having first sorted the input), the main computation to be performed is a binary search of the coordinates, per restriction 5. You'll wan't a function for that, maybe
int binarySearch(coordinates *target, coordinates coords[]) {
/* your code here: returns 1-based rank if found, zero if not found */
}
Again, this function will use your compareTo function to compare coordinate structures. Note in particular that if implemented correctly according to the restrictions, compareTo() will return zero if and only if the two objects being compared are equal.
in
int
lessThan(coordinates pts[], int p, int q) {
if ((pts[p].xInput < pts[q].xInput) || ((pts[p].xInput == pts[q].xInput) && (pts[p].yInput < pts[q].yInput))) {
return 1;
}
}
if ((pts[p].xInput < pts[q].xInput) || ((pts[p].xInput == pts[q].xInput) && (pts[p].yInput < pts[q].yInput))) is false the function does not return, introducing an undefined behavior in sortPoints
you wanted
int lessThan(coordinates pts[], int p, int q)
{
return ((pts[p].xInput < pts[q].xInput) || ((pts[p].xInput == pts[q].xInput) && (pts[p].yInput < pts[q].yInput)));
}
in sortPoints the variable temp in int i, start, min_index, temp; is useless, remove it
In main you only read the 5 values, nothing more, so the other functions are useless, and you do not print nor compute something
Not sure my answer is really usefull ...
Related
Imagine 10 cars randomly, uniformly distributed on a round track of length 1. If the positions are represented by a C double in the range [0,1> then they can be sorted and the gaps between the cars should be the position of the car in front minus the position of the car behind. The last gap needs 1 added to account for the discontinuity.
In the program output, the last column has very different statistics and distribution from the others. The rows correctly add to 1. What's going on?
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int compare (const void * a, const void * b)
{
if (*(double*)a > *(double*)b) return 1;
else if (*(double*)a < *(double*)b) return -1;
else return 0;
}
double grand_f_0_1(){
static FILE * fp = NULL;
uint64_t bits;
if(fp == NULL) fp = fopen("/dev/urandom", "r");
fread(&bits, sizeof(bits), 1, fp);
return (double)bits * 5.421010862427522170037264004349e-020; // https://stackoverflow.com/a/26867455
}
int main()
{
const int n = 10;
double values[n];
double diffs[n];
int i, j;
for(j=0; j<10000; j++) {
for(i=0; i<n; i++) values[i] = grand_f_0_1();
qsort(values, n, sizeof(double), compare);
for(i=0; i<(n-1); i++) diffs[i] = values[i+1] - values[i];
diffs[n-1] = 1. + values[0] - values[n-1];
for(i=0; i<n; i++) printf("%.5f%s", diffs[i], i<(n-1)?"\t":"\n");
}
return(0);
}
Here is a sample of the output. The first column represents the gap between the first and second car. The last column represents the gap between 10th car and the first car, across the start/finish line. Large numbers like .33 and .51 are much more common in the last column and very small numbers are relatively rare.
0.13906 0.14241 0.24139 0.29450 0.01387 0.07906 0.02905 0.03160 0.00945 0.01962
0.01826 0.36875 0.04377 0.05016 0.05939 0.02388 0.10363 0.04640 0.03538 0.25037
0.04496 0.05036 0.00536 0.03645 0.13741 0.00538 0.24632 0.04452 0.07750 0.35176
0.00271 0.15540 0.03399 0.05654 0.00815 0.01700 0.24275 0.25494 0.00206 0.22647
0.34420 0.03226 0.01573 0.08597 0.05616 0.00450 0.05940 0.09492 0.05545 0.25141
0.18968 0.34749 0.07375 0.01481 0.01027 0.00669 0.04306 0.00279 0.08349 0.22796
0.16135 0.02824 0.07965 0.11255 0.05570 0.05550 0.05575 0.05586 0.07156 0.32385
0.12799 0.18870 0.04153 0.16590 0.02079 0.06612 0.08455 0.14696 0.13088 0.02659
0.00810 0.06335 0.13014 0.06803 0.01878 0.10119 0.00199 0.06656 0.20922 0.33263
0.00715 0.03261 0.05779 0.47221 0.13998 0.11044 0.06397 0.00238 0.04157 0.07190
0.33703 0.02945 0.06164 0.01555 0.03444 0.14547 0.02342 0.03804 0.16088 0.15407
0.10912 0.14419 0.04340 0.09204 0.23033 0.09240 0.14530 0.00960 0.03412 0.09950
0.20165 0.09222 0.04268 0.17820 0.19159 0.02074 0.05634 0.00237 0.09559 0.11863
0.09296 0.01148 0.20442 0.07070 0.05221 0.04591 0.08455 0.25799 0.01417 0.16561
0.08846 0.07075 0.03732 0.11721 0.03095 0.24329 0.06630 0.06655 0.08060 0.19857
0.06225 0.10971 0.10978 0.01369 0.13479 0.17539 0.17540 0.02690 0.00464 0.18744
0.09431 0.10851 0.05079 0.07846 0.00162 0.00463 0.06533 0.18752 0.30896 0.09986
0.23214 0.11937 0.10215 0.04040 0.02876 0.00979 0.02443 0.21859 0.15627 0.06811
0.04522 0.07920 0.02432 0.01949 0.03837 0.10967 0.11123 0.01490 0.03846 0.51915
0.13486 0.02961 0.00818 0.11947 0.17204 0.08967 0.09767 0.03349 0.08077 0.23426
Your code is ok. The mean value of the last difference it two times larger than the others.
The paradox comes from the fact is that rather than selecting 10 points on a unit interval one actually tries to divide it into 11 sub-intervals with 10 cuts.Therefore the expected length of each sub-interval is 1/11.
The difference between consecutive points is approaching 1/11 except the last pair because it contains the last sub-interval (between the last point and point 1) and the first one (between point 0 and the first point).
Thus the mean of the last difference is 2/11.
"There is no special points on the circle"
The thing is that, on a circle, one car looks always the same, and so there is no need to relate to zero: you can just relate to the first car. This means that you can fix the first car at zero, and treat the random positions of the other cars as related to it (measured from it).
And so, the convenient solution is to fix the first car at zero and think of the 9 numbers you still generate as positions related to the first one.
Hope it's a satisfying answer :-)
IDENTITY (or Which diff is first?)
If 10 cars with labels ("1","2" and so on) are placed randomly on a circle, the difference from the "1" to the next will average 1/10.
While sorting, the first diff "loses it's identity", what it is changes: it's similar to that if you chose the 1st diff to be the longest one, it would average more. Choosing it based on relation of cars to zero skews (or, in nicer terms: changes) things in a similar manner.
The first difference (2nd, 3rd etc.) just becomes something different – defining it as a difference from a given car is more intuitive, and gives an option to use it as a reference (playing nicely with circle symmetry); the distribution of the rest of cars with respect to it is a uniform one. Dealing with the smallest of random points is not that simple.
Summary: define what you're calculating, know your definitions and probability is non-intuitive
After 3 months of puzzling over this, I have an explanation that is intuitive, at least to me. This is cumulative to the answers provided by #wojand and #tstanisl.
My original code is correct: it uniformly distributes points on the interval, and the forward differences of all points have the same statistical distribution. The paradox is that the forward difference of the highest-value point, the one that crosses the 0-1 discontinuity, is on average twice the others and it's distribution has a different shape.
The reason this forward difference has a different distribution is that it contains the value 0. Larger forward differences (gaps) are more likely to contain any fixed value, simply because they are larger.
We could search for the gap that contains 1/pi, for example, and it too would have the same atypical distribution.
Is it possible to create arrays based of their index as in
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y] = someNr;
dynamically/on the run, without creating foo[0...3][0...4]?
If not, is there a data structure that allow me to do something similar to this in C?
No.
As written your code make no sense at all. You need foo to be declared somewhere and then you can index into it with foo[x][y] = someNr;. But you cant just make foo spring into existence which is what it looks like you are trying to do.
Either create foo with correct sizes (only you can say what they are) int foo[16][16]; for example or use a different data structure.
In C++ you could do a map<pair<int, int>, int>
Variable Length Arrays
Even if x and y were replaced by constants, you could not initialize the array using the notation shown. You'd need to use:
int fixed[3][4] = { someNr };
or similar (extra braces, perhaps; more values perhaps). You can, however, declare/define variable length arrays (VLA), but you cannot initialize them at all. So, you could write:
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y];
for (int i = 0; i < x; i++)
{
for (int j = 0; j < y; j++)
foo[i][j] = someNr + i * (x + 1) + j;
}
Obviously, you can't use x and y as indexes without writing (or reading) outside the bounds of the array. The onus is on you to ensure that there is enough space on the stack for the values chosen as the limits on the arrays (it won't be a problem at 3x4; it might be at 300x400 though, and will be at 3000x4000). You can also use dynamic allocation of VLAs to handle bigger matrices.
VLA support is mandatory in C99, optional in C11 and C18, and non-existent in strict C90.
Sparse arrays
If what you want is 'sparse array support', there is no built-in facility in C that will assist you. You have to devise (or find) code that will handle that for you. It can certainly be done; Fortran programmers used to have to do it quite often in the bad old days when megabytes of memory were a luxury and MIPS meant millions of instruction per second and people were happy when their computer could do double-digit MIPS (and the Fortran 90 standard was still years in the future).
You'll need to devise a structure and a set of functions to handle the sparse array. You will probably need to decide whether you have values in every row, or whether you only record the data in some rows. You'll need a function to assign a value to a cell, and another to retrieve the value from a cell. You'll need to think what the value is when there is no explicit entry. (The thinking probably isn't hard. The default value is usually zero, but an infinity or a NaN (not a number) might be appropriate, depending on context.) You'd also need a function to allocate the base structure (would you specify the maximum sizes?) and another to release it.
Most efficient way to create a dynamic index of an array is to create an empty array of the same data type that the array to index is holding.
Let's imagine we are using integers in sake of simplicity. You can then stretch the concept to any other data type.
The ideal index depth will depend on the length of the data to index and will be somewhere close to the length of the data.
Let's say you have 1 million 64 bit integers in the array to index.
First of all you should order the data and eliminate duplicates. That's something easy to achieve by using qsort() (the quick sort C built in function) and some remove duplicate function such as
uint64_t remove_dupes(char *unord_arr, char *ord_arr, uint64_t arr_size)
{
uint64_t i, j=0;
for (i=1;i<arr_size;i++)
{
if ( strcmp(unord_arr[i], unord_arr[i-1]) != 0 ){
strcpy(ord_arr[j],unord_arr[i-1]);
j++;
}
if ( i == arr_size-1 ){
strcpy(ord_arr[j],unord_arr[i]);
j++;
}
}
return j;
}
Adapt the code above to your needs, you should free() the unordered array when the function finishes ordering it to the ordered array. The function above is very fast, it will return zero entries when the array to order contains one element, but that's probably something you can live with.
Once the data is ordered and unique, create an index with a length close to that of the data. It does not need to be of an exact length, although pledging to powers of 10 will make everything easier, in case of integers.
uint64_t* idx = calloc(pow(10, indexdepth), sizeof(uint64_t));
This will create an empty index array.
Then populate the index. Traverse your array to index just once and every time you detect a change in the number of significant figures (same as index depth) to the left add the position where that new number was detected.
If you choose an indexdepth of 2 you will have 10² = 100 possible values in your index, typically going from 0 to 99.
When you detect that some number starts by 10 (103456), you add an entry to the index, let's say that 103456 was detected at position 733, your index entry would be:
index[10] = 733;
Next entry begining by 11 should be added in the next index slot, let's say that first number beginning by 11 is found at position 2023
index[11] = 2023;
And so on.
When you later need to find some number in your original array storing 1 million entries, you don't have to iterate the whole array, you just need to check where in your index the first number starting by the first two significant digits is stored. Entry index[10] tells you where the first number starting by 10 is stored. You can then iterate forward until you find your match.
In my example I employed a small index, thus the average number of iterations that you will need to perform will be 1000000/100 = 10000
If you enlarge your index to somewhere close the length of the data the number of iterations will tend to 1, making any search blazing fast.
What I like to do is to create some simple algorithm that tells me what's the ideal depth of the index after knowing the type and length of the data to index.
Please, note that in the example that I have posed, 64 bit numbers are indexed by their first index depth significant figures, thus 10 and 100001 will be stored in the same index segment. That's not a problem on its own, nonetheless each master has his small book of secrets. Treating numbers as a fixed length hexadecimal string can help keeping a strict numerical order.
You don't have to change the base though, you could consider 10 to be 0000010 to keep it in the 00 index segment and keep base 10 numbers ordered, using different numerical bases is nonetheless trivial in C, which is of great help for this task.
As you make your index depth become larger, the amount of entries per index segment will be reduced
Please, do note that programming, especially lower level like C consists in comprehending the tradeof between CPU cycles and memory use in great part.
Creating the proposed index is a way to reduce the number of CPU cycles required to locate a value at the cost of using more memory as the index becomes larger. This is nonetheless the way to go nowadays, as masive amounts of memory are cheap.
As SSDs' speed become closer to that of RAM, using files to store indexes is to be taken on account. Nevertheless modern OSs tend to load in RAM as much as they can, thus using files would end up in something similar from a performance point of view.
This is an interview question. Say you have an array of four ints named A, and also this function:
int check(int x, int y){
if (x<=y) return 1;
return 0;
}
Now, you want to create a function that will sort A, and you can use only the function checkfor comparisons. How many calls for check do you need?
(It is ok to return a new array for result).
I found that I can do this in 5 calls. Is it possible to do it with less calls (on worst case)?
This is how I thought of doing it (pseudo code):
int[4] B=new int[4];
/*
The idea: put minimum values in even cells and maximum values in odd cells using check.
Then swap (if needed) between minimum values and also between maximum values.
And finally, swap the second element (max of minimums)
and the third element (min of maximums) if needed.
*/
if (check(A[0],A[1])==1){ //A[0]<=A[1]
B[0]=A[0];
B[2]=A[1];
}
else{
B[0]=A[1];
B[2]=A[0];
}
if (check(A[2],A[3])==1){ //A[2]<=A[3]
B[1]=A[2];
B[3]=A[3];
}
else{
B[1]=A[3];
B[3]=A[2];
}
if (check(B[0],B[1])==0){ //B[0]>B[1]
swap(B[0],B[1]);
}
if (check(B[2],B[3])==0){ //B[2]>B[3]
swap(B[2],B[3]);
}
if (check(B[1],B[2])==0){ // B[1]>B[2]
swap(B[1],B[2]);
}
There are 24 possible orderings of a 4-element list. (4 factorial) If you do only 4 comparisons, then you can get only 4 bits of information, which is enough to distinguish between 16 different cases, which isn't enough to cover all the possible output cases. Therefore, 5 comparisons is the optimal worst case.
In The Art of Computer Programming, p. 183 (Section 3.5.1), Donald Knuth has the following table of lower and upper bounds on the minimum numbers of comparisons:
The ceil(ln n!) is the "information theoretic" lower bound, whereas B(n) is the maximum number of comparisons in an insertion binary sort. Since the lower and upper bounds are equal for n=4, 5 comparisons are needed.
The information theoretic bound is derived by recognizing that there are n! possible orderings of n unique items. We distinguish these cases by asking S yes-no questions in the form of is X<Y?. These questions form a tree which has at most 2^S tips. We need n!<=2^S; solving for S gives ceil(lg(n!)).
Incidentally, you can use Stirling's approximation to show that this implies that sorting requires O(n log n) time.
The rest of the section goes on to describe a number of approaches to creating these bounds and studying this question, though work is on-going (see, for instance Peczarski (2011)).
Goal
I would like to write an algorithm (in C) which returns TRUE or FALSE (1 or 0) depending whether the array A given in input can “sum and/or sub” to x (see below for clarification). Note that all values of A are integers bounded between [1,x-1] that were randomly (uniformly) sampled.
Clarification and examples
By “sum and/or sub”, I mean placing "+" and "-" in front of each element of array and summing over. Let's call this function SumSub.
int SumSub (int* A,int x)
{
...
}
SumSub({2,7,5},10)
should return TRUE as 7-2+5=10. You will note that the first element of A can also be taken as negative so that the order of elements in A does not matter.
SumSub({2,7,5,2},10)
should return FALSE as there is no way to “sum and/or sub” the elements of array to reach the value of x. Please note, this means that all elements of A must be used.
Complexity
Let n be the length of A. Complexity of the problem is of order O(2^n) if one has to explore all possible combinations of pluses and minus. However, some combinations are more likely than others and therefore are worth being explored first (hoping the output will be TRUE). Typically, the combination which requires substracting all elements from the largest number is impossible (as all elements of A are lower than x). Also, if n>x, it makes no sense to try adding all the elements of A.
Question
How should I go about writing this function?
Unfortunately your problem can be reduced to subset-sum problem which is NP-Complete. Thus the exponential solution can't be avoided.
The original problem's solution is indeed exponential as you said. BUT with the given range[1,x-1] for numbers in A[] you can make the solution polynomial. There is a very simple dynamic programming solution.
With the order:
Time Complexity: O(n^2*x)
Memory Complexity: O(n^2*x)
where, n=num of elements in A[]
You need to use dynamic programming approach for this
You know the min,max range that can be made in in the range [-nx,nx]. Create a 2d array of size (n)X(2*n*x+1). Lets call this dp[][]
dp[i][j] = taking all elements of A[] from [0..i-1] whether its possible to make the value j
so
dp[10][3] = 1 means taking first 10 elements of A[] we CAN create the value 3
dp[10][3] = 0 means taking first 10 elements of A[] we can NOT create the value 3
Here is a kind of pseudo code for this:
int SumSub (int* A,int x)
{
bool dp[][];//set all values of this array 0
dp[0][0] = true;
for(i=1;i<=n;i++) {
int val = A[i-1];
for(j=-n*x;j<=n*x;j++) {
dp[i][j]=dp[ i-1 ][ j + val ] | dp[ i-1 ][ j - val ];
}
}
return dp[n][x];
}
Unfortunately this is NP-complete even when x is restricted to the value 0, so don't expect a polynomial-time algorithm. To show this I'll give a simple reduction from the NP-hard Partition Problem, which asks whether a given multiset of positive integers can be partitioned into two parts having equal sums:
Suppose we have an instance of the Partition Problem consisting of n positive integers B_1, ..., B_n. Create from this an instance of your problem in which A_i = B_i for each 1 <= i <= n, and set x = 0.
Clearly if there is a partition of B into two parts C and D having equal sums, then there is also a solution to the instance of your problem: Put a + in front of every number in C, and a - in front of every number in D (or the other way round). Since C and D have equal sums, this expression must equal 0.
OTOH, if the solution to the instance of your problem that we just created is YES (TRUE), then we can easily create a partition of B into two parts having equal sums: just put all the positive terms in one part (say, C), and all the negative terms (without the preceding - of course) in the other (say, D). Since we know that the total value of the expression is 0, it must be that the sum of the (positive) numbers in C is equal to the (negated) sum of the numbers in D.
Thus a YES to either problem instance implies a YES to the other problem instance, which in turn implies that a NO to either problem instance implies a NO to the other problem instance -- that is, the two problem instances have equal solutions. Thus if it were possible to solve your problem in polynomial time, it would be possible to solve the NP-hard Partition Problem in polynomial time too, by constructing the above instance of your problem, solving it with your poly-time algorithm, and reporting the result it gives.
[Description] Given two integer arrays with the same length. Design an algorithm which can judge whether they're the same. The definition of "same" is that, if these two arrays were in sorted order, the elements in corresponding position should be the same.
[Example]
<1 2 3 4> = <3 1 2 4>
<1 2 3 4> != <3 4 1 1>
[Limitation] The algorithm should require constant extra space, and O(n) running time.
(Probably too complex for an interview question.)
(You can use O(N) time to check the min, max, sum, sumsq, etc. are equal first.)
Use no-extra-space radix sort to sort the two arrays in-place. O(N) time complexity, O(1) space.
Then compare them using the usual algorithm. O(N) time complexity, O(1) space.
(Provided (max − min) of the arrays is of O(Nk) with a finite k.)
You can try a probabilistic approach - convert the arrays into a number in some huge base B and mod by some prime P, for example sum B^a_i for all i mod some big-ish P. If they both come out to the same number, try again for as many primes as you want. If it's false at any attempts, then they are not correct. If they pass enough challenges, then they are equal, with high probability.
There's a trivial proof for B > N, P > biggest number. So there must be a challenge that cannot be met. This is actually the deterministic approach, though the complexity analysis might be more difficult, depending on how people view the complexity in terms of the size of the input (as opposed to just the number of elements).
I claim that: Unless the range of input is specified, then it is IMPOSSIBLE to solve in onstant extra space, and O(n) running time.
I will be happy to be proven wrong, so that I can learn something new.
Insert all elements from the first array into a hashtable
Try to insert all elements from the second array into the same hashtable - for each insert to element should already be there
Ok, this is not with constant extra space, but the best I could come up at the moment:-). Are there any other constraints imposed on the question, like for example to biggest integer that may be included in the array?
A few answers are basically correct, even though they don't look like it. The hash table approach (for one example) has an upper limit based on the range of the type involved rather than the number of elements in the arrays. At least by by most definitions, that makes the (upper limit on) the space a constant, although the constant may be quite large.
In theory, you could change that from an upper limit to a true constant amount of space. Just for example, if you were working in C or C++, and it was an array of char, you could use something like:
size_t counts[UCHAR_MAX];
Since UCHAR_MAX is a constant, the amount of space used by the array is also a constant.
Edit: I'd note for the record that a bound on the ranges/sizes of items involved is implicit in nearly all descriptions of algorithmic complexity. Just for example, we all "know" that Quicksort is an O(N log N) algorithm. That's only true, however, if we assume that comparing and swapping the items being sorted takes constant time, which can only be true if we bound the range. If the range of items involved is large enough that we can no longer treat a comparison or a swap as taking constant time, then its complexity would become something like O(N log N log R), were R is the range, so log R approximates the number of bits necessary to represent an item.
Is this a trick question? If the authors assumed integers to be within a given range (2^32 etc.) then "extra constant space" might simply be an array of size 2^32 in which you count the occurrences in both lists.
If the integers are unranged, it cannot be done.
You could add each element into a hashmap<Integer, Integer>, with the following rules: Array A is the adder, array B is the remover. When inserting from Array A, if the key does not exist, insert it with a value of 1. If the key exists, increment the value (keep a count). When removing, if the key exists and is greater than 1, reduce it by 1. If the key exists and is 1, remove the element.
Run through array A followed by array B using the rules above. If at any time during the removal phase array B does not find an element, you can immediately return false. If after both the adder and remover are finished the hashmap is empty, the arrays are equivalent.
Edit: The size of the hashtable will be equal to the number of distinct values in the array does this fit the definition of constant space?
I imagine the solution will require some sort of transformation that is both associative and commutative and guarantees a unique result for a unique set of inputs. However I'm not sure if that even exists.
public static boolean match(int[] array1, int[] array2) {
int x, y = 0;
for(x = 0; x < array1.length; x++) {
y = x;
while(array1[x] != array2[y]) {
if (y + 1 == array1.length)
return false;
y++;
}
int swap = array2[x];
array2[x] = array2[y];
array2[y] = swap;
}
return true;
}
For each array, Use Counting sort technique to build the count of number of elements less than or equal to a particular element . Then compare the two built auxillary arrays at every index, if they r equal arrays r equal else they r not . COunting sort requires O(n) and array comparison at every index is again O(n) so totally its O(n) and the space required is equal to the size of two arrays . Here is a link to counting sort http://en.wikipedia.org/wiki/Counting_sort.
given int are in the range -n..+n a simple way to check for equity may be the following (pseudo code):
// a & b are the array
accumulator = 0
arraysize = size(a)
for(i=0 ; i < arraysize; ++i) {
accumulator = accumulator + a[i] - b[i]
if abs(accumulator) > ((arraysize - i) * n) { return FALSE }
}
return (accumulator == 0)
accumulator must be able to store integer with range = +- arraysize * n
How 'bout this - XOR all the numbers in both the arrays. If the result is 0, you got a match.