Constant search algorithm for uniformly distributed arrays? - arrays

I was looking for a good search algorithm, and I ran into the interpolation search one, which has a time complexity of O(log(log(n))) and can only be applied on uniformly distributed arrays.
But, I found it was possible to create a search algorithm that requires the same conditions, but that has a time complexity of O(1). Here's what I've come up with:
(Code in C++)
int search(double Find, double* Array, const int Size) {
if (Array[0] == Array[1] && Find == Array[0]) { return 0; }
if (Array[0] == Array[1] && Find != Array[0]) { return -1; }
const double Index = (Find - Array[0]) / (Array[Size - 1] - Array[0]) * (Size - 1.0);
if (Index < 0 || Index >= Size || Array[(int)(Index + 0.5)] != Find) { return -1; }
return Index + 0.5;
}
In this function, we pass the number we want to find, the array pointer, and the size of the array.
This function returns the index where the number to be found is, or -1 if not found.
Explanation:
Well, explaining this without any picture is going to be difficult, I'm going to try my best...
Because the array is uniformly distributed, we can represent all its values on a graph, with the index number(let's say x) as abscissa, and the value contained in the array at this index(let's say Arr[x]) as ordinate. Well, we can see that all the points represented on the graph belong to a function of equation:
Array[x] = tan(A) * x + Arr[0] -> with A as the angle formed between the function and the x-axis of the graph.
So now, we can transform the equation to:
x = (Arr[x] - Arr[0]) / tan(A)
And that's it, x is the index to find in relation to Arr[x](the given value to search).
We can just change the formula to x = (Arr[x] - Array[0]) / (Array[Size - 1] - Array[0]) * (Size - 1.0)
because we know that tan(A) = (Array[Size - 1] - Array[0]) / (Size - 1.0).
The question (Finally...)
I guess that this formula is already used in some programs, it was rather easy to find...
So my question is, why would we use the interpolation search instead of this? Am I not understanding something?
Thanks for you patience and help.

When speaking of a uniform distribution, this does not mean that once you know the first and last value in a sorted array, that you also know the other values. This is what your algorithm is assuming. A uniform distribution has to do with probabilities involved when an array is generated, but in actual arrays produced along this uniform distribution principle, you can still get arrays that look not so evenly spread. It is a matter of probability. On average the values will be evenly spread, but this is not a guarantee.
Secondly, your algorithm is in fact an interpolation algorithm, with that difference that it assumes that after the first lookup it must have arrived at the place where the value should be. Yet in randomly produced arrays (uniformly distributed) this is not assured, and so the operation must be repeated on a smaller interval, until the interval closes in to one index.
See also:
CS: What is the meaning of uniform distribution of elements in an array?
Wikipedia: Uniform distribution

Related

Optimal Algorithm for finding peak element in an array

So far I haven't found any algorithm that solves this task: "An element is
considered as a peak if and only if (A[i]>A[i+1])&&(A[i]>A[i-1]), not
taking into account edges of the array(1D)."
I know that the common approach for this problem is using "Divide & Conquer" but that's in case of taking into consideration the edges as "peaks".
The O(..) complexity I need to get for this exercise is O(log(n)).
By the image above it is clear to me why it is O(log(n)), but without the edges complexity changes to O(n), because in the lower picture I run recursive
function on each side of the middle element, which makes it run in O(n) (worst case scenario in which the element is near the edge). In this case, why not to use a simple binary search like this:
public static int GetPeak(int[]A)
{
if(A.length<=2)//doesn't apply for peak definition
{
return -1;
}
else {
int Element=Integer.MAX_VALUE;//The element which is determined as peak
// First and Second elements can't be hills
for(int i=1;i<A.length-1;i++)
{
if(A[i]>A[i+1]&&A[i]>A[i-1])
{
Element=A[i];
break;
}
else
{
Element=-1;
}
}
return Element;
}
The common algorithm is written here: http://courses.csail.mit.edu/6.006/spring11/lectures/lec02.pdf, but as I said before it doesn't apply for the terms of this exercise.
Return only one peak, else return -1.
Also, my apologies if the post is worded incorrectly due to the language barrier (I am not a native English speaker).
I think what you're looking for is a dynamic programming approach, utilizing divide-and-conquer. Essentially, you would have a default value for your peak which you would overwrite when you found one. If you could check at the beginning of your method and only run operations if you hadn't found a peak, then your O() notation would look something like O(pn) where p is the probability that any given element of your array is a peak, which is a variable term as it relates to how your data is structured (or not). For instance, if your array only has values between 1 and 5 and they're distributed equally then the probability would be equal to 0.24 so you would expect the algorithm to run in O(0.24n). Note that this still appears to be equivalent to O(n). However, if you require that your data values are unique on the array then your probability is equal to:
p = 2 * sum( [ choose(x - 1, 2) for x in 3:n ] ) / choose(n, 3)
p = 2 * sum( [ ((x - 1)! / (2 * (x - 3)!)) for x in 3:n ] ) / (n! / (n - 3)!)
p = sum( [ (x - 1) * (x - 2) for x in 3:n ] ) / (n * (n - 1) * (n - 2))
p = ((n * (n + 1) * (2 * n + 1)) / 6 - (n * (n + 1)) + 2 * n - 8) / (n * (n - 1) * (n - 2))
p = ((1 / 3) * n^3 - 5.5 * n^2 + 6.5 * n - 8) / (n * (n - 1) * (n - 2))
So, this seems like a lot but if we take the limit as n approaches infinity then we wind up with a value for p that is near 1/3.
So, if we have a 33% chance of finding a peak at any element on the array, then at the bottom level of your recursion when you have a 1/3 probability of finding a peak. So, the expected value of this is around 3 comparisons before you find one, which means a constant time. However, you still have to get to the bottom level of your recursion before you can do the comparisons and that requires O(log(n)) time. So, a divide-and-conquer approach should run in O(log(n)) time in the average case with O(n log(n)) in the worst case.
If you cannot make any assumptions about your data (monotonicity of the number sequence, number of peaks), and if edges cannot count as peaks, then you cannot hope for a better average performance than O(n). Your data is randomly distributed, and any value can be a peak. You have to examine them one by one, and there is no correlation between the values.
Accepting edges as potential candidates for peaks changes everything: you know there will always be at least one peak, and a good enough strategy is to always search in the direction of increasing values until you start to go down or you reach an edge (this is the one of the document you provided). That strategy is O(nlog(n)) because you use binary search to look for a local max.

Binary search for multiple distinct numbers in a large array in minimum number of comparisons

I have a large array of size n (say n = 1000000) with values monotonically non-decreasing. I have a set of 'k' key values (say k = { 1,23,39,55,..}). Assume key values are sorted. I have to find the index of these key values in the large array using minimum number of comparisons. How do I use binary search to search for multiple unique values? Doing it separately for each key value takes lot of comparisons. Can I use reuse some knowledge I learned in one search somehow when I search for another element on the same big array?
Sort the needles (the values you will search for).
Create an array of the same length as the needles, with each element being a pair of indexes. Initialize each pair with {0, len(haystack)}. These pairs represent all the knowledge we have of the possible locations of the needles.
Look at the middle value in the haystack. Now do binary search for that value in your needles. For all lesser needles, set the upper bound (in the array from step 2) to the current haystack index. For all greater needles, set the lower bound.
While you were doing step 3, keep track of which needle now has the largest range remaining. Bisect it and use this as your new middle value to repeat step 3. If the largest range is singular, you're done: all needles have been found (or if not found, their prospective location in the haystack is now known).
There may be some slight complication here when you have duplicate values in the haystack, but I think once you have the rest sorted out this should not be too difficult.
I was curious if NumPy implemented anything like this. The Python name for what you're doing is numpy.searchsorted(), and once you get through the API layers it comes to this:
/*
* Updating only one of the indices based on the previous key
* gives the search a big boost when keys are sorted, but slightly
* slows down things for purely random ones.
*/
if (#TYPE#_LT(last_key_val, key_val)) {
max_idx = arr_len;
}
else {
min_idx = 0;
max_idx = (max_idx < arr_len) ? (max_idx + 1) : arr_len;
}
So they do not do a full-blown optimization like I described, but they do track when the current needle is greater than the last needle, they can avoid searching the haystack below where the last needle was found. This is a simple and elegant improvement over the naive implementation, and as seen from the comments, it must be kept simple and fast because the function does not require the needles to be sorted in the first place.
By the way: my proposed solution aimed for something like theoretical optimality in big-O terms, but if you have a large number of needles, the fastest thing to do is probably to sort the needles then iterate over the entire haystack and all the needles in tandem: linear-search for the first needle, then resume from there to look for the second, etc. You can even skip every second item in the haystack by recognizing that if a needle is greater than A and less than C, it must belong at position B (assuming you don't care about the left/right insertion order for needles not in the haystack). You can then do about len(haystack)/2 comparisons and the entire thing will be very cache-friendly (after sorting the needles, of course).
One way to reuse knowledge from previous steps is like others suggested: once you have located a key, you can restrict the search ranges for the smaller and larger keys.
Assuming N=2^n, K=2^k and lucky outcomes:
after finding the middle key, (n comparisons), you have two subarrays of size N/2. Perform 2 searches for the "quartile" keys (n-1 comparisons each), reducing to N/4 subarrays...
In total, n + 2(n-1) + 4(n-2) + ... + 2^(k-1)(n-k+1) comparisons. After a bit of math, this equals roughly K.n-K.k = K.(n-k).
This is a best case scenario and the savings are not so significant compared to independent searches (K.n comparisons). Anyway, the worst case (all searches resulting in imbalanced partitions) is not worse than independent searches.
UPDATE: this is an instance of the Minimum Comparison Merging problem
Finding the locations of the K keys in the array of N values is the same as merging the two sorted sequences.
From Knuth Vol. 3, Section 5.3.2, we know that at least ceiling(lg(C(N+K,K))) comparisons are required (because there are C(N+K,K) ways to intersperse the keys in the array). When K is much smaller than N, this is close to lg((N^K/K!), or K lg(N) - K lg(K) = K.(n-k).
This bound cannot be beaten by any comparison-based method, so any such algorithm will take time essentially proportional to the number of keys.
Sort needles.
Search for first needle
Update lower bound of haystack with search result
Search for last needle
Update upper bound of haystack with search result
Go 2.
While not optimal it is much easier to implement.
If you have array of ints, and you want to search for minimum number of comparisons, I want to suggest you interpolation search from Knuth, 6.2.1. If binary search requires Log(N) iterations (and comparisons), interpolation search requires only Log(Log(N)) operations.
For details and code sample see:
http://en.wikipedia.org/wiki/Interpolation_search
http://xlinux.nist.gov/dads//HTML/interpolationSearch.html
I know the question was regarding C, but I just did an implementation of this in Javascript I thought I'd share. Not intended to work if you have duplicate elements in the array...I think it will just return any of the possible indexes in that case. For an array with 1 million elements where you search for each element its about 2.5x faster. If you also search for elements that are not contained in the array then its even faster. In one data set I through at it it was several times faster. For small arrays its about the same
singleSearch=function(array, num) {
return this.singleSearch_(array, num, 0, array.length)
}
singleSearch_=function(array, num, left, right){
while (left < right) {
var middle =(left + right) >> 1;
var midValue = array[middle];
if (num > midValue) {
left = middle + 1;
} else {
right = middle;
}
}
return left;
};
multiSearch=function(array, nums) {
var numsLength=nums.length;
var results=new Int32Array(numsLength);
this.multiSearch_(array, nums, 0, array.length, 0, numsLength, results);
return results;
};
multiSearch_=function(array, nums, left, right, numsLeft, numsRight, results) {
var middle = (left + right) >> 1;
var midValue = array[middle];
var numsMiddle = this.singleSearch_(nums, midValue, numsLeft, numsRight);
if ((numsRight - numsLeft) > 1) {
if (middle + 1 < right) {
var newLeft = middle;
var newRight = middle;
if ((numsRight - numsMiddle) > 0) {
this.multiSearch_(array, nums, newLeft, right, numsMiddle, numsRight, results);
}
if (numsMiddle - numsLeft > 0) {
this.multiSearch_(array, nums, left, newRight, numsLeft, numsMiddle, results);
}
}
else {
for (var i = numsLeft; i < numsRight; i++) {
var result = this.singleSearch_(array, nums[i], left, right);
results[i] = result;
}
}
}
else {
var result = this.singleSearch_(array, nums[numsLeft], left, right);
results[numsLeft] = result;
};
}
// A recursive binary search based function. It returns index of x in
// given array arr[l..r] is present, otherwise -1.
int binarySearch(int arr[], int l, int r, int x)
{
if (r >= l)
{
int mid = l + (r - l)/2;
// If the element is present at one of the middle 3 positions
if (arr[mid] == x) return mid;
if (mid > l && arr[mid-1] == x) return (mid - 1);
if (mid < r && arr[mid+1] == x) return (mid + 1);
// If element is smaller than mid, then it can only be present
// in left subarray
if (arr[mid] > x) return binarySearch(arr, l, mid-2, x);
// Else the element can only be present in right subarray
return binarySearch(arr, mid+2, r, x);
}
// We reach here when element is not present in array
return -1;
}

Remove 1000Hz tone from FFT array in C

I have an array of doubles which is the result of the FFT applied on an array, that contains the audio data of a Wav audio file in which i have added a 1000Hz tone.
I obtained this array thought the DREALFT defined in "Numerical Recipes".(I must use it).
(The original array has a length that is power of two.)
Mine array has this structure:
array[0] = first real valued component of the complex transform
array[1] = last real valued component of the complex transform
array[2] = real part of the second element
array[3] = imaginary part of the second element
etc......
Now, i know that this array represent the frequency domain.
I want to determine and kill the 1000Hz frequency.
I have tried this formula for finding the index of the array which should contain the 1000Hz frequency:
index = 1000. * NElements /44100;
Also, since I assume that this index refers to an array with real values only, i have determined the correct(?) position in my array, that contains imaginary values too:
int correctIndex=2;
for(k=0;k<index;k++){
correctIndex+=2;
}
(I know that surely there is a way easier but it is the first that came to mind)
Then, i find this value: 16275892957.123705, which i suppose to be the real part of the 1000Hz frequency.(Sorry if this is an imprecise affermation but at the moment I do not care to know more about it)
So i have tried to suppress it:
array[index]=-copy[index]*0.1f;
I don't know exactly why i used this formula but is the only one that gives some results, in fact the 1000hz tone appears to decrease slightly.
This is the part of the code in question:
double *copy = malloc( nCampioni * sizeof(double));
int nSamples;
/*...Fill copy with audio data...*/
/*...Apply ZERO PADDING and reach the length of 8388608 samples,
or rather 8388608 double values...*/
/*Apply the FFT (Sure this works)*/
drealft(copy - 1, nSamples, 1);
/*I determine the REAL(?) array index*/
i= 1000. * nSamples /44100;
/*I determine MINE(?) array index*/
int j=2;
for(k=0;k<i;k++){
j+=2;
}
/*I reduce the array value, AND some other values aroud it as an attempt*/
for(i=-12;i<12;i+=2){
copy[j-i]=-copy[i-j]*0.1f;
printf("%d\n",j-i);
}
/*Apply the inverse FFT*/
drealft(copy - 1, nSamples, -1);
/*...Write the audio data on the file...*/
NOTE: for simplicity I omitted the part where I get an array of double from an array of int16_t
How can i determine and totally kill the 1000Hz frequency?
Thank you!
As Oli Charlesworth writes, because your target frequency is not exactly one of the FFT bins (your index, TargetFrequency * NumberOfElements / SamplingRate, is not exactly an integer), the energy of the target frequency will be spread across all bins. For a start, you can eliminate some of the frequency by zeroing the bin closest to the target frequency. This will of course affect other frequencies somewhat too, since it is slightly off target. To better suppress the target frequency, you will need to consider a more sophisticated filter.
However, for educational purposes: To suppress the frequency corresponding to a bin, simply set that bin to zero. You must set both the real and the imaginary components of the bin to zero, which you can do with:
copy[index*2 + 0] = 0;
copy[index*2 + 1] = 1;
Some notes about this:
You had this code to calculate the position in the array:
int correctIndex = 2;
for (k = 0; k < index; k++) {
correctIndex += 2;
}
That is equivalent to:
correctIndex = 2*(index+1);
I believe you want 2*index, not 2*(index+1). So you were likely reducing the wrong bin.
At one point in your question, you wrote array[index] = -copy[index]*0.1f;. I do not know what array is. You appeared to be working in place in copy. I also do not know why you multiplied by 1/10. If you want to eliminate a frequency, just set it to zero. Multiplying it by 1/10 only reduces it to 10% of its original magnitude.
I understand that you must pass copy-1 to drealft because the Numerical Recipes code uses one-based indexing. However, the C standard does not support the way you are doing it. The behavior of the expression copy-1 is not defined by the standard. It will work in most C implementations. However, to write supported portable code, you should do this instead:
// Allocate one extra element.
double *memory = malloc((nCampioni+1) * sizeof *memory);
// Make a pointer that is convenient for your work.
double *copy = memory+1;
…
// Pass the necessary base address to drealft.
drealft(memory, nSamples, 1);
// Suppress a frequency.
copy[index*2 + 0] = 0;
copy[index*2 + 1] = 0;
…
// Free the memory.
free(memory);
One experiment I suggest you consider is to initialize an array with just a sine wave at the desired frequency:
for (i = 0; i < nSamples; ++i)
copy[i] = sin(TwoPi * Frequency / SampleRate * i);
(TwoPi is of course 2*3.1415926535897932384626433.) Then apply drealft and look at the results. You will see that much of the energy is at a peak in the closest bin to the target frequency, but much of it has also spread to other bins. Clearly, zeroing a single bin and performing the inverse FFT cannot eliminate all of the frequency. Also, you should see that the peak is in the same bin you calculated for index. If it is not, something is wrong.

Generating a random cubic graph with uniform probability (or less)

While this may look like homework, I assure you it's not. It stems from some homework assignment I did, though.
Let's call an undirected graph without self-edges "cubic" if every vertex has degree exactly three. Given a positive integer N I'd like to generate a random cubic graph on N vertices. I'd like for it to have uniform probability, that is, if there are M cubic graphs on N vertices the probability of generating each one is 1/M. A weaker condition that is still fine is that every cubic graph has non-zero probability.
I feel there's a quick and smart way to do this, but so far I've been unsuccessful.
I am a bad coder, please bear with this awful code:
PRE: edges = (3*nodes)/2, nodes is even, the constants are selected in such a way that the hash works (BIG_PRIME is bigger than edges, SMALL_PRIME is bigger than nodes, LOAD_FACTOR is small).
void random_cubic_graph() {
int i, j, k, count;
int *degree;
char guard;
count = 0;
degree = (int*) calloc(nodes, sizeof(int));
while (count < edges) {
/* Try a new edge at random */
guard = 0;
i = rand() % nodes;
j = rand() % nodes;
/* Checks if it is a self-edge */
if (i == j)
guard = 1;
/* Checks that the degrees are 3 or less */
if (degree[i] > 2 || degree[j] > 2)
guard = 1;
/* Checks that the edge was not already selected with an hash */
k = 0;
while(A[(j + k*BIG_PRIME) % (LOAD_FACTOR*edges)] != 0) {
if (A[(j + k*BIG_PRIME) % (LOAD_FACTOR*edges)] % SMALL_PRIME == j)
if ((A[(j + k*BIG_PRIME) % (LOAD_FACTOR*edges)] - j) / SMALL_PRIME == i)
guard = 1;
k++;
}
if (guard == 0)
A[(j + k*BIG_PRIME) % (LOAD_FACTOR*edges)] = hash(i,j);
k = 0;
while(A[(i + k*BIG_PRIME) % (LOAD_FACTOR*edges)] != 0) {
if (A[(i + k*BIG_PRIME) % (LOAD_FACTOR*edges)] % SMALL_PRIME == i)
if ((A[(i + k*BIG_PRIME) % (LOAD_FACTOR*edges)] - i) / SMALL_PRIME == j)
guard = 1;
k++;
}
if (guard == 0)
A[(i + k*BIG_PRIME) % (LOAD_FACTOR*edges)] = hash(j,i);
/* If all checks were passed, increment the count, print the edge, increment the degrees. */
if (guard == 0) {
count++;
printf("%d\t%d\n", i, j);
degree[i]++;
degree[j]++;
}
}
The problem is that its final edge that has to be selected might be a self-edge. That happens when N - 1 vertices have already degree 3, only 1 has degree 1. Thus the algorithm might not terminate. Moreover, I'm not entirely convinced that the probability is uniform.
There's probably much to improve in my code, but can you suggest a better algorithm to implement?
Assume N is even. (Otherwise there cannot be a cubic graph on N vertices).
You can do the following:
Take 3N points and divide them into N groups of 3 points each.
Now pair up these 3N points randomly (note: 3N is even). i.e. Marry two points off randomly and form 3N/2 marriages).
If there is a pairing between group i and group j, create an edge between i and j. This gives a graph on N vertices.
If this random pairing does not create any multiple edges or loops, you have a cubic graph.
If not try again. This runs in expected linear time and generates a uniform distribution.
Note: all cubic graphs on N vertices are generated by this method (responding to Hamish's comments).
To see this:
Let G be a cubic graph on N vertices.
Let the vertices be, 1, 2, ...N.
Let the three neighbours of j be A(j), B(j) and C(j).
For each j, construct the group of ordered pairs { (j, A(j)), (j, B(j)), (j, C(j)) }.
This gives us 3N ordered pairs. We pair them up: (u,v) is paired with (v,u).
Thus any cubic graph corresponds to a pairing and vice versa...
More information on this algorithm and faster algorithms can be found here: Generating Random Regular Graphs Quickly.
Warning: I make a lot of intuitive-but-maybe-wrong claims in this answer. You should definitely prove them if you intend to use this idea.
Enumerating Cubic Graphs
When dealing with a random choice, a good starting point is to figure out how to enumerate over all of your possible elements. This might reveal some of the structure, and lead you to an algorithm.
Here is my strategy for enumerating cubic graphs: pick the first vertex, and iterate over all possible choices of three adjacent vertices. During those iterations, recurse on the next vertex, with the caveat that you keep track of how many edges are needed for each vertex degree to reach 3. Continue in that fashion until the lowest level is reached. Now you have your first cubic graph. Undo the recently added edges and continue to the next possibility until there are none left. There are a few implementation details you need to consider, but generally straight-forward.
Generalize Enumeration into Choice
Once you can enumerate all the elements, it is trivial to make a random choice. For example, you can scan the list once to compute its size then pick a random number in [0, size) then scan the sequence again to get the element at that offset. This is incredibly inefficient, taking at LEAST time proportional to the O(n^3) number of cubic graphs, but it works.
Sacrifice Uniform Probability for Efficiency
The obvious speed-up here is to make random edge choices at each level, instead of iterating over each possibility. Unfortunately, this will favor some graphs because of how your early choices affect the availability of later choices. Taking into account the need to track the remaining free vertices, you should be able to achieve O(n log n) time and O(n) space. Significantly better than the enumerating algorithm.
...
It's probably possible to do better. Probably a lot better. But this should get you started.
Another term for cubic graph is 3-regular graph or trivalent graph.
Your problem needs a little more clarification because "the number of cubic graphs" could mean the number of cubic graphs on 2n nodes that are non-isomorphic to one another or the number of (non-isomorphic) cubic graphs on 2n labelled nodes. The former is given by integer sequence A005638, and it is likely a non-trivial problem to uniformly pick a random isomorphism class of cubic graphs efficiently (i.e. not listing them all out and then picking one class). The latter is given by A002829.
There is an article on Wikipedia about random regular graphs that you should take a look at.

Compare two integer arrays with same length

[Description] Given two integer arrays with the same length. Design an algorithm which can judge whether they're the same. The definition of "same" is that, if these two arrays were in sorted order, the elements in corresponding position should be the same.
[Example]
<1 2 3 4> = <3 1 2 4>
<1 2 3 4> != <3 4 1 1>
[Limitation] The algorithm should require constant extra space, and O(n) running time.
(Probably too complex for an interview question.)
(You can use O(N) time to check the min, max, sum, sumsq, etc. are equal first.)
Use no-extra-space radix sort to sort the two arrays in-place. O(N) time complexity, O(1) space.
Then compare them using the usual algorithm. O(N) time complexity, O(1) space.
(Provided (max − min) of the arrays is of O(Nk) with a finite k.)
You can try a probabilistic approach - convert the arrays into a number in some huge base B and mod by some prime P, for example sum B^a_i for all i mod some big-ish P. If they both come out to the same number, try again for as many primes as you want. If it's false at any attempts, then they are not correct. If they pass enough challenges, then they are equal, with high probability.
There's a trivial proof for B > N, P > biggest number. So there must be a challenge that cannot be met. This is actually the deterministic approach, though the complexity analysis might be more difficult, depending on how people view the complexity in terms of the size of the input (as opposed to just the number of elements).
I claim that: Unless the range of input is specified, then it is IMPOSSIBLE to solve in onstant extra space, and O(n) running time.
I will be happy to be proven wrong, so that I can learn something new.
Insert all elements from the first array into a hashtable
Try to insert all elements from the second array into the same hashtable - for each insert to element should already be there
Ok, this is not with constant extra space, but the best I could come up at the moment:-). Are there any other constraints imposed on the question, like for example to biggest integer that may be included in the array?
A few answers are basically correct, even though they don't look like it. The hash table approach (for one example) has an upper limit based on the range of the type involved rather than the number of elements in the arrays. At least by by most definitions, that makes the (upper limit on) the space a constant, although the constant may be quite large.
In theory, you could change that from an upper limit to a true constant amount of space. Just for example, if you were working in C or C++, and it was an array of char, you could use something like:
size_t counts[UCHAR_MAX];
Since UCHAR_MAX is a constant, the amount of space used by the array is also a constant.
Edit: I'd note for the record that a bound on the ranges/sizes of items involved is implicit in nearly all descriptions of algorithmic complexity. Just for example, we all "know" that Quicksort is an O(N log N) algorithm. That's only true, however, if we assume that comparing and swapping the items being sorted takes constant time, which can only be true if we bound the range. If the range of items involved is large enough that we can no longer treat a comparison or a swap as taking constant time, then its complexity would become something like O(N log N log R), were R is the range, so log R approximates the number of bits necessary to represent an item.
Is this a trick question? If the authors assumed integers to be within a given range (2^32 etc.) then "extra constant space" might simply be an array of size 2^32 in which you count the occurrences in both lists.
If the integers are unranged, it cannot be done.
You could add each element into a hashmap<Integer, Integer>, with the following rules: Array A is the adder, array B is the remover. When inserting from Array A, if the key does not exist, insert it with a value of 1. If the key exists, increment the value (keep a count). When removing, if the key exists and is greater than 1, reduce it by 1. If the key exists and is 1, remove the element.
Run through array A followed by array B using the rules above. If at any time during the removal phase array B does not find an element, you can immediately return false. If after both the adder and remover are finished the hashmap is empty, the arrays are equivalent.
Edit: The size of the hashtable will be equal to the number of distinct values in the array does this fit the definition of constant space?
I imagine the solution will require some sort of transformation that is both associative and commutative and guarantees a unique result for a unique set of inputs. However I'm not sure if that even exists.
public static boolean match(int[] array1, int[] array2) {
int x, y = 0;
for(x = 0; x < array1.length; x++) {
y = x;
while(array1[x] != array2[y]) {
if (y + 1 == array1.length)
return false;
y++;
}
int swap = array2[x];
array2[x] = array2[y];
array2[y] = swap;
}
return true;
}
For each array, Use Counting sort technique to build the count of number of elements less than or equal to a particular element . Then compare the two built auxillary arrays at every index, if they r equal arrays r equal else they r not . COunting sort requires O(n) and array comparison at every index is again O(n) so totally its O(n) and the space required is equal to the size of two arrays . Here is a link to counting sort http://en.wikipedia.org/wiki/Counting_sort.
given int are in the range -n..+n a simple way to check for equity may be the following (pseudo code):
// a & b are the array
accumulator = 0
arraysize = size(a)
for(i=0 ; i < arraysize; ++i) {
accumulator = accumulator + a[i] - b[i]
if abs(accumulator) > ((arraysize - i) * n) { return FALSE }
}
return (accumulator == 0)
accumulator must be able to store integer with range = +- arraysize * n
How 'bout this - XOR all the numbers in both the arrays. If the result is 0, you got a match.

Resources