How to identify the duplicated number, in an array, with minimum compexity?

How to identify the duplicated number, in an array, with minimum compexity? - c

There is an array of size 10,000. It store the number 1 to 10,000 in randomly order.
Each number occurs one time only.
Now if any number is removed from that array and any other number is duplicated into array.
How can we identify the which number is duplicated, with minimum complexity?
NOTE : We can not use another array.

The fastest way is an O(N) in-place pigeonhole sort.
Start at the first location of the array, a[0]. Say it has the value 5. You know that 5 belongs at a[4], so swap locations 0 and 4. Now a new value is in a[0]. Swap it to where it needs to go.
Repeat until a[0] == 1, then move on to a[1] and swap until a[1] == 2, etc.
If at any point you end up attempting to swap two identical values, then you have found the duplicate!
Runtime: O(N) with a very low coefficient and early exit. Storage required: zero.
Bonus optimization: count how many swaps have occurred and exit early if n_swaps == array_size. This resulted in a 15% improvement when I implemented a similar algorithm for permuting a sequence.

Compute the sum and the sum of the squares of the elements (you will need 64 bit values for the sum of the squares). From these you can recover which element was modified:
Subtract the expected values for the unmodified array.
If x was removed and y duplicated you get the difference y - x for the sum and y2 - x2 = (y + x) (y - x) for the sum of squares.
From that it is easy to recover x and y.
Edit: Note that this may be faster than pigeonhole sort, because it runs linearly over the array and is thus more cache friendly.

Why not simply using a second array or other data structure like hash table (hash table if you like, depending on the memory/performance tradeoff). This second array would simply store the count of a number in the original array. Now just add a +/- to the access function of the original array and you have your information immediately.
ps when you wrote "we can not use another array" - I assume, you can not change the ORIGINAL data structure. However the use of additional data structures is possible....

Sort the array, then iterate through until you hit two of the same number in a row.

Related

Is there a way to mathematically generate a given index of a sorted permutation sum array given the source set in O(1) or O(log N) time?

I'm given a set of integers of size N in sorted, ascending order. For simplicity, this array "arr" is as follows: [a0, a1, a2, ..., aN]. I need the array of the sum of all pairs ai and aj, with duplicates allowed: [a0 + a0, a0 + a1, a0 + a2, ..., a1 + a0, a1 + a1, ... aN + aN], size N^2. However, I need it in sorted order to binary search across it (in O(log(N^2)) time) without having to generate the entire array, which would take O(N^2 log(N^2)) time. As a binary search only needs the values of the array at certain indices, I was wondering if there was a mathematical function to determine the value of the sorted permutation sum array given a specific index (e.g. value(3) would return ak + am), allowing me to binary search across the array without generating it in full? I was thinking something like:
int value(int index) {
return arr[index/N] + arr[index%N];
}
but this doesn't take into account that the value of arr[i] + arr[k] may be greater than arr[i+1] + arr[k-5], for instance, even though arr[i+1] > arr[i]. TLDR; is there any way I could partition in less than O(N) time for this special case of array? For my own purposes, I could also accept a solution that generates the entire sorted array in less than O(N^2) time.

You will not be able to generate the entire 2D array in time less than O(N^2), because you need to do O(1) work to append/insert each value into your array, and there's N^2 elements to do this for. This automatically entails O(N^2) work.
However, you may be able to query without constructing the full array. I highly suspect that you won't be able to beat O(n) time. The reason I say that is because the only reason we can query a sorted list in O(log n) time is because we've put in prior work to sort those elements, which required inspecting each element and doing some work (thus, at least O(n)), and my gut tells me we're in a similar scenario here, only we haven't yet done the necessary precomputation.
Now, you can query it in O(n) time as is. Start by initializing two pointers- base and count, where base points to the first element in the list, and count points to the last element. Also initialize an integer total to 0. For each step, do the following:
move count backwards (towards the first element) until the value count points to plus the value base points to sum to less than the query value (*count + *base < query).
add the count (the index / offset) to total
move base forwards
repeat until just after either pointer hits its far end (inclusive)
The base pointer represents a value that we wish to know how many other elements could be added to and still be less than the query value. The pointer count indicates the last element that can can be added to base and still be less than the query value, so adding count (the index) for each base gives us the total number of pairs of elements that would sum less than the query. All together, this is O(n) since we just iterate over the initial array, once in each direction, at most.

Does the array “sum and/or sub” to `x`?

Goal
I would like to write an algorithm (in C) which returns TRUE or FALSE (1 or 0) depending whether the array A given in input can “sum and/or sub” to x (see below for clarification). Note that all values of A are integers bounded between [1,x-1] that were randomly (uniformly) sampled.
Clarification and examples
By “sum and/or sub”, I mean placing "+" and "-" in front of each element of array and summing over. Let's call this function SumSub.
int SumSub (int* A,int x)
{
...
}
SumSub({2,7,5},10)
should return TRUE as 7-2+5=10. You will note that the first element of A can also be taken as negative so that the order of elements in A does not matter.
SumSub({2,7,5,2},10)
should return FALSE as there is no way to “sum and/or sub” the elements of array to reach the value of x. Please note, this means that all elements of A must be used.
Complexity
Let n be the length of A. Complexity of the problem is of order O(2^n) if one has to explore all possible combinations of pluses and minus. However, some combinations are more likely than others and therefore are worth being explored first (hoping the output will be TRUE). Typically, the combination which requires substracting all elements from the largest number is impossible (as all elements of A are lower than x). Also, if n>x, it makes no sense to try adding all the elements of A.
Question
How should I go about writing this function?

Unfortunately your problem can be reduced to subset-sum problem which is NP-Complete. Thus the exponential solution can't be avoided.

The original problem's solution is indeed exponential as you said. BUT with the given range[1,x-1] for numbers in A[] you can make the solution polynomial. There is a very simple dynamic programming solution.
With the order:
Time Complexity: O(n^2*x)
Memory Complexity: O(n^2*x)
where, n=num of elements in A[]
You need to use dynamic programming approach for this
You know the min,max range that can be made in in the range [-nx,nx]. Create a 2d array of size (n)X(2*n*x+1). Lets call this dp[][]
dp[i][j] = taking all elements of A[] from [0..i-1] whether its possible to make the value j
so
dp[10][3] = 1 means taking first 10 elements of A[] we CAN create the value 3
dp[10][3] = 0 means taking first 10 elements of A[] we can NOT create the value 3
Here is a kind of pseudo code for this:
int SumSub (int* A,int x)
{
bool dp[][];//set all values of this array 0
dp[0][0] = true;
for(i=1;i<=n;i++) {
int val = A[i-1];
for(j=-n*x;j<=n*x;j++) {
dp[i][j]=dp[ i-1 ][ j + val ] | dp[ i-1 ][ j - val ];
}
}
return dp[n][x];
}

Unfortunately this is NP-complete even when x is restricted to the value 0, so don't expect a polynomial-time algorithm. To show this I'll give a simple reduction from the NP-hard Partition Problem, which asks whether a given multiset of positive integers can be partitioned into two parts having equal sums:
Suppose we have an instance of the Partition Problem consisting of n positive integers B_1, ..., B_n. Create from this an instance of your problem in which A_i = B_i for each 1 <= i <= n, and set x = 0.
Clearly if there is a partition of B into two parts C and D having equal sums, then there is also a solution to the instance of your problem: Put a + in front of every number in C, and a - in front of every number in D (or the other way round). Since C and D have equal sums, this expression must equal 0.
OTOH, if the solution to the instance of your problem that we just created is YES (TRUE), then we can easily create a partition of B into two parts having equal sums: just put all the positive terms in one part (say, C), and all the negative terms (without the preceding - of course) in the other (say, D). Since we know that the total value of the expression is 0, it must be that the sum of the (positive) numbers in C is equal to the (negated) sum of the numbers in D.
Thus a YES to either problem instance implies a YES to the other problem instance, which in turn implies that a NO to either problem instance implies a NO to the other problem instance -- that is, the two problem instances have equal solutions. Thus if it were possible to solve your problem in polynomial time, it would be possible to solve the NP-hard Partition Problem in polynomial time too, by constructing the above instance of your problem, solving it with your poly-time algorithm, and reporting the result it gives.

Probability, expected number

In an unsorted array, an element is a local maximum if it is larger than
both of the two adjacent elements. The first and last elements of the array are considered local
maxima if they are larger than the only adjacent element. If we create an array by randomly
permuting the numbers from 1 to n, what is the expected number of local maxima? Prove
your answer correct using additivity of expectations.
Im stuck with this question, i have no clue how to solve this...

You've got an unsorted Array array with n elements. You've got two possible positions for where the local maxima could be. The local maxima could be either on the end or between the first and last element.
Case 1:
If you're looking at the element in either the first or last index (array[0] or array[n-1]) What's the probability that the element is a local maxima? In other words what's the probability that the value of that element will be greater than the element to its right? There are 10 possible value each index could hold {0,1,2,3,4,5,6,7,8,9}. Therefore a 50% chance that on average the element in the first index will be greater than the element in the second index. (array[0] > array[1])
Case 2:
If you're looking at any element that ISNT the first or last element of the array, (n-2 elements) then what's the probability that each one will be the local max? Similarly to the first case, we know there are 10 possible values each index could hold, therefore a 1/3 chance that on average, the element we choose will be greater than the one before it and greater than the one after it.
Putting it all together:
There are 2 cases that have a 1/2 probability of being local maxima and there are n-2 cases that have a 1/3 probability of being local maxima. (2 + n-2 = n, all possible cases). (2)(1/2) + (n-2)(1/3) = (1+n)/(3).

Solvable of course, but won't deprive you the fun of doing it yourself. I will give you a tip. Consider this sketch. What do you think it represents? If you figure this out, you will know that a pattern is available to discover for any n, odd and even. Good luck. If still stuck, will tip you more.

Why does linear probing work with a relatively prime step?

I was reading about linear probing in a hash table tutorial and came upon this:
The step size is almost always 1 with linear probing, but it is acceptable to use other step sizes as long as the step size is relatively prime to the table size so that every index is eventually visited. If this restriction isn't met, all of the indices may not be visited...
(The basic problem is: You need to visit every index in an array starting at an arbitrary index and skipping ahead a fixed number of indices [the skip] to the next index, wrapping to the beginning of the array if necessary with modulo.)
I understand why not all indices could be visited if the step size isn't relatively prime to the table size, but I don't understand why the converse is true: that all the indices will be visited if the step size is relatively prime to the array size.
I've observed this relatively prime property working in several examples that I've worked out by hand, but I don't understand why it works in every case.
In short, my question is: Why is every index of an array visited with a step that is relatively prime to the array size? Is there a proof of this?
Thanks!

Wikipedia about Cyclic Groups
The units of the ring Z/nZ are the numbers coprime to n.
Also:
[If two numbers are co-prime] There exist integers x and y such that ax + by = 1
So, if "a" is your step length, and "b" the length of the array, you can reach any index "z" by
axz + byz = z
=>
axz = z (mod b)
i.e stepping "xz" times (and wrapping over the array "yz" times).

number of steps is lcm(A,P)/P or A/gcd(A,P) where A is array size and P is this magic coprime.
so if gcd(A,P) != 1 then number of steps will be less than A
On contrary if gcd(A,P) == 1 (coprimes) then number of steps will be A and all indexes will be visited

Compare two integer arrays with same length

[Description] Given two integer arrays with the same length. Design an algorithm which can judge whether they're the same. The definition of "same" is that, if these two arrays were in sorted order, the elements in corresponding position should be the same.
[Example]
<1 2 3 4> = <3 1 2 4>
<1 2 3 4> != <3 4 1 1>
[Limitation] The algorithm should require constant extra space, and O(n) running time.

(Probably too complex for an interview question.)
(You can use O(N) time to check the min, max, sum, sumsq, etc. are equal first.)
Use no-extra-space radix sort to sort the two arrays in-place. O(N) time complexity, O(1) space.
Then compare them using the usual algorithm. O(N) time complexity, O(1) space.
(Provided (max − min) of the arrays is of O(Nk) with a finite k.)

You can try a probabilistic approach - convert the arrays into a number in some huge base B and mod by some prime P, for example sum B^a_i for all i mod some big-ish P. If they both come out to the same number, try again for as many primes as you want. If it's false at any attempts, then they are not correct. If they pass enough challenges, then they are equal, with high probability.
There's a trivial proof for B > N, P > biggest number. So there must be a challenge that cannot be met. This is actually the deterministic approach, though the complexity analysis might be more difficult, depending on how people view the complexity in terms of the size of the input (as opposed to just the number of elements).

I claim that: Unless the range of input is specified, then it is IMPOSSIBLE to solve in onstant extra space, and O(n) running time.
I will be happy to be proven wrong, so that I can learn something new.

Insert all elements from the first array into a hashtable
Try to insert all elements from the second array into the same hashtable - for each insert to element should already be there
Ok, this is not with constant extra space, but the best I could come up at the moment:-). Are there any other constraints imposed on the question, like for example to biggest integer that may be included in the array?

A few answers are basically correct, even though they don't look like it. The hash table approach (for one example) has an upper limit based on the range of the type involved rather than the number of elements in the arrays. At least by by most definitions, that makes the (upper limit on) the space a constant, although the constant may be quite large.
In theory, you could change that from an upper limit to a true constant amount of space. Just for example, if you were working in C or C++, and it was an array of char, you could use something like:
size_t counts[UCHAR_MAX];
Since UCHAR_MAX is a constant, the amount of space used by the array is also a constant.
Edit: I'd note for the record that a bound on the ranges/sizes of items involved is implicit in nearly all descriptions of algorithmic complexity. Just for example, we all "know" that Quicksort is an O(N log N) algorithm. That's only true, however, if we assume that comparing and swapping the items being sorted takes constant time, which can only be true if we bound the range. If the range of items involved is large enough that we can no longer treat a comparison or a swap as taking constant time, then its complexity would become something like O(N log N log R), were R is the range, so log R approximates the number of bits necessary to represent an item.

Is this a trick question? If the authors assumed integers to be within a given range (2^32 etc.) then "extra constant space" might simply be an array of size 2^32 in which you count the occurrences in both lists.
If the integers are unranged, it cannot be done.

You could add each element into a hashmap<Integer, Integer>, with the following rules: Array A is the adder, array B is the remover. When inserting from Array A, if the key does not exist, insert it with a value of 1. If the key exists, increment the value (keep a count). When removing, if the key exists and is greater than 1, reduce it by 1. If the key exists and is 1, remove the element.
Run through array A followed by array B using the rules above. If at any time during the removal phase array B does not find an element, you can immediately return false. If after both the adder and remover are finished the hashmap is empty, the arrays are equivalent.
Edit: The size of the hashtable will be equal to the number of distinct values in the array does this fit the definition of constant space?

I imagine the solution will require some sort of transformation that is both associative and commutative and guarantees a unique result for a unique set of inputs. However I'm not sure if that even exists.

public static boolean match(int[] array1, int[] array2) {
int x, y = 0;
for(x = 0; x < array1.length; x++) {
y = x;
while(array1[x] != array2[y]) {
if (y + 1 == array1.length)
return false;
y++;
}
int swap = array2[x];
array2[x] = array2[y];
array2[y] = swap;
}
return true;
}

For each array, Use Counting sort technique to build the count of number of elements less than or equal to a particular element . Then compare the two built auxillary arrays at every index, if they r equal arrays r equal else they r not . COunting sort requires O(n) and array comparison at every index is again O(n) so totally its O(n) and the space required is equal to the size of two arrays . Here is a link to counting sort http://en.wikipedia.org/wiki/Counting_sort.

given int are in the range -n..+n a simple way to check for equity may be the following (pseudo code):
// a & b are the array
accumulator = 0
arraysize = size(a)
for(i=0 ; i < arraysize; ++i) {
accumulator = accumulator + a[i] - b[i]
if abs(accumulator) > ((arraysize - i) * n) { return FALSE }
}
return (accumulator == 0)
accumulator must be able to store integer with range = +- arraysize * n

How 'bout this - XOR all the numbers in both the arrays. If the result is 0, you got a match.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to identify the duplicated number, in an array, with minimum compexity? - c

Sort the array, then iterate through until you hit two of the same number in a row.

Related

Is there a way to mathematically generate a given index of a sorted permutation sum array given the source set in O(1) or O(log N) time?

Does the array “sum and/or sub” to `x`?

Probability, expected number

Why does linear probing work with a relatively prime step?

Compare two integer arrays with same length

Categories

Resources