finding the smallest Positive Missing intger in an array several times - arrays

I have this algorithm. How can I do it faster than O(n^2) ?
Here is the algorithm :
We are given an array of size k in one operation we choose the smallest positive missing integer from the last k elements of the array and we add it to the end of the array.
for example if k = 4 and the array is 4 7 2 2
after one operation the array becomes 4 7 2 2 1
and after 2 operations it becomes 4 7 2 2 1 3 (the smallest positive missing integer between 7 2 2 1 is 3)
after k + 1 operations whats the final array ?

I can't explain more than provide the steps.
The code is straightforward.
first is the array, last is the array + size
auto first = array;
auto last = first + k;
std::sort(first, last);
unsigned seq = 1;
// merge
auto std::vector dest(k + k + 1);
auto d_i = dest.begin();
auto d_end = dest.end();
while (d_i != d_end){ // we should fill destination
if(*first <= seq){
*d_i++ = *first;
if(seq == *first) seq++;
}else{
*d_i++ = seq++;
}
}

Related

Algorithm to find k smallest numbers in an array in same order using O(1) auxiliary space

For example if the array is arr[] = {4, 2, 6, 1, 5},
and k = 3, then the output should be 4 2 1.
It can be done in O(nk) steps and O(1) space.
Firstly, find the kth smallest number in kn steps: find the minimum; store it in a local variable min; then find the second smallest number, i.e. the smallest number that is greater than min; store it in min; and so on... repeat the process from i = 1 to k (each time it's a linear search through the array).
Having this value, browse through the array and print all elements that are smaller or equal to min. This final step is linear.
Care has to be taken if there are duplicate values in the array. In such a case we have to increment i several times if duplicate min values are found in one pass. Additionally, besides min variable we have to have a count variable, which is reset to zero with each iteration of the main loop, and is incremented each time a duplicate min number is found.
In the final scan through the array, we print all values smaller than min, and up to count values exactly min.
The algorithm in C would like this:
int min = MIN_VALUE, local_min;
int count;
int i, j;
i = 0;
while (i < k) {
local_min = MAX_VALUE;
count = 0;
for (j = 0; j < n; j++) {
if ((arr[j] > min || min == MIN_VALUE) && arr[j] < local_min) {
local_min = arr[j];
count = 1;
}
else if ((arr[j] > min || min == MIN_VALUE) && arr[j] == local_min) {
count++;
}
}
min = local_min;
i += count;
}
if (i > k) {
count = count - (i - k);
}
for (i = 0, j = 0; i < n; i++) {
if (arr[i] < min) {
print arr[i];
}
else if (arr[i] == min && j < count) {
print arr[i];
j++;
}
}
where MIN_VALUE and MAX_VALUE can be some arbitrary values such as -infinity and +infinity, or MIN_VALUE = arr[0] and MAX_VALUE is set to be maximal value in arr (the max can be found in an additional initial loop).
Single pass solution - O(k) space (for O(1) space see below).
The order of the items is preserved (i.e. stable).
// Pseudo code
if ( arr.size <= k )
handle special case
array results[k]
int i = 0;
// init
for ( ; i < k, i++) { // or use memcpy()
results[i] = arr[i]
}
int max_val = max of results
for( ; i < arr.size; i++) {
if( arr[i] < max_val ) {
remove largest in results // move the remaining up / memmove()
add arr[i] at end of results // i.e. results[k-1] = arr[i]
max_val = new max of results
}
}
// for larger k you'd want some optimization to get the new max
// and maybe keep track of the position of max_val in the results array
Example:
4 6 2 3 1 5
4 6 2 // init
4 2 3 // remove 6, add 3 at end
2 3 1 // remove 4, add 1 at end
// or the original:
4 2 6 1 5
4 2 6 // init
4 2 1 // remove 6, add 1 -- if max is last, just replace
Optimization:
If a few extra bytes are allowed, you can optimize for larger k:
create an array size k of objects {value, position_in_list}
keep the items sorted on value:
new value: drop last element, insert the new at the right location
new max is the last element
sort the end result on position_in_list
for really large k use binary search to locate the insertion point
O(1) space:
If we're allowed to overwrite the data, the same algorithm can be used, but instead of using a separate array[k], use the first k elements of the list (and you can skip the init).
If the data has to be preserved, see my second answer with good performance for large k and O(1) space.
First find the Kth smallest number in the array.
Look at https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-2-expected-linear-time/
Above link shows how you can use randomize quick select ,to find the kth smallest element in an average complexity of O(n) time.
Once you have the Kth smallest element,loop through the array and print all those elements which are equal to or less than Kth smallest number.
int small={Kth smallest number in the array}
for(int i=0;i<array.length;i++){
if(array[i]<=small){
System.out.println(array[i]+ " ");
}
}
A baseline (complexity at most 3n-2 for k=3):
find the min M1 from the end of the list and its position P1 (store it in out[2])
redo it from P1 to find M2 at P2 (store it in out[1])
redo it from P2 to find M3 (store it in out[0])
It can undoubtedly be improved.
Solution with O(1) space and large k (for example 100,000) with only a few passes through the list.
In my first answer I presented a single pass solution using O(k) space with an option for single pass O(1) space if we are allowed to overwrite the data.
For data that cannot be overwritten, ciamej provided a O(1) solution requiring up to k passes through the data, which works great.
However, for large lists (n) and large k we may want a faster solution. For example, with n=100,000,000 (distinct values) and k=100,000 we would have to check 10 trillion items with a branch on each item + an extra pass to get those items.
To reduce the passes over n we can create a small histogram of ranges. This requires a small storage space for the histogram, but since O(1) means constant space (i.e. not depending on n or k) I think we're allowed to do that. That space could be as small as an array of 2 * uint32. Histogram size should be a power of two, which allows us to use bit masking.
To keep the following example small and simple, we'll use a list containing 16-bit positive integers and a histogram of uint32[256] - but it will work with uint32[2] as well.
First, find the k-th smallest number - only 2 passes required:
uint32 hist[256];
First pass: group (count) by multiples of 256 - no branching besides the loop
loop:
hist[arr[i] & 0xff00 >> 8]++;
Now we have a count for each range and can calculate which bucket our k is in.
Save the total count up to that bucket and reset the histogram.
Second pass: fill the histogram again,
now masking the lower 8 bits and only for the numbers belonging in that range.
The range check can also be done with a mask
After this last pass, all values represented in the histogram are unique
and we can easily calculate where our k-th number is.
If the count in that slot (which represents our max value after restoring
with the previous mask) is higher than one, we'll have to remember that
when printing out the numbers.
This is explained in ciamej's post, so I won't repeat it here.
---
With hist[4] and a list of 32-bit integers we would need 8 passes.
The algorithm can easily be adjusted for signed integers.
Example:
k = 7
uint32_t hist[256]; // can be as small as hist[2]
uint16_t arr[]:
88
258
4
524
620
45
440
112
380
580
88
178
Fill histogram with:
hist[arr[i] & 0xff00 >> 8]++;
hist count
0 (0-255) 6
1 (256-511) 3 -> k
2 (512-767) 3
...
k is in hist[1] -> (256-511)
Clear histogram and fill with range (256-511):
Fill histogram with:
if (arr[i] & 0xff00 == 0x0100)
hist[arr[i] & 0xff]++;
Numbers in this range are:
258 & 0xff = 2
440 & 0xff = 184
380 & 0xff = 124
hist count
0 0
1 0
2 1 -> k
... 0
124 1
... 0
184 1
... 0
k - 6 (first pass) = 1
k is in hist[2], which is 2 + 256 = 258
Loop through arr[] to display the numbers <= 258 in preserved order.
Take care of possible duplicate highest numbers (hist[2] > 1 in this case).
we can easily calculate how many we have to print of those.
Further optimization:
If we can expect k to be in the lower ranges, we can even optimize this further by using the log2 values instead of fixed ranges:
There is a single CPU instruction to count the leading zero bits (or one bits)
so we don't have to call a standard log() function
but can call an intrinsic function instead.
This would require hist[65] for a list with 64-bit (positive) integers.
We would then have something like:
hist[ 64 - n_leading_zero_bits ]++;
This way the ranges we have to use in the following passes would be smaller.

Recover original array from all subsets

You are given all subset sums of an array. You are then supposed to recover the original array from the subset sums provided.
Every element in the original array is guaranteed to be non-negative and less than 10^5. There are no more than 20 elements in the original array. The original array is also sorted. The input is guaranteed to be valid.
Example 1
If the subset sums provided are this:
0 1 5 6 6 7 11 12
We can quickly deduce that the size of the original array is 3 since there are 8 (2^3) subsets. The output (i.e original array) for the above input is this:
1 5 6
Example 2
Input:
0 1 1 2 8 9 9 10
Output:
1 1 8
What I Tried
Since all elements are guaranteed to be non-negative, the largest integer in the input must be the total of the array. However, I am not sure as to how do I proceed from there. By logic, I thought that the next (2^2 - 1) largest subset sums must include all except one element from the array.
However, the above logic does not work when the original array is this:
1 1 8
That's why I am stuck and am not sure on how to proceed on.
Say S is the subset sum array and A is the original array. I'm assuming S is sorted.
|A| = log2(|S|)
S[0] = 0
S[1] = A[0]
S[2] = A[1]
S[3] = EITHER A[2] OR A[0] + A[1].
In general, S[i] for i >= 3 is either an element of A or a combination of the elements of A that you've already encountered. When processing S, skip once per combination of known elements of A that generate a given number, add any remaining numbers to A. Stop when A gets to the right size.
E.g., if A=[1,2,7,8,9] then S will include [1,2,1+2=3,...,1+8=9, 2+7=9,9,...]. When processing S we skip over two 9s because of 1+8 and 2+7, then see a third 9 which we know must belong to A.
E.g., if S=[0,1,1,2,8,9,9,10] then we know A has 3 elements, that the first 2 elements of A are [1,1], when we get to 2 we skip it because 1+1=2, we append 8 and we're done because we have 3 elements.
Here's an easy algorithm that doesn't require finding which subset sums to a given number.
S ← input sequence
X ← empty sequence
While S has a non-zero element:
d ← second smallest element of S (the smallest one is always zero)
Insert d in X
N ← empty sequence
While S is not empty:
z ← smallest element of S
Remove both z and z+d from S (if S does not contain z+d, it's an error; remove only one instance of both z and z+d if there are several).
Insert z in N.
S ← N
Output X.
I revisited this question a few years later and finally managed to solve it! The approach that I've used to tackle this problem is the same as what Dave had devised earlier. Dave gave a pretty concrete explanation so I'll just add on some details and append my commented C++ code so that it's a bit more clear;
Excluding the empty set, the two smallest elements in S has to be the two smallest elements in A. This is because every element is guaranteed to be non-negative. Having known the values of A[0] and A[1], we have something tangible to work and build bottom-up with.
Following which, any new element in S can either be a summation of the previous elements we have confirmed to be in A or it can an entirely new element in A. (i.e S[3] = A[0] + A[1] or S[3] = A[2]) To keep track of this, we can use a frequency table such as an unordered_map<int, int> in C++. We then repeat this process for S[4], S[5]... to continue filling up A.
To prune our search space, we can stop the moment the size of A corresponds with the size of S. (i.e |A| = log(|S|)/log2). This help us drastically cut unnecessary computation and runtime.
#include <bits/stdc++.h>
using namespace std;
typedef vector<int> vi;
int main () {
int n; cin>>n;
vi S, A, sums;
unordered_map<int, int> freq;
for (int i=0;i<(int) pow(2.0, n);i++) {
int a; cin>>a;
S.push_back(a);
}
sort(S.begin(), S.end());
// edge cases
A.push_back(S[1]);
if (n == 1) {for (auto v : A) cout << v << "\n"; return 0;}
A.push_back(S[2]);
if (n == 2) {for (auto v : A) cout << v << "\n"; return 0;}
sums.push_back(0); sums.push_back(S[1]); sums.push_back(S[2]);
sums.push_back(S[1] + S[2]);
freq[S[1] + S[2]]++; // IMPT: we only need frequency of composite elements
for (int i=3; i < S.size(); i++) {
if (A.size() == n) break; // IMPT: prune the search space
// has to be a new element in A
if (freq[S[i]] == 0) {
// compute the new subset sums with the addition of a new element
vi newsums = sums;
for (int j=0;j<sums.size();j++) {
int y = sums[j] + S[i];
newsums.push_back(y);
if (j != 0) freq[y]++; // IMPT: coz we only need frequency of composite elements
}
// update A and subset sums
sums = newsums;
A.push_back(S[i]);
} else {
// has to be a summation of the previous elements in A
freq[S[i]]--;
}
}
for (auto v : A) cout << v << "\n";
}

How to find all values that occur more than n/k times in an array?

I'm pursuing the Algorithms, Part I course on Coursera, and one of the interview questions (ungraded) is as follows:
Decimal dominants. Given an array with n keys, design an algorithm to find all values that occur more than n/10 times. The expected
running time of your algorithm should be linear.
It has a hint:
determine the (n/10)th largest key using quickselect and check if it
occurs more than n/10 times.
I don't understand what does the n/10 largest key have to do with n/10 repeated values. It won't tell me which values occur more than n/10 times.
There's a paper that finds a more general solution for n/k, but I'm having a hard time understanding the code in the paper.
One way to solve it is to sort the input array, and then make another pass counting the occurrence of each distinct value. That'll take O(nlogn) + O(n) time, which is more than what the question asks for.
Ideas?
Finding the n/10th largest key (that is, the key that would be at position n/10 if the array was sorted) takes linear time using QuickSelect. If there are less than n/10 copies of this key, then you know that there are not n/10 copies of anything above it in sorted order, because there isn't room for n/10 copies of anything above the key in sorted order. If there are n/10 or more copies, then you have found something that occurs more than n/10 times, and again there can't be anything larger than it that occurs more than n/10 times, because there isn't room for it.
Now you have an array of at most 9n/10 values smaller than the key you have just found left over from QuickSelect. Use another pass of QuickSelect to find the key n/10 from the top of this left over array. As before, you may find a key that occurs n/10 or more times, and whether you do or not you will eliminate at least n/10 entries from the array.
So you can search the whole array with 10 calls of QuickSelect, each taking linear time. 10 is a number fixed in the problem definition, so the whole operation counts as only linear time.
There is a variation of Boyer-Moore Voting algorithm which can find all the elements that occurs more than n/k in a input which runs in O(nk) and since k = 10 for your problem I think it should run in O(n * 10) = O(n) time.
From here
Following is an interesting O(nk) solution: We can solve the above
problem in O(nk) time using O(k-1) extra space. Note that there can
never be more than k-1 elements in output (Why?). There are mainly
three steps in this algorithm.
1) Create a temporary array of size (k-1) to store elements and their
counts (The output elements are going to be among these k-1 elements).
Following is structure of temporary array elements.
struct eleCount {
int element;
int count; };
struct eleCount temp[]; This step takes O(k) time.
2) Traverse through the input array and update temp[] (add/remove an
element or increase/decrease count) for every traversed element. The
array temp[] stores potential (k-1) candidates at every step. This
step takes O(nk) time.
3) Iterate through final (k-1) potential candidates (stored in
temp[]). or every element, check if it actually has count more than
n/k. This step takes O(nk) time.
The main step is step 2, how to maintain (k-1) potential candidates at
every point? The steps used in step 2 are like famous game: Tetris. We
treat each number as a piece in Tetris, which falls down in our
temporary array temp[]. Our task is to try to keep the same number
stacked on the same column (count in temporary array is incremented).
Consider k = 4, n = 9 Given array: 3 1 2 2 2 1 4 3 3
i = 0
3 _ _ temp[] has one element, 3 with count 1
i = 1
3 1 _ temp[] has two elements, 3 and 1 with counts 1 and 1 respectively
i = 2
3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 1 respectively.
i = 3
- - 2
3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 2 respectively.
i = 4
- - 2
- - 2
3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 3 respectively.
i = 5
- - 2
- 1 2
3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 2 and 3 respectively.
Now the question arises, what to do when temp[]
is full and we see a new element – we remove the bottom row from
stacks of elements, i.e., we decrease count of every element by 1 in
temp[]. We ignore the current element.
i = 6
- - 2
- 1 2 temp[] has two elements, 1 and 2 with counts as 1 and 2 respectively.
i = 7
- 2
3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 2 respectively.
i = 8
3 - 2
3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 2, 1 and 2 respectively.
Finally, we have at most k-1 numbers in
temp[]. The elements in temp are {3, 1, 2}. Note that the counts in
temp[] are useless now, the counts were needed only in step 2. Now we
need to check whether the actual counts of elements in temp[] are more
than n/k (9/4) or not. The elements 3 and 2 have counts more than 9/4.
So we print 3 and 2.
For a proper proof of this approach check out this answer from cs.stackexchange
You are right.
Decimal dominant is among the next candidates after QuickSelect: n/10, 2*n/10..9*n/10, so checking only n/10 index is not sufficient
Note that dominant occupies long run in sorted array and certainly at least one of elements with mentioned indexes belongs to that run.
Example for k = 3, N = 11. Let element b occupies at least 1/3 of array. In this case sorted array might look like
b b b b * * * * * * *
* b b b b * * * * * *
* * b b b b * * * * *
* * * b b b b * * * *
* * * * b b b b * * *
* * * * * b b b b * *
* * * * * b b b b * *
* * * * * * b b b b *
* * * * * * * b b b b
^ ^ //positions for quickselect
Note that in any case dominant element (if k-dominant does exist) occupies at least one of marked places. So after two rounds of QuickSelect we have two candidates
To say the truth I haven't got any of answers here, but they gave an idea how to solve the problem. Based on the answer by #MBo as was mentioned, you use quick selection for an every 10th element, but you use not a simple quick selection, but the quick selection based on 3-way quick sort. You not just place every 10th element on its place, but also all elements that are equal to it.
And after one iteration is enough to count all elements that have more than 9 duplicates.
I decided this question with Dijkstra's approach to solve DNF problem.
Please see my emplementation below. May be it will be interest and useful to somebody.
public void dCount(Comparable[] arr) {
dCount(arr, 0, arr.length - 1);
}
private void dCount(Comparable[] arr, int lo, int hi) {
if (lo >= hi) return;
int curr = lo;
int lt = lo;
int rt = hi;
Comparable pivot = arr[lo];
while (curr <= rt) { // 3-way qSort main cycle
if (less(arr[curr], pivot))
swap(arr, curr++, lt++);
else if (less(pivot, arr[curr]))
swap(arr, curr, rt--);
else curr++;
}
int count = curr - lt;
if (count > arr.length / 10) { // you can change count value if it needs (n / 3, n / 2... just change 10 to your number)
System.out.printf("%s repeats %d times\n", arr[lt], count);
}
dCount(arr, lo, lt -1); // recur call for the left side from the pivot equal range
dCount(arr, rt + 1, hi); // recur call for the right side from the pivot equal range
}
private static boolean less(Comparable a, Comparable b) {
return a.compareTo(b) < 0;
}
private static void swap(Comparable[] arr, int i, int j) {
Comparable swap = arr[i];
arr[i] = arr[j];
arr[j] = swap;
}

Optimal way to find number of operation required to convert all K numbers to lie in the range [L,R] (i.e. L≤x≤R)

I am solving this question which requires some optimized techniques to
solve it. I can think of the brute force method only which requires
combinatorics.
Given an array A consisting of n integers. We call an integer "good"
if it lies in the range [L,R] (i.e. L≤x≤R). We need to make sure if we
pick up any K integers from the array at least one of them should be a
good integer.
For achieving this, in a single operation, we are allowed to
increase/decrease any element of the array by one.
What will be the minimum number of operations we will need for a
fixed k?"
i.e k=1 to n.
input:
L R
1 2
A=[ 1 3 3 ]
output:
for k=1 : 2
for k=2 : 1
for k=3 : 0
For k=1, you have to convert both the 3s into 2s to make sure that if
you select any one of the 3 integers, the selected integer is good.
For k=2, one of the possible ways is to convert one of the 3s into 2.
For k=3, no operation is needed as 1 is a good integer.
As burnpanck has explained in his answer, to make sure that when you pick any k elements in the array, and at least one of them is in range [L,R], we need to make sure that there are at least n - k + 1 numbers in range [L,R] in the array.
So, first , for each element, we calculate the cost to make this element be a valid element (which is in range [L,R]) and store those cost in an array cost.
We notice that:
For k = 1, the minimum cost is the sum of array cost.
For k = 2, the minimum cost is the sum of cost, minus the largest element.
For k = 3, the minimum cost is the sum of cost, minus two largest elements.
...
So, we need to have a prefixSum array, which ith position is the sum of sorted cost array from 0 to ith.
After calculate prefixSum, we can answer result for each k in O(1)
So here is the algo in Java, notice the time complexity is O(n logn):
int[]cost = new int[n];
for(int i = 0; i < n; i++)
cost[i] = //Calculate min cost for element i
Arrays.sort(cost);
int[]prefix = new int[n];
for(int i = 0; i < n; i++)
prefix[i] = cost[i] + (i > 0 ? prefix[i - 1] : 0);
for(int i = n - 1; i >= 0; i--)
System.out.println("Result for k = " + (n - i) + " is " + prefix[i]);
To be sure that from picking k elements will give at least one valid means you should have not more than k-1 invalid in your set. You therefore need to find the shortest way to make enough elements valid. This I would do as follows: In a single pass, generate a map that counts how many elements are in the set that need $n$ operations to be made valid. Then, you clearly want to take those elements that need the least operations, so take the required number of elements in ascending order of required number of operations, and sum the number of operations.
In python:
def min_ops(L,R,A_set):
n_ops = dict() # create an empty mapping
for a in A_set: # loop over all a in the set A_set
n = max(0,max(a-R,L-a)) # the number of operations requied to make a valid
n_ops[n] = n_ops.get(n,0) + 1 # in the mapping, increment the element keyed by *n* by ones. If it does not exist yet, assume it was 0.
allret = [] # create a new list to hold the result for all k
for k in range(1,len(A_set)+1): # iterate over all k in the range [1,N+1) == [1,N]
n_good_required = len(A_set) - k + 1
ret = 0
# iterator over all pairs of keys,values from the mapping, sorted by key.
# The key is the number of ops required, the value the number of elements available
for n,nel in sorted(n_ops.items()):
if n_good_required:
return ret
ret += n * min(nel,n_good_required)
n_good_required -= nel
allret.append(ret) # append the answer for this k to the result list
return allret
As an example:
A_set = [1,3,3,6,8,5,4,7]
L,R = 4,6
For each A, we find how many operations we need to make it valid:
n = [3,1,1,0,2,0,0,1]
(i.e. 1 needs 3 steps, 3 needs one, and so on)
Then we count them:
n_ops = {
0: 3, # we already have three valid elements
1: 3, # three elements that require one op
2: 1,
3: 1, # and finally one that requires 3 ops
}
Now, for each k, we find out how many valid elements we need in the set,
e.g. for k = 4, we need at most 3 invalid in the set of 8, so we need 5 valid ones.
Thus:
ret = 0
n_good_requied = 5
with n=0, we have 3 so take all of them
ret = 0
n_good_required = 2
with n=1, we have 3, but we need just two, so take those
ret = 2
we're finished

Find pairs that sum to X in an array of integers of size N having element in the range 0 to N-1

It is an interview question. We have an array of integers of size N containing element between 0 to N-1. It may be possible that a number can occur more than two times. The goal is to find pairs that sum to a given number X.
I did it using an auxiliary array having count of elements of primary array and then rearranging primary according auxiliary array so that primary is sorted and then searched for pairs.
But interviewer wanted space complexity constant, so I told him to sort the array but it is nlogn time complexity solution. He wanted O(n) solution.
Is there any method available to do it in O(n) without any extra space?
No, I don't believe so. You either need extra space to be able to "sort" the data in O(n) by assigning to buckets, or you need to sort in-place which will not be O(n).
Of course, there are always tricks if you can make certain assumptions. For example, if N < 64K and your integers are 32 bits wide, you can multiplex the space required for the count array on top of the current array.
In other words, use the lower 16 bits for storing the values in the array and then use the upper 16 bits for your array where you simply store the count of values matching the index.
Let's use a simplified example where N == 8. Hence the array is 8 elements in length and the integers at each element are less than 8, though they're eight bits wide. That means (initially) the top four bits of each element are zero.
0 1 2 3 4 5 6 7 <- index
(0)7 (0)6 (0)2 (0)5 (0)3 (0)3 (0)7 (0)7
The pseudo-code for an O(n) adjustment which stores the count into the upper four bits is:
for idx = 0 to N:
array[array[idx] % 16] += 16 // add 1 to top four bits
By way of example, consider the first index which stores 7. That assignment statement will therefore add 16 to index 7, upping the count of sevens. The modulo operator is to ensure that values which have already been increased only use the lower four bits to specify the array index.
So the array eventually becomes:
0 1 2 3 4 5 6 7 <- index
(0)7 (0)6 (1)2 (2)5 (0)3 (1)3 (1)7 (3)7
Then you have your new array in constant space and you can just use int (array[X] / 16) to get the count of how many X values there were.
But, that's pretty devious and requires certain assumptions as mentioned before. It may well be that level of deviousness the interviewer was looking for, or they may just want to see how a prospective employee handle the Kobayashi Maru of coding :-)
Once you have the counts, it's a simple matter to find pairs that sum to a given X, still in O(N). The basic approach would be to get the cartestian product. For example, again consider that N is 8 and you want pairs that sum to 8. Ignore the lower half of the multiplexed array above (since you're only interested in the counts, you have:
0 1 2 3 4 5 6 7 <- index
(0) (0) (1) (2) (0) (1) (1) (3)
What you basically do is step through the array one by one getting the product of the counts of numbers that sum to 8.
For 0, you would need to add 8 (which doesn't exist).
For 1, you need to add 7. The product of the counts is 0 x 3, so that gives nothing.
For 2, you need to add 6. The product of the counts is 1 x 1, so that gives one occurrence of (2,6).
For 3, you need to add 5. The product of the counts is 2 x 1, so that gives two occurrences of (3,5).
For 4, it's a special case since you can't use the product. In this case it doesn't matter since there are no 4s but, if there was one, that couldn't become a pair. Where the numbers you're pairing are the same, the formula is (assuming there are m of them) 1 + 2 + 3 + ... + m-1. With a bit of mathematical widardry, that turns out to be m(m-1)/2.
Beyond that, you're pairing with values to the left, which you've already done so you stop.
So what you have ended up with from
a b c d e f g h <- identifiers
7 6 2 5 3 3 7 7
is:
(2,6) (3,5) (3,5)
(c,b) (e,d) (f,d) <- identifiers
No other values add up to 8.
The following program illustrates this in operation:
#include <stdio.h>
int arr[] = {3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 4, 4, 4, 4};
#define SZ (sizeof(arr) / sizeof(*arr))
static void dumpArr (char *desc) {
int i;
printf ("%s:\n Indexes:", desc);
for (i = 0; i < SZ; i++) printf (" %2d", i);
printf ("\n Counts :");
for (i = 0; i < SZ; i++) printf (" %2d", arr[i] / 100);
printf ("\n Values :");
for (i = 0; i < SZ; i++) printf (" %2d", arr[i] % 100);
puts ("\n=====\n");
}
That bit above is just for debugging. The actual code to do the bucket sort is below:
int main (void) {
int i, j, find, prod;
dumpArr ("Initial");
// Sort array in O(1) - bucket sort.
for (i = 0; i < SZ; i++) {
arr[arr[i] % 100] += 100;
}
And we finish with the code to do the pairings:
dumpArr ("After bucket sort");
// Now do pairings.
find = 8;
for (i = 0, j = find - i; i <= j; i++, j--) {
if (i == j) {
prod = (arr[i]/100) * (arr[i]/100-1) / 2;
if (prod > 0) {
printf ("(%d,%d) %d time(s)\n", i, j, prod);
}
} else {
if ((j >= 0) && (j < SZ)) {
prod = (arr[i]/100) * (arr[j]/100);
if (prod > 0) {
printf ("(%d,%d) %d time(s)\n", i, j, prod);
}
}
}
}
return 0;
}
The output is:
Initial:
Indexes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Counts : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Values : 3 1 4 1 5 9 2 6 5 3 5 8 9 4 4 4 4
=====
After bucket sort:
Indexes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Counts : 0 2 1 2 5 3 1 0 1 2 0 0 0 0 0 0 0
Values : 3 1 4 1 5 9 2 6 5 3 5 8 9 4 4 4 4
=====
(2,6) 1 time(s)
(3,5) 6 time(s)
(4,4) 10 time(s)
and, if you examine the input digits, you'll find the pairs are correct.
This may be done by converting the input array to the list of counters "in-place" in O(N) time. Of course this assumes input array is not immutable. There is no need for any additional assumptions about unused bits in each array element.
Start with the following pre-processing: try to move each array's element to the position determined by element's value; move element on this position also to the position determined by its value; continue until:
next element is moved to the position from where this cycle was started,
next element cannot be moved because it is already on the position corresponding to its value (in this case put current element to the position from where this cycle was started).
After pre-processing every element either is located at its "proper" position or "points" to its "proper" position. In case we have an unused bit in each element, we could convert each properly positioned element into a counter, initialize it with "1", and allow each "pointing" element to increase appropriate counter. Additional bit allows to distinguish counters from values. The same thing may be done without any additional bits but with less trivial algorithm.
Count how may values in the array are equal to 0 or 1. If there are any such values, reset them to zero and update counters at positions 0 and/or 1. Set k=2 (size of the array's part that has values less than k replaced by counters). Apply the following procedure for k = 2, 4, 8, ...
Find elements at positions k .. 2k-1 which are at their "proper" position, replace them with counters, initial value is "1".
For any element at positions k .. 2k-1 with values 2 .. k-1 update corresponding counter at positions 2 .. k-1 and reset value to zero.
For any element at positions 0 .. 2k-1 with values k .. 2k-1 update corresponding counter at positions k .. 2k-1 and reset value to zero.
All iterations of this procedure together have O(N) time complexity. At the end the input array is completely converted to the array of counters. The only difficulty here is that up to two counters at positions 0 .. 2k-1 may have values greater than k-1. But this could be mitigated by storing two additional indexes for each of them and processing elements at these indexes as counters instead of values.
After an array of counters is produced, we could just multiply pairs of counters (where corresponding pair of indexes sum to X) to get the required counts of pairs.
String sorting is n log n however if you can assume the numbers are bounded (and you can because you're only interested in numbers that sum to a certain value) you can use a Radix sort. Radix sort takes O(kN) time, where "k" is the length of the key. That's a constant in your case, so I think it's fair to say O(N).
Generally I would however solve this using a hash e.g.
http://41j.com/blog/2012/04/find-items-in-an-array-that-sum-to-15/
Though that is of course not a linear time solution.

Resources