Compare two arrays and create new array with equal elements in C - c

The problem is to check two arrays for the same integer value and put matching values in a new array.
Let say I have two arrays
a[n] = {2,5,2,7,8,4,2}
b[m] = {1,2,6,2,7,9,4,2,5,7,3}
Each array can be a different size.
I need to check if the arrays have matching elements and put them in a new array. The result in this case should be:
array[] = {2,2,2,5,7,4}
And I need to do it in O(n.log(n) + m.log(m)).
I know there is a way to do with merge sorting or put one of the array in a hash array but I really don't know how to implement it.
I will really appreciate your help, thanks!!!

As you have already figured out you can use merge sort (implementing it is beyond the scope of this answer, I suppose you can find a solution on wikipedia or searching on Stack Overflow) so that you can get nlogn + mlogm complexity supposing n is the size of the first array and m is the size of another.
Let's call the first array a (with the size n) and the second one b (with size m). First sort these arrays (merge sort would give us nlogn + mlogm complexity). And now we have:
a[n] // {2,2,2,4,5,7,8} and b[n] // {1,2,2,2,3,4,5,6,7,7,9}
Supposing n <= m we can simply iterate simulateously comparing coresponding values:
But first lets allocate array int c[n]; to store results (you can print to the console instead of storing if you need). And now the loop itself:
int k = 0; // store the new size of c array!
for (int i = 0, j = 0; i < n && j < m; )
{
if (a[i] == b[j])
{
// match found, store it
c[k] = a[i];
++i; ++j; ++k;
}
else if (a[i] > b[j])
{
// current value in a is leading, go to next in b
++j;
}
else
{
// the last possibility is a[i] < b[j] - b is leading
++i;
}
}
Note: the loop itself is n+m complexity at worst (remember n <= m assumption) which is less than for sorting so overal complexity is nlogn + mlogm. Now you can iterate c array (it's size is actually n as we allocated, but the number of elements in it is k) and do what you need with that numbers.

From the way that you explain it the way to do this would be to loop over the shorter array and check it against the longer array. Let us assume that A is the shorter array and B the longer array. Create a results array C.
Loop over each element in A, call it I
If I is found in B, remove it from B and put it in C, break out of the test loop.
Now go to the next element in A.
This means that if a number I is found twice in A and three times in B, then I will only appear twice in C. Once you finish, then every number found in both arrays will appear in C the number of times that it actually appears in both.
I am carefully not putting in suggested code as your question is about a method that you can use. You should figure out the code yourself.

I would be inclined to take the following approach:
1) Sort array B. There are many well published sort algorithms to do this, as well as several implementations in various generally available libraries.
2) Loop through array A and for each element do a binary search (or other suitable algorithm) on array B for a match. If a match is found, remove the element from array B (to avoid future matches) and add it to the output array.

Related

Given an array of integers of size n+1 consisting of the elements [1,n]. All elements are unique except one which is duplicated k times

I have been attempting to solve the following problem:
You are given an array of n+1 integers where all the elements lies in [1,n]. You are also given that one of the elements is duplicated a certain number of times, whilst the others are distinct. Develop an algorithm to find both the duplicated number and the number of times it is duplicated.
Here is my solution where I let k = number of duplications:
struct LatticePoint{ // to hold duplicate and k
int a;
int b;
LatticePoint(int a_, int b_) : a(a_), b(b_) {}
}
LatticePoint findDuplicateAndK(const std::vector<int>& A){
int n = A.size() - 1;
std::vector<int> Numbers (n);
for(int i = 0; i < n + 1; ++i){
++Numbers[A[i] - 1]; // A[i] in range [1,n] so no out-of-access
}
int i = 0;
while(i < n){
if(Numbers[i] > 1) {
int duplicate = i + 1;
int k = Numbers[i] - 1;
LatticePoint result{duplicate, k};
return LatticePoint;
}
So, the basic idea is this: we go along the array and each time we see the number A[i] we increment the value of Numbers[A[i]]. Since only the duplicate appears more than once, the index of the entry of Numbers with value greater than 1 must be the duplicate number with the value of the entry the number of duplications - 1. This algorithm of O(n) in time complexity and O(n) in space.
I was wondering if someone had a solution that is better in time and/or space? (or indeed if there are any errors in my solution...)
You can reduce the scratch space to n bits instead of n ints, provided you either have or are willing to write a bitset with run-time specified size (see boost::dynamic_bitset).
You don't need to collect duplicate counts until you know which element is duplicated, and then you only need to keep that count. So all you need to track is whether you have previously seen the value (hence, n bits). Once you find the duplicated value, set count to 2 and run through the rest of the vector, incrementing count each time you hit an instance of the value. (You initialise count to 2, since by the time you get there, you will have seen exactly two of them.)
That's still O(n) space, but the constant factor is a lot smaller.
The idea of your code works.
But, thanks to the n+1 elements, we can achieve other tradeoffs of time and space.
If we have some number of buckets we're dividing numbers between, putting n+1 numbers in means that some bucket has to wind up with more than expected. This is a variant on the well-known pigeonhole principle.
So we use 2 buckets, one for the range 1..floor(n/2) and one for floor(n/2)+1..n. After one pass through the array, we know which half the answer is in. We then divide that half into halves, make another pass, and so on. This leads to a binary search which will get the answer with O(1) data, and with ceil(log_2(n)) passes, each taking time O(n). Therefore we get the answer in time O(n log(n)).
Now we don't need to use 2 buckets. If we used 3, we'd take ceil(log_3(n)) passes. So as we increased the fixed number of buckets, we take more space and save time. Are there other tradeoffs?
Well you showed how to do it in 1 pass with n buckets. How many buckets do you need to do it in 2 passes? The answer turns out to be at least sqrt(n) bucekts. And 3 passes is possible with the cube root. And so on.
So you get a whole family of tradeoffs where the more buckets you have, the more space you need, but the fewer passes. And your solution is merely at the extreme end, taking the most spaces and the least time.
Here's a cheekier algorithm, which requires only constant space but rearranges the input vector. (It only reorders; all the original elements are still present at the end.)
It's still O(n) time, although that might not be completely obvious.
The idea is to try to rearrange the array so that A[i] is i, until we find the duplicate. The duplicate will show up when we try to put an element at the right index and it turns out that that index already holds that element. With that, we've found the duplicate; we have a value we want to move to A[j] but the same value is already at A[j]. We then scan through the rest of the array, incrementing the count every time we find another instance.
#include <utility>
#include <vector>
std::pair<int, int> count_dup(std::vector<int> A) {
/* Try to put each element in its "home" position (that is,
* where the value is the same as the index). Since the
* values start at 1, A[0] isn't home to anyone, so we start
* the loop at 1.
*/
int n = A.size();
for (int i = 1; i < n; ++i) {
while (A[i] != i) {
int j = A[i];
if (A[j] == j) {
/* j is the duplicate. Now we need to count them.
* We have one at i. There's one at j, too, but we only
* need to add it if we're not going to run into it in
* the scan. And there might be one at position 0. After that,
* we just scan through the rest of the array.
*/
int count = 1;
if (A[0] == j) ++count;
if (j < i) ++count;
for (++i; i < n; ++i) {
if (A[i] == j) ++count;
}
return std::make_pair(j, count);
}
/* This swap can only happen once per element. */
std::swap(A[i], A[j]);
}
}
/* If we get here, every element from 1 to n is at home.
* So the duplicate must be A[0], and the duplicate count
* must be 2.
*/
return std::make_pair(A[0], 2);
}
A parallel solution with O(1) complexity is possible.
Introduce an array of atomic booleans and two atomic integers called duplicate and count. First set count to 1. Then access the array in parallel at the index positions of the numbers and perform a test-and-set operation on the boolean. If a boolean is set already, assign the number to duplicate and increment count.
This solution may not always perform better than the suggested sequential alternatives. Certainly not if all numbers are duplicates. Still, it has constant complexity in theory. Or maybe linear complexity in the number of duplicates. I am not quite sure. However, it should perform well when using many cores and especially if the test-and-set and increment operations are lock-free.

How to merge two unsorted Arrays into another considering time complexity

I have two/more arrays. I want to form a new array which is sorted from the two/more array.
I dont want to combine the arrays and later sort. I wanted to get it on the go.
Time complexity is considered.
Any programming language/algorithm is fine.
so basically you need to find for each position in the new array the smallest remaining value.
The input are the arrays a and b with n and m elements. Find min value in both arrays. compare the two values and set the smaller one to max Value in its array to prevent that it is found again. Set the min value in c.
Pseudocode:
merge(a[n],b[m]){
c[n+m];
for (i = 0, i < n+m; i++){
aMin = getMinValue(a);
bMin = getMinValue(b);
if(aMin < bMin) setMinValueToMax(a,aMin)
else setMinValueToMax(b,bMin)
c[i] = min(aMin,bMin)
}
return c;
}
Here n is the size of the bigger array.
GetMinValue will run in O(n), just iterate over all elements. SetMinValueToMax will run in O(n) as well if you don't save the index. The for-loop will be in O(n). The body of the for is 0(3n) (2 x getMinValue + setMinValueToMax). In big O const factors can be removed which leads to a runtime of O(n^2).

Merge k sorted arrays using C

I need to merge k (1 <= k <= 16) sorted arrays into one sorted array. This is for a homework assignment and the Professor requires that this be done using an O(n) algorithm. Merging 2 arrays is no problem and I can do it easily using an O(n) algorithm. I feel that what my professor is asking is undoable for n arrays with an O(n) algorithm.
I am using the below algorithm to split the array indices and running InsertionSort on each partition. I could save these start and end indices into a 2D array. I just don't see how the merging can be done using O(n) because this is going to require more than one loop. If it is possible, does anyone have any hints. I'm not looking for actual code, just a hint as to where I should start/
int chunkSize = round(float(arraySize) / numThreads);
for (int i = 0; i < numThreads; i++) {
int start = i * chunkSize;
int end = start + chunkSize - 1;
if (i == numThreads - 1) {
end = arraySize - 1;
}
InsertionSort(&array[start], end - start + 1);
}
EDIT: The requirement is that the algorithm be O(n) where n is the number of elements in the array. Also, I need to solve this without using a min heap.
EDIT #2: Here is an algorithm I came up with. The problem here is that I'm not storing the result of each iteration back into the original array. I could just copy all of it back in for a loop but that would be expensive. Is there any way I can do this, other than using something memcpy? In the below code, indices is a 2D array [numThreads][2] where array[i][0] is the start index and array[i][1] is the end index of the ith array.
void mergeArrays(int array[], int indices[][2], int threads, int result[]) {
for (int i = 0; i < threads - 1; i++) {
int resPos = 0;
int lhsPos = 0;
int lhsEnd = indices[i][1];
int rhsPos = indices[i+1][0];
int rhsEnd = indices[i+1][1];
while (lhsPos <= lhsEnd && rhsPos <= rhsEnd) {
if (array[lhsPos] <= array[rhsPos]) {
result[resPos] = array[lhsPos];
lhsPos++;
} else {
result[resPos] = array[rhsPos];
rhsPos++;
}
resPos++;
}
while (lhsPos <= lhsEnd) {
result[resPos] = array[lhsPos];
lhsPos++;
resPos++;
}
while (rhsPos <= rhsEnd) {
result[resPos] = array[rhsPos];
rhsPos++;
resPos++;
}
}
}
You can merge K sorted arrays in one sorted array with O(N*log(K)) algorithm, using priority queue with K entries, where N is overall number of elements in all arrays.
If K is considered as constant value (it is limited by 16 in your case), then complexity is O(N).
Note again: N is number of elements in my post, not number of arrays.
It is impossible to merge arrays in O(K) - simple copy takes O(N)
Using the facts you provided:
(1) n is the number of arrays to to merge;
(2) the arrays to be merged are already sorted;
(3) the merge needs to be of order n, that is linear in the number of arrays
(and NOT linear in the number of elements in each array, as you might mistakenly think at first sight).
Use the analogy of merging 4 sorted piles of cards, low to high, face up. You would pick the card with the lowest face value from one of the piles and put it (face down) on the merged deck, until all piles are exhausted.
For your program: keep a counter for each array for the number of elements you have already transferred to the output. This is at the same time an index to the next element in each array NOT merged in the output. Pick the smallest element that you find at one of these locations. You have to lookup the first waiting element in all the arrays for that, so that is of order n.
Also, I don't understand why the answer from MoB got up-votes, it does not answer the question.
Here is one way to do it (pseudocode)
input array[k][n]
init indices[k] = { 0, 0, 0, ... }
init queue = { empty priority queue }
for i in 0..k:
insert i into queue with priority (array[i][0])
while queue is not empty:
let x = pop queue
output array[x, indices[x]]
increment indices[x]
insert x into queue with priority (array[x][indices[x]])
This can probably be simplified further in C. You would have to find a suitable queue implementation to use though as there are none in libc.
Complexity for this operation:
"while queue is not empty" => O(n)
"insert x into queue ..." => O(log k)
=> O(n log k)
Which, if you consider k = constant, is O(n).
After sorting the k sub-arrays (the method doesn't matter), the code does a k-way merge. The simplest implementation does k-1 compares to determine the smallest leading element of each of the k arrays, then moves that element from it's sub-array to the output array and gets the next element from that array. When the end of an array is reached, the algorithm drops down to a (k-1) way merge, then (k-2) way merge, finally there's just one sub-array left and it's copied. This will be O(n) time since k-1 is a constant.
The k-1 compares can be sped up by using a minimum heap (which is how some priority queues are implemented), but it's still O(n), with just a smaller constant. The heap needs to be initialized at the start, then updated each time an element is removed and a new one added.

quicksort code understanding

i have a quicksort code that is supposed to run on the text "B A T T A J U S" (ignore blanks). But i dont seem to understand the code that well.
void quicksort (itemType a[], int l, int r)
{
int i, j; itemType v;
if (r>l)
{
v = a[r]; i = l-1; j = r;
for (;;)
{
while (a[++i] < v);
while (a[--j] >= v);
if (i >= j) break;
swap(a,i,j);
}
swap(a,i,r);
quicksort(a,l,i-1);
quicksort(a,i+1,r);
}
}
i can explain what i understand: the first if check if l < r which in this case it is since, s is greater than b. THen i get alittle confused: v is set to be equal to a[r], does this mean S? since S is all the way to the right? then l is set to outside the "array" since its -1. (so its undefined, i assume) then j is set to be equal to r, but is that the posision r? as in S?
I kinda dont understand what values are set to what, if the a[r] = the letter in the posision or the or anything else. Hopefully some1 can explain me how the first swap works, so i hopefully can learn this?
It is probably better to start with an understanding of the QuickSort algorithm, and then see how the code corresponds to it, than to study the code to try to figure out how QuickSort works. Basic QuickSort (which is what you have) is in fact a pretty simple algorithm. To sort an array A:
If the length of A is less than 2 then the array is already sorted. Otherwise,
Select any element of A to be a "pivot element".
Rearrange the other elements as needed so that all those that are less than the pivot are at the beginning of A, and those that are greater than or equal to the pivot are at the end. (This particular version also puts the pivot itself between the two, which is common but not strictly necessary; it could simply be included in the upper subarray, and the algorithm would still work.)
Apply the QuickSort procedure to each of the two sub-arrays produced by (3).
Your particular code chooses the right-most element of each (sub)array as the pivot element, and at step (4) it excludes the pivot from the sub-arrays to be recursively sorted.
Quick sort works by separating your array into a "left" subarray which contains only values stricly less than an arbitrarily chosen a pivot value and a "right" subarray that contains only elements that are greater than or equal to the pivot. Once the array has been divided like this, each of the two subarrays are sorted using the same algorithm. Here is how this applies to your code:
v = a[r] sets the pivot value to the last element in the array. This works well since the array is presumably unsorted to begin with, so a[r] is as good a value as any.
while(a[++i] < v) ; keeps stopping at the first element of the left sub-array that is greater than or equal to the pivot, v. When this loop ends, i is the index of an element that should be in the right sub-array rather than the left.
while(a[--j] >= v) ; does the same thing, except that it stops at the last element of the right sub-array that is strictly less than the pivot, v. When this loop ends, j is the index of an element that should be in the left sub-array rather than the right.
Whenever we find a pair of elements that are in the wrong sub-arrays, we swap them.
When all of the elements in the array are sorted (i meets j), we swap the pivot with the element at index i (which is now guaranteed to be in the right sub-array).
Since the pivot is guaranteed to be in the right position (left sub-array is strictly less and right sub-array is greater than or equal), we need to sort the sub-arrays but not the pivot. That is why the recursive calls use indices l,i-1 and i+1,r, leaving the pivot at index i.
I can't offer a solution in that exact form. That code is overly complicated in my thinking.
Also not sure if what I'm proposing is a bubble sort, or modified bubble, but to me just easier. My added comment is that quicksort() is calling itself, therefore it is recursive. Not good in my book for something as simple as a sort. This all depends on what you need for size and efficiency. If you're sorting many terms, then my proposed sort is not the best.
for(i = 0; i < (n - 1); i++) {
for(j = (i + 1); j < n; j++) {
if(value[i] > value[j]) {
tmp = value[i];
value[i] = value[j];
value[j] = tmp;
}
}
}
Where
n is the number of total elements.
i, j, and tmp are integers
value[] is an array of integers to sort

Merging two sorted arrays into a third one can be done in O(n)?

I'm trying to merge to sorted arrays into a third sorted array , but I can't see
any way to do that in O(n) , only in O(n*n) .Am I wrong ? is there a way to do that in O(n) ?
Edit :
Actually the question is a little different :
I have 2 sorted skip lists and I want to merge them into a new sorted skip list ,without changing
the input (i.e. the two skip lists) .
I was thinking about :
put the lists in two arrays
merge the two arrays using MergeSort (this takes O(n) runtime)
build a new skip list from the sorted array .... // I'm not sure about its runtime
any ideas ?
Regards
You keep two loops going, and flip between each of them as you pull values from each 'side' into the 3rd array. if arr1's values are less than the current arr2, then stuff arr1's values into arr3 until you hit equality or go 'bigger', then you flip the process and start pulling values out of arr2. And then just keep bouncing back/forth until there's nothing left in either source array.
Comes out to O(n+m), aka O(n).
picture two arrays one above the other:
list1=[1,2,6,10]
list2=[3,4,10]
if we start from the left and work our way to the right, comparing the items, each time we take the smallest value and put it in the third array. From the list that we took the smallest item from, we move onto its next item.
i=0,j=0
list1[i] < list2[j]
take 1
i+=1
2<3
take 2
i+=1
3<6
take 3
j+=1
etc..
until we get the final merged array [1,2,3,..]
Because selecting each element for the third array only took one move, it's basically O(N).
you can use two index variables for the already sorted array, and another one for the array being sorted, all initialized to 0.
Now, while you haven't reached the end with any of the sorted arrays, compare the two pointed values in each iteration, take the higher (or lower, depends on your sorting) value and increment the index pointing to the value you've just used.
At the end, go through the array you havn't completed and just paste the remaining values into the merged array.
That way, you're going through the values only once, meaning O(n).
Hint: consider only the head elements of both lists (and remove them [virtually] when processed).
If both input lists are already sorted, how could the merge be O(n*n)? The algorithm given by yourself (the 3 steps) is definitely O(n) rather than O(n*n). Each step is O(n), so overall it is O(n). The big-O is determined by the highest order of your algorithm. Be sure to understand the concept of big-O before working on your homework.
Yes it can be done, actually it would be O(n + m) where n and m are length of first and second arrays, consecutively.
The algorithm is called one pass merge
Pseudo code:
i, j, k = 0 // k is index for resulting array
//maximum length of the resulting array can be n+m,
//so it is always safe to malloc for such a length if you are in C or C++
while(i< len(array1) and j < len(array2) )
if (array1[i] == array2[j])
result[k] = array1[i]
++i, ++j, ++k
else if (array1[i] < array2[j])
result[k] = array1[i]
++i, ++k
else
result[k] = array2[j]
++j, ++k
//now one array might not be traversed all the way up
if ( i < len(array1) )
while( i != len(array1))
result[k] = array1[i]
++i, ++k
else if ( j < len(array2) )
while( j != len(array2) )
result[k] = array2[j]
++j, ++k
Basically, you traverse both arrays at the same time and if the lengths are different, larger array won't be traversed all the way up, so you just add all the elements of the larger array to the result.

Resources