Matches in Array Algorithm - arrays

I have an array of items and I need to find the matching ones(duplicates). I have the simplest O(n^2) algorithm running for now. Item type doesn't really matter, but if you want to know it's image.
myarray;
for(i = 0; i < myarray.length - 1; i++)
for(int j = i+1; j < myarray.length; j++)
if(myarray[i] = myarray[j])
output(names of items);
I tried Wikipedia and Google, but couldn't come out with an answer. Any links or algorithms or code in any language would be great.

Rather than sort and then compare adjacent items, why not add each item to a self balancing binary tree, thus you get the 'already present' check for free (sort of).

If you can find an order on the items, sort them. Then it will be very simple to find items that are equal because they will be next to each other.
This is only O(n*Log(n)).

To find duplicates in your array you can sort and scan the list, looking for adjacent identical items in O(n log n).
If you only want to output duplicates, and memory is not an issue, you can keep a hashSet of elements you've already seen, go through the array, check if the current element is is in your set. Output it as duplicate if it is, insert it to the set otherwise. That would be O(n)

Related

Making a new array without duplicates in C

my code works as far as i can tell..
I was wondering if it can be done in a better way (better time complexity) and what is the time complexity of my code as im not really sure how to caculate it.
cant change the current array in the question but if there is a faster way to do it by removal i would also like to know, thanks a lot.
int i = 1, j = 0, count = 1;
int arrNew[SIZE] = { NULL };
arrNew[0] = arr1[0];
while(i<size){
if (arr1[i] == arrNew[j]) { // if the element of arr1 is already added, resets j for next iteration and moves to the next element.
j = 0;
i++;
}
else {
if (j == count - 1) { // checks if we reached the end of arrNew and adds missing element.
arrNew[count] = arr1[i];
j = 0;
count++; // this variable makes sure we check only the assigned elements of arrNew.
i++;
}
else // if j < count -1 we didnt finish checking all of arrNew.
j++;
}
}
I was wondering if it can be done in a better way (better time complexity)
It's a little hard to tell what's going on at first, but it looks like you're basically using one loop to do two jobs. You're looping on i to step through the original array, but also using j to scan through the new array for each new element. Effectively, you've got nested loops that both potentially have the same size, so you've got O(n2) complexity.
I'd suggest rewriting your code so that the two loops are explicit. You're not saving any time by making one loop do double duty, and if you come back to this code a month from now you're going to waste a bunch of time trying to remember how it works. Make your code obvious — it's as much about communicating with your future self or your coworkers as with the compiler.
Can you improve on that O(n2) complexity? Yes, definitely. One way is to sort the array, so that duplicate values end up being adjacent to each other in the array. It's then easy to just not copy any values that are the same as the preceding value. I know you can't modify the original array, but you can copy the whole thing, sort it, and then copy that array while removing dupes. That'd give you O(n log n) complexity (if you choose an efficient sorting algorithm). In fact, you could speed that up a bit by combining the sorting and copying -- but you'd still end up with O(n log n) complexity. Another way is to use a hash table: check to see whether the value exists in the table, tossing it if it does, or adding it to the table and copying to the new array if it doesn't. That'd be close to O(n).

Convert a given array of integers to a sorted one, by deleting the minimum number of elements

I am working on the following problem:
I have to convert a given array of integers to a sorted one, by deleting the minimum number of elements.
For example: [3,5,2,10,11] will be sorted by deleting ‘2‘ : [3,5,10,11].
Or [3,6,2,4,5,7,7] will be sorted by deleting ‘3‘,’6‘ : [2,4,5,7,7] OR by deleting ‘6‘,’2‘ : [3,4,5,7,7] (both ways i remove 2 elements, that’s why they are both correct).
My thought was to keep a counter for each element to see how many conflicts it has with other elements.
What i mean by conflicts: in the first example, numbers ‘3‘ and ‘5‘ have 1 conflict each (with the number ‘2‘) and number ‘2‘ has 2 conflicts (with numbers ‘3‘ and ‘5‘).
So, after calculating the array of conflicts, i remove from the original array the element that has the max number of conflicts and i repeat for the remaining array until all elements have 0 conflicts.
This is not an efficient way though (in some cases that i havent thought it might also produce wrong results), so i was wondering if anyone could think of a better solution.
I believe this is just a cleverly disguised version of the longest increasing subsequence problem. If you delete the minimum number of elements to have a sorted sequence, what you're left with is the longest increasing subsequence of the original array. Accordingly, you can do the following:
Find the longest increasing subsequence (O(n log n) algorithms exist for this), then
Delete everything not in that subsequence.
Hope this helps!
You can build a DAG based on the elements in the array:
Each element a[n] is a vertex.
For any pair of elements (m,n) where (m < n) and (a[m] <= a[n]) add a directed edge.
Optimization: you can build simple chains for sorted subarrays. For example, if a[m]<=a[m+1)<=a[m+2]<=a[m+3]>a[m+4], you can skip adding the edges (m,m+2) and (m,m+3) for vertex m.
The goal now is to find the longest path in the graph, which has a linear time solution for directed acyclic graphs.
A algorithm is described in the aforementioned Wikipedia page and also here.
I would do it with recursive programming. Here is my pseudo code :
/**
* sortedArray : an array already sorted.
* leftToSort : an unsorted array that need to be sorted/merged with sortedArray.
* first call to this function must be sortArrayByRemoval([], arrayToSort);
**/
public Integer[] sortArrayByRemoval(Integer[] sortedArray, Integer[] leftToSort){
if(leftToSort.size==0){
return sortedArray; //end of recursion
}
Integer candidate = leftToSort[0];
if(candidate>=sortedArray[last]){ //append candidate to the sorted array
return sortArrayByRemoval(sortedArray.append(candidate) , leftToSort.removeFirst());
}else{
//either we skip it
Integer[] result1 = sortArrayByRemoval(sortedArray,leftToSort.removeFirst());
//either we do back tracking
Integer[] result2 = sortArrayByRemoval(sortedArray.removeLast(),leftToSort);
//and finally we return the best choice (i.e. the longest array)
return biggestArray(result1, result2);
}
}
Maybe not the most efficient, but I think it gives you the correct answer.
If you are not stuck on the idea of literally deleting things from the original array, then what you want is the longest increasing subsequence of your original sequence of numbers. This is a well-known, classic problem and you can find many examples in the literature or in textbooks.
(If you are stuck on deleting, well... find the LCS and delete everything not in the LCS.)

Merging two sorted arrays into a third one can be done in O(n)?

I'm trying to merge to sorted arrays into a third sorted array , but I can't see
any way to do that in O(n) , only in O(n*n) .Am I wrong ? is there a way to do that in O(n) ?
Edit :
Actually the question is a little different :
I have 2 sorted skip lists and I want to merge them into a new sorted skip list ,without changing
the input (i.e. the two skip lists) .
I was thinking about :
put the lists in two arrays
merge the two arrays using MergeSort (this takes O(n) runtime)
build a new skip list from the sorted array .... // I'm not sure about its runtime
any ideas ?
Regards
You keep two loops going, and flip between each of them as you pull values from each 'side' into the 3rd array. if arr1's values are less than the current arr2, then stuff arr1's values into arr3 until you hit equality or go 'bigger', then you flip the process and start pulling values out of arr2. And then just keep bouncing back/forth until there's nothing left in either source array.
Comes out to O(n+m), aka O(n).
picture two arrays one above the other:
list1=[1,2,6,10]
list2=[3,4,10]
if we start from the left and work our way to the right, comparing the items, each time we take the smallest value and put it in the third array. From the list that we took the smallest item from, we move onto its next item.
i=0,j=0
list1[i] < list2[j]
take 1
i+=1
2<3
take 2
i+=1
3<6
take 3
j+=1
etc..
until we get the final merged array [1,2,3,..]
Because selecting each element for the third array only took one move, it's basically O(N).
you can use two index variables for the already sorted array, and another one for the array being sorted, all initialized to 0.
Now, while you haven't reached the end with any of the sorted arrays, compare the two pointed values in each iteration, take the higher (or lower, depends on your sorting) value and increment the index pointing to the value you've just used.
At the end, go through the array you havn't completed and just paste the remaining values into the merged array.
That way, you're going through the values only once, meaning O(n).
Hint: consider only the head elements of both lists (and remove them [virtually] when processed).
If both input lists are already sorted, how could the merge be O(n*n)? The algorithm given by yourself (the 3 steps) is definitely O(n) rather than O(n*n). Each step is O(n), so overall it is O(n). The big-O is determined by the highest order of your algorithm. Be sure to understand the concept of big-O before working on your homework.
Yes it can be done, actually it would be O(n + m) where n and m are length of first and second arrays, consecutively.
The algorithm is called one pass merge
Pseudo code:
i, j, k = 0 // k is index for resulting array
//maximum length of the resulting array can be n+m,
//so it is always safe to malloc for such a length if you are in C or C++
while(i< len(array1) and j < len(array2) )
if (array1[i] == array2[j])
result[k] = array1[i]
++i, ++j, ++k
else if (array1[i] < array2[j])
result[k] = array1[i]
++i, ++k
else
result[k] = array2[j]
++j, ++k
//now one array might not be traversed all the way up
if ( i < len(array1) )
while( i != len(array1))
result[k] = array1[i]
++i, ++k
else if ( j < len(array2) )
while( j != len(array2) )
result[k] = array2[j]
++j, ++k
Basically, you traverse both arrays at the same time and if the lengths are different, larger array won't be traversed all the way up, so you just add all the elements of the larger array to the result.

what's efficient way to filter an array

I am programming c on linux and I have a big integer array, how to filter it, say, find values that fit some condition, e.g. value > 1789 && value < 2031. what's the efficient way to do this, do I need to sort this array first?
I've read the answers and thank you all, but I need to do such filtering operation many times on this big array, not only for once. so is iterating it one by one every time the best way?
If the only thing you want to do with the array is to get the values that match this criteria, it would be faster just to iterate over the array and check each value for the condition (O(n) vs. O(nlogn)). If however, you are going to perform multiple operations on this array, than it's better to sort it.
Sort the array first. Then on each query do 2 binary searches. I'm assuming queries will be like -
Find integers x such that a < x < b
First binary search would find the index i of the element such that Array[i-1] <= a < Array[i] and second binary search would find the index j such that Array[j] < b <= Array[j+1]. Then your desired range would be [i, j].
This algorithm's complexity is O(NlogN) in preprocessing and O(N) per query if you want to iterate over all the elements and O(logN) per query if you just want to count the number of filtered element.
Let me know if you need help implementing binary search in C. There is library function named binary_search() in C and lower_bound() and upper_bound() in C++ STL.
You could use a max heap implemented as an array of the same size as the source array. Initialize it with min-1 value and insert values into the max-heap as the numbers come in. The first check would be to see if the number to be inserted is greater than the first element, if it's not, discard it, if it is larger then insert it into the array. To get the list of numbers back, read all numbers in the new array till min-1.
To filter the array, you'll have to look at each element once. There's no need to look at any element more than once, so a simple linear search of the array for items matching your criteria is going to be as efficient as you can get.
Sorting the array would end up looking at some elements more than once, which is not necessary for your purpose.
If you can spare some more memory, then you can scan your array once, get the indices of matching values and store it in another array. This new array will be significantly shorter since it has only indices of values which match a specific pattern! Something like this
int original_array[SOME_SIZE];
int new_array[LESS_THAN_SOME__SIZE];
for ( int i=0,j=0; i<SOME_SIZE; i++)
{
if ( original_array[i]> LOWER_LIMIT && original_array[i]< HIGHER_LIMIT )
{
new_array[j++] = i;
}
}
You need to do the above once and form now on,
for ( int i=0; i< LESS_THAN_SOME_SIZE; i++ )
{
if ( original_array[new_array[i]]> LOWER_LIMIT && original_array[new_array[i]]< HIGHER_LIMIT )
{
printf("Success! Found Value %d\n", original_array[new_array[i]] )
}
}
So at the cost of some memory, you can save considerable amount of time. Even if you invest some time in sorting, you have to parse the sorted array every time. This method minimizes the array length as well as the sorting time ( at the cost of extra memory, of course :) )
Try this library: http://code.google.com/p/boolinq/
It is iterator-based and as fast as can be, there are no any overhead. But it needs C++11 standard. Yor code will be written in declarative-way:
int arr[] = {1,2,3,4,5,6,7,8,9};
auto items = boolinq::from(arr).where([](int a){return a>3 && a<6;});
while (!items.empty())
{
int item = items.front();
...
}
Faster than iterator-based scan can be only multithreaded scan...

What is the bug in this code?

Based on a this logic given as an answer on SO to a different(similar) question, to remove repeated numbers in a array in O(N) time complexity, I implemented that logic in C, as shown below. But the result of my code does not return unique numbers. I tried debugging but could not get the logic behind it to fix this.
int remove_repeat(int *a, int n)
{
int i, k;
k = 0;
for (i = 1; i < n; i++)
{
if (a[k] != a[i])
{
a[k+1] = a[i];
k++;
}
}
return (k+1);
}
main()
{
int a[] = {1, 4, 1, 2, 3, 3, 3, 1, 5};
int n;
int i;
n = remove_repeat(a, 9);
for (i = 0; i < n; i++)
printf("a[%d] = %d\n", i, a[i]);
}
1] What is incorrect in above code to remove duplicates.
2] Any other O(N) or O(NlogN) solution for this problem. Its logic?
Heap sort in O(n log n) time.
Iterate through in O(n) time replacing repeating elements with a sentinel value (such as INT_MAX).
Heap sort again in O(n log n) to distil out the repeating elements.
Still bounded by O(n log n).
Your code only checks whether an item in the array is the same as its immediate predecessor.
If your array starts out sorted, that will work, because all instances of a particular number will be contiguous.
If your array isn't sorted to start with, that won't work because instances of a particular number may not be contiguous, so you have to look through all the preceding numbers to determine whether one has been seen yet.
To do the job in O(N log N) time, you can sort the array, then use the logic you already have to remove duplicates from the sorted array. Obviously enough, this is only useful if you're all right with rearranging the numbers.
If you want to retain the original order, you can use something like a hash table or bit set to track whether a number has been seen yet or not, and only copy each number to the output when/if it has not yet been seen. To do this, we change your current:
if (a[k] != a[i])
a[k+1] = a[i];
to something like:
if (!hash_find(hash_table, a[i])) {
hash_insert(hash_table, a[i]);
a[k+1] = a[i];
}
If your numbers all fall within fairly narrow bounds or you expect the values to be dense (i.e., most values are present) you might want to use a bit-set instead of a hash table. This would be just an array of bits, set to zero or one to indicate whether a particular number has been seen yet.
On the other hand, if you're more concerned with the upper bound on complexity than the average case, you could use a balanced tree-based collection instead of a hash table. This will typically use more memory and run more slowly, but its expected complexity and worst case complexity are essentially identical (O(N log N)). A typical hash table degenerates from constant complexity to linear complexity in the worst case, which will change your overall complexity from O(N) to O(N2).
Your code would appear to require that the input is sorted. With unsorted inputs as you are testing with, your code will not remove all duplicates (only adjacent ones).
You are able to get O(N) solution if the number of integers is known up front and smaller than the amount of memory you have :). Make one pass to determine the unique integers you have using auxillary storage, then another to output the unique values.
Code below is in Java, but hopefully you get the idea.
int[] removeRepeats(int[] a) {
// Assume these are the integers between 0 and 1000
Boolean[] v = new Boolean[1000]; // A lazy way of getting a tri-state var (false, true, null)
for (int i=0;i<a.length;++i) {
v[a[i]] = Boolean.TRUE;
}
// v[i] = null => number not seen
// v[i] = true => number seen
int[] out = new int[a.length];
int ptr = 0;
for (int i=0;i<a.length;++i) {
if (v[a[i]] != null && v[a[i]].equals(Boolean.TRUE)) {
out[ptr++] = a[i];
v[a[i]] = Boolean.FALSE;
}
}
// Out now doesn't contain duplicates, order is preserved and ptr represents how
// many elements are set.
return out;
}
You are going to need two loops, one to go through the source and one to check each item in the destination array.
You are not going to get O(N).
[EDIT]
The article you linked to suggests a sorted output array which means the search for duplicates in the output array can be a binary search...which is O(LogN).
Your logic just wrong, so the code is wrong too. Do your logic by yourself before coding it.
I suggest a O(NlnN) way with a modification of heapsort.
With heapsort, we join from a[i] to a[n], find the minimum and replace it with a[i], right?
So now is the modification, if the minimum is the same with a[i-1] then swap minimum and a[n], reduce your array item's number by 1.
It should do the trick in O(NlnN) way.
Your code will work only on particular cases. Clearly, you're checking adjacent values but duplicate values can occur any where in array. Hence, it's totally wrong.

Resources