What is the bug in this code? - c

Based on a this logic given as an answer on SO to a different(similar) question, to remove repeated numbers in a array in O(N) time complexity, I implemented that logic in C, as shown below. But the result of my code does not return unique numbers. I tried debugging but could not get the logic behind it to fix this.
int remove_repeat(int *a, int n)
{
int i, k;
k = 0;
for (i = 1; i < n; i++)
{
if (a[k] != a[i])
{
a[k+1] = a[i];
k++;
}
}
return (k+1);
}
main()
{
int a[] = {1, 4, 1, 2, 3, 3, 3, 1, 5};
int n;
int i;
n = remove_repeat(a, 9);
for (i = 0; i < n; i++)
printf("a[%d] = %d\n", i, a[i]);
}
1] What is incorrect in above code to remove duplicates.
2] Any other O(N) or O(NlogN) solution for this problem. Its logic?

Heap sort in O(n log n) time.
Iterate through in O(n) time replacing repeating elements with a sentinel value (such as INT_MAX).
Heap sort again in O(n log n) to distil out the repeating elements.
Still bounded by O(n log n).

Your code only checks whether an item in the array is the same as its immediate predecessor.
If your array starts out sorted, that will work, because all instances of a particular number will be contiguous.
If your array isn't sorted to start with, that won't work because instances of a particular number may not be contiguous, so you have to look through all the preceding numbers to determine whether one has been seen yet.
To do the job in O(N log N) time, you can sort the array, then use the logic you already have to remove duplicates from the sorted array. Obviously enough, this is only useful if you're all right with rearranging the numbers.
If you want to retain the original order, you can use something like a hash table or bit set to track whether a number has been seen yet or not, and only copy each number to the output when/if it has not yet been seen. To do this, we change your current:
if (a[k] != a[i])
a[k+1] = a[i];
to something like:
if (!hash_find(hash_table, a[i])) {
hash_insert(hash_table, a[i]);
a[k+1] = a[i];
}
If your numbers all fall within fairly narrow bounds or you expect the values to be dense (i.e., most values are present) you might want to use a bit-set instead of a hash table. This would be just an array of bits, set to zero or one to indicate whether a particular number has been seen yet.
On the other hand, if you're more concerned with the upper bound on complexity than the average case, you could use a balanced tree-based collection instead of a hash table. This will typically use more memory and run more slowly, but its expected complexity and worst case complexity are essentially identical (O(N log N)). A typical hash table degenerates from constant complexity to linear complexity in the worst case, which will change your overall complexity from O(N) to O(N2).

Your code would appear to require that the input is sorted. With unsorted inputs as you are testing with, your code will not remove all duplicates (only adjacent ones).

You are able to get O(N) solution if the number of integers is known up front and smaller than the amount of memory you have :). Make one pass to determine the unique integers you have using auxillary storage, then another to output the unique values.
Code below is in Java, but hopefully you get the idea.
int[] removeRepeats(int[] a) {
// Assume these are the integers between 0 and 1000
Boolean[] v = new Boolean[1000]; // A lazy way of getting a tri-state var (false, true, null)
for (int i=0;i<a.length;++i) {
v[a[i]] = Boolean.TRUE;
}
// v[i] = null => number not seen
// v[i] = true => number seen
int[] out = new int[a.length];
int ptr = 0;
for (int i=0;i<a.length;++i) {
if (v[a[i]] != null && v[a[i]].equals(Boolean.TRUE)) {
out[ptr++] = a[i];
v[a[i]] = Boolean.FALSE;
}
}
// Out now doesn't contain duplicates, order is preserved and ptr represents how
// many elements are set.
return out;
}

You are going to need two loops, one to go through the source and one to check each item in the destination array.
You are not going to get O(N).
[EDIT]
The article you linked to suggests a sorted output array which means the search for duplicates in the output array can be a binary search...which is O(LogN).

Your logic just wrong, so the code is wrong too. Do your logic by yourself before coding it.
I suggest a O(NlnN) way with a modification of heapsort.
With heapsort, we join from a[i] to a[n], find the minimum and replace it with a[i], right?
So now is the modification, if the minimum is the same with a[i-1] then swap minimum and a[n], reduce your array item's number by 1.
It should do the trick in O(NlnN) way.

Your code will work only on particular cases. Clearly, you're checking adjacent values but duplicate values can occur any where in array. Hence, it's totally wrong.

Related

can someone suggest a better algorithm than this to check if there is at least one duplicate value in an array?

an unsorted integer array nums, and it's size numsSize is given as arguments of function containsDuplicate and we have to return a boolean value true if at least one duplicate value is there otherwise false.
for this task I chose to check if every element, and the elements after that are equal or not until last second element is reached, if equal I will be returning true otherwise false.
bool containsDuplicate(int* nums, int numsSize){
for(int i =0 ;i< numsSize-1;i++)
{
for(int j = i+1;j < numsSize; j++)
{
if(nums[i] == nums[j])
{
return true;
}
}
}
return false;
}
To minimize run time, I've written return value just when the duplicates are found, but still my code is not performing well on large size arrays, I'm expecting an algorithm which has a time complexity O(n) if possible. And is there anyway we can skip the values which are duplicates of previously looked values?
I've seen all other solutions, but I couldn't find a better solution in C.
Your algorithm is O(n^2). But if you sort first, which can be done in less than O(n^2), then determining if there is a duplicate in the array is O(n).
You could maintain a lookup table to determine if each value has been previously seen, which would run in O(n) time, but unless the potential range of values stored in the array are relatively small, this has prohibitive memory usage.
For instance, if you know the values in the array will range from 0-127.
int contains_dupes(int *arr, size_t n) {
char seen[128] = {0};
for (size_t i = 0; i < n; i++) {
if (seen[arr[i]]) return 0;
seen[arr[i]] = 1;
}
return 1;
}
But if we assume int is 4 bytes, and the values in the array can be any int, and we use char for our lookup table, then your lookup table would have to be 4GB in size.
O(n) time, O(n) space: use a set or map. Parse your array, checking each element in turn for membership in your set or map. If it's present then you've found a duplicate; if not, then add it.
If O(n) space is too expensive, you can get away with far less by doing a first pass using a cuckoo hash, which is a space efficient data structure that guarantees no false negatives, but can have false positives. Use the same approach as above but with the cuckoo hash instead of a set or map. Any duplicates you detect may be false positives, so will need to be checked.
Then, parse the array a second time, using the approach described in the first paragraph, but skip past anything that isn't in your set of candidates.
This is still O(n) time.
https://en.wikipedia.org/wiki/Cuckoo_hashing

Best/worst case of finding minimum element in an array

I am trying to find the minimum element in an array, and I've been asked to figure out the number of comparisons needed while using the naïve approach. So the code I've written is:
int findMin(int arr[], int size) {
int comparisons = 0;
int temp = arr[0];
for (int i = 1; i < size; i++) {
comparisons++;
if (arr[i] < temp) {
temp = arr[i];
}
}
return comparisons;
}
Doesn't this require n-1 comparisons in all cases? Because I'll have to check all elements from arr[1] to arr[arr_length-1], and best case/worst case doesn't come into play as I don't know if the element is minimum unless I encounter all the elements of the array atleast once. Thus, it always does n-1 comparisons to get to the minimum element right?
One of my friends says its going to be 2(n-2)+1 worst case and n-1 best case but I don't know how that is? Am I wrong or is something else going on here?
With the unsorted array, since it don't have any information about the order of the unseen element, it have to look at them all, hence the time complexity is n-1.
Your algorithm requires n-1 comparisons in all cases, which means n-1 in the worst case and also n-1 in the best case. There are no optimal inputs or pessimal inputs of your function.
FYI: What is the best, worst and average case running times of an algorithm?

Given an array of integers of size n+1 consisting of the elements [1,n]. All elements are unique except one which is duplicated k times

I have been attempting to solve the following problem:
You are given an array of n+1 integers where all the elements lies in [1,n]. You are also given that one of the elements is duplicated a certain number of times, whilst the others are distinct. Develop an algorithm to find both the duplicated number and the number of times it is duplicated.
Here is my solution where I let k = number of duplications:
struct LatticePoint{ // to hold duplicate and k
int a;
int b;
LatticePoint(int a_, int b_) : a(a_), b(b_) {}
}
LatticePoint findDuplicateAndK(const std::vector<int>& A){
int n = A.size() - 1;
std::vector<int> Numbers (n);
for(int i = 0; i < n + 1; ++i){
++Numbers[A[i] - 1]; // A[i] in range [1,n] so no out-of-access
}
int i = 0;
while(i < n){
if(Numbers[i] > 1) {
int duplicate = i + 1;
int k = Numbers[i] - 1;
LatticePoint result{duplicate, k};
return LatticePoint;
}
So, the basic idea is this: we go along the array and each time we see the number A[i] we increment the value of Numbers[A[i]]. Since only the duplicate appears more than once, the index of the entry of Numbers with value greater than 1 must be the duplicate number with the value of the entry the number of duplications - 1. This algorithm of O(n) in time complexity and O(n) in space.
I was wondering if someone had a solution that is better in time and/or space? (or indeed if there are any errors in my solution...)
You can reduce the scratch space to n bits instead of n ints, provided you either have or are willing to write a bitset with run-time specified size (see boost::dynamic_bitset).
You don't need to collect duplicate counts until you know which element is duplicated, and then you only need to keep that count. So all you need to track is whether you have previously seen the value (hence, n bits). Once you find the duplicated value, set count to 2 and run through the rest of the vector, incrementing count each time you hit an instance of the value. (You initialise count to 2, since by the time you get there, you will have seen exactly two of them.)
That's still O(n) space, but the constant factor is a lot smaller.
The idea of your code works.
But, thanks to the n+1 elements, we can achieve other tradeoffs of time and space.
If we have some number of buckets we're dividing numbers between, putting n+1 numbers in means that some bucket has to wind up with more than expected. This is a variant on the well-known pigeonhole principle.
So we use 2 buckets, one for the range 1..floor(n/2) and one for floor(n/2)+1..n. After one pass through the array, we know which half the answer is in. We then divide that half into halves, make another pass, and so on. This leads to a binary search which will get the answer with O(1) data, and with ceil(log_2(n)) passes, each taking time O(n). Therefore we get the answer in time O(n log(n)).
Now we don't need to use 2 buckets. If we used 3, we'd take ceil(log_3(n)) passes. So as we increased the fixed number of buckets, we take more space and save time. Are there other tradeoffs?
Well you showed how to do it in 1 pass with n buckets. How many buckets do you need to do it in 2 passes? The answer turns out to be at least sqrt(n) bucekts. And 3 passes is possible with the cube root. And so on.
So you get a whole family of tradeoffs where the more buckets you have, the more space you need, but the fewer passes. And your solution is merely at the extreme end, taking the most spaces and the least time.
Here's a cheekier algorithm, which requires only constant space but rearranges the input vector. (It only reorders; all the original elements are still present at the end.)
It's still O(n) time, although that might not be completely obvious.
The idea is to try to rearrange the array so that A[i] is i, until we find the duplicate. The duplicate will show up when we try to put an element at the right index and it turns out that that index already holds that element. With that, we've found the duplicate; we have a value we want to move to A[j] but the same value is already at A[j]. We then scan through the rest of the array, incrementing the count every time we find another instance.
#include <utility>
#include <vector>
std::pair<int, int> count_dup(std::vector<int> A) {
/* Try to put each element in its "home" position (that is,
* where the value is the same as the index). Since the
* values start at 1, A[0] isn't home to anyone, so we start
* the loop at 1.
*/
int n = A.size();
for (int i = 1; i < n; ++i) {
while (A[i] != i) {
int j = A[i];
if (A[j] == j) {
/* j is the duplicate. Now we need to count them.
* We have one at i. There's one at j, too, but we only
* need to add it if we're not going to run into it in
* the scan. And there might be one at position 0. After that,
* we just scan through the rest of the array.
*/
int count = 1;
if (A[0] == j) ++count;
if (j < i) ++count;
for (++i; i < n; ++i) {
if (A[i] == j) ++count;
}
return std::make_pair(j, count);
}
/* This swap can only happen once per element. */
std::swap(A[i], A[j]);
}
}
/* If we get here, every element from 1 to n is at home.
* So the duplicate must be A[0], and the duplicate count
* must be 2.
*/
return std::make_pair(A[0], 2);
}
A parallel solution with O(1) complexity is possible.
Introduce an array of atomic booleans and two atomic integers called duplicate and count. First set count to 1. Then access the array in parallel at the index positions of the numbers and perform a test-and-set operation on the boolean. If a boolean is set already, assign the number to duplicate and increment count.
This solution may not always perform better than the suggested sequential alternatives. Certainly not if all numbers are duplicates. Still, it has constant complexity in theory. Or maybe linear complexity in the number of duplicates. I am not quite sure. However, it should perform well when using many cores and especially if the test-and-set and increment operations are lock-free.

Merge k sorted arrays using C

I need to merge k (1 <= k <= 16) sorted arrays into one sorted array. This is for a homework assignment and the Professor requires that this be done using an O(n) algorithm. Merging 2 arrays is no problem and I can do it easily using an O(n) algorithm. I feel that what my professor is asking is undoable for n arrays with an O(n) algorithm.
I am using the below algorithm to split the array indices and running InsertionSort on each partition. I could save these start and end indices into a 2D array. I just don't see how the merging can be done using O(n) because this is going to require more than one loop. If it is possible, does anyone have any hints. I'm not looking for actual code, just a hint as to where I should start/
int chunkSize = round(float(arraySize) / numThreads);
for (int i = 0; i < numThreads; i++) {
int start = i * chunkSize;
int end = start + chunkSize - 1;
if (i == numThreads - 1) {
end = arraySize - 1;
}
InsertionSort(&array[start], end - start + 1);
}
EDIT: The requirement is that the algorithm be O(n) where n is the number of elements in the array. Also, I need to solve this without using a min heap.
EDIT #2: Here is an algorithm I came up with. The problem here is that I'm not storing the result of each iteration back into the original array. I could just copy all of it back in for a loop but that would be expensive. Is there any way I can do this, other than using something memcpy? In the below code, indices is a 2D array [numThreads][2] where array[i][0] is the start index and array[i][1] is the end index of the ith array.
void mergeArrays(int array[], int indices[][2], int threads, int result[]) {
for (int i = 0; i < threads - 1; i++) {
int resPos = 0;
int lhsPos = 0;
int lhsEnd = indices[i][1];
int rhsPos = indices[i+1][0];
int rhsEnd = indices[i+1][1];
while (lhsPos <= lhsEnd && rhsPos <= rhsEnd) {
if (array[lhsPos] <= array[rhsPos]) {
result[resPos] = array[lhsPos];
lhsPos++;
} else {
result[resPos] = array[rhsPos];
rhsPos++;
}
resPos++;
}
while (lhsPos <= lhsEnd) {
result[resPos] = array[lhsPos];
lhsPos++;
resPos++;
}
while (rhsPos <= rhsEnd) {
result[resPos] = array[rhsPos];
rhsPos++;
resPos++;
}
}
}
You can merge K sorted arrays in one sorted array with O(N*log(K)) algorithm, using priority queue with K entries, where N is overall number of elements in all arrays.
If K is considered as constant value (it is limited by 16 in your case), then complexity is O(N).
Note again: N is number of elements in my post, not number of arrays.
It is impossible to merge arrays in O(K) - simple copy takes O(N)
Using the facts you provided:
(1) n is the number of arrays to to merge;
(2) the arrays to be merged are already sorted;
(3) the merge needs to be of order n, that is linear in the number of arrays
(and NOT linear in the number of elements in each array, as you might mistakenly think at first sight).
Use the analogy of merging 4 sorted piles of cards, low to high, face up. You would pick the card with the lowest face value from one of the piles and put it (face down) on the merged deck, until all piles are exhausted.
For your program: keep a counter for each array for the number of elements you have already transferred to the output. This is at the same time an index to the next element in each array NOT merged in the output. Pick the smallest element that you find at one of these locations. You have to lookup the first waiting element in all the arrays for that, so that is of order n.
Also, I don't understand why the answer from MoB got up-votes, it does not answer the question.
Here is one way to do it (pseudocode)
input array[k][n]
init indices[k] = { 0, 0, 0, ... }
init queue = { empty priority queue }
for i in 0..k:
insert i into queue with priority (array[i][0])
while queue is not empty:
let x = pop queue
output array[x, indices[x]]
increment indices[x]
insert x into queue with priority (array[x][indices[x]])
This can probably be simplified further in C. You would have to find a suitable queue implementation to use though as there are none in libc.
Complexity for this operation:
"while queue is not empty" => O(n)
"insert x into queue ..." => O(log k)
=> O(n log k)
Which, if you consider k = constant, is O(n).
After sorting the k sub-arrays (the method doesn't matter), the code does a k-way merge. The simplest implementation does k-1 compares to determine the smallest leading element of each of the k arrays, then moves that element from it's sub-array to the output array and gets the next element from that array. When the end of an array is reached, the algorithm drops down to a (k-1) way merge, then (k-2) way merge, finally there's just one sub-array left and it's copied. This will be O(n) time since k-1 is a constant.
The k-1 compares can be sped up by using a minimum heap (which is how some priority queues are implemented), but it's still O(n), with just a smaller constant. The heap needs to be initialized at the start, then updated each time an element is removed and a new one added.

Find n-th smallest element in array without sorting?

I want to write a program to find the n-th smallest element without using any sorting technique..
Can we do it recursively, divide and conquer style like quick-sort?
If not, how?
You can find information about that problem here: Selection algorithm.
What you are referring to is the Selection Algorithm, as previously noted. Specifically, your reference to quicksort suggests you are thinking of the partition based selection.
Here's how it works:
Like in Quicksort, you start by picking a good
pivot: something that you think is nearly
half-way through your list. Then you
go through your entire list of items
swapping things back and forth until
all the items less than your pivot
are in the beginning of the list, and
all things greater than your pivot
are at the end. Your pivot goes into the leftover spot in the middle.
Normally in a quicksort you'd recurse
on both sides of the pivot, but for
the Selection Algorithm you'll only
recurse on the side that contains the
index you are interested in. So, if
you want to find the 3rd lowest
value, recurse on whichever side
contains index 2 (because index 0 is
the 1st lowest value).
You can stop recursing when you've
narrowed the region to just the one
index. At the end, you'll have one
unsorted list of the "m-1" smallest
objects, and another unsorted list of the "n-m" largest
objects. The "m"th object will be inbetween.
This algorithm is also good for finding a sorted list of the highest m elements... just select the m'th largest element, and sort the list above it. Or, for an algorithm that is a little bit faster, do the Quicksort algorithm, but decline to recurse into regions not overlapping the region for which you want to find the sorted values.
The really neat thing about this is that it normally runs in O(n) time. The first time through, it sees the entire list. On the first recursion, it sees about half, then one quarter, etc. So, it looks at about 2n elements, therefore it runs in O(n) time. Unfortunately, as in quicksort, if you consistently pick a bad pivot, you'll be running in O(n2) time.
This task is quite possible to complete within roughly O(n) time (n being the length of the list) by using a heap structure (specifically, a priority queue based on a Fibonacci heap), which gives O(1) insertion time and O(log n) removal time).
Consider the task of retrieving the m-th smallest element from the list. By simply looping over the list and adding each item to the priority queue (of size m), you can effectively create a queue of each of the items in the list in O(n) time (or possibly fewer using some optimisations, though I'm not sure this is exceedingly helpful). Then, it is a straightforward matter of removing the element with lowest priority in the queue (highest priority being the smallest item), which only takes O(log m) time in total, and you're finished.
So overall, the time complexity of the algorithm would be O(n + log n), but since log n << n (i.e. n grows a lot faster than log n), this reduces to simply O(n). I don't think you'll be able to get anything significantly more efficient than this in the general case.
You can use Binary heap, if u dont want to use fibonacci heap.
Algo:
Contruct the min binary heap from the array this operation will take O(n) time.
Since this is a min binary heap, the element at the root is the minimum value.
So keep on removing element frm root, till u get ur kth minimum value. o(1) operation
Make sure after every remove you re-store the heap kO(logn) operation.
So running time here is O(klogn) + O(n)............so it is O(klogn)...
Two stacks can be used like this to locate the Nth smallest number in one pass.
Start with empty Stack-A and Stack-B
PUSH the first number into Stack-A
The next number onwards, choose to PUSH into Stack-A only if the number is smaller than its top
When you have to PUSH into Stack-A, run through these steps
While TOP of Stack-A is larger than new number, POP TOP of Stack-A and push it into Stack-B
When Stack-A goes empty or its TOP is smaller than new number, PUSH in the new number and restore the contents of Stack-B over it
At this point you have inserted the new number to its correct (sorted) place in Stack-A and Stack-B is empty again
If Stack-A depth is now sufficient you have reached the end of your search
I generally agree to Noldorins' optimization analysis.
This stack solution is towards a simple scheme that will work (with relatively more data movement -- across the two stacks). The heap scheme reduces the fetch for Nth smallest number to a tree traversal (log m).
If your target is an optimal solution (say for a large set of numbers or maybe for a programming assignment, where optimization and the demonstration of it are critical) you should use the heap technique.
The stack solution can be compressed in space requirements by implementing the two stacks within the same space of K elements (where K is the size of your data set). So, the downside is just extra stack movement as you insert.
Here is the Ans to find Kth smallest element from an array:
#include<stdio.h>
#include<conio.h>
#include<iostream>
using namespace std;
int Nthmin=0,j=0,i;
int GetNthSmall(int numbers[],int NoOfElements,int Nthsmall);
int main()
{
int size;
cout<<"Enter Size of array\n";
cin>>size;
int *arr=(int*)malloc(sizeof(int)*size);
cout<<"\nEnter array elements\n";
for(i=0;i<size;i++)
cin>>*(arr+i);
cout<<"\n";
for(i=0;i<size;i++)
cout<<*(arr+i)<<" ";
cout<<"\n";
int n=sizeof(arr)/sizeof(int);
int result=GetNthSmall(arr,size,3);
printf("Result = %d",result);
getch();
return 0;
}
int GetNthSmall(int numbers[],int NoOfElements,int Nthsmall)
{
int min=numbers[0];
while(j<Nthsmall)
{
Nthmin=numbers[0];
for(i=1;i<NoOfElements;i++)
{
if(j==0)
{
if(numbers[i]<min)
{
min=numbers[i];
}
Nthmin=min;
}
else
{
if(numbers[i]<Nthmin && numbers[i]>min)
Nthmin=numbers[i];
}
}
min=Nthmin;
j++;
}
return Nthmin;
}
The simplest way to find the nth largest element in an array without using any sorting methods.
public static void kthLargestElement() {
int[] a = { 5, 4, 3, 2, 1, 9, 8 };
int n = 3;
int max = a[0], min = a[0];
for (int i = 0; i < a.length; i++) {
if (a[i] < min) {
min = a[i];
}
if (a[i] > max) {
max = a[i];
}
}
int max1 = max, c = 0;
for (int i = 0; i < a.length; i++) {
for (int j = 0; j < a.length; j++) {
if (a[j] > min && a[j] < max) {
max = a[j];
}
}
min = max;
max = max1;
c++;
if (c == (a.length - n)) {
System.out.println(min);
}
}
}

Resources