Making a new array without duplicates in C

Making a new array without duplicates in C - arrays

my code works as far as i can tell..
I was wondering if it can be done in a better way (better time complexity) and what is the time complexity of my code as im not really sure how to caculate it.
cant change the current array in the question but if there is a faster way to do it by removal i would also like to know, thanks a lot.
int i = 1, j = 0, count = 1;
int arrNew[SIZE] = { NULL };
arrNew[0] = arr1[0];
while(i<size){
if (arr1[i] == arrNew[j]) { // if the element of arr1 is already added, resets j for next iteration and moves to the next element.
j = 0;
i++;
}
else {
if (j == count - 1) { // checks if we reached the end of arrNew and adds missing element.
arrNew[count] = arr1[i];
j = 0;
count++; // this variable makes sure we check only the assigned elements of arrNew.
i++;
}
else // if j < count -1 we didnt finish checking all of arrNew.
j++;
}
}

I was wondering if it can be done in a better way (better time complexity)
It's a little hard to tell what's going on at first, but it looks like you're basically using one loop to do two jobs. You're looping on i to step through the original array, but also using j to scan through the new array for each new element. Effectively, you've got nested loops that both potentially have the same size, so you've got O(n2) complexity.
I'd suggest rewriting your code so that the two loops are explicit. You're not saving any time by making one loop do double duty, and if you come back to this code a month from now you're going to waste a bunch of time trying to remember how it works. Make your code obvious — it's as much about communicating with your future self or your coworkers as with the compiler.
Can you improve on that O(n2) complexity? Yes, definitely. One way is to sort the array, so that duplicate values end up being adjacent to each other in the array. It's then easy to just not copy any values that are the same as the preceding value. I know you can't modify the original array, but you can copy the whole thing, sort it, and then copy that array while removing dupes. That'd give you O(n log n) complexity (if you choose an efficient sorting algorithm). In fact, you could speed that up a bit by combining the sorting and copying -- but you'd still end up with O(n log n) complexity. Another way is to use a hash table: check to see whether the value exists in the table, tossing it if it does, or adding it to the table and copying to the new array if it doesn't. That'd be close to O(n).

Related

How to remove certain elements from an array using a conditional test in C?

I am writing a program that goes through an array of ints and calculates stdev to identify outliers in the data. From here, I would like to create a new array with the identified outliers removed in order to recalculate the avg and stdev. Is there a way that I can do this?

There is a pretty simple solution to the problem that involves switching your mindset in the if statement (which isn't actually in a for loop it seems... might want to fix that).
float dataMinusOutliers[n];
int indexTracker = 0;
for (i=0; i<n; i++) {
if (data[i] >= (-2*stdevfinal) && data[i] <= (2*stdevfinal)) {
dataMinusOutliers[indexTracker] = data[i];
indexTracker += 1;
}
}
Note that this isn't particularly scalable and that the dataMinusOutliers array is going to potentially have quite a few unused indices. You can always use indexTracker - 1 to note how large the array actually is though, and create yet another array into which you copy the important values in dataMinusOutliers. Is there likely a more elegant solution? Yes. Does this work given your requirements though? Yup.

End condition of an IF loop that depends on a 3D array

I'm writing a program which will move a particle about a cube, either left,right,up,down, back or forward depending on the value randomly generator by the program. The particle is able to move with cube of dimensions LxLxL. I wish the program to stop when the particle has been to all possible sites and the number of jumps taken output.
Currently I am doing this using an array[i][j][k] and when the particle has been to a position, changing the value of the array at that corresponding point to 0. However, in my IF loop I have to type out every possible combination of i,j and k in order to say if they are all equal to 0 the program should end. Would there be a better way to do this?
Thanks,
Beth

Yes. I'm assuming the if in question is the one contained within the triple nested loop who's body sets finish=1;. The better way of doing this is to set a your flag before the loop, beginning with a true value then setting it to false and breaking if you encounter a value other then zero. Your if statement becomes much simpler, like this;
int finish =1; // start with a true value
//loops are untouched so still got the for i and for j above this
for(k = 0; k < 15; k++)
{
if (list[i][j][k] != 0)
{
finish = 0;
break;
}
}
// outside all the loops
return finish;
I think this is what you're asking for but if not please edit your question to clarify. I'm not sure if there is some technical name for this concept but the idea is to choose your initial truth value based on what's most efficient. Given you have a 15x15x15 array and a single non-zero value means false, it's much better to begin with true and break as soon as you encounter a value that makes your statement false. Trying to go in the other direction is far more complicated and far less efficient.

Maybe you can add your list[i][j][k] to a collection every time list[i][j][k]=0. Then at the end of your program, check for the collection's length. If it's of the right length, then terminate..

How do you calculate big O of an algorithm

I have a problem where i have to find missing numbers within an array and add them to a set.
The question goes like so:
Array of size (n-m) with numbers from 1..n with m of them missing.
Find one all of the missing numbers in O(log). Array is sorted.
Example:
n = 8
arr = [1,2,4,5,6,8]
m=2
Result has to be a set {3, 7}.
This is my solution so far and wanted to know how i can calculate the big o of a solution. Also most solution I have seen uses the divide and conquer approach. How do i calculate the big oh of my algorithm below ?
ps If i don't meet the requirement, Is there any way I can do this without having to do it recursively ? I am really not a fan of recursion, I simply cant get my head around it ! :(
var arr = [1,2,4,5,6,8];
var mySet = [];
findMissingNumbers(arr);
function findMissingNumbers(arr){
var temp = 0;
for (number in arr){ //O(n)
temp = parseInt(number)+1;
if(arr[temp] - arr[number] > 1){
addToSet(arr[number], arr[temp]);
}
}
}
function addToSet(min, max){
while (min != max-1){
mySet.push(++min);
}
}

There are two things you want to look at, one you have pointed out: how many times do you iterate the loop "for (number in arr)"? If you array contains n-m elements, then this loop should be iterated n-m times. Then look at each operation you do inside the loop and try to figure out a worst-case scenario (or typical) scenario for each. The temp=... line should be a constant cost (say 1 unit per loop), the conditional is constant cost (say 1 unit per loop) and then there is the addToSet. The addToset is more difficult to analyze because it isn't called every time, and it may vary in how expensive it is each time called. So perhaps what you want to think is that for each of the m missing elements, the addToSet is going to perform 1 operation... a total of m operations (which you don't know when they will occur, but all m must occur at some point). Then add up all of your costs.
n-m loops iterations, in each one you do 2 operations total of 2(n-m) then add in the m operations done by addToSet, for a total of something like 2n-m ~ 2n (assuming that m is small compared to n). This could be O(n-m) or also O(n) (If it is O(n-m) it is also O(n) since n >= n-m.) Hope this helps.

In your code you have a complexity of O(n) in time because you check n index of your array. A faster way to do this is something like that :
Go to the half of your array
Is this number at the right place (this
means the other ones will be too because array is sorted)
If it's the expected number : go to the half of the second half
If not : add this number in the set and go to the half of the first half
Stop when the number you looking at is at index size-1
Note that you can have some optimization, for example you can directly check if the array have the correct size and return an empty array. It depends of your problem.
My algorithm is also in O(n) because you always take the worst set of data. In my case I would be that we miss one data at the end of the array. So technically it should be O(n-1) but constants are negligible in front of n (assumed to be very high). That's why you have to keep in mind the average complexity too.

For what it's worth here is a more succinct implementation of the algorithm (javascript):
var N = 10;
var arr = [2,9];
var mySet = [];
var index = 0;
for(var i=1;i<=N;i++){
if(i!=arr[index]){
mySet.push(i);
}else{
index++;
}
}
Here the big(O) is trivial as there is only a single loop which runs exactly N times with constant cost operations each iteration.

Big O is the complexity of the algorithm. It is a function for the number of steps it takes your program to come up with a solution.
This gives a pretty good explanation of how it works:
Big O, how do you calculate/approximate it?

Limit input data to achieve a better Big O complexity

You are given an unsorted array of n integers, and you would like to find if there are any duplicates in the array (i.e. any integer appearing more than once).
Describe an algorithm (implemented with two nested loops) to do this.
The question that I am stuck at is:
How can you limit the input data to achieve a better Big O complexity? Describe an algorithm for handling this limited data to find if there are any duplicates. What is the Big O complexity?
Your help will be greatly appreciated. This is not related to my coursework, assignment or coursework and such. It's from the previous year exam paper and I am doing some self-study but seem to be stuck on this question. The only possible solution that i could come up with is:
If we limit the data, and use nested loops to perform operations to find if there are duplicates. The complexity would be O(n) simply because the amount of time the operations take to perform is proportional to the data size.
If my answer makes no sense, then please ignore it and if you could, then please suggest possible solutions/ working out to this answer.
If someone could help me solve this answer, I would be grateful as I have attempted countless possible solution, all of which seems to be not the correct one.
Edited part, again.. Another possible solution (if effective!):
We could implement a loop to sort the array so that it sorts the array (from lowest integer to highest integer), therefore the duplicates will be right next to each other making them easier and faster to be identified.
The big O complexity would still be O(n^2).
Since this is linear type, it would simply use the first loop and iterate n-1 times as we are getting the index in the array (in the first iteration it could be, for instance, 1) and store this in a variable names 'current'.
The loop will update the current variable by +1 each time through the iteration, within that loop, we now write another loop to compare the current number to the next number and if it equals to the next number, we can print using a printf statement else we move back to the outer loop to update the current variable by + 1 (next value in the array) and update the next variable to hold the value of the number after the value in current.

You can do linearly (O(n)) for any input if you use hash tables (which have constant look-up time).
However, this is not what you are being asked about.
By limiting the possible values in the array, you can achieve linear performance.
E.g., if your integers have range 1..L, you can allocate a bit array of length L, initialize it to 0, and iterate over your input array, checking and flipping the appropriate bit for each input.

A variance of Bucket Sort will do. This will give you complexity of O(n) where 'n' is the number of input elements.
But one restriction - max value. You should know the max value your integer array can take. Lets say it as m.
The idea is to create a bool array of size m (all initialized to false). Then iterate over your array. As you find an element, set bucket[m] to true. If it is already true then you've encountered a duplicate.
A java code,
// alternatively, you can iterate over the array to find the maxVal which again is O(n).
public boolean findDup(int [] arr, int maxVal)
{
// java by default assigns false to all the values.
boolean bucket[] = new boolean[maxVal];
for (int elem : arr)
{
if (bucket[elem])
{
return true; // a duplicate found
}
bucket[elem] = true;
}
return false;
}
But the constraint here is the space. You need O(maxVal) space.

nested loops get you O(N*M) or O(N*log(M)) for O(N) you can not use nested loops !!!
I would do it by use of histogram instead:
DWORD in[N]={ ... }; // input data ... values are from < 0 , M )
DWORD his[M]={ ... }; // histogram of in[]
int i,j;
// compute histogram O(N)
for (i=0;i<M;i++) his[i]=0; // this can be done also by memset ...
for (i=0;i<N;i++) his[in[i]]++; // if the range of values is not from 0 then shift it ...
// remove duplicates O(N)
for (i=0,j=0;i<N;i++)
{
his[in[i]]--; // count down duplicates
in[j]=in[i]; // copy item
if (his[in[i]]<=0) j++; // if not duplicate then do not delete it
}
// now j holds the new in[] array size
[Notes]
if value range is too big with sparse areas then you need to convert his[]
to dynamic list with two values per item
one is the value from in[] and the second is its occurrence count
but then you need nested loop -> O(N*M)
or with binary search -> O(N*log(M))

What is the bug in this code?

Based on a this logic given as an answer on SO to a different(similar) question, to remove repeated numbers in a array in O(N) time complexity, I implemented that logic in C, as shown below. But the result of my code does not return unique numbers. I tried debugging but could not get the logic behind it to fix this.
int remove_repeat(int *a, int n)
{
int i, k;
k = 0;
for (i = 1; i < n; i++)
{
if (a[k] != a[i])
{
a[k+1] = a[i];
k++;
}
}
return (k+1);
}
main()
{
int a[] = {1, 4, 1, 2, 3, 3, 3, 1, 5};
int n;
int i;
n = remove_repeat(a, 9);
for (i = 0; i < n; i++)
printf("a[%d] = %d\n", i, a[i]);
}
1] What is incorrect in above code to remove duplicates.
2] Any other O(N) or O(NlogN) solution for this problem. Its logic?

Heap sort in O(n log n) time.
Iterate through in O(n) time replacing repeating elements with a sentinel value (such as INT_MAX).
Heap sort again in O(n log n) to distil out the repeating elements.
Still bounded by O(n log n).

Your code only checks whether an item in the array is the same as its immediate predecessor.
If your array starts out sorted, that will work, because all instances of a particular number will be contiguous.
If your array isn't sorted to start with, that won't work because instances of a particular number may not be contiguous, so you have to look through all the preceding numbers to determine whether one has been seen yet.
To do the job in O(N log N) time, you can sort the array, then use the logic you already have to remove duplicates from the sorted array. Obviously enough, this is only useful if you're all right with rearranging the numbers.
If you want to retain the original order, you can use something like a hash table or bit set to track whether a number has been seen yet or not, and only copy each number to the output when/if it has not yet been seen. To do this, we change your current:
if (a[k] != a[i])
a[k+1] = a[i];
to something like:
if (!hash_find(hash_table, a[i])) {
hash_insert(hash_table, a[i]);
a[k+1] = a[i];
}
If your numbers all fall within fairly narrow bounds or you expect the values to be dense (i.e., most values are present) you might want to use a bit-set instead of a hash table. This would be just an array of bits, set to zero or one to indicate whether a particular number has been seen yet.
On the other hand, if you're more concerned with the upper bound on complexity than the average case, you could use a balanced tree-based collection instead of a hash table. This will typically use more memory and run more slowly, but its expected complexity and worst case complexity are essentially identical (O(N log N)). A typical hash table degenerates from constant complexity to linear complexity in the worst case, which will change your overall complexity from O(N) to O(N2).

Your code would appear to require that the input is sorted. With unsorted inputs as you are testing with, your code will not remove all duplicates (only adjacent ones).

You are able to get O(N) solution if the number of integers is known up front and smaller than the amount of memory you have :). Make one pass to determine the unique integers you have using auxillary storage, then another to output the unique values.
Code below is in Java, but hopefully you get the idea.
int[] removeRepeats(int[] a) {
// Assume these are the integers between 0 and 1000
Boolean[] v = new Boolean[1000]; // A lazy way of getting a tri-state var (false, true, null)
for (int i=0;i<a.length;++i) {
v[a[i]] = Boolean.TRUE;
}
// v[i] = null => number not seen
// v[i] = true => number seen
int[] out = new int[a.length];
int ptr = 0;
for (int i=0;i<a.length;++i) {
if (v[a[i]] != null && v[a[i]].equals(Boolean.TRUE)) {
out[ptr++] = a[i];
v[a[i]] = Boolean.FALSE;
}
}
// Out now doesn't contain duplicates, order is preserved and ptr represents how
// many elements are set.
return out;
}

You are going to need two loops, one to go through the source and one to check each item in the destination array.
You are not going to get O(N).
[EDIT]
The article you linked to suggests a sorted output array which means the search for duplicates in the output array can be a binary search...which is O(LogN).

Your logic just wrong, so the code is wrong too. Do your logic by yourself before coding it.
I suggest a O(NlnN) way with a modification of heapsort.
With heapsort, we join from a[i] to a[n], find the minimum and replace it with a[i], right?
So now is the modification, if the minimum is the same with a[i-1] then swap minimum and a[n], reduce your array item's number by 1.
It should do the trick in O(NlnN) way.

Your code will work only on particular cases. Clearly, you're checking adjacent values but duplicate values can occur any where in array. Hence, it's totally wrong.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight