Remove duplicates from array in linear time and without extra arrays

Remove duplicates from array in linear time and without extra arrays - arrays

We have an array and it is unsorted. We know the range is [0,n].
We want to remove duplicates but we cannot use extra arrays and it must run in linear time.
Any ideas? Just to clarify, this is not for homework!

If the integers are limited 0 to n, you can move through the array, placing numbers by their indices. Every time you replace a number, take the value that used to be there and move it to where it should be. For instance, let's say we have an array of size 8:
-----------------
|3|6|3|4|5|1|7|7|
-----------------
S
Where S is our starting point, and we'll use C to keep track of our "current" index below.
We start with index 0, and move 3 to the 3 index spot, where 4 is. Save 4 in a temp var.
-----------------
|X|6|3|3|5|1|7|7| Saved 4
-----------------
S C
We then put 4 in the index 4, saving what used to be there, 5.
-----------------
|X|6|3|3|4|1|7|7| Saved 5
-----------------
S C
Keep going
-----------------
|X|6|3|3|4|5|7|7| Saved 1
-----------------
S C
-----------------
|X|1|3|3|4|5|7|7| Saved 6
-----------------
S C
-----------------
|X|1|3|3|4|5|6|7| Saved 7
-----------------
S C
When we try to replace 7, we see a conflict, so we simply don't place it. We then continue from the starting index S, increment it by 1:
-----------------
|X|1|3|3|4|5|6|7|
-----------------
S
1 is fine here, 3 needs to move
-----------------
|X|1|X|3|4|5|6|7|
-----------------
S
But 3 is a duplicate, so we throw it away and keep iterating through the rest of the array.
So basically, we move each entry at most 1 time, and iterate through the entire array. That's O(2n) = O(n)

void printRepeating(int arr[], int size)
{
int i;
printf("The repeating elements are: \n");
for(i = 0; i < size; i++)
{
if(arr[abs(arr[i])] >= 0)
arr[abs(arr[i])] = -arr[abs(arr[i])];
else
printf(" %d ", abs(arr[i]));
}
}

Assume int a[n] is an array of integers in the range [0,n-1]. Note that this differs slightly from the stated problem, but I make this assumption to make clear how the algorithm works. The algorithm can be patched up to work for integers in the range [0,n].
for (int i=0; i<n; i++)
{
if (a[i] != i)
{
j = a[i];
k = a[j];
a[j] = j; // Swap a[j] and a[i]
a[i] = k;
}
}
for (int i=0; i<n; i++)
{
if (a[i] == i)
{
printf("%d\n", i);
}
}

Can you sort? Sort with Radix Sort - http://en.wikipedia.org/wiki/Radix_sort with complexity O(arraySize) for given case and then remove duplicates from sorted array O(arraySize).

Walk through the array assign array[array[i]] = -array[array[i]]; if not negative; if its already negative then its duplicate, this will work since all values are within 0 and n.

Extending #Joel Lee's code for completion.
#include <iostream>
void remove_duplicates(int *a, int size)
{
int i, j, k;
bool swap = true;
while(swap){
swap = false;
for (i=0; i<size; i++){
if(a[i] != i && a[i] != a[a[i]]){
j = a[i];
k = a[j];
a[i] = k;
a[j] = j;
swap = true;
}
}
}
}
int main()
{
int i;
//int array[8] = {3,6,3,4,5,1,7,7};
int array[8] = {7,4,6,3,5,4,6,2};
remove_duplicates(array, sizeof(array)/sizeof(int));
for (int i=0; i<8; i++)
if(array[i] == i)
std::cout << array[i] << " ";
return 0;
}

With ES6 I think this can be solved with only a few lines reducing the array into an object and then using object.keys to get array without duplicates. This probably takes more memory. I'm not sure.
I did it like this:
var obj = array.reduce(function (acc, elem) {
acc[elem] = true;
return acc;
},{});
var uniqueArray = Object.keys(obj);
This has the added bonus (or disadvantage) of sorting the array. It works with strings too.

Use the array a a container with negative sign as an indicator, this will corrupt the input though.

Related

How to keep while loop in bubble sort function in C

I'm trying to make my own bubble-sort function in C.
As you can see in the code below this, I'm trying to only using while / if loop to create this function. I put 5 numbers (1,3,2,5,4) so that size of array of this would be 5, and I got 5 (I checked it with Python(C)tutor. However, It works well until tab[j] gets 3. I'm trying to figure it out, but couldn't figure it out why it keeps going out when tab[j] gets 3.
Could you anybody explain what's wrong to me? I would appreciate it.
Here is my code below:
#include <stdio.h>
void ft_sort_integer_table(int *tab, int size)
{
int i;
int j;
int tem;
i = 0;
j = 0;
while(tab[i] < size)
{
if(tab[j] > tab[j+1])
{
tem = tab[j];
tab[j] = tab[j+1];
tab[j+1] = tem;
printf("%d ", tab[j]);
j++;
}
else if(tab[j] < tab[j+1])
{
printf("%d ",tab[j]);
j++;
}
i++;
}
}
int main(void)
{
int tab[] = {1,3,2,5,4};
int size = sizeof(tab)/sizeof(*tab);
ft_sort_integer_table(tab, size);
return(0);
}

You'll need an inner loop in your bubble sort, which is responsible for moving the largest element to the back and performing swaps i times (these large elements are "bubbling up"). Start the inner loop at 0 on each iteration and iterate through size - i (we know that the last i elements are sorted and in their final positions).
i controls your outer loop and should be incremented at the end of the loop (just as you would with a for loop). j controls the inner loop and should be incremented at the end of the loop.
While you're at it, it's a good idea to move your printing out of the sort function, which causes an unnecessary side effect and might frustrate your debugging efforts.
Also, it's worth mentioning that (1) for loops are more semantically appropriate here and (2) there is an optimization available by adding a boolean--as soon as you have a pass through the inner loop that performs no swaps, end early!
#include <stdio.h>
void ft_sort_integer_table(int *tab, int size)
{
int i = 0, j, tem;
while (i < size)
{
j = 0;
while (j < size - i)
{
if (tab[j] > tab[j+1])
{
tem = tab[j];
tab[j] = tab[j+1];
tab[j+1] = tem;
}
j++;
}
i++;
}
}
int main(void)
{
int tab[] = {1,3,2,5,4,6,7,1,5,6,8,9,1,4,5,1,2};
int size = sizeof(tab) / sizeof(*tab);
ft_sort_integer_table(tab, size);
for (int i = 0; i < size; i++)
{
printf("%d ", tab[i]);
}
return(0);
}
Output:
1 1 1 1 2 2 3 4 4 5 5 5 6 6 7 8 9

I'm trying to figure it out, but couldn't figure it out why it keeps
going out when tab[j] get 3.
From your code above, j increment in the same fashion as i. That means both variables will have the same value since j will be incremented by one after the if-then-else statement, and i will also be incremented by one at the end of each loop. Therefore, tab[j] is referencing the same value as tab[i]
With that being said, the boolean condition in the while loop checks whether the value in the tab[i] is less than the value of size.
When i == 3, tab[i] == 5 since in the loop, only the values in the array of index less then i are swapped/changed. Since the size variable holds that value of 5, tab[i] < size will result in a false value and exit the loop.
More information on bubble sort can be found here, https://www.geeksforgeeks.org/bubble-sort/

basic thing from a job interview - using linked list,arrays

I got this question on a job interview, and i could't solve it.
i think i was just really nervous because it doesn't look this hard.
Arr is a given integer array, size n. Sol is a given empty array,
size n.
for each i (i goes from 0 to n-1 ) you have to put in Sol[i] the index
in Arr of the closest elemnt appears on the left side, that is smaller
than Arr[i]. meaning: Sol[i]=max{ j | j < i; Arr[j] < Arr[i] }. if
the is no such index, put -1.
for example: Arr is [5,7,9,2,8,11,16,10,12] Sol is
[-1,0,1,-1,3,4,5,4,7]
time complexity: o(n) space complexity: o(n)
I tried to scan the array from the end to the start, but I didn't know how to continue.
I was asked to use only array and linked list.
I had 10 minutes to solve it, so guess it is not that hard.
thanks a lot!!

Note that for Arr[] with length < 2 there are trivial solutions. This pseudo code assumes that Arr[] has a length >= 2.
int Arr[] = {5,7,9,2,8,11,16,10,12};
int Sol[] = new int[9];
Stack<int> undecided; // or a stack implemented using a linked list
Sol[0] = -1; // this is a given
for(int i = Arr.length() - 1; i != 0; --i) {
undecided.push(i); // we haven't found a smaller value for this Arr[i] item yet
// note that all the items already on the stack (if any)
// are smaller than the value of Arr[i] or they would have
// been popped off in a previous iteration of the loop
// below
while (!undecided.empty() && (Arr[i-1] < Arr[undecided.peek()])) {
// the value for the item on the undecided stack is
// larger than Arr[i-1], so that's the index for
// the item on the undecided stack
Sol[undecided.peek()] = i-1;
undecided.pop();
}
}
// We've filled in Sol[] for all the items have lesser values to
// the left of them. Whatever is still on the undecided stack
// needs to be set to -1 in Sol
while (!undecided.empty()) {
Sol[undecided.peek()] = -1;
undecided.pop();
}
To be honest, I'm not sure I would have come up with this in an interview situation given a 10 minute time limit.
A C++ version of this can be found on ideone.com: https://ideone.com/VXC0yq

int Arr[] = {5,7,9,2,8,11,16,10,12};
int Sol[] = new int[9];
for(int i = 0; i < Arr.length; i++) {
int element = Arr[i];
int tmp = -1;
for(int j = 0 ;j < i; j++) {
int other = Arr[j];
if (other < element) {
tmp = j;
}
}
Sol[i] = tmp;
}

Rearranging an array with respect to another array

I have 2 arrays, in parallel:
defenders = {1,5,7,9,12,18};
attackers = {3,10,14,15,17,18};
Both are sorted, what I am trying to do is rearrange the defending array's values so that they win more games (defender[i] > attacker[i]) but I am having issues on how to swap the values in the defenders array. So in reality we are only working with the defenders array with respect to the attackers.
I have this but if anything it isn't shifting much and Im pretty sure I'm not doing it right. Its suppose to be a brute force method.
void rearrange(int* attackers, int* defenders, int size){
int i, c, j;
int temp;
for(i = 0; i<size; i++){
c = 0;
j = 0;
if(defenders[c]<attackers[j]){
temp = defenders[c+1];
defenders[c+1] = defenders[c];
defenders[c] = temp;
c++;
j++;
}
else
c++;
j++;
}
}
Edit: I did ask this question before, but I feel as if I worded it terribly, and didn't know how to "bump" the older post.

To be honest, I didn't look at your code, since I have to wake up in less than 2.30 hours to go to work, hope you won't have hard feelings for me.. :)
I implemented the algorithm proposed by Eugene Sh. Some links you may want to read first, before digging into the code:
qsort in C
qsort and structs
shortcircuiting
My approach:
Create merged array by scanning both att and def.
Sort merged array.
Refill def with values that satisfy the ad pattern.
Complete refilling def with the remaining values (that are
defeats)*.
*Steps 3 and 4 require two passes in my approach, maybe it can get better.
#include <stdio.h>
#include <stdlib.h>
typedef struct {
char c; // a for att and d for def
int v;
} pair;
void print(pair* array, int N);
void print_int_array(int* array, int N);
// function to be used by qsort()
int compar(const void* a, const void* b) {
pair *pair_a = (pair *)a;
pair *pair_b = (pair *)b;
if(pair_a->v == pair_b->v)
return pair_b->c - pair_a->c; // d has highest priority
return pair_a->v - pair_b->v;
}
int main(void) {
const int N = 6;
int def[] = {1, 5, 7, 9, 12, 18};
int att[] = {3, 10, 14, 15, 17, 18};
int i, j = 0;
// let's construct the merged array
pair merged_ar[2*N];
// scan the def array
for(i = 0; i < N; ++i) {
merged_ar[i].c = 'd';
merged_ar[i].v = def[i];
}
// scan the att array
for(i = N; i < 2 * N; ++i) {
merged_ar[i].c = 'a';
merged_ar[i].v = att[j++]; // watch out for the pointers
// 'merged_ar' is bigger than 'att'
}
// sort the merged array
qsort(merged_ar, 2 * N, sizeof(pair), compar);
print(merged_ar, 2 * N);
// scan the merged array
// to collect the patterns
j = 0;
// first pass to collect the patterns ad
for(i = 0; i < 2 * N; ++i) {
// if pattern found
if(merged_ar[i].c == 'a' && // first letter of pattern
i < 2 * N - 1 && // check that I am not the last element
merged_ar[i + 1].c == 'd') { // second letter of the pattern
def[j++] = merged_ar[i + 1].v; // fill-in `def` array
merged_ar[i + 1].c = 'u'; // mark that value as used
}
}
// second pass to collect the cases were 'def' loses
for(i = 0; i < 2 * N; ++i) {
// 'a' is for the 'att' and 'u' is already in 'def'
if(merged_ar[i].c == 'd') {
def[j++] = merged_ar[i].v;
}
}
print_int_array(def, N);
return 0;
}
void print_int_array(int* array, int N) {
int i;
for(i = 0; i < N; ++i) {
printf("%d ", array[i]);
}
printf("\n");
}
void print(pair* array, int N) {
int i;
for(i = 0; i < N; ++i) {
printf("%c %d\n", array[i].c, array[i].v);
}
}
Output:
gsamaras#gsamaras:~$ gcc -Wall px.c
gsamaras#gsamaras:~$ ./a.out
d 1
a 3
d 5
d 7
d 9
a 10
d 12
a 14
a 15
a 17
d 18
a 18
5 12 18 1 7 9

The problem is that you are resetting c and j to zero on each iteration of the loop. Consequently, you are only ever comparing the first value in each array.
Another problem is that you will read one past the end of the defenders array in the case that the last value of defenders array is less than last value of attackers array.
Another problem or maybe just oddity is that you are incrementing both c and j in both branches of the if-statement. If this is what you actually want, then c and j are useless and you can just use i.
I would offer you some updated code, but there is not a good enough description of what you are trying to achieve; I can only point out the problems that are apparent.

missing numbers

Given an array of size n. It contains numbers in the range 1 to n. Each number is present at
least once except for 2 numbers. Find the missing numbers.
eg. an array of size 5
elements are suppose 3,1,4,4,3
one approach is
static int k;
for(i=1;i<=n;i++)
{
for(j=0;j<n;j++)
{
if(i==a[j])
break;
}
if(j==n)
{
k++;
printf("missing element is", a[j]);
}
if(k==2)
break;}
another solution can be..
for(i=0;i

Let me First explain the concept:
You know that sum of natural numbers 1....n is
(n*(n+1))/2.Also you know the sum of square of sum of first n natural numbers 1,2....n is n*(n+1)*(2n+1)/6.Thus you could solve the above problem in O(n) time using above concept.
Also if space complexity is not of much consideration you could use count based approach which requires O(n) time and space complexity.
For more detailed solution visit Find the two repeating elements in a given array

I like the "use array elements as indexes" method from Algorithmist's link.
Method 5 (Use array elements as index)
Thanks to Manish K. Aasawat for suggesting this method.
traverse the list for i= 1st to n+2 elements
{
check for sign of A[abs(A[i])] ;
if positive then
make it negative by A[abs(A[i])]=-A[abs(A[i])];
else // i.e., A[abs(A[i])] is negative
this element (ith element of list) is a repetition
}
The only difference is that here it would be traversing 1 to n.
Notice that this is a single-pass solution that uses no extra space (besides storing i)!
Footnote:
Technically it "steals" some extra space -- essentially it is the counter array solution, but instead of allocating its own array of ints, it uses the sign bits of the original array as counters.

Use qsort() to sort the array, then loop over it once to find the missing values. Average O(n*log(n)) time because of the sort, and minimal constant additional storage.

I haven't checked or run this code, but you should get the idea.
int print_missing(int *arr, size_t length) {
int *new_arr = calloc(sizeof(int) * length);
int i;
for(i = 0; i < length; i++) {
new_arr[arr[i]] = 1;
}
for(i = 0; i < length; i++) {
if(!new_arr[i]) {
printf("Number %i is missing\n", i);
}
}
free(new_arr);
return 0;
}
Runtime should be O(2n). Correct me if I'm wrong.

It is unclear why the naive approach (you could use a bitfield or an array) of marking the items you have seen isn't just fine. O(2n) CPU, O(n/8) storage.

If you are free to choose the language, then use python's sets.
numbers = [3,1,4,4,3]
print set (range (1 , len (numbers) + 1) ) - set (numbers)
Yields the output
set([2, 5])

Here you go. C# solution:
static IEnumerable<int> FindMissingValuesInRange( int[] numbers )
{
HashSet<int> values = new HashSet<int>( numbers ) ;
for( int value = 1 ; value <= numbers.Length ; ++value )
{
if ( !values.Contains(value) ) yield return value ;
}
}

I see a number of problems with your code. First off, j==n will never happen, and that doesn't give us the missing number. You should also initialize k to 0 before you attempt to increment it. I wrote an algorithm similar to yours, but it works correctly. However, it is not any faster than you expected yours to be:
int k = 0;
int n = 5;
bool found = false;
int a[] = { 3, 1, 4, 4, 3 };
for(int i = 1; i <= n; i++)
{
for(int j = 0; j < n; j++)
{
if(a[j] == i)
{
found = true;
break;
}
}
if(!found)
{
printf("missing element is %d\n", i);
k++;
if(k==2)
break;
}
else
found = false;
}
H2H

using a support array you can archeive O(n)
int support[n];
// this loop here fills the support array with the
// number of a[i]'s occurences
for(int i = 0; i < n; i++)
support[a[i]] += 1;
// now look which are missing (or duplicates, or whatever)
for(int i = 0; i < n; i++)
if(support[i] == 0) printf("%d is missing", i);

**
for(i=0; i < n;i++)
{
while((a[i]!=i+1)&&(a[i]!=a[a[i]-1])
{
swap(a[i],a[a[i]-1]);
}
for(i=0;i< n;i++)
{
if(a[i]!=i+1)
printf("%d is missing",i+1); }
this takes o(n) time and o(1) space
========================================**

We can use the following code to find duplicate and missing values:
int size = 8;
int arr[] = {1, 2, 3, 5, 1, 3};
int result[] = new int[size];
for(int i =0; i < arr.length; i++)
{
if(result[arr[i]-1] == 1)
{
System.out.println("repeating: " + (arr[i]));
}
result[arr[i]-1]++;
}
for(int i =0; i < result.length; i++)
{
if(result[i] == 0)
{
System.out.println("missing: " + (i+1));
}
}

This is an interview question: Missing Numbers.
condition 1 : The array must not contain any duplicates.
The complete solution is :
public class Solution5 {
public static void main(String[] args) {
int a[] = { 1,8,6,7,10};
Arrays.sort(a);
List<Integer> list = new ArrayList<>();
int start = a[0];
for (int i = 0; i < a.length; i++) {
int ch = a[i];
if(start == ch) {
start++;
}else {
list.add(start);
start++;
//must do this
i--;
}
}//for
System.out.println(list);
}//main
}

Algorithm: efficient way to remove duplicate integers from an array

I got this problem from an interview with Microsoft.
Given an array of random integers,
write an algorithm in C that removes
duplicated numbers and return the unique numbers in the original
array.
E.g Input: {4, 8, 4, 1, 1, 2, 9} Output: {4, 8, 1, 2, 9, ?, ?}
One caveat is that the expected algorithm should not required the array to be sorted first. And when an element has been removed, the following elements must be shifted forward as well. Anyway, value of elements at the tail of the array where elements were shifted forward are negligible.
Update: The result must be returned in the original array and helper data structure (e.g. hashtable) should not be used. However, I guess order preservation is not necessary.
Update2: For those who wonder why these impractical constraints, this was an interview question and all these constraints are discussed during the thinking process to see how I can come up with different ideas.

A solution suggested by my girlfriend is a variation of merge sort. The only modification is that during the merge step, just disregard duplicated values. This solution would be as well O(n log n). In this approach, the sorting/duplication removal are combined together. However, I'm not sure if that makes any difference, though.

I've posted this once before on SO, but I'll reproduce it here because it's pretty cool. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows:
Take the first element of the array, this will be the sentinel.
Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel.
Move all elements for which the index is equal to the hash to the beginning of the array.
Move all elements that are equal to sentinel, except the first element of the array, to the end of the array.
What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements.
This can be shown to be O(N) provided no pathological scenario in the hashing: Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast.
Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion.
Here's an implementation in the D programming language:
void uniqueInPlace(T)(ref T[] dataIn) {
uniqueInPlaceImpl(dataIn, 0);
}
void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) {
if(dataIn.length - start < 2)
return;
invariant T sentinel = dataIn[start];
T[] data = dataIn[start + 1..$];
static hash_t getHash(T elem) {
static if(is(T == uint) || is(T == int)) {
return cast(hash_t) elem;
} else static if(__traits(compiles, elem.toHash)) {
return elem.toHash;
} else {
static auto ti = typeid(typeof(elem));
return ti.getHash(&elem);
}
}
for(size_t index = 0; index < data.length;) {
if(data[index] == sentinel) {
index++;
continue;
}
auto hash = getHash(data[index]) % data.length;
if(index == hash) {
index++;
continue;
}
if(data[index] == data[hash]) {
data[index] = sentinel;
index++;
continue;
}
if(data[hash] == sentinel) {
swap(data[hash], data[index]);
index++;
continue;
}
auto hashHash = getHash(data[hash]) % data.length;
if(hashHash != hash) {
swap(data[index], data[hash]);
if(hash < index)
index++;
} else {
index++;
}
}
size_t swapPos = 0;
foreach(i; 0..data.length) {
if(data[i] != sentinel && i == getHash(data[i]) % data.length) {
swap(data[i], data[swapPos++]);
}
}
size_t sentinelPos = data.length;
for(size_t i = swapPos; i < sentinelPos;) {
if(data[i] == sentinel) {
swap(data[i], data[--sentinelPos]);
} else {
i++;
}
}
dataIn = dataIn[0..sentinelPos + start + 1];
uniqueInPlaceImpl(dataIn, start + swapPos + 1);
}

How about:
void rmdup(int *array, int length)
{
int *current , *end = array + length - 1;
for ( current = array + 1; array < end; array++, current = array + 1 )
{
while ( current <= end )
{
if ( *current == *array )
{
*current = *end--;
}
else
{
current++;
}
}
}
}
Should be O(n^2) or less.

If you are looking for the superior O-notation, then sorting the array with an O(n log n) sort then doing a O(n) traversal may be the best route. Without sorting, you are looking at O(n^2).
Edit: if you are just doing integers, then you can also do radix sort to get O(n).

One more efficient implementation
int i, j;
/* new length of modified array */
int NewLength = 1;
for(i=1; i< Length; i++){
for(j=0; j< NewLength ; j++)
{
if(array[i] == array[j])
break;
}
/* if none of the values in index[0..j] of array is not same as array[i],
then copy the current value to corresponding new position in array */
if (j==NewLength )
array[NewLength++] = array[i];
}
In this implementation there is no need for sorting the array.
Also if a duplicate element is found, there is no need for shifting all elements after this by one position.
The output of this code is array[] with size NewLength
Here we are starting from the 2nd elemt in array and comparing it with all the elements in array up to this array.
We are holding an extra index variable 'NewLength' for modifying the input array.
NewLength variabel is initialized to 0.
Element in array[1] will be compared with array[0].
If they are different, then value in array[NewLength] will be modified with array[1] and increment NewLength.
If they are same, NewLength will not be modified.
So if we have an array [1 2 1 3 1],
then
In First pass of 'j' loop, array[1] (2) will be compared with array0, then 2 will be written to array[NewLength] = array[1]
so array will be [1 2] since NewLength = 2
In second pass of 'j' loop, array[2] (1) will be compared with array0 and array1. Here since array[2] (1) and array0 are same loop will break here.
so array will be [1 2] since NewLength = 2
and so on

1. Using O(1) extra space, in O(n log n) time
This is possible, for instance:
first do an in-place O(n log n) sort
then walk through the list once, writing the first instance of every back to the beginning of the list
I believe ejel's partner is correct that the best way to do this would be an in-place merge sort with a simplified merge step, and that that is probably the intent of the question, if you were eg. writing a new library function to do this as efficiently as possible with no ability to improve the inputs, and there would be cases it would be useful to do so without a hash-table, depending on the sorts of inputs. But I haven't actually checked this.
2. Using O(lots) extra space, in O(n) time
declare a zero'd array big enough to hold all integers
walk through the array once
set the corresponding array element to 1 for each integer.
If it was already 1, skip that integer.
This only works if several questionable assumptions hold:
it's possible to zero memory cheaply, or the size of the ints are small compared to the number of them
you're happy to ask your OS for 256^sizepof(int) memory
and it will cache it for you really really efficiently if it's gigantic
It's a bad answer, but if you have LOTS of input elements, but they're all 8-bit integers (or maybe even 16-bit integers) it could be the best way.
3. O(little)-ish extra space, O(n)-ish time
As #2, but use a hash table.
4. The clear way
If the number of elements is small, writing an appropriate algorithm is not useful if other code is quicker to write and quicker to read.
Eg. Walk through the array for each unique elements (ie. the first element, the second element (duplicates of the first having been removed) etc) removing all identical elements. O(1) extra space, O(n^2) time.
Eg. Use library functions which do this. efficiency depends which you have easily available.

Well, it's basic implementation is quite simple. Go through all elements, check whether there are duplicates in the remaining ones and shift the rest over them.
It's terrible inefficient and you could speed it up by a helper-array for the output or sorting/binary trees, but this doesn't seem to be allowed.

If you are allowed to use C++, a call to std::sort followed by a call to std::unique will give you the answer. The time complexity is O(N log N) for the sort and O(N) for the unique traversal.
And if C++ is off the table there isn't anything that keeps these same algorithms from being written in C.

You could do this in a single traversal, if you are willing to sacrifice memory. You can simply tally whether you have seen an integer or not in a hash/associative array. If you have already seen a number, remove it as you go, or better yet, move numbers you have not seen into a new array, avoiding any shifting in the original array.
In Perl:
foreach $i (#myary) {
if(!defined $seen{$i}) {
$seen{$i} = 1;
push #newary, $i;
}
}

The return value of the function should be the number of unique elements and they are all stored at the front of the array. Without this additional information, you won't even know if there were any duplicates.
Each iteration of the outer loop processes one element of the array. If it is unique, it stays in the front of the array and if it is a duplicate, it is overwritten by the last unprocessed element in the array. This solution runs in O(n^2) time.
#include <stdio.h>
#include <stdlib.h>
size_t rmdup(int *arr, size_t len)
{
size_t prev = 0;
size_t curr = 1;
size_t last = len - 1;
while (curr <= last) {
for (prev = 0; prev < curr && arr[curr] != arr[prev]; ++prev);
if (prev == curr) {
++curr;
} else {
arr[curr] = arr[last];
--last;
}
}
return curr;
}
void print_array(int *arr, size_t len)
{
printf("{");
size_t curr = 0;
for (curr = 0; curr < len; ++curr) {
if (curr > 0) printf(", ");
printf("%d", arr[curr]);
}
printf("}");
}
int main()
{
int arr[] = {4, 8, 4, 1, 1, 2, 9};
printf("Before: ");
size_t len = sizeof (arr) / sizeof (arr[0]);
print_array(arr, len);
len = rmdup(arr, len);
printf("\nAfter: ");
print_array(arr, len);
printf("\n");
return 0;
}

Here is a Java Version.
int[] removeDuplicate(int[] input){
int arrayLen = input.length;
for(int i=0;i<arrayLen;i++){
for(int j = i+1; j< arrayLen ; j++){
if(((input[i]^input[j]) == 0)){
input[j] = 0;
}
if((input[j]==0) && j<arrayLen-1){
input[j] = input[j+1];
input[j+1] = 0;
}
}
}
return input;
}

Here is my solution.
///// find duplicates in an array and remove them
void unique(int* input, int n)
{
merge_sort(input, 0, n) ;
int prev = 0 ;
for(int i = 1 ; i < n ; i++)
{
if(input[i] != input[prev])
if(prev < i-1)
input[prev++] = input[i] ;
}
}

An array should obviously be "traversed" right-to-left to avoid unneccessary copying of values back and forth.
If you have unlimited memory, you can allocate a bit array for sizeof(type-of-element-in-array) / 8 bytes to have each bit signify whether you've already encountered corresponding value or not.
If you don't, I can't think of anything better than traversing an array and comparing each value with values that follow it and then if duplicate is found, remove these values altogether. This is somewhere near O(n^2) (or O((n^2-n)/2)).
IBM has an article on kinda close subject.

Let's see:
O(N) pass to find min/max allocate
bit-array for found
O(N) pass swapping duplicates to end.

This can be done in one pass with an O(N log N) algorithm and no extra storage.
Proceed from element a[1] to a[N]. At each stage i, all of the elements to the left of a[i] comprise a sorted heap of elements a[0] through a[j]. Meanwhile, a second index j, initially 0, keeps track of the size of the heap.
Examine a[i] and insert it into the heap, which now occupies elements a[0] to a[j+1]. As the element is inserted, if a duplicate element a[k] is encountered having the same value, do not insert a[i] into the heap (i.e., discard it); otherwise insert it into the heap, which now grows by one element and now comprises a[0] to a[j+1], and increment j.
Continue in this manner, incrementing i until all of the array elements have been examined and inserted into the heap, which ends up occupying a[0] to a[j]. j is the index of the last element of the heap, and the heap contains only unique element values.
int algorithm(int[] a, int n)
{
int i, j;
for (j = 0, i = 1; i < n; i++)
{
// Insert a[i] into the heap a[0...j]
if (heapInsert(a, j, a[i]))
j++;
}
return j;
}
bool heapInsert(a[], int n, int val)
{
// Insert val into heap a[0...n]
...code omitted for brevity...
if (duplicate element a[k] == val)
return false;
a[k] = val;
return true;
}
Looking at the example, this is not exactly what was asked for since the resulting array preserves the original element order. But if this requirement is relaxed, the algorithm above should do the trick.

In Java I would solve it like this. Don't know how to write this in C.
int length = array.length;
for (int i = 0; i < length; i++)
{
for (int j = i + 1; j < length; j++)
{
if (array[i] == array[j])
{
int k, j;
for (k = j + 1, l = j; k < length; k++, l++)
{
if (array[k] != array[i])
{
array[l] = array[k];
}
else
{
l--;
}
}
length = l;
}
}
}

How about the following?
int* temp = malloc(sizeof(int)*len);
int count = 0;
int x =0;
int y =0;
for(x=0;x<len;x++)
{
for(y=0;y<count;y++)
{
if(*(temp+y)==*(array+x))
{
break;
}
}
if(y==count)
{
*(temp+count) = *(array+x);
count++;
}
}
memcpy(array, temp, sizeof(int)*len);
I try to declare a temp array and put the elements into that before copying everything back to the original array.

After review the problem, here is my delphi way, that may help
var
A: Array of Integer;
I,J,C,K, P: Integer;
begin
C:=10;
SetLength(A,10);
A[0]:=1; A[1]:=4; A[2]:=2; A[3]:=6; A[4]:=3; A[5]:=4;
A[6]:=3; A[7]:=4; A[8]:=2; A[9]:=5;
for I := 0 to C-1 do
begin
for J := I+1 to C-1 do
if A[I]=A[J] then
begin
for K := C-1 Downto J do
if A[J]<>A[k] then
begin
P:=A[K];
A[K]:=0;
A[J]:=P;
C:=K;
break;
end
else
begin
A[K]:=0;
C:=K;
end;
end;
end;
//tructate array
setlength(A,C);
end;

The following example should solve your problem:
def check_dump(x):
if not x in t:
t.append(x)
return True
t=[]
output = filter(check_dump, input)
print(output)
True

import java.util.ArrayList;
public class C {
public static void main(String[] args) {
int arr[] = {2,5,5,5,9,11,11,23,34,34,34,45,45};
ArrayList<Integer> arr1 = new ArrayList<Integer>();
for(int i=0;i<arr.length-1;i++){
if(arr[i] == arr[i+1]){
arr[i] = 99999;
}
}
for(int i=0;i<arr.length;i++){
if(arr[i] != 99999){
arr1.add(arr[i]);
}
}
System.out.println(arr1);
}
}

This is the naive (N*(N-1)/2) solution. It uses constant additional space and maintains the original order. It is similar to the solution by #Byju, but uses no if(){} blocks. It also avoids copying an element onto itself.
#include <stdio.h>
#include <stdlib.h>
int numbers[] = {4, 8, 4, 1, 1, 2, 9};
#define COUNT (sizeof numbers / sizeof numbers[0])
size_t undup_it(int array[], size_t len)
{
size_t src,dst;
/* an array of size=1 cannot contain duplicate values */
if (len <2) return len;
/* an array of size>1 will cannot at least one unique value */
for (src=dst=1; src < len; src++) {
size_t cur;
for (cur=0; cur < dst; cur++ ) {
if (array[cur] == array[src]) break;
}
if (cur != dst) continue; /* found a duplicate */
/* array[src] must be new: add it to the list of non-duplicates */
if (dst < src) array[dst] = array[src]; /* avoid copy-to-self */
dst++;
}
return dst; /* number of valid alements in new array */
}
void print_it(int array[], size_t len)
{
size_t idx;
for (idx=0; idx < len; idx++) {
printf("%c %d", (idx) ? ',' :'{' , array[idx] );
}
printf("}\n" );
}
int main(void) {
size_t cnt = COUNT;
printf("Before undup:" );
print_it(numbers, cnt);
cnt = undup_it(numbers,cnt);
printf("After undup:" );
print_it(numbers, cnt);
return 0;
}

This can be done in a single pass, in O(N) time in the number of integers in the input
list, and O(N) storage in the number of unique integers.
Walk through the list from front to back, with two pointers "dst" and
"src" initialized to the first item. Start with an empty hash table
of "integers seen". If the integer at src is not present in the hash,
write it to the slot at dst and increment dst. Add the integer at src
to the hash, then increment src. Repeat until src passes the end of
the input list.

Insert all the elements in a binary tree the disregards duplicates - O(nlog(n)). Then extract all of them back in the array by doing a traversal - O(n). I am assuming that you don't need order preservation.

Use bloom filter for hashing. This will reduce the memory overhead very significantly.

In JAVA,
Integer[] arrayInteger = {1,2,3,4,3,2,4,6,7,8,9,9,10};
String value ="";
for(Integer i:arrayInteger)
{
if(!value.contains(Integer.toString(i))){
value +=Integer.toString(i)+",";
}
}
String[] arraySplitToString = value.split(",");
Integer[] arrayIntResult = new Integer[arraySplitToString.length];
for(int i = 0 ; i < arraySplitToString.length ; i++){
arrayIntResult[i] = Integer.parseInt(arraySplitToString[i]);
}
output:
{ 1, 2, 3, 4, 6, 7, 8, 9, 10}
hope this will help

Create a BinarySearchTree which has O(n) complexity.

First, you should create an array check[n] where n is the number of elements of the array you want to make duplicate-free and set the value of every element(of the check array) equal to 1. Using a for loop traverse the array with the duplicates, say its name is arr, and in the for-loop write this :
{
if (check[arr[i]] != 1) {
arr[i] = 0;
}
else {
check[arr[i]] = 0;
}
}
With that, you set every duplicate equal to zero. So the only thing is left to do is to traverse the arr array and print everything it's not equal to zero. The order stays and it takes linear time (3*n).

Given an array of n elements, write an algorithm to remove all duplicates from the array in time O(nlogn)
Algorithm delete_duplicates (a[1....n])
//Remove duplicates from the given array
//input parameters :a[1:n], an array of n elements.
{
temp[1:n]; //an array of n elements.
temp[i]=a[i];for i=1 to n
temp[i].value=a[i]
temp[i].key=i
//based on 'value' sort the array temp.
//based on 'value' delete duplicate elements from temp.
//based on 'key' sort the array temp.//construct an array p using temp.
p[i]=temp[i]value
return p.
In other of elements is maintained in the output array using the 'key'. Consider the key is of length O(n), the time taken for performing sorting on the key and value is O(nlogn). So the time taken to delete all duplicates from the array is O(nlogn).

this is what i've got, though it misplaces the order we can sort in ascending or descending to fix it up.
#include <stdio.h>
int main(void){
int x,n,myvar=0;
printf("Enter a number: \t");
scanf("%d",&n);
int arr[n],changedarr[n];
for(x=0;x<n;x++){
printf("Enter a number for array[%d]: ",x);
scanf("%d",&arr[x]);
}
printf("\nOriginal Number in an array\n");
for(x=0;x<n;x++){
printf("%d\t",arr[x]);
}
int i=0,j=0;
// printf("i\tj\tarr\tchanged\n");
for (int i = 0; i < n; i++)
{
// printf("%d\t%d\t%d\t%d\n",i,j,arr[i],changedarr[i] );
for (int j = 0; j <n; j++)
{
if (i==j)
{
continue;
}
else if(arr[i]==arr[j]){
changedarr[j]=0;
}
else{
changedarr[i]=arr[i];
}
// printf("%d\t%d\t%d\t%d\n",i,j,arr[i],changedarr[i] );
}
myvar+=1;
}
// printf("\n\nmyvar=%d\n",myvar);
int count=0;
printf("\nThe unique items:\n");
for (int i = 0; i < myvar; i++)
{
if(changedarr[i]!=0){
count+=1;
printf("%d\t",changedarr[i]);
}
}
printf("\n");
}

It'd be cool if you had a good DataStructure that could quickly tell if it contains an integer. Perhaps a tree of some sort.
DataStructure elementsSeen = new DataStructure();
int elementsRemoved = 0;
for(int i=0;i<array.Length;i++){
if(elementsSeen.Contains(array[i])
elementsRemoved++;
else
array[i-elementsRemoved] = array[i];
}
array.Length = array.Length - elementsRemoved;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Remove duplicates from array in linear time and without extra arrays - arrays

We have an array and it is unsorted. We know the range is [0,n]. We want to remove duplicates but we cannot use extra arrays and it must run in linear time. Any ideas? Just to clarify, this is not for homework!

void printRepeating(int arr[], int size) { int i; printf("The repeating elements are: \n"); for(i = 0; i < size; i++) { if(arr[abs(arr[i])] >= 0) arr[abs(arr[i])] = -arr[abs(arr[i])]; else printf(" %d ", abs(arr[i])); } }

Can you sort? Sort with Radix Sort - http://en.wikipedia.org/wiki/Radix_sort with complexity O(arraySize) for given case and then remove duplicates from sorted array O(arraySize).

Walk through the array assign array[array[i]] = -array[array[i]]; if not negative; if its already negative then its duplicate, this will work since all values are within 0 and n.

Use the array a a container with negative sign as an indicator, this will corrupt the input though.

Related

How to keep while loop in bubble sort function in C

basic thing from a job interview - using linked list,arrays

Rearranging an array with respect to another array

missing numbers

Algorithm: efficient way to remove duplicate integers from an array

Categories

Resources