Related
I came across a question for which I couldn't find the algorithm. Can you help me?
Question- A valid substring is one which contains the letter a or z. You will get a string and you have to calculate the number of valid sub-strings of that string.For example- the string 'abcd' contains 4 valid substrings. The string 'azazaz' contains 21 valid substrings and similarly 'abbzbba' contains 22 valid substrings.
I just want to know the algorithm.
Define D[i] - number of valid substrings ending at index i.
Assuming you have this D[i], the solution is simply D[0]+D[1]+...+D[n-1].
Calculating D is fairly simple, by iterating the string and for each charater:
if it is "valid", all substrings ending with this characters are valid.
Otherwise, only by extending a valid substring that ended at last character - makes it valid.
C code:
int NumValidSubstrings(char* s) {
int n = strlen(s);
int D[n] = {0}; // VLA, if that's an issue, just use dynamic allocation
for (int i = 0; i < n; i++) {
if (s[i] == 'z' || s[i] == 'a') {
// if character is valid, each substring ending with it is also valid.
D[i] += i + 1;
} else if (i > 0) {
// Else, only valid substrings from last character, that are extended by 1
D[i] = D[i-1];
}
}
int count = 0;
for (int i = 0; i < n; i++) count += D[i];
return count;
}
Notes:
This technique is called Dynamic Programming.
This solution is O(n) time + space.
You can save some space by not storing the entire D array - but only the last value and calculate count on the fly, making this solution O(1) space and O(n) time.
Given an arbitrary char[] find the number of character pairs in the string. So this would be 3:
aabbcc
If two pairs of the same character adjacent both pairs should be counted. So this would be 3:
aaaabb
An interrupting single char must reset the count of consecutive character pairs. So this would be 1:
aabcc
If the interrupting single char is the same as the preceding pair it does not reset the count of consecutive character pairs. So this would be 3:
aabbbccc
This is an adaptation from this question. The original interpretation of the question was interesting but the comments kept changing the nature of that question from it's original interpretation. This was my answer to that original interpretation, and I was wondering if improvement could be made upon it?
Loop control should use the size of the array for the range, and indexing of a[i + 1] may be out of bounds if i is the index to the last element, so using a[i - 1] instead and iterating over the range [1 .. sizeof(a) / sizeof(a[0])] is preferable
The algorithm is best solved with 3 variables:
char* last points to the first element of the current string of consecutive characters
int count1 the number of consecutive pairs in the current count
int count the highest number of recorded consecutive pairs
The algorithm is best illustrated with a state machine. It will operate on :
Upon entry set last to NULL, if count1 is larger than count assign count1 to count, and reset count1 to 0
Upon entry set last to the first character in this string of consecutive characters (a[i-1])
Upon entry add the number of consecutive characters pointed to by last divided by 2 so as to only find the pairs
This is corrected code with comments inline:
size_t i = 0;
char* last = NULL;
long int count1 = 0;
long int count = 0;
char a[] = {'d', 'p', 'p', 'c', 'c', 'd', 'd', 'd'};
while (++i < sizeof(a) / sizeof(a[0])) { // This will iterate over the range: [1 .. sizeof(a) / sizeof(a[0])]
if (a[i - 1] == a[i]) { // Test previous character to avoid going out of range
if (last == NULL) { // Entry to state 2
last = a + i - 1;
}
} else if (last != NULL) {
if (a + i - last > 1) { // Entry to state 3
count1 += (a + i - last) / 2;
last = a + i;
} else { // Entry to state 1
if (count1 > count) { // If the current count is larger
count = count1; // Replace the maximum count
}
count1 = 0; // Reset the current count
last = NULL;
}
}
}
if (last != NULL) { // Entry to state 3
if (a + (sizeof(a) / sizeof(a[0])) - last > 1) {
count1 += (a + (sizeof(a) / sizeof(a[0])) - last) / 2;
}
if (count1 > count) { // If the current count is larger
count = count1; // Replace the maximum count
}
}
printf("%ld", count);
[Live Example]
For example: int A[] = {3,2,1,2,3,2,1,3,1,2,3};
How to sort this array efficiently?
This is for a job interview, I need just a pseudo-code.
The promising way how to sort it seems to be the counting sort. Worth to have a look at this lecture by Richard Buckland, especially the part from 15:20.
Analogically to the counting sort, but even better would be to create an array representing the domain, initialize all its elements to 0 and then iterate through your array and count these values. Once you know those counts of domain values, you can rewrite values of your array accordingly. Complexity of such an algorithm would be O(n).
Here's the C++ code with the behaviour as I described it. Its complexity is actually O(2n) though:
int A[] = {3,2,1,2,3,2,1,3,1,2,3};
int domain[4] = {0};
// count occurrences of domain values - O(n):
int size = sizeof(A) / sizeof(int);
for (int i = 0; i < size; ++i)
domain[A[i]]++;
// rewrite values of the array A accordingly - O(n):
for (int k = 0, i = 1; i < 4; ++i)
for (int j = 0; j < domain[i]; ++j)
A[k++] = i;
Note, that if there is big difference between domain values, storing domain as an array is inefficient. In that case it is much better idea to use map (thanks abhinav for pointing it out). Here's the C++ code that uses std::map for storing domain value - occurrences count pairs:
int A[] = {2000,10000,7,10000,10000,2000,10000,7,7,10000};
std::map<int, int> domain;
// count occurrences of domain values:
int size = sizeof(A) / sizeof(int);
for (int i = 0; i < size; ++i)
{
std::map<int, int>::iterator keyItr = domain.lower_bound(A[i]);
if (keyItr != domain.end() && !domain.key_comp()(A[i], keyItr->first))
keyItr->second++; // next occurrence
else
domain.insert(keyItr, std::pair<int,int>(A[i],1)); // first occurrence
}
// rewrite values of the array A accordingly:
int k = 0;
for (auto i = domain.begin(); i != domain.end(); ++i)
for (int j = 0; j < i->second; ++j)
A[k++] = i->first;
(if there is a way how to use std::map in above code more efficient, let me know)
Its a standard problem in computer science : Dutch national flag problem
See the link.
count each number and then create new array based on their counts...time complexity in O(n)
int counts[3] = {0,0,0};
for(int a in A)
counts[a-1]++;
for(int i = 0; i < counts[0]; i++)
A[i] = 1;
for(int i = counts[0]; i < counts[0] + counts[1]; i++)
A[i] = 2;
for(int i = counts[0] + counts[1]; i < counts[0] + counts[1] + counts[2]; i++)
A[i] = 3;
Problem description: You have n buckets, each bucket contain one coin , the value of the coin can be 5 or 10 or 20. you have to sort the buckets under this limitation: 1. you can use this 2 functions only: SwitchBaskets (Basket1, Basket2) – switch 2 baskets GetCoinValue (Basket1) – return Coin Value in selected basket 2. you cant define array of size n 3. use the switch function as little as possible.
My simple pseudo-code solution, which can be implemented in any language with O(n) complexity.
I will pick coin from basket
1) if it is 5 - push it to be the first,
2)if it is 20- push it to be the last,
3)If 10 - leave it where it is.
4) and look at the next bucket in line.
Edit: if you can't push elements to the first or last position then Merge sort would be ideally for piratical implementation. Here is how it will work:
Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then merges each of the resulting lists of two into lists of four, then merges those lists of four, and so on; until at last two lists are merged into the final sorted list. Of the algorithms described here, this is the first that scales well to very large lists, because its worst-case running time is O(n log n). Merge sort has seen a relatively recent surge in popularity for practical implementations, being used for the standard sort routine in the programming languages
I think the question is intending for you to use bucket sort. In cases where there are a small number of values bucket sort can be much faster than the more commonly used quicksort or mergesort.
As robert mentioned basketsort (or bucketsort) is the best in this situation.
I would also added next algorithm (it's actually very similar to busket sort):
[pseudocode is java-style]
Create a HashMap<Integer, Interger> map and cycle throught your array:
for (Integer i : array) {
Integer value = map.get(i);
if (value == null) {
map.put(i, 1);
} else {
map.put(i, value + 1);
}
}
I think I understasnd the question - you can use only O(1) space, and you can change the array only by swapping cells. (So you can use 2 operations on the array - swap and get)
My solution:
Use 2 index pointers - one for the position of the last 1, and one for the position of the last 2.
In stage i, you assume that the array is allready sorted from 1 to i-1,
than you check the i-th cell:
If A[i] == 3
you do nothing.
If A[i] == 2
you swap it with the cell after the last 2 index.
If A[i] == 1
you swap it with the cell after the last 2 index, and than swap the cell
after the last 2 index (that contains 1) with the cell after the last 1 index.
This is the main idea, you need to take care of the little details.
Overall O(n) complexity.
Here is the groovy solution, based on #ElYusubov but instead of pushing Bucket(5) to beginning & Bucket(15) to end. Use sifting so that 5's move toward beginning and 15 towards end.
Whenever we swap a bucket from end to current position, we decrement end, do not increment current counter as we need to check for the element again.
array = [15,5,10,5,10,10,15,5,15,10,5]
def swapBucket(int a, int b) {
if (a == b) return;
array[a] = array[a] + array[b]
array[b] = array[a] - array[b]
array[a] = array[a] - array[b]
}
def getBucketValue(int a) {
return array[a];
}
def start = 0, end = array.size() -1, counter = 0;
// we can probably do away with this start,end but it helps when already sorted.
// start - first bucket from left which is not 5
while (start < end) {
if (getBucketValue(start) != 5) break;
start++;
}
// end - first bucket from right whichis not 15
while (end > start) {
if (getBucketValue(end) != 15) break;
end--;
}
// already sorted when end = 1 { 1...size-1 are Buck(15) } or start = end-1
for (counter = start; counter < end;) {
def value = getBucketValue(counter)
if (value == 5) { swapBucket(start, counter); start++; counter++;}
else if (value == 15) { swapBucket(end, counter); end--; } // do not inc counter
else { counter++; }
}
for (key in array) { print " ${key} " }
This can be done very easily using-->
Dutch national Flag algorithm http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Sort/Flag/
instead of using 1,2,3 take it as 0,1,2
Have you tried to look at wiki for example? - http://en.wikipedia.org/wiki/Sorting_algorithm
This code is for c#:
However, you have to consider the algorithms to implement it in a non-language/framework specific way. As suggested Bucket set might be the efficient one to go with. If you provide detailed information on problem, i would try to look at best solution.
Good Luck...
Here is a code sample in C# .NET
int[] intArray = new int[9] {3,2,1,2,3,2,1,3,1 };
Array.Sort(intArray);
// write array
foreach (int i in intArray) Console.Write("{0}, ", i.ToString());
Just for fun, here's how you would implement "pushing values to the far edge", as ElYusubub suggested:
sort(array) {
a = 0
b = array.length
# a is the first item which isn't a 1
while array[a] == 1
a++
# b is the last item which isn't a 3
while array[b] == 3
b--
# go over all the items from the first non-1 to the last non-3
for (i = a; i <= b; i++)
# the while loop is because the swap could result in a 3 or a 1
while array[i] != 2
if array[i] == 1
swap(i, a)
while array[a] == 1
a++
else # array[i] == 3
swap(i, b)
while array[b] == 3
b--
This could actually be an optimal solution. I'm not sure.
Lets break the problem we have just two numbers in array . [1,2,1,2,2,2,1,1]
We can sort in one pass o(n) with minm swaps if;
We start two pointers from left and right until they meet each other.
Swapping left element with right if left element is bigger. (sort ascending)
We can do another pass, for three numbers (k-1 passes). In pass one we moved 1's to their final position and in pass 2 we moved 2's.
def start = 0, end = array.size() - 1;
// Pass 1, move lowest order element (1) to their final position
while (start < end) {
// first element from left which is not 1
for ( ; Array[start] == 1 && start < end ; start++);
// first element from right which IS 1
for ( ; Array[end] != 1 && start < end ; end--);
if (start < end) swap(start, end);
}
// In second pass we can do 10,15
// We can extend this using recurion, for sorting domain = k, we need k-1 recurions
def DNF(input,length):
high = length - 1
p = 0
i = 0
while i <= high:
if input[i] == 0:
input[i],input[p]=input[p],input[i]
p = p+1
i = i+1
elif input[i] == 2:
input[i],input[high]=input[high],input[i]
high = high-1
else:
i = i+1
input = [0,1,2,2,1,0]
print "input: ", input
DNF(input,len(input))
print "output: ", input
I would use a recursive approach over here
fun sortNums(smallestIndex,largestIndex,array,currentIndex){
if(currentIndex >= array.size)
return
if (array[currentIndex] == 1){
You have found the smallest element, now increase the smallestIndex
//You need to put this element to left side of the array at the smallestIndex position.
//You can simply swap(smallestIndex, currentIndex)
// The catch here is you should not swap it if it's already on the left side
//recursive call
sortNums(smallestIndex,largestIndex,array,currentIndex or currentIndex+1)// Now the task of incrementing current Index in recursive call depends on the element at currentIndex. if it's 3, then you might want to let the fate of currentIndex decided by recursive function else simply increment by 1 and move further
} else if (array[currentInde]==3){
// same logic but you need to add it at end
}
}
You can start the recursive function by sortNums(smallestIndex=-1,largestIndex=array.size,array,currentIndex=0)
You can find the sample code over here
Code Link
//Bubble sort for unsorted array - algorithm
public void bubleSort(int arr[], int n) { //n is the length of an array
int temp;
for(int i = 0; i <= n-2; i++){
for(int j = 0; j <= (n-2-i); j++){
if(arr[j] > arr[j +1]){
temp = arr[j];
arr[j] = arr[j +1];
arr[j + 1] = temp;
}
}
}
I recently came across a question somewhere:
Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice. Assume that you can access each element of the array only once. Describe an algorithm to find the repeated number. If you used auxiliary storage in your algorithm, can you find an algorithm that does not require it?
What I am interested in to know is the second part, i.e., without using auxiliary storage. Do you have any idea?
Just add them all up, and subtract the total you would expect if only 1001 numbers were used from that.
Eg:
Input: 1,2,3,2,4 => 12
Expected: 1,2,3,4 => 10
Input - Expected => 2
Update 2: Some people think that using XOR to find the duplicate number is a hack or trick. To which my official response is: "I am not looking for a duplicate number, I am looking for a duplicate pattern in an array of bit sets. And XOR is definitely suited better than ADD to manipulate bit sets". :-)
Update: Just for fun before I go to bed, here's "one-line" alternative solution that requires zero additional storage (not even a loop counter), touches each array element only once, is non-destructive and does not scale at all :-)
printf("Answer : %d\n",
array[0] ^
array[1] ^
array[2] ^
// continue typing...
array[999] ^
array[1000] ^
1 ^
2 ^
// continue typing...
999^
1000
);
Note that the compiler will actually calculate the second half of that expression at compile time, so the "algorithm" will execute in exactly 1002 operations.
And if the array element values are know at compile time as well, the compiler will optimize the whole statement to a constant. :-)
Original solution: Which does not meet the strict requirements of the questions, even though it works to find the correct answer. It uses one additional integer to keep the loop counter, and it accesses each array element three times - twice to read it and write it at the current iteration and once to read it for the next iteration.
Well, you need at least one additional variable (or a CPU register) to store the index of the current element as you go through the array.
Aside from that one though, here's a destructive algorithm that can safely scale for any N up to MAX_INT.
for (int i = 1; i < 1001; i++)
{
array[i] = array[i] ^ array[i-1] ^ i;
}
printf("Answer : %d\n", array[1000]);
I will leave the exercise of figuring out why this works to you, with a simple hint :-):
a ^ a = 0
0 ^ a = a
A non destructive version of solution by Franci Penov.
This can be done by making use of the XOR operator.
Lets say we have an array of size 5: 4, 3, 1, 2, 2
Which are at the index: 0, 1, 2, 3, 4
Now do an XOR of all the elements and all the indices. We get 2, which is the duplicate element. This happens because, 0 plays no role in the XORing. The remaining n-1 indices pair with same n-1 elements in the array and the only unpaired element in the array will be the duplicate.
int i;
int dupe = 0;
for(i = 0; i < N; i++) {
dupe = dupe ^ arr[i] ^ i;
}
// dupe has the duplicate.
The best feature of this solution is that it does not suffer from overflow problems that is seen in the addition based solution.
Since this is an interview question, it would be best to start with the addition based solution, identify the overflow limitation and then give the XOR based solution :)
This makes use of an additional variable so does not meet the requirements in the question completely.
Add all the numbers together. The final sum will be the 1+2+...+1000+duplicate number.
To paraphrase Francis Penov's solution.
The (usual) problem is: given an array of integers of arbitrary length that contain only elements repeated an even times of times except for one value which is repeated an odd times of times, find out this value.
The solution is:
acc = 0
for i in array: acc = acc ^ i
Your current problem is an adaptation. The trick is that you are to find the element that is repeated twice so you need to adapt solution to compensate for this quirk.
acc = 0
for i in len(array): acc = acc ^ i ^ array[i]
Which is what Francis' solution does in the end, although it destroys the whole array (by the way, it could only destroy the first or last element...)
But since you need extra-storage for the index, I think you'll be forgiven if you also use an extra integer... The restriction is most probably because they want to prevent you from using an array.
It would have been phrased more accurately if they had required O(1) space (1000 can be seen as N since it's arbitrary here).
Add all numbers. The sum of integers 1..1000 is (1000*1001)/2. The difference from what you get is your number.
One line solution in Python
arr = [1,3,2,4,2]
print reduce(lambda acc, (i, x): acc ^ i ^ x, enumerate(arr), 0)
# -> 2
Explanation on why it works is in #Matthieu M.'s answer.
If you know that we have the exact numbers 1-1000, you can add up the results and subtract 500500 (sum(1, 1000)) from the total. This will give the repeated number because sum(array) = sum(1, 1000) + repeated number.
Well, there is a very simple way to do this... each of the numbers between 1 and 1000 occurs exactly once except for the number that is repeated.... so, the sum from 1....1000 is 500500. So, the algorithm is:
sum = 0
for each element of the array:
sum += that element of the array
number_that_occurred_twice = sum - 500500
n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
public static void main(String[] args) {
int start = 1;
int end = 10;
int arr[] = {1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10};
System.out.println(findDuplicate(arr, start, end));
}
static int findDuplicate(int arr[], int start, int end) {
int sumAll = 0;
for(int i = start; i <= end; i++) {
sumAll += i;
}
System.out.println(sumAll);
int sumArrElem = 0;
for(int e : arr) {
sumArrElem += e;
}
System.out.println(sumArrElem);
return sumArrElem - sumAll;
}
No extra storage requirement (apart from loop variable).
int length = (sizeof array) / (sizeof array[0]);
for(int i = 1; i < length; i++) {
array[0] += array[i];
}
printf(
"Answer : %d\n",
( array[0] - (length * (length + 1)) / 2 )
);
Do arguments and callstacks count as auxiliary storage?
int sumRemaining(int* remaining, int count) {
if (!count) {
return 0;
}
return remaining[0] + sumRemaining(remaining + 1, count - 1);
}
printf("duplicate is %d", sumRemaining(array, 1001) - 500500);
Edit: tail call version
int sumRemaining(int* remaining, int count, int sumSoFar) {
if (!count) {
return sumSoFar;
}
return sumRemaining(remaining + 1, count - 1, sumSoFar + remaining[0]);
}
printf("duplicate is %d", sumRemaining(array, 1001, 0) - 500500);
public int duplicateNumber(int[] A) {
int count = 0;
for(int k = 0; k < A.Length; k++)
count += A[k];
return count - (A.Length * (A.Length - 1) >> 1);
}
A triangle number T(n) is the sum of the n natural numbers from 1 to n. It can be represented as n(n+1)/2. Thus, knowing that among given 1001 natural numbers, one and only one number is duplicated, you can easily sum all given numbers and subtract T(1000). The result will contain this duplicate.
For a triangular number T(n), if n is any power of 10, there is also beautiful method finding this T(n), based on base-10 representation:
n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
I support the addition of all the elements and then subtracting from it the sum of all the indices but this won't work if the number of elements is very large. I.e. It will cause an integer overflow! So I have devised this algorithm which may be will reduce the chances of an integer overflow to a large extent.
for i=0 to n-1
begin:
diff = a[i]-i;
dup = dup + diff;
end
// where dup is the duplicate element..
But by this method I won't be able to find out the index at which the duplicate element is present!
For that I need to traverse the array another time which is not desirable.
Improvement of Fraci's answer based on the property of XORing consecutive values:
int result = xor_sum(N);
for (i = 0; i < N+1; i++)
{
result = result ^ array[i];
}
Where:
// Compute (((1 xor 2) xor 3) .. xor value)
int xor_sum(int value)
{
int modulo = x % 4;
if (modulo == 0)
return value;
else if (modulo == 1)
return 1;
else if (modulo == 2)
return i + 1;
else
return 0;
}
Or in pseudocode/math lang f(n) defined as (optimized):
if n mod 4 = 0 then X = n
if n mod 4 = 1 then X = 1
if n mod 4 = 2 then X = n+1
if n mod 4 = 3 then X = 0
And in canonical form f(n) is:
f(0) = 0
f(n) = f(n-1) xor n
My answer to question 2:
Find the sum and product of numbers from 1 -(to) N, say SUM, PROD.
Find the sum and product of Numbers from 1 - N- x -y, (assume x, y missing), say mySum, myProd,
Thus:
SUM = mySum + x + y;
PROD = myProd* x*y;
Thus:
x*y = PROD/myProd; x+y = SUM - mySum;
We can find x,y if solve this equation.
In the aux version, you first set all the values to -1 and as you iterate check if you have already inserted the value to the aux array. If not (value must be -1 then), insert. If you have a duplicate, here is your solution!
In the one without aux, you retrieve an element from the list and check if the rest of the list contains that value. If it contains, here you've found it.
private static int findDuplicated(int[] array) {
if (array == null || array.length < 2) {
System.out.println("invalid");
return -1;
}
int[] checker = new int[array.length];
Arrays.fill(checker, -1);
for (int i = 0; i < array.length; i++) {
int value = array[i];
int checked = checker[value];
if (checked == -1) {
checker[value] = value;
} else {
return value;
}
}
return -1;
}
private static int findDuplicatedWithoutAux(int[] array) {
if (array == null || array.length < 2) {
System.out.println("invalid");
return -1;
}
for (int i = 0; i < array.length; i++) {
int value = array[i];
for (int j = i + 1; j < array.length; j++) {
int toCompare = array[j];
if (value == toCompare) {
return array[i];
}
}
}
return -1;
}
I got this problem from an interview with Microsoft.
Given an array of random integers,
write an algorithm in C that removes
duplicated numbers and return the unique numbers in the original
array.
E.g Input: {4, 8, 4, 1, 1, 2, 9} Output: {4, 8, 1, 2, 9, ?, ?}
One caveat is that the expected algorithm should not required the array to be sorted first. And when an element has been removed, the following elements must be shifted forward as well. Anyway, value of elements at the tail of the array where elements were shifted forward are negligible.
Update: The result must be returned in the original array and helper data structure (e.g. hashtable) should not be used. However, I guess order preservation is not necessary.
Update2: For those who wonder why these impractical constraints, this was an interview question and all these constraints are discussed during the thinking process to see how I can come up with different ideas.
A solution suggested by my girlfriend is a variation of merge sort. The only modification is that during the merge step, just disregard duplicated values. This solution would be as well O(n log n). In this approach, the sorting/duplication removal are combined together. However, I'm not sure if that makes any difference, though.
I've posted this once before on SO, but I'll reproduce it here because it's pretty cool. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows:
Take the first element of the array, this will be the sentinel.
Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel.
Move all elements for which the index is equal to the hash to the beginning of the array.
Move all elements that are equal to sentinel, except the first element of the array, to the end of the array.
What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements.
This can be shown to be O(N) provided no pathological scenario in the hashing: Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast.
Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion.
Here's an implementation in the D programming language:
void uniqueInPlace(T)(ref T[] dataIn) {
uniqueInPlaceImpl(dataIn, 0);
}
void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) {
if(dataIn.length - start < 2)
return;
invariant T sentinel = dataIn[start];
T[] data = dataIn[start + 1..$];
static hash_t getHash(T elem) {
static if(is(T == uint) || is(T == int)) {
return cast(hash_t) elem;
} else static if(__traits(compiles, elem.toHash)) {
return elem.toHash;
} else {
static auto ti = typeid(typeof(elem));
return ti.getHash(&elem);
}
}
for(size_t index = 0; index < data.length;) {
if(data[index] == sentinel) {
index++;
continue;
}
auto hash = getHash(data[index]) % data.length;
if(index == hash) {
index++;
continue;
}
if(data[index] == data[hash]) {
data[index] = sentinel;
index++;
continue;
}
if(data[hash] == sentinel) {
swap(data[hash], data[index]);
index++;
continue;
}
auto hashHash = getHash(data[hash]) % data.length;
if(hashHash != hash) {
swap(data[index], data[hash]);
if(hash < index)
index++;
} else {
index++;
}
}
size_t swapPos = 0;
foreach(i; 0..data.length) {
if(data[i] != sentinel && i == getHash(data[i]) % data.length) {
swap(data[i], data[swapPos++]);
}
}
size_t sentinelPos = data.length;
for(size_t i = swapPos; i < sentinelPos;) {
if(data[i] == sentinel) {
swap(data[i], data[--sentinelPos]);
} else {
i++;
}
}
dataIn = dataIn[0..sentinelPos + start + 1];
uniqueInPlaceImpl(dataIn, start + swapPos + 1);
}
How about:
void rmdup(int *array, int length)
{
int *current , *end = array + length - 1;
for ( current = array + 1; array < end; array++, current = array + 1 )
{
while ( current <= end )
{
if ( *current == *array )
{
*current = *end--;
}
else
{
current++;
}
}
}
}
Should be O(n^2) or less.
If you are looking for the superior O-notation, then sorting the array with an O(n log n) sort then doing a O(n) traversal may be the best route. Without sorting, you are looking at O(n^2).
Edit: if you are just doing integers, then you can also do radix sort to get O(n).
One more efficient implementation
int i, j;
/* new length of modified array */
int NewLength = 1;
for(i=1; i< Length; i++){
for(j=0; j< NewLength ; j++)
{
if(array[i] == array[j])
break;
}
/* if none of the values in index[0..j] of array is not same as array[i],
then copy the current value to corresponding new position in array */
if (j==NewLength )
array[NewLength++] = array[i];
}
In this implementation there is no need for sorting the array.
Also if a duplicate element is found, there is no need for shifting all elements after this by one position.
The output of this code is array[] with size NewLength
Here we are starting from the 2nd elemt in array and comparing it with all the elements in array up to this array.
We are holding an extra index variable 'NewLength' for modifying the input array.
NewLength variabel is initialized to 0.
Element in array[1] will be compared with array[0].
If they are different, then value in array[NewLength] will be modified with array[1] and increment NewLength.
If they are same, NewLength will not be modified.
So if we have an array [1 2 1 3 1],
then
In First pass of 'j' loop, array[1] (2) will be compared with array0, then 2 will be written to array[NewLength] = array[1]
so array will be [1 2] since NewLength = 2
In second pass of 'j' loop, array[2] (1) will be compared with array0 and array1. Here since array[2] (1) and array0 are same loop will break here.
so array will be [1 2] since NewLength = 2
and so on
1. Using O(1) extra space, in O(n log n) time
This is possible, for instance:
first do an in-place O(n log n) sort
then walk through the list once, writing the first instance of every back to the beginning of the list
I believe ejel's partner is correct that the best way to do this would be an in-place merge sort with a simplified merge step, and that that is probably the intent of the question, if you were eg. writing a new library function to do this as efficiently as possible with no ability to improve the inputs, and there would be cases it would be useful to do so without a hash-table, depending on the sorts of inputs. But I haven't actually checked this.
2. Using O(lots) extra space, in O(n) time
declare a zero'd array big enough to hold all integers
walk through the array once
set the corresponding array element to 1 for each integer.
If it was already 1, skip that integer.
This only works if several questionable assumptions hold:
it's possible to zero memory cheaply, or the size of the ints are small compared to the number of them
you're happy to ask your OS for 256^sizepof(int) memory
and it will cache it for you really really efficiently if it's gigantic
It's a bad answer, but if you have LOTS of input elements, but they're all 8-bit integers (or maybe even 16-bit integers) it could be the best way.
3. O(little)-ish extra space, O(n)-ish time
As #2, but use a hash table.
4. The clear way
If the number of elements is small, writing an appropriate algorithm is not useful if other code is quicker to write and quicker to read.
Eg. Walk through the array for each unique elements (ie. the first element, the second element (duplicates of the first having been removed) etc) removing all identical elements. O(1) extra space, O(n^2) time.
Eg. Use library functions which do this. efficiency depends which you have easily available.
Well, it's basic implementation is quite simple. Go through all elements, check whether there are duplicates in the remaining ones and shift the rest over them.
It's terrible inefficient and you could speed it up by a helper-array for the output or sorting/binary trees, but this doesn't seem to be allowed.
If you are allowed to use C++, a call to std::sort followed by a call to std::unique will give you the answer. The time complexity is O(N log N) for the sort and O(N) for the unique traversal.
And if C++ is off the table there isn't anything that keeps these same algorithms from being written in C.
You could do this in a single traversal, if you are willing to sacrifice memory. You can simply tally whether you have seen an integer or not in a hash/associative array. If you have already seen a number, remove it as you go, or better yet, move numbers you have not seen into a new array, avoiding any shifting in the original array.
In Perl:
foreach $i (#myary) {
if(!defined $seen{$i}) {
$seen{$i} = 1;
push #newary, $i;
}
}
The return value of the function should be the number of unique elements and they are all stored at the front of the array. Without this additional information, you won't even know if there were any duplicates.
Each iteration of the outer loop processes one element of the array. If it is unique, it stays in the front of the array and if it is a duplicate, it is overwritten by the last unprocessed element in the array. This solution runs in O(n^2) time.
#include <stdio.h>
#include <stdlib.h>
size_t rmdup(int *arr, size_t len)
{
size_t prev = 0;
size_t curr = 1;
size_t last = len - 1;
while (curr <= last) {
for (prev = 0; prev < curr && arr[curr] != arr[prev]; ++prev);
if (prev == curr) {
++curr;
} else {
arr[curr] = arr[last];
--last;
}
}
return curr;
}
void print_array(int *arr, size_t len)
{
printf("{");
size_t curr = 0;
for (curr = 0; curr < len; ++curr) {
if (curr > 0) printf(", ");
printf("%d", arr[curr]);
}
printf("}");
}
int main()
{
int arr[] = {4, 8, 4, 1, 1, 2, 9};
printf("Before: ");
size_t len = sizeof (arr) / sizeof (arr[0]);
print_array(arr, len);
len = rmdup(arr, len);
printf("\nAfter: ");
print_array(arr, len);
printf("\n");
return 0;
}
Here is a Java Version.
int[] removeDuplicate(int[] input){
int arrayLen = input.length;
for(int i=0;i<arrayLen;i++){
for(int j = i+1; j< arrayLen ; j++){
if(((input[i]^input[j]) == 0)){
input[j] = 0;
}
if((input[j]==0) && j<arrayLen-1){
input[j] = input[j+1];
input[j+1] = 0;
}
}
}
return input;
}
Here is my solution.
///// find duplicates in an array and remove them
void unique(int* input, int n)
{
merge_sort(input, 0, n) ;
int prev = 0 ;
for(int i = 1 ; i < n ; i++)
{
if(input[i] != input[prev])
if(prev < i-1)
input[prev++] = input[i] ;
}
}
An array should obviously be "traversed" right-to-left to avoid unneccessary copying of values back and forth.
If you have unlimited memory, you can allocate a bit array for sizeof(type-of-element-in-array) / 8 bytes to have each bit signify whether you've already encountered corresponding value or not.
If you don't, I can't think of anything better than traversing an array and comparing each value with values that follow it and then if duplicate is found, remove these values altogether. This is somewhere near O(n^2) (or O((n^2-n)/2)).
IBM has an article on kinda close subject.
Let's see:
O(N) pass to find min/max allocate
bit-array for found
O(N) pass swapping duplicates to end.
This can be done in one pass with an O(N log N) algorithm and no extra storage.
Proceed from element a[1] to a[N]. At each stage i, all of the elements to the left of a[i] comprise a sorted heap of elements a[0] through a[j]. Meanwhile, a second index j, initially 0, keeps track of the size of the heap.
Examine a[i] and insert it into the heap, which now occupies elements a[0] to a[j+1]. As the element is inserted, if a duplicate element a[k] is encountered having the same value, do not insert a[i] into the heap (i.e., discard it); otherwise insert it into the heap, which now grows by one element and now comprises a[0] to a[j+1], and increment j.
Continue in this manner, incrementing i until all of the array elements have been examined and inserted into the heap, which ends up occupying a[0] to a[j]. j is the index of the last element of the heap, and the heap contains only unique element values.
int algorithm(int[] a, int n)
{
int i, j;
for (j = 0, i = 1; i < n; i++)
{
// Insert a[i] into the heap a[0...j]
if (heapInsert(a, j, a[i]))
j++;
}
return j;
}
bool heapInsert(a[], int n, int val)
{
// Insert val into heap a[0...n]
...code omitted for brevity...
if (duplicate element a[k] == val)
return false;
a[k] = val;
return true;
}
Looking at the example, this is not exactly what was asked for since the resulting array preserves the original element order. But if this requirement is relaxed, the algorithm above should do the trick.
In Java I would solve it like this. Don't know how to write this in C.
int length = array.length;
for (int i = 0; i < length; i++)
{
for (int j = i + 1; j < length; j++)
{
if (array[i] == array[j])
{
int k, j;
for (k = j + 1, l = j; k < length; k++, l++)
{
if (array[k] != array[i])
{
array[l] = array[k];
}
else
{
l--;
}
}
length = l;
}
}
}
How about the following?
int* temp = malloc(sizeof(int)*len);
int count = 0;
int x =0;
int y =0;
for(x=0;x<len;x++)
{
for(y=0;y<count;y++)
{
if(*(temp+y)==*(array+x))
{
break;
}
}
if(y==count)
{
*(temp+count) = *(array+x);
count++;
}
}
memcpy(array, temp, sizeof(int)*len);
I try to declare a temp array and put the elements into that before copying everything back to the original array.
After review the problem, here is my delphi way, that may help
var
A: Array of Integer;
I,J,C,K, P: Integer;
begin
C:=10;
SetLength(A,10);
A[0]:=1; A[1]:=4; A[2]:=2; A[3]:=6; A[4]:=3; A[5]:=4;
A[6]:=3; A[7]:=4; A[8]:=2; A[9]:=5;
for I := 0 to C-1 do
begin
for J := I+1 to C-1 do
if A[I]=A[J] then
begin
for K := C-1 Downto J do
if A[J]<>A[k] then
begin
P:=A[K];
A[K]:=0;
A[J]:=P;
C:=K;
break;
end
else
begin
A[K]:=0;
C:=K;
end;
end;
end;
//tructate array
setlength(A,C);
end;
The following example should solve your problem:
def check_dump(x):
if not x in t:
t.append(x)
return True
t=[]
output = filter(check_dump, input)
print(output)
True
import java.util.ArrayList;
public class C {
public static void main(String[] args) {
int arr[] = {2,5,5,5,9,11,11,23,34,34,34,45,45};
ArrayList<Integer> arr1 = new ArrayList<Integer>();
for(int i=0;i<arr.length-1;i++){
if(arr[i] == arr[i+1]){
arr[i] = 99999;
}
}
for(int i=0;i<arr.length;i++){
if(arr[i] != 99999){
arr1.add(arr[i]);
}
}
System.out.println(arr1);
}
}
This is the naive (N*(N-1)/2) solution. It uses constant additional space and maintains the original order. It is similar to the solution by #Byju, but uses no if(){} blocks. It also avoids copying an element onto itself.
#include <stdio.h>
#include <stdlib.h>
int numbers[] = {4, 8, 4, 1, 1, 2, 9};
#define COUNT (sizeof numbers / sizeof numbers[0])
size_t undup_it(int array[], size_t len)
{
size_t src,dst;
/* an array of size=1 cannot contain duplicate values */
if (len <2) return len;
/* an array of size>1 will cannot at least one unique value */
for (src=dst=1; src < len; src++) {
size_t cur;
for (cur=0; cur < dst; cur++ ) {
if (array[cur] == array[src]) break;
}
if (cur != dst) continue; /* found a duplicate */
/* array[src] must be new: add it to the list of non-duplicates */
if (dst < src) array[dst] = array[src]; /* avoid copy-to-self */
dst++;
}
return dst; /* number of valid alements in new array */
}
void print_it(int array[], size_t len)
{
size_t idx;
for (idx=0; idx < len; idx++) {
printf("%c %d", (idx) ? ',' :'{' , array[idx] );
}
printf("}\n" );
}
int main(void) {
size_t cnt = COUNT;
printf("Before undup:" );
print_it(numbers, cnt);
cnt = undup_it(numbers,cnt);
printf("After undup:" );
print_it(numbers, cnt);
return 0;
}
This can be done in a single pass, in O(N) time in the number of integers in the input
list, and O(N) storage in the number of unique integers.
Walk through the list from front to back, with two pointers "dst" and
"src" initialized to the first item. Start with an empty hash table
of "integers seen". If the integer at src is not present in the hash,
write it to the slot at dst and increment dst. Add the integer at src
to the hash, then increment src. Repeat until src passes the end of
the input list.
Insert all the elements in a binary tree the disregards duplicates - O(nlog(n)). Then extract all of them back in the array by doing a traversal - O(n). I am assuming that you don't need order preservation.
Use bloom filter for hashing. This will reduce the memory overhead very significantly.
In JAVA,
Integer[] arrayInteger = {1,2,3,4,3,2,4,6,7,8,9,9,10};
String value ="";
for(Integer i:arrayInteger)
{
if(!value.contains(Integer.toString(i))){
value +=Integer.toString(i)+",";
}
}
String[] arraySplitToString = value.split(",");
Integer[] arrayIntResult = new Integer[arraySplitToString.length];
for(int i = 0 ; i < arraySplitToString.length ; i++){
arrayIntResult[i] = Integer.parseInt(arraySplitToString[i]);
}
output:
{ 1, 2, 3, 4, 6, 7, 8, 9, 10}
hope this will help
Create a BinarySearchTree which has O(n) complexity.
First, you should create an array check[n] where n is the number of elements of the array you want to make duplicate-free and set the value of every element(of the check array) equal to 1. Using a for loop traverse the array with the duplicates, say its name is arr, and in the for-loop write this :
{
if (check[arr[i]] != 1) {
arr[i] = 0;
}
else {
check[arr[i]] = 0;
}
}
With that, you set every duplicate equal to zero. So the only thing is left to do is to traverse the arr array and print everything it's not equal to zero. The order stays and it takes linear time (3*n).
Given an array of n elements, write an algorithm to remove all duplicates from the array in time O(nlogn)
Algorithm delete_duplicates (a[1....n])
//Remove duplicates from the given array
//input parameters :a[1:n], an array of n elements.
{
temp[1:n]; //an array of n elements.
temp[i]=a[i];for i=1 to n
temp[i].value=a[i]
temp[i].key=i
//based on 'value' sort the array temp.
//based on 'value' delete duplicate elements from temp.
//based on 'key' sort the array temp.//construct an array p using temp.
p[i]=temp[i]value
return p.
In other of elements is maintained in the output array using the 'key'. Consider the key is of length O(n), the time taken for performing sorting on the key and value is O(nlogn). So the time taken to delete all duplicates from the array is O(nlogn).
this is what i've got, though it misplaces the order we can sort in ascending or descending to fix it up.
#include <stdio.h>
int main(void){
int x,n,myvar=0;
printf("Enter a number: \t");
scanf("%d",&n);
int arr[n],changedarr[n];
for(x=0;x<n;x++){
printf("Enter a number for array[%d]: ",x);
scanf("%d",&arr[x]);
}
printf("\nOriginal Number in an array\n");
for(x=0;x<n;x++){
printf("%d\t",arr[x]);
}
int i=0,j=0;
// printf("i\tj\tarr\tchanged\n");
for (int i = 0; i < n; i++)
{
// printf("%d\t%d\t%d\t%d\n",i,j,arr[i],changedarr[i] );
for (int j = 0; j <n; j++)
{
if (i==j)
{
continue;
}
else if(arr[i]==arr[j]){
changedarr[j]=0;
}
else{
changedarr[i]=arr[i];
}
// printf("%d\t%d\t%d\t%d\n",i,j,arr[i],changedarr[i] );
}
myvar+=1;
}
// printf("\n\nmyvar=%d\n",myvar);
int count=0;
printf("\nThe unique items:\n");
for (int i = 0; i < myvar; i++)
{
if(changedarr[i]!=0){
count+=1;
printf("%d\t",changedarr[i]);
}
}
printf("\n");
}
It'd be cool if you had a good DataStructure that could quickly tell if it contains an integer. Perhaps a tree of some sort.
DataStructure elementsSeen = new DataStructure();
int elementsRemoved = 0;
for(int i=0;i<array.Length;i++){
if(elementsSeen.Contains(array[i])
elementsRemoved++;
else
array[i-elementsRemoved] = array[i];
}
array.Length = array.Length - elementsRemoved;