Knuth List Insertion Method in C

Knuth List Insertion Method in C - c

I've been reading through the sorting and searching algorithms in Volume 3 of The Art of Computer Programming by Donald Knuth, Second Edition. I came across an algorithm which Knuth calls "list insertion" (A modification on traditional insertion sort) on page 95.
On that page, Knuth concludes that the "right data structure for straight insertion is a one-way, linked linear list.", and that "linked allocation (Section 2.2.3) is ideally suited to insertion, since only a few links need to be changed." However, the MIXAL program (Program L) on page 97 does not appear to utilize a traditional linked linear list structure (a series of nodes linked by addresses). Instead, it appears that the key and the link are stored together in a struct-like structure and these structures are stored in an array called INPUT.
I decided to try to implement this algorithm in C. I have provided Knuth's description of the algorithm, and his implementation in the MIXAL assembly language as reference. I decided to let the keys be the elements, themselves, in the data array, and I put the links in a parallel-like array called links. I say 'parallel-like array' because the size of the links array is one greater than the size of the data array. I did this so I could easily determine the index of the smallest element of the data array by storing it as the first element in the links array. Because of this extra index in links, the indices 0 - (n - 1) of the data array correspond to indices 1 - n of the links array. Each element in the links array corresponds to the index in the data array of the next element in the sorted list.
My question is, is this how this algorithm is supposed to be implemented based on his description, or am I missing something?
int *listInsertion(int data[], int n) {
if (n > 1) {
int i, j, entry;
int *links = (int *) calloc(n + 1, sizeof *links);
links[n] = -1;
links[n - 1] = n - 1;
for (i = n - 2; i >= 0; i--) {
entry = data[i];
for (j = i + 1; links[j] >= 0 && entry > data[links[j]];
j = links[j] + 1)
continue;
if (j == i + 1) {
links[i] = i;
} else {
links[i] = links[i + 1];
links[i + 1] = links[j];
links[j] = i;
}
}
return links;
}
return NULL;
}

I suggest you implement the algorithm with the symbol mentioned by Knuth first.
This will help you figure out the first version quickly.
void insertSort(const int *K, int *L, int n) {
if (n == 1) return;
L[n] = n-1;
L[n-1] = n;
for (int j = n-2; j >= 0; j--) {
int entry = K[j];
int p = L[n];
int q = n;
while(entry > K[p]) {
q = p;
p = L[q];
if (p == n) {
break;
}
}
L[q] = j;
L[j] = p;
}
}
Then you can refactor your first version to enhance it or make it shorter.

Related

Is there a way to find the most common element in a STRING array?

I have this code below, but it works with integers. I am wanting to get the same thing working for a string array. In particular if say there were names to a string array like 'char []a = "Sparky", "Mary", "Sparky", "John", "Betsy" ', how to get Sparky selected.
int[] a
int count = 1, tempCount;
int popular = a[0];
int temp = 0;
for (int i = 0; i < (a.length - 1); i++)
{
temp = a[i];
tempCount = 0;
for (int j = 1; j < a.length; j++)
{
if (temp == a[j])
tempCount++;
}
if (tempCount > count)
{
popular = temp;
count = tempCount;
}
}
return popular;

A simple way to do this is sort the array (e.g. using the standard function qsort()) and then iterate over it, keeping track of:
the most common string you've seen so far,
how many times you've seen the most common string,
the latest string you've seen,
how many times you've seen the latest string.
If (4) exceeds (2), you update (1).
This requires on aeverage O(n log n) time and O(log n) to sort the array and O(n) to scan it as described.

Can quicksort be implemented in C without stack and recursion?

I found this post How to do iterative quicksort without using stack in c?
but the answer suggested does use a inline stack array! (Only constant amount of extra space is permitted)

The code in the page in reference makes a bold claim:
STACK My implementation does not use the stack to store data...
Yet the function definition has many variables with automatic storage, among them 2 arrays with 1000 entries, which will end up using a fixed but substantial amount of stack space:
// quickSort
//
// This public-domain C implementation by Darel Rex Finley.
//
// * Returns YES if sort was successful, or NO if the nested
// pivots went too deep, in which case your array will have
// been re-ordered, but probably not sorted correctly.
//
// * This function assumes it is called with valid parameters.
//
// * Example calls:
// quickSort(&myArray[0],5); // sorts elements 0, 1, 2, 3, and 4
// quickSort(&myArray[3],5); // sorts elements 3, 4, 5, 6, and 7
bool quickSort(int *arr, int elements) {
#define MAX_LEVELS 1000
int piv, beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R ;
beg[0]=0; end[0]=elements;
while (i>=0) {
L=beg[i]; R=end[i]-1;
if (L<R) {
piv=arr[L]; if (i==MAX_LEVELS-1) return NO;
while (L<R) {
while (arr[R]>=piv && L<R) R--; if (L<R) arr[L++]=arr[R];
while (arr[L]<=piv && L<R) L++; if (L<R) arr[R--]=arr[L]; }
arr[L]=piv; beg[i+1]=L+1; end[i+1]=end[i]; end[i++]=L; }
else {
i--; }}
return YES; }
The indentation style is very confusing. Here is a reformatted version:
#define MAX_LEVELS 1000
bool quickSort(int *arr, int elements) {
int piv, beg[MAX_LEVELS], end[MAX_LEVELS], i = 0, L, R;
beg[0] = 0;
end[0] = elements;
while (i >= 0) {
L = beg[i];
R = end[i] - 1;
if (L < R) {
piv = arr[L];
if (i == MAX_LEVELS - 1)
return NO;
while (L < R) {
while (arr[R] >= piv && L < R)
R--;
if (L < R)
arr[L++] = arr[R];
while (arr[L] <= piv && L < R)
L++;
if (L < R)
arr[R--] = arr[L];
}
arr[L] = piv;
beg[i + 1] = L + 1;
end[i + 1] = end[i];
end[i++] = L;
} else {
i--;
}
}
return YES;
}
Note that 1000 is large but not sufficient for pathological cases on moderately large arrays that are already sorted. The function returns NO on such arrays with a size of 1000 only, which is unacceptable.
A much lower value would suffice with an improved version of the algorithm where the larger range is pushed into the array and the loop iterates on the smaller range. This ensures that an array of N entries can handle a set of 2N entries. It still has quadratic time complexity on sorted arrays but at least would sort arrays of all possible sizes.
Here is a modified and instrumented version:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define MAX_LEVELS 64
int quickSort(int *arr, size_t elements) {
size_t beg[MAX_LEVELS], end[MAX_LEVELS], L, R;
int i = 0;
beg[0] = 0;
end[0] = elements;
while (i >= 0) {
L = beg[i];
R = end[i];
if (L + 1 < R--) {
int piv = arr[L];
if (i == MAX_LEVELS - 1)
return -1;
while (L < R) {
while (arr[R] >= piv && L < R)
R--;
if (L < R)
arr[L++] = arr[R];
while (arr[L] <= piv && L < R)
L++;
if (L < R)
arr[R--] = arr[L];
}
arr[L] = piv;
if (L - beg[i] > end[i] - R) {
beg[i + 1] = L + 1;
end[i + 1] = end[i];
end[i++] = L;
} else {
beg[i + 1] = beg[i];
end[i + 1] = L;
beg[i++] = L + 1;
}
} else {
i--;
}
}
return 0;
}
int testsort(int *a, size_t size, const char *desc) {
clock_t t = clock();
size_t i;
if (quickSort(a, size)) {
printf("%s: quickSort failure\n", desc);
return 1;
}
for (i = 1; i < size; i++) {
if (a[i - 1] > a[i]) {
printf("%s: sorting error: a[%zu]=%d > a[%zu]=%d\n",
desc, i - 1, a[i - 1], i, a[i]);
return 2;
}
}
t = clock() - t;
printf("%s: %zu elements sorted in %.3fms\n",
desc, size, t * 1000.0 / CLOCKS_PER_SEC);
return 0;
}
int main(int argc, char *argv[]) {
size_t i, size = argc > 1 ? strtoull(argv[1], NULL, 0) : 1000;
int *a = malloc(sizeof(*a) * size);
if (a != NULL) {
for (i = 0; i < size; i++)
a[i] = rand();
testsort(a, size, "random");
for (i = 0; i < size; i++)
a[i] = i;
testsort(a, size, "sorted");
for (i = 0; i < size; i++)
a[i] = size - i;
testsort(a, size, "reverse sorted");
for (i = 0; i < size; i++)
a[i] = 0;
testsort(a, size, "constant");
free(a);
}
return 0;
}
Output:
random: 100000 elements sorted in 7.379ms
sorted: 100000 elements sorted in 2799.752ms
reverse sorted: 100000 elements sorted in 2768.844ms
constant: 100000 elements sorted in 2786.612ms
Here is a slighlty modified version more resistant to pathological cases:
#define MAX_LEVELS 48
int quickSort(int *arr, size_t elements) {
size_t beg[MAX_LEVELS], end[MAX_LEVELS], L, R;
int i = 0;
beg[0] = 0;
end[0] = elements;
while (i >= 0) {
L = beg[i];
R = end[i];
if (R - L > 1) {
size_t M = L + ((R - L) >> 1);
int piv = arr[M];
arr[M] = arr[L];
if (i == MAX_LEVELS - 1)
return -1;
R--;
while (L < R) {
while (arr[R] >= piv && L < R)
R--;
if (L < R)
arr[L++] = arr[R];
while (arr[L] <= piv && L < R)
L++;
if (L < R)
arr[R--] = arr[L];
}
arr[L] = piv;
M = L + 1;
while (L > beg[i] && arr[L - 1] == piv)
L--;
while (M < end[i] && arr[M] == piv)
M++;
if (L - beg[i] > end[i] - M) {
beg[i + 1] = M;
end[i + 1] = end[i];
end[i++] = L;
} else {
beg[i + 1] = beg[i];
end[i + 1] = L;
beg[i++] = M;
}
} else {
i--;
}
}
return 0;
}
Output:
random: 10000000 elements sorted in 963.973ms
sorted: 10000000 elements sorted in 167.621ms
reverse sorted: 10000000 elements sorted in 167.375ms
constant: 10000000 elements sorted in 9.335ms
As a conclusion:
yes quick sort can be implemented without recursion,
no it cannot be implemented without any local automatic storage,
yes only a constant amount of extra space is necessary, but only because we live is a small world where the maximum size of the array is bounded by available memory. A size of 64 for the local objects handles arrays larger than the size of the Internet, much larger than current 64-bit systems could address.

Apparently, it is possible to implement a non-recursive quicksort with only constant amount of extra space as stated here. This builds upon the Sedgewick's work for non-recursive formulation of quicksort. Instead of preserving the boundary values(low and high) it essentially performs a linear scan to determine these bounds.

Can quicksort be implemented in C without stack and recursion?
Quicksort requires two paths be followed forward from each non-trivial partitioning: a new partitioning of each (sub)partition. Information about the previous partitioning (the bounds of one of the resulting partitions) needs to be carried forward to each new partitioning. The question, then, is where does that information live? In particular, where does the information about one partition live while the program is working on the other?
For a serial algorithm, the answer is that the information is stored on a stack or a queue or a functional equivalent of one of those. Always, because those are our names for data structures that serve the needed purpose. In particular, recursion is a special case, not an alternative. In a recursive quicksort, the data are stored on the call stack. For an iterative implementation you can implement a stack in a formal sense, but it's possible to instead use a simple and relatively small array as a makeshift stack.
But stack and queue equivalents can go a lot farther than that. You could append data to a file, for example, for later read-back. You could write it to a pipe. You could transmit it to yourself asynchronously over a communications network.
If you're clever, you can even make the input array itself serve the need for a stack, by encoding the partition bounds using relative element order or some other element property, as described by Ďurian, for example. This involves a space vs speed tradeoff that is probably not a good deal in most cases. However, it has lower space overhead (O(1)) than do typical quicksort implementations (O(log N)), and it does not change the algorithm's O(N log N) asymptotic time complexity.
If you wanted to go crazy, you could even nest iterations in place of recursing. That would impose a hard upper bound on the size of the arrays that could be handled, but not as tight of one as you might think. With some care and a few tricks, you could handle billion-element arrays with a 25-loop nest. Such a deep nest would be ugly and crazy, but nevertheless conceivable. A human could write it by hand. And in that case, the series of nested loop scopes, with their block-scoped variables, serves as a stack equivalent.
So the answer depends on what exactly you mean by "without stack":
yes, you can use a queue instead, though it would need to have about the same capacity as there are elements to sort;
yes, you can use an array or some other kind of sequential data storage, including the input array itself, to emulate a formal stack or queue;
yes, you can encode a suitable stack equivalent directly into the structure of your program;
yes, you can probably come up with other, more esoteric versions of stacks and queues;
but no, you cannot perform a quicksort without something filling the multi-level data-storage role for which a stack or stack-equivalent is conventionally used.

Well, it can, because I implemented a quicksort in fortran IV (it was a long time ago, and before the language supported recursion - and it was for a bet). However you do need somewhere (a large array would do) to remember your state as you do individual bits of work.
It's a lot easier recursively...

Quicksort is by definition a "divide and conquer" searching algorithm, the idea is that you split the given array into smaller partitions. So you are dividing the problem into subproblems, that is easier to solve.
When using Quicksort without recursion you need a struct of some sort to store the partitions you are not using at the time.
That's why the answer of the post uses an array to make quicksort non recursive.

Leetcode: Four Sum

Problem: Given an array S of n integers, are there elements a, b, c, and d in S such that a + b + c + d = target? Find all unique quadruplets in the array which gives the sum of target.
Note:
Elements in a quadruplet (a,b,c,d) must be in non-descending order. (ie, a ≤ b ≤ c ≤ d)
The solution set must not contain duplicate quadruplets.
For example, given array S = {1 0 -1 0 -2 2}, and target = 0.
A solution set is:
(-1, 0, 0, 1)
(-2, -1, 1, 2)
(-2, 0, 0, 2)
I know there's an O(n^3) solution to this problem, but I was wondering if there's a faster algorithm. I googled a lot and found that many people gave an O(n^2logn) solution, which fails to correctly deal with cases when there are duplicates of pair sums in S (like here
and here). I hope someone can give me a correct version of an O(n^2logn) algorithm if it really exists.
Thanks!

The brute-force algorithm takes time O(n^4): Use four nested loops to form all combinations of four items from the input, and keep any that sum to the target.
A simple improvement takes time O(n^3): Use three nested loops to form all combinations of three items from the input, and keep any that sum to the negative of the target.
The best algorithm I know is a meet-in-the-middle algorithm that operates in time O(n^2): Use two nested loops to form all combinations of two items from the input, storing the pairs and totals in some kind of dictionary (hash table, balanced tree) indexed by total. Then use two more nested loops to again form all combinations of two items from the input, and keep the two items from the nested loops, plus the two items from the dictionary, for any pair of items that sums to the negative of a total in the dictionary.
I have code at my blog.

IMHO, for O(n^2lgn) algorithm, the problem of duplicates can be solved when creating the aux[] array. (I'm using the name in the second link you provided). The basic idea is first sort the elements in the input, and then while processing the array, skip the duplicates.
vector<int> createAuxArray(vector<int> input) {
int len = input.size();
vector<int> aux;
sort(input.begin(), input.end());
for (int i = 0; i < len; ++i) {
if (i != 0 && input[i] == input[i - 1]) continue; // skip when encountered a duplicate
for (int j = i + 1; j < len; ++j) {
if (j != i + 1 && input[j] == input[j - 1]) continue; // same idea
aux.push_back(createAuxElement(input[i], input[j]);
}
}
return aux;
}
Complexity for this module is O(nlgn) + O(n^2) = O(n^2), which doesn't affect the overall performance. Once we have created aux array, we can plug it into the code mentioned in the post and the results will be correct.
Note that a BST or hashtable can be used to replace the sorting, but in general it doesn't decrease the complexity since you have to insert/query (O(lgN)) inside 2-nested loop.

This is a modified version of the geeksforgeeks solution which handles duplicates of pair sums as well. I noticed that some of the pairs were missing because the hash table was overwriting the old pairs when it found new pair that satisfies the sum. Thus, the fix is to avoid overwriting by storing them in a vector of pairs. Hope this helps!
vector<vector<int> > fourSum(vector<int> &a, int t) {
unordered_map<int, vector<pair<int,int> > > twoSum;
set<vector<int> > ans;
int n = a.size();
for (int i = 0; i < n; i++) for (int j = i + 1; j < n; j++) twoSum[a[i] + a[j]].push_back(make_pair(i, j));
for (int i = 0; i < n; i++) {
for (int j = i + 1; j < n; j++) {
if (twoSum.find(t - a[i] - a[j]) != twoSum.end()) {
for (auto comp : twoSum[t - a[i] - a[j]]) {
if (comp.first != i and comp.first != j and comp.second != i and comp.second != j) {
vector<int> row = {a[i], a[j], a[comp.first], a[comp.second]};
sort(row.begin(), row.end());
ans.insert(row);
}
}
}
}
}
vector<vector<int> > ret(ans.begin(), ans.end());
return ret;
}

Sorting: how to sort an array that contains 3 kind of numbers

For example: int A[] = {3,2,1,2,3,2,1,3,1,2,3};
How to sort this array efficiently?
This is for a job interview, I need just a pseudo-code.

The promising way how to sort it seems to be the counting sort. Worth to have a look at this lecture by Richard Buckland, especially the part from 15:20.
Analogically to the counting sort, but even better would be to create an array representing the domain, initialize all its elements to 0 and then iterate through your array and count these values. Once you know those counts of domain values, you can rewrite values of your array accordingly. Complexity of such an algorithm would be O(n).
Here's the C++ code with the behaviour as I described it. Its complexity is actually O(2n) though:
int A[] = {3,2,1,2,3,2,1,3,1,2,3};
int domain[4] = {0};
// count occurrences of domain values - O(n):
int size = sizeof(A) / sizeof(int);
for (int i = 0; i < size; ++i)
domain[A[i]]++;
// rewrite values of the array A accordingly - O(n):
for (int k = 0, i = 1; i < 4; ++i)
for (int j = 0; j < domain[i]; ++j)
A[k++] = i;
Note, that if there is big difference between domain values, storing domain as an array is inefficient. In that case it is much better idea to use map (thanks abhinav for pointing it out). Here's the C++ code that uses std::map for storing domain value - occurrences count pairs:
int A[] = {2000,10000,7,10000,10000,2000,10000,7,7,10000};
std::map<int, int> domain;
// count occurrences of domain values:
int size = sizeof(A) / sizeof(int);
for (int i = 0; i < size; ++i)
{
std::map<int, int>::iterator keyItr = domain.lower_bound(A[i]);
if (keyItr != domain.end() && !domain.key_comp()(A[i], keyItr->first))
keyItr->second++; // next occurrence
else
domain.insert(keyItr, std::pair<int,int>(A[i],1)); // first occurrence
}
// rewrite values of the array A accordingly:
int k = 0;
for (auto i = domain.begin(); i != domain.end(); ++i)
for (int j = 0; j < i->second; ++j)
A[k++] = i->first;
(if there is a way how to use std::map in above code more efficient, let me know)

Its a standard problem in computer science : Dutch national flag problem
See the link.

count each number and then create new array based on their counts...time complexity in O(n)
int counts[3] = {0,0,0};
for(int a in A)
counts[a-1]++;
for(int i = 0; i < counts[0]; i++)
A[i] = 1;
for(int i = counts[0]; i < counts[0] + counts[1]; i++)
A[i] = 2;
for(int i = counts[0] + counts[1]; i < counts[0] + counts[1] + counts[2]; i++)
A[i] = 3;

Problem description: You have n buckets, each bucket contain one coin , the value of the coin can be 5 or 10 or 20. you have to sort the buckets under this limitation: 1. you can use this 2 functions only: SwitchBaskets (Basket1, Basket2) – switch 2 baskets GetCoinValue (Basket1) – return Coin Value in selected basket 2. you cant define array of size n 3. use the switch function as little as possible.
My simple pseudo-code solution, which can be implemented in any language with O(n) complexity.
I will pick coin from basket
1) if it is 5 - push it to be the first,
2)if it is 20- push it to be the last,
3)If 10 - leave it where it is.
4) and look at the next bucket in line.
Edit: if you can't push elements to the first or last position then Merge sort would be ideally for piratical implementation. Here is how it will work:
Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then merges each of the resulting lists of two into lists of four, then merges those lists of four, and so on; until at last two lists are merged into the final sorted list. Of the algorithms described here, this is the first that scales well to very large lists, because its worst-case running time is O(n log n). Merge sort has seen a relatively recent surge in popularity for practical implementations, being used for the standard sort routine in the programming languages

I think the question is intending for you to use bucket sort. In cases where there are a small number of values bucket sort can be much faster than the more commonly used quicksort or mergesort.

As robert mentioned basketsort (or bucketsort) is the best in this situation.
I would also added next algorithm (it's actually very similar to busket sort):
[pseudocode is java-style]
Create a HashMap<Integer, Interger> map and cycle throught your array:
for (Integer i : array) {
Integer value = map.get(i);
if (value == null) {
map.put(i, 1);
} else {
map.put(i, value + 1);
}
}

I think I understasnd the question - you can use only O(1) space, and you can change the array only by swapping cells. (So you can use 2 operations on the array - swap and get)
My solution:
Use 2 index pointers - one for the position of the last 1, and one for the position of the last 2.
In stage i, you assume that the array is allready sorted from 1 to i-1,
than you check the i-th cell:
If A[i] == 3
you do nothing.
If A[i] == 2
you swap it with the cell after the last 2 index.
If A[i] == 1
you swap it with the cell after the last 2 index, and than swap the cell
after the last 2 index (that contains 1) with the cell after the last 1 index.
This is the main idea, you need to take care of the little details.
Overall O(n) complexity.

Here is the groovy solution, based on #ElYusubov but instead of pushing Bucket(5) to beginning & Bucket(15) to end. Use sifting so that 5's move toward beginning and 15 towards end.
Whenever we swap a bucket from end to current position, we decrement end, do not increment current counter as we need to check for the element again.
array = [15,5,10,5,10,10,15,5,15,10,5]
def swapBucket(int a, int b) {
if (a == b) return;
array[a] = array[a] + array[b]
array[b] = array[a] - array[b]
array[a] = array[a] - array[b]
}
def getBucketValue(int a) {
return array[a];
}
def start = 0, end = array.size() -1, counter = 0;
// we can probably do away with this start,end but it helps when already sorted.
// start - first bucket from left which is not 5
while (start < end) {
if (getBucketValue(start) != 5) break;
start++;
}
// end - first bucket from right whichis not 15
while (end > start) {
if (getBucketValue(end) != 15) break;
end--;
}
// already sorted when end = 1 { 1...size-1 are Buck(15) } or start = end-1
for (counter = start; counter < end;) {
def value = getBucketValue(counter)
if (value == 5) { swapBucket(start, counter); start++; counter++;}
else if (value == 15) { swapBucket(end, counter); end--; } // do not inc counter
else { counter++; }
}
for (key in array) { print " ${key} " }

This can be done very easily using-->
Dutch national Flag algorithm http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Sort/Flag/
instead of using 1,2,3 take it as 0,1,2

Have you tried to look at wiki for example? - http://en.wikipedia.org/wiki/Sorting_algorithm

This code is for c#:
However, you have to consider the algorithms to implement it in a non-language/framework specific way. As suggested Bucket set might be the efficient one to go with. If you provide detailed information on problem, i would try to look at best solution.
Good Luck...
Here is a code sample in C# .NET
int[] intArray = new int[9] {3,2,1,2,3,2,1,3,1 };
Array.Sort(intArray);
// write array
foreach (int i in intArray) Console.Write("{0}, ", i.ToString());

Just for fun, here's how you would implement "pushing values to the far edge", as ElYusubub suggested:
sort(array) {
a = 0
b = array.length
# a is the first item which isn't a 1
while array[a] == 1
a++
# b is the last item which isn't a 3
while array[b] == 3
b--
# go over all the items from the first non-1 to the last non-3
for (i = a; i <= b; i++)
# the while loop is because the swap could result in a 3 or a 1
while array[i] != 2
if array[i] == 1
swap(i, a)
while array[a] == 1
a++
else # array[i] == 3
swap(i, b)
while array[b] == 3
b--
This could actually be an optimal solution. I'm not sure.

Lets break the problem we have just two numbers in array . [1,2,1,2,2,2,1,1]
We can sort in one pass o(n) with minm swaps if;
We start two pointers from left and right until they meet each other.
Swapping left element with right if left element is bigger. (sort ascending)
We can do another pass, for three numbers (k-1 passes). In pass one we moved 1's to their final position and in pass 2 we moved 2's.
def start = 0, end = array.size() - 1;
// Pass 1, move lowest order element (1) to their final position
while (start < end) {
// first element from left which is not 1
for ( ; Array[start] == 1 && start < end ; start++);
// first element from right which IS 1
for ( ; Array[end] != 1 && start < end ; end--);
if (start < end) swap(start, end);
}
// In second pass we can do 10,15
// We can extend this using recurion, for sorting domain = k, we need k-1 recurions

def DNF(input,length):
high = length - 1
p = 0
i = 0
while i <= high:
if input[i] == 0:
input[i],input[p]=input[p],input[i]
p = p+1
i = i+1
elif input[i] == 2:
input[i],input[high]=input[high],input[i]
high = high-1
else:
i = i+1
input = [0,1,2,2,1,0]
print "input: ", input
DNF(input,len(input))
print "output: ", input

I would use a recursive approach over here
fun sortNums(smallestIndex,largestIndex,array,currentIndex){
if(currentIndex >= array.size)
return
if (array[currentIndex] == 1){
You have found the smallest element, now increase the smallestIndex
//You need to put this element to left side of the array at the smallestIndex position.
//You can simply swap(smallestIndex, currentIndex)
// The catch here is you should not swap it if it's already on the left side
//recursive call
sortNums(smallestIndex,largestIndex,array,currentIndex or currentIndex+1)// Now the task of incrementing current Index in recursive call depends on the element at currentIndex. if it's 3, then you might want to let the fate of currentIndex decided by recursive function else simply increment by 1 and move further
} else if (array[currentInde]==3){
// same logic but you need to add it at end
}
}
You can start the recursive function by sortNums(smallestIndex=-1,largestIndex=array.size,array,currentIndex=0)
You can find the sample code over here
Code Link

//Bubble sort for unsorted array - algorithm
public void bubleSort(int arr[], int n) { //n is the length of an array
int temp;
for(int i = 0; i <= n-2; i++){
for(int j = 0; j <= (n-2-i); j++){
if(arr[j] > arr[j +1]){
temp = arr[j];
arr[j] = arr[j +1];
arr[j + 1] = temp;
}
}
}

Algorithm: efficient way to remove duplicate integers from an array

I got this problem from an interview with Microsoft.
Given an array of random integers,
write an algorithm in C that removes
duplicated numbers and return the unique numbers in the original
array.
E.g Input: {4, 8, 4, 1, 1, 2, 9} Output: {4, 8, 1, 2, 9, ?, ?}
One caveat is that the expected algorithm should not required the array to be sorted first. And when an element has been removed, the following elements must be shifted forward as well. Anyway, value of elements at the tail of the array where elements were shifted forward are negligible.
Update: The result must be returned in the original array and helper data structure (e.g. hashtable) should not be used. However, I guess order preservation is not necessary.
Update2: For those who wonder why these impractical constraints, this was an interview question and all these constraints are discussed during the thinking process to see how I can come up with different ideas.

A solution suggested by my girlfriend is a variation of merge sort. The only modification is that during the merge step, just disregard duplicated values. This solution would be as well O(n log n). In this approach, the sorting/duplication removal are combined together. However, I'm not sure if that makes any difference, though.

I've posted this once before on SO, but I'll reproduce it here because it's pretty cool. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows:
Take the first element of the array, this will be the sentinel.
Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel.
Move all elements for which the index is equal to the hash to the beginning of the array.
Move all elements that are equal to sentinel, except the first element of the array, to the end of the array.
What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements.
This can be shown to be O(N) provided no pathological scenario in the hashing: Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast.
Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion.
Here's an implementation in the D programming language:
void uniqueInPlace(T)(ref T[] dataIn) {
uniqueInPlaceImpl(dataIn, 0);
}
void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) {
if(dataIn.length - start < 2)
return;
invariant T sentinel = dataIn[start];
T[] data = dataIn[start + 1..$];
static hash_t getHash(T elem) {
static if(is(T == uint) || is(T == int)) {
return cast(hash_t) elem;
} else static if(__traits(compiles, elem.toHash)) {
return elem.toHash;
} else {
static auto ti = typeid(typeof(elem));
return ti.getHash(&elem);
}
}
for(size_t index = 0; index < data.length;) {
if(data[index] == sentinel) {
index++;
continue;
}
auto hash = getHash(data[index]) % data.length;
if(index == hash) {
index++;
continue;
}
if(data[index] == data[hash]) {
data[index] = sentinel;
index++;
continue;
}
if(data[hash] == sentinel) {
swap(data[hash], data[index]);
index++;
continue;
}
auto hashHash = getHash(data[hash]) % data.length;
if(hashHash != hash) {
swap(data[index], data[hash]);
if(hash < index)
index++;
} else {
index++;
}
}
size_t swapPos = 0;
foreach(i; 0..data.length) {
if(data[i] != sentinel && i == getHash(data[i]) % data.length) {
swap(data[i], data[swapPos++]);
}
}
size_t sentinelPos = data.length;
for(size_t i = swapPos; i < sentinelPos;) {
if(data[i] == sentinel) {
swap(data[i], data[--sentinelPos]);
} else {
i++;
}
}
dataIn = dataIn[0..sentinelPos + start + 1];
uniqueInPlaceImpl(dataIn, start + swapPos + 1);
}

How about:
void rmdup(int *array, int length)
{
int *current , *end = array + length - 1;
for ( current = array + 1; array < end; array++, current = array + 1 )
{
while ( current <= end )
{
if ( *current == *array )
{
*current = *end--;
}
else
{
current++;
}
}
}
}
Should be O(n^2) or less.

If you are looking for the superior O-notation, then sorting the array with an O(n log n) sort then doing a O(n) traversal may be the best route. Without sorting, you are looking at O(n^2).
Edit: if you are just doing integers, then you can also do radix sort to get O(n).

One more efficient implementation
int i, j;
/* new length of modified array */
int NewLength = 1;
for(i=1; i< Length; i++){
for(j=0; j< NewLength ; j++)
{
if(array[i] == array[j])
break;
}
/* if none of the values in index[0..j] of array is not same as array[i],
then copy the current value to corresponding new position in array */
if (j==NewLength )
array[NewLength++] = array[i];
}
In this implementation there is no need for sorting the array.
Also if a duplicate element is found, there is no need for shifting all elements after this by one position.
The output of this code is array[] with size NewLength
Here we are starting from the 2nd elemt in array and comparing it with all the elements in array up to this array.
We are holding an extra index variable 'NewLength' for modifying the input array.
NewLength variabel is initialized to 0.
Element in array[1] will be compared with array[0].
If they are different, then value in array[NewLength] will be modified with array[1] and increment NewLength.
If they are same, NewLength will not be modified.
So if we have an array [1 2 1 3 1],
then
In First pass of 'j' loop, array[1] (2) will be compared with array0, then 2 will be written to array[NewLength] = array[1]
so array will be [1 2] since NewLength = 2
In second pass of 'j' loop, array[2] (1) will be compared with array0 and array1. Here since array[2] (1) and array0 are same loop will break here.
so array will be [1 2] since NewLength = 2
and so on

1. Using O(1) extra space, in O(n log n) time
This is possible, for instance:
first do an in-place O(n log n) sort
then walk through the list once, writing the first instance of every back to the beginning of the list
I believe ejel's partner is correct that the best way to do this would be an in-place merge sort with a simplified merge step, and that that is probably the intent of the question, if you were eg. writing a new library function to do this as efficiently as possible with no ability to improve the inputs, and there would be cases it would be useful to do so without a hash-table, depending on the sorts of inputs. But I haven't actually checked this.
2. Using O(lots) extra space, in O(n) time
declare a zero'd array big enough to hold all integers
walk through the array once
set the corresponding array element to 1 for each integer.
If it was already 1, skip that integer.
This only works if several questionable assumptions hold:
it's possible to zero memory cheaply, or the size of the ints are small compared to the number of them
you're happy to ask your OS for 256^sizepof(int) memory
and it will cache it for you really really efficiently if it's gigantic
It's a bad answer, but if you have LOTS of input elements, but they're all 8-bit integers (or maybe even 16-bit integers) it could be the best way.
3. O(little)-ish extra space, O(n)-ish time
As #2, but use a hash table.
4. The clear way
If the number of elements is small, writing an appropriate algorithm is not useful if other code is quicker to write and quicker to read.
Eg. Walk through the array for each unique elements (ie. the first element, the second element (duplicates of the first having been removed) etc) removing all identical elements. O(1) extra space, O(n^2) time.
Eg. Use library functions which do this. efficiency depends which you have easily available.

Well, it's basic implementation is quite simple. Go through all elements, check whether there are duplicates in the remaining ones and shift the rest over them.
It's terrible inefficient and you could speed it up by a helper-array for the output or sorting/binary trees, but this doesn't seem to be allowed.

If you are allowed to use C++, a call to std::sort followed by a call to std::unique will give you the answer. The time complexity is O(N log N) for the sort and O(N) for the unique traversal.
And if C++ is off the table there isn't anything that keeps these same algorithms from being written in C.

You could do this in a single traversal, if you are willing to sacrifice memory. You can simply tally whether you have seen an integer or not in a hash/associative array. If you have already seen a number, remove it as you go, or better yet, move numbers you have not seen into a new array, avoiding any shifting in the original array.
In Perl:
foreach $i (#myary) {
if(!defined $seen{$i}) {
$seen{$i} = 1;
push #newary, $i;
}
}

The return value of the function should be the number of unique elements and they are all stored at the front of the array. Without this additional information, you won't even know if there were any duplicates.
Each iteration of the outer loop processes one element of the array. If it is unique, it stays in the front of the array and if it is a duplicate, it is overwritten by the last unprocessed element in the array. This solution runs in O(n^2) time.
#include <stdio.h>
#include <stdlib.h>
size_t rmdup(int *arr, size_t len)
{
size_t prev = 0;
size_t curr = 1;
size_t last = len - 1;
while (curr <= last) {
for (prev = 0; prev < curr && arr[curr] != arr[prev]; ++prev);
if (prev == curr) {
++curr;
} else {
arr[curr] = arr[last];
--last;
}
}
return curr;
}
void print_array(int *arr, size_t len)
{
printf("{");
size_t curr = 0;
for (curr = 0; curr < len; ++curr) {
if (curr > 0) printf(", ");
printf("%d", arr[curr]);
}
printf("}");
}
int main()
{
int arr[] = {4, 8, 4, 1, 1, 2, 9};
printf("Before: ");
size_t len = sizeof (arr) / sizeof (arr[0]);
print_array(arr, len);
len = rmdup(arr, len);
printf("\nAfter: ");
print_array(arr, len);
printf("\n");
return 0;
}

Here is a Java Version.
int[] removeDuplicate(int[] input){
int arrayLen = input.length;
for(int i=0;i<arrayLen;i++){
for(int j = i+1; j< arrayLen ; j++){
if(((input[i]^input[j]) == 0)){
input[j] = 0;
}
if((input[j]==0) && j<arrayLen-1){
input[j] = input[j+1];
input[j+1] = 0;
}
}
}
return input;
}

Here is my solution.
///// find duplicates in an array and remove them
void unique(int* input, int n)
{
merge_sort(input, 0, n) ;
int prev = 0 ;
for(int i = 1 ; i < n ; i++)
{
if(input[i] != input[prev])
if(prev < i-1)
input[prev++] = input[i] ;
}
}

An array should obviously be "traversed" right-to-left to avoid unneccessary copying of values back and forth.
If you have unlimited memory, you can allocate a bit array for sizeof(type-of-element-in-array) / 8 bytes to have each bit signify whether you've already encountered corresponding value or not.
If you don't, I can't think of anything better than traversing an array and comparing each value with values that follow it and then if duplicate is found, remove these values altogether. This is somewhere near O(n^2) (or O((n^2-n)/2)).
IBM has an article on kinda close subject.

Let's see:
O(N) pass to find min/max allocate
bit-array for found
O(N) pass swapping duplicates to end.

This can be done in one pass with an O(N log N) algorithm and no extra storage.
Proceed from element a[1] to a[N]. At each stage i, all of the elements to the left of a[i] comprise a sorted heap of elements a[0] through a[j]. Meanwhile, a second index j, initially 0, keeps track of the size of the heap.
Examine a[i] and insert it into the heap, which now occupies elements a[0] to a[j+1]. As the element is inserted, if a duplicate element a[k] is encountered having the same value, do not insert a[i] into the heap (i.e., discard it); otherwise insert it into the heap, which now grows by one element and now comprises a[0] to a[j+1], and increment j.
Continue in this manner, incrementing i until all of the array elements have been examined and inserted into the heap, which ends up occupying a[0] to a[j]. j is the index of the last element of the heap, and the heap contains only unique element values.
int algorithm(int[] a, int n)
{
int i, j;
for (j = 0, i = 1; i < n; i++)
{
// Insert a[i] into the heap a[0...j]
if (heapInsert(a, j, a[i]))
j++;
}
return j;
}
bool heapInsert(a[], int n, int val)
{
// Insert val into heap a[0...n]
...code omitted for brevity...
if (duplicate element a[k] == val)
return false;
a[k] = val;
return true;
}
Looking at the example, this is not exactly what was asked for since the resulting array preserves the original element order. But if this requirement is relaxed, the algorithm above should do the trick.

In Java I would solve it like this. Don't know how to write this in C.
int length = array.length;
for (int i = 0; i < length; i++)
{
for (int j = i + 1; j < length; j++)
{
if (array[i] == array[j])
{
int k, j;
for (k = j + 1, l = j; k < length; k++, l++)
{
if (array[k] != array[i])
{
array[l] = array[k];
}
else
{
l--;
}
}
length = l;
}
}
}

How about the following?
int* temp = malloc(sizeof(int)*len);
int count = 0;
int x =0;
int y =0;
for(x=0;x<len;x++)
{
for(y=0;y<count;y++)
{
if(*(temp+y)==*(array+x))
{
break;
}
}
if(y==count)
{
*(temp+count) = *(array+x);
count++;
}
}
memcpy(array, temp, sizeof(int)*len);
I try to declare a temp array and put the elements into that before copying everything back to the original array.

After review the problem, here is my delphi way, that may help
var
A: Array of Integer;
I,J,C,K, P: Integer;
begin
C:=10;
SetLength(A,10);
A[0]:=1; A[1]:=4; A[2]:=2; A[3]:=6; A[4]:=3; A[5]:=4;
A[6]:=3; A[7]:=4; A[8]:=2; A[9]:=5;
for I := 0 to C-1 do
begin
for J := I+1 to C-1 do
if A[I]=A[J] then
begin
for K := C-1 Downto J do
if A[J]<>A[k] then
begin
P:=A[K];
A[K]:=0;
A[J]:=P;
C:=K;
break;
end
else
begin
A[K]:=0;
C:=K;
end;
end;
end;
//tructate array
setlength(A,C);
end;

The following example should solve your problem:
def check_dump(x):
if not x in t:
t.append(x)
return True
t=[]
output = filter(check_dump, input)
print(output)
True

import java.util.ArrayList;
public class C {
public static void main(String[] args) {
int arr[] = {2,5,5,5,9,11,11,23,34,34,34,45,45};
ArrayList<Integer> arr1 = new ArrayList<Integer>();
for(int i=0;i<arr.length-1;i++){
if(arr[i] == arr[i+1]){
arr[i] = 99999;
}
}
for(int i=0;i<arr.length;i++){
if(arr[i] != 99999){
arr1.add(arr[i]);
}
}
System.out.println(arr1);
}
}

This is the naive (N*(N-1)/2) solution. It uses constant additional space and maintains the original order. It is similar to the solution by #Byju, but uses no if(){} blocks. It also avoids copying an element onto itself.
#include <stdio.h>
#include <stdlib.h>
int numbers[] = {4, 8, 4, 1, 1, 2, 9};
#define COUNT (sizeof numbers / sizeof numbers[0])
size_t undup_it(int array[], size_t len)
{
size_t src,dst;
/* an array of size=1 cannot contain duplicate values */
if (len <2) return len;
/* an array of size>1 will cannot at least one unique value */
for (src=dst=1; src < len; src++) {
size_t cur;
for (cur=0; cur < dst; cur++ ) {
if (array[cur] == array[src]) break;
}
if (cur != dst) continue; /* found a duplicate */
/* array[src] must be new: add it to the list of non-duplicates */
if (dst < src) array[dst] = array[src]; /* avoid copy-to-self */
dst++;
}
return dst; /* number of valid alements in new array */
}
void print_it(int array[], size_t len)
{
size_t idx;
for (idx=0; idx < len; idx++) {
printf("%c %d", (idx) ? ',' :'{' , array[idx] );
}
printf("}\n" );
}
int main(void) {
size_t cnt = COUNT;
printf("Before undup:" );
print_it(numbers, cnt);
cnt = undup_it(numbers,cnt);
printf("After undup:" );
print_it(numbers, cnt);
return 0;
}

This can be done in a single pass, in O(N) time in the number of integers in the input
list, and O(N) storage in the number of unique integers.
Walk through the list from front to back, with two pointers "dst" and
"src" initialized to the first item. Start with an empty hash table
of "integers seen". If the integer at src is not present in the hash,
write it to the slot at dst and increment dst. Add the integer at src
to the hash, then increment src. Repeat until src passes the end of
the input list.

Insert all the elements in a binary tree the disregards duplicates - O(nlog(n)). Then extract all of them back in the array by doing a traversal - O(n). I am assuming that you don't need order preservation.

Use bloom filter for hashing. This will reduce the memory overhead very significantly.

In JAVA,
Integer[] arrayInteger = {1,2,3,4,3,2,4,6,7,8,9,9,10};
String value ="";
for(Integer i:arrayInteger)
{
if(!value.contains(Integer.toString(i))){
value +=Integer.toString(i)+",";
}
}
String[] arraySplitToString = value.split(",");
Integer[] arrayIntResult = new Integer[arraySplitToString.length];
for(int i = 0 ; i < arraySplitToString.length ; i++){
arrayIntResult[i] = Integer.parseInt(arraySplitToString[i]);
}
output:
{ 1, 2, 3, 4, 6, 7, 8, 9, 10}
hope this will help

Create a BinarySearchTree which has O(n) complexity.

First, you should create an array check[n] where n is the number of elements of the array you want to make duplicate-free and set the value of every element(of the check array) equal to 1. Using a for loop traverse the array with the duplicates, say its name is arr, and in the for-loop write this :
{
if (check[arr[i]] != 1) {
arr[i] = 0;
}
else {
check[arr[i]] = 0;
}
}
With that, you set every duplicate equal to zero. So the only thing is left to do is to traverse the arr array and print everything it's not equal to zero. The order stays and it takes linear time (3*n).

Given an array of n elements, write an algorithm to remove all duplicates from the array in time O(nlogn)
Algorithm delete_duplicates (a[1....n])
//Remove duplicates from the given array
//input parameters :a[1:n], an array of n elements.
{
temp[1:n]; //an array of n elements.
temp[i]=a[i];for i=1 to n
temp[i].value=a[i]
temp[i].key=i
//based on 'value' sort the array temp.
//based on 'value' delete duplicate elements from temp.
//based on 'key' sort the array temp.//construct an array p using temp.
p[i]=temp[i]value
return p.
In other of elements is maintained in the output array using the 'key'. Consider the key is of length O(n), the time taken for performing sorting on the key and value is O(nlogn). So the time taken to delete all duplicates from the array is O(nlogn).

this is what i've got, though it misplaces the order we can sort in ascending or descending to fix it up.
#include <stdio.h>
int main(void){
int x,n,myvar=0;
printf("Enter a number: \t");
scanf("%d",&n);
int arr[n],changedarr[n];
for(x=0;x<n;x++){
printf("Enter a number for array[%d]: ",x);
scanf("%d",&arr[x]);
}
printf("\nOriginal Number in an array\n");
for(x=0;x<n;x++){
printf("%d\t",arr[x]);
}
int i=0,j=0;
// printf("i\tj\tarr\tchanged\n");
for (int i = 0; i < n; i++)
{
// printf("%d\t%d\t%d\t%d\n",i,j,arr[i],changedarr[i] );
for (int j = 0; j <n; j++)
{
if (i==j)
{
continue;
}
else if(arr[i]==arr[j]){
changedarr[j]=0;
}
else{
changedarr[i]=arr[i];
}
// printf("%d\t%d\t%d\t%d\n",i,j,arr[i],changedarr[i] );
}
myvar+=1;
}
// printf("\n\nmyvar=%d\n",myvar);
int count=0;
printf("\nThe unique items:\n");
for (int i = 0; i < myvar; i++)
{
if(changedarr[i]!=0){
count+=1;
printf("%d\t",changedarr[i]);
}
}
printf("\n");
}

It'd be cool if you had a good DataStructure that could quickly tell if it contains an integer. Perhaps a tree of some sort.
DataStructure elementsSeen = new DataStructure();
int elementsRemoved = 0;
for(int i=0;i<array.Length;i++){
if(elementsSeen.Contains(array[i])
elementsRemoved++;
else
array[i-elementsRemoved] = array[i];
}
array.Length = array.Length - elementsRemoved;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight