how do i get a lower time complexity with a nested loop - c

Hi everyone i'm working on c and my code looks like this, basically it's an array and in every array[j] there is a number from 1 to 8, not in order and i'm trying to find where 1 is, and do some operations, then find 2 and do other operations etc.. till number 8:
for(i=1;i<9;i++)
for(j=0;j<8;j++)
if(array[j]==i)
//and operations to do but they are not needed now;
I'm trying to find another way of doing this with less time spent in the cycle as the complexity can be (n^2). Someone advided me a hasing system to order things but i don't know if it's good enough.
Thanks to everyone!

Quicksort has an average complexity of O(n log(n)).
Sort the pair (value, index) by index so you can access them in O(1).
struct pair
{
int val;
int index;
};
int pair_compare(const void* a, const void* b)
{
struct pair* x = (struct pair*)a;
struct pair* y = (struct pair*)b;
return x->val > y->val;
}
int main()
{
int arr[8] = {4,2,1,3,6,5,8,7};
struct pair* pairs = (struct pair*) calloc(8, sizeof(*pairs));
int i;
for(i = 0; i < 8; ++i)
{
pairs[i].val = arr[i];
pairs[i].index = i;
}
qsort(pairs, 8, sizeof(*pairs), pair_compare);
for(i = 0; i < 8; ++i)
{
printf("val: %d\tindex: %d\n", pairs[i].val, pairs[i].index);
}
free(pairs);
return 0;
}

suggest the code have an outer loop going through the arrays of 8 structs
For each array of 8 structs, loop through that 8, placing the index into that array of structs into a separate array of 8 integers according to the value in the first field of each struct.
Then a loop that goes through that array of 8 offsets, one by one, performing the desired operation for that specified struct instance.

Related

Check repeated numbers on a matrix in C

How can I use repetitions to check if there aren't any repeated numbers on a n x n matrix?
Using two for's two times wouldn't let me check anything that does not share at least a line or a column
Example: (in the most simplified way possible):
int matrix[n][n];
/*matrix is filled*/
int current, isEqual;
for (int i=0; i<n; i++)
{
for (int j=0; j<n; j++)
{
current = matrix[i][j];
if (current == matrix[i][j+1])
{
isEqual=1;
}
else
{
isEqual=0;
}
}
}
for (int j=0; j<n; j++)
{
for (int i=0; i<n; i++)
{
current = matrix[i][j];
if (current == matrix[i+1][j])
{
isEqual=1;
}
else
{
isEqual=0;
}
}
}
I can't check numbers that don't share lines or columns.
First, think in a NxM matrix as if it were an array with length [N*M]. The only difference is how you access the elements (two fors instead of one, for example).
Then, a simple algorithm would be to iterate every element (first index), and for each one, iterate every other element (second index) to check if it's the same. It's easier to do with an array; in a matrix it's the same, maybe a bit more verbose and complex. But the algorithm is the same.
As a second phase, after you have implemented the basic algorithm, you can improve its performance starting the second index in the element after the first index. This way, you avoid checking the already seen elements multiple times. This algorithm improvement is slightly harder to do in a matrix, if you iterate it with 2 fors, as it's a bit harder to know what's the "next index" (you have a "compound" index, {i,j}).
One simple way to do this is to insert each number into a data structure that makes it easy to check for duplicates. This is sort of fun to do in C, and although the following is certainly not super efficient or production ready, it's (IMO) a nice little toy:
/* Check if any integer on the input stream is a dup */
#include <stdio.h>
#include <stdlib.h>
struct node { int data; struct node *child[2]; };
static struct node *
new_node(int data)
{
struct node *e = calloc(1, sizeof *e);
if( e == NULL ){
perror("calloc");
exit(EXIT_FAILURE);
}
e->data = data;
return e;
}
/*
* Insert a value into the tree. Return 1 if already present.
* Note that this tree needs to be rebalanced. In a real
* project, we would use existing libraries. For this toy
* it is not worth the work needed to properly rebalance the
* tree.
*/
int
insert(struct node **table, int data)
{
struct node *t = *table;
if( !t ){
*table = new_node(data);
return 0;
}
if( data == t->data ){
return 1;
}
return insert(&t->child[data < t->data], data);
}
int
main(void)
{
int rv, v;
struct node *table = NULL;
while( (rv = scanf("%d", &v)) == 1 ){
if( insert(&table, v) ){
fprintf(stderr, "%d is duplicated\n", v);
return EXIT_FAILURE;
}
}
if( rv != EOF ){
fprintf(stderr, "Invalid input\n");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
The basic approach is to loop through the nxn matrix and keeping a list of the numbers in it along with a count of the number of times each number is found in the nxn matrix.
The following is example source code for a 50 x 50 matrix. To extend this to an n x n matrix is fairly straightforward and I leave that as an exercise for you. You may need to do something such as using malloc() to create an arbitrary sized matrix. There are posts on that sort of thing.
I also do not specify how the data is put into the matrix in the first place. That is also up to you.
This is to just show a brute force approach for determining if there are duplicates in the matrix.
I've also taken the liberty of assuming the matrix elements are int but changing the type to something else should be straightforward. If the matrix elements are something other than a simple data value type such as int, long, etc. then the function findAndCount() will need changing for the equality comparison.
Here are the data structures I'm using.
typedef struct {
int nLength; // number of list elements in use
struct {
int iNumber; // number from an element of the nxn matrix
int iCount; // number of times this element was found in the matrix
} list[50 * 50];
} elementList;
elementList matrixList = {
0,
{0, 0}
};
int matrixThing[50][50];
next we need to loop through the matrix and with each element in the matrix to check if it is in the list. If it's not then add it. It does exist then increment the count.
for (unsigned short i = 0; i < 50; i++) {
for (unsigned short j = 0; j < 50; j++) {
findAndCount (matrixThing[i][j], &matrixList);
}
}
And then we need to define our function we use to check matrix values against the list.
void findAndCount (int matrixElement, elementList *matrixList)
{
for (int i = 0; i < matrixList->nLength; i++) {
if (matrixElement == matrixList->list[i].iNumber) {
matrixList->list[i].iCount++;
return;
}
}
// value not found in the list so we add it and set the initial count
// to one.
// we can then determine if there are any duplicates by checking the
// resulting list once we have processed all matrix elements to see
// if any count is greater than one.
// the initial check will be to see if the value of nLength is equal
// to the number of array elements in the matrix, n time n.
// so a 50 x 50 matrix should result in an nLength of 2500 if each
// element is unique.
matrixList->list[matrixList->nLength].iNumber = matrixElement;
matrixList->list[matrixList->nLength].iCount = 1;
matrixList->nLength++;
return;
}
Search algorithms
The above function, findAndCheck(), is a brute force search algorithm that searches through an unsorted list element by element until either the thing being searched for is found or the end of the list is reached.
If the list is sorted then you can use a binary search algorithm which is much quicker than a linear search. However you then run into the overhead needed to keep the list sorted using a sorting algorithm in order to use a binary search.
If you change the data structure used to store the list of found values to a data structure that maintains values in an ordered sequence, you can also cut down on the overhead of searching though there will also be an overhead of inserting new values into the data structure.
One such data structure is a tree and there are several types and algorithms to build a tree by inserting new items as well as searching a tree. See search tree which describes several different kinds of trees and searches.
So there is a kind of balancing between the effort to do searching versus the effort to add items to the data structure.
Here is an example that checks for duplicate values, the way want to do it.
Looping is slow, and we should use a hash set or a tree instead of using loops.
I assume you are not using C++, because the C++ standard library has build-in algorithms and data structures to do it efficiently.
#include <stdio.h>
/* Search the 'array' with the specified 'size' for the value 'key'
starting from 'offset' and return 1 if the value is found, otherwise 0 */
int find(int key, int* array, int size, int offset) {
for (int x = offset; x < size; ++x)
if (key == array[x])
return 1;
return 0;
}
/* Print duplicate values in a matrix */
int main(int argc, char *argv[]) {
int matrix[3][3] = { 1, 2, 3, 4, 3, 6, 2, 8, 2 };
int size = sizeof(matrix) / sizeof(matrix[0][0]);
int *ptr = (int*)matrix;
for (int x = 0; x < size; ++x) {
/* If we already checked the number, then don't check it again */
if (find(ptr[x], ptr, x, 0))
continue;
/* Check if the number repeats and show it in the console if it does */
if (find(ptr[x], ptr, size, x + 1))
printf("%d\n", ptr[x]);
}
return 0;
}
When you become better at C, you should find or implement a "hash set" or a "red-black tree", and use that instead.

What is this sorting algorithm?

I happened to write a simple sorting algorithm, but I am not sure what this algorithm is called.
#include<stdio.h>
#include<stdlib.h>
void IDontKnowWhatThisIs(int* arr, int size){
int* minuscount = malloc(size * sizeof(int)); //new location chooser array
int* valarr = malloc(size * sizeof(int)); //value backup array
//compare all elements: size^2
for (int i = 0; i < size; i++){
valarr[i] = arr[i];
minuscount[i] = 0;
for (int j = 0; j < size; j++){
if (i != j){
//the one with the least amount(0) is the smallest value
if (arr[i] - arr[j] > 0){
minuscount[i] += 1;
}
}
}
}
//O(size)
for (int i = 0; i < size; i++){
//place everything back in
arr[minuscount[i]] = valarr[i];
}
free(minuscount);
free(valarr);
//total time complexity: O(size^2)
}
int main(){
int arr[10] = { 50, 2, 13, 33, 62, 11, 30, 66, 1, -101 };
IDontKnowWhatThisIs(arr, 10);
for (int i = 0; i < 10; i++) printf("%d ", arr[i]);
return 0;
}
It is a simple algorithm that compares each elements with one another and counts new location for them.
and then it is copied back to the original array.
I don't think it is one of those generic n^2 algorithms(selection, bubble, insertion), but the concept of it is still very simple, so I am sure this algorithm already exists.
edit: on second thought, I think this is similar to a selection sort, but unoptimized as it compares even more..
I am not aware of a name for this algorithm. It's clever, but unfortunately you need to add an extra step if you want to handle possible duplicates in the array.
For instance, if the array is: [3;4;4;1;2] then minuscount will be [2;3;3;0;1] and the two 4 will be put in the same cell in arr, resulting in the final array [1;2;3;4;2] where that final 2 is leftover from the original array.
I don't known a name either. I would call it RankSort, because it computes the rank of every element, in order to permute them to their sorted location.
This sort is not very attractive because
it takes two extra arrays, one for the ranks and one as a buffer for permutation (the buffer can be avoided by implementing the permutation in-place);
as said by others, possible equal elements require special handling, namely a lexicographical comparison on value then index. This has a cost;
it performs all N² comparisons. (This can be reduced to N(N-1)/2 by updating the rank of the largest element.)

Unique pairs with equal sum in C

The question at hand is:
Q8. Given an unsorted array A[]. The task is to print all unique pairs in the unsorted array with equal sum. Consider the Input: A[] = {6, 4, 12, 10, 22, 54, 32, 42, 21, 11}
Explain the approach of solving the above problem, and write the code in any one programming language C/C++/Python/Java. What is the time complexity of the above problem?
Here is my solution to the above problem (in C) :
#include <stdio.h>
int main(){
int arr[]={6,4,12,10,22,54,32,42,21,11};
int len=sizeof(arr)/sizeof(arr[0]);
for(int i=0;i<len;i++)
for(int j=i+1;j<len;j++)
for(int k=i+1;k<len;k++)
for(int l=k+1;l<len;l++)
if(arr[i]+arr[j]==arr[l]+arr[k])
printf("(%d,%d),(%d,%d)\n",arr[i],arr[j],arr[k],arr[l]);
return 0;
}
My logic is to take one element at a time, and take its sum with every other element, and for each such iteration, compare the sum of two other unique pair of elements to it.
For example, when i=0, j=3, arr[i]+arr[j]=16. When k=1,l=2, arr[k]+arr[l]=16. Since the pairs are unique (6,10) and (4,12) and their sum is equal, I print them.
Note that the pairs are assumed to be unordered pairs so that (a,b) is the same as (b,a) and so we don't need to repeat that, as they have to be unique.
My question is : I know that my code is almost O(n^4). How can I improve/optimise it further?
FIrst you precompute the sum of each pair and keep the result in a matrix PAIRSUM.
PAIRSUM(0, 0) = 12
PAIRSUM(0, 1) = 10 a s o
Next, you loop over the PAIRSUM and see where 2 entries are similar.
So you reduced the big problem to a smaller one, in which you check the equality of 2 numbers, not of 2 sums of numbers.
For this you keep a vector PAIR in which at index X you keep the entries in PAIRSUM where the sum was X.
For example, PAIR(10) = { {0, 1} }.
You can also consider in PAIRSUM only the matrix above the diagonal, so for which the indexes (i,j) have i>j.
It would be easier in C++, Python, or Java because those languages provide high level containers. In Python, you could use a defaultdict(list) where the key would be the sums and the value a list of pairs giving that sum.
Then you only have to process unique pairs (N2 / 2)
result = collections.defaultdict(list)
for i, a in enumerate(A):
for b in A[i+1:]:
result[a+b].append((a,b))
It will be slightly more complex in C because you do not have the high-level direct access dict. If you can waste some memory and only have small numbers like here, you can say that the highest sum will be less than twice the biggest number in the input array, and directly allocate an array of that size. That way you ensure direct access from a sum. From there, you just use a linked list of pairs and that is all. As a bonus you even get a sorted list of sums.
I you cannot assume that numbers are small you will have to build a direct access container. A hash type container using N*N/2 as size (N being the length of A) and sum%size as hash function should be enough.
For completeness, here is a possible C code not doing the small numbers assumption (this code displays all pairs not only the ones with duplicated sums):
#include <stdio.h>
#include <stdlib.h>
// a node in a linked list of pairs
struct pair_node {
int first;
int second;
struct pair_node *next;
};
// a slot for a hash type containers of pairs indexed by their sum
struct slot {
int number;
struct pair_node *nodes;
struct slot *next;
};
// utility function to find (or create) a slot for a sum
struct slot* find_slot(struct slot **slots, int size, int sum) {
struct slot *slt = slots[sum%size];
while (slt != NULL) {
if (slt->number == sum) {
return slt;
}
slt = slt->next;
}
slt = malloc(sizeof(struct slot));
slt->number = sum;
slt->nodes = NULL;
slt->next = slots[sum%size];
slots[sum%size] = slt;
return slt;
}
int main() {
int A[] = {6,4,12,10,22,54,32,42,21,11}; // the array of numbers
int N = sizeof(A) / sizeof(A[0]);
int arr_size = N * N / 2; // size of the hash table of pairs
struct slot** result = malloc(arr_size * sizeof(struct slot *));
for (int i=0; i<arr_size; i++) {
result[i] = NULL;
}
// process unique pairs
for (int i=0; i<N-1; i++) {
for (int j=i+1; j<N; j++) {
int sum = A[i] + A[j];
// allocate and initialize a node
struct pair_node *node = malloc(sizeof(*node));
node->first = A[i];
node->second = A[j];
// store the node in the hash container
struct slot *slt = find_slot(result, arr_size, sum);
node->next = slt->nodes;
slt->nodes = node;
}
}
// display the result
for (int i=0; i<arr_size; i++) {
for (struct slot* slt=result[i]; slt != NULL;) {
printf("%d :", slt->number);
struct pair_node *node = slt->nodes;
while(node != NULL) {
printf(" (%d,%d)", node->first, node->second);
node = node->next;
free(node); // free what has been allocated
}
printf("\n");
struct slot *old = slt;
slt = slt->next;
free(old);
}
}
free(result);
return EXIT_SUCCESS;
}
C code for calculating all the sums and storing the sums with indexes inside an array of structures. Then we sort the structures and print adjacent structure elements with the same sum.
#include <stdlib.h>
#include <stddef.h>
#include <stdio.h>
#include <errno.h>
#include <assert.h>
// for debugging
#define debug(...) ((void)0) // printf(__VA_ARGS__)
// two indexes and a sum
struct is_s {
// one index inside the array
size_t i;
// the other index also inside the array
size_t j;
// no idea, must be random
int sum;
};
// used for qsoring the struct is_s
static int is_qsort_compare_sum(const void *a0, const void *b0) {
const struct is_s * const a = a0;
const struct is_s * const b = b0;
return a->sum - b->sum;
}
int unique_pairs(const size_t len, const int arr[len]) {
if (len <= 1) return 0;
// The number of unsorted combinations must be n!/(2!*(n-2)!)
const size_t islen = len * (len - 1) / 2; // #MOehm
debug("%zu\n", islen);
struct is_s * const is = malloc(islen * sizeof(*is));
if (is == NULL) {
return -ENOMEM;
}
size_t isidx = 0;
for (size_t i = 0; i < len; ++i) {
for (size_t j = i + 1; j < len; ++j) {
assert(isidx < islen); // just for safety
struct is_s * const p = &is[isidx++];
p->i = i;
p->j = j;
p->sum = arr[i] + arr[j];
debug("[%zu]=[%zu]=%d [%zu]=%d %d\n", isidx, p->i, arr[p->i], p->j, arr[p->j], p->sum);
}
}
qsort(is, islen, sizeof(*is), is_qsort_compare_sum);
for (size_t i = 0; i < islen - 1; ++i) {
debug("([%zu]=%d,[%zu]=%d)%d = ([%zu]=%d,[%zu]=%d)%d\n",
is[i].i, arr[is[i].i], is[i].j, arr[is[i].j], is[i].sum,
is[i+1].i, arr[is[i+1].i], is[i+1].j, arr[is[i+1].j], is[i+1].sum
);
if (is[i].sum == is[i + 1].sum) {
printf("(%d,%d),(%d,%d) = %d\n",
arr[is[i].i], arr[is[i].j],
arr[is[i+1].i], arr[is[i+1].j], is[i].sum);
}
}
free(is);
return 0;
}
int main(void) {
const int arr[] = {6,4,12,10,22,54,32,42,21,11};
return unique_pairs(sizeof(arr)/sizeof(*arr), arr);
}
The result I get is:
(6,10),(4,12) = 16
(10,22),(21,11) = 32
(12,21),(22,11) = 33
(22,21),(32,11) = 43
(32,21),(42,11) = 53
(12,42),(22,32) = 54
(10,54),(22,42) = 64
As I wonder if this is correct, as #Bathsheba noted, I think the worst case is O(n*n).
It can be done in O(N^2 * log(N^2) * M), where M is the maximum number of pairs(i, j) that have the same sum, so in worst case it would be O(N^3 * log(N)).
Lets iterate for every pair 0 <= i,j < N in order (increasing or decreasing), we have to save the sum of all the previous pairs(i, j) (to know which previous pairs have a certain sum) this can be done with a map with a integer key and a vector of pairs for the mapped value; then for every pair(i, j) you search in the map for it's sum (key = A[i] + A[j]), then al the pairs store in map[sum] are answers to this pair(i, j).
You don't have to worry about for the following pairs to (i, j) that have the sum because the following pairs when they be processed they will count it.
Here is a Java solution:
import java.util.*;
class Duplicate {
public static void main(String[] args) {
int [] a = {5,3,1,4,5,6,3,7,0,10,6,4,9,1};
List<Integer> li = new ArrayList<Integer>();
int p1=0, p2=0;
for(int i=0; i<a.length;i++) {
for(int j=i+1; j<a.length;j++){
if(a[i]+a[j] == 10) {
p1 = a[i];
p2 = a[j];
if(!li.contains(Math.abs(p2-p1))) {
li.add(Math.abs(p2-p1));
System.out.println("Pairs" + ":" + p1 + "," + p2);
}
}
p1=0;
p2=0;
}
}
}
}

Refactoring Dynamic Approach for Optimal Binary Search Tree

I am very new to the concept of Dynamic Programing and CS in general. I am teaching myself by reading lectures posted online, watching videos and solving problems posted on websites such as GeeksforGeeks and Hacker Rank.
Problem
Given input
3 25 30 5
where 3 = #of keys
25 = frequency of key 1
30 = frequency of key 2
5 = frequency of key 3
I am to print the minimum cost if each key is arranged in a optimized manner. This is a optimal binary search tree problem and I found a solution on geeks for geeks that sort of does something similar.
#include <stdio.h>
#include <limits.h>
// A utility function to get sum of array elements freq[i] to freq[j]
int sum(int freq[], int i, int j);
/* A Dynamic Programming based function that calculates minimum cost of
a Binary Search Tree. */
int optimalSearchTree(int keys[], int freq[], int n)
{
/* Create an auxiliary 2D matrix to store results of subproblems */
int cost[n][n];
/* cost[i][j] = Optimal cost of binary search tree that can be
formed from keys[i] to keys[j].
cost[0][n-1] will store the resultant cost */
// For a single key, cost is equal to frequency of the key
for (int i = 0; i < n; i++)
cost[i][i] = freq[i];
// Now we need to consider chains of length 2, 3, ... .
// L is chain length.
for (int L=2; L<=n; L++)
{
// i is row number in cost[][]
for (int i=0; i<=n-L+1; i++)
{
// Get column number j from row number i and chain length L
int j = i+L-1;
cost[i][j] = INT_MAX;
// Try making all keys in interval keys[i..j] as root
for (int r=i; r<=j; r++)
{
// c = cost when keys[r] becomes root of this subtree
int c = ((r > i)? cost[i][r-1]:0) +
((r < j)? cost[r+1][j]:0) +
sum(freq, i, j);
if (c < cost[i][j])
cost[i][j] = c;
}
}
}
return cost[0][n-1];
}
// A utility function to get sum of array elements freq[i] to freq[j]
int sum(int freq[], int i, int j)
{
int s = 0;
for (int k = i; k <=j; k++)
s += freq[k];
return s;
}
// Driver program to test above functions
int main()
{
int keys[] = {0,1,2};
int freq[] = {34, 8, 50};
int n = sizeof(keys)/sizeof(keys[0]);
printf("Cost of Optimal BST is %d ", optimalSearchTree(keys, freq, n));
return 0;
}
However in this solution they are also taking input of the "keys", but it seems they have no impact on the final answer, as they shouldn't. Only the frequency of how many time each key is searched for matters.
For simplicity sake and understanding this dynamic approach, I was wondering how can I possibly modify this solution so that it takes its input in the format shown above and prints the result.
The function you presented does have a keys parameter, but it does not use it. You could remove it altogether.
Edit: in particular, since function optimalSearchTree() does not use its keys parameter at all, removing that argument requires changing only the function signature (...
int optimalSearchTree(int freq[], int n)
...) and the one call of that function. Since you don't need the keys for this particular exercise, though, you can altogether remove them from the main program, too, to give you:
int main()
{
int freq[] = {25, 30, 5};
int n = sizeof(freq)/sizeof(freq[0]);
printf("Cost of Optimal BST is %d ", optimalSearchTree(freq, n));
return 0;
}
(substituting the frequency values you specified for the ones in the original code)
The function does, however, assume that the frequencies are given in order of increasing key. It needs at least the relative key order to do its job, because otherwise you cannot construct a search tree. If you were uncomfortable with the idea that the key values are unknown, you could interpret the code to be using indices into the freq[] array as aliases for the key values. That works because a consequence of the assumption described above is that x -> keys[x] is a 1:1, order-preserving mapping from integers 0 ... n - 1 to whatever the actual keys are.
If the function could not assume the frequencies were initially given in increasing order by key, then it could first use the keys to sort the frequencies into that order, and then proceed as it does now.

Algorithm: efficient way to remove duplicate integers from an array

I got this problem from an interview with Microsoft.
Given an array of random integers,
write an algorithm in C that removes
duplicated numbers and return the unique numbers in the original
array.
E.g Input: {4, 8, 4, 1, 1, 2, 9} Output: {4, 8, 1, 2, 9, ?, ?}
One caveat is that the expected algorithm should not required the array to be sorted first. And when an element has been removed, the following elements must be shifted forward as well. Anyway, value of elements at the tail of the array where elements were shifted forward are negligible.
Update: The result must be returned in the original array and helper data structure (e.g. hashtable) should not be used. However, I guess order preservation is not necessary.
Update2: For those who wonder why these impractical constraints, this was an interview question and all these constraints are discussed during the thinking process to see how I can come up with different ideas.
A solution suggested by my girlfriend is a variation of merge sort. The only modification is that during the merge step, just disregard duplicated values. This solution would be as well O(n log n). In this approach, the sorting/duplication removal are combined together. However, I'm not sure if that makes any difference, though.
I've posted this once before on SO, but I'll reproduce it here because it's pretty cool. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows:
Take the first element of the array, this will be the sentinel.
Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel.
Move all elements for which the index is equal to the hash to the beginning of the array.
Move all elements that are equal to sentinel, except the first element of the array, to the end of the array.
What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements.
This can be shown to be O(N) provided no pathological scenario in the hashing: Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast.
Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion.
Here's an implementation in the D programming language:
void uniqueInPlace(T)(ref T[] dataIn) {
uniqueInPlaceImpl(dataIn, 0);
}
void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) {
if(dataIn.length - start < 2)
return;
invariant T sentinel = dataIn[start];
T[] data = dataIn[start + 1..$];
static hash_t getHash(T elem) {
static if(is(T == uint) || is(T == int)) {
return cast(hash_t) elem;
} else static if(__traits(compiles, elem.toHash)) {
return elem.toHash;
} else {
static auto ti = typeid(typeof(elem));
return ti.getHash(&elem);
}
}
for(size_t index = 0; index < data.length;) {
if(data[index] == sentinel) {
index++;
continue;
}
auto hash = getHash(data[index]) % data.length;
if(index == hash) {
index++;
continue;
}
if(data[index] == data[hash]) {
data[index] = sentinel;
index++;
continue;
}
if(data[hash] == sentinel) {
swap(data[hash], data[index]);
index++;
continue;
}
auto hashHash = getHash(data[hash]) % data.length;
if(hashHash != hash) {
swap(data[index], data[hash]);
if(hash < index)
index++;
} else {
index++;
}
}
size_t swapPos = 0;
foreach(i; 0..data.length) {
if(data[i] != sentinel && i == getHash(data[i]) % data.length) {
swap(data[i], data[swapPos++]);
}
}
size_t sentinelPos = data.length;
for(size_t i = swapPos; i < sentinelPos;) {
if(data[i] == sentinel) {
swap(data[i], data[--sentinelPos]);
} else {
i++;
}
}
dataIn = dataIn[0..sentinelPos + start + 1];
uniqueInPlaceImpl(dataIn, start + swapPos + 1);
}
How about:
void rmdup(int *array, int length)
{
int *current , *end = array + length - 1;
for ( current = array + 1; array < end; array++, current = array + 1 )
{
while ( current <= end )
{
if ( *current == *array )
{
*current = *end--;
}
else
{
current++;
}
}
}
}
Should be O(n^2) or less.
If you are looking for the superior O-notation, then sorting the array with an O(n log n) sort then doing a O(n) traversal may be the best route. Without sorting, you are looking at O(n^2).
Edit: if you are just doing integers, then you can also do radix sort to get O(n).
One more efficient implementation
int i, j;
/* new length of modified array */
int NewLength = 1;
for(i=1; i< Length; i++){
for(j=0; j< NewLength ; j++)
{
if(array[i] == array[j])
break;
}
/* if none of the values in index[0..j] of array is not same as array[i],
then copy the current value to corresponding new position in array */
if (j==NewLength )
array[NewLength++] = array[i];
}
In this implementation there is no need for sorting the array.
Also if a duplicate element is found, there is no need for shifting all elements after this by one position.
The output of this code is array[] with size NewLength
Here we are starting from the 2nd elemt in array and comparing it with all the elements in array up to this array.
We are holding an extra index variable 'NewLength' for modifying the input array.
NewLength variabel is initialized to 0.
Element in array[1] will be compared with array[0].
If they are different, then value in array[NewLength] will be modified with array[1] and increment NewLength.
If they are same, NewLength will not be modified.
So if we have an array [1 2 1 3 1],
then
In First pass of 'j' loop, array[1] (2) will be compared with array0, then 2 will be written to array[NewLength] = array[1]
so array will be [1 2] since NewLength = 2
In second pass of 'j' loop, array[2] (1) will be compared with array0 and array1. Here since array[2] (1) and array0 are same loop will break here.
so array will be [1 2] since NewLength = 2
and so on
1. Using O(1) extra space, in O(n log n) time
This is possible, for instance:
first do an in-place O(n log n) sort
then walk through the list once, writing the first instance of every back to the beginning of the list
I believe ejel's partner is correct that the best way to do this would be an in-place merge sort with a simplified merge step, and that that is probably the intent of the question, if you were eg. writing a new library function to do this as efficiently as possible with no ability to improve the inputs, and there would be cases it would be useful to do so without a hash-table, depending on the sorts of inputs. But I haven't actually checked this.
2. Using O(lots) extra space, in O(n) time
declare a zero'd array big enough to hold all integers
walk through the array once
set the corresponding array element to 1 for each integer.
If it was already 1, skip that integer.
This only works if several questionable assumptions hold:
it's possible to zero memory cheaply, or the size of the ints are small compared to the number of them
you're happy to ask your OS for 256^sizepof(int) memory
and it will cache it for you really really efficiently if it's gigantic
It's a bad answer, but if you have LOTS of input elements, but they're all 8-bit integers (or maybe even 16-bit integers) it could be the best way.
3. O(little)-ish extra space, O(n)-ish time
As #2, but use a hash table.
4. The clear way
If the number of elements is small, writing an appropriate algorithm is not useful if other code is quicker to write and quicker to read.
Eg. Walk through the array for each unique elements (ie. the first element, the second element (duplicates of the first having been removed) etc) removing all identical elements. O(1) extra space, O(n^2) time.
Eg. Use library functions which do this. efficiency depends which you have easily available.
Well, it's basic implementation is quite simple. Go through all elements, check whether there are duplicates in the remaining ones and shift the rest over them.
It's terrible inefficient and you could speed it up by a helper-array for the output or sorting/binary trees, but this doesn't seem to be allowed.
If you are allowed to use C++, a call to std::sort followed by a call to std::unique will give you the answer. The time complexity is O(N log N) for the sort and O(N) for the unique traversal.
And if C++ is off the table there isn't anything that keeps these same algorithms from being written in C.
You could do this in a single traversal, if you are willing to sacrifice memory. You can simply tally whether you have seen an integer or not in a hash/associative array. If you have already seen a number, remove it as you go, or better yet, move numbers you have not seen into a new array, avoiding any shifting in the original array.
In Perl:
foreach $i (#myary) {
if(!defined $seen{$i}) {
$seen{$i} = 1;
push #newary, $i;
}
}
The return value of the function should be the number of unique elements and they are all stored at the front of the array. Without this additional information, you won't even know if there were any duplicates.
Each iteration of the outer loop processes one element of the array. If it is unique, it stays in the front of the array and if it is a duplicate, it is overwritten by the last unprocessed element in the array. This solution runs in O(n^2) time.
#include <stdio.h>
#include <stdlib.h>
size_t rmdup(int *arr, size_t len)
{
size_t prev = 0;
size_t curr = 1;
size_t last = len - 1;
while (curr <= last) {
for (prev = 0; prev < curr && arr[curr] != arr[prev]; ++prev);
if (prev == curr) {
++curr;
} else {
arr[curr] = arr[last];
--last;
}
}
return curr;
}
void print_array(int *arr, size_t len)
{
printf("{");
size_t curr = 0;
for (curr = 0; curr < len; ++curr) {
if (curr > 0) printf(", ");
printf("%d", arr[curr]);
}
printf("}");
}
int main()
{
int arr[] = {4, 8, 4, 1, 1, 2, 9};
printf("Before: ");
size_t len = sizeof (arr) / sizeof (arr[0]);
print_array(arr, len);
len = rmdup(arr, len);
printf("\nAfter: ");
print_array(arr, len);
printf("\n");
return 0;
}
Here is a Java Version.
int[] removeDuplicate(int[] input){
int arrayLen = input.length;
for(int i=0;i<arrayLen;i++){
for(int j = i+1; j< arrayLen ; j++){
if(((input[i]^input[j]) == 0)){
input[j] = 0;
}
if((input[j]==0) && j<arrayLen-1){
input[j] = input[j+1];
input[j+1] = 0;
}
}
}
return input;
}
Here is my solution.
///// find duplicates in an array and remove them
void unique(int* input, int n)
{
merge_sort(input, 0, n) ;
int prev = 0 ;
for(int i = 1 ; i < n ; i++)
{
if(input[i] != input[prev])
if(prev < i-1)
input[prev++] = input[i] ;
}
}
An array should obviously be "traversed" right-to-left to avoid unneccessary copying of values back and forth.
If you have unlimited memory, you can allocate a bit array for sizeof(type-of-element-in-array) / 8 bytes to have each bit signify whether you've already encountered corresponding value or not.
If you don't, I can't think of anything better than traversing an array and comparing each value with values that follow it and then if duplicate is found, remove these values altogether. This is somewhere near O(n^2) (or O((n^2-n)/2)).
IBM has an article on kinda close subject.
Let's see:
O(N) pass to find min/max allocate
bit-array for found
O(N) pass swapping duplicates to end.
This can be done in one pass with an O(N log N) algorithm and no extra storage.
Proceed from element a[1] to a[N]. At each stage i, all of the elements to the left of a[i] comprise a sorted heap of elements a[0] through a[j]. Meanwhile, a second index j, initially 0, keeps track of the size of the heap.
Examine a[i] and insert it into the heap, which now occupies elements a[0] to a[j+1]. As the element is inserted, if a duplicate element a[k] is encountered having the same value, do not insert a[i] into the heap (i.e., discard it); otherwise insert it into the heap, which now grows by one element and now comprises a[0] to a[j+1], and increment j.
Continue in this manner, incrementing i until all of the array elements have been examined and inserted into the heap, which ends up occupying a[0] to a[j]. j is the index of the last element of the heap, and the heap contains only unique element values.
int algorithm(int[] a, int n)
{
int i, j;
for (j = 0, i = 1; i < n; i++)
{
// Insert a[i] into the heap a[0...j]
if (heapInsert(a, j, a[i]))
j++;
}
return j;
}
bool heapInsert(a[], int n, int val)
{
// Insert val into heap a[0...n]
...code omitted for brevity...
if (duplicate element a[k] == val)
return false;
a[k] = val;
return true;
}
Looking at the example, this is not exactly what was asked for since the resulting array preserves the original element order. But if this requirement is relaxed, the algorithm above should do the trick.
In Java I would solve it like this. Don't know how to write this in C.
int length = array.length;
for (int i = 0; i < length; i++)
{
for (int j = i + 1; j < length; j++)
{
if (array[i] == array[j])
{
int k, j;
for (k = j + 1, l = j; k < length; k++, l++)
{
if (array[k] != array[i])
{
array[l] = array[k];
}
else
{
l--;
}
}
length = l;
}
}
}
How about the following?
int* temp = malloc(sizeof(int)*len);
int count = 0;
int x =0;
int y =0;
for(x=0;x<len;x++)
{
for(y=0;y<count;y++)
{
if(*(temp+y)==*(array+x))
{
break;
}
}
if(y==count)
{
*(temp+count) = *(array+x);
count++;
}
}
memcpy(array, temp, sizeof(int)*len);
I try to declare a temp array and put the elements into that before copying everything back to the original array.
After review the problem, here is my delphi way, that may help
var
A: Array of Integer;
I,J,C,K, P: Integer;
begin
C:=10;
SetLength(A,10);
A[0]:=1; A[1]:=4; A[2]:=2; A[3]:=6; A[4]:=3; A[5]:=4;
A[6]:=3; A[7]:=4; A[8]:=2; A[9]:=5;
for I := 0 to C-1 do
begin
for J := I+1 to C-1 do
if A[I]=A[J] then
begin
for K := C-1 Downto J do
if A[J]<>A[k] then
begin
P:=A[K];
A[K]:=0;
A[J]:=P;
C:=K;
break;
end
else
begin
A[K]:=0;
C:=K;
end;
end;
end;
//tructate array
setlength(A,C);
end;
The following example should solve your problem:
def check_dump(x):
if not x in t:
t.append(x)
return True
t=[]
output = filter(check_dump, input)
print(output)
True
import java.util.ArrayList;
public class C {
public static void main(String[] args) {
int arr[] = {2,5,5,5,9,11,11,23,34,34,34,45,45};
ArrayList<Integer> arr1 = new ArrayList<Integer>();
for(int i=0;i<arr.length-1;i++){
if(arr[i] == arr[i+1]){
arr[i] = 99999;
}
}
for(int i=0;i<arr.length;i++){
if(arr[i] != 99999){
arr1.add(arr[i]);
}
}
System.out.println(arr1);
}
}
This is the naive (N*(N-1)/2) solution. It uses constant additional space and maintains the original order. It is similar to the solution by #Byju, but uses no if(){} blocks. It also avoids copying an element onto itself.
#include <stdio.h>
#include <stdlib.h>
int numbers[] = {4, 8, 4, 1, 1, 2, 9};
#define COUNT (sizeof numbers / sizeof numbers[0])
size_t undup_it(int array[], size_t len)
{
size_t src,dst;
/* an array of size=1 cannot contain duplicate values */
if (len <2) return len;
/* an array of size>1 will cannot at least one unique value */
for (src=dst=1; src < len; src++) {
size_t cur;
for (cur=0; cur < dst; cur++ ) {
if (array[cur] == array[src]) break;
}
if (cur != dst) continue; /* found a duplicate */
/* array[src] must be new: add it to the list of non-duplicates */
if (dst < src) array[dst] = array[src]; /* avoid copy-to-self */
dst++;
}
return dst; /* number of valid alements in new array */
}
void print_it(int array[], size_t len)
{
size_t idx;
for (idx=0; idx < len; idx++) {
printf("%c %d", (idx) ? ',' :'{' , array[idx] );
}
printf("}\n" );
}
int main(void) {
size_t cnt = COUNT;
printf("Before undup:" );
print_it(numbers, cnt);
cnt = undup_it(numbers,cnt);
printf("After undup:" );
print_it(numbers, cnt);
return 0;
}
This can be done in a single pass, in O(N) time in the number of integers in the input
list, and O(N) storage in the number of unique integers.
Walk through the list from front to back, with two pointers "dst" and
"src" initialized to the first item. Start with an empty hash table
of "integers seen". If the integer at src is not present in the hash,
write it to the slot at dst and increment dst. Add the integer at src
to the hash, then increment src. Repeat until src passes the end of
the input list.
Insert all the elements in a binary tree the disregards duplicates - O(nlog(n)). Then extract all of them back in the array by doing a traversal - O(n). I am assuming that you don't need order preservation.
Use bloom filter for hashing. This will reduce the memory overhead very significantly.
In JAVA,
Integer[] arrayInteger = {1,2,3,4,3,2,4,6,7,8,9,9,10};
String value ="";
for(Integer i:arrayInteger)
{
if(!value.contains(Integer.toString(i))){
value +=Integer.toString(i)+",";
}
}
String[] arraySplitToString = value.split(",");
Integer[] arrayIntResult = new Integer[arraySplitToString.length];
for(int i = 0 ; i < arraySplitToString.length ; i++){
arrayIntResult[i] = Integer.parseInt(arraySplitToString[i]);
}
output:
{ 1, 2, 3, 4, 6, 7, 8, 9, 10}
hope this will help
Create a BinarySearchTree which has O(n) complexity.
First, you should create an array check[n] where n is the number of elements of the array you want to make duplicate-free and set the value of every element(of the check array) equal to 1. Using a for loop traverse the array with the duplicates, say its name is arr, and in the for-loop write this :
{
if (check[arr[i]] != 1) {
arr[i] = 0;
}
else {
check[arr[i]] = 0;
}
}
With that, you set every duplicate equal to zero. So the only thing is left to do is to traverse the arr array and print everything it's not equal to zero. The order stays and it takes linear time (3*n).
Given an array of n elements, write an algorithm to remove all duplicates from the array in time O(nlogn)
Algorithm delete_duplicates (a[1....n])
//Remove duplicates from the given array
//input parameters :a[1:n], an array of n elements.
{
temp[1:n]; //an array of n elements.
temp[i]=a[i];for i=1 to n
temp[i].value=a[i]
temp[i].key=i
//based on 'value' sort the array temp.
//based on 'value' delete duplicate elements from temp.
//based on 'key' sort the array temp.//construct an array p using temp.
p[i]=temp[i]value
return p.
In other of elements is maintained in the output array using the 'key'. Consider the key is of length O(n), the time taken for performing sorting on the key and value is O(nlogn). So the time taken to delete all duplicates from the array is O(nlogn).
this is what i've got, though it misplaces the order we can sort in ascending or descending to fix it up.
#include <stdio.h>
int main(void){
int x,n,myvar=0;
printf("Enter a number: \t");
scanf("%d",&n);
int arr[n],changedarr[n];
for(x=0;x<n;x++){
printf("Enter a number for array[%d]: ",x);
scanf("%d",&arr[x]);
}
printf("\nOriginal Number in an array\n");
for(x=0;x<n;x++){
printf("%d\t",arr[x]);
}
int i=0,j=0;
// printf("i\tj\tarr\tchanged\n");
for (int i = 0; i < n; i++)
{
// printf("%d\t%d\t%d\t%d\n",i,j,arr[i],changedarr[i] );
for (int j = 0; j <n; j++)
{
if (i==j)
{
continue;
}
else if(arr[i]==arr[j]){
changedarr[j]=0;
}
else{
changedarr[i]=arr[i];
}
// printf("%d\t%d\t%d\t%d\n",i,j,arr[i],changedarr[i] );
}
myvar+=1;
}
// printf("\n\nmyvar=%d\n",myvar);
int count=0;
printf("\nThe unique items:\n");
for (int i = 0; i < myvar; i++)
{
if(changedarr[i]!=0){
count+=1;
printf("%d\t",changedarr[i]);
}
}
printf("\n");
}
It'd be cool if you had a good DataStructure that could quickly tell if it contains an integer. Perhaps a tree of some sort.
DataStructure elementsSeen = new DataStructure();
int elementsRemoved = 0;
for(int i=0;i<array.Length;i++){
if(elementsSeen.Contains(array[i])
elementsRemoved++;
else
array[i-elementsRemoved] = array[i];
}
array.Length = array.Length - elementsRemoved;

Resources