Finding minimun and maximum using recursion in C program - c

I need your help please.
I need to do a recursive function that finds the minimum and the maximum of any array it receives.
I need to implement function void minMax(int arr[], int left, int right, int min_max[]), but I don't know how to start, I would like if you'll give me some ideas.
This is an e.x to how the output supposed to show:
min_max[1]=100 ו min_max[0]=(-4)
On this detail:
left = 2 , right = 5, arr = [3,−1,3,100,2,−4,3], assumption : left <= right.
Thank you all.

i think what you are looking for is the Tournament Method :
Divide the array into two parts and compare the maximums and minimums of the two parts to get the maximum and the minimum of the whole array
(this is similar to binary search)
#include<stdio.h>
struct pair
{
int min;
int max;
};
struct pair getMinMax(int arr[], int low, int high)
{
struct pair minmax, mml, mmr;
int mid;
// If there is only one element
if (low == high)
{
minmax.max = arr[low];
minmax.min = arr[low];
return minmax;
}
/* If there are two elements */
if (high == low + 1)
{
if (arr[low] > arr[high])
{
minmax.max = arr[low];
minmax.min = arr[high];
}
else
{
minmax.max = arr[high];
minmax.min = arr[low];
}
return minmax;
}
/* If there are more than 2 elements */
mid = (low + high)/2;
mml = getMinMax(arr, low, mid);
mmr = getMinMax(arr, mid+1, high);
/* compare minimums of two parts*/
if (mml.min < mmr.min)
minmax.min = mml.min;
else
minmax.min = mmr.min;
/* compare maximums of two parts*/
if (mml.max > mmr.max)
minmax.max = mml.max;
else
minmax.max = mmr.max;
return minmax;
}
/* Driver program to test above function */
int main()
{
int arr[] = {1000, 11, 445, 1, 330, 3000};
int arr_size = 6;
struct pair minmax = getMinMax(arr, 0, arr_size-1);
printf("nMinimum element is %d", minmax.min);
printf("nMaximum element is %d", minmax.max);
getchar();
}
source : https://www.geeksforgeeks.org/maximum-and-minimum-in-an-array/ (Method 2)

First of all this question has a terrible redaction. In order to get better answers you should work more on your questions otherwise it makes it harder for the other people to help you and also feels like you haven't done enough prior research.
About the content of your question I would solve this with pseudocode first (As with any other "hard" problem). I supose this is homework because you already have the function signature.
The left and right parameters aren't needed or I am not understanding what they are used for.
The pseudocode function could look like this:
function minMax(arr, mixMax)
if arr is empty
return
elem = extract last elem from arr
if mixMax[0] is empty or elem > mixMax[0]
mixMax[0] = elem
if mixMax[1] is empty or elem < mixMax[1]
mixMax[1] = elem
minMax(arr, minMax)

Related

Binary Search Using Recursive Function in C

So I'm trying to write a binary search function that uses recursion and keep getting a segmentation fault if I go past two values in the array. I've looked at a bunch of other code doing what I'm trying to do and as far as I can see they appear to do the same thing. I'm a very novice programmer and feel like I'm banging my head against the wall with this. Any help would be appreciated.
int search(int value, int array[], int start, int end)
{
//Define new variables for use in recursion
int sizeArray, middleOfArray;
//Get size of array
sizeArray = (end - start) + 1;
//Find midpoint of array based off size
middleOfArray = sizeArray / 2;
//Base Case 1, if array unscannable, return -1
if (start > end) {
return -1;
}
//Recursive Cases
else
{
//If midpoint in array is > target value,
//Search from beginning of array->one below midpoint
if (array[middleOfArray] > value){
return search(value, array, start, middleOfArray - 1);
}
//If midpoint in array is < target value,
//search from one above midpoint->end of array
else if (array[middleOfArray] < value) {
return search(value, array, middleOfArray + 1, end);
}
//If none of the other cases are satisfied, value=midpoint
//Return midpoint
else {
return middleOfArray;
}
}
}
The problem is here:
middleOfArray = sizeArray / 2;
It should be like this:
middleOfArray = start + sizeArray / 2;
You can also get middle of array like this. Which will save you from one extra variable sizeofArray.
middleofArray=(start+end)/2;

Max in array and its frequency

How do you write a function that finds max value in an array as well as the number of times the value appears in the array?
We have to use recursion to solve this problem.
So far i am thinking it should be something like this:
int findMax(int[] a, int head, int last)
{
int max = 0;
if (head == last) {
return a[head];
}
else if (a[head] < a[last]) {
count ++;
return findMax(a, head + 1, last);
}
}
i am not sure if this will return the absolute highest value though, and im not exactly sure how to change what i have
Setting the initial value of max to INT_MIN solves a number of issues. #Rerito
But the approach OP uses iterates through each member of the array and incurs a recursive call for each element. So if the array had 1000 int there would be about 1000 nested calls.
A divide and conquer approach:
If the array length is 0 or 1, handle it. Else find the max answer from the 1st and second halves. Combine the results as appropriate. By dividing by 2, the stack depth usage for a 1000 element array will not exceed 10 nested calls.
Note: In either approach, the number of calls is the same. The difference lies in the maximum degree of nesting. Using recursion where a simple for() loop would suffice is questionable. To conquer a more complex assessment is recursion's strength, hence this approach.
To find the max and its frequency using O(log2(length)) stack depth usage:
#include <stddef.h>
typedef struct {
int value;
size_t frequency; // `size_t` better to use that `int` for large arrays.
} value_freq;
value_freq findMax(const int *a, size_t length) {
value_freq vf;
if (length <= 1) {
if (length == 0) {
vf.value = INT_MIN; // Degenerate value if the array was size 0.
vf.frequency = 0;
} else {
vf.value = *a;
vf.frequency = 1;
}
} else {
size_t length1sthalf = length / 2;
vf = findMax(a, length1sthalf);
value_freq vf1 = findMax(&a[length1sthalf], length - length1sthalf);
if (vf1.value > vf.value)
return vf1;
if (vf.value == vf1.value)
vf.frequency += vf1.frequency;
}
return vf;
}
Your are not thaaaat far.
In order to save the frequency and the max you can keep a pointer to a structure, then just pass the pointer to the start of your array, the length you want to go through, and a pointer to this struct.
Keep in mind that you should use INT_MIN in limits.h as your initial max (see reset(maxfreq *) in the code below), as int can carry negative values.
The following code does the job recursively:
#include <limits.h>
typedef struct {
int max;
int freq;
} maxfreq;
void reset(maxfreq *mfreq){
mfreq->max = INT_MIN;
mfreq->freq = 0;
}
void findMax(int* a, int length, maxfreq *mfreq){
if(length>0){
if(*a == mfreq->max)
mfreq->freq++;
else if(*a > mfreq->max){
mfreq->freq = 1;
mfreq->max = *a;
}
findMax(a+1, length - 1, mfreq);
}
}
A call to findMax will recall itself as many times as the initial length plus one, each time incrementing the provided pointer and processing the corresponding element, so this is basically just going through all of the elements in a once, and no weird splitting.
this works fine with me :
#include <stdio.h>
#include <string.h>
// define a struct that contains the (max, freq) information
struct arrInfo
{
int max;
int count;
};
struct arrInfo maxArr(int * arr, int max, int size, int count)
{
int maxF;
struct arrInfo myArr;
if(size == 0) // to return from recursion we check the size left
{
myArr.max = max; // prepare the struct to output
myArr.count = count;
return(myArr);
}
if(*arr > max) // new maximum found
{
maxF = *arr; // update the max
count = 1; // initialize the frequency
}
else if (*arr == max) // same max encountered another time
{
maxF = max; // keep track of same max
count ++; // increase frequency
}
else // nothing changes
maxF = max; // keep track of max
arr++; // move the pointer to next element
size --; // decrease size by 1
return(maxArr(arr, maxF, size, count)); // recursion
}
int main()
{
struct arrInfo info; // return of the recursive function
// define an array
int arr[] = {8, 4, 8, 3, 7};
info = maxArr(arr, 0, 5, 1); // call with max=0 size=5 freq=1
printf("max = %d count = %d\n", info.max, info.count);
return 0;
}
when ran, it outputs :
max = 8 count = 3
Notice
In my code example I assumed the numbers to be positive (initializing max to 0), I don't know your requirements but you can elaborate.
The reqirements in your assignment are at least questionable. Just for reference, here is how this should be done in real code (to solve your assignment, refer to the other answers):
int findMax(int length, int* array, int* maxCount) {
int trash;
if(!maxCount) maxCount = &trash; //make sure we ignore it when a NULL pointer is passed in
*maxCount = 0;
int result = INT_MIN;
for(int i = 0; i < length; i++) {
if(array[i] > result) {
*maxCount = 1;
result = array[i];
} else if(array[i] == result) {
(*maxCount)++;
}
}
return result;
}
Always do things as straight forward as you can.

Sorting an array of coordinates by their distance from origin

The code should take an array of coordinates from the user, then sort that array, putting the coordinates in order of their distance from the origin. I believe my problem lies in the sorting function (I have used a quicksort).
I am trying to write the function myself to get a better understanding of it, which is why I'm not using qsort().
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define MAX_SIZE 64
typedef struct
{
double x, y;
}POINT;
double distance(POINT p1, POINT p2);
void sortpoints(double distances[MAX_SIZE], int firstindex, int lastindex, POINT data[MAX_SIZE]);
void printpoints(POINT data[], int n_points);
int main()
{
int n_points, i;
POINT data[MAX_SIZE], origin = { 0, 0 };
double distances[MAX_SIZE];
printf("How many values would you like to enter?\n");
scanf("%d", &n_points);
printf("enter your coordinates\n");
for (i = 0; i < n_points; i++)
{
scanf("%lf %lf", &data[i].x, &data[i].y);
distances[i] = distance(data[i], origin); //data and distances is linked by their index number in both arrays
}
sortpoints(distances, 0, i, data);
return 0;
}
double distance(POINT p1, POINT p2)
{
return sqrt(pow((p1.x - p2.x), 2) + pow((p1.y - p2.y), 2));
}
void printpoints(POINT *data, int n_points)
{
int i;
printf("Sorted points (according to distance from the origin):\n");
for (i = 0; i < n_points; i++)
{
printf("%.2lf %.2lf\n", data[i].x, data[i].y);
}
}
//quicksort
void sortpoints(double distances[MAX_SIZE], int firstindex, int lastindex, POINT data[MAX_SIZE])
{
int indexleft = firstindex;
int indexright = lastindex;
int indexpivot = (int)((lastindex + 1) / 2);
int n_points = lastindex + 1;
double left = distances[indexleft];
double right = distances[indexright];
double pivot = distances[indexpivot];
POINT temp;
if (firstindex < lastindex) //this will halt the recursion of the sorting function once all the arrays are 1-size
{
while (indexleft < indexpivot || indexright > indexpivot) //this will stop the sorting once both selectors reach the pivot position
{
//reset the values of left and right for the iterations of this loop
left = distances[indexleft];
right = distances[indexright];
while (left < pivot)
{
indexleft++;
left = distances[indexleft];
}
while (right > pivot)
{
indexright--;
right = distances[indexright];
}
distances[indexright] = left;
distances[indexleft] = right;
temp = data[indexleft];
data[indexleft] = data[indexright];
data[indexright] = temp;
}
//recursive sorting to sort the sublists
sortpoints(distances, firstindex, indexpivot - 1, data);
sortpoints(distances, indexpivot + 1, lastindex, data);
}
printpoints(data, n_points);
}
Thanks for your help, I have been trying to debug this for hours, even using a debugger.
Ouch! You call sortpoints() with i as argument. That argument, according to your prototype and code, should be the last index, and i is not the last index, but the last index + 1.
int indexleft = firstindex;
int indexright = lastindex; // indexright is pointing to a non-existent element.
int indexpivot = (int)((lastindex + 1) / 2);
int n_points = lastindex + 1;
double left = distances[indexleft];
double right = distances[indexright]; // now right is an undefined value, or segfault.
To fix that, call your sortpoints() function as:
sortpoints (0, n_points-1, data);
The problem is in your sortpoints function. The first while loop is looping infinitely. To test that is it an infinite loop or not place a printf statement
printf("Testing first while loop\n");
in your first while loop. You have to fix that.
There are quite a number of problems, but one of them is:
int indexpivot = (int)((lastindex + 1) / 2);
The cast is unnecessary, but that's trivia. Much more fundamental is that if you are sorting a segment from, say, 48..63, you will be pivoting on element 32, which is not in the range you are supposed to be working on. You need to use:
int indexpivot = (lastindex + firstindex) / 2;
or perhaps:
int indexpivot = (lastindex + firstindex + 1) / 2;
For the example range, these will pivot on element 55 or 56, which is at least within the range.
I strongly recommend:
Creating a print function similar to printpoints() but with the following differences:
Takes a 'tag' string to identify what it is printing.
Takes and prints the distance array too.
Takes the arrays and a pair of offsets.
Use this function inside the sort function before recursing.
Use this function inside the sort function before returning.
Use this function in the main function after you've read the data.
Use this function in the main function after the data is sorted.
Print key values — the pivot distance, the pivot index, at appropriate points.
This allows you to check that your partitioning is working correctly (it isn't at the moment).
Then, when you've got the code working, you can remove or disable (comment out) the printing code in the sort function.

given an array, for each element, find out the total number of elements lesser than it, which appear to the right of it

I had previously posted a question, Given an array, find out the next smaller element for each element
now, i was trying to know , if there is any way to find out "given an array, for each element, find out the total number of elements lesser than it, which appear to the right of it"
for example, the array [4 2 1 5 3] should yield [3 1 0 1 0]??
[EDIT]
I have worked out a solution, please have a look at it, and let me know if there is any mistake.
1 Make a balanced BST inserting elements traversing the array from right to left
2 The BST is made in such a way that each element holds the size of the tree rooted at that element
3 Now while you search for the right position to insert any element, take account of the total size of the subtree rooted at left sibling + 1(for parent) if you move right
Now since, the count is being calculated at the time of insertion of an element, and that we are moving from right to left, we get the exact count of elements lesser than the given element appearing after it.
It can be solved in O(n log n).
If in a BST you store the number of elements of the subtree rooted at that node when you search the node (reaching that from the root) you can count number of elements larger/smaller than that in the path:
int count_larger(node *T, int key, int current_larger){
if (*T == nil)
return -1;
if (T->key == key)
return current_larger + (T->right_child->size);
if (T->key > key)
return count_larger(T->left_child, key, current_larger + (T->right_child->size) + 1);
return count_larger(T->right_child, key, current_larger)
}
** for example if this is our tree and we're searching for key 3, count_larger will be called for:
-> (node 2, 3, 0)
--> (node 4, 3, 0)
---> (node 3, 3, 2)
and the final answer would be 2 as expected.
Suppose the Array is 6,-1,5,10,12,4,1,3,7,50
Steps
1.We start building a BST from right end of the array.Since we are concerned with all the elements to right for any element.
2.Suppose we have formed the partial solution tree upto the 10.
3.Now when inserting 5 we do a tree traversal and insert to the right of 4.
Notice that each time we traverse to the right of any node we increment by 1 and add the no. of elements in left subtree of that node.
eg:
for 50 it is 0
for 7 it is 0
for 12 it is 1 right traversel + leftsubtree size of 7 = 1+3 =4
for 10 same as above.
for 4 it is 1+1 =2
While building bst we can easily maintain the left subtree size for each node by simply maintaining a variable corresponding to it and incrementing it by 1 each time a node traverses to the left by it.
Hence the Solution Average case O(nlogn).
We can use other optimizations such as predetermining whether array is sorted in decreasing order
find groups of element in decreasing order treat them as single.
I think is it possible to do it in O(nlog(n))with a modified version of quicksort. Basically each time you add an element to less, you check if this element rank in the original array was superior to the rank of the current pivot. It may look like
oldrank -> original positions
count -> what you want
function quicksort('array')
if length('array') ≤ 1
return 'array' // an array of zero or one elements is already sorted
select and remove a pivot value 'pivot' from 'array'
create empty lists 'less' and 'greater'
for each 'x' in 'array'
if 'x' ≤ 'pivot'
append 'x' to 'less'
if oldrank(x) > = oldrank(pivot) increment count(pivot)
else
append 'x' to 'greater'
if oldrank(x) < oldrank(pivot) increment count(x) //This was missing
return concatenate(quicksort('less'), 'pivot', quicksort('greater')) // two recursive calls
EDIT:
Actually it can be done using any comparison based sorting algorithm . Every time you compare two elements such that the relative ordering between the two will change, you increment the counter of the bigger element.
Original pseudo-code in wikipedia.
You can also use binary Index tree
int tree[1000005];
void update(int idx,int val)
{
while(idx<=1000000)
{
tree[idx]+=val;
idx+=(idx & -idx);
}
}
int sum(int idx)
{
int sm=0;
while(idx>0)
{
sm+=tree[idx];
idx-=(idx & -idx);
}
return sm;
}
int main()
{
int a[]={4,2,1,5,3};
int s=0,sz=6;
int b[10];
b[sz-1]=0;
for(int i=sz-2;i>=0;i--)
{
if(a[i]!=0)
{
update(a[i],1);
b[i]=sum(a[i]-1)+s;
}
else s++;
}
for(int i=0;i<sz-1;i++)
{
cout<<b[i]<<" ";
}
return 0;
}
//some array called newarray
for(int x=0; x <=array.length;x++)
{
for(int y=x;y<array.length;y++)
{
if(array[y] < array[x])
{
newarray[x] = newarray[x]+1;
}
}
}
something like this,where array is your input array and newarray your output array
make sure to initialize everything correctly(0 for the newarrays values)
Another approach without using the tree.
Construct another sorted array . For example for input array {12, 1, 2, 3, 0, 11, 4} it will be {0, 1, 2, 3, 4, 11, 12}
Now compare position of each element from input array with sorted array.For example 12 in first array is at 0 index while sorted array it’s as 6
Once comparison is done, remove element from both array
Other than using BST, we can also solve this problem optimally by doing some modification in merge sort algorithm (in O(n*logn) time).
If you observe this problem more carefully, you can say that in the problem we need to count the number of inversions required for each element to make the array sorted in ascending order, right?
So this problem can be solved using Divide and Conquer paradigm. Here you need to maintain an auxiliary array for storing the count of inversions required (i.e. elements smaller than it on the right side of it).
Below is a python program:
def mergeList(arr, pos, res, start, mid, end):
temp = [0]*len(arr)
for i in range(start, end+1):
temp[i] = pos[i]
cur = start
leftcur = start
rightcur = mid + 1
while leftcur <= mid and rightcur <= end:
if arr[temp[leftcur]] <= arr[temp[rightcur]]:
pos[cur] = temp[leftcur]
res[pos[cur]] += rightcur - mid - 1
leftcur += 1
cur += 1
else:
pos[cur] = temp[rightcur]
cur += 1
rightcur += 1
while leftcur <= mid:
pos[cur] = temp[leftcur]
res[pos[cur]] += end - mid
cur += 1
leftcur += 1
while rightcur <= end:
pos[cur] = temp[rightcur]
cur += 1
rightcur += 1
def mergeSort(arr, pos, res, start, end):
if start < end:
mid = (start + end)/2
mergeSort(arr, pos, res, start, mid)
mergeSort(arr, pos, res, mid+1, end)
mergeList(arr, pos, res, start, mid, end)
def printResult(arr, res):
print
for i in range(0, len(arr)):
print arr[i], '->', res[i]
if __name__ == '__main__':
inp = input('enter elements separated by ,\n')
inp = list(inp)
res = [0]*len(inp)
pos = [ind for ind, v in enumerate(inp)]
mergeSort(inp, pos, res, 0, len(inp)-1)
printResult(inp, res)
Time : O(n*logn)
Space: O(n)
You can also use an array instead of a binary search tree.
def count_next_smaller_elements(xs):
# prepare list "ys" containing item's numeric order
ys = sorted((x,i) for i,x in enumerate(xs))
zs = [0] * len(ys)
for i in range(1, len(ys)):
zs[ys[i][1]] = zs[ys[i-1][1]]
if ys[i][0] != ys[i-1][0]: zs[ys[i][1]] += 1
# use list "ts" as binary search tree, every element keeps count of
# number of children with value less than the current element's value
ts = [0] * (zs[ys[-1][1]]+1)
us = [0] * len(xs)
for i in range(len(xs)-1,-1,-1):
x = zs[i]+1
while True:
us[i] += ts[x-1]
x -= (x & (-x))
if x <= 0: break
x = zs[i]+1
while True:
x += (x & (-x))
if x > len(ts): break
ts[x-1] += 1
return us
print count_next_smaller_elements([40, 20, 10, 50, 20, 40, 30])
# outputs: [4, 1, 0, 2, 0, 1, 0]
Instead of BST, you can use stl map.
Start inserting from right.
After inserting an element, find its iterator:
auto i = m.find(element);
Then subtract it from m.end(). That gives you the number of elements in map which are greater than current element.
map<int, bool> m;
for (int i = array.size() - 1; i >= 0; --i) {
m[array[i]] = true;
auto iter = m.find(array[i])
greaterThan[i] = m.end() - iter;
}
Hope it helped.
Modified Merge sort: (Already tested code)
Takes O(nlogn) time.
public class MergeSort {
static HashMap<Integer, Integer> valueToLowerCount = new HashMap<Integer, Integer>();
public static void main(String[] args) {
int [] arr = new int[] {50, 33, 37, 26, 58, 36, 59};
int [] lowerValuesOnRight = new int[] {4, 1, 2, 0, 1, 0, 0};
HashMap<Integer, Integer> expectedLowerCounts = new HashMap<Integer, Integer>();
idx = 0;
for (int x: arr) {
expectedLowerCounts.put(x, lowerValuesOnRight[idx++]);
}
for (int x : arr) valueToLowerCount.put(x, 0);
mergeSort(arr, 0, arr.length-1);
//Testing
Assert.assertEquals("Count lower values on right side", expectedLowerCounts, valueToLowerCount);
}
public static void mergeSort(int []arr, int l, int r) {
if (r <= l) return;
int mid = (l+r)/2;
mergeSort(arr, l, mid);
mergeSort(arr, mid+1, r);
mergeDecreasingOrder(arr, l, mid, r);
}
public static void mergeDecreasingOrder(int []arr, int l, int lr, int r) {
int []leftArr = Arrays.copyOfRange(arr, l, lr+1);
int []rightArr = Arrays.copyOfRange(arr, lr+1, r+1);
int indexArr = l;
int i = 0, j = 0;
while (i < leftArr.length && j < rightArr.length) {
if (leftArr[i] > rightArr[j]) {
valueToLowerCount.put(leftArr[i], valueToLowerCount.get(leftArr[i]) + rightArr.length - j);
arr[indexArr++] = leftArr[i++];
}else {
arr[indexArr++] = rightArr[j++];
}
}
while (i < leftArr.length) {
arr[indexArr++] = leftArr[i++];
}
while (j < rightArr.length) {
arr[indexArr++] = rightArr[j++];
}
}
}
To find the total number of values on right-side which are greater than an array element, simply change single line of code:
if (leftArr[i] > rightArr[j])
to
if (leftArr[i] < rightArr[j])

Removing Duplicates in an array in C

The question is a little complex. The problem here is to get rid of duplicates and save the unique elements of array into another array with their original sequence.
For example :
If the input is entered b a c a d t
The result should be : b a c d t in the exact state that the input entered.
So, for sorting the array then checking couldn't work since I lost the original sequence. I was advised to use array of indices but I don't know how to do. So what is your advise to do that?
For those who are willing to answer the question I wanted to add some specific information.
char** finduni(char *words[100],int limit)
{
//
//Methods here
//
}
is the my function. The array whose duplicates should be removed and stored in a different array is words[100]. So, the process will be done on this. I firstly thought about getting all the elements of words into another array and sort that array but that doesn't work after some tests. Just a reminder for solvers :).
Well, here is a version for char types. Note it doesn't scale.
#include "stdio.h"
#include "string.h"
void removeDuplicates(unsigned char *string)
{
unsigned char allCharacters [256] = { 0 };
int lookAt;
int writeTo = 0;
for(lookAt = 0; lookAt < strlen(string); lookAt++)
{
if(allCharacters[ string[lookAt] ] == 0)
{
allCharacters[ string[lookAt] ] = 1; // mark it seen
string[writeTo++] = string[lookAt]; // copy it
}
}
string[writeTo] = '\0';
}
int main()
{
char word[] = "abbbcdefbbbghasdddaiouasdf";
removeDuplicates(word);
printf("Word is now [%s]\n", word);
return 0;
}
The following is the output:
Word is now [abcdefghsiou]
Is that something like what you want? You can modify the method if there are spaces between the letters, but if you use int, float, double or char * as the types, this method won't scale at all.
EDIT
I posted and then saw your clarification, where it's an array of char *. I'll update the method.
I hope this isn't too much code. I adapted this QuickSort algorithm and basically added index memory to it. The algorithm is O(n log n), as the 3 steps below are additive and that is the worst case complexity of 2 of them.
Sort the array of strings, but every swap should be reflected in the index array as well. After this stage, the i'th element of originalIndices holds the original index of the i'th element of the sorted array.
Remove duplicate elements in the sorted array by setting them to NULL, and setting the index value to elements, which is the highest any can be.
Sort the array of original indices, and make sure every swap is reflected in the array of strings. This gives us back the original array of strings, except the duplicates are at the end and they are all NULL.
For good measure, I return the new count of elements.
Code:
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
void sortArrayAndSetCriteria(char **arr, int elements, int *originalIndices)
{
#define MAX_LEVELS 1000
char *piv;
int beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R;
int idx, cidx;
for(idx = 0; idx < elements; idx++)
originalIndices[idx] = idx;
beg[0] = 0;
end[0] = elements;
while (i>=0)
{
L = beg[i];
R = end[i] - 1;
if (L<R)
{
piv = arr[L];
cidx = originalIndices[L];
if (i==MAX_LEVELS-1)
return;
while (L < R)
{
while (strcmp(arr[R], piv) >= 0 && L < R) R--;
if (L < R)
{
arr[L] = arr[R];
originalIndices[L++] = originalIndices[R];
}
while (strcmp(arr[L], piv) <= 0 && L < R) L++;
if (L < R)
{
arr[R] = arr[L];
originalIndices[R--] = originalIndices[L];
}
}
arr[L] = piv;
originalIndices[L] = cidx;
beg[i + 1] = L + 1;
end[i + 1] = end[i];
end[i++] = L;
}
else
{
i--;
}
}
}
int removeDuplicatesFromBoth(char **arr, int elements, int *originalIndices)
{
// now remove duplicates
int i = 1, newLimit = 1;
char *curr = arr[0];
while (i < elements)
{
if(strcmp(curr, arr[i]) == 0)
{
arr[i] = NULL; // free this if it was malloc'd
originalIndices[i] = elements; // place it at the end
}
else
{
curr = arr[i];
newLimit++;
}
i++;
}
return newLimit;
}
void sortArrayBasedOnCriteria(char **arr, int elements, int *originalIndices)
{
#define MAX_LEVELS 1000
int piv;
int beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R;
int idx;
char *cidx;
beg[0] = 0;
end[0] = elements;
while (i>=0)
{
L = beg[i];
R = end[i] - 1;
if (L<R)
{
piv = originalIndices[L];
cidx = arr[L];
if (i==MAX_LEVELS-1)
return;
while (L < R)
{
while (originalIndices[R] >= piv && L < R) R--;
if (L < R)
{
arr[L] = arr[R];
originalIndices[L++] = originalIndices[R];
}
while (originalIndices[L] <= piv && L < R) L++;
if (L < R)
{
arr[R] = arr[L];
originalIndices[R--] = originalIndices[L];
}
}
arr[L] = cidx;
originalIndices[L] = piv;
beg[i + 1] = L + 1;
end[i + 1] = end[i];
end[i++] = L;
}
else
{
i--;
}
}
}
int removeDuplicateStrings(char *words[], int limit)
{
int *indices = (int *)malloc(limit * sizeof(int));
int newLimit;
sortArrayAndSetCriteria(words, limit, indices);
newLimit = removeDuplicatesFromBoth(words, limit, indices);
sortArrayBasedOnCriteria(words, limit, indices);
free(indices);
return newLimit;
}
int main()
{
char *words[] = { "abc", "def", "bad", "hello", "captain", "def", "abc", "goodbye" };
int newLimit = removeDuplicateStrings(words, 8);
int i = 0;
for(i = 0; i < newLimit; i++) printf(" Word # %d = %s\n", i, words[i]);
return 0;
}
Traverse through the items in the array - O(n) operation
For each item, add it to another sorted-array
Before adding it to the sorted array, check if the entry already exists - O(log n) operation
Finally, O(n log n) operation
i think that in C you can create a second array. then you copy the element from the original array only if this element is not already in the send array.
this also preserve the order of the element.
if you read the element one by one you can discard the element before insert in the original array, this could speedup the process.
As Thomas suggested in a comment, if each element of the array is guaranteed to be from a limited set of values (such as a char) you can achieve this in O(n) time.
Keep an array of 256 bool (or int if your compiler doesn't support bool) or however many different discrete values could possibly be in the array. Initialize all the values to false.
Scan the input array one-by-one.
For each element, if the corresponding value in the bool array is false, add it to the output array and set the bool array value to true. Otherwise, do nothing.
You know how to do it for char type, right?
You can do same thing with strings, but instead of using array of bools (which is technically an implementation of "set" object), you'll have to simulate the "set"(or array of bools) with a linear array of strings you already encountered. I.e. you have an array of strings you already saw, for each new string you check if it is in array of "seen" strings, if it is, then you ignore it (not unique), if it is not in array, you add it to both array of seen strings and output. If you have a small number of different strings (below 1000), you could ignore performance optimizations, and simply compare each new string with everything you already saw before.
With large number of strings (few thousands), however, you'll need to optimize things a bit:
1) Every time you add a new string to an array of strings you already saw, sort the array with insertion sort algorithm. Don't use quickSort, because insertion sort tends to be faster when data is almost sorted.
2) When checking if string is in array, use binary search.
If number of different strings is reasonable (i.e. you don't have billions of unique strings), this approach should be fast enough.

Resources