How can I assemble a set of the lowest or greatest numbers in an array? For instance, if I wanted to find the lowest 10 numbers in an array of size 1000.
I'm working in C but I don't need a language specific answer. I'm just trying to figure out a way to deal with this sort of task because it's been coming up a lot lately.
QuickSelect algorithm allows to separate predefined number of the lowest and greatest numbers (without full sorting). It uses partition procedure like Quicksort algo, but stops when pivot finds needed position.
Method 1: Sort the array
You can do something like a quick sort on the array and get the first 10 elements. But this is rather inefficient because you are only interested in the first 10 elements, and sorting the entire array for that is an overkill.
Method 2: Do a linear traversal and keep track of 10 elements.
int lowerTen = malloc(size_of_array);
//'array' is your array with 1000 elements
for(int i=0; i<size_of_array; i++){
if(comesUnderLowerTen(array[i], lowerTeb)){
addTolowerTen(array[i], lowerTen)
}
}
int comesUnderLowerTen(int num, int *lowerTen){
//if there are not yet 10 elements in lowerTen, insert.
//else if 'num' is less than the largest element in lowerTen, insert.
}
void addToLowerTen(int num, int *lowerTen){
//should make sure that num is inserted at the right place in the array
//i.e, after inserting 'num' *lowerTen should remain sorted
}
Needless to say, this is not a working example. Also do this only if the 'lowerTen' array needs to maintain a sorted list of a small number of elements. If you need the first 500 elements in a 1000 element array, this would not be the preferred method.
Method 3: Do method 2 when you populate the original array
This works only if your original 1000 element array is populated one by one - in that case instead of doing a linear traversal on the 1000 element array you can maintain the 'lowerTen' array as the original array is being populated.
Method 4: Do not use an array
Tasks like these would be easier if you can maintain a data structure like a binary search tree based on your original array. But again, constructing a BST on your array and then finding first 10 elements would be as good as sorting the array and then doing the same. Only do this if your use case demands a search on a really large array and the data needs to be in-memory.
Implement a priority queue.
Loop through all the numbers and add them to that queue.
If that queue's length would be equal to 10, start checking if the current number is lower than highest one in that queue.
If yes, delete that highest number and add current one.
After all you will have a priority queue with 10 lowest numbers from your array.
(Time needed should be O(n) where n is the length of your array).
If you need any more tips, add a comment :)
the following code
cleanly compiles
performs the desired functionality
might not be the most efficient
handles duplicates
will need to be modified to handle numbers less than 0
and now the code
#include <stdlib.h> // size_t
void selectLowest( int *sourceArray, size_t numItemsInSource, int *lowestDest, size_t numItemsInDest )
{
size_t maxIndex = 0;
int maxValue = 0;
// initially populate lowestDest array
for( size_t i=0; i<numItemsInDest; i++ )
{
lowestDest[i] = sourceArray[i];
if( maxValue < sourceArray[i] )
{
maxValue = sourceArray[i];
maxIndex = i;
}
}
// search rest of sourceArray and
// if lower than max in lowestDest,
// then
// replace
// find new max value
for( size_t i=numItemsInDest; i<numItemsInSource; i++ )
{
if( maxValue > sourceArray[i] )
{
lowestDest[maxIndex] = sourceArray[i];
maxIndex = 0;
maxValue = 0;
for( size_t j=0; j<numItemsInDest; j++ )
{
if( maxValue < lowestDest[j] )
{
maxValue = lowestDest[j];
maxIndex = j;
}
}
}
}
} // end function: selectLowest
Related
I want to create a function that can return the number distinct values present in a given array. If for eg the array is
array[5] = { 1 3 4 1 3}, the return value should be 3(3 unique numbers in array).
I've so far only got this:
int NewFucntion(int values[], int numValues){
for (i=0; i<numValues; i++){
Im a new coder/New to C language and im stuck on how to proceed. Any guidance would be much appreciated. Thanks
Add elements from the array to the std::set<T> and since the set is not allowing duplicate elements, you can then only get the number of elements from the set which gives you the number of distinct elements.
For example:
#include<set>
int NewFucntion(int values[], int numValues){
std::set<int> set;
for(int i=0; i<numValues; i++){
set.insert(values[i]);
}
return set.size();
}
int distinct(int arr[], int arr_size){
int count = arr_size;
int current;
int i, j;
for (i = 0; i < arr_size; i++){
current = arr[i];
for (j = i+1; j < arr_size; j++) // checks values after [i]th element.
if (current == arr[j])
--count; // decrease count by 1;
}
if (count >= 0)
return count;
else return 0;
}
Here's the explanation.
The array with its size is passed as an argument.
current stores the element to compare others with.
count is the number that we need finally.
count is assigned the value of size of the array (i.e we assume that all elements are unique).
(It can also be the other way round)
A for loop starts, and the first (0th) element is compared with the elements after it.
If the element reoccurs, i.e. if (current==arr[j]), then the value of count is decremented by 1 (since we expected all elements to be unique, and because it is not unique, the number of unique values is now one less than what it was initially. Hence --count).
The loops go on, and the value is decremented to whatever the number of unique elements is.
In case our array is {1,1,1,1}, then the code will print 0 instead of a negative value.
Hope that helps.
Happy coding. :)
I like wdc's answer, but I am going to give an alternative using only arrays and ints as you seam to be coding in c and wdc's answer is a c++ answer:
To do this thing, what you need to do is to go through your array as you did, and store the new numbers you go over in a different array lets call it repArray where there wont be any repetition; So every time you add something to this array you should check if the number isn't already there.
You need to create it and give it a size so why not numValues as it cannot get any longer than that. And an integers specifying how many of it's indexes are valid, in other words how many you have written to let's say validIndexes. So every time you add a NEW element to repArray you need to increment validIndexes.
In the end validIndexes will be your result.
Assume an array and we start from element at index 0. We want to go from 0 index to last index of the array by taking steps of at max length K.
For example, suppose an array is [10,2,-10,5,20] and K is 2, which means maximum step length is 2 (We can assume K is always possible and less than length of array).
Now as we start from index 0, our score currently is 10 and then we can either go to 2 or can go to -10. Suppose we go to 2 from here so total score becomes 10+2=12. Now from 2 we can go to -10 or 5 so you go to 5 making score 12+5=17. From here you directly go to last index as you have no way other than that, hence total score is 17+20=37.
For given array of length N and an integer K we need to find maximum score we can get.
I thought of a solution, to divide it into sub problems by deciding weather to go at index i or not and recursively call the remaining array. But I sense some dynamic programming out of this problem.
How can this be solved for given array of size N and integer K.
Constraint : 1<=N<=100000 and 1<=K<=N
Came up with a O(n*k) solution.
Main function call would be findMax(A,N,K,0).
MAX = new Array();
MAX[i] = null. For 0<=i<N
null denoting the particular element has not been filled.
procedure findMax(A,N,K,i)
{
if (MAX[i]!=null)
return MAX[i];
else if (i==N-1)
MAX[i]=A[i];
else
{
MAX[i]=A[i]+findMax(A,N,K,i+1);
for (j=2; j<=K&&(i+j)<N; ++j)
if (A[i]+findMax(A,N,K,i+j)>MAX[i])
MAX[i]=A[i]+findMax(A,N,K,i+j);
}
return MAX[i];
}
The problem has optimal sub-structure property. To calculate optimal solution, all sub-problems need to be computed. So at a quick glance, I guess the time complexity wont go below O(n*k).
This can be solved in O(n) time and memory
basically: go from back from i = n-1 to 0 and you have to know somehow what is the best index from i+1 up to i+k right? Then best answer for i would be to jump on the best index in range [i+1, i+k]
To get that information you can create some sort of queue (but you need to be able to perform pop from both sides in c++ you can use dequeue).
In that queue you keep two informations: (time, value), where time is the time at which you pushed element and value is best sum you can get if you start from element.
Now when you are in index i: first pop until current time (lest name it t) minus queue.top.time is > k: while( t-que.top.time > k) que.pop
Then you can take que.top.value + array[i] and that is the best value you can get from index i.
Last part to do is updating queue. You create new element e = (t, que.top.value + array[i]) and take que.back (instead of que.top) and perform
while (que.back.value <= e.value) que.pop_back
Then you can push back
que.push_back(e)
and increase t++
This works because, when your new element has better value then elements inserted on the que in the past its better to keep this element instead, because you will be able to use it longer.
Hope it makes sense :)
Try to walk backward this way you can achieve that in O(n*logk).
If the array was on size 1 the the max was that element. Consider you in the i-element - you can take him or one of the next K element -> choose the one that maximize your final result.
Consider the following pseudo code:
Base on #RandomPerfectHashFunction answer with some change
Consider Max as our answer array and tree as AVL Tree( self balancing binary search Tree)
findMaxStartingFromIndex(A,N,K,i, Max, Tree)
if Max[i] != null
return Max[i]
max = Tree.Max // log k - just go down all the way to the right
if (i + k > N) // less then k element to end of array
max = max(max,0) // take the maximum only if he positive
Max[i] = A[i] + max
Tree.add(Max[i])
if (i + k < N)
Tree.remove(Max[i+k]) // remove the element from tree because it is out of the rolling-window of k elements
return Max[i]
In Main:
Init Max array at size N
Init Tree as empty AVL tree
Max[N-1] = A[N-1]
Tree.add(MAX[N-1])
for (i = N-2; i >= 0 ; i--)
findMaxStartingFromIndex(A,N,K,i, Nax, Tree)
When all done look for the max in the first k element of the Max array (no always choosing the first element is the best option)
Adding finding and removing element to binary search tree is log n -> in our case tree will hold only k element -> we achieve O(n*logk) complexity
This can be done in O(n). I'm assuming you're already familiar with the basic DP algorithm, which runs in O(nk). We have dp[i] = value[i] + (max(dp[j]) for i - k < j < i). The k factor in the complexity comes from finding the minimum of the last k values in our DP array, which we can optimize to O(1).
One optimization might be to maintain a binary search tree containing the last k values, which would make an O(n log k) solution. But we can do better by using a double-ended queue instead of a binary search tree.
We maintain a deque containing the candidates for the maximum of the last k elements. Before we push the current dp value into the back of the deque, we pop off the value at the back if it is less than or equal to the current value. Because the current value is both better (or at least as good) than the value in the back and will be in the deque for longer, the value at the back will never be the maximum in the deque and can be discarded. We repeat this until the value at the back is no longer less than or equal to the current value.
We can then pop off the front value if its index is less than the current index minus k.
The way we popped off numbers from the back makes our queue always decreasing, so the maximum value is at the front.
Note that even though the loop popping off the values at the back might run as much as n - 1 times in an iteration of the main loop, the total complexity is still O(n) because each element in the DP array popped off at most once.
this can be solved with dynamic programming. dp[i] means the maximum scores we can collect from nums[0] to nums[i]. Transition is dp[i] = max(dp[i-1], dp[i-2],...,dp[i-k])+nums[i]. Time complexity is O(nk).
A greedy solution. You might find it is easier to understand.
class Solution {
public static void main(String[] args) {
//Init
int[]path= {10,2,-10,5,20};
int maxStep=2;
int max=path[0];
if(path.length==0)System.out.println(0);
for(int i=0;i<path.length-1;) {
int index=0,temp=Integer.MIN_VALUE;
//for each step, choose the step that has max value
for(int j=1;(j<=maxStep)&&(i+j<=path.length-1);j++) {
if(i+j>path.length-1)break;
if(path[i+j]>temp) {
temp=path[i+j];
index=j;
}
}
//change the index and the max value
i+=index;max+=temp;
}
System.out.println(max);
}
}
This was asked today in my interview .Two of the answers here posted best approach. Just adding code here for the same.
Time Complexity : O(n) n - number of elements of array
package main.java;
import java.util.*;
public class Main {
public static int solve(int[] a, int k) {
int ans = Integer.MIN_VALUE;
MaxSlidingWindow maxSlidingWindow = new MaxSlidingWindow(k);
for (int i = 0; i < a.length; i++) {
ans = maxSlidingWindow.getMax() + a[i];
maxSlidingWindow.add(i, ans);
}
return ans;
}
public static void main(String[] args) {
int[] input = {-9, -11, -10, 5, 20};
System.out.print(Main.solve(input, 2));
}
}
// at any point MaxSlidingWindow will have atmost k nodes
// with 'index' and 'val' monotonically decreasing from head to tail
class MaxSlidingWindow {
int k;
Deque<Node> q;
class Node {
int index;
int val;
Node(int index, int val) {
this.index = index;
this.val = val;
}
}
MaxSlidingWindow(int k) {
this.k = k;
this.q = new LinkedList<Node>();
}
public void add(int index, int val) {
if (q.isEmpty()) {
q.addLast(new Node(index, val));
} else {
if (index - q.peekFirst().index + 1 > k) {
q.pollFirst(); // removing head as it is out of range
}
while (!q.isEmpty() && q.peekLast().val <= val) {
q.pollLast(); // removing values in last less than current
}
q.addLast(new Node(index, val));
}
}
public int getMax() {
if (q.isEmpty()) {
return 0;
}
return q.peekFirst().val;
}
}
I need to merge k (1 <= k <= 16) sorted arrays into one sorted array. This is for a homework assignment and the Professor requires that this be done using an O(n) algorithm. Merging 2 arrays is no problem and I can do it easily using an O(n) algorithm. I feel that what my professor is asking is undoable for n arrays with an O(n) algorithm.
I am using the below algorithm to split the array indices and running InsertionSort on each partition. I could save these start and end indices into a 2D array. I just don't see how the merging can be done using O(n) because this is going to require more than one loop. If it is possible, does anyone have any hints. I'm not looking for actual code, just a hint as to where I should start/
int chunkSize = round(float(arraySize) / numThreads);
for (int i = 0; i < numThreads; i++) {
int start = i * chunkSize;
int end = start + chunkSize - 1;
if (i == numThreads - 1) {
end = arraySize - 1;
}
InsertionSort(&array[start], end - start + 1);
}
EDIT: The requirement is that the algorithm be O(n) where n is the number of elements in the array. Also, I need to solve this without using a min heap.
EDIT #2: Here is an algorithm I came up with. The problem here is that I'm not storing the result of each iteration back into the original array. I could just copy all of it back in for a loop but that would be expensive. Is there any way I can do this, other than using something memcpy? In the below code, indices is a 2D array [numThreads][2] where array[i][0] is the start index and array[i][1] is the end index of the ith array.
void mergeArrays(int array[], int indices[][2], int threads, int result[]) {
for (int i = 0; i < threads - 1; i++) {
int resPos = 0;
int lhsPos = 0;
int lhsEnd = indices[i][1];
int rhsPos = indices[i+1][0];
int rhsEnd = indices[i+1][1];
while (lhsPos <= lhsEnd && rhsPos <= rhsEnd) {
if (array[lhsPos] <= array[rhsPos]) {
result[resPos] = array[lhsPos];
lhsPos++;
} else {
result[resPos] = array[rhsPos];
rhsPos++;
}
resPos++;
}
while (lhsPos <= lhsEnd) {
result[resPos] = array[lhsPos];
lhsPos++;
resPos++;
}
while (rhsPos <= rhsEnd) {
result[resPos] = array[rhsPos];
rhsPos++;
resPos++;
}
}
}
You can merge K sorted arrays in one sorted array with O(N*log(K)) algorithm, using priority queue with K entries, where N is overall number of elements in all arrays.
If K is considered as constant value (it is limited by 16 in your case), then complexity is O(N).
Note again: N is number of elements in my post, not number of arrays.
It is impossible to merge arrays in O(K) - simple copy takes O(N)
Using the facts you provided:
(1) n is the number of arrays to to merge;
(2) the arrays to be merged are already sorted;
(3) the merge needs to be of order n, that is linear in the number of arrays
(and NOT linear in the number of elements in each array, as you might mistakenly think at first sight).
Use the analogy of merging 4 sorted piles of cards, low to high, face up. You would pick the card with the lowest face value from one of the piles and put it (face down) on the merged deck, until all piles are exhausted.
For your program: keep a counter for each array for the number of elements you have already transferred to the output. This is at the same time an index to the next element in each array NOT merged in the output. Pick the smallest element that you find at one of these locations. You have to lookup the first waiting element in all the arrays for that, so that is of order n.
Also, I don't understand why the answer from MoB got up-votes, it does not answer the question.
Here is one way to do it (pseudocode)
input array[k][n]
init indices[k] = { 0, 0, 0, ... }
init queue = { empty priority queue }
for i in 0..k:
insert i into queue with priority (array[i][0])
while queue is not empty:
let x = pop queue
output array[x, indices[x]]
increment indices[x]
insert x into queue with priority (array[x][indices[x]])
This can probably be simplified further in C. You would have to find a suitable queue implementation to use though as there are none in libc.
Complexity for this operation:
"while queue is not empty" => O(n)
"insert x into queue ..." => O(log k)
=> O(n log k)
Which, if you consider k = constant, is O(n).
After sorting the k sub-arrays (the method doesn't matter), the code does a k-way merge. The simplest implementation does k-1 compares to determine the smallest leading element of each of the k arrays, then moves that element from it's sub-array to the output array and gets the next element from that array. When the end of an array is reached, the algorithm drops down to a (k-1) way merge, then (k-2) way merge, finally there's just one sub-array left and it's copied. This will be O(n) time since k-1 is a constant.
The k-1 compares can be sped up by using a minimum heap (which is how some priority queues are implemented), but it's still O(n), with just a smaller constant. The heap needs to be initialized at the start, then updated each time an element is removed and a new one added.
I have a big size array that contains numbers, is there way to find the indices of top n values? Any lib function in C?
example:
an array : {1,2,6,5,3}
the indices of top 2 number is: {2,3}
If by top n you mean the n-th highest (or lowest) number in the array, you may want to look at the QuickSelect algorithm. Unfortunately there is no C library function I am aware of that implements it but Wikipedia should give you a good starting point.
QuickSelect is O(n) on average, if O(nlogn) and some overhead is fine as well, you can do qsort and take the n'th element.
Edit (In response to example) Getting all the indexes of the top-n in a single batch is straightforward with both approaches. QuickSelect sorts them all on one side of the final pivot.
So you want the top n numbers in a big array of N numbers. There is a straightforward algorithm which is O(N*n). If n is small (as it seems to be in your case) this is good enough.
size_t top_elems(int *arr, size_t N, size_t *top, size_t n) {
/*
insert into top[0],...,top[n-1] the indices of n largest elements
of arr[0],...,arr[N-1]
*/
size_t top_count = 0;
size_t i;
for (i=0;i<N;++i) {
// invariant: arr[top[0]] >= arr[top[1]] >= .... >= arr[top[top_count-1]]
// are the indices of the top_count larger values in arr[0],...,arr[i-1]
// top_count = max(i,n);
size_t k;
for (k=top_count;k>0 && arr[i]>arr[top[k-1]];k--);
// i should be inserted in position k
if (k>=n) continue; // element arr[i] is not in the top n
// shift elements from k to top_count
size_t j=top_count;
if (j>n-1) { // top array is already full
j=n-1;
} else { // increase top array
top_count++;
}
for (;j>k;j--) {
top[j]=top[j-1];
}
// insert i
top[k] = i;
}
return top_count;
}
I have an array of int (the length of the array can go from 11 to 500) and i need to extract, in another array, the largest ten numbers.
So, my starting code could be this:
arrayNumbers[n]; //array in input with numbers, 11<n<500
int arrayMax[10];
for (int i=0; i<n; i++){
if(arrayNumbers[i] ....
//here, i need the code to save current int in arrayMax correctly
}
//at the end of cycle, i want to have in arrayMax, the ten largest numbers (they haven't to be ordered)
What's the best efficient way to do this in C?
Study maxheap. Maintain a heap of size 10 and ignore all spilling elements. If you face a difficulty please ask.
EDIT:
If number of elements are less than 20, find n-10 smallest elements and rest if the numbers are top 10 numbers.
Visualize a heap here
EDIT2: Based on comment from Sleepy head, I searched and found this (I have not tested). You can find kth largest element (10 in this case) in )(n) time. Now in O(n) time, you can find first 10 elements which are greater than or equal to this kth largest number. Final complexity is linear.
Here is a algo which solves in linear time:
Use the selection algorithm, which effectively find the k-th element in a un-sorted array in linear time. You can either use a variant of quick sort or more robust algorithms.
Get the top k using the pivot got in step 1.
This is my idea:
insert first 10 elements of your arrayNum into arrMax.
Sort those 10 elements arrMax[0] = min , arrMax[9] = max.
then check the remaining elements one by one and insert every possible candidate into it's right position as follow (draft):
int k, r, p;
for (int k = 10; k < n; k++)
{
r = 0;
while(1)
{
if (arrMax[r] > arrNum[k]) break; // position to insert new comer
else if (r == 10) break; // don't exceed length of arrMax
else r++; // iteration
}
if (r != 0) // no need to insert number smaller than all members
{
for (p=0; p<r-1; p++) arrMax[p]=arrMax[p+1]; // shift arrMax to make space for new comer
arrMax[r-1] = arrNum[k]; // insert new comer at it's position
}
} // done!
Sort the array and insert Max 10 elements in another array
you can use the "select" algorithm which finds you the i-th largest number (you can put any number you like instead of i) and then iterate over the array and find the numbers that are bigger than i. in your case i=10 of course..
The following example can help you. it arranges the biggest 10 elements of the original array into arrMax assuming you have all positive numbers in the original array arrNum. Based on this you can work for negative numbers also by initializing all elements of the arrMax with possible smallest number.
Anyway, using a heap of 10 elements is a better solution rather than this one.
void main()
{
int arrNum[500]={1,2,3,21,34,4,5,6,7,87,8,9,10,11,12,13,14,15,16,17,18,19,20};
int arrMax[10]={0};
int i,cur,j,nn=23,pos;
clrscr();
for(cur=0;cur<nn;cur++)
{
for(pos=9;pos>=0;pos--)
if(arrMax[pos]<arrNum[cur])
break;
for(j=1;j<=pos;j++)
arrMax[j-1]=arrMax[j];
if(pos>=0)
arrMax[pos]=arrNum[cur];
}
for(i=0;i<10;i++)
printf("%d ",arrMax[i]);
getch();
}
When improving efficiency of an algorithm, it is often best (and instructive) to start with a naive implementation and improve it. Since in your question you obviously don't even have that, efficiency is perhaps a moot point.
If you start with the simpler question of how to find the largest integer:
Initialise largest_found to INT_MIN
Iterate the array with :
IF value > largest_found THEN largest_found = value
To get the 10 largest, you perform the same algorithm 10 times, but retaining the last_largest and its index from the previous iteration, modify the largest_found test thus:
IF value > largest_found &&
value <= last_largest_found &&
index != last_largest_index
THEN
largest_found = last_largest_found = value
last_largest_index = index
Start with that, then ask yourself (or here) about efficiency.