There are n elements in the array. I need to divide the array in two parts where average of both array parts is same.
Say, you have an array [1, 2, 3]. Here elements [1, 3] have average of 2 whereas element [2] too have an average of two.
Another example is that : [1, 2, 5, 4]. Here elements [1, 5] has an average of 3 whereas elements [2, 4] too have an average of 3.
So, in case there exists such an average condition, I should flag "Yes" otherwise "No". What data structure/algorithm would you recommend for such problem?
I tried something on lines of this :
http://www.geeksforgeeks.org/equilibrium-index-of-an-array/
but it did not work.
I'm not an expert of algorthms and the only solution I can think now is a bit brutal:
avg(array)
if there is an element with the same value of the avg => done
sort of the array
starting from the biggest element, I would calculate the avg with the others starting from the smallest ones with tail recursion (until they don't give a solution higher than the calculated avg or the calculated avg)
if I find a combination which gives the calculated avg, the remain numbers will give the same avg for sure
Unluckily I don't remember any kind of useful theorem about the average...
Related
This was the interview question I had from a tech company. I got it wrong, which I think doomed my chances, but I honestly I still cannot figure out the answer... here's the question. Assume that all elements of the sequence are unique.
We have two finite sequences: X={Xi}, Y={Yi} where Yi is a sub-sequence of Xi.
Let's write them as separate arrays: [X1, X2, ..., Xn], [Y1, Y2, ..., Yk] where n is the length of X, k is the length of Y, and obviously, since Y is a sub-sequence of X, we have n>=k.
For instance
X=[1, 10, 5, 7, 11, -4, 9, 5]
y=[10, 7, -4, 9]
Then for each element in Y, we want to find the number of elements in X which 1) appear after that element and 2) greater than that element.
Using the example above
X=[1, 10, 5, 7, 11, -4, 9, 5]
y=[10, 7, -4, 9]
ans=[1, 2, 2, 0]
explanation:
the first element of ans is 1 because only 11 appears after 10 and greater than 10 in X,
so there's only 1 element
second element of ans is 2 since 11, 9 both appear after 7 in X, so there are 2 elements
that appear after 7 and greater than 7.
the third element of ans is also 2 since 9, 5 appear after -4 and are both greater than
-4 in X.
the fourth element is 0 since no element in X appears after and greater than 9.
The interviewer wanted me to solve it in O(N) time complexity where N is the length of X. I did not find how.
Anybody has an idea?
If have an algorithm that can solve this problem, then by setting Y = X, you can make it provide enough information to sort X without any further comparisons among elements in X. Therefore, you can't do this in linear time under the usual assumptions, i.e., arbitrary integers in X that you can do operations on in constant time, but no constant bound on their size.
You can do it in O(N log N) time pretty easily by walking backwards through X and maintaining an order statistic tree of the elements seen so far. See https://en.wikipedia.org/wiki/Order_statistic_tree
I think it's impossible same as it's impossible for sorting and here is the reason
For solving this we should save state for previous calculation in limited number variable, for example store addition, subtraction or multiply.
So if there is a big number in A thats not in B its very clear it's not usefull at all, and we already know the only possible solution is to save previous state in limited variable, So we can't have numbers that related only to item in A.
So far we know to solve this is we should figure out the saving state algorithm, for saving state we can only store some number that represent for all previous numbers for current element in Y all of these calculation its not helping because we dont know the next item in Y (for example the current number is -1000 and next number is 3000 and other number in X is 1,2,3). so because of that we cant have any stored number that related to current element in Y. also we cant have any number that's not related to Y(as its usefull at all)
I'm trying to develop a sort of very simple machine learning example to recognize similarity between arrays.
For this reason I'm trying to calculate the average between 2 arrays with different length.
For example if I have:
array_1 = [0, 4, 5];
array_2 = [4, 2, 7];
The average is:
average_array = [2, 3, 6];
But how can I manage to calculate the average if I have the following situation:
array_1 = [0, 4, 5, 10, 7];
array_2 = [4, 2, 7];
As you can see the arrays have a different length.
Is there an algorithm that I can apply to solve this problems?
Does anyone have an idea or some suggestion?
Of course I can consider the missing values of the second array as 0, and evaluate the average as, for example:
average_array = [2, 3, 6, 5, 3.5];
or consider the values as "null" and have:
average_array = [2, 3, 6, 10, 7];
But are this two approach good?
Or there is something smarter?
Thanks for your help!!
To answer your question, we really need more information on what you are trying to achieve.
I'm trying to develop a sort of very simple machine learning example
to recognize similarity between arrays. For this reason I'm trying to
calculate the average between 2 arrays with different length.
Depending on your usecase, similarity might be defined completely differently.
For instance:
if the array encodes sound-information you might want to measure similarity as "does this sound clip occur in this one" or "are the main frequencies (which would correspond to chords) the same"
if the array encodes image information (properly DFT-ed and zig-zag-encoded) you might not care about the low frequencies (end of the array) and only measure the difference between the first few values of the array
if the array encodes some kind of composition of elements (e.g. this essay contains keyword "matrix" 40 times, and keyword "SVM" 27 times) the difference in values might be very important.
General advice:
Think about what you're measuring
Decide what's important
But in general, have a look at smoothing algorithms. For instance Kneyser-Ney or Good-Turing smoothing. They explictly deal with comparing a vector of probabilities that may differ in length (in other words, have explicit zero entries)
https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation
If after taking the the average of the arrays, you intend to take the mod of the difference of the array and the average array, then you are probably in the right direction if you will measure the dissimilarity by the magnitude of the difference.
But for arrays of different lengths I propose that you also take the index of extra elements in consideration.
For
array_1 = [0, 4, 5, 10, 7];
array_2 = [4, 2, 7];
average should be average_array = [2, 3, 6, 6.5, 5.5];
6.5 = (10 + 3(index) + 0(element) ) / 2
and
5.5 = (7 + 4(index) + 0(element))/2
Reason for taking index into consideration is that the length factor is also dealth with this approach. However this is just my 2 cents. May be there are better algorithms out there.
You should also take a look at this post
I have an unsorted array of size n and I need to find k-1 divisors so every subset is of the same size (like after the array is sorted).
I have seen this question with k-1=3. I guess I need the median of medians and this is will take o(n). But I think we should do it k times so o(nk).
I would like to understand why it would take o(n logk).
For example: I have an unsorted array with integers and I want find the k'th divisors which is the k-1 integers that split the array into k (same sized) subarrays according to their values.
If I have [1, 13, 6, 7, 81, 9, 10, 11] the 3=k dividers is [7 ,11] spliting to [1 6, 9 10 13 81] where every subset is big as 2 and equal.
You can use a divide-and-conquer approach. First, find the (k-1)/2th divider using the median-of-medians algorithm. Next, use the selected element to partition the list into two sub-lists. Repeat the algorithm on each sub-list to find the remaining dividers.
The maximum recursion depth is O(log k) and the total cost across all sub-lists at each level is O(n), so this is an O(n log k) algorithm.
Given an array A of size N, we construct a list containing all possible subarrays of A in descending order.
Two subarrays B and C are compare by padding zeroes until both are of size N. Then, we compare the two subarrays element by element and return as soon as a point of difference is observed.
We are given multiple queries where given x we have to find the maximum element in the xth subarray sorted according to the order given above.
For example, if the array A is [3, 1, 2, 4]; then the sorted subarrays will be:
[4]
[3, 1, 2, 4]
[3, 1, 2]
[3, 1]
[3]
[2, 4]
[2]
[1, 2, 4]
[1, 2]
[1]
A query where x = 3 corresponds to finding the maximum element in the subarray [3, 1, 2]; so here the answer would be 3.
Since the number of queries are large (of the order of 10^5) and the number of elements in the array can also be large (of the order of 10^5), we would need to do some preprocessing to answer each query in O(1) or O(log N) or O(sqrt N) time. I can't seem to figure out how to do this. I have solved it for when the array contains unique elements, however how could we do this for when the array contains repetitions? Is there any data structure which could help in storing the required information?
Build suffix array in back order for this array (consider it like string)
For every entry store it's length and cumulative count (sum of lengths from the beginning of suffix array)
For query find needed index by binary search for cumulative counts, and get needed prefix of found suffix
For your examples suffixes with cumul.counts are
4 (0)
3124 (1)
34 (5)
124 (7)
query 3 finds entry 3124 (1<=3<5), and gets 3-1=2-nd (by length) prefix = 312
Does anyone know an Algorithm that sorts k-approximately an array?
We were asked to find and Algorithm for k-approximate sorting, and it should run in O(n log(n/k)). but I can't seem to find any.
K-approx. sorting means that an array and any 1 <= i <= n-k such that sum a[j] <= sum a[j] i<=j<= i+k-1 i+1<=j<= i+k
I know I'm very late to the question ... But under the assumption that k is some approximation value between 0 and 1 (when 0 is completely unsorted and 1 is perfectly sorted) surely the answer to this is quicksort (or mergesort).
Consider the following array:
[4, 6, 9, 1, 10, 8, 2, 7, 5, 3]
Let's say this array is 'unsorted' - now apply one iteration of quicksort to this array with the (length[array]/2)th element as a pivot: length[array]/2 = 5. So the 5th element is our pivot (i.e. 8):
[4, 6, 2, 1, 3, 9, 7, 10, 8]
Now this is array is not sorted - but it is more sorted than one iteration ago, i.e. its approximately sorted but for a low approximation, i.e. a low value of k. Repeat this step again on the two halves of the array and it becomes more sorted. As k increases towards 1 - i.e. perfectly sorted - the complexity becomes O(N log(N/1)) = O(N log(N)).