Absolute distance from various points in O(n) - c

I am stuck in question. The part of the question requires to calculate sum of absolute distance of a point from various points.
|x - x1| + |x - x2| + |x - x3| + |x - x4| ....
I have to calculate this distance in O(n) for every point while iterating in array for eg:
array = { 3,5,4,7,5}
sum of distance from previous points
dis[0] = 0;
dis[1] = |3-5| = 2
dis[2] = |3-4| + |5-4| = 2
dis[3] = |3-7| + |5-7| + |4-7| = 9
dis[4] = |3-5| + |5-5| + |4-5| + |7-5| = 5
Can anyone suggest the algo to do this ?
Algorithm for less than O(n^2) will be appreciated ( not necessarily O(n)).
Code for O(n^2)
REP(i,n){
LL ans = 0;
for(int j=0;j<i;j++)
ans= ans + abs(a[i]-a[j])
dis[i]=ans;
}

O(n log n) algorithm is possible.
Assume we had a datastructure for a list of integers which supported:
Insert(x)
SumGreater(x)
SumLesser(x)
Insert(x) inserts x into the list.
SumGreater(x) gives the sum of all elements greater than x, which are in the list.
SumLesser(x) gives the sum of elements < x.
NumGreater(x) gives the number of all elements greater than x.
NumLesser(x) gives the number of all elements < x.
Using balanced binary trees, with cumulative sub-tree sums and sub-tree counts stored in the nodes, we can implement each operation in O(log n) time.
To use this structure for your question.
Walk the array left to right, and When you encounter a new element x
You query the already inserted numbers for SumGreater(x) = G and SumLesser(x) = L and NumGreater(x) = n_G and NumLesser(x) = n_L
The value for x would be (G - n_G*x) + (n_L*x-L).
Then you insert x and continue.

Is O(n) even possible? - If the size of your output is 1/2 * n^2, how can you populate it in O(n) time?

Related

Array [1,2,3....n]. A number is missing in that series. What is the optimum way to find out that number? [duplicate]

This question already has answers here:
Quickest way to find missing number in an array of numbers
(31 answers)
Closed 3 years ago.
There will be an array starting from 1,2,3.... n. If any one number is removed from the array what is the optimun way to find out removed number.
The following is just one possible way to find the missing one.
If n is given, then the sum of the series 1 + 2 + 3 + ... + n is,
S = 1 + 2 + 3 + ... + n
= n * (n + 1) / 2
So, ultimately, you know the value of S. Now sum up all the integers you are given. Let's call it S'. And the difference between S and S', (S - S') is the answer.
This will work even the given integers are in random order. This will not require a binary search where it is required that the integers must be sorted and needs an extra nlogn time.
Assuming that the numbers are in serial order
Two solutions
Check while reading.
Binary search for the missing number.
If the array is sorted then the element can be found in O(log n) time and O(1) space
Using Binary Search.
int search(int *ar, int n) {
int a = 0, b = n - 1;
int mid;
while ((b - a) > 1) {
mid = a + ((b-a) >> 1);
if ((ar[a] - a) != (ar[mid] - mid))
b = mid;
else if ((ar[b] - b) != (ar[mid] - mid))
a = mid;
}
return (ar[mid] + 1);
}
If the array is not sorted
n*(n+1)/2 - sum(array)
where n is the length of the array.

Given an unsorted integer array find numbers that are not searchable

Interview question from a friend
Given an unsorted integer array, how many number are not able to find using binary search?
For example, [2, 3, 4, 1, 5], only the number 1 can't be find using binary search, hence count = 1
[4,2,1,3,5] 4 and 4 and 2 are not searchable => binarySearch(arr, n) return a number that is not equal to num
Expected run time is O(n)
Can't think of an algorithm that can achieve O(n) time :(
Thought about building min and max arr, however, woudln't work as the subarray can mess it out again.
Already knew the O(nlogn) approach, it was obvious, just call binary search for each number and check.
I believe this code works fine. It does one single walk of each value in the list, so it is O(n).
function CountUnsearchable(list, minValue = -Infinity, maxValue=Infinity) {
if (list is empty) return 0;
let midPoint = mid point of "list"
let lowerCount = CountUnsearchable(left half of list, minValue, min(midPoint, maxValue));
let upperCount = CountUnsearchable(right half of list, max(minValue, midPoint), maxValue);
let midPointUnsearchable = 1 if midPoint less than minValue or greater than maxValue, otherwise 0;
return lowerCount + upperCount + midPointUnsearchable;
}
It works, because we walk the tree a bit like we would in a binary search, except at each node we take both paths, and simply track the maximum value that could have led us to take this path, and the minimum value that could have led us to take this path. That makes it simple to look at the current value and answer the question of whether it can be found via a binary search.
Try to create the following function:
def count_unsearchable(some_list, min_index=None, max_index=None, min_value=None, max_value=None):
"""How many elements of some_list are not searchable in the
range from min_index to max_index, assuming that by the time
we arrive our values are known to be in the range from
min_value to max_value. In all cases None means unbounded."""
pass #implementation TBD
It is possible to implement this function in a way that runs in time O(n). The reason why it is faster than the naive approach is that you are only making the recursive calls once per range, instead of once per element in that range.
Idea: Problem can be reworded as - find the count of numbers in the array which are greater than all numbers to their left and smaller than all numbers to their right. Further simplified, find the count of numbers which are greater than the max number to their left and smaller than the minimum number to their right.
Code: Java 11 | Time/Space: O(n)/O(n)
int binarySearchable(int[] nums) {
var n = nums.length;
var maxToLeft = new int[n + 1];
maxToLeft[0] = Integer.MIN_VALUE;
var minToRight = new int[n + 1];
minToRight[n] = Integer.MAX_VALUE;
for (var i = 1; i < n + 1; i++) {
maxToLeft[i] = Math.max(maxToLeft[i - 1], nums[i - 1]);
minToRight[n - i] = Math.min(minToRight[n + 1 - i], nums[n - i]);
}
for (var i = 0; i < n; i++)
if (nums[i] >= maxToLeft[i + 1] && nums[i] <= minToRight[i + 1])
count++;
return count;
}
TopCoder problem: https://community.topcoder.com/stat?c=problem_statement&pm=5869&rd=8078
Video explanation: https://www.youtube.com/watch?v=blICHR_ocDw
LeetCode discuss: https://leetcode.com/discuss/interview-question/352743/Google-or-Onsite-or-Guaranteed-Binary-Search-Numbers

time complexity of randomized array insertion

So I had to insert N elements in random order into a size-N array, but I am not sure about the time complexity of the program
the program is basically:
for (i = 0 -> n-1){
index = random (0, n); (n is exclusive)
while (array[index] != null)
index = random (0, n);
array[index] = n
}
Here is my assumption: a normal insertion of N numbers is of course strictly N, but how much cost will the collision from random positions cost? For each n, its collision rate increases like 0, 1/n, 2/n .... n-1/n, so expected number of insertions attempts will be 1, 2, 3 .. n-1, this is O(n), so total time complexity will be O(n^2), so is this the average cost? but wow this is really bad, am I right?
So what will happen if I do a linear search instead of keep trying to generate random numbers? Its worst case will obviously be O(n^2>, but I don't know how to analyze its average case, which depends on average input distribution?
First consider the inner loop. When do we expect to have our first success (find an open position) when there are i values already in the array? For this we use the geometric distribution:
Pr(X = k) = (1-p)^{k-1} p
Where p is the probability of success for an attempt.
Here p is the probability that the array index is not already filled.
There are i filled positions so p = (1 - (i/n)) = ((n - i)/n).
From the wiki, the expectation for the geometric distribution is 1/p = 1 / ((n-i)/n) = n/(n-i).
Therefore, we should expect to make (n / (n - i)) attempts in the inner loop when there are i items in the array.
To fill the array, we insert a new value when the array has i=0..n-1 items in it. The amount of attempts we expect to make overall is the sum:
sum_{i=0,n-1} n/(n-i)
= n * sum_{i=0,n-1}(1/(n-i))
= n * sum_{i=0,n-1}(1/(n-i))
= n * (1/n + 1/(n-1) + ... + 1/1)
= n * (1/1 + ... + 1/(n-1) + 1/n)
= n * sum_{i=1,n}(1/i)
Which is n times the nth harmonic number and is approximately ln(n) + gamma, where gamma is a constant. So overall, the number of attempts is approximately n * (ln(n) + gamma), which is O(nlog n). Remember that this is only the expectation and there is no true upper bound since the inner loop is random; it may never find an open spot.
The expected number of insertions attempt at step i is
sum_{t=0}^infinity (1-i/n)^t * (n-i)/n * t
= (n-i)/n * i/n * (1-i/n)^{-2}
= i/(n-i)
Summing over i you get
sum_{i=0}^{n-1} i/(n-1)
>= sum_{i=n/2}^n i / (n-i)
>= n/2 sum_{x=1}^n/2 1/x
>= n/2 * log(n) + O(n)
And
sum_{i=0}^{n-1} i/(n-i)
<= n * sum _{x=1}^n 1/x
<= n * log(n) + O(n)
So you get exactly n*log(n) as an asymptotic complexity. Which is not as bad as you feared.
About doing a linear search, I don't know how you would do it while keeping the array random. If you really want an efficient algorithm to shuffle your array, you should check out Fisher-Yates shuffle.

Find the median of the sum of the arrays

Two sorted arrays of length n are given and the question is to find, in O(n) time, the median of their sum array, which contains all the possible pairwise sums between every element of array A and every element of array B.
For instance: Let A[2,4,6] and B[1,3,5] be the two given arrays.
The sum array is [2+1,2+3,2+5,4+1,4+3,4+5,6+1,6+3,6+5]. Find the median of this array in O(n).
Solving the question in O(n^2) is pretty straight-forward but is there any O(n) solution to this problem?
Note: This is an interview question asked to one of my friends and the interviewer was quite sure that it can be solved in O(n) time.
The correct O(n) solution is quite complicated, and takes a significant amount of text, code and skill to explain and prove. More precisely, it takes 3 pages to do so convincingly, as can be seen in details here http://www.cse.yorku.ca/~andy/pubs/X+Y.pdf (found by simonzack in the comments).
It is basically a clever divide-and-conquer algorithm that, among other things, takes advantage of the fact that in a sorted n-by-n matrix, one can find in O(n) the amount of elements that are smaller/greater than a given number k. It recursively breaks down the matrix into smaller submatrixes (by taking only the odd rows and columns, resulting in a submatrix that has n/2 colums and n/2 rows) which combined with the step above, results in a complexity of O(n) + O(n/2) + O(n/4)... = O(2*n) = O(n). It is crazy!
I can't explain it better than the paper, which is why I'll explain a simpler, O(n logn) solution instead :).
O(n * logn) solution:
It's an interview! You can't get that O(n) solution in time. So hey, why not provide a solution that, although not optimal, shows you can do better than the other obvious O(n²) candidates?
I'll make use of the O(n) algorithm mentioned above, to find the amount of numbers that are smaller/greater than a given number k in a sorted n-by-n matrix. Keep in mind that we don't need an actual matrix! The Cartesian sum of two arrays of size n, as described by the OP, results in a sorted n-by-n matrix, which we can simulate by considering the elements of the array as follows:
a[3] = {1, 5, 9};
b[3] = {4, 6, 8};
//a + b:
{1+4, 1+6, 1+8,
5+4, 5+6, 5+8,
9+4, 9+6, 9+8}
Thus each row contains non-decreasing numbers, and so does each column. Now, pretend you're given a number k. We want to find in O(n) how many of the numbers in this matrix are smaller than k, and how many are greater. Clearly, if both values are less than (n²+1)/2, that means k is our median!
The algorithm is pretty simple:
int smaller_than_k(int k){
int x = 0, j = n-1;
for(int i = 0; i < n; ++i){
while(j >= 0 && k <= a[i]+b[j]){
--j;
}
x += j+1;
}
return x;
}
This basically counts how many elements fit the condition at each row. Since the rows and columns are already sorted as seen above, this will provide the correct result. And as both i and j iterate at most n times each, the algorithm is O(n) [Note that j does not get reset within the for loop]. The greater_than_k algorithm is similar.
Now, how do we choose k? That is the logn part. Binary Search! As has been mentioned in other answers/comments, the median must be a value contained within this array:
candidates[n] = {a[0]+b[n-1], a[1]+b[n-2],... a[n-1]+b[0]};.
Simply sort this array [also O(n*logn)], and run the binary search on it. Since the array is now in non-decreasing order, it is straight-forward to notice that the amount of numbers smaller than each candidate[i] is also a non-decreasing value (monotonic function), which makes it suitable for the binary search. The largest number k = candidate[i] whose result smaller_than_k(k) returns smaller than (n²+1)/2 is the answer, and is obtained in log(n) iterations:
int b_search(){
int lo = 0, hi = n, mid, n2 = (n²+1)/2;
while(hi-lo > 1){
mid = (hi+lo)/2;
if(smaller_than_k(candidate[mid]) < n2)
lo = mid;
else
hi = mid;
}
return candidate[lo]; // the median
}
Let's say the arrays are A = {A[1] ... A[n]}, and B = {B[1] ... B[n]}, and the pairwise sum array is C = {A[i] + B[j], where 1 <= i <= n, 1 <= j <= n} which has n^2 elements and we need to find its median.
Median of C must be an element of the array D = {A[1] + B[n], A[2] + B[n - 1], ... A[n] + B[1]}: if you fix A[i], and consider all the sums A[i] + B[j], you would see that the only A[i] + B[j = n + 1 - i] (which is one of D) could be the median. That is, it may not be the median, but if it is not, then all other A[i] + B[j] are also not median.
This can be proved by considering all B[j] and count the number of values that are lower and number of values that are greater than A[i] + B[j] (we can do this quite accurately because the two arrays are sorted -- the calculation is a bit messy thought). You'd see that for A[i] + B[n + 1 - j] these two counts are most "balanced".
The problem then reduces to finding median of D, which has only n elements. An algorithm such as Hoare's will work.
UPDATE: this answer is wrong. The real conclusion here is that the median is one of D's element, but then D's median is the not the same as C's median.
Doesn't this work?:
You can compute the rank of a number in linear time as long as A and B are sorted. The technique you use for computing the rank can also be used to find all things in A+B that are between some lower bound and some upper bound in time linear the size of the output plus |A|+|B|.
Randomly sample n things from A+B. Take the median, say foo. Compute the rank of foo. With constant probability, foo's rank is within n of the median's rank. Keep doing this (an expected constant number of times) until you have lower and upper bounds on the median that are within 2n of each other. (This whole process takes expected linear time, but it's obviously slow.)
All you have to do now is enumerate everything between the bounds and do a linear-time selection on a linear-sized list.
(Unrelatedly, I wouldn't excuse the interviewer for asking such an obviously crappy interview question. Stuff like this in no way indicates your ability to code.)
EDIT: You can compute the rank of a number x by doing something like this:
Set i = j = 0.
While j < |B| and A[i] + B[j] <= x, j++.
While i < |A| {
While A[i] + B[j] > x and j >= 0, j--.
If j < 0, break.
rank += j+1.
i++.
}
FURTHER EDIT: Actually, the above trick only narrows down the candidate space to about n log(n) members of A+B. Then you have a general selection problem within a universe of size n log(n); you can do basically the same trick one more time and find a range of size proportional to sqrt(n) log(n) where you do selection.
Here's why: If you sample k things from an n-set and take the median, then the sample median's order is between the (1/2 - sqrt(log(n) / k))th and the (1/2 + sqrt(log(n) / k))th elements with at least constant probability. When n = |A+B|, we'll want to take k = sqrt(n) and we get a range of about sqrt(n log n) elements --- that's about |A| log |A|. But then you do it again and you get a range on the order of sqrt(n) polylog(n).
You should use a selection algorithm to find the median of an unsorted list in O(n). Look at this: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

Dividing a graph in three parts such the maximum of the sum of weights of the three parts is minimized

I want to divide a graph with N weighted-vertices and N-1 edges into three parts such that the maximum of the sum of weights of all the vertices in each of the parts is minimized. This is the actual problem i am trying to solve, http://www.iarcs.org.in/inoi/contests/jan2006/Advanced-1.php
I considered the following method
/*Edges are stored in an array E, and also in an adjacency matrix for depth first search.
Every edge in E has two attributes a and b which are the nodes of the edge*/
min-max = infinity
for i -> 0 to length(E):
for j -> i+1 to length(E):
/*Call depth first search on the nodes of both the edges E[i] and E[j]
the depth first search returns the sum of weights of the vertices it visits,
we keep track of the maximum weight returned by dfs*/
Adjacency-matrix[E[i].a][E[i].b] = 0;
Adjacency-matrix[E[j].a][E[j].b] = 0;
max = 0
temp = dfs(E[i].a)
if temp > max then max = temp
temp = dfs(E[i].b)
if temp > max then max = temp
temp = dfs(E[i].a)
if temp > max then max = temp
temp = dfs(E[i].a)
if temp > max then max = temp
if max < min-max
min-max = max
Adjacency-matrix[E[i].a][E[i].b] = 1;
Adjacency-matrix[E[j].a][E[j].b] = 1;
/*The depth first search is called four times but it will terminate one time
if we keep track of the visited vertices because there are only three components*/
/*After the outer loop terminates what we have in min-max will be the answer*/
The above algorithm takes O(n^3) time, as the number of edges will be n-1 the outer loop will run (n-1)! times that takes O(n^2) the dfs will visit each vertex only one so that is O(n) time.
But the problem is that n can be <= 3000 and O(n^3) time is not good for this problem. Is there any other method which will calculate the solve the question in the link faster than n^3?
EDIT: I implemented #BorisStrandjev's algorithm in c, it gave me a correct answer for the test input in the question, but for all other test inputs it gives a wrong answer, here is a link to my code in ideone http://ideone.com/67GSa2, the output here should be 390 but the program prints 395.
I am trying to find if i have made any mistake in my code but i dont see any. Can anyone please help me here the answers my code gave are very close to the correct answer so is there anything more to the algorithm?
EDIT 2: In the following graph-
#BorisStrandjev, your algorithm will chose i as 1, j as 2 in one of the iterations, but then the third part (3,4) is invalid.
EDIT 3
I finally got the mistake in my code, instead of V[i] storing sum of i and all its descendants it stored V[i] and its ancestors, otherwise it would solve the above example correctly, thanks to all of you for your help.
Yes there is faster method.
I will need few auxiliary matrices and I will leave their creation and initialization in correct way to you.
First of all plant the tree - that is make the graph directed. Calculate array VAL[i] for each vertex - the amount of passengers for a vertex and all its descendants (remember we planted, so now this makes sense). Also calculate the boolean matrix desc[i][j] that will be true if vertex i is descendant of vertex j. Then do the following:
best_val = n
for i in 1...n
for j in i + 1...n
val_of_split = 0
val_of_split_i = VAL[i]
val_of_split_j = VAL[j]
if desc[i][j] val_of_split_j -= VAL[i] // subtract all the nodes that go to i
if desc[j][i] val_of_split_i -= VAL[j]
val_of_split = max(val_of_split, val_of_split_i)
val_of_split = max(val_of_split, val_of_split_j)
val_of_split = max(val_of_split, n - val_of_split_i - val_of_split_j)
best_val = min(best_val, val_of_split)
After the execution of this cycle the answer will be in best_val. the algorithm is clearly O(n^2) you just need to figure out how to calculate desc[i][j] and VAL[i] in such complexity, but it is not so complex a task, I think you can figure it out yourself.
EDIT Here I will include the code for the whole problem in pseudocode. I deliberately did not include the code before the OP tried and solved it by himself:
int p[n] := // initialized from the input - price of the node itself
adjacency_list neighbors := // initialized to store the graph adjacency list
int VAL[n] := { 0 } // the price of a node and all its descendants
bool desc[n][n] := { false } // desc[i][j] - whether i is descendant of j
boolean visited[n][n] := {false} // whether the dfs visited the node already
stack parents := {empty-stack}; // the stack of nodes visited during dfs
dfs ( currentVertex ) {
VAL[currentVertex] = p[currentVertex]
parents.push(currentVertex)
visited[currentVertex] = true
for vertex : parents // a bit extended stack definition supporting iteration
desc[currentVertex][vertex] = true
for vertex : adjacency_list[currentVertex]
if visited[vertex] continue
dfs (currentvertex)
VAL[currentVertex] += VAL[vertex]
perents.pop
calculate_best ( )
dfs(0)
best_val = n
for i in 0...(n - 1)
for j in i + 1...(n - 1)
val_of_split = 0
val_of_split_i = VAL[i]
val_of_split_j = VAL[j]
if desc[i][j] val_of_split_j -= VAL[i]
if desc[j][i] val_of_split_i -= VAL[j]
val_of_split = max(val_of_split, val_of_split_i)
val_of_split = max(val_of_split, val_of_split_j)
val_of_split = max(val_of_split, n - val_of_split_i - val_of_split_j)
best_val = min(best_val, val_of_split)
return best_val
And the best split will be {descendants of i} \ {descendants of j}, {descendants of j} \ {descendants of i} and {all nodes} \ {descendants of i} U {descendants of j}.
You can use a combination of Binary Search & DFS to solve this problem.
Here's how I would proceed:
Calculate the total weight of the graph, and also find the heaviest edge in the graph. Let them be Sum, MaxEdge resp.
Now we have to run a binary search between this range: [maxEdge, Sum].
In each search iteration, middle = (start + end / 2). Now, pick a start node and perform a DFS s.t. the sum of edges traversed in the sub-graph is as close to 'middle' as possible. But keep this sum to be less than middle. This will be one sub graph. In the same iteration, now pick another node which is unmarked by the previous DFS. Perform another DFS in the same way. Likewise, do it once more because we need to break the graph into 3 parts.
The min. weight amongst the 3 sub-graphs calculated above is the solution from this iteration.
Keep running this binary search until its end variable exceeds its start variable.
The max of all the mins obtained in step 4 is your answer.
You can do extra book-keeping in order to get the 3-sub-graphs.
Order complexity : N log(Sum) where Sum is the total weight of the graph.
I just noticed that you have talked about weighted vertices, and not edges. In that case, just treat edges as vertices in my solution. It should still work.
EDIT 4: THIS WON'T WORK!!!
If you process the nodes in the link in the order 3,4,5,6,1,2, after processing 6, (I think) you'll have the following sets: {{3,4},{5},{6}}, {{3,4,5},{6}}, {{3,4,5,6}}, with no simple way to split them up again.
I'm just leaving this answer here in case anyone else was thinking of a DP algorithm.
It might work to look at all the already processed neighbours in the DP algorithm.
.
I'm thinking a Dynamic Programming algorithm, where the matrix is (item x number of sets)
n = number of sets
k = number of vertices
// row 0 represents 0 elements included
A[0, 0] = 0
for (s = 1:n)
A[0, s] = INFINITY
for (i = 1:k)
for (s = 0:n)
B = A[i-1, s] with i inserted into minimum one of its neighbouring sets
A[i, s] = min(A[i-1, s-1], B)) // A[i-1, s-1] = INFINITY if s-1 < 0
EDIT: Explanation of DP:
This is a reasonably basic Dynamic Programming algorithm. If you need a better explanation, you should read up on it some more, it's a very powerful tool.
A is a matrix. The row i represents a graph with all vertices up to i included. The column c represents the solution with number of sets = c.
So A[2,3] would give the best result of a graph containing item 0, item 1 and item 2 and 3 sets, thus each in it's own set.
You then start at item 0, calculate the row for each number of sets (the only valid one is number of sets = 1), then do item 1 with the above formula, then item 2, etc.
A[a, b] is then the optimal solution with all vertices up to a included and b number of sets. So you'll just return A[k, n] (the one that has all vertices included and the target number of sets).
EDIT 2: Complexity
O(k*n*b) where b is the branching factor of a node (assuming you use an adjacency list).
Since n = 3, this is O(3*k*b) = O(k*b).
EDIT 3: Deciding which neighbouring set a vertex should be added to
Keep n arrays of k elements each in a union find structure, with each set pointing to the sum for that set. For each new row, to determine which sets a vertex can be added to, we use its adjacency list and look-up the set and value of each of its neighbours. Once we find the best option, we can just add that element to the applicable set and increment its sum by the added element's value.
You'll notice the algorithm only looks down 1 row, so we only need to keep track of the last row (not store the whole matrix), and can modify the previous row's n arrays rather than copying them.

Resources