Consider the following question:
Given a 2D array of unsigned integers and a maximum length n, find a path in that matrix that is not longer than n and which maximises the sum. The output should consist of both the path and the sum.
A path consists of neighbouring integers that are either all in the same row, or in the same column, or down a diagonal in the down-right direction.
For example, consider the following matrix and a given path length limit of 3:
1 2 3 4 5
2 1 2 2 1
3 4 5* 6 5
3 3 5 10* 5
1 2 5 7 15*
The most optimal path would be 5 + 10 + 15 (nodes are marked with *).
Now, upon seeing this problem, immediately a Dynamic Programming solution seems to be most appropriate here, given this problem's similarity to other problems like Min Cost Path or Maximum Sum Rectangular Submatrix. The issue is that in order to correctly solve this problem, you need to start building up the paths from every integer (node) in the matrix and not just start the path from the top left and end on the bottom right.
I was initially thinking of an approach similar to that of the solution for Maximum Sum Rectangular Submatrix in which I could store each possible path from every node (with path length less than n, only going right/down), but the only way I can envision that approach is by making recursive calls for down and right from each node which would seem to defeat the purpose of DP. Also, I need to be able to store the max path.
Another possible solution I was thinking about was somehow adapting a longest path search and running it from each int in the graph where each int is like an edge weight.
What would be the most efficient way to find the max path?
The challenge here is to avoid to sum the same nodes more than once. For that you could apply the following algorithm:
Algorithm
For each of the 3 directions (down, down+right, right) perform steps 2 and 3:
Determine the number of lines that exist in this direction. For the downward direction, this is the number of columns. For the rightward direction, this is the number of rows. For the diagonal direction, this is the number of diagonal lines, i.e. the sum of the number of rows and columns minus 1, as depicted by the red lines below:
For each line do:
Determine the first node on that line (call it the "head"), and also set the "tail" to that same node. These two references refer to the end points of the "current" path. Also set both the sum and path-length to zero.
For each head node on the current line perform the following bullet points:
Add the head node's value to the sum and increase the path length
If the path length is larger than the allowed maximum, subtract the tail's value from the sum, and set the tail to the node that follows it on the current line
Whenever the sum is greater than the greatest sum found so far, remember it together with the path's location.
Set the head to the node that follows it on the current line
At the end return the greatest sum and the path that generated this sum.
Code
Here is an implementation in basic JavaScript:
function maxPathSum(matrix, maxLen) {
var row, rows, col, cols, line, lines, dir, dirs, len,
headRow, headCol, tailRow, tailCol, sum, maxSum;
rows = matrix.length;
cols = matrix[0].length;
maxSum = -1;
dirs = 3; // Number of directions that paths can follow
if (maxLen == 1 || cols == 1)
dirs = 1; // Only need to check downward directions
for (dir = 1; dir <= 3; dir++) {
// Number of lines in this direction to try paths on
lines = [cols, rows, rows + cols - 1][dir-1];
for (line = 0; line < lines; line++) {
sum = 0;
len = 0;
// Set starting point depending on the direction
headRow = [0, line, line >= rows ? 0 : line][dir-1];
headCol = [line, 0, line >= rows ? line - rows : 0][dir-1];
tailRow = headRow;
tailCol = headCol;
// Traverse this line
while (headRow < rows && headCol < cols) {
// Lengthen the path at the head
sum += matrix[headRow][headCol];
len++;
if (len > maxLen) {
// Shorten the path at the tail
sum -= matrix[tailRow][tailCol];
tailRow += dir % 2;
tailCol += dir >> 1;
}
if (sum > maxSum) {
// Found a better path
maxSum = sum;
path = '(' + tailRow + ',' + tailCol + ') - '
+ '(' + headRow + ',' + headCol + ')';
}
headRow += dir % 2;
headCol += dir >> 1;
}
}
}
// Return the maximum sum and the string representation of
// the path that has this sum
return { maxSum, path };
}
// Sample input
var matrix = [
[1, 2, 3, 4, 5],
[2, 1, 2, 2, 1],
[3, 4, 5, 5, 5],
[3, 3, 5, 10, 5],
[1, 2, 5, 5, 15],
];
var best = maxPathSum(matrix, 3);
console.log(best);
Some details about the code
Be aware that row/column indexes start at 0.
The way the head and tail coordinates are incremented is based on the binary representation of the dir variable: it takes these three values (binary notation): 01, 10, 11
You can then take the first bit to indicate whether the next step in the direction is on the next column (1) or not (0), and the second bit to indicate whether it is on the next row (1) or not (0). You can depict it like this, where 00 represents the "current" node:
00 10
01 11
So we have this meaning to the values of dir:
01: walk along the column
10: walk along the row
11: walk diagonally
The code uses >>1 for extracting the first bit, and % 2 for extracting the last bit. That operation will result in a 0 or 1 in both cases, and is the value that needs to be added to either the column or the row.
The following expression creates a 1D array and takes one of its values on-the-fly:
headRow = [0, line, line >= rows ? 0 : line][dir-1];
It is short for:
switch (dir) {
case 1:
headRow = 0;
break;
case 2:
headRow = line;
break;
case 3:
if (line >= rows)
headRow = 0
else
headRow = line;
break;
}
Time and space complexity
The head will visit each node exactly once per direction. The tail will visit fewer nodes. The number of directions is constant, and the max path length value does not influence the number of head visits, so the time complexity is:
Θ(rows * columns)
There are no additional arrays used in this algorithm, just a few primitive variables. So the additional space complexity is:
Θ(1)
which both are the best you could hope for.
Is it Dynamic Programming?
In a DP solution you would typically use some kind of tabulation or memoization, possibly in the form of a matrix, where each sub-result found for a particular node is input for determining the result for neighbouring nodes.
Such solutions could need Θ(rows*columns) extra space. But this problem can be solved without such (extensive) space usage. When looking at one line at a time (a row, a column or a diagonal), the algorithm has some similarities with Kadane's algorithm:
One difference is that here the choice to extend or shorten the path/subarray is not dependent on the matrix data itself, but on the given maximum length. This is also related to the fact that here all values are guaranteed to be non-negative, while Kadane's algorithm is suitable for signed numbers.
Just like with Kadane's algorithm the best solution so far is maintained in a separate variable.
Another difference is that here we need to look in three directions. But that just means repeating the same algorithm in those three directions, while carrying over the best solution found so far.
This is a very basic use of Dynamic Programming, since you don't need the tabulation or memoization techniques here. We only keep the best results in the variables sum and maxSum. That cannot be viewed as tabluation or memoization, which typically keep track of several competing results that must be compared at some time. See this interesting answer on the subject.
Use F[i][j][k] as the max path sum where the path has length k and ends at position (i, j).
F[i][j][k] can be computed from F[i-1][j][k-1] and F[i][j-1][k-1].
The answer would be the maximum value of F.
To retrieve the max path, use another table G[i][j][k] to store the last step of F[i][j][k], i.e. it comes from (i-1,j) or (i,j-1).
The constraints are that the path can only be created by going down or to the right in the matrix.
Solution complexity O(N * M * L) where:
N: number of rows
M: number of columns
L: max length of the path
int solve(int x, int y, int l) {
if(x > N || y > M) { return -INF; }
if(l == 1) {matrix[x][y];}
if(dp[x][y][l] != -INF) {return dp[x][y][l];} // if cached before, return the answer
int option1 = solve(x+1, y, l-1); // take a step down
int option2 = solve(x, y+1, l-1); // take a step right
maxPath [x][n][l] = (option1 > option2 ) ? DOWN : RIGHT; // to trace the path
return dp[x][y][l] = max(option1, option2) + matrix[x][y];
}
example: solve(3,3,3): max path sum starting from (3,3) with length 3 ( 2 steps)
Related
We have an strictly increasing array of length n ( 1 < n < 500) . We sum the digits of each element to create a new array with each elements values is in range 1 to 500.The task is to rebuild the old array from the new one. since there might be more than one answer, we want the answers with the minimum value of the last element.
Example:
3 11 23 37 45 123 =>3 2 5 10 9 6
now from the second array, we can rebuild the original array in many different ways for instance:
12 20 23 37 54 60
from all the possible combinations, we need the one we minimum last element.
My Thoughts so far:
The brute force way is to find all possible permutations to create each number and then create all combinations possible of all numbers of the second array and find the combination with minimum last element. It is obvious that this is not a good choice.
Using this algorithm(with exponential time!) we can create all possible permutations of digits that sum to a number in the second arrays. Note that we know the original elements were less than 500 so we can limit the death of search of the algorithm.
One way I thought of that might find the answer faster is to:
start from the last element in the new arrays and find all possible
numbers that their digit sum resulted this element.
Then try to use the smallest amount in the last step for this element.
Now try to do the same with the second to last element. If the
minimum permutation value found for the second to last element is bigger
than the one found for the last element, backtrack to the last
element and try a larger permutation.
Do this until you get to the first element.
I think this is a greed solution but I'm not very sure about the time complexity. Also I want to know is there a better solution for this problem? like using dp?
For simplicity, let's have our sequence 1-based and the input sequence is called x.
We will also use an utility function, which returns the sum of the digits of a given number:
int sum(int x) {
int result = 0;
while (x > 0) {
result += x % 10;
x /= 10;
}
return result;
}
Let's assume that we are at index idx and try to set there some number called value (given that the sum of digits of value is x[idx]). If we do so, then what could we say about the previous number in the sequence? It should be strictly less than value.
So we already have a state for a potential dp approach - [idx, value], where idx is the index where we are currently at and value denotes the value we are trying to set on this index.
If the dp table holds boolean values, we will know we have found an answer if we have found a suitable number for the first number in the sequence. Therefore, if there is a path starting from the last row in the dp table and ends at row 0 then we'll know we have found an answer and we could then simply restore it.
Our recurrence function will be something like this:
f(idx, value) = OR {dp[idx - 1][value'], where sumOfDigits(value) = x[idx] and value' < value}
f(0, *) = true
Also, in order to restore the answer, we need to track the path. Once we set any dp[idx][value] cell to be true, then we can safe the value' to which we would like to jump in the previous table row.
Now let's code that one. I hope the code is self-explanatory:
boolean[][] dp = new boolean[n + 1][501];
int[][] prev = new int[n + 1][501];
for (int i = 0; i <= 500; i++) {
dp[0][i] = true;
}
for (int idx = 1; idx <= n; idx++) {
for (int value = 1; value <= 500; value++) {
if (sum(value) == x[idx]) {
for (int smaller = 0; smaller < value; smaller++) {
dp[idx][value] |= dp[idx - 1][smaller];
if (dp[idx][value]) {
prev[idx][value] = smaller;
break;
}
}
}
}
}
The prev table only keeps information about which is the smallest value', which we can use as previous to our idx in the resulting sequence.
Now, in order to restore the sequence, we can start from the last element. We would like it to be minimal, so we can find the first one that has dp[n][value] = true. Once we have such element, we then use the prev table to track down the values up to the first one:
int[] result = new int[n];
int idx = n - 1;
for (int i = 0; i <= 500; i++) {
if (dp[n][i]) {
int row = n, col = i;
while (row > 0) {
result[idx--] = col;
col = prev[row][col];
row--;
}
break;
}
}
for (int i = 0; i < n; i++) {
out.print(result[i]);
out.print(' ');
}
If we apply this on an input sequence:
3 2 5 10 9 6
we get
3 11 14 19 27 33
The time complexity is O(n * m * m), where n is the number of elements we have and m is the maximum possible value that an element could hold.
The space complexity is O(n * m) as this is dominated by the size of the dp and prev tables.
We can use a greedy algorithm: proceed through the array in order, setting each element to the least value that is greater than the previous element and has digits with the appropriate sum. (We can just iterate over the possible values and check the sums of their digits.) There's no need to consider any greater value than that, because increasing a given element will never make it possible to decrease a later element. So we don't need dynamic programming here.
We can calculate the sum of the digits of an integer m in O(log m) time, so the whole solution takes O(b log b) time, where b is the upper bound (500 in your example).
So this problem was asked in a quiz and the problem goes like:
You are given an array 'a' with elements ranging from 1-106 and the size of array could be maximum 105 Now we are asked to find the number of subarrays with the same 'degree' as the original array. Degree of an array is defined as the frequency of maximum occurring element in the array. Multiple elements could have the same frequency.
I was stuck in this problem for like an hour but couldn't think of any solution. How do I solve it?
Sample Input:
first-input
1,2,2,3,1
first-output 2
second-input
1,1,2,1,2,2
second-output 4
The element that occurs most frequently is called the mode; this problem defines degree as the frequency count. Your tasks are:
Identify all of the mode values.
For each mode value, find the index range of that value. For instance, in the array
[1, 1, 2, 1, 3, 3, 2, 4, 2, 4, 5, 5, 5]
You have three modes (1 2 5) with a degree of 3. The index ranges are
1 - 0:3
2 - 2:8
5 - 10:12
You need to count all index ranges (subarrays) that include at least one of those three ranges.
I've tailored this example to have both basic cases: modes that overlap, and those that do not. Note that containment is a moot point: if you have an array where one mode's range contains another:
[0, 1, 1, 1, 0, 0]
You can ignore the outer one altogether: any subarray that contains 0 will also contain 1.
ANALYSIS
A subarray is defined by two numbers, the starting and ending indices. Since we must have 0 <= start <= end <= len(array), this is the "handshake" problem between array bounds. We have N(N+1)/2 possible subarrays.
For 10**5 elements, you could just brute-force the problem from here: for each pair of indices, check to see whether that range contains any of the mode ranges. However, you can easily cut that down with interval recognition.
ALGORITHM
Step through the mode ranges, left to right. First, count all subranges that include the first mode range [0:3]. There is only 1 possible starts [0] and 10 possible ends [3:12]; that's 10 subarrays.
Now move to the second mode range, [2:8]. You need to count subarrays that include this, but exclude those you've already counted. Since there's an overlap, you need a starting point later than 0, or an ending point before 3. This second clause is not possible with the given range.
Thus, you consider start [1:2], end [8:12]. That's 2 * 5 more subarrays.
For the third range [10:12 (no overlap), you need a starting point that does not include any other subrange. This means that any starting point [3:10] will do. Since there's only one possible endpoint, you have 8*1, or 8 more subarrays.
Can you turn this into something formal?
Taking reference from leet code
https://leetcode.com/problems/degree-of-an-array/solution/
solve
class Solution {
public int findShortestSubArray(int[] nums) {
Map<Integer, Integer> left = new HashMap(),
right = new HashMap(), count = new HashMap();
for (int i = 0; i < nums.length; i++) {
int x = nums[i];
if (left.get(x) == null) left.put(x, i);
right.put(x, i);
count.put(x, count.getOrDefault(x, 0) + 1);
}
int ans = nums.length;
int degree = Collections.max(count.values());
for (int x: count.keySet()) {
if (count.get(x) == degree) {
ans = Math.min(ans, right.get(x) - left.get(x) + 1);
}
}
return ans;
}
}
I want to divide a graph with N weighted-vertices and N-1 edges into three parts such that the maximum of the sum of weights of all the vertices in each of the parts is minimized. This is the actual problem i am trying to solve, http://www.iarcs.org.in/inoi/contests/jan2006/Advanced-1.php
I considered the following method
/*Edges are stored in an array E, and also in an adjacency matrix for depth first search.
Every edge in E has two attributes a and b which are the nodes of the edge*/
min-max = infinity
for i -> 0 to length(E):
for j -> i+1 to length(E):
/*Call depth first search on the nodes of both the edges E[i] and E[j]
the depth first search returns the sum of weights of the vertices it visits,
we keep track of the maximum weight returned by dfs*/
Adjacency-matrix[E[i].a][E[i].b] = 0;
Adjacency-matrix[E[j].a][E[j].b] = 0;
max = 0
temp = dfs(E[i].a)
if temp > max then max = temp
temp = dfs(E[i].b)
if temp > max then max = temp
temp = dfs(E[i].a)
if temp > max then max = temp
temp = dfs(E[i].a)
if temp > max then max = temp
if max < min-max
min-max = max
Adjacency-matrix[E[i].a][E[i].b] = 1;
Adjacency-matrix[E[j].a][E[j].b] = 1;
/*The depth first search is called four times but it will terminate one time
if we keep track of the visited vertices because there are only three components*/
/*After the outer loop terminates what we have in min-max will be the answer*/
The above algorithm takes O(n^3) time, as the number of edges will be n-1 the outer loop will run (n-1)! times that takes O(n^2) the dfs will visit each vertex only one so that is O(n) time.
But the problem is that n can be <= 3000 and O(n^3) time is not good for this problem. Is there any other method which will calculate the solve the question in the link faster than n^3?
EDIT: I implemented #BorisStrandjev's algorithm in c, it gave me a correct answer for the test input in the question, but for all other test inputs it gives a wrong answer, here is a link to my code in ideone http://ideone.com/67GSa2, the output here should be 390 but the program prints 395.
I am trying to find if i have made any mistake in my code but i dont see any. Can anyone please help me here the answers my code gave are very close to the correct answer so is there anything more to the algorithm?
EDIT 2: In the following graph-
#BorisStrandjev, your algorithm will chose i as 1, j as 2 in one of the iterations, but then the third part (3,4) is invalid.
EDIT 3
I finally got the mistake in my code, instead of V[i] storing sum of i and all its descendants it stored V[i] and its ancestors, otherwise it would solve the above example correctly, thanks to all of you for your help.
Yes there is faster method.
I will need few auxiliary matrices and I will leave their creation and initialization in correct way to you.
First of all plant the tree - that is make the graph directed. Calculate array VAL[i] for each vertex - the amount of passengers for a vertex and all its descendants (remember we planted, so now this makes sense). Also calculate the boolean matrix desc[i][j] that will be true if vertex i is descendant of vertex j. Then do the following:
best_val = n
for i in 1...n
for j in i + 1...n
val_of_split = 0
val_of_split_i = VAL[i]
val_of_split_j = VAL[j]
if desc[i][j] val_of_split_j -= VAL[i] // subtract all the nodes that go to i
if desc[j][i] val_of_split_i -= VAL[j]
val_of_split = max(val_of_split, val_of_split_i)
val_of_split = max(val_of_split, val_of_split_j)
val_of_split = max(val_of_split, n - val_of_split_i - val_of_split_j)
best_val = min(best_val, val_of_split)
After the execution of this cycle the answer will be in best_val. the algorithm is clearly O(n^2) you just need to figure out how to calculate desc[i][j] and VAL[i] in such complexity, but it is not so complex a task, I think you can figure it out yourself.
EDIT Here I will include the code for the whole problem in pseudocode. I deliberately did not include the code before the OP tried and solved it by himself:
int p[n] := // initialized from the input - price of the node itself
adjacency_list neighbors := // initialized to store the graph adjacency list
int VAL[n] := { 0 } // the price of a node and all its descendants
bool desc[n][n] := { false } // desc[i][j] - whether i is descendant of j
boolean visited[n][n] := {false} // whether the dfs visited the node already
stack parents := {empty-stack}; // the stack of nodes visited during dfs
dfs ( currentVertex ) {
VAL[currentVertex] = p[currentVertex]
parents.push(currentVertex)
visited[currentVertex] = true
for vertex : parents // a bit extended stack definition supporting iteration
desc[currentVertex][vertex] = true
for vertex : adjacency_list[currentVertex]
if visited[vertex] continue
dfs (currentvertex)
VAL[currentVertex] += VAL[vertex]
perents.pop
calculate_best ( )
dfs(0)
best_val = n
for i in 0...(n - 1)
for j in i + 1...(n - 1)
val_of_split = 0
val_of_split_i = VAL[i]
val_of_split_j = VAL[j]
if desc[i][j] val_of_split_j -= VAL[i]
if desc[j][i] val_of_split_i -= VAL[j]
val_of_split = max(val_of_split, val_of_split_i)
val_of_split = max(val_of_split, val_of_split_j)
val_of_split = max(val_of_split, n - val_of_split_i - val_of_split_j)
best_val = min(best_val, val_of_split)
return best_val
And the best split will be {descendants of i} \ {descendants of j}, {descendants of j} \ {descendants of i} and {all nodes} \ {descendants of i} U {descendants of j}.
You can use a combination of Binary Search & DFS to solve this problem.
Here's how I would proceed:
Calculate the total weight of the graph, and also find the heaviest edge in the graph. Let them be Sum, MaxEdge resp.
Now we have to run a binary search between this range: [maxEdge, Sum].
In each search iteration, middle = (start + end / 2). Now, pick a start node and perform a DFS s.t. the sum of edges traversed in the sub-graph is as close to 'middle' as possible. But keep this sum to be less than middle. This will be one sub graph. In the same iteration, now pick another node which is unmarked by the previous DFS. Perform another DFS in the same way. Likewise, do it once more because we need to break the graph into 3 parts.
The min. weight amongst the 3 sub-graphs calculated above is the solution from this iteration.
Keep running this binary search until its end variable exceeds its start variable.
The max of all the mins obtained in step 4 is your answer.
You can do extra book-keeping in order to get the 3-sub-graphs.
Order complexity : N log(Sum) where Sum is the total weight of the graph.
I just noticed that you have talked about weighted vertices, and not edges. In that case, just treat edges as vertices in my solution. It should still work.
EDIT 4: THIS WON'T WORK!!!
If you process the nodes in the link in the order 3,4,5,6,1,2, after processing 6, (I think) you'll have the following sets: {{3,4},{5},{6}}, {{3,4,5},{6}}, {{3,4,5,6}}, with no simple way to split them up again.
I'm just leaving this answer here in case anyone else was thinking of a DP algorithm.
It might work to look at all the already processed neighbours in the DP algorithm.
.
I'm thinking a Dynamic Programming algorithm, where the matrix is (item x number of sets)
n = number of sets
k = number of vertices
// row 0 represents 0 elements included
A[0, 0] = 0
for (s = 1:n)
A[0, s] = INFINITY
for (i = 1:k)
for (s = 0:n)
B = A[i-1, s] with i inserted into minimum one of its neighbouring sets
A[i, s] = min(A[i-1, s-1], B)) // A[i-1, s-1] = INFINITY if s-1 < 0
EDIT: Explanation of DP:
This is a reasonably basic Dynamic Programming algorithm. If you need a better explanation, you should read up on it some more, it's a very powerful tool.
A is a matrix. The row i represents a graph with all vertices up to i included. The column c represents the solution with number of sets = c.
So A[2,3] would give the best result of a graph containing item 0, item 1 and item 2 and 3 sets, thus each in it's own set.
You then start at item 0, calculate the row for each number of sets (the only valid one is number of sets = 1), then do item 1 with the above formula, then item 2, etc.
A[a, b] is then the optimal solution with all vertices up to a included and b number of sets. So you'll just return A[k, n] (the one that has all vertices included and the target number of sets).
EDIT 2: Complexity
O(k*n*b) where b is the branching factor of a node (assuming you use an adjacency list).
Since n = 3, this is O(3*k*b) = O(k*b).
EDIT 3: Deciding which neighbouring set a vertex should be added to
Keep n arrays of k elements each in a union find structure, with each set pointing to the sum for that set. For each new row, to determine which sets a vertex can be added to, we use its adjacency list and look-up the set and value of each of its neighbours. Once we find the best option, we can just add that element to the applicable set and increment its sum by the added element's value.
You'll notice the algorithm only looks down 1 row, so we only need to keep track of the last row (not store the whole matrix), and can modify the previous row's n arrays rather than copying them.
Given an array A with n
integers. In one turn one can apply the
following operation to any consecutive
subarray A[l..r] : assign to all A i (l <= i <= r)
median of subarray A[l..r] .
Let max be the maximum integer of A .
We want to know the minimum
number of operations needed to change A
to an array of n integers each with value
max.
For example, let A = [1, 2, 3] . We want to change it to [3, 3, 3] . We
can do this in two operations, first for
subarray A[2..3] (after that A equals to [1,
3, 3] ), then operation to A[1..3] .
Also,median is defined for some array A as follows. Let B be the same
array A , but sorted in non-decreasing
order. Median of A is B m (1-based
indexing), where m equals to (n div 2)+1 .
Here 'div' is an integer division operation.
So, for a sorted array with 5 elements,
median is the 3rd element and for a sorted
array with 6 elements, it is the 4th element.
Since the maximum value of N is 30.I thought of brute forcing the result
could there be a better solution.
You can double the size of the subarray containing the maximum element in each iteration. After the first iteration, there is a subarray of size 2 containing the maximum. Then apply your operation to a subarray of size 4, containing those 2 elements, giving you a subarray of size 4 containing the maximum. Then apply to a size 8 subarray and so on. You fill the array in log2(N) operations, which is optimal. If N is 30, five operations is enough.
This is optimal in the worst case (i.e. when only one element is the maximum), since it sets the highest possible number of elements in each iteration.
Update 1: I noticed I messed up the 4s and 8s a bit. Corrected.
Update 2: here's an example. Array size 10, start state:
[6 1 5 9 3 2 0 7 4 8]
To get two nines, run op on subarray of size two containing the nine. For instance A[4…5] gets you:
[6 1 5 9 9 2 0 7 4 8]
Now run on size four subarray that contains 4…5, for instance on A[2…5] to get:
[6 9 9 9 9 2 0 7 4 8]
Now on subarray of size 8, for instance A[1…8], get:
[9 9 9 9 9 9 9 9 4 8]
Doubling now would get us 16 nines, but we have only 10 positions, so round of with A[1…10], get:
[9 9 9 9 9 9 9 9 9 9]
Update 3: since this is only optimal in the worst case, it is actually not an answer to the original question, which asks for a way of finding the minimal number of operations for all inputs. I misinterpreted the sentence about brute forcing to be about brute forcing with the median operations, rather than in finding the minimum sequence of operations.
This is the problem from codechef Long Contest.Since the contest is already over,so awkwardiom ,i am pasting the problem setter approach (Source : CC Contest Editorial Page).
"Any state of the array can be represented as a binary mask with each bit 1 means that corresponding number is equal to the max and 0 otherwise. You can run DP with state R[mask] and O(n) transits. You can proof (or just believe) that the number of statest will be not big, of course if you run good DP. The state of our DP will be the mask of numbers that are equal to max. Of course, it makes sense to use operation only for such subarray [l; r] that number of 1-bits is at least as much as number of 0-bits in submask [l; r], because otherwise nothing will change. Also you should notice that if the left bound of your operation is l it is good to make operation only with the maximal possible r (this gives number of transits equal to O(n)). It was also useful for C++ coders to use map structure to represent all states."
The C/C++ Code is::
#include <cstdio>
#include <iostream>
using namespace std;
int bc[1<<15];
const int M = (1<<15) - 1;
void setMin(int& ret, int c)
{
if(c < ret) ret = c;
}
void doit(int n, int mask, int currentSteps, int& currentBest)
{
int numMax = bc[mask>>15] + bc[mask&M];
if(numMax == n) {
setMin(currentBest, currentSteps);
return;
}
if(currentSteps + 1 >= currentBest)
return;
if(currentSteps + 2 >= currentBest)
{
if(numMax * 2 >= n) {
setMin(currentBest, 1 + currentSteps);
}
return;
}
if(numMax < (1<<currentSteps)) return;
for(int i=0;i<n;i++)
{
int a = 0, b = 0;
int c = mask;
for(int j=i;j<n;j++)
{
c |= (1<<j);
if(mask&(1<<j)) b++;
else a++;
if(b >= a) {
doit(n, c, currentSteps + 1, currentBest);
}
}
}
}
int v[32];
void solveCase() {
int n;
scanf(" %d", &n);
int maxElement = 0;
for(int i=0;i<n;i++) {
scanf(" %d", v+i);
if(v[i] > maxElement) maxElement = v[i];
}
int mask = 0;
for(int i=0;i<n;i++) if(v[i] == maxElement) mask |= (1<<i);
int ret = 0, p = 1;
while(p < n) {
ret++;
p *= 2;
}
doit(n, mask, 0, ret);
printf("%d\n",ret);
}
main() {
for(int i=0;i<(1<<15);i++) {
bc[i] = bc[i>>1] + (i&1);
}
int cases;
scanf(" %d",&cases);
while(cases--) solveCase();
}
The problem setter approach has exponential complexity. It is pretty good for N=30. But not so for larger sizes. I think, it's more interesting to find an exponential time solution. And I found one, with O(N4) complexity.
This approach uses the fact that optimal solution starts with some group of consecutive maximal elements and extends only this single group until whole array is filled with maximal values.
To prove this fact, take 2 starting groups of consecutive maximal elements and extend each of them in optimal way until they merge into one group. Suppose that group 1 needs X turns to grow to size M, group 2 needs Y turns to grow to the same size M, and on turn X + Y + 1 these groups merge. The result is a group of size at least M * 4. Now instead of turn Y for group 2, make an additional turn X + 1 for group 1. In this case group sizes are at least M * 2 and at most M / 2 (even if we count initially maximal elements, that might be included in step Y). After this change, on turn X + Y + 1 the merged group size is at least M * 4 only as a result of the first group extension, add to this at least one element from second group. So extending a single group here produces larger group in same number of steps (and if Y > 1, it even requires less steps). Since this works for equal group sizes (M), it will work even better for non-equal groups. This proof may be extended to the case of several groups (more than two).
To work with single group of consecutive maximal elements, we need to keep track of only two values: starting and ending positions of the group. Which means it is possible to use a triangular matrix to store all possible groups, allowing to use a dynamic programming algorithm.
Pseudo-code:
For each group of consecutive maximal elements in original array:
Mark corresponding element in the matrix and clear other elements
For each matrix diagonal, starting with one, containing this element:
For each marked element in this diagonal:
Retrieve current number of turns from this matrix element
(use indexes of this matrix element to initialize p1 and p2)
p2 = end of the group
p1 = start of the group
Decrease p1 while it is possible to keep median at maximum value
(now all values between p1 and p2 are assumed as maximal)
While p2 < N:
Check if number of maximal elements in the array is >= N/2
If this is true, compare current number of turns with the best result \
and update it if necessary
(additional matrix with number of maximal values between each pair of
points may be used to count elements to the left of p1 and to the
right of p2)
Look at position [p1, p2] in the matrix. Mark it and if it contains \
larger number of turns, update it
Repeat:
Increase p1 while it points to maximal value
Increment p1 (to skip one non-maximum value)
Increase p2 while it is possible to keep median at maximum value
while median is not at maximum value
To keep algorithm simple, I didn't mention special cases when group starts at position 0 or ends at position N, skipped initialization and didn't make any optimizations.
Suppose you have an array of numbers, and another set of numbers. You have to find the shortest subarray containing all numbers with minimal complexity.
The array can have duplicates, and let's assume the set of numbers does not. It's not ordered - the subarray may contain the set of number in any order.
For example:
Array: 1 2 5 8 7 6 2 6 5 3 8 5
Numbers: 5 7
Then the shortest subarray is obviously Array[2:5] (python notation).
Also, what would you do if you want to avoid sorting the array for some reason (a la online algorithms)?
Proof of a linear-time solution
I will write right-extension to mean increasing the right endpoint of a range by 1, and left-contraction to mean increasing the left endpoint of a range by 1. This answer is a slight variation of Aasmund Eldhuset's answer. The difference here is that once we find the smallest j such that [0, j] contains all interesting numbers, we thereafter consider only ranges that contain all interesting numbers. (It's possible to interpret Aasmund's answer this way, but it's also possible to interpret it as allowing a single interesting number to be lost due to a left-contraction -- an algorithm whose correctness has yet to be established.)
The basic idea is that for each position j, we will find the shortest satisfying range ending at position j, given that we know the shortest satisfying range ending at position j-1.
EDIT: Fixed a glitch in the base case.
Base case: Find the smallest j' such that [0, j'] contains all interesting numbers. By construction, there can be no ranges [0, k < j'] that contain all interesting numbers so we don't need to worry about them further. Now find the smallestlargest i such that [i, j'] contains all interesting numbers (i.e. hold j' fixed). This is the smallest satisfying range ending at position j'.
To find the smallest satisfying range ending at any arbitrary position j, we can right-extend the smallest satisfying range ending at position j-1 by 1 position. This range will necessarily also contain all interesting numbers, though it may not be minimal-length. The fact that we already know this is a satisfying range means that we don't have to worry about extending the range "backwards" to the left, since that can only increase the range over its minimal length (i.e. make the solution worse). The only operations we need to consider are left-contractions that preserve the property of containing all interesting numbers. So the left endpoint of the range should be advanced as far as possible while this property holds. When no more left-contractions can be performed, we have the minimal-length satisfying range ending at j (since further left-contractions clearly cannot make the range satisfying again) and we are done.
Since we perform this for each rightmost position j, we can take the minimum-length range over all rightmost positions to find the overall minimum. This can be done using a nested loop in which j advances on each outer loop cycle. Clearly j advances by 1 n times. Since at any point in time we only ever need the leftmost position of the best range for the previous value of j, we can store this in i and just update it as we go. i starts at 0, is at all times <= j <= n, and only ever advances upwards by 1, meaning it can advance at most n times. Both i and j advance at most n times, meaning that the algorithm is linear-time.
In the following pseudo-code, I've combined both phases into a single loop. We only try to contract the left side if we have reached the stage of having all interesting numbers:
# x[0..m-1] is the array of interesting numbers.
# Load them into a hash/dictionary:
For i from 0 to m-1:
isInteresting[x[i]] = 1
i = 0
nDistinctInteresting = 0
minRange = infinity
For j from 0 to n-1:
If count[a[j]] == 0 and isInteresting[a[j]]:
nDistinctInteresting++
count[a[j]]++
If nDistinctInteresting == m:
# We are in phase 2: contract the left side as far as possible
While count[a[i]] > 1 or not isInteresting[a[i]]:
count[a[i]]--
i++
If j - i < minRange:
(minI, minJ) = (i, j)
count[] and isInteresting[] are hashes/dictionaries (or plain arrays if the numbers involved are small).
This sounds like a problem that is well-suited for a sliding window approach: maintain a window (a subarray) that is gradually expanding and contracting, and use a hashmap to keep track of the number of times each "interesting" number occurs in the window. E.g. start with an empty window, then expand it to include only element 0, then elements 0-1, then 0-2, 0-3, and so on, by adding subsequent elements (and using the hashmap to keep track of which numbers exist in the window). When the hashmap tells you that all interesting numbers exist in the window, you can begin contracting it: e.g. 0-5, 1-5, 2-5, etc., until you find out that the window no longer contains all interesting numbers. Then, you can begin expanding it on the right hand side again, and so on. I'm quite (but not entirely) sure that this would work for your problem, and it can be implemented to run in linear time.
Say the array has n elements, and set has m elements
Sort the array, noting the reverse index (position in the original array)
// O (n log n) time
for each element in given set
find it in the array
// O (m log n) time - log n for binary serch, m times
keep track of the minimum and maximum index for each found element
min - max defines your range
Total time complexity: O ((m+n) log n)
This solution definitely does not run in O(n) time as suggested by some of the pseudocode above, however it is real (Python) code that solves the problem and by my estimates runs in O(n^2):
def small_sub(A, B):
len_A = len(A)
len_B = len(B)
sub_A = []
sub_size = -1
dict_b = {}
for elem in B:
if elem in dict_b:
dict_b[elem] += 1
else:
dict_b.update({elem: 1})
for i in range(0, len_A - len_B + 1):
if A[i] in dict_b:
temp_size, temp_sub = find_sub(A[i:], dict_b.copy())
if (sub_size == -1 or (temp_size != -1 and temp_size < sub_size)):
sub_A = temp_sub
sub_size = temp_size
return sub_size, sub_A
def find_sub(A, dict_b):
index = 0
for i in A:
if len(dict_b) == 0:
break
if i in dict_b:
dict_b[i] -= 1
if dict_b[i] <= 0:
del(dict_b[i])
index += 1
if len(dict_b) > 0:
return -1, {}
else:
return index, A[0:index]
Here's how I solved this problem in linear time using collections.Counter objects
from collections import Counter
def smallest_subsequence(stream, search):
if not search:
return [] # the shortest subsequence containing nothing is nothing
stream_counts = Counter(stream)
search_counts = Counter(search)
minimal_subsequence = None
start = 0
end = 0
subsequence_counts = Counter()
while True:
# while subsequence_counts doesn't have enough elements to cancel out every
# element in search_counts, take the next element from search
while search_counts - subsequence_counts:
if end == len(stream): # if we've reached the end of the list, we're done
return minimal_subsequence
subsequence_counts[stream[end]] += 1
end += 1
# while subsequence_counts has enough elements to cover search_counts, keep
# removing from the start of the sequence
while not search_counts - subsequence_counts:
if minimal_subsequence is None or (end - start) < len(minimal_subsequence):
minimal_subsequence = stream[start:end]
subsequence_counts[stream[start]] -= 1
start += 1
print(smallest_subsequence([1, 2, 5, 8, 7, 6, 2, 6, 5, 3, 8, 5], [5, 7]))
# [5, 8, 7]
Java solution
List<String> paragraph = Arrays.asList("a", "c", "d", "m", "b", "a");
Set<String> keywords = Arrays.asList("a","b");
Subarray result = new Subarray(-1,-1);
Map<String, Integer> keyWordFreq = new HashMap<>();
int numKeywords = keywords.size();
// slide the window to contain the all the keywords**
// starting with [0,0]
for (int left = 0, right = 0 ; right < paragraph.size() ; right++){
// expand right to contain all the keywords
String currRight = paragraph.get(right);
if (keywords.contains(currRight)){
keyWordFreq.put(currRight, keyWordFreq.get(currRight) == null ? 1 : keyWordFreq.get(currRight) + 1);
}
// loop enters when all the keywords are present in the current window
// contract left until the all the keywords are still present
while (keyWordFreq.size() == numKeywords){
String currLeft = paragraph.get(left);
if (keywords.contains(currLeft)){
// remove from the map if its the last available so that loop exists
if (keyWordFreq.get(currLeft).equals(1)){
// now check if current sub array is the smallest
if((result.start == -1 && result.end == -1) || (right - left) < (result.end - result.start)){
result = new Subarray(left, right);
}
keyWordFreq.remove(currLeft);
}else {
// else reduce the frequcency
keyWordFreq.put(currLeft, keyWordFreq.get(currLeft) - 1);
}
}
left++;
}
}
return result;
}