Number of steps to transform X to Y. (Integers) - c

So I am confused on how this recursive function works. I don't understand how this actually produces the answer. The question states:
Write a recursive function that returns the minimum number of steps
necessary to transform X to Y. If X > Y, return 1000000000, to indicate that no solution exists.(For example, if X = 13 and Y = 28, the correct response would be 2 - first you would add 1 to13 to obtain 14, then multiply 14 by 2 to obtain 28.) Feel free to call the provided function
This is the solution:
int min(int x, int y) {
if (x < y) return x;
return y;
}
// Returns the minimum number of steps to transform x into y, or
// 100000000 to indicate no solution.
int minSteps(int x, int y) {
if (x > y) return NO_SOLUTION;
if (x == y) return 0;
int mult = 1 + minSteps(2*x, y);
int add = 1 + minSteps(x+1, y);
return min(add, mult);
}
If someone can please explain the solution that will be great. Thanks!

In the core of many problems that can be solved with recursion lies the principle of reduction of the original problem to a smaller one and continuing the process until it diminishes to a known one.
This problem perfectly suits for such approach.
Your answer is a series of arithmetic operations that convert x to y. I.e., something like this:
x ? a ? b ? c ? ... ? y
Where ? denotes either multiplication by 2 or addition of 1; and a,b,c... represent the intermediate results after applying the operation on a previous result. For example, transformation of 5 to 22 can be described this way:
5 (*2) 10 (+1) 11 (*2) 22
Now let's get back to the reduction principle. Starting with a given x, we need to choose the first step. It can be either *2 OR[1] +1, we don't know it yet, so we need to check them both. In the case of *2, the x transforms to 2x and in the case of +1, the x transforms to x+1. And voila, we progressed one step and reduced the problem! Now we have 2 smaller problems to solve - one for 2x and one for x+1, and find the minimum between the results. Since we're counting the steps, we create 2 distinct counters (one for each type of operation taken) and add 1 to each one of them (since we performed one step already). To complete the calculation of the actual value of each counter we need to solve the two smaller problems - and to solve them we call the function recursively with the new input (twice, once per input). The algorithm continues this way, reducing the problem each time until getting to a stop condition, which can be either x == y (it's a valid transformation) or x > y (invalid transformation). In case of x == y there are exactly 0 steps required and the execution stops, causing the call stack to fall back, populating the actual value of the counter that originated the recursion branch. In case of x > y the result is 1000000000 (which is assumed to be too large to be an actual result and thus the sum will be dropped as larger than the sum from the second branch). This process is usually better understood by visualizing with recursion tree (see #DavidBowling answer, for example. Err, deleted for some reason...).
[1] Although in this problem it's very clear, but sometimes the distinction between the operations can be vague. It's very important to dissect the problem into a number of smaller ones, without any overlap between them.

This is actually a nice example to help teach recursion. Not sure if I can explain it, but will take a shot. To be clear, there are only two kinds of steps: either doubling X, or adding 1 to X.
The best way to understand this is to follow an example through the code.
I've deleted the rest of my answer for now. Playing with this in a debugger. It's actually quite elegant but I don't feel I can yet explain exactly how its working before playing with it some more.
Still not up to explaining it, but take a look at this:
minSteps(16, 28) = 12
minSteps(15, 28) = 13
minSteps(14, 28) = 1
minSteps(13, 28) = 2
minSteps(12, 28) = 3
minSteps(11, 28) = 4
minSteps(10, 28) = 5
minSteps( 9, 28) = 6
minSteps( 8, 28) = 7
minSteps( 7, 28) = 2
minSteps( 6, 28) = 3
minSteps( 5, 28) = 4
minSteps( 4, 28) = 5
minSteps( 3, 28) = 4
minSteps( 2, 28) = 5
minSteps( 1, 28) = 6
Notice in particular:
minSteps( 5, 28) = 4 // x+1 twice (5->7), then x*2 twice (7->14->28)
minSteps( 4, 28) = 5 // x+1 thrice (4->7), then x*2 twice (7->14->28)
minSteps( 3, 28) = 4 // x*2 (3->6), then x+1 (6->7), then x*2 twice (7->14->28)
minSteps( 2, 28) = 5 // x+1 (2->3), then x*2 (3->6), then x+1 (6->7), then x*2 twice (7->14->28)
To me, it seems relatively easy to see how the algorithm can get correct any case of just multiplying by 2 repeatedly, or any case of first adding 1 some number of times and then multiplying by 2 repeatedly. That is correctly the minimum number of steps in almost every case above.
But the cases of minSteps(3,28) and minSteps(2,28) are really quite interesting, because the minimum number of steps for those cases involves switching back and forth between x*2 and x+1. And yet the algorithm gets it right.
There is actually nothing special about these cases. The answer is this: The process is always binary. At each step the problem is broken into both x*2 and x+1 for that step, and so on for each next step as you note: The key is that in this way the algorithm actually tests EVERY POSSIBLE PATH (every possible combination of x+1s and x*2s) and takes the minimum of all possible paths. It was not obvious to me at first that it was trying every path. Of course it abandons any path that exceeds Y as soon as it exceeds Y.

Related

How can I remove rows of a matrix in Matlab when the difference between two consecutive rows is more than a threshold?

Suppose a data like:
X y
1 5
2 6
3 1
4 7
5 3
6 8
I want to remove 3 1 and 5 3 because their difference with the previous row is more than 3. In fact, I want to draw a plot with them and want it to be smooth.
I tried
for qq = 1:size(data,1)
if data(qq,2) - data(qq-1,2) > 3
data(qq,:)=[];
end
end
However, it gives:
Subscript indices must either be real positive integers or logicals.
Moreover, I guess the size of array changes as I remove some elements.
In the end, the difference between no consecutive elements must be greater than threshold.
In practice I want to smooth the following picture where there is high fluctuate
One very simple filter from Mathematical morphology that you could try is the closing with a structuring element of size 2. It changes the value of any sample that is lower than both neighbors to the lowest of its two neighbors. Other values are not changed. Thus, it doesn't use a threshold to determine what samples are wrong, it only looks that the sample is lower than both neighbors:
y = [5, 6, 1, 7, 3, 8]; % OP's second column
y1 = y;
y1(end+1) = -inf; % enforce boundary condition
y1 = max(y1,circshift(y1,1)); % dilation
y1 = min(y1,circshift(y1,-1)); % erosion
y1 = y1(1:end-1); % undo boundary condition change
This returns y1 = [5 6 6 7 7 8].
If you want to prevent changing your signal for small deviations, you can apply your threshold as a second step:
I = y1 - y < 3;
y1(I) = y(I);
This finds the places where we changed the signal, but the change was less than the threshold of 3. At those places we write back the original value.
You have a few errors:
Your index needs to start from 2, so that you aren't trying to index 0 for a previous index.
You need to check that the absolute value of the difference is greater than 3.
Since your data matrix is changing sizes, you can't use a for loop with a fixed number of iterations. Use a while loop instead.
This should give you the results you want:
qq = 2;
while qq <= size(data, 1)
if abs(data(qq, 2) - data(qq-1, 2)) > 3,
data(qq, :) = [];
else
qq = qq+1;
end
end

Find a duplicate in array of integers

This was an interview question.
I was given an array of n+1 integers from the range [1,n]. The property of the array is that it has k (k>=1) duplicates, and each duplicate can appear more than twice. The task was to find an element of the array that occurs more than once in the best possible time and space complexity.
After significant struggling, I proudly came up with O(nlogn) solution that takes O(1) space. My idea was to divide range [1,n-1] into two halves and determine which of two halves contains more elements from the input array (I was using Pigeonhole principle). The algorithm continues recursively until it reaches the interval [X,X] where X occurs twice and that is a duplicate.
The interviewer was satisfied, but then he told me that there exists O(n) solution with constant space. He generously offered few hints (something related to permutations?), but I had no idea how to come up with such solution. Assuming that he wasn't lying, can anyone offer guidelines? I have searched SO and found few (easier) variations of this problem, but not this specific one. Thank you.
EDIT: In order to make things even more complicated, interviewer mentioned that the input array should not be modified.
Take the very last element (x).
Save the element at position x (y).
If x == y you found a duplicate.
Overwrite position x with x.
Assign x = y and continue with step 2.
You are basically sorting the array, it is possible because you know where the element has to be inserted. O(1) extra space and O(n) time complexity. You just have to be careful with the indices, for simplicity I assumed first index is 1 here (not 0) so we don't have to do +1 or -1.
Edit: without modifying the input array
This algorithm is based on the idea that we have to find the entry point of the permutation cycle, then we also found a duplicate (again 1-based array for simplicity):
Example:
2 3 4 1 5 4 6 7 8
Entry: 8 7 6
Permutation cycle: 4 1 2 3
As we can see the duplicate (4) is the first number of the cycle.
Finding the permutation cycle
x = last element
x = element at position x
repeat step 2. n times (in total), this guarantees that we entered the cycle
Measuring the cycle length
a = last x from above, b = last x from above, counter c = 0
a = element at position a, b = elment at position b, b = element at position b, c++ (so we make 2 steps forward with b and 1 step forward in the cycle with a)
if a == b the cycle length is c, otherwise continue with step 2.
Finding the entry point to the cycle
x = last element
x = element at position x
repeat step 2. c times (in total)
y = last element
if x == y then x is a solution (x made one full cycle and y is just about to enter the cycle)
x = element at position x, y = element at position y
repeat steps 5. and 6. until a solution was found.
The 3 major steps are all O(n) and sequential therefore the overall complexity is also O(n) and the space complexity is O(1).
Example from above:
x takes the following values: 8 7 6 4 1 2 3 4 1 2
a takes the following values: 2 3 4 1 2
b takes the following values: 2 4 2 4 2
therefore c = 4 (yes there are 5 numbers but c is only increased when making steps, not initially)
x takes the following values: 8 7 6 4 | 1 2 3 4
y takes the following values: | 8 7 6 4
x == y == 4 in the end and this is a solution!
Example 2 as requested in the comments: 3 1 4 6 1 2 5
Entering cycle: 5 1 3 4 6 2 1 3
Measuring cycle length:
a: 3 4 6 2 1 3
b: 3 6 1 4 2 3
c = 5
Finding the entry point:
x: 5 1 3 4 6 | 2 1
y: | 5 1
x == y == 1 is a solution
Here is a possible implementation:
function checkDuplicate(arr) {
console.log(arr.join(", "));
let len = arr.length
,pos = 0
,done = 0
,cur = arr[0]
;
while (done < len) {
if (pos === cur) {
cur = arr[++pos];
} else {
pos = cur;
if (arr[pos] === cur) {
console.log(`> duplicate is ${cur}`);
return cur;
}
cur = arr[pos];
}
done++;
}
console.log("> no duplicate");
return -1;
}
for (t of [
[0, 1, 2, 3]
,[0, 1, 2, 1]
,[1, 0, 2, 3]
,[1, 1, 0, 2, 4]
]) checkDuplicate(t);
It is basically the solution proposed by #maraca (typed too slowly!) It has constant space requirements (for the local variables), but apart from that only uses the original array for its storage. It should be O(n) in the worst case, because as soon as a duplicate is found, the process terminates.
If you are allowed to non-destructively modify the input vector, then it is pretty easy. Suppose we can "flag" an element in the input by negating it (which is obviously reversible). In that case, we can proceed as follows:
Note: The following assume that the vector is indexed starting at 1. Since it is probably indexed starting at 0 (in most languages), you can implement "Flag item at index i" with "Negate the item at index i-1".
Set i to 0 and do the following loop:
Increment i until item i is unflagged.
Set j to i and do the following loop:
Set j to vector[j].
if the item at j is flagged, j is a duplicate. Terminate both loops.
Flag the item at j.
If j != i, continue the inner loop.
Traverse the vector setting each element to its absolute value (i.e. unflag everything to restore the vector).
It depends what tools are you(your app) can use. Currently a lot of frameworks/libraries exists. For exmaple in case of C++ standart you can use std::map<> ,as maraca mentioned.
Or if you have time you can made your own implementation of binary tree, but you need to keep in mind that insert of elements differs in comarison with usual array. In this case you can optimise search of duplicates as it possible in your particular case.
binary tree expl. ref:
https://www.wikiwand.com/en/Binary_tree

finding maximum sum of a disjoint sequence of an array

Problem from :
https://www.hackerrank.com/contests/epiccode/challenges/white-falcon-and-sequence.
Visit link for references.
I have a sequence of integers (-10^6 to 10^6) A. I need to choose two contiguous disjoint subsequences of A, let's say x and y, of the same size, n.
After that you will calculate the sum given by ∑x(i)y(n−i+1) (1-indexed)
And I have to choose x and y such that sum is maximised.
Eg:
Input:
12
1 7 4 0 9 4 0 1 8 8 2 4
Output: 120
Where x = {4,0,9,4}
y = {8,8,2,4}
∑x(i)y(n−i+1)=4×4+0×2+9×8+4×8=120
Now, the approach that I was thinking of for this is something in lines of O(n^2) which is as follows:
Initialise two variables l = 0 and r = N-1. Here, N is the size of the array.
Now, for l=0, I will calculate the sum while (l<r) which basically refers to the subsequences that will start from the 0th position in the array. Then, I will increment l and decrement r in order to come up with subsequences that start from the above position + 1 and on the right hand side, start from right-1.
Is there any better approach that I can use? Anything more efficient? I thought of sorting but we cannot sort numbers since that will change the order of the numbers.
To answer the question we first define S(i, j) to be the max sum of multlying the two sub-sequence items, for sub-array A[i...j] when the sub-sequence x starts at position i, and sub-sequence y ends on position j.
For example, if A=[1 7 4 0 9 4 0 1 8 8 2 4], then S(1, 2)=1*7=7 and S(2, 5)=7*9+4*0=63.
The recursive rule to compute S is: S(i, j)=max(0, S(i+1, j-1)+A[i]*A[j]), and the end condition is S(i, j)=0 iff i>=j.
The requested final answer is simply the maximum value of S(i, j) for all combinations of i=1..N, j=1..N, since one of the S(i ,j) values will correspond to the max x,y sub-sequences, and thus will be equal the maximum value for the whole array. The complexity of computing all such S(i, j) values is O(N^2) using dynamic programming, since in the course of computing S(i, j) we will also compute the values of up to N other S(i', j') values, but ultimately each combination will be computed only once.
def max_sum(l):
def _max_sub_sum(i, j):
if m[i][j]==None:
v=0
if i<j:
v=max(0, _max_sub_sum(i+1, j-1)+l[i]*l[j])
m[i][j]=v
return m[i][j]
n=len(l)
m=[[None for i in range(n)] for j in range(n)]
v=0
for i in range(n):
for j in range(i, n):
v=max(v, _max_sub_sum(i, j))
return v
WARNING:
This method assumes the numbers are non-negative so this solution does not answer the poster's actual problem now it has been clarified that negative input values are allowed.
Trick 1
Assuming the numbers are always non-negative, it is always best to make the sequences as wide as possible given the location where they meet.
Trick 2
We can change the sum into a standard convolution by summing over all values of i. This produces twice the desired result (as we get both the product of x with y, and y with x), but we can divide by 2 at the end to get the original answer.
Trick 3
You are now attempting to find the maximum of a convolution of a signal with itself. There is a standard method for doing this which is to use the fast fourier transform. Some libraries will have this built in, e.g. in Scipy there is fftconvolve.
Python code
Note that you don't allow the central value to be reused (e.g. for a sequance 1,3,2 we can't make x 1,3 and y 3,1) so we need to examine alternate values of the convolved output.
We can now compute the answer in Python via:
import scipy.signal
A = [1, 7, 4, 0, 9, 4, 0, 1, 8, 8, 2, 4]
print max(scipy.signal.fftconvolve(A,A)[1::2]) / 2

Making Minimal Changes to Change Range of the Array

Consider having an array filled with elements a0,a1,a2,....,a(n-1).
Consider that this array is sorted already; it will be easier to describe the problem.
Now the range of the array is defined as the biggest element - smallest element.
Say this range is some value x.
Now the problem I have is that, I want to change the elements in such a way that the range becomes less than/equal to some target value y.
I also have the additional constraint that I want to change minimal amount for each element. Consider an element a(i) that has value z. If I change it by r amount, this costsr^2.
Thus, what is an efficient algorithm to update this array to make the range less than or equal to target range y that minimizes the cost.
An example:
Array = [ 0, 3, 19, 20, 23 ] Target range is 17.
I would make the new array [ 3, 3, 19, 20, 20 ] . The cost is (3)^2 + (3)^2 = 18.
This is the minimal cost.
If you are adding/removing to some certain element a(i), you must add/remove that quantity q all at once. You can not remove 3 times 1 unit from a certain element, but must remove a quantity of 3 units once.
I think you can build two heaps from the array - one min-heap, one max-heap. Now you will take the top elements of both heaps and peek at the ones right under them and compare the differences. The one that has the bigger difference you will take and if that difference is bigger than you need, you will just take the required size and add the cost.
Now, if you had to take the whole difference and didn't achieve your goal, you will need to repeat this step. However, if you once again choose from the same heap, you have to remember to add the cost for the element you are taking out of the heap in that steps AND also for those that have been taken out of the processed heap before.
This yields an O(N*logN) algorithm, I'm not sure if it can be done faster.
Example:
Array [2,5,10,12] , I want difference 4.
First heap has 2 on top, second one 12. the 2 is 3 far from 5 and 12 is 2 far from 10 so I take the min-heap and the two will have to be changed by 3. So now we have a new situation:
[5, 10, 12]
The 12 is 2 far from 10 and we take it, subtract 2 and get new situation:
[5,10]
Now we can choose any heap, both differences are the same (the same numbers :-) ). We just need to change by 1 so we get subtract 1 from 10 and get the right result. Now, because we changed 5 to 6 we would also have to change the number that was originally 12 once more to 9 so the resulting cost:
[2 - changed to 5, 5 - unchanged, 10 - changed to 9, 12 - changed to 9].
Here is a linear-time algorithm that minimizes the piecewise quadratic objective function. Probably it can be simplified.
Let the range be [x, x + y], where x is a variable. For different choices of x, there are at most 2n + 1 possibilities for which points lie in the range, arising from 2n critical values a0 - y, a1 - y, ..., a(n-1) - y, a0, a1, ..., a(n-1). One linear-time merge yields the critical values in sorted order. For each of the 2n - 1 intervals [w, z] between critical values where the range contains at least one point, we can construct and minimize a quadratic function consisting of a sum where every point aj less than w yields a term (x - aj)^2 and every point aj greater than z + y yields a term (x + y - aj)^2. The global minimum lies at the mean of aj (for terms of the first type) or aj - y (for terms of the second type); the endpoints of the interval must be checked as well. Naively, this gives a quadratic-time algorithm.
To get down to linear time, it suffices to update the sum preceding the mean computation incrementally. Each of the critical values has an associated event indicating whether the point responsible for it is entering or leaving the interval, meaning that that point's term should enter or leave the sum.

How can I efficiently remove zeroes from a (non-sparse) matrix?

I have a matrix:
x = [0 0 0 1 1 0 5 0 7 0];
I need to remove all of the zeroes, like so:
x = [1 1 5 7];
The matrices I am using are large (1x15000) and I need to do this multiple times (5000+), so efficiency is key!
One way:
x(x == 0) = [];
A note on timing:
As mentioned by woodchips, this method seems slow compared to the one used by KitsuneYMG. This has also been noted by Loren in one of her MathWorks blog posts. Since you mentioned having to do this thousands of times, you may notice a difference, in which case I would try x = x(x~=0); first.
WARNING: Beware if you are using non-integer numbers. If, for example, you have a very small number that you would like to consider close enough to zero so that it will be removed, the above code won't remove it. Only exact zeroes are removed. The following will help you also remove numbers "close enough" to zero:
tolerance = 0.0001; % Choose a threshold for "close enough to zero"
x(abs(x) <= tolerance) = [];
Just to be different:
x=x(x~=0);
or
x=x(abs(x)>threshold);
This has the bonus of working on complex numbers too
Those are the three common solutions. It helps to see the difference.
x = round(rand(1,15000));
y = x;
tic,y(y==0) = [];toc
Elapsed time is 0.004398 seconds.
y = x;
tic,y = y(y~=0);toc
Elapsed time is 0.001759 seconds.
y = x;
tic,y = y(find(y));toc
Elapsed time is 0.003579 seconds.
As you should see, the cheapest way is the direct logical index, selecting out the elements to be retained. The find is more expensive, since matlab finds those elements, returning a list of them, and then indexes into the vector.
Here's another way
y = x(find(x))
I'll leave it to you to figure out the relative efficiency of the various approaches you try -- do write and let us all know.
Though my timing results are not conclusive to whether it is significantly faster, this seems to be the fastest and easiest approach:
y = nonzeros(y)
x = [0 0 0 1 1 0 5 0 7 0]
y = [0 2 0 1 1 2 5 2 7 0]
Then x2 and y2 can be obtained as:
x2=x(~(x==0 & y==0))
y2=y(~(x==0 & y==0))
x2 = [0 1 1 0 5 0 7]
y2 = [2 1 1 2 5 2 7]
Hope this helps!

Resources