Related
So I'm trying to create a new variable column of ''first differences'' by subtracting values in the SAME column but have no clue how to do so on SPSS. For example, in this picture:
1st value - 0 = 0 (obviously). 2nd value - 1st value =..., 3rd value - 2nd value =..., 4th value - 3rd value =... and so on.
Also, if there is a negative number, does SPSS allow me to log it/regress it? Once I find the first difference, I'm going to LOG it & then regress it. For context, the reason I'm doing this is part of a bigger equation to find out how economic growth and a CHANGE in economic growth (hence the first difference and log) will affect the variable im studying.
Thanks.
To calculate differences between values in consecutive rows use this:
if $casenum>1 diffs = FinalConsumExp - lag(FinalConsumExp).
execute.
If you need help with additional problems please start a separate question for each problem.
HTH.
I was trying to sharpen my skills by solving the Codality problems. I reached this one: https://codility.com/programmers/lessons/9-maximum_slice_problem/max_double_slice_sum/
I actually theoretically understand the solution:
Use Kadane's Algorithm on the array and store the sum at every index.
Reverse the array and do the same.
Find a point where the sum of both is max by looping over both result sets one at a time.
The max is the max double slice.
My question is not so much about how to solve the problem. My question is about how does one imagine that this will be way in which this problem can be solved. There are at-least 3 different concepts that need to be made use of:
The understanding that if all elements in the array are positive, or negative it is a different case than when there are some positive and negative elements in the array.
Kadane's Algorithm
Going over the array forward and reversed.
Despite all of this, Codality has tagged this problem as "Painless".
My questions is am I missing something? It seems hard that I would be able to solve this problem without knowing some of these concepts.
Is there a technique where I can start from scratch and very basic concepts and work my way up to the concepts required to solve this problem. Or is it that I am expected to know these concepts before even starting the problem?
How can I prepare my self to solve such problems where I don't know the required concepts in the future?
I think you are overthinking the problem, that's why you find it more difficult than it is:
The understanding that if all elements in the array are positive, or negative it is a different case than when there are some positive and negative elements in the array.
It doesn't have to be a different case. You might be able to come up with an algorithm that doesn't care about this distinction and works anyway.
You don't need to start by understanding this distinction, so don't think about it until or even if you have to.
Kadane's Algorithm
Don't think of an algorithm, think of what the problem requires. Usually that 10+ paragraph problem statement can be expressed in much less.
So let's see how we can simplify the problem statement.
It first defines a slice as a triplet (x, y, z). It's defined at the sum of elements starting at x+1, ending at z-1 and not containing y.
Then it asks for the maximum sum slice. If we need the maximum slice, do we need x and z in the definition? We might as well let it start and end anywhere as long as it gets us the maximum sum, no?
So redefine a slice as a subset of the array that starts anywhere, goes up to some y-1, continues from y+1 and ends anywhere. Much simpler, isn't it?
Now you need the maximum such slice.
Now you might be thinking that you need, for each y, the maximum sum subarray that starts at y+1 and the maximum sum subarray that ends at y-1. If you can find these, you can update a global max for each y.
So how do you do this? This should now point you towards Kadane's algorithm, which does half of what you want: it computes the maximum sum subarray ending at some x. So if you compute it from both sides, for each y, you just have to find:
kadane(y - 1) + kadane_reverse(y + 1)
And compare with a global max.
No special cases for negatives and positives. No thinking "Kadane's!" as soon as you see the problem.
The idea is to simplify the requirement as much as possible without changing its meaning. Then you use your algorithmic and deductive skills to reach a solution. These skills are honed with time and experience.
Yes this is homework. I am not asking for any easy answers, just help moving in the right direction. here is the assignment: "Create a function that receives two numbers: a and b. The function calculates and returns the multiplication of all the numbers between a and b. Create three versions of this function."
I created the function using a for loop and a while loop, but I am at a loss how to use recursion- the final part of the assignment.
Kudos for admitting this is a homework question. As such, while I won't give you the answer, I will give you a few pointers towards it.
When writing a recursive function, there are two key things to consider:
What stops the recursion, and
What happens until the recursion stops
In your case, where you have to calculate the product of a list of numbers, this works out as:
What should the function do when there is only 1 item in the list? (ie: when a and b are the same)
How can I multiply one element by the product of the rest of the list?
For extra credit, look up tail recursion and understand why it can help keep your memory usage down.
Does that give you enough of a start?
It's a simple instance of dynamic programming — you start with one problem and attempt to resolve it by breaking it into problems that are easier to solve and combining the results.
You can then usually attack these problems by working backwards: what's the most trivial case, that you could answer immediately? What would you do if the problem were a notch harder than that?
As you've explicitly been told to find a recursive solution, you can assume that you're looking for a method that can either directly return a result or else must call itself with modified parameters, and do something with that result to get its own.
Failing that, given that the question is slightly artificial, consider looking up how you could literally just implement a for loop using a recursive structure, then directly adapt your existing for loop. No great thought about the nature of breaking problems down, just looking at how to express your existing solution in a different way.
function recursiveMultiplication(num1, num2) {
if (num2 == num1) {
return num2;
}
return num2 * recursiveMultiplication(num1, num2 - 1);
}
console.log(recursiveMultiplication(5, 8));
My problem is this: I have a large sequence of numbers. I know that, after some point, it becomes periodic - that is, there are k numbers at the beginning of the sequence, and then there are m more numbers that repeat for the rest of the sequence. As an example to make this more clear, the sequence might look like this: [1, 2, 5, 3, 4, 2, 1, 1, 3, 2, 1, 1, 3, 2, 1, 1, 3, ...], where k is 5 and m is 4, and the repeating block is then [2, 1, 1, 3]. As is clear from this example, I can have repeating bits inside of the larger block, so it doesn't help to just look for the first instances of repetition.
However, I do not know what k or m are - my goal is to take the sequence [a_1, a_2, ... , a_n] as an input and output the sequence [a_1, ... , a_k, [a_(k+1), ... , a_(k+m)]] - basically truncating the longer sequence by listing the majority of it as a repeating block.
Is there an efficient way to do this problem? Also, likely harder but more ideal computationally - is it possible to do this as I generate the sequence in question, so that I have to generate a minimal amount? I've looked at other, similar questions on this site, but they all seem to deal with sequences without the beginning non-repeating bit, and often without having to worry about internal repetition.
If it helps/would be useful, I can also get into why I am looking at this and what I will use it for.
Thanks!
EDITS: First, I should have mentioned that I do not know if the input sequence ends at exactly the end of a repeated block.
The real-world problem that I am attempting to work on is writing a nice, closed-form expression for continued fraction expansions (CFEs) of quadratic irrationals (actually, the negative CFE). It is very simple to generate partial quotients* for these CFEs to any degree of accuracy - however, at some point the tail of the CFE for a quadratic irrational becomes a repeating block. I need to work with the partial quotients in this repeating block.
My current thoughts are this: perhaps I can adapt some of the algorithms suggested that work from the right to work with one of these sequences. Alternatively, perhaps there is something in the proof of why quadratic irrationals are periodic that will help me see why they begin to repeat, which will help me come up with some easy criteria to check.
*If I am writing a continued fraction expansion as [a_0, a_1, ...], I refer to the a_i's as partial quotients.
Some background info can be found here for those interested: http://en.wikipedia.org/wiki/Periodic_continued_fraction
You can use a rolling hash to achieve linear time complexity and O(1) space complexity (I think this is the case, since I don't believe you can have an infinite repeating sequence with two frequencies which are not multiples of each other).
Algorithm: You just keep two rolling hashes which expand like this:
_______ _______ _______
/ \/ \/ \
...2038975623895769874883301010883301010883301010
. . . ||
. . . [][]
. . . [ ][ ]
. . .[ ][ ]
. . [. ][ ]
. . [ . ][ ]
. . [ .][ ]
. . [ ][ ]
. [ ][ ]
Keep on doing this for the entire sequence. The first pass will only detect repetitions repeated 2*n times for some value of n. However that's not our goal: our goal in the first pass is to detect all possible periods, which this does. As we go along the sequence performing this process, we also keep track of all relatively prime periods we will need to later check:
periods = Set(int)
periodsToFurthestReach = Map(int -> int)
for hash1,hash2 in expandedPairOfRollingHashes(sequence):
L = hash.length
if hash1==hash2:
if L is not a multiple of any period:
periods.add(L)
periodsToFurthestReach[L] = 2*L
else L is a multiple of some periods:
for all periods P for which L is a multiple:
periodsToFurthestReach[P] = 2*L
After this process, we have a list of all periods and how far they've reached. Our answer is probably the one with the furthest reach, but we check all other periods for repetition (fast because we know the periods we're checking for). If this is computationally difficult, we can optimize by pruning away periods (which stop repeating) as we're going through the list, very much like the sieve of Eratosthenes, by keeping a priority queue of when we next expect a period to repeat.
At the end, we double-check the result to make sure there was no hash collision (in unlikely even there is, blacklist and repeat).
Here I assumed your goal was to minimize non-repeating-length, and not give a repeating element which can be further factored; you can modify this algorithm to find all other compressions, if they exist.
So, ninjagecko provided a good working answer to the question I posed. Thanks very much! However, I ended up finding a more efficient, mathematically based way to do the specific case that I am looking at - that is, writing out a closed form expression for the continued fraction expansion of a quadratic irrational. Obviously this solution will only work for this specific case, rather than the general case that I asked about, but I thought it might be useful to put it here in case others have a similar question.
Basically, I remembered that a quadratic irrational is reduced if and only if its continued fraction expansion is purely periodic - as in, it repeats right from the beginning, without any leading terms.
When you work out the continued fraction expansion of a number x, you basically set x_0 to be x, and then you form your sequence [a_0; a_1, a_2, a_3, ... ] by defining a_n = floor(x_n) and x_(n+1) = 1/(x_n - a_n). Normally you would just continue this until you reach a desired precision. However, for our purposes, we just run this method until x_k is a reduced quadratic irrational (which occurs if it is bigger than 1 and its conjugate is between -1 and 0). Once this happens, we know that a_k is the first term of our repeating block. Then, when we find x_(k+m+1) equal to x_k, we know that a_(k+m) is the last term in our repeating block.
Search from the right:
does a_n == a_n-1
does (a_n,a_n-1) == (a_n-2,a_n-3)
...
This is clearly O(m^2). The only available bound appears to be that m<n/2, so it's O(n^2)
Is this acceptable for your application? (Are we doing your homework for you, or is there an actual real-world problem here?)
This page lists several good cycle-detection algorithms and gives an implementation of an algorithm in C.
Consider the sequence once it has repeated a number of times. It will end e.g. ...12341234123412341234. If you take the repeating part of the string up to just before the last cycle of repeats, and then slide it along by the length of that cycle, you will find that you have a long match between a substring at the end of the sequence and the same substring slid to the left a distance which is small compared with its length.
Conversely, if you have a string where a[x] = a[x + k] for a large number of x, then you also have a[x] = a[x + k] = a[x + 2k] = a[x + 3k]... so a string that matches itself when slid a short distance compared to its length must contain repeats.
If you look at http://en.wikipedia.org/wiki/Suffix_array, you will see that you can build the list of all suffixes of a string, in sorted order, in linear time, and also an array which tells you how many characters each suffix has in common with the previous suffix in sorted order. If you look for the entry with the largest value of this, this would be my candidate for a string going ..1234123412341234, and the distance between the starting points of the two suffixes would tell you the length at which the sequence repeats. (but in practice some sort of rolling hash search like http://en.wikipedia.org/wiki/Rabin-Karp might be quicker and easier, although there are quite codeable linear time Suffix Array algorithms, like "Simple Linear Work Suffix Array Construction" by Karkkainen and Sanders).
Suppose that you apply this algorithm when the number of characters available is 8, 16, 32, 64, .... 2^n, and you finally find a repeat at 2^p. How much time have you wasted in earlier stages? 2^(p-1) + 2^(p-2) + ..., which sums to about 2^p, so the repeated searches are only a constant overhead.
I have a database full of facts such as:
overground(newcrossgate,brockley,2).
overground(brockley,honoroakpark,3).
overground(honoroakpark,foresthill,3).
overground(foresthill,sydenham,2).
overground(sydenham,pengewest,3).
overground(pengewest,anerley,2).
overground(anerley,norwoodjunction,3).
overground(norwoodjunction,westcroydon,8).
overground(sydenham,crystalpalace,5).
overground(highburyandislington,canonbury,2).
overground(canonbury,dalstonjunction,3).
overground(dalstonjunction,haggerston,1).
overground(haggerston,hoxton,2).
overground(hoxton,shoreditchhighstreet,3).
example: newcrossgate to brockley takes 2 minutes.
I then created a rule so that if I enter the query istime(newcrossgate,honoroakpark,Z). then prolog should give me the time it takes to travel between those two stations. (The rule I made is designed to calculate the distance between any two stations at all, not just adjacent ones).
istime(X,Y,Z):- istime(X,Y,0,Z); istime(Y,X,0,Z).
istime(X,Y,T,Z):- overground(X,Y,Z), T1 is T + Z.
istime(X,Y,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
istime(X,Y,Z):- overground(X,B,T), istime(B,X,T1), Z is T + T1.
it seems to work perfectly for newcrossgate to the first couple stations, e.g newcrossgate to foresthill or sydenham. However, after testing newcrossgate to westcroydon which takes 26mins, I tried newcrossgate to crystalpalace and prolog said it should take 15 mins...despite the fact its the next station after westcroydon. Clearly somethings wrong here, however it works for most of the stations while coming up with a occasional error in time every now and again, can anyone tell me whats wrong? :S
This is essentially the same problem as your previous question, the only difference is that you need to accumulate the time as you go.
One thing I see is that your "public" predicate, istime/3 tries to do too much. All it should do is seed the accumulator and invoke the worker predicate istime/4. Since you're looking for route/time in both directions, the public predicate should be just
istime( X , Y , Z ) :- istime( X , Y , 0 , Z ) .
istime( X , Y , Z ) :- istime( Y , X , 0 , Z ) .
The above is essentially the first clause of your istime/3 predicate
istime(X,Y,Z):- istime(X,Y,0,Z); istime(Y,X,0,Z).
The remaining clauses of istime/3, the recursive ones:
istime(X,Y,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
istime(X,Y,Z):- overground(X,B,T), istime(B,X,T1), Z is T + T1.
should properly be part of istime/4 and have the accumulator present. That's where your problem is.
Give it another shot and edit your question to show the next iteration. If you still can't figure it out, I'll show you some different ways to do it.
Some Hints
Your "worker" predicate will likely look a lot like your earlier "find a route between two stations" exercise, but it will have an extra argument, the accumulator for elapsed time.
There are two special cases. If you use the approach you used in your "find a route between two stations" solution, the special cases are
A and B are directly adjacent.
A and B are connected via at least one intermediate stop.
There's another approach as well, that might be described as using lookahead, in which case the special cases are
A and B are the same, in which case you're arrived.
A and B are not and are connected by zero or more intermediate stops.
FWIW, You shouldn't necessarily expect the route with the shortest elapsed time or the minimal number of hops to be the first solution found. Backtracking will produce all the routes, but the order in which they are found has to do with the order in which the facts are stored in the database. A minimal cost search of the graph is another kettle of fish entirely.
Have you tried to cycle through answers with ;? 26mins is not the shortest time between newcrossgate and westcroydon...
Edit: my bad! Apparently the shorter results were due to a bug in your code (see my comment about the 4th clause). However, your code is correct, 15mins is the shortest route between newcrossgate and crystalpalace. Only because there is a route that goes from newcrossgate to westcroydon, then crystalpalace, that doesn't mean it's the shortest route, or the route your program will yield first.
Update: if you're running into problems to find answers to some routes, I'd suggest changing the 3rd clause to:
istime(X,Y,_,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
The reason is simple: your first clause swaps X with Y, which is good, since with that you're saying the routes are symmetrical. However, the 3rd clause does not benefit from that, because it's never called by the swapped one. Ignoring the 3rd argument (which you're not using anyway) and thus letting the 1st clause call the 3rd might fix your issue, since some valid routes that were not used previously will be now.
(also: I agree with Nicholas Carey's answer, it would be better to use the third argument as an accumulator; but as I said, ignoring it for now might just work)
To make it work you need to do the reverse of both journeys stated in your last clause.
Keep the predicate as it is, istime(X,Y,Z) and just make another clause containing the reverse journeys.
This way it works with all the stations. (Tried and Tested)