I am having trouble understanding the SARSA algorithm:
http://en.wikipedia.org/wiki/SARSA
In particular, when updating the Q value what is gamma? and what values are used for s(t+1) and a(t+1)?
Can someone explain this algorithm to me?
Thanks.
Gamma determines how much memory your algorithm has. If you set it to 0.0, then your algorithm will not update the value function Q at all. If you set it to 1.0, then the new experience will be given as much weight as all the previous experiences combined. The best values lie inbetween and have to be determined experimentally.
Here is how it works:
In your first step, you just get a state. Simply store it away as st. Also, look up your value function for the best action to make in this state and store it as at.
In each subsequent step, you get rt+1 and st+1. Again, use your value function to find the best action — at+1. The value of the transition from your previous action to the new one is equal to rt+1+Q(st+1,at+1)-Q(st,at). Use this to update your long-term estimate of the previous action's value Q(st,att). Finally, store st+1 and at+1 as st and at for the next step.
In effect, the value function is just a running average of these update values for each action and every state.
Related
The N2 diagram for my full problem is below.
The N2 diagram for the coupled portion of the problem is below.
I have a DirectSolver handling the coupling between LLTForces and ImplicitLiftingLine, and an LNBGS solver handling the coupling between LiftingLineGroup and TestCL.
The gist for the problem is here: https://gist.github.com/eufren/31c0e569ed703b2aea3e2ef5360610f7
I have implemented guess_nonlinear() on ImplicitLiftingLine, which should use various outputs from LLTGeometry to give a good initial guess for the vortex strengths based on a linearised form of the governing equations.
def guess_nonlinear(self, inputs, outputs, resids):
freestream_unit_vector = inputs['freestream_unit_vector']
freestream_velocity = inputs['freestream_velocity']
n = inputs['normal_vectors']
A = inputs['surface_areas']
l = inputs['bound_vortices']
ic_tot = inputs['influence_coefficients_total']
v_inf = freestream_velocity
v_inf_vec = v_inf*freestream_unit_vector
lin_numerator = np.pi * v_inf * A * np.sum(n * v_inf_vec, axis=1)
lin_denominator = (np.linalg.norm(np.cross(v_inf_vec, l), axis=1) - np.pi * v_inf * A * np.sum(np.sum(n * ic_tot, axis=2), axis=1))
lin_vtx_str = lin_numerator / lin_denominator
outputs['vortex_strengths'] = lin_vtx_str
However, when the problem is run for the first time, any inputs not explicitly set with p.set_val() are all 1s. This causes guess_nonlinear() to give a bad output and so the system fails to converge:
As far as I can tell, the execution order for the LLT group is correct, and the geometry components should be being executed before the implicit component. I'm confused as to why this doesn't seem to actually be happening when the code is run, and instead these inputs are taking their default values.
What do I need to change to get this to work properly? Additionally, I've found difficulty in getting LNBGS to converge (hence adding guess_nonlinear()) during optimisation - only DirectSolver gets all the way through the optimisation without issues, but it's very slow for large numbers of LLT nodes). How can I improve the linear and nonlinear solver selection, and improve the reliability of the iterative solver?
Note: Thanks for providing a testable example. It made figuring out the answer to your question a lot simpler. Your problem was a bit subtle and I would not have been able to give a good answer without runnable code
Your first question: "Why are all the inputs 1"
"Short" Answer
You have put the nonlinear solver to high in the model hierarchy, which then included a key precurser component that computed your input values. By moving the solver down to a lower level of the model, I was able to ensure that the precurser component (LTTGeometry) ran and had valid outputs before you got to the guess_nonlinear of implicit component.
Here is what you had (Notice the implicit solver included LTTGeometry even though the data cycle does not require that component:
I moved both the nonlinear solver and the linear solver down into the LTTCycle group, which then allows the LTTGeometry component to execute before getting to the nonlinear solver and guess_nonlinear step:
My fix is only partially correct, since there is a secondary cycle from the TestCL component that also needs a solver and does not have one. However, that cycle still does not involve the LTTGeometry group. So the fully correct fix is to restructure you model top run geometry first, and then put the LTTCycle and TestCL groups together so you can run a solver over just them. That was a bit more hacking than I wanted to do on your test problem, but you can see the general idea from the adjusted N2 above.
Long Answer
The guess_nonlinear sequence in OpenMDAO does NOT run the compute method of explicit components or of groups. It follows the execution hierarchy, and calls any guess_nonlinear that it finds. So that means that any explicit components you have in your model will NOT get executed, their outputs will not get updated with computed values, and those computed values will not get passed to the inputs of downstream components.
Things get a little tricky when you have deep model hierarchies. The guess_nonlinear method is called as the first step in the nonlinear solver process. If you have a NonLinearRunOnce solver at the top level, it will follow the compute chain down the line calling compute or solve_nonlinear on each child and doing a data transfer after each one. If one of those children happens to be a group with a nonlinear solver, then that solver will call guess_nonlinear on its children (grandchildren of the top group with the NonLinearRunOnce solver) as the first step. So any outputs that were computed by the siblings of this group will be valid, but none of the outputs from the grandchild level will have been computed yet.
You may be wondering why not just have the guess_nonlinear method call the compute for any explicit components? There is a difficult to balance trade off here. If you assume that all explicit components are very cheap to run, then it might make sense to run the compute methods --- or it might not. A lot depends on the cyclic data structure. If some early component in the group needs guesses from the later one, then running its compute isn't going to help you much at all. Perhaps more importantly though, not all explicit components are cheap to run. You might have a very expensive computation, and calling compute as part of the guess process would be way too costly.
The compromise here, if you need some kind of top level guess process, is that you can implement guess_nonlinear at the group level. It's less common to do, but it gives you total control over what happens. You can call whatever you need to call in whatever sequence.
So the absolute key thing to remember is that the only data you have available to you when a guess_nonlinear is called is any data that was computed before your containing solver was executed. That means any thing that was computed before you got to the model scope of the containing solver (not the scope of the component with the guess_method itself).
Your second question: "How can I speed this up when the number of nodes gets large?"
This one not possible to give a generic answer to at all. I noticed that you have already specified sparse partial derivatives. That is a great start, but if its still not fast enough for you then it means you're reaching the limits of what you can do with a DirectSolver. You note that this solver is the only one that gets you through the optimization without issues, which I will take to mean that ScipyKryloventer link description here and PetscKrylov are not converging the linear system well for you --- at least not by themselves. Thats not surprising, as krylov solvers almost always require some kind of preconditioner... and this is why I can't offer a generic answer. Setting up efficient linear solvers for larger-scale compute is a tricky subject. If you look into the literature, you'll find some good suggestions. You can also study open source implementations like VSPAero for some tips.
effectively, you've reached the limit of what simple linear solvers can offer you. From this point forward, OpenMDAO can help a bit by making it easier to implement some preconditioning, but you'll have to suffer the math side yourself.
How to use an step function for an array in Anylogic?
step function is applied to double values, but I want to applied on elements of an array at a specific time.
You can't... so this is a solution:
Instead of an array you should use a linkedHashMap where your key is the specific time and the element is the step value you want at that time. So you defined it as follow:
And you put the values like this:
stepsArray.put(3.0,2.3);
where 3.0 is the time in which the step will occur and 2.3 is the value the step will take. You have to put there all the values you need. You are the one who has to fill these values according to your needs.
Then you create an cyclic event that will evaluate if it's time to apply a step and you create a variable of type double that will be the one storing the value of the step.
So, the event:
double theTime=round(100*time())/100.0;//it's better to round up the time just in case
if(stepsArray.containsKey(theTime)){
variable=stepsArray.get(theTime);
}
note that I'm using a variable, not a dynamic variable.. they you can connect the variable to wherever your step is needed in the sd model.
This method is a bit complicated, but it's the most general for your completely ambiguous question.
Not sure Felipe's approach is the best one but maybe I misunderstand the question.
Have you tried using a "table function" object? Define it as below where the horizontal axis represents the time unit and the vertical your step function data:
Then, use a cyclic event that every relevant time unit (depends on your model) pulls the current required value from the table function:
I'm trying to evaluate Apache Flink for the use case we're currently running in production using custom code.
So let's say there's a stream of events each containing a specific attribute X which is a continuously increasing integer. That is a bunch of contiguous events have this attributes set to N, then the next batch has it set to N+1 etc.
I want to break the stream into windows of events with the same value of X and then do some computations on each separately.
So I define a GlobalWindow and a custom Trigger where in onElement method I check the attribute of any given element against the saved value of the current X (from state variable) and if they differ I conclude that we've accumulated all the events with X=CURRENT and it's time to do computation and increase the X value in the state.
The problem with this approach is that the element from the next logical batch (with X=CURRENT+1) has been already consumed but it's not a part of the previous batch.
Is there a way to put it back somehow into the stream so that it is properly accounted for the next batch?
Or maybe my approach is entirely wrong and there's an easier way to achieve what I need?
Thank you.
I think you are on a right track.
Trigger specifies when a window can be processed and results for a window can be emitted.
The WindowAssigner is the part which says to which window element will be assigned. So I would say you also need to provide a custom implementation of WindowAssigner that will assign same window to all elements with equal value of X.
A more idiomatic way to do this with Flink would be to use stream.keyBy(X).window(...). The keyBy(X) takes care of grouping elements by their particular value for X. You then apply any sort of window you like. In your case a SessionWindow may be a good choice. It will fire for each key after that key hasn't been seen for some configurable period of time.
This approach will be much more robust with regard to unordered data which you must always assume in a stream processing system.
I have a nonlinear equation that uses an initial y'(a) and outputs a value y(b) such that y(b)=f(y'(a)), where f(x) is some function. The idea is that I'd like to be able to maximize y(b).
Typically, if I had a value for y(b), I could use the shooting or secant method. However I don't have that value. I was thinking I could use a loop to find the max value, but that is very inefficient. Anything better I could use?
*Edit: Also I do not have an explicit expression for f(x).
Thanks,
Mike
I have been making a labview program for kids to moniter energy production from various types of power sources. I have a condition where if they are underproducing a warning will fire, and if they are overproducing by a certian threshold, another warning will fire.
I would like to time how long throughout the activity, each type of warning is fired so each group will have a score at the end. This is just to simulate how the eventual program will behave.
Currently I have a timer which can derrive the amount of time the warning is true, but it will overwrite itself each time the warning goes off and on again.
So basically I need to to sum up the total time that the value has been true, even when it has flitted between true and false.
One method of tabulating the total time spent "True" would be exporting the Warning indicator from the While-loop using an indexed tunnel. If you also export from the loop a millisecond counter value of when the indicator was triggered, you can post process what will be an array of True/False values with the corresponding time at which the value transitioned.
The post processing could be a for-loop that keeps a running total of time spent true.
P.s. if you export your code as a VI snippet, others will be able to directly examine and modify the code without needing to remake it from scratch. See the NI webpage on the subject:
http://www.ni.com/white-paper/9330/en/
I would suggest going another way. Personally, I found the code you used confusing, since you subtract the tick count from the value in the shift register, which may work, but doesn't make any logical sense.
Instead, I would suggest turning this into a subVI which does the following:
Keep the current boolean value, the running total and the last reset time in shift registers.
Initialize these SRs on the first call using the first call primitive and a case structure.
If the value changes from F to T (compare the input to the SR), update the start time.
If it changes from T to F, subtract the start time from the current time and add that to the total.
I didn't actually code this now, so there may be holes there, but I'm leaving that as an exercise. Also, I would suggest making the VI reentrant. That way, you can simply call it a second time to get the same functionality for the second timer.