Flink trigger on a custom window - apache-flink

I'm trying to evaluate Apache Flink for the use case we're currently running in production using custom code.
So let's say there's a stream of events each containing a specific attribute X which is a continuously increasing integer. That is a bunch of contiguous events have this attributes set to N, then the next batch has it set to N+1 etc.
I want to break the stream into windows of events with the same value of X and then do some computations on each separately.
So I define a GlobalWindow and a custom Trigger where in onElement method I check the attribute of any given element against the saved value of the current X (from state variable) and if they differ I conclude that we've accumulated all the events with X=CURRENT and it's time to do computation and increase the X value in the state.
The problem with this approach is that the element from the next logical batch (with X=CURRENT+1) has been already consumed but it's not a part of the previous batch.
Is there a way to put it back somehow into the stream so that it is properly accounted for the next batch?
Or maybe my approach is entirely wrong and there's an easier way to achieve what I need?
Thank you.

I think you are on a right track.
Trigger specifies when a window can be processed and results for a window can be emitted.
The WindowAssigner is the part which says to which window element will be assigned. So I would say you also need to provide a custom implementation of WindowAssigner that will assign same window to all elements with equal value of X.

A more idiomatic way to do this with Flink would be to use stream.keyBy(X).window(...). The keyBy(X) takes care of grouping elements by their particular value for X. You then apply any sort of window you like. In your case a SessionWindow may be a good choice. It will fire for each key after that key hasn't been seen for some configurable period of time.
This approach will be much more robust with regard to unordered data which you must always assume in a stream processing system.

Related

How do I find the event time difference between consecutive events in Flink?

I want to find the event time difference between every two consecutive input events. If the time difference is above a certain threshold then I want to output an event signalling the threshold has been breached. I also want the first event of the stream to always output this breach signal as an indication that it does not have a previous event to calculate a time difference with.
I tried using Flink's CEP library as it ensures that the events are ordered by event time.
The pattern I created is as follows:
Pattern.begin("begin").optional().next("end");
I use the optional() clause to cater for the first event as I figured the first event would be the only event where "begin" would not have a value.
When I input my events a1 a2 a3 a4 a5 I get the following output matches:
{a1} {a1 a2} {a2} {a2 a3} {a3} {a3 a4} {a4} {a4 a5}...
However I want the following as it will allow me to calculate the time difference between each consecutive event.
{a1} {a1 a2} {a2 a3} {a3 a4} {a4 a5}...
I have tried playing around with different AfterMatchSkipStrategy settings as well as IterativeCondition clauses but with no success.
Marking "begin" as optional is what's causing the unwanted matches. I would look for some other way to generate the breach signal for the first event -- e.g., perhaps you could prepend a dummy first event.
Another approach would be to only use CEP or SQL for sorting the stream, and then use a RichFlatMap or stateful process function to implement the business logic: i.e., compute the differences and generate the breach signals.
See Can I use Flink CEP to sort a stream? for how to do this.

How do I select whether the routine continues based on the participant's response?

I want to create an experiment in PsychoPy Builder that conditionally shows a second routine to participants based on their keyboard response.
In the task, I have a loop that first goes through a routine where participants have three options to respond ('left','right','down') and only if they select 'left', regardless of the correct answer, should they see a second routine that asks a follow-up question to respond to. The loop should then restart with routine 1 each time.
I've tried using bits of code in the "begin experiment" section as such:
if response.key=='left':
continueRoutine=True
elif response.key!='left':
continueRoutine=False
But here I get an error saying response.key is not defined.
Assuming your keyboard component is actually called response, the attribute you are looking for is called response.keys. It is pluralised as it returns a list rather than a single value. This is because it is capable of storing multiple keypresses. Even if you only specify a single response, it will still be returned as a list containing just that single response (e.g. ['left'] rather than 'left'). So you either need to extract just one element from that list (e.g. response.keys[0]) and test against that, or use a construction like if 'left' in response.keys to check inside the list.
Secondly, you don't need to have a check that assigns True to continueRoutine, as it defaults to being True at the beginning of a routine. So it is only setting it to False that results in any action. So you could simply do something like this:
if not 'left' in response.keys:
continueRoutine = False
Lastly, for PsychoPy-specific questions, you might get better support via the dedicated forum at https://discourse.psychopy.org as it allows for more to-and-fro discussion than the single question/answer structure here at SO.

Step Function For Array in Anylogic

How to use an step function for an array in Anylogic?
step function is applied to double values, but I want to applied on elements of an array at a specific time.
You can't... so this is a solution:
Instead of an array you should use a linkedHashMap where your key is the specific time and the element is the step value you want at that time. So you defined it as follow:
And you put the values like this:
stepsArray.put(3.0,2.3);
where 3.0 is the time in which the step will occur and 2.3 is the value the step will take. You have to put there all the values you need. You are the one who has to fill these values according to your needs.
Then you create an cyclic event that will evaluate if it's time to apply a step and you create a variable of type double that will be the one storing the value of the step.
So, the event:
double theTime=round(100*time())/100.0;//it's better to round up the time just in case
if(stepsArray.containsKey(theTime)){
variable=stepsArray.get(theTime);
}
note that I'm using a variable, not a dynamic variable.. they you can connect the variable to wherever your step is needed in the sd model.
This method is a bit complicated, but it's the most general for your completely ambiguous question.
Not sure Felipe's approach is the best one but maybe I misunderstand the question.
Have you tried using a "table function" object? Define it as below where the horizontal axis represents the time unit and the vertical your step function data:
Then, use a cyclic event that every relevant time unit (depends on your model) pulls the current required value from the table function:

Laview PID.vi continues when event case is False

I'm looking for a way to disable the PID.vi from running in Labview when the event case container is false.
The program controls motor position to maintain constant tension on a cable using target force and actual force as the input parameters. The output is motor position. Note that reinitialize is set to false since it needs previous instances to spool the motor.
Currently, when the event case is true the motor spools as expected and maintains the cable tension. But when the event case state is toggled the PID.vi seems to be running in the background when false causing the motor spool sporatically.
Is there a way to freeze the PID controls so that it continues from where it left off?
The PID VI does not run in the background. It only executes when you call it. That said, PID is a time-based calculation. It calculates the difference from the last time you called the VI and uses that to calculate the new values. If a lot of time passed, it will just try to fix it using that data.
If you want to freeze the value and then resume fixing smoothly, you can use the limits input on the top and set the max and min to your desired output. This will cause the PID VI to always output that value. You will probably need a feedback node or shift register to remember the last value output by the PID.
What Yair said is not entirely true - the integral and derivative terms are indeed time dependent, but the proportional is not. A great reference for understanding PIDs and how they are implemented in LabVIEW can be found here (not sure why it is archived). Also, the PID VIs are coded in G so you can simply open them to see how they operate.
If you take a closer look at the PID VI, you can see what is happening and why you might not get the response you expect. In the VI itself, dt will be either 1) what you set it to, or 2) an accumulation of time based on a tick count stored in the VI (the default). Since you have not specified a dt, the PID algorithm uses the accumulated time between calls. If you have "paused" calculation for some time, this will have an impact on the integral and derivative output.
The derivative output will kick in when there is a change in the process variable (use of the process variable prevents derivative kick). The effect of a large accumulated time between calls will be to reduce the response of this term. The time that you have paused will have a more significant impact on the integral term. Since the response of the integral portion of the controller is the proportional to the integral of the error over dt, the longer you pause the larger the response simply because because the the algorithm is performing a trapezoidal integration over dt.
My first suggestion is don't pause the controller - let the PID do what it is supposed to do. If you are using it properly, then you should not have to stop the controller action. But, if you must pause the controller action, consider re-initializing the controller. This will force the controller to reset the accumulated time term and the response in the first iteration will be purely proportional.
Hope this helps.

LabVIEW: How to exchange lots of variables between loops?

I have two loops:
One loop gets data from a device and processes it. Scales received variables, calculates extra data.
Second loop visualizes the data and stores it.
There are lots of different variables that need to passed between those two loops - about 50 variables. I need the second loop to have access only to the newest values of the data. It needs to be able to read those variables any time they are needed to be visualized.
What is the best way to share such vector between two loops?
There are various ways of sharing data.
The fastest and simplest is a local variable, however that is rather uncontrolled, and you need to make sure to write them at one place (plus you need an indicator).
One of the most advanced options is creating a class for your data, and use an instance (if you create a by-ref class, otherwise it won't matter), and create a public 'GET' method.
In between you have sevaral other options:
queues
semaphores
property nodes
global variables
shared variables
notifiers
events
TCP-IP
In short there is no best way, it all depends on your skills and application.
As long as you're considering loops within the SAME application, there ARE good and bad ideas, though:
queues (OK, has most features)
notifiers (OK)
events (OK)
FGVs (OK, but keep an eye on massively parallel access hindering exec)
semaphores (that's not data comms)
property nodes (very inefficient, prone to race cond.)
global variables (prone to race cond.)
shared variables (badly implemented by NI, prone to race cond.)
TCP-IP (slow, awkward, affected by firewall config)
The quick and dirty way to do this is to write each value to an indicator in the producer loop - these indicators can be hidden offscreen, or in a page of a tab control, if you don't want to see them - and read a local variable of each one in the consumer loop. However if you have 50 different values it may become hard to maintain this code if you need to change or extend it.
As Ton says there are many different options but my suggestion would be:
Create a cluster control, with named elements, containing all your data
Save this cluster as a typedef
Create a notifier using this cluster as the data type
Bundle the data into the cluster (by name) and write this to the notifier in the producer loop
Read the cluster from the notifier in the consumer loop, unbundle it by name and do what you want with each element.
Using a cluster means you can easily pass it to different subVIs to process different elements if you like, and saving as a typedef means you can add, rename or alter the elements and your code will update to match. In your consumer loop you can use the timeout setting of the notifier read to control the loop timing, if you want. You can also use the notifier to tell the loops when to exit, by force-destroying it and trapping the error.
Two ways:
Use a display loop with SEQ (Single Element Queue)
Use a event structure with User Event. (Do not put two event structures in same loop!! Use another)
Use an enum with case structure and variant to cast the data to expected type.
(Notifier isn't reliable to stream data, because is a lossy scheme. Leave this only to trigger small actions)
If all of your variables can be bundled together in a single cluster to send at once, then you should use a single element queue. If your requirements change later such that the transmission cannot be lossy, then it's a matter of changing the input to the Obtain Queue VI (with a notifier you'd have to swap out all of the VIs). Setting up individual indicators and local variables would be pretty darn tedious. Also, not good style.
If the loops are inside of the same VI then:
The simplest solution would be local variables.
Little bit better to use shared variables.
Better is to use functional global variables (FGVs)
The best solution would be using SEQ (Single Element Queue).
Anyway for better understanding please go trough this paper.

Resources