Event Time and Watermarks can you explain it

Event Time and Watermarks can you explain it - apache-flink

I am new to flink, and am trying to learn the Event Time and Watermarks section.
Can you explain what is Watermarks, and what problem it solves? The example is not clear to me.
does it only need for event time (out of order processing)?

The purpose of the watermark is to define when a time-based window should fire.
Watermarks allow for the idea that events might be slightly out of order, so the time "extracted" from it might differ by some amount from where you would like to draw the "low water" mark for firing that window. For example if your data is generated from disparate sources that have varied latency before arrived (consider situation of distributed logging).However, you might not need this if your data is guaranteed to only have ascending timestamps, for example if it is generated from sensor readings.
So this goes hand in hand with the some of the pre-defined Watermark generators that Flink provides which, not surprisingly, line up with the options.

Related

Event time window in Flink does not trigger

When I use flink event time window, the window just doesn't trigger. How can I solve the problem, and are there any ways to debug?

As you are using the event time window, it is probably a watermark problem. The window only output when watermarks make a progress. There are some reasons why the event time has not been advanced:
There are no data from the source
One of the source parallelisms doesn't have data
The time field extracted from the record should be millisecond instead of second.
Data should cover a longer time span than the window size to advance the event time.
The window will output if we change event time to processing time. Furthermore, we can monitor event time by checking the watermarks in the web dashboard[1] or print-debug it with a ProcessFunction which can lookup the current watermark.
[1] https://ci.apache.org/projects/flink/flink-docs-master/monitoring/debugging_event_time.html#monitoring-current-event-time

Be sure you're setting environment.setStreamTimeCharacteristic(TimeCharacteristic.EventTime).

how flink handles early event? ignore or create separate window?

The watermark and late event handling is easy to understand, but how about early event? For example, if the original stream contains events happened from 3:00 to 4:00, but if I insert some events which happened from 6:00 to 7:00 into the stream, then how flink handles them? It would create separate window(s) for them and when the window expires, they get handled too?

Depending on the watermarking strategy, early events can advance the watermark and then cause subsequent "on time" events to be considered late.

Early events are not dropped but put into the corresponding window. The window is processed when the watermark passes the end timestamp of the window. So, Flink is able to maintain several windows at the same time.

Process lots of small tasks and keep the UI responsive

I have a WPF application that needs to do some processing of many small tasks.
These small tasks are all generated at the same time and added to the Dispatcher Queue with a priority of Normal. At the same time a busy indicator is being displayed. The result is that the busy indicator actually freezes despite the work being broken into tasks.
I tried changing the priority of these tasks to be Background to see if that fixed it, but still the busy indicator froze.
I subscribed to the Dispatcher.Hooks.OperationStarted event to see if any render jobs occurred while my tasks were processing but they didn't.
Any ideas what is going on?
Some technical details:
The tasks are actually just messages coming from an Observable sequence, and they are "queued" into the dispatcher by a call to ReactiveUI's ObserveOn(RxApp.MainThreadScheduler) which should be equivalent to ObserveOn(DispatcherScheduler). The work portion of each of these tasks is the code that is subscribing through the ObserveOn call e.g.
IObservable<TaskMessage> incomingTasks;
incomingTasks.ObserveOn(RxApp.MainThreadScheduler).Subscribe(SomeMethodWhichDoesWork);
in this example, incomingTasks would produce maybe 3000+ messages in short succession, the ObserveOn pushes each call to SomeMethodWhichDoesWork onto the Dispatcher queue so that it will be processed later

The basic problem
The reason you are seeing the busy indicator stall is because your SomeMethodWhichDoesWork is taking too long. While it is running, it prevents any other work from occuring on the Dispatcher.
Input and Render priority operations generated to handle animations are lower than Normal, but higher priority than Background operations. However, operations on the Dispatcher are not interrupted by the enqueing of higher priority operations. So a Render operation will have to wait for a running operation, even if it is a Background operation.
Caveat regarding observing on the DispatcherScheduler
ObserveOn(DispatcherScheduler) will push everything through at Normal priority by default. More recent versions of Rx have on overload that allows you to specify a priority.
One point to highlight that's often missed is that items will be queued onto the Dispatcher by the DispatcherScheduler as soon as they arrive NOT one after the other.
So if your 3000 items all turn up fairly close together, you will have 3000 operations at Normal priority backed up on the Dispatcher blocking everything of the same or lower priority until they are done - including Render operations. This is almost certainly what you were seeing - and that means you might still see problems even if you do all but the UI update work on a background thread depending on how heavy your UI updates are.
In addition to this, you should check you aren't running the whole subscription on the UI thread - as Lee says. I usually write my code so that I Subscribe on a background thread rather than use SubscribeOn, although this is perfectly fine too.
Recommendations
Whatever you do, do as much work as possible on a background thread. That point has been done to death on StackOverflow, and elsewhere. Here are some good resources covering this:
MSDN Entry on WPF Threading Model
MSDN Magazine "Build More Responsive Apps With The Dispatcher", by Shaun Wildermuth
If you want to keep the UI responsive in the face of lots of small updates you can either:
Schedule items at a lower priority, which is nice and easy - but not so good if you need a certain priority
Store updates in your own queue and enqueue them and have each operation you run Invoke the next item from your queue as it's last step.
The bigger picture
It's worth stepping back a bit and looking at the bigger picture as well.
If you separately dump 3000 items into the UI in succession, what's that going to do for the user? At best they are going to be running a monitor with a refresh rate of 100Hz, probably lower. I find that frame rates of 10 per second are more than adequate for most purposes.
Not only that, human beings supposedly can't handle more than 5-9 bits of information in one go - so you might find better ways of aggregating and displaying information than updating so many things at once. For example, make use of master/detail views rather than showing everything on screen at once etc. etc.
Another option is to review how much work your UI update is causing. Some controls (I'm looking at you XamDataGrid) can have very lengthy measure/arrange layout operations. Can you simplify your animations? Use a simpler Visual tree? Think about the popular busy spinner that looks like circling dots - but really it's just changing their color. A great effect that is fairly cheap to achieve. It's worth profiling your application to see where time is going.
I would think about the overrall approach front-to-back as well. If you are reasonably certain you are going to get that many items to update at once, why not buffer them up and manage them in chunks? That would might have advantages all the way back to the source - which perhaps is on a server somewhere? In any case, Rx has some nice operators, like Buffer that can turn a stream of individual items into a larger lists - and it has overloads that can buffer by time and size together.

Have you tried using .SubscribeOn(TaskPoolScheduler.TaskPool) to subscribe on a different thread?

#Pedro Pombeiro has the right answer.
The reason you are seeing the freezes on the UI is that you are queueing the work on the Dispatcher. This means the work will be done on the UI thread. You can think of the Dispatcher as a message pump that is constant draining messages from each of its queues (which you can think of each of the priorities [SystemIdle, ApplicationIdle, ContextIdle, Background, Input, Loaded, Render, DataBind, Normal, Send])
Putting you work onto a different priority queue, does not make it run concurrently, just asynchronously.
To run your work on another thread using Rx, then use SubscribeOn as above. Remember to then schedule any updates to the UI back on to the Dispatcher with ObserveOn.

How to synchronize two MediaPlayers or MediaElements in WPF?

I am trying to implement a system in WPF that plays two synchronized videos on two screens. I thought that if I bundled the two corresponding MediaTimelines into a single ParallelTimeline and control the timelines from the clock controller of the ParallelTimeline the clocks of the media timelines would be driven from the same clock and thus play in sync. Only that is not the case, there is a huge delay between both. Is there some way of doing this?
Thanks

If your two MediaTimelines are in the same storyboard (and it sounds like they are), you should be able to keep the elements in sync by changing the ParallelTimeline.SlipBehavior to SlipBehavior.Slip. This behavior will "hold back" the progression of the timelines if a media element in the storyboard runs into buffering or loading delays.
You can get more details about this behavior here:
http://msdn.microsoft.com/en-us/library/cc304465.aspx

What is the most intuitive, usable way of entering a time of day or a duration?

I'm building a line-of-business application in Silverlight and need to get the user to edit two .NET TimeSpan values. One is a time of day (relative to midnight) and the other is a duration. Currently I'm using two TextBoxes, formatted as hh:mm. This is pretty straightforward, but it could definitely be improved. I've observed people using the application and while some have no problem quickly entering times, other people struggle.
Given that I'm working in Silverlight2, what would you see as the perfect custom control that easily let you visualize and edit these two TimeSpans?
To make things harder, the UI should allow any time of the day to be selected with accuracy down to the minute, but emphasize times within the normal working day (eg: 8:00am - 5:00pm). Some users tend to enter 2:00 (am) when they really mean 2:00pm.
In my app, I'm tending towards aligning the times and durations to 5 minute intervals. As a bit of background, this app is similar to a room booking app where people specify when and how long they want a room for.

In one of my web applications I used a slider with 2 handles.
Example:
|.........Y-----------------Y...|
5AM 8PM
Of course I didn't need as high precision as you do, but I believe that with slightly longer slider 5min intervals would be possible.
To emphasize normal workday, you could colour background of the slider in different colour for normal workday. Or make handlers "snap" to start and end of normal workday.

It probably depends on how accurate you need your data and how varying it can be. If it doesn't need to be perfectly accurate and it doesn't vary a lot, you could do something like
Task was performed at [select start time...] o'clock for [select duration...]
where [select start time...] is a pulldown with every hour and [select duration...] is a pulldown with common scenarios for what you're tracking like "30 mins", "1 hour", "2 hours"
If it needs to be flexible maybe just going with the sentence structure and replacing the pulldowns with textboxes would make it clear for all first time users.

Get the latest Sliverlight Toolkit and use one of the new Time oriented controls

Look at Outlook perhaps, it uses dropdowns that defaults to sane half hours (to me anyway ;) and the selections can then be edited by hand afterwards if higher precision is wanted. The duration also follows when the start time is changed, and defaults to an hour or something.
I used text boxes in an old web application before just like you, with the added option of double-clicking them to bring up a quick selection widget like the above Outlook Sample. Perhaps a button or some other Silverlight magic can enhance that.
A vertical time-line like a calendar day in Outlook where you can drag the top and bottom of a meeting "box" is to me the most instuitive or atleast quickest way to place and adjust a booking. Perhaps if it's prefilled with one that spans an hour or so, easily draggable to change the start time - with the top and bottom resizeable to change the duration.

Expanding on what Anthony said, Silverlight Toolkit March 2009 release include TimePicker & TimeUpDown controls.
You can see a live demo of TimeUpDown and TimePicker with 2 popups at:
http://silverlight.codeplex.com/Wiki/View.aspx?title=Silverlight%20Toolkit%20Overview%20Part%201#TimeUpDown
I Actually owned the feature set and API for this control, so I'm extremely well familiar nowadays with what's the best form to input time.
There's a whole list of best practices we can talk about for time input. All of which are currently easily found in the controls.
On some concepts we've had to innovate (like the "Time Intellisense" feature) but mostly we were using true time tested concepts. (no pun intended)
However, as part of the non-goals for these controls for v1 we decided to not support time ranges. If you feel that time ranges is something we should natively support, feel free to suggest this on codeplex:
http://silverlight.codeplex.com/WorkItem/Create.aspx
We actively prioritize items based on amount of votes and user scenarios called on in issues.
For now, I'd suggest you just use 2 TimePickers.
Advanced visualizations (like a multi select ruler or a multi slider) are one way of doing time range input, but you've got to have a solid globalized text input system for a fallback option.

This is a great time to ask what task your users are trying to accomplish. You can craft your system's performance based on this. In Outlook, for example, people usually enter the time because they are trying to schedule a meeting -- so you can easily disambiguate "2" or "2:00" to mean 2pm, because very few users are trying to schedule meetings at 2 am. This sounds similar to your application.
If you look at your users, they will likely also be scheduling for typical times -- these should be easy to specify in your interface. E.g., if most meetings are 50 minutes long, that should be very salient, perhaps a button or other one-click option.
I wouldn't recommend inventing a new input widget. The more standard your input tool, the less your users have to think when using your product. Concentrate on the smarts inside your logic, figuring out (and showing the user) what you think they're asking for.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight