We have a DB table where every row has a text message and a timestamp. E.g.
Mesg1 09:00
Mesg2 09:01
Mesg3 09:15
Mesg4 09:20
The timings are not at a fixed interval. It is uneven. We would like to read the table as a Source and send the Messages to a Target at the configured timestamps. Components like Quartz do not allow configuring uneven trigger times.
Is there a common pattern that can be followed for such a use case?
Regards,
Yash
Use camel cron component for the trigger events.
from("cron:tab?schedule=0/1+*+*+*+*+?")
.setBody().constant("event")
.log("${body}");
The schedule expression 0/3+10+++*+? can be also written as 0/3 10 * * * ? and triggers an event every three seconds only in the tenth minute of each hour.
Related
I have an AppleScript that runs on loop every two hours to modify a calendar B based on updates from another calendar A.
The script uses the on idle command below to wait 2 hours every loop. What happens if the computer stays idle for 1.5 hours then goes to sleep for 10 hours? Will there be 0.5 hours left when it wakes up? Any other scenarios?
on idle
my_code()
return (120 * minutes)
end idle
The script truly only needs to run if there is an update to calendar A, which is a shared iCloud calendar and can get updates from multiple people. The two hour loop is what I could figure out so far but I feel it is not efficient. Any more robust suggestions? Is there a way I can trigger the script to run only when it detects an update in calendar A? Or, along the same line of thought, is there a way to get the last timestamp the calendar was updated?
Thanks
I can't test following. Not sure it is the best way to solve your problem. Try yourself:
property oldStampDates : {}
on run
tell application "Calendar" to tell calendar "Test Calendar" to set oldStampDates to get stamp date of events
end run
on idle
--> Script retrieves last modified date and time of indicated calendar events.
tell application "Calendar" to tell calendar "Test Calendar" to set newStampDates to get stamp date of events
if newStampDates is not oldStampDates then display notification "The changes was detected"
set oldStampDates to newStampDates
return 30 -- seconds, default setting
end idle
NOTE: 1) you can put instead of display notification call to your handler my_code(), 2) you can put instead of 30 seconds other value, for example, return 10 (checking every 10 seconds).
I have a use case where I have 2 input topics in kafka.
Topic schema:
eventName, ingestion_time(will be used as watermark), orderType, orderCountry
Data for first topic:
{"eventName": "orderCreated", "userId":123, "ingestionTime": "1665042169543", "orderType":"ecommerce","orderCountry": "UK"}
Data for second topic:
{"eventName": "orderSucess", "userId":123, "ingestionTime": "1665042189543", "orderType":"ecommerce","orderCountry": "USA"}
I want to get all the userid for orderType,orderCountry where user does first event but not the second one in a window of 5 minutes for a maximum of 2 events per user for a orderType and orderCountry (i.e. upto 10 mins only).
I have union both topics data and created a view on top of it and trying to use flink cep sql to get my output, but somehow not able to figure it out.
SELECT *
FROM union_event_table
MATCH_RECOGNIZE(
PARTITION BY orderType,orderCountry
ORDER BY ingestion_time
MEASURES
A.userId as userId
A.orderType as orderType
A.orderCountry AS orderCountry
ONE ROW PER MATCH
PATTERN (A not followed B) WITHIN INTERVAL '5' MINUTES
DEFINE
A As A.eventName = 'orderCreated'
B AS B.eventName = 'orderSucess'
)
First thing is not able to figure it out what to use in place of A not followed B in sql, another thing is how can I restrict the output for a userid to maximum of 2 events per orderType and orderCountry, i.e. if a user doesn't perform 2nd event after 1st event in 2 consecutive windows for 5 minutes, the state of that user should be removed, so that I will not get output of that user for same orderType and orderCountry again.
I don't believe this is possible using MATCH_RECOGNIZE. This could, however, be implemented with the DataStream CEP library by using its capability to send timed out patterns to a side output.
This could also be solved at a lower level by using a KeyedProcessFunction. The long ride alerts exercise from the Apache Flink Training repo is an example of that -- you can jump straight away to the solution if you want.
I am new to apache flink and am trying to understand how the concept of EventTime and Windowing is handled by flink.
So here's my scenario :
I have a program that runs as a thread and creates a files with 3
fields every second of which the 3rd field is the timestamp.
There is a little tweak though every 5 seconds I enter an older timestamp (t-5 you could say) into the new file created.
Now I run the stream processing job which reads the 3 fields above
into a tuple.
Now I have defined the following code for watermarking and timestamp generation:
WatermarkStrategy
.<Tuple3<String, Integer, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(4))
.withTimestampAssigner((event, timestamp) -> event.f2);
And then I use the following code for windowing the above and trying to get the aggregation :
withTimestampsAndWatermarks
.keyBy(0)
.window(TumblingEventTimeWindows.of(Time.milliseconds(4000)))
.reduce((x,y) -> new Tuple3<String, Integer, Long>(x.f0, x.f1 + y.f1,y.f2))
It is clear that I am trying to aggregate the numbers within each field.(a little more context, the field(f2) that I am trying to aggregate are all 1s)
Hence I have the following questions :
That is the window is just 4 seconds wide, and every fifth entry is
an older timestamp, so I am expecting that the next window to have
lesser counts. Is my understanding wrong here ?
If my understanding is right - I do not see any aggregation when running both programs in parallel, Is there something wrong with my code ?
Another one that is bothering me is on what fields or on what parameters do the windows start time and end time really dependent ? Is it on the timestamp extracted from Events or is it from processing time
You have to configure the allowed lateness: https://nightlies.apache.org/flink/flink-docs-release-1.2/dev/windows.html#allowed-lateness. If not configured, Flink will drop the late message. So for the next window, there will be less elements than previous window.
Window is assigned by the following calculation:
return timestamp - (timestamp - offset + windowSize) % windowSize
In your case, offset is 0(default). For event time window, the timestamp is the event time. For processing time window, the timestamp is the processing time from Flink operator. E.g. if windowSize=3, timestamp=122, then the element will be assigned to the window [120, 123).
I'm new to Apache Camel. In my application, I need to consume information from files on a folder, make validations and store that resulting information on an object that is stored inside the exchange, on a property. This process must run every 3 hours a day.
But, I need only once a day at a scheduled time to send an email with the information that is stored on that object. How can I achieve this?
Here's some pseudo code:
.1 from("file:C:/SourceFolder?scheduler.cron=* 3 * * * *).aggregate().process().to(a); //every 3 hours;
.2 from(a).process(); //here, the email must be send at 8pm every day;
Component "Direct" doesn't work, as it doesn't accept scheduling. I just need the information on the exchange from .1, and I need the routing .2 to run only once a day. Suggestions please? Thank you.
What is a watermark in Flink with respect to Event time processing? Why is it needed.?
Why is it needed in all cases of event time being used. By all cases I mean if i dont do a window opeation
then why do we still need a water mark.
I come from spark background. In spark we need watermarks only when we use windows on the incoming events.
I have read few articles and it seems to me that watermarks and windows seems same.If there are differences please explain and point it put
Post your reply I did some more reading. Below is a query that is more specific.
Main Question:- Why do we need outoforder when we have acceptedlateness.
Given below example:
Assume you have a BoundedOutOfOrdernessTimestampExtractor with a 2 minute bound and a 10 minute tumbling window that starts at 12:00 and ends at 12:10:
12:01, A
12:04, B
WM, 12:02 // 12:04 - 2 minutes
12:02, C
12:08, D
12:14, E
WM, 12:12
12:16, F
WM, 12:14 // 12:16 - 2 minutes
12:09, G
In the above example [12:02, C] record is not dropped but included into the window 12:00 -12:10 and later evaluated.- Hence the watermark could as well be the event timestamp
The record [12:09, G] is included into the window 12:00 - 12:10 only when there is a acceptedlateness of 5mins configured. This takes care of late and out of order events
So now adding to my previous question above, what is the necessary of outoforder option to be BoundedOutOfOrdernessTimestampExtractor of some value(other than 0) instead of the event timestamp istelf ?
What is that outoforder can achieve which allowedlateness cannot and in what scenario it does?
Watermarks and windows are closely related but they are very different concepts.
Watermarks are needed for any kind of event-based aggregation to cut off late events. Windows can only close when they receive an appropriate watermark and that's when results of aggregations are published.
If you have no out of order events, you can set watermarks to be equivalent to the timestamps of input events. But that's usually a luxury.
edit to address questions in comment.
is it a rule of thumb to keep the watermarks duration equal to window duration because by only doing so the result is calculated and emitted.
No, the durations are independent, but add up the lag on a given event.
Your watermark duration depends on your data and how much lag you can take for your application. Let's say most events are in order, 10% are coming up to 1s late, an additional 5% up to 10s, and 1% up to 1h.
If you set watermark duration to 0, then 16% of your data points are discarded, but Flink will receive no additional lag. If your watermark trails 1s behind your events, you will lose 6% of your data, but the results will have 1s more lag. If you want to retain all data, Flink will need to wait for 1h on each aggregation until Flink can be sure that no data is missing.
But then what is the role of trigger? and how does sliding windows coordinate with water marks and triggers. Can you please explain as how they work with each other ?
Let's say, you have a window of 1 min and a watermark delay of 5 s. A window will only trigger when it is sure that all relevant data has been seen. In this case, it needs to wait 1 min 5 s to trigger, such that the last event of the window has surely arrived.
Btw events later as watermark are discarded by default. You can change that behavior.