My job does the following things:
Consumes events from Kafka topic based on event time.
Computes a window size of 7 days and in a slide of 1 day.
Sink the results to Redis.
I have several issues:
In case it consumes Kafka events from the lastest record, after 1 day the job is alive, the job closes the window and computes 7 days window. The problem is that the job has only data for 1 day and hence the results are wrong.
If I try to let it consumes the Kafka events from a timestamp of 7 days ago, as the job starts, it calculates the whole windows from the first day, and it took a lot of time. Also, I want just the last window results because this is what matters for me.
Have I missed something? Is there a better way to do that?
Flink aligns time windows to the epoch. So if you have windows that are one hour long, they run from the top of the hour to the top of the hour. Day long windows run from midnight to midnight. The same principle applies to windows that are seven days long, and since the epoch began on a Thursday (Jan 1, 1970), a window that is seven days long should close at midnight on Wednesday night / Thursday morning.
You can supply an offset to the window constructor if you want to shift the windows to start at a different time.
Related
I want to execute a Job in CRON for every 14 days from a specific date and timezone.
As an e.g. from JUNE 24TH every 14 days in CST time zone.
Run job every fortnight
The easy way
The easiest way to do this is simply to create the task to run every 14 days from when you want it to first run like:
CREATE TASK mytask_fortnightly
WAREHOUSE = general
SCHEDULE = '20160 MINUTE'
AS
SELECT 'Hello world'
How it works
As there are 60 minutes in an hour, 24 hours in a day and 14 days in a fortnight, ergo that's 20,160 minutes.
Caveat
The above solution does not run the task every fortnight from a given date/time, but rather every fortnight from when the task is created.
Even though this is the simplest method, it does require you to be nominally present to create the task at the exact desired next scheduled time.
As a workaround however, you can create a one-shot task to do that for you the very first time at the exact correct date/time. This means you don't have to remember to be awake / alert / present to do it manually yourself, and you can clean up the creation task afterwards.
The harder way.
Other solutions will require you to create a task which gets run every Thursday (since 2021-06-24 is/was a Thursday, each subsequent Thursday will either be the off-week, or the fortnight week)
e.g. SCHEDULE = 'USING CRON 0 0 * * THU'
Then you will add specific logic to it to determine which one the correct fortnight is.
Using this method will also incur execution cost for the off-week as well to determine if it's the correct week.
Javascript SP
In javascript you can determine if it's the correct week or not by subtracting the start date from the current date and if it's not a mutiple of 14 days, use this as a conditional to short circuit the SP.
const deltaMs = (new Date) - (new Date('2021-06-24'));
const deltaDays = ~~(deltaMs / 86400000);
const run = deltaDays % 14 === 0;
if (!run) return;
// ... continue to do what you want.
SQL
You can also check if it's a fortnight using the following SQL condition in a WHERE clause, or IFF / CASE functions.
DATEDIFF('day', '2021-06-24', CURRENT_DATE) % 14 = 0
I need to generate a report that shows activity on all accounts, were the last activity is greater than 7 days ago, thus not showing accounts that have had activity in the past 7 days.
I know this can be hard set when building the report, but I need this to update for the current day each time it is run. I don't wont to have to edit report everyday.
Wouldn't it be where DateField > LAST 7 DAYS since report support relative days such as LAST WEEK or LAST FY or YESTERDAY etc
This should be a 2 hour 50 minute event starting at 5pm PST on a thursday evening repeating weekly. When imported into google calendar or Evolution, the first occurrence is correct, but subsequent weeks have the event on Wednesdays.
DTSTART:20170908T000000Z
DTEND:20170908T025000Z
RRULE:FREQ=WEEKLY;UNTIL=20171201T080000Z;BYDAY=TH
Other events that my application generate occur on multiple days i.e. BYDAY=TH,TU so simply removing the BYDAY is not a solution for my problem.
You are mixing up the timezone of the event and the display timezone. As far as recurrence calculation goes, the only thing that counts is the timezone that you declare in the VEVENT. Here you are using UTC. So:
The first instance is on the 20170908 which is a Friday. From there, you ask for a recurrence every thursday. The next thursday after 20170908 is 20170914, so the next instance starts on 20170914T000000Z.
When viewed with a display timezone of PST, this event has indeed its first instance on Thursday evening. But the second instance, as calculated above, is on a Thursday UTC time, so on a Wednesday PST time.
Besides this particular issue, you also need to worry about daylight saving changes. If your event is really taking place in PST timezone, the event, as expressed above will see its time change in November, after the DST transition takes place.
Hence it is almost never a good idea to express a recurring event by using UTC (Zulu) time.
You should have your event expressed in local time with timezone, i.e. use:
DTSTART;TZID="America/Los_Angeles":20170907T170000
Long time lurker and now i have my first question:
I'm designing a SQL Report. One Task is to calculate the amount of minutes between two Times. They can be the same day or on different days. In the Database there are 4 Columns given
The Start Date (as Datetime e.g. 24.10.2017 00:00:00)
The Start Time (as Datetime e.g. 01.01.1899 11:25:00)
The End Date (formated as above)
The End Time (formated as above)
I'm calculating three Filds, all in Minutes
Days between: =DateDiff("n", Fields!StartDatum.Value,Fields!QualDatum.Value)
Minutes between: =DateDiff("n",Fields!StartZeit.Value,Fields!QualZeit.Value)
Adding those up: =Fields!QualiZeitTage.Value+Fields!QualiZeitMinuten.Value
All of this is working great and produces the desired output.
My Problem is, that i don't need the full time between those events. I only want to count minutes that are between 7:00 am and 8:00 pm. Also, i want to exclude Saturdays and Sundays. How would i go about limiting the datediff function to my desired times?
Second Problem: The Endtime and Date are only written when the event actually is finished. If it's still ongoing those Fields are empty producing a negative number (-1060764480 for example). Since i'm only using those to produce Boolean output on surpassing a certain length, it's no problem. I would like to handle that more "cleanly" though. Any thoughts?
I'm writing an application that indexes data for our stores, some of which are open late (8 am - 2 am). We need to be able to search this database quickly -- basically, to run a query to find which stores are open at a given point in time (now, Sunday at 1 am, whatever).
In addition, the open/close times can vary day-by-day -- some stores are closed on Sundays, for example.
The obvious solution to me would be to make a table where I have a row with the store ID, day, open time, and close time. For something like Monday, 8 am - 2 am, that would actually be two rows, one for Monday 0800 - 2400, and one for Tuesday 0000 - 0200.
We have a lot of stores, so the search has to perform well (basically, the data has to be index-friendly), but I'll also have to display this data back out in a human-readable format. With my current solution, that'd look something like this:
Monday: 8:00 - Midnight
Tuesday: Midnight - 2:00 am; 8:00 am - Midnight
I'm just wondering if anybody else has alternative solutions before I jump right to an implementation. Thanks!
When PBS (the US Public Broadcasting System) faced this same problem a couple of years ago, they invented the idea of the "30 hour day" -- Where 00:00 is midnight at the start of the day, 24:00 is midnight at the end of the day, 25:00 is 1am the next day, 30:00 is 6am the next day. That way Mon closing time of 26:00 is 2am Tues morning.
Rather than two records representing a single store's times for a day, it may be more object oriented to think of the "store day" as the object. That way 1 record = 1 store's times for a day. If you want to store the two sets of open/close times, just use four fields in the record instead of two--and adjust your queries appropriately.
Remember that your queries should use a library/api that you write and publish. The library will then deal with the data store and its data layout. No one but your library should be looking at the db directly.
Time zones are very important in this sort of app too. (Hopefully) at some point, the store chain will expand to cover more than one time zone. You'll then need to determine the local time of the query. -- May not the same as the time zone of your server which is handling the queries.
Further thoughts--
I now see that you're standardizing to GMT. Good. You could also use datetime values (vs time values) and standardize to a given week in time. Eg open time is Sun Jan 1, 1995 10am - Mon Jan 2, 1995 2am (using Jan 1, 1995 as a base since it was a Sunday).
Then rationalize your "current time and date" to match the same point in the week of Jan 1, 1995. Then query to find open store days.
HTH,
Larry