I'm writing an application that indexes data for our stores, some of which are open late (8 am - 2 am). We need to be able to search this database quickly -- basically, to run a query to find which stores are open at a given point in time (now, Sunday at 1 am, whatever).
In addition, the open/close times can vary day-by-day -- some stores are closed on Sundays, for example.
The obvious solution to me would be to make a table where I have a row with the store ID, day, open time, and close time. For something like Monday, 8 am - 2 am, that would actually be two rows, one for Monday 0800 - 2400, and one for Tuesday 0000 - 0200.
We have a lot of stores, so the search has to perform well (basically, the data has to be index-friendly), but I'll also have to display this data back out in a human-readable format. With my current solution, that'd look something like this:
Monday: 8:00 - Midnight
Tuesday: Midnight - 2:00 am; 8:00 am - Midnight
I'm just wondering if anybody else has alternative solutions before I jump right to an implementation. Thanks!
When PBS (the US Public Broadcasting System) faced this same problem a couple of years ago, they invented the idea of the "30 hour day" -- Where 00:00 is midnight at the start of the day, 24:00 is midnight at the end of the day, 25:00 is 1am the next day, 30:00 is 6am the next day. That way Mon closing time of 26:00 is 2am Tues morning.
Rather than two records representing a single store's times for a day, it may be more object oriented to think of the "store day" as the object. That way 1 record = 1 store's times for a day. If you want to store the two sets of open/close times, just use four fields in the record instead of two--and adjust your queries appropriately.
Remember that your queries should use a library/api that you write and publish. The library will then deal with the data store and its data layout. No one but your library should be looking at the db directly.
Time zones are very important in this sort of app too. (Hopefully) at some point, the store chain will expand to cover more than one time zone. You'll then need to determine the local time of the query. -- May not the same as the time zone of your server which is handling the queries.
Further thoughts--
I now see that you're standardizing to GMT. Good. You could also use datetime values (vs time values) and standardize to a given week in time. Eg open time is Sun Jan 1, 1995 10am - Mon Jan 2, 1995 2am (using Jan 1, 1995 as a base since it was a Sunday).
Then rationalize your "current time and date" to match the same point in the week of Jan 1, 1995. Then query to find open store days.
HTH,
Larry
Related
My job does the following things:
Consumes events from Kafka topic based on event time.
Computes a window size of 7 days and in a slide of 1 day.
Sink the results to Redis.
I have several issues:
In case it consumes Kafka events from the lastest record, after 1 day the job is alive, the job closes the window and computes 7 days window. The problem is that the job has only data for 1 day and hence the results are wrong.
If I try to let it consumes the Kafka events from a timestamp of 7 days ago, as the job starts, it calculates the whole windows from the first day, and it took a lot of time. Also, I want just the last window results because this is what matters for me.
Have I missed something? Is there a better way to do that?
Flink aligns time windows to the epoch. So if you have windows that are one hour long, they run from the top of the hour to the top of the hour. Day long windows run from midnight to midnight. The same principle applies to windows that are seven days long, and since the epoch began on a Thursday (Jan 1, 1970), a window that is seven days long should close at midnight on Wednesday night / Thursday morning.
You can supply an offset to the window constructor if you want to shift the windows to start at a different time.
This should be a 2 hour 50 minute event starting at 5pm PST on a thursday evening repeating weekly. When imported into google calendar or Evolution, the first occurrence is correct, but subsequent weeks have the event on Wednesdays.
DTSTART:20170908T000000Z
DTEND:20170908T025000Z
RRULE:FREQ=WEEKLY;UNTIL=20171201T080000Z;BYDAY=TH
Other events that my application generate occur on multiple days i.e. BYDAY=TH,TU so simply removing the BYDAY is not a solution for my problem.
You are mixing up the timezone of the event and the display timezone. As far as recurrence calculation goes, the only thing that counts is the timezone that you declare in the VEVENT. Here you are using UTC. So:
The first instance is on the 20170908 which is a Friday. From there, you ask for a recurrence every thursday. The next thursday after 20170908 is 20170914, so the next instance starts on 20170914T000000Z.
When viewed with a display timezone of PST, this event has indeed its first instance on Thursday evening. But the second instance, as calculated above, is on a Thursday UTC time, so on a Wednesday PST time.
Besides this particular issue, you also need to worry about daylight saving changes. If your event is really taking place in PST timezone, the event, as expressed above will see its time change in November, after the DST transition takes place.
Hence it is almost never a good idea to express a recurring event by using UTC (Zulu) time.
You should have your event expressed in local time with timezone, i.e. use:
DTSTART;TZID="America/Los_Angeles":20170907T170000
I want to create a database to store process cycle time data. For example:
Say a particular process for a certain product, say welding, theoretically takes about 10 seconds to do (the process cycle time). Due to various issues, the machine's actual cycle time would vary throughout the day. I would like to store the machine's actual cycle time throughout the day and analyze it over time (days, weeks, months). How would i go about designing the database for this?
I considered using a time series database, but i figured it isn't suitable - cycle time data has a start time and an end time - basically i'm measuring time performance over time - if this even makes sense. At the same time, I was also worried that using relational database to store and then display/analyze time related data is inefficient.
Any thoughts on a good database structure would be greatly appreciated. Let me know if any more info is needed and i will gladly edit this question
You are tracking the occurrence of an event. The event (weld) starts at some time and ends at some time. It might be tempting to model the event entity like so:
StationID StartTime StopTime
with each welding station having a unique identifier. Some sample data might look like this:
17 08:00:00 09:00:00
17 09:00:00 10:00:00
For simplicity, I've set the times to large values (1 hour each) and removed the date values. This tells you that welding station #17 started a weld at 8am and finished at 9am, at which time the second weld started which finished at 10am.
This seems simple enough. Notice, however, that the StopTime of the first entry matches the StartTime of the second entry. Of course it does, the end of one weld signals the start of the next weld. That's how the system was designed.
But this sets up what I call the Row Spanning Dependency antipattern: where the value of one field of a row must be synchronized with the value of another field in a different row.
This can create any number of problems. For example, what if the StartTime of the second entry showed '09:15:00'? Now we have a 15 minute gap between the end of the first weld and the start of the next. The system does not allow for gaps -- the end of each weld also starts the next weld. How should be interpret this gap. Is the StopTime of the first row wrong. Is the StartTime of the second row wrong? Are both wrong? Or was there another row between them that was somehow deleted? There is no way to tell which is the correct interpretation.
What if the StartTime of the second entry showed '08:45'? This is an overlap where the start of the second cycle supposedly started before the first cycle ended. Again, we can't know which row contains the erroneous data.
A row spanning dependency allows for gaps and overlaps, neither of which is allowed in the data. There would need to be a large amount of database and application code required to prevent such a situation from ever occurring, and when it does (as assuredly it will) there is no way to determine which data is correct and which is wrong -- not from within the database, that is.
An easy solution is to do away with the StopTime field altogether:
StationID StartTime
17 08:00:00
17 09:00:00
Each entry signals the start of a weld. The end of the weld is indicated by the start of the next weld. This simplifies the data model, makes it impossible to have a gap or overlap, and more precisely matches the system we are modeling.
But we need the data from two rows to determine the length of a weld.
select w1.StartTime, w2.StartTime as StopTime
from Welds w1
join Welds w2
on w2.StationID = w1.StationID
and w2.StartTime =(
select Max( StartTime )
from Welds
where StationID = w2.StationID
and StartTime < w2.StartTime );
This may seem like a more complicated query that if the start and stop times were in the same row -- and, well, it is -- but think of all that checking code that no longer has to be written, and executed at every DML operation. And since the combination of StationID and StartTime would be the obvious PK, the query would use only indexed data.
There is one addition to suggest. What about the first weld of the day or after a break (like lunch) and the last weld of the day or before a break? We must make an effort not to include the break time as a cycle time. We could include the intelligence to detect such situation in the query, but that would increase the complexity even more.
Another way would be to include a status value in the record.
StationID StartTime Status
17 08:00:00 C
17 09:00:00 C
17 10:00:00 C
17 11:00:00 C
17 12:00:00 B
17 13:00:00 C
17 14:00:00 C
17 15:00:00 C
17 16:00:00 C
17 17:00:00 B
So the first few entries represent the start of a cycle, whereas the entry for noon and 5pm represents the start of a break. Now we just need to append the line
where w1.Status = 'C'
to the end of the query above. Thus the 'B' entries supply the end times of the previous cycle but do not start another cycle.
I want to load one table for data for say 1 month starting from 1 April to 30 April in successive manner.
i.e after loading data for 1 April, date should automatically increment to 2, load the data and increment and so on, till its 30 April.
Also, data of 2 April depends on 1 April data. So i cannot give a date range to load randomly.
How can I do it?
It would be preferable to get the loads done in single session run, instead of running the session for several times.
Sort the source data by date and use a Transaction Control transformation to enforce a commit every time the date changes.
Using SQL Server 2008+. SQLCLR is not an option.
I have a situation where I have data all stored in UTC. I need to do a comparison against that data by determining what a certain local time, let's say 8am, is in UTC. The timezone for the local time will vary on a row by row basis. (The timezone for each row is stored, so that's not an issue.) That certain local time has no date associated with it. It's always just "8am".
I have timezone data in the database, and this tells me the base UTC offset as well if the timezone follows daylight savings time.
But now I'm kind of stuck.
My problem is that in order to do a daylight savings time adjustment, I need to know if the current date/time in a particular timezone falls within certain ranges, but I can only convert to the appropriate local time to do that check if I know if it's daylight savings! In other words, how can I check to see if it's daylight savings unless I know whether a UTC offset is off due to daylight savings?
It's a chicken and egg problem.
It seems to me that the only solution is to be able to have a table that calculates daylight-savings aware offsets on a per-timezone basis.
Ideas?
You do have an ambiguity problem here, but it's not a chicken and egg issue.
The piece of information you are missing is, "what defines a day?" I know, it sounds crazy, but a "day" is not a universal concept. It's a local one.
For just a minute, put aside issues of time zones, DST and UTC. If I ask you, "How many hours are we apart from 8 AM right now?" You could give me two different answers. It's 7PM right now, so you might say "11 hours" - since that's how much time we are from 8 AM today. But I could also have said "13 hours" - since that's how much time we are from 8 AM tomorrow. Now in this very simplistic sample, you could disambiguate in one of two different ways. You might say "the last 8AM" or "the next 8AM". Or you might say "whichever happened today."
Now go back to the concept of UTC. What is a "UTC day?" Well, we know it's 24 hours, since UTC doesn't follow any daylight savings time. But saying that it runs "midnight to midnight UTC", isn't a very meaningful measure. Sure, there are some places that use this definition (for example, StackOverflow's stats engine). But for most people, we think of "today" in our own local time.
So I can't really say "whichever 8AM happened today". The only date measurement you have is a UTC date. You won't know which local date you should be looking at. Let's take a real example:
I live in Phoenix, Arizona, so my time zone offset is UTC-7. We don't have DST here.
It is currently June 14th 2013, 7 PM local time.
So that's June 15th 2013, 2 AM UTC.
Now I record that time in the database, and later I ask:
"How far away are we from 8 AM Arizona time?"
With the information I have, I don't know if I should be looking for 8 AM on June 14th, or 8 AM on June 15th. Only the latter falls on the same UTC date, but I certainly could be interested in either one of them.
If you can decide in your business logic that you want the last time, or the next time, then you can resolve this. Simply convert the UTC datetime to the local time zone. Then roll forward or backward to the desired time. If your time zone has DST and you cross a transition date along the way, you can adjust for that.
You could also pick the nearest time of the two, but of course that all depends on your business logic.
Another approach would be to figure out which local today you are in, using the UTC time you are comparing. So in my example above, Arizona's local June 14th runs from June 13th 17:00 UTC to June 14th 17:00 UTC.
So to summarize, you wanted to know "Is 8 AM in DST surrounding this UTC datetime?", and you can't answer that without more information, either the date of the 8AM, or some logical relationship to follow, of which there are several options available. Pick a strategy that works for your needs.
UPDATE
You asked in comments:
How can I know if right now in UTC is in dst in X time zone so I can adjust accordingly?
This is where the datetimeoffset type can be helpful. You don't just want to track "is DST in effect", you want to track the precise offset for the target time zone, including any DST that might be in effect. The difference is subtle, but it comes down to tracking a full offset rather than just a boolean yes/no.
So, let's pretend I live in New York City. Checking this site, we know that EDT went into effect on March 10th 2013 at 2AM local time, and it will go back to EST on November 3rd 2013 at 2AM local time.
So we have the following:
UTC Local datetimeofffset
2013-03-10T05:00:00Z 2013-03-10T00:00:00-05:00
2013-03-10T06:00:00Z 2013-03-10T01:00:00-05:00
2013-03-10T07:00:00Z 2013-03-10T03:00:00-04:00 <--- transition
2013-03-10T08:00:00Z 2013-03-10T04:00:00-04:00
...
2013-11-03T04:00:00Z 2013-11-03T00:00:00-04:00
2013-11-03T05:00:00Z 2013-11-03T01:00:00-04:00
2013-11-03T06:00:00Z 2013-11-03T01:00:00-05:00 <--- transition
2013-11-03T07:00:00Z 2013-11-03T02:00:00-05:00
Now notice that if you strip off the offset, you only have a one-way function. In other words, you can always determine the correct local time for the UTC time, but you can't go the other direction unless you know the offset during the fall-back transition (or unless you are willing to live with ambiguity).
So the algorithm for going from UTC to local time should be something like this:
Starting with the UTC datetime: 2013-11-03T05:30:00Z
Apply the standard offset (-5) 2013-11-03T00:30:00-05:00
Apply the daylight offset (-4) 2013-11-03T01:30:00-04:00
Which one is valid according to the time zone rules?
In this case, the daylight offset is valid.
Your time zone data should have this information.
If not, then you need to reconsider the source of your time zone tables.
Let's try it again with the other 1:30 time:
Starting with the UTC datetime: 2013-11-03T06:30:00Z
Apply the standard offset (-5) 2013-11-03T01:30:00-05:00
Apply the daylight offset (-4) 2013-11-03T02:30:00-04:00
Which one is valid according to the time zone rules?
In this case, the standard offset is valid.
How do we know? Because -4 is the daylight offset, and DST is supposed to be over at 2:00 local time. We have 2:30 local time associated with that offset, so only the standard one is valid in this time zone.
So can you convert from UTC to local? Yes. Always.
But you also said that the local value in the other column is just something like 8AM. So if it was 1:30AM, then certainly you would have an ambiguity during the fall-back transition. There is no way to resolve this, other than just picking one.
Sometimes, you might want to just pick one or the other, but sometimes you might want to error. And sometimes you might want to let your user pick which of the two they were interested in. It's not unheard of to see a dialog such as the following:
DAYLIGHT SAVING TIME
We're sorry, but there are two different instances of 1:30 AM on this day.
Which did you mean?
[1:30 AM Eastern Daylight Time] [1:30 AM Eastern Standard Time]
...those are buttons, if you couldn't tell. :)