as the example in flink doc:
input
.keyBy(<key selector>)
.window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8)))
.<windowed transformation>(<window function>);
I want to set different offset for every key, because it has different timezone.
How can I do that?
If I've understood correctly, it seems like you should use a map to transform all the timestamps into UTC. You'll want do that before your timestamp extractor / watermark generator.
In other words, shift the timestamps around so you don't need to vary the offset by key.
You can implement a custom WindowAssigner that takes the timezone into account when assigning records to windows. If you want to go that route, you might want to fork the TumblingEventTimeWindows assigner and extend it with custom logic to handle timezones.
You can also create windows per timezone with this approach.
We have a system that serves many distributions centers. A distribution center is a physical space anywhere in the country. One client may have more than one distribution center. Now that we are expanding to more places, we will have to face the problem of different timezones. The same client may have, also, centers in different timezones.
A lot of events can be created and saved (with date and hour) using our system in the client centers. The ideal behavior for different timezones in the same client is the following:
Considering an event that happens in timezone A, at noon. If a supervisor of another distribution center in timezone B goes check this event, he should see the date and hour of that event respecting the timezone where the event was originally created (including DST changes, if existing).
This is because what matters to know is if the event was created at noon in the event timezone. For the supervisor does not matter if when the event was created it was 2PM in his timezone.
We use PostgreSQL as our database and I see that exists two different types to save timestamps. TIMESTAMP and TIMESTAMPTZ. All of our database uses only the type TIMESTAMP.
Another scenario that might (or not) happen is the case where a distribution center changes geographically. This may impact in a change of its timezone.
I have done some research and found that the more “correct” approach (at least is what it seems) is to save the timezone of every center in the DISTRIBUTION_CENTER table. Change all types from TIMESTAMP to TIMESTAMPTZ in our database and in every insert of an event that saves the timestamp, we should use the timezone of the center where the event was created to save the offset of the timezone of the event (the TIMESTAMPTZ only saves the offset of the timezone, not the timezone itself).
I’m not fully convinced that this is the right (or best) approach to deal with different timezones. As I never implemented nothing like this, I can’t say.
If I have to follow this approach, I will have to change all column types from TIMESTAMP to TIMESTAMPTZ in our database. All views that depends on that columns in some way in also have to be recreated, because we will be changing the column type. I will also have to change all queries that deals with that columns to apply the center timezone using AT TIMEZONE.
The database has now the America/Sao_Paulo timezone setted up and my fear is ended up doing something wrong when changing the column type for TIMESTAMPTZ. Does this change can potentially break with data consistency? Should I first change the column type or change the database timezone for UTC?
The solution that I described is the best way of doing this?
Does that approach also deals correctly with the DST changes?
Extra information: our server uses java (jersey).
These issues have been covered many times on Stack Overflow & the sister site https://dba.stackexchange.com/. Please, next time, search thoroughly before posting. So, just a recap here.
First you must understand the difference between offset-from-UTC and time zone. An offset is merely a number of hours, minutes, and seconds displacement from UTC. A time zone is the history of past, present, and future changes in the offset used by the people of a particular region. So using a time zone is always preferable to a mere offset.
exists two different types to save timestamps. TIMESTAMP and TIMESTAMPTZ
Not precisely. The actual types as defined by the SQL standard are TIMESTAMP WITH TIME ZONE and TIMESTAMP WITHOUT TIME ZONE. The other names are Postgres-specific synonyms. I suggest sticking with the standard names for clarity. Date-time handling is confusing enough without the ambiguity of reading/remembering the z on the end.
The SQL spec barely touches on the topic of date-time handling. So behavior varies widely between database implementations.
The way Postgres works is really quite simple.
For a column of type TIMESTAMP WITH TIME ZONE, any input passed with an offset-from-UTC or a time zone is automatically adjusted into UTC. After adjusting, the original value’s offset/zone info is discarded. The “with time zone” really means “with respect for the incoming data’s time zone” rather than “stored with time zone”. If you must know the original offset/zone, you must store that yourself in a separate column. I suggest doing so as text using the ISO 8601 standard formats for offset, or the proper name of timezone. If an input lacks any indicator of zone/offset, the session’s current default time zone is applied and then the adjustment is made to UTC — as I vaguely recall; you should never pass an input lacking zone/offset!
For a column of TIMESTAMP WITHOUT TIME ZONE, any zone/offset info with an input (if any) is ignored. Similarly, when retrieved this value has no zone/offset. This type has no concept of zone/offset. Do not use this type when your intention is to store a moment, a point on the timeline. This type is for a vague idea about potential moments along a range of about 26-27 hours, such as “Christmas Day starts after midnight on December 25, 2018”. Such a sentence has no real meaning until you append "in Japan", "in India", or "in France" (thereby creating a value in the other type, TIMESTAMP WITH TIME ZONE). This type is also used for future appointments more than several weeks out, when politicians may potentially change their region’s offset (which they surprisingly do often and with little forewarning).
BEWARE: Confusingly, some tools or drivers may apply the session’s current default time zone to values of either type. This includes pgAdmin. A terrible anti-feature in my opinion. Well-intentioned, but such a tool/driver sitting between you and Postgres should not be injecting its “opinion” about the data in transit. Doing so creates the illusion that the data retrieved carries that zone inside the database when precisely the opposite is true (actually carries either either UTC or no zone/offset). If your tool makes such an adjustment, it is likely controlled by a zone/offset setting in your Postgres session, as discussed here.
Best practice in date-time handling is to think, work, store, log, and exchange data using UTC. Think of other zones as mere variations on that theme. Adjust into time zones only when required by business logic or for presentation to users. Forget all about your own parochial time zone. Get a second clock on your desk set to UTC – seriously.
The database has now the America/Sao_Paulo timezone setted up
The default time zone of your server OS should be irrelevant. Never depend on such a default as a programmer as it is well out of your control, and is so easily changed.
In Java, the JVM has is own current default time zone separate from the host OS. The JVM’s current default may be changed during runtime at any moment by any code in any thread of any app within that JVM. So never depend on the current default. Always specify your desired/expected time zone explicitly by passing the optional argument.
If a supervisor of another distribution center in timezone B goes check this event, he should see
As discussed above, on the database side you should be working in UTC. Adjusting into a time zone expected by the user is a user-interface task, not a database task. Just like with internationalization, where you would store some kind of key-lookup value in the database, to be localized to some human language on the user-interface side.
the TIMESTAMPTZ only saves the offset of the timezone, not the timezone itself
No, incorrect. As discussed above, the TIMESTAMP WITH TIME ZONE type discards the offset/zone info after adjusting into UTC for storage. No offset, no zone, just a UTC moment is stored in the column — basically, a count of microseconds since an epoch reference.
Change all types from TIMESTAMP to TIMESTAMPTZ in our database and in every insert of an event that saves the timestamp, we should use the timezone of the center where the event was created to save the offset of the timezone of the event
If you are saying that you already have recorded date-time values from various time zones into a column of TIMESTAMP WITHOUT TIME ZONE, then you have an awful mess. You cannot reliably clean it up, not with full certainty of accuracy, as you do not really know what zone/offset was originally intended for the inputs passed to the database. You can guess the original intent of the stored data, but you can never be sure.
Explain to your boss & stakeholders that this is not a mess of your making. Explain that whoever set up this database & app did the equivalent of storing money amounts in various currencies such as Yen, Canadian dollars, British pounds, and Euros without bothering to record which currency on each amount.
If you want to guess, you would need to know the name of time zones that were likely used.
In Java, use only the java.time classes built into Java 8 and later. The older date-time classes are a bloody awful mess, now legacy, supplanted by java.time as defined in JSR 310.
Identify your possible zones.
ZoneId zoneSaoPaulo = ZoneId.of( "America/Sao_Paulo" ) ;
ZoneId zoneLisbon = ZoneId.of( "Europe/Lisbon" ) ;
ZoneId zoneKolkata = ZoneId.of( "Asia/Kolkata" ) ;
Extract the date-time value as a LocalDateTime, the Java class for a date-time value lacking any concept of zone/offset. With JDBC 4.2 and later, you may directly exchange java.time objects with the database.
LocalDateTime ldt = myResultSet.getObject( … , LocalDateTime.class ) ;
Perhaps an enum would be appropriate way to represent your distribution centers. This assumes the list need not change during runtime.
public enum DistributionCenter {
// List the constants to be constructed automatically when this class loads.
SAOPAULO( ZoneId.of( "America/Sao_Paulo" ) ) ,
LISBON( ZoneId.of( "Europe/Lisbon" ) ) ,
KOLKATA( ZoneId.of( "Asia/Kolkata" ) )
final public ZoneId zoneId ; // Make public, or add a getter method to access private member.
// Add constructor taking the passed `ZoneId` and storing in the variable.
}
Apply the zone, to generate a ZonedDateTime object. Now we have an actual moment, a specific point on the timeline.
DistributionCenter dc = … ;
ZonedDateTime zdt = ldt.atZone( dc.zoneId ) ;
Adjust that value into a UTC value. Same moment, same point on the timeline, different wall-clock time. Do not proceed with your project until you understand that concept clearly.
The Instant class represents a moment on the timeline in UTC with a resolution of nanoseconds (up to nine (9) digits of a decimal fraction).
Instant instant = zdt.toInstant() ;
You should be able to pass your ZonedDateTime object to your JDBC driver for adjustment into UTC. I just want to drive home the point that we are ending up with a UTC value in Postgres storage. Plus, I do convert to Instant myself for easy debugging – remember: UTC is The One True Time.
Now that we have determined an actual moment, we can store it in the database.
myPreparedStatement.setObject( … , instant ) ;
Note how none of this code depends on the current default time zone of your server host OS, your Postgres cluster, or your JVM.
I will have to change all column types from TIMESTAMP to TIMESTAMPTZ in our database
Yes. Data recording an actual moment, a piece of history, should never have been stored in TIMESTAMP WITHOUT TIME ZONE. Some naïve programmers/DBAs hope that using this data type may somehow exempt them from dealing with time zone issues. But actually this is a “pay now, or pay later” situation. Unfortunately, you are the one stuck paying for their poor choice.
You likely could do this same kind of work within a Postgres procedure. Postgres does have much better support for date-time work than most databases. However, nothing beats the java.time classes for date-time handling. And, personally I would rather debug and practice this particular chore within Java.
distribution center changes geographically
That is confusing and unwise. The business really should identify the new location as a new center, not the same. If you cannot convince management to do so, I would do so within your database and apps behind-the-scenes.
About java.time
The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.
To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.
The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.
You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes. Hibernate 5 & JPA 2.2 support java.time.
Where to obtain the java.time classes?
Java SE 8, Java SE 9, Java SE 10, Java SE 11, and later - Part of the standard Java API with a bundled implementation.
Java 9 brought some minor features and fixes.
Java SE 6 and Java SE 7
Most of the java.time functionality is back-ported to Java 6 & 7 in ThreeTen-Backport.
Android
Later versions of Android (26+) bundle implementations of the java.time classes.
For earlier Android (<26), the process of API desugaring brings a subset of the java.time functionality not originally built into Android.
If the desugaring does not offer what you need, the ThreeTenABP project adapts ThreeTen-Backport (mentioned above) to Android. See How to use ThreeTenABP….
Read Basil's impressive answer to understand the concepts.
You should definitely switch to timestamp with time zone, which in reality is an absolute timestamp (in slight violation of the SQL standars's intention).
One thing you want to conaider is if the time zone of an event should change if the time zone of the center that recorded it changes. If not, you will either have to keep a history of time zones for a center or (better) store the time zone with each event as it is created.
To put it short, on input to my Postgres DB I have a timestamp in format "2014-12-10T12:00:14+07:00", and I would like to use 'timestampandtz' Postgres C extension (https://github.com/mweber26/timestampandtz), but I don't know how to approach the question of determining the time zone.
Since the extension compares the input with the full timezone names in zones.c, the datatype won't know what to do with "+07:00" instead of "# Continent/City", which it expects on the input.
The thing is, I need to pull out the city somehow from "+07:00", cause I do need Daylight Savings Time resolution.
Also, I know that these timestamps are from only one country, so maybe determining the "# Continent/City" can be thought out based on this.
Any POV on how to approach this challenge would be greatly appreciated, thanks!
I'm afraid that you are out of luck. There is no way to convert a time offset like +07:00 to a time zone like US/Eastern automatically.
The reason is that the same time offset can belong to different time zones. For example, -05:00 currently could be America/Lima or US/Central, but these are different time zones – the former has no daylight savings time.
So you will have to come up with a translation yourself, e.g. if you know what time zone all your data with a certain time offset belong to.
With Django, can someone explain to me how DateTimes are stored in the Model/database and what is the best way to make sure that it is displayed correctly to the users? My users are all in the same timezone, but we have summer/winter time.
I'm thinking that DateTimes should be saved as GMP+1, but without summer/winter time. If a record has a DateTime that is 1 hour more than another one, then it should also have occurred 1 hour later. But how do I make sure that users see the correct time for this location? I'd prefer not to use the systime on their personal computers; all users should see the same date/times, regardless of where they are. They'll know that it is the time for this home location.
settings.py has TIME_ZONE, which I'm guessing is how the DB stores DateTime, but is this with or without summertime? And then I use other settings (template tags?) to convert the time? Which ones? Or is this also set in setting.py?
My app is hosted by webfactional, so I guess I'm not restricted by a Windows system, which apparently needs to use the system time.
I see wordpress database on wp_posts table, there are 2 columns to store post date, post_date and post_date_gmt
post_date_gmt store the post date in GMT time. And post_date store post date in user time who create the post. Am I right?
Is there any benefit to store both version of post date on database?
Is it faster to store both version of post date rather than calculate post_date_gmt with user timezone when user want to view post date based on his/her timezone
UPDATE:
I Also ask on site point and get this answer
The one benefit it would provide is
that if the person moves to a
different timezone then their earlier
posts still record what time it was
where they were when they posted them
and not what time it was where they
are now (as would be all you could
calculate using the GMT time and their
current timezone).
http://www.sitepoint.com/forums/showthread.php?p=4671837
Not sure about the speed on the interpreter. One big reason they use GMT is actually that it makes the date-time more portable and not tied to a specific timezone and allows you to write code that is more general and internationalized. Why they save both is curious but the GMT format is definitely preferred for database storage.
Thats because some countries use Daylight Saving Time