In Data Vault, all objects have a load_datetime attribute which can be used to determine the relative order of insertion into the database, regardless of where in the world this took place.
Which Snowflake data type is best suited for such columns and why?
My own feeling is that timestamp_ntz would not work as it just records "wallclock" time.
I would think that timestamp_ltz is the best choice as it stores only UTC.
Also, timestamp_tz should maintain the correct relative order, but the local time information is irrelevant in this case so timestamp_ltz seems a cleaner choice.
Have I missed anything?
I agree that using _ltz (UTC) is the best choice, especially if you have sources coming from different time zones. If all your sources are local to a single time zone then _tz would be fine, but why risk it right?
We have a system that serves many distributions centers. A distribution center is a physical space anywhere in the country. One client may have more than one distribution center. Now that we are expanding to more places, we will have to face the problem of different timezones. The same client may have, also, centers in different timezones.
A lot of events can be created and saved (with date and hour) using our system in the client centers. The ideal behavior for different timezones in the same client is the following:
Considering an event that happens in timezone A, at noon. If a supervisor of another distribution center in timezone B goes check this event, he should see the date and hour of that event respecting the timezone where the event was originally created (including DST changes, if existing).
This is because what matters to know is if the event was created at noon in the event timezone. For the supervisor does not matter if when the event was created it was 2PM in his timezone.
We use PostgreSQL as our database and I see that exists two different types to save timestamps. TIMESTAMP and TIMESTAMPTZ. All of our database uses only the type TIMESTAMP.
Another scenario that might (or not) happen is the case where a distribution center changes geographically. This may impact in a change of its timezone.
I have done some research and found that the more “correct” approach (at least is what it seems) is to save the timezone of every center in the DISTRIBUTION_CENTER table. Change all types from TIMESTAMP to TIMESTAMPTZ in our database and in every insert of an event that saves the timestamp, we should use the timezone of the center where the event was created to save the offset of the timezone of the event (the TIMESTAMPTZ only saves the offset of the timezone, not the timezone itself).
I’m not fully convinced that this is the right (or best) approach to deal with different timezones. As I never implemented nothing like this, I can’t say.
If I have to follow this approach, I will have to change all column types from TIMESTAMP to TIMESTAMPTZ in our database. All views that depends on that columns in some way in also have to be recreated, because we will be changing the column type. I will also have to change all queries that deals with that columns to apply the center timezone using AT TIMEZONE.
The database has now the America/Sao_Paulo timezone setted up and my fear is ended up doing something wrong when changing the column type for TIMESTAMPTZ. Does this change can potentially break with data consistency? Should I first change the column type or change the database timezone for UTC?
The solution that I described is the best way of doing this?
Does that approach also deals correctly with the DST changes?
Extra information: our server uses java (jersey).
These issues have been covered many times on Stack Overflow & the sister site https://dba.stackexchange.com/. Please, next time, search thoroughly before posting. So, just a recap here.
First you must understand the difference between offset-from-UTC and time zone. An offset is merely a number of hours, minutes, and seconds displacement from UTC. A time zone is the history of past, present, and future changes in the offset used by the people of a particular region. So using a time zone is always preferable to a mere offset.
exists two different types to save timestamps. TIMESTAMP and TIMESTAMPTZ
Not precisely. The actual types as defined by the SQL standard are TIMESTAMP WITH TIME ZONE and TIMESTAMP WITHOUT TIME ZONE. The other names are Postgres-specific synonyms. I suggest sticking with the standard names for clarity. Date-time handling is confusing enough without the ambiguity of reading/remembering the z on the end.
The SQL spec barely touches on the topic of date-time handling. So behavior varies widely between database implementations.
The way Postgres works is really quite simple.
For a column of type TIMESTAMP WITH TIME ZONE, any input passed with an offset-from-UTC or a time zone is automatically adjusted into UTC. After adjusting, the original value’s offset/zone info is discarded. The “with time zone” really means “with respect for the incoming data’s time zone” rather than “stored with time zone”. If you must know the original offset/zone, you must store that yourself in a separate column. I suggest doing so as text using the ISO 8601 standard formats for offset, or the proper name of timezone. If an input lacks any indicator of zone/offset, the session’s current default time zone is applied and then the adjustment is made to UTC — as I vaguely recall; you should never pass an input lacking zone/offset!
For a column of TIMESTAMP WITHOUT TIME ZONE, any zone/offset info with an input (if any) is ignored. Similarly, when retrieved this value has no zone/offset. This type has no concept of zone/offset. Do not use this type when your intention is to store a moment, a point on the timeline. This type is for a vague idea about potential moments along a range of about 26-27 hours, such as “Christmas Day starts after midnight on December 25, 2018”. Such a sentence has no real meaning until you append "in Japan", "in India", or "in France" (thereby creating a value in the other type, TIMESTAMP WITH TIME ZONE). This type is also used for future appointments more than several weeks out, when politicians may potentially change their region’s offset (which they surprisingly do often and with little forewarning).
BEWARE: Confusingly, some tools or drivers may apply the session’s current default time zone to values of either type. This includes pgAdmin. A terrible anti-feature in my opinion. Well-intentioned, but such a tool/driver sitting between you and Postgres should not be injecting its “opinion” about the data in transit. Doing so creates the illusion that the data retrieved carries that zone inside the database when precisely the opposite is true (actually carries either either UTC or no zone/offset). If your tool makes such an adjustment, it is likely controlled by a zone/offset setting in your Postgres session, as discussed here.
Best practice in date-time handling is to think, work, store, log, and exchange data using UTC. Think of other zones as mere variations on that theme. Adjust into time zones only when required by business logic or for presentation to users. Forget all about your own parochial time zone. Get a second clock on your desk set to UTC – seriously.
The database has now the America/Sao_Paulo timezone setted up
The default time zone of your server OS should be irrelevant. Never depend on such a default as a programmer as it is well out of your control, and is so easily changed.
In Java, the JVM has is own current default time zone separate from the host OS. The JVM’s current default may be changed during runtime at any moment by any code in any thread of any app within that JVM. So never depend on the current default. Always specify your desired/expected time zone explicitly by passing the optional argument.
If a supervisor of another distribution center in timezone B goes check this event, he should see
As discussed above, on the database side you should be working in UTC. Adjusting into a time zone expected by the user is a user-interface task, not a database task. Just like with internationalization, where you would store some kind of key-lookup value in the database, to be localized to some human language on the user-interface side.
the TIMESTAMPTZ only saves the offset of the timezone, not the timezone itself
No, incorrect. As discussed above, the TIMESTAMP WITH TIME ZONE type discards the offset/zone info after adjusting into UTC for storage. No offset, no zone, just a UTC moment is stored in the column — basically, a count of microseconds since an epoch reference.
Change all types from TIMESTAMP to TIMESTAMPTZ in our database and in every insert of an event that saves the timestamp, we should use the timezone of the center where the event was created to save the offset of the timezone of the event
If you are saying that you already have recorded date-time values from various time zones into a column of TIMESTAMP WITHOUT TIME ZONE, then you have an awful mess. You cannot reliably clean it up, not with full certainty of accuracy, as you do not really know what zone/offset was originally intended for the inputs passed to the database. You can guess the original intent of the stored data, but you can never be sure.
Explain to your boss & stakeholders that this is not a mess of your making. Explain that whoever set up this database & app did the equivalent of storing money amounts in various currencies such as Yen, Canadian dollars, British pounds, and Euros without bothering to record which currency on each amount.
If you want to guess, you would need to know the name of time zones that were likely used.
In Java, use only the java.time classes built into Java 8 and later. The older date-time classes are a bloody awful mess, now legacy, supplanted by java.time as defined in JSR 310.
Identify your possible zones.
ZoneId zoneSaoPaulo = ZoneId.of( "America/Sao_Paulo" ) ;
ZoneId zoneLisbon = ZoneId.of( "Europe/Lisbon" ) ;
ZoneId zoneKolkata = ZoneId.of( "Asia/Kolkata" ) ;
Extract the date-time value as a LocalDateTime, the Java class for a date-time value lacking any concept of zone/offset. With JDBC 4.2 and later, you may directly exchange java.time objects with the database.
LocalDateTime ldt = myResultSet.getObject( … , LocalDateTime.class ) ;
Perhaps an enum would be appropriate way to represent your distribution centers. This assumes the list need not change during runtime.
public enum DistributionCenter {
// List the constants to be constructed automatically when this class loads.
SAOPAULO( ZoneId.of( "America/Sao_Paulo" ) ) ,
LISBON( ZoneId.of( "Europe/Lisbon" ) ) ,
KOLKATA( ZoneId.of( "Asia/Kolkata" ) )
final public ZoneId zoneId ; // Make public, or add a getter method to access private member.
// Add constructor taking the passed `ZoneId` and storing in the variable.
}
Apply the zone, to generate a ZonedDateTime object. Now we have an actual moment, a specific point on the timeline.
DistributionCenter dc = … ;
ZonedDateTime zdt = ldt.atZone( dc.zoneId ) ;
Adjust that value into a UTC value. Same moment, same point on the timeline, different wall-clock time. Do not proceed with your project until you understand that concept clearly.
The Instant class represents a moment on the timeline in UTC with a resolution of nanoseconds (up to nine (9) digits of a decimal fraction).
Instant instant = zdt.toInstant() ;
You should be able to pass your ZonedDateTime object to your JDBC driver for adjustment into UTC. I just want to drive home the point that we are ending up with a UTC value in Postgres storage. Plus, I do convert to Instant myself for easy debugging – remember: UTC is The One True Time.
Now that we have determined an actual moment, we can store it in the database.
myPreparedStatement.setObject( … , instant ) ;
Note how none of this code depends on the current default time zone of your server host OS, your Postgres cluster, or your JVM.
I will have to change all column types from TIMESTAMP to TIMESTAMPTZ in our database
Yes. Data recording an actual moment, a piece of history, should never have been stored in TIMESTAMP WITHOUT TIME ZONE. Some naïve programmers/DBAs hope that using this data type may somehow exempt them from dealing with time zone issues. But actually this is a “pay now, or pay later” situation. Unfortunately, you are the one stuck paying for their poor choice.
You likely could do this same kind of work within a Postgres procedure. Postgres does have much better support for date-time work than most databases. However, nothing beats the java.time classes for date-time handling. And, personally I would rather debug and practice this particular chore within Java.
distribution center changes geographically
That is confusing and unwise. The business really should identify the new location as a new center, not the same. If you cannot convince management to do so, I would do so within your database and apps behind-the-scenes.
About java.time
The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.
To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.
The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.
You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes. Hibernate 5 & JPA 2.2 support java.time.
Where to obtain the java.time classes?
Java SE 8, Java SE 9, Java SE 10, Java SE 11, and later - Part of the standard Java API with a bundled implementation.
Java 9 brought some minor features and fixes.
Java SE 6 and Java SE 7
Most of the java.time functionality is back-ported to Java 6 & 7 in ThreeTen-Backport.
Android
Later versions of Android (26+) bundle implementations of the java.time classes.
For earlier Android (<26), the process of API desugaring brings a subset of the java.time functionality not originally built into Android.
If the desugaring does not offer what you need, the ThreeTenABP project adapts ThreeTen-Backport (mentioned above) to Android. See How to use ThreeTenABP….
Read Basil's impressive answer to understand the concepts.
You should definitely switch to timestamp with time zone, which in reality is an absolute timestamp (in slight violation of the SQL standars's intention).
One thing you want to conaider is if the time zone of an event should change if the time zone of the center that recorded it changes. If not, you will either have to keep a history of time zones for a center or (better) store the time zone with each event as it is created.
This is more of a thinking question.
I have been working around different time/date formats, and I noticed that it seems to be preferred to store date/time objects as variables with unique classes (like ISOdate or POSIXct) in databases (like Mongo, MySQL, postegen).
I get why one would want to convert to such a format when analyzing data, but I was wondering what's the advantage for when I store it in that format in a data-base?
Do these formats tend to take less space than conventional numbers?
I can't seem to find an answer online.
For arguments sake let's just talk about a simple date type (just date, no time or time zone) - such as the DATE type in MySQL.
Say we stored a string of 2014-12-31. What's one day later? As a human, it's easy to come up with the answer 2015-01-01, but a computer needs to have those algorithms programmed in.
While these types might expose APIs that have the algorithms for dealing with calendar math, under the hood they most likely store the information as a whole number of days since some starting date (which is called an "epoch"). So 2014-12-31 is actually stored as something like 16701. The computer can very efficiently add 1 to get 16702 for the next day.
This also makes it much easier to sort. Sure, in YYYY-MM-DD format, the lexicographical sort order is preserved, but it still takes much more processing power to sort strings than it does integers. Also, the date might be formatted for other cultures when represented as a string, such as in MM/DD/YYYY or DD/MM/YYYY format, which are not lexicographically sortable. If you through thousands of dates into a table and then query with a WHERE or ORDER BY clause, the database needs to be able to efficiently sort the values, and integer sorting is much faster than analyzing strings.
And yes - they tend to take much less physical storage space as well.
The same principles apply when date and time are both present, and you also have to contend with the precision of the time value (seconds, milliseconds, nanoseconds, etc.)
My target market is based in a very different time zone compared to where the webserver is based. Therefore, my save method timestamps Created and Modified are a lot less useful than they could be. Is there anyway that I could define a global offset for my app for those two fields whenever they are saved in the app so that the time matches my target market timezone? For example, deduct 5h from every Created record?
Store your datetimes as UTC and convert them to the appropriate user timezone when you display them, with CakeTime::convert. If you have user accounts, let the users pick their own timezones. If you don't, pick whichever timezone makes sense to you.
Put this in your Config/bootstrap.php:
date_default_timezone_set('UTC'); //or whatever your timezone is
It's just based on the server time and really has nothing to do with CakePHP - so just change the default timezone with PHP, and you should be good to go. 'created' and 'modified' will be based on the specified timezone.
USPS does not have on official listing of Time Zones by Zip Code. They have up to date street names, 2006/7 lat/lng and up to date city/states.
Also, AT&T (phone) does not have any time zone data from phone area codes (at least they said they don't)
I am looking for the most accurate way to get time zones by either zip code, city or phone number. Whatever is more likely to be accurate. I can make sure the results are not drastically off by looking at the state., but that could still be an hour off.
The focus here is accuracy. What is the most reliable way and/or most trusted source(s). Any help would be appreciated. If it costs something, then that is OK. Don't withhold the suggestion, especially if you think you can speak to its accuracy.
The concern about Phone Number is that some people use out of state area codes from VoIP services. But if that is the most up to date way, we might just use it anyway.
Started with states, had a co-worker come up with the zip codes that were not in the biggest TZ for their state. I don't know how he did it but it was a manual process.
For fun, look up Time In Indiana on Wikipedia. It is a joke-looking history.
GeoNames looks good for an API. Get lat/lng from zip and then TZ
A friend used this for an opensource project: http://www.twinsun.com/tz/tz-link.htm - he had no complaints about its accuracy, but that is hearsay...
ZIP Codes are more accurate and that is becoming truer by the day as more people are keeping cell numbers when they relocate to different areas of the country and - as you mentioned- VoIP usage is increasing. Obviously, good data will accurately reflect daylight savings (including a few areas like Arizona that do not observe it -hard to believe, but true). There is a lot of helpful information here greatdata.com including a free lookup tool where you can enter a ZIP Code or Area Code and get the time zone (plus a lot more). Here is the commercial data for your situation Area Code Database by ZIP Code
Chronomouse.js is a library that helps you get the current time, GMT offset, time zone name, location, capital city, daylight savings laws, or daylight savings status for any US/Canada area code, or any country code (essentially, from any phone number).
For example:
console.log ( getLocalInfo('212',{zone_display: 'area'}).time.zone );
// EST
More documentation and examples are available at www.chronomouse.com
Note: I am the author of this library.