How to convert different date formats to single one in snowflake - snowflake-cloud-data-platform

I have test table where column sys_created_on(datatype is varchar(15)) is a datetime field and we receive two different date formats like below.
03-04-2022 12:49
2/28/2022 10:35
Expected Result is:
03-04-2022 12:49
02-28-2022 10:35
Could you please suggest if there is any way to convert all formats to one format instead..
any suggestions can be appreciated. Please suggest if datatype change can help anything here.
Thank you!!

The best thing to do here would be to just convert your text timestamp column to a bona fide timestamp column. You could achieve this using the TO_TIMESTAMP() function along with a CASE expression:
SELECT
ts,
CASE WHEN REGEXP_LIKE(ts, '\\d{1,2}-\\d{2}-\\d{4} \\d{1,2}:\\d{2}')
THEN TO_TIMESTAMP(ts, 'mm-dd-yyyy hh24:mi')
ELSE TO_TIMESTAMP(ts, 'mm/dd/yyyy hh24:mi') END AS ts_real
FROM yourTable;
Assuming you had a new timestamp column, you could populate it using the ts text column as follows:
UPDATE yourTable
SET ts_real = CASE WHEN REGEXP_LIKE(ts, '\\d{1,2}-\\d{2}-\\d{4} \\d{1,2}:\\d{2}')
THEN TO_TIMESTAMP(ts, 'mm-dd-yyyy hh24:mi')
ELSE TO_TIMESTAMP(ts, 'mm/dd/yyyy hh24:mi') END;

TRY_TO_DATE return null if it fails so you can just chain different formats together with COALESCE or NVL
SELECT column1,
TRY_TO_DATE(column1, 'dd-mm-yyyy hh:mi') as d1,
TRY_TO_DATE(column1, 'mm/dd/yyyy hh:mi') as d2
,nvl(d1,d2) as answer
FROM VALUES ('03-04-2022 12:49'),('2/28/2022 10:35');
gives:
COLUMN1
D1
D2
ANSWER
03-04-2022 12:49
2022-04-03
2022-04-03
2/28/2022 10:35
2022-02-28
2022-02-28
which can be merged as
,nvl(TRY_TO_DATE(column1, 'dd-mm-yyyy hh:mi'),TRY_TO_DATE(column1, 'mm/dd/yyyy hh:mi')) as answer
ah, didn't read well enough, to make them all the same, UPDATE but use the "local format" thus just a TO_CHAR
thus:
UPDATE table
SET sys_created_on = to_char(nvl(
TRY_TO_TIMESTAMP(sys_created_on , 'dd-mm-yyyy hh:mi'),
TRY_TO_TIMESTAMP(sys_created_on , 'mm/dd/yyyy hh:mi')
));

Replace the separator using replace():
update test_table
set sys_created_on = replace(sys_created_on,'/','-');
If you're also dealing with different day and month field order, look into regexp_replace() to swap their places:
update test_table
set sys_created_on = regexp_replace(sys_created_on,
'(.*)/(.*)/(.*)',
'\\2-\\1-\\3');
That's in case your 03-04-2022 is in format dd-mm-yyyy making it April 3rd, not March 4th. It's good to know what exact format you're dealing with. In extreme cases you might even need to make sure whether your hour field is 24-h or 12-h-based but missing an am/pm meridiem indicator.
As suggested by Tim's and Simeon's answers, a matching data type is always encouraged. It takes less space, queries faster, enables type-specific functions and maintains validity of data (varchar doesn't care if you get February 30th or 32nd day of month 13, at 25:60)
If you want to keep the cookie and eat it too, here's how you can add one virtual column where you'll always see a standardised version of your sys_created_on, and another one, which will always interpret it as a proper timestamp. This way you don't need to touch anything in how the table is populated, keep the original, unprocessed data, see how it gets standardised, and also benefit from a timestamp data type, while not using up any additional space:
alter table test_table
add column standardised_sys_created_on varchar(15)
as replace(sys_created_on,'/','-'),
add column timestamp_sys_created_on TIMESTAMP_NTZ
as coalesce(
try_to_date(sys_created_on, 'dd-mm-yyyy hh24:mi'),
try_to_date(sys_created_on, 'dd/mm/yyyy hh24:mi'));
To make it faster at the expense of materializing them, you can turn those virtual columns into generated/computed using default.

Related

How to compare column values to declared variable

I was asked this interview question.
--Without modifying the following code:
DECLARE #StartDateInput SMALLDATETIME = '1/1/2018',
#EndDateInput SMALLDATETIME = '1/1/2018'
--Modify the following query so that it will return contacts modified at any time on January 1st, 2018
SELECT *
FROM dbo.Contacts
I tried the following query but this was not correct. I'm sure that I'm supposed to use the #EndDateInput variable as well but I wasn't sure how to use it. I don't think that this is the right way to approach this in general either.
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate = SMALLDATETIME
It looks like the question is probing your understanding of date and datetime types, namely that a date with a time is after a date without a time (if there is even such a thing; most timeless dates are considered to be midnight on the relevant date, which is a time too.. in the same way that 1.0 is the same thing as 1, and 1.1 is after 1.0)
I'd use a range:
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate >= #StartDateInput AND ModifiedDate < DATEADD(DAY, 1, #EndDateInput)
Why?
This caters for datetimes that have a time component.
It doesn't modify the row data (always a bad idea, e.g. to cast a million datetimes to a date just to strip the time off, every time you query - precludes using an index on the column and is a massive waste of resources) just to perform the query.
It converts the apparent "end date is inclusive" implied by both #variables being the same, to a form that allows the exclusive behavior of < to work inclusively (adds a day and then gets rows less than the following day, thereby including 23:59:59.999999 ...)
The only thing I would say is that strictly, the spec only calls for one day's records, which means it's not mandatory to use the #EndDateInput at all. It seems logical to use it, but it could be argued that if the spec is that this query will only ever return one day, the #End variable could be discarded and a DATEADD performed on the #Start instead
It is saying "any time" meaning consider the time component. With T-SQL the only reliable way is to use >= and < range query (exclusive upper range):
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate >= #StartDateInput and
ModifiedDate < dateadd(d, 1, #EndDateInput);
PS: Initial declaration of #StartDateInput and #ENdDateInput is not robust and probably by chance pointing to Jan 1st, 2018. If it were '1/2/2018' then it would be ambiguous between Jan 2nd and Feb 1st. Better use ODBC canonical and\or ISO 8601 strings like '20180101'.

Have a field that is a time, need to find those outside of HH:MM AM/HH:MM PM format

I am looking a table that has from and to times for the hours that a particular business is open in a SQL Server table and the data is entered char, not a time format. It needs to be in the 12 hour format of MM:HH AM or PM. There many not entered with the correct time format. How can I create a Case statement or something to catch those in wrong format.
I've not tried anything, I don't know where to begin.
No code to show
I would expect 8am or 23:59 for example to show up in the case statement column with whatever fail message is entered.
My comments from under the question stand on this, but to reiterate them: Don't use a char to store date and time values. Using the wrong data type can (and will) cause you problems. You should be using the appropriate datatype (in this case time) and have your presentation layer handle the formatting.
Firstly, however, to explicitly answer your question, you state you want the format HH:MM AM/PM, which means you could use a LIKE expression:
SELECT TimeColumn
FROM YourTable
WHERE TimeColumn NOT LIKE '[0-1][0-9]:[0-5][0-9] [AP]M';
This does, still, however, have flaws as it'll allow a value like '19:00 AM'. Thus, you could be more specific and do this:
SELECT TimeColumn
FROM YourTable
WHERE TimeColumn NOT LIKE '0[0-9]:[0-5][0-9] [AP]M'
AND TimeColumn NOT LIKE '1[0-2]:[0-5][0-9] [AP]M';
Personally, I would actually add the above as a CHECK CONSTRAINT to stop the insertion of bad data, but you'll need to fix the data first:
ALTER TABLE YourTable
ADD CONSTRAINT ck_ValidTime
CHECK (TimeColumn NOT LIKE '0[0-9]:[0-5][0-9] [AP]M' AND TimeColumn NOT LIKE '1[0-2]:[0-5][0-9] [AP]M');
But, like I said, you could really be fixing your data type. I would firstly add a new column to store the old data:
ALTER TABLE YourTable ADD TimeStringColumn char(8);
GO
UPDATE YourTable SET TimeStringColumn = TimeColumn;
Then correct the values of your column and then alter the datatype:
UPDATE YourTable SET TimeColumn = TRY_CONVERT(char(8),TRY_CONVERT(time,'12:17 AM'),114);
ALTER TABLE YourTable ALTER COLUMN TimeColumn time(0);
If you want one good reason why you need to change your data type, according to your data '12:58 AM' is after '10:01 PM'.

How to get date time from date in SQL Server

I have a table having column order date like this 2019-06-01.
But I need to show date and time also like this 2019-06-01 00:00:00.000
Please suggest
You should be able to do a simple CAST() assuming your type is a date like you mentioned. It is stored as a date, but you want it to be presented as a datetime.
SELECT CAST(your_col AS DATETIME) AS your_col
FROM your_table
There are a few ways to do it. This should be pretty straight forward though:
DECLARE #TEST DATE
SET #TEST = '2019-06-01'
SELECT CONVERT(DATETIME, #TEST)
yields:
2019-06-01 00:00:00.000
Really doing what you are looking for, making your datatype of DATE into DATETIME
However as suggested in the comments depending on how this data is used you might just want to append stuff to it on the presentation layer. Really up to you! Good luck. :)

How to Get rows that are added 'x' minutes before in sqlserver?

I want to get all rows that have being added 'x' minutes before.
SELECT [PromoCodeID]
,[CustomerID]
,[DiscountAmount]
,[AddedBy]
,[AddedDate]
,[ModifiedBy]
,[ModifiedDate]
FROM [tbl_PromoCodesNewCustomer]
Where....
Eg: Records added 30min before from todays date time
NOTE: Record Added date is added in the field 'AddedDate' which has DATETIME datatype.
You can use this:
SELECT [PromoCodeID]
,[CustomerID]
,[DiscountAmount]
,[AddedBy]
,[AddedDate] AS added
,[ModifiedBy]
,[ModifiedDate]
FROM [tbl_PromoCodesNewCustomer]
WHERE DATEADD(minute,x,added) > GETDATE()
Where "x" in DATEADD is the number of minutes you want.
i suggest a light variation on FirstHorizon answer:
SELECT [PromoCodeID]
,[CustomerID]
,[DiscountAmount]
,[AddedBy]
,[AddedDate] AS added
,[ModifiedBy]
,[ModifiedDate]
FROM [tbl_PromoCodesNewCustomer]
WHERE AddedDate < DATEADD(minute * -1,x,getdate())
the change may look minor but depending on the number of involved rows, indexes and some other factor evaluated by the query optimizer this query may perform way better because there is no calc to make on the data.
here is an article that explain the reason (look at the second paragraph, 'Using Functions in Comparisons within the ON or WHERE Clause').
EDIT:
I'd advise you to populate the addedDate field with the UTC date and time, otherwise you could have problems with data managed among different servers \ time zones or on time change days.
So, if we consider this, the where clause will be:
WHERE DATEDIFF(mi,addedDate,GETUTCDATE)<30

Proper way to index date & time columns

I have a table with the following structure:
CREATE TABLE MyTable (
ID int identity,
Whatever varchar(100),
MyTime time(2) NOT NULL,
MyDate date NOT NULL,
MyDateTime AS (DATEADD(DAY, DATEDIFF(DAY, '19000101', [MyDate]),
CAST([MyDate] AS DATETIME2(2))))
)
The computed column adds date and time into a single datetime2 field.
Most queries against the table have one or more of the following clauses:
... WHERE MyDate < #filter1 and MyDate > #filter2
... ORDER BY MyDate, MyTime
... ORDER BY MyDateTime
In a nutshell, date is usually used for filtering, and full datetime is used for sorting.
Now for questions:
What is the best way to set indices on those 3 date-time columns? 2 separate on date and time or maybe 1 on date and 1 on composite datetime, or something else? Quite a lot of inserts and updates occur on this table, and I'd like to avoid over-indexing.
As I wrote this question, I noticed the long and kind of ugly computed column definition. I picked it up from somewhere a while ago and forgot to investigate if there's a simpler way of doing it. Is there any easier way of combining a date and time2 into a datetime2? Simple addition does not work, and I'm not sure if I should avoid casting to varchar, combining and casting back.
Unfortunately, you didn't mention what version of SQL Server you're using ....
But if you're on SQL Server 2008 or newer, you should turn this around:
your table should have
MyDateTime DATETIME
and then define the "only date" column as
MyDate AS CAST(MyDateTime AS DATE) PERSISTED
Since you make it persisted, it's stored along side the table data (and now calculated every time you query it), and you can easily index it now.
Same applies to the MyTime column.
Having date and time in two separate columns may seem peculiar but if you have queries that use only the date (and/or especially only the time part), I think it's a valid decision. You can create an index on date only or on time or on (date, whatever), etc.
What I don't understand is why you also have the computed datetime column as well. There s no reason to store this value, too. It can easily be calculated when needed.
And if you need to order by datetime, you can use ORDER BY MyDate, MyTime. With an index on (MyDate, MyTime) this should be ok. Range datetime queries would also be using that index.
The answer isn't in your indexing, it's in your querying.
A single DateTime field should be used, or even SmallDateTime if that provides the range of dates and time resolution required by your application.
Index that column, then use queries like this:
SELECT * FROM MyTable WHERE
MyDate >= #startfilterdate
AND MyDate < DATEADD(d, 1, #endfilterdate);
By using < on the end filter, it only includes results from sometime before midnight of that date, which is the day after the user-selected "end date". This is simpler and more accurate than adding 23:59:59, especially since stored times can include microseconds between 23:59:59 and 00:00:00.
Using persisted columns and indexes on them is a waste of server resources.

Resources