Remove milliseconds from timestamp - snowflake-cloud-data-platform

How to remove milliseconds (fraction) from timestamp?
Like for example to_timstamp_ntz('2021-12-28 14:25:36') returns 2021-12-20 14:25:36.000

It depends on what you want to achieve on the result.
The timestamp data type always contains milliseconds.
Suppose you have such a variable:
set t = to_timestamp_ntz('2021-12-28 14:25:36.300');
You can truncate the milliseconds and leave the same data type, you will still see zeros but not different values:
select date_trunc('SECOND', $t);
+--------------------------+
| DATE_TRUNC('SECOND', $T) |
+--------------------------+
| 2021-12-28 14:25:36.000 |
+--------------------------+
You can convert a timestamp value to char or varchar and remove the milliseconds when converting:
select to_varchar($t, 'YYYY-MM-DD HH24:MI:SS');
+-----------------------------------------+
| TO_VARCHAR($T, 'YYYY-MM-DD HH24:MI:SS') |
+-----------------------------------------+
| 2021-12-28 14:25:36 |
+-----------------------------------------+
If you only want to get rid of milliseconds in the displayed result, but not change the type, I suggest changing the session or user settings:
alter session set timestamp_ntz_output_format = 'YYYY-MM-DD HH24:MI:SS';
select $t;
+---------------------+
| $T |
+---------------------+
| 2021-12-28 14:25:36 |
+---------------------+
Reference: DATE_TRUNC, TO_VARCHAR, TIMESTAMP_NTZ_OUTPUT_FORMAT

Related

How to cast date time to snowflake timestamp while copying data from a external stage in to a table?

I am trying to copy data from the external stage(azure) to a table in snowflake
The file format is in csv which include a date columns- orderdate('2/24/2003 0:00') .
I created table - sales_order with a data type 'timestamp' for the column 'orderdate'
#csv file for sales_order
| sales | orderdate |
| -------- | --------------|
| 2871 | 2/24/2003 0:00|
| 3211 | 2/25/2003 0:00|
i used below copy command to copy data from external stage to the table
copy into sales_order (sales, orderdate) from (select t.$1, to_timestamp_ntz(t.$2) from #sales_stage t)
But copying failed with below error
Timestamp '2/24/2003 0:00' is not recognized
#Expected
Any solution to load/transform the orderdate to the respective date time format in snowflake?
The format of the orderdate column needs to have the proper time information like
If you try to add the seconds info in the Orderdata column and try to convert, it should work
Your oderdate column should look like this
| sales | orderdate |
| -------- | --------------|
| 2871 | 2/24/2003 00:00:00|
| 3211 | 2/25/2003 00:00:00|
If you try to do a data conversion with your specified data format it will fail
You can try this simple code to test
select to_timestamp_ntz('2/24/2003 0:00', 'mm/dd/yyyy hh24:mi:ss');
Fails with error - Can't parse '2/24/2003 0:00' as timestamp with format 'mm/dd/yyyy hh24:mi:ss'
and if you now try this code
select to_timestamp_ntz('2/24/2003 00:00:00', 'mm/dd/yyyy hh24:mi:ss');
This works.

Results changed after sorting a table

I have a following scenario:
I created a batch job using SQL API.
final TableEnvironment tEnv = TableEnvironment.create(EnvironmentSettings.newInstance().inBatchMode().build());
I load the data from csv files, convert/aggregate it using SQL API.
At some stage I have a table:
CREATE VIEW ohlc_current_day as
SELECT
CAST(transact_time as DATE) as `day`,
instrument_id,
first_value(price) as `open`,
min(price) AS `low`,
max(price) AS `high`,
last_value(price) as `close`,
count(*) AS `count`,
sum(quantity) AS volume,
sum(quantity * price) AS turnover
FROM trades //table loaded from csv
group by CAST(transact_time as DATE), instrument_id
Now when check the results:
select * from ohlc_current_day where instrument_id=14
+------------+---------------+---------+---------+--------+---------+-------+-----------+---------------+
| day | instrument_id | open | low | high | close | count | volume | turnover |
+------------+---------------+---------+---------+--------+---------+-------+-----------+---------------+
| 2021-04-11 | 14 | 1723.0 | 1709.0 | 1743.0 | 1728.0 | 679 | 487470.0 | 8.4114803E8 |
+------------+---------------+---------+---------+--------+---------+-------+-----------+---------------+
The results are repeatable and correct (checked with reference).
Then, for futrher processing, I need ohlc values from the previous day which are already stored in a database:
CREATE TABLE ohlc_database (
`day` TIMESTAMP,
instrument_id INT,
`open` float,
`low` FLOAT,
`high` FLOAT,
`close` FLOAT,
`count` BIGINT,
volume FLOAT,
turnover FLOAT
) WITH (
'connector' = 'jdbc',
'url' = 'url',
'table-name' = 'ohlc',
'username' = 'user',
'password' = 'password'
)
Let's now merge ohlc_current_day with ohlc_database:
CREATE VIEW ohlc_raw as
SELECT * from ohlc_current_day
UNION ALL
select
CAST(`day` as DATE) as `day`,
instrument_id,
`open`,
`low`,
`high`,
`close`,
`count`,
volume,
turnover
FROM ohlc_database
WHERE `day` = '2021-04-10' //hardcoded previous day date
And check the results:
select * from ohlc_raw where instrument_id=14
+------------+---------------+--------+--------+--------+---------+-------+-----------+---------------+
| day | instrument_id | open | low | high | close | count | volume | turnover |
+------------+---------------+--------+--------+--------+---------+-------+-----------+---------------+
| 2021-04-10 | 14 | 1696.0 | 1654.0 | 1703.0 | 1691.0 | 936 | 1040888.0 | 1.74619264E9 |
| 2021-04-11 | 14 | 1723.0 | 1709.0 | 1743.0 | 1728.0 | 679 | 487470.0 | 8.4114829E8 |
+------------+---------------+--------+--------+--------+---------+-------+-----------+---------------+
results are ok, values the same as in previous select query.
Now let's order by day:
CREATE VIEW ohlc as
SELECT * from ohlc_raw ORDER BY `day`
Check the results:
select * from ohlc where instrument_id=14
+------------+---------------+-----------------+-----------------+-------------+----------------+----------------------+--------------------------------+--------------------------------+
| day | instrument_id | open | low | high | close | count | volume | turnover |
+------------+---------------+-----------------+-----------------+-------------+----------------+----------------------+--------------------------------+--------------------------------+
| 2021-04-10 | 14 | 1696.0 | 1654.0 | 1703.0 | 1691.0 | 936 | 1040888.0 | 1.74619264E9 |
| 2021-04-11 | 14 | 1729.0 | 1709.0 | 1743.0 | 1732.0 | 679 | 487470.0 | 8.4114854E8 |
+------------+---------------+-----------------+-----------------+-------------+----------------+----------------------+--------------------------------+--------------------------------+
open and close are wrong compared to previous values. They are calculated using first_value() and last_value() functions which depend on the order of elements. So my guess is that order by in last query has changed the order and this is why there are different results.
Is my understanding correct? How can I fix it?
I thought that first_value() or last_value() themself is a sort operation. When you order by in last query, the sort according to last_value() is out of order. May be you can output the result into database after union all, and then do order by date after extract data from database if possible.

UNNEST array and assign to new columns with CASE WHEN

I have following BigQuery table, which has nested structure, i.e. example below is one record in my table.
Id | Date | Time | Code
AQ5ME | 120520 | 0950 | 123
---------- | 150520 | 1530 | 456
My goal is to unnest the array to achieve the following structure (given that 123 is the Start Date code and 456 is End Date code):
Id | Start Date | Start Time | End Date | End Time
AQ5ME | 120520 | 0950 | 150520 | 1530
I tried basic UNNEST in BigQuery and my results are as follows:
Id | Start Date | Start Time | End Date | End Time
AQ5ME | 120520 | 0950 | NULL | NULL
AQ5ME | NULL | NULL | 150520 | 1530
Could you please support me how to unnest it in a correct way as described above?
You can calculate mins and max within the row, and extract them as a new column.
Since you didn't show the full schema, I assume Date and Time are separate arrays.
For that case, you can use that query:
SELECT Id,
(SELECT MIN(D) from UNNEST(Date) as d) as StartDate,
(SELECT MIN(t) from UNNEST(Time) as t) as StartTime,
(SELECT MAX(D) from UNNEST(Date) as d) as EndDate,
(SELECT MAX(t) from UNNEST(Time) as t) as EndTime
FROM table
As in Sabri's response - using aggregation functions while unnesting works perfectly. To use this fields later on for sorting purposes (in ORDER BY statement) SAFE_OFFSET[0] can be used, like for example below:
...
ORDER BY StartDate[SAFE_OFFSET(0)] ASC

SQL to have dates from different tables appear in chronological order by column

I would like to query dates from three different tables and order each date chronologically into columns. Each table Event1, Event2, and Event3 contains event dates that are to occur chronologically for any one individual (i.e. Event1 should occur before Event2 which should then occur before Event3). But it just so happens that a person has a date for Event1 that is after Event2 and Event3. I would like to get a result set that shows two rows. One row that shows the earlier events from Event2 and Event3 and a second row that contains only the newer event date from Event1. Below are the data for this example:
Main
ID
----------
Person001
Person002
Person003
Person004
Person005
Event1
ID | EVENT_DATE
----------+-----------
Person001 | 2019-04-30
Person002 | 2018-02-01
Person004 | 2018-05-01
Event2
ID | EVENT_DATE
----------+-----------
Person001 | 2005-03-03
Person002 | 2018-03-15
Person003 | 2017-10-10
Person005 | 2018-10-01
Event3
ID | EVENT_DATE
----------+-----------
Person001 | 2005-04-15
Person002 | 2019-01-10
Person004 | 2018-12-11
Person005 | 2018-12-15
Person005 | 2019-07-02
I would like the results set to appear like this:
ID | EVENT_DATE_1 | EVENT_DATE_2 | EVENT_DATE_3
----------+--------------+--------------+--------------
Person001 | NULL | 2005-03-03 | 2005-04-15
Person001 | 2019-04-30 | NULL | NULL
Person002 | 2018-02-01 | 2018-03-15 | 2019-01-10
Person003 | NULL | 2017-10-10 | NULL
Person004 | 2018-05-01 | NULL | 2018-12-11
Person005 | NULL | 2018-10-01 | 2018-12-15
Person005 | NULL | NULL | 2019-07-02
I am using Microsoft SQL Server.
Thanks in advance.
I should clarify: Person001 is just an example individual. I would like to query a whole database of people. For most people, the events will fall in the correct order. However, some people will have multiple instances of an event. For example, someone can have two Event1 dates. For Person001 in the example, they are supposed to have an Event1 date that corresponds with Event2 and Event3; it just happens to be missing data.
Edit: I added more example data. I tried the code in the answers and it seems to work only for the case of Person001. If there are other arrangements of data points, it doesn't seem to work. I'm hoping the extra persons will account for other types of scenarios.
That is a bit strange but you can do:
select id, event_date as event_date_1, null as event_date_2, null as event_date_3
from event1
union all
select coalesce(e2.id, e3.id), null, e2.event_date, e3.event_date
from event2 e2 full join
event3 e3
on e2.id = e3.id
order by id, event_date_1;
Here is a db<>fiddle.

SQL Server Current Date compare with specific date

I got 1 table which is dbo.Invoice. My current query now is able to select "SalesRef" that does not have invoice for "Mvt_Type" = '122'. However, I need to extend my query with PostDate field.
My problem is current query still display an SalesRef that does not have invoice for "Mvt_Type" = '122' with Postdate today( 8/8/2017). My expected result is it can only be display if no invoice was made more than 2 days after the Postdate. So, it suppose to display on 11/8/2017 or more.
Table dbo.Invoice
| PO_NUMBER | TYPE | MVT_TYPE | QUANTITY | SALESREF | DEBIT | POSTDATE |
|----------- |------ |---------- |---------- |---------- |------- |------------ |
| 10001001 | GR | 101 | 1000.00 | 5001 | S | 2017-01-08 |
| 10001001 | GR | 101 | 2000.00 | 5002 | S | 2017-02-08 |
| 10001001 | GR | 122 | 1000.00 | 5001 | H | 2017-01-08 |
| 10001001 | INV | 000 | 1000.00 | 5001 | S | 2017-01-08 |
| 10001001 | INV | 000 | 2000.00 | 5002 | S | 2017-02-08 |
| 10001001 | GR | 122 | 1500.00 | 5002 | H | 2017-02-08 |
| 10001001 | INV | 000 | 1000.00 | 5001 | H | 2017-01-08 |
Below is my current query :
SELECT *
FROM dbo.INVOICE i
WHERE MVT_TYPE = '122' AND SALESREF IS NOT NULL AND POSTDATE > CONVERT(VARCHAR(10), dateadd(day,2,getdate()),101)
AND NOT EXISTS (SELECT 1
FROM dbo.INVOICE
WHERE DEBIT = 'H' AND MVT_TYPE = '000' AND SALESREF = i.SALESREF )
Expected Result is same like below. But this time need to add PostDate.
| PO_NUMBER | TYPE | MVT_TYPE | QUANTITY | SALESREF | DEBIT | POSTDATE |
|----------- |------ |---------- |---------- |---------- |------- |------------ |
| 10001001 | GR | 122 | 1500.00 | 5002 | H | 2017-02-08 |
If PostDate is DATE or DATETIME, instead of casting you could use DATEDIFF function to get the days between two dates and do the INT comparison:
WHERE DATEDIFF(DAY, PostDate, GETDATE())>2
If PostDate is varchar, stored in the format shown in the OP:
SET LANGUAGE british
SELECT ....
WHERE DATEDIFF(DAY, CAST(PostDate as datetime), GETDATE())>2
EDIT: Apparently DATEDIFF will work if PostDate is VARCHAR data type as well
DECLARE #PostDate VARCHAR(50)
SET #PostDate='08-01-2017'
SELECT DATEDIFF(DAY, #PostDate, GETDATE()) -- GETDATE() is 08-08-2017
-- Returns 7
Having said this, it is a good practice to keep Dates and Times as proper data types. In your case, you could change the data type to DATE, if possible. Will speed up lookups
EDIT 2: Please note, SQL Server works with ISO 8601 Date Format, which is YYYY-MM-DD, but the dates in OP's example, even though as per OP refer to dates in August 2017, are given incorrectly (referring to Jan and Feb 2017) and are stored as varchar. For correct results, these need to be either converted to DATE/DATETIME data type, or reformatted with the correct ISO format.
EDIT 3: Showing an example of casting OP's date format into proper, ISO format before calling DATEDIFF:
SET LANGUAGE british
DECLARE #PostDate VARCHAR(50)
SET #PostDate='2017-01-08'
SELECT DATEDIFF(DAY, CAST(#PostDate AS DATETIME), GETDATE()) -- GETDATE() is 08-08-2017
-- Returns 7
And the WHERE clause would be as follows:
-- In the begining of the select statement
SET LANGUAGE british
SELECT *
FROM ...
WHERE DATEDIFF(DAY, CAST(PostDate as datetime), GETDATE())>2
Is the POSTDATE - date column? If no then you are comparing strings and the result is as expected as '2017-01-08' > '08/10/2017' ('2' > '0'). Most probably you just need to cast the POSTDATE. See the example:
select
case
when '2017-01-08' > CONVERT(VARCHAR(10), dateadd(day,2,getdate()),101) THEN 1
ELSE 0
end without_cast,
case
when CAST('2017-01-08' AS DATE) > CONVERT(VARCHAR(10), dateadd(day,2,getdate()),101) THEN 1
ELSE 0
end with_cast
So what you need is:
SELECT *
FROM dbo.INVOICE i
WHERE MVT_TYPE = '122' AND SALESREF IS NOT NULL AND CAST(POSTDATE AS DATE) > CONVERT(VARCHAR(10), dateadd(day,2,getdate()),101)
AND NOT EXISTS (SELECT 1
FROM dbo.INVOICE
WHERE DEBIT = 'H' AND MVT_TYPE = '000' AND SALESREF = i.SALESREF )
Your problem is that you store a date as a varchar.
To compare 2 dates correctly you should compare their DATE rappresentation, not strings.
So I suggest you to convert your varchar to date, i.e. instead of
CAST(POSTDATE AS DATE) > CONVERT(VARCHAR(10), dateadd(day,2,getdate()),101)
you should use DATEFROMPARTS ( left(POSTDATE, 4), right(POSTDATE, 2), substring(POSTDATE,6,2)) > dateadd(day,2,cast(getdate() as date));.
DATEFROMPARTS function is available starting with SQL Server 2012, let me know if you are on the earlier version and I'll rewrite my code

Resources