CSV file w/ two different timestamp formats in Snowflake - snowflake-cloud-data-platform

(Submitting on behalf of a Snowflake User)
I have a csv file that has two different timestamp format.
For example:
time_stmp1: 2019-07-01 00:03:17.000 EDT
time_stmp2: 2019-06-30 21:03:17 PDT
In the copy command I am able to specify only one format.
How should I proceed to load both columns in TIMESTAMP_LTZ data type?
Any recommendations?

you could use the SELECT form of COPY INTO where you transform the date fields individually, something like:
COPY INTO MY_TABLE (NAME, DOB, DOD, HAIR_COLOUR)
FROM (
SELECT $1, TO_DATE($2,'YYYYMMDD'), TO_DATE($3,'MM-DD-YYYY'), $4
FROM #MY_STAGE/mypeeps (file_format => 'MY_CSV_FORMAT')
)
ON_ERROR = CONTINUE;

Currently, Snowflake does not allow loading data with different date formats from one single file.
If the data in the file is just date, then use datatype as the date and, in FILE FORMAT, define date as AUTO.
If the data is included date and time, then use the datatype as timestamp and define timestamp in the FILE FORMAT as per the data file.
DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'YY/MM/DD HH24:MI:SS'
If there are multiple date formats in the file, for example, MM/DD/YY and MM/DD/YY HH:MI: SS, it does not load correctly, you may need to split the file and load separately or update all data(date type) to a single common format and load it to the table.

Related

Azure Data Factory copy activity failed with Exception ERROR [22007] Timestamp is not recognized?

I am using Azure Data Factory to copy data from CSV to Snowflake, the copy executes fine but it has an error when it comes to copy Date from the CSV which has this value (14/01/2000), if the Date is (12/10/2000) or less, it works very well.
Here is the error message:
ErrorCode=UserErrorOdbcOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ERROR [22007] Timestamp ‘14/01/2000’ is not recognized
I tried to adjust the format of the date in the copy activity to be dd/MM/yyyy or change the Culture to en-UK as the below image but it has the same issue.
I tried to use all the possible types of date in Snowflake as below but I still have the same issue:
DATE
DATETIME
TIMESTAMP
TIMESTAMP_LTZ
Snowflake doesn't supports format as DD/MM/YYYY and even it supports MM/DD/YYYY it can lead to incorrect dates (05/02/2013 could be interpreted as May 2, 2013 instead of February 5, 2013).
So this:
select '14/01/2000'::timestamp;
produces:
Timestamp '14/01/2000' is not recognized
while this:
select '01/14/2000'::timestamp;
produces:
2000-01-14 00:00:00.000
Same for:
select '14/01/2000'::date;
select '01/14/2000'::date;
The guidelines for how to use date/timestamp formats are described here.
In your case one way to get that value as a date is to use the to_date function, like this:
select to_date('14/01/2000', 'DD/MM/YYYY');
gives me:
2000-01-14

Date '2017/02/23' not recognized in SnowFlake

I have a csv with example data as:
61| MXN| Mexican Peso| 2017/02/23
I'm trying to insert this into snowflake using the following commands:
create or replace stage table_stage file_format = (TYPE=CSV,ENCODING = 'WINDOWS1252');
copy into table from #table_stage/table.csv.gz file_format = (TYPE=CSV FIELD_DELIMITER='|' error_on_column_count_mismatch=false, ENCODING = 'WINDOWS1252');
put file://table.csv #table_stage auto_compress=true
But I get the error as
Date '2017/02/23' not recognized
Using alter session set date_input_format = 'YYYY-DD-MM' to change the date format fixes it.
But what can I add in the create stage or the copy command itself to change the date format?
Snowflake has session parameter DATE_INPUT_FORMAT that control over the input format for DATE data type.
The default value is AUTO specifies that Snowflake attempts to automatically detect the format of dates stored in the system during the session meaning the COPY INTO <table> command attempts to match all date strings in the staged data files with one of the formats listed in Supported Formats for AUTO Detection.
To guarantee correct loading of data, Snowflake strongly recommends explicitly setting the file format options for data loading (as explain in documentation)
To solve your issue you need to set the DATE_INPUT_FORMAT parameter with the expected format of dates in your staged files.
Just set the date format in the file format: https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html
Use the DATE_FORMAT parameter in file_format condition.
More you can read here: COPY INTO
copy into table
from #table_stage/table.csv.gz
file_format = (TYPE=CSV
FIELD_DELIMITER='|'
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
ENCODING = 'WINDOWS1252'
DATE_FORMAT = 'YYYY/MM/DD');

How to load and validate timestamp data in multiple formats?

I am populating table data from a file using the copy command. The table includes timestamp data in multiple formats. I have set alter session set TIMESTAMP_INPUT_FORMAT = 'dd-mon-yyyy hh24.mi.ss.ff6'; which handles the formatting of certain values thus formatted, but there are other timestamp values in the source file that are formatted differently. To cope with this I am doing e.g.
copy into <table> (
<timestamp_column_1>,
<timestamp_column_2>
...
) from (
SELECT
$1,
TO_TIMESTAMP_TZ(t.$2, 'DD-MON-YY')
This works, but the validate command does not support transformations, so my current validation method is unrealiable.
Is there a way I can achieve what I want in my load without using transformations?

how do I convert date format using bteq script?

I'm loading data from a flat file which the date data are in 20150605 format....However, I need to convert it into yyy-mm-dd before loading it into Teradata. I tried the following, but it unfortunately failed.
Values
( Format(:a, 'YYYY-MM-DD')
);
How do I convert this type of data conversion. For others, it would be
(:a (integer))
if I've not mistaken...
The FORMAT clause describes the external data. Using this Teradata-specific cast syntax:
(:a (DATE, FORMAT 'yyyymmdd'))
For something other than FastLoad / TPT LOAD, you could also use
CAST(:a AS DATE FORMAT'yyyymmdd')

Unable to retrieve date columns in hive

I am finding difficulty when it comes to retrieving date values in hive. My query is
create external table test1(DISPLAYSCALE int, CREATED_DATE date, LAST_EDITED_DATE date)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde' STORED AS INPUTFORMAT 'com.esri.json.hadoop.UnenclosedJsonInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
When I try to use the select * from test1 limit5 I get this error;
Failed with exception
java.io.IOException:java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.io.DateWritable cannot be cast to
org.apache.hadoop.io.Text
As per the json the datatype for CREATED_DATE and CREATED_DATE are esriFieldTypeDate and the values are in this format say 2013-11-20 09:39:25.000001.
So i used the date datatype while creating the table, copied it to HDFS using the unenclosed json and used the select * query to retrieve the columns, but I get the above error. To get the values we are creating the same table with string data type respectively instead of date and we are able to get the values .
Can you suggest a solution for this problem. This question may seem silly but I am pretty new to programming.
Date data type supports only format YYYY-­MM-­DD
Timestamps or VARCHAR data type you can be used rather than date data type.
Date data types do not exist in Hive. In fact the dates are treated as strings in Hive. Refer the below post
http://www.folkstalk.com/2011/11/date-functions-in-hive.html
EsriJsonSerDe (and GeoJsonSerDe) support for DATE and TIMETAMP type columns is added on Spatial-Framework-for-Hadoop master in git.
Alternately, you can try using org.openx.data.jsonserde.JsonSerDe or org.apache.hive.hcatalog.data.JsonSerDe (instead of EsriJsonSerDe - and with column type string rather than binary) together with UnenclosedEsriJsonInputFormat.
[disclosure: Spatial-Framework-for-Hadoop collaborator]

Resources