Date '2017/02/23' not recognized in SnowFlake - snowflake-cloud-data-platform

I have a csv with example data as:
61| MXN| Mexican Peso| 2017/02/23
I'm trying to insert this into snowflake using the following commands:
create or replace stage table_stage file_format = (TYPE=CSV,ENCODING = 'WINDOWS1252');
copy into table from #table_stage/table.csv.gz file_format = (TYPE=CSV FIELD_DELIMITER='|' error_on_column_count_mismatch=false, ENCODING = 'WINDOWS1252');
put file://table.csv #table_stage auto_compress=true
But I get the error as
Date '2017/02/23' not recognized
Using alter session set date_input_format = 'YYYY-DD-MM' to change the date format fixes it.
But what can I add in the create stage or the copy command itself to change the date format?

Snowflake has session parameter DATE_INPUT_FORMAT that control over the input format for DATE data type.
The default value is AUTO specifies that Snowflake attempts to automatically detect the format of dates stored in the system during the session meaning the COPY INTO <table> command attempts to match all date strings in the staged data files with one of the formats listed in Supported Formats for AUTO Detection.
To guarantee correct loading of data, Snowflake strongly recommends explicitly setting the file format options for data loading (as explain in documentation)
To solve your issue you need to set the DATE_INPUT_FORMAT parameter with the expected format of dates in your staged files.

Just set the date format in the file format: https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html

Use the DATE_FORMAT parameter in file_format condition.
More you can read here: COPY INTO
copy into table
from #table_stage/table.csv.gz
file_format = (TYPE=CSV
FIELD_DELIMITER='|'
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
ENCODING = 'WINDOWS1252'
DATE_FORMAT = 'YYYY/MM/DD');

Related

How to load Parquet/AVRO into multiple columns in Snowflake with schema auto detection?

When trying to load a Parquet/AVRO file into a Snowflake table I get the error:
PARQUET file format can produce one and only one column of type variant or object or array. Use CSV file format if you want to load more than one column.
But I don't want to load these files into a new one column table — I need the COPY command to match the columns of the existing table.
What can I do to get schema auto detection?
Good news, that error message is outdated, as now Snowflake supports schema detection and COPY INTO multiple columns.
To reproduce the error:
create or replace table hits3 (
WatchID BIGINT,
JavaEnable SMALLINT,
Title TEXT
);
copy into hits3
from #temp.public.my_ext_stage/files/
file_format = (type = parquet);
-- PARQUET file format can produce one and only one column of type variant or object or array.
-- Use CSV file format if you want to load more than one column.
To fix the error and have Snowflake match the columns from the table and Parquet/AVRO files just add the option MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE (or MATCH_BY_COLUMN_NAME=CASE_SENSITIVE):
copy into hits3
from #temp.public.my_ext_stage/files/
file_format = (type = parquet)
match_by_column_name = case_insensitive;
Docs:
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
https://docs.snowflake.com/en/user-guide/data-load-overview.html?#detection-of-column-definitions-in-staged-semi-structured-data-files

How to load and validate timestamp data in multiple formats?

I am populating table data from a file using the copy command. The table includes timestamp data in multiple formats. I have set alter session set TIMESTAMP_INPUT_FORMAT = 'dd-mon-yyyy hh24.mi.ss.ff6'; which handles the formatting of certain values thus formatted, but there are other timestamp values in the source file that are formatted differently. To cope with this I am doing e.g.
copy into <table> (
<timestamp_column_1>,
<timestamp_column_2>
...
) from (
SELECT
$1,
TO_TIMESTAMP_TZ(t.$2, 'DD-MON-YY')
This works, but the validate command does not support transformations, so my current validation method is unrealiable.
Is there a way I can achieve what I want in my load without using transformations?

Snowflake Copy Into failing when insert Null in timestamp column

Trying to load file data into Snowflake using COPY INTO. The table has a timestamp column. The file has only Null's empty string "" in that column.
On running copy into with File Format Timestamp option set AUTO, the statement is failing stating Can't parse '' as timestamp.
Is there any way to handle this
Using NULL_IF option:
NULL_IF = ( 'string1' [ , 'string2' ... ] )
String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.
NULL_IF = ('\\N', '')

NULL Value Handling for CSV Files Via External Tables in Snowflake

I am trying to get the NULL_IF parameter of a file format working when applied to an external table.
I have a source CSV file containing NULL values in some columns. NULLS in the source file appear in the format "\N" (all non numeric values in the file are quoted). Here is an example line from the raw csv where the ModifiedOn value is NULL in the source system:
"AirportId" , "IATACode" , "CreatedOn" , "ModifiedOn"
1 , "ACU" , "2015-08-25 16:58:45" , "\N"
I have a file format defined including the parameter NULL_IF = "\\N"
The following select statement successfully interprets the correct rows as holding NULL values.
SELECT $8
FROM #MyS3Bucket
(
file_format => 'CSV_1',
pattern => '.*MyFileType.*.csv.gz'
)
However if I use the same file format with an external table like this:
CREATE OR REPLACE EXTERNAL TABLE MyTable
MyColumn varchar as (value:c8::varchar)
WITH LOCATION = #MyS3Bucket
FILE_FORMAT = (FORMAT_NAME = 'CSV_1')
PATTERN = '.*MyFileType_.*.csv.gz';
Each row holds \N as a value rather than NULL.
I assume this is caused by external tables providing a single variant output that can then be further split rather than directly presenting individual columns in the csv file.
One solution is to code the NULL handling into the external view like this:
CREATE OR REPLACE EXTERNAL TABLE MyTable
MyColumn varchar as (NULLIF(value:c8::varchar,'\\N'))
WITH LOCATION = #MyS3Bucket
FILE_FORMAT = (FORMAT_NAME = 'CSV_1')
PATTERN = '.*MyFileType_.*.csv.gz';
However this leaves me at risk of having to re-write a lot of external table code if the file format changes whereas the file format could\should centralise that NULL definition. It would also mean the NULL conversion would have to be handled column by column rather than file by file increasing code complexity.
Is there a way that I can have the NULL values appear through an external table without handling them explicitly through column definitions?
Ideally this would be applied through a file format object but changes to the format of the raw file are not impossible.
I am able to reproduce the issue, and it seems like a bug. If you have access to Snowflake support, it could be better to submit a support case regarding to this issue, so you can easily follow the process.

CSV file w/ two different timestamp formats in Snowflake

(Submitting on behalf of a Snowflake User)
I have a csv file that has two different timestamp format.
For example:
time_stmp1: 2019-07-01 00:03:17.000 EDT
time_stmp2: 2019-06-30 21:03:17 PDT
In the copy command I am able to specify only one format.
How should I proceed to load both columns in TIMESTAMP_LTZ data type?
Any recommendations?
you could use the SELECT form of COPY INTO where you transform the date fields individually, something like:
COPY INTO MY_TABLE (NAME, DOB, DOD, HAIR_COLOUR)
FROM (
SELECT $1, TO_DATE($2,'YYYYMMDD'), TO_DATE($3,'MM-DD-YYYY'), $4
FROM #MY_STAGE/mypeeps (file_format => 'MY_CSV_FORMAT')
)
ON_ERROR = CONTINUE;
Currently, Snowflake does not allow loading data with different date formats from one single file.
If the data in the file is just date, then use datatype as the date and, in FILE FORMAT, define date as AUTO.
If the data is included date and time, then use the datatype as timestamp and define timestamp in the FILE FORMAT as per the data file.
DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'YY/MM/DD HH24:MI:SS'
If there are multiple date formats in the file, for example, MM/DD/YY and MM/DD/YY HH:MI: SS, it does not load correctly, you may need to split the file and load separately or update all data(date type) to a single common format and load it to the table.

Resources