snowflake handle null while reading csv file - snowflake-cloud-data-platform

I am trying to load a CSV file from S3. which has a null value in the integer type data field in the snowflake table.
So I try to use IFFNULL function but gets the error.
Numeric value 'null' is not recognized.
For example when I try
select IFNULL(null,0)
I get the answer as 0.
but the same thing when I try while reading the CSV file won't work
select $1,$2,ifnull($2,0)
from
#stage/path
(file_format => csv)
I get the null not recognized Error.
and it fails when $2 is null.
My csv format is as below.
create FILE FORMAT CSV
COMPRESSION = 'AUTO' FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n' SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = '\042'
TRIM_SPACE = FALSE
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = '\134'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO'
TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
Basically, I am just trying to convert null to 0, when reading from the stage.

The null string literal could be handled by setting NULL_IF:
CREATE FILE FORMAT CSV
...
NULL_IF = ('null', '\\N');

I used the second option listed in the Snowflake documentation specifying FIELD_OPTIONALLY_ENCLOSED_BY=NONE and EMPTY_FIELD_AS_NULL = FALSE in which case I'd need to provide a value to be used for NULLs (NULL_IF=('NULL')
https://docs.snowflake.com/en/user-guide/data-unload-considerations.html
"Leave string fields unenclosed by setting the FIELD_OPTIONALLY_ENCLOSED_BY option to NONE (default), and set the EMPTY_FIELD_AS_NULL value to FALSE to unload empty strings as empty fields.
If you choose this option, make sure to specify a replacement string for NULL data using the NULL_IF option, to distinguish NULL values from empty strings in the output file. If you later choose to load data from the output files, you will specify the same NULL_IF value to identify the NULL values in the data files."
So my query looked something like the following:
COPY INTO #~/unload/table FROM (
SELECT * FROM table
)
FILE_FORMAT = (TYPE = 'CSV' COMPRESSION = 'GZIP'
FIELD_DELIMITER = '\u0001'
EMPTY_FIELD_AS_NULL = FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = NONE
NULL_IF=('NULL'))
OVERWRITE = TRUE;

Related

How to create a file format of TSV in snowflake?

I have to create TSV file in snowflake.
If anyone knows could you please share the sample code.
Using a comma is so common for delimited files, the term for any delimited file format in Snowflake is CSV. You can create a TSV file format by specifying a type of CSV and a delimiter of tab:
CREATE FILE FORMAT TSV_FILE_FORMAT TYPE = 'CSV' COMPRESSION = 'AUTO'
FIELD_DELIMITER = '\t' RECORD_DELIMITER = '\n' SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = 'NONE' TRIM_SPACE = FALSE
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO'
TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
Your specific parameters may vary depending on the specific way the TSV handles things like escaping tab characters, etc., but this is a good start.

Found character instead of field delimiter '|' in Snowflake

I have a row in my CSV file like mentioned below
"TEXT"|"123584543"||||"Sherly"||"E'Sheryl"|||"DOCT"||"DC"|||||"AC"|||||||||||
I am trying to create stage using the below query:
Create or Replace file format test_stg
type = CSV
RECORD_DELIMITER = '\n'
FIELD_DELIMITER = '|'
FIELD_OPTIONALLY_ENCLOSED_BY = '\042'
SKIP_HEADER=1
empty_field_as_null = true
ESCAPE = '"';
When I run the above query I'm getting error which I have mentioned below:
**SQL compilation error: value [\"] for parameter 'FIELD_OPTIONALLY_ENCLOSED_BY' conflict with parameter 'ESCAPE'**
When I try the below query, it is getting executed successfully.
create or replace file format test_stg1
type = csv
record_delimiter = '\n'
field_delimiter = '|'
skip_header = 1
null_if = ('NULL', 'null')
empty_field_as_null = true
FIELD_OPTIONALLY_ENCLOSED_BY = '0x22';
This query gets executed successfully. But when I run the COPY command, I'm getting an unusual error - Found character instead of field delimiter '|'.
Can anyone guide in fixing this issue?
Thanks :)
Remove the FIELD_OPTIONALLY_ENCLOSED_BY = '0x22' ,recreate file format and run the copy statement.

Copy command in Snowflake failed to parse \n in data

I have a CSV file format data that brings string values and JSON entries. For example -
message_id
status
user_detail
date
a123bxe
Success
{user_name:'jim',full_name:'Jim Mathews'}
2021-07-28
b245apl
Success
{user_name: '\n153674#dewbbe',full_name:'Dev Webbe'}
2021-07-28
The file has | as field delimiter \n as a record delimiter. Because of \n coming in the data value, snowflake considers the rest as a new record and tries to enter the rest of the data under the table which eventually giving an error due to data type mismatch.
Here is the file format, I'm using:-
FILE_FORMAT = COMPRESSION = ‘AUTO’
FIELD_DELIMITER = ‘|’
RECORD_DELIMITER = ‘\n’
SKIP_HEADER = 0
ESCAPE_UNENCLOSED_FIELD='\n'
VALIDATE_UTF8 = TRUE
EMPTY_FIELD_AS_NULL = TRUE
TRIM_SPACE = TRUE
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
NULL_IF = (‘NULL’, ‘null’,‘None’,‘NONE’,"""")
How should I escape the \n coming as a part of data value?
Try using the FIELD_OPTIONALLY_ENCLOSED_BY parameter with a single quote

Create Snowpipe Copy into using a file form a stage fails

When I try to load the data below from a stage file it fails with invalid date, is there a way to resolve this issue? Without changing the source file.
I am trying to setup a Snowpipe
Orig_Int_Date
04-21-2020
create or replace file format Ally_format
type = csv
field_delimiter = '|'
skip_header = 1
empty_field_as_null = true
REPLACE_INVALID_CHARACTERS = TRUE
DATE_FORMAT = '<MM-DD-YYYY>'
EMPTY_FIELD_AS_NULL = TRUE;**
Copy into NAM_FIN_DB.FIN_PUBLIC.ALLY
from #NAM_FIN_DB.PUBLIC.FP_FINANCE
file_format = Ally_format
pattern='ALLY.*';**
I think your date format line should be:
DATE_FORMAT = 'MM-DD-YYYY'
not
DATE_FORMAT = '<MM-DD-YYYY>'

Snowflake - Escape character parameter not working in Copy statement

Problem Statement -
While doing the data load using the copy command and by defining the escape property the statement is not eliminating the escape character from the data.
Ex-
I'm trying to load the data from the CSV file. The file is having the data in the following Format in one of the column ('EC F&G BREWER\'S INT\'L BEER'). The expectation on this is that the escape character in the data (which is backslash '\') should be removed from the data field after the load.
Following is the copy statement I'm using -
COPY INTO SANDBOX.TEST_SBX.TEST_20200512
FROM #STAGE_NAME/TEST_FILE/
FILE_FORMAT = (FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n'
SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = '\047'
TRIM_SPACE = FALSE
ESCAPE ='\134'
ESCAPE_UNENCLOSED_FIELD='\134'
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('NULL', 'null', '') )
PATTERN='.*.gz.*'
PURGE=FALSE
ON_ERROR = ABORT_STATEMENT
FORCE = FALSE
RETURN_FAILED_ONLY = FALSE
;
There are many columns in this file that have the () backslash in the character field, and all are getting ignored during the load.
I cannot switch to the manual column selection mode and use the Regexp to replace the escape character, I have to use the copy command without switching to the column selection mode.
I expected that the escape character configured in the file format is treated appropriately (escaped) without having to treat it as a transformation similar to how it is treated as escape characters in the other data processing/loading engines.
Please suggest what I can do here on this.
This is the code I have implemented
--Data File
'EC F&G ANSHUL\'s Public COMPANY'
'YB MARTHA\'S VINEYARD LOUNGE'
'EC F&G BREWER\'S INT\'L BEER'
COPY INTO test_so FROM #file_format_stage/Test_File.csv.gz
FILE_FORMAT = (FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n'
SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '\047'
TRIM_SPACE = FALSE
ESCAPE ='\\'
--ESCAPE_UNENCLOSED_FIELD=NONE
NULL_IF = ('NULL', 'null', '')
SKIP_BYTE_ORDER_MARK = False)
force=true
;
And the result
I tried replicating exactly the same still not getting the same result. Got the understanding there is problem with the file . Or something wrong at the account level parameters which is cause the problem.
CREATE OR REPLACE TABLE SANDBOX.AAGRA018_SBX.TEXT_FILE_TST
(COL1 VARCHAR(50)
);
TRUNCATE TABLE SANDBOX.AAGRA018_SBX.TEXT_FILE_TST;
COPY INTO SANDBOX.AAGRA018_SBX.TEXT_FILE_TST
FROM #~/Test_File.txt.gz
FILE_FORMAT = (FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n'
SKIP_HEADER = 0 FIELD_OPTIONALLY_ENCLOSED_BY = '\047'
TRIM_SPACE = FALSE ESCAPE ='\\' --ESCAPE_UNENCLOSED_FIELD='\134'
NULL_IF = ('NULL', 'null', '')
)
force=true;
select * from SANDBOX.AAGRA018_SBX.TEXT_FILE_TST;
Data File -
YB MARTHA\'S VINEYARD LOUNGE
EC F&G BREWER\'S INT\'L BEER
This is not giving the desired result
[Snowflake UI Image copy][1]
[1]: https://i.stack.imgur.com/IBgNz.png

Resources