escape comma in snowflake copy into - snowflake-cloud-data-platform

COPY INTO #TMP_STG
FROM table
FILE_FORMAT = (
TYPE=CSV
EMPTY_FIELD_AS_NULL = false
FIELD_DELIMITER=','
)
single = false
max_file_size=4900000000;
I'm generating a file from a sf table using COPY INTO, set the delimiter as ','. But there's a column which contains comma in the value, e.g.
col1 col2 col3
CAD Toronto,ON 10
USD Dallas,Texas 10
I was thinking to add ESCAPE = '/' inside FILE_FORMAT, but also noticed it mentions to use with FIELD_OPTIONALLY_ENCLOSED_BY, do I need to use them together? How do I make sure Toronto,ON is in col2 and not divided by delimiter?

Related

Cannot insert Array in Snowflake

I have a CSV file with the following data:
eno | phonelist | shots
"1" | "['1112223333','6195551234']" | "[[11,12]]"
The DDL statement I have used to create table in snowflake is as follows:
CREATE TABLE ArrayTable (eno INTEGER, phonelist array,shots array);
I need to insert the data from the CSV into the Snowflake table and the method I have used is:
create or replace stage ArrayTable_stage file_format = (TYPE=CSV)
put file://ArrayTable #ArrayTable_stage auto_compress=true
copy into ArrayTable from #ArrayTable_stage/ArrayTable.gz
file_format = (TYPE=CSV FIELD_DELIMITER='|' FIELD_OPTIONALLY_ENCLOSED_BY='\"\')
But when I try to run the code, I get the error:
Copy to table failed: 100069 (22P02): Error parsing JSON:
('1112223333','6195551234')
How to resolve this?
FIELD_OPTIONALLY_ENCLOSED_BY='\"\' base on the row you have that should just be '\"'
select parse_json('[\'1112223333\',\'6195551234\']');
works (the back slashes are to get around the sql parser)
but your output has parens (, ) which is different.
SELECT column2, TRY_PARSE_JSON(column2) as j
FROM #ArrayTable_stage/ArrayTable.gz
file_format = (TYPE=CSV FIELD_DELIMITER='|' FIELD_OPTIONALLY_ENCLOSED_BY='\"\')
WHERE j is null;
will show which values are failing to parse..
failing that you might want to use to_array to parse column2 and thus insert into you table the SELECTED/transformed data, that is failing to auto transform

parse csv file with string fields that contain double quotes and/or commas using Snowflake COPY INTO

MY QUESTION:
How do I construct my copy into statement so that my file properly parses and loads? Thanks!
THE PROBLEM:
I have a csv file that I need to parse and copy to a table from a named stage in Snowflake.
The file looks similar to below:
ID, Name, Job Title,Company Name, Email Address, Phone Number
5244, Ted Jones, Manager, Quality Comms, tj#email.com,555-630-1277
5246, Talim Jones,""P-Boss"" of the world, Quality Comms, taj#email.com,555-630-127
5247, Jordy Jax,,"M & G Services.",jj#services.com, 616-268-1546
MY CODE:
COPY INTO DB.SCHEMA.TABLE_NAME
(
ID,
FULL_NAME,
JOB_TITLE,
EMAIL_ADDRESS
)
FROM
(
SELECT $1::NUMBER AS ID,
$2 AS FULL_NAME,
$3 AS JOB_TITLE,
$5 AS EMAIL_ADDRESS
FROM #STAGE_NAME)
--SNOWFLAKE DOES NOT SUPPORT UTF 16 OR 32 SO HAVING REPLACE INVALID UTF 8 CHARACTERS
FILE_FORMAT = (TYPE = 'CSV', RECORD_DELIMITER = '\n', FIELD_DELIMITER = ',', SKIP_HEADER = 1,FIELD_OPTIONALLY_ENCLOSED_BY = '"',TRIM_SPACE = TRUE,REPLACE_INVALID_CHARACTERS = TRUE)
ON_ERROR = CONTINUE
--COPY A FILE INTO A TABLE EVEN IF IT HAS ALREADY BEEN LOADED INTO THE TABLE
FORCE = TRUE
MY ERROR MESSAGE:
Found character 'P' instead of field delimiter ','
WHAT I HAVE TRIED:
I have tried many things, most notably:
I have tried to escape the double quotes in my select statement for the Job Title.
I have tried removing the FIELD_OPTIONALLY_ENCLOSED_BY = '"' parameter and just using ESCAPE = '"' with no luck.
Try removing the option FIELD_OPTIONALLY_ENCLOSED_BY = '"' and also include a replace function in your inner query.
Example:
SELECT
$1::NUMBER AS ID,
$2 AS FULL_NAME,
replace($3,'"','') AS JOB_TITLE,
$5 AS EMAIL_ADDRESS
FROM #STAGE_NAME

Snowflake Copy Into - Multiple column escape handling

I have a unique situation while loading data from a csv file into Snowflake.
I have multiple columns that need some re-work
Column enclosed in " and contains columns - this is handled properly
Columns that are enclosed in " but also contain " within the data i.e. ( "\"DataValue\"")
My File Format is as such:
ALTER FILE FORMAT DB.SCHEMA.FF_CSV_TEST
SET COMPRESSION = 'AUTO'
FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n'
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
TRIM_SPACE = FALSE
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
ESCAPE = NONE
ESCAPE_UNENCLOSED_FIELD = 'NONE'
DATE_FORMAT = 'AUTO'
TIMESTAMP_FORMAT = 'AUTO'
NULL_IF = ('\\N');
My columns enclosed in " that contain commas are being handled fine. However the remaining columns that resemble ( "\"DataValue\"") are returning errors:
Found character 'V' instead of field delimiter ','
Are there there any ways to handle this?
I have attempted using a select against the stage itself:
select t.$1, t.$2, t.$3, t.$4, t.$5, TRIM(t.$6,'"')
from #STAGE_TEST/file.csv.gz t
LIMIT 1000;
with t.$5 being the column enclosed with " and containing commas
and t.$6 being the ( "\"DataValue\"")
Are there any other options than developing python (or other) code that strips out this before processing into Snowflake?
Add the \ to your escape parameter. It looks like your quote values are properly escaped, so that should take care of those quotes.

How do I unload a CSV file where only non-null values are wrapped in quotes, quotes are optionally enclosed, and null values are not quoted?

(Submitting on behalf of a Snowflake User)
For example - ""NiceOne"" LLC","Robert","GoodRX",,"Maxift","Brian","P,N and B","Jane"
I have been able use create a file format that satisfies each of these conditions, but not one that satisfies all three.
I've used the following recommendation:
Your first column is malformed, missing the initial ", it should be:
"""NiceOne"" LLC"
After fixing that, you should be able to load your data with almost
default settings,
COPY INTO my_table FROM #my_stage/my_file.csv FILE_FORMAT = (TYPE =
CSV FIELD_OPTIONALLY_ENCLOSED_BY = '"');
...but the above format returns:
returns -
"""NiceOne"" LLC","Robert","GoodRX","","Maxift","Brian","P,N and B","Jane"
I don't want quotes around empty fields. I'm looking for
"""NiceOne"" LLC","Robert","GoodRX",,"Maxift","Brian","P,N and B","Jane"
Any recommendations?
If you use the following you will not get quotes around NULL fields, but you will get quotes on '' (empty text). You can always concatenate the fields and format the resulting line manually if this doesn't suite you.
COPY INTO #my_stage/my_file.CSV
FROM (
SELECT
'"NiceOne" LLC' A, 'Robert' B, 'GoodRX' C, NULL D,
'Maxift' E, 'Brian' F, 'P,N and B' G, 'Jane' H
)
FILE_FORMAT = (
TYPE = CSV
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ()
COMPRESSION = NONE
)
OVERWRITE = TRUE
SINGLE = TRUE

snowflake - How to use a file format to decode a csv column?

I've got some data in a string column that is in a strange csv format. I can write a file format that correctly interprets it. How do I use my file format against data that has already been imported?
create table test_table
(
my_csv_column string
)
How do I split/flatten this column with:
create or replace file format my_csv_file_format
type = 'CSV'
RECORD_DELIMITER = '0x0A'
field_delimiter = ' '
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
VALIDATE_UTF8 = FALSE
Please assume that I cannot use split, as I want to use the rich functionality of the file format (optional escape characters, date recognition etc.).
What I'm trying to achieve is something like the below (but I cannot find how to do it)
copy into destination_Table
from
(select
s.$1
,s.$2
,s.$3
,s.$4
from test_table s
file_format = (column_name ='my_csv_column' , format_name = 'my_csv_file_format'))

Resources