parse csv file with string fields that contain double quotes and/or commas using Snowflake COPY INTO

parse csv file with string fields that contain double quotes and/or commas using Snowflake COPY INTO - snowflake-cloud-data-platform

MY QUESTION:
How do I construct my copy into statement so that my file properly parses and loads? Thanks!
THE PROBLEM:
I have a csv file that I need to parse and copy to a table from a named stage in Snowflake.
The file looks similar to below:
ID, Name, Job Title,Company Name, Email Address, Phone Number
5244, Ted Jones, Manager, Quality Comms, tj#email.com,555-630-1277
5246, Talim Jones,""P-Boss"" of the world, Quality Comms, taj#email.com,555-630-127
5247, Jordy Jax,,"M & G Services.",jj#services.com, 616-268-1546
MY CODE:
COPY INTO DB.SCHEMA.TABLE_NAME
(
ID,
FULL_NAME,
JOB_TITLE,
EMAIL_ADDRESS
)
FROM
(
SELECT $1::NUMBER AS ID,
$2 AS FULL_NAME,
$3 AS JOB_TITLE,
$5 AS EMAIL_ADDRESS
FROM #STAGE_NAME)
--SNOWFLAKE DOES NOT SUPPORT UTF 16 OR 32 SO HAVING REPLACE INVALID UTF 8 CHARACTERS
FILE_FORMAT = (TYPE = 'CSV', RECORD_DELIMITER = '\n', FIELD_DELIMITER = ',', SKIP_HEADER = 1,FIELD_OPTIONALLY_ENCLOSED_BY = '"',TRIM_SPACE = TRUE,REPLACE_INVALID_CHARACTERS = TRUE)
ON_ERROR = CONTINUE
--COPY A FILE INTO A TABLE EVEN IF IT HAS ALREADY BEEN LOADED INTO THE TABLE
FORCE = TRUE
MY ERROR MESSAGE:
Found character 'P' instead of field delimiter ','
WHAT I HAVE TRIED:
I have tried many things, most notably:
I have tried to escape the double quotes in my select statement for the Job Title.
I have tried removing the FIELD_OPTIONALLY_ENCLOSED_BY = '"' parameter and just using ESCAPE = '"' with no luck.

Try removing the option FIELD_OPTIONALLY_ENCLOSED_BY = '"' and also include a replace function in your inner query.
Example:
SELECT
$1::NUMBER AS ID,
$2 AS FULL_NAME,
replace($3,'"','') AS JOB_TITLE,
$5 AS EMAIL_ADDRESS
FROM #STAGE_NAME

Related

Snowflake Copy Into - Multiple column escape handling

I have a unique situation while loading data from a csv file into Snowflake.
I have multiple columns that need some re-work
Column enclosed in " and contains columns - this is handled properly
Columns that are enclosed in " but also contain " within the data i.e. ( "\"DataValue\"")
My File Format is as such:
ALTER FILE FORMAT DB.SCHEMA.FF_CSV_TEST
SET COMPRESSION = 'AUTO'
FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n'
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
TRIM_SPACE = FALSE
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
ESCAPE = NONE
ESCAPE_UNENCLOSED_FIELD = 'NONE'
DATE_FORMAT = 'AUTO'
TIMESTAMP_FORMAT = 'AUTO'
NULL_IF = ('\\N');
My columns enclosed in " that contain commas are being handled fine. However the remaining columns that resemble ( "\"DataValue\"") are returning errors:
Found character 'V' instead of field delimiter ','
Are there there any ways to handle this?
I have attempted using a select against the stage itself:
select t.$1, t.$2, t.$3, t.$4, t.$5, TRIM(t.$6,'"')
from #STAGE_TEST/file.csv.gz t
LIMIT 1000;
with t.$5 being the column enclosed with " and containing commas
and t.$6 being the ( "\"DataValue\"")
Are there any other options than developing python (or other) code that strips out this before processing into Snowflake?

Add the \ to your escape parameter. It looks like your quote values are properly escaped, so that should take care of those quotes.

escape double quotes in snowflake

I'm trying to load the data using copy into command. Field has special character as value \", but
FIELD_OPTIONALLY_ENCLOSED_BY its escaping \ and getting error while loading
Found character '0' instead of field delimiter ';'
DATA:
"TOL";"AANVR. 1E K ZIE RF.\";"011188"
After escaping second column value its considering and escaping delimiter AANVR. 1E K ZIE RF.\"; but actually it should be AANVR. 1E K ZIE RF.\.
File format
CREATE OR REPLACE FILE FORMAT TEST
FIELD_DELIMITER = ';'
SKIP_HEADER = 1
TIMESTAMP_FORMAT = 'MM/DD/YYYYHH24:MI:SS'
escape = "\\" '
TRIM_SPACE = TRUE
FIELD_OPTIONALLY_ENCLOSED_BY = '\"'
NULL_IF = ('')
ENCODING = "iso-8859-1"
;

If you need to replace double quotes in an existing table, you can use '\"' syntax in replace function. Example provided below.
select replace(column_name,'\"','') as column_name from table_name

Rough example, but the below works for me. Let me know if you're looking for a different output.
CREATE OR REPLACE table DOUBLE_TEST_DATA (
string1 string
, varchar1 varchar
, string2 string
);
COPY INTO DOUBLE_TEST_DATA FROM #TEST/doublequotesforum.csv.gz
FILE_FORMAT = (
TYPE=CSV
, FIELD_DELIMITER = ';'
, FIELD_OPTIONALLY_ENCLOSED_BY='"'
);
select * from DOUBLE_TEST_DATA;
Output:

My CSV file with double quotes enclosed fields - numeric value ' "12131" ' not recognized

I staged a csv file has all the fields enclosed in double quotes (" ") and comma separated and rows are separated by newline character. The value in the enclosed fields also contains newline characters (\n).
I am using the default FILE FORMAT = CSV. When using COPY INTO I am seeing a column mismatch error in this case.
I solved this first error by adding the file type to specify the FIELD_OPTIONALLY_ENCLOSED_BY = attribute in the SQL below.
However when I try to import NUMBER values from csv file, I already used FIELD_OPTIONALLY_ENCLOSED_BY='"'; but it's not working. I get "Numeric value '"3922000"' is not recognized" error.
A sample of my .csv file looks like this:
"3922000","14733370","57256","2","3","2","2","2019-05-23
14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000","","tcllVeEFPD"
My COPY INTO statement is below:
COPY INTO '..'
FROM '...'
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT";
I get a feeling that NUMBER is interpreted as STRING.
Does anyone have solution for that one?

Try using a subquery in the FROM clause of the COPY command where each column is listed out and cast the appropriate columns.
Ex.
COPY INTO '...'
FROM (
SELECT $1::INTEGER
$2::FLOAT
...
)
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT";

How do I unload a CSV file where only non-null values are wrapped in quotes, quotes are optionally enclosed, and null values are not quoted?

(Submitting on behalf of a Snowflake User)
For example - ""NiceOne"" LLC","Robert","GoodRX",,"Maxift","Brian","P,N and B","Jane"
I have been able use create a file format that satisfies each of these conditions, but not one that satisfies all three.
I've used the following recommendation:
Your first column is malformed, missing the initial ", it should be:
"""NiceOne"" LLC"
After fixing that, you should be able to load your data with almost
default settings,
COPY INTO my_table FROM #my_stage/my_file.csv FILE_FORMAT = (TYPE =
CSV FIELD_OPTIONALLY_ENCLOSED_BY = '"');
...but the above format returns:
returns -
"""NiceOne"" LLC","Robert","GoodRX","","Maxift","Brian","P,N and B","Jane"
I don't want quotes around empty fields. I'm looking for
"""NiceOne"" LLC","Robert","GoodRX",,"Maxift","Brian","P,N and B","Jane"
Any recommendations?

If you use the following you will not get quotes around NULL fields, but you will get quotes on '' (empty text). You can always concatenate the fields and format the resulting line manually if this doesn't suite you.
COPY INTO #my_stage/my_file.CSV
FROM (
SELECT
'"NiceOne" LLC' A, 'Robert' B, 'GoodRX' C, NULL D,
'Maxift' E, 'Brian' F, 'P,N and B' G, 'Jane' H
)
FILE_FORMAT = (
TYPE = CSV
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ()
COMPRESSION = NONE
)
OVERWRITE = TRUE
SINGLE = TRUE

snowflake - How to use a file format to decode a csv column?

I've got some data in a string column that is in a strange csv format. I can write a file format that correctly interprets it. How do I use my file format against data that has already been imported?
create table test_table
(
my_csv_column string
)
How do I split/flatten this column with:
create or replace file format my_csv_file_format
type = 'CSV'
RECORD_DELIMITER = '0x0A'
field_delimiter = ' '
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
VALIDATE_UTF8 = FALSE
Please assume that I cannot use split, as I want to use the rich functionality of the file format (optional escape characters, date recognition etc.).
What I'm trying to achieve is something like the below (but I cannot find how to do it)
copy into destination_Table
from
(select
s.$1
,s.$2
,s.$3
,s.$4
from test_table s
file_format = (column_name ='my_csv_column' , format_name = 'my_csv_file_format'))

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

parse csv file with string fields that contain double quotes and/or commas using Snowflake COPY INTO - snowflake-cloud-data-platform

Try removing the option FIELD_OPTIONALLY_ENCLOSED_BY = '"' and also include a replace function in your inner query. Example: SELECT $1::NUMBER AS ID, $2 AS FULL_NAME, replace($3,'"','') AS JOB_TITLE, $5 AS EMAIL_ADDRESS FROM #STAGE_NAME

Related

Snowflake Copy Into - Multiple column escape handling

escape double quotes in snowflake

My CSV file with double quotes enclosed fields - numeric value ' "12131" ' not recognized

How do I unload a CSV file where only non-null values are wrapped in quotes, quotes are optionally enclosed, and null values are not quoted?

snowflake - How to use a file format to decode a csv column?

Categories

Resources