Snowflake file format - ' ' is not recognized with Copy Into command - snowflake-cloud-data-platform

I have a txt file, see screenshot below for some of the data in the file.
I'm doing a COPY INTO a new table with no data in it but I'm seeing this error:
Timestamp ' ' is not recognized File 'my_file' , line 2, character 381
Row 1, column "CLICKS"["TS":7]
In the file format below I've added skip_header = 1 because headers are in the file, and field_delimiter = '"' because it looks like each value is separated by " "
This is my file format:
CREATE FILE FORMAT IF NOT EXISTS my_file_format
field_delimiter = '"'
date_format = 'YYYY-MM-DD'
error_on_column_count_mismatch = True
skip_header = 1
escape = NONE
NULL_IF=('NULL','',' ','NULL','NULL','//N', '\\N', ' ') ;
I've also created the table:
create table my_table(
PAGE_URL VARCHAR,
NORMALIZED_PAGE_URL VARCHAR,
TARGET_URL VARCHAR,
NORMALIZED_TARGET_URL VARCHAR,
CLICK_ID VARCHAR,
IMPRESSION_ID VARCHAR,
TS TIMESTAMP,
PUBLISHER_DOMAIN_ID INT,
etc...
);

This is a problem that originates in whatever system the text file was created. It's not a problem that can be solved easily in Snowflake.

Related

Converting a table in one form to another using Snowflake

I am trying to load a CSV file into Snowflake. The sample format of the input csv table in s3 location is as follows (with 2 columns: ID, Location_count):
Input csv table
I need to transform it in the below format:(with 3 columns:ID, Location, Count)
Output csv table
However when I am trying to load the input file using the following query after creating database, external stage and file format, it returns LOAD_FAILED
create or replace table table_name
(
id integer,
Location_count variant
);
select parse_json(Location_count) as c;
list #stage_name;
copy into table_name from #stage_name file_format = 'fileformatname' on_error = 'continue';
you will probably need to parse_json that 2nd column as part of a copy-transformation. For example:
create file format myformat
type = csv field_delimiter = ','
FIELD_OPTIONALLY_ENCLOSED_BY = '"';
create or replace stage csv_stage file_format = (format_name = myformat);
copy into #csv_stage from
( select '1',
'{"SHS-TRN":654738,"PRN-UTN":78956,"NCT-JHN":96767}') ;
create or replace table blah (id integer, something variant);
copy into blah from (select $1, parse_json($2) from #csv_stage);

parse csv file with string fields that contain double quotes and/or commas using Snowflake COPY INTO

MY QUESTION:
How do I construct my copy into statement so that my file properly parses and loads? Thanks!
THE PROBLEM:
I have a csv file that I need to parse and copy to a table from a named stage in Snowflake.
The file looks similar to below:
ID, Name, Job Title,Company Name, Email Address, Phone Number
5244, Ted Jones, Manager, Quality Comms, tj#email.com,555-630-1277
5246, Talim Jones,""P-Boss"" of the world, Quality Comms, taj#email.com,555-630-127
5247, Jordy Jax,,"M & G Services.",jj#services.com, 616-268-1546
MY CODE:
COPY INTO DB.SCHEMA.TABLE_NAME
(
ID,
FULL_NAME,
JOB_TITLE,
EMAIL_ADDRESS
)
FROM
(
SELECT $1::NUMBER AS ID,
$2 AS FULL_NAME,
$3 AS JOB_TITLE,
$5 AS EMAIL_ADDRESS
FROM #STAGE_NAME)
--SNOWFLAKE DOES NOT SUPPORT UTF 16 OR 32 SO HAVING REPLACE INVALID UTF 8 CHARACTERS
FILE_FORMAT = (TYPE = 'CSV', RECORD_DELIMITER = '\n', FIELD_DELIMITER = ',', SKIP_HEADER = 1,FIELD_OPTIONALLY_ENCLOSED_BY = '"',TRIM_SPACE = TRUE,REPLACE_INVALID_CHARACTERS = TRUE)
ON_ERROR = CONTINUE
--COPY A FILE INTO A TABLE EVEN IF IT HAS ALREADY BEEN LOADED INTO THE TABLE
FORCE = TRUE
MY ERROR MESSAGE:
Found character 'P' instead of field delimiter ','
WHAT I HAVE TRIED:
I have tried many things, most notably:
I have tried to escape the double quotes in my select statement for the Job Title.
I have tried removing the FIELD_OPTIONALLY_ENCLOSED_BY = '"' parameter and just using ESCAPE = '"' with no luck.
Try removing the option FIELD_OPTIONALLY_ENCLOSED_BY = '"' and also include a replace function in your inner query.
Example:
SELECT
$1::NUMBER AS ID,
$2 AS FULL_NAME,
replace($3,'"','') AS JOB_TITLE,
$5 AS EMAIL_ADDRESS
FROM #STAGE_NAME

S3 to Snowflake ( loading csv data in S3 to Snowflake table throwing following error)

I am trying to load .csv file data to Snowflake table and using following command
COPY INTO MYTABLE
FROM #S3PATH PATTERN='.*TEST.csv'
FILE_FORMAT = (type = csv skip_header = 1) ON_ERROR = CONTINUE PURGE=TRUE FORCE=TRUE;
Following scenario I am seeing
1) if even one column of the table is numeric it will throw error
Numeric value '""' is not recognized
2) if i change all the columns data type to varchar, then it will load the data but it will populate
all the columns data with "" double quotes ( instead of 15 , "15")
Thanks in advance for your response!
You're likely missing FIELD_OPTIONALLY_ENCLOSED_BY = '\042' in your file_format. Add that in and try.
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html#type-csv
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
Thanks CodeMonkey!
One issue is solved
current scenario:
One column is defines as " NUMBER" in SF table and if the csv file has a value populated for that columns then those were the only rows loaded in the table. basically if the numeric column in csv file is null (or blank) those record as not loaded.
also tried using
EMPTY_FIELD_AS_NULL = TRUE
still the same result as above.
"first_error" message: Numeric value '' is not recognized
here is what i did and it is working
FILE_FORMAT = (type = csv field_delimiter = ',' skip_header = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '\042' EMPTY_FIELD_AS_NULL = TRUE NULL_IF = ('NULL','null','')) ON_ERROR = CONTINUE PURGE=TRUE FORCE=TRUE;

My CSV file with double quotes enclosed fields - numeric value ' "12131" ' not recognized

I staged a csv file has all the fields enclosed in double quotes (" ") and comma separated and rows are separated by newline character. The value in the enclosed fields also contains newline characters (\n).
I am using the default FILE FORMAT = CSV. When using COPY INTO I am seeing a column mismatch error in this case.
I solved this first error by adding the file type to specify the FIELD_OPTIONALLY_ENCLOSED_BY = attribute in the SQL below.
However when I try to import NUMBER values from csv file, I already used FIELD_OPTIONALLY_ENCLOSED_BY='"'; but it's not working. I get "Numeric value '"3922000"' is not recognized" error.
A sample of my .csv file looks like this:
"3922000","14733370","57256","2","3","2","2","2019-05-23
14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000","","tcllVeEFPD"
My COPY INTO statement is below:
COPY INTO '..'
FROM '...'
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT";
I get a feeling that NUMBER is interpreted as STRING.
Does anyone have solution for that one?
Try using a subquery in the FROM clause of the COPY command where each column is listed out and cast the appropriate columns.
Ex.
COPY INTO '...'
FROM (
SELECT $1::INTEGER
$2::FLOAT
...
)
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT";

snowflake - How to use a file format to decode a csv column?

I've got some data in a string column that is in a strange csv format. I can write a file format that correctly interprets it. How do I use my file format against data that has already been imported?
create table test_table
(
my_csv_column string
)
How do I split/flatten this column with:
create or replace file format my_csv_file_format
type = 'CSV'
RECORD_DELIMITER = '0x0A'
field_delimiter = ' '
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
VALIDATE_UTF8 = FALSE
Please assume that I cannot use split, as I want to use the rich functionality of the file format (optional escape characters, date recognition etc.).
What I'm trying to achieve is something like the below (but I cannot find how to do it)
copy into destination_Table
from
(select
s.$1
,s.$2
,s.$3
,s.$4
from test_table s
file_format = (column_name ='my_csv_column' , format_name = 'my_csv_file_format'))

Resources