I am using COPY command to populate tables in Snowflake from file data. File format is CSV and record delimiter is \n.
This works well for the majority of data, except for a few rows where some columns contain newline characters embedded in the data.
I have tried to set my record delimiter to something like $\n but it didn't seem to work.
Is it possible to load data with embedded newlines with the CSV file format?
If your data may contain newline characters, the text fields should be enclosed, for example, in double quotes or single quote when generating the file.
And when importing data using COPY INTO, this character must be defined in the FIELD_OPTIONALLY_ENCLOSED_BY attribute.
Reference: COPY INTO with TYPE = CSV
Related
I Unloaded a table from from Redshift to S3. The table is 212 columns wide. Some fields in some rows contain Arabic text.
Here's the Redshidt Unload command I used:
unload ('select * from dataw.testing')
to 's3://uarchive-live/rpt_all/rpt_all.txt'
iam_role 'arn:aws:iam::12345678988:role/service-role'
GZIP
DELIMITER '\t'
null as ''
;
When I attempt to COPY this file into Snowflake an error occurs.
End of record reached while expected to parse column '"RPT_ALL"["AUTO_TRAF_RETR_CNT":211]' File 'rpt_all_250/rpt_all.txt0000_part_113.gz', line 9684, character 1187 Row 9684, column "RPT_ALL"["AUTO_TRAF_RETR_CNT":211]
The field name referenced in the error is not the last field in the records, there are two more after that one.
I removed the Arabic text from the fields and left them blank, then I attempted the COPY again, and this time it Copied with no errors.
Here's the Snowflake File Format I'm using:
CREATE FILE FORMAT IF NOT EXISTS "DEV"."PUBLIC"."ff_noheader" TYPE = 'CSV' RECORD_DELIMITER = '\n' FIELD_DELIMITER = '\t' SKIP_HEADER = 0 COMPRESSION = 'GZIP' TIMESTAMP_FORMAT = 'AUTO' TRIM_SPACE = TRUE REPLACE_INVALID_CHARACTERS = TRUE;
Here's the Snowflake Copy command I'm using:
COPY INTO "DEV"."PUBLIC"."RPT_ALL" FROM #"stg_All"/snowflk_test.csv FILE_FORMAT="DEV"."PUBLIC"."ff_noheader";
What do I need to configure in Snowflake to accept this Arabic text so that the end of record is not corrupted?
Thanks
I'm not a Snowflake expert but I have used it and I have debug a lot issue like this.
My initial though as to why you are getting an unexpected EOR, which is \n, is that you data contains \n. If you data has \n then this will look like an EOR when the data is read. I don't believe there is a way to change the EOR in the Redshift UNLOAD command. So you need to ESCAPE in the Redshift UNLOAD command to add a backslash before characters like \n. You will also need to tell Snowflake what the escape character is - ESCAPE = '\' (I think you need double backslash in this statement). [There's a change you may need to quote your fields also but you will know that when you hit any issues hidden by this one.]
The other way would be to use a different unload format that doesn't suffer from overloaded character meaning.
There's a chance that the issue is in character encodings related to your Arabic text but I expect not since both Redshift and Snowflake are UTF-8 based systems. Possible but not likely.
I want to load a CSV file into a database.
The CSV content look like :
"AAAAA","DDDDD","ooooo"\r\n
"AAAAA","DDDDD","contennt exemple"\r\n
"AAAAA","DDDDD","file C:\hghjghj\gfhfhg\ssss\"\r\n
"AAAAA","DDDDD","mistake in Word"\r\n
I insert the content with the "Load data local infile instruction", but the last line is not included because the end of the previous field is ""\" , I don't know how i can change my code, could you help me please ?
My Code :
LOAD DATA LOCAL INFILE 'Import/file.TXT' INTO TABLE
`cree_re_import` FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n'
Thanks.
Ristof
A SHOW WARNINGS after executing LOAD DATA LOCAL INFILE should give you more information in case the server couldn't process the content of the file correctly.
The CSV content which you posted has incorrect format:
Backslash is the escape character within strings in SQL statements. To specify a literal backslash, you must specify two backslashes for the value to be interpreted as a single backslash:
So the 3d column in 3rd row should be
"file C:\\hghjghj\\gfhfhg\\ssss\\"
I am trying to load data from an Excel .csv file to a flat file format to use as a datasource in a Data Services job data flow which then transfers the data to an SQL-Server (2012) database table.
I consistently lose 1 in 6 records.
I have tried various parameter values in the file format definition and settled on setting Adaptable file scheme to "Yes", file type "delimited", column delimeter "comma", row delimeter {windows new line}, Text delimeter ", language eng(English) and all else as defaults.
I have also set "write errors to file" to "yes" but it just creates an empty error file (I expected the 6,000 odd unloaded rows to be in here).
If we strip out three of the columns containing special characters (visible in XL) it loads a treat so I think these characters are the problem.
The thing is, we need the data in those columns and unfortunately, this .csv file is as good a data source as we are likely to get and it is always likely to contain special characters in these three columns so we need to be able to read it in if possible.
Should I try to specifically strip the columns in the Query source component of the dataflow? Am I missing a data-cleansing trick in the query or file format definition?
OK so didn't get the answer I was looking for but did get it to work by setting the "Row within Text String" parameter to "Row delimiter".
I am getting the below error while trying to do bcp from a flat delimited file into Sybase IQ table.
Could not execute statement.
Non-space text found after ending quote character for an enclosed field.
I couldn't observe any non space text in the file, but this error is stopping me from doing the bulk copy. | is column delimiter with " as text qualifier and \n is row delimiter.
Below is the sample template for the same, am using.
LOAD TABLE TABLE_NAME(a NULL('(null)'),b NULL('(null)'),c NULL('(null)'))
USING CLIENT FILE '/home/...../a.txt' //unix
QUOTES ON
FORMAT bcp
STRIP RTRIM
DELIMITED BY '|'
ROW DELIMITED BY '\n'
When i perform the same query with QUOTES OFF, the load was successful. But, the same query is getting failed with QUOTES ON. I would like to get quotes stripped off, as well.
Sample Data
12345|"abcde"|(null)
12346|"abcdf"|"zxf"
12347|(null)|(null)
12348|"abcdg"|"zyf"
Any leads would be helpful!
If IQ bcp is the same as ASE, then I think those '(null)' fields are being interpreted as strings, not fields that are NULL.
You'd need to stream edit out those (null).
You're on unix so use sed or perl -ne.
E.g. pipe the file through " | perl -pne 's/(null)//g'" to the loading command or filename.
QUOTES OFF might seem to work, but I wonder if when you look in your loaded data, you'll see double quotes inside the 2nd field, and '(null)' where you expect a field to be NULL.
I have data in the csv file similar to this:
Name,Age,Location,Score
"Bob, B",34,Boston,0
"Mike, M",76,Miami,678
"Rachel, R",17,Richmond,"1,234"
While trying to BULK INSERT this data into a SQL Server table, I encountered two problems.
If I use FIELDTERMINATOR=',' then it splits the first (and sometimes the last) column
The last column is an integer column but it has quotes and comma thousand separator whenever the number is greater than 1000
Is there a way to import this data (using XML Format File or whatever) without manually parsing the csv file first?
I appreciate any help. Thanks.
You can parse the file with http://filehelpers.sourceforge.net/
And with that result, use the approach here: SQL Bulkcopy YYYYMMDD problem or straight into SqlBulkCopy
Use MySQL load data:
LOAD DATA LOCAL INFILE 'path-to-/filename.csv' INTO TABLE `sql_tablename`
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
IGNORE 1 LINES;
The part optionally enclosed by '\"', or escape character and quote, will keep the data in the first column together for the first field.
IGNORE 1 LINES will leave the field name row out.
UTF8 line is optional but good to use if names have diacritics, like in José.