Azure Data Factory Managed Instance -> Snowflake text with escape characters - snowflake-cloud-data-platform

I've got a Copy Data Activity in Data Factory which takes a table from a SQL Server Managed Instance and puts it into a Snowflake Instance.
The activity uses a temporary staging BLOB account.
When debugging the pipeline it's failing.
The error comes up as "Found character 't' instead of record delimiter '\r\n'".
It looks like it's caused by escape characters, but there is no options available to deal with escape characters on a temporary stage.
I think I could fix this by having two activities 1 moving Managed Instance to BLOB and 1 moving BLOB to Snowflake, but would prefer to handle it with just the 1 if possible.
I have tried to add to the user properties;
{
"name": "escapeQuoteEscaping",
"value": "true"
}
Is there anything else I could add in here?
Thanks,
Dan

It's the file format where you specify the details of the file being ingested, not the stage.
There are many options including the specification of delimiters and special characters within the data. The message
Found character 't' instead of record delimiter
suggests that you may have a tab-delimited file, so you could set \t as the delimiter in the file format.
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html

Related

Changing .csv delimiter on ADF

I am trying to load a .csv table to MS SQL Server via Azure Data Factory, but I have a problem with the delimiter (;) since it appears as a character in some of the values included in some columns.
As a result, I get an error saying in the details "found more columns than expected column count".
Is there any way to change the delimiter directly on ADF before/while loading the .csv table (ex.: making it from ";" to "|||")?
Thanks in advance!
I have a problem with the delimiter (;) since it appears as a
character in some of the values included in some columns.
As you have quoted that your delimiter is ; but it is occurring as a character in some of the columns which means that there is no specific pattern of the occurrence. Hence, it is not possible in ADF.
The recommendation is to write a program using any preferred language (like python) which will iterate each row from the dataset and write a logic to replace the delimiter to ||| or you can also remove the unrequired ; and append the changes in new file. Later you can ingest this new file in ADF.

Getting error due to '\'(escape char) as end of the column values during loading data into snowflake from sql server using COPY activity in ADF

I am using ADF copy activity to load data from SQL server to Snowflake, Getting an error during loading to snowflake as
ERROR:
"first error Found character '1' instead of field delimiter ','"
MY DATA:
c1 c2
rajesh\ 1
characher "" is treated as escape character while loading data into snowflake.
At sink side, I have added below 2 file format option and tried different values but it didn't work out.
ESCAPE_UNENCLOSED_FIELD
ESCAPE
Please share any suggestion how to handle this and Appreciate your support.
Thanks.
If you're using the GUI builder for the file format, scroll to the bottom section. In the drop-down boxes, you want to set the "Escape Unenclosed Field" to None. If there are still problems you can also try setting the "Escape Character" to None.
If you're using SQL to create or replace the file format, you want a section like this: ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = 'NONE'

SAP Data Services .csv data file load from Excel with special characters

I am trying to load data from an Excel .csv file to a flat file format to use as a datasource in a Data Services job data flow which then transfers the data to an SQL-Server (2012) database table.
I consistently lose 1 in 6 records.
I have tried various parameter values in the file format definition and settled on setting Adaptable file scheme to "Yes", file type "delimited", column delimeter "comma", row delimeter {windows new line}, Text delimeter ", language eng(English) and all else as defaults.
I have also set "write errors to file" to "yes" but it just creates an empty error file (I expected the 6,000 odd unloaded rows to be in here).
If we strip out three of the columns containing special characters (visible in XL) it loads a treat so I think these characters are the problem.
The thing is, we need the data in those columns and unfortunately, this .csv file is as good a data source as we are likely to get and it is always likely to contain special characters in these three columns so we need to be able to read it in if possible.
Should I try to specifically strip the columns in the Query source component of the dataflow? Am I missing a data-cleansing trick in the query or file format definition?
OK so didn't get the answer I was looking for but did get it to work by setting the "Row within Text String" parameter to "Row delimiter".

Why won't redshift accept my fixedwidth text file

I am reading a varchar(500) column from a SQL Server 2008 R2 database to import into Redshift via a fixedwidth text file.
To pull down the record into a fixed width file, I started out by using a StringBuilder to write out a block of text at a time. I was using AppendFormat and the alignment specifier to align the different records. At certain points, once every 400k lines, I would write the contents of StringBuilder into a StreamWriter to write to disk.
I noticed that there was an issue with text when I tried loading the files into Redshift, the upload into Reshift failed due to extra columns, (there were more columns than my fixed width specification accommodated for).
When I tested the StringBuilder against a regular string, the widths match what I intended them to match, 500 characters.
The discrepancy came when I tried writing my records to disk. I kept getting the same issue when I wrote the aforementioned database column to disk using WriteLineformat the StreamWriter object.
The collation on the database is SQL_Latin1_General_CP1_CI_AS. I understand that strings from the database get converted the database collation to UTF-16. I think there is no problem there, as stated from the test I performed above. I think the issue I'm having is from taking the string in UTF-16 form and writing them to disk using StreamWriter.
I can expect any type of character from the database field, except for a newline or carriage return. I'm pretty confident that white space is trimmed before being pushed into the database column using a combination of the TSQL functions Ltrim and Rtrim.
Edit: Following is the code I use in Powershell
$dw = new-object System.Data.SqlClient.SqlConnection("<connection string details>")
$dw.open()
$reader = (new-object System.Data.SqlClient.Sqlcommand("select email from emails",$dw)).ExecuteReader()
$writer = new-object system.IO.StreamWriter("C:\Emails.txt",[System.Text.Encoding]::UTF8)
while($reader.read())
{
$writer.writelineformat("{0,-500}",$reader["email"])
}
$writer.close()
$reader.close()
Obviously I'm not going to give you the details of my connection string or my table naming convention.
Edit: I'm including the AWS Redshift article that explains that data can only be imported into Redshift using UTF-8 encoding.
http://docs.aws.amazon.com/redshift/latest/dg/t_preparing-input-data.html
Edit: I was able to get a sample of the outputted file through
get-content -encoding utf8
The content inside of the file is definitely UTF-8 proper. All of the line endings within. It's seems like my main issue is with Redshift taking multi-byte characters for fixed width files.
I suspect that the issue is caused by the fact that StreamWriter by default uses UTF-8 so in some instances you will get double byte characters as utf-8 is variable width.
Try using unicode, which will match your database encoding, StreamWriter has an overload which supports encoding.
Just so that anyone seeing this understands. My problem is really with redshift. One thing I have noticed is that the service seems to have processing issues with fixedwidth files. This seems to be specific to Amazon, since the underlying system that runs Redshift is ParAccel. I had issues in the past with Fixedwidth files. I have been able to confirm that there is an issue with Redshift accepting multi-byte characters within fixedwidth version of the S3 Copy command.

Talend: Write data to PostgreSQL database error

I am trying to write data from a .csv file to my postgreSQL database. The connection is fine, but when I run my job i get the following error:
Exception in component tPostgresqlOutput_1
org.postgresql.util.PSQLException: ERROR: zero-length delimited identifier at or near """"
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1592)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1327)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:336)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:328)
at talend_test.exporttoexcel_0_1.exportToExcel.tFileInputDelimited_1Process(exportToExcel.java:568)
at talend_test.exporttoexcel_0_1.exportToExcel.runJobInTOS(exportToExcel.java:1015)
at talend_test.exporttoexcel_0_1.exportToExcel.main(exportToExcel.java:886)
My job is very simple:
tFileInputDelimiter -> PostgreSQL_Output
I think that the error means that the double quotes should be single quotes ("" -> ''), but how can i edit this in Talend?
Or is it another reason?
Can anyone help me on this one?
Thanks!
If you are using the customer.csv file from the repository then you have to change the properties of customer file by clicking through metadata->file delimited->customer in the repository pane.
You should be able to right click the customer file and then choose Edit file delimited. In the third screen, if the file extension is .csv then in Escape char settings you have to select CSV options. Typical escape sequences (as used by Excel and other programs) have escape char as "\"" and text enclosure is also "\"".
You should also check that encoding is set to UTF-8 in the file settings. You can then refresh your preview to view a sample of your file in a table format. If this matches your expectations of the data you should then be able to save the metadata entry and update this to your jobs.
If your file is not in the repository, then click on the component with your file and do all of the above CSV configuration steps in the basic settings of the component.

Resources