snowflake schema with number how to use it in Azure data factory to copy data from Synapse to Snowflake - snowflake-cloud-data-platform

I am trying to copy data from Synapse and load into Snowflake, for this i am using Azure Data Factory and control table having source and target fields names
My problem here is the snowflake schema name starts with number for example 9289RESIST.Tablename,
but this is failing in ADF due to schema name start with number.
How to give the sink table schema name in Azure Copy activity?
I tried adding double cotes for schema name "9289RESIST" but it was returning me errors.

I created a schema with name 923_rlschema in snowflake and tried to call it dynamically from ADF by wrapping the schema name within double quotes and got the same error.
Message": "ErrorCode=UserErrorOdbcOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ERROR [42000] SQL compilation error:\nsyntax error line 1 at position 26 unexpected '923'.\nsyntax error line 1 at position 26 unexpected '923'.\nsyntax error line 1 at position 38 unexpected '""'.,
Then I removed the double quotes and control table looks as in below image.
This control table is taken as a dataset in lookup activity
For-each activity is taken and Lookup activity array output is given as items in for-each activity.
#activity('Lookup1').output.value
Inside for-each activity, copy activity is taken and source dataset is given
In sink dataset, schema name and table name are given as a dynamic content.
When pipeline is run, it got executed successfully.

Related

Azure Data Factory: Lookup varbinary column in SQL DB for use in a Script activity to write to another SQL DB - ByteArray is not supported

I'm trying to insert into an on-premises SQL database table called PictureBinary:
PictureBinary table
The source of the binary data is a table in another on-premises SQL database called DocumentBinary:
DocumentBinary table
I have a file with all of the Id's of the DocumentBinary rows that need copying. I feed those into a ForEach activity from a Lookup activity. Each of these files has about 180 rows (there are 50 files fed into a new instance of the pipeline in parallel).
Lookup and ForEach Activities
So far everything is working. But then, inside the ForEach I have another Lookup activity that tries to get the binary info to pass into a script that will insert it into the other database.
Lookup Binary column
And then the Script activity would insert the binary data into the table PictureBinary (in the other database).
Script to Insert Binary data
But when I debug the pipeline, I get this error when the binary column Lookup is reached:
ErrorCode=DataTypeNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column: coBinaryData,The data type ByteArray is not supported from the column named coBinaryData.,Source=,'
I know that the accepted way of storing the files would be to store them on the filesystem and just store the file path to the files in the database. But we are using a NOP database that stores the files in varbinary columns.
Also, if there is a better way of doing this, please let me know.
I tried to reproduce your scenario in my environment and got similar error
As per Microsoft document Columns with datatype Byte Array Are not supported in lookup activity is might be the main cause of error.
To workaround this as Follow below steps:
As you explained your case you have a file in which all the Id's of the DocumentBinary rows that need copy in destination are stored. To achieve this, you can simply use Copy activity with the Query where you copy records where the DocumentBinary in column is equal to the Id stored in file
First, I took lookup activity from where I can get Id's of the DocumentBinary rows stored in file
Then I took ForEach I passed the output of lookup activity to ForEach activity.
After this I took Copy activity in forEach activity
Select * from DocumentBinary
where coDocumentBinaryId = '#{item().PictureId}'
In source of copy activity select Use query as Query and pass above query with your names
Now go to Mapping Click on Import Schema then delete unwanted columns and map the columns accordingly.
Note: For this, columns in both tables are of similar datatypes either both uniqueidenntifier or both should be int
Sample Input in file:
Output (Copied only picture id contend in file from source to destination):

ADF copy activity - ignore the new columns in source without throwing an error

I have a pipeline that copies data from source (dynamics) to SQL server datawarehouse. There is a ForEach activity which iterates over the list of all the tables and in ADF copy activity the data is copied. Also, the data copy is incremental and that is achieved by using SQL query to load the data incrementally.
However, sometimes new columns are added to the source system but not yet exsist in the destination table. Right now my pipeline stops working and throws an error.
Is there a way to skip the newly added columns of the source system in ADF?
You can use the query option in the source and write the query to get the required columns in the select list from the source table.
Or you can edit the mapping in your copy activity and map only the required columns.

Read single text file and based on a particular value of a column load that record into its respective table

I have been searching on the internet for a solution to my problem but I can not seem to find any info. I have a large single text file ( 10 million rows), I need to create an SSIS package to load these records into different tables based on the transaction group assigned to that record. That is Tx_grp1 would go into Tx_Grp1 table, Tx_Grp2 would go into Tx_Grp2 table and so forth. There are 37 different transaction groups in the single delimited text file, records are inserted into this file as to when they actually occurred (by time). Also, each transaction group has a different number of fields
Sample data file
date|tx_grp1|field1|field2|field3
date|tx_grp2|field1|field2|field3|field4
date|tx_grp10|field1|field2
.......
Any suggestion on how to proceed would be greatly appreciated.
This task can be solved with SSIS, just with some experience. Here are the main steps and discussion:
Define a Flat file data source for your file, describing all columns. Possible problems here - different data types of fields based on tx_group value. If this is the case, I would declare all fields as strings long enough and later in the dataflow - convert its type.
Create a OLEDB Connection manager for the DB you will use to store the results.
Create a main dataflow where you will proceed the file, and add a Flat File Source.
Add a Conditional Split to the output of Flat file source, and define there as much filters and outputs as you have transaction groups.
For each transaction group data output - add Data Conversion for fields if necessary. Note - you cannot change data type of existing column, if you need to cast string to int - create a new column.
Add for each destination table an OLEDB Destination. Connect it to proper transaction group data flow, and map fields.
Basically, you are done. Test the package thoroughly on a test DB before using it on a production DB.

Inserting data into the newly created tables from table variable SSIS and not one single table

I have been searching for about a week now and I was wondering if anyone may have a clue. I wrote a package to do the following:
loop through a parent folder and its subfolders for a csv with a particular naming structure (works)
Create a table for each .csv based on the enumeration of each file (works).
Import the data into sql server in their own tables with the file name that was created as the table name and not OLE DB Destination (which does not work). It works if it there is destination folder for everything, but when I use table variable that does not work.
What I did was add an Execute SQL task to the for each container to create a table with a variable for the file path that is mapped as an expression in the for each container in a create table query under property sqlstatementsource expression. The tables are created, but when I use the variable that was mapped for the for each loop as the table name or variable in OLE DB Destination I get an error asking for me to check if the table exists. The tables are created, but I cannot get the insertion of the data into their own tables. Even when I bypass the error of "Destination table has not been provided" and run the package. I set delayValidation as true and still nothing. SSIS from what I have seen so far does some cool things. However, I am stuck right now. What else am I doing wrong?
I forgot to mention that the data is going to sql server.
Thanks for everything.
You can't create an OLEDB Destination at design time with a variable for a table name. The OLEDB destination needs to know the table name, and the columns, so that it can pre-map the data flow to the table columns.
You have a couple of other options:
You can use BiML to dynamically create your dataflows and destinations.
You can use an ExecuteSQL Transformation as your dataflow destination, and write a dynamic SQL statement that inserts each row in the dataflow to the desired table.

SSIS - How do I use a resultset as input in a SQL task and get data types right?

I am trying to merge records from an Oracle database table to my local SQL table.
I have a variable for the package that is an Object, called OWell.
I have a data flow task that gets the Oracle data as a SQL statment (select well_id, well_name from OWell order by Well_ID), and then a conversion task to convert well_id from a DT_STR of length 15 to a DT_WSTR; and convert well_name from a DT_STR of length 15 to DT_WSTR of length 50. That is then stored in the recordset OWell.
The reason for the conversions is the table that I want to add records to has an identity field: SSIS shows well_id as a DT_WSTR of length 15, well_name a DT_WSTR of length 50.
I then have a SQL task that connects to the local database and attempts to add records that are not there yet. I've tried various things: using the OWell as a result set and referring to it in my SQL statement. Currently, I have the ResultSet set to None, and the following SQL statment:
Insert into WELL (WELL_ID, WELL_NAME)
Select OWELL_ID, OWELL_NAME
from OWell
where OWELL_ID not in
(select WELL.WELL_ID from WELL)
For Parameter Mapping, I have Paramater 0, called OWell_ID, from my variable User::OWell. Parameter 1, called OWell_Name is from the same variable. Both are set to VARCHAR, although I've also tried NVARCHAR. I do not have a Result set.
I am getting the following error:
Error: 0xC002F210 at Insert records to FLEDG, Execute SQL Task: Executing the query "Insert into WELL (WELL_ID, WELL_NAME)
Select OWELL..." failed with the following error: "An error occurred while extracting the result into a variable of type (DBTYPE_STR)". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.
I don't think it's a data type issue, but rather that I somehow am not using the resultset properly. How, exactly, am I supposed to refer to that recordset in my SQL task, so that I can use the two recordset fields and add records that are missing?
Your problem is that you are trying to read an object variable into a sql task, and refer to that variable in the sql task.
To do what you are trying to do, you can use a foreach loop task. You can set the enumerator of a for each to an object (recordset) variable and map its columns to variables that you can then pass as parameters into your sql task. Your sql code in the example above has another flaw in that you are trying to reference a variable in your package as if it were a table in your database. You need to change your sql to be something like Insert into well(?,?)
This approach however leaves out the step where you can check to see if the records exists before you insert it. A better overall approach would be to do this all in a dataflow.
Do everything you are doing in your select from Oracle dataflow. At the last step, instead of using a recordset destination pointing to variable USER::OWell, add a lookup from the local sql table. Set your sql statement there to be select WELL.WELL_ID from WELL. On the columns tab in your lookup match Well_ID from your dataflow (fields on the left) to Well_ID from your lookup (fields on the right) by dragging the well_id field from the left to the right to form a connector between the boxes. At the bottom of the dialog box, click on Configure Error Output and set the error column value for the lookup output row to be Redirect Row. Choose OK to save and close this lookup. Next, add a oledb destination to the data flow and connect it to the error output of the lookup (the red arrow). Point the destination to the sql table and map the columns from the dataflow to the appropriate columns in the output table. This will pass the rows from the oracle dataflow that do not exist in the sql table into the bulk insert of the sql table.
To infer missing rows we either used a lookup task and then directed the unfound rows to an ordinary OLEDB destination (you just don't supply the identity column, obviously) or (where we were comparing a whole table) the SQLBI.com TableDifference component and routed the new rows to a similar OLEDB destination.
Individual INSERTs in SQL Command task aren't terribly quick.

Resources