unable to load XML data using Copy data activity - sql-server

I am unable to load the XML data using copy data activity in sql server DB, but am able to achieve this with data flows using flatten hierarchy , while mapping the corresponding array is not coming properly in copy data even pipeline success also only partial data is loading in DB.
and auto creation of table is also not allowing while doing copy activity for XML file , has to create table script first and load the data ...
as we are using SHIR this activity should be done in using copy data activity.

Use the collection reference in mapping tab of copy activity to unroll by and extract the data. I repro'd this using copy activity with sample nested XML data.
img1: source dataset preview.
In mapping tab, Select Import schemas
Toggle on the Advanced editor
Give the json path of the array from which data needs to be iterated and extracted.
img:2 Mapping settings.
When pipeline is run, data is copied successfully to database.
img:3 sink data after copying.
Reference : MS document on hierarchical source to tabular sink.

Related

Azure Data Factory: Lookup varbinary column in SQL DB for use in a Script activity to write to another SQL DB - ByteArray is not supported

I'm trying to insert into an on-premises SQL database table called PictureBinary:
PictureBinary table
The source of the binary data is a table in another on-premises SQL database called DocumentBinary:
DocumentBinary table
I have a file with all of the Id's of the DocumentBinary rows that need copying. I feed those into a ForEach activity from a Lookup activity. Each of these files has about 180 rows (there are 50 files fed into a new instance of the pipeline in parallel).
Lookup and ForEach Activities
So far everything is working. But then, inside the ForEach I have another Lookup activity that tries to get the binary info to pass into a script that will insert it into the other database.
Lookup Binary column
And then the Script activity would insert the binary data into the table PictureBinary (in the other database).
Script to Insert Binary data
But when I debug the pipeline, I get this error when the binary column Lookup is reached:
ErrorCode=DataTypeNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column: coBinaryData,The data type ByteArray is not supported from the column named coBinaryData.,Source=,'
I know that the accepted way of storing the files would be to store them on the filesystem and just store the file path to the files in the database. But we are using a NOP database that stores the files in varbinary columns.
Also, if there is a better way of doing this, please let me know.
I tried to reproduce your scenario in my environment and got similar error
As per Microsoft document Columns with datatype Byte Array Are not supported in lookup activity is might be the main cause of error.
To workaround this as Follow below steps:
As you explained your case you have a file in which all the Id's of the DocumentBinary rows that need copy in destination are stored. To achieve this, you can simply use Copy activity with the Query where you copy records where the DocumentBinary in column is equal to the Id stored in file
First, I took lookup activity from where I can get Id's of the DocumentBinary rows stored in file
Then I took ForEach I passed the output of lookup activity to ForEach activity.
After this I took Copy activity in forEach activity
Select * from DocumentBinary
where coDocumentBinaryId = '#{item().PictureId}'
In source of copy activity select Use query as Query and pass above query with your names
Now go to Mapping Click on Import Schema then delete unwanted columns and map the columns accordingly.
Note: For this, columns in both tables are of similar datatypes either both uniqueidenntifier or both should be int
Sample Input in file:
Output (Copied only picture id contend in file from source to destination):

ADF copy activity - ignore the new columns in source without throwing an error

I have a pipeline that copies data from source (dynamics) to SQL server datawarehouse. There is a ForEach activity which iterates over the list of all the tables and in ADF copy activity the data is copied. Also, the data copy is incremental and that is achieved by using SQL query to load the data incrementally.
However, sometimes new columns are added to the source system but not yet exsist in the destination table. Right now my pipeline stops working and throws an error.
Is there a way to skip the newly added columns of the source system in ADF?
You can use the query option in the source and write the query to get the required columns in the select list from the source table.
Or you can edit the mapping in your copy activity and map only the required columns.

Need to map csv file to target table dynamically

I have several CSV files and have their corresponding tables (which will have same columns as that of CSVs with appropriate datatype) in the database with the same name as the CSV. So, every CSV will have a table in the database.
I somehow need to map those all dynamically. Once I run the mapping, the data from all the csv files should be transferred to the corresponding tables.I don't want to have different mappings for every CSV.
Is this possible through informatica?
Appreciate your help.
PowerCenter does not provide such feature out-of-the-box. Unless the structures of the source files and target tables are the same, you need to define separate source/target definitions and create mappings that use them.
However, you can use Stage Mapping Generator to generate a mapping for each file automatically.
PMy understanding is you have mant CSV files with different column layouts and you need to load them into appropriate tables in the Database.
Approach 1 : If you use any RDBMS you should have have some kind of import option. Explore that route to create tables based on csv files. This is a manual task.
Approach 2: Open the csv file and write formuale using the header to generate a create tbale statement. Execute the formula result in your DB. So, you will have many tables created. Now, use informatica to read the CSV and import all the tables and load into tables.
Approach 3 : using Informatica. You need to do lot of coding to create a dynamic mapping on the fly.
Proposed Solution :
mapping 1 :
1. Read the CSV file pass the header information to a java transformation
2. The java transformation should normalize and split the header column into rows. you can write them to a text file
3. Now you have all the columns in a text file. Read this text file and use SQL transformation to create the tables on the database
Mapping 2
Now, the table is available you need to read the CSV file excluding the header and load the data into the above table via SQL transformation ( insert statement) created by mapping 1
you can follow this approach for all the CSV files. I haven't tried this solution at my end but, i am sure that the above approach would work.
If you're not using any transformations, its wise to use Import option of the database. (e.g bteq script in Teradata). But if you are doing transformations, then you have to create as many Sources and targets as the number of files you have.
On the other hand you can achieve this in one mapping.
1. Create a separate flow for every file(i.e. Source-Transformation-Target) in the single mapping.
2. Use target load plan for choosing which file gets loaded first.
3. Configure the file names and corresponding database table names in the session for that mapping.
If all the mappings (if you have to create them separately) are same, use Indirect file Method. In the session properties under mappings tab, source option.., you will get this option. Default option will be Direct change it to Indirect.
I dont hav the tool now to explore more and clearly guide you. But explore this Indirect File Load type in Informatica. I am sure that this will solve the requirement.
I have written a workflow in Informatica that does it, but some of the complex steps are handled inside the database. The workflow watches a folder for new files. Once it sees all the files that constitute a feed, it starts to process the feed. It takes a backup in a time stamped folder and then copies all the data from the files in the feed into an Oracle table. An Oracle procedure gets to work and then transfers the data from the Oracle table into their corresponding destination staging tables and finally the Data Warehouse. So if I have to add a new file or a feed, I have to make changes in configuration tables only. No changes are required either to the Informatica Objects or the db objects. So the short answer is yes this is possible but it is not an out of the box feature.

SharePoint List: pass URL param to the SQL where clause

I have added a SQL database as external content type and created a SharePoint list based on that. I saw that while configuring it, there is an option to set a filter. I want to filter records based on the URL parameter which the list receives. I tried setting up a column_name = parameter filter, but how do I pass the query string parameter from the URL to this filter parameter?
For example, if I have 100 records for Basketball players, and the URL is list.aspx?team=pacers I want the list to load only the 10 pacers records from the SQL database. I don't want to load all the records and then filter the list using UI.
Thanks!
You should know about Data Source Filters
When you create an external content type filter, which is known as a
Data Source Filter, the filter operation occurs within the SQL Server
database. This is important when you are working with lots of data
because you can offload processing from SharePoint products to the
external database and gain performance improvements. After you create
the external list, you can use the Data Source Filter by creating a
view that specifies different filter values in the Data Source Filter
section of the List View settings page.
Read more, How to: Create external content types for SQL Server in SharePoint 2013

move images in physical directory to sql server image type

How do I use SSIS to iterate the image files in a directory and using the filename run a query to insert the image into sql server?
I realise that with a Foreach File Enumerator I can loop the files and get the filename into a variable. How do I use this variable to run a query to find the record for that filename from hd in my table and then import the image into my sql server image type column?
Once I have the file in my database, I will delete the file from hd.
If I'm understanding the problem correctly, you would like to sweep all the files in some location into SQL Server using SSIS?
Data Flow Task
Your data flow task will be responsible for the actual import of files into the database. Your approach would be the same as outlined in Import varbinary data Pretty picture version at insert XML file in SQL via SSIS
Your source will be a Script Transformation Component operating as a source component. It's job will be to add all the file names into the Data Flow. Change the filter in the second link to *.png (or whatever your filter is) and it should work.
Use the Import Column Component on the generated file names. This will add the file pointer into the data flow so that it can get imported into the database. You will need to ensure your data type is DT_IMAGE. Even if you're using varbinary(max)/varchar(max)/nvarchar(max) it's all going to be DT_IMAGE within the context of the pipeline's metadata.
Route all of that data into your target table and you will have imported your file data.
File cleanup
At this point, you have imported all this data and now you want to remove the files from disk. Assuming you stored the file name in the database along with the image bits, I'd use an Execute SQL Task to retrieve the list of file names. Change the output type from None to Full Result Set and store that into a variable of type Object.
Connect a Foreach Enumerator to the output of the SQL Task and here you'll want to "shred" the results. Google that term and you'll find a variety of blog posts or previous SO questions on how to do this. The end result will be a file name will be pulled from the recordset object and assigned to a local variable.
Inside the Foreach Enumerator, use a File System Task and Delete the file which is referenced in the variable set from the Foreach Enumerator.

Resources