SSIS - how to use lookup to add extra columns within data flow - sql-server

I have a csv file containing the following columns:
Email | Date | Location
I want to pretty much throw this straight into a database table. The snag is that the Location values in the file are strings - eg: "Boston". The table I want to insert into has an integer property LocationId.
So, midway through my data flow, I need to do a database query to get the LocationId corresponding to the Location. eg:
SELECT Id as LocationId FROM Locations WHERE Name = { location string from current csv row }
and add this to my current column set as a new value "LocationId".
I don't know how to do this - I tried a lookup, but this meant I had to put the lookup in a separate data flow - the columns from my csv file don't seem to be available.
I want to use caching, as the same Locations are repeated a lot and I don't want to run selects for each row when I don't need to.
In summary:
How can I, part-way through the data flow, stick in a lookup transformation (from a different source, sql), and merge the output with the csv-derived columns?
Is lookup the wrong transformation to use?

Lookup transformation will work for you, it has caching and you can persist all input columns to the output and add columns from the query you use in lookup transformation.
You can also use merge join here, in some cases it is better solution, however it brings additional overhead because it requres sorting for its inputs.

Check this.
Right click on look up transformation -> go to show advanced editor -> go to Input and output properties.
here you can add new column or you can change data type of existing columns.
for more info how to use look up Click Here

Open the flat file connection manager, go to the Advanced tab.
Click "New" to add the new column and modify the properties.
Now go back to the Flat File Destination output, right click > Mappings > map the lookup column with the new one.

Related

How to grab a cell value and write it as a new column. Excel to SQL copy activity Azure Data Factory

I have a copy activity pipeline that simply copies cells A6:C9 into sql table.
However, now I need to grab the value in a cell A3 (date) and write it as a new column in sql table.
How can I achieve that?
Do I need to use another copy activity? Or it can be done in a single one?
UPDATE: value A3:A3 in Lookup activity.
You may fetch the desired value with a Lookup activity, and then add it to the Copy Data activity as an additional column. Please see the example below.
Start by creating a range parameter on your Excel dataset, so you may provide it dynamically in the pipeline:
Make sure the Range parameter is added in the Connection tab as well.
Next, create a pipeline with a Lookup activity, followed by a Copy Data activity. The lookup activity should provide the range for the cell you want to capture (in my case, B1)
Finally, in the Copy Data activity, insert an additional column and provide it the Lookup activity output.
The expression I used is #activity('Lookup1').output.firstRow.Prop_0. This will add a column called Date with the value in B1, for every row.

ADF V2 - SQL source dataset - column structure mapping issue

In copy activity (SQL data set to azure blob), i'm using dynamic content for source data set, sink data set & Mapping of source and sink.
in SQL source i used SP output having 3 columns named (col1,col2,col3) in same order, but in source data set structure i used dynamic content with same name but different order (col2, col1, col3), because of that values are swapped between col1 & col2 in the source data set itself
My question is why name based mapping is not taking in ADF V2 data set.
in the same way for another Source (SP Output) returns 7 columns, if i want to use only 3 columns it picking first 3 columns only, there is no leverage of columns to choose using dynamic content.
The dynamic schema mapping is really useful and saves a ton of work, specially when you dont have a fixed schema. In your case it seems that your schemas are always the same, so why not do the mapping yourself?
Just go to your copy activity, select the tab Mapping and click the button "New Mapping". It will pop 2 textbox with a line, indicating a column from source, being mapped to a column in the sink.
Just fill it with the corresponding names, and you should be good to go.
Hope this helped!

is it possible to pass columns through an SSIS script transformation?

I have a source with 100+ columns.
I need to pass these through a script component transformation, which altersonly a handful of the columns.
Is there a simple way to allow any columns i dont modify to simply pass through the transformation?
Currently, i have to pass them all in, and assign them to the output with no changes.
This is a lot of work for 100 columns and id rather not have to do it if possible!
FYI:
There is no unique key, so i cannot split out the records using multicast and merge them after the script component.
You actually have to choose what columns you want included in your script component as either read only or read/write.
Anything you do not select as read/write simply passes through.
There are things you can do with a script task. Like add an output column to your current data flow or even create a separate data flow output.
In your case. you will want to select the handful of columns that you want to alter as read/write, then modify those columns in script and the rest will just pass through.

Read single text file and based on a particular value of a column load that record into its respective table

I have been searching on the internet for a solution to my problem but I can not seem to find any info. I have a large single text file ( 10 million rows), I need to create an SSIS package to load these records into different tables based on the transaction group assigned to that record. That is Tx_grp1 would go into Tx_Grp1 table, Tx_Grp2 would go into Tx_Grp2 table and so forth. There are 37 different transaction groups in the single delimited text file, records are inserted into this file as to when they actually occurred (by time). Also, each transaction group has a different number of fields
Sample data file
date|tx_grp1|field1|field2|field3
date|tx_grp2|field1|field2|field3|field4
date|tx_grp10|field1|field2
.......
Any suggestion on how to proceed would be greatly appreciated.
This task can be solved with SSIS, just with some experience. Here are the main steps and discussion:
Define a Flat file data source for your file, describing all columns. Possible problems here - different data types of fields based on tx_group value. If this is the case, I would declare all fields as strings long enough and later in the dataflow - convert its type.
Create a OLEDB Connection manager for the DB you will use to store the results.
Create a main dataflow where you will proceed the file, and add a Flat File Source.
Add a Conditional Split to the output of Flat file source, and define there as much filters and outputs as you have transaction groups.
For each transaction group data output - add Data Conversion for fields if necessary. Note - you cannot change data type of existing column, if you need to cast string to int - create a new column.
Add for each destination table an OLEDB Destination. Connect it to proper transaction group data flow, and map fields.
Basically, you are done. Test the package thoroughly on a test DB before using it on a production DB.

How to insert a row into a dataset using SSIS?

I'm trying to create an SSIS package that takes data from an XML data source and for each row inserts another row with some preset values. Any ideas? I'm thinking I could use a DataReader source to generate the preset values by doing the following:
SELECT 'foo' as 'attribute1', 'bar' as 'attribute2'
The question is, how would I insert one row of this type for every row in the XML data source?
I'm not sure if I understand the question... My assumption is that you have n number of records coming into SSIS from your data source, and you want your output to have n * 2 records.
In order to do this, you can do the following:
multicast to create multiple copies of your input data
derived column transforms to set the "preset" values on the copies
sort
merge
Am I on the right track w/ what you're trying to accomplish?
I've never tried it, but it looks like you might be able to use a Derived Column transformation to do it: set the expression for attribute1 to "foo" and the expression for attribute2 to "bar".
You'd then transform the original data source, then only use the derived columns in your destination. If you still need the original source, you can Multicast it to create a duplicate.
At least I think this will work, based on the documentation. YMMV.
I would probably switch to using a Script Task and place your logic in there. You may still be able leverage the File Reading and other objects in SSIS to save some code.

Resources