I have 2 flat file source with only one column in each.
First flat file as 1,2,3 as its data.
Second flat file as A,B,C as it data.
I want the cross join output of this into another flat file. I dont have any DB connection to use or basically I'm not allowed to use the DB connection. can I know which all transformations can help in doing this?
You can use the Merge Join transformation and use Full Outer Join for Join Type.
Related
I have multiple flat files based on which I need to transformation. I have at least 5 or 6 flat files which I have to use Merge Join to get final outcomes but after reading other articles I came to know I can only use two flat file source to use Merge join. Does anyone has experience or know how to use merge join when you have multiple flat file source( 5 or 6)?
You can use multiple merge join transformation instead of linking all sources into one merge join transformation, you can refer to the following link for an example:
merging two or more tables in SSIS
I am trying to combine two inventory sources with SSIS. The first of which contains inventory information from our new system while the second contains legacy data. I am getting the data from the sources just fine.
Both data sets have the same columns, but I only want to get the results from the second data set if the ItemCode value for that record doesn't exist in the first data set.
Which transform would I need to use to achieve this?
Edit - here is what I have so far in my data flow.
I need to add a transform to the Extract Legacy Item Data source so that it will remove records whose item codes already exist in the Extract New Item Data source.
The two sources are on different servers so I cannot resolve by amending the query. I would also like to avoid running the same query that is run in the Extract New Item Data source.
If both Sources type is SQL databases and they are stored on the same Server, you can use an SQL Command as Source to achieve that:
SELECT Inverntory2.*
FROM Inverntory2 LEFT JOIN Inverntory1
On Inverntory2.ItemCode = Inverntory1.ItemCode
WHERE Inverntory1.ItemCode IS NULL
OR
SELECT *
FROM Inverntory2
WHERE NOT EXISTS (SELECT 1 FROM Inverntory1 WHERE Inverntory2.ItemCode = Inverntory1.ItemCode)
An example of this is below. Using a SQL Server Destination will work fine, however this only allows for loading to a local SQL Server instance, something that you may want to consider for the future. While a Lookup typically performs better, Merge Joins can be beneficial in certain circumstances such as when many additional columns are introduced into the Data Flow, as may be done with your data sets. It looks like #Hadi has covered how to do this with a Lookup, so you may want to test both approaches in a non-production environment that mimics prod, then assess the results to determine the better option.
Start off by creating a staging table which is an exact clone of one of the tables. Either table will work since they have the same definition. Make sure all columns in the staging allow null values.
Add an Execute SQL Task to clear the staging table before the Data Flow Task by either truncating or dropping and then creating the table.
Since ItemCode is unique sort on this column in each OLE DB Source. If you aren't already change the Data Access Mode to SQL command in both OLE DB Sources and add an ORDER BY clause for ItemCode. Do this by right-clicking the OLE DB Source and going to Show Advanced Editor > Input and Output Properties > OLE DB Source Output > Output Column > then select ItemCode and set the SortKeyPosition property to 1 (assuming you do ASC source in SQL statement).
Next add a Merge Join in the Data Flow Task. This requires both inputs to be sorted, which is why the inputs are now sorted. You can do this either way, but for this example use the OLE DB Source that will only be used when ItemCode does not exist as the merge join left input. Use a left outer join and the ItemCode column as the join key by connecting them via dragging a line from one to the other in the GUI. Add all the columns from the OLE DB Source that you want to use when the same ItemCode is in both data sets (from what I could tell this is Extract New Item Data, please adjust this if it isn't) by checking the check-box next to them in the Merge Join editor. Use an output alias prefix that will help you distingush these, for example X_ItemCode for the matching rows.
After the Merge Join add a Conditional Split. This is divide the records based on whether X_ItemCode was found. For the expression of the first output, use the ISNULL function to test if there was a match from the left outer join. For example ISNULL(X_ItemCode) != TRUE indicates that the ItemCode does exists in both data sets. You can call this output Matching Rows. The default output will contain the non-matches. To make it easier to distinguish you can rename the default output Non-Matching Rows.
Connect the Matching Rows output to the destination table. In this map only the columns of rows that were matched for the source you want to use when ItemCode exists in both data sets, i.e. the X_ prefixed rows such as X_ItemCode.
Add another SQL Server Destination in the Data Flow and connect the output Non-Matching Rows output to this, with all the columns mapped from rows that did not match, the one's without X_ in this example.
Back on the Control Flow in the package, add another Data Flow Task after this one. Use the staging table as the OLE DB Source and the destination table as the SQL Server Destination. Sorting isn't necessary here.
First of all, concerning that you are using SQL Server Destination, i suggest reading the following answer from the SSIS guru #billinkc:
Should SSIS packages and SQL database be on same server?
I will provide different methods to achieve that:
(1) Using Lookup transformation
You should add a Data Flow Task, where you add the second inventory (legacy) as source
Add a lookup transformation where you select the first inventory source as lookup table.
Map the source and lookup table with ItemCode column
In the lookup transformation select Redirect rows to no match output from the drop down list.
Use the Lookup no match output to get the desired rows (not found in the first Inventory source)
You can refer to the link below, it contains a step by step tutorials.
Helpful link
UNDERSTAND SSIS LOOKUP TRANSFORMATION WITH AN EXAMPLE STEP BY STEP
Old Versions of SSIS
If you are using old versions of SSIS, then you will not find the Redirect rows to no match output drop down list. Instead you should go to the Lookup Error output, select Redirect Row option for No Match situation, and use the error output to get the desired rows.
(2) Using Linked Servers
On the Second inventory create a Linked server to be able to connect the the first Server. Now you are be able to use an SQL Command that only select the rows not found in the first source:
SELECT *
FROM Inverntory2
WHERE NOT EXISTS (SELECT 1 FROM <Linked Server>.<database>.<schema>.Inverntory1 Inv1 WHERE Inverntory2.ItemCode = Inv1.ItemCode)
(3) Staging table + MERGE, MERGE JOIN , UNION ALL transformation
On each source SQL command add a fixed value column that contains the source id (1,2), example:
SELECT *, 1 as SourceID FROM Inventory
You can combine both sources in one destination using one of the transformation listed above, then add a second Data flow task to import distinct data from staging table into destination based on the ItemCode column, example:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ItemCode ORDER BY SourceID) rn
FROM StagingTable ) s
Where s.rn = 1
Then it will return all rows from SourceId =1 and the new rows from SourceId = 2
To learn more about Merge, Merge Join and UNION ALL transformation you can refer to one of the following links:
Learn SSIS : MERGE, MERGE JOIN and UNION ALL
SSIS Do I Union All or Merge??
Using the SSIS Merge Join
How to get unmatched data between two sources in SSIS Data Flow?
Note: check the answer provided by #userfl89 it contains very detailed information about using Merge Join transformation and it described another approach that can help. Now you have to test which approach fits your needs. Good Luck
I have 2 inputs. One is text file and the other is proprietary DB which is basically non oledb connection. I would like to compare both these datasets and give me 2 outputs matched and non matched. I can use merge join but it gives only one output. For lookup I may need to load the text file data into SQL server to give me 2 outputs as required. I would like to avoid the extra step of loading data into SQL server. Please let me know the simplest way of doing this.
Thanks.
Do the Merge Join using an Outer Join, so that rows that don't match will have a NULL column in the output.
Then do a conditional split and send the rows with NULLs to the non-matched output, and the rows with NOT NULL to the matched output.
I have two tables on two different servers that have identical schemas. For simplicity, let's say each table has 3 fields. I am trying to create a SSIS package that takes the data from both tables and merges it into one recordset.
I added two OLE DB sources, which get the same three fields from the two tables. I then have a Sort transformation on each, that then flows into a Merge Join. I set the join type to "Full Outer Join" on the Merge Join. I can select all six of the fields and see the output using a Data Viewer coming out of the Merge Join.
Let's say there were 25 records in each source table. In the Data Viewer, I end up getting 50 records - 25 with NULL in the last three fields, and 25 with NULL in the first three fields. I would like the output to be 50 records with data in only three fields. What I am doing wrong here? Should I be using some other sort of merge option?
I would greatly appreciate any suggestions on how to resolve what should be a simple task. Thanks!
The output from the merge join you are using is correct since you're using a full outer join. To fix your problem, use a merge transformation instead of merge join. This will combine your two sorted data flows into one sorted data flow. You've already set up your data flow correctly from your description (it should look like this):
Documentation can be found here: https://msdn.microsoft.com/en-us/library/ms141703.aspx
I'm creating a warehouse using SQL Server 2008 and Analysis Services. I've managed to create and populate the dimension tables, but I'm having a lot of trouble writing the SQL for loading the fact table. For one thing, I'm not sure how to load the keys of the fact table with the PKs from the dimension table. I tried writing a query that had a series of JOINs to get the keys and the measures I want, but the statement got so complicated that I got lost.
This is the star schema that I have to work from:
http://i.imgur.com/C3DGj.png
What am I doing wrong? I have a feeling that I'm missing something pretty basic, but I'm fairly new to this and most of the information I found online seemed to deal with using SSIS, which I don't have installed.
Any help would be appreciated.
Todays Data Warehouse Developer uses SSIS for loading dimensional models. Typically, lookups are used to convert the dimensional attribute into a key. Most of the time, the data is going to be on another server or in a flat file or something else that forces you to use an ETL tool like SSIS, but in your case, you can get it done without. If your enterprise is serious about BI, you should push to get SSIS installed and learn it.
For your situation, assuming you have a table loaded with raw facts locally, you should be able to do an insert/select.
Basically, you'll want to inner join (since you've had no problems populating the dimension tables) each dimension to the raw facts table. Something like:
INSERT trainingcentrefact
(timekey,locationkey,instructorkey,coursekey,paid,notpaid,... etc)
SELECT
t.timekey
,l.locationkey
,i.instructorkey
,c.coursekey
,rf.paid
,rf.notpaid
,... etc
FROM rawfacts rf
INNER JOIN timedimension t ON rf.time = t.time
INNER JOIN locationdimension l on rf.location = l.location
INNER JOIN instructordimension i on rf.instructor = i.instructor
INNER JOIN coursedimension c on rf.course = c.course