OLE DB Source to Multiple destination with Different Column - sql-server

What will be best way to handle a use case where-
I have a old -db source containing 10 columns
This source data need to go to three places with different fields from source
Excel 1 ( 5 fields from Source )
Excel 2 with different field than previous excel
SQL server table to with another combination of fields
Script component is used to choose column seems to be an option. Multicast does not provide ability to pick and choose specific column.
Please see picture for my solution. Need to know if there is other option to achieve it

There are some tips that may helps:
Avoid Script Components
Instead of adding script components to select specific columns, in each OLE DB Destination, just don't map these columns.
Example:
image reference : how to assign a constant value to a column in oledb destination in ssis
Select specific columns in the OLEDB Source
If there are some columns in the OLE DB Source that wont be used in any of the destinations it is better to change the Access Mode and use SQL Command instead of Table or View and specify the columns needed in the Select query. As example, if the table contains 5 columns [Col1],[Col2], ... [Col5] and you only need [Col1],[Col2] use the following query:
Select [Col1],[Col2] From [Table]
Instead of of selecting the Table name
For more information:
SSIS OLE DB Source Editor Data Access Mode: “SQL command” vs “Table or view”

There isn't a better approach than what you have. Instead of adding script components, In each OLE DB Destination, just don't map the columns that you don't want to use in that destination.

Related

How to add new column in SSIS Package. When the columns are at max?

I have a package with 600 columns, and now I need to add a new column to the package. But SSIS is telling me that I reached the maximum limit of columns. Is there any way I can add the new column?
Before thinking about adding a new column, you should think about why you have 600 columns in a single package!!
When working with a data source with a considerable number of data columns, you should ignore the useless columns or merge them into one Jumbo column (comma-separated values, JSON, XML...).
Example 1 - Flat File
In case that your data source is a flat file and that only the first few columns are useful, you can go to the Flat File Connection Manager > Advanced Tab and minimize the number of columns.
Example 2 - SQL Server source
If your data source is an SQL database, you can customize the columns you need to merge using functions like CONCAT_WS() (Concatenate with separator). As an example:
SELECT [Co1], [Col2], [Col5], CONCAT_WS(';',[Col3],[Col4],[Col6]) as [Jumbo1], CONCAT_WS(';',[Col7],[Col8],[Col9]) as [Jumbo2]
FROM MyTable
Side IMPORTANT note: If you use an OLE DB source, do not ignore columns by unchecking them from the OLE DB Source editor. Use a SQL command instead. I highly suggest reading the following articles:
SSIS OLE DB Source: SQL Command vs. Table or View
Data Access Modes in SSIS OLE DB Destination: SQL Command vs. Table or View
Similar Issues
Is there a limit in the number of columns in a dataflow

Ignoring column from Excel file while importing to SQL Server

I have multiple Excel files that have the same format. I need to import them into SQL Server.
The issue I currently have is that there are two text columns that I need to ignore completely as they are free text and the character length for some rows exceeds what the server allows me to import which results in a truncation error.
Because I don't need these columns for my analysis, the table I'm importing to doesn't include these columns but for some reason the SSIS packages still picks up those columns and cuts the import job halfway through.
I tried using max character length for those columns which still results in the truncation error.
I need to create an SSIS package that ignores the two columns completely without deleting the columns from Excel.
You can specify which columns you need to ignore from the Edit Mappings dialog.
I have added the image for your reference:
If you just create the SSIS package in SSDT the Excel file can be queried to return only the required columns. In the package, create an Excel Connection Manager using the Excel file. Then on the Control Flow of the package add a Data Flow Task that has an Excel Source component in it. On this source, change the data access mode to SQL command and the file can then be queried similar to SQL. In the following example TabName is the name of the Excel tab containing the data that will be returned. If either the tab or any column names contain spaces they will need to be enclosed in square brackets, i.e. TabName would be [Tab Name].
Import/Export Wizard
Since you mentioned in the comments that you are using SQL Server Import/Export Wizard. You can solve that if you have a fixed columns (range) that you are looking to import (example: first 10 columns).
In Import/Export wizard, after selecting destination options you will be asked if you want to read from tables or query:
Select the query option, then use a simple select query and specify the columns range after the sheet name. As example:
SELECT * FROM [Sheet1$A:C]
The query above will read from the first 3 columns in Sheet1 since A:C represent the range between first column A and third column C.
Now, you can check the columns from the Edit Mappings dialog:
SSIS
You can use the same logic within SSIS package, just write the same SQL command in the Excel Source after changing the Access Mode to SQL Command.
The solution is simple. I needed to write a query that will exclude the columns. So instead of selecting "Copy data from one or more tables" you select "write a query" and exclude the columns you don't need. This one worked 100%

SSIS - combine results only if key doesn't exist in first dataset

I am trying to combine two inventory sources with SSIS. The first of which contains inventory information from our new system while the second contains legacy data. I am getting the data from the sources just fine.
Both data sets have the same columns, but I only want to get the results from the second data set if the ItemCode value for that record doesn't exist in the first data set.
Which transform would I need to use to achieve this?
Edit - here is what I have so far in my data flow.
I need to add a transform to the Extract Legacy Item Data source so that it will remove records whose item codes already exist in the Extract New Item Data source.
The two sources are on different servers so I cannot resolve by amending the query. I would also like to avoid running the same query that is run in the Extract New Item Data source.
If both Sources type is SQL databases and they are stored on the same Server, you can use an SQL Command as Source to achieve that:
SELECT Inverntory2.*
FROM Inverntory2 LEFT JOIN Inverntory1
On Inverntory2.ItemCode = Inverntory1.ItemCode
WHERE Inverntory1.ItemCode IS NULL
OR
SELECT *
FROM Inverntory2
WHERE NOT EXISTS (SELECT 1 FROM Inverntory1 WHERE Inverntory2.ItemCode = Inverntory1.ItemCode)
An example of this is below. Using a SQL Server Destination will work fine, however this only allows for loading to a local SQL Server instance, something that you may want to consider for the future. While a Lookup typically performs better, Merge Joins can be beneficial in certain circumstances such as when many additional columns are introduced into the Data Flow, as may be done with your data sets. It looks like #Hadi has covered how to do this with a Lookup, so you may want to test both approaches in a non-production environment that mimics prod, then assess the results to determine the better option.
Start off by creating a staging table which is an exact clone of one of the tables. Either table will work since they have the same definition. Make sure all columns in the staging allow null values.
Add an Execute SQL Task to clear the staging table before the Data Flow Task by either truncating or dropping and then creating the table.
Since ItemCode is unique sort on this column in each OLE DB Source. If you aren't already change the Data Access Mode to SQL command in both OLE DB Sources and add an ORDER BY clause for ItemCode. Do this by right-clicking the OLE DB Source and going to Show Advanced Editor > Input and Output Properties > OLE DB Source Output > Output Column > then select ItemCode and set the SortKeyPosition property to 1 (assuming you do ASC source in SQL statement).
Next add a Merge Join in the Data Flow Task. This requires both inputs to be sorted, which is why the inputs are now sorted. You can do this either way, but for this example use the OLE DB Source that will only be used when ItemCode does not exist as the merge join left input. Use a left outer join and the ItemCode column as the join key by connecting them via dragging a line from one to the other in the GUI. Add all the columns from the OLE DB Source that you want to use when the same ItemCode is in both data sets (from what I could tell this is Extract New Item Data, please adjust this if it isn't) by checking the check-box next to them in the Merge Join editor. Use an output alias prefix that will help you distingush these, for example X_ItemCode for the matching rows.
After the Merge Join add a Conditional Split. This is divide the records based on whether X_ItemCode was found. For the expression of the first output, use the ISNULL function to test if there was a match from the left outer join. For example ISNULL(X_ItemCode) != TRUE indicates that the ItemCode does exists in both data sets. You can call this output Matching Rows. The default output will contain the non-matches. To make it easier to distinguish you can rename the default output Non-Matching Rows.
Connect the Matching Rows output to the destination table. In this map only the columns of rows that were matched for the source you want to use when ItemCode exists in both data sets, i.e. the X_ prefixed rows such as X_ItemCode.
Add another SQL Server Destination in the Data Flow and connect the output Non-Matching Rows output to this, with all the columns mapped from rows that did not match, the one's without X_ in this example.
Back on the Control Flow in the package, add another Data Flow Task after this one. Use the staging table as the OLE DB Source and the destination table as the SQL Server Destination. Sorting isn't necessary here.
First of all, concerning that you are using SQL Server Destination, i suggest reading the following answer from the SSIS guru #billinkc:
Should SSIS packages and SQL database be on same server?
I will provide different methods to achieve that:
(1) Using Lookup transformation
You should add a Data Flow Task, where you add the second inventory (legacy) as source
Add a lookup transformation where you select the first inventory source as lookup table.
Map the source and lookup table with ItemCode column
In the lookup transformation select Redirect rows to no match output from the drop down list.
Use the Lookup no match output to get the desired rows (not found in the first Inventory source)
You can refer to the link below, it contains a step by step tutorials.
Helpful link
UNDERSTAND SSIS LOOKUP TRANSFORMATION WITH AN EXAMPLE STEP BY STEP
Old Versions of SSIS
If you are using old versions of SSIS, then you will not find the Redirect rows to no match output drop down list. Instead you should go to the Lookup Error output, select Redirect Row option for No Match situation, and use the error output to get the desired rows.
(2) Using Linked Servers
On the Second inventory create a Linked server to be able to connect the the first Server. Now you are be able to use an SQL Command that only select the rows not found in the first source:
SELECT *
FROM Inverntory2
WHERE NOT EXISTS (SELECT 1 FROM <Linked Server>.<database>.<schema>.Inverntory1 Inv1 WHERE Inverntory2.ItemCode = Inv1.ItemCode)
(3) Staging table + MERGE, MERGE JOIN , UNION ALL transformation
On each source SQL command add a fixed value column that contains the source id (1,2), example:
SELECT *, 1 as SourceID FROM Inventory
You can combine both sources in one destination using one of the transformation listed above, then add a second Data flow task to import distinct data from staging table into destination based on the ItemCode column, example:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ItemCode ORDER BY SourceID) rn
FROM StagingTable ) s
Where s.rn = 1
Then it will return all rows from SourceId =1 and the new rows from SourceId = 2
To learn more about Merge, Merge Join and UNION ALL transformation you can refer to one of the following links:
Learn SSIS : MERGE, MERGE JOIN and UNION ALL
SSIS Do I Union All or Merge??
Using the SSIS Merge Join
How to get unmatched data between two sources in SSIS Data Flow?
Note: check the answer provided by #userfl89 it contains very detailed information about using Merge Join transformation and it described another approach that can help. Now you have to test which approach fits your needs. Good Luck

Import multiple Excel files to create 1 table in SQL Server 2012

I have a few experience in using SQL Server 2012.
All I know to import a excel to database is like the following:
open SQL Server Management Studio
right click on the "table" folder -> Tasks -> Import Data
set data source to MS Excel.
It seems that only one Excel is entertained.
But I want to concatenate 6 Excel files (all with same column layouts) to form a single table in SQL Server.
P.S. No need to tell me to concatenate the Excel file manually by copy and paste because each individual Excel file has about ~50,000 records.
Any ideas / solutions by using sql scripts or any other programming methods?
Thanks a lot.
There's a range of ways to do this, but I'll give you the simplest that comes to mind without requiring any deep technical knowledge on your part.
Given that you're using the wizard, firstly on the 'Select Table Sources and Views' page, change the 'Destination' to be the name of the table you've previously created.
Then, under the 'Edit Mappings' menu when selecting your sheets, ensure you have 'Append rows to the destination table' selected, rather than Create/Delete. Within reason, this will achieve your goal.
There is a risk in flat file loading like this that SQL Server will create your table with unsuitable types (i.e. a Column is a text column, but only contained numbers on the first file - so the column was created as an INT and wont accept any other files). You'll need to create the tables from scratch again with the right structure or work with the mappings page to do this work.
Another way for the semi-technical type is as long as the data is equivelent between files, you can simply do your imports into a series of seperate tables:
Table1
Table2
Table3
...
Then do a
INSERT INTO Table1
SELECT * FROM Table2
UNION ALL
SELECT * FROM Table3
... Add tables here
You can then use DROP TABLE to remove the extras.

SSIS dynamic columns validation

I'm trying to use Dynamic Column mapping by selecting the destination table using the Variable Name option in the OLEDB destination. I'm getting the error: "OLE DB Destination" failed validation and returned validation status "VS_NEEDSNEWMETADATA".
I understand from what I've read that Dynamic column validation is not possible in SSIS. But then, why is it possible to select table destination in OLEDB using a variable name? Isn't it dynamic column mapping?
What I'm trying to do is to create a foreach loop to read a list of tables and import these tables from the source db to the staging area. Using the Variable Name destination within OLEDB seems perfect to me, but it does not work, even by enabling DelayValidation in the dataflow.
Thanks,
Rodrigo
Why would I use a TableName from Variable for my OLE DB Destination?
I automate the heck out of my SSIS package development. Instead of having to specify each table name, I have a variable called FullyQualifiedName that I populate once and then reuse for my package. Think of a truncate and reload pattern: Execute SQL Task to clear out the target table, A Foreach loop to load all the files-either because the names are dynamic or I have multiple days worth of data to load, and then Archive the file. I'd need to reference that table at least twice in that scenario. By having the table name in a variable, I can define it once and reference it in many different locations.
I have worked in environments where we physically isolate data based on the customer. i.e Blackstone.Sales, Yampas.Sales, Ranger.Sales, etc. When the customer logs in, their account can only access data in their schema. The tables are identical in structure but they have different names to ensure isolation. For a scenario like that, you could be matching file name to target table and therefore want to use a Variable to control what table is written to.
As you've already determined, you cannot accomplish dynamic column mapping in the manner you are attempting. If it's a straight copy from source to your staging environment, I'd just use a technology like Biml to generate the packages and be done with it.
I have faced and worked on such requests. NO, SSIS won't allow you dynamic column mappings. So I had tried something on the lines of below:
You need to first use your knowledge of the system and put together a sort of configuration table that would tell you the following things -
-Source Table(SourceTable)
-Columns to be extracted from source table(SourceQuery)
HINT: A SELECT query..e.g. SELECT ID, Name, Salary from dbo.tblEmployee
-Destination Table(DestinationTable)
-Columns which need to be fed from the source
-Few other details like server name/connection properties etc..
You would need to later traverse through the rows of this table using a ForEach Loop container.
Next, identify the maximum number of columns and maximum length of data types in these columns, in the source that might be up for extracting. You would need to create a table with information soon.
Create a sort of staging table let's say StgData. I will create this table with 50 columns, all of data type NVARCHAR(MAX). The CREATE statement should look like:
CREATE TABLE StgData
(
Column1 NVARCHAR(MAX),
Column2 NVARCHAR(MAX),
Column3 NVARCHAR(MAX),
....
Column50 NVARCHAR(MAX)
)
The raw data would be loaded onto StgData.
Now have a ForEach loop container traversing through ETLMappings.
Inside this, you would have to use INSERT statements in Execute SQL Task to load the data.
The script inside the task would look like:-
INSERT INTO dbo.StgData
?
? corresponds to the SourceQuery column(which should be captured by ForEach container.
Once the StgData is loaded, it should be used to load the DestinationTable(also captured in ForEach loop container)
Now again you need to have good understanding on schema and column mapping. The configuration table should have a column which stores the SQL query in the form
INSERT INTO DestTable1 SELECT Col1, CAST(Col2 as float) Col2 FROM StgData
Something on those lines.
This is just a basic structure. Ofcourse lot of formatting and customization has to be added.

Resources