I have an SSIS package that I want to use to update a column in a datawarehouse staging table based on the values of a surrogate key mapping table that contains the surrogate key paired with the natural key. Specifically I want to use the cache Lookup to update the fact staging table to contain the surrogate key for the inventory dimention in the same way that the following SQL would.
UPDATE A
SET A.DWHSurrogateKey = B.DWHSurrogateKey
FROM SaleStagingTable A INNER JOIN inventoryStagingTable on B.OLTPInventoryKey = A.OLTPInventoryKey
Unfortunately the nature of the data flow from Lookup transformation to destination means that it creates a whole new row, rather than updating the existing matched row. Is it possible to manipulate SSIS to do this?
Couple of constraints:
My destination is an ADO .NET destination, and we cannot use OLE DB Destinations or sources (we need to be able to use named parameters and you can't do that with OLE DB Connections)
I need to do this for multiple dimensions to link them to the fact table, so I can't just push the mapped data to new tables every time, as that becomes really messy and hard to manage
I'd like to be able to do what these guys have suggested but with ADO connectors rather than OLE DB:
http://redsouljaz.wordpress.com/2009/11/30/ssis-update-data-from-different-table-if-data-is-null/
http://www.rad.pasfu.com/index.php?/archives/46-SSIS-Upsert-With-Lookup-Transform.html
For such a simple update I would use an Execute SQL Task and save the hassle of having to mess around with a data flows. If you have lots of similar updates but with different fields and tables, I would store the column and table names in a Foreach Loop Container using a Foreach Item Enumerator, I would then add a Script Task that would take the item names and generate some dynamic SQL which could be stored in a variable, Next add the Execute SQL Task and get it to use the SQL variable.
Related
I am new to using ssis and am on my third package. We are taking data from Oracle into Sql Server. On my oracle table, the unique key is called recnum and is numeric(12,0). In this particular package, I am trying to take the record from oracle, lookup in a sql server table to see if that unique key is found, and if not add the record to the sql server table. My issue is it wouldn't find a match. After much testing, I came up with the following method that works. But I don't understand why I had to do this.
How I currently have it working:
I get the data from oracle. In my next step, I added a derived column that uses the oracle column. (The expression is just that field, no other formatting.) Then in the lookup I use the derived column instead of the column from Oracle.
We had already done this on another table where the unique key was numeric(8,0) and it worked ok without needing a derived column.
SSIS is very fussy about data types, lookups only work nicely if data types match.
Double click on the Data Path lines between Data Flow objects to check data types. I use Data Conversion tasks or CAST statements to force matching data types when I use lookups.
Hope this helps.
We have a large production MSSQL database (mdf appx. 400gb) and i have a test database. All the tables,indexes,views etc. are same eachother. I need to make sure that tha datas in the tables of this two database consistent. so i need to insert all the new rows and update all the updated rows into test db from production every night.
I came up with idea of using SSIS packages to make the data consistent by checking updated rows and new rows in all the tables. My SSIS Flow is ;
I have packages in SSIS for each tables seperately because;
Orderly;
Im getting the timestamp value in the table in order to get last 1 day rows instead of getting whole table.
I get the rows of the table in the production
Then im using 'Lookup' tool to compare this data with the test database table data.
Then im using conditional sprit to get a clue whether the data is new or updated.
If the data is new, i insert this data to the destination
5_2. If the data is updated, then i update the data in the destination table.
Data flow is in the MTRule and STBranch package in the picture
The problem is, im repeating creating all this single flow for each table and i have more than 300 table like this. It takes hours and hours :(
What im asking is;
Is there any way in SSIS to do this dynamically ?
PS: Every single table has its own columns and PK values but my data flow schema is always same. . (Below)
You can look into BiMLScript, which lets you create packages dynamically based on metadata.
I believe the best way to achieve this is to use Expressions. They empower you to dynamically set the source and Destination.
One possible solution might be as follows:
create a table which stores all your table names and PK columns
define a package which Loops through this table and which parses a SQL Statement
Call your main package and pass the stmt to it
Use the stmt as Data Source for your Data Flow
if applicable, pass the Destination Table as Parameter as well (another column in your config table)
This is how I processed several really huge tables: the data had to be fetched from 20 tables and moved to one single table.
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there.
Then call the stored procedure in a FOR EACH component in SSIS.
Why do you need to use SSIS?
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there. Then call the stored procedure in a FOR EACH component in SSIS.
In fact you might be able to do everything using a Stored Procedure and scheduling it in a SQL Agent Job.
I have been searching for about a week now and I was wondering if anyone may have a clue. I wrote a package to do the following:
loop through a parent folder and its subfolders for a csv with a particular naming structure (works)
Create a table for each .csv based on the enumeration of each file (works).
Import the data into sql server in their own tables with the file name that was created as the table name and not OLE DB Destination (which does not work). It works if it there is destination folder for everything, but when I use table variable that does not work.
What I did was add an Execute SQL task to the for each container to create a table with a variable for the file path that is mapped as an expression in the for each container in a create table query under property sqlstatementsource expression. The tables are created, but when I use the variable that was mapped for the for each loop as the table name or variable in OLE DB Destination I get an error asking for me to check if the table exists. The tables are created, but I cannot get the insertion of the data into their own tables. Even when I bypass the error of "Destination table has not been provided" and run the package. I set delayValidation as true and still nothing. SSIS from what I have seen so far does some cool things. However, I am stuck right now. What else am I doing wrong?
I forgot to mention that the data is going to sql server.
Thanks for everything.
You can't create an OLEDB Destination at design time with a variable for a table name. The OLEDB destination needs to know the table name, and the columns, so that it can pre-map the data flow to the table columns.
You have a couple of other options:
You can use BiML to dynamically create your dataflows and destinations.
You can use an ExecuteSQL Transformation as your dataflow destination, and write a dynamic SQL statement that inserts each row in the dataflow to the desired table.
I'm trying to use Dynamic Column mapping by selecting the destination table using the Variable Name option in the OLEDB destination. I'm getting the error: "OLE DB Destination" failed validation and returned validation status "VS_NEEDSNEWMETADATA".
I understand from what I've read that Dynamic column validation is not possible in SSIS. But then, why is it possible to select table destination in OLEDB using a variable name? Isn't it dynamic column mapping?
What I'm trying to do is to create a foreach loop to read a list of tables and import these tables from the source db to the staging area. Using the Variable Name destination within OLEDB seems perfect to me, but it does not work, even by enabling DelayValidation in the dataflow.
Thanks,
Rodrigo
Why would I use a TableName from Variable for my OLE DB Destination?
I automate the heck out of my SSIS package development. Instead of having to specify each table name, I have a variable called FullyQualifiedName that I populate once and then reuse for my package. Think of a truncate and reload pattern: Execute SQL Task to clear out the target table, A Foreach loop to load all the files-either because the names are dynamic or I have multiple days worth of data to load, and then Archive the file. I'd need to reference that table at least twice in that scenario. By having the table name in a variable, I can define it once and reference it in many different locations.
I have worked in environments where we physically isolate data based on the customer. i.e Blackstone.Sales, Yampas.Sales, Ranger.Sales, etc. When the customer logs in, their account can only access data in their schema. The tables are identical in structure but they have different names to ensure isolation. For a scenario like that, you could be matching file name to target table and therefore want to use a Variable to control what table is written to.
As you've already determined, you cannot accomplish dynamic column mapping in the manner you are attempting. If it's a straight copy from source to your staging environment, I'd just use a technology like Biml to generate the packages and be done with it.
I have faced and worked on such requests. NO, SSIS won't allow you dynamic column mappings. So I had tried something on the lines of below:
You need to first use your knowledge of the system and put together a sort of configuration table that would tell you the following things -
-Source Table(SourceTable)
-Columns to be extracted from source table(SourceQuery)
HINT: A SELECT query..e.g. SELECT ID, Name, Salary from dbo.tblEmployee
-Destination Table(DestinationTable)
-Columns which need to be fed from the source
-Few other details like server name/connection properties etc..
You would need to later traverse through the rows of this table using a ForEach Loop container.
Next, identify the maximum number of columns and maximum length of data types in these columns, in the source that might be up for extracting. You would need to create a table with information soon.
Create a sort of staging table let's say StgData. I will create this table with 50 columns, all of data type NVARCHAR(MAX). The CREATE statement should look like:
CREATE TABLE StgData
(
Column1 NVARCHAR(MAX),
Column2 NVARCHAR(MAX),
Column3 NVARCHAR(MAX),
....
Column50 NVARCHAR(MAX)
)
The raw data would be loaded onto StgData.
Now have a ForEach loop container traversing through ETLMappings.
Inside this, you would have to use INSERT statements in Execute SQL Task to load the data.
The script inside the task would look like:-
INSERT INTO dbo.StgData
?
? corresponds to the SourceQuery column(which should be captured by ForEach container.
Once the StgData is loaded, it should be used to load the DestinationTable(also captured in ForEach loop container)
Now again you need to have good understanding on schema and column mapping. The configuration table should have a column which stores the SQL query in the form
INSERT INTO DestTable1 SELECT Col1, CAST(Col2 as float) Col2 FROM StgData
Something on those lines.
This is just a basic structure. Ofcourse lot of formatting and customization has to be added.
I have a desktop application through which data is entered and it is being captured in MS Access DB. The application is being used by multiple users(at different locations). The idea is to download data entered for that particular day into an excel sheet and load it into a centralized server, which is an MSSQL server instance.
i.e. data(in the form of excel sheets) will come from multiple locations and saved into a shared folder in the server, which need to be loaded into SQL Server.
There is a ID column with IDENTITY in the MSSQL server table, which is the primary key column and there are no other columns in the table which contains unique value. Though the data is coming from multiple sources, we need to maintain single auto-updating series(IDENTITY).
Suppose, if there are 2 sources,
Source1: Has 100 records entered for the day.
Source2: Has 200 records entered for the day.
When they get loaded into Destination(SQL Server), table should have 300 records, with ID column values from 1 to 300.
Also, for the next day, when the data comes from the sources, Destination has to load data from 301 ID column.
The issue is, there may be some requests to change the data at Source, which is already loaded in central server. So how to update the data for that row in the central server as the ID column value will not be same in Source and Destination. As mentioned earlier ID is the only unique value column in the table.
Please suggest some ides to do this or I've to take up different approach to accomplish this task.
Thanks in advance!
Krishna
Okay so first I would suggest .NET and doing it through a File Stream Reader, dumping it to the disconnected layer of ADO.NET in a DataSet with multiple DataTables from the different sources. But... you mentioned SSIS so I will go that route.
Create an SSIS project in Business Intelligence Development Studio(BIDS).
If you know for a fact you are just doing a bunch of importing of Excel files I would just create many 'Data Flow Task's or many Source to Destination tasks in a single 'Data Flow Task' up to you.
a. Personally I would create tables in a database for each location of an excel file and have their columns map up. I will explain why later.
b. In a data flow task, select 'Excel Source' as the source file. Put in the appropriate location of 'new connection' by double clicking the Excel Source
c. Choose an ADO Net Destination, drag the blue line from the Excel Source to this endpoint.
d. Map your destination to be the table you map to from SQL.
e. Repeat as needed for each Excel destination
Set up the SSIS task to automate from SQL Server through SQL Management Studio. Remember you to connect to an integration instance, not a database instance.
Okay now you have a bunch of tables right instead of one big one? I did that for a reason as these should be entry points and the logic to determinate dupes and import time I would leave to another table.
I would set up another two tables for the combination of logic and for auditing later.
a. Create a table like 'Imports' or similar, have the columns be the same except add three more columns to it: 'ExcelFileLocation', 'DateImported'. Create an 'identity' column as the first column and have it seed on the default of (1,1), assign it the primary key.
b. Create a second table like 'ImportDupes' or similar, repeat the process above for the columns.
c. Create a unique constraint on the first table of either a value or set of values that make the import unique.
c. Write a 'procedure' in SQL to do inserts from the MANY tables that match up to the excel files to insert into the ONE 'Imports' location. In the many inserts do a process similar to:
Begin try
Insert into Imports (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End try
-- if logic breaks unique constraint put it into second table
Begin Catch
Insert into ImportDupes (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End Catch
-- repeat above for EACH excel table
-- clean up the individual staging tables for the next import cycle for EACH excel table
truncate TableExcel1
d. Automate the procedure to go off
You now have two tables, one for successful imports and one for duplicates.
The reason I did what I did is two fold:
You need to know more detail than just the detail a lot of times like when it came in, from what source it came from, was it a duplicate, if you do this for millions of rows can it be indexed easily?
This model is easier to take apart and automate. It may be more work to set up but if a piece breaks you can see where and easily stop the import for one location by turning off the code in a section.