SSIS dynamic columns validation - sql-server

I'm trying to use Dynamic Column mapping by selecting the destination table using the Variable Name option in the OLEDB destination. I'm getting the error: "OLE DB Destination" failed validation and returned validation status "VS_NEEDSNEWMETADATA".
I understand from what I've read that Dynamic column validation is not possible in SSIS. But then, why is it possible to select table destination in OLEDB using a variable name? Isn't it dynamic column mapping?
What I'm trying to do is to create a foreach loop to read a list of tables and import these tables from the source db to the staging area. Using the Variable Name destination within OLEDB seems perfect to me, but it does not work, even by enabling DelayValidation in the dataflow.
Thanks,
Rodrigo

Why would I use a TableName from Variable for my OLE DB Destination?
I automate the heck out of my SSIS package development. Instead of having to specify each table name, I have a variable called FullyQualifiedName that I populate once and then reuse for my package. Think of a truncate and reload pattern: Execute SQL Task to clear out the target table, A Foreach loop to load all the files-either because the names are dynamic or I have multiple days worth of data to load, and then Archive the file. I'd need to reference that table at least twice in that scenario. By having the table name in a variable, I can define it once and reference it in many different locations.
I have worked in environments where we physically isolate data based on the customer. i.e Blackstone.Sales, Yampas.Sales, Ranger.Sales, etc. When the customer logs in, their account can only access data in their schema. The tables are identical in structure but they have different names to ensure isolation. For a scenario like that, you could be matching file name to target table and therefore want to use a Variable to control what table is written to.
As you've already determined, you cannot accomplish dynamic column mapping in the manner you are attempting. If it's a straight copy from source to your staging environment, I'd just use a technology like Biml to generate the packages and be done with it.

I have faced and worked on such requests. NO, SSIS won't allow you dynamic column mappings. So I had tried something on the lines of below:
You need to first use your knowledge of the system and put together a sort of configuration table that would tell you the following things -
-Source Table(SourceTable)
-Columns to be extracted from source table(SourceQuery)
HINT: A SELECT query..e.g. SELECT ID, Name, Salary from dbo.tblEmployee
-Destination Table(DestinationTable)
-Columns which need to be fed from the source
-Few other details like server name/connection properties etc..
You would need to later traverse through the rows of this table using a ForEach Loop container.
Next, identify the maximum number of columns and maximum length of data types in these columns, in the source that might be up for extracting. You would need to create a table with information soon.
Create a sort of staging table let's say StgData. I will create this table with 50 columns, all of data type NVARCHAR(MAX). The CREATE statement should look like:
CREATE TABLE StgData
(
Column1 NVARCHAR(MAX),
Column2 NVARCHAR(MAX),
Column3 NVARCHAR(MAX),
....
Column50 NVARCHAR(MAX)
)
The raw data would be loaded onto StgData.
Now have a ForEach loop container traversing through ETLMappings.
Inside this, you would have to use INSERT statements in Execute SQL Task to load the data.
The script inside the task would look like:-
INSERT INTO dbo.StgData
?
? corresponds to the SourceQuery column(which should be captured by ForEach container.
Once the StgData is loaded, it should be used to load the DestinationTable(also captured in ForEach loop container)
Now again you need to have good understanding on schema and column mapping. The configuration table should have a column which stores the SQL query in the form
INSERT INTO DestTable1 SELECT Col1, CAST(Col2 as float) Col2 FROM StgData
Something on those lines.
This is just a basic structure. Ofcourse lot of formatting and customization has to be added.

Related

How to automate export of multiple tables to delimited text in SSIS?

I am trying to create some sort of automation whereby I can generate a series of pipe-delimited text extracts for about 100 different tables each month. Each extract would be based on a simple query like this:
SELECT *
FROM tablename
WHERE AsOfDate = 'currentmonth'
where both tablename and currentmonth would be variables. The tablename variable name would change for each of the tables but currentmonth would remain the same throughout the execution.
I have been attempting to build an SSIS package that uses a ForEach Loop container that runs through a list of all the table names and passes that variable into a SQL string, which is then used by the OLE DB Data source in the data flow.
However, all of these tables have different columns. Based on what I can tell, it would not be feasible to do a simple OLE DB Source to a Flat File Destination within that loop container since the Flat File Connection Manager must be configured to account for the different columns of each table.
Would there be any feasible way to do this outside of configuring the process manually for each of the 100+ tables?
You could look into BiML which programmatically creates your dataflows based on metadata.
Or you could use a Script task that loops through the tables, loops through their columns, and generates text files instead of using any dataflow at all.

How to use Sybase LOAD TABLE correctly?

I have been using ISQL (SQLAnywhere 12) to import data from CSVs into existing tables using INPUT INTO and never ran into a problem. Today I needed to import data into a table containing an auto-increment column, however, and thought I just needed to leave that column blank, so I tried it with a file containing only 1 row of data (to be safe). Turns out it imported with a 0 in the auto-increment field instead of the next integer value.
Looking at the Sybase documentation, it seems like I should be using LOAD TABLE instead, but the examples look a bit complex.
My questions are the following...
The documentation says the CSV file needs to be on the database server and not the client. I do not have access to the database server itself - can I load the file from within ISQL remotely?
How do I define the columns of the table I'm loading into? What if I am only loading data into a few columns and leaving the rest as NULL?
To confirm, this will leave existing data in the table as-is and simply add to it using whatever is in the CSV?
Many thanks in advance.
Yes. Check out the online documentation for LOAD TABLE - you can use the USING CLIENT FILE clause.
You can specify the column names in parens after the table name, i.e. LOAD TABLE mytable (col1, col2, col3) USING CLIENT FILE 'mylocalfile.txt'. Any columns not listed here will be set to NULL if the column is nullable or the equivalent to an empty string if it's not - this is why your autoincrement column was set to 0. You can use the DEFAULTS ON clause to get what you want.
Yes, existing data in the table is not affected.

Inserting data into the newly created tables from table variable SSIS and not one single table

I have been searching for about a week now and I was wondering if anyone may have a clue. I wrote a package to do the following:
loop through a parent folder and its subfolders for a csv with a particular naming structure (works)
Create a table for each .csv based on the enumeration of each file (works).
Import the data into sql server in their own tables with the file name that was created as the table name and not OLE DB Destination (which does not work). It works if it there is destination folder for everything, but when I use table variable that does not work.
What I did was add an Execute SQL task to the for each container to create a table with a variable for the file path that is mapped as an expression in the for each container in a create table query under property sqlstatementsource expression. The tables are created, but when I use the variable that was mapped for the for each loop as the table name or variable in OLE DB Destination I get an error asking for me to check if the table exists. The tables are created, but I cannot get the insertion of the data into their own tables. Even when I bypass the error of "Destination table has not been provided" and run the package. I set delayValidation as true and still nothing. SSIS from what I have seen so far does some cool things. However, I am stuck right now. What else am I doing wrong?
I forgot to mention that the data is going to sql server.
Thanks for everything.
You can't create an OLEDB Destination at design time with a variable for a table name. The OLEDB destination needs to know the table name, and the columns, so that it can pre-map the data flow to the table columns.
You have a couple of other options:
You can use BiML to dynamically create your dataflows and destinations.
You can use an ExecuteSQL Transformation as your dataflow destination, and write a dynamic SQL statement that inserts each row in the dataflow to the desired table.

Load data from multiple source into a destination

I have a desktop application through which data is entered and it is being captured in MS Access DB. The application is being used by multiple users(at different locations). The idea is to download data entered for that particular day into an excel sheet and load it into a centralized server, which is an MSSQL server instance.
i.e. data(in the form of excel sheets) will come from multiple locations and saved into a shared folder in the server, which need to be loaded into SQL Server.
There is a ID column with IDENTITY in the MSSQL server table, which is the primary key column and there are no other columns in the table which contains unique value. Though the data is coming from multiple sources, we need to maintain single auto-updating series(IDENTITY).
Suppose, if there are 2 sources,
Source1: Has 100 records entered for the day.
Source2: Has 200 records entered for the day.
When they get loaded into Destination(SQL Server), table should have 300 records, with ID column values from 1 to 300.
Also, for the next day, when the data comes from the sources, Destination has to load data from 301 ID column.
The issue is, there may be some requests to change the data at Source, which is already loaded in central server. So how to update the data for that row in the central server as the ID column value will not be same in Source and Destination. As mentioned earlier ID is the only unique value column in the table.
Please suggest some ides to do this or I've to take up different approach to accomplish this task.
Thanks in advance!
Krishna
Okay so first I would suggest .NET and doing it through a File Stream Reader, dumping it to the disconnected layer of ADO.NET in a DataSet with multiple DataTables from the different sources. But... you mentioned SSIS so I will go that route.
Create an SSIS project in Business Intelligence Development Studio(BIDS).
If you know for a fact you are just doing a bunch of importing of Excel files I would just create many 'Data Flow Task's or many Source to Destination tasks in a single 'Data Flow Task' up to you.
a. Personally I would create tables in a database for each location of an excel file and have their columns map up. I will explain why later.
b. In a data flow task, select 'Excel Source' as the source file. Put in the appropriate location of 'new connection' by double clicking the Excel Source
c. Choose an ADO Net Destination, drag the blue line from the Excel Source to this endpoint.
d. Map your destination to be the table you map to from SQL.
e. Repeat as needed for each Excel destination
Set up the SSIS task to automate from SQL Server through SQL Management Studio. Remember you to connect to an integration instance, not a database instance.
Okay now you have a bunch of tables right instead of one big one? I did that for a reason as these should be entry points and the logic to determinate dupes and import time I would leave to another table.
I would set up another two tables for the combination of logic and for auditing later.
a. Create a table like 'Imports' or similar, have the columns be the same except add three more columns to it: 'ExcelFileLocation', 'DateImported'. Create an 'identity' column as the first column and have it seed on the default of (1,1), assign it the primary key.
b. Create a second table like 'ImportDupes' or similar, repeat the process above for the columns.
c. Create a unique constraint on the first table of either a value or set of values that make the import unique.
c. Write a 'procedure' in SQL to do inserts from the MANY tables that match up to the excel files to insert into the ONE 'Imports' location. In the many inserts do a process similar to:
Begin try
Insert into Imports (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End try
-- if logic breaks unique constraint put it into second table
Begin Catch
Insert into ImportDupes (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End Catch
-- repeat above for EACH excel table
-- clean up the individual staging tables for the next import cycle for EACH excel table
truncate TableExcel1
d. Automate the procedure to go off
You now have two tables, one for successful imports and one for duplicates.
The reason I did what I did is two fold:
You need to know more detail than just the detail a lot of times like when it came in, from what source it came from, was it a duplicate, if you do this for millions of rows can it be indexed easily?
This model is easier to take apart and automate. It may be more work to set up but if a piece breaks you can see where and easily stop the import for one location by turning off the code in a section.

How do you get an SSIS package to only insert new records when copying data between servers

I am copying some user data from one SqlServer to another. Call them Alpha and Beta. The SSIS package runs on Beta and it gets the rows on Alpha that meet a certain condition. The package then adds the rows to Beta's table. Pretty simple and that works great.
The problem is that I only want to add new rows into Beta. Normally I would just do something simple like....
INSERT INTO BetaPeople
SELECT * From AlphaPeople
where ID NOT IN (SELECT ID FROM BetaPeople)
But this doesn't work in an SSIS package. At least I don't know how and that is the point of this question. How would one go about doing this across servers?
Your example seems simple, looks like you are adding only new people, not looking for changed data in existing records. In this case, store the last ID in the DB.
CREATE TABLE dbo.LAST (RW int, LastID Int)
go
INSERT INTO dbo.LAST (RW, LastID) VALUES (1,0)
Now you can use this to insert the last ID of the row transferred.
UPDATE dbo.LAST SET LastID = #myLastID WHERE RW = 1
When selecting OLEDB source, set data access mode to SQL Command and use
DECLARE #Last int
SET #Last = (SELECT LastID FROM dbo.LAST WHERE RW = 1)
SELECT * FROM AlphaPeople WHERE ID > #Last;
Note, I do assume that you are using ID int IDENTITY for your PK.
If you have to monitor for data changes of existing records, then have the "last changed" column in every table, and store time of the last transfer.
A different technique would involve setting-up a linked server on Beta to Alpha and running your example without using SSIS. I would expect this to be way slower and more resource intensive than the SSIS solution.
INSERT INTO dbo.BetaPeople
SELECT * FROM [Alpha].[myDB].[dbo].[AlphaPeople]
WHERE ID NOT IN (SELECT ID FROM dbo.BetaPeople)
Add a lookup between your source and destination.
Right click the lookup box to open Lookup Transformation Editor.
Choose [Redirect rows to no match output].
Open columns, map your key columns.
Add an entry with the table key in lookup column , lookup operation as
Connect lookup box to destination, choose [Lookup no Match Output]
Simplest method I have used is as follows:
Query Alpha in a Source task in a Dataflow and bring in records to the data flow.
Perform any needed Transformations.
Before writing to the Destination (Beta) perform a lookup matching the ID column from Alpha to those in Beta. On the first page of the Lookup Transformation editor, make sure you select "Redirect rows to no match output" from the dropdown list "Specify how to handle rows with now matching error"
Link the Lookup task to the Destination. This will give you a prompt where you can specify that it is the unmatched rows that you want to insert.
This is the classical Delta detection issue. The best solution is to use Change Data Capture with/without SSIS. If what you are looking for is a once in a life time activity, no need to go for SSIS. Use other means such as linked server and compare with existing records.
The following should solve issue of loading Changed and New records using SSIS:
Extract Data from Source usint Data flow.
Extract Data from Target.
Match on Primary key Add Unmatch records and split matched and unmatched records from Source and Matched records from Target call them Matched_Source,
Unmatch_Source and Matched_Target.
Compare Matched_Source and Matched_Target and Split Matched_Source to Changed and Unchanged.
Null load TempChanged Table.
Add Changed Records to TempChanged.
Execute SQL script/stored proc to Delete Records from Target for primary key in TempChanged and add records in TempChanged to Target.
Add Unmatched_Source to Target.
Another solution would be to use a temporary table.
In the properties for Beta's connection manager, change RetainSameConnection to true (by default SSIS runs each query in it's own connection, this would mean the temporary table would be killed as soon as it has been created).
Create a SQL Task using Beta's connection and use the following SQL to create your temporary table:
SELECT TOP 0 *
INTO ##beta_temp
FROM Beta
Next create a data flow that pulls data from Alpha and loads into ##beta_temp (you will need to run the SQL statement above on SSMS first so that Visual Studio can see the table at design time and you will also need to set the DelayValidation property to true on the Data Flow task).
Now you have two tables on the same server and you can just use your example SQL modified to use the temporary table.
INSERT INTO Beta
SELECT * FROM ##beta_temp
WHERE ID NOT IN (SELECT ID FROM Beta)

Resources