SSIS Data Validation and Data Loading - sql-server

I need suggestion on best approach from below listed options. I need to validate excel file data and load it to SQL Server
Validations include
Non Duplicate columns
Mandatoty fields present
Fields not present in Database
In case of error I would write in errorlog table in database
Below is my approach
Load the Data into a Temp Table in Database
Run the Validations
Log the Error
On success load it to main tables
Please let me know if you have any other better ideas for this scenario

Here are couple of approaches that are possible:
Using SSIS
Create excel connection manager then use dataflow task with OLEDB Source, lookup transform (to eliminate the records NOT needed), OLEDB destination
directly into main table.
You can also choose to redirect or ignore rows that do not satisfy the transformations.
(use can use bulk insert task if the excel is really large instead of dealing RBAR)
2. Using TSQL
BULK INSERT or BCP or use OPENROWSET into staging table. Beware that you need to have approriate drivers installed (JET for x32 or ACE for x64 SQL Server).
Then do error handling by logging to error table (raiseerror, try-catch) before loading to main table.

Related

Methods to transfer Tables from source database to destination database using SSIS dynamically

I am relatively new to SSIS and have to come up with a SSIS package for work such that certain tables must be dynamically moved from one SQL server database to another SQL server database. I have the following constraints that need to be met:
Source table names and destination table names may differ so direct copying of table does not work with transfer SQL server object task.
Only certain columns may be transferred from source table to destination table.
This package needs to run every 5 minutes so it has to be relatively fast.
The transfer must be dynamic such that if there are new source tables, the package need not be reconfigured with hard coded values.
I have the following ideas for now:
Use transfer SQL Server object task but I'm not sure if the above requirements can be met, especially selective transfer of tables and dynamic mapping of columns.
Use SQLBulkCopy in a script component to perform migration.
I would appreciate if anyone could give some direction as to how I can go about meeting the requirements and if my existing ideas are possible.

Dynamic column mapping for both Source and destination in data flow tasks from Oracle to SQL Server

We have around 5000 tables in Oracle and the same 5000 tables exist in SQL server. Each table's columns vary frequently but at any point in time source and destination columns will always be the same. Creating 5000 Data flow tasks is a big pain. Further there's a need to map every time a table definition changes, such as when a column is added or removed.
Tried the SSMA (SQL Server Migration Assistance for Oracle ) but it is very slow for transferring huge amount of data then moved to SSIS
I have followed the below approach in SSIS:
I have created a staging table where it will have a table name, source
query (oracle), Target Query (SQL server) used that table in Execute
SQL task and stored the result set as the full result set
created for each loop container off that execute SQL task result set
and with the object and 3 variables table name, source query and
destination query
In the data flow task source I have chosen OLE DB source for oracle
connection and choose data access mode as an SQL command from a
variable (passed source query from loop mapping variable)
In the data flow task destination I have chosen OLE DB source for SQL
connection and choose data access mode as an SQL command from a
variable (passed Target query from loop mapping variable)
And looping it for all the 5000 tables..it is not working can you please guide us how I need to create it for 5000 tables dynamically from oracle to SQL server using SSIS. any sample code/help would be greatly appreciated. Thanks in advance
Using SSIS, when thinking about dynamic source or destination you have to take into consideration that the only case you can do that is when metadata is well defined at run-time. In your case:
Each table columns vary frequently but at any point of time source destination columns will always same.
You have to think about build packages programatically rather than looping over tables.
Yes, you can use loops in case you can classify tables into groups based on their metadata (columns names, data types ...). Then you can create a package for each group.
If you are familiar with C# you can dynamically import tables without the need of SSIS. You can refer to the following project to learn more about reading from oracle and import to SQL using C#:
Github - SchemaMapper
I will provide some links that you can refer to for more information about creating packages programatically and dynamic columns mapping:
How to manage SSIS script component output columns and its properties programmatically
How to Map Input and Output Columns dynamically in SSIS?
Implementing Foreach Looping Logic in SSIS

What is the equivalent of 'SELECT * INTO' in SSIS

I am building a SSIS package in which package i need to transfer from
an odata source some tables into sql server.
So far i have implement an "insert into" query to the sql server from the tables i read from odata Source. Because the number of tables are 10+ is there a way that i can do "select into" query for faster transfer of those tables in SSIS ?
SSIS has no build in operation to create a table on a destination based on a data set, which is what SELECT ... INTO does.
There is no easy tweak to do this either, SSIS is mostly based for static metadata ETLs, that is performing operations between different sources and destinations with consistent structures and data types. You might achieve what you need with custom scripts, but that would be as well completely outside of SSIS.
If you already know the data you will be inserting into, create the destination tables first (with CREATE TABLE) and then use SSIS to map the corresponding columns. If your destination tables will be dynamic then you will have a hard time using regular SSIS operations to match the metadata of each table, since this is set at design time.
If the problem isn't the table's column data type but the speed of the operation (SELECT ... INTO has minimal logging), then the fastest option is using the bulk insert operation on the destination component when working with SQL Server. It will be faster than regular inserts, but usually slower than performing a SELECT ... INTO directly from SQL.

Dynamically create destination table from source server with SSIS

I need a bit advice how to solve the following task:
I got a source system based on IBM DB2 (IBMDA400) which has a lot of tables that changes rapidly and daily in structure. I must load specified tables from the DB2 into a MSSQL 2008 R2 Server. Therefore i thought using SSIS is the best choice.
My first attempt was just to add both datasources, drop all tables in MSSQL and recreate them with a "Select * Into #Table From #Table". But I was not able to get this working because I could not connect both OLEDB Connections. I also tried this with an Openrowset statement but the SQL Server does not allow that for security reasons and I am not allowed to change that.
My second try was to manually read the tables from the source and drop and recreate the tables with a for each loop and then load the data via the Data Flow Task. But I got stuck on getting the meta data from the Execute SQL Task... so i dont got the column names and types.
I can not believe that this is too hard to archieve. Why is there no "create table if not exist" checkbox on the Data Flow Task?
Of course i searched for the problem here before but could not find a solution.
Thanks in advance,
Pad
This is the solution i got at the end:
Create a File/Table which is used for selection of the source tables.
Important: Create a linked Server on your SQL Instance or a working Connectionstring for the OPENROWSET (i was not able to do so - i choosed the linked server)
Query source File/Table
Build a loop through the resultset
Use Variables and Script Task to build your query
Drop the destination table
Build another Querystring with INSERT INTO TABLE FROM OPENROWSET (or if you used linked Server OPENQUERY)
Execute this Statement
Done.
As i said above i am not quite happy with this but for now it should be ok. I will update this if i got another solution.

MaxDB Data and Schema Export to SQL Server 2005/8

I am tasked with exporting the data contained inside a MaxDB database to SQL Server 200x. I was wondering if anyone has gone through this before and what your process was.
Here is my idea but its not automated.
1) Export data from MaxDB for each table as a CSV.
2) Clean the CSV to remove ? (which it uses for nulls) and fix the date strings.
3) Use SSIS to import the data into tables in SQL Server.
I was wondering if anyone has tried linking MaxDB to SQL Server or what other suggestions or ideas you have for automating this.
Thanks.
AboutDev.
I managed to find a solution to this. There is an open source MaxDB library that will allow you to connect to it through .Net much like the SQL provider. You can use that to get schema information and data, then write a little code to generate scripts to run in SQL Server to create tables and insert the data.
MaxDb Data Provider for ADO.NET
If this is a one time thing, you don't have to have it all automated.
I'd pull the CSVs into SQL Server tables, and keep them forever, will help with any questions a year from now. You can prefix them all the same, "Conversion_" or whatever. There are no constraints or FKs on these tables. You might consider using varchar for every column (or the ones that cause problems, or not at all if the data is clean), just to be sure there are no data type conversion issues.
pull the data from these conversion tables into the proper final tables. I'd use a single conversion stored procedure to do everything (but I like tsql). If the data isn't that large millions and millions of rows or less, just loop through and build out all the tables, printing log info as necessary, or inserting into exception/bad data tables as necessary.

Resources