SQL Server stored procedure conversion to SSIS Package - sql-server

Problem: currently we have numerous stored procedures (very long up to 10,000 lines) which were written by various developers for various requirements in last 10 years. It has become hard now to manage those complex/long stored procedures (with no proper documentation).
We plan to move those stored procedure into SSIS ETL package.
Has anybody done this is past? If yes, what approach should one take.
Appreciate if anybody could provide advise on approach to convert stored procedure into SSIS ETL Packages.
Thanks

I've done this before, and what worked well for my team was to refactor incrementally, starting with the original source, and then iterate the refactoring effort.
The first step was to attempt to modularize the stored procedure logic into Execute SQL tasks that we chained together. Each task was tested and approved, then we'd integrate and ensure that the new process matched the results of the legacy procedures.
After this point, we could divide the individual Execute SQL tasks across the team, and load-balance the analysis of whether we could further refactor the SQL within the Execute SQL tasks to native SSIS tasks.
Each refactoring was individually unit tested and then integration tested to ensure that the overall process output still behaved like the legacy procedures.

I would suggest the following steps:
Analyze the stored procedures to identify the list of sources and destinations. For example: If the stored procedure dbo.TransferOrders moves data from table dbo.Order to dbo.OrderHistory. Then your source will be dbo.Order and destination will be dbo.OrderHistory.
After you list out the sources and destinations, try to group the stored procedures according to your preference either by source/destination.
Try to find out if there are any data transformations happening within the stored procedures. There are good data transformation tasks available within SSIS. You can evaluate and move some of those functionalities from stored procedures to SSIS. Since SSIS is a workflow kind of tool, I feel that it is easier to understand what is going inside the package than having to scroll through many lines of code to understand the functionality. But, that's just me. Preferences differ from person to person.
Try to identify the dependencies within stored procedures and prepare a hierarchy. This will help in placing the tasks inside the package in appropriate order.
If you have table named dbo.Table1 populating 5 different tables. I would recommend having them in a single package. Even if this data population being carried out by 5 different stored procedures, you don't need to go for 5 packages. Still, this again depends on your business scenario.
SSIS project solution can have multiple packages within them and re-use data sources. You can use Execute SQL task available on the Control Flow task to run your existing queries but I would recommend that you also take a look at some of the nice transformation tasks available in SSIS. I have used them in my project and they function well for ETL operations.
These steps can be done by looking into one stored procedure at a time. You don't have to go through all of them at once.
Please have a look at some of the examples that I have given in other Stack Overflow questions. These should help you give an idea of what you can achieve with SSIS.
Copying data from one SQL table to another
Logging feature available in SSIS
Loading a flat file with 1 million rows into SQL tables using SSIS
Hope that helps.

Related

Looping Through Tables in a DB in Informatica

I am looking for a way in Informatica to pull data from a table in a database, load it in Snowflake, and then move on to the next table in that same DB and repeating that for the remaining tables in the database.
We currently have this set up running in Matillion where there is an orchestration that grabs all of the names of a table of a database, and then loops through each of the tables in that database to send the data into Snowflake.
My team and I have tried to ask Informatica Global Support, but they have not been very helpful for us to figure out how to accomplish this. They have suggested things like Dynamic Mapping, which I do not think will work for our particular case since we are in essence trying to get data from one database to a Snowflake database and do not need to do any other transformations.
Please let me know if any additional clarification is needed.
Dynamic Mapping Task is your answer. You create one mapping. With, or without any transformations - as you need. Then you set up Dynamic Mapping Task to execute the mapping across whole set of your 60+ different sources and targets.
Please note that this is available as part of Cloud Data Integration module of IICS. It's not available in PowerCenter.

Tool To Generate Data Migration Script From Table A To Table B In SQL Server

During our SQL Server database deployments, we create a temporary table which contains the new desired state of data for a particular table. We then merge the temp table into the target table (we actually use individual insert, update and delete statements, but that's probably not relevant). The inserts/updates/deletes performed are captured and written out to a log.
We would like to be able to report on what changes would be applied by a deployment, without actually applying them. This is currently done by rolling back the transaction at the end of the above process. This doesn't feel particularly great though.
Now what we are thinking of doing is, instead of performing the changes and rolling them back, we will generate a migration script for the table (generate some SQL code that performs the necessary inserts, updates and deletes). If we want to do the actual deployment, this code will be dynamically executed. If not, the code will just be printed to a log.
It shouldn't take long to put together some code which can generate migration scripts for two specified tables, but I first wanted to verify that there isn't already an existing tool which can do this?
Searching on Google, I can find lots of talk about migrating whole databases, but nothing about generating a data migration script to effectively merge one table into another.
So my question is, does anyone know of such a tool?
There are several data compare tools like:
SQL Data Compare from Red Gate
SQL Server Data Tools
dbForge Data Compare from Devart
Is that what you're looking for?

SSIS 14 - Staging Area - Merge two sources is taking a lot of time

I've two tables:
Table A: 631 476 rows
Table B: 12 90 rows
Eache Table have the Field ID that I want to use it as Key in Merge Object. In the following image is possible to see that the process blocks before the Merge Object. I already test with Merge Join object and results are the same...
Which other possibilities I have in order to make this operation using SSIS 14?
Thanks!
If both sources tables are in the same server, Don't use this way. You should simply write an query in SQL Server side.
Something like this :
SELECT *
FROM [Table A]
INNER JOIN [Table B] ON [Table A].ID = [Table B].ID
ORDER BY ...
As James Serra said : When to use T-SQL or SSIS for ETL
Performance – With T-SQL, everything is processed within the SQL engine. With SSIS, you are bringing all the data over to the SSIS memory space and doing the manipulation there. So if speed is an issue, usually T-SQL is the way to go, especially when dealing with a lot of records. Something like a JOIN statement in T-SQL will go much faster than using lookup tasks in SSIS. Another example is a MERGE statement in T-SQL has much better performance than a SCD task in SSIS for large tasks
Features/capabilities – Some features can only be done in either T-SQL or SSIS. You can shred text in SSIS, but can’t in T-SQL. For example, text files with an inconsistent number of fields per row can only be done in SSIS. So certain tasks may force you into using one or the other
Current skill set – Are the people in your IT department more familiar with SSIS or T-SQL?
Ease of development/maintenance – Of course, whatever one you are most familiar with will be the easiest, but if your skills at both are fairly even, then SSIS is usually easier to use because it is graphical, but sometimes you can develop quicker in T-SQL. For example, having to join a bunch of tables will require a bunch of tasks in SSIS, where in T-SQL it is one statement. So it might be easier to create the tasks to join the tables in SSIS, but it will take longer to build then writing a T-SQL statement
Complexity – SSIS can be more complex because you might need to create many tasks to accomplish your objective, where in T-SQL it might just be one statement, like in the example above for joining tables
Extensibility – SSIS has better extensibility because you can create a script task that uses C# that can do just about anything, especially for non-database related tasks. T-SQL is limited because it is only for database tasks. SSIS also has logging, which T-SQL does not
Likelihood of depracation/breaking changes – Minor issue, but T-SQL is always removing features in each release that will have to be rewritten
Types/architecture of sources and destinations – SSIS is better if you have multiple types of sources. For example, it works really well with Oracle, XML, flat-files, etc. SSIS was designed from the beginning to work well with other sources, where T-SQL is designed for SQL Server and it requires more steps to access other sources, and there are additional limitations when doing so
Local regulations – Are there some company standards you have to adhere to that would limit which tool you can use?
I have had issues doing joins or merges in SSIS. I will instead write the TSQL version and execute SQL task. It always runs much faster for me that way.

Convert or output SSIS package/job to SQL script?

I understand this may be a little far-fetched, but is there a way to take an existing SSIS package and get an output of the job it's doing as T-SQL? I mean, that's basically what it is right? Transfering data from one database to another can be done with T-SQL as well.
I'm wondering this because I'm trying to get away from using SSIS packages for data transfer and instead using EF/linq to do this on the fly in my application. My thought process is that currently I have an SSIS package that transfers and formats data from one database to another in preparation to be spit out to an excel. This SSIS package runs nightly and helps speed up the generation of the excel as once the data is transferred to the second db, it's already nice and formatted correctly.
However, if I could leverage EF and maybe some linq to sql in order to format the data from the first database on the fly and spit it out to excel quickly without having to use this second db, that would be great. So can my original question be done, can I extract the t-sql representation of an SSIS package some how?
SSIS packages are not exclusively T-SQL. They can consist of custom back-end code, file system changes, Office document creation steps, etc, to name only a few. As a result, generating the entirety of an SSIS package's work into T-SQL isn't possible, because the full breadth of it's work isn't limited to SQL Server.

Database cleanup

I inherited a SQL server database that is not well formatted. ( some consulting company came in to do the project and left without completing it)
the main issues I have with this database are:
Data types: a lot of tinyint and text types.
Tables are not normalized: some of the keys are names instead of seq ids.
A lot of tables that I am not sure are being used
a lot of stored procedures that i am not sure are being used
Badly named tables and stored procs
I also inherited the asp.net application that runs against this database.
I would like to clean this database up. I understand that changing the datatypes will have to happen at each table. for getting rid of all the extra tables and stored procs. what is the easiest way to do so.
any other tips to make it cleaner and smaller is appreciated.
I want to also mention that I have RedGate tools installed.( if that helps).
Thank you
Check out the Sql Server Data Tools they allow to create a project from a live database. Some of the things you can do in there is right click 'Find Usages' for the tables, views and functions.
So long as the previous developer used stored procedures and views rather than querying directly, it should find references to your project that way, without killing your project.
Also, for finding stored procedures that are not used, put in some basic logging at the top of each stored procedure in your application, after X amount of days, those that haven't been logged in your table are likely safe to remove, else a tedious search through your .NET code will find them.

Resources