How to run SSIS packages dynamically? - sql-server

We have a large production MSSQL database (mdf appx. 400gb) and i have a test database. All the tables,indexes,views etc. are same eachother. I need to make sure that tha datas in the tables of this two database consistent. so i need to insert all the new rows and update all the updated rows into test db from production every night.
I came up with idea of using SSIS packages to make the data consistent by checking updated rows and new rows in all the tables. My SSIS Flow is ;
I have packages in SSIS for each tables seperately because;
Orderly;
Im getting the timestamp value in the table in order to get last 1 day rows instead of getting whole table.
I get the rows of the table in the production
Then im using 'Lookup' tool to compare this data with the test database table data.
Then im using conditional sprit to get a clue whether the data is new or updated.
If the data is new, i insert this data to the destination
5_2. If the data is updated, then i update the data in the destination table.
Data flow is in the MTRule and STBranch package in the picture
The problem is, im repeating creating all this single flow for each table and i have more than 300 table like this. It takes hours and hours :(
What im asking is;
Is there any way in SSIS to do this dynamically ?
PS: Every single table has its own columns and PK values but my data flow schema is always same. . (Below)

You can look into BiMLScript, which lets you create packages dynamically based on metadata.

I believe the best way to achieve this is to use Expressions. They empower you to dynamically set the source and Destination.
One possible solution might be as follows:
create a table which stores all your table names and PK columns
define a package which Loops through this table and which parses a SQL Statement
Call your main package and pass the stmt to it
Use the stmt as Data Source for your Data Flow
if applicable, pass the Destination Table as Parameter as well (another column in your config table)
This is how I processed several really huge tables: the data had to be fetched from 20 tables and moved to one single table.

You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there.
Then call the stored procedure in a FOR EACH component in SSIS.

Why do you need to use SSIS?
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there. Then call the stored procedure in a FOR EACH component in SSIS.
In fact you might be able to do everything using a Stored Procedure and scheduling it in a SQL Agent Job.

Related

How to insert data into destination table using flat file source in SSIS

I have SSIS package, in which flow is -
Get the data from flat file source and insert it into staging table.
Use the staging table data for transformation using select and where clause and then insert filtered data in destination. table.
For 1st point, I have taken Data flow task to get the data from source and insert data into staging table. For 2nd point, I am confused, how should I do it. I am using Execute SQL task to run Select-Where query but the not getting how will I insert that query result into destination table. Which SSIS component should I use here. Or shall I change the entire flow for better performance. Kindly suggest. Thanks in advance.
You are on the right track. Mostly, for a simple data import, I use this flow.
Let's say we have a destination table named FiscalYear.
The first thing I would do is create the staging table. If it exists, I drop it and recreate the table.
The next step is, using the data flow, to stage the file to the staging table.
For the last step, using Execute SQL task, and SQL-server Merge query, I insert or update the data. But to insert or update the data, you may have to have a unique identifier for each row that is in the file. This unique identifier is going to help you from inserting duplicates in case you run the package more than once.
This row unique identifier can be a single column or a combination of columns. In my case, I usually have a column named rowguid of type uniqueidentifier

Monthly report automated with VBA + SQL Server Stored Procs?

I am trying to completely automate this process, and I'm wondering if its viable or efficient to do in VBA.
Report process involves 2 files: one sql file and one excel file.
SQL file has the algorithm, and the final step is a query who's result is then pasted into the excel file.
The algorithm is simpler(than what the audience might be used to) but has two "into" commands and several "update" commands.
Two "into" commands, the first grabs a small portion(constrained on first and last day of previous month) of a 500m+ record table. The second joins the first table with an eligibility type table.
After the second table is created, there is a series of UPDATE commands that change existing data of existing columns.
Then a series of ALTER & UPDATE commands that add new columns to the [second] table and UPDATES them with desired data.
the final step is a query who's results are copy-pasted into excel (as is, no formatting changes necessary).
I'm not too well-versed in VBA/VBNET nor TSQL stored procedures and dynamic sql, if the sql algorithm was a simple pull query with no table creation, I can build something to automate that. But the SQL has 2 table creations, and about a dozen ALTER & UPDATE commands.
Am I stirring up the wrong nest? Should I run it manually as is?
You can definitely do automate this. I created a report that ran two stored procedures and created numerous queries with temp tables including both update and alter commands then used VBA to run execute these and aggregate the data in the final summary sheet.
There is a ton of documentation out there. You can even pass your values to the stored procedure after the user inputs them.
I would add this as a comment but I do not have enough reputation to comment yet (need 50).

Remove duplicates from a SQL server rows using DISTINCT

I need to remove SQL server duplicated rows when importing file into database with distinct method.
HallGroup is my table in database. I'm using this
Sql procedure:
SELECT DISTINCT * INTO tempdb.dbo.tmpTable
FROM HallGroup
DELETE FROM HallGroup
INSERT INTO HallGroup SELECT * FROM tempdb.dbo.tmpTable
DROP TABLE tempdb.dbo.tmpTable
With this procedure works fine duplicated rows are deleted, but the problem is when i try to import again data to SQL server rows are still duplicating. What i'm missing, So any hint?
How to remove SQL server duplicated rows properly when importing file into database with distinct method?
I am just getting back into SQL after being out for a bit but I would not have solved your problem in that way that you are trying (not that I completely understand why you are doing it that way) as I believe (even if it were working correctly) over time your process will take longer each time you do it as the size of the table increases.
It would be much more efficient if you inserted the new data based on the absence of a key (you indicate you are already using a stored proc). If you don't have a key to use (which very recently happened to me), make one. I just solved a similar problem to yours whereas I am importing data into a table from an external source and wanted to eliminate the possibility of duplicates. In my case, I associate name of the external source datafile (is distinct by dataset to import) with the data to be imported and use that to ensure I am not re-importing already imported data. I load the external data into a table using a dtsx and then run a stored proc to merge that data with an existing table. This gives me the added advantage of having a audit trail of where each record came from.
Hope this helps.

How can I minimize validation intervals when changing the SQL in ADO NET Source Tasks

Part of an SSIS package is the data import from an external database via a SQL command embedded into an ADO.NET Source Data Flow Source. Whenever I make even the slightest adjustment to the query (such as changing a column name) it takes ages (in that case 1-2 hours) until the program has finished validation. The query itself returns around 30,000 rows with 20 columns each.
Is there any way to cut these long intervals or is this something I have to live with?
I usually store the source queries in a table and the first part of my package would execute a select and store the query returned from the table in a package variable, which would then be used by the ADO.NET Source Data Flow. So In my package for the default value of the variable I usually have the query that is stored in the database along with a "where 1=2" at the end. Hence during design time it does execute the query but just returns the column metadata. Let me know if you have any questions.

Load data from multiple source into a destination

I have a desktop application through which data is entered and it is being captured in MS Access DB. The application is being used by multiple users(at different locations). The idea is to download data entered for that particular day into an excel sheet and load it into a centralized server, which is an MSSQL server instance.
i.e. data(in the form of excel sheets) will come from multiple locations and saved into a shared folder in the server, which need to be loaded into SQL Server.
There is a ID column with IDENTITY in the MSSQL server table, which is the primary key column and there are no other columns in the table which contains unique value. Though the data is coming from multiple sources, we need to maintain single auto-updating series(IDENTITY).
Suppose, if there are 2 sources,
Source1: Has 100 records entered for the day.
Source2: Has 200 records entered for the day.
When they get loaded into Destination(SQL Server), table should have 300 records, with ID column values from 1 to 300.
Also, for the next day, when the data comes from the sources, Destination has to load data from 301 ID column.
The issue is, there may be some requests to change the data at Source, which is already loaded in central server. So how to update the data for that row in the central server as the ID column value will not be same in Source and Destination. As mentioned earlier ID is the only unique value column in the table.
Please suggest some ides to do this or I've to take up different approach to accomplish this task.
Thanks in advance!
Krishna
Okay so first I would suggest .NET and doing it through a File Stream Reader, dumping it to the disconnected layer of ADO.NET in a DataSet with multiple DataTables from the different sources. But... you mentioned SSIS so I will go that route.
Create an SSIS project in Business Intelligence Development Studio(BIDS).
If you know for a fact you are just doing a bunch of importing of Excel files I would just create many 'Data Flow Task's or many Source to Destination tasks in a single 'Data Flow Task' up to you.
a. Personally I would create tables in a database for each location of an excel file and have their columns map up. I will explain why later.
b. In a data flow task, select 'Excel Source' as the source file. Put in the appropriate location of 'new connection' by double clicking the Excel Source
c. Choose an ADO Net Destination, drag the blue line from the Excel Source to this endpoint.
d. Map your destination to be the table you map to from SQL.
e. Repeat as needed for each Excel destination
Set up the SSIS task to automate from SQL Server through SQL Management Studio. Remember you to connect to an integration instance, not a database instance.
Okay now you have a bunch of tables right instead of one big one? I did that for a reason as these should be entry points and the logic to determinate dupes and import time I would leave to another table.
I would set up another two tables for the combination of logic and for auditing later.
a. Create a table like 'Imports' or similar, have the columns be the same except add three more columns to it: 'ExcelFileLocation', 'DateImported'. Create an 'identity' column as the first column and have it seed on the default of (1,1), assign it the primary key.
b. Create a second table like 'ImportDupes' or similar, repeat the process above for the columns.
c. Create a unique constraint on the first table of either a value or set of values that make the import unique.
c. Write a 'procedure' in SQL to do inserts from the MANY tables that match up to the excel files to insert into the ONE 'Imports' location. In the many inserts do a process similar to:
Begin try
Insert into Imports (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End try
-- if logic breaks unique constraint put it into second table
Begin Catch
Insert into ImportDupes (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End Catch
-- repeat above for EACH excel table
-- clean up the individual staging tables for the next import cycle for EACH excel table
truncate TableExcel1
d. Automate the procedure to go off
You now have two tables, one for successful imports and one for duplicates.
The reason I did what I did is two fold:
You need to know more detail than just the detail a lot of times like when it came in, from what source it came from, was it a duplicate, if you do this for millions of rows can it be indexed easily?
This model is easier to take apart and automate. It may be more work to set up but if a piece breaks you can see where and easily stop the import for one location by turning off the code in a section.

Resources