I am creating a process that automates testing the consistency in database tables across servers.
I have a test_master table which contains following columns:
test_id, test_name, test_status
and Job_master table which contains following columns:
jid, test_id, job_name, job_type, job_path, job_status,
server_ip, db, error_description, op_table, test_table,
copy_status, check_status
There can be multiple jobs for a particular test. The jobs are logical jobs (and not sql agent jobs), they can be script, procedure or ssis package.
So I have made an ssis package :
In Pre-execute, it takes up tests which aren't done yet.
Each Job runs and writes the name of live table into op_table field
In post-execute, the live tables are getting copied to a test database environment and table name is put into test_table.. and testing will be performed there only.
Here the jobs are running in a loop... Is there a way to let the jobs run in parallel because they are independent of each other....
Can I write an sql procedure for this inside of this loop or is there any other way I can do this..
Any new ideas are welcome...
Thank you very much.. :)
Very roughly, I would put the approach as below:
SQL bits
Wrap whatever SQL code is part of "job" into a stored proc. Inside this proc, populate a variable which takes care of the SQL bit and execute it using dynamic SQL. Update the job status in the same proc and take help of TRY-CATCH-THROW construct.
Packages
Populate the name of packages in an SSIS string variable in delimited fashion(or have an object variable, whatever suits you). Then, in a script task, iterate through the list of packages and fire them using dtExec command. To update the job status, it's best to have the update of job status taken care by the invoked packages. If that is not an option, use Try-catch construct, update the job statuses according. This is a helpful link.
Do a check on the job_type variable on top of the SSIS package(using precedence constraint) and route them into the correct 'block'.
Related
I have a sql query that has more than 200 lines of codes with the following steps. I need to run this everyday and generate Table A. I have a new requirement to make a SSIS package with the same process and create the tableA with ssis . The below details are the current SQL process
drop table_A
select into table_A from (select tableB union all select tableC union all TableD)
key fators : table_B, table_C, table_D - I need to pull 20 columns out of 40 columns from these three tables. The columns names vary and I need to
rename and standardisse the column names and certain data type so that it goes as unique column in Table_A.
This is already set up in sql query, but I need to know whats the best practise and how to transform them into SSIS ? Should I use "Execute SQL Task" process flow
or use data flow task involving oledb source and oledb destination ?
Execute SQL Task is what you're going to want. The Execute SQL Task is designed to run an arbitrary query that may or may not return a result set. You've already done the hard work of getting your code working correctly so all you need to do is define a Connection Manager (likely an OLE DB) and paste in your code.
In this case, SSIS is going to be nothing more than a coordinator/execution framework for your existing SQL Process. And that's perfectly acceptable as someone who's written more than a few SSIS packages.
A Data Flow Task, I find, is more appropriate when you need to move the data from tables B, C, and D into a remote database or you need to perform transformation logic on them that isn't easily done in TSQL.
A Data Flow Task will not support creating the table at run-time. All SSIS tasks perform a validation check - either on package start or it can be delayed until the specific task begins. One of the checks a Data Flow Task is going to perform is "does the target table exist (and does the structure match my cached copy)?"
We have a list of stored procedures (more than 1000) in a table which need to be executed every morning.
The stored procedures do not have any dependency with each other.
We have tried while loop and cursor it used to takes a lot of time in execution.
We taught of creating job for each stored procedure and call them using sp_start_job (sp_start_job is called in async manner) we got level of parallelism.
Problem arise when a new stored procedure is added to list and it became huge.
some time ppl missed to create job related new stored procedure
DB got bombarded with a no of jobs (manageability issue for DBA)
Note: list of may altered any day (stored procedures can be added or removed from list).
If the SPs run for longer, I would have categorized the 1000 SPs into 5-10 numbers, then 1 SSIS package for each category and then Agent Jobs for each package. Then, schedule those jobs at same time.
There are many ways like Loops, Scripting and multiple factors to achieve it. You can test with different ways and go with the best one.
Note: Performance of the SSIS execution depends on your Memory, Processor and Hardware.
Adding to # Nick.MacDermaid - you can utilize MaxConcurrentExecutables property of package to implement custom parallelism. Of course you would need to have multiple containers and corresponding stored proc groups.
Parallel Execution in SSIS
MaxConcurrentExecutables, a property of the package. It defines how
many tasks (executables) can run simultaneously. It defaults to -1
which is translated to the number of processors plus 2. Please note
that if your box has hyperthreading turned on, it is the logical
processor rather than the physically present processor that is
counted.
Hi you can use the following piece of code to get basically write the script of running all your stored procedures if you add a new procedure it will automatically be added to the list
SELECT'EXEC '+SPECIFIC_NAME [Command] + ';'
FROM information_schema.routines
WHERE routine_type = 'PROCEDURE'
After this you take the result set put it into a tab delimited text file and save the file in a location.
use this link to import the text into a execute SQL task the first answer works well
SSIS: How do I pull a SQL statement from a file into a string variable?
execute the task and it should work, if you need to narrow the ;ist of procedures you can specify a specific prefix in the name of the procedure and use that in the where clause
It will run in serial, sorry i dont have enough rep to comment yet
I am working in SQL Server 2008 and BIDS. Due to some performance problems, I am re-designing my current architecture. Currently, I have a stored procedure that has many INSERT INTO SELECT statements inside of it. In my new architecture, I am trying to get the performance of SSIS for inserts (instead of INSERT INTO in SSMS). So, my new stored proc will still have all of the SELECT statements (just no INSERT INTO before each of them). I will call this stored proc in SSIS (with a few parameters supplied that are needed by the SELECTs). My goal is to have each SELECT write to separate flat files. (Actually, certain groups of SELECTS will write to separate flat files, such that I have just a few -- instead of a billion -- flat file connection managers.) I know how to execute a stored proc in SISS and have it write a multiple-row set to a flat file. But, is it possible to have the execution of 1 stored proc in SSIS to write several multiple-row sets to several flat files? If so, how can it be done?
You can have one stored proc write to as many files as you want. Please look at this article by Phil Factor, https://www.simple-talk.com/sql/t-sql-programming/reading-and-writing-files-in-sql-server-using-t-sql/
However you are loosing all the power of SSIS - such as redirection on error rows, logging, parrallel processing. What you need to do sounds like a perfect SSIS task (or series of tasks).
Using Data Flow for Dynamic Export is not possible due to Strict Metadata Architecture of SSIS. But you can do it using Control Flow task. You have to write BCP command in Execute Process Task and call it for each table you want to export.
Steps:
Call select * from information_schema.tables and grab result set into variable
Use foreach Loop task to loop through tables
Use execute process task to call BCP in your loop.
In SSIS 2008 I have a Script Task that checks if a table exists in a database and sets a boolean variable.
In my Data Flow I do a Conditional Split based on that variable, so that I can do the appropriate OLE DB Commands based on whether that table exists or not.
If the table does exist, the package will run correctly. But if the table doesn't exist, SSIS is checking metadata on the OLE DB Command that isn't being run, determine the table isn't there, and failing with an error before doing anything.
There doesn't seem to be any way to catch or ignore that error (e.g. I tried increasing MaximumErrorCount and various different ErrorRowDescription settings), or to stop it ever validating the command (ValidateExternalMetadata only seems to affect the designer, by design).
I don't have access to create stored procedures to wrap this kind of test, and OLE DB Commands do not let you use IF OBJECT_ID('') IS NOT NULL prefixes on any statements you're doing (in this case, a DELETE FROM TableName WHERE X = ?).
Is there any other way around this, short of using a script component to fire off the DELETE command row-by-row manually?
You can use Script component to execute DELETE statement for each row in input path but that might be very slow depending on number of rows to be deleted.
You can:
Store PKs of records that should be deleted to a database table (for instance: TBL_TO_DEL)
Add Execute SQL Task with SQL query to delete records by joining TBL_TO_DEL with table that You want to delete records from
Put precedence constraint on path between your data flow and Execute SQL task (constraint based on your variable)
This solution is much faster than deleting row by row.
If for some reason You can't create new table, check my answer on SSIS Pass Datasource Between Control Flow Tasks to see other ways to pass data to next data flow where You can use OleDb source and OleDb command. Whichever way You choose, key is in constraint that will or will not execute following task (Execute SQL task or data flow) depending on value in variable.
Note that Execute SQL task will not validate query and as such will fail at runtime if constraint is satisfied and table doesn't exist. If You use another Data Flow instead of Execute SQL Task, set DelayedValidation property to true. It means that task will be validated at the moment prior to executing particular task, not anytime earlier.
I have data coming in from datastage that is being put in our SQL Server 2008 database in a table: stg_table_outside_data. The ourside source is putting the data into that table every morning. I want to move the data from stg_table_outside_data to table_outside_data where I keep multiple days worth of data.
I created a stored procedure that inserts the data from stg_table_outside_Data into table_outside_data and then truncates stg_table_outside_Data. The outside datastage process is outside of my control, so I have to do this all within SQL Server 2008. I had originally planned on using a simple after insert statement, but datastage is doing a commit after every 100,000 rows. The trigger would run after the first commit and cause a deadlock error to come up for the datastage process.
Is there a way to set up an after insert to wait 30 minutes then make sure there wasn't a new commit within that time frame? Is there a better solution to my problem? The goal is to get the data out of the staging table and into the working table without duplications and then truncate the staging table for the next morning's load.
I appreciate your time and help.
One way you could do this is take advantage of the new MERGE statement in SQL Server 2008 (see the MSDN docs and this blog post) and just schedule that as a SQL job every 30 minutes or so.
The MERGE statement allows you to easily just define operations (INSERT, UPDATE, DELETE, or nothing at all) depending on whether the source data (your staging table) and the target data (your "real" table) match on some criteria, or not.
So in your case, it would be something like:
MERGE table_outside_data AS target
USING stg_table_outside_data AS source
ON (target.ProductID = source.ProductID) -- whatever join makes sense for you
WHEN NOT MATCHED THEN
INSERT VALUES(.......)
WHEN MATCHED THEN
-- do nothing
You shouldn't be using a trigger to do this, you should use a scheduled job.
maybe building a procedure that moves all data from stg_table_outside_Data to table_outside_data once a day, or by using job scheduler.
Do a row count on the trigger, if the count is less than 100,000 do nothing. Otherwise, run your process.