Unit testing results between several stored procedures? - sql-server

I need to unit test results between several stored procedures on a single database (certain values between different results sets ought to be the same). Also, I need to be able to copy these unit test such that several identical databases will perform the unit tests identically when I choose to start the tests.
I want to use OpenRowSet to dump these results to temp tables and then compare these tables, possibly using a stored procedure that I can execute once a week.
Before I configure the servers to allow this are there any reasons not to use OpenRowSet? If so then what other options might I have?

The main reason to not use OpenRowSet is that you don't need to use it. Since you want to do testing, you should use a testing framework. I am a huge fan of DbFit ( http://dbfit.github.io/dbfit/ ). Your tests are completely isolated from your database. It is very easy to set up and modify. And you can even compare result sets between two Stored Procedures. It is very easy to automate. It is easy to create subsections and only run tests in a particular subsection, or an individual test. You can stage the test with DML statements and everything will get rolled back at the end of the test. You can use variables to grab data from a query or procedure and use that in calls / queries that follow.

Well... Perhaps using a unit testing framework is a step too far for you. If you don't want to go that far, try the below.
From MSDN OpenRowset "is an alternative to accessing tables in a linked server and is a one-time, ad hoc method of connecting and accessing remote data by using OLE DB". You have stated that there are several databases on the same server (i.e. no linked server). Therefore OpenRowSet seems to be overkill. You can still get the bulk performance gains by using "Select Into" statements to create your data tables in a new unit testing database (I wouldn't advokate creating test tables in your prod databases). This would have a stored proc that calls each individual database using 3 or 4 part naming. If you really wanted to you could have a table of database\stored procedures and use dynamic SQL to execute them all. once you have all the data your stored proc just needs to compare them.

Another way would be introduce a kind of ReturnType in your stored
procedures, or use an existing parameter to send that value.
When the ReturnType is set to 'INSERT_RESULTS_TO_TEST' or some such,
then have the stored procedures' final return statements, insert
records into test tables designed for testing, instead of their
default return.
If necessary, have additional columns in the test tables to indicate,
which server, which database, which stored procedure is producing the
result. Say call these ResultSetID.
Then, for your comparison, use self-joins on the test tables
comparing for the values between different ResultSetIDs.

Related

Options for executing SQL commands in parallel

Scenario
Note: I am using SQL Server 2017 Enterprise
I am looping through a list of databases and copying data to them out of one database. This database will only be accessed by the script (no other transactions will be made against it from something else). The copy includes copying straight table to table, or will have more complex, longer-running queries or stored procedures. All of this is done with SQL Server jobs calling procedures; I'm not using anything like SSIS.
Question
Instead of looping through all the databases and running the statements one at a time, I want to be able to run them in parallel. Is there an easy way to do this?
Options I've thought of:
Run each data transfer as a job and then run all the jobs at once. From my understanding, they would be executed asynchronously, but I'm not 100% sure.
Generate the SQL statements and write a script outside of SQL Server (e.g. Powershell or Python) and run all the commands in parallel
Leverage SSIS
I prefer not to do this, since this would take too much work and I'm not very familiar with it. This may be used down the road though.
Use powershell...
Create a table on the central database to house instance / connection string details. (Remember to obfuscate for security)
Create another table to house the queries.
Create a third table to map Instance to Query.
In powershell create a collection / list based object. Deserialized from your data entries. The object will be made up of three properties {Source / Destination / Query}
Write a method / function to carry out the ETL based work. CONNECT TO DB, READ FROM SOURCE, WRITE TO DEST.
Iterate over the collection using Foreach-Parallel construct with your function nested within. This will initiate a new SPID based on the number of elements in the collection and pass those values into your function where the work will be carried out.

Common function / stored procedures for all databases

We have a database server and it has about 10 databases.
I would like to create some functions / stored procedures which can be used in all databases.
For example, we can use sp_executesql in any database.
We have some requirements like that (getting current academic year, financial year, etc...)
Is it doable?
As others have suggested, you could put objects into the master database, but Microsoft explicitly recommends that you should not do that. I find that solution to be rather risky anyway, because the master database is 'owned' by the system, not by you, so there are no guarantees that it will continue to behave in the same way in the future.
Instead, I would consider this to be primarily a deployment issue. There are (at least) two strategies you could use:
Deploy the objects to every database
Deploy them to one 'reference' database that is only used for shared objects and create synonyms in the other databases
The second option is perhaps the better one, because if your functions use tables (e.g. you use a calendar table to get the academic year, which is much easier than calculating it) then you would have to create the same tables in every database too. By using synonyms, you only have to maintain one set of tables.
For the actual deployment, it's straightforward to use scripting to do manage the objects, because you just need a list of databases to connect to and run each DDL script against. You can do that using batch files and SQLCMD (perhaps with SQLCMD variables in your .sql scripts), or drive it from PowerShell or any other language that you prefer.
Depending upon what the SP actually does, you want to create the procedure in master, name it with sp_ and mark it as a system procedure:
http://weblogs.sqlteam.com/mladenp/archive/2007/01/18/58287.aspx
A couple of options:
You can use a system stored procedure as Cade says. I've done this in the past and it works ok. One warning on this is that the sp_MS_marksystemobject procedure is undocumented, which may mean that it could vanish or change without warning in future SQL versions. Thinking back I think there were other problems using this approach with functions though.
Another approach is to use standardized procedure and functions, and roll them out across your databases using sp_MSforeachdb to run code against every database. If you need to run against only your 10 database you can take copy the code in this procedure and modify it to check that a database matches your schema before running the code (or you can write your own version that does a similar thing).

Migration of the same data in several directions

I need to move data between PROD and DEV SQL Server databases using stored procedure. Obviously I have to use linked servers to perform this task. But I need the same functionality in both directions.
For example something like:
CREATE PROCEDURE dbo.spMoveData(
)
AS
INSERT INTO Target.dbo.TestTable
SELECT * FROM Source.dbo.TestTable
And just pass proper connection strings to linked servers. I think it is better then creation of two identical stored procedures. Is it possible? If not, are there any best practices to do such tasks?
Maybe it makes sense to drop and create required linked servers before procedure execution?

SQL Server stored procedure conversion to SSIS Package

Problem: currently we have numerous stored procedures (very long up to 10,000 lines) which were written by various developers for various requirements in last 10 years. It has become hard now to manage those complex/long stored procedures (with no proper documentation).
We plan to move those stored procedure into SSIS ETL package.
Has anybody done this is past? If yes, what approach should one take.
Appreciate if anybody could provide advise on approach to convert stored procedure into SSIS ETL Packages.
Thanks
I've done this before, and what worked well for my team was to refactor incrementally, starting with the original source, and then iterate the refactoring effort.
The first step was to attempt to modularize the stored procedure logic into Execute SQL tasks that we chained together. Each task was tested and approved, then we'd integrate and ensure that the new process matched the results of the legacy procedures.
After this point, we could divide the individual Execute SQL tasks across the team, and load-balance the analysis of whether we could further refactor the SQL within the Execute SQL tasks to native SSIS tasks.
Each refactoring was individually unit tested and then integration tested to ensure that the overall process output still behaved like the legacy procedures.
I would suggest the following steps:
Analyze the stored procedures to identify the list of sources and destinations. For example: If the stored procedure dbo.TransferOrders moves data from table dbo.Order to dbo.OrderHistory. Then your source will be dbo.Order and destination will be dbo.OrderHistory.
After you list out the sources and destinations, try to group the stored procedures according to your preference either by source/destination.
Try to find out if there are any data transformations happening within the stored procedures. There are good data transformation tasks available within SSIS. You can evaluate and move some of those functionalities from stored procedures to SSIS. Since SSIS is a workflow kind of tool, I feel that it is easier to understand what is going inside the package than having to scroll through many lines of code to understand the functionality. But, that's just me. Preferences differ from person to person.
Try to identify the dependencies within stored procedures and prepare a hierarchy. This will help in placing the tasks inside the package in appropriate order.
If you have table named dbo.Table1 populating 5 different tables. I would recommend having them in a single package. Even if this data population being carried out by 5 different stored procedures, you don't need to go for 5 packages. Still, this again depends on your business scenario.
SSIS project solution can have multiple packages within them and re-use data sources. You can use Execute SQL task available on the Control Flow task to run your existing queries but I would recommend that you also take a look at some of the nice transformation tasks available in SSIS. I have used them in my project and they function well for ETL operations.
These steps can be done by looking into one stored procedure at a time. You don't have to go through all of them at once.
Please have a look at some of the examples that I have given in other Stack Overflow questions. These should help you give an idea of what you can achieve with SSIS.
Copying data from one SQL table to another
Logging feature available in SSIS
Loading a flat file with 1 million rows into SQL tables using SSIS
Hope that helps.

Stored Procedures MSSQL2005

If you have a lot of Stored Procedures and you change the name of a column of a table, is there a way to check which Stored Procedures won't work any longer?
Update: I've read some of the answers and it's clear to me that there's is no easy way to do this. Would it be easier to move away from Stored Procedures?
I'm a big fan of SysComments for this:
SELECT DISTINCT Object_Name(ID)
FROM SysComments
WHERE text LIKE '%Table%'
AND text LIKE '%Column%'
There's a book-style answer to this, and a real-world answer.
First, for the book answer, you can use sp_depends to see what other stored procs reference the table (not the individual column) and then examine those to see if they reference the table:
http://msdn.microsoft.com/en-us/library/ms189487.aspx
The real-world answer, though, is that it doesn't work in a lot of cases:
Dynamic SQL strings: if you're building strings dynamically, either in a stored proc or in your application code, and then executing that string, SQL Server has no way of knowing what your code is doing. You may have the column name hard-coded in your code, and that'll break.
Embedded T-SQL code: if you've got code in your application (not in SQL Server) then nothing in the SQL Server side will detect it.
Another option is to use SQL Server Profiler to capture a trace of all activity on the server, then search through the captured queries for the field name you want. It's not a good idea on a production server, because the profile incurs some overhead, but it does work - most of the time. Where it will break is if your application does a "SELECT *", and then in your application, you're expecting a specific field name to come back as part of that result set.
You're probably beginning to get the picture that there's no simple, straightforward way to do this.
While this will take the most work, the best way to ensure that everything works is to write integration tests.
Integration tests are just like unit tests, except in this case they would integrate with the database. It would take some effort, but you could easily write tests that exercise each stored procedure to ensure it executes w/o error.
In the simplest case it would just execute the sp and make sure there is no error and not be concerned about the actual results. If your tests just executed sp's w/o checking results you could write a lot of this genericly.
To do this you would need a database to execute against. While you could setup the database and deploy your stored procs manually, the best way would be to use continuous integration to automatically get the latest code (database DDL, stored procs, tests) from your source control system, build your database, and execute your tests. This would happen every time you committed changes to source control.
Yes it seems like a lot of work. It's a lot of work, but the payoff is also big. The ability to ensure that your changes don't break anything allows you to move your product forward faster with a better quality.
Take a look at NUnit and NDbUnit
I'm sure there are more elegant ways to address this, but if the database isn't too complex, here's a quick and dirty way:
Select all the sprocs and script to a query window.
Search for the old column name.
If you are only interested in finding the column usage in the stored procedure probably the best way will be do do a brute force search for the column name in the definition column sys.sql_modules table - which stores the definition for the stored procedures/functions.

Resources