I'm building my own clone of http://statoverflow.com/sandbox (using the free controls provided to 10K users from Telerik). I have a proof of concept available I can use locally, but before I open it up to others I need to lock it down some more. Currently I run everything through a stored procedure that looks something like this:
CREATE PROCEDURE WebQuery
#QueryText nvarchar(1000)
AS
BEGIN
-- no writes, so no need to lock on select
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
-- throttles
SET ROWCOUNT 500
SET QUERY_GOVERNOR_COST_LIMIT 500
exec (#QueryText)
END
I need to do two things yet:
Replace QUERY_GOVERNOR_COST_LIMIT with an actual rather than estimated timeout, so no query runs longer than say 2 minutes.
Right now nothing stops users from just putting their own 'SET ROWCOUNT 50000;' in front of their query text to override my restriction, so I need to somehow limit the queries to a single statement or (preferrably) disallow the SET commands inside the exec function.
Any ideas?
You really plan to allow users to run arbitrary Ad-Hoc SQL? Only then can a user place in a SET to override your restrictions. If that's the case, you're best bet is to do some basic parsing using lexx/yacc or flex/bison (or your favorite CLR language tree parser) and detect invalid SET statements. Are you going to allow SET #variable=value though, which syntactically is a SET...
If you impersonate low privileged users via EXECUTE AS make sure you create an irreversible impersonation context, so the user does not simply execute REVERT and regain all the privileges :) You also must really understand the implications of database impersonation, make sure you read Extending Database Impersonation by Using EXECUTE AS.
Another thing to consider is deffering execution of requests to a queue. Since queue readers can be calibrated via MAX_QUEUE_READERS, you get a very cheap throttling. See Asynchronous procedure execution for a related article how to use queues to execute batches. This mechanism is different from resource governance, but I've seen it used to more effect that the governor itself.
Throwing this out there:
The EXEC statement appears to support impersonation. See http://msdn.microsoft.com/en-us/library/ms188332.aspx. Perhaps you can impersonate a limited user. I am looking into the availability of limitations that may prevent SET statements and the like.
On a very basic level, how about blocking any statement that doesn't start with SELECT? Or will other query starts be supported, like CTE's or DECLARE statements? 1000 chars isn't too much room to play with, but i'm not too clear what this is in the first place.
UPDATED
Ok, how about prefixing whatever they submit with SELECT TOP 500 FROM (
and appending a ). If they try to do multiple statements it'll throw an error you can catch. And to prevent denial of service, replace their starting SELECT with another SELECT TOP 500.
Doesn't help if they've appended an ORDER BY to something returning a million rows, though.
Related
SET STATISTICS TIME statement is only useful while developing as with it one can performance tune additional statement being added to the query or UDF/SP being worked on. However when one has to performance tune existing code, e.g. a SP with hundreds or thousands of lines of code, the output of this statement is pretty totally useless as it is not clear which to which SQL-statement the recorded times belong to.
Isn't there any alternatives to SET STATISTICS TIME which also show the Statements to which the recorded times belong to?
I would recommend to use advanced tool. Here is example of one call of sp with all and every internal details. On the right you have different runs history which can be commented and analyzed later. All you need for stats/index usage/io/waits - everything available on different tabs. Util: SentryOne Plan Explorer (free).
If your Stored Procedures are granular then you could use this DMV to get an idea of times.
SELECT
DB_NAME(qs.database_id) AS DBName
,qs.database_id
,qs.object_id
,OBJECT_NAME(qs.object_id,qs.database_id) AS ObjectName
,qs.cached_time
,qs.last_execution_time
,qs.plan_handle
,qs.execution_count
,total_worker_time
,last_worker_time
,min_worker_time
,max_worker_time
,total_physical_reads
,last_physical_reads
,min_physical_reads
,max_physical_reads
,total_logical_writes
,last_logical_writes
,min_logical_writes
,max_logical_writes
,total_logical_reads
,last_logical_reads
,min_logical_reads
,max_logical_reads
,total_elapsed_time
,last_elapsed_time
,min_elapsed_time
,max_elapsed_time
FROM
sys.dm_exec_procedure_stats qs
I'd create an extended events session similar to the one below:
CREATE EVENT SESSION [proc_statments] ON SERVER
ADD EVENT sqlserver.module_end(
WHERE ([object_name]=N'usp_foobar')
),
ADD EVENT sqlserver.sp_statement_completed(
SET collect_object_name=(1),collect_statement=(1)
WHERE ([object_name]=N'usp_foobar'))
ADD TARGET package0.event_file(SET filename=N'proc_statments')
WITH (TRACK_CAUSALITY=ON)
GO
This tracks both stored procedure and stored procedure statement completion for a procedure called usp_foobar. Within the event itself, there's an identifier that helps you tie together which statements were executed as a result of having executed a specific procedure (that's what the TRACK_CAUSALITY is for).
I have a stored procedure, and I want to ensure it cannot be executed concurrently.
My (multi-threaded) application does all necessary work on the underlying table via this stored procedure.
IMO, locking the table itself is an unnecessarily drastic action to take, and so when I found out about sp_GetAppLock, which essentially enforces a critical section, this sounded ideal.
My plan was to encase the stored procedure in a transaction and to set up spGetAppLock with transaction scope. The code was written and tested successfully.
The code has now been put forward for review and I have been told that I should not call this function. However when asking the obvious question "why not?", the only reasons I am getting are highly subjective, to do with any form of locking being complicated.
I don't necessarily buy this, but I was wondering whether anyone had any objective reasons why I should avoid this construct. Like I say, given my circumstances a critical section sounds an ideal approach to me.
Further info: An application sits on top of this with 2 threads T1 and T2. Each thread is waiting for a different message M1 and M2. The business logic involved says that processing can only happen once both M1 and M2 have arrived. The stored procedure logs that Mx has arrived (insert) and then checks whether My is present (select). The built-in locking is fine to make sure the inserts happen serially. But the selects need to happen serially too and I think I need to do something over and above the built-in functionality here.
Just for clarity, I want the "processing" to happen exactly once. So I can't afford for the stored procedure to return either false positives or false negatives. I'm worried that if the stored proc runs twice in very quick succession, then both "selects" might return data which indicates that it is appropriate to perform processing.
What is the procedure doing that you cannot rely on SQL Servers built-in concurrency control mechanisms? Often queries can be rewritten to allow real concurrency.
But if this procedure indeed has to be executed "alone", locking the table itself on first access is most likely going to be a lot faster than using the call to sp_GetAppLock. It sounds like this procedure is going to be called often. If that is the case you should look for a way to achieve the goal with minimal impact.
If the table contains no other rows besides of M1 and M2 a table lock is still your best bet.
If you have multiple threads sending multiple messages you can get more fine-grained by using "serializable" as transaction level and check if the other message is there before you do the insert but within the same transaction. To prevent deadlocks in this case make sure you check for both messages for example like this:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN;
SELECT
#hasM1 = MAX(CASE WHEN msg_type='M1' THEN 1 ELSE 0 END),
#hasM2 = MAX(CASE WHEN msg_type='M2' THEN 1 ELSE 0 END)
FROM messages WITH(UPDLOCK)
WHERE msg_type IN ('M1','M2')
INSERT ...
IF(??) EXEC do_other_stuff_and_delete_messages;
COMMIT
In the IF statement before(!) the COMMIT you can use the information collected before the insert together with the information that you inserted to decide if additional processing is necessary.
In that processing step make sure to either mark those messages as processed or to delete them all still within the same transaction. That will make sure that you will not process those messages twice.
SERIALIZABLE is the only transaction isolation level that allows to lock rows that do not exist yet, so the first select statement with the WITH(UPDLOCK) effectively prevents the other row being inserted while the first execution is still running.
Finally, these are a lot of things to be aware of that could go wrong. You might want to have a look at service broker instead. you could use three queues with that. one for type M1 and one for type M2. Every time a message arrives within those queues a procedure can automatically be called to insert a token into the third queue. The third queue then could activate a process to check if both messages exist and do work. That would make the entire process asynchronous but for that it would be easy to restrict the queue 3 response to always only do one check at a time.
Service broker on msdn, also look at "activation" for the automatic message processing.
sp_GetAppLock is just like many other tools and as such it can be misused, overused, or correctly used. It is an exact match for the type of problem described by the original poster.
This is a good MSSQL Tips post on the usage
Prevent multiple users from running the same SQL Server stored procedure at the same time
http://www.mssqltips.com/sqlservertip/3202/prevent-multiple-users-from-running-the-same-sql-server-stored-procedure-at-the-same-time/
We use sp_getapplock all the time, due to the fact that we support some legacy applications that have been re-worked to use a SQL back-end, and the SQL Server locking model is not an exact match for our application logic.
We tend to go for a 'pessimistic' locking model, where we lock an entity before allowing a user to edit it, and use the (NOLOCK) hint extensively when reading data to bypass any blocking from the native locks on the actual tables. sp_getapplock is a good match for this. We also use it to enforce critical paths in large multi-user systems. You have to be systematic about what you call the locks you place.
We've found no performance problems with large numbers of user/locks via this route, so I see no reason why it wouldn't work well for you. Just be aware that you can get blocking and deadlocks if you have processes that place the same named locks, but not necessarily in the same order.
You can create a table with a flag for each set of messages, so if one of the threads is first to start processing it will mark the flag as processing.
To make sure that record blocked properly once one of threads reaches it use:
SELECT ... FROM WITH(XLOCK,ROWLOCK,READCOMMITTED) ... WHERE ...
This peace of code will put Exclusive lock on the record meaning who first got to it owns the row.
Then you do your changes and update flag, other thread will get updated value because it will be blocked by Exclusive lock until first thread commmits or rollbacks transaction.
For this to work you always need to select records from table with XLOCK this way it will work as expected.
Hope this helps.
Exclusive lock prove:
USE master
GO
IF OBJECT_ID('dbo.tblTest') IS NOT NULL
DROP TABLE dbo.tblTest
CREATE TABLE tblTest ( id int PRIMARY KEY )
;WITH cteNumbers AS (
SELECT 1 N
UNION ALL
SELECT N + 1 FROM cteNumbers WHERE N<1000
)
INSERT INTO
tblTest
SELECT
N
FROM
cteNumbers
OPTION (MAXRECURSION 0)
BEGIN TRANSACTION
SELECT * FROM dbo.tblTest WITH(XLOCK,ROWLOCK,READCOMMITTED) WHERE id = 1
SELECT * FROM sys.dm_tran_locks WHERE resource_database_id = DB_ID('master')
ROLLBACK TRANSACTION
I have a bunch of users making ad-hoc queries into a database running SQL Server. Occasionally someone will run a query that locks up the system for a long period of time as they retrieve 10MM rows.
Is it possible to set some options when a specific login connects? e.g.:
transaction isolation level
max rowcount
query timeout
If this isn't possible in SQL Server 2000, is it possible in another version? As far as I can tell, the resource governor does not give you control like this (it just lets you manage memory and CPU).
I realize the user's could do much of this themselves but it'd be awesome if I could control it from the server per user.
Obviously I'd like to move the users away from direct table/view access but that's not an option at the moment.
You can certainly limit the query results. I'm doing it for StackQL using a stored procedure that looks something like this:
CREATE PROCEDURE [dbo].[WebQuery]
#QueryText nvarchar(1000)
AS
BEGIN
INSERT INTO QueryLogs(QueryText)
VALUES(#QueryText)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SET QUERY_GOVERNOR_COST_LIMIT 15000
SET ROWCOUNT 500
Begin Try
exec (#QueryText)
End Try
Begin Catch
SELECT ERROR_NUMBER() AS ErrorNumber,
ERROR_MESSAGE() AS ErrorMessage,
ERROR_PROCEDURE() AS ErrorProcedure,
ERROR_SEVERITY() AS ErrorSeverity,
ERROR_LINE() AS ErrorLine,
ERROR_STATE() AS ErrorState
End Catch
END
The important part here is the series of three SET statements after the log. This limits based on both the number of rows in the results and the expected costs of the query. The rowcount and query governor values can be use variables, and so it shouldn't be hard to modify that to change the restriction based on the current user as well.
However, you should also note that it's pretty easy for users who are "in the know" to bust out of that if they want. In my case I consider the ability to get past the limits from time to time a feature. But it's also why I do the logging: the code to get past the limits sticks out in the logs, and so I can easily catch and ban anyone doing it too often without my permission.
Finally, any user that calls this should be only in the denydatawriters role, the datareaders role, and then given explicit permissions to execute just this stored procedure. Then they can't really do anything but select on existing tables.
Now, I'll anticipate your next question is whether you can make this automatic from somewhere like report builder or management studio. Unfortunatley, I don't think that's possible. You'll need to give them some kind of interface that makes it easy to call your stored procedure. But I could be wrong here.
In SQL Server 2005 and 2008 you can add a LOGIN Trigger to the database that could change the connections settings, based on who they are connected as. However, you need to be very careful with these as a mistake or error in the trigger could result in everyone being locked out.
There's nothing like that in SQL Server 2000.
I just wanted to add, that if Joel's approach works for you, then I would strongly encourage that method over mine, because a Login Trigger is one mighty big and dangerous grenade to be throwing at this problem.
If you still really want to do them, here is a nice article that demonstrates how to use them. Even more important, however is here which has the crucial instructions for what to do if your Logon Trigger locks everyone out.
I am running a bunch of database migration scripts. I find myself with a rather pressing problem, that business is waking up and expected to see their data, and their data has not finished migrating. I also took the applications offline and they really need to be started back up. In reality "the business" is a number of companies, and therefore I have a number of scripts running SPs in one query window like so:
EXEC [dbo].[MigrateCompanyById] 34
GO
EXEC [dbo].[MigrateCompanyById] 75
GO
EXEC [dbo].[MigrateCompanyById] 12
GO
EXEC [dbo].[MigrateCompanyById] 66
GO
Each SP calls a large number of other sub SPs to migrate all of the data required. I am considering cancelling the query, but I'm not sure at what point the execution will be cancelled. If it cancels nicely at the next GO then I'll be happy. If it cancels mid way through one of the company migrations, then I'm screwed.
If I cannot cancel, could I ALTER the MigrateCompanyById SP and comment all the sub SP calls out? Would that also prevent the next one from running, whilst completing the one that is currently running?
Any thoughts?
One way to acheive a controlled cancellation is to add a table containing a cancel flag. You can set this flag when you want to cancel exceution and your SP's can check this at regular intervals and stop executing if appropriate.
I was forced to cancel the script anyway.
When doing so, I noted that it cancels after the current executing statement, regardless of where it is in the SP execution chain.
Are you bracketing the code within each migration stored proc with transaction handling (BEGIN, COMMIT, etc.)? That would enable you to roll back the changes relatively easily depending on what you're doing within the procs.
One solution I've seen, you have a table with a single record having a bit value of 0 or 1, if that record is 0, your production application disallows access by the user population, enabling you to do whatever you need to and then set that flag to 1 after your task is complete to enable production to continue. This might not be practical given your environment, but can give you assurance that no users will be messing with your data through your app until you decide that it's ready to be messed with.
you can use this method to report execution progress of your script.
the way you have it now is every sproc is it's own transaction. so if you cancel the script you will get it update only partly up to the point of the last successfuly executed sproc.
you cna however put it all in a singel transaction if you need all or nothign update.
I am still learning sql server somewhat and recently came across a select query in a stored procedure which was causing a very slow fill of a dataset in c#. At first I thought this was to do with .NET but then found a suggestion to put in the stored procedure:
set implicit_transactions off
this seems to cure it but I would like to know why also I have seen other options such as:
set nocount off
set arithabort on
set concat_null_yields_null on
set ansi_nulls on
set cursor_close_on_commit off
set ansi_null_dflt_on on
set ansi_padding on
set ansi_warnings on
set quoted_identifier on
Does anyone know where to find good info on what each of these does and what is safe to use when I have stored procedures setup just to query of data for viewing.
I should note just to stop the usual use/don't use stored procedures debate these queries are complex select statements used on multiple programs in multiple languages it is the best place for them.
Edit: Got my answer didn't end up fully reviewing all the options but did find
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
Sped up the complex queries dramatically, I am not worried about the dirty read in this instance.
This is the page out of SQL Server Books Online (BOL) that you want. It explains all the SET statements that can be used in a session.
http://msdn.microsoft.com/en-us/library/ms190356.aspx
Ouch, someone, somewhere is playing with fire big-time.
I have never had a production scenario where I had to enable implicit transactions. I always open transactions when I need them and commit them when I am done. The problem with implicit transactions is its really easy to "leak" an open transaction which can lead to horrible issues. What this setting means is "please open a transaction for me the first time I run a statement if there is no transaction open, don't worry about committing it".
For example have a look at the following examples:
set implicit_transactions on
go
select top 10 * from sysobjects
And
set implicit_transactions off
go
begin tran
select top 10 * from sysobjects
They both do the exact same thing, however in the second statement its pretty clear someone forgot to commit the transaction. This can get very complicated to track down if you have this set in an obscure place.
The best place to get documentation for all the set statements is the old trusty sql server books online. It together with a bit of experimentation in query analyzer are usually all that is required to get a grasp of most settings.
I would strongly recommend you find out who is setting up implicit transactions, find out why they are doing it, and remove the setting if its not really required. Also, you must confirm that whoever uses this setting commits their implicitly open transactions.
What was probably going on is that you had an open transaction that was blocking a bit of your your stored proc, and somewhere you have a timeout that is occurring, raising an error and being handled in code, when that timeout happens your stored proc continues running. My guess is that the delay is usually 30 seconds exactly.
I think you need to look deeper into your stored procedure. I don't think that SET IMPLICIT_TRANSACTIONS is really going to be what's sped up your procedure, I think it's probably a coincidence.
One thing that may be worth a look at is what is passed from the client to the server by using the profiler.
We had an odd situation where the default SET arguments for the ADO connection were causing an SP to take ages to run from the client which we resolved by looking at exactly what the server was receiving from the client, complete with default SET arguments compared to what was sent when executing from SSMS. We then made the client pass the same SET statements as those sent by SSMS.
This may be way off track but it is a useful method to use when the SP executes in a timely fashion on the server but not from the client.