SQL Azure Request Limits and a possible connection leak - sql-server

I have an interesting problem going on. I recently moved 2 SQL databases to SQL Azure for a client and all seemed to be going well...at first. Mid-morning I get a spike of error emails for various things, but a few common ones:
-The request limit for the database is 90 and has been reached.
-Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
-A transport-level error has occurred when receiving results from the server.
There's obviously some database related issues going on with the move to Azure, or the existing code in general. The errors that seemed to happen the most were request limit and timeouts. Once they started, they never seemed to stop. And I don't think there were many users using the site today. It almost seemed like the connection continued to try to connect on a different thread in the background if this makes any sense. This is in reference to the "The timeout period elapsed prior to completion of the operation or the server is not responding." I would get an error email, I'd check the page it referenced myself, and it would load immediately. I checked with the user who threw the error and they reported everything was fine. Strange. Yet I continued every few minutes to get the same error email.
I currently have them on the S1 Tier which limits the requests to 90 concurrently. I did some digging and found the following SQL query:
select * from sys.dm_exec_connections
I ran this, and it showed I had over 90 active connections, some of which were opened some time ago. This was strange to me as the site was currently not being used (it's really late at night and I know no one is using the site). I wanted to end all the connections so I came up with the following query:
DECLARE #sessionId int
DECLARE #SQL nvarchar(1000)
DECLARE #clientIP nvarchar(50)
set #clientIP = 'XX.XX.XX.XX'
select #sessionId = min( session_id ) from sys.dm_exec_connections where client_net_address = #clientIP
while #sessionId is not null
begin
SET #SQL = 'KILL ' + CAST(#sessionId as varchar(4))
EXEC (#SQL)
select #sessionId = min( session_id ) from sys.dm_exec_connections where session_id > #sessionId and client_net_address = #clientIP
end
I tried running this command, but the connections came right back. I went on the web server and manually stopped the site in IIS, ran the KILL command again but the connections remained. I put up the app_offline file and took the site down for about a half hour to see if any lingering connections would drop, but they didn't. And I still continued to get error emails for pages I KNEW were not accessible because I stopped the Site AND app pool. I went on the server and manually stopped the w3wp process and ran SQL KILL statements to kill the connections. They finally went away! I put the app back online and hit a single page. I kept running the above query to see the active connections and sure enough every time I ran the query the active connection count kept creeping up. It stops around 102 as of right now. And that's me as a user hitting a single page. I'm guessing this isn't normal? Does this indicate connections are lingering out there and not being dropped or closed?
I just made code changes recently adding Entity Framework. Wherever I'm grabbing data through EF, im using so with a using statement on the context. The rest of the app is sort of old and is using TableAdapters. I see in some places it's following the same pattern with using statements, other places Dispose is being called. I haven't had a chance to track down all the usages yet. Is this a good place to start looking? Anyone have any suggestions on how to track this 'leak' down? I'm not super knowledgeable with SQL so any help would be greatly appreciated!

Related

How to add informations on WARNING message in postgres

I get the following warning
WARNING: there is already a transaction in progress
in my database and i want to investigate the reason this happens. However due to the database is accessible through many microservices i cannot find which service is trying to start a new/parallel connection.
How can i increase the level of information of this message? Like a timestamp, who tried to start the connection like client_addr field or any other information that will reveal the root of the fault.
Thanks in advance
the source - starting transaction twice, example:
t=# begin;
BEGIN
Time: 22.594 ms
t=# begin;
WARNING: there is already a transaction in progress
BEGIN
Time: 1.269 ms
to see who, when, set log_min_messages to at least warning, log_line_prefix to have %h for IP and %m for time, %u for username - https://www.postgresql.org/docs/current/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT, (logging_collector of course on) and check logs
There's plenty you could do to find out what's going on. First, you could check PostgreSQL logs to see what's going on. If you do not have access to the logs. You can check which queries are active, idle, or idle in transaction by running the next query:
SELECT
pid,
query,
state
FROM pg_stat_activity
There you can see which transaction is currently running by adding to the query WHERE state='active'
IMPORTANT NOTE:
If your using services to access the database (specialy c# services (by experience)) you have to check your connection to the database. Because if it is not correctly configured you'll end up with services that can accept only one user per transaction and that's really dangerous.
The problem might be that you are sending your calls to the database through one connection and the 'service' never opens new connections. Therefore, PostgreSQL will reject any incoming queries and set the message:
WARNING: there is already a transaction in progress
Because the connection channel is being used by a transaccion.

Error 17886 - The server will drop the connection

We are running a website on a vps server with sql server 2008 x64 r2. We are being bombarded with 17886 errors - namely:
The server will drop the connection, because the client driver has
sent multiple requests while the session is in single-user mode. This
error occurs when a client sends a request to reset the connection
while there are batches still running in the session, or when the
client sends a request while the session is resetting a connection.
Please contact the client driver vendor.
This causes sql statements to return corrupt results. I have tried pretty much all of the suggestions I have found on the net, including:
with mars, and without.
with pooling and without
with async=true and without
we only have one database and it is absolutely multi-user.
Everything has been installed recently so it is up to date. They may be correlated with high cpu (though not exclusively according to the monitors I have seen). Also correlated with high request rates from search engines. However, high cpu/requests shouldn't cause sql connections to reset - at worst we should have high response times or iis refusing to send response.
Any suggestions? I am only a developer not dba - do i need a dba to solve this problem?
Not sure but some of your queries might cause deadlocks on the server.
At the point you detect this error again
Open Management Studio (on the server, install it if necessary)
Open a new query window
Run sp_who2
Check the blkby column which is short for Blocked By. If there is any data in that column you have a deadlock problem (Normally it should be like the screenshot I attached, completely empty).
If you have a deadlock then we can continue with next steps. But right now please check that.
To fix the error above, ”MultipleActiveResultSets=True” needs to be added to the connection string.
via Event ID 17886 MSSQLServer – The server will drop the connection
I would create an eventlog task to email you whenever 17886 is thrown. Then go immediately to the db and execute the sp_who2, get the blkby spid and run a dbcc inputbuffer. Hopefully the eventinfo will give you something a bit more tangible to go on.
sp_who2
DBCC INPUTBUFFER(62)
GO
Use a "Instance Per Request" strategy in your DI-instantiation code and your problem will be solved
Most probably you are using dependency injection. During web development you have to take into account the possibility of concurrent requests. Therefor you have to make sure every request gets new instances during DI, otherwise you will get into concurrency issues. Don't be cheap by using ".SingleInstance" for services and contexts.
Enabling MARS will probably decrease the number of errors, but the errors that are encountered will be less clear. Enabling MARS is always never the solution, do not use this unless you know what you're doing.

Why does SQL Server say "Starting Up Database" in the event log, twice per second?

I have a SQL Server [2012 Express with Advanced Services] database, with not much in it. I'm developing an application using EF Code First, and since my model is still in a state of flux, the database is getting dropped and re-created several times per day.
This morning, my application failed to connect to the database the first time I ran it. On investigation, it seems that the database is in "Recovery Pending" mode.
Looking in the event log, I can see that SQL Server has logged:
Starting up database (my database)
...roughly twice per second all night long. (The event log filled up, so I can't see beyond yesterday evening).
Those "information" log entries stop at about 6am this morning, and are immediately followed by an "error" log entry saying:
There is insufficient memory in resource pool 'internal' to run this query
What the heck happened to my database?
Note: it's just possible that I left my web application running in "debug" mode overnight - although without anyone "driving" it I can't imagine that there would be much database traffic, if any.
It's also worth mentioning that I have a full-text catalog in the database (though as I say, there's hardly any actual content in the DB at present).
I have to say, this is worrying - I would not be happy if this were to happen to my production database!
With AUTO_CLOSE ON the database will be closed as soon as there are no connections to it, and re-open (run recovery, albeit a fast paced one) every time a connection is established to it. So you were seeing the message because every 2 second your application would connect to the database. You probably always had this behavior and never noticed before. Now that your database crashed, you investigated the log and discovered this problem. While is good that now you know and will likely fix it, this does not address you real problem, namely the availability of the database.
So now you have a database that won't come out of recovery, what do you do? You restore from you last backup and apply your disaster recovery plan. Really, that's all there is to it. And there is no alternative.
If you want to understand why the crash happened (it can be any of about 1 myriad reasons...) then you need to contact CSS (Product Support). They have the means to guide you through investigation.
If you wanted to turn off this message in event log.
Just goto SQL Server Management Studio,
Right click on your database
Select Options (from left panel)
Look into "Automatic" section, and change "Auto Close" to "False"
Click okay
That's All :)
I had a similar problem with a sql express database stuck in recovery. After investigating the log it transpired that the database was starting up every couple of minutes. Running the script
select name, state_desc, is_auto_close_on from sys.databases where name = 'mydb'
revealed that auto close was set to on.
So it appears that the database is in always in recovery but is actually coming online for a brief second before going offline again because there are no client connections.
I solved this with following script.
Declare #state varchar(20)
while 1=1
begin
Select #state = state_desc from sys.databases where name='mydb';
If #state = 'ONLINE'
Begin
Alter database MyDb
Set AUTO_CLOSE_OFF;
Print 'Online'
break;
End
waitfor delay '00:00:02'
end

How do I find what code is consumming my SQL Server connection pool?

I have rewritten the below based on the answers.
I have a website that causes HIGH CPU issues on the database server to the point where the server becomes unavailable. Recycling the app pool fixes the issue. According to the server administrator http://www.microsoft.com/downloads/details.aspx?FamilyID=28bd5941-c458-46f1-b24d-f60151d875a3&displaylang=en shows there are threads that are active for about an hour.
The interactions with the database are very simple and worked prior to us adding web forms routing to the application.
They only consists of code like this throughout the application.
Yes, this code is not perfect, but its not this code that is an issue as prior to us adding routing, there were no problems.
private string GetPublishedParagraphs()
{
string query, paragraphs = "";
try
{
m_sql_connection = new SqlConnection(m_base_page.ConnectionString());
query = "select * from PublishedParagraphs where IDDataContent_page='" + m_IDDataContent_page + "'";
SqlDataAdapter da = new SqlDataAdapter(query, m_sql_connection);
DataSet ds = new DataSet();
da.Fill(ds, "paragraph");
if (ds.Tables["paragraph"].Rows.Count > 0)
paragraphs = (string)ds.Tables["paragraph"].Rows[0]["paragraphs"];
ds.Dispose();
da.Dispose();
}
finally
{
m_sql_connection.Close();
}
paragraphs = paragraphs.Replace("™", "™");
return paragraphs;
}
The connection string looks like:
server_name; User ID=server_user; Password=server_password
We have meticulously checked that every call to the database Open() is followed by a Close().
We have measured there are no open connections by viewing them as we run the application locally and the connection count does not increase via:
SELECT SPID,
STATUS,
PROGRAM_NAME,
LOGINAME=RTRIM(LOGINAME),
HOSTNAME,
CMD
FROM MASTER.DBO.SYSPROCESSES
WHERE DB_NAME(DBID) = 'TEST' AND DBID != 0
(However, if we don't Close connections, there is a leak)
The difference between our application from when it worked is the addition of asp.net routing via web forms. This calls the database too, but again closes connections after they are open.
We are not sure what else we can check for.
Any ideas fellow programmers?
ANSWER
We found the problem via Query Profiler. This showed us a query with high usage. Tracing the query back to the code showed an infinite loop calling the database over and over. It was difficult to find as the loop was initiated by a bot calling a page on the website that no longer existed.
In the code you are showing, the ds and da .Dispose go in the finally block. Better yet, use the using () {} structure which ensures object disposal
the pattern of build your own string as a query isn't just a gaping security hole, it is also very inefficient. Use a stored procedure and a parameter instead.
the query for processes is overly restrictive. If you have a resource issue that is causing connections to be refused, it won't be limited to a single database. About the only thing I would restrict is the current command --> where spid != ##spid
REALLY need some error messages and context - where are they being seen? Tell us more and we can help!
Good luck!
First, great additional information! Thanks for the followup.
I would suggest that if you're so sure that the code you have posted has nothing to do with the problem that you remove it from the question. However, the problems aren't an issue of merely being "imperfect". Proper disposal of memory intensive objects - ones that the initial developers recognized as intensive enough to include the dispose() method - ones that interact with the database - while you are having unexplained problems with database isn't a small issue, in my opinion anyways.
I did some googling and found this. While I wouldn't go and say that this is the problem, it did get me to thinking. When "threads that are active for about an hour", is that being measured on the db server or on the web server? I'm not familiar with the tool, but are you able to post logs from this tool?
On the webserver, are you able to monitor the routing code's actions? Is the routing code written / setup in such a way as to protect against infinite loops - see the question and answers here text.
In the earlier version of my answer, I said to you that looking only # connections for a particular database was too restrictive for your task. The clarifications to your question do not indicate that you have corrected this query. I would suggest:
SELECT
is_being_blocked = sp.blocked
, sp.cpu
, DB_NAME ( dbid )
, sp.status
, LOGINAME=RTRIM(sp.LOGINAME)
, sp.HOSTNAME
, sp.Hostprocess
, sp.CMD
FROM SYSPROCESSES sp
WHERE spid != ##SPID
ORDER BY
sp.blocked ASC
, sp.cpu DESC
Logs - what are the SQL Server Logs saying in the time span 10 minutes before and 10 minutes after you restart the web app?
Have you tried and is this issue repeatable in development?
Please tell us what the below statement means in terms of your application - an error message or other: "the server becomes unavailable"
I highly suggest that, you startup a trace of sql server using profiler. According to what you are saying in this question, this is what I would trace saving to table ( on another sql server ) or saving to file ( on another machine NOT the sql server box ). This trace is for finding a problem that is severely hampering production. It's not something that you would want running on a regular basis.
I would capture these events
* Errors and Warnings - all of them
* Security Audit
** Audit Login
** Audit Logout
* Sessions
** Existing Sessions
* TSQL
** SQL: Stmt Starting
** SQL: Stmt Completed
** Prepare SQL
** Exec Prepared SQL
I wouldn't use any filters other than the presets.
Have you tried running the "sp_who2" query in SQL Server Management Studio to see how many active database connections there are as the code looks fine.
You might want to change the scope of the m_sql_connection variable from class scope to member scope. Perhaps that could be your issue?
what do you mean by "running out of application pool?" Do you mean the connection pool?
If your database seems to be getting overworked, it could also be because a user has free reign over your m_IDDataContent_page variable. This data access code is vulnerable to sql injection.

sql server 2005 deadlock times out in production, not in test environment: why?

In my development environment, I seek to recreate a production issue we
face with MSSQL 2005. This issue has two parts:
The Problem
1) A deadlock occurs and MSSQL selects one connection ("Connection X") as the 'victim'.
2) All subsequent attempts to use "Connection X" fail (we use connection pooling). MSSQL says "The server failed to resume the transaction"
Of the two, #2 if more serious: since "connection X" is whacked every
"round robin" attempt to re-use "connection x" fails--and mysterious
"random" errors appear to the user. We must restart the server.
Why I Write
At this point, however, I wish to recreate problem #1. I can create a
deadlock easily.
But here's my issue: whereas in production, MSSQL chooses one
connection (SPID) as the 'deadlock victim', in my test environment, the deadlock just hangs...and hangs and hangs. Forever? I'm not sure, but I left it hanging overnight and it still hung in the morning.
So here's the question: how can I make sql server "choose a deadlock victim" when a deadlock occurs?
Attempts so Far
I tried setting the "lock_timeout" parameter via the jdbc url ("lockTimeout=5000"), however I got a different message than in production (in test,"Lock request time out period exceeded." instead of in production "Transaction (Process ID 59) was deadlocked on lock resources with another process and has been chosen as the deadlock victim.")
Some details on problem #2
I've researched this "unable to resume the transaction" problem and found a
few things:
bad exception handling may cause this problem. E.g.: the java code does
not close the Statement/PreparedStatement and the driver's implementation
of "Connection" is stuck with a bad/stale/old "transaction ID"
A jdbc driver upgrade may make the problem go away.
For now, however, I just want to recreate a deadlock and make sql server
"choose a deadlock victim".
thanks in advance!
Appendix A. Technical Environment
Development:
sql server 2005 SP3 (9.00.4035.00)
driver: sqljdbc.jar version 1.0
Jboss 3.2.6
jdbc url: jdbc:sqlserver://<>;
Production:
sql server 2005 SP2 (9.00.3042.00)
driver: sqljdbc.jar version 1.0
Jboss 3.2.6
jdbc url: jdbc:sqlserver://<>;
Appendix B. Steps to force a deadlock
get connection A
get connection B
run sql1 with connection A
run sql2 with connection B
run sql1 with connection B
run sql2 with connection A
where
sql1:
update member set name = name + 'x' WHERE member_id = 71
sql2:
update member set name = name + 'x' WHERE member_id = 72
The explanation of why the JDBc connection enters the incorrect state is given here: The server failed to resume the transaction... Why?. You should upgrade to JDBC SQL driver v2.0 before anything else. The link also contains advice on how to fix the application processing to avoid this situation, most importantly about avoiding the mix of JDBC transaction API with native Transact-SQL transactions.
As for the deadlock repro: you did not recreate a deadlock in test. You just blocked waiting for a transaction to commit. A deadlock is a different thing and SQL Server will choose a victim, you do not have to set deadlock priority, lock timeouts or anything. Deadlock priorities are a completely different topic and are used to choose the victim in certain scenarios like high priority vs. low priority overnight batch processing.
Any deadlock investigation should start with understanding the deadlock, if you want to eliminate it. The Dedlock Graph Event Class in Profiler is the perfect starting point. With the deadlock graph info you can see what resources is the deadlock occuring on and what statements are involved. Most times the solution is either to fix the order of updates in application (always follow the same order) or fix the access path (ie. add an index).
Update
The UPDATE .. WHERE key IN (SELECT ...) is usually deadlocking because the operation is not atomic. Multiple threads can return the the same IN list because the SELECT part does not lock anything. This is just a guess, to properly validate you must look at the deadlock info.
To validate your hand made test for deadlocks you should validate that the blocking SPIDs form a loop. Look at SELECT session_id, blocking_session_id FROM sys.dm_exec_requests WHERE blocking_session_id <> 0. If the result contains a loop (eg. A blocked by B and B blocked by A) adn the server does not trigger a deadlock, that's a bug. However, what you will find is that the blocking list will not form a loop, will be something A blocked by B and B blocked by C and C not in the list, which means you have done something wrong in the repro test.
You can specify a Deadlock priority ffor a the session using
SET DEADLOCK_PRIORITY LOW | MEDIUM | HIGH
See this MSDN link for details.
You can also use the following command to view the open transactions
DBCC OPENTRAN (db_name)
This command may help you identify what is causing the deadlock. See MSDN for more info.
What are the queries being run? What is actually causing the deadlock?
You say you have two connections A and B. A runs sql1 then sql2, while B runs sql2 then sql1. So, what is the work (queries) being done? More importantly, where are the transactions? What isolation level are you using? What opens/closes the transactions? (Yes, this leads to questioning the exception processing used by your drivers--if they don't detect and properly process a returned "it didn't work" message, then you absolutely need to take them out back and shoot them--bullets or penicillin, your call.)
Understanding the explicit details underlying the deadlock will allow you to recreate it. I'd first try to recreate it "below" your application -- that is, open up two windows in SSMS, and recreate the application's actions step by step, by hand if/as necessary. Once you can do this, step back and replicate that in your application--all on your development servers, of course!
(A thought--are your Dev databases copies of your Production DBs? If Dev DBs are orders of magnitude smaller than Prod ones, your queries may be the same but what SQL does "under the hood" will be vastly different.)
A last thought, SQL will detect and process deadlocks automatically (I really don't think you can disable this), if yours are running overnight then I don't think you have a deadlock, but rather just a conventional locking/blocking issue.
[Posting this now -- going to look something up, will check back later.]
[Later]
Interesting--SQL Server 2005 compact edition does not detect deadlocks, it only does timeouts. You're not using that in Dev, are you?
I see no way to "turn off" or otherwise control the deadlock timeout period. I hit and messed with deadlocks just last week, and some arbitrary testing then indicated that deadlocks are detected and resolved in (for our dev server) under 5 seconds. It truly seems like you don't have deadlocks on you Dev machine, just blocking. But realize that this stuff is hard for "armchair DBAs" to analyzed, you'd really need to sit down and do some serious analysis of what's going on within the system when this problem is occuring.
[ This is a response to the answers. The UI does not allow longer 'comments' on answers]
What are the queries being run? What is actually causing the deadlock?
In my test environment, I ran very simple queries:
sql1:
UPDATE principal SET name = name + '.' WHERE principal_id = 71
sql2:
UPDATE principal SET name = name + '.' WHERE principal_id = 72
Then executed them in chiastic/criss-cross order, i.e. w/o any commits.
connectionA
sql1
connectionB
sql2
sql1
sql2
This to me seems like a basic example of a deadlock. If this a "mere lock", however, and not a deadlock, please disabuse me of this notion.
In production, our 'problematic query' ("prodbad") looked liked this:
UPDATE post SET lock_flag = ?
WHERE thread_id IN (SELECT thread_id FROM POST WHERE post_id = ?)
Note a few things:
1) This "prod problem query" actually works. AFAIK it had a
deadlock this one time
2) I suspect that the problem lies in page locking, i.e. pessimistic locking due to reads elsewhere in the transaction
3) I do not know what sql this transaction executed prior to this query.
4 )This query is an example of "I can do that in one sql statement"
processing, which while seems clever to the programmer ultimately causes much more IO than running two queries:
queryM:SELECT thread_id FROM POST WHERE post_id = ?
queryN: UPDATE post SET lock_flag = ? WHERE thread_id = <>
*>(A thought--are your Dev databases copies of your Production DBs?
If Dev DBs are orders of magnitude smaller than Prod ones, your queries may be the same but >what SQL does "under the hood" will be vastly different.)*
In this case the prod and dev db's differ. "Prod server" had tons of data. "Dev db" had little data. The queries were very differently. All I wanted to do was recreate a deadlock.
*> The server failed to resume the transaction... Why?. You should upgrade to JDB
C SQL driver v2.0 before anything else.*
Thanks. We plan on this change. Switching drivers introduces a little bit of risk, so we'll need to run some test..
To recap:
I had the "bright idea" to force a simple deadlock and see if my connection was "whacked/hosed/borked/etc." The deadlock, however, behaved differently than in production.

Resources