Ok, in an attempt to move my app to the cloud I've moved a local SQL database to Azure SQL. The problem is that the connection to that new Azure SQL database is so 'flakey' I'm about to bring it back in house.
The task is to loop and create a a total of about 481K records in the database.
The connection string is
"Server=tcp:xxx,1433;Initial Catalog=xx;Persist Security Info=False;User ID=xx;MultipleActiveResultSets=True;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;ConnectRetryCount=255;"
The SQL query it is running each time is not complicated. Just inserting three values into three columns. (column and values changed to protect some internal workings)
Insert Into TheTable (C1, C2, C3) VALUES ('V1', 'V2', 'V3')
but at random points it throws this.
System.ComponentModel.Win32Exception (0x80004005): The wait operation timed outExecution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds(String methodName, Boolean async, Int32 timeout, Boolean asyncWrite)
at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry)
at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
at XX in D:\PATHOFTHEFILE:line 420
Note that
1) I'm opening and closing the connection each time I create a record and closing it in the next step.
2) There's no one else hitting the database except me.
3) The database service is set for S1.
4) Yeah - I get the irony that the program is crashing on line 420. I'm trying to find ways to drug test the code.
Questions
1) Is there anything wrong with my connection string? The documentation says that I should use a Timeout of 30 when connecting to an Azure SQL database. Frankly the code ran better (or at least lived longer) when I had the timeout set for 0.
2) At first I tried using a single connection to handle the loop through the entire 481K INSERT statements. Is that a bad design? How long will Azure SQL reliability hold a connection?
3) I'm not getting a warm fuzzy feeling about the ability to build rock solid apps on Azure SQL. Can someone point me to a good reference about the difference between building for local SQL vs Azure SQL. I've gone through everything I can find and there just didn't seem to be that much out there.
4) I like the fact that I can connect to Azure SQL with MMC. But there are (generically speaking) all kinds of monitoring info I can't get from MMC anymore. Anyone have a link to something that can help me really understand what's going on in the database without using that dreadful Azure Portal
UPDATE #1
Guilty as charged
public static void RunSQL(string SQLString)
{
int a = 0;
SqlCommand Command = new SqlCommand(SQLString, TheConnection);
try
{
a = Command.ExecuteNonQuery();
}
catch (Exception ex)
{
Notifications.EventLogging.ProcessEvent(SQLString + " go boom " + ex.InnerException + ex.Message + ex.StackTrace);
Notifications.EventLogging.ProcessEvent("Time Of Death" + DateTime.Now);
Console.ReadKey();
}
Azure SQL instances are hosted on shared infrastructure. Azure will throttle requests to ensure that all instances on a server can meet the minimum SLA. In this life, death and taxes are guaranteed, but Azure SQL Connections are not.
To deal with this, you need to have automatic retry. Microsoft has provided a few options over the years, beginning with the now deprecated ReliableSqlConnection class. The preferred way to talk to Azure SQL these days is with Entity Framework 6.x, which has automatic retry built in.
In practice, an Azure SQL database that sees light and sporadic traffic rarely sees a throttle event. I've had developer friends who have deployed production code hitting an Azure SQL database, used raw SqlConnections and SqlCommands, and seemed genuinely surprised when I told them that connections aren't guaranteed. They'd never run across it! But if you're hitting the hell out of a server (as you are doing), I've seen it happen enough to be noticeable.
EDIT 12-6-2018: I have experienced this problem myself in the last few months. After combing through the logs, the culprit was spikes in database traffic that maxed out the DTU limit for my SQL Server. Retry is not necessarily a sufficient solution in this case because automatic retry effectively helps choke your SQL DB. To check and see if you are falling victim to DTU throttling, go to your Azure SQL DB's Overview tab, look at the resource utilization graph, and make SURE you select Max as the metric. It defaults to Avg, and this can hide spikes in DTU traffic.
Edit 8-6-2019: If you're maxing out DTUs and want to know what's causing it, there are a few places to look on the Azure SQL Management blade in the Azure Portal:
Query performance insights. You can use the tools here to find your top resource-consuming queries that have run in a given time period. This is a good place to start. Make sure to check through all metrics.
Performance recommendations. If you're missing an index, it's probably listed here.
Metrics. Check through all listed. Make sure to set your aggregation to Max.
This should give you good (or even great) indicators of what's going wrong.
Upgrade to higher level of Premium version, migrate and then downgrade accordingly. This will help you get around timeout errors.
Related
I'm working on a .NET application migration from Oracle to SQL Server database. The application was developed in the 2000s by a third party, so we intend to modify it as little as possible in order to avoid introducing new bugs.
I replaced the Oracle references to SqlClient ones (OracleConnection to SqlConnection, OracleTransaction to SqlTransaction etc.) and everything worked fine. However, I'm having trouble with a logic that tries to reconnect to the DB in case of errors.
If a problem occurs when trying to read/write to the database, method TryReconnect is called. This method checks whether the Oracle exception number is 3114 or 12571; if so, it tries to reopen the connection.
I checked these error codes:
ORA-03114: Not Connected to Oracle
ORA-12571: TNS: packet writer failure
I searched for the equivalent error codes for SQL Server but I couldn't find them. I checked the MSSQL and .NET SqlClient documentation but I'm not sure that any of those is equivalent to ORA-3114 and ORA-12571.
Can somebody help me deciding which error numbers should be checked in this logic? I thought about checking for codes 0 (I saw it happen when I stopped the database to force an error and test this) and -2 (Timeout expired), but I'm not really sure about it.
The behavior is different. You can't base your SQL Server retry logic on Oracle semantics. For starters, SqlConnection will retry to connect even in the old System.Data.SqlClient library. Its replacement, Microsoft.Data.SqlClient includes configurable retry logic to handle connections to cloud databases from on-premise applications, eg an on-prem application connecting to Azure SQL. This retry logic is on by default in the current RTM version , 3.0.0.
You can also look at high-level resiliency libraries like Polly, a very popular resiliency package that implements recovery strategies like retries with backoff, circuit breakers etc. This article describes Cadru.Polly which contains strategies for handling several SQL Server transient faults. You could use this directly or you can handle the transient error numbers described in that article:
Exception Handling Strategy
Errors Handled
SqlServerTransientExceptionHandlingStrategy
40501, 49920, 49919, 49918, 41839, 41325, 41305, 41302, 41301, 40613, 40197, 10936, 10929, 10928, 10060, 10054, 10053, 4221, 4060, 12015, 233, 121, 64, 20
SqlServerTransientTransactionExceptionHandlingStrategy
40549, 40550
SqlServerTimeoutExceptionHandlingStrategy
-2
NetworkConnectivityExceptionHandlingStrategy
11001
Polly allows you to combine policies and specify different retry strategies for them, eg :
Using a cached response in some cases (lookup data?)
Retrying with backoff (even random delays) in other cases (deadlocks?). Random delays can be very useful if you run into timeouts because too many concurrent operations cause deadlocks or timeouts. Without it, all failing requests would retry at the same time, causing yet another failure
Using a circuit breaker to switch to a different service or server.
You could create an Oracle strategy so you can use Polly throughout your projects and handle all recoverable failures, not just database retries.
In SQL Server Management Studio, I discovered an issue while attempting to disable a trigger on one of our tables in our Azure SQL Database, which is set at one of the highest-available performance tiers (Business Critical Gen 5). I used right-click disable to accomplish this. I receive the following error after the timeout period expires:
Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. (.Net SqlClient Data Provider)
------------------------------
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=12.00.0700&EvtSrc=MSSQLServer&EvtID=-2&LinkId=20476
------------------------------
Server Name: searchfoundry.database.windows.net
Error Number: -2
Severity: 11
State: 0
------------------------------
Program Location:
at Microsoft.SqlServer.Management.Common.ConnectionManager.ExecuteTSql(ExecuteTSqlAction action, Object execObject, DataSet fillDataSet, Boolean catchException)
at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteNonQuery(String sqlCommand, ExecutionTypes executionType, Boolean retry)
The wait operation timed out
The help link goes to a 404 - no surprise there.
EDIT #1:
#DanGuzman kindly assisted me in suggesting I run the disable trigger as T-SQL. This worked. However, in between the time I originally posted, and the time he gave his suggestion, I discovered that this timeout error is occurring within other areas of our infrastructure/services which use this database.
I have run a query to check for blocking sessions on this database. There are none listed. I have also increased the timeout period from 30 seconds to 5 minutes. Items are still timing out.
I am looking for guidance on what other queries I can run to look under the hood of this database to determine what is causing these timeouts to occur.
I'd be happy to just restart the SQL Server to resolve this, but as many of us know, there is no restarting Azure SQL Servers, unfortunately.
Increasing the DTUs works for me.
I had fixed similar timeout issues by increasing the DTU quota of the DB
Before reading please note that I've googled this and read a ton of articles on SO and elsewhere, no suggestions have worked as of yet.
I'm randomly getting a timeout error that occurs when logging into my MVC 3 application. It happens well over half of the time - all other times the application logs on just fine. When it doesn't work, it tries for about 10 seconds then errors.
The error:
Exception: "An error occurred while executing the command definition. See the inner exception for details."
Inner Exception: {"Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.\r\nTimeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding."}
This happens inside my repository class that directly interacts with entity framework.
It seems to happen when logging in and simply pulling a quick check from the database, such as:
return entities.Users.SingleOrDefault(user => user.UserName == userName);
return (entities.Users.SingleOrDefault(u => u.UserId == user.UserId || u.UserName == user.UserName) != null);
Things I've tried:
SQL Server validation
Integrated Security (I even gave every possible account full database access)
Running outside of IIS
Setting Connect Timeout extremely high (Connect Timeout=50000) in the connection string. (I do not have Default Command Timeout set here)
Setting the CommandTimeout to 0, 5000, 100000, whatever, on my entity connection: entities.CommandTieout = 100000;
Setting the CommandTimeout inside every using statement where I use an instance of the repository.
Flipping SingleOrDefault to FirstOrDefault etc.
Enabling/Disabling Lazy Loading (Why not?)
If it helps:
I am using a custom role and membership provider.
I'm just making calls from my controller inside a using statement (AccountRepository bleh = new AccountRepository()) and the AccountRepository implements IDisposable etc.
The entity model is in a separate project.
I'm running the site in IIS. It's setup with the 4.0 integrated app pool.
All accounts have full database access.
When the error occurs, it doesn't take no where near as long as I have set in the web config (50000 I think) or for the commandtimeout in the repository.
It's not doing much on the login, just validating user, getting user role then loading up some small amount data, but the error always occurs when getting the user data on login.
When I try it outside of debugging it repeats the error four or five times (with custom errors off).
Here is the full exception from the event log:
Exception information:
Exception type: SqlException
Exception message: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning()
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.EntityClient.EntityCommandDefinition.ExecuteStoreCommands(EntityCommand entityCommand, CommandBehavior behavior)
You could examine the problem from the database server by setting up a SQL Server Profiler.
You can find lots of info about SQL Profiler by just googling around. Here's a site with a video that might help you get started.
Edit: While this did indeed help me it was not the solution. The problem still exists for anyone reading in the future.
Just to let everyone know - I believe that I have found the issue. Through the SQL Profiler, I saw that the account being used to access SQL was in fact the local system account. I then realized that in an attempt to fix an issue prior, I had changed the ASP.NET v4.0 app pool to use the local system account. I went and changed the Identity back to 'ApplicationPoolIdentity', added the IIS APPPOOL\ASP.NET v4.0 user to the database and so far everything has been working great. #DOK - Thank you much for the information on SQL Profiler, it helped tremendously! Thanks everyone else also!
I have a multithreaded Windows Service I've developed with VS 2010 (.NET 4.0) which can have anywhere from a few to a few dozen threads, each retrieving data from a slow server over the Internet and then using a local database to record this data (so the process is Internet-bound, not LAN or CPU bound).
With some regularity, I am getting a flood/flurry/burst of the following error from several threads simultaneously:
System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
The call stack for this error is typically:
at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
at System.Data.SqlClient.SqlConnection.Open()
I'm not specifying a Connection Timeout in the connection string, and there are other applications and processes working in this database. Has anyone come across this kind of behavior and if so what was done to prevent it?
The most commonly-called method in my data access layer looks like this, and all my other DAL methods follow the same approach:
using (SqlConnection con = new SqlConnection(GetConnectionString()))
using (SqlCommand cmd = new SqlCommand("AddGdsMonitorLogEntry", con))
{
cmd.CommandType = CommandType.StoredProcedure;
/* setting cmd.Parameters [snipped] */
// We have been getting some timeouts writing to the log; wait a little longer than the default.
cmd.CommandTimeout *= 4;
con.Open();
cmd.ExecuteNonQuery();
}
Thanks very much!
EDIT
Given comments about this occurring in mirrored environments, I should indeed mention that the database in question is mirrored. It's marked in SSMS as "Principal, Synchronized", in "High safety without automatic failover (synchronous)" mode.
EDIT 5/26/11
I am seeing nothing in the SQL Server logs to indicate any problems. (I don't have access to the Windows Event Viewer on that server, but I've asked for someone to look for me.)
According to the MSDN Blog post just created today (hooray for Google!):
Microsoft has confirmed that this is a problem in the current release of ADO.NET. This issue will be fixed in ADO.NET version, ships with Visual Studio 2011.
In the meantime, we request to use the following workarounds:
Increase the connection string timeout to 150 sec. This will give the first attempt enough time to connect( 150* .08=12 sec)
Add MinPool Size=20 in the connection string. This will always maintain a minimum of 20 connections in the pool and there will be less chances of creating new connection, thus reducing the chance of this error.
Improve the network performance. Update your NIC drivers to the latest firmware version. We have seen network latency when your NIC card is not compatible with certain Scalable Networking Pack settings. If you are on Windows Vista SP1 or above you may also consider disabling Receive Window Auto-Tuning. If you have NIC teaming enabled, disabling it would be a good option.
The post itself is an interesting read, talking about a TCP/IP connection retry algorithm. And kudos to all the folks who said "hey this looks like it's related to mirroring..."! And note the comment about this being "because of slow response from SQL Server or due to network delays".
UGH!!!
Thanks to everyone who posted. Now we must all ask for a patch to the .NET Framework (or some other ADO.NET patching mechanism), so we don't have to wait for (and buy) Visual Studio 11...
Connection timeout is a different thing than command timeout. Command timeout applies to situation when you have connection established, but due to some internal reasons server cannot return any results within required time. Default command timeout is 30 seconds.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout.aspx
Try to specify connection timeout in the connection string. Default value is 15 seconds what may be the reason of the issue you see.
You can also specify connection timeout in code:
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectiontimeout.aspx
I get this every once in a while on this old database server that we have (coming up on 10 years old now). When it does happen though it's because something is hammering that thing with connections/queries constantly. My guess is that you'll find that when it happens the database server is under load (or a high number of connections or something along those lines) Anyway, in my experience if you can optimize the code, optimize the database, getting a beefier database server, etc. all helps. Another thing you can do, which Piotr suggests, is simply up the timeout for the connection. I'd still go through and optimize some stuff though (should help in the long run).
I have been able to somewhat reliably reproduce this problem. I have a service that when a processing job is requested it kicks off processing in a new appdomain / thread. This thread will execute 10 to 16 database queries simultaneously. When I run 30 of these jobs one after another then a random one or two of the jobs will crash with the timeout error.
I changed the connection string to turn off Connection Pooling with Pooling=false and then the error changed to the following. This gets thrown 3 or 4 times inside an aggregate exception, since the connections are happening inside a Parallel.For
System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning()
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()
at System.Data.SqlClient.TdsParser.ConsumePreLoginHandshake(Boolean encrypt, Boolean trustServerCert, Boolean& marsCapable)
at System.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSecurity)
at System.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, SqlConnection owningObject)
at System.Data.SqlClient.SqlInternalConnectionTds.LoginWithFailover(Boolean useFailoverHost, ServerInfo primaryServerInfo, String failoverHost, String newPassword, Boolean redirectedUserInstance, SqlConnection owningObject, SqlConnectionString connectionOptions, TimeoutTimer timeout)
at System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(SqlConnection owningObject, TimeoutTimer timeout, SqlConnectionString connectionOptions, String newPassword, Boolean redirectedUserInstance)
at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, Object providerInfo, String newPassword, SqlConnection owningObject, Boolean redirectedUserInstance)
at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup)
at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
at System.Data.SqlClient.SqlConnection.Open()
at Tps.PowerTools.CoreEngine.V5.DataAccess.DataContext.ExecuteQuery(PtQuery query, ValueStore`1 store, String readerDescription) in C:\SourceCode\Tps.PowerToolsV1\Trunk\Libraries\CoreEngine\CoreEngine.V5\DataAccess\DataContext.cs:line 326
at Tps.PowerTools.CoreEngine.V5.DataAccess.DataContext.<StockHistoricalData>b__15(PtQuery query) in C:\SourceCode\Tps.PowerToolsV1\Trunk\Libraries\CoreEngine\CoreEngine.V5\DataAccess\DataContext.cs:line 302
at System.Threading.Tasks.Parallel.<>c__DisplayClass32`2.<PartitionerForEachWorker>b__30()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass7.<ExecuteSelfReplicating>b__6(Object )
Optimizing the queries you are executing on the remote server will always help. Time each query and look for long running ones. If you are just doing reads then use the (NOLOCK) hint on the SELECT statements. This was a life saver for me. Just read up on it to make sure it is appropriate in your application. If you have access to the remote database make sure the indexes are not to fragmented. This will cause a major slow down in query execution. Make sure indexes are rebuilt/reorganized as part of the SQL maintenance plan. Add new indexes where appropriate.
Extending the timeout may make matters worse. If you let queries run longer then, potentially, more queries will time out. The timeout is there to protect the server and other clients accessing it. Bumping it up a little is not a huge deal but you don't want queries running for a long time killing the server.
I have rewritten the below based on the answers.
I have a website that causes HIGH CPU issues on the database server to the point where the server becomes unavailable. Recycling the app pool fixes the issue. According to the server administrator http://www.microsoft.com/downloads/details.aspx?FamilyID=28bd5941-c458-46f1-b24d-f60151d875a3&displaylang=en shows there are threads that are active for about an hour.
The interactions with the database are very simple and worked prior to us adding web forms routing to the application.
They only consists of code like this throughout the application.
Yes, this code is not perfect, but its not this code that is an issue as prior to us adding routing, there were no problems.
private string GetPublishedParagraphs()
{
string query, paragraphs = "";
try
{
m_sql_connection = new SqlConnection(m_base_page.ConnectionString());
query = "select * from PublishedParagraphs where IDDataContent_page='" + m_IDDataContent_page + "'";
SqlDataAdapter da = new SqlDataAdapter(query, m_sql_connection);
DataSet ds = new DataSet();
da.Fill(ds, "paragraph");
if (ds.Tables["paragraph"].Rows.Count > 0)
paragraphs = (string)ds.Tables["paragraph"].Rows[0]["paragraphs"];
ds.Dispose();
da.Dispose();
}
finally
{
m_sql_connection.Close();
}
paragraphs = paragraphs.Replace("™", "™");
return paragraphs;
}
The connection string looks like:
server_name; User ID=server_user; Password=server_password
We have meticulously checked that every call to the database Open() is followed by a Close().
We have measured there are no open connections by viewing them as we run the application locally and the connection count does not increase via:
SELECT SPID,
STATUS,
PROGRAM_NAME,
LOGINAME=RTRIM(LOGINAME),
HOSTNAME,
CMD
FROM MASTER.DBO.SYSPROCESSES
WHERE DB_NAME(DBID) = 'TEST' AND DBID != 0
(However, if we don't Close connections, there is a leak)
The difference between our application from when it worked is the addition of asp.net routing via web forms. This calls the database too, but again closes connections after they are open.
We are not sure what else we can check for.
Any ideas fellow programmers?
ANSWER
We found the problem via Query Profiler. This showed us a query with high usage. Tracing the query back to the code showed an infinite loop calling the database over and over. It was difficult to find as the loop was initiated by a bot calling a page on the website that no longer existed.
In the code you are showing, the ds and da .Dispose go in the finally block. Better yet, use the using () {} structure which ensures object disposal
the pattern of build your own string as a query isn't just a gaping security hole, it is also very inefficient. Use a stored procedure and a parameter instead.
the query for processes is overly restrictive. If you have a resource issue that is causing connections to be refused, it won't be limited to a single database. About the only thing I would restrict is the current command --> where spid != ##spid
REALLY need some error messages and context - where are they being seen? Tell us more and we can help!
Good luck!
First, great additional information! Thanks for the followup.
I would suggest that if you're so sure that the code you have posted has nothing to do with the problem that you remove it from the question. However, the problems aren't an issue of merely being "imperfect". Proper disposal of memory intensive objects - ones that the initial developers recognized as intensive enough to include the dispose() method - ones that interact with the database - while you are having unexplained problems with database isn't a small issue, in my opinion anyways.
I did some googling and found this. While I wouldn't go and say that this is the problem, it did get me to thinking. When "threads that are active for about an hour", is that being measured on the db server or on the web server? I'm not familiar with the tool, but are you able to post logs from this tool?
On the webserver, are you able to monitor the routing code's actions? Is the routing code written / setup in such a way as to protect against infinite loops - see the question and answers here text.
In the earlier version of my answer, I said to you that looking only # connections for a particular database was too restrictive for your task. The clarifications to your question do not indicate that you have corrected this query. I would suggest:
SELECT
is_being_blocked = sp.blocked
, sp.cpu
, DB_NAME ( dbid )
, sp.status
, LOGINAME=RTRIM(sp.LOGINAME)
, sp.HOSTNAME
, sp.Hostprocess
, sp.CMD
FROM SYSPROCESSES sp
WHERE spid != ##SPID
ORDER BY
sp.blocked ASC
, sp.cpu DESC
Logs - what are the SQL Server Logs saying in the time span 10 minutes before and 10 minutes after you restart the web app?
Have you tried and is this issue repeatable in development?
Please tell us what the below statement means in terms of your application - an error message or other: "the server becomes unavailable"
I highly suggest that, you startup a trace of sql server using profiler. According to what you are saying in this question, this is what I would trace saving to table ( on another sql server ) or saving to file ( on another machine NOT the sql server box ). This trace is for finding a problem that is severely hampering production. It's not something that you would want running on a regular basis.
I would capture these events
* Errors and Warnings - all of them
* Security Audit
** Audit Login
** Audit Logout
* Sessions
** Existing Sessions
* TSQL
** SQL: Stmt Starting
** SQL: Stmt Completed
** Prepare SQL
** Exec Prepared SQL
I wouldn't use any filters other than the presets.
Have you tried running the "sp_who2" query in SQL Server Management Studio to see how many active database connections there are as the code looks fine.
You might want to change the scope of the m_sql_connection variable from class scope to member scope. Perhaps that could be your issue?
what do you mean by "running out of application pool?" Do you mean the connection pool?
If your database seems to be getting overworked, it could also be because a user has free reign over your m_IDDataContent_page variable. This data access code is vulnerable to sql injection.