I have 2 websites connecting to the same instance of MSSQL via classic ASP. Both websites are similar in nature and run similar queries.
One website chokes up every once in a while, while the other website is fine. This leads me to believe MSSQL is not the problem, otherwise I would think the bottleneck would occur in both websites simultaneously.
I've been trying to use Performance Monitor in Windows Server 2008 to locate the problem, but since everything is in aggregate form, it's hard to find the offending asp page.
So I am looking for some troubleshooting tips...
Is there a simple way to check all recent ASP pages and the see amount of time they ran for?
Is there a simple way to see live page requests as they happen?
I basically need to track down this offending code, but I am having a hard time seeing what happening in real-time through IIS.
If you use "W3C Extended Logging" as the log mode for your IIS logfiles, then you can switch on a column "time-taken" which will give you the execution time of each ASP in milliseconds (by default, this column is disabled). See here for more details.
You may find that something in one application is taking a lock in the database (e.g. through a transaction) and then not releasing it, which causes the other app to timeout.
Check your code for transactions and them being closed, and possibly consdier setting up tracing on the SQL server to log deadlocks.
Your best bet is to run SQL Server profiler to see what procedure or sql may be taking a long time to execute. You can also use Process Monitor to see any pages that may be taking a long time to finish execution and finally dont forget to check your IIS logs.
Hope that helps
Related
We use Interbase 2020 as production DB using UTF8 (approx 250 simultaneous user). With this database we have two main problems that we are not able to solve.
In history we had a problem with an older udf-function that crashed our database because it was not ready for unicode string operation. As a result we changed to unicode compatible versions.
The last few years sometimes we get hiccup (as we call it). In this case every client looses connection and the guardian restarts. The clients can connect again without us doing anything.
The second problem is that sometimes the interbase does not crash but everyone looses the connection and it is not possible to reconnect (by client, or ibexpert for example). In this case we have to restart the whole server.
These problems are occuring irregular. Most times it first starts with a hiccup. After a time (maybe two to ten hours later), the second problem arrives and we need to restart our database. If we are lucky we need to restart the server 2-3 times, on a bad day we need to restart the server more often as the second problem returns again and again (for example every 30 minutes).
We are not yet able to locate this problem. It doesn't matter if a user is connected to the database or just idling on weekends. It also often happens when nobody is connected.
Even the server logs don't give hints that helped us yet.
-We minimized udf function use as low as possible, changed to newer udfs that support unicode etc.
-functions that crash the server (afaik) are guarded that they dont get for example invalid datetimes
-We update database server regularely to newest version
-also updated client dlls
-also updated connection components (IBDAC) + Delphi 11.1
-wrote exception tracker in our client software (unfortunately there is only the connection lost error)
-regularely check active transactions if something hangs/loops/snapshot creation
Do you have any information that we could use to solve our problems? Is there any possibility to get more info out of the log files (other log levels possible?)? We don't want to log every procedure call if not necessary, but if there are no other options we need to..
Thanks for your help!
Matze,
I suggest you log a Case with our Support team at Embarcadero (https://www.embarcadero.com/support). They will work with you to understand the specifics of the crash, get relevant details (and Performance Monitoring information) from you, and help us work on a resolution (if not addressed already in our latest update).
We have addressed a few corner cases (and other crash reports) in many updates over the past couple years in InterBase 2020, and are eager to get to the bottom of this issue as well. You can see some of the resolved crash reports at https://docwiki.embarcadero.com/InterBase/2020/en/Resolved_Defects
Supporting 250 simultaneous users is not the problem, but understanding how the use cases are running into any potential system resource limits is important.
You do mention that you have the latest updates to InterBase 2020, but I do not see a build number in your message. You can get the most recent update build (14.4.0.804) of the server (if on Windows) from https://my.embarcadero.com/#downloadDetail/1383
I have got a bug case from the service desk, which was a result of different system times on the application server (JBoss) and DB(Oracle) server. As a result, timeouts lied.
It doesn't happen often, but for the future, it will be better if the app server could raise alarm about the bad time on the DB server before it results in some deeper problems.
Of course, I can simply read
Select CURRENT_TIMESTAMP
, and compare it against the local time. But it is probable that the time of sending the query and getting its result will get some noticeable time and I will recognize good time as bad one or vice versa.
I can also check the time from sending the query to the return of the result. But this way will work correctly in the case of the good net without lags. And if the time on the DB server fails, it is highly probable that the net around the DB server is not OK. The queues on the DB server can make the times of sending and receiving noticeably unequal.
What is the best way you know to check the time on the DB server?
Limitations: preciseness of 5 sec
false alarms <10%
To be optimized(minimized): lost alarms.
Maybe I am inventing the bicycle and JBoss and/or Oracle have some tool for that? (I could not find it)
Have a program running on the app server get the current time there, then query the database time (CURRENT_TIMESTAMP) and the app server gets the current time there after the query returns.
Confirm that the DB time is between the two times on the App Server (with any tolerance you need). You can include a separate check on how long it took to get the response from the DB but it should be trivial.
If the environment is some form of VM, issues are most likely to arise when the VM is started or resumed from a pause. There might be situations where a clock is running fast or slow so recording the times would allow you to look for trends in either direction and allow you to take preemptive action.
I have a problem with the spectre/meltdown patch from windows (it got released somewhere around Q1 last year). When activated, my delphi REST service is being slowed down about 15 times (so if a request takes 1 second, with the activated patches its about 15 seconds). I have traced the slowdown down to the database connection. Somehow the translation from parameters, after they have all been set, to the sql text, takes really long and then the execution on the database itself takes a lot longer than usual. First I helped myself by cutting down the sql statement to couple of rows, and it got faster (so more rows mean a lot more time. Approximately its like, if you add one more row to an update/insert statement it takes 0.2-0.3 seconds more to process the transaction. As far as I saw it, select statements work fine).
After I got the same issue on other requests, and the application is still in development, I turned of the patches, and everything got a lot faster. Now the administrator insists that the patches are being turned on, and the problem is there again .
Did anybody experience something like this, or is there a possiblity to exclude an application from being targeted by the patches? The strange thing is, I also have an client/server application that is using the same business logic. The client/server application is also being slowed down, but approximately just around the factor of 2. So thats the thing that I dont quite understand. With the same functions, it takes a lot longer from within the service, than from the client/server application.
Ah yes, I am using devart for the database connection, and its an mssql server (2016). The service and the client/server application are written in delphi XE7 (now trying to update do Xe10.2 hoping that this will help)
Thanks
Sorry for the long introduction but before I can ask my question, I think giving the background would help understanding our problem much better.
We are using sql server 2008 for our web services as the backend and from time to time it takes too much time for responding back for the requests that supposed to run really fast, like taking more than 20 seconds for a select request that queries a table that has only 22 rows. We went through many potential areas that could cause the issue from indexes to stored procedures, triggers etc, and tried to optimize whatever we can like removing indexes that are not read but write frequently or adding NOLOCK for our select queries to reduce the locking of the tables (we are OK with dirty reads).
We also had our DBA's reviewed the server and benchmarked the components to see any bottlenecks in CPU, memory or disk subsystem, and found out that hardware-wise we are OK as well. And since the pikes are occurring occasionally, it is really hard to reproduce the error on production or development because most of the time when we rerun the same query it yields response times that we are expecting, which are short, not the one that has been experienced earlier.
Having said that, I almost have been suspicious about I/O although it is not seem to be a bottleneck. But I think I was just be able to reproduce the error after running an index fragmentation report for a specific table on the server, which immediately caused pikes in requests not only run against that table but also in other requests that query other tables. And since the DB, and the server, is shared with other applications we use and also from time to time queries can be run on the server and database that take long time is a common scenario for us, my suspicion regarding occasional I/O bottleneck is, I believe, becoming a fact.
Therefore I want to find out a way that would prioritize requests that are coming from web services which will be processed even if there are other resource sensitive queries being run. I have been looking for some kind of prioritization I described above since very beginning of the resolution process and found out that SQL Server 2008 has a feature called 'Resource Governor' that allows prioritization of the requests.
However, since I am not an expert on Resource Governor nor a DBA, I would like to ask other people's experience who may have used or is using Resource Governor, as well as whether I can prioritize I/O for a specific login or a specific stored procedure (For example, if one I/O intensive process is being run at the time we receive a web service request, can SQL server stops, or slows down, I/O activity for that process and give a priority to the request we just received?).
Thank you for anyone that spends time on reading or helping out in advance.
Some Hardware Details:
CPU: 2x Quad Core AMD Opteron 8354
Memory: 64GB
Disk Subsystem: Compaq EVA8100 series (I am not sure but it should be RAID 0+1 accross 8 HP HSV210 SCSI drives)
PS:And I can almost 100 percent sure that application servers are not causing the error and there is no bottleneck we can identify there.
Update 1:
I'll try to answer as much as I can for the following questions that gbn asked below. Please let me know if you are looking something else.
1) What kind of index and statistics maintenance do you have please?
We have a weekly running job that defrags indexes every Friday. In addition to that, Auto Create Statistics and Auto Update Statistics are enabled. And the spikes are occurring in other times than the fragmentation job as well.
2) What kind of write data volumes do you have?
Hard to answer.In addition to our web services, there is a front end application that accesses the same database and periodically resource intensive queries needs to be run to my knowledge, however, I don't know how to get, let's say weekly or daily, write amount to DB.
3) Have you profiled Recompilation and statistics update events?
Sorry for not be able to figure out this one. I didn't understand what you are asking about by this question. Can you provide more information for this question, if possible?
first thought is that statistics are being updated because of the data change threshold is reached causing execution plans to be rebuilt.
What kind of index and statistics maintenance do you have please? Note: index maintenance updates index stats, not column stats: you may need separate stats updates.
What kind of write data volumes do you have?
Have you profiled Recompilation and statistics update events?
In response to question 3) of your Update to the original question, take a look at the following reference on SQL Server Pedia. It provides an explanation of what query recompiles are and also goes on to explain how you can monitor for these events. What I believe gbn is asking (feel free to correct me sir :-) ) is are you seeing recompile events prior to the slow execution of the troublesome query. You can look for this occurring by using the SQL Server Profiler.
Reasons for Recompiling a Query Execution Plan
I have a stored procedure that is called by a website to display data. Today the web page has started timing out so I got profiler going and saw the query that was taking too long. I then ran the same query in management studio, under the same user login, and it takes less than a second to return.
Is there anything obvious that could be causing this? I can't think of a reason why when ASP calls the stored proc it takes 30 secs but when I call it it's fine.
Thanks
I guess, there might be two reasons:
Network problem
Parameter sniffing
This is usually because some of the SET-tings differ between the Management Studio connection and the ASP connection, such as SET ARITHABORT. This wouldn't explain why it's only started being problematic today from the website call, but there's a fair chance it's related.
It seemed to be parameter sniffing...
I've stopped the sniffing by assigning the passed in parameters to local variables and it seems to be fine at the moment (i.e. it's running under a second from the website again). It'll be interesting to see if it stays like this or will degrade again.
I had assumed running with the option RECOMPILE would have temporarily 'fixed' the parameter sniffing problem for the query in question but it didn't.
Ah well. Thank you everyone for answering. I'll see what happens
We had a similar issue with our IVR - when I ran a query through SSMS, it returned instantly, but when it was run through a webservice accessed by our IVR, it would time out about 20% of the time - really odd.
I ended up running SQL Profiler to see the queries being submitted and then added some additional indexes per the recommendations of the Index Tuning wizard, which sped up the IVR query to under a second every time. I suspect the problem was also something to do with parameters, and while I didn't compare the execution plan between the two different venues, I suspect they were quite different. SQL Profiler will help you sort this out, though, since you can see the query actually submitted to the engine, as well as the execution plan it uses to fetch the data.
Sounds like a dead lock.