IIS response time hight every 10-15 minutes for the same simple request - angularjs

We have a performance issue with an AngularJS website hosted on IIS. This issue only affects our users connected via VPN (working from home).
The problem: regularly, a page that usually takes one or two seconds to load can take over 10 seconds.
This issue first appeared to be random, but we were able to reproduce it in a test environment and found out that the problem seems to arise on a very regular basis (every 10-15 minutes).
What we did: using a tool (ThousandEyes), we send every minute the same simple GET request via 12 clients to the Test server. We can see in the IIS logs that this request is processed in less than 50ms most of the time. However, every 15 minutes or so, the same request takes more than 5 seconds to process at least for 1 client. Example below: the calls done every minutes by client #1 takes more than 5 sec at 21:12, 21:13, 21:14, then 21:28, 21:29, then 21:45:
The graph below shows the mean response times for the 12 clients (peak every 10-15 minutes):
For both the test and the production environments, this issue only affect users connected via VPN (but not all the users connected via VPN are affected at the same time).
Any idea what can cause this behavior ?
All suggestions and questions are welcome.
Notes:
Session State. InProcess. I tried Not Enabled and State Server but we still have the same results.
Maximum Worker Process. 1. I tried 2, no change.
Test server usage. As far as I can tell, nothing special happen every 15 minutes on the server (no special events).
Test server configuration: 2 Xeon proc #2.6GHz, 8 GB RAM, 20 GB disk space, Windonws 2016.
Test server load: almost nothing beside these 12 requests every minute from the 12 test clients.

This issue cost us a lot of time. We finally found out that a VPN server was misconfigured.
Rebuilding this server was the solution.

Related

Sudden spikes in SQL connections causing timeouts

For the last week I've been experiencing intermittent mini-outages lasting between 1-3 minutes every few hours. We're using .NET Framework 4.7.2 and EF6 on top of Azure SQL for years, and it has served us well. Starting about 10 days ago however, we're seeing these sudden bursts of SQL connections being opened. These sudden bursts of SQL connections are causing timeouts on any new requests, causing our website to be inaccessible. For context: our platform sees about 1.1 million unique visitors every single day and traffic is always very stable and predictable with no sudden bursts, even during the mini outages traffic is perfectly normal.
The exception we get during these bursts is '
Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.'
We use a combination of StructureMap with Nested Contaienrs to inject our DbContext in controllers and services, and in legacy parts of the codebase we religiously consume our DbContext with usings. We never manually open a connection (so no SqlConnections floating around)
Azure metrics (Succesful connections last 48 hours)
Azure SQL usage charts (24 hours)
The spikes here don't align with the outages so don't seem suspicious to us.
These bursts automatically resolve themselves within minutes. If I'm fast enough while our platform alerts notify us, I can confirm using 'exec sp_who' that there are indeed an excessive amount of idle connections (status=sleeping, cmd=AWAITING COMMAND) to our database. We are constantly running on 4 similarly specced VM's, and when this burst happens, the idle connections don't originate from one single machine.
We've been scratching our heads for the last week especially since the way we've been using EF6 and SQL Server has been a very smooth ride for several years. We obviously scoured over every single change we've made to the platform codebase over the last 2 weeks to spot anything that seems suspicious, but sadly that hasn't resulted in anything yet. We're also diligently squashing and tuning all our heavier un-optimized queries in a bid to fix this, but they've been serving the website fine for years and this really only started about 10-12 days ago.
Anyone who can give some insight into what might cause such very sudden bursts; any advice would be greatly appreciated.
Thank you in advance

How can I check if the system time of the DB server is correct?

I have got a bug case from the service desk, which was a result of different system times on the application server (JBoss) and DB(Oracle) server. As a result, timeouts lied.
It doesn't happen often, but for the future, it will be better if the app server could raise alarm about the bad time on the DB server before it results in some deeper problems.
Of course, I can simply read
Select CURRENT_TIMESTAMP
, and compare it against the local time. But it is probable that the time of sending the query and getting its result will get some noticeable time and I will recognize good time as bad one or vice versa.
I can also check the time from sending the query to the return of the result. But this way will work correctly in the case of the good net without lags. And if the time on the DB server fails, it is highly probable that the net around the DB server is not OK. The queues on the DB server can make the times of sending and receiving noticeably unequal.
What is the best way you know to check the time on the DB server?
Limitations: preciseness of 5 sec
false alarms <10%
To be optimized(minimized): lost alarms.
Maybe I am inventing the bicycle and JBoss and/or Oracle have some tool for that? (I could not find it)
Have a program running on the app server get the current time there, then query the database time (CURRENT_TIMESTAMP) and the app server gets the current time there after the query returns.
Confirm that the DB time is between the two times on the App Server (with any tolerance you need). You can include a separate check on how long it took to get the response from the DB but it should be trivial.
If the environment is some form of VM, issues are most likely to arise when the VM is started or resumed from a pause. There might be situations where a clock is running fast or slow so recording the times would allow you to look for trends in either direction and allow you to take preemptive action.

High Response to Server

I have been searching for the reason why the Response to Server (TTFB) has some major delays on:
https://tools.pingdom.com/#!/cu9yoV/https://graphic-cv.com
without finding the real reason. It resides on a shared server which in fact is running with good hardware and the site is running PHP 7.1, APCu cache, Cloudflare, PrestaShop 1.6.1.18 platform configured with the best speed optimization setting in the backend
As seen on the metrix test the site requests are loading within seconds, but the first http/https request to the server can delay the site all from 3 seconds to 20 seconds. If I do a re-test it will go down to 2-5 seconds, but if I haven't accessed the site 30 min and up, issues will arise again with high load time.
How do I find the culprit which is delaying the TTFB? The hosting company with all their resources for testing/monitoring haven't provided me with a clear answer.
It was hardware related. After the hosting company upgraded their hardware, new CPUs, RAID-10, DDR4, Litespeed, now my site loads within 3 seconds.

Why is it that when the same query is executed using ExecuteReader() in vb .net twice, it returns two very different response times?

Whenever a user clicks on GetReport button, there is a request to the server where SQL is formed in the back end and connection is established with Database. When the function ExecuteReader() is executed, it returns data at different time responses.
There are 12 servers in Production environment and the settings is such that when there is no response for more than 60 seconds from the back end, the connection is removed and hence blank screen appears on "UI".
In my code the SQL is formed and connection is established and when ExecuteReader()function is executed, it is returning data after the interval of 60 seconds where as per settings in the server, the connection is removed and hence leading to appearance of blank screen.
If the ExecuteReader() function returns data within 60 seconds, then the functionality works fine. The problem is only when the ExecuteReader() function does not retrieve data within 60 seconds.
Problem is that ExecuteReader() function returns data within 2 seconds sometimes for the same SQl and sometimes takes 2 minutes to retrieve data.
Please suggest why there is variation in returning data at different time intervals for the same query and how should I be proceeding in this situation as we are not able to increase the response time in production because of security issues.
Code is in vb.net
You said it yourself:
how should I be proceeding in this situation as we are not able to increase the response time in production because of security issues.
There's nothing you can do
If, however, you do suddenly gain the permissions to modify the query that is being run, or reconfigure the resource provision of the production system, post back here with a screenshot of the execution plan and we can tell you any potential performance bottlenecks.
Dan's comment pretty much covers why a database query might be slow; usually a similar reason why YouTube is slower to buffer at 7pm - the parents got home from work at 6, the kids screamed at them for an hour ago wanting to go on YouTube while parent desperately tries to engage child in something more educational or physically active, parent finally gives in and wants some peace and quiet :) - resource provision/supply and demand in the entire chain between you and YouTube

Deploying to App Engine: Verifying availability

When I deploy my project to App Engine from Eclipse, the time it takes for "Veryfing availability" to finish varies a lot. Sometimes it just takes a couple of seconds, but mostly it looks like this:
Verifying availability:
Will check again in 1 seconds.
Will check again in 2 seconds.
Will check again in 4 seconds.
Will check again in 8 seconds.
Will check again in 16 seconds.
Will check again in 32 seconds.
Will check again in 60 seconds.
Will check again in 60 seconds.
Will check again in 60 seconds.
Will check again in 1 seconds.
Will check again in 2 seconds.
Will check again in 4 seconds.
Will check again in 8 seconds.
Closing update: new version is ready to start serving.
What's happening during this process, and is there anything I can do to make it go away? It's a little frustrating when I only do small changes and have to wait for ages to test it.
That is some App Engine delay over which we have no control. Just have to live with it. It is some race to deploy on the servers where the GAE traffic cops are trying to get everyone's apps deployed, updated, and readied. Can be frustrating sometimes, but it's part of the deal.
From this answer (from the Java GAE):
The delay is probably caused by an unacknowledged temporary server
condition. It has happened previously and usually improves after a few
hours.
This resource suggest the same.
From personal experience it indeed improves after a few hours.

Resources