We are using Azure Functions. We are running on App Service Plan (not consumption model). My issue is that we are seeing a strange delay on 'some' web calls into out services.
For example i have one http get trigger, it returns a list of objects from another web service (so there is outbound web traffic from my function). If i call the service 10 times. i'll get maybe 6 respoonses come back in bewteen 400 and 600 ms) but then 4 of those calls will take bewteen 7000 and 8000 ms). Its actually quite consitent in the number. It seems bizarre , its either half a second or 7.5 seconds. I have tested the backend system and its not that. So its something around the function app itself. Any thoughts or suggestions welcome.
(Copying Noel's comment as the answer)
Thanks for the suggestion on the Kudo call that was a great help. It was actually due to a linked VNet. (Which i didnt mention in my post) It seems to be swallowing up outbound traffic and sending it on a very slow trip, due to what looks like a bad config around dns.settings.
Related
I run an API in app engine. Sometimes it takes a request only ~50ms to complete and sometimes it takes 10-15 seconds!
Here's what it looks like in the Google Chrome console:
As you can see, some requests are very fast, and some very slow.
Using StackDrive Trace I can confirm that it takes the API 10 seconds or longer sometimes. I tried automatically making requests each second to see if it speeds up after the first request, but it still seems random.
So the next thing I tried is measuring if the API itself is slow due to my own code. I tested it, but it seems to be very fast and not the cause of the problem. Neither do i make any requests inside my API that could be slowing it down (other than a database request).
I am still trying to figure out what it is that is causing this massive latency, but it seems like it happens in between the request being made on the frontend and the request being received on the backend.
I would highly appreciate any help and suggestions!
EDIT 1
Seems like the 204 No Content responses are also slow sometimes.
Here's more strange behavior. On the frontend I make several requests at once to load a page. For every request there is almost exactly a one second delay:
I still have not even figured out the cause of this problem, help is still appreciated.
EDIT 2
My timeline doesn't seem to break down the way it does for Alex:
I tried adding this to all http headers:
'Cache-Control': 'no-cache',
Pragma: 'no-cache'
Which sadly is not solving my problem either.
EDIT 3
The 10 second latency is probably caused by 10 requests all being fired at once, each taking 1 second.
So my first question is:
Can a single app engine f1 instance not handle multiple (concurrent) requests at once?
And my second question:
Why does it take over 1 second (sometimes over 2 seconds) to process a single request?
I did another test to find out if it is my code that is slowing down the requests. I deployed a .net core MVC controller with only 1 task. All it does is return "Hello world". Here are the results (using this method):
> curl.exe -s -o --url "http://api.---.com/test" -w "#curl-format.txt"
time_namelookup: 0,000001
time_connect: 0,109000
time_appconnect: 0,000000
time_pretransfer: 0,109000
time_redirect: 0,000000
time_starttransfer: 1,203000
--------
time_total: 1,203000
In your fast requests, the responses are 204 (No Content) and they are 45 bytes. The slow requests are responding with 200 and are actually returning something.
Is there some kind of caching that's effecting this?
EDIT 1: Since your server was returning 204's I was more referring to any caching that you implemented on the client-side. I see that you found the trace screen (https://console.cloud.google.com/traces/traces), have you tried clicking on one of them? It gives you a breakdown like this:
that should tell you where the request is spending its time
I was checking for XHR calls timing in Chrome DevTools to improve slow requests but I found out that 99% of the response time is wasted on content download even though the content size is less than 5 KB and the application is running on localhost(Working on my local machine so no Network issues).
But when replaying the call using Replay XHR menu, the Content download period drops dramatically from 2.13 s to 2.11 ms(as shown in the screen shots below). Data is not cached at browser level.
Example of Call Timing
Same Example Replayed
Can someone explain why the content download timing is slow and how to improve it?
The Application is an ASP.NET mvc 5 solution combined with angularJS.
The Web Server Details:
- Windows Server 2012 R2
- IIS 8
Thank you in advance for your support!
I can't conclusively tell you the cause of this, but I can offer some variables that you can investigate, which might help you figure out what's going on.
Caching
I know you said that the data is not getting cached at the browser level, but I'd suggest checking that again. Because the fact that the initial request takes 2s, and then the repeat request only takes 2ms really does sound like caching.
How to check:
Go to Network panel.
Look at Size column for the request. If you see from memory or from disk cache, it was served from the cache.
Slow development server or machine
My initial thought was that you're doing more work on your development machine than it can handle. Maybe the server requires more resources than your machine can handle. Maybe you have a lot of other programs running and your memory / CPU is getting maxed.
How to check:
Run your app on a more powerful server and see if the pattern persists.
Frontend app is doing too much work
I'm not sure this last one actually makes sense, but it's worth a check. Perhaps your Angular app is doing a crazy amount of JS work during the initial request, and it's maxing out your CPU. So the entire browser is stalling when you make the initial request.
How to check:
Go to Performance panel.
Start recording.
Do the action that causes your app to make the initial request.
Stop recording.
Check the CPU chart. If it's completely maxed out, then your app is indeed doing a bunch of work.
Please leave a comment and let me know if any of these helped.
I have also been investigating this issue on Chrome (currently 91.0.4472.164) as the content download times appear to be vastly different based on the context of the download. When going directly to a resource or attempting to update rendered content as the result of a web call, the content download time can take up to 10x the duration when made from other client applications or when simply saving the data off as a variable in Chrome.
I created a quick, hacky Spring Boot web application that demonstrates the problem that I have made public on github: https://github.com/zielinskin/h2-training-simple
The steps in the readme should hopefully be sufficient to demonstrate the vast performance differences.
I believe that Chrome will need to resolve this performance issue as it has nothing to do with the webserver or ui framework being used.
The "Content Download" includes both the time taken to download the content and also the time for the server to upload the content. You can test out the following cases to see what is the cause. Usually it is a combination of all them.
Case 1: server delay
Assume running server and client on localhost with 0 network delay, and small data.
time0 client receives a response with header content-length = 20
time5 server > client: 10 bytes of data
time5 client receives data
Case 2: network delay
Use hard-coded dummy data to speed up server
time0 client receives a response with header content-length = 20
time0 server > client: 10 bytes of data
time5 client receives data
Case 3: client is too busy
Isolate the query by trying something like curl google.com -v in terminal to access the URL directly. You can use Chrome Dev tool and Firefox Dev tools to copy the request as shown below.
I have installed PostgreSQL 8.4. What I want to do is call a web service through a C function, enabled by an insert/update trigger and pass the NEW values in this webservice. How can I do that, I searched the web and couldn't find an example.
Thanks in advance.
Please don't do this. If you do you will have wonderful questions like how you handle the web service being down. Also you will have to address what happens when your application rolls back. You can't uncall the web service. Also if the connection times out, your procedure will hang for quite a bit of time (retaining all locks etc) while waiting for the response which never comes.
A better approach is to use a queuing solution like pgq or pg_message_queue and queue up the data at trigger time, only to run it against the web service asynchronously.
I coded a simple scraper , who's job is to go on several different pages of a site. Do some parsing , call some URL's that are otherwise called via AJAX , and store the data in a database.
Trouble is , that sometimes my ip is blocked after my scraper executes. What steps can I take so that my ip does not get blocked? Are there any recommended practices? I have added a 5 second gap between requests to almost no effect. The site is medium-big(need to scrape several URLs)and my internet connection slow, so the script runs for over an hour. Would being on a faster net connection(like on a hosting service) help ?
Basically I want to code a well behaved bot.
lastly I am not POST'ing or spamming .
Edit: I think I'll break my script into 4-5 parts and run them at different times of the day.
You could use rotating proxies, but that wouldn't be a very well behaved bot. Have you looked at the site's robots.txt?
Write your bot so that it is more polite, i.e. don't sequentially fetch everything, but add delays in strategic places.
Following guidelines set in robots.txt is a good first step. There are tools such as import.io and morph.io. There are also packages/ plugins for servers. For example x-ray; a node.js which have options to assist in quickly writing responsible scrapers e.g. throttle, delays, max connections etc.
I'm working on a rather large classic asp / SQL Server application.
A new version was rolled out a few months ago with a lot of new features, and I must have a very nasty bug somewhere : some very basic pages randomly take a very long time to execute.
A few clues :
It isn't the database : when I run the query profiler, it doesn't detect any long running query
When I launch IIS Diagnostic tools, reqviewer shows that the request is in state "processing"
This can happen on ANY page
I can't reproduce it easily, it's completely random.
To have an idea of "a very long time" : this morning I had a page take more than 5 minutes to execute, when it normaly should be returned to the client in less than 100 ms.
The application can handle rather large upload and download of files (up to 2 gb in size). This is also handled with a classic asp script, using SoftArtisan FileUp. Don't think it can cause the problem though, we've had these uploads for quite a while now.
I've had the problem on two separate servers (in two separate locations, with different sets of data). One is running the application with good ol' SQL Server 2000 and the other runs SQL Server 2005. The web server is IIS 6 in both cases.
Any idea what the problem is or on how to solve that kind of problem ?
Thanks.
Sebastien
Edit :
The problem came from memory fragmentation. Some asp pages were used to download files from the server. File sizes could go from a few kb to more than 2 gb. These variations in size induced memory fragmentation. The asp pages could also take quite some time to execute (the time for the user to download the pages minus what is put in cache at IIS's level), which is not really standard for server pages that should execute quickly.
This is what I did to improve things :
Put all the download logic in a single asp page with session turned off
That allowed me to put that asp page in a specific pool that could be recycled every so often (download would now disturb the rest of the application no more)
Turn on LFH (Low Fragmention Heap), which is not by default on Windows 2003, in order to reduce memory fragmentation
References for LFH :
http://msdn.microsoft.com/en-us/library/aa366750(v=vs.85).aspx
Link (there is a dll there that you can use to turn on LFH, but the article is in French. You'll have to learn our beautiful language now!)
I noticed the same thing on a classic ASP + ajax application that I worked on. Using Timer, I timed the page load to be 153 milliseconds but in the firebug waterfall chart it randomly says 3.5 seconds. The Timer output is on the response and the waterfall chart claims that it's Firefox waiting for a response from the server. Because the waterfall chart also shows the response, I can compare the waterfall chart to the timer and there's a huge discrepancy 'every so often'
Can you establish whether this is a problem for all pages or a common subset of pages?
If a subset examine what these pages have in common, for example they all use a specific COM dll, that other pages don't.
Does this problem affect multiple clients or just a few?
IOW is there an issue with a specific browser OS version.
Is this public or intranet?
Can you reproduce the problem from a client you own?
Is there any chance there are some full-text search queries going on SQL Server?
Because if so, and if SQL Server has no access to internet, it may cause a 45-second delay every few hours or so when it tries to check the certifications (though this does not apply to SQL Server 2000).
For a detailed explanation of what I'm referring to, read this.
Are any other apps running on your web server? If so, is your problematic in the same app pool as any of them? If so, try creating a dedicated app pool for it. Maybe one of the other apps is having a problem and is adversely affecting yours.
One thing to watch out for is if you have server side debugging turned on in IIS, the web server will run in single threaded mode.
So if you try to load a page, and someone else has hit that url at the same time, you will be queued up behind them. It will seem like pages take a long time to load, but its simply because the server is doling out page requests in a single file line and sometimes you aren't at the front of the line.
You may have turned this on for debugging and forgot to turn it off for production.