I have a website with sometimes a large number of connections in a short time.
I am working to optimize the website and the server to manage theses connections.
I have made a stress test with 200 clients in 30 seconds. I watch the result of "top". After 30 sec the test is finished but the apache's process stay opened during a long time (maybe 15 sec) ?
Maybe this is a part of my problem, event after the request is done, apache still consumed memory.
Is it normal ? Maybe apache keep the process in order to threat a further request ?
Related
We have a performance issue with an AngularJS website hosted on IIS. This issue only affects our users connected via VPN (working from home).
The problem: regularly, a page that usually takes one or two seconds to load can take over 10 seconds.
This issue first appeared to be random, but we were able to reproduce it in a test environment and found out that the problem seems to arise on a very regular basis (every 10-15 minutes).
What we did: using a tool (ThousandEyes), we send every minute the same simple GET request via 12 clients to the Test server. We can see in the IIS logs that this request is processed in less than 50ms most of the time. However, every 15 minutes or so, the same request takes more than 5 seconds to process at least for 1 client. Example below: the calls done every minutes by client #1 takes more than 5 sec at 21:12, 21:13, 21:14, then 21:28, 21:29, then 21:45:
The graph below shows the mean response times for the 12 clients (peak every 10-15 minutes):
For both the test and the production environments, this issue only affect users connected via VPN (but not all the users connected via VPN are affected at the same time).
Any idea what can cause this behavior ?
All suggestions and questions are welcome.
Notes:
Session State. InProcess. I tried Not Enabled and State Server but we still have the same results.
Maximum Worker Process. 1. I tried 2, no change.
Test server usage. As far as I can tell, nothing special happen every 15 minutes on the server (no special events).
Test server configuration: 2 Xeon proc #2.6GHz, 8 GB RAM, 20 GB disk space, Windonws 2016.
Test server load: almost nothing beside these 12 requests every minute from the 12 test clients.
This issue cost us a lot of time. We finally found out that a VPN server was misconfigured.
Rebuilding this server was the solution.
My application is doing some juggling of email attachments. So far it's clocking in around 20s and everything works fine. But if I send larger attachments and it passes 60s, is it going to break?
The App Engine doc does not say if the mail reception servlet have a timeout of 60s or 10 minutes, so it's hard to say.
In any case, I would recommend you perform the following in the servlet that handles /_ah/mail :
Store the mail content in Cloud Storage or blob store
Start a task to process this mail
That way you will take advantage of the retry capabilities of task, and you'll have 10 minutes to process your mail.
If you believe your task may take more than 10 minutes, you can either break up in smaller tasks (chained or parallel depending on your use case) or use modules to go beyond the 10 minutes limit. Note that modules will not stay up forever and you should not expect to perform 4 hour tasks on modules, for example.
I have a low-load application which experienced latency spikes (requests taking up to 10s to return) due to loading requests, as seen in the logs:
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time.
Here I assume that "new process" means "new instance".
In order to avoid this, I fixed the number of idle instances to exactly one (max=1 and min=1), so there is always one instance running ("resident instance") and GAE shouldn't start new ones. Billing is enabled.
However, I still experience loading requests. Why? Can anything be done about this?
Idle instances are "reserve" instances - they are meant to handle spikes when traffic increases, not the "normal" traffic. Idle instances are used only during the spin-up of the dynamic instances.
So, when you have one idle instance and no dynamic instances running and you get a request, than the idle instance should handle the request, but a new dynamic instance will still be spun up.
I too experienced the same problem with my low-traffic app and here is the practical solution that almost always prevents my users to face a cold start :
- 1 resident F4 instance
- pending latency to 15 sec
- i worked so that my warmup request are as fast as possible (under 10 sec), still quite long cause i use the frameWork Play (Java)
- and when i really don t want to have any problems i create fake traffic by pinging my app.
With this config, the resident usually serves around 50 requests, during that time, a dynamic instance receives a warmup and then start serving.
Working on a GAE project and one requirement we have is that we want to in a timely manner be able to determine if a user has left the application. Currently we have this working, but is unreliable so I am researching alternatives.
The way we do this now is we have a function setup to run in JS on an interval that sends a heartbeat signal to the GAE app using an AJAX call. This works relatively well, but is generating a lot of traffic and CPU usage. If we don't hear a heartbeat from a client for several minutes, we determine they have left the application. We also have the unload function wired up to send a part message, again through an AJAX call. This works less then well, but most of the time not at all.
We are also making use of the Channels API. One thing I have noticed is that our app when using an open channel, the client seems to also be sending a heartbeat signal in the form of a call to http://talkgadget.google.com/talkgadget/dch/bind. I believe this is happening from the iFrame and/or JS that gets loaded when opening channel in the client.
My question is, can my app on the server side some how hook in to these calls to http://talkgadget.google.com/talkgadget/dch/bind and use this as the heartbeat signal? Is there a better way to detect if a client is still connected even if they aren't actively doing anything in the client?
Google have added this feature:
See https://developers.google.com/appengine/docs/java/channel/overview
Tracking Client Connections and Disconnections
Applications may register to be notified when a client connects to or
disconnects from a channel.
You can enable this inbound service in appengine-web.xml:
Currently the channel API bills you up-front for all the CPU time the channel will consume for two hours, so it's probably cheaper to send messages to a dead channel than to send a bunch of heartbeat messages to the server.
https://groups.google.com/d/msg/google-appengine/sfPTgfbLR0M/yctHe4uU824J
What I would try is attach a "please acknowledge" parameter to every Nth message (staggered to avoid every client acknowledging a single message). If 2 of these are ignored mute the channel until you hear from that client.
You can't currently use the Channel API to determine if a user is still online or not. Your best option for now depends on how important it is to know as soon as a user goes offline.
If you simply want to know they're offline so you can stop sending messages, or it's otherwise not vital you know immediately, you can simply piggyback pings on regular interactions. Whenever you send the client an update and you haven't heard anything from them in a while, tag the message with a 'ping request', and have the client send an HTTP ping whenever it gets such a tagged message. This way, you'll know they're gone shortly after you send them a message. You're also not imposing a lot of extra overhead, as they only need to send explicit pings if you're not hearing anything else from them.
If you expect long periods of inactivity and it's important to know promptly when they go offline, you'll have to have them send pings on a schedule, as you suggested. You can still use the trick of piggybacking pings on other requests to minimize them, and you should set the interval between pings as long as you can manage, to reduce load.
I do not have a good solution to your core problem of "hooking" the client to server. But I do have an interesting thought on your current problem of "traffic and CPU usage" for periodic pings.
I assume you have a predefined heart-beat interval time, say 1 min. So, if there are 120 clients, your server would process heart beats at an average rate of 2 per second. Not good if half of them are "idle clients".
Lets assume a client is idle for 15 minutes already. Does this client browser still need to send heart-beats at the constant pre-defined interval of 1 min?? Why not make it variable?
My proposal is simple: Vary the heart-beats depending on activity levels of client.
When the client is "active", heart-beats work at 1 per minute. When the client is "inactive" for more than 5 minutes, heart-beat rate slows down to 50% (one after every 2 minutes). Another 10 minutes, and heart-beat rate goes down another 50% (1 after every 4 minutes)... At some threshold point, consider the client as "unhooked".
In this method, "idle clients" would not be troubling the server with frequent heartbeats, allowing your app server to focus on "active clients".
Its a lot of javascript to do, but probably worth if you are having trouble with traffic and CPU usage :-)
Our team is in a spike sprint to choose between ActiveMQ or RabbitMQ. We made 2 little producer/consumer spikes sending an object message with an array of 16 strings, a timestamp, and 2 integers. The spikes are ok on our devs machines (messages are well consumed).
Then came the benchs. We first noticed that somtimes, on our machines, when we were sending a lot of messages the consumer was sometimes hanging. It was there, but the messsages were accumulating in the queue.
When we went on the bench plateform :
cluster of 2 rabbitmq machines 4 cores/3.2Ghz, 4Gb RAM, load balanced by a VIP
one to 6 consumers running on the rabbitmq machines, saving the messages in a mysql DB (same type of machine for the DB)
12 producers running on 12 AS machines (tomcat), attacked with jmeter running on another machine. The load is about 600 to 700 http request per second, on the servlets that produces the same load of RabbitMQ messages.
We noticed that sometimes, consumers hang (well, they are not blocked, but they dont consume messages anymore). We can see that because each consumer save around 100 msg/sec in database, so when one is stopping consumming, the overall messages saved per seconds in DB fall down with the same ratio (if let say 3 consumers stop, we fall around 600 msg/sec to 300 msg/sec).
During that time, the producers are ok, and still produce at the jmeter rate (around 600 msg/sec). The messages are in the queues and taken by the consumers still "alive".
We load all the servlets with the producers first, then launch all the consumers one by one, checking if the connexions are ok, then run jmeter.
We are sending messages to one direct exchange. All consumers are listening to one persistent queue bounded to the exchange.
That point is major for our choice. Have you seen this with rabbitmq, do you have an idea of what is going on ?
Thank you for your answers.
It's always worth
setting the prefetch count when using basic.consume :
channel.basicQos(100);
before the channel.basicConsume line in order to ensure you never have
more than 100 messages queued up in your QueueingConsumer.
I have seen this behavior when using the RabbitMQ STOMP plugin. I haven't found a solution yet.
Are you using the STOMP plugin?
The channel in a RabbitMQ is not thread safe.
so check in consumer channel for any thread requests.