Problems with amphp with more than 1k concurrent tcp requests - amphp

I have created an application using the https://amphp.org framework. It is running with the help of a cronjob (every 5 minutes) sending a concurrent request to some sensors (temperatures) and it saves all the responses.
Up to ~1K sensors everything is working fine.
When increased to 1.5k for about 30% of the requests I am getting the following message:
Connecting to tcp://...:502 failed: timeout exceeded (10000 ms)
Any suggestions would be greatly appreciated!

There's usually a limit of up to 1024 file descriptors with stream_select, so you'll need one of the extensions, see https://github.com/amphp/amp#requirements.

Related

Apache and process not immediately closed

I have a website with sometimes a large number of connections in a short time.
I am working to optimize the website and the server to manage theses connections.
I have made a stress test with 200 clients in 30 seconds. I watch the result of "top". After 30 sec the test is finished but the apache's process stay opened during a long time (maybe 15 sec) ?
Maybe this is a part of my problem, event after the request is done, apache still consumed memory.
Is it normal ? Maybe apache keep the process in order to threat a further request ?

IIS response time hight every 10-15 minutes for the same simple request

We have a performance issue with an AngularJS website hosted on IIS. This issue only affects our users connected via VPN (working from home).
The problem: regularly, a page that usually takes one or two seconds to load can take over 10 seconds.
This issue first appeared to be random, but we were able to reproduce it in a test environment and found out that the problem seems to arise on a very regular basis (every 10-15 minutes).
What we did: using a tool (ThousandEyes), we send every minute the same simple GET request via 12 clients to the Test server. We can see in the IIS logs that this request is processed in less than 50ms most of the time. However, every 15 minutes or so, the same request takes more than 5 seconds to process at least for 1 client. Example below: the calls done every minutes by client #1 takes more than 5 sec at 21:12, 21:13, 21:14, then 21:28, 21:29, then 21:45:
The graph below shows the mean response times for the 12 clients (peak every 10-15 minutes):
For both the test and the production environments, this issue only affect users connected via VPN (but not all the users connected via VPN are affected at the same time).
Any idea what can cause this behavior ?
All suggestions and questions are welcome.
Notes:
Session State. InProcess. I tried Not Enabled and State Server but we still have the same results.
Maximum Worker Process. 1. I tried 2, no change.
Test server usage. As far as I can tell, nothing special happen every 15 minutes on the server (no special events).
Test server configuration: 2 Xeon proc #2.6GHz, 8 GB RAM, 20 GB disk space, Windonws 2016.
Test server load: almost nothing beside these 12 requests every minute from the 12 test clients.
This issue cost us a lot of time. We finally found out that a VPN server was misconfigured.
Rebuilding this server was the solution.

Requests to GAE app fail with Connection Reset

Our GAE python app exposes an API that is hit by an external client system (Java-based, if that matters). The large majority of requests (tens of thousands per day) work fine, however some few requests (less than 10 per day) fail with the client side reporting 'Connection Reset by Peer' error. When that happens, the client system has fired multiple API calls that finish successfully, so we rule out the case of connectivity issues on the client side.
The GAE logs show only app-related failures but other kinds of failures (e.g. connection errors) don't appear in the logs, so we can't really tell why these API calls are failing.
Is there any way to better identify such issues other that the logs?
The GAE module that accepts the API calls has the following scaling properties
instance_class: B2
basic_scaling:
max_instances: 5
idle_timeout: 1m
and at the time of failure, only 2 (out of maximum 5) instances where running, so the GAE module is below its scaling limits. The API calls are served on-average in less that 500ms and we have never seen a log error for exceeding the 60" limit of requests. Overall, the module doesn't seem overloaded. Could it be something else?
Seems to me like the best solution to fix your issue is to try and have an exponential backoff algorithm in your code when the connection gets reset, as it would "fail gracefully" and retry it.
I would also suggest moving to automatic scaling, where playing with your max pending latency and min pending latency you might help with these kind of issues. I don't have specifics on what would fix your issue, but I guess that fiddling with these could provide some results.

Receiving email on GAE: still 60s to complete processing?

My application is doing some juggling of email attachments. So far it's clocking in around 20s and everything works fine. But if I send larger attachments and it passes 60s, is it going to break?
The App Engine doc does not say if the mail reception servlet have a timeout of 60s or 10 minutes, so it's hard to say.
In any case, I would recommend you perform the following in the servlet that handles /_ah/mail :
Store the mail content in Cloud Storage or blob store
Start a task to process this mail
That way you will take advantage of the retry capabilities of task, and you'll have 10 minutes to process your mail.
If you believe your task may take more than 10 minutes, you can either break up in smaller tasks (chained or parallel depending on your use case) or use modules to go beyond the 10 minutes limit. Note that modules will not stay up forever and you should not expect to perform 4 hour tasks on modules, for example.

RabbitMQ message consumers stop consuming messages

Our team is in a spike sprint to choose between ActiveMQ or RabbitMQ. We made 2 little producer/consumer spikes sending an object message with an array of 16 strings, a timestamp, and 2 integers. The spikes are ok on our devs machines (messages are well consumed).
Then came the benchs. We first noticed that somtimes, on our machines, when we were sending a lot of messages the consumer was sometimes hanging. It was there, but the messsages were accumulating in the queue.
When we went on the bench plateform :
cluster of 2 rabbitmq machines 4 cores/3.2Ghz, 4Gb RAM, load balanced by a VIP
one to 6 consumers running on the rabbitmq machines, saving the messages in a mysql DB (same type of machine for the DB)
12 producers running on 12 AS machines (tomcat), attacked with jmeter running on another machine. The load is about 600 to 700 http request per second, on the servlets that produces the same load of RabbitMQ messages.
We noticed that sometimes, consumers hang (well, they are not blocked, but they dont consume messages anymore). We can see that because each consumer save around 100 msg/sec in database, so when one is stopping consumming, the overall messages saved per seconds in DB fall down with the same ratio (if let say 3 consumers stop, we fall around 600 msg/sec to 300 msg/sec).
During that time, the producers are ok, and still produce at the jmeter rate (around 600 msg/sec). The messages are in the queues and taken by the consumers still "alive".
We load all the servlets with the producers first, then launch all the consumers one by one, checking if the connexions are ok, then run jmeter.
We are sending messages to one direct exchange. All consumers are listening to one persistent queue bounded to the exchange.
That point is major for our choice. Have you seen this with rabbitmq, do you have an idea of what is going on ?
Thank you for your answers.
It's always worth
setting the prefetch count when using basic.consume :
channel.basicQos(100);
before the channel.basicConsume line in order to ensure you never have
more than 100 messages queued up in your QueueingConsumer.
I have seen this behavior when using the RabbitMQ STOMP plugin. I haven't found a solution yet.
Are you using the STOMP plugin?
The channel in a RabbitMQ is not thread safe.
so check in consumer channel for any thread requests.

Resources