RabbitMQ message consumers stop consuming messages - benchmarking

Our team is in a spike sprint to choose between ActiveMQ or RabbitMQ. We made 2 little producer/consumer spikes sending an object message with an array of 16 strings, a timestamp, and 2 integers. The spikes are ok on our devs machines (messages are well consumed).
Then came the benchs. We first noticed that somtimes, on our machines, when we were sending a lot of messages the consumer was sometimes hanging. It was there, but the messsages were accumulating in the queue.
When we went on the bench plateform :
cluster of 2 rabbitmq machines 4 cores/3.2Ghz, 4Gb RAM, load balanced by a VIP
one to 6 consumers running on the rabbitmq machines, saving the messages in a mysql DB (same type of machine for the DB)
12 producers running on 12 AS machines (tomcat), attacked with jmeter running on another machine. The load is about 600 to 700 http request per second, on the servlets that produces the same load of RabbitMQ messages.
We noticed that sometimes, consumers hang (well, they are not blocked, but they dont consume messages anymore). We can see that because each consumer save around 100 msg/sec in database, so when one is stopping consumming, the overall messages saved per seconds in DB fall down with the same ratio (if let say 3 consumers stop, we fall around 600 msg/sec to 300 msg/sec).
During that time, the producers are ok, and still produce at the jmeter rate (around 600 msg/sec). The messages are in the queues and taken by the consumers still "alive".
We load all the servlets with the producers first, then launch all the consumers one by one, checking if the connexions are ok, then run jmeter.
We are sending messages to one direct exchange. All consumers are listening to one persistent queue bounded to the exchange.
That point is major for our choice. Have you seen this with rabbitmq, do you have an idea of what is going on ?
Thank you for your answers.

It's always worth
setting the prefetch count when using basic.consume :
channel.basicQos(100);
before the channel.basicConsume line in order to ensure you never have
more than 100 messages queued up in your QueueingConsumer.

I have seen this behavior when using the RabbitMQ STOMP plugin. I haven't found a solution yet.
Are you using the STOMP plugin?

The channel in a RabbitMQ is not thread safe.
so check in consumer channel for any thread requests.

Related

Re-distributing messages from busy subscribers

We have the following set up in our project: Two applications are communicating via a GCP Pub/Sub message queue. The first application produces messages that trigger executions (jobs) in the second (i.e. the first is the controller, and the second is the worker). However, the execution time of these jobs can vary drastically. For example, one could take up to 6 hours, and another could finish in less than a minute. Currently, the worker picks up the messages, starts a job for each one, and acknowledges the messages after their jobs are done (which could be after several hours).
Now getting to the problem: The worker application runs on multiple instances, but sometimes we see very uneven message distribution across the different instances. Consider the following graph, for example:
It shows the number of messages processed by each worker instance at any given time. You can see that some are hitting the maximum of 15 (configured via the spring.cloud.gcp.pubsub.subscriber.executor-threads property) while others are idling at 1 or 2. At this point, we also start seeing messages without any started jobs (awaiting execution). We assume that these were pulled by the GCP Pub/Sub client in the busy instances but cannot yet be processed due to a lack of executor threads. The threads are busy because they're processing heavier and more time-consuming jobs.
Finally, the question: Is there any way to do backpressure (i.e. tell GCP Pub/Sub that an instance is busy and have it re-distribute the messages to a different one)? I looked into this article, but as far as I understood, the setMaxOutstandingElementCount method wouldn't help us because it would control how many messages the instance stores in its memory. They would, however, still be "assigned" to this instance/subscriber and would probably not get re-distributed to a different one. Is that correct, or did I misunderstand?
We want to utilize the worker instances optimally and have messages processed as quickly as possible. In theory, we could try to split up the more expensive jobs into several different messages, thus minimizing the processing time differences but is this the only option?

maxConcurrency in Mule 4

I am new to mule and trying to understand how max concurrency works? I understand that is to the determine how many simultaneous request can be received. But I have a question, in case I have huge incoming requests to the application, if I set the max concurrency will the subsequent requests be in queue or it error out or data will be lost?
For example, say I am getting around 10,000 requests to my application and I set the max concurrency to 1000. So the other 9000 will be in the queue or there is a chance of data loss?
Also what would be the ideal max concurrency that can be set for flow?
Thanks in advance.
I'm not sure to which queue are you referring to. In Mule 4 requests are not queued. If a flow sets maxConcurrency, any concurrent request exceeding that number will fail with an error. If that causes data loss will probably depend on what your application is doing. maxConcurrency just limits the concurrency, without you needing to think on terms of the implementation or numbers of threads.
If your application is reading from a queue (ie a JMS or VM queue), it would be expected it can not read more than maxConcurrency simultaneous messages, so the other messages should remain in the queue and not trigger an error.
Note that maxConcurrency doesn't guarantees that an application can actually process as many concurrent requests. If you set it to 1000, you have to ensure that your application will have enough resources to process that many concurrent requests. The only way to know how to size it is to perform load testing in conditions similar to the real expected usage and verify at what number you start getting errors due to not enough resources.
A special case is when you want just a specific number. For example a flow reading from a queue with maxConcurrency = 1 to process only one message at a time.

Service bus queue in e-commerce application

We have an e-commerce application running on MS SQL.
Every now and then we have a flash sale, and once we start inserting all the orders into the database, our site's performance drops. We have it at the point where we can insert about 1,500 orders in a minute, but the site hangs for a few minutes after that. The site only hangs once the inserts start happening.
I have been looking into using Azure Service Bus queues mixed with SignalR to manage the order process, as this was suggested to me a while back. The way I see it happening is (broad overview):
Client calls a procedure on the server which inserts an order into a queue.
Client gets notified that they are in a queue.
We have a worker process which processes the order from the queue and inserts it into the database.
Server then notifies the client that the order is processed and moves them onto the payment page.
I am new to SignalR and queues in general so my questions are:
Will queues actually have a performance benefit. If so, why?
Are queues even the correct thing to use in this instance?
The overview you mention makes sense. It seems like you should be able to do it without SignalR since ServiceBus will let you know once it successfully inserted the message into the queue.
It is not that queues give you better performance for 1 request. Messages placed onto the queue will be stored until you are ready to process them. By doing this you will not suffer "peak" issues and you will be able to receive from the Queue at a speed that you know your system is able to sustain (Maybe 500 orders/minute or whatever number works for you).
So they will give you a much more stable latency per request without bringing down your system.

Is lease_tasks() in gae pull queues a blocking method?

I have a pull-queue in Google App Engine and a resident backend which processes the tasks in the pull queue. The backend has several worker threads for consuming and processing tasks, as suggested in a post in Google Cloud Platform blog
https://cloud.google.com/resources/articles/ios-push-notifications
Workers poll the pull-queue with lease_tasks(). My question is: is lease_tasks() supposed to be a blocking method, i.e. block the current thread's execution until either the queue has some tasks or a deadline is exceeded?
According to the GAE documentation
https://developers.google.com/appengine/docs/python/taskqueue/overview-pull#Python_Leasing_tasks
lease_tasks() accepts a 'deadline' parameter and may raise the DeadlineExceededError, thus isn't rational to assume that lease_tasks() blocks up to 'deadline' seconds?
The problem is that while I 'm developing the application in the development server, lease_tasks() returns immediately with an empty list of tasks. The result is that the worker thread's while-loop is constantly calling lease_tasks(), thus consuming 100% of CPU. If I put an explicit sleep(), say for 5 sec, that will make the worker go to sleep and won't wake up if a task is placed in the queue in the mean time. That would make the worker less responsive (worst case, it might take ->5 secs for handling the next task), plus I would consume more CPU (wake up->sleep cycles) than just having the thread block in a 'queue' (I know the pull-queue is actually an RPC, yet it abstractly remains a producer queue)
Perhaps this happens only with the dev app server while in GAE lease_tasks() blocks. However, the example code from the blog post mentioned above also suspends thread execution with sleep(). The example code is available in github and a link is in the blog post (unfortunately I cannot post it here)
lease_tasks does not wait for new tasks to be added. Most task queue calls take up to 5 seconds. The calls to lease tasks and to fetch queue statistics take longer - up to 10 seconds by default.
Most users won't need to set the deadline, it is an optional parameter. If you have very many workers contending on the same queue and often experience transient errors after 10 seconds, consider increasing the lease deadline to 20 seconds (or shard the load over more queues and/or tags). Alternately, if you only have one worker and it always needs time to perform other work in addition to leasing tasks, a small deadline like 5 seconds could be used, but it is much better to use the async API instead.

Google App Engine Channels API and sending heartbeat signals from client

Working on a GAE project and one requirement we have is that we want to in a timely manner be able to determine if a user has left the application. Currently we have this working, but is unreliable so I am researching alternatives.
The way we do this now is we have a function setup to run in JS on an interval that sends a heartbeat signal to the GAE app using an AJAX call. This works relatively well, but is generating a lot of traffic and CPU usage. If we don't hear a heartbeat from a client for several minutes, we determine they have left the application. We also have the unload function wired up to send a part message, again through an AJAX call. This works less then well, but most of the time not at all.
We are also making use of the Channels API. One thing I have noticed is that our app when using an open channel, the client seems to also be sending a heartbeat signal in the form of a call to http://talkgadget.google.com/talkgadget/dch/bind. I believe this is happening from the iFrame and/or JS that gets loaded when opening channel in the client.
My question is, can my app on the server side some how hook in to these calls to http://talkgadget.google.com/talkgadget/dch/bind and use this as the heartbeat signal? Is there a better way to detect if a client is still connected even if they aren't actively doing anything in the client?
Google have added this feature:
See https://developers.google.com/appengine/docs/java/channel/overview
Tracking Client Connections and Disconnections
Applications may register to be notified when a client connects to or
disconnects from a channel.
You can enable this inbound service in appengine-web.xml:
Currently the channel API bills you up-front for all the CPU time the channel will consume for two hours, so it's probably cheaper to send messages to a dead channel than to send a bunch of heartbeat messages to the server.
https://groups.google.com/d/msg/google-appengine/sfPTgfbLR0M/yctHe4uU824J
What I would try is attach a "please acknowledge" parameter to every Nth message (staggered to avoid every client acknowledging a single message). If 2 of these are ignored mute the channel until you hear from that client.
You can't currently use the Channel API to determine if a user is still online or not. Your best option for now depends on how important it is to know as soon as a user goes offline.
If you simply want to know they're offline so you can stop sending messages, or it's otherwise not vital you know immediately, you can simply piggyback pings on regular interactions. Whenever you send the client an update and you haven't heard anything from them in a while, tag the message with a 'ping request', and have the client send an HTTP ping whenever it gets such a tagged message. This way, you'll know they're gone shortly after you send them a message. You're also not imposing a lot of extra overhead, as they only need to send explicit pings if you're not hearing anything else from them.
If you expect long periods of inactivity and it's important to know promptly when they go offline, you'll have to have them send pings on a schedule, as you suggested. You can still use the trick of piggybacking pings on other requests to minimize them, and you should set the interval between pings as long as you can manage, to reduce load.
I do not have a good solution to your core problem of "hooking" the client to server. But I do have an interesting thought on your current problem of "traffic and CPU usage" for periodic pings.
I assume you have a predefined heart-beat interval time, say 1 min. So, if there are 120 clients, your server would process heart beats at an average rate of 2 per second. Not good if half of them are "idle clients".
Lets assume a client is idle for 15 minutes already. Does this client browser still need to send heart-beats at the constant pre-defined interval of 1 min?? Why not make it variable?
My proposal is simple: Vary the heart-beats depending on activity levels of client.
When the client is "active", heart-beats work at 1 per minute. When the client is "inactive" for more than 5 minutes, heart-beat rate slows down to 50% (one after every 2 minutes). Another 10 minutes, and heart-beat rate goes down another 50% (1 after every 4 minutes)... At some threshold point, consider the client as "unhooked".
In this method, "idle clients" would not be troubling the server with frequent heartbeats, allowing your app server to focus on "active clients".
Its a lot of javascript to do, but probably worth if you are having trouble with traffic and CPU usage :-)

Resources