Google PubSub and ordering messages - google-cloud-pubsub

I'm trying to understand a little bit more about how PubSub ordering messages works with a very basic toy example.
Basically, by using the python examples from the googleapis repo I'm able to publish ordered messages to a topic, and then read them through a subscription with order enabled.
What confuses me is that, if I publish the following set of messages
[
("message1", "key1"),
("message2", "key1"),
("message3", "key1"),
("message1", "key2"),
("message2", "key2"),
]
When I try to read them through either Pull or StreamingPull, PubSub behaves more of like a queue, and I'm only able to retrieve the init messages
[
("message1", "key1"),
("message1", "key2"),
]
Only after I ACK those messages, I can move forward, but then I only get again the message3 and message2. Does this mean that for a Key X, message M+1 won't be available in the subscriptions until message M is acknowledged?
Is this queue-behaviour expected or am I missing something really obvious?
Thank you!

The only guarantee around delivery is that for the same ordering key, a subsequent message will not be delivered unless the previous message has been delivered. For both pull and streaming pull, the messages may come before an ack or they may only come after an ack, but there are not strong guarantees either way.

Related

how to store messages received from websockets

we are building-up a application with chat system as a part of our service, for that, we are using websockets, as it is easily available on all platform(ios,android,web).
But we need to store all the messages received from the websockets.
We realized websockets are extremely fast, so if fire a query, for each messages we received through the websockets there might be a
some chances, some messages would not be store/or get might be
lost.
let me explain these:
Case1
so in one-to-one chat, when we receive a message, we store in a variable called $msg and we simply pass this $msg to the intended user. So if we add some more logic, like before sending message to user, we could fire a query to store the message, it would take some time, lets say 2sec, or 1 sec, with this logic, some messages received through the sockets will be lost,
so we have to have deliver the message as soon as we received.
Case2
there could be another logic; if we fire a query, after sending the message to the intended user, in that time, there could be a chance $msg variable has changed their value so many times, in just fraction of second.
lets see an example.
lets assume, The variable $msg has 'hello' and we pass this $msg variable to the function, who stores the message to the database, but as we know, websockets are extremely fast, there could be chance, the value stored in the $msg, has changed so many times, or we have lost our message 'hello' which we wanted to store.
could we implement the Message Queue(DS MESSAGE QUEUE) in that case, or we should use apache kafka, rabbitmq like services ?
Note: we already aware with some real time database concepts, provides
by tech giants, but due to its high cost we are not able to use such
kind of services.

How to mark a message as "in progress" so other workers don't work on it

I'm attempting to use a pull queue to create a queue of image processing tasks that could take longer that the acktimeout limit of 10 minutes. I'm using node.js api and I'm wondering how I could have a worker grab a message off the pull queue, mark it as in progress so no other workers attempt to grab it, do its work and acknowledge the message after the processing is done. This processing could take up to an hour per worker. If an exception occurs, I'd like to remove the "in progress" status and allow other workers to pick up this message and attempt to work on it.
I was hoping there was something in pubsub that would allow me to do this. My alternative is to, before processing, store an entity (inProgressMessage) with the message id, ack id, status=pending, timestamp=now() into datastore, have the worker immediately return the ackid after receiving the message (this will allow other workers to attempt other messages), then the worker can work on the lengthy task. If successful, mark the entity status as complete, if failed in a non permanent way, requeue the task into pubsub, if failed in a permanent way that won't allow reqeueing, I can have cron that checks datastore for pending tasks older than several hours and have them either be deleted or requeued.
My alternative feels like i'm re-implementing alot of what pub sub is supposed to help with.
Let me know if you can think of a better way.
To take longer than the ack deadline to process a message, you'll want to use modifyAckDeadline. You can extend the deadline as many times as you need up to 10 minutes per call. Your workflow would be as follows:
Pull the message.
Start to process the message.
While you are not done with the message, if you are close to the 10 minute ack deadline, call modifyAckDeadline to extend the deadline.
Once done processing the message, ack it.
Please note that calling modifyAckDeadline does not guarantee that the message won't be delivered to another task. In certain circumstances like server restarts, the message may end up being delivered to another of your subscribers. However, in most normal circumstances, as long as you call modifyAckDeadline before the current ack deadline, you can prevent a message's redelivered as long as necessary.
When creating a topic (only), you can configure the acknowledge time to be whatever up to 10 minutes (https://cloud.google.com/pubsub/subscriber). Once a message has been pulled from the queue, no other worker (of the same subscriber) will be able to take it for processing, unless the ack ttl was reached, and then the message is automatically returned to queue.
Since you need a longer period, you will have to implement something on your own, or seek another queuing solution. I think the design you suggested is fairly simple to implement, and is not really a re-implementation of what pubsub does.

IMAP - JavaMail - How to know which messages to process?

What I want to achieve:
I am coding a Java program that uses IMAP to connect to some gmail accounts every 5 minutes and extract information from some messages.
I want to check all the messages (incoming and outgoing) and take only the ones I have not processed. By "processed" I do not mean only "read" or "seen" messages. My application does not care whether or not another user has accessed that account and read a message. My application needs to keep track of which was the last message it processed and, the next time it goes through the messages, start with the first non-processed message.
I do not want to change anything in the messages. I do not want to mark them as seen or read.
What I have implemented:
Establish IMAP connection.
Open and access all messages in "[Gmail]/All Mail" folder.
What I have tried:
I have been reading about UID and message number, but I am not sure if any of them could help me achieve what I want. Maybe UID could, but: how do I retrieve it with JavaMail?
I found Folder.getMessages(int start, int end), but I think it refers to the index of the message in a folder, which I believe can easily change.
Can anyone provide some guidance at what is the best approach to take here?
Thanks!
IMAP UIDs are relative to the folder containing the message. I don't know how Gmail handles UIDs for messages in the "[Gmail]/All Mail" folder, but if it does the right thing you could use the UIDFolder interface to get the UIDs. And as described, once you've processed a certain UID, all the new messages will have larger UIDs, which can make processing more efficient.
The alternative is to use Message-IDs, which has a different set of problems...

Camel Multicast Subroutes out of order

I have a scenario where I get as input Message A. Message A must then be split into 3 different types of message, and forwarded to other routes. It is important that the messages arrive in a precise order, Ie. A-1 must be sent before A-2, which must be sent before A-3.
To do this I have done the following (outline):
from("activemq:queue:somequeue-local")
.multicast().to("direct:a1","direct:a2","direct:a3");
from("direct:a1)
//split incoming message and prepare output document for A-1
.to("activemq:queue:otherqueue")
.from("direct:a2)
//split incoming message and prepare output document for A-2
.to("activemq:queue:otherqueue")
.from("direct:a3)
//split incoming message and prepare output document for A-3
.to("activemq:queue:otherqueue")
And in another context, responsible for sending out the info to the external system, I have
.from("activemq:queue:otherqueue?maxMessagesPerTask=1&concurrentConsumers=1&maxConcurrentConsumers=1")
// do different stuff based on which type we are called with then end with
.beanref("somebean","writeToFileAndCallImportbat");
Now, my problem is, that when I get to the receiver, I get the messages in random order. Sometimes A-1,A-3,A-2, sometimes right, A-1,A-2,A-3.
I have tried adding JMSXGroupID and JMSXGroupSeq to the messages, but without any luck.
I have also tried skipping the MQ part entirely, and use direct-vm: to call the shared receiver, but then it looks like I have three simultanious invocations of the receiver at once, and still in random execution order.
I was under the impression that multicast would run sequential, unless otherwise prompted to?
Is there something fundamentally wrong with the approach taken?
I am using Camel version 2.12.
Or, said more plainly:
I would like a route that creates three different output messages, and executes a batch file on them, in order. How do I go about that?
If you use the Splitter pattern, have you checked to see if the streaming property is set to false.
If enabled then Camel will split in a streaming fashion, which means it will split the input message in chunks. This reduces the memory overhead. For example if you split big messages its recommended to enable streaming. If streaming is enabled then the sub-message replies will be aggregated out-of-order, eg in the order they come back. If disabled, Camel will process sub-message replies in the same order as they where splitted.
So, it turned out to not be a problem with multicast after all.
Rather, in each of my sub-routes, I did this:
.split(..stax(SpecialClass)).streaming()
.beanRef("transformationBean","somefunction")
.aggregate(constant("1"), new MyAggregator())
.completionTimeout(5000)
.completionSize(1000)
.to(writeToFileAndRunBat)
Which, I assumed meant "Process all elements in the split, and if you aren't finished in 5 seconds or after 1000 elements, break out".
I changed it to
.split(..stax(SpecialClass), , new MyAggregator()).streaming()
.beanRef("transformationBean","somefunction")
.end()
.to(writeToFileAndRunBat)
Coming to think of it, it makes perfect sense, as the first version couldn't really know when we were done, while the last (I assume) just iterate over all elements in the split and calls the Aggregator for each.
Also, I had to .end() in the first version. So I guess the whole thing was just acting random.

Request Reply and Scatter Gather Using Apache Camel

I am attempting to construct a route which will do the following:
Consume a message from jms:sender-in. I am using a INOUTrequest reply pattern. The JMSReplyTo = sender-out
The above message will be routed to multiple recipients like jms:consumer1-in, jms:consumer2-in and jms:consumer3-in. All are using a request reply pattern. The JMSReplyTo is specified per consumer ( in this case, the JMSReplyTo are in this order jms:consumer1-out, jms:consumer2-out, jms:consumer3-out
I need to aggregate all the replies together and send the result back to jms:sender-out.
I constructed a route which will resemble this:
from("jms:sender-in")
.to("jms:consumer1-in?exchangePattern=InOut&replyTo=queue:consumer1-out&preserveMessageQos=true")
.to("jms:consumer2-in?exchangePattern=InOut&replyTo=queue:consumer2-out&preserveMessageQos=true")
.to("jms:consumer3-in?exchangePattern=InOut&replyTo=queue:consumer3-out&preserveMessageQos=true");
I then send the replies back to some queue to gather and aggreagte:
from("jms:consumer1-out?preserveMessageQos=true").to("jms:gather");
from("jms:consumer1-out?preserveMessageQos=true").to("jms:gather");
from("jms:consumer1-out?preserveMessageQos=true").to("jms:gather");
from("jms:gather").aggregate(header("TransactionID"), new GatherResponses()).completionSize(3).to("jms:sender-out");
To emulate the behavior of my consumers, I added the following route:
from("jms:consumer1-in").setBody(body());
from("jms:consumer2-in").setBody(body());
from("jms:consumer3-in").setBody(body());
I am getting a couple off issues:
I am getting a timeout error on the replies. If I comment out the gather part, then no issues. Why is there a timeout even though the replies are coming back to the queue and then forwarded to another queue.
How can I store the original JMSReplyTo value so Camel is able to send the aggregated result back to the sender's reply queue.
I have a feeling that I am struggling with some basic concepts. Any help is appreciated.
Thanks.
A good question!
There are two things you need to consider
Don't mix the exchange patterns, Request Reply (InOut) vs Event
message (InOnly). (Unless you have a good reason).
If you do a scatter-gather, you need to make the requests
multicast, otherwise they will be pipelined which is not
really scatter-gather.
I've made two examples which are similar to your case - one with Request Reply and one with (one way) Event messages.
Feel free to replace the activemq component with jms - it's the same thing in these examples.
Example one, using event messages - InOnly:
from("activemq:amq.in")
.multicast()
.to("activemq:amq.q1")
.to("activemq:amq.q2")
.to("activemq:amq.q3");
from("activemq:amq.q1").setBody(constant("q1")).to("activemq:amq.gather");
from("activemq:amq.q2").setBody(constant("q2")).to("activemq:amq.gather");
from("activemq:amq.q3").setBody(constant("q3")).to("activemq:amq.gather");
from("activemq:amq.gather")
.aggregate(new ConcatAggregationStrategy())
.header("breadcrumbId")
.completionSize(3)
.to("activemq:amq.out");
from("activemq:amq.out")
.log("${body}"); // logs "q1q2q3"
Example two, using Request reply - note that the scattering route has to gather the responses as they come in. The result is the same as the first example, but with less routes and less configuration.
from("activemq:amq.in2")
.multicast(new ConcatAggregationStrategy())
.inOut("activemq:amq.q4")
.inOut("activemq:amq.q5")
.inOut("activemq:amq.q6")
.end()
.log("Received replies: ${body}"); // logs "q4q5q6"
from("activemq:amq.q4").setBody(constant("q4"));
from("activemq:amq.q5").setBody(constant("q5"));
from("activemq:amq.q6").setBody(constant("q6"));
As for your question two - of course, it's possible to pass around JMSReplyTo headers and force exchange patterns along the road - but you will create hard to debug code. Keep your exchange patterns simple and clean - it keep bugs away.

Resources