Google Cloud PubSub send the message to more than one consumer (in the same subscription) - google-cloud-pubsub

I have a Java SpringBoot2 application (app1) that sends messages to a Google Cloud PubSub topic (it is the publisher).
Other Java SpringBoot2 application (app2) is subscribed to a subscription to receive those messages. But in this case, I have more than one instance (the k8s auto-scaling is enabled), so I have more than one pod for this app consuming messages from the PubSub.
Some messages are consumed by one instance of app2, but many others are sent to more than one app2 instance, so the messages process is duplicated for these messages.
Here is the code of consumer (app2):
private final static int ACK_DEAD_LINE_IN_SECONDS = 30;
private static final long POLLING_PERIOD_MS = 250L;
private static final int WINDOW_MAX_SIZE = 1000;
private static final Duration WINDOW_MAX_TIME = Duration.ofSeconds(1L);
#Autowired
private PubSubAdmin pubSubAdmin;
#Bean
public ApplicationRunner runner(PubSubReactiveFactory reactiveFactory) {
return args -> {
createSubscription("subscription-id", "topic-id", ACK_DEAD_LINE_IN_SECONDS);
reactiveFactory.poll(subscription, POLLING_PERIOD_MS) // Poll the PubSub periodically
.map(msg -> Pair.of(msg, getMessageValue(msg))) // Extract the message as a pair
.bufferTimeout(WINDOW_MAX_SIZE, WINDOW_MAX_TIME) // Create a buffer of messages to bulk process
.flatMap(this::processBuffer) // Process the buffer
.doOnError(e -> log.error("Error processing event window", e))
.retry()
.subscribe();
};
}
private void createSubscription(String subscriptionName, String topicName, int ackDeadline) {
pubSubAdmin.createTopic(topicName);
try {
pubSubAdmin.createSubscription(subscriptionName, topicName, ackDeadline);
} catch (AlreadyExistsException e) {
log.info("Pubsub subscription '{}' already configured for topic '{}': {}", subscriptionName, topicName, e.getMessage());
}
}
private Flux<Void> processBuffer(List<Pair<AcknowledgeablePubsubMessage, PreparedRecordEvent>> msgsWindow) {
return Flux.fromStream(
msgsWindow.stream()
.collect(Collectors.groupingBy(msg -> msg.getRight().getData())) // Group the messages by same data
.values()
.stream()
)
.flatMap(this::processDataBuffer);
}
private Mono<Void> processDataBuffer(List<Pair<AcknowledgeablePubsubMessage, PreparedRecordEvent>> dataMsgsWindow) {
return processData(
dataMsgsWindow.get(0).getRight().getData(),
dataMsgsWindow.stream()
.map(Pair::getRight)
.map(PreparedRecordEvent::getRecord)
.collect(Collectors.toSet())
)
.doOnSuccess(it ->
dataMsgsWindow.forEach(msg -> {
log.info("Mark msg ACK");
msg.getLeft().ack();
})
)
.doOnError(e -> {
log.error("Error on PreparedRecordEvent event", e);
dataMsgsWindow.forEach(msg -> {
log.error("Mark msg NACK");
msg.getLeft().nack();
});
})
.retry();
}
private Mono<Void> processData(Data data, Set<Record> records) {
// For each message, make calculations over the records associated to the data
final DataQuality calculated = calculatorService.calculateDataQualityFor(data, records); // Arithmetic calculations
return this.daasClient.updateMetrics(calculated) // Update DB record with a DaaS to wrap DB access
.flatMap(it -> {
if (it.getProcessedRows() >= it.getValidRows()) {
return finish(data);
}
return Mono.just(data);
})
.then();
}
private Mono<Data> finish(Data data) {
return dataClient.updateStatus(data.getId, DataStatus.DONE) // Update DB record with a DaaS to wrap DB access
.doOnSuccess(updatedData -> pubSubClient.publish(
new Qa0DonedataEvent(updatedData) // Publis a new event in other topic
))
.doOnError(err -> {
log.error("Error finishing data");
})
.onErrorReturn(data);
}
I need that each messages is consumed by one and only one app2 instance. Anybody know if this is possible? Any idea to achieve this?
Maybe the right way is to create one subscription for each app2 instance and configure the topic to send each message t exactly one subscription instead of to every one. It is possible?
According to the official documentation, once a message is sent to a subscriber, Pub/Sub tries not to deliver it to any other subscriber on the same subscription (app2 instances are subscriber of the same subscription):
Once a message is sent to a subscriber, the subscriber should
acknowledge the message. A message is considered outstanding once it
has been sent out for delivery and before a subscriber acknowledges
it. Pub/Sub will repeatedly attempt to deliver any message that has
not been acknowledged. While a message is outstanding to a subscriber,
however, Pub/Sub tries not to deliver it to any other subscriber on
the same subscription. The subscriber has a configurable, limited
amount of time -- known as the ackDeadline -- to acknowledge the
outstanding message. Once the deadline passes, the message is no
longer considered outstanding, and Pub/Sub will attempt to redeliver
the message

In general, Cloud Pub/Sub has at-least-once delivery semantics. That means that it will be possible to have messages redelivered that have already been acked and to have messages delivered to multiple subscribers receive the same message for a subscription. These two cases should be relatively rare for a well-behaved subscriber, but without keeping track of the IDs of all messages delivered across all subscribers, it will not be possible to guarantee that there won't be duplicates.
If it is happening with some frequency, it would be good to check if your messages are getting acknowledged within the ack deadline. You are buffering messages for 1s, which should be relatively small compared to your ack deadline of 30s, but it also depends on how long the messages ultimately take to process. For example, if the buffer is being processed in sequential order, it could be that the later messages in your 1000-message buffer aren't being processed in time. You could look at the subscription/expired_ack_deadlines_count metric in Cloud Monitoring to determine if it is indeed the case that your acks for messages are late. Note that late acks for even a small number of messages could result in more duplicates. See the "Message Redelivery & Duplication Rate" section of the Fine-tuning Pub/Sub performance with batch and flow control settings post.

Ok, after doing tests, reading documentation and reviewing the code, I have found a "small" error in it.
We had a wrong "retry" on the "processDataBuffer" method, so when an error happened, the messages in the buffer were marked as NACK, so they were delivered to another instance, but due to retry, they were executed again, correctly, so messages were also marked as ACK.
For this, some of them were prosecuted twice.
private Mono<Void> processDataBuffer(List<Pair<AcknowledgeablePubsubMessage, PreparedRecordEvent>> dataMsgsWindow) {
return processData(
dataMsgsWindow.get(0).getRight().getData(),
dataMsgsWindow.stream()
.map(Pair::getRight)
.map(PreparedRecordEvent::getRecord)
.collect(Collectors.toSet())
)
.doOnSuccess(it ->
dataMsgsWindow.forEach(msg -> {
log.info("Mark msg ACK");
msg.getLeft().ack();
})
)
.doOnError(e -> {
log.error("Error on PreparedRecordEvent event", e);
dataMsgsWindow.forEach(msg -> {
log.error("Mark msg NACK");
msg.getLeft().nack();
});
})
.retry(); // this retry has been deleted
}
My question is resolved.

Once corrected the mentioned bug, I still receive duplicated messages. It is accepted that Google Cloud's PubSub does not guarantee the "exactly one deliver" when you use buffers or windows. This is exactly my scenario, so I have to implement a mechanism to remove dups based on a message id.

Related

How to configure consumer-level transactional redelivery with Camel and IBM MQ

I am trying to accomplish a transactional JMS client in Java Spring Boot using Apache Camel, which connects to IBM MQ. Furthermore, the client needs to apply an exponential back-off redelivery behavior when processing of messages fails. Reason: Messages from MQ need to be processed and forwarded to external systems that may be down for maintenance for many hours. Using transactions to guarantee at-least once processing guarantees seems the appropriate solution to me.
I have researched this topic for many hours and have not been able to find a solution. I will start with what I currently have:
#Bean
UserCredentialsConnectionFactoryAdapter uccConnectionFactoryAdapter ()
throws IOException {
MQConnectionFactory factory = new MQConnectionFactory();
factory.setCCDTURL(tabFilePath);
UserCredentialsConnectionFactoryAdapter adapter =
new UserCredentialsConnectionFactoryAdapter();
adapter.setTargetConnectionFactory(factory);
adapter.setUsername(userName);
bentechConnectionFactoryAdapter.setPassword(password);
return adapter;
}
#Bean
PlatformTransactionManager jmsTransactionManager(#Autowired UserCredentialsConnectionFactoryAdapter uccConnectionFactoryAdapter) {
JmsTransactionManager txMgr = new JmsTransactionManager(uccConnectionFactoryAdapter);
return txMgr;
}
#Bean()
CamelContextConfiguration contextConfiguration(#Autowired UserCredentialsConnectionFactoryAdapter uccConnectionFactoryAdapter,
#Qualifier("jmsTransactionManager") #Autowired PlatformTransactionManager txMgr) {
return new CamelContextConfiguration() {
#Override
public void beforeApplicationStart(CamelContext context) {
JmsComponent jmsComponent = JmsComponent.jmsComponentTransacted(uccConnectionFactoryAdapter, txMgr);
// required for consumer-level redelivery after rollback
jmsComponent.setCacheLevelName("CACHE_CONSUMER");
jmsComponent.setTransacted(true);
jmsComponent.getConfiguration().setConcurrentConsumers(1);
context.addComponent("jms", jmsComponent);
}
#Override
public void afterApplicationStart(CamelContext camelContext) {
// Do nothing
}
};
}
// in a route builder
...
from("jms:topic:INPUT_TOPIC?clientId=" + CLIENT_ID + "&subscriptionDurable=true&durableSubscriptionName="+ SUBSCRIPTION_NAME)
.transacted()
.("direct:processMessage");
...
I was able to verify the transactional behavior through integration tests. If an unhandled exception occurs during message processing, the transaction gets rolled back and retried. The problem is, it gets immediately retried, several times per second, causing possibly significant load on the IBM MQ manager and external system.
For ActiveMQ, redelivery policies are easy to do, with plenty of examples on the net. The ActiveMQConnectionFactory has a setRedeliveryPolicy method, meaning, the ActiveMQ client library has redelivery logic built in. This from all I can tell in line with the documentation of Camel's Transactional Client EIP, which states:
The redelivery in transacted mode is not handled by Camel but by the backing system (the transaction manager). In such cases you should resort to the backing system how to configure the redelivery.
What I absolutely can't figure out is how to achieve the same thing for IBM MQ. IBM's MQConnectionFactory does not have any support for redelivery policies. In fact, searching for redeliverypolicy in the MQ Knowledge Center brings up exactly... drumroll... 0 hits. I even looked a bit through the implementation of the MQConnectionFactory and didn't discover anything either.
Another backing system I looked into was the JmsTransactionManager. Searches for "jmstransactionmanager redelivery policy" or "jmstransactionmanager exponential backoff" did not turn up anything useful either. There was some talk about TransactionTemplate and AbstractMessageListenerContainer but 1) I didn't see any connection to redelivery policies, and 2) I could not figure out how those interact with Camel and JMS.
Sooo, does anybody have any idea how to implement exponential backoff redelivery policies with Apache Camel and IBM MQ?
Closing note: Camel supports redelivery policies on errorHandler and onException are not the same as redelivery policies in the transaction/connection backing system. Those handlers retry at the point of failure using the 'Exchange' object in whichever state it is, without rolling back and reprocessing the message from the start of the route. The transaction remains active during entire rety period, and a rollback only occurs when the errorHandler or onException gives up. This is not what I want for retries that may go on for many hours.
Looks like #JoshMc pointed me in the right direction. I managed to implement a RoutePolicy that delays redeliveries with increasing delays. I have run a test session for a few hours and several thousand redeliveries of the same message to see if there are any problems like memory leak, MQ connection exhaustion or so. I did not observe any problems. There were two stable TCP connections to the MQ manager, and memory usage of the Java process moved within a close range.
import java.util.Timer;
import java.util.TimerTask;
import javax.jms.Session;
import lombok.extern.log4j.Log4j2;
import org.apache.camel.CamelContext;
import org.apache.camel.CamelContextAware;
import org.apache.camel.Exchange;
import org.apache.camel.Message;
import org.apache.camel.Route;
import org.apache.camel.component.jms.JmsMessage;
import org.apache.camel.support.RoutePolicySupport;
#Log4j2
public class ExponentialBackoffPolicy extends RoutePolicySupport implements CamelContextAware {
final static String JMSX_DELIVERY_COUNT = "JMSXDeliveryCount";
private CamelContext camelContext;
#Override
public void setCamelContext(CamelContext camelContext) {
this.camelContext = camelContext;
}
#Override
public CamelContext getCamelContext() {
return this.camelContext;
}
#Override
public void onExchangeDone(Route route, Exchange exchange) {
try {
// ideally we would check if the exchange is transacted but onExchangeDone is called after the
// transaction is already rolled back, and the transaction context has already been removed.
if (exchange.getException() == null)
{
log.debug("No exception occurred, skipping route suspension.");
return;
}
int deliveryCount = getRetryCount(exchange);
int redeliveryDelay = getRedeliveryDelay(deliveryCount);
log.info("Suspending route {} for {}ms after exception. Current delivery count {}.",
route.getId(), redeliveryDelay, deliveryCount);
super.suspendRoute(route);
scheduleWakeup(route, redeliveryDelay);
} catch (Exception ex) {
// only log exception and let Camel continue as of this policy didn't exist.
log.error("Exception while suspending route", ex);
}
}
void scheduleWakeup(Route route, int redeliveryDelay) {
Timer timer = new Timer();
timer.schedule(
new TimerTask() {
#Override
public void run() {
log.info("Resuming route {} after redelivery delay of {}ms.", route.getId(), redeliveryDelay);
try {
resumeRoute(route);
} catch (Exception ex) {
// only log exception and let Camel continue as of this policy didn't exist.
log.error("Exception while resuming route", ex);
}
timer.cancel();
}
},
redeliveryDelay);
}
int getRetryCount(Exchange exchange) {
Message msg = exchange.getIn();
return (int) msg.getHeader(JMSX_DELIVERY_COUNT, 1);
}
int getRedeliveryDelay(int deliveryCount) {
// very crude backoff strategy for now, will need to refine later
if (deliveryCount < 10) return 1000;
if (deliveryCount < 20) return 5000;
if (deliveryCount < 30) return 20000;
return 60000;
}
}
And this is how it being used in route definitions:
from(mqConnectionString)
.routePolicy(new ExponentialBackoffPolicy())
.transacted()
...
// and if you want to distinguish between retriable and non-retriable situations, apply the following two exception handlers
onException(NonRetriableProcessingException.class)
.handled(true)
.log(LoggingLevel.WARN, "Non-retriable exception occurred, discard message.");
onException(Exception.class)
.handled(false)
.log(LoggingLevel.WARN, "Retriable exception occurred, retry message.");
One thing to note is that the JMSXDeliveryCount header comes from the MQ manager, and the redelivery delay is calculated from that. When you restart an application using the ExponentialBackoff policy while a message permanently fails, upon restart it will immediately attempt to reprocess that message but in case of another failure apply a delay corresponding to the total number of redeliveries, and not start over with the initial short delay.

How to manually ack/nack a PubSub message in Camel Route

I am setting up a Camel Route with ackMode=NONE meaning acknowlegements are not done automatically. How do I explicitly acknowledge the message in the route?
In my Camel Route definition I've set ackMode to NONE. According to the documentation, I should be able to manually acknowledge the message downstream:
https://github.com/apache/camel/blob/master/components/camel-google-pubsub/src/main/docs/google-pubsub-component.adoc
"AUTO = exchange gets ack’ed/nack’ed on completion. NONE = downstream process has to ack/nack explicitly"
However I cannot figure out how to send the ack.
from("google-pubsub:<project>:<subscription>?concurrentConsumers=1&maxMessagesPerPoll=1&ackMode=NONE")
.bean("processingBean");
My PubSub subscription has an acknowledgement deadline of 10 seconds and so my message keeps getting re-sent every 10 seconds due to ackMode=NONE. This is as expected. However I cannot find a way to manually acknowledge the message once processing is complete and stop the re-deliveries.
I was able to dig through the Camel components and figure out how it is done. First I created a GooglePubSubConnectionFactory bean:
#Bean
public GooglePubsubConnectionFactory googlePubsubConnectionFactory() {
GooglePubsubConnectionFactory connectionFactory = new GooglePubsubConnectionFactory();
connectionFactory.setCredentialsFileLocation(pubsubKey);
return connectionFactory;
}
Then I was able to reference the ack id of the message from the header:
#Header(GooglePubsubConstants.ACK_ID) String ackId
Then I used the following code to acknowledge the message:
List<String > ackIdList = new ArrayList<>();
ackIdList.add(ackId);
AcknowledgeRequest ackRequest = new AcknowledgeRequest().setAckIds(ackIdList);
Pubsub pubsub = googlePubsubConnectionFactory.getDefaultClient();
pubsub.projects().subscriptions().acknowledge("projects/<my project>/subscriptions/<my subscription>", ackRequest).execute();
I think it is best if you look how the Camel component does it with ackMode=AUTO. Have a look at this class (method acknowledge)
But why do you want to do this extra work? Camel is your fried to simplify integration by abstracting away low level code.
So when you use ackMode=AUTO Camel automatically commits your successfully processed messages (when the message has successfully passed the whole route) and rolls back your not processable messages.

Can async subscriber example lose messages?

Starting off with pubsub. When reading the google cloud documentation, i ran into a snippet of code, and i think i see a flaw with the example.
This is the code i am talking about. It uses the async subscriber.
public class SubscriberExample {
private static final String PROJECT_ID = ServiceOptions.getDefaultProjectId();
private static final BlockingQueue<PubsubMessage> messages = new LinkedBlockingDeque<>();
static class MessageReceiverExample implements MessageReceiver {
#Override
public void receiveMessage(PubsubMessage message, AckReplyConsumer consumer) {
messages.offer(message);
consumer.ack();
}
}
public static void main(String... args) throws Exception {
String subscriptionId = args[0];
ProjectSubscriptionName subscriptionName = ProjectSubscriptionName.of(
PROJECT_ID, subscriptionId);
Subscriber subscriber = null;
try {
subscriber =
Subscriber.newBuilder(subscriptionName, new MessageReceiverExample()).build();
subscriber.startAsync().awaitRunning();
while (true) {
PubsubMessage message = messages.take();
processMessage(message);
}
} finally {
if (subscriber != null) {
subscriber.stopAsync();
}
}
}
My question is, what if a bunch of messages have been acknowledged, and the BlockingQueue is not empty, and the server crashes. Then i would lose some messages right? (Acknowledged in PubSub, but not actually processed).
Wouldn't the best implementation be to only acknowledge the message after the it has been processed? Instead of acknowledging it and leaving it on a queue, and assuming it will be processed. I understand this will decouple the receiving of messages and process of messages, and potentially increase throughput, but still it risks losing messages right?
Yes, one should not acknowledge a message until it has been fully processed. Otherwise, the message may never be processed because it will not be redelivered in the event of a crash or restart if it was acknowledged. I have entered an issue to update the example.

Job Queue using Google PubSub

I want to have a simple task queue. There will be multiple consumers running on different machines, but I only want each task to be consumed once.
If I have multiple subscribers taking messages from a topic using the same subscription ID is there a chance that the message will be read twice?
I've tested something along these lines successfully but I'm concerned that there could be synchronization issues.
client = SubscriberClient.create(SubscriberSettings.defaultBuilder().build());
subName = SubscriptionName.create(projectId, "Queue");
client.createSubscription(subName, topicName, PushConfig.getDefaultInstance(), 0);
Thread subscriber = new Thread() {
public void run() {
while (!interrupted()) {
PullResponse response = subscriberClient.pull(subscriptionName, false, 1);
List<ReceivedMessage> messages = response.getReceivedMessagesList();
mess = messasges.get(0);
client.acknowledge(subscriptionName, ImmutableList.of(mess.getAckId()));
doSomethingWith(mess.getMessage().getData().toStringUtf8());
}
}
};
subscriber.start();
In short, yes there is a chance that some messages will be duplicated: GCP promises at-least-once delivery. Exactly-once-delivery is theoretically impossible in any distributed system. You should design your doSomethingWith code to be idempotent if possible so duplicate messages are not a problem.
You should also only acknowledge a message once you have finished processing it: what would happen if your machine dies after acknowledge but before doSomethingWith returns? your message will be lost! (this fundamental idea is why exactly-once delivery is impossible).
If losing messages is preferable to double processing them, you could add a locking process (write a "processed" token to a consistent database), but this can fail if the write is handled before the message is processed. But at this point you might be able to find a messaging technology that is designed for at-most-once, rather than optimised for reliability.

Is there a limit on the number of entities you can query from the GAE datastore?

My GCM Endpoint is derived from the code at /github.com/GoogleCloudPlatform/gradle-appengine-templates/tree/master/GcmEndpoints/root/src/main. Each Android client device
registers with the endpoint. A message can be sent to the first 10 registered devices using this code:
#Api(name = "messaging", version = "v1", namespace = #ApiNamespace(ownerDomain = "${endpointOwnerDomain}", ownerName = "${endpointOwnerDomain}", packagePath="${endpointPackagePath}"))
public class MessagingEndpoint {
private static final Logger log = Logger.getLogger(MessagingEndpoint.class.getName());
/** Api Keys can be obtained from the google cloud console */
private static final String API_KEY = System.getProperty("gcm.api.key");
/**
* Send to the first 10 devices (You can modify this to send to any number of devices or a specific device)
*
* #param message The message to send
*/
public void sendMessage(#Named("message") String message) throws IOException {
if(message == null || message.trim().length() == 0) {
log.warning("Not sending message because it is empty");
return;
}
// crop longer messages
if (message.length() > 1000) {
message = message.substring(0, 1000) + "[...]";
}
Sender sender = new Sender(API_KEY);
Message msg = new Message.Builder().addData("message", message).build();
List<RegistrationRecord> records = ofy().load().type(RegistrationRecord.class).limit(10).list();
for(RegistrationRecord record : records) {
Result result = sender.send(msg, record.getRegId(), 5);
if (result.getMessageId() != null) {
log.info("Message sent to " + record.getRegId());
String canonicalRegId = result.getCanonicalRegistrationId();
if (canonicalRegId != null) {
// if the regId changed, we have to update the datastore
log.info("Registration Id changed for " + record.getRegId() + " updating to " + canonicalRegId);
record.setRegId(canonicalRegId);
ofy().save().entity(record).now();
}
} else {
String error = result.getErrorCodeName();
if (error.equals(Constants.ERROR_NOT_REGISTERED)) {
log.warning("Registration Id " + record.getRegId() + " no longer registered with GCM, removing from datastore");
// if the device is no longer registered with Gcm, remove it from the datastore
ofy().delete().entity(record).now();
}
else {
log.warning("Error when sending message : " + error);
}
}
}
}
}
The above code sends to the first 10 registered devices. I would like to send to all registered clients. According to http://objectify-appengine.googlecode.com/svn/branches/allow-parent-filtering/javadoc/com/googlecode/objectify/cmd/Query.html#limit(int) setting limit(0) accomplishes this. But I'm not convinced there will not be a problem for very large numbers of registered clients due to memory constraints or the time it takes to execute the query. https://code.google.com/p/objectify-appengine/source/browse/Queries.wiki?repo=wiki states "Cursors let you take a "checkpoint" in a query result set, store the checkpoint elsewhere, and then resume from where you left off later. This is often used in combination with the Task Queue API to iterate through large datasets that cannot be processed in the 60s limit of a single request".
Note the comment about the 60s limit of a single request.
So my question - if I modified the sample code at /github.com/GoogleCloudPlatform/gradle-appengine-templates/tree/master/GcmEndpoints/root/src/main to request all objects from the datastore, by replacing limit(10) with limit(0), will this ever fail for a large number of objects? And if it will fail, roughly what number of objects?
This is a poor pattern, even with cursors. At the very least, you'll hit the hard 60s limit for a single request. And since you're doing updates on the RegistrationRecord, you need a transaction, which will slow down the process even more.
This is exactly what the task queue is for. The best way is to do it in two tasks:
Your api endpoint enqueues "send message to everyone" and returns immediately.
That first task is the "mapper" which iterates the RegistrationRecords with a keys-only query. For each key, enqueue a "reducer" task for "send X message to this record".
The reducer task sends the message and (in a transaction) performs your record update.
Using Deferred this actually isn't much code at all.
The first task frees you client immediately and gives you 10m to iterating RegistrationRecord keys rather than the 60s limit for a normal request. If you have your chunking right and batch queue submissions, you should be able to generate thousands of reducer tasks per second.
This will effortlessly scale to hundreds of thousands of users, and might get you into millions. If you need to scale higher, you can apply a map/reduce approach to parallelize the mapping. Then it's just a question of how many instances you want to throw at the problem.
I have used this approach to great effect in the past sending out millions of apple push notifications at a time. The task queue is your friend, use it heavily.
Your query will time out if you try to retrieve too many entities. You will need to use cursors in your loop.
No one can say how many entities can be retrieved before this timeout - it depends on the size of your entities, complexity of your query, and, most importantly, what else happens in your loop. For example, in your case you can dramatically speed up your loop (and thus retrieve many more entities before a timeout) by creating tasks instead of building and sending messages within the loop itself.
Note that by default a query returns entities in chunks of 20 - you will need to increase the chunk size if you have a large number of entities.

Resources