Only processing complete group messages Apache Camel

Only processing complete group messages Apache Camel - apache-camel

Is there a way to make it so that camel will only process grouped messages in a queue that has a complete sequence? For example, if I send message (GroupSeqNumber) 1 and the last message (GroupSeqNumber) 3 I don't want it to process the message until message (GroupSeqNumber) 2 enters the queue. Right now my implementation will only process messages after the last message is sent regardless if the grouped messages have a complete sequence or not.
from("temp:queue:temp").aggregate({Combines body of grouped messages}).process({Processing class});

Related

What is the cardinality between Message and Exchange

I am trying to understand the cardinality between Messages and Exchanges. In the following route, by the time one message makes it to the final log how many Exchanges are created?
from("timer:one-second-timer")
.bean(messageTransformer, "transformMessage")
.to("log:logging-end-point");
Since a Message can hold only one "in" message, I imagine there will be one Message for each end-point the message hops on. Is this true?

You can consider an Exchange as being the envelope containing the Message + some meta-data (the Properties), allowing this message to be transported between endpoints.
The javadoc says:
An Exchange is the message container holding the information during
the entire routing of a Message received by a Consumer.
So, in your example, if the timer is fired 10 times, this will result in 10 distinct exchanges of one message.

Is there a workaround for retrieving data for more than 1000 messages using SendGrid Filter All Messages?

I have a requirement of retrieving the activity for certain category of emails sent through our SendGrid account with events occurring before specific time.
For this, I first invoke the filter all messages endpoint of SendGrid to retrieve message IDs of the specific category of messages I am interested in bounding above using last_event_time parameter of the endpoint followed by retrieving activity for individual messages using the Filter message by message ID endpoint with the retrieved message ids.
The problem that I am faced with is that the filter all messages has a limit parameter with maximum value of 1000 while the number of messages for which the last_event_time is equal to a specific timestamp say '2021-10-10T10:10:10Z' can be more than 1000.
In this case, the use of timestamp to iteratively filter messages doesn't work as the response from filter all messages contains data for the same set of 1000 messages. I though of using the message ID's from the data retrieved through filter all messages to exclude data for those in subsequent calls and even tried that out but it errored out because the request uri was too long.
Not sure if I am missing something here.

You can paginate through the responses by adding limit and offset parameters to your request URL.
For example:
https://api.sendgrid.com/v3/resource?limit=1000&offset=1000

How to make sure that email is not sent twice with task queue usage?

I would like to change my GAE app logic and to start emails sending with task queue usage.
Currently I have a cron job, which runs each 15 minutes and read messages to be sent from the datastore:
class SendMessagesHandler(webapp2.RequestHandler):
def get(self):
emails_quota_exceeded = models.get_system_value('emails_quota_exceeded')
if emails_quota_exceeded == 0 or emails_quota_exceeded == None:
messages = models.get_emails_queue()
for message in messages:
try:
...
email.send()
models.update_email_status(message.key.id()) # update email status indicating that the mail has been sent
except apiproxy_errors.OverQuotaError, error_message:
models.set_system_value(what='emails_quota_exceeded', val=1)
logging.warning('E-mails quota exceeded for today: %s' % error_message)
break
else:
logging.info('Free quota to send e-mails is exceeded')
If I use task queues, then I'll get something like:
for message in messages:
taskqueue.add(url='/sendmsg', payload=message)
In this scenario it is possible that the same message will be sent twice (or even more times) - for ex., if it wasn't sent yet, but cron job was executed second time.
If I update email status immediately after adding the message to the queue:
for message in messages:
taskqueue.add(url='/sendmsg', payload=message)
models.update_email_status(message.key.id()) # update email status indicating that the mail has been sent
then it is possible that the message will never be sent. For ex., if exception happened during e-mail sending. Understand that the task will be retried, but in case quota is exceeded for today, then retries will not help.
I think I can also re-read the status of each message at task queue before trying to sent it, but it will cost me additional read operations.
What's the best way to handle it?

Giving your task a name including the key.id() will prevent it from being sent twice:
task_name = ''.join(['myemail-', str(mykey)])
try:
taskqueue.Task(
url="/someURL/send-single-email",
name=task_name,
method="POST",
params={
"subject": subject,
"body": body,
"to": to,
"from": from }
).add(queue_name="mail-queue")
except:
pass #throws TombstonedTaskError(InvalidTaskError) if tombstoned name used.
There may be times when you want to send follow-up emails for messages with the same key. Therefore, I would recommend adding a date or datetime stamp to the task name. This will allow you to send other messages of the same key at a later time:
task_name = ''.join(['myemail-', str(mykey), str(datetime.utcnow()-timedelta(hours=8))]).translate(string.maketrans('.:_ ', '----'))

Apache Camel - Make Aggregator 'flush'

I effectively want a flush, or a completionSize but for all the aggregations in the aggregator. Like a global completionSize.
Basically I want to make sure that every message that comes in a batch is aggregated and then have all the aggregations in that aggregator complete at once when the last one has been read.
e.g. 1000 messages arrive (the length is not known beforehand)
aggregate on correlation id into bins
A 300
B 400
C 300 (size of the bins is not known before hand)
I want the aggregator not to complete until the 1000th exchange is aggregated
thereupon I want all of the aggregations in the aggregator to complete at once
The CompleteSize applies to each aggregation, and not the aggregator as a whole unfortunately. So if I set CompleteSize( 1000 ) it will just never finish, since each aggregation has to exceed 1000 before it is 'complete'
I could get around it by building up a single Map object, but this is kind of sidestepping the correlation in aggregator2, that I would prefer to use ideally
so yeah, either global-complete-size or flushing, is there a way to do this intelligently?

one option is to simply add some logic to keep a global counter and set the Exchange.AGGREGATION_COMPLETE_ALL_GROUPS header once its reached...
Available as of Camel 2.9...You can manually complete all current aggregated exchanges by sending in a message containing the header Exchange.AGGREGATION_COMPLETE_ALL_GROUPS set to true. The message is considered a signal message only, the message headers/contents will not be processed otherwise.

I suggest to take a look at the Camel aggregator eip doc http://camel.apache.org/aggregator2, and read about the different completion conditions. And as well that special message Ben refers to you can send to signal to complete all in-flight aggregates.
If you consume from a batch consumer http://camel.apache.org/batch-consumer.html then you can use a special completion that complets when the batch is done. For example if you pickup files or rows from a JPA database table etc. Then when all messages from the batch consumer has been processed then the aggregator can signal completion for all these aggregated messages, using the completionFromBatchConsumer option.
Also if you have a copy of the Camel in Action book, then read chapter 8, section 8.2, as its all about the aggregate EIP covered in much more details.

Using Exchange.AGGREGATION_COMPLETE_ALL_GROUPS_INCLUSIVE worked for me:
from(endpoint)
.unmarshal(csvFormat)
.split(body())
.bean(CsvProcessor())
.choice()
// If all messages are processed,
// flush the aggregation
.`when`(simple("\${property.CamelSplitComplete}"))
.setHeader(Exchange.AGGREGATION_COMPLETE_ALL_GROUPS_INCLUSIVE, constant(true))
.end()
.aggregate(simple("\${body.trackingKey}"),
AggregationStrategies.bean(OrderAggregationStrategy()))
.completionTimeout(10000)

MPI_Recv with list of sources

Is it possible to have a list of sources passed to an MPI_Recv (or equivalent) call? Currently, my code looks something like:
do i=nod1,nod2
call mpi_recv(tmp,n,MPI_REAL,MPI_ANY_SOURCE,tag,MPI_COMM_WORLD,status,ierr)
... do stuff with tmp here
call mpi_send(tmp,n,MPI_REAL,status(MPI_SOURCE),tag,MPI_COMM_WORLD,ierr)
enddo
Of course, this doesn't guarantee that it does what I want. (if nod1 sends two messages here before nod2 can send one message, then nod2's message is not recieved during this iteration, which would be bad.) In my application, that can't happen since nod1 and nod2 have other constraints which force them to be synchronized (enough) with each other ... but it got me wondering if there is way to specify a list of procs that are permitted to be recieved from.

Not as such. However, you can use MPI_Probe() with MPI_ANY_SOURCE, and then compare the MPI_SOURCE field of the status object against your list of processes you want to receive from. If there's a match, you can then proceed to receive from that source with a regular blocking receive.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Only processing complete group messages Apache Camel - apache-camel

Related

What is the cardinality between Message and Exchange

Is there a workaround for retrieving data for more than 1000 messages using SendGrid Filter All Messages?

How to make sure that email is not sent twice with task queue usage?

Apache Camel - Make Aggregator 'flush'

MPI_Recv with list of sources

Categories

Resources