Mule 4 - Empty VM queue error consuming messages - batch-file

I have a flow in Mule 4 that reads data from a CSV file and inserts it into Salesforce using a Batch:
All Salesforce results are inserted into a non-persistent VM queue (transient by default).
All messages are inserted for each block of records and are consumed without problems at the end of the batch.
However, when I have finished, the following error appears after 10 seconds:
Message : Tried to consume messages from VM queue 'productQueue' but it was empty after timeout of 10 SECONDS.
Error type : VM:EMPTY_QUEUE
Element : testing-threadingSub_Flow/processors/0/processors/0 # testing-threading:testing-threading.xml:95 (Consume)
Element XML : <vm:consume doc:name="Consume" doc:id="6b7b2df6-c986-425c-a6f0-29613a876d37" config-ref="VM_Config" queueName="demoQueue" timeout="10"></vm:consume>
Why does the consumer of the queue run if there are no more messages to process?
I want this component to only read messages when it is his turn. Maybe I'm using the wrong kind of VM?

The VM consume operation will try reading from queue up to a specified timeout which is configurable and then log that error if the queue is empty.
Somehow your foreach block is executing the consume more times than the required amount/messages available. If you share you foreach xml configuration we might be able to see more as to why.
Other than solving why foreach is running the consume more than necessary. There are a few options to modify this behaviour:
Wrap the consume in in a try to supress the error:
<try doc:name="Try" >
<vm:consume ... />
<error-handler >
<on-error-continue enableNotifications="false" logException="false" type=" ">
<logger />
</on-error-continue>
</error-handler>
</try>
Or maybe do not use a consume, and use a different flow with a VM listener to listen for the messages on that VM queue. This might change how your app needs to work.

Related

Apache Flink : Batch Mode failing for Datastream API's with exception `IllegalStateException: Checkpointing is not allowed with sorted inputs.`

A continuation to this : Flink : Handling Keyed Streams with data older than application watermark
based on the suggestion, I have been trying to add support for Batch in the same Flink application which was using the Datastream API's.
The logic is something like this :
streamExecutionEnvironment.setRuntimeMode(RuntimeExecutionMode.BATCH);
streamExecutionEnvironment.readTextFile("fileName")
.process(process function which transforms input)
.assignTimestampsAndWatermarks(WatermarkStrategy
.<DetectionEvent>forBoundedOutOfOrderness(orderness)
.withTimestampAssigner(
(SerializableTimestampAssigner<Event>) (event, l) -> event.getEventTime()))
.keyBy(keyFunction)
.window(TumblingEventWindows(Time.of(x days))
.process(processWindowFunction);
Based on the public docs, my understanding was that i simply needed to change the source to a bounded one. However the above processing keeps on failing at the event trigger after the windowing step with the below exception :
java.lang.IllegalStateException: Checkpointing is not allowed with sorted inputs.
at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.init(OneInputStreamTask.java:99)
at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:552)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:537)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:764)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:571)
at java.base/java.lang.Thread.run(Thread.java:829)
The input file contains the historical events for multiple keys. The data for a given key is sorted, but the overall data is not. I have also added an event at the end of each key with the timestamp = MAX_WATERMARK to indicate end of keyed Stream. I tried it for a single key as well but the processing failed with the same exception.
Note: I have not enabled checkpointing.
I have also tried explicitly disabling checkpointing to no avail.
env.getCheckpointConfig().disableCheckpointing();
EDIT - 1
Adding more details :
I tried changing and using FileSource to read files but still getting the same exception.
environment.fromSource(FileSource.forRecordStreamFormat(new TextLineFormat(), path).build(),
WatermarkStrategy.noWatermarks(),
"Text File")
The first process step and key splitting works. However it fails after that. I tried removing windowing and adding a simple process step but it continues to fail.
There is no explicit Sink. The last process function simply updates a database.
Is there something I'm missing ?
That exception can only be thrown if checkpointing is enabled. Perhaps you can a checkpointing interval configured in flink-conf.yaml?

how to perform parallel processing of gcp pubsub messages in apache camel

I have this code below that takes message from pubsub source topic -> transform it as per a template -> then publish the transformed message to a target topic.
But to improve performance I need to do this task in parallel.That is i need to poll 500 messages,and then transform it in parallel and then publish them to the target topic.
From the camel gcp component documentation I believe maxMessagesPerPoll and concurrentConsumers parameter will do the job.Due to lack of documentation I am not sure how does it internally works.
I mean a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic b)what about ordering of the messages c) should I be looking at parallel processing EIPs as an alternative
etc.
The concept is not clear to me
Was go
// my route
private void addRouteToContext(final PubSub pubSub) throws Exception {
this.camelContext.addRoutes(new RouteBuilder() {
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("google-pubsub:{{gcp_project_id}}:{{pubsub.dead.letter.topic}}")
.useOriginalMessage().onPrepareFailure(new FailureProcessor()));
/*
* from topic
*/
from("google-pubsub:{{gcp_project_id}}:" + pubSub.getFromSubscription() + "?"
+ "maxMessagesPerPoll={{consumer.maxMessagesPerPoll}}&"
+ "concurrentConsumers={{consumer.concurrentConsumers}}").
/*
* transform using the velocity
*/
to("velocity:" + pubSub.getToTemplate() + "?contentCache=true").
/*
* attach header to the transform message
*/
setHeader("Header ", simple("${date:now:yyyyMMdd}")).routeId(pubSub.getRouteId()).
/*
* log the transformed event
*/
log("${body}").
/*
* publish the transformed event to the target topic
*/
to("google-pubsub:{{gcp_project_id}}:" + pubSub.getToTopic());
}
});
}
a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic
No, Camel does not create 500 parallel threads in this case. As you suspect, the number of concurrent consumer threads is set with concurrentConsumers. So if you define 5 concurrentConsumers with a maxMessagesPerPoll of 500, every consumer will fetch up to 500 messages and process them one after the other in a single thread. That is, you have 5 messages processed in parallel.
what about ordering of the messages
As soon as you process messages in parallel, the order of messages is messed up. But this already happens with 1 Consumer when you got processing errors and they are detoured to your deadLetterChannel and reprocessed later.
should I be looking at parallel processing EIPs as an alternative
Only if the concurrentConsumers option is not sufficient.
When you mention the concurrentConsumers option(let's say concurrentConsumers=10), you are asking Camel to create a thread pool of 10 threads, and each of those 10 threads will pick up a different message from the pub-sub queue and process them.
The thing to note here is that when you are specifying the concurrentConsumers option, the thread pool uses a fixed size, which means that a fixed number of active threads are waiting at all times to process incoming messages. So 10 threads(since I specified concurrentConsumers=10) will be waiting to process my messages, even if there aren't 10 messages coming in simultaneously.
Obviously, this is not going to guarantee that the incoming messages will be processed in the same order. If you are looking to have the messages in the same order, you can have a look at the Resequencer EIP to order your messages.
As for your third question, I don't think google-pubsub component allows a parallel processing option. You can make your own using the Threads EIP. This would definitely give more control over your concurrency.
Using Threads, your code would look something like this:
from("google-pubsub:project-id:destinationName?maxMessagesPerPoll=20")
// the 2 parameters are 'pool size' and 'max pool size'
.threads(5, 20)
.to("direct:out");

Infinite AMQP Consumer with Alpakka

I'm trying to implement a very simple service connected to an AMQP broker with Alpakka. I just want it to consume messages from its queue as a stream at the moment they are pushed on a given exchange/topic.
Everything seemed to work fine in my tests, but when I tried to start my service, I realized that my stream was only consuming my messages once and then exited.
Basically I'm using the code from Alpakka documentation :
def consume()={
val amqpSource = AmqpSource.committableSource(
TemporaryQueueSourceSettings(connectionProvider, exchangeName)
.withDeclaration(exchangeDeclaration)
.withRoutingKey(topic),
bufferSize = prefetchCount
)
val amqpSink = AmqpSink.replyTo(AmqpReplyToSinkSettings(connectionProvider))
amqpSource.mapAsync(4)(msg => onMessage(msg)).runWith(amqpSink)
}
I tried to schedule the consume() execution every second, but I experienced OutOfMemoryException issues.
Is there any proper way to make this code run as an infinite loop ?
If you want to have a Source restarted when it fails or is cancelled, wrap it with RestartSource.withBackoff.

Using MPI Ibcast with multiple roots

I have an assignment with mpi.
In the assignment a slave must share some data to all the other slaves.
Let tasks = {1,2,3}
Task 1 broadcasts a message to all other tasks using Ibcast with root set to 1.
At the same time, task 2 broadcasts a message to all other slaves using Ibcast with root set to 2.
After a while, all the tasks do
For (i : tasks | where i != myRank)
Use Ibcast to recive message from root I
I Have a code written in c doing the example above.
The code works only if 1 task broadcasts a message,
but if two tasks are broadcasting messages at the same time (before, any other task has received any message)
I’m getting that the request never completes ( using mpi test function on the request handler from ibcast - in the foreach loop).
I’m not sure if I’m doing something wrong or mpi Ibcast does not provide expected behavior in this case.
Thanks

how to manually set a task to run in a gae queue for the second time

I have a task that runs in GAE queue.
according to my logic, I want to determine if the task will run again or not.
I don't want it do be normally executed by the queue and then to put it again in the queue
because I want to have the ability to check the "X-AppEngine-TaskRetryCount"
and quit trying after several attempts.
To my understanding it seems that the only case that a task will re-executed is when an internal GAE error will happen (or If my code will take too long in a "DeadlineExceededException" cases..(And I don't want to hold the code "hostage" for that long :) )
How can I re-enter a task to the queue in a manner that GAE will set X-AppEngine-TaskRetryCount ++ ??
You can programmatically retry / restart a task using a self.error() in python.
From the docs: App engine retries a task by returning any HTTP status code outside of the range 200–299
And at the beginning of the task you can test for the number of retries using:
retries = int(self.request.headers['X-Appengine-Taskretrycount'])
if retries < 10 :
self.error(409)
return

Resources