Apache Camel: complete exchanges when an aggregated exchange is completed - apache-camel

In my Apache Camel application, I have a very simple route:
from("aws-sqs://...")
.aggregate(constant(true), new AggregationStrategy())
.completionSize(100)
.to("SEND_AGGREGATE_VIA_HTTP");
That is, it takes messages from AWS SQS, groups them in batches of 100, and sends them via HTTP somewhere.
Exchanges with messages from SQS are completed successfully on getting into the aggregate stage, and SqsConsumer deletes them from the queue at this point.
The problem is that something might happen with an aggregated exchange (it might be delivered with an error), and messages will be lost. I would really like these original exchanges to be completed successfully (messages to be deleted from a queue) only when an aggregated exchange they're in is also completed successfully (a batch of messages is delivered). Is there a way to do this?
Thank you.

You could set deleteAfterRead to false and manually delete the messages after you've sent them to you HTTP endpoint; You could use a bean or a processor and send the proper SQS delete requests through the AWS SDK library. It's a workaround, granted, but I don't see a better way of doing it.

Related

Apache Flink - how to stop and resume stream processing on downstream failure

I have a Flink application that consumes incoming messages on a Kafka topic with multiple partitions, does some processing then sends them to a sink that sends them over HTTP to an external service. Sometimes the downstream service is down the stream processing needs to stop until it is back in action.
There are two approaches I am considering.
Throw an exception when the Http sink fails to send the output message. This will cause the task and job to restart according to the configured restart strategy. Eventually the downstream service will be back and the system will continue where it left off.
Have the Sink sleep and retry on failure; it can do this continually until the downstream service is back.
From what I understand and from my PoC, with 1. I will lose exactly-least once guarantees since the sink itself is external state. As far as I can see, you cannot make a simple HTTP endpoint transactional, as it needs to be to implement TwoPhaseCommitSinkFunction.
With 2. this is less of an issue since pipeline will not proceed until the sink makes a successful write, and I can rely on back pressure throughout the system to pause the retrieval of messages from the Kafka source.
The main questions I have are:
Is it a correct assumption that you can't make a TwoPhaseCommitSinkFunction for a simple HTTP endpoint?
Which of the two strategies, or neither, makes the most sense?
Am I missing simpler obvious solutions?
I think you can try AsyncIO in Flink - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/.
Try to make the HTTP endpoint send a response once all operation has been done for the request, e.g. In http server, the process for the request has been done and the result has been committed to DB. Then use a http async client in AsyncIO operator. The AsyncIO operator will wait until the response is received by the operator. If any error happened, the Flink streaming pipeline will fail and restart the pipeline based on recovery strategy.
All requests to HTTP endpoint without receiving response will be in the internal buffer of AsyncIO operator, and once streaming pipeline failed, the requests pending in the buffer will be saved in the checkpoint state. It will also trigger back pressure when the internal buffer is full.

For Cloud Run triggered from PubSub, when is the right time to send ACK for the request message?

I was building a service that runs on Cloud Run that is triggered by PubSub through EventArc.
'PubSub' guarantees delivery at least one time and it would retry for every acknowledgement deadline. This deadline is set in the queue subscription details.
We could send an acknowledgement back at two points when a service receives a pub-sub request (which is received as a POST request in the service).
At the beginning of the request as soon as the request was received. The service would then continue to process the request at its own pace. However, this article points out that
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So sending a response at the beginning may not be an option
After the request has been processed by the service. So this would mean that, depending on what the service would do, we cannot always predict how long it would take to process the request. Hence we cannot set the Acknowledgement deadline correctly, resulting in PubSub retries and duplicate requests.
So what is the best practice here? Is there a better way to handle this?
Best practice is generally to ack a message once the processing is complete. In addition to the Cloud Run limitation you linked, consider that if the endpoint acked a message immediately upon receipt and then an error occurred in processing it, your application could lose that message.
To minimize duplicates, you can set the ack deadline to an upper bound of the processing time. (If your endpoint ends up processing messages faster than this, the ack deadline won’t rate-limit incoming messages.) If the 600s deadline is not sufficient, you could consider writing the message to some persistent storage and then acking it. Then, a separate worker can asynchronously process the messages from persistent storage.
Since you are concerned that you might not be able to set the correct "Acknowledgement Deadline", you can use modify_ack_deadline() in your code where you can dynamically extend your deadline if the process is still running. You can refer to this document for sample code implementations.
Be wary that the maximum acknowledgement deadline is 600 seconds. Just make sure that your processing in cloud run does not exceed the said limit.
Acknowledgements do not apply to Cloud Run, because acks are for "pull subscriptions" where a process is continuously pulling the Cloud PubSub API.
To get events from PubSub into Cloud Run, you use "push subscriptions" where PubSub makes an HTTP request to Cloud Run, and waits for it to finish.
In this push scenario, PubSub already knows it made you a request (you received the event) so it does not need an acknowledgement about the receipt of the message. However, if your request sends a faulty response code (e.g. http 500) PubSub will make another request to retry (and this is configurable on the Push Subscription itself).

How to ensure ordering in camel context polling weblogic jms queue

I am trying to setup a camel context that polls a weblogic jms queue, consumes the message and sends it to a webservice endpoint. Incase any error occurs in the transaction or the target system is unavailable, I need to redeliver the same message without losing sequence/ordering.
I have set up a camel jms route with single consumer and enabled transacted attribute as per https://camel.apache.org/transactional-client.html and set the redelivery as unlimited.
When the transaction fails with messageA, the jms message consumption from weblogic queue is rollbacked and the messageA is marked for redelivery (state string is marked delayed) in weblogic. But during this time if another message reaches the weblogic queue, the camel route picks the messageB and forwards it to the target endpoint although the messageA is still in retrying mode. This distorts the whole ordering of the messages.
The transaction client is used to ensure that messages are not lost while the application is shutdown during redelivery.
I expect that there are no message loss and the messages are always sent in the correct order as per generated into the weblogic queue to the target endpoint.
That a newly arrived message outpaces an existing message that has to be redelivered, sounds like a broker (Weblogic) issue or feature.
I never saw this behavior with ActiveMQ. A failed message that is redelivered, is redelivered immediately by the Apache broker.
It sounds like the message is "put aside" internally to redeliver it later. This can absolutely make sense to avoid blocking message processing.
Is there something like a "redelivery delay" set on Weblogic you can configure? I could imagine that a delayed redelivery is like an internal error queue with a scheduled consumer.

JMS, how to implement request-reply pattern using two message brokers

Is it possible to implement asynchronous request-reply pattern using two JMS message broker instances? Such that service A sends message to queue A, and gets the response from queue B (different broker instance)?
Does JMS API (or Apache Camel) provide some complete mechanism to achieve this transparently with correlation identifiers? What is the necessary configuration?
Bonus question:
To shuffle deck even more, I would like to cluster the queues. Could this be achieved transparently as per specification? Basically I have multiple Spring boot applications (services) with ActiveMQ broker embedded in the Spring context. Each broker acts as an one-way channel for the service, and each service excepts a response for a specific message to other service in it's own broker. Now, I would like run multiple instances of each services and retain the correlation between messages.
Since your request sender (A) and response receiver (B) are two different services - it becomes your responsiblity to store context of original request somewhere.
Probably the easiest is to use a dedicated queue for this purpose (let's call it echo).
(A) could store original request message with some specific correlation Id, for example taken from message Id of request sent. The same correlation Id is supposed to be set for response message, which will be taken by (B). So once response is recieved by (B), it is able to get the original request from echo queue, using selector with the same correlation Id.

Asynchronous delivery support using Apache Camel routes and ActiveMQ

Please help me in finalizing architecture for Asynchronous delivery support using Apache Camel and ActiveMQ and below I have explained point by point basis about my requirement.
I have Jetty Server receiving incoming messages and ActiveMQ to store it in disk using Kaha DB.
Active MQ sends ack once it stores in kaha DB back to client.
I have Spring AbstractPollingMessageListenerContainer JMS Message listener which picks up the message from activemq queue every 1 second and dispatch to Camel HTTP endpoints and then finally sent to actual remote receivers.
Once Dispatcher thread gets response from remote receivers it deletes message from ActiveMQ.
Assume that I have many slow remote receivers in that case my Dispatcher thread created by AbstractPollingMessageListenerContainer remains blocked until I get response from remote receivers. This results to creation of new Dispatcher threads since already created Dispatcher threads are not able to dispatch new messages from ActiveMQ queue.
Now creation of many Dispatcher threads result into more CPU usage which impacts overall performance.
Now my requirement is I want Dispatcher thread only to dispatch messages from ActiveMQ queue to HTTP endpoint and forget and also not do acknowledgement so that message is still in queue.
Also I will not let Dispatcher thread to wait till I get response so I have thought to handle response using separate thread and this same thread will only delete message from ActiveMQ queue.
So my current architecture is like below:
Camel Jetty Server ----> ActiveMQ queue ----> Dispatcher Thread ---> Camel Direct endpoint ----> Camel HTTP endpoint ---> remote receivers sending response back ---> response ---> Dispatcher Thread (sends ack to delete messages from ActiveMQ queue) ----> ActiveMQ Queue.
Here I feel since we are using Direct endpoint which is synchronous so Dispatcher thread remains active till it gets response and so same dispatcher thread is not able to process further new message from ActiveMQ queue.
Please suggest if some thing else I can use here to avoid Direct endpoint.
I used SEDA endpoint but drawback is it processes 1 message using 1 thread and also gets blocked till it gets response from receivers.
In this approach previously Dispatcher thread gets blocked but now Seda consumer threads gets blocked and could not dispatch new messages from in memory queue of SEDA towards remote receivers.
I feel some kind of design which helps me in keep on sending message to remote receivers and only when response comes back some daemeon thread gets notified and it will handle acknowledgement towards activeMQ. Also I thought to use NIO framework implementation like Camel netty/netty4-http component but could not find exact usage and how to fit it in current architecture.
Modified architecture should be like below:
Camel Jetty Server ----> ActiveMQ queue ----> Dispatcher Thread--->Unknown Stuff ----> Camel HTTP Endpoint ---> remote receivers sending response back--->Unknown Stuff (sends ack to delete messages from ActiveMQ queue) ----> ActiveMQ Queue
Please help me in finalising Unknown Stuff and I am posting my query after doing enough R & D.
Also new ideas are welcome and please give me idea with a restriction that I must persist the message and delete it only after getting success response from remote receiver. Also I have to design architecture only using Apache Camel routes.
Route Definitions:
1. Dispatcher Route:
from(fromUri)to(toUris);
fromUri:
[ActiveMQueue.http1270018081testEndpoint1:queue:ActiveMQueue?maxConcurrentConsumers=15&concurrentConsumers=3&maxMessagesPerTask=10&messageListenerContainerFactoryRef=AbstractPollingMessageListenerContainer ]
ToUris:
[ActiveMQ.DLQ:queue:ActiveMQ.DLQ, direct:http1270018081testEndpoint1]
2.Remote Receiver Proxy Route:
fromUri:direct:http1270018081testEndpoint1
from(fromUri).to(toUri).process(responseProcessor)
toUri:http://127.0.0.1:8081/testEndpoint1?bridgeEndpoint=true
responseProcessor: To process response received by remote receiver.
Overall Route looks like below:
Dispatcher Route---> Remote Receiver Route---> Remote Server
JMS message acknowledgement is done under the covers, so your only way to really "send the acknowledgement back to the queue" is to use a JMS transaction (doesn't need to be XA)
It sounds like a LLR-style transaction would be useful and drastically simplify things for you. If you consume the message from the queue using a JMS-local transaction, and only have one other endpoint, the message will only be acknowledged and removed from the queue when the http send is completed-- even though HTTP doesn't support transactions. You can then have a number of concurrent consumers to run in parallel and combine with throttling to help with rate limiting.
from: amq:queue:INPUT.REQUESTS?.. concurrentConsumers.. and transacted enabled
throttle
to: http://url

Resources