Can you configure Camel Saga compensation retries? - apache-camel

Given this saga definition:
from("direct:bookTrip")
.saga(
.setHeader("id", header(Exchange.TIMER_COUNTER))
.setHeader(Exchange.HTTP_METHOD, constant("POST"))
.log("Executing saga #${header.id}")
.to("http4://camel-saga-train-service:8080/api/train/buy/seat")
.to("http4://camel-saga-flight-service:8080/api/flight/buy");
rest().post("/train/buy/seat")
.param().type(RestParamType.header).name("id").required(true).endParam()
.route()
.saga()
.propagation(SagaPropagation.MANDATORY)
.option("id", header("id"))
.compensation("http4://camel-saga-payment-service:8080/api/cancel&type=train")
.log("Buying flight #${header.id}")
.to("http4://camel-saga-payment-service:8080/api/pay?bridgeEndpoint=true&type=train")
.log("Payment for train #${header.id} done");
rest().post("/flight/buy")
.param().type(RestParamType.header).name("id").required(true).endParam()
.route()
.saga()
.propagation(SagaPropagation.MANDATORY)
.option("id", header("id"))
.compensation("http4://camel-saga-payment-service:8080/api/cancel&type=flight")
.log("Buying flight #${header.id}")
.to("http4://camel-saga-payment-service:8080/api/pay?bridgeEndpoint=true&type=flight")
.log("Payment for flight #${header.id} done");
Imagine the /flight/buy call failed, then the train compensation would be executed -> .compensation("http4://camel-saga-payment-service:8080/api/cancel&type=train")
Given that this is an HTTP call, the compensation invocation could fail. Is there a way to configure the number of retries and backoff period of the compensation process?
Currently it tries to compensate 5 times:
Exception thrown during compensation at xxx. Attempt 1 of 5

Related

Google Cloud Run pubsub pull listener app fails to start

I'm testing pubsub "pull" subscriber on Cloud Run using just listener part of this sample java code (SubscribeAsyncExample...reworked slightly to fit in my SpringBoot app):
https://cloud.google.com/pubsub/docs/quickstart-client-libraries#java_1
It fails to startup during deploy...but while it's trying to start, it does pull items from the pubsub queue. Originally, I had an HTTP "push" receiver (a #RestController) on a different pubsub topic and that worked fine. Any suggestions? I'm new to Cloud Run. Thanks.
Deploying...
Creating Revision... Cloud Run error: Container failed to start. Failed to start and then listen on the port defined
by the PORT environment variable. Logs for this revision might contain more information....failed
Deployment failed
In logs:
2020-08-11 18:43:22.688 INFO 1 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 4606 ms
2020-08-11T18:43:25.287759Z Listening for messages on projects/ce-cxmo-dev/subscriptions/AndySubscriptionPull:
2020-08-11T18:43:25.351650801Z Container Sandbox: Unsupported syscall setsockopt(0x18,0x29,0x31,0x3eca02dfd974,0x4,0x28). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.
2020-08-11T18:43:25.351770555Z Container Sandbox: Unsupported syscall setsockopt(0x18,0x29,0x12,0x3eca02dfd97c,0x4,0x28). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.
2020-08-11 18:43:25.680 WARN 1 --- [ault-executor-0] i.g.n.s.i.n.u.internal.MacAddressUtil : Failed to find a usable hardware address from the network interfaces; using random bytes: ae:2c:fb:e7:92:9c:2b:24
2020-08-11T18:45:36.282714Z Id: 1421389098497572
2020-08-11T18:45:36.282763Z Data: We be pub-sub'n in pull mode2!!
Nothing else after this and the app stops running.
#Component
public class AndyTopicPullRecv {
public AndyTopicPullRecv()
{
subscribeAsyncExample("ce-cxmo-dev", "AndySubscriptionPull");
}
public static void subscribeAsyncExample(String projectId, String subscriptionId) {
ProjectSubscriptionName subscriptionName =
ProjectSubscriptionName.of(projectId, subscriptionId);
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
// Handle incoming message, then ack the received message.
System.out.println("Id: " + message.getMessageId());
System.out.println("Data: " + message.getData().toStringUtf8());
consumer.ack();
};
Subscriber subscriber = null;
try {
subscriber = Subscriber.newBuilder(subscriptionName, receiver).build();
// Start the subscriber.
subscriber.startAsync().awaitRunning();
System.out.printf("Listening for messages on %s:\n", subscriptionName.toString());
// Allow the subscriber to run for 30s unless an unrecoverable error occurs.
// subscriber.awaitTerminated(30, TimeUnit.SECONDS);
subscriber.awaitTerminated();
System.out.printf("Async subscribe terminated on %s:\n", subscriptionName.toString());
// } catch (TimeoutException timeoutException) {
} catch (Exception e) {
// Shut down the subscriber after 30s. Stop receiving messages.
subscriber.stopAsync();
System.out.printf("Async subscriber exception: " + e);
}
}
}
Kolban question is very important!! With the shared code, I would like to say "No". The Cloud Run contract is clear:
Your service must answer to HTTP request. Out of request, you pay nothing and no CPU is dedicated to your instance (the instance is like a daemon when no request is processing)
Your service must be stateless (not your case here, I won't take time on this)
If you want to pull your PubSub subscription, create an endpoint in your code with a Rest controller. While you are processing this request, run your pull mechanism and process messages.
This endpoint can be called by Cloud Scheduler regularly to keep the process up.
Be careful, you have a max request processing timeout at 15 minutes (today, subject to change in a near future). So, you can't run your process more than 15 minutes. Make it resilient to fail and set your scheduler to call your service every 15 minutes

AggregationStrategy warns on timeout all the time

Why does AggregationStrategy implementation always log a warning when it times out? I do not see any exchange/data loss in the aggregation, when this happens.
AggregateProcessor calls this timeout method when the completionTimeout requirement has been met. Any logging of that event could be debug or informational, but shouldn't rise to a warning.
2020-06-25 16:06:54.454 WARN 1 --- [eTimeoutChecker] o.e.s.e.a.ElasticBulkAggregationStrategy : Parallel processing timed out after 1000 millis for number -1. This task will be cancelled and will not be aggregated.
Here is the aggregate portion of my route.
from (direct:...)
...
.aggregate(constant(true)).id("aggregator"+id)
.aggregationStrategyRef("elasticAggregationStrategy")
.completionSize(aggregatorbatchSize)
.completionTimeout(aggregatorbatchTimeout)
.to("seda:aggregatedPayload")
.end()
It is harmless. There is open Jira CAMEL-15244 to remove/reduce severity of the message.
You have following options:
Ignore the warning
Submit PR - please add Jira comment before working on task
Wait until someone resolves that

Apache Camel - Parallel Routes Inflight Exchanges

I have a Camel context with many routes that starts every 15m with Timer Component.
These routes set some properties in exchange (Target host, Query and Current Date that I use a Processor to get date, -12 hours and transform to GMT).
After set these properties, using Direct, another route is called to execute the HTTP Get. When the Request finished, another Route is called to Post the return on Artemis ActiveMQ.
The project is deployed on Wildfly 13.
The problem is:
Sometimes the routes simply freeze. Don't start after 15 minutes.
When I try to stop/start the route, I got the follow log:
[0m[0m08:27:45,230 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) There are 1 inflight exchanges: InflightExchange: [exchangeId=ID-exchange-ID, fromRouteId=Route1, routeId=GetDataAutoBySinceTime, nodeId=toD7, elapsed=0, duration=216958569]
[0m[0m08:27:46,231 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 299 seconds. Inflights per route: [Route1 = 1]
[0m[0m08:27:46,231 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) There are 1 inflight exchanges: InflightExchange: [exchangeId=ID-exchange-ID, fromRouteId=Route1, routeId=GetDataAutoBySinceTime, nodeId=toD7, elapsed=0, duration=216959570]
[0m[0m08:27:47,231 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 298 seconds. Inflights per route: [Route1 = 1]
I don't know if some processes are stuck making it impossible another processes to start.
I thought to remove the generic routes (PostMessageInActiveMQ and
GetDataAutomaticallyBySinceTime and to implements the same code in another routes (Route1, Route2 and Route3) but I don't think this is the best approach.
Routes:
Route1 (Route2 and Route3 are almost the same, just change properties values)
from("timer:Route1Timer?period=15m")
.routeId("Route1")
.autoStartup(false)
.setProperty("targetAddress", simple("hostname.route1"))
.process(new GetCurrentDate())
.setProperty("query",
simple("DataQuery%26URI=Route1%26format=xml%26Mode=since-time%26p1=${header.currentDate}"))
.to("direct:GetDataAutoBySinceTime");
GetDataAutomaticallyBySinceTime
from("direct:GetDataAutoBySinceTime")
.routeId("GetDataAutoBySinceTime")
.autoStartup(true)
.removeHeaders("*")
.setHeader(Exchange.HTTP_METHOD, constant("GET"))
.toD("http4:${header.targetAddress}/command=${header.query}%26httpClient.socketTimeout=3000")
.convertBodyTo(String.class, "utf-8")
.to("direct:PostMessageInActiveMQ");
PostMessageInActiveMQ
CamelArtemisComponent components = new CamelArtemisComponent();
getContext().addComponent("artemis", components.getArtemisComponent());
from("direct:PostMessageInActiveMQ")
.routeId("PostMessageInActiveMQ")
.autoStartup(true)
.convertBodyTo(String.class, "utf-8")
.inOnly("artemis:ARTEMIS.QUEUE");
Entire code: https://github.com/vitorvr/camel-example
EDIT:
Camel Version: 2.22.0

CamelExchangeException Invalid Correlation Key on exchange header

Apache Camel is throwing Invalid Correlation Key exception when trying to aggregate messages from my AWS SQS queue.
The messages were placed in the queue using ZipSplitter and they all appear in the queue with matching "parentId" values (which I added using a random uuid as part of the splitting -I've tried CamelSourceFile as well). I get the Exception repeatedly until the retries are exhausted.
My aggregate expression:
from(--queue--).aggregate(header("parentId"), customAggregationStrategy).completionTimeout(3000).processor(new Processor() {...}.to(--next queue--);
There is no logging emitted from my customAggregationStrategy nor from any of the subsequent processors. It fails to aggregate:
... DeadLetterChannel - Failed delivery for (MessageId: ...). On Delivery attempt: 0 caught ...CamelExchangeException: Invalid correlation key. Exchange[ID...]
The delivery attempt is 0 through 9 for my retry attempts.
The infuriating thing is that the code works everywhere but locally...which you think would narrow things down, but neither the exception nor anything else logged sheds any light onto what is going on here.
You could try to use Camel simple language when expressing the correlation key, ie:
.aggregate(simple("${headers.parentId}", customAggregationStrategy)
This way, the exceptions might be silently ignored ?
Did you activate Camel tracer (http://camel.apache.org/tracer.html) to analyse your exchanges and ease the debugging ?
I suspect you have an Exchange which does NOT have the "parentId" header. If you want to skip them, just activate the ignoreInvalidCorrelationKeys option (see http://camel.apache.org/aggregator2.html)

Handle exception from load balancer only once (when failover is exhausted)

My code consumes jms queue and through lb redirects to external http client.
I need to log original message for every failed delivery to local directory.
Problem is that onException is caught by each failover.
Is there any way how to achieve this?
Pseudo code:
onException(Exception.class).useOriginalMessage()
.setHeader(...)
.to("file...")
.setHeader(...)
.to("file...")
from("activemq...")
.process(...)
.loadBalance().failover(...)
.to("lb-route1")
.to("lb-route2")
.end()
.process()
.to("file...")
from("lb-route1")
.recipientList("dynamic url")
.end()
from("lb-route2")
.recipientList("dynamic url")
.end()
I have not tested this logic with multiple to statements, but when I have my list of endpoints in an array this logic functions just fine. I will attempt to deliver to the first endpoint, if I cannot I will attempt to deliver this to the second endpoint. If an exception occurs at the second endpoint it will propagate back to the route's error handler. If you need it to round robin instead, change the last false on the failover statement to a true.
String[] endpointList = {"direct:end1", "direct:end2"};
from("direct:start")
.loadbalance().failover(endpointList.length-1, false, false)
.to(endpointList);

Resources