Camel idempotentConsumer always use PUT instead of GET - apache-camel

I am using camel idempotent. Can someone please explain the logic behind idempotentConsumer xml tag.
I received file for first time. All good the idempotentconsumer block executed. on infinispan server I see a log PUT.
I dropped a duplicate file. Now idempotentconsumer identifies duplicated but on infinispan server I see a log with PUT instead of GET. I am wondering is this issue with server side or camel-client?
<idempotentConsumer messageIdRepositoryRef="infinispanRepo" >
<header>CamelFileAbsolutePath</header>
</idempotentConsumer>

No this is working as designed. The Idempotent Consumer EIP will attempt to put the key to the cache with a fixed value of true - that would be an atomic operation on Infinispan. The result of that put operation is then used to know if there was a duplicate or not.
If you do two operations with a GET and then PUT its no longer an atomic operation and you can end up with problems.
See the code at:
https://github.com/apache/camel/blob/master/components/camel-infinispan/src/main/java/org/apache/camel/component/infinispan/processor/idempotent/InfinispanIdempotentRepository.java#L68

Related

AsyncIO Exceptions in Apache Flink

In Apache Flink, I'm using the RichAsyncFunction for data enrichment. In the case of errors/exceptions, I want to funnel those error records into an error stream. I can see that other functions have a "side output" for this sort of scenario, but how is it handled in RichAsyncFunction? I also see use of ResultFuture<>.completeExceptionally, but what does this do or mean when it occurs? Does the stream stop, is it just logged, what is the state with regards to the output element of the stream? All the docs seem to just point out how to handle the happy path or to call completeExceptionally with no explanation of what happens next. What is the proper way to handle/capture errors in RichAsyncFunction?
Thanks!

Flink: what's the best way to handle exceptions inside Flink jobs

I have a flink job that takes in Kafaka topics and goes through a bunch of operators. I'm wondering what's the best way to deal with exceptions that happen in the middle.
My goal is to have a centralized place to handle those exceptions that may be thrown from different operators and here is my current solution:
Use ProcessFunction and output sideOutput to context in the catch block, assuming there is an exception, and have a separate sink function for the sideOutput at the end where it calls an external service to update the status of another related job
However, my question is that by doing so it seems I still need to call collector.collect() and pass in a null value in order to proceed to following operators and hit last stage where sideOutput will flow into the separate sink function. Is this the right way to do it?
Also I'm not sure what actually happens if I don't call collector.collect() inside a operator, would it hang there and cause memory leak?
It's fine to not call collector.collect(). And you don't need to call collect() with a null value when you use the side output to capture the exception - each operator can have its own side output. Finally, if you have multiple such operators with a side output for exceptions, you can union() the side outputs together before sending that stream to a sink.
If for some reason the downstream operator(s) need to know that there was an exception, then one approach is to output an Either<good result, Exception>, but then each downstream operator would of course need to have code to check what it's receiving.

DynamoDB ConditionalCheckFailedException thrown but succeeds

I think that have seen in many occasions that a DynamoDB conditional put throws ConditionalCheckFailedException but succeeds. Usually in this scenario, the request takes quite long (~10s) to finish, but I can see that the request is updated despite the fact that a ConditionalCheckFailedException is thrown (and the it took few seconds).
By the way I don't force any timeout on the DDB request.
Is this a bug, or some DDB conditional put contract that I misunderstand? Has anyone experienced this issue?
Answering this late to inform others:
ConditionCheckFailedException but item is persisted:
This typically happens when you save an item to DynamoDB, DynamoDB acknowledges the write request but the response gets lost on the return path which can happen for multiple reasons, keeping in mind that DynamoDB is one of the largest distributed systems in the cloud.
This causes the SDK timeout to exceed while awaiting a response, which then triggers an SDK retry. When the write request is retried, the condition now evaluates to False as the item already exists, which in turn throws a ConditionCheckFailedException, which can cause confusion.
When I receive a ConditionCheckFailedException I typically do a strongly consistent GetItem request for the item to ensure it exists with the values I expect and move on.

apache flink - the correct way of error handling

I wonder if there is an option of built in error handling in Flink.
there may be 2 cases:
the current message from Kafka (in my case) is invalid, continue to next one
uncaught exception - from what I saw it can stop the stream aggregation completely.
ho can I handle these 2 cases? (java code)
1) This is done idiomatically with a flatMap: if your message is valid, you go on with a list containing your valid element (maybe already processed in the same step). If it's not valid, you simply return an empty list so that no elements are produced by that step. I could provide Scala code but I'm not familiar with Java APIs so I don't want to put you off track. Just check the flatMap call.
2) This depends on the type of exception: if it's provoked by your own code, just catch it and handle it inside the operator, or simply log it and move on. Without any further information about a specific case, this is the best I know of, but again, coming from Scala I haven't experienced runtime exceptions.

Camel condition on aggregate of messages

I'm looking for a way to conditionally handle messages based on the aggregation of messages. I've looked into a lot of ways to do this, but it seems that Apache Camel doesn't support it. I'll explain the scenario and then the solutions I tried.
Scenario:
I'm trying to conditionally clean a directory. I poll from the directory every x days and fetch all the files (file://...). I route this into an aggregation, that aggregates the files into a single size (directorySize). I then check if this size passes a certain threshold.
Here is where the problem lies. I now want to remove certain files if this condition passes, but I don't have access to the original messages anymore because they were aggregated in a new exchange.
Solutions:
I tried to fetch the files again to process them. Problem is that you can't make a consumer fetch on demand as far as I know. I tried using pollEnrich, but that will only fetch a single file and not all files in the directory.
I tried to filter/stop the parent route. The problem here is that filter()/choice...stop()/end() will only stop the aggregated route with the directory size and not the parent route with the file messages. I can't conditionally process these.
I tried to move the aggregated condition to another route that I would call, but this causes the same problem as the first solution.
Things I consider doing:
Rewrite the aggregation strategy to not only aggregate the size, but also the files itself into a groupedExchange. This way I can split the aggregation again after the check. I don't really like this solution because it causes a lot boilerplate, both in code as during runtime.
Move the file size calculator to a processor instead of the aggregator. This would defeat the purpose of using camel in the first place.. I would manually be fetching the files and adding the sizes.. And that for every single file..
Use a ControlBus to dynamically start the delete route on that directory. Once again a lot of workaround to achieve something that I feel should be able to be done in a simple route.
I would like to set the calculated size on every parent message, but I have no clue how this could be achieved?
Another way to stop the parent route that I haven't thought of?
I'm a bit stunned that you can't elegantly filter messages based on the aggregation of these messages. Is there something that I missed in Camel that would provide an elegant solution? Or is this a case of the least bad solution?
Simple Schema
Message(File)
Message(File) --> AggregatedMessage(directorySize) --> delete certain Files?
Message(File)
Camel is really awesome, but sometimes it's sure difficult to see exactly which design pattern to use ;)
Firstly, you need to keep a copy of the file objects, because you don't know whether to delete them or not until you reach your threshold - there are basically (at least) two ways to do this.
Alternative 1
The first way is to use a List in an exchange property. This property will hang around no matter what you do with the exchange body. If you have a look at the source code for GroupedExchangeAggregationStrategy, it does precisely this:
list = new ArrayList<Exchange>();
answer.setProperty(Exchange.GROUPED_EXCHANGE, list);
// ...
list.add(newExchange);
Or you could do the same thing manually on your own exchange property. In any case, it's completely fine to use the Grouped aggregation strategy as you have done.
Alternative 2
The second way to "keep" old messages is to send a copy to a stopped SEDA queue. So you would do to("seda:xyz"). You define this queue as .noAutoStartup(). Then you can send messages to it and they will queue up on an internal queue, managed by camel. When you want to process the messages, you simply start it up via controlbus and stop it again afterwards.
Generally, messing around with starting and stopping queues should be avoided unless absolutely necessary, but that's certainly another way to do it
Suggested solution
I suggest you do as you have done (i.e. alternative 1):
aggregate via GroupedExchangeAggregationStrategy to keep the individual files in a list
Compute the total file size (use a processor, or do it along the way with a custom aggregation strategy)
Use a filter(simple("${body} < 123"))
"Unwind" your aggregation via a splitter(simple("${property.CamelGroupedExchange}"))
Delete your files one by one
Please let me know if this doesn'y makes sense, or if I have misunderstood your problem in any way.

Resources