I am executing some external processes using camel-exec component.
These might be long running (or for whatever business reason), I would like to be able to kill some inflightExchanges based on some header values.
So far so good. I can filter the exchange(s) I want to kill, I am marking the exchange with e.setProperty(Exchange.ROUTE_STOP, Boolean.TRUE); and removing it from the InfightRepository.
But what I need is to forcibly stop the execution of the current task and stop the exchange from further routing.
What would you propose? Any ideas?
What I did to solve it is the following:
Created two caches anexecCommandCache<aKey, org.apache.camel.component.exec.ExecCommand> and a processCache<org.apache.camel.component.exec.ExecCommand, java.lang.Process>. Whenever a new executable is started, I am adding it in these caches.
When I need to cancel an executable I am browsing the InflightRepository and filter on the aKey Exchange header. Then I retrieve the ExecCommand from the execCommandCache and and with this I can find the actual Process from processCache.
Finally I am using Zeroturnaround zt-process-killer to kill the Process. and marking the exchange as done via exchange.setProperty(Exchange.ROUTE_STOP, Boolean.TRUE);
ProcessUtil.destroyGracefullyOrForcefullyAndWait(process, 5, TimeUnit.SECONDS, 5, TimeUnit.SECONDS);
It's not very clean but it does the trick. Hope it helps.
Related
I have implemented a Source by extending RichSourceFunction for our Message Queue that Flink doesn't support.
When I implements the run method whose signature is:
override def run(sc: SourceFunction.SourceContext[String]): Unit = {
val msg = read_from_mq
sc.collect(msg)
}
When the run method is called, if there is no newer message in message queue,
Should I run without calling sc.collect or
I can wait until newer data comes(in this case, run method will be blocked).
I would prefer the 2nd one,not sure if this is the correct usage.
The run method of a Flink source should loop, endlessly producing output until its cancel method is called. When there's nothing to produce, then it's best if you can find a way to do a blocking wait.
The apache nifi source connector is another reasonable example to use as a model. You will note that it sleeps for a configurable interval when there's nothing for it to do.
As you probably know both options are functionally correct and will yield correct results.
This being said the second one is preferred because you're not holding the thread. In fact, if you take a look at the RabbitMQ connector implementation you'll notice that this exactly how it is implemented: inside its run it indirectly waits for messages to be placed on a BlockingQueue.
I'm consuming messages from SQS FIFO queue with maxMessagesPerPoll=5 set.
Currently I'm processing each message individually which is a total waste of resources.
In my case, as we are using FIFO queue and all of those 5 messages are related to the same object, I could process them all toghether.
I though this might be done by using aggregate pattern but I wasn't able to get any results.
My consumer route looks like this:
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.process(exchange -> {
// process the message
})
I believe it should be possible to do something like this
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.aggregate(const(true), new GroupedExchangeAggregationStrategy())
.completionFromBatchConsumer()
.process(exchange -> {
// process ALL messages together as I now have a list of all exchanges
})
but the processor is never invoked.
Second thing:
If I'm able to make this work, when does ACK is sent to SQS? When each individual message is processed or when the aggregate process finishes? I hope the latter
When the processor is not called, the aggregator probably still waits for new messages to aggregate.
You could try to use completionSize(5) instead of completionFromBatchConsumer() for a test. If this works, the batch completion definition is the problem.
For the ACK against the broker: unfortunately no. I think the message is commited when it arrives at the aggregator.
The Camel aggregator component is a "stateful" component and therefore it must end the current transaction.
For this reason you can equip such components with persistent repositories to avoid data loss when the process is killed. In such a scenario the already aggregated messages would obviously be lost if you don't have a persistent repository attached.
The problem lies in GroupedExchangeAggregationStrategy
When I use this strategy, the output is an "array" of all exchanges. This means that the exchange that comes to the completion predicate no longer has the initial properties. Instead it has CamelGroupedExchange and CamelAggregatedSize which makes no use for the completionFromBatchConsumer()
As I don't actually need all exchanges being aggregated, it's enough to use GroupedBodyAggregationStrategy. Then exchange properties will remain as in the original exchange and just the body will contain an "array"
Another solution would be to use completionSize(Predicate predicate) and use a custom predicate that extracts necessary value from groupped exchanges.
To some extent, this is a bit of a shot in the dark, but we have a process that dramatically slows down over the course of a day. We've found everything running on Fuse begins to drag, but only when we've been running a specific process. Running JProfiler, I found there to be a memory usage increase over time marked on org.apache.camel.ProducreTemplate.send.
So my main question is, is there something I'm missing with the way we are using the ProducerTemplate here that is incorrect/could be causing this issue?
Exchange foo = new DefaultExchange(getCamelContext(), ExchangePattern.InOnly);
foo.getIn().setBody(obj);
Route r = exchange.getContext().getRoute("do_something_fun");
ProducerTemplate template = exchange.getContext().createProducerTemplate();
template.send(r.getEndpoint(), foo);
Normally you shouldn't create a ProducerTemplate on each request as is described here: http://camel.apache.org/why-does-camel-use-too-many-threads-with-producertemplate.html
However, because I don't have the complete picture of your application you could have a situation where you cannot reuse it but then you must remember to close it when you're done with it.
I'm looking for a way to conditionally handle messages based on the aggregation of messages. I've looked into a lot of ways to do this, but it seems that Apache Camel doesn't support it. I'll explain the scenario and then the solutions I tried.
Scenario:
I'm trying to conditionally clean a directory. I poll from the directory every x days and fetch all the files (file://...). I route this into an aggregation, that aggregates the files into a single size (directorySize). I then check if this size passes a certain threshold.
Here is where the problem lies. I now want to remove certain files if this condition passes, but I don't have access to the original messages anymore because they were aggregated in a new exchange.
Solutions:
I tried to fetch the files again to process them. Problem is that you can't make a consumer fetch on demand as far as I know. I tried using pollEnrich, but that will only fetch a single file and not all files in the directory.
I tried to filter/stop the parent route. The problem here is that filter()/choice...stop()/end() will only stop the aggregated route with the directory size and not the parent route with the file messages. I can't conditionally process these.
I tried to move the aggregated condition to another route that I would call, but this causes the same problem as the first solution.
Things I consider doing:
Rewrite the aggregation strategy to not only aggregate the size, but also the files itself into a groupedExchange. This way I can split the aggregation again after the check. I don't really like this solution because it causes a lot boilerplate, both in code as during runtime.
Move the file size calculator to a processor instead of the aggregator. This would defeat the purpose of using camel in the first place.. I would manually be fetching the files and adding the sizes.. And that for every single file..
Use a ControlBus to dynamically start the delete route on that directory. Once again a lot of workaround to achieve something that I feel should be able to be done in a simple route.
I would like to set the calculated size on every parent message, but I have no clue how this could be achieved?
Another way to stop the parent route that I haven't thought of?
I'm a bit stunned that you can't elegantly filter messages based on the aggregation of these messages. Is there something that I missed in Camel that would provide an elegant solution? Or is this a case of the least bad solution?
Simple Schema
Message(File)
Message(File) --> AggregatedMessage(directorySize) --> delete certain Files?
Message(File)
Camel is really awesome, but sometimes it's sure difficult to see exactly which design pattern to use ;)
Firstly, you need to keep a copy of the file objects, because you don't know whether to delete them or not until you reach your threshold - there are basically (at least) two ways to do this.
Alternative 1
The first way is to use a List in an exchange property. This property will hang around no matter what you do with the exchange body. If you have a look at the source code for GroupedExchangeAggregationStrategy, it does precisely this:
list = new ArrayList<Exchange>();
answer.setProperty(Exchange.GROUPED_EXCHANGE, list);
// ...
list.add(newExchange);
Or you could do the same thing manually on your own exchange property. In any case, it's completely fine to use the Grouped aggregation strategy as you have done.
Alternative 2
The second way to "keep" old messages is to send a copy to a stopped SEDA queue. So you would do to("seda:xyz"). You define this queue as .noAutoStartup(). Then you can send messages to it and they will queue up on an internal queue, managed by camel. When you want to process the messages, you simply start it up via controlbus and stop it again afterwards.
Generally, messing around with starting and stopping queues should be avoided unless absolutely necessary, but that's certainly another way to do it
Suggested solution
I suggest you do as you have done (i.e. alternative 1):
aggregate via GroupedExchangeAggregationStrategy to keep the individual files in a list
Compute the total file size (use a processor, or do it along the way with a custom aggregation strategy)
Use a filter(simple("${body} < 123"))
"Unwind" your aggregation via a splitter(simple("${property.CamelGroupedExchange}"))
Delete your files one by one
Please let me know if this doesn'y makes sense, or if I have misunderstood your problem in any way.
I've been trying to work out how to cancel a long-running AD search in System.DirectoryServices.Protocols. Can anyone help?
I've looked at the supportControl/supportedCapabilities attributes on RootDSE and they don't contain the 1.3.6.1.1.8 OID so I think that means it doesn't support the LDAP CANCEL extended operation as defined here: https://www.rfc-editor.org/rfc/rfc3909
That leaves the original LDAP ABANDON command (see here for list). But there doesn't seem to be a matching DirectoryRequest Class.
Anyone have any ideas?
I think I've found my answer: whilst I was reading around your suggestion, Martin, I came across the Abort method on the LdapConnection class. I didn't expect to find it there: starting out from the LDAP documentation I'd expected to find it as just another LDAPMessage but the MS guys seem to have treated it as a special case. If anyone is familiar with a non-MS implementation of LDAP and can comment on whether the MS approach is typical, I'd appreciate it to improve my understanding.
I think, but I'm not positive, there is no asynch query with a cancel. It has an asynch property but it's to allow a collection to be filled, nothing to do with cancelling. The best I can offer is to put your query in a background worker thread and put an asynch callback that will deal with the answer when it comes back. If the user decides to cancel, you can just cancel the background worker thread. You'll free your app up, even if you haven't freed the ldap server up until it finishes it's query. You can find info on background worker threads at http://www.c-sharpcorner.com/UploadFile/LivMic/BGWorker07032007000515AM/BGWorker.aspx
Don't forget to call .Dispose() when cleaning up your active directory objects to prevent memory leaks.
If the query will produce many data also, you can abandon them through paging. Specify a PageResultRequestControl option in the query, giving a fairly low page size (IIUC, 1000 is the default page size). IIUC, you'll send new requests every time you got a page (passing cookies from one response into the next request). When you choose to cancel the query, send another request with zero expected results.