I am using Camel to consume a huge (GB) file, process/modify some data in the file and finally forward the modified file via an AWS-S3, FTP or SFTP producer endpoint to its target. In the actual usage scenario using an intermediate(temporary) file holding the processed data is not allowed.
In case of the AWS producer, the configure method of the corresponding RouteBuilder specifies the route as follows:
from("file:/...")
.streamCaching()
.process(new CustomFileProcessor())
.to("aws-s3://...");
In it's process(Exchange exchange) method the CustomFileProcessor reads the input data from exchange.getIn().getBody(InputStream.class) and writes the processed and modified data into a PipedOutputStream.
Now the PipedInputStream connected with this PipedOutputStream should be used as the source for the producer sending the data to AWS-S3.
I tried exchange.getOut().setBody(thePipedInputStream) in the process method but this doesn't work and seems to create a deadlock.
So what is the correct way - if it is possible at all - of piping the processed output data of the CustomFileProcessor to the producer endpoint so that the entire data is send over?
Many thanks in advance.
After further digging into this the solution was quite simple. I only needed to place the pipe reader into a separate thread and the problem was solved.
Related
I have the following service.
Spring boot 2.5.13
Camel 3.18.0
JMS
I want to use an embedded ActiveMQ Artemis, standalone ActiveMQ Artemis, and IBM MQ.
I've managed to get all 3 running and connecting, but one thing I cant figure out is the JMSReplyTo option.
Running locally with embedded broker:
This runs fine. I can write a message to the queue and a response is send to the JMSReplyTo:
public void sendRequest(){
ActiveMQQueue activeMQQueue = new ActiveMQQueue("RESPONSE_QUEUE");
jmsTemplate.convertAndSend("REQUEST_QUEUE", "Hello", pp -> {
pp.setJMSReplyTo(activeMQQueue);
return pp;
});
}
Via ActiveMQ Artemis console:
This is where the inconstancy comes as the Object received is an ActiveMQDestination which makes setting the CamelJmsDestination much more involved.
Am I wasting my time here? Should I just grab the queue name and construct the uri manually? Or I am missing some logic as to how this works? Or maybe I'm not using the Artemis console in the correct way?
.setExchangePattern(ExchangePattern.InOut)
.setHeader("CamelJmsDestination", header("JMSReplyTo"))
When using javax.jms.Message#setJMSReplyTo(Destination) you have to pass a javax.jms.Destination which must implement one of the following:
javax.jms.Queue
javax.jms.TemporaryQueue
javax.jms.Topic
javax.jms.TemporaryTopic
In order to reproduce this semantic via text in the web console of ActiveMQ Artemis you need to prefix your destination's name with one of the following respectively:
queue://
temp-queue://
topic://
temp-topic://
So when you set the JMSReplyTo header try using queue://RESPONSE_QUEUE.
When your application then receives this message and invokes getJMSReplyTo() it will receive a javax.jms.Queue implementation (i.e. ActiveMQQueue) and then you can use getQueueName() to get the String name of the queue if necessary.
I need to consume JMS messages with Camel everyday at 9pm (or from 9pm to 10pm to give it the time to consume all the messages).
I can't see any "scheduler" option for URIs "cMQConnectionFactory:queue:myQueue" while it exists for "file://" or "ftp://" URIs.
If I put a cTimer before it will send an empty message to the queue, not schedule the consumer.
You can use a route policy where you can setup for example a cron expression to tell when the route is started and when its stopped.
http://camel.apache.org/scheduledroutepolicy.html
Other alternatives is to start/stop the route via the Java API or JMX etc and have some other logic that knows when to do that according to the clock.
This is something that has caused me a significant amount of trouble. There are a number of ways of skinning this cat, and none of them are great as far as I can see.
On is to set the route not to start automatically, and use a schedule to start the route and then stop it again after a short time using the controlbus EIP. http://camel.apache.org/controlbus.html
I didn't like this approach because I didn't trust that it would drain the queue completely once and only once per trigger.
Another is to use a pollEnrich to query the queue, but that only seems to pick up one item from the queue, but I wanted to completely drain it (only once).
I wrote a custom bean that uses consumer and producer templates to read all the entries in a queue with a specified time-out.
I found an example on the internet somewhere, but it took me a long time to find, and quickly searching again I can't find it now.
So what I have is:
from("timer:myTimer...")
.beanRef( "myConsumerBean", "pollConsumer" )
from("direct:myProcessingRoute")
.to("whatever");
And a simple pollConsumer method:
public void pollConsumer() throws Exception {
if ( consumerEndpoint == null ) consumerEndpoint = consumer.getCamelContext().getEndpoint( endpointUri );
consumer.start();
producer.start();
while ( true ) {
Exchange exchange = consumer.receive( consumerEndpoint, 1000 );
if ( exchange == null ) break;
producer.send( exchange );
consumer.doneUoW( exchange );
}
producer.stop();
consumer.stop();
}
where the producer is a DefaultProducerTemplate, consumer is a DefaultConsumerTemplate, and these are configured in the bean configuration.
This seems to work for me, but if anyone gives you a better answer I'll be very interested to see how it can be done better.
Hi I am using apache camel + Spring and defined a configure like
public class MyOrderConsumerRouterBuilder extends RouteBuilder implements InitializingBean, ApplicationContextAware{
#Override
public void configure() throws Exception {
from("seda:asyncChannel?concurrentConsumers=20").id("asyncProcessChannelFromId")
.to("bean:OrderProcessManager?method=processOrders").id("asyncProcessChannelToId");
}
}
Is this Producer multithread? I see that consumers are multiple. In my case it is : concurrentConsumers=20
I checked below URL
How do I configure the default maximum cache size for ProducerCache or ProducerTemplate
As per source code DefaultCamelContext.createProducerTemplate() DefaultCamelContext DefaultProducerTemplate is being created with maximumCacheSize (default 1000)
As per this I understand this there can be multiple producers which are being defined using maximumCacheSize as LRU. In my case I have only one endpoint i.e SEDA so there will be only one producer.
So I think there will always be one single threaded producer. Please help me to understand it better.
The producer is not multithreaded, but there are multiple producers.
In your case 20 consumers (threads) are waiting for messages. If a message arrives, it is processed according to the route definition by one of these threads.
If another message arrives the thread who processes the first message is probably still occupied, but one of the other 19 free threads can process the message.
As long as there are no Splitters, Aggregators and similar EIPs, a single thread "walks" the message through your route and in your case finally sends the message to the OrderProcessManager bean. So this producing step (calling the bean method) is obviously done by a single thread for a single message.
BUT since you can have up to 20 threads processing messages in parallel, the OrderProcessManager bean can be called by up to 20 producers (threads) in parallel.
I am new to Spring Batch application. I am trying to use FlatFileItemWriter to write the data into a file. Challenge is application is creating the file on a given path, but, now writing the actual content into it.
Following are details related to code:
List<String> dataFileList : This list contains the data that I want to write to a file
FlatFileItemWriter<String> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource("C:\\Desktop\\test"));
writer.open(new ExecutionContext());
writer.setLineAggregator(new PassThroughLineAggregator<>());
writer.setAppendAllowed(true);
writer.write(dataFileList);
writer.close();
This is just generating the file at proper place but contents are not getting written into the file.
Am I missing something? Help is highly appreciated.
Thanks!
This is not a proper way to use Spring Batch Writer and writer data. You need to declare bean of Writer first.
Define Job Bean
Define Step Bean
Use your Writer bean in Step
Have a look at following examples:
https://github.com/pkainulainen/spring-batch-examples/blob/master/spring-boot/src/main/java/net/petrikainulainen/springbatch/csv/in/CsvFileToDatabaseJobConfig.java
https://spring.io/guides/gs/batch-processing/
You probably need to force a sync to disk. From the docs at https://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/file/FlatFileItemWriter.html,
setForceSync
public void setForceSync(boolean forceSync)
Flag to indicate that changes should be force-synced to disk on flush. Defaults to false, which means that even with a local disk changes could be lost if the OS crashes in between a write and a cache flush. Setting to true may result in slower performance for usage patterns involving many frequent writes.
Parameters:
forceSync - the flag value to set
Take this example:
from("seda:data").log("data added to queue")
.setHeader("CamelHttpMethod", constant("POST"))
.setHeader(Exchange.CONTENT_TYPE, constant("application/json"))
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
exchange.setProperty(Exchange.CHARSET_NAME, "UTF-8");
}
})
.recipientList(header(RECIPIENT_LIST))
.ignoreInvalidEndpoints().parallelProcessing();
Assume the RECIPENT_LIST header contains only one http endpoint. For a given http endpoint, messages should be processed in order, but two messages for different end points can be processed in parallel.
Basically, I want to know if there is anything be done to improve performance. For example, would using concurrentConsumers help?
SEDA with concurrentConsumers > 1 would absolutely help with throughput because it would allow multiple threads to run in parallel...but you'll need to implement your own locking mechanism to make sure only a single thread is hitting a given http endpoint at a given time
otherwise, here is an overview of your options: http://camel.apache.org/parallel-processing-and-ordering.html
in short, if you can use JMS, then consider using ActiveMQ message groups as its trivial to use and is designed for exactly this use case (parallel processing, but single threaded by groups of messages, etc).