Unable to process large files in camel - apache-camel

I am trying to do simple transformation on a Csv file.But my programm is getting stuck and not giving any output and on console its printing something like below.
22:38:02.001 [main] INFO o.a.camel.impl.DefaultCamelContext - Apache Camel 2.15.2 (CamelContext: camel-1) is shutting down
22:38:02.135 [main] INFO o.a.c.impl.DefaultShutdownStrategy - Starting to graceful shutdown 1 routes (timeout 300 seconds)
22:38:02.167 [main] DEBUG o.a.c.i.DefaultExecutorServiceManager - Created new ThreadPool for source: org.apache.camel.impl.DefaultShutdownStrategy#65ead16a with name: ShutdownTask. -> org.apache.camel.util.concurrent.RejectableThreadPoolExecutor#52c0a65f[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0][ShutdownTask]
22:38:02.173 [Camel (camel-1) thread #1 - ShutdownTask] DEBUG o.a.c.impl.DefaultShutdownStrategy - There are 1 routes to shutdown
22:38:02.177 [Camel (camel-1) thread #1 - ShutdownTask] DEBUG o.a.c.impl.DefaultShutdownStrategy - Route: route1 suspended and shutdown deferred, was consuming from: Endpoint[file:///home/cloudera/Desktop/camelinput/?delay=15m&noop=true]
22:38:02.177 [Camel (camel-1) thread #1 - ShutdownTask] INFO o.a.c.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 300 seconds.
22:38:02.179 [Camel (camel-1) thread #1 - ShutdownTask] DEBUG o.a.c.impl.DefaultShutdownStrategy - There are 1 inflight exchanges:
InflightExchange: [exchangeId=ID-quickstart-cloudera-40574-1441345060577-0-2, fromRouteId=route1, routeId=route1, nodeId=unmarshal1, elapsed=10787, duration=10791]
22:38:05.436 [Camel (camel-1) thread #1 - ShutdownTask] INFO o.a.c.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 299 seconds.
22:38:05.437 [Camel (camel-1) thread #1 - ShutdownTask] DEBUG o.a.c.impl.DefaultShutdownStrategy - There are 1 inflight exchanges:
InflightExchange: [exchangeId=ID-quickstart-cloudera-40574-1441345060577-0-2, fromRouteId=route1, routeId=route1, nodeId=unmarshal1, elapsed=14045, duration=14049]
22:38:08.201 [Camel (camel-1) thread #1 - ShutdownTask] INFO o.a.c.impl.DefaultShutdownStrategy - Waiting as there are still 2 inflight and pending exchanges to complete, timeout in 298 seconds.
22:38:08.202 [Camel (camel-1) thread #1 - ShutdownTask] DEBUG o.a.c.impl.DefaultShutdownStrategy - There are 1 inflight exchanges:
InflightExchange: [exchangeId=ID-quickstart-cloudera-40574-1441345060577-0-2, fromRouteId=route1, routeId=route1, nodeId=unmarshal1, elapsed=16810, duration=16814]
Actually the same program worked for small file but when I try to do with large file I am getting this issue.I think it may problem with Threads .Please Help me out to figure out the issue.
Following is my Program
Main Class
TestRouter myRoute = new TestRouter();
HDFSTransfer hdfsTransfer = new HDFSTransfer();
String copy = hdfsTransfer.copyToLocal(
"hdfs://localhost:8020",
"/user/cloudera/input/CamelTestIn.csv",
"/home/cloudera/Desktop/camelinput/");
boolean flag = false;
if ("SUCCESS".equals(copy)) {
myContext.addRoutes(myRoute);
// Launching the context
myContext.start();
// Pausing to let the route do its work
Thread.sleep(10000);
myContext.stop();
flag = true;
}
if (flag) {
hdfsTransfer.moveFile(
"hdfs://localhost:8020",
"file:/home/cloudera/Desktop/camelout/out.csv",
"/user/cloudera/output/");
}
RouterBuilder Class
{
CsvDataFormat csv = new CsvDataFormat();
from("file:/home/cloudera/Desktop/camelinput/?noop=true&delay=15m")
.unmarshal(csv)
.convertBodyTo(List.class)
.process(new Processor() {
#Override
public void process(Exchange msg) throws Exception {
List<List<String>> data = (List<List<String>>) msg.getIn().getBody();
for (List<String> line : data) {
// Checks if column two contains text STANDARD
// and alters its value to DELUXE.
// System.out.println("line "+line);
/*
if("Aug-04".equalsIgnoreCase(line.get(6))){
line.set(6, "04-August");}
*/
}
}
}).marshal(csv)
.to("file:/home/cloudera/Desktop/camelout/?fileName=out.csv")
.log("done.").end();
}

If you have a bigger file then you need to sleep for longer than 10 seconds to let it have time to process the file.
Also mind that your current route reads the file into memory when means you can run out of memory if the file is very big.
See the lazyLoad option on: http://camel.apache.org/csv.html
Also if all your route is doing is to change some text in a big file, then there is better and faster ways doing that than maybe a Camel route.

Related

Apache Camel - split and aggregation bug

I tried to create a bug in Camel's issue tracker but it's not easy to get access there now. So maybe someone will be able to help me here.
I'm migrating gradually to the newest Camel version. Currently I'm going from the 3.7.3 to 3.11.7 but I checked that this bug happens also on 3.20.1.
Ok, so to the point.
When I have pipeline like this:
.to(SPLIT_WORKER_ROUTE_ID, OTHER_ROUTE_ID)
it should execute in sequence. But when somewhere inside SPLIT_WORKER_ROUTE_ID I have an aggregation code like this:
.split(body())
.process(splitWorkerProcessor)
.aggregate(exchangeProperty(CORRELATION_ID), new SplitAggregator()).completionSize(exchangeProperty(SPLIT_SIZE))
.to(AFTER_SPLIT_ROUTE_ID)
before it goes to AFTER_SPLIT_ROUTE_ID the OTHER_ROUTE_ID kicks in and starts to run in parallel with SPLIT_WORKER_ROUTE_ID.
When I rewrite the code like this (or go back to Camel 3.7.3):
.split(body(), new SplitAggregator()).parallelProcessing()
.process(splitWorkerProcessor)
.end()
.to(AFTER_SPLIT_ROUTE_ID)
everything runs as it should sequentially. Unfortunately, I have to use more complex aggregation conditions so I'm afraid I cannot use this workaround as aggregation configuration is not possible in this approach.
I guess that according to
https://camel.apache.org/manual/camel-3x-upgrade-guide-3_11.html#_aggregate_eip
something has changed in this area. (EDIT: I've just checked Camel 3.10 and it works properly so I'm 99.99% sure this change introduced this bug)
The problem leads to the situation that order of execution is disturbed and we have this:
.to(SPLIT_WORKER_ROUTE_ID, OTHER_ROUTE_ID)
the OTHER_ROUTE_ID can complete before this sequence SPLIT_WORKER_ROUTE_ID -> AFTER_SPLIT_ROUTE_ID.
Here is the log presenting the problem:
2023-02-02T18:45:41,229 [main] INFO direct://Main [...] [] [] [] [] - MAIN START
2023-02-02T18:45:41,230 [Camel (camel-1) thread #1 - Threads] INFO direct://splitWorker [...] [] [] [] [] - SPLIT_WORKER_ROUTE_ID START
2023-02-02T18:45:41,399 [Camel (camel-1) thread #1 - Threads] INFO direct://other [...] [] [] [] [] - OTHER_ROUTE_ID START
2023-02-02T18:45:41,399 [Camel (camel-1) thread #3 - Aggregator] INFO direct://splitWorker [...] [] [] [] [] - Aggregation just finished inside SPLIT_WORKER_ROUTE_ID START!
2023-02-02T18:45:41,399 [Camel (camel-1) thread #3 - Aggregator] INFO direct://splitWorker [...] [] [] [] [] - SPLIT_WORKER_ROUTE_ID FINISH
2023-02-02T18:45:41,400 [Camel (camel-1) thread #3 - Aggregator] INFO direct://afterSplit [...] [] [] [] [] - AFTER_SPLIT_ROUTE_ID START
2023-02-02T18:45:42,404 [Camel (camel-1) thread #1 - Threads] INFO direct://other [...] [] [] [] [] - OTHER_ROUTE_ID FINISH
2023-02-02T18:45:43,406 [Camel (camel-1) thread #3 - Aggregator] INFO direct://afterSplit [...] [] [] [] [] - AFTER_SPLIT_ROUTE_ID FINISH
2023-02-02T18:45:47,417 [Camel (camel-1) thread #6 - Delay] INFO direct://Main [...] [] [] [] [] - MAIN FINISH
I would appreciate any help, thanks a lot!
By default, the output of the aggregator is executed on a thread from the aggregator's thread pool. However, you can have the aggregator output run in the same thread as the calling route:
.aggregate(exchangeProperty(CORRELATION_ID), new SplitAggregator())
.completionSize(exchangeProperty(SPLIT_SIZE))
.executorService(new SynchronousExecutorService())
This technique is briefly described here.

concurrentConsumers not created right away from beginning

I am using Camel in a Spring-Boot application to route from AMQ-Queue. Messages from this queue will be sent to a REST-Webservice. It is already working with this code line:
from("amq:queue:MyQueue").process("jmsToHttpProcessor").to(uri);
My uri looks like this:
http4://localhost:28010/application/createCustomer
Now I have the requirement that the routing to the Webservice should be done parallely:
In order to achive that, I configured concurrentConsumers in JmsConfiguration as follows:
#Bean
public JmsComponent amq(#Qualifier("amqConnectionFactory") ConnectionFactory amqConnectionFactory, AMQProperties amqProperties) {
JmsConfiguration jmsConfiguration = new JmsConfiguration(amqConnectionFactory);
jmsConfiguration.setConcurrentConsumers(50);
jmsConfiguration.setMaxConcurrentConsumers(50);
return new JmsComponent(jmsConfiguration);
}
#Bean
public ConnectionFactory amqConnectionFactory(AMQProperties amqProperties) throws Exception {
ConnectionFactoryParser parser = new ConnectionFactoryParser();
ConnectionFactory returnValue = parser.newObject(parser.expandURI(amqProperties.getUrl()), "amqConnectionFactory");
return returnValue;
}
It is working as expected, BUT not right away from the beginning. I have the phenomenon:
I have 100 messages in the ActiveMQ queue
I start my Spring application
Camel creates only 1 thread consuming 1 message after the previous one gets response
I observe that the amount of messages in queue only decreasing slowly(99.... 98... 97... 96...)
I am filling the queue with new 100 messages
NOW the concurrent consumers are being created as I can observe that the messages decreasing rapidly.
Does someone have any idea, why the concurrentConsumers is not working right away from the beginning?
I tried the advices. Unfortunately they dont change the behaviour. I found out, that the problem is that Camel already starts consuming the messages from the queue before the Spring boot application is startet. I can observe this from the log:
2021-04-01T20:26:33,901 INFO (Camel (CamelBridgeContext) thread #592 - JmsConsumer[MyQueue]) [message]; ...
2021-04-01T20:26:33,902 INFO (Camel (CamelBridgeContext) thread #592 - JmsConsumer[MyQueue]) [message]; ...
2021-04-01T20:26:33,915 INFO (main) [AbstractConnector]; _; Started ServerConnector#5833f5cd{HTTP/1.1,[http/1.1]}{0.0.0.0:23500}
2021-04-01T20:26:33,920 INFO (main) [BridgeWsApplication]; _; Started BridgeWsApplication in 12.53 seconds (JVM running for 13.429)
In this case, only one consumer with thread #592 is consuming all the messages.
In fact, if I start my Spring application first, and then fill the queue with messages, then concurrentConsumers will be used:
2021-04-01T20:30:20,159 INFO (Camel (CamelBridgeContext) thread #594 - JmsConsumer[MyQueue])
2021-04-01T20:30:20,159 INFO (Camel (CamelBridgeContext) thread #599 - JmsConsumer[MyQueue])
2021-04-01T20:30:20,178 INFO (Camel (CamelBridgeContext) thread #593 - JmsConsumer[MyQueue])
2021-04-01T20:30:20,204 INFO (Camel (CamelBridgeContext) thread #564 - JmsConsumer[MyQueue])
In this case, messages are being consumed from concurrentConsumers parallely.
In order to solve the problem, I tried setting autoStartUp to false in my RouteBuilder component:
#Override
public void configure() {
CamelContext context = getContext();
context.setAutoStartup(false);
// My Route
}
In my naive thinking, I let Camel starting after the Spring boot is started and running:
public static void main(String[] args) {
ConfigurableApplicationContext context = SpringApplication.run(BridgeWsApplication.class, args);
SpringCamelContext camel = (SpringCamelContext) context.getBean("camelContext");
camel.start();
try {
camel.startAllRoutes();
} catch (Exception e) {
e.printStackTrace();
}
}
Unfortunately, this does not change the behaviour. There must be a configuration to let Camel starts after Spring is started.

Apache Camel - Parallel Routes Inflight Exchanges

I have a Camel context with many routes that starts every 15m with Timer Component.
These routes set some properties in exchange (Target host, Query and Current Date that I use a Processor to get date, -12 hours and transform to GMT).
After set these properties, using Direct, another route is called to execute the HTTP Get. When the Request finished, another Route is called to Post the return on Artemis ActiveMQ.
The project is deployed on Wildfly 13.
The problem is:
Sometimes the routes simply freeze. Don't start after 15 minutes.
When I try to stop/start the route, I got the follow log:
[0m[0m08:27:45,230 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) There are 1 inflight exchanges: InflightExchange: [exchangeId=ID-exchange-ID, fromRouteId=Route1, routeId=GetDataAutoBySinceTime, nodeId=toD7, elapsed=0, duration=216958569]
[0m[0m08:27:46,231 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 299 seconds. Inflights per route: [Route1 = 1]
[0m[0m08:27:46,231 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) There are 1 inflight exchanges: InflightExchange: [exchangeId=ID-exchange-ID, fromRouteId=Route1, routeId=GetDataAutoBySinceTime, nodeId=toD7, elapsed=0, duration=216959570]
[0m[0m08:27:47,231 INFO [org.apache.camel.impl.DefaultShutdownStrategy] (Camel (camel-example) thread #70 - ShutdownTask) Waiting as there are still 1 inflight and pending exchanges to complete, timeout in 298 seconds. Inflights per route: [Route1 = 1]
I don't know if some processes are stuck making it impossible another processes to start.
I thought to remove the generic routes (PostMessageInActiveMQ and
GetDataAutomaticallyBySinceTime and to implements the same code in another routes (Route1, Route2 and Route3) but I don't think this is the best approach.
Routes:
Route1 (Route2 and Route3 are almost the same, just change properties values)
from("timer:Route1Timer?period=15m")
.routeId("Route1")
.autoStartup(false)
.setProperty("targetAddress", simple("hostname.route1"))
.process(new GetCurrentDate())
.setProperty("query",
simple("DataQuery%26URI=Route1%26format=xml%26Mode=since-time%26p1=${header.currentDate}"))
.to("direct:GetDataAutoBySinceTime");
GetDataAutomaticallyBySinceTime
from("direct:GetDataAutoBySinceTime")
.routeId("GetDataAutoBySinceTime")
.autoStartup(true)
.removeHeaders("*")
.setHeader(Exchange.HTTP_METHOD, constant("GET"))
.toD("http4:${header.targetAddress}/command=${header.query}%26httpClient.socketTimeout=3000")
.convertBodyTo(String.class, "utf-8")
.to("direct:PostMessageInActiveMQ");
PostMessageInActiveMQ
CamelArtemisComponent components = new CamelArtemisComponent();
getContext().addComponent("artemis", components.getArtemisComponent());
from("direct:PostMessageInActiveMQ")
.routeId("PostMessageInActiveMQ")
.autoStartup(true)
.convertBodyTo(String.class, "utf-8")
.inOnly("artemis:ARTEMIS.QUEUE");
Entire code: https://github.com/vitorvr/camel-example
EDIT:
Camel Version: 2.22.0

How to consume messages from a Topic ActiveMQ Artemis

I'm trying to work with topics on ActiveMQ Artemis.
I have created a Multicast Address and a Multicast Queue inside this Address.
Created 2 routes with Apache Camel to connect in this Topic, but when I post message only one Route consume the message and when I post another message, the secont Route that consume this message message.
Below the code and the output.
public class CamelRoutes {
public static void main(String[] args) throws Exception {
ActiveMQJMSConnectionFactory connection = new ActiveMQJMSConnectionFactory("tcp://localhost:61616", "admin", "admin");
CamelContext camel = new DefaultCamelContext();
camel.addComponent("amq", JmsComponent.jmsComponent(connection));
camel.addRoutes(new RouteBuilder(){
#Override
public void configure() throws Exception {
from("amq:TEST.TOPIC")
.routeId("Route1")
.log("ROUTE1: ${body}");
}
});
camel.addRoutes(new RouteBuilder(){
#Override
public void configure() throws Exception {
from("amq:TEST.TOPIC")
.routeId("Route2")
.log("ROUTE2: ${body}");
}
});
camel.start();
Thread.sleep(20000000);
}
}
2019-02-11 16:35:42 [Camel (camel-1) thread #1 - JmsConsumer[TEST.TOPIC]] INFO Route1:159 - ROUTE1: {"message":1}
2019-02-11 16:35:45 [Camel (camel-1) thread #2 - JmsConsumer[TEST.TOPIC]] INFO Route2:159 - ROUTE2: {"message":2}
2019-02-11 16:35:48 [Camel (camel-1) thread #1 - JmsConsumer[TEST.TOPIC]] INFO Route1:159 - ROUTE1: {"message":3}
2019-02-11 16:35:51 [Camel (camel-1) thread #2 - JmsConsumer[TEST.TOPIC]] INFO Route2:159 - ROUTE2: {"message":4}
2019-02-11 16:35:54 [Camel (camel-1) thread #1 - JmsConsumer[TEST.TOPIC]] INFO Route1:159 - ROUTE1: {"message":5}
You are consuming from the queue, not from the topic.
You need to correct your consumer's URI scheme.
Change your consumer to:
from("amq:topic:TEST.TOPIC");
This is how you can create queue consumer :
from("amq:queue:YOUR.QUEUE.NAME);
// or as queue is default value
from("amq:YOUR.QUEUE.NAME);
This is how you can create topic consumer :
from("amq:topic.YOUR.TOPIC.NAME);

Camel route shutdown during Day light Savings time

Below are my camel routes to send periodic messages and forward messages in queue to endpoint.
Event Route:
from("activemq:Queue.External?cacheLevelName=CACHE_CONSUMER&transacted=true")
.routeId("EventRoute")
.autoStartup(false)
.filter(messageFilter)
.process(eventTransformer)
.setHeader(Exchange.HTTP_METHOD, constant(HttpMethods.POST))
.setHeader(Exchange.ACCEPT_CONTENT_TYPE, constant("application/xml"))
.setProperty("eventEndpoint", constant(eventEndpoint))
.to(eventUri)
.process(eventResponse);
Periodic message route:
from("timer:monitor?fixedRate=true&period=" + (periodicMessageInterval() * 1000))
.routeId("periodicMessageRoute")
.autoStartup(false)
.process(periodicMessageTransformer)
.setHeader(Exchange.HTTP_METHOD, constant(HttpMethods.POST))
.setHeader(Exchange.ACCEPT_CONTENT_TYPE, constant("application/xml"))
.doTry()
.to(periodicMessageUri)
.process(periodicMessageResponse)
.doCatch(Exception.class)
.log(LoggingLevel.DEBUG, "Error response received: ${body}");
Route initialization for periodic message detection
builder.from("activemq:topic:Topic.Heartbeat?concurrentConsumers=1&maxConcurrentConsumers=1")
.routeId("periodic")
.log(LoggingLevel.DEBUG, "Periodic Message Received: ${id}")
.process(this);
builder.from("timer:monitor?fixedRate=true&period=" + monitoringIntervalInMilliseconds)
.routeId("timer")
.log(LoggingLevel.DEBUG, "Checking periodic message reception")
.process(exchange -> exchange.getIn().setBody(check()))
.choice().when(builder.body(Boolean.class)).to("direct:stopActiveMqRoutes")
.otherwise().to("direct:startActiveMqRoutes").end();
ProcessorDefinition routeDefinitionStop = builder.from("direct:stopActiveMqRoutes");
for (final String routeId : routeIds) {
routeDefinitionStop = routeDefinitionStop
.to("controlbus:route?routeId=" + routeId + "&action=status")
.choice().when(builder.body().isNotEqualTo("Stopped"))
.log(LoggingLevel.INFO, "Stopping route execution: " + routeId)
.to("controlbus:route?routeId=" + routeId + "&action=stop&async=true")
.end();
}
routeDefinitionStop.end();
ProcessorDefinition routeDefinitionStart = builder.from("direct:startActiveMqRoutes");
for (final String routeId : routeIds) {
routeDefinitionStart = routeDefinitionStart
.to("controlbus:route?routeId=" + routeId + "&action=status")
.choice().when(builder.body().isNotEqualTo("Started"))
.log(LoggingLevel.INFO, "Starting route execution: " + routeId)
.to("controlbus:route?routeId=" + routeId + "&action=start&async=true")
.end();
}
routeDefinitionStart.end();
Both the routes stopped during the day light savings time for 1 hour and the routes started automatically after 1 hour. Is it because of the Timer component in JDK that caused this issue?
Error log:
2017-11-05 01:01:13,921 INFO [route2] Stopping route execution: EventRoute
2017-11-05 01:01:13,921 INFO [route2] Stopping route execution: periodicMessageRoute
2017-11-05 01:01:26,671 INFO [org.apache.camel.impl.DefaultShutdownStrategy] Route: EventRoute shutdown complete, was consuming from: activemq://Queue.External?cacheLevelName=CACHE_CONSUMER&transacted=true
2017-11-05 01:01:26,672 INFO [org.apache.camel.impl.DefaultShutdownStrategy] Graceful shutdown of 1 routes completed in 22 seconds
2017-11-05 01:01:26,673 INFO [org.apache.camel.spring.SpringCamelContext] Route: EventRoute is stopped, was consuming from: activemq://Queue.External?cacheLevelName=CACHE_CONSUMER&transacted=true
2017-11-05 01:01:26,675 INFO [org.apache.camel.impl.DefaultShutdownStrategy] Starting to graceful shutdown 1 routes (timeout 300 seconds)
2017-11-05 01:01:26,676 INFO [org.apache.camel.impl.DefaultShutdownStrategy] Route: periodicMessageRoute shutdown complete, was consuming from: timer://monitor?fixedRate=true&period=30000
2017-11-05 01:01:26,680 INFO [org.apache.camel.impl.DefaultShutdownStrategy] Graceful shutdown of 1 routes completed in 0 seconds
2017-11-05 01:01:26,681 INFO [org.apache.camel.spring.SpringCamelContext] Route: periodicMessageRoute is stopped, was consuming from: timer://monitor?fixedRate=true&period=30000
2017-11-05 01:01:26,681 INFO [org.apache.camel.impl.DefaultShutdownStrategy] Starting to graceful shutdown 1 routes (timeout 300 seconds)
2017-11-05 01:01:26,681 INFO [org.apache.camel.impl.DefaultShutdownStrategy] Route: EventRoute shutdown complete, was consuming from: activemq://Queue.External?cacheLevelName=CACHE_CONSUMER&transacted=true
2017-11-05 01:01:26,681 INFO [org.apache.camel.impl.DefaultShutdownStrategy] Graceful shutdown of 1 routes completed in 0 seconds
2017-11-05 01:00:03,921 INFO [route3] Starting route execution: EventRoute
2017-11-05 01:00:03,924 INFO [route3] Starting route execution: periodicMessageRoute
2017-11-05 01:00:03,943 INFO [org.apache.camel.spring.SpringCamelContext] Route: EventRoute started and consuming from: activemq://Queue.External?cacheLevelName=CACHE_CONSUMER&transacted=true

Resources