Apache Camel, SpringBoot application in PCF results with message failures - apache-camel

We have developed Apache Camel, SpringBoot based application to read data from Oracle table and do some transformation of record and publish to Kafka topic. We are using Camel SQL component for DB integration and implemented Split & parallel processing pattern to parallelize processing to achieve high throughput. Oracle tables are partitioned, so we are creating multiple routes, one route per table partition, to speed up the processing. We have 30 partitions in table and so created 30 routes.
from(buildSelectSqlEndpoint(routeId))
.process(new GenericEventProcessor("MessagesReceived"))
.split(body())
.parallelProcessing()
.process(new GenericEventProcessor("SplitAndParallelProcessing"))
.process(new InventoryProcessor())
.process(new GenericEventProcessor("ConvertedToAvroFormat"))
.to(buildKafkaProducerEndPoint(routeId));
I tested the application in local laptop with an adequate load and it is processing as expected. But when we deployed the application in PCF, we see some of the threads are failing. I have enabled the Camel debug log and i see below debug log line -
Pipeline - Message exchange has failed: so breaking out of pipeline
for exchange
Due to these thousands of messages are not published to Kafka.
From initial analysis, i figured out Camel is creating one thread for each route and based on maxMessagePerPoll configuration, it creates number of threads equal to maxMessagePerPoll. With Split & parallel processing enabled, Camel creates additional threads equal to maxMessagePerPoll for Kafka publish. with this approach, we will have hundreds of threads created and not sure any issue with PCF. Also I tried number of routes with 2, to check the message delivery failures, and still see hundreds of failures and with only 2 routes, increases total processing time for millions of records.
Could you please let me know, how to use Apache Camel with containers like PCF? any additional configurations we need to have in PCF or Camel ?

Related

Apache Nifi Site To Site Data Partitioning

I have a single output port in NiFi flow and I have a Flink job that's consuming data from this port using NiFi Site To Site protocol (Flink provides appropriate connector). The consumption is parallel - i.e. there are multiple Flink sources reading from the same NiFi port.
What I would like to achieve is kind of partitioned data load balancing between running Flink sources - i.e. ensure that data with the same key is always delivered to the same Flink source (similar to ActiveMQ message groups or Kafka partitioning). This is needed for ordering purposes.
Unfortunately, I was unable to find any documentation telling how to accomplish that.
Any suggestions really appreciated.
Thanks in advance,
Site-to-site wasn't really made to do what you are asking for. The best way to achieve it would be for NiFi to publish to Kafka, and then Flink consume from Kafka.

Does Apache Camel create multiple threads for multiple from() inside configure() in 3 server nodes?

If we deploy following camel code in three Wildfly nodes:
configure(){
from("sftp").to("MyQueue")
from("MyQueue").to("database")
}
How does program execute in all three nodes?
Does it creates 6 threads i.e 1 thread for from("sftp") - Polling and 1 thread for from("queue") for listening to sftp response.
Not sure about the SFTP consumer, but 1 sounds coherent because an SFTP endpoint does not support concurrency.
Said that, you should take care when you run multiple Camel applications that consume from the same FTP endpoint. Otherwise you will get lots of errors because they compete against each other about the same files.
For a JMS consumer you can configure the number of concurrent consumers when you configure the broker connection.

Transfer events from Apache Flink to Apache NiFi - poor performance

I integrated two tools Apache NiFi and Apache Flink. NiFi takes events and send them to Flink, after that Flink returns these events after some processing to the same NiFi.
I built source and sink to Nifi in Flink. The whole process works, but the performence of the sink is very poor (about 10 events per second).
If I remove the sink (print the output only), the process speed is much higher.
I figured out that I can change parallelism for the sink process using setParallelism(), of course it helps, but the base throughput is too low.
I also tried to use requestBatchCount(1000), but nothing has changed.
Probably my problem is related with transactions. After each event sink wait for closing the transaction, but I'm not sure and I don't know how to change it e.g. send hundreds of events in one transaction.
What can I do for increase the performance for sink?
Here is my sink definition:
SiteToSiteClientConfig sinkConfig = new SiteToSiteClient.Builder()
.url("http://" + host + ":" + port + "/nifi")
.portName("Data from Flink")
.buildConfig();
outStream.addSink(new NiFiSink<String>(sinkConfig, new NiFiDataPacketBuilder<String>() {
public NiFiDataPacket createNiFiDataPacket(String s, RuntimeContext ctx) {
return new StandardNiFiDataPacket(s.getBytes(), new HashMap<>());
}
}));
Now I'm using the latest version of Flink (1.5.1) and NiFi (1.7.1)
The NiFiSink provided by Apache Flink is creating a transaction for each event:
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-nifi/src/main/java/org/apache/flink/streaming/connectors/nifi/NiFiSink.java#L60-L67
It was done this way to make the error handling clear so that if the transaction fails to commit, it will throw an exception out of the invoke method in the context of the specific event that failed.
You can implement your own custom version of this that lets many events be sent before calling commit on the transaction, but I don't know enough about Flink to understand what happens if the transaction fails to commit later on and you no longer have the events.

How can I integrate my own XA transaction manager with Apache Camel?

I'm trying to create a router to integrate a number of JMS topics & Queues. I am constrained by the fact the client I am working for can't change the JMS implementation (TibCo EMS with some custom client libraries) and the fact that they have written their own XA transaction manager which doesn't quite conform with the JTA spec. It is very important that message delivery is guaranteed.
I've done a lot of reading and experimenting with Camel and I've realised that I probably need to write my own JMS component, as the standard JMS component is not going to integrate with the JMS client libraries or TM I have.
I need to be able to put hooks into the route lifecycle at the following points:
During the route startup, I need to identify all JMS connections and enlist them as XA resources with the TM implementation
When a message is received at the consumer, I need to start a transaction including all the JMS connections in the route
When a routing decision is made, I need to send the message to the producer and commit the transaction
Given the above, I think I can implement a very simplified version of the camel-jms component which strips out all the Spring parts and only contains the bare minimum required to interact with my JMS libraries.
Where would be the best place to initialise the transaction manager? I've been looking at DefaultCamelContext, RoutePolicy and RouteContext but I can't find a place where all the endpoints are resolved and initialised.
I solved this problem by implementing the UserTransaction and TransactionManager interfaces and creating a PlatformTransactionManager which the Camel JMS component uses to create the DefaultMessageListenerContainer.
One important point to note is that the transacted property on the Camel JMSComponent refers to local transactions, not XA transactions. If you set this property to true after passing a PlatformTransactionManager to the component, the DMLC will effectively try to commit your transaction twice, which won't work.
This leaves me with a nice working example consuming from one JMS broker and producing to another, but it is very slow - ~5 messages per second. Unfortunately Spring JMS does not support batching so it seems the best solution here is to adjust the JMS topic configurations such that routing only takes place between topics on the same broker.

Resuming Camel Processing after power failure

I'm currently developing a Camel Integration app in which resumption from a previous state of processing is important. When there's a power outage, for instance, it's important that all previously processed messages are not re-processed. The processing should resume from where it left off before the outage.
I've gone through a number of possible solutions including Terracotta and Apache Shiro. I'm not sure how to use either as documentation on the integration with Apache Camel is scarce. I've not settled on the two, however.
I'm looking for suggestions on the potential alternatives I can use or a pointer to some tutorial to get me started.
The difficulty in surviving outages lies primarily in state, and what to do with in-flight messages.
Usually, when you're talking state within routes the solution is to flush it to disk, or other nodes in the cluster. Taking the aggregator pattern as an example, aggregated state is persisted in an aggregation repository. The default implementation is in memory, so if the power goes out, all the state is lost. However, there are other implementations, including one for JDBC, and another using Hazelcast (a lightweight in-memory data grid). I haven't used Hazelcast myself, but JDBC does a synchronous write to disk. The aggregator pattern allows you to resume from where you left off. A similar solution exists for idempotent consumption.
The second question, around in-flight messages is a little more complicated, and largely depends on where you are consuming from. If you're in the middle of handling a web service request, and the power goes out, does it matter if you have lost the message? The user can simply retry. Any effects on external systems can be wrapped in a transaction, or an idempotent consumer with JDBC idempotent repository.
If you are building out integrations based on messaging, you should consume within a transaction, so that if your server goes down, the messages go back into the broker and can be replayed to another consumer.
Be careful when using seda: or threads blocks, these use an in-memory queue to pass exchanges between threads, any messages flowing down these sorts of routes will be lost if someone trips over the power cable. If you can't afford message loss, and need this sort of processing model, consider using a JMS queue as the endpoints between the two routes (with transactions to ensure you pick up where you left off).

Resources