We have a clustered environment where same camel ftp endpoint is installed on multiple fuse instances. I want message(file) to be consumed by only one fuse instance. I am planning to achieve this by implementing IdempotentRepository interface with database persistence. Want to make sure that this approach will work or there is a better way to do it?
If you don't want to depend on a database for doing this you could try with Hazelcast. Hazelcast is a distributed data cache that can be used as a idempotent repository without external dependencies. Also, Apache Camel provides a class for doing this. There is a nice tutorial explaining how to do it: Hazelcast Idempotent Repository Tutorial
Related
For integration purposes, we're using Apache Camel, Karaf with OSGi, so we are creating OSGi bundles. However, what Best Practices exist when it comes to structuring the bundles?
The integrations are fairly straightforward, with an incoming document type (via some protocol like HTTPS, SFTP, JMS), transformation to another document type, and again transportation via some protocol. The basic setup is always the same and follows the VETO Pattern: validate, enrich, transform, operate. Each unique combination of the mentioned protocol/docType defines an integration.
We decouple the connectivity (which includes validation) from the other steps via JMS. When we look at the ETO steps we separate those into their own Java classes and their corresponding XSLT. However, what's the added value of the OSGi framework and how should we divide the integrations between the OSGi bundles?
Take into account performing changes, maintenance and deployments? Consider 2 dozen integration points (unique endpoints) with 50 different integrations running in between, in other words 50 unique transformations between two different docTypes. We can put all code & XSLT's of all 50 integrations in 1 bundle (the other bundle handling connectivity), or 50 bundles with 1 integration each. What are best practices, if any, when it comes to deployment strategy? What to take into account?
You can check out examples from Apache Karaf github repository to see how bundles for OSGi applications are structured there. Christian Schneider has also done talk about OSGi best practices and has some examples in his repository as well.
Generally you want to keep your bundles small with least amount of dependencies as possible. Due to this I would recommend having only one integration per bundle. This makes installing integrations and their dependencies easier and offers some flexibility if you ever decide to split integrations between multiple Karaf instances.
For connectivity stuff your best bet is usually to use/create/publish OSGi services. For example with pax-jdbc-config you can use config files to create new DataSource type services which you can then use to connect to different databases from your integration bundles.
Publishing own custom services is pretty easy with declarative services and could easily be used to share connections to internal systems, blob storages, data access objects, etc while maintaining loose coupling by keeping actual implementations hidden with interfaces. For services the recommended approach is to have separate API and implementation bundle so bundles that use the service can just add dependency to the API bundle.
For deployment you can create your own custom Karaf distribution with bundles pre-installed, deploy bundles using Karaf features or use the hot deploy folder. For the two latter ones you might want to configure Karaf to use external folder for bundle configurations and hot deploy as process of updating Karaf is basically replacing it with new installation.
[Edit]
If we have 50+ transformations and put each in its own bundle, then I would have more than 50 bundles to manage. This is just a single instance. Other customers would have their own instances, again running 50+, 100+ bundles
Would say that the key thing here is to try to simplify and identify repetition in bundles. Often these can be converted to something more generic and re-usable.
With OSGi you can use Declarative services and OSGiDefaultCamelContext to instantiate Camel integration instances per configuration file which can be useful if you have multiple integrations that work pretty much the same but only have minor variations. Don't know if camel has support for this with blueprints.
With many bundles efficient use of Karaf features or OSGi features (R8) can be vital for handling the added complexity. Features make it considerably easier to install and update OSGi applications which consist from multiple bundles, configuration files and other features.
Still there's really no rule on how big is too big for single OSGi bundle. Grouping closely related things in to single bundle can make a lot of sense and help avoid splintering things too much.
There is a requirement where we get a stream of data from Kafka Stream and our objective is to push this data to SOLR.
We did some reading but we could find there are lot of Kafka Connect solutions available in the market, but the problem is we do not know which is the best solution and how to achieve.
The options are:
Use Solr connector to connect with Kafka.
Use Apache Storm as it directly provides support for integrating with Solr.
There is no much documentation or in depth information provided for the above mentioned options.
Will anyone be kind enough to let me know
How we can use a Solr connector and integrate with Kafka stream without using Confluent?
Solr-Kafka Connector: https://github.com/MSurendra/kafka-connect-solr
Also, With regard to Apache Storm,
will it be possible for Apache Storm to accept the Kafka Stream and push it to Solr, though we would need some sanitization of data before pushing it to Solr?
I am avoiding Storm here, because the question is mostly about Kafka Connect
CAVEAT - This Solr Connector in the question is using Kakfa 0.9.0.1 dependencies, therefore, it is very unlikely to work with the newest Kafka API's.
This connector is untested by me. Follow at your own risk
The following is an excerpt from Confluent's documentation on using community connectors, with some emphasis and adaptations. In other words, written for Kafka Connects not included in Confluent Platform.
1) Clone the GitHub repo for the connector
$ git clone https://github.com/MSurendra/kafka-connect-solr
2) Build the jar with maven
Change into the newly cloned repo, and checkout the version you want. (This Solr connector has no releases like the Confluent ones).
You will typically want to checkout a released version.
$ cd kafka-connect-solr; mvn package
From here, see Installing Plugins
3) Locate the connector’s uber JAR or plugin directory
We copy the resulting Maven output in the target directory into one of the directories on the Kafka Connect worker’s plugin path (the plugin.path property).
For example, if the plugin path includes the /usr/local/share/kafka/plugins directory, we can use one of the following techniques to make the connector available as a plugin.
As mentioned in the Confluent docs, the export CLASSPATH=<some path>/kafka-connect-solr-1.0.jar option would work, though plugin.path will be the way moving forward (Kafka 1.0+)
You should know which option to use based on the result of mvn package
Option 1) A single, uber JAR file
With this Solr Connector, we get a single file named kafka-connect-solr-1.0.jar.
We copy that file into the /usr/local/share/kafka/plugins directory:
$ cp target/kafka-connect-solr-1.0.jar /usr/local/share/kafka/plugins/
Option 2) A directory of dependencies
(This does not apply to the Solr Connector)
If the connector’s JARs are collected into a subdirectory of the build’s target directories, we can copy all of these JARs into a plugin directory within the /usr/local/share/kafka/plugins, for example
$ mkdir -p /usr/local/share/kafka/plugins/kafka-connect-solr
$ cp target/kafka-connect-solr-1.0.0/share/java/kafka-connect-solr/* /usr/local/share/kafka/plugins/kafka-connect-solr/
Note
Be sure to install the plugin on all of the machines where you’re running Kafka Connect distributed worker processes. It is important that every connector you use is available on all workers, since Kafka Connect will distribute the connector tasks to any of the worker
4) Running Kafka Connect
If you have properly set plugin.path or did export CLASSPATH, then you can use connect-standalone or connect-distributed with the appropriate config file for that Connect project.
Regarding,
we would need some sanitization of data before pushing it to Solr
You would need to do that with a separate process like Kafka Streams, Storm, or other process prior to Kafka Connect. Write your transformed output to a secondary topic. Or write your own Kafka Connect Transform process. Kafka Connect has very limited transformations out of the box.
Also worth mentioning - JSON seems to be the only supported Kafka message format for this Solr connector
I have a multi-module maven based project which has a number of Spring Boot applications, a couple of which (lets call them A and B) connect to a database (I have a separate module with the database related code on which both applications depend.) I am also using Flyway to maintain the database versioning and maintain the database structure.
What is the best approach to maintain the database properties? At the moment I have 3 places where I am repeating the same thing. I have the application.yml of module A and application.yml of module B, since both are separate Spring Boot applications. Then I have the Flyway plugin configuration again which needs the properties in the pom.xml to be able to perform its tasks, like clean, repair and migrate.
What is the proper approach to centralise and externalise this information, like the database URL, username and password? I am also facing the issue that each time I pull the new code onto the test system I have to update the same data again because it gets overwritten, and the database configuration on the test system is different from my local development environment.
What is the best strategy to manage this?
Externalize your configuration into a configuration module. This, of course, depends on how flexible Flyway / Spring Boot are from using classpath based properties.
Look at something like archaius and make your configuration truly externalized, centralized and dynamic by having it backed by, say, an external datastore. More work involved here but gives you additional benefits, like being able to change config in one place and have them dynamically picked up in running applications everywhere.
It's not an easy problem to solve and definitely involves some work to make your tools cooperate by hooking into their lifecycle.
For your flyway you could use the maven-properties-plugin. In that way you can externalize the credential to a properties file. An example is described here.
For the spring-boot application I will recommend the spring cloud config . With the spring cloud config you can externalize your config to a git repository which can be discovered over an Eureka Service, e.g. like here. I will consider to restructure the modules to independent microservices. A good infrastructure for a microservice based architecture provides the JHipster project.
What is the difference between Apache camel-jbpm and jboss jbpm ?
Since Apache camel(2.16.3) is having one component as camel-jbpm.
I am confused which one I should use ? I am integrating with karaf. please suggest.
JBoss BPM (business process - a.k.a human workflow) is a project you can find and read more about here:
http://www.jbpm.org/
Apache Camel is an integration library that allows to integrate with a lot of different system. Doing so by using Camel components. One of these components is camel-jbpm that makes it possible/easier to use JBPM from Camel users.
http://camel.apache.org/jbpm
So if you have an existing BPM system and need to integrate with that from a Camel application or Java application, then using the camel-jpmn can make that (much) easier.
I have an OSGi-based, server side application that uses the filesystem to store scripts and configuration data.
In time, I'd like to move that application to 'the cloud', and that's not going to work well with its current dependency to file system access.
What I'd like to do is insert a JCR layer into this application, so it will still work in the current situation (regular files on the local filesystem), but will pave the way to a cloud situation.
I did find a file connector in modeshape, but I ran into a pretty severe incompatibility with OSGi, which hasn't been fixed. Besides, ModeShape pulls in LOTS of dependencies (about 6 MB, I think), which is a problem for me.
So I don't see any options besides starting to hack my own JCR implementation, which I am reluctant to do.
Any ideas?
Although you wouldn't be using JCR directly, using the Apache Sling ResourceProvider mechanism should allow you to move easily from filesystem to something else later, and it's OSGi-friendly as Sling is 100% based on OSGi.
You could start now by using Sling's Filesystem resource provider ( http://sling.apache.org/site/accessing-filesystem-resources-extensionsfsresource.html ) and later move to your own custom ResourceProvider, as needed.
The source code of the filesystem provider is at https://svn.apache.org/repos/asf/sling/trunk/bundles/extensions/fsresource - it's quite simple code that can be used as an example for creating your own ResourceProvider.
For your custom system the question would be how many Sling bundles you need to get that working - I don't know off the top of my head but would suggest using the Sling Launchpad to find out, it launches a vanilla Sling system with lots of bundles that you won't need, but you could try reducing it to the minimum that still allows the ResourceProvider mechanism to work.
You can also use Apache Commons VFS2, there is for example a JCR connector, or you can use webdav or a JDBC table. I use this in a commercial project on top of an atomic (git like) tree on top of a shared JDBC table.