Do Apache Camel and Apache Airflow overlap and how do they compare?

Do Apache Camel and Apache Airflow overlap and how do they compare? - apache-camel

We are currently using Apache-Camel for ETL, that is, we take daily/weekly/monthly exports from various databases, perform needed actions and then publish the results somewhere for other databases to ingest.
Recently i saw a talk on Apache-Airflow, and it seems to me that it can do the work Camel is doing only easier. By easier i mean it looks like it would be more self-documenting and therefore easier to maintain. Am i correct? And why are there no comparisons between the two, like there are between Camel and Mule?

Apache Camel and Apache Airflow were written for different purposes. The former as a Enterprise Integration Framework, the latter as a platform to programmatically author, schedule and monitor workflows, this is why they are not generally compared side-by-side.
Apache Camel can be used for ETL: think of ETL as a process integrating the operational DB and the datawarehouse, and think of each step in the ETL data-processing process as a message.
Would it be easier to perform the task we are doing now, if we changed to Airflow? Well, generally how well suited a framework is for a specific company's needs depends on how things are set up on site. In our case we have chosen for Java and we want our processes to run on windows machines and on linux. The comparison then becomes:
Camel's main advantages are that it we are already using it, it's Java, and there is even a Spring boot auto-configuration.
The main disadvantages are that it is hard to maintain: understanding what exactly happens when and why, is hard. This is not directly caused by the features Camel has as a Enterprise Integration Framework, but because it is not tailored to simplify workflows.
Airflow is specifically written with scheduling interdependent jobs in mind, it even has a GUI to simplify this task.
For us it would require additional installations and it may not work with our Java-witten jobs out-of-the-box (i know that it is possible to call java from python, but this just adds more complexity).
For my needs i'm going to explore other options and maybe just leave things the way they are.

It depends on the type of problem(s) you are looking to solve. Apache Camel is an enterprise integration framework that implements well-known, accepted Enterprise Integration Patterns to provide specific solutions to types of well known problems.
Apache Airflow does not implement these integration patterns and therefore would be less useful in solving these specific types of problems.
From my experience with Camel, it is often misused as a generic platform to solve non enterprise-integration problems, which leads to dealing with the unnecessary overhead and constraints of the framework.
Using your ETL problem as an example, I would think that Apache Camel would be unnecessary unless you were doing some form of Message Routing or Message Transformation of the data that would warrant/benefit from using an integration solution such as Camel. The solutions that Apache Camel offers for these well-known integration problems are the real benefit to using Apache Camel over another tool or doing it by hand.
TLDR; To answer your question, Apache Camel is an Enterprise Integration Framework for solving specific types of integration problems and Apache Airflow is not. That is likely why there is no comparison between the two - they are apples and oranges, in a sense.
While you may be able to do some of the same things in both, Apache Camel will also have complex integration solutions out of the box that Airflow won't.

Related

Apache Camel vs Apache Nifi

I am using Apache camel for quite long time and found it to be a fantastic solution for all kind of system integration related business need. But couple of years back I came accross the Apache Nifi solution. After some googleing I found that though Nifi can work as ETL tool but it is actually meant for stream processing.
In my opinion, "Which is better" is very bad question to ask as that depend on different things. But it will be nice if somebody can describe more about the basic comparison between the two and also the obvious question, when to use what.
It will help to take decision as per my current requirement, which will be the good option in my context or should I use both of them together.

The biggest and most obvious distinction is that NiFi is a no-code approach - 99% of NiFi users will never see a line of code. It is a web based GUI with a drag and drop interface to build pipelines.
NiFi can perform ETL, and can be used in batch use cases, but it is geared towards data streams. It is not just about moving data from A to B, it can do complex (and performant) transformations, enrichments and normalisations. It comes out of the box with support for many specific sources and endpoints (e.g. Kafka, Elastic, HDFS, S3, Postgres, Mongo, etc.) as well as generic sources and endpoints (e.g. TCP, HTTP, IMAP, etc.).
NiFi is not just about messages - it can work natively with a wide array of different formats, but can also be used for binary data and large files (e.g. moving multi-GB video files).
NiFi is deployed as a standalone application - it's not a framework or api or library or something that you integrate in to something else. It is a fully self-contained, realised application that is fully featured out of the box with no additional development. Though it can be extended with custom development if required.
NiFi is natively clustered - it expects (but isn't required) to be deployed on multiple hosts that work together as a cluster for performance, availability and redundancy.
So, the two tools are used quite differently - hopefully that helps highlight some of the key differences

It's true that there is some functional overlap between NiFi and Camel, but they were designed very differently:
Apache NiFi is a data processing and integration platform that is mostly used centrally. It has a low-code approach and prefers configuration.
Apache Camel is an integration framework which is mostly used in distributed solutions. Solutions are coded in Java. Example solutions are adapters, flows, API's, connectors, cloud functions and so on.
They can be used very well together. Especially when using a message broker like Apache ActiveMQ or Apache Kafka.
An example: A java application is enhanced with Camel so that it can send messages to Kafka. In NiFi the first step is consuming those messages from Kafka. Then in the NiFi flow the message is changed in various steps. In the middle the message is put on another Kafka topic. A Camel function (CamelK) in the cloud does various operations on the message, when it's finished it put the message on a Kafka topic. The message goes through a NiFi flow which at the end calls an API created with Camel.
In a blog I wrote in detail on the various ways to combine Camel and Nifi:
https://raymondmeester.medium.com/using-camel-and-nifi-in-one-solution-c7668fafe451

Apache Flink Stateful Functions python vs java performance

What are the advantages and disadvantages of using python or java when developing apache flink stateful function.
Is there any performance difference? which one is more efficient for the same operation?
Can we develop the application completely on python?
What are the features that one supports and the other does not.

StateFun support embedded functions and remote functions.
Embedded functions are bundled and deployed within the JVM processes that run Flink. Therefore they must be implemented in a JVM language (like Java) and they would be the most performant. The downside is that any change to the function code requires a restart of the Flink cluster.
Remote functions are functions that are executing in a separate process, and are invoked by the Flink cluster for every incoming message addressed to them. Therefore they are expected to be less performant than the embedded functions, but they provide a great flexibility in:
Choosing an implementation language
Fast scaling up and down
Fast restart in case of a failure.
Rolling upgrades
Can we develop the application completely on python?
Is it is possible to develop an application completely in Python, see the python greeter example.
What are the features that one supports and the other does not.
The current features are currently supported only in the Java SDK:
Richer routing logic from an ingress to a function. Any routing logic that you can describe via code.
Few more state types like a table and a buffer.
Exposing existing Flink sources and Sinks as ingresses and egresses.

When would you use an enterprise service bus and an integration framework like Apache Camel?

I was trying to get a fix on when using Apache Camel would be appropriate and inappropriate from reading this article -- https://dzone.com/articles/when-use-apache-camel . What the article mentions is that when the number of services is low, using an integration framework like Camel might be overkill, which makes sense. But I was confused by this sentence
Although FuseSource offers commercial support, I would not use Apache
Camel for very large integration projects. An ESB is the right tool
for this job in most cases. It offers many additional features such as
BPM or BAM. Of course, you could also use several single frameworks or
products and „create“ your own ESB, but this is a waste of time and
money (in my opinion).
Is this because the integration framework lacks components that an ESB provides? If so, what are those?

At a functional level, Apache Camel does everything that all the other ESB's do and is a fine choice for pretty much any integration work. It has connectors (components in camel-speak) for every transport you can think of, deals with clustering, can provide a JMS broker and whatever else you need to integrate.
It doesn't have the nice UI's and IDEs that other tools (Tibco, webMethods, Boomi etc) which is a big advantage. The developers might actually write unit tests if you use Camel :) I'm joking of course, integration devs never write unit tests.
In terms of "weight" Camel itself is not too bloated. It can be used as a standalone runtime or you can simply leverage the integration capabilities as a library in another app. It heavily utilises spring and can run a large number of threads, so requires a reasonable amount of memory (~512Mb JVM Heap would be the lower bound) but is not difficult to use. It wont fight you.
JBoss Fuse is the full blown Red Hat supported Enterprise ESB. It is based around Camel, but uses apache karaf as the runtime which is a OSGi container. This is heavy weight but gives you a very powerful ESB runtime, deployment model and management interface and you can buy commercial support for Fuse and ActiveMQ from Red Hat. This is the more traditional "ESB" platform, but deep down inside all the integration functionality comes from Camel.

I have been working with various products that are used to implement ESBs in the past 10 years: Apache Camel, Mule, Oracle Service Bus, IBM Integration Bus (WebSphere Message Broker), Spring Integration, FUSE, etc. I can tell you what are the differences from various perspectives, but most importantly, what you have to understand is that ESB is an abstract concept - basically a "bus" in terms of integration patterns that connects various services in an entreprise. For this reason, you can build an ESB with a product that targets this specific need (e.g. IIB) or you can build it with something much more generic (e.g. SpringIntegration).
Management point of view:
The top-end commercial products (the ones you refer to "ESBs") like the ones from Oracle or IBM are well suited for this job. They have nice user interfaces that are made for implementing a bus pattern. On the workforce market you will need to find (may be simple or not) certified specialists to operate these products. Behind these products you will get support of big companies with which the enterprise may already have existing contracts. Often these products will have very specific connectors that may (or not, see developer point of view) ease your life. To give you an example, IBM's product will have very efficient connectors for IBM MQ or Mainframe CICS. They also offer out-of-the box integrations with other products like BPM. When building bigger implementations of ESBs, the projects are structured by the fact that not everything can be done in such a product - this means that you have smaller risk but also less flexibility. In terms of problems management, you are relatively safe because you blame the big company that supports you.
Towards the more "open" products/frameworks like FUSE, Mule, Camel, etc., you will start getting very flexible software that may be also cheaper as a start price. For working with it you will need generic profiles, but you will also need software architects to design your product because nothing forces you build the message flows in a particular way. At a certain point you will need at least some very skilled developers because you may not get efficient support, depending on the product you choose. In terms of problems management, you are fully responsible for this choice.
Developer point of view:
Top-end commercial products will come with in-built connectors and other operators - quick to build a POC. However, any customization will be a pain (for instance you may have an HTTP connector, but you may need to switch to some new SHA algorithm that has not been standard at the time you installed). The introduction of specific custom connectors will be possible, but much more complicated than with "open" products. You'll get nice user interfaces that will hide the code. This will mean a series of drawbacks for you, for instance code repository usage will not be efficient (can't diff two versions), unit tests are difficult to write, etc. The integrations with other products like BPM will force you to use a pre-chosen product, anything else will be doable but expensive.
The open-end products/frameworks will be flexible, everybody in the team will understand the product quickly (e.g. provided everyone knows Spring). Changing some core functionality might be very cheap. However, things can quickly degenerate if there is no supervision of experienced members because you can code pretty much anythig.
Architect point of view:
That said, there is something about the ESB that you should know in advance. You should be very clear in terms of capabilities of what you need to do (let's say: routing, service discovery, protocol conversion, etc.). Should you need to implement something complex in an ESB, for instance transformation of the business message, means that the business analyst will need to interact with your ESB. So for instance the expert in banking payments will need to write an XSLT for you. This XSLT might be simple, but it may also cause big trouble, for which you will be responsible. Now if this person also needs to know the product of your choice, this complicates things. Therefore the added value of integrating an ESB with BPM, BAM, etc. is very questionable.
Another thing to note is that for most enterprises nowadays, the ESB is not something that would create some competitive advantage, like for instance a web page. So investing in a super complex development or out-of-the-box product needs to be challenged.
Conclusion
Not sure if it was overwhelming or understandable, but in the end it boils down to the resources you have and the complexity of the task.
Just as a comment, contrary to another answer on this question, if you are building something complex:
Pay attention to the Architect's point of view above.
You may seriously consider the open end of products because by experience I can tell you that no matter what kind of support you have, when the things get nasty you better have very skilled people in-house and no official support will really save you (apart from your face).

A recent article by codecentric states as follows:
Apache Camel is a framework full of tools for routing data within an application. It is a framework you use when a full-blown Enterprise Server Bus is not (yet) needed.
You can find the full article here

Don't get too stuck in words like ESB and frameworks. The things that should decide on what kind of platform you need should depend on:
Your current and future system landscape
Your current and future requirements (technical and non-functional)
Your in house competence
Your budget
After going through those steps then you can evaluate and come to a conclusion on what you need .For some apache camel is the best fit for some other platforms may be a better fit. Don't get stuck in words like ESB etc.

How to integrate various application and provide common interface to access their data?

we have a few various applications that stores its data and we need one common service which provides access to these data.
With the applications I mean for example Atlassian Jira, Confluence, SVN, Git, LDAP, few internal mysql databases, etc. Some of them offers you SOAP API, REST API, various command line clients, for some you have to directly access database to get data.
What we want is a common REST API interface, to access all possible data sources. Of course, we have to solve authentication and authorization, caching and many more tasks.
It seems that something like ESB - Enterprise service bus and EIP - Enterprise integration patterns is answer to our needs.
For start, we are playing and actually dig in to Apache Camel - it's not full EIP stack, it's "just" a integration framework. But I guess it's good enough for us right now.
My question is - what you mean about the solution? Are we on the good way?
Thanks!

Camel has a lot of connectors, so that would be a great start.
If you are afraid it is too thin, then take a look at Apache ServiceMix, which provides a deployment (OSGi) container for camel routes (and other things). Camel comes bundled within the standard service mix release out of the box.
The hard task is probably to design the generic API good enough to cover your use cases.
A GIT repo and a Database are very different, is this very generic? Do you only want to access "text" data or something?
I like the approach with camel non the less, since it's rather generic and flexible in these kind of scenarios. That you will need

Apache SOA vs. Mule

I'm looking for a high level technical gap analysis of the Apache ESB/SOA stack (Servicemix, Camel, ActiveMQ, CXF) vs. comparable Mule technologies.
As well, I'm trying to better understand how these frameworks are viewed amongst developers in terms of learning curve, stability, scalability and overall ability to meet client requirements...

It's not really an answer, but too long to be added as a comment.
Gartner does such comparisons (example), so does Forrester (example1; example2), but their papers are:
expensive to obtain
focusing more on the market share and the hype, less on the technical capability to deliver a solution
mainly about commercial products - maybe because market share for open source is difficult to measure (no licenses sold)
I personally have experience with Oracle Fusion (bad), Tibco (better) and Vitria (outdated), but I'm not up to the challenge to do a detailed comparison...

Camel uses a Java Domain Specific Language in addition to Spring XML for configuring the routing rules and providing Enterprise Integration Patterns
Camel's API is smaller & cleaner (IMHO) and is closely aligned with the APIs of JBI, CXF and JMS; based around message exchanges (with in and optional out messages) which more closely maps to REST, WS, WSDL & JBI than the UMO model Mule is based on
Camel allows the underlying transport details to be easily exposed (e.g. the JmsExchange, JbiExchange, HttpExchange objects expose all the underlying transport information & behaviour if its required). See How does the Camel API compare to
Camel supports an implicit Type Converter in the core API to make it simpler to connect components together requiring different types of payload & headers
Camel uses the Apache 2 License rather than Mule's more restrictive commercial license

Mulesoft Anypoint is a ready to use full-stack integration platform. Apache components functionally provide similar capabilities but generally take more time to implement and support. Both allow dropping down to Spring / Java level therefore no true technical gaps in either. The choice would depend on the business goals, available budget, and the scope and number of the integration projects. Mule offers better time to market and is easier to operate, but ain't particularly cheap. Apache stack is free but developers' time (generally) is not.

Camel is a EAI Framework and It doesn't have it's own runtime but other side Mule is full ESB product having it's own run-time. Mule has lot of connector to integrate with other system and stand itself as light weight ESB. Developers have full liberty to write own connector or invoke existing Java library to avoid rework.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight