Sling vs CMIS as a REST interface for Apache jackrabbit - jackrabbit

If I get it, Apache Sling acts a REST CRUD interface for a Jackrabbit JCR repository.
As there already exists a RESTful protocol (CMIS in its AtomPub implementation)
to work with JCR repositories, apart from the view/templating layer, is there
any advantage in using Apache Sling vs CMIS (ie. via Apache Chemistry) ?
Is there anything that can be done using JCR (Sling) that CMIS does not support ?

A few characteristics of both protocols might help you choosing one to work with.
CMIS
Started as a mean to federate content across different (document) content repositories, its core business is letting diverse content repositories talk to each other over a web interface (REST / WS). While the latest edition of the protocol improved browser operations with content through the JSON based browser binding, CMIS often shows up as a bit of a chatty protocol, not always shining for content delivery.
PROS: standard, supported by a multitude of vendors, supports a slightly richer data model (Renditions, Policies)
CONS: chatty, lots of XML parsing if using the AtomPub or WS bindings, can't create custom service APIs
Sling
I am no expert on this, but as far as I gathered it's a lightweight, extensible HTTP layer on top of JCR. Data processing logic is tied to the content you request via HTTP as Components, giving you the ability to process and eventually optimize content before delivery.
PROS: adds a data processing layer on top of content retrieval, works on plain HTTP without complex payloads to describe each action
CONS: non standard, can't easily swap content repository

Related

Apache Camel vs Apache Nifi

I am using Apache camel for quite long time and found it to be a fantastic solution for all kind of system integration related business need. But couple of years back I came accross the Apache Nifi solution. After some googleing I found that though Nifi can work as ETL tool but it is actually meant for stream processing.
In my opinion, "Which is better" is very bad question to ask as that depend on different things. But it will be nice if somebody can describe more about the basic comparison between the two and also the obvious question, when to use what.
It will help to take decision as per my current requirement, which will be the good option in my context or should I use both of them together.
The biggest and most obvious distinction is that NiFi is a no-code approach - 99% of NiFi users will never see a line of code. It is a web based GUI with a drag and drop interface to build pipelines.
NiFi can perform ETL, and can be used in batch use cases, but it is geared towards data streams. It is not just about moving data from A to B, it can do complex (and performant) transformations, enrichments and normalisations. It comes out of the box with support for many specific sources and endpoints (e.g. Kafka, Elastic, HDFS, S3, Postgres, Mongo, etc.) as well as generic sources and endpoints (e.g. TCP, HTTP, IMAP, etc.).
NiFi is not just about messages - it can work natively with a wide array of different formats, but can also be used for binary data and large files (e.g. moving multi-GB video files).
NiFi is deployed as a standalone application - it's not a framework or api or library or something that you integrate in to something else. It is a fully self-contained, realised application that is fully featured out of the box with no additional development. Though it can be extended with custom development if required.
NiFi is natively clustered - it expects (but isn't required) to be deployed on multiple hosts that work together as a cluster for performance, availability and redundancy.
So, the two tools are used quite differently - hopefully that helps highlight some of the key differences
It's true that there is some functional overlap between NiFi and Camel, but they were designed very differently:
Apache NiFi is a data processing and integration platform that is mostly used centrally. It has a low-code approach and prefers configuration.
Apache Camel is an integration framework which is mostly used in distributed solutions. Solutions are coded in Java. Example solutions are adapters, flows, API's, connectors, cloud functions and so on.
They can be used very well together. Especially when using a message broker like Apache ActiveMQ or Apache Kafka.
An example: A java application is enhanced with Camel so that it can send messages to Kafka. In NiFi the first step is consuming those messages from Kafka. Then in the NiFi flow the message is changed in various steps. In the middle the message is put on another Kafka topic. A Camel function (CamelK) in the cloud does various operations on the message, when it's finished it put the message on a Kafka topic. The message goes through a NiFi flow which at the end calls an API created with Camel.
In a blog I wrote in detail on the various ways to combine Camel and Nifi:
https://raymondmeester.medium.com/using-camel-and-nifi-in-one-solution-c7668fafe451

What is mediation engine?

What is mediation engine as referred to in camels documentation(below link)?
https://camel.apache.org/manual/latest/faq/what-is-camel.html
A use-case example too would be greatly appreciated.
The mediation engine reference in this context is originating from the topic of Enterprise Application Integration and is closely related to what GoF Mediator Pattern does - That is encapsulate communication between entities. In the case of EAI, a mediator/mediation engine sits between multiple disparate systems and acts as a broker between them, instead of letting the systems communicate directly.
Mediation approach in EAI offers capabilities like
Reduced coupling between systems. For instance, you do not have to learn and implement a legacy mainframe protocol in modern systems, just because you want to get some data from main frames. A mediation engine like Apache Camel could talk REST over HTTPS at one end and some archaic mainframe protocol on the other end.
Ease of migration: Once the mainframes area replaced with something else, you could just change the mediation layer to do something about it, instead of modifying multiple impacted systems that used to talk to mainframes.
Access to single resource/service via multiple channels: Let's say you have an old system that currently does SOAP over HTTP but you would like to offer REST with JSON payload to some of your new customers. Instead of building completely new systems up-front for this purpose, you could throw Apache Camel as a mediator and it would accept JSON payloads at one end and SOAP on the other end. Whoever wants to talk JSON can go through Camel and whoever wants to do SOAP may continue in a direct connection to the legacy system. Someday if some hypothetical FooBar protocol becomes popular, and if Apache Camel provides you a FooBar component, your users who demands FooBar support could be routed through Camel, to the system that still speaks SOAP.
All these stuff is discussed in detail in the Enterprise Integration Patterns site and book. Apache Camel implements truck loads of patterns as described in the EIP Book. I hope this answer helped you to understand the mediation role Apache Camel could play in Enterprise IT ecosystems.
From Camel in action
The core feature of Camel is its routing and mediation engine. A
routing engine selectively moves a message around, based on the
route’s configuration. In Camel’s case, routes are configured with a
combination of enterprise integration patterns and a domain-specific
language.
In this link, https://camel.apache.org/manual/latest/faq/what-is-camel.html indicated projects that can be components of the Camel route to and from where messages can be sent and consumed (https://camel.apache.org/components/latest/activemq-component.html, https://camel.apache.org/components/latest/cxf-component.html).
Apache camel is kind of a ESB middleware. Mediation with respect to Camel would mean the following
Data format transformation : If Application A speaks JSON and Application B understands CSV format. You can use Apache Camel to transfer JSON to CSV.
Protocol transformation : If Application A knows only to call webservices but Application B prefers reading data from a message queue. You can use Apache Camel to receive this data by exposing a webservice and then push it to a queue for application B to consume.
Content transformation - Filtering or Enriching data : During this transformation process, you can also transform the data by filtering or enriching data fields based on what application B needs. In this way no change is required in A as it sends what it has and no change is required in application B as it gets what it needs.
Connectors : Many ESB now have in-built connectors to connect directly with ERP or SAS based applications. For example, a kafka connector. https://camel.apache.org/blog/Camel-Kafka-connector-intro/

Pros and cons using Lucidworks Fusion instead of regular Solr

i wanna know what are the pros and cons using Fusion instead of regular Solr ? can you guys give some example (like some problem that can be solved easily using Fusion)?
First of all, I should disclose that I am the Product Manager for Lucidworks Fusion.
You seem to already be aware that Fusion works with Solr (or one or more Solr clusters or instances), using Solr for data storage and querying. The purpose of Fusion is to make it easier to use Solr, integrate Solr, and to build complex solutions that make use of Solr. Some of the things that Fusion provides that many people find helpful for this include:
Connectors and a connector framework. Bare Solr gives you a good API and the ability to push certain types of files at the command line. Fusion comes with several pre-built data source connectors that fetch data from various types of systems, process them as appropriate (including parsing, transformation, and field mapping), and sends the results to Solr. These connectors include common document stores (cloud and on-premise), relational databases, NoSQL data stores, HDFS, enterprise applications, and a very powerful and configurable web crawler.
Security integration. Solr does not have any authentication or authorizations (though as of version 5.2 this week, it does have a pluggable API and an basic implementation of Kerberos for authentication). Fusion wraps the Solr APIs with a secured version. Fusion has clean integrations into LDAP, Active Directory, and Kerberos for authentication. It also has a fine-grained authorizations model for mananging and configuring Fusion and Solr. And, the Fusion authorizations model can automatically link group memberships from LDAP/AD with access control lists from the Fusion Connectors data sources so that you get document-level access control mirrored from your source systems when you run search queries.
Pipelines processing model. Fusion provides a pipeline model with modular stages (in both API and GUI form) to make it easier to define and edit transformations of data and documents. It is analogous to unix shell pipes. For example, while indexing you can include stages to define mappings of fields, compute new fields, aggregate documents, pull in data from other sources, etc. before writing to Solr. When querying, you could do the same, along with transforming the query, running and returning the results of other analytics, and applying security filtering.
Admin GUI. Fusion has a web UI for viewing and configuring the above (as well as the base Solr config). We think this is convenient for people who want to use Solr, but don't use it regularly enough to remember how to use the APIs, config files, and command line tools.
Sophisticated search-based features: Using the pipelines model described above, Fusion includes (and make easy to use) some richer search-based components, including: Natural language processing and entity extraction modules; Real-time signals-driven relevancy adjustment. We intend to provide more of these in the future.
Analytics processing: Fusion includes and integrates Apache Spark for running deep analytics against data stored in Solr (or on its way in to Solr). While Solr implicitly includes certain data analytics capabilities, that is not its main purpose. We use Apache Spark to drive Fusion's signals extraction and relevancy tuning, and expect to expose APIs so users can easily run other processing there.
Other: many useful miscellaneous features like: dashboarding UI; basic search UI with manual relevancy tuning; easier monitoring; job management and scheduling; real-time alerting with email integration, and more.
A lot of the above can of course be built or written against Solr, without Fusion, but we think that providing these kinds of enterprise integrations will be valuable to many people.
Pros:
Connectors : Lucidworks provides you a wide range of connectors, with those you can connect to datasources and pull the data from there.
Reusability : In Lucidworks you can create pipelines for data ingestion and data retrieval. You can create pipelines with common logic so that these can be used in other pipelines.
Security : You can apply restrictions over data i.e Security Trimming data. Lucidworks provides in built query-pipeline stages for Security Trimming or you can write custom pipeline for your use case.
Troubleshooting : Lucidworks comes with discrete services i.e api, connectors, solr. You can troubleshoot any issue according the services, each service has its logs. Also you can configure JVM properties for each service
Support : Lucidworks support is available 24/7 for help. You can create support case according the severity and they schedule call for you.
Cons:
Not much, but it keeps you away from your normal development, you don't get much chance to open your IDE and start coding.

How to integrate various application and provide common interface to access their data?

we have a few various applications that stores its data and we need one common service which provides access to these data.
With the applications I mean for example Atlassian Jira, Confluence, SVN, Git, LDAP, few internal mysql databases, etc. Some of them offers you SOAP API, REST API, various command line clients, for some you have to directly access database to get data.
What we want is a common REST API interface, to access all possible data sources. Of course, we have to solve authentication and authorization, caching and many more tasks.
It seems that something like ESB - Enterprise service bus and EIP - Enterprise integration patterns is answer to our needs.
For start, we are playing and actually dig in to Apache Camel - it's not full EIP stack, it's "just" a integration framework. But I guess it's good enough for us right now.
My question is - what you mean about the solution? Are we on the good way?
Thanks!
Camel has a lot of connectors, so that would be a great start.
If you are afraid it is too thin, then take a look at Apache ServiceMix, which provides a deployment (OSGi) container for camel routes (and other things). Camel comes bundled within the standard service mix release out of the box.
The hard task is probably to design the generic API good enough to cover your use cases.
A GIT repo and a Database are very different, is this very generic? Do you only want to access "text" data or something?
I like the approach with camel non the less, since it's rather generic and flexible in these kind of scenarios. That you will need

Apache SOA vs. Mule

I'm looking for a high level technical gap analysis of the Apache ESB/SOA stack (Servicemix, Camel, ActiveMQ, CXF) vs. comparable Mule technologies.
As well, I'm trying to better understand how these frameworks are viewed amongst developers in terms of learning curve, stability, scalability and overall ability to meet client requirements...
It's not really an answer, but too long to be added as a comment.
Gartner does such comparisons (example), so does Forrester (example1; example2), but their papers are:
expensive to obtain
focusing more on the market share and the hype, less on the technical capability to deliver a solution
mainly about commercial products - maybe because market share for open source is difficult to measure (no licenses sold)
I personally have experience with Oracle Fusion (bad), Tibco (better) and Vitria (outdated), but I'm not up to the challenge to do a detailed comparison...
Camel uses a Java Domain Specific Language in addition to Spring XML for configuring the routing rules and providing Enterprise Integration Patterns
Camel's API is smaller & cleaner (IMHO) and is closely aligned with the APIs of JBI, CXF and JMS; based around message exchanges (with in and optional out messages) which more closely maps to REST, WS, WSDL & JBI than the UMO model Mule is based on
Camel allows the underlying transport details to be easily exposed (e.g. the JmsExchange, JbiExchange, HttpExchange objects expose all the underlying transport information & behaviour if its required). See How does the Camel API compare to
Camel supports an implicit Type Converter in the core API to make it simpler to connect components together requiring different types of payload & headers
Camel uses the Apache 2 License rather than Mule's more restrictive commercial license
Mulesoft Anypoint is a ready to use full-stack integration platform. Apache components functionally provide similar capabilities but generally take more time to implement and support. Both allow dropping down to Spring / Java level therefore no true technical gaps in either. The choice would depend on the business goals, available budget, and the scope and number of the integration projects. Mule offers better time to market and is easier to operate, but ain't particularly cheap. Apache stack is free but developers' time (generally) is not.
Camel is a EAI Framework and It doesn't have it's own runtime but other side Mule is full ESB product having it's own run-time. Mule has lot of connector to integrate with other system and stand itself as light weight ESB. Developers have full liberty to write own connector or invoke existing Java library to avoid rework.

Resources