What are the best alternatives of Jackrabbit/OAK? - jackrabbit

We are building services for a CMS and we went ahead with Jackrabbit. But as we are moving towards production, we are
getting into many issues like authentication/authorization, scaling, etc.
Our implementation was with Jackrabbit 2.18 with Mysql DB.
But due to lack of community support we are exploring other options.
Is there any other better solution / alternatives to Jackrabbit or Oak ?
Or Oak with mongodb or cassandra may give better performance ?

Related

Which community edition graph database supports high-available cluster and has good online query performance?

I am currently building a knowledge graph for an e-commerce company, and it mainly consists of the product category hierarchies, properties, and relations among them. Besides the common relational queries, we care about the following points very much:
Master-slave cluster support. This graph database will be used for online search query processing, so high availability is crucial to us. The data volume won't be as big as millions of nodes, so we don't need a distributed cluster that can span data across multiple machines. Still, rather we may need multiple machines that can be read simultaneously, and the service won't go down even if one of the machines is offline.
Fast online query performance. Reasoning about relations can be done offline, so the performance is not that important. But we need to do a lot of online queries like "find the nodes whose property P equals to value V", so we need good performance for online query processing. This database will be read-intensive and won't be changed very much after it's initialization.
Community and documentation. Since our team is new to the field of a graph database, so we expect user-friendly documentation for deployment and development and an active community for solving problems.
Based on the requirements above, I investigated some candidates:
Neo4j. We first tried Neo4j since it's the most popular one in the field. Actually, I liked it, especially the Cypher query language. But we are about to abandon it because the community edition does not support any cluster, and currently, we don't have the budget to pay for the enterprise edition.
OrientDB. OrientDB is like the second most popular one on the market, and it seems to support cluster in its community edition. I use the word "seems" because it is not clearly stated on its website. Can anyone clear this out? Besides, I found a negative article about OrientDB which makes me hesitate: http://orientdbleaks.blogspot.jp/2015/06/the-orientdb-issues-that-made-us-give-up.html
Titan. Titan is also great, but since its original company has been acquired and its original developers are developing a different product, its future development and maintenance are in doubt.
ArangoDB. This one seems to be very fast, according to the performance report(https://www.arangodb.com/2015/10/benchmark-postgresql-mongodb-arangodb/), but I don't know if its online query processing ability is good enough, and its support for the cluster is also unknown to me.
As for documentation and community, I really have no idea since these are the kind of things that you only get to know after you start doing it.
To sum up, based on my requirements, I think OrientDB and ArangoDB maybe my candidates, but I don't know which one to choose because of the points I stated above. Or perhaps is there any other right candidate that I'm missing?
Thanks.
Max working for ArangoDB here. ArangoDB does not only do online queries for graphs, but due to its multi-model nature you can mix graph queries with document queries (using secondary indexes), key lookups and joins. It has a sophisticated query engine with an optimizer that is fully aware of the ArangoDB cluster structure and can optimize and distribute query executions across all instances.
In a cluster, sharding, synchronous replication and self-healing are all fully automatic with configurable parameters. Deployment of an ArangoDB cluster is particularly simple (literally two clicks) on Apache Mesos or DC/OS, but is also relatively straightforward with other orchestration frameworks. ArangoDB on DC/OS additionally allows you to scale up and down via the graphical user interface or REST API calls, and failed tasks are automatically replaced.
As to the performance, all our benchmarks show a very good performance, the just released Version 3.1 even has vertex centric indexes, which is particularly important for graph queries.
We do our best to provide extensive documentation, which you find at https://www.arangodb.com/documentation/ . We have a user manual, a manual for our query language AQL as well as one for the HTTP/REST API. Furthermore, we have tutorials, frequently asked questions, a "Cookbook" for standard tasks, and we try to answer questions on StackOverflow and github issues in a timely manner.
All of this is included in the Community Edition, which is available with the Apache 2.0 open source license.
If you have more questions, do not hesitate to reach out to our team or to me personally.
OrientDB Community Edition is a free open source software, built upon by a community of developers and is constantly improving. Features such as horizontal scaling, fault tolerance, clustering, sharding and replicating aren’t disabled in OrientDB community.
For more information about cluster, take a look at the official OrientDB guide: http://orientdb.com/docs/last/Tutorial-Clusters.html
Hope it helps.
Regards
Neo4j enterprise edition can be used under the AGPL license. I am surprised a lot of people arn't aware this. If you are using Neo4j Enterprise as a server and communicating with it via REST or bolt protocol (Apache Licensed), then you don't have to worry about releasing the code of the system connecting to it under AGPL.
If you are using it embedded, then you to adhere to AGPL, but then why would you need Neo4j enterprise in that situation?
Remember to clone and compile Neo4j Enterprise from github if you plan on using it under AGPL, don't download trial.
Neo Technology gives great support and that is what you are essentially paying for for the enterprise subscription.

Cassandra and solr on same node

I am working on architecting a POC Cassandra Datastax enterprise cluster environment. We are going to use solr in combination with Cassandra. Would it be a valid configuration to host both solr and Cassandra on the same physical server?
If you're evaluating DSE, Solr is built into the packages you're using. It's an extremely tight integration that would be tough to replicate on your own. Here's the documentation: https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchIntro.html
It's also worth noting that Solr, in this case, does run co-located with Cassandra for data locality, and to take advantage of C* replication, availability, and some other C* specific benefits.
But most importantly I suggest checking out this hands on training: https://academy.datastax.com/courses/ds310-datastax-enterprise-search-apache-solr
If you have any specific questions about the integration, update your question and I'd be happy to help.

Need help in deciding java framework, libraries

I am planning to develop open source java application to work on google app engine as well as normal rdbms system, so please help me in choosing
MVC Framework -
Struts / Spring MVC ?
ORM -
JDO / JPA ?
I am considering performance as a key factor.
For app engine you will want a lightweight framework, both for persistance and application stuff. Google is changing their pricing model so you might want to consider how this will impact your plans as well. There is an interesting discussion on the app engine group about this: https://groups.google.com/forum/#!topic/google-appengine/ob-kMuDAAqc/discussion
Aside from that I can only comment on the choice of persistance framework:
JDO on app engine is a pain. The version they (Google) support in app engine is 1.x, which is ancient, I believe. I have had more trouble with getting things to work than I care to remember. If you have previous experience with JDO this still might be a good choice. If I were to start over again I would choose a persistence framework that was specifically written for app engine, like objectify or twig. They require less overhead and are easier to use (from my point of view). One giant plus of objectify: gives you memcache support out of the box with no extra work. How great is that?
However, you also want to support an RDBMS. If you have the time, you could roll your own abstraction layer on top of objectify and the RDBMS persistance layer of your choice. That would give you the edge in performance that you are looking for. ;)
If that is not an option I would suggest JPA (not because I have used it myself, but because I had so much trouble with JDO).
Hope this helps!
What do you mean by performance? For instance, if you're not using the always on feature, you might consider the server cold start-up time as the single thing affecting the performance the most. All the frameworks will make it slower, there's even a blog post about optimizing the cold start-up time.
For the MVC, Spring 3 is quite good at it, but it's more personal preference and what you're comfortable with. If you want something designed for AppEngine, give Gaelyk a try, it's Groovy though. For the data storage, JPA is perhaps more widely used, but I think the JDO support on App Engine is better. Both of those provide some level of platform independence, if you need to get off from App Engine. There are also Objectify-Appengine and Twig that are more tied to the platform and thus might provide better interface for managing the datastore.
I would personally go with Java EE 6 framework.So
MVC: JSF -> very lightweight and easy to develop. JSF2.x fixed many shortcoming features from JSF1.2
ORM: JPA2.0 -> since it is standard and come with Java EE 6 bundle. You can replace with Hibernate, your choice. Each has its own unique advantage features. I would not say one is better than other.
Dont forget that Java EE 6 come with EJB3.1. EJB have had its bad reputation to be heavyweight, however since EJB 3.0, it is a much different story. EJB3.1 has become much more light weight and easy to develop. Glassfish web profile provides EJBLite (Hehehehe :D :D much lighter :D)
In term of development complexity, I have to say that Spring is a bit more complex than JEE6, but again, I only touch Spring very minimal, this discussion will leave to much more experienced developers to talk about.
I would go for JSF + JPA and I'd use Spring Framework for dependency injection.
My 5¢. :P

What is the best database to use with Grails in an enterprise application?

I realize this has flame potential, please refrain. That being said, I'm interested in what databases people have used with Grails. What positive experiences and what horror stories are out there?
I love MySQL, but there are a few significant bugs that are impacting me between Hibernate and MySQL, particularly as it pertains to index creation. So I guess my question is really, what is the most stable database for integration with Grails? Or what database has the fewest bugs with respect to Grails?
Or what database has the widest use in conjunction with Grails? I also realize that these questions are somewhat orthogonal and opposing. Anyway, I'd like to open it up to discussion.
I use Hibernate in enterprice apps since version 1. My personal chart is
Oracle: fast stable lot of dba knows it and hao to tune performace backups end so
SQLServer: same as above (but not as fast as Oracle)
DB2: not so easy to use with hibernate(I got several issues with date and char datatype)
MySQL: not so easy to manage or find professional support (may be different for you) but Hibernate stuff works great.
As Gregg said, this is a hibernate question - Grails does all it's DB interaction via that (except for any custom SQL you write).
The only problem you might hit is with the GORM DSL not correctly creating any tricky hibernate mappings you require for a particular DB (especially if it's a legacy one). But GORM is pretty mature these days and I personally haven't hit any issues lately.
We run MySQL in production on a public web application and it has been fine. We've also deployed 'enterprisey' apps on top of Oracle which also went well except for a couple of issues with id generator configuration if I recall correctly. But I think those have been fixed in the latest Grails version.
In summary, go with your gut feel based on previous experience with hibernate.
cheers
Lee

How to best search against a DB with Lucene?

I am looking into mechanisms for better search capabilities against our database. It is currently a huge bottleneck (causing long-lasting queries that are hurting our database performance).
My boss wanted me to look into Solr, but on closer inspection, it seems we actually want some kind of DB integration mechanism with Lucene itself.
From the Lucene FAQ, they recommend Hibernate Search, Compass, and DBSight.
As a background of our current technology stack, we are using straight JSPs on Tomcat, no Hibernate, no other frameworks on top of it... just straight Java, JSP, and JDBC against a DB2 database.
Given that, it seems Hibernate Search might be a bit more difficult to integrate into our system, though it might be nice to have the option of using Hibernate after such an integration.
Does anyone have any experiences they can share with using one of these tools (or other similar Lucene based solutions) that might help in picking the right tool?
It needs to be a FOSS solution, and ideally will manage updating Lucene with changes from the database automagicly (though efficiently), without extra effort to notify the tool when changes have been made (otherwise, it seems rolling my own Lucene solution would be just as good). Also, we have multiple application servers with just 1 database (+failover), so it would be nice if it is easy to use the solution from all application servers seamlessly.
I am continuing to inspect the options now, but it would be really helpful to utilize other people's experiences.
When you say "search against a DB", what do you mean?
Relational databases and information retrieval systems use very different approaches for good reason. What kind of data are you searching? What kind of queries do you perform?
If I were going to implement an inverted index on top of a database, as Compass does, I would not use their approach, which is to implement Lucene's Directory abstraction with BLOBs. Rather, I'd implement Lucene's IndexReader abstraction.
Relational databases are quite capable of maintaining indexes. The value that Lucene brings in this context is its analysis capabilities, which are most useful for unstructured text records. A good approach would leverage the strengths of each tool.
As updates are made to the index, Lucene creates more segments (additional files or BLOBs), which degrade performance until a costly "optimize" procedure is used. Most databases will amortize this cost over each index update, giving you more stable performance.
I have had good experiences with Compass. It has really good integration with hibernate and can mirror data changes made through hibernate and jdbc directly to the Lucene indexes though its GPS devices http://www.compass-project.org/docs/1.2.2/reference/html/gps-jdbc.html.
Maintaining the Lucene indexes on all your application servers may be an issue. If you have multiple App servers updating the db, then you may hit some issues with keeping the index in sync with all the changes. Compass may have an alternate mechanism for handling this now.
The Alfresco Project (CMS) also uses Lucene and have a mechanism for replicating Lucene index changes between servers that may be useful in handling these issues.
We started using Compass before Hibernate Search was really off the ground so I cannot offer any comparison with it.
LuSql http://code.google.com/p/lusql/ allows you to load the contents of a JDBC-accessible database into Lucene, making it searchable. It is highly optimized and multi-threaded. I am the author of LuSql and will be coming out with a new version (re-architected with a new plugable architecture) in the next month.
For a pure performance boost with searching Lucene will certainly help out a lot. Only index what you care about/need and you should be good. You could use Hibernate or some other piece if you like but I don't think it is required.
Well, it seems DBSight doesn't meet the FOSS requirement, so unless it is an absolutely stellar solution, it is not an option for me right now...

Resources