What is the production ready NonSQL database? - database

With the rising of non-sql database usage in high traffic website, I'm interested to use it for my project. Now I've heard several names like Voldermort, MongoDB and CouchDB. But which are among these NonSQL database that is production ready? I've seen the download pages and it seems that none of them is production ready because is not version 1.0 yet. Is there any other names other than these 3 that is recommendable to be used in production?

What do you mean by production ready? As far as I know, all of them are being used on live systems.
You should make your choice based on how the features they provide fit your needs.
You can also add Tokyo Cabinet to the list as well as the mnesia database provided by the Erlang VM.

I think you need to start out from your project requirements to see what kind of database you really need. There are many non-relational DBMS:s out there and they differ a lot in what kind of problems they are good at solving. I think the article Should you go Beyond Relational Databases? by Martin Kleppmann is a good starting point for finding out what you need. There's also a lot of stackoverflow threads on similar topics, these are my favorites:
The Next-gen Databases
Non-Relational Database Design
When shouldn’t you use a relational
database?
Good reasons NOT to use a relational
database?
When you have narrowed down what you actually need you can take a deeper look into the alternatives to see which DBMS are production ready for your use case. Production readiness isn't a yes/no thing: people may successfully deploy some solution that for example lacks in tool support - in another project this could be a no-go.
As for version numbers different projects have a different take on this, so you can't just compare the version numbers. I'm involved in the graph database project Neo4j and even if it has been in production use for 5+ years by now we still haven't released a version 1.0 final yet.

I'm tempted to answer "use SIRA_PRISE".
It's definitely non-SQL.
And its current version is 1.2, meaning that someone like you must definitely assume it's "production-ready".
But perhaps I shouldn't be answering at all.

Nice article comparing rdbms with 'next gen' and listing some providers:
Is the Relational Database Doomed?
http://readwrite.com/2009/02/12/is-the-relational-database-doomed

I will suggest you to use Arangodb.
ArangoDB is a multi-model mostly-memory database with a flexible data model for documents and graphs. It is designed as a “general purpose database”, offering all the features you typically need for modern web applications.
ArangoDB is supposed to grow with the application—the project may start as a simple single-server prototype, nothing you couldn’t do with a relational database equally well. After some time, some geo-location features are needed and a shopping cart requires transactions. ArangoDB’s graph data model is useful for the recommendation system. The smartphone app needs a lean API to the back-end—this is where Foxx, ArangoDB’s integrated Javascript application framework, comes into play.
Another unique feature is ArangoDB’s query language AQL — it makes querying powerful and convenient. AQL enables you to describe complex filter conditions and joins in a readable format, much in the same way as SQL.
You can model your data in several ways:
in key/value pairs
as collections of documents
as graphs with nodes, edges, and properties for both
You can access data in ArangoDB:
using the general HTTP REST API via curl/wget, or your browser
via the ArangoDB shell (“arangosh”)
using a programming language specific client library
Server requirements for ArangoDB:
ArangoDB runs on Linux, OS X and Microsoft Windows.
It runs on 32bit and 64bit systems, though using a 32bit system will limit you to using only approximately 2 to 3 GB of data with ArangoDB.

Related

What is difference between Titan and Neo4j graph database?

I had worked on relational database; but now want to learn about graph database. I came to know that these two are graph database. What is difference between these two databases. What should we prefer among them?
One approach is to simply try to choose one database over the other. For example, you might quickly search around to find that Titan has been forked to JanusGraph where it is more actively maintained. In your research you may find that there are other open source graph databases as well like OrientDb, ChronoGraph, or Sqlg as well as commercial alternatives like Microsoft's CosmosDb, DSE Graph or IBM Graph. How do you decide now?
There is a graph framework that ties together all of these graphs including Neo4j/Titan (and more than those listed here): Apache TinkerPop. TinkerPop provides an abstraction over different graph databases and graph processors allowing the same code to be used with different configurable backends. This pattern is quite similar to the one you find in SQL with JDBC which helps make your code vendor agnostic.
You can try all of the different supported graph databases before you make a choice and you can do this type of prototyping/benchmarking fairly quickly with the Gremlin Console. You will be able to make self-informed choice as to what is the best way to go for your project.
It occurs to me as I come to the end of this post that I haven't directly answered your question. If you are just getting started and are just interested in learning about graph databases, then I likely wouldn't recommend starting with Titan/JanusGraph as it requires a bit of configuration to get started (schemas, backend selection, etc). Start with TinkerGraph or Neo4j using the Gremlin Console to try out some simple graph traversals and go from there.
Titan was originally backed by Aurelius, which was bought by DataStax in 2015. This move was designed to give DataStax a jump-start into the Graph DB world, as they now offer their own "DSE Graph" enterprise product. Titan was since been forked (as previously mentioned) into JanusGraph.
The nice thing about Titan/Janus (IMO) is that it is "pluggable" with other existing back-end and search technologies. So it will "play nice" with things like Cassandra, HBase, Hadoop, Solr, and ElasticSearch.
The drawback is that the community support is tough. The Titan project has been effectively killed, and Janus scores a whopping 0.23 on DBEngines. That makes it the 16th most-popular Graph DB (231st overall), which is pretty low.
Neo4j is backed by Neo Technology, and is regarded as the front-runner in the Graph DB community (score of 38.52 right now, 1st graph DB and 21st overall). It is open source, but controlled by Neo Technologies so they can dictate a difference in feature set between open source and enterprise.
The nice thing about Neo4j is that they have a lot of tutorials and learning aids built right-in to the Neo4j Browser, which is a nice, user-friendly web interface. Their documentation is top-notch, easy to read and search through, and they have a pretty good following here on Stack Overflow.
Neo4j Browser screenshot:
The drawback of Neo4j, is that some features (like clustering) are only available in the enterprise version. But if you work for a big company who doesn't mind shelling-out $ for an enterprise license, that may not be a big deal.
Consistency: Titan/Janus is a part of the "eventual consistency" crowd, while Neo4j aims to be strong-consistent (especially in a causal clustering scenario). Although consistency can be tuned with configuration in both, with Titan/Janus that can be dependent on your choice of pluggable backend (ex: typically strong-consistent with HBase, while eventually consistent with Cassandra).
Recommendation:
If you're just starting to learn graph databases and modeling, you can't go wrong with Neo4j. Simply download/install the community edition, run it, and execute :play movies as your first command (tutorial that walks you through loading, modeling, and querying movie relationships).
If you have some experience with graph, and you don't mind troubleshooting/googling to figure out things (like how to set the max frame size for Thrift), then you could probably do some really cool things with Titan.
Try each out, and see which one works for you.
There are far more than two graph databases - there are dozens. That being said, there are two with real market share: Neo4j and Titan/JanusGraph. But there are dozens of other graph datases, each with interesting strengths for different specific application spaces. That being said, I wouldn't dig into all of the niche players to start with - learning the basic idea of graph databases can be done with one of the two lead players.
Neo4j is the most mature, with the most nicely packaged install and documentation, tons of reference code, and support from a wide range of partners.
Titan/JanusGraph is the next most popular, as it's free/open source and has very strong support (e.g. IBM, Google, Hortonworks, AWS, ...). There's a recent complexity in that the leaders of the Titan project were acquired, freezing the Titan project. But the community forked the project into JanusGraph. So while JanusGraph is a new project, it's literally the same Titan code, with even broader industry support than Titan had.
Related to the two is the language used to work with the graphs. Neo4j uses its proprietary language, Cypher, while nearly everyone else uses Gremlin, and the TinkerPop open source tool set (which is a part of the Apache set of open source projects). Nearly all graph databases, including Neo4j, support Gremlin and TinkerPop. So, for example, you can use either Cypher or Gremlin to query Neo4j, though Neo (and some other proprietary graph database vendors) support Gremlin as a second-class citizen, so to speak. For example, you can connect to Neo using Gremlin from the (external) Gremlin console, but you can't use Gremlin in the (very nice) Neo4j console.
Note that there are many graph databases that support Gremlin other than Titan/JanusGraph. One new entrant that's very interesting is Microsoft's Azure Cosmos DB, which is a managed graph database that's "cheap and easy" if you use Azure already. And there are several vendors that provide managed JanusGraph.
For personal learningk I'd say that Neo4j is the easiest to set up and learn - you download and run it, and open a web browser onto their web-based console, which only takes a few minutes. That being said, if you're comfortable on a command line JanusGraph only took a half hour to install and get running for me, so it's not too hard.
For learning the concepts Neo4j is great. Neo4j's query language, Cypher, and JanusGraph's query language, Gremlin, are semantically identical, just spelled differently, so you'll learn the concepts either way.
For building a real system, either could work (and there are many successful following both approaches).
For which you choose, you'll want to think about whether you want to be strategically tied to a single vendor (Neo4j) or in a broader standards-based community. There's comfort level in picking the market leader with the most mature product - Neo4j. And there's a comfort level in picking open standards with strong industry support - JanusGraph. So IMO there's no "wrong" answer - people using either one are happy and successful. But since you have to pick, you'll need to think about which you're more comfortable with long-term.
Neo4j uses native graph technology.
Native graph technology ensures that data is stored efficiently by writing nodes and relationships close to each other.
It optimizes the graph DB.
With native graph technology, processing becomes faster because it uses index-free
adjancey. That means each node directly references its adjacent nodes.
Titan (Now JanusGraph) uses non-native graph technology.
In non-native we use different storage backends like Cassandra, HBase
With non-native processing becomes slowers compared to native because database uses
many types of indexs to link nodes together.

What factors to consider when choosing a Multi-model DBMS? (OrientDB vs ArangoDB)

I am looking to dip my hands into the world of Multi-Model DBMS, I have no particular use cases, just want to start learning.
I find that there are two prominent ones - OrientDB vs ArangoDB, but was unable to find any meaningful comparison, unopinionated between them. Can someone shed some light on the difference in features between the two, and any caveats in using one over the other? If I learn one would I be able to easily transition to the other?
(I tagged FoundationDB as well, but it is proprietary and I probably won't consider it)
This question asks for a general comparison between OrientDB vs ArangoDB for someone looking to learn about Multi-model DBMS, and not an opinionated answer about which is better.
Disclaimer: I would no longer recommend OrientDB, see my comments below.
I can provide a slightly less biased opinion, having used both ArangoDB and OrientDB. It's still biased as I'm the author of OrientDB's node.js driver - oriento but I don't have a vested interest in either company or product, I've just necessarily used OrientDB more.
ArangoDB and OrientDB are both targeting a similar market and have a lot of similarities:
Both are multi-model, you can use them to store documents, graphs and simple key / values.
Both have support for Gremlin, but it's firmly a second class citizen compared to their own preferred query languages.
Both support server-side "stored procedures" in JavaScript. In both systems this comes via a slightly less than idiomatic JavaScript API, although ArangoDB's is a lot better. This is getting fixed in a forthcoming version of OrientDB.
Both offer REST APIs, both aim to be usable as an "API Server" via JavaScript request handlers. This is a lot more practical in ArangoDB than OrientDB.
Both are distributed under a permissive license.
Both are ACID and have transaction support, but in both the transactions are server-side operations - they're more like atomic batches of commands rather than the kinds of transactions you might be used to in a traditional RDBMS.
However, there are a lot of differences:
ArangoDB has no concept of "links", which are a very useful feature in OrientDB. They allow unidirectional relationships (just like a hyperlink on the web), without the overhead of edges.
ArangoDB is written in C++ (and JavaScript), whereas OrientDB is written in Java. Both have their advantages:
Being written in C++ means ArangoDB uses V8, the same high performance JavaScript engine that powers node.js and Google Chrome. Whereas being written in Java means OrientDB uses Nashorn, which is still fast but not the fastest. This means that ArangoDB can offer a greater level of compatibility with the node.js ecosystem compared to OrientDB.
Being written in Java means that OrientDB runs on more platforms, including e.g. Raspberry PI. It also means that OrientDB can leverage a lot of other technologies written in Java, e.g. OrientDB has superb full text / geospatial search support via Lucene, which is not available to ArangoDB.
OrientDB uses a dialect of SQL as its query language, whereas ArangoDB uses its own custom language called AQL. In theory, AQL is better because it's designed explicitly for the problem, in practise though it feels quite similar to SQL but with different keywords, and is yet another language to learn while OrientDB's implementation feels a lot more comfortable if you're used to SQL. SQL is declarative whereas AQL is imperative - YMMV here.
ArangoDB is a "mostly-memory" database, it works best when most of your data fits in RAM. This may or may not be suitable for your needs. OrientDB doesn't have this restriction (but also loves RAM).
OrientDB is fully object oriented - it supports classes with properties and inheritance. This is exceptionally useful because it means that your database structure can map 1-1 to your application structure, with no need for ugly hacks like ActiveRecord. ArangoDB supports something fairly similar via models in Foxx, but it's more like an optional addon rather than a core part of how the database works.
ArangoDB offers a lot of flexibility via Foxx, but it has not been designed by people with strong server-side JS backgrounds and reinvents the wheel a lot of the time. Rather than leveraging frameworks like express for their request handling, they created their own clone of Sinatra, which of course makes it almost the same as express (express is also a Sinatra clone), but subtly different, and means that none of express's middleware or plugins can be reused. Similarly, they embed V8, but not libuv, which means they do not offer the same non blocking APIs as node.js and therefore users cannot be sure about whether a given npm module will work there. This means that non trivial applications cannot use ArangoDB as a replacement for the backend, which negates a lot of the potential usefulness of Foxx.
OrientDB supports first class property level and database level indices. You can query and insert into specific indexes directly for maximum efficiency. I've not seen support for this in ArangoDB.
OrientDB is the more established option, with many high profile users. ArangoDB is newer, less well known, but growing fast.
ArangoDB's documentation is excellent, and they offer official drivers for many different programming languages. OrientDB's documentation is not quite as good, and while there are drivers for most platforms, they're community powered and therefore not always kept up to date with bleeding edge OrientDB features.
If you're using Java (or a Java bridge), you can embed OrientDB directly within your application, as a library. This use case is not possible in ArangoDB.
OrientDB has the concept of users and roles, as well as Record Level Security. This may be a killer feature for you, it is for me. It also supports token based authentication, so it's possible to use OrientDB as your primary means of authorizing/authenticating users. OrientDB also has LDAP integration. In contrast, ArangoDB support only a very simple auth option.
Both systems have their own advantages, so choosing between them comes down to your own situation:
If you're building a small application, and you're a web developer optimizing for developer productivity, it will probably be easier to get up and running quickly with ArangoDB.
If you're building a larger application, which could potentially store many gigabytes or terabytes of data, or have many thousands of concurrent users, or have "enterprise" use cases, or need fine grained security controls, OrientDB is the one for you.
If you're storing RDF or similarly structured linked data, choose OrientDB.
If you're using Java, just choose OrientDB.
Note: This is (my opinion of) the state of play today, things change quickly and I would not underestimate the ruthless efficiency of the awesome team behind ArangoDB, I just think that it's not quite there yet :)
Charles Pick (codemix.com)

CUBRID database

I have received a message about CUBRID database they said that it's better than MySQL in performance, so any one heard about it.
Is that correct
Regards
I use CUBRID in most of my projects. The idea of being "better than MySQL", I think, depends on the situation, on the needs of your application. For some CUBRID is really nice, for some MySQL, or some other one. For example, CUBRID has very nice features optimized for Social Networking Services where you have heavy traffic often on one page, use lots of indexes, and take advantage of covering index. They provide some nice examples how to design your database schema and how to tune queries to obtain the best performance (link).
What's your case? If you expect simultaneously several hundred users who generate some thousands of new records every day, CUBRID can easily handle all these. This is what database systems are created for.
You should also consider the environment you are developing in. Is your app developed on PHP, Python, or what? We use PHP and Java on our sites. CUBRID has many Drivers. I believe you can find the necessary driver on their site.
You should also look at the community support. If you have some questions or issues with their database, it's often faster to directly write on their Q&A site or forum.

Real World Experience of db4o and/or Eloquera Database

I am evaluating two object databases, db4o (http://www.db4o.com) and Eloquera Database (http://eloquera.com) for a coming project. I have to choose one. My basic requirement is scalability, multi user support and easy type evolution for RAD.
Please share your real world experience.
If you have both, can you compare these two? Which do you prefer?
For the last 2 years I've been using DB4O, and I'm now switching to Eloquera.
My reasons, in order:
I'm building a commercial product, and the royalty based licensing on DB4O is WAY to high; DB4O said we could "talk about it", but I'm a very small development shop and giving away a huge chunk of each sale I make just doesn't make any sense when there's a perfectly good alternative.
I'm using the Db4oTool.exe to modify my assmeblies in a post-build step, and it really slows down the build process. Eloquera doesn't need to modify my assemblies.
I found a bug in the DB4O code, and it took many many months before it was integrated into their codebase. I have found bugs in Eloquera and they fixed them in a day or two
DB4O is not yet on .NET 4 (although they finally have an early beta). DB4O is the ONLY thing holding me back from using VS2010 (and .NET 4). I tried migrating to VS2010 but VS2010 automatically converts all unit tests to .NET 4, so all of my persistence related unit tests immediately failed.
DB4O is not really designed to be thread-safe.
DB4O has features and many API features that are obviously ported from Java.
Robert
Eloquera ( www.eloquera.com ) originally designed and developed for use in the Web environment and it’s designed as native .NET application in C#.
Eloquera wasn’t ported from Java as many other databases.
Eloquera natively as part of architecture supports:
Simultaneous user access
Security settings
Has genuine C/S architecture, has desktop mode available.
Max database size 1TB+, in a large data scale Eloquera maintains the fast query response; it has patents pending technologies including virtual file system, indexing, and adaptive cache. Eloquera has state of the art reflection written in MSIL that allows Eloquera to outperform many databases that use Microsoft’s standard reflection.
Supports in-memory database for the fast data processing
Since most of the users in the Web come from relational database world it was natural for Eloquera to support SQL and LINQ
EF support is due next month
Unlike some databases Eloquera does not put blindly objects in the database, if you change fields from int;int; to long; it will not keep querying with a wrong results because it still sees two int;int; - it will notify the user to update the definition
Eloquera provides a native indexing for properties and fields. Most of the databases do not provide properties indexing.
I might argue with Carl regarding DB4O the easiest database on the market, since Eloquera can do the same things from API perspective.
Eloquera is younger than Versant and still has some enterprise features coming.
Last month Eloquera R&D department got engaged with Eloquera Parallel Server to provide horizontal scaling that arguably will be magnitude cheaper than Versant’s VOD.
Some of the distinguished points
Eloquera is FREE for commercial use. You are not required to pay any royalties. All features above you have for FREE.
Eloquera has a commercial support available.
Eloquera is designed for the modern world with modern architecture. It was not adapting from time to time to market needs. It is natural part of Eloquera’s architecture.
If you are interested to hear user experiences with db4o, I suggest you also ask in our db4o user forums.
While db4o was originally developed for embedded use in applications with limited resources (and now runs very well on constrained platforms like Android, CompactFramework and Silverlight) I know that we do have many users that are happily using db4o for web applications.
Indeed there is some correctness to the db4o-bashing-post by leatrop: The db4o server core currently only allows one thread to enter for storing and querying tasks in a particular database.
However there are a couple of ways to make db4o applications scale very well:
Since the setup costs for db4o databases is very low (one single API call) it is possible to work with multiple databases. You can use the db4o replication system (dRS) to distribute objects between multiple databases. It is also possible to create backups of db4o databases while they are running and to replicate these backups to multiple machines. The approach of using multiple databases (for timeslices of data or for different usecases in your application) can be very nice for backup and debugging purposes. You don't need to copy the entire database if you want to test only some aspects of your live app.
If you still find that db4o does not scale good enough for concurrent users or database sizes, you can later switch to our high-end object database Versant VOD. It was built to run in the cloud and it has a proven track record to work for thousands of concurrent users with multi-terabyte databases. VOD for .NET also comes with a LINQ provider, so the interfaces of db4o and VOD are compatible.
My recommendation: Start with db4o. It is the easiest object database to get started with and to develop with. Just store any object with one line of code, without setting up schemas or mapping files. Use LINQ to query (or native queries, if you work with Java).
db4o is open source and it's free (under the GPL).
I'm creating a 2nd generation Social Media Platform completely based on Javafx and Db4o. We are able to do things with db4o that would be impossible with any other database. Semantic OWL Ontologies and Complex relationships with Objects and Our User Definable Canvas make Db4o an amazing fit for us. We have no worries about scaling either and have found several solutions. Carl is one of the most intelligent people in software. This fact is obvious when you learn about his product.
Mike Tallent
CEO
Objectwheel

Social Networking backend architecture

Ideally, where would an application like Facebook store its "Friends" data?
In a database table? in an xml file?
From Facebooks engineering page:
"Already, we are the second most-trafficked PHP site in the world (Yahoo is #1), and one of the largest MySQL installations anywhere, running thousands of databases."
and
"We've built a lightweight but powerful multi-language RPC framework that allows us to seamlessly and easily tie together subsystems written in any language, running on any platform. Facebook is built in PHP, C++, Perl, Python, Erlang, Java, and even a little bit of ML—and it all works together.
* We are the largest user in the world of memcached, an open-source caching system. Originally developed by LiveJournal, we've since made so many scalability improvements and performance upgrades that we will be the primary contributor of features in the next major release.
* We've created a custom-built search engine serving millions of queries a day, completely distributed and entirely in-memory, with real-time updates."
Relational databases?
check out this blog: http://highscalability.com/ many real-world examples of systems architecures to learn from
"Friends" data is well-described in a graph database. Neo4j is an example, though I know it's not the way Facebook stores this information.
Facebook uses a number of database technologies that may be involved:
a patched version of MySQL
Cassandra
Hadoop
... others
Most probably it should contain some other mechanism. As an example a search engine does not keep its index as a database or XML file. To obtain a maximum performance generally they keep some tree (Binary search tree or more complicated one) and store them on disk in performance effective manner. So I guess such mechanism.
Certainly not in a XML file.
Yes, in a database, in one or several tables. And for the precise exemple of facebook, on several server.

Resources