What NoSQL database to use? - database

I am developing an app that has many-to-many relationship between its entities and also need to store files (pdfs, emails, images, etc). I will be using Java code. I would like to know which NoSQL database to use. I was thinking of using Neo4J for the many-to-many relationship with gridfs to store the files. Has anyone done something like this before? Need to store the complex relationship between entities and also files (which can be huge.. 16MB-1ooMB).

Not sure that i help you, but i try to use mongo for model many-to-many.
And my experience is terrible...
because, official mongo documentation says: for model many-to-many use DBRef.
And when i use DBRef i need dereference my documents references.
And i don't found any working example how to do it....
I wrote own - but it provides many bugs...
Perhaps i'm looser, and don't understand how works with it, but i think when you will be look to mondo think about it.

Related

How do I link models across different Datasources?

At first glance I am assuming the answer is no simply because when I look at the queries Cake creates, there is no way for one datasource to know how to build queries for the associated tables. At least to me. What I have is a UserModel that hasOne TwitterProfileModel via User.twitter_profile_id. My UserModel uses MySQL and my TwitterProfileModel uses MongoDB. The only solution I can think of is fetching the TwitterProfile data via my UserModel's afterFind callback. I am curious to know if there is a solution more native to CakePHP and if the way I am planning of approaching this issue is the best way. I've looked at the documentation and I see no mentioning of a situation like mine.
EDIT:
I am aware that datasources do not talk to each other. My question is what steps can I follow in order to be able to retrive an associated model that is from a different datasource
This is not related to CakePHP at all but to databases: You can not associate different db systems tables by joins. Use the same db system for both tables if you really have to use sql joins if not you'll need to fetch the records in the afterFind() as you already do it.

Can I use RavenDB (NoSQL) or should I just be using MySQL(RDBMS)?

I am starting on a ASP.NET MVC 3 General Management System (Project Management being the first component). Now I have been reading up a bit on RavenDB and it sounds pretty interesting. One of the biggest things that I like about it is the fact I would not need any type on ORM to handle the data from the DB. This will make my code a lot cleaner and quicker. However coming from a background working exclusively with MySQL for the past 6+ years, I tend to think very relationally with my data. There are a few things that seems like NoSQL would not be good for. I want to throw these things out there and maybe these issues can be handle in a NoSQL solution and I am just think too relationally (then again, maybe this project should be done with MySQL). These are the issues I am thinking of:
Unique Idenifiers: I am going to want to be able to have unique identifiers for a lot of things. For stuff like projects, the name should be unique and could use that however when it come to tasks under a project, the title may not be unique and this is where I would use a quto-increment field but I can do that in RavenDB (from what I can tell)
Linking: Using for fields like status and type I would just use a linking with a foreign key. Now for one-to-many relationships, I can just use the text instead of trying to link a foreign key (which you don't have in NoSQL) but with many-to-many linking, that because a problem. For example, I intend to have a tagging system (like on here) where most items can have 1 to many tags attached to it and then I can perform searches on those tag for the items. Is there a way to do this in NoSQL?
Is a RDBMS really the best tool for the job here or am I just not properly think the "NoSQL" way and I can accomplish this with NoSQL (RavenDB)?
I know this is an old post. Perhaps the docs weren't as good when originally written. But for reference in case other stumble here:
Raven comes with a HiLo document id generation strategy by default. Storing a new document without specifying an id yourself will get an auto incrementing id such as "projects/1", "projects/2", etc. Read more here.
The best guidance on the different ways to handle document relationships is here in the documentation. For the situation you described, you don't really need a separate document at all. You can simply embed a string array of tag names into each item. Documents are not flat, they can be structured. And yes, you can still query on them.
Hopefully you've discovered this on your own since the original post.
Ayende wrote a post "Modeling reference data in RavenDB" which answers some of your questions re Linking. You will have copies of the data between the reference document and your other documents and that redundancy is "ok" for document databases. You can still build indexes or query based on the on either Id or text that you store.
I would favor SQL for a transaction system such as Accounts Receivable application where you need to perform ad hoc queries. With document database you really need to think through how you will be fetching your data and build indexes up front to answers those questions. With RavenDB there is also a dynamic indexing function that learns from and caches the queries that are fired at the database.
For project management where the majority of items would be tasks I would think a RavenDB would fit your needs.

Is there a nosql store that also allows for relationships between stored entities?

I am looking for nosql key value stores that also provide for storing/maintaining relationships between stored entities. I know Google App Engine's datastore allows for owned and unowned relationships between entities. Does any of the popular nosql store's provide something similar?
Even though most of them are schema less, are there methods to appropriate relationships onto a key value store?
It belongs to the core features of graph databases to provide support for relationships between entities. Typically, you model your entities as nodes and the relationships as relationships/edges in the graph. Unlike RDBMS you don't have to define relationships in advance -- just add them to the graph as needed (schema-free). I created a domain modeling gallery giving a few examples of how this can look in practice. The examples use the Neo4j graphdb, a project I'm involved in. The mailing list of this project use to prove very helpful for graph modeling questions.
The document-oriented database Riak has support for links between documents.
You can add support for relationships on top of any database engine (like key/value), but it doesn't come whithout work. It all comes down to your use case. If you provide more details it's easier to come up with a useful answer.
Oops, now I saw that the title says "nosql store" and then your actual question narrows this down to "nosql key value store". As key/value stores have no semantics for defining relationships between entities I'll still post my answer.
MongoDB is a document database, not a key/value store. It does provide, however, a simple form of inter-document references. These work more-or-less like SQL foreign keys that are automatically nulled when the referenced object is deleted.
This is adequate for the same sorts of things for which you'd use foreign keys, but it isn't optimized for serious graph traversal.
The relationships in the Google App Engine are only keys to entities that are automatically de-referenced when accessed in code. And are only values when used to filter against. Its a function of the DB Api rather than anything explicit, so the access to the ReferenceProperty will simply perform a query against the referenced model to get access to the object.
If you look at something like MongoDB, the relationships are stored in-object (from what I remeber), but they can also be stored however you want in the sense that you would create an API that would search the joined table for your item in the relationship in a similar manner to who the App Engine works.
Paul.

Is there a database like this?

Background: Okay, so I'm looking for what I guess is an object database. However, the (admittedly few) object databases that I've looked at have been simple persistence layers, and not full-blown DBMSs. I don't know if what I'm looking for is even considered an object database, so really any help in pointing me in the right direction would be very appreciated.
I don't want to give you two pages describing what I'm looking for so I'll use an example to illustrate my point. Let's say I have a "BlogPost" object that I need to store. Something like this, in pseudocode:
class BlogPost
title:String
body:String
author:User
tags:List<String>
comments:List<Comment>
(Assume Comment is its own class.)
Now, in a relational database, author would be stored as a foreign key pointing to a User.id, and the tags and comments would be stored as one-to-many or many-to-many relationships using a separate table to store the relationships. What I'd like is a database engine that does the following:
Stores related objects (author, tags, etc.) with a direct reference instead of using foreign keys, which require an additional lookup; in other words, objects on top of each other should be natively supported by the database
Allows me to add a comment or a tag to the blog post without retrieving the entire object, updating it, and then putting it back into the database (like a document-oriented database -- CouchDB being an example)
I guess what I'm looking for is a navigational database, but I don't know. Is there anything even remotely similar to what I'm thinking of? If so, what is it called? (Or better yet, give me an actual working database.) Or am I being too picky?
Edit:
Just to clarify, I am NOT looking for an ORM or an abstraction layer or anything like that. I am looking for an actual database that does this internally. Sorry if I'm being difficult, but I've searched and I couldn't find anything.
Edit:
Also, something for the JVM would be excellent, but at this point I really don't care what platform it runs on.
I think what you are describing could easily be modeled in a graph database. Then you get the benefit of navigating to the nodes/edges where you want to make changes without any need to retrieve anything else. For the JVM there's the Neo4j open source graph database (where I'm part of the team). You can read about it over at High Scalability, as part of an overview at thinkvitamin or in this stackoverflow thread. As for the tags, I think storing them in a graph database can give you some extra advantages if you want to find related tags and similar stuff. Just drop a line on the mailing list, and I'm sure the community will help you out.
You could try out db4o which is available in C# and Java.
I think our looking for this: http://www.odbms.org/. This site has some good info on Object Databases, including Objectivity, which is a pretty good object database.
Elephant does this: http://common-lisp.net/project/elephant/
Exactly what you've described can be done with (N)Hibernate running on an ordinary RDBMS.
The advantage of using such a persistence layer with an ordinary database is that you have a standard database system combined with convenient programming. You declare your classes in a very natural way, and (N)Hibernate provides a way to translate betweeen references/lists and foreign key relationships.
Java tutorial: http://docs.jboss.org/hibernate/stable/core/reference/en/html/tutorial-firstapp.html
.NET tutorial: https://web.archive.org/web/20081212181310/http://blogs.hibernatingrhinos.com/nhibernate/archive/2008/04/01/your-first-nhibernate-based-application.aspx
If you insist that you don't want to use a well-supported standard RDBMS and would rather trust your data to something more exotic and less heavily tested, you're looking for an Object Relational Database.
However, such a product would probably be best implemented by making it be a layer over a standard RDBMS anyway. This is probably why ORMs like (N)Hibernate are the most popular solution - they allow standard RDBMS software (and widely available management/user skills) to be applied, and yet the programming experience is 99% object-based.
This is exactly what LINQ was designed for.
Microsoft LINQ defines a set of proprietary query operators that can be used to query, project and filter data in arrays, enumerable classes, XML (XLINQ), relational database, and third party data sources. While it allows any data source to be queried, it requires that the data be encapsulated as objects. So, if the data source does not natively store data as objects, the data must be mapped to the object domain. Queries written using the query operators are executed either by the LINQ query processing engine or, via an extension mechanism, handed over to LINQ providers which either implement a separate query processing engine or translate to a different format to be executed on a separate data store (such as on a database server as SQL queries (DLINQ)). The results of a query are returned as a collection of in-memory objects that can be enumerated using a standard iterator function such as C#'s foreach.
There's a variety of terms, all linked to Object-Relational Mapping, aka ORM, which is probably going to be the most useful one for you to look up. ORM libraries exist for many programming languages.
Oracle's nested tables provide some part of that functionality, though in updates, you cannot just add a row to the nested table - you have to replace the whole nested table.
I guess you're looking for an ORM with "EntityFirst" approach.
In EntityFirst approach the developer is least[not-at-all] concerned with Database. You just have to build your entities or objects. The ORM then takes care of storing the entities in Database and retrieving them at your will.
The only EntityFirst ORM witihn my knowledge "Signum". It's a wonderful framework built on top of .net. I recommend you to go thrgouh some videos on the SignumFramework website and I'm sure you'll find it useful.
Link Text: http://www.signumframework.com
Thanks.
ZODB perhaps?
good introduction find here:
http://www.ibm.com/developerworks/aix/library/au-zodb/
You could try out STSdb, DB4O, Perst ... which is available in C# and Java.

ActiveRecord for Erlang

I am continuing to delve into Erlang. I am thinking of starting my next web project using Erlang, and at this stage the only thing I will really miss from Ruby on Rails is ActiveRecord.
Is there a good alternative technology for Erlang?
Update:
The closest I have come to a solution is to ErlyDB, a component of ErlyWeb.
ErlyDB is a database abstraction layer
generator for Erlang. ErlyDB combines
database metadata and user-provided
metadata to generate functions that
let you perform common data access
operations in an intuitive manner. It
also provides a single API for working
with different database engines
(although currently, only MySQL is
supported), letting you write portable
data access code.
Well, the major advantages of ActiveRecord (as I see it) are:
You can persist your objects in a relational database nearly transparently.
You can search the database by any attribute of your objects.
You can validate objects when persisting them.
You can have callbacks on deleting, updating, or inserting objects.
With Mnesia:
You can persist any Erlang data absolutely transparently.
Using pattern matching, you can search the database by any attribute of your data or their combination.
QLC gives you a nice query interface for cases when pattern matching isn't enough.
No solutions for validating and callbacks, however...
So, what else do you have in ActiveRecord that is lacking in Mnesia?
I don't think there really is at the time of this writing. That may be because the kinds of systems being written in erlang and the type of people writing them don't really call for Relational Databases. I see much more code using mnesia, CouchDB, Tokyo Cabinet and other such alternative database technologies.
That's not to say someone might not want to create something like active record. It's just hasn't really been a need yet. Maybe you will be the first? :-)
You might be interested in Chicago Boss's "BossRecords":
http://www.chicagoboss.org/api-record.html
They are quite explicitly modeled on the Active Record pattern, and use a lot of compiler magic to make the syntax squeaky clean. BossRecords support save/validate as well as has_many/belongs_to associations. Attributes in your data model are made available through generated functions (e.g. "Employee:first_name()").
Some googling reveals libs / clients / wrappers for Couchdb described "ActiveRecord like libraries like CouchFoo", and advise to steer clear:
http://upstream-berlin.com/2009/03/31/the-case-of-activerecord-vs-couchdb/
http://debasishg.blogspot.com/2009/04/framework-inertia-couchdb-and-case-of.html#
as to your comment on "not suited for web apps yet", I think the pieces are there: mochiweb, couch, yaws, nitrogen, erlyweb. There's some powerful tools, very different paradigm, certainly, from rails, django and PHP.

Resources