We are in the process of a database redesign & have heard stuff about all the different options: hadoop, cassandra, oracle, etc. Is there a good article that compares each of the major DB's side-by-side on performance & features?
Comparison of relational database management systems
NoSQL
Related
It seems now that Google bet on NewSql solutions for big data storages.
I'm wondering if there is still some advantages of a NoSql solution comparing to a newSql solution ? (Like memory managment or others things)
NewSql databases are a new "strain" of databases if you will that are attempting to take the long established benefits of the traditional Relationl Database Management System (RDBMS) and make it compete with the highlights of NoSql data stores. They are not updates or improvements to RDBMS but more often rewrites that include middleware that abstracts the practice of database "sharding" or the ability to distribute a database over a grid of computers like NoSql does.
The power of the RDBMS comes mostly from queryability via the Structured Query Language (SQL), their transactionality and adhereance to the ACID principal (Atomicity, Consistency, Isolation, Durability) and the powerful tools developed over time to manage them. A lesser benefit comes from the fact that the relational model eliminates repetitive storage of the same information in multiple places.
The benefits of the NoSql is high speed, the ability to scale laterally across a comuting grid, and the lack of schema to maintain. This makes them very highly performant even against hugh data stores. But they lack the benefits that you get from the traditional RDBMS in that the query language to manipulate data isn't really there (yet), they can't be transactional across a computing grid, and they lack the tools to work against them like MS Sql Server Management Studio.
NewSql is attempting to take the best parts of both worlds and I think it eventually will. Here is a great write up of the RDBMS V.s. NoSql V.s. NewSql on bananagunprogramming.com.
I am a newbie in NoSQL databases and this may sound a bit stupid but I was wondering if NoSQL databases use or need indexes?
If yes, how to make or manage them? any links?
Thanks
CouchDB and MongoDB definitely yes. I mentioned that in my book:
http://use-the-index-luke.com/sql/testing-scalability/response-time-throughput-scaling-horizontal
Here are the respective docs:
http://guide.couchdb.org/draft/btree.html
http://www.mongodb.org/display/DOCS/Indexes
NoSQL is, however, too fragmented to give a definite "yes, all NoSQL systems need indexes", I believe. Most systems require and provide indexes but not at level most SQL databases do. Recently, the Cassandra people were proudly introducing secondary indexes, i.e., more than a single clustered index.
http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes (well, not so recently as I remember)
Definitely nosql databases need index,
i.e. but in most popular databases you need not to maintain index by yourself because as per current needs of nosql databases communities of nosql databases is developing with new features and with "Code Less, Get More"
It's a hot topic in architect community "Moving to NoSQL DB". But my question is, Does NoSQL DB is capable enough to process huge database.
Does it support indexing
Does it support backups (data backup and log backup)
Is there Management Tools available
Does it support replication
Does it support Transaction
A developer can work independently on Relational Database. Does NoSQL has such tools. Can we use it in data center?
Thanks
Nirajan,
The answer to all of your questions with regards to RavenDB is that it absolutely does support all of those.
Regarding RavenDB:
yes
yes
yes (it's self-explanatory and missing from the documentation, but it will be re-written anyway)
yes
yes
I am currently examining different NoSQL and RDBMSes regarding their replication abilities in order to build distributed systems.
Reading through several papers and books, I get the feeling, that some vendors or authors use their own definitions regarding the terms
Master-Master Replication (Replication between two servers)
Master-Slave Replication (Replication between mutliple Servers in order to increase reading speed, writes are only able for the master server)
Multi-Master Replication (= Peer-To-Peer?)
Peer-To-Peer Replication (replication between n nodes, each can read/write)
Merge Replication (?)
E.g: Some mix up the terms Master-Master and Peer-to-Peer as the same, while in Mysql docus for instance I found it is differentiated between Master-Master and Multi-Master (=Peer-to-peer???) Replication.
Where is the difference in Multi-Master and Peer-to-Peer replication?
Is Multi-Master replication's use case more oriented towards Clustering while Peer-To-Peer targets distributed content to distributed applications?
I would like to sort things out and be sure that I have the right understanding in these terms, so maybe a discussion in here would help to merge some knowledge.
Regards, Chris
Edit: added merge replication to the list and some explanations as I understand them...
Regarding CouchDB, the story is simple. Here it is:
There is only one replication mode for CouchDB. The source copies all its data to the target, subject to an optional yes/no filter. I described CouchDB replication in another question. The key point is that "replication" is simply a DB client. It connects to both couches, reads from the source, and writes to the target.
Any other big-picture architecture (peer-to-peer, multi-master, master-slave) is just the implementation of the developers or the system administrators. For example, if GETs are distributed to many couches, but POST go to one central couch which replicates to the others, that is effectively master-slave. If you put a CouchDB in every major city for performance, and they replicate directly with each other, that is multi-master replication.
Within the CouchDB community, and especially from Chris Anderson's projects and presentations, "peer-to-peer" replication is a concept where CouchDB is everywhere: mobile phones, data centers, telephone poles. And replication happens directly between couches in a decentralized way, without a central authority or architecture, like the web itself.
Is there any different between these two kind of database? If yes, what is the different? Thank you.
The question isn't really answerable because "RDBMS" and "column-oriented" refer to very different aspects of a DBMS and are not mutually exclusive.
A RDBMS is any DBMS that implements the relational model.
A column-oriented DBMS is any DBMS that uses a columnar storage for data. That could be an RDBMS or it could be something else.
A column-oriented database is typically used for data warehouses and where you need to aggregate large amounts of data. It can be substantially different than a 'typical' transactional database.
Is this this what you are desiring to build (a data warehouse)?
When the column-oriented DBMS supports SQL, it replaces the SQL schema internally with a fully normalised version. Therefore performance considerations in the design of the schema that usually apply to traditional RDBMS no longer apply.