Neo4j + Redis? Good or bad? - database

I making a music app with social networking features. I was hoping to power my database with Neo4j and Redis. In Neo4j I will store user info and all other information ( post, reviews, etc.) in redis. Does anyone have any advice or insight on this?

Short answer: it depends.
Longer answer:
I'm assuming that you are just starting with the app and want to have quick feedback if it is a thing you want to invest (time/money) in.
If you want to run queries like "which users reviewed the same song" you need to put this data into Neo4J. In general, the more connected data you have there, the more interesting the questions you can answer. So I would err on the side of putting data into Neo4j. Also, only querying one database is easier to implement than aggregating data over multiple ones.
If you get enough users that the amount of data they produce starts to impact Neo4j, you can put the actual review text or post into redis and reference it by an id from Neo4j. But by then you already know it is worth doing and this is a fairly manageable refactoring and data migration.

Neo4j is a graph database. However it does not support sharding (horizontal partitioning). The good thing about using Neo4j is that you can store a graph data structure and run graph algorithms easily with Neo4j query language. This may be useful for analyzing some social network properties. The bad thing, is, because Neo4j does not support sharding, the capacity of the database is limited to a single node. When the data size increases, its performance may be impacted.
Redis is always useful for caching data, which can be a good choice.

IMHO, I will try to store all in neo4j in the same case.

Related

Scalable database technology and architecture

I've been trying to learn more about database scaling in a distributed system, and I am stuck in between RDBMS and NoSQL.
Some articles online suggest that NoSQL is the solution to modern Big Data. Others say NoSQL is just a hype and RDBMS can be just as scalable with good design, and it provides good data structure.
Instead of reading others' opinions, I'd love to judge the two myself, but I do not understand exactly what is required for a scalable RDBMS and a scalable NoSQL.
I've done a bit more readings on RDBMS, and it seems that the solution requires leveraging memcache and sharding to reduce database size and the number of DB queries. Are there other tricks? Can you still use tables with many columns? Or use less columns and more joins?
As for NoSQL, I've read a little about MongoDB. I understand that it encourages data aggregation. But how does that make it more scalable? I'm also starting to learn Cassandra because I read that it scales much better than MongoDB, but I have no idea how it is more scalable.
I would very much appreciate a basic (or advanced, if you have the patience to type it out) condensed and down-to-the-core explanation on scaling RDBMS and NoSQL, or good articles online or books that explain the topic. :)
I won't cover ways you can scale by implementing things on your own and putting a memcache server in between, ... I'll just cover what comes right out of the box...
Let's start first with RDBMS:
I think setting up an RDBMS cluster is more complicated than a NoSQL cluster, but that's just my opinion. Usually what you have is one Master and multiple Slaves. You have to send all your writes to the master and can read from any slave you want. Since you have RDBMS and ACID, the system should somehow guarantee you, that you won't read old data. So the thing here is, that you assume that your application writes once and reads often (as it's usually the case). For those purposes, one Server for read/write and multiple servers for read is great. The problem is if you'r writes are so often that you can't keep up with them anymore on the one machine. That is your bottleneck. Additionally to the build in solutions from Oracle for instance - which are huge - there is also http://www.scalearc.com/ which can cache queries, ... and handle the scaling for you.
NoSQL:
There is no 1 NoSQL schema which is implemented by all the DBs. Every system is a bit different. MongoDB for instance is quite similar to RDBMS, it also has only one Master and several slaves to which it can replicate data, but additionally you can also create shards. Data is split between shards, and replicated to slaves. So you could have multiple different masters which are responsible for smaller parts. Afterwards when you read, you can choose if you want to read from multiple slaves, from the master or from any slave - depending how urgently you need the latest data.
Cassandra on the other hand works totally differently. I'm not sure if you can write to multiple servers or how it works, but basically the servers keep a log of all the writes. So even if they can't process the writes immediately, they are stored in a log, to still give you a fast response. Afterwards when you read, you can say again how urgently you want to have the new data, and if you really want the latest latest data, Cassandra will need to check the log, if there are any updates written, and it will cost you a lot of time.
Key-Value stores like ElasticSearch, CouchDB, CouchBase work again differently. Here the of the item is hashed, and based on the hash, sent to one node which will be responsible for it. This way, when you read after the key was written, you get again up to date information, because you'll read from the same node. The idea of this design is, that no one single key will be of everyone's interest, but the load will be distributed. These are also the DBs which I think scale the best, and make it the easiest to add more servers to the cluster, but you loose the power of complex queries, like you have it in MongoDB and Cassandra - and of course RDBMS. ElasticSearch has some simple search queries, and CouchDB and CouchBase have only Views which are produced by MapReduce, where you can get data which you want, if it fits the view. Otherwise you can only access it by the key.
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis - is a very comprehensive summary of the most common NoSQL DBs, what are their strengths and weaknesses, and the most common usage scenarios.
In the end, the question is also, why do you want to scale? how many records are you going to have in the database? Few millions is not a problem at all. Few hundred millions is also not a problem for most of the RDBMS on a powerful enough server. And if designed the DB and it's indices properly even a billion records per year should be still fine.

Ideal database for a minimalist blog engine

So I'm designing this blog engine and I'm trying to just keep my blog data without considering comments or membership system or any other type of multi-user data.
The blog itself is surrounded around 2 types of data, the first is the actual blog post entry which consists of: title, post body, meta data (mostly dates and statistics), so it's really simple and can be represented by simple json object. The second type of data is the blog admin configuration and personal information. Comment system and other will be implemented using disqus.
My main concern here is the ability of such engine to scale with spiked visits (I know you might argue this but lets take it for granted). So since I've started this project I'm moving well with the rest of my stack except the data layer. Now I've been having this dilemma choosing the database, I've considered MongoDB but some reviews and articles/benchmarking were suggesting slow reads after collections read certain size. Next I was looking at Redis and using its persistence features RDB and AOF, while Redis is good at both fast reading/writing I'm afraid of using it because I'm not familiar with it. And this whole search keeps going on to things like "PostgreSQL 9.4 is now faster than MongoDB for storing JSON documents" etc.
So is there any way I can settle this issue for good? considering that I only need to represent my data in key,value structure and only require fast reading but not writing and the ability to be fault tolerant.
Thank you
If I were you I would start small and not try to optimize for big data just yet. A lot of blogs you read about the downsides of a NoSQL solution are around large data sets - or people that are trying to do relational things with a database designed for de-normalized data.
My list of databases to consider:
Mongo. It has huge community support and based on recent funding - it's going to be around for a while. It runs very well on a single instance and a basic replica set. It's easy to set up and free, so it's worth spending a day or two running your own tests to settle the issue once and for all. Don't trust a blog.
Couchbase. Supports key/value storage and also has persistence to disk. http://www.couchbase.com/couchbase-server/features Also has had some recent funding so hopefully that means stability. =)
CouchDB/PouchDB. You can use PouchDB purely on the client side and it can connect to a server side CouchDB. CouchDB might not have the same momentum as Mongo or Couchbase, but it's an actively supported product and does key/value with persistence to disk.
Riak. http://basho.com/riak/. Another NoSQL that scales and is a key/value store.
You can install and run a proof-of-concept on all of the above products in a few hours. I would recommend this for the following reasons:
A given database might scale and hit your points, but be unpleasant to use. Consider picking a database that feels fun! Sort of akin to picking Ruby/Python over Java because the syntax is nicer.
Your use case and domain will be fairly unique. Worth testing various products to see what fits best.
Each database has quirks and you won't find those until you actually try one. One might have quirks that are passable, one will have quirks that are a show stopper.
The benefit of trying all of them is that they all support schemaless data, so if you write JSON, you can use all of them! No need to create objects in your code for each database.
If you abstract the database correctly in code, swapping out data stores won't be that painful. In other words, your code will be happier if you make it easy to swap out data stores.
This is only an option for really simple CMSes, but it sounds like that's what you're building.
If your blog is super-simple as you describe and your main concern is very high traffic then the best option might be to avoid a database entirely and have your CMS generate static files instead. By doing this, you eliminate all your database concerns completely.
It's not the best option if you're doing anything dynamic or complex, but in this small use case it might fit the bill.

When NOT to use Cassandra? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
There has been a lot of talk related to Cassandra lately.
Twitter, Digg, Facebook, etc all use it.
When does it make sense to:
use Cassandra,
not use Cassandra, and
use a RDMS instead of Cassandra.
There is nothing like a silver bullet, everything is built to solve specific problems and has its own pros and cons. It is up to you, what problem statement you have and what is the best fitting solution for that problem.
I will try to answer your questions one by one in the same order you asked them. Since Cassandra is based on the NoSQL family of databases, it's important you understand why use a NoSQL database before I answer your questions.
Why use NoSQL
In the case of RDBMS, making a choice is quite easy because all the databases like MySQL, Oracle, MS SQL, PostgreSQL in this category offer almost the same kind of solutions oriented toward ACID properties. When it comes to NoSQL, the decision becomes difficult because every NoSQL database offers different solutions and you have to understand which one is best suited for your app/system requirements. For example, MongoDB is fit for use cases where your system demands a schema-less document store. HBase might be fit for search engines, analyzing log data, or any place where scanning huge, two-dimensional join-less tables is a requirement. Redis is built to provide In-Memory search for varieties of data structures like trees, queues, linked lists, etc and can be a good fit for making real-time leaderboards, pub-sub kind of system. Similarly there are other databases in this category (Including Cassandra) which are fit for different problem statements. Now lets move to the original questions, and answer them one by one.
When to use Cassandra
Being a part of the NoSQL family, Cassandra offers a solution for problems where one of your requirements is to have a very heavy write system and you want to have a quite responsive reporting system on top of that stored data. Consider the use case of Web analytics where log data is stored for each request and you want to built an analytical platform around it to count hits per hour, by browser, by IP, etc in a real time manner. You can refer to this blog post to understand more about the use cases where Cassandra fits in.
When to Use a RDMS instead of Cassandra
Cassandra is based on a NoSQL database and does not provide ACID and relational data properties. If you have a strong requirement for ACID properties (for example Financial data), Cassandra would not be a fit in that case. Obviously, you can make a workaround for that, however you will end up writing lots of application code to simulate ACID properties and will lose on time to market badly. Also managing that kind of system with Cassandra would be complex and tedious for you.
When not to use Cassandra
I don't think it needs to be answered if the above explanation makes sense.
When evaluating distributed data systems, you have to consider the CAP theorem - you can pick two of the following: consistency, availability, and partition tolerance.
Cassandra is an available, partition-tolerant system that supports eventual consistency. For more information see this blog post I wrote: Visual Guide to NoSQL Systems.
Cassandra is the answer to a particular problem: What do you do when you have so much data that it does not fit on one server ? How do you store all your data on many servers and do not break your bank account and not make your developers insane ? Facebook gets 4 Terabyte of new compressed data EVERY DAY. And this number most likely will grow more than twice within a year.
If you do not have this much data or if you have millions to pay for Enterprise Oracle/DB2 cluster installation and specialists required to set it up and maintain it, then you are fine with SQL database.
However Facebook no longer uses cassandra and now uses MySQL almost exclusively moving the partitioning up in the application stack for faster performance and better control.
The general idea of NoSQL is that you should use whichever data store is the best fit for your application. If you have a table of financial data, use SQL. If you have objects that would require complex/slow queries to map to a relational schema, use an object or key/value store.
Of course just about any real world problem you run into is somewhere in between those two extremes and neither solution will be perfect. You need to consider the capabilities of each store and the consequences of using one over the other, which will be very much specific to the problem you are trying to solve.
Besides the answers given above about when to use and when not to use Cassandra, if you do decide to use Cassandra you may want to consider not using Cassandra itself, but one of the its many cousins out there.
Some answers above already pointed to various "NoSQL" systems which share many properties with Cassandra, with some small or large differences, and may be better than Cassandra itself for your specific needs.
Additionally, recently (several years after this question was originally asked), a Cassandra clone called Scylla (see https://en.wikipedia.org/wiki/Scylla_(database)) was released. Scylla is an open-source re-implementation of Cassandra in C++, which claims to have significantly higher throughput and lower latencies than the original Java Cassandra, while being mostly compatible with it (in features, APIs, and file formats). So if you're already considering Cassandra, you may want to consider Scylla as well.
I will focus here on some of the important aspects which can help you to decide if you really need Cassandra. The list is not exhaustive, just some of the points which I have at top of my mind-
Don't consider Cassandra as the first choice when you have a strict requirement on the relationship (across your dataset).
Cassandra by default is AP system (of CAP). But, it supports tunable consistency which means it can be configured to support as CP as well. So don't ignore it just because you read somewhere that it's AP and you are looking for CP systems. Cassandra is more accurately termed “tuneably consistent,” which means it allows you to easily decide the level of consistency you require, in balance with the level of availability.
Don't use Cassandra if your scale is not much or if you can deal with a non-distributed DB.
Think harder if your team thinks that all your problems will be solved if you use distributed DBs like Cassandra. To start with these DBs is very simple as it comes with many defaults but optimizing and mastering it for solving a specific problem would require a good (if not a lot) amount of engineering effort.
Cassandra is column-oriented but at the same time each row also has a unique key. So, it might be helpful to think of it as an indexed, row-oriented store. You can even use it as a document store.
Cassandra doesn't force you to define the fields beforehand. So, if you are in a startup mode or your features are evolving (as in agile) - Cassandra embraces it. So better, first think about queries and then think about data to answer them.
Cassandra is optimized for really high throughput on writes. If your use case is read-heavy (like cache) then Cassandra might not be an ideal choice.
Right. It makes sense to use Cassandra when you have a huge amount of data, a huge number of queries but very little variety of queries. Cassandra basically works by partitioning and replicating. If all your queries will be based on the same partition key, Cassandra is your best bet. If you get a query on an attribute that is not the partition key, Cassandra allows you to replicate the whole data with a new partition key. So now you have 2 replicas of the same data with 2 different partition keys.
Which brings me to your next question. When not to use Cassandra. As I mentioned, Cassandra scales by replicating the complete database for every new partitioning key. But you can't keep making new copies again and again. So when you have a high variety in queries i.e. each query has a different column in the where clause, Cassandra is not a good option.
Now for the third question. The whole point of using RDBMS is when you want the ACID properties. If you are building something like a payment service and want each transaction to be isolated, each transaction to either complete or not happen at all, changes to be persistent despite system failure, and the money to be consistent across bank accounts before and after the transaction completes, an RDBMS is the only option that will help you achieve this.
This article actually explains the whole thing, especially when to use Cassandra or not (as opposed to some other NoSQL option) part of the question -> Choosing the best Database. Do check it out.
EDIT: To answer the question in the comments by proximab, when we think of banking systems we immidiately think "ACID is the best solution". But even banking systems are made up of several subsystems that might not even be dealing with any transaction related data like account holder's personal information, account statements, credit card details, credit histories, etc.
All of this information needs to be stored in some database or the another. Now if you store the account related information like account balance, that is something that needs to be consistent at all times. For example, if you try to send money from account A to account B, then the money that disappears from account A should instantaneousy show up in account B, and it cannot be present in both accounts at the same time. This system cannot be inconsistant at any point. This is where ACID is of utmost importance.
On the other hand if you are saving credit card details or credit histories, that should not get into the wrong hands, then you need something that allows access only to authorised users. That I believe is supported by Cassandra. That said, data like credit history and credit card transactions, I think that is an ever increasing data. Also there is only so much yo can query on this data i.e. it has a very finite number of queries. These two conditions make Cassandra a perfect solution.
Talking with someone in the midst of deploying Cassandra, it doesn't handle the many-to-many well. They are doing a hack job to do their initial testing. I spoke with a Cassandra consultant about this and he said he wouldn't recommend it if you had this problem set.
You should ask your self the following questions:
(Volume, Velocity) Will you be writing and reading TONS of information , so much information that no one computer could handle the writes.
(Global) Will you need this writing and reading capability around the world so that the writes in one part of the world are accessible in another part of the world?
(Reliability) Do you need this database to be up and running all the time and never go down regardless of which Cloud, which country, whether it's VM , Container, or Bare metal?
(Scale-ability) Do you need this database to be able to continue to grow easily and scale linearly
(Consistency) Do you need TUNABLE consistency where some writes can happen asynchronously where as others need to be certified?
(Skill) Are you willing to do what it takes to learn this technology and the data modeling that goes with creating a globally distributed database that can be fast for everyone, everywhere?
If for any of these questions you thought "maybe" or "no," you should use something else. If you had "hell yes" as an answer to all of them, then you should use Cassandra.
Use RDBMS when you can do everything on one box. It's probably easier than most and anyone can work with it.
Heavy single query vs. gazillion light query load is another point to consider, in addition to other answers here. It's inherently harder to automatically optimize a single query in a NoSql-style DB. I've used MongoDB and ran into performance issues when trying to calculate a complex query. I haven't used Cassandra but I expect it to have the same issue.
On the other hand, if your load is expected to be that of very many small queries, and you want to be able to easily scale out, you could take advantage of eventual consistency that is offered by most NoSql DBs. Note that eventual consistency is not really a feature of a non-relational data model, but it is much easier to implement and to set up in a NoSql-based system.
For a single, very heavy query, any modern RDBMS engine can do a decent job parallelizing parts of the query and take advantage of as much CPU and memory you throw at it (on a single machine). NoSql databases don't have enough information about the structure of the data to be able to make assumptions that will allow truly intelligent parallelization of a big query. They do allow you to easily scale out more servers (or cores) but once the query hits a complexity level you are basically forced to split it apart manually to parts that the NoSql engine knows how to deal with intelligently.
In my experience with MongoDB, in the end because of the complexity of the query there wasn't much Mongo could do to optimize it and run parts of it on multiple data. Mongo parallelizes multiple queries but isn't so good at optimizing a single one.
Let's read some real world cases:
http://planetcassandra.org/apache-cassandra-use-cases/
In this article: http://planetcassandra.org/blog/post/agentis-energy-stores-over-15-billion-records-of-time-series-usage-data-in-apache-cassandra
They elaborated the reason why they didn't choose MySql is because db synchronization is too slow.
(Also due to 2-phrase commit, FK, PK)
Cassandra is based on Amazon Dynamo paper
Features:
Stability
High availability
Backup performs well
Read and Write is better than HBase, (BigTable clone in java).
wiki http://en.wikipedia.org/wiki/Apache_Cassandra
Their Conclusion is:
We looked at HBase, Dynamo, Mongo and Cassandra.
Cassandra was simply the best storage solution for the majority of our data.
As of 2018,
I would recommend using ScyllaDB to replace classic cassandra, if you need back support.
Postgres kv plugin is also quick than cassandra. How ever won't have multi-instance scalability.
another situation that makes the choice easier is when you want to use aggregate function like sum, min, max, etcetera and complex queries (like in the financial system mentioned above) then a relational database is probably more convenient then a nosql database since both are not possible on a nosql databse unless you use really a lot of Inverted indexes. When you do use nosql you would have to do the aggregate functions in code or store them seperatly in its own columnfamily but this makes it all quite complex and reduces the performance that you gained by using nosql.
Cassandra is a good choice if:
You don't require the ACID properties from your DB.
There would be massive and huge number of writes on the DB.
There is a requirement to integrate with Big Data, Hadoop, Hive and Spark.
There is a need of real time data analytics and report generations.
There is a requirement of impressive fault tolerant mechanism.
There is a requirement of homogenous system.
There is a requirement of lots of customisation for tuning.
If you need a fully consistent database with SQL semantics, Cassandra is NOT the solution for you. Cassandra supports key-value lookups. It does not support SQL queries. Data in Cassandra is "eventually consistent". Concurrent lookups of data may be inconsistent, but eventually lookups are consistent.
If you need strict semantics and need support for SQL queries, choose another solution such as MySQL, PostGres, or combine use of Cassandra with Solr.
Apache cassandra is a distributed database for managing large amounts of structured data across many commodity servers, while providing highly available service and no single point of failure.
The archichecture is purely based on the cap theorem, which is availability , and partition tolerance, and interestingly eventual consistently.
Dont Use it, if your not storing volumes of data across racks of clusters,
Dont use if you are not storing Time series data,
Dont Use if you not patitioning your servers,
Dont use if you require strong Consistency.
Mongodb has very powerful aggregate functions and an expressive aggregate framework. It has many of the features developers are accustomed to using from the relational database world. It's document data/storage structure allows for more complex data models than Cassandra, for example.
All this comes with trade-offs of course. So when you select your database (NoSQL, NewSQL, or RDBMS) look at what problem you are trying to solve and at your scalability needs. No one database does it all.
According to DataStax, Cassandra is not the best use case when there is a need for
1- High end hardware devices.
2- ACID compliant with no roll back (bank transaction)
It does not support complete transaction management across the
tables.
Secondary Index not supported.
Have to rely on Elastic search /Solr for Secondary index and the custom sync component has to be written.
Not ACID compliant system.
Query support is limited.

Should I choose relational or non-relation database for social-network like app

I'm in the process of choosing database for my application. I have been using MySQL for the longest time but for my current application Performance and Scalability is important and I know MySQL has its limitation and I have been hearing a lot about key-value stores, column-based DBs and document-based DBs and others. I have looked into:
Cassandra
MongoDB
Redis
CouchDB
They all seem (or claim) to be faster than relational DBs such as MySQL.
I'm using Ruby on Rails and there are clients for all the above so it shouldn't be a problem.
My data model is simple for the most part which is centered on a user object(with rich profile and preferences) related to different items such as photos, videos, posts...etc and each one of these has one tag or more.
The fact that these databases are new there doesn't seem to be a lot of resources for them online. Plus they are in a way structurally different so it will not be trivial to switch from one to another later.
I wish you can give me your input on what DB you think would be most suit my application that will have good performance and scale.
Thanks,
Tam
Step 1) Create your design using whatever technology you are strongest with.
Step 2) Release your social network, begin on researching non-relational databases and master whichever you feel most comfortable with.
Step 3) Refactor your data tier so you could potentially replace MySQL quickly and easily with your newly learned DB technology.
Step 4) Wait for your website to become so big that the need to replace MySQL comes around and begin to plug the holes.
I know this seems kind of cheeky, but really my point is just release your software and start to worry about scale etc. when it actually becomes a concern.
The primary benefit of something like a document database, at least for your app, is that you can treat the entire User glob of info as a single document. You don't have to worry about adding table for properties, or new features, or whatever, rather you can keep the bulk of it in the user document and update it dynamically.
For read often, write rarely, this works a treat.
Now you don't need a "document database" to do something like this. MySQL et al will work just fine with a primary key and a CLOB (text) / BLOB field to hold the document.
Where something like CouchDB (the one that I'm most familiar with in this space) can help is that it has well supported replication, and it's straightforward to create views on specific attributes of the documents (for example, you want all "premiere" members, or whatever).
Plus, since CouchDB is HTTP, it works well with the modern caches and such that are available, which can help you in scaling, especially in, again, read heavy operations.
A lot of this is more about overall architecture than actual tools, so make sure you consider that first.
There is also Tokyo Cabinet which is used by some large sites.
I have not yet used on but my understanding is that when site like Twitter need to turn large numbers of messages round very quickly the overhead of the RDBMS is just to great and starts to slow the response times down significantly.
What you would need to do is look at the advantages you get from an RDBMS and weigh that against it's speed then do the same in reverse for a nosql type database.
RDBMS's give you a standard, they give you security, integrity and a general purpose language based on sets to make data manipulation easier. However if you do not need all or any of that structure you are loosing out on speed.
Prior to SQL was CODASYL and network databases. SQL took ove because of portability and transferability of skills etc. But i think the mobile wired world is changing this and it would be worth investigating.

What are the advantages of CouchDB vs an RDBMS

I've heard a lot about couchdb lately, and am confused about what it offers.
It's hard to explain all the differences in strict advantage/disadvantage form.
I would suggest playing with CouchDB a little yourself. The first thing you'll notice is that the learning curve during initial usage is totally inverted from RDBMS.
With RDBMS you spend a lot of up front time modeling your real world data to get it in to the Database. Once you've dealt with the modeling you can do all kinds of queries.
With CouchDB you just get all your data in JSON and stored in the DB in, literally, minutes. You don't need to do any normalization or anything like that, and the transport is HTTP so you have plenty of client options.
Then you'll notice a big learning curve when writing map functions and learning how the key collation works and the queries against the views you write. Once you learn them, you'll start to see how views allow you to normalize the indexes while leaving the data un-normalized and "natural".
CouchDB is a document-oriented database.
Wikipedia:
As opposed to Relational Databases, document-based databases do not store data in tables with uniform sized fields for each record. Instead, each record is stored as a document that has certain characteristics. Any number of fields of any length can be added to a document. Fields can also contain multiple pieces of data.
Advantages:
You don't waste space by leaving empty fields in documents (because they're not necessarily needed)
By providing a simple frontend for editing it is possible to quickly set up an application for maintaining data.
Fast and agile schema updates/changes
Map Reduce queries in a turing complete language of your choice. (no more sql)
Flexible Schema designs
Freeform Object Storage
Really really easy replication
Really Really easy Load-Balancing (soon)
Take a look here.
I think what better answers you is:
Just as CouchDB is not always the
right tool for the job, RDBMS's are
also not always the right answer.
CouchDB is a disk hog because it doesn't update documents -- it creates a new revision each time you update so the not-wasting-space-part because you don't have empty fields is trumped by the revisions.

Resources