Related
I got two opinions about NoSQL from my friend.
First: Use NoSQL to boost performance and save occasional updated data. Still use sql to save all important dan transaction data.
Second: Don't use NoSQL if you didn't really need it. Use it if you really save big data.
I've used NoSQL and its really fast when selecting data.
I want to know, is first opinion only enough for implementing NoSQL? What all of you think about these?
NOTE: In my case, it still running well with SQL. I want to add NoSQL for improving data reading speed. so it will work alongside.
Is it worth it to use NoSQL this early?
Thanks in advance
It depends on what you are designing.
From my experience scaling out data collection I have found traditional relational storage to be a bottleneck in terms of its inability to scale out over multiple nodes when a databases gets very large. Sure it scales up but this becomes cost prohibitive at some point. In this scenario it would therefore depend on your medium to long term data storage projections. The solution for me was therefore mixture of relational storage for data that may be updated frequently and noSQL (document storage) for data the has a fast rate of growth that is generally not updated post write.
Things to take into account:
Queries
SQL relational storage supports a growing subset languages for queries, as well as a wide range of filters, sorting options, and projections and index queries. NoSQL does all this as well, but SQL can often go beyond it, allowing powerful aggregations of your data as well, beyond what NoSQL can do.
Transactions
Transactions are important because they ensure that you have atomically made changes to your database. Many NoSQL platforms don’t support transactions, so be aware of this feature when you’re figuring out which to use, and what your own needs are.
Consistency
MySQL platforms often use a single master to guarantee strong consistency in your database. These use synchronous replication to ensure you don’t lose important changes queued up to the master. NoSQL, by contrast, does replication of entity groups without a master, so that data is strong within an entity group, and is eventually updated across all groups. The better option depends on the constraints and needs of your database.
Scalability
For years, database administrators relied on scaling up, buying bigger servers as database load increased. However, as transaction rates and demands on the databases continue to expand immensely, emphasis is on scaling out instead. Scaling out is distributing databases across multiple hosts, and that’s something NoSQL does better than standard SQL. They’re designed for optimal use on scaled out databases.
Management
NoSQL databases are generally designed to require less management overall. Repairs are often automatic, and data distribution and simpler data models contribute to less administration required overall. However, you’ve also got less support when there’s a problem. SQL platforms often have vendors waiting to supply support to enterprises.
Schema
Regular SQL platforms often have strictly enforced rules for a schema change, to stave off user-created typos that can put faults in your query. NoSQL platforms will have their own mechanisms for combating this.
Hope that helps.
NoSQL scores over SQL in below areas
It support semi-structured data and volatile data. You can change the structure at any time
It does not have schema
Read/Write through put is very high
Horizontal scalability is easily achieved - Add cheaper hardware and provide right replication factor
Will support Bigdata in volumes of Terra Bytes & Peta Bytes by using cheaper hardware
Good support for Analytic tools on top of Bigdata, especially Hadoop/Hbase family
In memory caching option is available to increase the performance of queries
Faster development life cycles for developers
When you should not use NoSQL and go for SQL
If you require business critical transaction with ACID properties i.e where Consistency is key & Eventual consistency is not an option
If you have heavy aggregation queries spanning multiple entities
In summary, you have to use right technology for right business use case. i.e combination of SQL and NoSQL
Regarding your queries:
Use SQL for business critical transactions. If your SQL is scaling for your business requirements, use SQL.
Use NoSQL for huge volumes of data in magnitudes of Tera/Peta bytes with variety of data , where SQL can't handle that volume & variety.
As others pointed out both SQL and NoSQL (Not only SQL) have their advantages.
There is often temptation to use both side by side and get maximum out of it. Something referred to as Polyglot persistence
Is it a good idea? Sometimes, yes.
Should I do it?
While it may have benefits, the trade off comes with maintenance of multiple stores (note: they would have different ways of database management).
Also the data sync is a bigger one if you are planning this for same transactional system.
If the data you are going to store (in sql and no-sql databases) can be logically separated then you might be ok. But in case they are closely related then you are going to have tough time keeping them consistent.
Overall when i evaluated this option, i came to conclude that it would work only when you can logically partition the data. Another use case may be using Nosql for Analytics and continue with sql for transaction system.
Going back to your use case, did you try JSON storage within your sql database. It may give you benefit of performance without much tradeoffs.
I'm intrigued in the database service Datomic, but I'm not sure if it fits the needs of the projects I work on. When is Datomic a good choice, and when should it be avoided?
With the proviso that I haven't used Datomic in production, thought I'd give you an answer.
Advantages
Datalog queries are powerful (more so than non-recursive SQL) and very expressive.
Queries can be written with Clojure data structures, and it's NOT a weak DSL like many SQL libraries that allow you to query with data structures.
It's immutable, so you get the advantages that immutability gives you in Clojure/other languages as well
a. This also allows you to store, while saving structures, all past facts in your database—this is VERY useful for auditing & more
Disadvantages
It can be slow, as Datalog is just going to be slower than equivalent SQL (assuming an equivalent SQL statement can be written).
If you are writing a LOT, you could maybe need to worry about the single transactor getting overwhelmed. This seems unlikely for most cases, but it's something to think about (you could do a sort of shard, though, and probably save yourself; but this isn't a DB for e.g. storing stock tick data).
It's a bit tricky to get up and running with, and it's expensive, and the licensing and price makes it difficult to use a hosted instance with it: you'll need to be dealing with sysadminning this yourself instead of using something like Postgres on Heroku or Mongo at MongoHQ
I'm sure I'm missing some on each side, and though I have 3 listed under disadvantages, I think that the advantages outweigh them in more circumstances where disadvantages don't preclude its use. Price is probably the one that will prevent its being used in most small projects (that you expect to outlast the 1 year free trial).
Cf. this short post describing Datomic simply for some more information.
Expressivity (c.f. Datalog) and immutability are awesome. It's SO much fun to work with Dataomic in that regard, and you can tell it's powerful just by using it a bit.
One important thing when considering if Datomic is the right fit for your application is to think about shape of the data you are going to store and query - as Datomic facts are actually very similar to RDF triples (+ first class time notion) it lends itself very good to modeling complex relationships (linked graph data) - something which is often cumbersome with traditional SQL databases.
I found this aspect to be one of the most appealing and important for me, it worked really well, even if this is of course not something exclusive to Datomic, as there are many other high-quality offerings for graph databases, one must mention Neo4J when we are talking about JVM based solutions.
Regarding Datomic schema, i think it's just the right balance between flexibility and stability.
To complete the above answers, I'd like to emphasize that immutability and the ability to remember the past are not 'wizardry features' suited to a few special case like auditing. It is an approach which has several deep benefits compared to 'mutable cells' databases (which are 99% of databases today). Stuart Halloway demonstrates this nicely in this video: the Impedance Mismatch is our fault.
In my personal opinion, this approach is fundamentally more sane conceptually. Having used it for several months, I don't see Datomic has having crazy magical sophisticated powers, rather a more natural paradigm without some of the big problems the others have.
Here are some features of Datomic I find valuable, most of which are enabled by immutability:
because reading is not remote, you don't have to design your queries like an expedition over the wire. In particular, you can separate concerns into several queries (e.g find the entities which are the input to my query - answer some business question about these entities - fetch associated data for presenting the result)
the schema is very flexible, without sacrificing query power
it's comfortable to have your queries integrated in your application programming language
the Entity API brings you the good parts of ORMs
the query language is programmable and has primitives for abstraction and reuse (rules, predicates, database functions)
performance: writers impede only other writers, and no one impedes readers. Plus, lots of caching.
... and yes, a few superpowers like travelling to the past, speculative writes or branching reality.
Regarding when not to use Datomic, here are the current constraints and limitations I see:
you have to be on the JVM (there is also a REST API, but you lose most of the benefits IMO)
not suited for write scale, nor huge data volumes
won't be especially integrated into frameworks, e.g you won't currently find a library which generates CRUD REST endpoints from a Datomic schema
it's a commercial database
since reading happens in the application process (the 'Peer'), you have to make sure that the Peer has enough memory to hold all the data it needs to traverse in a query.
So my very vague and informal answer would be that Datomic is a good fit for most non-trivial applications which write load is reasonable and you don't have a problem with the license and being on the JVM.
As an analogy, you can ask yourself the same question for Git as compared to other version control systems which are not based on immutability.
Just to tentatively add over the other answers:
It is probably fair to say datomic presents the better conceptual framework for a queryable data store of all other current options out there, while being partially scalable and not exceptionally performant.
I say only partially scalable, because queries need to fit in the peer RAM or fail. And not exceptionally performant, as top-notch SQL engines can optimize queries to fit in memory through sophisticated execution plans, something I've not yet seen mentioned as a feature in datomic; Datomic's decoupling of transacting and querying might in the overall offset this feature.
Unlike many NoSQL engines though, transactions are a first-class citizen, which puts it at par with RDBMS systems in that key regard.
For applications where data is read more than being written, transactions are needed, queries always fit in memory or memory is very cheap, and the overall size of accumulated data isn't too large, it might be a win where a commercial-only product can be afforded ― for those who are willing to embrace its novel conceptual framework implied in the API.
I'm a newcomer to memcached, but fairly familiar with database internals and systems programming, so this seemed odd to me. It's obvious that a memory-based solution is faster than a disk-based solution, but since any database backing the cache will know more about the structure of data, shouldn't it have a better idea of how to cache it effectively?
I see three possibilities:
"Machines deployed with memcached have more RAM than database servers typically do." Would adding the same amount of memory make the solutions perform similarly?
"Ensuring ACID transactional properties in the database make this speedup difficult to match." Is it possible to get similar-scale speedups by relaxing the transactional guarantees of your database to match those of the cache?
"Distributing the database queries across multiple cache machines equally is what allows the speedup." Would sharding the database do the same thing?
If it's not a combination of these, what more does adding a caching layer bring to the table which databases cannot, and why don't/can't database vendors implement a better caching layer themselves?
Successful use of memcached isn't about "database-level caching." That's almost never a good idea.
Instead, you think about all the things you're going to get from the DB to build a "thing" and you cache that thing so you don't have to do it next time.
Also, you can cache lots of things that aren't database. Anything that's expensive in your app to build or retrieve. Cache that.
If a DB query is sufficiently fast, don't cache that.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
There has been a lot of talk related to Cassandra lately.
Twitter, Digg, Facebook, etc all use it.
When does it make sense to:
use Cassandra,
not use Cassandra, and
use a RDMS instead of Cassandra.
There is nothing like a silver bullet, everything is built to solve specific problems and has its own pros and cons. It is up to you, what problem statement you have and what is the best fitting solution for that problem.
I will try to answer your questions one by one in the same order you asked them. Since Cassandra is based on the NoSQL family of databases, it's important you understand why use a NoSQL database before I answer your questions.
Why use NoSQL
In the case of RDBMS, making a choice is quite easy because all the databases like MySQL, Oracle, MS SQL, PostgreSQL in this category offer almost the same kind of solutions oriented toward ACID properties. When it comes to NoSQL, the decision becomes difficult because every NoSQL database offers different solutions and you have to understand which one is best suited for your app/system requirements. For example, MongoDB is fit for use cases where your system demands a schema-less document store. HBase might be fit for search engines, analyzing log data, or any place where scanning huge, two-dimensional join-less tables is a requirement. Redis is built to provide In-Memory search for varieties of data structures like trees, queues, linked lists, etc and can be a good fit for making real-time leaderboards, pub-sub kind of system. Similarly there are other databases in this category (Including Cassandra) which are fit for different problem statements. Now lets move to the original questions, and answer them one by one.
When to use Cassandra
Being a part of the NoSQL family, Cassandra offers a solution for problems where one of your requirements is to have a very heavy write system and you want to have a quite responsive reporting system on top of that stored data. Consider the use case of Web analytics where log data is stored for each request and you want to built an analytical platform around it to count hits per hour, by browser, by IP, etc in a real time manner. You can refer to this blog post to understand more about the use cases where Cassandra fits in.
When to Use a RDMS instead of Cassandra
Cassandra is based on a NoSQL database and does not provide ACID and relational data properties. If you have a strong requirement for ACID properties (for example Financial data), Cassandra would not be a fit in that case. Obviously, you can make a workaround for that, however you will end up writing lots of application code to simulate ACID properties and will lose on time to market badly. Also managing that kind of system with Cassandra would be complex and tedious for you.
When not to use Cassandra
I don't think it needs to be answered if the above explanation makes sense.
When evaluating distributed data systems, you have to consider the CAP theorem - you can pick two of the following: consistency, availability, and partition tolerance.
Cassandra is an available, partition-tolerant system that supports eventual consistency. For more information see this blog post I wrote: Visual Guide to NoSQL Systems.
Cassandra is the answer to a particular problem: What do you do when you have so much data that it does not fit on one server ? How do you store all your data on many servers and do not break your bank account and not make your developers insane ? Facebook gets 4 Terabyte of new compressed data EVERY DAY. And this number most likely will grow more than twice within a year.
If you do not have this much data or if you have millions to pay for Enterprise Oracle/DB2 cluster installation and specialists required to set it up and maintain it, then you are fine with SQL database.
However Facebook no longer uses cassandra and now uses MySQL almost exclusively moving the partitioning up in the application stack for faster performance and better control.
The general idea of NoSQL is that you should use whichever data store is the best fit for your application. If you have a table of financial data, use SQL. If you have objects that would require complex/slow queries to map to a relational schema, use an object or key/value store.
Of course just about any real world problem you run into is somewhere in between those two extremes and neither solution will be perfect. You need to consider the capabilities of each store and the consequences of using one over the other, which will be very much specific to the problem you are trying to solve.
Besides the answers given above about when to use and when not to use Cassandra, if you do decide to use Cassandra you may want to consider not using Cassandra itself, but one of the its many cousins out there.
Some answers above already pointed to various "NoSQL" systems which share many properties with Cassandra, with some small or large differences, and may be better than Cassandra itself for your specific needs.
Additionally, recently (several years after this question was originally asked), a Cassandra clone called Scylla (see https://en.wikipedia.org/wiki/Scylla_(database)) was released. Scylla is an open-source re-implementation of Cassandra in C++, which claims to have significantly higher throughput and lower latencies than the original Java Cassandra, while being mostly compatible with it (in features, APIs, and file formats). So if you're already considering Cassandra, you may want to consider Scylla as well.
I will focus here on some of the important aspects which can help you to decide if you really need Cassandra. The list is not exhaustive, just some of the points which I have at top of my mind-
Don't consider Cassandra as the first choice when you have a strict requirement on the relationship (across your dataset).
Cassandra by default is AP system (of CAP). But, it supports tunable consistency which means it can be configured to support as CP as well. So don't ignore it just because you read somewhere that it's AP and you are looking for CP systems. Cassandra is more accurately termed “tuneably consistent,” which means it allows you to easily decide the level of consistency you require, in balance with the level of availability.
Don't use Cassandra if your scale is not much or if you can deal with a non-distributed DB.
Think harder if your team thinks that all your problems will be solved if you use distributed DBs like Cassandra. To start with these DBs is very simple as it comes with many defaults but optimizing and mastering it for solving a specific problem would require a good (if not a lot) amount of engineering effort.
Cassandra is column-oriented but at the same time each row also has a unique key. So, it might be helpful to think of it as an indexed, row-oriented store. You can even use it as a document store.
Cassandra doesn't force you to define the fields beforehand. So, if you are in a startup mode or your features are evolving (as in agile) - Cassandra embraces it. So better, first think about queries and then think about data to answer them.
Cassandra is optimized for really high throughput on writes. If your use case is read-heavy (like cache) then Cassandra might not be an ideal choice.
Right. It makes sense to use Cassandra when you have a huge amount of data, a huge number of queries but very little variety of queries. Cassandra basically works by partitioning and replicating. If all your queries will be based on the same partition key, Cassandra is your best bet. If you get a query on an attribute that is not the partition key, Cassandra allows you to replicate the whole data with a new partition key. So now you have 2 replicas of the same data with 2 different partition keys.
Which brings me to your next question. When not to use Cassandra. As I mentioned, Cassandra scales by replicating the complete database for every new partitioning key. But you can't keep making new copies again and again. So when you have a high variety in queries i.e. each query has a different column in the where clause, Cassandra is not a good option.
Now for the third question. The whole point of using RDBMS is when you want the ACID properties. If you are building something like a payment service and want each transaction to be isolated, each transaction to either complete or not happen at all, changes to be persistent despite system failure, and the money to be consistent across bank accounts before and after the transaction completes, an RDBMS is the only option that will help you achieve this.
This article actually explains the whole thing, especially when to use Cassandra or not (as opposed to some other NoSQL option) part of the question -> Choosing the best Database. Do check it out.
EDIT: To answer the question in the comments by proximab, when we think of banking systems we immidiately think "ACID is the best solution". But even banking systems are made up of several subsystems that might not even be dealing with any transaction related data like account holder's personal information, account statements, credit card details, credit histories, etc.
All of this information needs to be stored in some database or the another. Now if you store the account related information like account balance, that is something that needs to be consistent at all times. For example, if you try to send money from account A to account B, then the money that disappears from account A should instantaneousy show up in account B, and it cannot be present in both accounts at the same time. This system cannot be inconsistant at any point. This is where ACID is of utmost importance.
On the other hand if you are saving credit card details or credit histories, that should not get into the wrong hands, then you need something that allows access only to authorised users. That I believe is supported by Cassandra. That said, data like credit history and credit card transactions, I think that is an ever increasing data. Also there is only so much yo can query on this data i.e. it has a very finite number of queries. These two conditions make Cassandra a perfect solution.
Talking with someone in the midst of deploying Cassandra, it doesn't handle the many-to-many well. They are doing a hack job to do their initial testing. I spoke with a Cassandra consultant about this and he said he wouldn't recommend it if you had this problem set.
You should ask your self the following questions:
(Volume, Velocity) Will you be writing and reading TONS of information , so much information that no one computer could handle the writes.
(Global) Will you need this writing and reading capability around the world so that the writes in one part of the world are accessible in another part of the world?
(Reliability) Do you need this database to be up and running all the time and never go down regardless of which Cloud, which country, whether it's VM , Container, or Bare metal?
(Scale-ability) Do you need this database to be able to continue to grow easily and scale linearly
(Consistency) Do you need TUNABLE consistency where some writes can happen asynchronously where as others need to be certified?
(Skill) Are you willing to do what it takes to learn this technology and the data modeling that goes with creating a globally distributed database that can be fast for everyone, everywhere?
If for any of these questions you thought "maybe" or "no," you should use something else. If you had "hell yes" as an answer to all of them, then you should use Cassandra.
Use RDBMS when you can do everything on one box. It's probably easier than most and anyone can work with it.
Heavy single query vs. gazillion light query load is another point to consider, in addition to other answers here. It's inherently harder to automatically optimize a single query in a NoSql-style DB. I've used MongoDB and ran into performance issues when trying to calculate a complex query. I haven't used Cassandra but I expect it to have the same issue.
On the other hand, if your load is expected to be that of very many small queries, and you want to be able to easily scale out, you could take advantage of eventual consistency that is offered by most NoSql DBs. Note that eventual consistency is not really a feature of a non-relational data model, but it is much easier to implement and to set up in a NoSql-based system.
For a single, very heavy query, any modern RDBMS engine can do a decent job parallelizing parts of the query and take advantage of as much CPU and memory you throw at it (on a single machine). NoSql databases don't have enough information about the structure of the data to be able to make assumptions that will allow truly intelligent parallelization of a big query. They do allow you to easily scale out more servers (or cores) but once the query hits a complexity level you are basically forced to split it apart manually to parts that the NoSql engine knows how to deal with intelligently.
In my experience with MongoDB, in the end because of the complexity of the query there wasn't much Mongo could do to optimize it and run parts of it on multiple data. Mongo parallelizes multiple queries but isn't so good at optimizing a single one.
Let's read some real world cases:
http://planetcassandra.org/apache-cassandra-use-cases/
In this article: http://planetcassandra.org/blog/post/agentis-energy-stores-over-15-billion-records-of-time-series-usage-data-in-apache-cassandra
They elaborated the reason why they didn't choose MySql is because db synchronization is too slow.
(Also due to 2-phrase commit, FK, PK)
Cassandra is based on Amazon Dynamo paper
Features:
Stability
High availability
Backup performs well
Read and Write is better than HBase, (BigTable clone in java).
wiki http://en.wikipedia.org/wiki/Apache_Cassandra
Their Conclusion is:
We looked at HBase, Dynamo, Mongo and Cassandra.
Cassandra was simply the best storage solution for the majority of our data.
As of 2018,
I would recommend using ScyllaDB to replace classic cassandra, if you need back support.
Postgres kv plugin is also quick than cassandra. How ever won't have multi-instance scalability.
another situation that makes the choice easier is when you want to use aggregate function like sum, min, max, etcetera and complex queries (like in the financial system mentioned above) then a relational database is probably more convenient then a nosql database since both are not possible on a nosql databse unless you use really a lot of Inverted indexes. When you do use nosql you would have to do the aggregate functions in code or store them seperatly in its own columnfamily but this makes it all quite complex and reduces the performance that you gained by using nosql.
Cassandra is a good choice if:
You don't require the ACID properties from your DB.
There would be massive and huge number of writes on the DB.
There is a requirement to integrate with Big Data, Hadoop, Hive and Spark.
There is a need of real time data analytics and report generations.
There is a requirement of impressive fault tolerant mechanism.
There is a requirement of homogenous system.
There is a requirement of lots of customisation for tuning.
If you need a fully consistent database with SQL semantics, Cassandra is NOT the solution for you. Cassandra supports key-value lookups. It does not support SQL queries. Data in Cassandra is "eventually consistent". Concurrent lookups of data may be inconsistent, but eventually lookups are consistent.
If you need strict semantics and need support for SQL queries, choose another solution such as MySQL, PostGres, or combine use of Cassandra with Solr.
Apache cassandra is a distributed database for managing large amounts of structured data across many commodity servers, while providing highly available service and no single point of failure.
The archichecture is purely based on the cap theorem, which is availability , and partition tolerance, and interestingly eventual consistently.
Dont Use it, if your not storing volumes of data across racks of clusters,
Dont use if you are not storing Time series data,
Dont Use if you not patitioning your servers,
Dont use if you require strong Consistency.
Mongodb has very powerful aggregate functions and an expressive aggregate framework. It has many of the features developers are accustomed to using from the relational database world. It's document data/storage structure allows for more complex data models than Cassandra, for example.
All this comes with trade-offs of course. So when you select your database (NoSQL, NewSQL, or RDBMS) look at what problem you are trying to solve and at your scalability needs. No one database does it all.
According to DataStax, Cassandra is not the best use case when there is a need for
1- High end hardware devices.
2- ACID compliant with no roll back (bank transaction)
It does not support complete transaction management across the
tables.
Secondary Index not supported.
Have to rely on Elastic search /Solr for Secondary index and the custom sync component has to be written.
Not ACID compliant system.
Query support is limited.
While computer programming evangelists predicting the future of Cloud Computing to be very bright, is there a chance for relational databases to be on their way out?
What are the DBs that are more suitable for Cloud Computing?
Here's a good article that may answer some of your questions. It features a good comparison between RDBMS systems and the ones usually used for cloud storage infrastructure:
http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomed.php
The relational database model has a firm mathematical basis in relational algebra. This makes it easy to reason about, to extend, and to use properly (in theory). Even if database access patterns change significantly as a result of these new APIs and uses, it's likely that a relational database will form the underlying implementation for this reason.
No, RDBMSs will always have a place because of their functionality. Not just on their own, but also as backbones to other systems (like OODBMSs).
Relational databases are still relevant, both for localized storage (such as application-specific storage) and for server storage.
The cloud computing platforms that I've seen each have a relational database offering. So, I don't see cloud computing really changing the picture in reference to database types being used.
However, something will eventually replace the databases that we're all used to. The question is whether that will be a higher-level version of RDBs or something different. Another aspect of that question is how long will it take for the current crop of RDBs to fade out? (I don't have an answer for either.)
Clouds go poof still these days, so I don't think so anytime soon.
I don't think that cloud computing will kill RDBMSs. Something else might though.
First, what type of storage engine a given application uses does not (or should not) depend on where it is running (the cloud or a specific server), but rather on how it needs to store the data.
Second, as far as I can tell the only reason people think RDBMSs are on their way out is because they don't scale as well as non-relational DBMSs (such as document-oriented DBMSs like CouchDB) which can more easily be distributed into the cloud. However, there is no reason that RDBMSs cannot become more cloud-friendly in the future. As an early example, look at Drizzle:
The Drizzle project is building a database optimized for Cloud and Net applications. It is being designed for massive concurrency on modern multi-cpu/core architecture.
So no, I don't think that cloud computing will kill RDBMSs. They will just be forced to adapt. What might kill them, however, is if an existing alternative, or a new one, becomes as robust and easy to use as RDBMSs. What I mean is a solution that has both completely solid software (betas not allowed) and is easy for programmers to switch to. They give out degrees to people who understand RDBMSs. Because of all the assisting software (such as ORMs like ActiveRecord, SQLAlchemy, and whatever the .NET folk use I'm assuming), using RDBMSs has become easy even for people who don't know what the first normal form is. So I think that until there is a way for people to use (for instance) a DODBMS just as easily, RDBMSs will continue to dominate. I'm also not saying that is necessarily bad. Again, which DBMS you use should depend on your data, not what people say is cool and better.
A quote from the article :
"The inherent constraints of a relational database ensure that data at the lowest level have integrity. Data that violate integrity constraints cannot physically be entered into the database. These constraints don't exist in a key/value database, so the responsibility for ensuring data integrity falls entirely to the application. But application code often carries bugs. Bugs in a properly designed relational database usually don't lead to data integrity issues; bugs in a key/value database, however, quite easily lead to data integrity issues."
What this means to me is that RDBMS's are doomed, and hotshot new technologies are facing a great and brilliant future, to the same extent that users aren't anywhere near interested in the correctness of their data.
IMHO.
There's nothing wrong with relational databases for applications that need to query more structured data (e.g., "How many people bought product XYZ, on this date, paid more than $100, but less than $150?"). There are potentially significant architectural issues that will need to be addressed as these systems scale and grow. Once your DB outgrows the one machine you started on and/or traffic/requests begin to overload available resources, then (if you still want to keep your relational database) you have to start adding layers. Thankfully today, there are many more options available then in previous years... including caching, map and reduce, and other functionality - but these add-on layers do add complexity and maintenance overhead. In one sense I'd consider these engineered "band-aids" which will most likely solve the scalability and distribution problems with a relational DB today, but longer term? Who knows. I also see these popular layers today - all of which are basically trying to emulate functionality already available in object DBs, giving developers a "virtual object DB" layer that they can use with their object languages to do things faster and more efficiently, and get past the growth and performance obstacles. So I guess my overall opinion is, relational DBs became the defacto DB probably mostly due to how (relatively) easy it was to query a database, and get results back to the one client/app using it. As volumes have grown though, and application complexity is exponentially greater today, I think more developers will decide to bite the bullet, learn the syntax for object DBs (which is actually about as standardized today as relational DBs), and just skip all the middleware and layers that only emulate functionality that one could get natively in an OODBMS. I've seen OODBs that simply get installed on any number of servers, and automatically distributing data as needed, and giving the developer a single view of any size federation of databases... Seems to me the best solution as systems become more distributed, to get a DB that can has native distributed architecture. Anyway, just a thought.