Google's Bigtable vs. A Relational Database [duplicate] - database

This question already has answers here:
Closed 13 years ago.
Duplicates
Why should I use document based database instead of relational database?
Pros/Cons of document based database vs relational database
I don't know much about Google's Bigtable but am wondering what the difference between Google's Bigtable and relational databases like MySQL is. What are the limitations of both?

Bigtable is Google's invention to deal with the massive amounts of information that the company regularly deals in. A Bigtable dataset can grow to immense size (many petabytes) with storage distributed across a large number of servers. The systems using Bigtable include projects like Google's web index and Google Earth.
According to Google whitepaper on the subject:
A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
The internal mechanics of Bigtable versus, say, MySQL are so dissimilar as to make comparison difficult, and the intended goals don't overlap much either. But you can think of Bigtable a bit like a single-table database. Imagine, for example, the difficulties you would run into if you tried to implement Google's entire web search system with a MySQL database -- Bigtable was built around solving those problems.
Bigtable datasets can be queried from services like AppEngine using a language called GQL ("gee-kwal") which is a based on a subset of SQL. Conspicuously missing from GQL is any sort of JOIN command. Because of the distributed nature of a Bigtable database, performing a join between two tables would be terribly inefficient. Instead, the programmer has to implement such logic in his application, or design his application so as to not need it.

Google's BigTable and other similar projects (ex: CouchDB, HBase) are database systems that are oriented so that data is mostly denormalized (ie, duplicated and grouped).
The main advantages are:
- Join operations are less costly because of the denormalization
- Replication/distribution of data is less costly because of data independence (ie, if you want to distribute data across two nodes, you probably won't have the problem of having an entity in one node and other related entity in another node because similar data is grouped)
This kind of systems are indicated for applications that need to achieve optimal scale (ie, you add more nodes to the system and performance increases proportionally). In an RDBMS like MySQL or Oracle, when you start adding more nodes if you join two tables that are not in the same node, the join cost is higher. This becomes important when you are dealing with high volumes.
RDBMS' are nice because of the richness of the storage model (tables, joins, fks). Distributed databases are nice because of the ease of scale.

Related

Graph database vs. RDB with link/bridge tables

I work in the fraud/AML (anti-money laundering) field, and we are exploring using a graph database to unearth hidden connections and links. I've read a fair amount abut graph databases lately (mostly neo4j, but I think the concepts are similar across different products?), and from what I can tell, they seem to be well-suited to this domain. The issue is that I'm having a hard time getting buy-in from tech management, as they seem to think that we can do the same things with our existing data reporting model, which is in Hadoop, and is essentially a data warehouse which has specific tables that provide many-to-many link tables between the core tables (I believe Kimball calls them 'bridge' tables?).
In a way, they seem to provide the same functionality as the relationship tables in a graph DB. Given that we have already constructed the link tablesin Hadoop, would a graph database provide any performance advantage for the kinds of things we may want to do (e.g. How is Customer A connected to Customer B), or have we largely negated any performance advantage of a graph DB by building all of the link tables?
On similar hardware platforms, a relational database will never be able to keep up with a well constructed graph database when performing "path-between" queries. Never.
Every graph database product has its own internal storage representation, but they are all fundamentally designed to store nodes and edges and support navigational queries across those nodes and edges. Without the addition of new graph-support features, relational database will struggle to provide graph-like capabilities.
The other advantage of using a native graph database is that the graph query languages are specifically designed to support path-between queries. In Objectivity/DB, a massively scalable and distributable object/graph database, we can use the DO query language to find all of the paths between two entities up to a specified number of degrees apart in milliseconds or seconds. A DO query might look like the following:
Match p = (:Account { accountId = "1234"})
-[*..100]->
(:Account { accountId = "5678"})
return p;
Here, we are saying: Find all paths (p) from Account 1234 to Account 5678, where they are between 1 and 100 degrees apart.
To create and execute this same query in a relational database would be much more complicated (without the addition of graph features to the database) and the execution of a query like this in a relational database would be much more resource intensive (memory, cpu, I/O).
If you have the opportunity to explore graph database for your project, make sure you understand your scalability and data distribution requirements. That information will be key to selecting the correct product.
Disclaimer: I am the Director of Field Operations for Objectivity.

Practical example for each type of database (real cases) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
There are several types of database for different purposes, however normally MySQL is used to everything, because is the most well know Database. Just to give an example in my company an application of big data has a MySQL database at an initial stage, what is unbelievable and will bring serious consequences to the company. Why MySQL? Just because no one know how (and when) should use another DBMS.
So, my question is not about vendors, but type of databases. Can you give me an practical example of specific situations (or apps) for each type of database where is highly recommended to use it?
Example:
• A social network should use the type X because of Y.
• MongoDB or couch DB can't support transactions, so Document DB is not good to an app for a bank or auctions site.
And so on...
Relational: MySQL, PostgreSQL, SQLite, Firebird, MariaDB, Oracle DB, SQL server, IBM DB2, IBM Informix, Teradata
Object: ZODB, DB4O, Eloquera, Versant , Objectivity DB, VelocityDB
Graph databases: AllegroGraph, Neo4j, OrientDB, InfiniteGraph, graphbase, sparkledb, flockdb, BrightstarDB
Key value-stores: Amazon DynamoDB, Redis, Riak, Voldemort, FoundationDB, leveldb, BangDB, KAI, hamsterdb, Tarantool, Maxtable, HyperDex, Genomu, Memcachedb
Column family: Big table, Hbase, hyper table, Cassandra, Apache Accumulo
RDF Stores: Apache Jena, Sesame
Multimodel Databases: arangodb, Datomic, Orient DB, FatDB, AlchemyDB
Document: Mongo DB, Couch DB, Rethink DB, Raven DB, terrastore, Jas DB, Raptor DB, djon DB, EJDB, denso DB, Couchbase
XML Databases: BaseX, Sedna, eXist
Hierarchical: InterSystems Caché, GT.M thanks to #Laurent Parenteau
I found two impressive articles about this subject. All credits to highscalability.com. The information in this answer is transcribed from these articles:
35+ Use Cases For Choosing Your Next NoSQL Database
What The Heck Are You Actually Using NoSQL For?
If Your Application Needs...
• complex transactions because you can't afford to lose data or if you would like a simple transaction programming model then look at a Relational or Grid database.
• Example: an inventory system that might want full ACID. I was very unhappy when I bought a product and they said later they were out of stock. I did not want a compensated transaction. I wanted my item!
• to scale then NoSQL or SQL can work. Look for systems that support scale-out, partitioning, live addition and removal of machines, load balancing, automatic sharding and rebalancing, and fault tolerance.
• to always be able to write to a database because you need high availability then look at Bigtable Clones which feature eventual consistency.
• to handle lots of small continuous reads and writes, that may be volatile, then look at Document or Key-value or databases offering fast in-memory access. Also, consider SSD.
• to implement social network operations then you first may want a Graph database or second, a database like Riak that supports relationships. An in-memory relational database with simple SQL joins might suffice for small data sets. Redis' set and list operations could work too.
• to operate over a wide variety of access patterns and data types then look at a Document database, they generally are flexible and perform well.
• powerful offline reporting with large datasets then look at Hadoop first and second, products that support MapReduce. Supporting MapReduce isn't the same as being good at it.
• to span multiple data-centers then look at Bigtable Clones and other products that offer a distributed option that can handle the long latencies and are partition tolerant.
• to build CRUD apps then look at a Document database, they make it easy to access complex data without joins.
• built-in search then look at Riak.
• to operate on data structures like lists, sets, queues, publish-subscribe then look at Redis. Useful for distributed locking, capped logs, and a lot more.
• programmer friendliness in the form of programmer-friendly data types like JSON, HTTP, REST, Javascript then first look at Document databases and then Key-value Databases.
• transactions combined with materialized views for real-time data feeds then look at VoltDB. Great for data-rollups and time windowing.
• enterprise-level support and SLAs then look for a product that makes a point of catering to that market. Membase is an example.
• to log continuous streams of data that may have no consistency guarantees necessary at all then look at Bigtable Clones because they generally work on distributed file systems that can handle a lot of writes.
• to be as simple as possible to operate then look for a hosted or PaaS solution because they will do all the work for you.
• to be sold to enterprise customers then consider a Relational Database because they are used to relational technology.
• to dynamically build relationships between objects that have dynamic properties then consider a Graph Database because often they will not require a schema and models can be built incrementally through programming.
• to support large media then look storage services like S3. NoSQL systems tend not to handle large BLOBS, though MongoDB has a file service.
• to bulk upload lots of data quickly and efficiently then look for a product that supports that scenario. Most will not because they don't support bulk operations.
• an easier upgrade path then use a fluid schema system like a Document Database or a Key-value Database because it supports optional fields, adding fields, and field deletions without the need to build an entire schema migration framework.
• to implement integrity constraints then pick a database that supports SQL DDL, implement them in stored procedures, or implement them in application code.
• a very deep join depth then use a Graph Database because they support blisteringly fast navigation between entities.
• to move behavior close to the data so the data doesn't have to be moved over the network then look at stored procedures of one kind or another. These can be found in Relational, Grid, Document, and even Key-value databases.
• to cache or store BLOB data then look at a Key-value store. Caching can for bits of web pages, or to save complex objects that were expensive to join in a relational database, to reduce latency, and so on.
• a proven track record like not corrupting data and just generally working then pick an established product and when you hit scaling (or other issues) use one of the common workarounds (scale-up, tuning, memcached, sharding, denormalization, etc).
• fluid data types because your data isn't tabular in nature, or requires a flexible number of columns, or has a complex structure, or varies by user (or whatever), then look at Document, Key-value, and Bigtable Clone databases. Each has a lot of flexibility in their data types.
• other business units to run quick relational queries so you don't have to reimplement everything then use a database that supports SQL.
• to operate in the cloud and automatically take full advantage of cloud features then we may not be there yet.
• support for secondary indexes so you can look up data by different keys then look at relational databases and Cassandra's new secondary index support.
• create an ever-growing set of data (really BigData) that rarely gets accessed then look at Bigtable Clone which will spread the data over a distributed file system.
• to integrate with other services then check if the database provides some sort of write-behind syncing feature so you can capture database changes and feed them into other systems to ensure consistency.
• fault tolerance check how durable writes are in the face power failures, partitions, and other failure scenarios.
• to push the technological envelope in a direction nobody seems to be going then build it yourself because that's what it takes to be great sometimes.
• to work on a mobile platform then look at CouchDB/Mobile couchbase.
General Use Cases (NoSQL)
• Bigness. NoSQL is seen as a key part of a new data stack supporting: big data, big numbers of users, big numbers of computers, big supply chains, big science, and so on. When something becomes so massive that it must become massively distributed, NoSQL is there, though not all NoSQL systems are targeting big. Bigness can be across many different dimensions, not just using a lot of disk space.
• Massive write performance. This is probably the canonical usage based on Google's influence. High volume. Facebook needs to store 135 billion messages a month (in 2010). Twitter, for example, has the problem of storing 7 TB/data per day (in 2010) with the prospect of this requirement doubling multiple times per year. This is the data is too big to fit on one node problem. At 80 MB/s it takes a day to store 7TB so writes need to be distributed over a cluster, which implies key-value access, MapReduce, replication, fault tolerance, consistency issues, and all the rest. For faster writes in-memory systems can be used.
• Fast key-value access. This is probably the second most cited virtue of NoSQL in the general mind set. When latency is important it's hard to beat hashing on a key and reading the value directly from memory or in as little as one disk seek. Not every NoSQL product is about fast access, some are more about reliability, for example. but what people have wanted for a long time was a better memcached and many NoSQL systems offer that.
• Flexible schema and flexible datatypes. NoSQL products support a whole range of new data types, and this is a major area of innovation in NoSQL. We have: column-oriented, graph, advanced data structures, document-oriented, and key-value. Complex objects can be easily stored without a lot of mapping. Developers love avoiding complex schemas and ORM frameworks. Lack of structure allows for much more flexibility. We also have program- and programmer-friendly compatible datatypes like JSON.
• Schema migration. Schemalessness makes it easier to deal with schema migrations without so much worrying. Schemas are in a sense dynamic because they are imposed by the application at run-time, so different parts of an application can have a different view of the schema.
• Write availability. Do your writes need to succeed no matter what? Then we can get into partitioning, CAP, eventual consistency and all that jazz.
• Easier maintainability, administration and operations. This is very product specific, but many NoSQL vendors are trying to gain adoption by making it easy for developers to adopt them. They are spending a lot of effort on ease of use, minimal administration, and automated operations. This can lead to lower operations costs as special code doesn't have to be written to scale a system that was never intended to be used that way.
• No single point of failure. Not every product is delivering on this, but we are seeing a definite convergence on relatively easy to configure and manage high availability with automatic load balancing and cluster sizing. A perfect cloud partner.
• Generally available parallel computing. We are seeing MapReduce baked into products, which makes parallel computing something that will be a normal part of development in the future.
• Programmer ease of use. Accessing your data should be easy. While the relational model is intuitive for end users, like accountants, it's not very intuitive for developers. Programmers grok keys, values, JSON, Javascript stored procedures, HTTP, and so on. NoSQL is for programmers. This is a developer-led coup. The response to a database problem can't always be to hire a really knowledgeable DBA, get your schema right, denormalize a little, etc., programmers would prefer a system that they can make work for themselves. It shouldn't be so hard to make a product perform. Money is part of the issue. If it costs a lot to scale a product then won't you go with the cheaper product, that you control, that's easier to use, and that's easier to scale?
• Use the right data model for the right problem. Different data models are used to solve different problems. Much effort has been put into, for example, wedging graph operations into a relational model, but it doesn't work. Isn't it better to solve a graph problem in a graph database? We are now seeing a general strategy of trying to find the best fit between a problem and solution.
• Avoid hitting the wall. Many projects hit some type of wall in their project. They've exhausted all options to make their system scale or perform properly and are wondering what next? It's comforting to select a product and an approach that can jump over the wall by linearly scaling using incrementally added resources. At one time this wasn't possible. It took custom built everything, but that's changed. We are now seeing usable out-of-the-box products that a project can readily adopt.
• Distributed systems support. Not everyone is worried about scale or performance over and above that which can be achieved by non-NoSQL systems. What they need is a distributed system that can span datacenters while handling failure scenarios without a hiccup. NoSQL systems, because they have focussed on scale, tend to exploit partitions, tend not use heavy strict consistency protocols, and so are well positioned to operate in distributed scenarios.
• Tunable CAP tradeoffs. NoSQL systems are generally the only products with a "slider" for choosing where they want to land on the CAP spectrum. Relational databases pick strong consistency which means they can't tolerate a partition failure. In the end, this is a business decision and should be decided on a case by case basis. Does your app even care about consistency? Are a few drops OK? Does your app need strong or weak consistency? Is availability more important or is consistency? Will being down be more costly than being wrong? It's nice to have products that give you a choice.
• More Specific Use Cases
• Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.
• Syncing online and offline data. This is a niche CouchDB has targeted.
• Fast response times under all loads.
• Avoiding heavy joins for when the query load for complex joins become too large for an RDBMS.
• Soft real-time systems where low latency is critical. Games are one example.
• Applications where a wide variety of different write, read, query, and consistency patterns need to be supported. There are systems optimized for 50% reads 50% writes, 95% writes, or 95% reads. Read-only applications needing extreme speed and resiliency, simple queries, and can tolerate slightly stale data. Applications requiring moderate performance, read/write access, simple queries, completely authoritative data. A read-only application which complex query requirements.
• Load balance to accommodate data and usage concentrations and to help keep microprocessors busy.
• Real-time inserts, updates, and queries.
• Hierarchical data like threaded discussions and parts explosion.
• Dynamic table creation.
• Two-tier applications where low latency data is made available through a fast NoSQL interface, but the data itself can be calculated and updated by high latency Hadoop apps or other low priority apps.
• Sequential data reading. The right underlying data storage model needs to be selected. A B-tree may not be the best model for sequential reads.
• Slicing off part of service that may need better performance/scalability onto its own system. For example, user logins may need to be high performance and this feature could use a dedicated service to meet those goals.
• Caching. A high performance caching tier for websites and other applications. Example is a cache for the Data Aggregation System used by the Large Hadron Collider.
Voting.
• Real-time page view counters.
• User registration, profile, and session data.
• Document, catalog management and content management systems. These are facilitated by the ability to store complex documents has a whole rather than organized as relational tables. Similar logic applies to inventory, shopping carts, and other structured data types.
• Archiving. Storing a large continual stream of data that is still accessible on-line. Document-oriented databases with a flexible schema that can handle schema changes over time.
• Analytics. Use MapReduce, Hive, or Pig to perform analytical queries and scale-out systems that support high write loads.
• Working with heterogeneous types of data, for example, different media types at a generic level.
• Embedded systems. They don’t want the overhead of SQL and servers, so they use something simpler for storage.
• A "market" game, where you own buildings in a town. You want the building list of someone to pop up quickly, so you partition on the owner column of the building table, so that the select is single-partitioned. But when someone buys the building of someone else you update the owner column along with price.
• JPL is using SimpleDB to store rover plan attributes. References are kept to a full plan blob in S3. (source)
• Federal law enforcement agencies tracking Americans in real-time using credit cards, loyalty cards and travel reservations.
• Fraud detection by comparing transactions to known patterns in real-time.
• Helping diagnose the typology of tumors by integrating the history of every patient.
• In-memory database for high update situations, like a website that displays everyone's "last active" time (for chat maybe). If users are performing some activity once every 30 sec, then you will be pretty much be at your limit with about 5000 simultaneous users.
• Handling lower-frequency multi-partition queries using materialized views while continuing to process high-frequency streaming data.
• Priority queues.
• Running calculations on cached data, using a program friendly interface, without having to go through an ORM.
• Uniq a large dataset using simple key-value columns.
• To keep querying fast, values can be rolled-up into different time slices.
• Computing the intersection of two massive sets, where a join would be too slow.
• A timeline ala Twitter.
Redis use cases, VoltDB use cases and more find here.
This question is almost impossible to answer because of the generality. I think you are looking for some sort of easy answer problem = solution. The problem is that each "problem" becomes more and more unique as it becomes a business.
What do you call a social network? Twitter? Facebook? LinkedIn? Stack Overflow? They all use different solutions for different parts, and many solutions can exist that use polyglot approach. Twitter has a graph like concept, but there are only 1 degree connections, followers and following. LinkedIn on the other hand thrives on showing how people are connected beyond first degree. These are two different processing and data needs, but both are "social networks".
If you have a "social network" but don't do any discovery mechanisms, then you can easily use any basic key-value store most likely. If you need high performance, horizontal scale, and will have secondary indexes or full-text search, you could use Couchbase.
If you are doing machine learning on top of the log data you are gathering, you can integrate Hadoop with Hive or Pig, or Spark/Shark. Or you can do a lambda architecture and use many different systems with Storm.
If you are doing discovery via graph like queries that go beyond 2nd degree vertexes and also filter on edge properties you likely will consider graph databases on top of your primary store. However graph databases aren't good choices for session store, or as general purpose stores, so you will need a polyglot solution to be efficient.
What is the data velocity? scale? how do you want to manage it. What are the expertise you have available in the company or startup. There are a number of reasons this is not a simple question and answer.
A short useful read specific to database selection: How to choose a NoSQL Database?. I will highlight keypoints in this answer.
Key-Value vs Document-oriented
Key-value stores
If you have clear data structure defined such that all the data would have exactly one key, go for a key-value store. It’s like you have a big Hashtable, and people mostly use it for Cache stores or clearly key based data. However, things start going a little nasty when you need query the same data on basis of multiple keys!
Some key value stores are: memcached, Redis, Aerospike.
Two important things about designing your data model around key-value store are:
You need to know all use cases in advance and you could not change the query-able fields in your data without a redesign.
Remember, if you are going to maintain multiple keys around same data in a key-value store, updates to multiple tables/buckets/collection/whatever are NOT atomic. You need to deal with this yourself.
Document-oriented
If you are just moving away from RDBMS and want to keep your data in as object way and as close to table-like structure as possible, document-structure is the way to go! Particularly useful when you are creating an app and don’t want to deal with RDBMS table design early-on (in prototyping stage) and your schema could change drastically over time. However note:
Secondary indexes may not perform as well.
Transactions are not available.
Popular document-oriented databases are: MongoDB, Couchbase.
Comparing Key-value NoSQL databases
memcached
In-memory cache
No persistence
TTL supported
client-side clustering only (client stores value at multiple nodes). Horizontally scalable through client.
Not good for large-size values/documents
Redis
In-memory cache
Disk supported – backup and rebuild from disk
TTL supported
Super-fast (see benchmarks)
Data structure support in addition to key-value
Clustering support not mature enough yet. Vertically scalable (see Redis Cluster specification)
Horizontal scaling could be tricky.
Supports Secondary indexes
Aerospike
Both in-memory & on-disk
Extremely fast (could support >1 Million TPS on a single node)
Horizontally scalable. Server side clustering. Sharded & replicated data
Automatic failovers
Supports Secondary indexes
CAS (safe read-modify-write) operations, TTL support
Enterprise class
Comparing document-oriented NoSQL databases
MongoDB
Fast
Mature & stable – feature rich
Supports failovers
Horizontally scalable reads – read from replica/secondary
Writes not scalable horizontally unless you use mongo shards
Supports advanced querying
Supports multiple secondary indexes
Shards architecture becomes tricky, not scalable beyond a point where you need secondary indexes. Elementary shard deployment need 9 nodes at minimum.
Document-level locks are a problem if you have a very high write-rate
Couchbase Server
Fast
Sharded cluster instead of master-slave of mongodb
Hot failover support
Horizontally scalable
Supports secondary indexes through views
Learning curve bigger than MongoDB
Claims to be faster

How do the newer database models achieve better scalability and performance as compared to a traditional RDBMS implementation?

We have
BigTable from Google,
Hadoop, actively contributed by Yahoo,
Dynamo from Amazon
all aiming towards one common goal - making data management as scalable as possible.
By scalability what I understand is that the cost of the usage should not go up drastically when the size of data increases.
RDBMS's are slow when the amount of data is large as the number of indirections invariable increases leading to more IO's.
How do these custom scalable friendly data management systems solve the problem?
This is a figure from this document explaining Google BigTable:
Looks the same to me. How is the ultra-scalability achieved?
The "traditional" SQL DBMS market really means a very small number of products, which have traditionally targeted business applications in a corporate setting. Massive shared-nothing scalability has not historically been a priority for those products or their customers. So it is natural that alternative products have emerged to support internet scale database applications.
This has nothing to do with the fact that these new products are not "Relational" DBMSs. The relational model can scale just as well as any other model. Arguably the relational model suits these types of massively scalable applications better than say, network (graph based) models. It's just that the SQL language has a lot of disadvantages and no-one has yet come up with suitable relational NOSQL (non-SQL) alternatives.
Speaking specifically to your question about Bigtable, the difference is that the heirarchy in the diagram above is all there is. Each Bigtable tabletserver is responsible for a set of tablets (contiguous row ranges from a table); the mapping from row range to tablet is maintained in the metadata table, while the mapping from tablet to tabletserver is maintained in the memory of the Bigtable master. Looking up a row, or range of rows, requires looking up the metadata entry (which will almost certainly be in memory on the server that hosts it), then using that to look up the actual row on the server responsible for it - resulting in only one, or a few disk seeks.
In a nutshell, the reason this scales well is because it's possible to throw more hardware at it: given enough resources, the metadata is always in memory, and thus there's no need to go to disk for it, only for the data (and not always for that, either!).
It's about using cheap comodity hardware to build a network/grid/cloud and spread the data and load (for example using map/reduce).
RDBMS databases seem to me like software being (originaly) designed to run on one supercomputer. You can use various hard drive arrays, DB clusters, but still..
The amount of data increased so there's one more reason to design new data storages with this in mind - scalability, high availability, terabytes of data.
Another thing - if you build a grid/cloud from cheap servers, it's fault tolerant because you store all data at three (?) different locations and at the same time it's cheap.
Back to your pictures - the first one is from one computer (typically), the second one from a network of computers.
One theoretical answer on scalability is at http://queue.acm.org/detail.cfm?id=1394128 - the ACID guarantees are expensive. See http://database.cs.brown.edu/papers/stonebraker-cacm2010.pdf for a counter-argument.
In fact just surviving power failures is expensive. Years ago now I compared MySQL against Oracle. MySQL was almost unbelieveably faster than Oracle, but we couldn't use it. MySQL of those days was built on top of Berkeley
DB, which was miles faster than Oracle's full blown log-based database, but if the power went off while Berkely DB based MySQL was running, it was a manual process to get the database consistent again when the power went back on, and you'ld probably lose recent updates for good.

What sort of Database system should I use?

I'm planning to write an address book that stored contact information.
Each contact could have an unlimited number of fields.
Mostly strings and integers.
But perhaps references to other Objects.
What are the advantages and disadvantages of using an RDBMS with ORM vs OODBMS vs Document DBMS (like CouchDB).
Thanks.
Most of the problems with relational databases is that if you have vast amounts of tables that have joins to one or many tables, and if you require to pull off data once off, you will have to optimise your SQL query to make joins efficient.
In NoSQL databases, the main objective was to be able to be fast and scale horizontally. Some do avoid data joins so you would have to do this yourself (by pulling data in memory and do match joins). Facebook's own Cassandra (now an Apache Project) is basically a NoSQL database system which guarantees no single point of failure.
Also, RDBMS indexing is relatively faster (but that can be debatable) compared to NoSQL databases when it comes to indexing large documents.
I haven't played with CouchDB or MongoDB so I can't compare them. All I know is some do joins in memory (like Redis) which in effect means, pulling all data from database to memory (RAM) and doing joins.
I don't know if that's what you're looking for.
Consider writing the data to a custom text file.
People's address books rarely get past a few hundred entries, so it's easy enough to scan through the entire list for whatever action you need to do.

What applications/IDEs are out there to manage NoSQL database systems?

NoSQL are an alternative to RDBMS, that work well with simple data models holding vast volumes of data. MongoDB, Google's BigTable, Dojo's Persevere, Amazon's Dynamo, Facebook's Cassandra are some examples. Could you state your favorite application/IDE that can manage (i.e. insert or query data) such databases?
NoSQL database are at their childhood, there are no real managing applications i am aware of, but most nosql database provide some form of an cli-tool to interact with the database.
Developing such managing applications is also non trivial, in SQL based systems with indexes you can show efficient a listing of rows beginning with the first and ending with the n-th row of a table containing m rows (n < m)
But NoSQL database are mainly Key-Value-Stores with a hashtable-lookup, so you can't really look efficient at the first n data sets.
Sure there's are exceptions like Redis and MongoDB which provide some form of sorted sets and indexes, but they are really the exception.
If you need such a tool write a minimalistic version like Try Redis
or MongoDB Browser Shell, will not be much effort and provides everything you can do to query your database.
For MongoDB you can use RoboMongo, MonguVUE etc. CouchDB has it's own interface but there are plenty of GUI's available but also depending on the type of NoSQL you will use as there are four categories.

Resources