NoSQL Database Design for FSA - database

I need to design a NoSQL database for a system that is using an FSA(Fare Service Aggregator) which is having a very heavy load including major scenarios, database aggregates, and queries.
Are any references on how to design a NoSQL database of about 10-15 pages?
Video tutorials or examples would do. Thank you.

There is no database design in NoSQL, it's literally a document dump. The major problem with NoSQL is how to query all the garbage in it. Mostly they all are Key-Value stores, totally unsuitable for business requirements. If document is too big maybe split it. I would suggest Couchbase to play with, because it's almost got SQL for querying objects (must have called it BackSQL) which it seems done right, so there is a chance you will be able to implement something using it. Will be 30x slower than RDBMS, but it's a trade off for NoSQL "scalability" (when you can horizontally add a dozen of servers to compensate slow index building and scan).
https://docs.couchbase.com/tutorials/todo-app/design/data-modeling.html
https://resources.couchbase.com/c/relational-no-sql-wp

Related

Using NoSQL and RDBS together

Using NoSQL & RDBMS in same application is advisable? Managing relational/transaction data into RDBMS and dynamic data ito NoSQL and latter that will be used for Analytic?
Yes
You can use Both NoSQL and RDBMS in same application, In fact many project uses the same approach.
In a working environment you have a Relation-Mapping between different entities of your business and mostly Normalized. So its advisable to use an RDBMS there.
Although some NoSQL solutions are out there in which you can handle these BUT on your own but you're never going to need it, as in NoSQL you can place the related data in one Big Table, but then here you have to think much about the structure of that.
In case of RDBMS the structure is straight forward (Normalized mostly).
So its better to use RDBMS here.
Now in case of Analytics you should use NoSQL, as the content is mostly linear and you will require a "flat structure". Only one table having all required fields. This approach in GOOD as compared to different tables because as the data grows JOINS are going to make the Query so slow that retrieving Data will be pain in ass. Although there are so many solution in RDBMS provided by some of the big names in Database Technologies, I'm not in favor of using them.
Also Analytics is something in which the parameters you are analyzing are changing frequently a Non-Schema Based DB can increase your productivity to large extent.
Telling from personal experience : We were using MSSQL as main DB for the site and for the Analysis but as the data grew we tried many ways, finally settling on using Mongo as the Analytic Database and MSSQL as main DB.
SO the project uses MSSQL for transaction (functional) working and Mongo for Analytics.

why everyone wants NOSQL other than large-scale Oracle Clusters?

oracle has a good reputation for handling large-scale applications and it's also flexible to extend to cluster environment. Why everyone wants NOSQL?
because nosql db is much cheaper?
why not swith object-oriented db?
Firstly, not everyone does want NoSQL. Packaged software (eg ERP) is all pretty much mainstream RDBMS stuff. Don't confuse the amount of development effort with the usage.
What has happened is that NoSQL has opened up a whole range of applications that simply didn't suit relational technologies, and so there's been a rush of applications that CAN be developed. Most of the stuff that could be developed on RDBMS platforms already has been and is in more of a maintenance/upgrade phase. Probably with less upgrading than usual because of the whole global financial climate.
So in ten/fifteen years, as those NoSQL apps move into the same level of maturity, the frenzy will have died down and there's be less excitement.
Oracle also have NoSQL solution: http://www.oracle.com/technetwork/database/nosqldb/overview/index.html
A SQL solution isn't necessarily what you want regardless of scale.
There are situations where you can't easily predict the schema of your model, or worst still your data is schemaless. In those situations you want a data model which doesn't limit you but rather allows you the flexibilty you need to evolve your data while still maintaining core abilities such as fast indexes.
Another reason is that SQL doesn't represent the natural way in which you want to look at your data, predominantly graph DBs such as Neo4J or GraphDB allow developers or users to approach a linked graph model in a more intuitive way.
Of course there is a way to address all these issues in Oracle RDBMS, but it feels more like hacking the DB to fit your needs as opposed to using a DB that fits you. This sounds like a perk but it actually goes a long way into the ease of development and analysis of your application.
Now if we are talking about scale, Oracle can probably beat column based DBs such as HBase or Hypertable, but it is important to note that Oracle RDBMS isn't just more expensive it's way way more expensive. In today's world even small time startups have Terabytes of data they need to analyze daily. Even small companies can use clusters of 100 machines in the cloud to store their data, in such a company Oracle isn't a viable option the annual licensing cost and the hiring of DBAs will prevent startups from using it.
Finally the last reason why you would start off with NoSQL is speed, bringing up a MongoDB and starting development can be done in 5 minutes and sometimes you wanna handle problems as they come up and avoid premature optimization
If you are willing to let go consistency, there are no theorical limits to how mouch you can scale out some NoSQL solutions.
Some RDBMS can scale a lot, and Oracle is among the best of them, but no RDBMS let you cut in consistency, and therefor even the best has a pretty clear theorical limit to how much it can scale, not to mention the real world limits.
Some big names on the net just can't rely only on RDBMS anymore, and many others follows just to be like the big ones. Finally, some solutions are realy best fitted on a scheam-less structure, but those don't account for the most part of NoSQL users I would guess. The main point is to scale the web 2.0.
This is a very general question you are asking. Are you comparing the relational database to NOSQL database, or are you comparing commercial or open source database? We need to figure out as it seems we are comparing apples to oranges, and you will not get a straight answer.
Here are the breakdown from my perspective.
DB type: If you are comparing relational db vs NOSQL db, you should refer to this link
instead.
Cost: If you are comparing from a cost perspective, each has its own cost. Oracle will charge for the license, while the NOSQL db (using MongoDB as an example) are open source, and you don't have to pay the license. But you will need someone to have a good understanding of NOSQL to administer and maintain the site, which is a lot harder to find.
Application: What kind of application are you writing? You need to understand the database requirement for your application. If you need to store a lot of unstructured data as a key value entry, you are better off using NOSQL. On the other side of coin, you would prefer SQL db if you will join a lot of tables and execute complex SQL.
You mentioned Object Oriented DB, and that is another type of database from NoSQL or relational DB. At the end, it depends on your need. To expand the choice of horizon, some might heard of hierarchical database, which mainly live in mainframe environment. :-)

'e-Commerce' scalable database model

I would like to understand database scalability so I've just heard a talk about Habits of Highly Scalable Web Applications
http://techportal.inviqa.com/2010/03/02/habits-of-highly-scalable-web-applications/
On it, the presenter mainly talk about relational database scalability.
I also have read something about MapReduce and Column oriented tables, big tables, hypertable etc... trying to understand which are the most up to date methods to scale web application data. But the second group, to me, is being hard to understand where it fits.
It serves as transactional, reliable data store? or not, its just for large access and processing and to handle fine graned operations we will ever need to rely on RDBMSs?
Could someone give a comprehensive landscape for those new technologies and how to use it?
Basically it's about using the right tool for the job. Relational databases have been around for decades, which means they are very good at solving the problems that haven't changed in that time - things like keeping track of sales for example. Although they have become the default data store for just about everything, they are not so good at handling the problems that didn't exist twenty years ago - particularly scalability and data without a clearly defined, unchanging schema.
NOSQL is a class of tools designed to solve the problems that are not perfectly suited to relational databases. Scalability is the best known, though unlikely to be a relevant to most developers. I think the other key use case that we don't see so much of yet is for small projects that don't need to worry about the data storage characteristics at all, and can just use the default - being able to skip database design, ORM and database maintenance is quite attractive.
For Ecommerce specifically you're probably better off using sql at least in part - You might use NOSQL for product details or a recommendation engine, but you want your sales data in an easily queried sql table.

What is NoSQL, how does it work, and what benefits does it provide? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.

voldemort vs. couchdb

I am trying to decide whether to use voldemort or couchdb for an upcoming healthcare project. I want a storage system that has high availability , fault tolerance, and can scale for the massive amounts of data being thrown at it.
What is the pros/cons of each?
Thanks
Project Voldemort looks nice, but I haven't looked deeply into it so far.
In it current state CouchDB might not be the right thing for "massive amounts of data". Distributing data between nodes and routing queries accordingly is on the roadmap but not implemented so far. The biggest known production setups of CouchDB use "tables" ("databases" in couch-speak) of about 200G.
HA is not natively supported by CouchDB but can build easily: All CouchDB nodes are replicating the database nodes between each other in a multi-master setup. We put two Varnish proxies in front of the CouchDB machines and the Varnish boxes are made redundant with CARP. CouchDBs "build from the Web" design makes such things very easy.
The most pressing issue in our setup is the fact that there are still issues with the replication of large (multi MB) attachments to CouchDB documents.
I suggest you also check the traditional RDBMS route. There are huge issues with available talent outside the RDBMS approach and there are very capable offerings available from Oracle & Co.
Not knowing enough from your question, I would nevertheless say Project Voldemort or distributed hash tables (DHTs) like CouchDB in general are a solution to your problem of HA.
Those DHTs are very nice for high availability but harder to write code for than traditional relational databases (RDBMS) concerning consistency.
They are quite good to store document type information, which may fit nicely with your healthcare project but make development harder for data.
The biggest limitation of most stores is that they are not transactionally safe (See Scalaris for an transactionally safe store) and you need to ensure data consistency by yourself - most use read time consistency by merging conflicting data). RDBMS are much easier to use for consistency of data (ACID)
Joining data is much harder too. In RDBMs you can easily query data over several tables, you need to write code in CouchDB to aggregate data. For other stores Hadoop may be a good choice for aggregating information.
Read about BASE and the CAP theorem on consistency vs. availability.
See
http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
http://queue.acm.org/detail.cfm?id=1394128
Is memcacheDB an option? I've heard that's how Digg handled HA issues.

Resources