Schema-free/flexible ACID database for a SaaS application? - database

I am looking at rewriting a VB based on-premise (locally installed) application (invoicing+inventory) as a web based Clojure application for small enterprise customers. I am intending this to be offered as a SaaS application for customers in similar trade.
I was looking at database options: My choice was an RDBMS: Postgresql/ MySQL. I might scale up to 400 users in the first year, with typically a 20-40 page views/ per day per user - mostly for transactions not static views. Each view will involve fetch data and update data. ACID compliance is necessary(or so I think). So the transaction volume is not huge.
It would have been a no-brainer to pick either of these based on my preference, but for this one requirement, which I believe is typical of a SaaS app: The Schema will be changing as I add more customers/users and for each customer's changing business requirement (I will be offering some limited flexibility only to start with). As I am not a DB expert, based on what I can think of and has read, I can handle that in a number of ways:
Have a traditional RDBMS schema design in MySQl/Postgresql with a single DB hosting multiple tenants. And add enough "free-floating" columns in each table to allow for future changes as I add more customers or changes for an existing customer. This might have a downside of propagating the changes to the DB every time a small change is made to the Schema. I remember reading that in Postgresql schema updates can be done real time without locking. But not sure, how painful or how practical is it in this use case. And also, as the schema changes might also introduce new/ minor SQL changes as well.
Have an RDBMS, but design the database schema in a flexible manner: with a close to entity-attribute-value or just as a key-value store. (Workday, FriendFeed for example)
Have the entire thing in-memory as objects and store them in log files periodically.(e.g., edval, lmax)
Go for a NoSQL DB like MongoDB or Redis. But based on what I can gather, they are not suitable for this use-case and not fully ACID compliant.
Go for some NewSQL Dbs like VoltDb or JustoneDb(cloud based) which retain the SQL and ACID compliant behaviour and are "new-gen" RDBMS.
I looked at neo4j(graphdb), but not sure if that will fit this use-case
In my use case, more than scalability or distributed computing, I am looking at a better way to achieve "Flexibility in Schema + ACID + some reasonable Performance". Most of the articles I could find on the net speak of flexibility in schema as a cause leading to performance(in the case of NoSQL DBs) and scalability while leaving out the ACID/Transactions side.
Is this an "either or" case of 'Schema flexibility vs ACID' transactions or Is there a better way out?

I think tarantool can help you. That solution have transactions, lua, msgpack, and etc. And also see that video

Related

Relational or NoSQL database for application

I will be working on application which should manage instracompany documentation. It should work like this: User can upload a document with mandatory fields like: description, when document will be accessible for "readers", group of people who should approve the document, group of people who should read the document after successful approval... then history of documents, which decision each user made (aggred x disaggreed), some basic managment of users/groups/documents/roles etc... This application should be for a company (and it should run on local network).
Should I use relational database with ORM or NoSQL database? And why? What would be benefits of relation db or nosql related to description of application above.
Thanks
If you have a strictly defined schema (seems to be the case) and predictable traffic (which is also very likely in a corporate environment) and want ACID transactions and data recovery guarantees which have been tested and polished for many years (you surely do) then RDBMS is your choice. It doesn't matter what is used on the application side, ORM, plain JDBC or whatever.
One slippery point might be the document storage, however, provided that documents are not huge, relational databases (e.g. PostgreSQL) will do the job just fine.
This assumes that you do not expect hundreds of thousands requests per second and thus don't need any sharding. Even if you do expect such load, RDBMS may be okay.

Persisting User Preferences - Relational or Document-Oriented Database

I am looking into persisting user preferences past session expiration for an application and was curious if based on people's previous experiences a Relational Database (i.e. Oracle, MySql) or Document-Oriented Database (i.e. MongoDB, Redis) is better suited for this task. To help clarify the meaning of user preferences, my web-application would be storing pretty detailed information on a per-user basis including but not limited to: window size and position, grid column width and order, various widget states (collapsed/un-collapsed panels). All persistence in my application is currently handled by a Relational Database, but I have a feeling that something like user preferences may lend itself better to a Document-Oriented Database because it may be hard to represent this data in a strictly-structured way and a semi-structured approach may be better.
If you are already using a relational database for your application, it makes little sense to separate out just user privileges to a document-oriented db - it would just increase complexity. Starting a new app, it's worth considering.
For existing application you may consider using semi-structured data stores, like Postgresql's hstore.
The question being asked is Suitability not Practicality of installing new DB.
What is DB better suited for non-relational data like user preference ?
Clearly the answer should be non-relational DB. Document oriented NoSQL databases are suitable to storing these.
The OP mentioned Widgets etc preferences which are most likely JSON a document/objects. This is another reason mongoDB or JSON document oriented DB is more suitable.
There is also a fear of "installing new database" which is coming from the experience/pains of older relational databases which none of these NoSQL will have. But all this is besides the "suitability" question. Many factors will go into the "practicality" decision besides just the dependency.

What can an RDBMS do that Neo4j (and graph databases) cant?

“A Graph Database –transforms a–> RDBMS”
The Neo4j site seems to imply that whatever you can do in RDBMS, you can do in Neo4j.
Before choosing Neo4j as a replacement for an RDBMS, I need some doubts answered.
I am interested in Neo4j for
ability to do quickly modify data "schema"
ability to express entities naturally instead of relations and normalizations
...which leads to highly expressive code (better than ORM)
This is a NoSQL solution I am interested in for it's features, not high performance.
Question: Does Neo4j present any issues that may make it unsuitable as a RDBMS replacement?
I am particularly concerned about these:
is there any DB feature I must implement in application logic? (For example, you must implement joins at application layer for a few NoSQL DBs)
Are the fields "indexed" to allow a lookup faster than O(n)?
How do I handle hot backups and replication?
any issues with "altering" schema or letting entities with different versions of the schema living together?
This is an extremely broad topic covering everything from modeling and implementation to IT and support. It's impossible to really answer all those questions here, especially without details on your situation. However, you seem to be exploring options and avenues. So, I'll just pass on some general food for thought as someone that's implemented a number of systems.
Everybody seems to think their new database paradigm is a replacement for relational databases. So, take those claims with a grain of salt.
I like to think in terms of 3 fundamental models: Relational, Document, and Graphing. Depending on your problem space one or even more of these is the right answer. I would not do financial transactions in anything but relational (SQL Based). If you are building a CMS, then a Document DB is the way to go. If my application is modeling networks (roads, people, connections, networks etc.) I use Neo4J.
As far as production quality, there are solid options in each category. Relational has a bunch. For document databases I'd go MongoDB or a higher level JCR system like Apache Jackrabbit. For graphing, I only have experience with Neo4j and it is rock solid for me.
Whatever you do, don't buy into the hype that "We have the one technology that solves all your problems." It's not there and it narrows your thinking.
I 'm convinced Neo4j is a good replacement for relational databases by now.
It is ACID compliant
Though the community version lacks some features like hot backups, the enterprise edition has
You can get support for it
At first sight (and in the new releases where you don't need a START clause) its query language CYPHER can do almost anything SQL can
but
it's harder to find a CYPHER developer than a SQL one
and it does not have an equivalent optimizer: it matters more than with SQL how you write the query
Though it supports replication and Neo explicitly markets it as a big data product, I can't confirm it is scalable enough and I did not study security aspects.
In recent releases (younger that the question above), one can define indexes on labels, which work like indexes on tables in a relational DB, allowing for O(log(n)) lookups.
(fyi: Neo4j has no tables, but each node(~=row) can have different labels, comparable to gmail labels. This is more flexible: you don't have to chose whether you put cars and bicycles in one for vehicles table or not: a bicycle would have both a :vehicle and a :bicycle label.)
To answer the original question: Neo4j does hardly support for schema enforcement. Neo advices implementing automated consistency tests on your database, which you run on your acceptance test instance as part of your release cycle.
Using an enterprise db such as oracle will give you many, many features which may or may not be part of neo. These include:
ACID transactions
High availability / backups / standby
ability to use sql to get data in the most efficient way using a cost based optimizer - the db determines the best way to retrieve the data based on your latest statistics
Scalability, partitioning
support
security
If you are going to implement most of the functionality of your application in code by yourself and don't require the structure and advanced features offered by an rdbms or if your data structures are better suited to a graph based db then by all means trial neo. There is a reason that most corporate apps use a one of the traditional rdbms servers but this may not always be the case in the future

'e-Commerce' scalable database model

I would like to understand database scalability so I've just heard a talk about Habits of Highly Scalable Web Applications
http://techportal.inviqa.com/2010/03/02/habits-of-highly-scalable-web-applications/
On it, the presenter mainly talk about relational database scalability.
I also have read something about MapReduce and Column oriented tables, big tables, hypertable etc... trying to understand which are the most up to date methods to scale web application data. But the second group, to me, is being hard to understand where it fits.
It serves as transactional, reliable data store? or not, its just for large access and processing and to handle fine graned operations we will ever need to rely on RDBMSs?
Could someone give a comprehensive landscape for those new technologies and how to use it?
Basically it's about using the right tool for the job. Relational databases have been around for decades, which means they are very good at solving the problems that haven't changed in that time - things like keeping track of sales for example. Although they have become the default data store for just about everything, they are not so good at handling the problems that didn't exist twenty years ago - particularly scalability and data without a clearly defined, unchanging schema.
NOSQL is a class of tools designed to solve the problems that are not perfectly suited to relational databases. Scalability is the best known, though unlikely to be a relevant to most developers. I think the other key use case that we don't see so much of yet is for small projects that don't need to worry about the data storage characteristics at all, and can just use the default - being able to skip database design, ORM and database maintenance is quite attractive.
For Ecommerce specifically you're probably better off using sql at least in part - You might use NOSQL for product details or a recommendation engine, but you want your sales data in an easily queried sql table.

Applications for using couchDB and a RDBMS together

Wondering if there was a scenario where one would use a document-based DB and a relational DB together in a best-of-both-worlds scenario?
In my view, until I see an actual (open source or otherwise transparent) application successfully doing this, I will remain skeptical that it's worthwhile for projects with fewer than a dozen developers.
I suspect that by choosing one database over another and sticking with it--in good times and in bad--developers will reduce both the complexity of the data model and the maintenance cost of the code. Also, by choosing two databases, one runs the risk of a worst-of-both-worlds scenario, with data which is both difficult to manipulate and report on (CouchDB) and also not scalable (RDBMS).
One idea is to use a relational database as the main data store and a document-based db as a data distribution mechanism from the back end to the front end(s).
We use a mix of RDBMS and CouchDB. The RDBMS (IBM DB/2) is used for "exact" data where transactions make things easier. Examples are bookkeeping of money and inventory. CouchDB is used for archival of "finished" records from the RDBMS, digital asserts (JPEGs, scanned documents) and badly structured information, e.g. information acquired via shipping companie's track and trace systems.

Resources