NoSQL are an alternative to RDBMS, that work well with simple data models holding vast volumes of data. MongoDB, Google's BigTable, Dojo's Persevere, Amazon's Dynamo, Facebook's Cassandra are some examples. Could you state your favorite application/IDE that can manage (i.e. insert or query data) such databases?
NoSQL database are at their childhood, there are no real managing applications i am aware of, but most nosql database provide some form of an cli-tool to interact with the database.
Developing such managing applications is also non trivial, in SQL based systems with indexes you can show efficient a listing of rows beginning with the first and ending with the n-th row of a table containing m rows (n < m)
But NoSQL database are mainly Key-Value-Stores with a hashtable-lookup, so you can't really look efficient at the first n data sets.
Sure there's are exceptions like Redis and MongoDB which provide some form of sorted sets and indexes, but they are really the exception.
If you need such a tool write a minimalistic version like Try Redis
or MongoDB Browser Shell, will not be much effort and provides everything you can do to query your database.
For MongoDB you can use RoboMongo, MonguVUE etc. CouchDB has it's own interface but there are plenty of GUI's available but also depending on the type of NoSQL you will use as there are four categories.
Related
Using NoSQL & RDBMS in same application is advisable? Managing relational/transaction data into RDBMS and dynamic data ito NoSQL and latter that will be used for Analytic?
Yes
You can use Both NoSQL and RDBMS in same application, In fact many project uses the same approach.
In a working environment you have a Relation-Mapping between different entities of your business and mostly Normalized. So its advisable to use an RDBMS there.
Although some NoSQL solutions are out there in which you can handle these BUT on your own but you're never going to need it, as in NoSQL you can place the related data in one Big Table, but then here you have to think much about the structure of that.
In case of RDBMS the structure is straight forward (Normalized mostly).
So its better to use RDBMS here.
Now in case of Analytics you should use NoSQL, as the content is mostly linear and you will require a "flat structure". Only one table having all required fields. This approach in GOOD as compared to different tables because as the data grows JOINS are going to make the Query so slow that retrieving Data will be pain in ass. Although there are so many solution in RDBMS provided by some of the big names in Database Technologies, I'm not in favor of using them.
Also Analytics is something in which the parameters you are analyzing are changing frequently a Non-Schema Based DB can increase your productivity to large extent.
Telling from personal experience : We were using MSSQL as main DB for the site and for the Analysis but as the data grew we tried many ways, finally settling on using Mongo as the Analytic Database and MSSQL as main DB.
SO the project uses MSSQL for transaction (functional) working and Mongo for Analytics.
I am looking into persisting user preferences past session expiration for an application and was curious if based on people's previous experiences a Relational Database (i.e. Oracle, MySql) or Document-Oriented Database (i.e. MongoDB, Redis) is better suited for this task. To help clarify the meaning of user preferences, my web-application would be storing pretty detailed information on a per-user basis including but not limited to: window size and position, grid column width and order, various widget states (collapsed/un-collapsed panels). All persistence in my application is currently handled by a Relational Database, but I have a feeling that something like user preferences may lend itself better to a Document-Oriented Database because it may be hard to represent this data in a strictly-structured way and a semi-structured approach may be better.
If you are already using a relational database for your application, it makes little sense to separate out just user privileges to a document-oriented db - it would just increase complexity. Starting a new app, it's worth considering.
For existing application you may consider using semi-structured data stores, like Postgresql's hstore.
The question being asked is Suitability not Practicality of installing new DB.
What is DB better suited for non-relational data like user preference ?
Clearly the answer should be non-relational DB. Document oriented NoSQL databases are suitable to storing these.
The OP mentioned Widgets etc preferences which are most likely JSON a document/objects. This is another reason mongoDB or JSON document oriented DB is more suitable.
There is also a fear of "installing new database" which is coming from the experience/pains of older relational databases which none of these NoSQL will have. But all this is besides the "suitability" question. Many factors will go into the "practicality" decision besides just the dependency.
Are either HBase/Hive suitable replacements as your traditional (non)relational database? Will they be able to serve up web-requests from web clients and respond in a timely manner? Are HBase/Hive only suitable for large dataset analysis? Sorry I'm a noob at this subject. Thanks in advance!
Hive is not at all suitable for any real time need such as timely web responses. You can use HBase though. But don't think about either HBase or Hive as a replacement of traditional RDBMSs. Both were meant to serve different needs. If your data is not huge enough better go with a RDBMS. RDBMSs are still the best choice(if they fit into your requirements). Technically speaking, HBase is really more a DataStore than DataBase because it lacks many of the features you find in an RDBMS, such as typed columns, secondary indexes, triggers, and advanced query languages, etc.
And the most important thing which could struck a newbie is the lack of SQL support by HBase, since it belongs to NoSQL family of stores.
And HBase/Hive are not the only options to handle large datasets. You have several options like Cassandra, Hypertable, MongoDB, Accumulo etc etc. But each one is meant for solving some specific problem. For example, MongoDB is used handling document data. So, you need to analyze your use case first and based on that you have to choose the datastore which suits your requirements.
You might find this list useful which compares different NoSQL datastores.
HTH
Hive is data warehouse tool, and it is mainly used for batch processing.
HBase is NoSQL database which allows random access based on rowkey (primary key). It is used for transactional access. It doesn't have indexing support which could be limitation for your needs.
Thanks,
Dino
What type of NoSQL database is best suited to store hierarchical data?
Say for example I want to store posts of a forum with a tree structure:
original post
+ re: original post
+ re: original post
+ re2: original post
+ re3: original post
+ re2: original post
MongoDB and CouchDB offer solutions, but not built in functionality. See this SO question on representing hierarchy in a relational database as most other NoSQL solutions I've seen are similar in this regard; where you have to write your own algorithms for recalculating that information as nodes are added, deleted and moved. Generally speaking you're making a decision between fast read times (e.g. nested set) or fast write times (adjacency list). See aforementioned SO question for more options along these lines - the flat table approach appears most aligned with your question.
One standard that does abstract away these considerations is the Java Content Repository (JCR), both Apache JackRabbit and JBoss eXo are implementations. Note, behind the scenes both are still doing some sort of algorithmic calculations to maintain hierarchy as described above. In addition, the JCR also handles permissions, file storage, and several other aspects - so it may be overkill for your project.
What you possibly need is a document-oriented database like MongoDB or CouchDB.
See examples of different techniques which allow you to store hierarchical data in MongoDB:
http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
The most common one is IBM's IMS.There is also Cache Database
See this question posted on dba section of stackexchange.
Faced with the same issue, I decided to create my own (very simple) solution using Lua + Redis https://github.com/qbolec/Redis-Tree/
Exist-db implemented hierarchical data model for xml persistence
Graph databases would probably also solve this problem. If neo4j is not enough for you in terms of scaling, consider Titan, which is based on various storage back-ends including HBase and should scale very well. It is not as mature as neo4j, but it is a very promising project.
LDAP, obviously. OpenLDAP would make short work of it.
In mathematics, and, more specifically, in graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path. So any graph db will do the job for sure. BTW an ordinary graph like a tree can be simply mapped to any relational or non-relational DB. To store hierarchical data into a relational db take a look at this awesome presentation by Bill Karwin. There are also ORMs with facilities to store trees. For example TypeORM supports the Adjacency list and Closure table patterns for storing hierarchical structures.
TypeORM is used in TypeScript\Javascript development. Check popular ORMs to find a one supporting trees based on your environment.
The king of Non-relational DBs [IMHO] is Mongodb. Check out it's documentation. to find out how it stores trees. Trees are the most common kind of graphs and they are used everywhere. Any well-established DB solution should have a way to deal with trees.
Here's a non-answer for you. SQLServer 2008!!!! It's great for recursive queries. Or you can go the old fashioned route and store hierarchy data in a separate table to avoid recursion.
I think relational databases lend themselves very well to tree data. Both in query performance and ease of use. With one caveat.... you will be inserting into an indexed table, and probably several other indexed tables every time someone makes a post. Insert performance could be an issue on a facebook caliber forum.
Check out MarkLogic. You can download a demo copy from the website. It is a database for unstructured data and falls under the NoSQL classification of databases. I know unstructured data is a pretty loaded term but just think of it as data that does not fit well in the rows and columns of a RDBMS (like hierarchical data).
Just spent the weekend at a training course using MUMUPS db as a back-end for a full stack javascript browser application development framework. Great stuff! I'd recommend GT.M distro of MUMPS under GPL. Or try http://sourceforge.net/projects/mumps/?source=recommended for vanilla MUMPS. Check out http://robtweed.wordpress.com/ for ewd.js js framework and more info on MUMPS.
A NoSql storage service with native support for hierarchical data is Amazon Web Service's Simple Storage Service (AWS S3). The path based keys are hierarchical by nature, and the blob values may be typed using attributes (mime type, e.g. application/json, text/csv, etc.). Advantages of S3 include the ability to scale to both extremely large overall capacity, versioning, as well as nearly infinite concurrent writes. Disadvantages include no support for conditional writes (optimistic concurrency), or consistent reads (only for read-after write) and no support for references/relationships. It is also purely usage based so wide variations in demand do not require complex scaling infrastructure or over-provisioned capacity.
Clicknouse db has explicit support for hierarchical data
Wondering if there was a scenario where one would use a document-based DB and a relational DB together in a best-of-both-worlds scenario?
In my view, until I see an actual (open source or otherwise transparent) application successfully doing this, I will remain skeptical that it's worthwhile for projects with fewer than a dozen developers.
I suspect that by choosing one database over another and sticking with it--in good times and in bad--developers will reduce both the complexity of the data model and the maintenance cost of the code. Also, by choosing two databases, one runs the risk of a worst-of-both-worlds scenario, with data which is both difficult to manipulate and report on (CouchDB) and also not scalable (RDBMS).
One idea is to use a relational database as the main data store and a document-based db as a data distribution mechanism from the back end to the front end(s).
We use a mix of RDBMS and CouchDB. The RDBMS (IBM DB/2) is used for "exact" data where transactions make things easier. Examples are bookkeeping of money and inventory. CouchDB is used for archival of "finished" records from the RDBMS, digital asserts (JPEGs, scanned documents) and badly structured information, e.g. information acquired via shipping companie's track and trace systems.