I am about to work on a app that will be showing a lot of visualizations. It is an data read-only application, there will be negligible write operations. We have a lot of data(JSON, CSV), depending on the usecase we will have to filter to a subset and send it to the UI for visualization.
What kind of NoSQL would you recommend and please do specify the reasons?Thanks!
P.S: Some of the devs are recommending ElasticSearch. I am not sure if we should go for a document store or a key-value in the first place.
If you're visualizing log data, I'd use logstash in combination with elasticsearch and kibana. There's also commercial ways to protect your data and more coming. I'm working on k3bana which will visualize data with X3DOM and D3.js. Good luck!
I used Redis (with Jedis) to store key-value pairs in one case.
Related
I'm beginning to pursue my first online project that I am planning will need to scale as such I have opted for a NoSQL DB. Some reading into this and modeling of what my queries would look like and there are two databases I am considering. Cassandra seems like the right choice for item lookups by keyword but MongoDB sounds like the right choice for initially entering the data in as it can retain the account structure in document form.
This split decision has left me wondering: Are there any major companies that use multiple database types for storage of different items as in using both Cassandra and Mongo together?
I would think scaling up would be more difficult but are the added benefits (if there are any) worth the trouble? I'm not the expert on this. I'm hoping you are. Thanks in advance for sharing your experience.
Cassandra can handle both use cases so you can use the same database for your purposes.
Stargate (https://stargate.io/) is an open-source API platform which provides a data gateway to Cassandra with REST API, GraphQL API, Document API and even native CQL access.
The Document API lets you save and search schemaless JSON documents to/from Cassandra directly from your app.
You can try it out for free on Astra with no credit card required. In just a few clicks, you'll be able to launch a Cassandra cluster with Stargate pre-configured so you can use the Document API straight out-of-the box and build a proof-of-concept app immediately without having to worry about downloading/installing/configuring a Cassandra cluster.
There are even sample apps you can access straight from the Astra dashboard so you can see Stargate in action. For more info, see Using the Document API on Astra. Cheers!
Using multiple database technologies in the same project is somewhat common nowadays and it is called "Polyglot persistence".
Many people use this method to take advantage of multiple systems - and as you mentioned Cassandra is right for somethings and something else (maybe MongoDB) is best for something else, so using a combination can give the advantage of both worlds.
Scaling, Replication, Support can be more costly when you use multiple technologies because you need expertise in both to support.
So if you really have use cases where Cassandra wont be a good choice and you have some primary use cases where Cassandra is the best choice then yes, going with two databases can be the best option provided you are ready to take the trouble of supporting two systems.
I making a music app with social networking features. I was hoping to power my database with Neo4j and Redis. In Neo4j I will store user info and all other information ( post, reviews, etc.) in redis. Does anyone have any advice or insight on this?
Short answer: it depends.
Longer answer:
I'm assuming that you are just starting with the app and want to have quick feedback if it is a thing you want to invest (time/money) in.
If you want to run queries like "which users reviewed the same song" you need to put this data into Neo4J. In general, the more connected data you have there, the more interesting the questions you can answer. So I would err on the side of putting data into Neo4j. Also, only querying one database is easier to implement than aggregating data over multiple ones.
If you get enough users that the amount of data they produce starts to impact Neo4j, you can put the actual review text or post into redis and reference it by an id from Neo4j. But by then you already know it is worth doing and this is a fairly manageable refactoring and data migration.
Neo4j is a graph database. However it does not support sharding (horizontal partitioning). The good thing about using Neo4j is that you can store a graph data structure and run graph algorithms easily with Neo4j query language. This may be useful for analyzing some social network properties. The bad thing, is, because Neo4j does not support sharding, the capacity of the database is limited to a single node. When the data size increases, its performance may be impacted.
Redis is always useful for caching data, which can be a good choice.
IMHO, I will try to store all in neo4j in the same case.
So I'm designing this blog engine and I'm trying to just keep my blog data without considering comments or membership system or any other type of multi-user data.
The blog itself is surrounded around 2 types of data, the first is the actual blog post entry which consists of: title, post body, meta data (mostly dates and statistics), so it's really simple and can be represented by simple json object. The second type of data is the blog admin configuration and personal information. Comment system and other will be implemented using disqus.
My main concern here is the ability of such engine to scale with spiked visits (I know you might argue this but lets take it for granted). So since I've started this project I'm moving well with the rest of my stack except the data layer. Now I've been having this dilemma choosing the database, I've considered MongoDB but some reviews and articles/benchmarking were suggesting slow reads after collections read certain size. Next I was looking at Redis and using its persistence features RDB and AOF, while Redis is good at both fast reading/writing I'm afraid of using it because I'm not familiar with it. And this whole search keeps going on to things like "PostgreSQL 9.4 is now faster than MongoDB for storing JSON documents" etc.
So is there any way I can settle this issue for good? considering that I only need to represent my data in key,value structure and only require fast reading but not writing and the ability to be fault tolerant.
Thank you
If I were you I would start small and not try to optimize for big data just yet. A lot of blogs you read about the downsides of a NoSQL solution are around large data sets - or people that are trying to do relational things with a database designed for de-normalized data.
My list of databases to consider:
Mongo. It has huge community support and based on recent funding - it's going to be around for a while. It runs very well on a single instance and a basic replica set. It's easy to set up and free, so it's worth spending a day or two running your own tests to settle the issue once and for all. Don't trust a blog.
Couchbase. Supports key/value storage and also has persistence to disk. http://www.couchbase.com/couchbase-server/features Also has had some recent funding so hopefully that means stability. =)
CouchDB/PouchDB. You can use PouchDB purely on the client side and it can connect to a server side CouchDB. CouchDB might not have the same momentum as Mongo or Couchbase, but it's an actively supported product and does key/value with persistence to disk.
Riak. http://basho.com/riak/. Another NoSQL that scales and is a key/value store.
You can install and run a proof-of-concept on all of the above products in a few hours. I would recommend this for the following reasons:
A given database might scale and hit your points, but be unpleasant to use. Consider picking a database that feels fun! Sort of akin to picking Ruby/Python over Java because the syntax is nicer.
Your use case and domain will be fairly unique. Worth testing various products to see what fits best.
Each database has quirks and you won't find those until you actually try one. One might have quirks that are passable, one will have quirks that are a show stopper.
The benefit of trying all of them is that they all support schemaless data, so if you write JSON, you can use all of them! No need to create objects in your code for each database.
If you abstract the database correctly in code, swapping out data stores won't be that painful. In other words, your code will be happier if you make it easy to swap out data stores.
This is only an option for really simple CMSes, but it sounds like that's what you're building.
If your blog is super-simple as you describe and your main concern is very high traffic then the best option might be to avoid a database entirely and have your CMS generate static files instead. By doing this, you eliminate all your database concerns completely.
It's not the best option if you're doing anything dynamic or complex, but in this small use case it might fit the bill.
I have to implement caching for a function that processes strings of varying lenghts (a couple of bytes up to a few kilobytes). My intention is to use a database for this - basically one big table with input and output columns and an index on the input column. The cache would try to find the string in the input column and get the output column - probably one of the simplest database applications imaginable.
What database would be best for this application? A fully-featured database like mysql or a simple one like sqlite3? Or is there even a better way by not using a database?
Document-stores are made for this. I highly recommend Redis for this specific problem. It is a "key-value" store, meaning it does not have relations, it does not have schemas, all it does is map keys to values. Which sounds like just what you need.
Alternatives are MongoDB and CouchDB. Look around and see what suites you best. My recommendation stays with Redis though.
Reading: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Joe has some good recommendations for data stores that are commonly use for caching. I would say Redis, Couchbase (not CouchDB though - it goes to disk fairly frequently/not that fast from my experience) and just plain Memcached.
MongoDB can be used for caching, but I don't think it's quite as tuned for pure caching like something like Redis is. Mongo can hit the disk quite a bit.
Also I highly recommend using time to live (TTL) as your main caching strategy. Just give a value some time to expire and then re-populate it later. It is a very hard problem to pro-actively find all instances of some data in a cache and refresh it.
What type of NoSQL database is best suited to store hierarchical data?
Say for example I want to store posts of a forum with a tree structure:
original post
+ re: original post
+ re: original post
+ re2: original post
+ re3: original post
+ re2: original post
MongoDB and CouchDB offer solutions, but not built in functionality. See this SO question on representing hierarchy in a relational database as most other NoSQL solutions I've seen are similar in this regard; where you have to write your own algorithms for recalculating that information as nodes are added, deleted and moved. Generally speaking you're making a decision between fast read times (e.g. nested set) or fast write times (adjacency list). See aforementioned SO question for more options along these lines - the flat table approach appears most aligned with your question.
One standard that does abstract away these considerations is the Java Content Repository (JCR), both Apache JackRabbit and JBoss eXo are implementations. Note, behind the scenes both are still doing some sort of algorithmic calculations to maintain hierarchy as described above. In addition, the JCR also handles permissions, file storage, and several other aspects - so it may be overkill for your project.
What you possibly need is a document-oriented database like MongoDB or CouchDB.
See examples of different techniques which allow you to store hierarchical data in MongoDB:
http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
The most common one is IBM's IMS.There is also Cache Database
See this question posted on dba section of stackexchange.
Faced with the same issue, I decided to create my own (very simple) solution using Lua + Redis https://github.com/qbolec/Redis-Tree/
Exist-db implemented hierarchical data model for xml persistence
Graph databases would probably also solve this problem. If neo4j is not enough for you in terms of scaling, consider Titan, which is based on various storage back-ends including HBase and should scale very well. It is not as mature as neo4j, but it is a very promising project.
LDAP, obviously. OpenLDAP would make short work of it.
In mathematics, and, more specifically, in graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path. So any graph db will do the job for sure. BTW an ordinary graph like a tree can be simply mapped to any relational or non-relational DB. To store hierarchical data into a relational db take a look at this awesome presentation by Bill Karwin. There are also ORMs with facilities to store trees. For example TypeORM supports the Adjacency list and Closure table patterns for storing hierarchical structures.
TypeORM is used in TypeScript\Javascript development. Check popular ORMs to find a one supporting trees based on your environment.
The king of Non-relational DBs [IMHO] is Mongodb. Check out it's documentation. to find out how it stores trees. Trees are the most common kind of graphs and they are used everywhere. Any well-established DB solution should have a way to deal with trees.
Here's a non-answer for you. SQLServer 2008!!!! It's great for recursive queries. Or you can go the old fashioned route and store hierarchy data in a separate table to avoid recursion.
I think relational databases lend themselves very well to tree data. Both in query performance and ease of use. With one caveat.... you will be inserting into an indexed table, and probably several other indexed tables every time someone makes a post. Insert performance could be an issue on a facebook caliber forum.
Check out MarkLogic. You can download a demo copy from the website. It is a database for unstructured data and falls under the NoSQL classification of databases. I know unstructured data is a pretty loaded term but just think of it as data that does not fit well in the rows and columns of a RDBMS (like hierarchical data).
Just spent the weekend at a training course using MUMUPS db as a back-end for a full stack javascript browser application development framework. Great stuff! I'd recommend GT.M distro of MUMPS under GPL. Or try http://sourceforge.net/projects/mumps/?source=recommended for vanilla MUMPS. Check out http://robtweed.wordpress.com/ for ewd.js js framework and more info on MUMPS.
A NoSql storage service with native support for hierarchical data is Amazon Web Service's Simple Storage Service (AWS S3). The path based keys are hierarchical by nature, and the blob values may be typed using attributes (mime type, e.g. application/json, text/csv, etc.). Advantages of S3 include the ability to scale to both extremely large overall capacity, versioning, as well as nearly infinite concurrent writes. Disadvantages include no support for conditional writes (optimistic concurrency), or consistent reads (only for read-after write) and no support for references/relationships. It is also purely usage based so wide variations in demand do not require complex scaling infrastructure or over-provisioned capacity.
Clicknouse db has explicit support for hierarchical data