Has anyone tried or even thought about using a terracotta solution (i.e. ehcache) to run/store Neo4J?
I understand Neo4J has High Availability, but that's really just replication. Want I really want is a distributed graph solution, hence, neo4j on something like ehcache.
Any thoughts/suggestions?
Thanks!
Running graph databases efficiently across multiple hosts is very challenging
This article on Cache Sharding explains one solution:
http://jim.webber.name/2011/02/23/abe72f61-27fb-4c1b-8ce1-d0db7583497b.aspx
Note particularly the statement: "partitioning graphs across physical instances is a notoriously difficult way to scale graph data", referring to a previous article that discusses the details of why this is a hard problem:
http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx
Related
I making a music app with social networking features. I was hoping to power my database with Neo4j and Redis. In Neo4j I will store user info and all other information ( post, reviews, etc.) in redis. Does anyone have any advice or insight on this?
Short answer: it depends.
Longer answer:
I'm assuming that you are just starting with the app and want to have quick feedback if it is a thing you want to invest (time/money) in.
If you want to run queries like "which users reviewed the same song" you need to put this data into Neo4J. In general, the more connected data you have there, the more interesting the questions you can answer. So I would err on the side of putting data into Neo4j. Also, only querying one database is easier to implement than aggregating data over multiple ones.
If you get enough users that the amount of data they produce starts to impact Neo4j, you can put the actual review text or post into redis and reference it by an id from Neo4j. But by then you already know it is worth doing and this is a fairly manageable refactoring and data migration.
Neo4j is a graph database. However it does not support sharding (horizontal partitioning). The good thing about using Neo4j is that you can store a graph data structure and run graph algorithms easily with Neo4j query language. This may be useful for analyzing some social network properties. The bad thing, is, because Neo4j does not support sharding, the capacity of the database is limited to a single node. When the data size increases, its performance may be impacted.
Redis is always useful for caching data, which can be a good choice.
IMHO, I will try to store all in neo4j in the same case.
I'm building a website that will rely on heavy computations to make guess and suggestion on objects of objects (considering the user preferences and those of users with similar profiles). Right now I'm using MongoDB for my projects, but I suppose that I'll have to go back to SQL for this one.
Unfortunately my knowledge on the subject is high school level. I know that there are a lot of relational databases, and was wondering about what could have been some of the most appropriate for this kind of heavily dynamic cluster analysis. Also I would really appreciate some suggestion regarding possible readings (would be really nice if free and online, but I won't mind reading a book. Just maybe not a 1k pages one if possible).
Thanks for your help, extremely appreciated.
Recommondations are typically a graph like problem, so you should also consider looking into graph databases, e.g. Neo4j
What type of NoSQL database is best suited to store hierarchical data?
Say for example I want to store posts of a forum with a tree structure:
original post
+ re: original post
+ re: original post
+ re2: original post
+ re3: original post
+ re2: original post
MongoDB and CouchDB offer solutions, but not built in functionality. See this SO question on representing hierarchy in a relational database as most other NoSQL solutions I've seen are similar in this regard; where you have to write your own algorithms for recalculating that information as nodes are added, deleted and moved. Generally speaking you're making a decision between fast read times (e.g. nested set) or fast write times (adjacency list). See aforementioned SO question for more options along these lines - the flat table approach appears most aligned with your question.
One standard that does abstract away these considerations is the Java Content Repository (JCR), both Apache JackRabbit and JBoss eXo are implementations. Note, behind the scenes both are still doing some sort of algorithmic calculations to maintain hierarchy as described above. In addition, the JCR also handles permissions, file storage, and several other aspects - so it may be overkill for your project.
What you possibly need is a document-oriented database like MongoDB or CouchDB.
See examples of different techniques which allow you to store hierarchical data in MongoDB:
http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
The most common one is IBM's IMS.There is also Cache Database
See this question posted on dba section of stackexchange.
Faced with the same issue, I decided to create my own (very simple) solution using Lua + Redis https://github.com/qbolec/Redis-Tree/
Exist-db implemented hierarchical data model for xml persistence
Graph databases would probably also solve this problem. If neo4j is not enough for you in terms of scaling, consider Titan, which is based on various storage back-ends including HBase and should scale very well. It is not as mature as neo4j, but it is a very promising project.
LDAP, obviously. OpenLDAP would make short work of it.
In mathematics, and, more specifically, in graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path. So any graph db will do the job for sure. BTW an ordinary graph like a tree can be simply mapped to any relational or non-relational DB. To store hierarchical data into a relational db take a look at this awesome presentation by Bill Karwin. There are also ORMs with facilities to store trees. For example TypeORM supports the Adjacency list and Closure table patterns for storing hierarchical structures.
TypeORM is used in TypeScript\Javascript development. Check popular ORMs to find a one supporting trees based on your environment.
The king of Non-relational DBs [IMHO] is Mongodb. Check out it's documentation. to find out how it stores trees. Trees are the most common kind of graphs and they are used everywhere. Any well-established DB solution should have a way to deal with trees.
Here's a non-answer for you. SQLServer 2008!!!! It's great for recursive queries. Or you can go the old fashioned route and store hierarchy data in a separate table to avoid recursion.
I think relational databases lend themselves very well to tree data. Both in query performance and ease of use. With one caveat.... you will be inserting into an indexed table, and probably several other indexed tables every time someone makes a post. Insert performance could be an issue on a facebook caliber forum.
Check out MarkLogic. You can download a demo copy from the website. It is a database for unstructured data and falls under the NoSQL classification of databases. I know unstructured data is a pretty loaded term but just think of it as data that does not fit well in the rows and columns of a RDBMS (like hierarchical data).
Just spent the weekend at a training course using MUMUPS db as a back-end for a full stack javascript browser application development framework. Great stuff! I'd recommend GT.M distro of MUMPS under GPL. Or try http://sourceforge.net/projects/mumps/?source=recommended for vanilla MUMPS. Check out http://robtweed.wordpress.com/ for ewd.js js framework and more info on MUMPS.
A NoSql storage service with native support for hierarchical data is Amazon Web Service's Simple Storage Service (AWS S3). The path based keys are hierarchical by nature, and the blob values may be typed using attributes (mime type, e.g. application/json, text/csv, etc.). Advantages of S3 include the ability to scale to both extremely large overall capacity, versioning, as well as nearly infinite concurrent writes. Disadvantages include no support for conditional writes (optimistic concurrency), or consistent reads (only for read-after write) and no support for references/relationships. It is also purely usage based so wide variations in demand do not require complex scaling infrastructure or over-provisioned capacity.
Clicknouse db has explicit support for hierarchical data
I am intrested to know a little bit more about databases then i currently know. I know how to setup a database backend for any webapp that i happen to be creating but that is all. For example if i was creating three different apps i would simply create three different databases and then configure each database for the particular app. This is all simple knowledge and i would now like to have a deeper understanding of how databases actually work.
Lets say that I developed an application for example that needed lot of space and processing power.This database would then have to be spread over numerous machines. How exactly would a database be spread across numerous machines and still be able to write records and then retreieve them. Would each table get their own machine and what software is needed to make sure that the different machines have all performed their transactions successfully.
As you can see i am quite a database ignoramus lol.
Any help in clearing this up would be greatly appreciated.
I don't know what RDBMS you're using but I have two book suggestions.
For theory (which should come first, in my opinion): Database in Depth: Relational Theory for Practitioners
For implementation: High Performance MySQL: Optimization, Backups, Replication, and More
I own both these books and they are both pretty great, especially the first one.
That's quite a broad topic... You might want to start with Multi-master replication, High-availability clustering and Massively parallel processing.
If you want to know about how to keep databases running with ever increasing load, then it's not a basic question. Several well known web companies are struggling to find the right way to make their database scalable.
Using memcached to cache database information is one way to decrease load on your database if your application is read-intensive. If you application is write-intensive then may be you would want to consider using a NOSQL datastore like MongoDB or Redis.
Database Design for Mere Mortals
This is the best book about the subject if you don't have any experience with databases. It's got historical background and practical examples. Most books often skip the historical stuff because they assume you know what a db is, or it doesn't matter, and jump right to the practical. This book gives you the complete picture.
Just came across FlockDB graph database. Details at github /flockDB. Twitter claims it uses FlockDB for the following:
Twitter runs FlockDB on a large cluster of machines.
we use it to store social graphs (who
follows whom, who blocks whom) and
secondary indices at twitter.
At first glance, setup and trying it doesn't look straight forward. Have anyone already used it / setup this? If so, please answer the following general queries.
What kind of applications is it
better suited for? (Twitter claims it
is simple and very rough, it remains
to see what it meant though)
How is FlockDB better than other graph db /
noSQL db. Have you setup FlockDB,
used it for a application?
Early advices any?
Note: I am evaluating the FlockDB and other graph databases mainly for learning them. Perhaps, I will build an application for that.
Flockdb is still Yet to be released by Twitter, which means the current version you are seeing won't run properly. Going by the history of commits i guess within a couple of days you can see a stable version which you can build and test.
Compared to something like Neo4J you can say Flockdb is not even a graph database. The toughest part of a graph database is how many levels of depth it can handle. From the little documentation of Flockdb it seems like it can't handle more than 1 level of depth. Where FlockDb wins compared to DBs like Neo4J is it's low latency, high throughput and inherent distributed nature.
Regarding Applications - i guess it will be a great fit whenever you need social networking or twitter like behavior. I don't think many will find such use cases though (who gets 20k friend requests per sec ?).
I Just started looking into Flockdb. Right now i am planning to use it in my forum software. Instead of user1 follows user2 relationship, i am planning to use it for user1 read post1, user1 favorites post1 etc. Being one of the highly active online communities we get a lot of such traffic(read/favorite). Can't think of any other use cases now.
Don't miss OrientDB. It's a document-graph dbms with special operator for traversing relationships: http://code.google.com/p/orient/wiki/GraphDatabase