How to best model authorization of access to nodes in a graph database for a CMS - graph-databases

I'm trying to figure out how to best model
User -> Role -> Permission -> Content in a graph database for a CMS I'm attempting to build.
My current approach is to have User and Role as nodes and then each type of permission as a separate edge between Role and Content.
I did however also read an article https://medium.com/#gadberger/role-based-access-control-using-a-graph-database-2573debb685e that suggests an edge between Role and Content with properties for each permission.
Finally there is http://www.charlesnurse.com/post/Look-Mom-NoSQL-10-An-Introduction-to-Graph-Databases that suggests having User, Role, Permission and Content as nodes, which basically only differs from my current solution by having Permission as a node rather than being and edge.
I'm a little confused whether to model permissions as
a node
an edge with all separate permissions as properties on that edge
a separate edge for each individual permission
What are the pros / cons of each and what would best fit in the context of a CMS?
I'm using ArangoDB.

I'm not sure I can fully answer these questions for you, aside from some guidance based on node/edge differences.
The main feature of a node is that you can link other nodes to it with an edge. You cannot create links between edges.
Given that, I think the implementation is up to you, based on your data, understanding of the graph, and how much management of individual permissions you want to do (changing a single node vs. lots of edges).
ArangoDB allows you to define multiple "named" graphs over the same nodes, which could be handy. For instance, you could have one graph that defines correlated documents and another that defines permissions. The two graphs could be completely unrelated and limiting the "named graph" to a specific purpose might speed traversal significantly (YMMV).

Related

What is the best way to implement permission levels in the MEAN stack?

I've been following the examples in the book "MEAN Machine", and I've implemented a simple token-based authentication system that makes the contents of a certain model only available to authenticated users.
I'd like to take this to a more complex level: I need three different user types.
I am building an app where some users (let's say, vendors) can upload certain data that could only be accessible to certain authenticated users (let's say, consumers), but vendors also need to be able to see, but not edit data uploaded by other vendors. Then, there would be a third type of user, the admin, who would be able to edit and see everything, including the details of other, lower level users.
How should I proceed in constructing this?
Thanks in advance for your help.
As you mentioned that the authentication system is already working and now you need to implement Access List Control. The ACL end implementation depends a lot on your database model and requirements. There are also Node modules which have the support for more advanced models like this acl module https://www.npmjs.com/package/acl, supports also MongoDB.

Graph database to return a list of common friends among 2 person in a social network

Are there any graph database(s) that has a built-in feature to return a list of common friends among 2 or more people - just like in a social network like Facebook? The result should be returned as fast as possible without the need to perform complex calculations or to traverse the database. If not, what are the ways to implement it? What about OriendDB? What about using a combination of graph database and Redis?
Not sure about specific graph databases (I come at this from building my own graph database on top of redis) but assuming that friend means a direct connection, finding common friends is fairly simple - just get the full friends list from each and calculate the intersection.
Redis has a command to do this natively and very fast. The SQL query for it is also fairly simple. Getting all connections for a single node should be available on any graph database, and even if you need to retrieve the full lists and calculate the intersection in app code, performance will probably be adequate as long as you don't have to deal with people who have thousands/millions of friends.
Where it gets more complex is dealing with indirect relationships - the intersection operation is the same, but the sets don't exist in the form needed without traversing the graph, so before calculating the intersection you need to build a set of all second level connections for each user. You can either do this as the first step of your query or maintain permanent sets updated when connections change - the appropriate method depends on whether you need to optimize for data usage and write performance or read performance.

Graph Database to Count Direct Relations

I'm trying to graph the linking structure of a web site so I can model how pages on a given domain link to each other. Note I'm not graphing links to sites not on the root domain.
Obviously this graph could be considerable in size. One of the main queries I want to perform is to count how many pages directly link into a given url. I want to run this against the whole graph (shudder) such that I end up with a list of urls and the count of incoming links to that url.
I know one popular way of doing this would be via some kind of map reduce - and I may still end up going that way - however I have a requirement to be able to view this report in (near) realtime which isn't generally map reduce friendly.
I've had a quick look at neo4j and OrientDb. While both of these could model the relationship I want it's not clear if I could query them to generate the report I want. At this point I'm not committed to any particularly technology.
Any help would be greatly appreciated.
Thanks,
Paul
both OrientDB and Neo4J supports Blueprints as common API to make graph operations like traversal, counting, etc.
If I've understood well your use case your graph seems pretty simple: you have a "URL" Vertex that links each other with one type of Edge "Links".
To execute operation against graphs take a look at Gremlin.
You might have a look at structr. It is a open source CMS running on top of Neo4j and exactly has those types of inter-page links.
For getting the number of links pointing to the page you just have to iterate the incoming LINKS_TO links for the current page-node.
What is the use-case for your query ? A popular page list? So it would just contain the top-n pages? You might then try to just start at random places of the graph traverse incoming LINKS_TO relationships to your current node(s) in parallel and put them into a sorting structure, so you always start/continue with the first 20 or so top page-nodes that already have the highest number of incoming links (until they're finished).
Marko Rodriguez has some similar "page-rank" examples in the Gremlin documentation. He's also got several blog posts where he talks about this.
Well with Neo4J you won't be able to split the graph across servers to distribute the load. you could replicate the database to distribute the computation, but then updating will be slow (as you have to replicate the updates). I would attack the problem by updating a count of inbound links to each node as new relationships are added as a property of the node. Neo4J has excellent write performance. Of course you don't need to persist this information because direct relationships are cheap to retrieve (you don't get a collection of all related nodes just an iterator).
You should also take a look at a highly scalable graph database product, such as InfiniteGraph. If you email their technical support I think they will be able to point you at some sample code that does a large part of what you've described here.

How to grant permission to view a page based on a hierarchy of users

I'm working on a user facing django application for an enterprise solution. Currently, users are able to categorize data on the site into private collections, visible only to themselves. A feature request is for managers to be able to view the private collections of their subordinates.
My issue is, what is the best solution for implementing this hierarchy? I've thought of a few solutions:
A foreign key from user to user named manager. Create a #user_passes_test test that recurses through the manager relation looking from the owner of the collection until a) the requesting user is found to be a manager, or b) manager is Null, indicating the requesting user is not authorized to do access this page.
Benefits: simple hierarchy is accurately represented with minimum data
Drawbacks: A large hierarchy results in many queries
Create a many to many relation between users and users called managers. Create all the relationships in this table.
Benefits: Only one query necessary, and users can have multiple managers.
Drawbacks: difficult to change the hierarchy when someone leaves.
I'm open to any other suggestions people have, as well.
A tree of uniform data (where the data referred to by nodes and leaves is of the same class) can often easily be maintained by an SQL-based tree structure. While you can always write one by hand, Django MPTT and Treebeard have both dealt with the issue. I've used Treebeard, but MPTT seems to be more popular.

Users and roles in context

I'm trying to get a sense of how to implement the user/role relationships for an application I'm writing. The persistence layer is Google App Engine's datastore, which places some interesting (but generally beneficial) constraints on what can be done. Any thoughts are appreciated.
It might be helpful to keep things very concrete. I would like there to be organizations, users, test content and test administrations (records of tests that have been taken). A user can have the role of participant (test-taker), contributor of test material or both. A user can also be a member of zero or more organizations. In the role of participant, the user can see the previous administrations of tests he or she has taken. The user can also see a test administration of another participant if that participant has given the user authorization. The user can see test material that has been made public, and he or she can see restricted content as a participant during a specific administration of a test for which that user has been authorized by an organization. As a member of an organization, the user can see restricted content in the role of contributor, and he or she might or might not also be able to edit the content. Each organization should have one or more administrators that can determine whether a member can see and edit content and determine who has admin privileges. There should also be one or more application-wide superusers that can troubleshoot and solve problems. Members of organizations can see the administrations of tests that the participants concerned have authorized them to see, and they can see anonymous data if no authorization has been given. A user cannot see the test results of another user in any other circumstances.
Since there are no joins in the App Engine datastore, it might be necessary to have things less normalized than usual for the typical SQL database in order to ensure that queries that check permissions are fast (e.g., ones that determine whether a link is to be displayed).
My questions are:
How do I move forward on this? Should I spend a lot of time up front in order to get the model right, or can I iterate several times and gradually roll in additional complexity?
Does anyone have some general ideas about how to break things up in this instance?
Are there any GAE libraries that handle roles in a way that is compatible with this arrangement?
I'm not quite sure I'm understanding your questions correctly, but I'll try my best to answer:
I always find iterative programming easier to test and write, so that's my recommendation.
I think you have the necessary entities already divided correctly, but I think you need an additional entity: Permission, that defines what each role can do, each Role having zero or more Permission links. Just remember that for each many-to-many relationship in GAE you need to either define a list of keys, or a separate entity to be the intermediary.
Not that I know of, but you may want to investigate Django-based role systems and try to adapt a Django-based solution (since Django's been around longer). You can hack Django onto GAE rather nicely with App Engine Patch.

Resources