I create an edge like so
CREATE CLASS PumpUpE ABSTRACT EXTENDS E
CREATE CLASS Posted EXTENDS PumpUpE
CREATE EDGE Posted FROM (SELECT FROM User WHERE objectId="vjuQDNCOX4") to #13:491
It all looks fine in the OrientDB Studio
But my Posted class is still empty
This is the answer taken from https://github.com/orientechnologies/orientdb/wiki/Troubleshooting#why-cant-i-see-all-the-edges:
Why can't I see all the edges?
OrientDB, by default, manages edges as "lightweight" edges if they have no properties. This means that if an edge has no properties, it's not stored as physical record. But don't worry, your edge is still there but encoded in a separate data structure. For this reason if you execute a select from Eno edges or less edges than expected are returned. It's extremely rare the need to have the list of edges, but if this is your case you can disable this feature by issuing this command once (with a slow down and a bigger database size):
alter database custom useLightweightEdges=false
Related
I see in the docs that it mentions UserSuppliedIds is supported for edges but any edge I create ignores my .property assignment for id and assigns a guid. I don't want to add duplicate edges between two vertices so I was going to assign my own id to it( and then I can quickly and efficiently for query it using regular sql syntax too). How can I use my own 'id' for an edge?
Which docs are you referring to? Is it the gremlin docs?
Note that, we currently control the ids of the edges ourselves so that the edges could be collocated with the source vertices for query-efficiency reasons. And it's a bug that we don't throw an exception when an edge id is indeed provided.
We are changing this behavior and will allow users to specify id while creating an edge. I will check with the team, and get you an ETA for this.
Thanks again for reporting this. Please, let us know if we can help in any other way.
Jayanta
I have tested the following gremlin expression which adds an edge and assigns a custom id.
g.V('1231234').addE('postedTo').property('id', '1231234:post_4').from(g.V('post_4'))
This worked using the latest and previous versions of Microsoft.Azure.Graph nuget package (0.2.4-preview and 0.2.2-preview):
Note: Edge or vertex id property can only be assigned when creating the element via addV or addE operations. After the element is written, the id property is read-only.
I did not get time to test this on a graph server instance, however version 0.2.2-preview of the package should be comparable to what is deployed so I'm expecting the same results.
Since it doesn't appear to be supported today through the Gremlin APIs I'd suggest you take a look at using the Document APIs for CRUD operations against your graph elements. This is the approach I've taken at work and we've had great success with it. Basically, if you insert a few vertices and edges through Gremlin and then inspect the resulting documents using SQL in the portal you'll be able to see the format that is expected in the underlying storage.
Building on that, we've designed some libraries that take the POCOs for our various Vertex and Edge types and translate them into the graph format expected in the backend by Cosmos. This will allow you to completely control the selection of Ids for your edges. A very important and common use case that you've pointed out which is also important to our system is the ability to prevent more than one edge for a particular vertex by restricting it's Id.
Is it possible to create edges by specifying documents that may or may not exist, and create them when they don't?
For instance, if I run a query like:
INSERT {_to: 'docs/something', _from: 'docs/other'} IN edges
If either docs/something or docs/other don't exist already, I'll get an error. Is there an option I could pass that would create docs/something and docs/other (as an empty object, perhaps) if they didn't exist?
Note: I can do a bulk import and create edges without documents - _to and/or _from just lead to nowhere - but I'd rather create a blank document
One of the features of Managed Graphs is, that it ensures graph integrity. Thus using the edge management facility will end in ArangoDB not permitting the insertion of dangling edges.
However, ArangoDBs graph functionality is layered on top of document functionality. The document functionality does not warant graph integrity; thus inserting edges referencing non existant vertices is possible this way and your example query will work if the edge collection exists.
However, quoting the insert documentation:
Each INSERT operation is restricted to a single collection,
and the collection name must not be dynamic.
Only a single INSERT statement per collection is allowed per AQL query,
and it cannot be followed by read operations that access
the same collection, by traversal operations,
or AQL functions that can read documents.
So you won't be able to create vertices dynamically with AQL in the same query.
With ArangoDB 2.8 the vertex collection would have to exist first.
This is for a project that will map metadata. There are many more nodes but this particular one became a debate in the team.
Which model would yield the best query performance? Or it does not matter?
Option 1
Permission metadata is explicit as edges between nodes.
Option 2
Permission metadata is inside the properties of the edge.
Option 3
???
Let me comment for ArangoDB here, being one of its developers.
There is a third possibility, namely to have a single vertex collections and multiple edge collections for the different access methods. You would then "officially" have 3 graphs that share the same vertex set.
I would expect that this is better in performance, because each access type would only have to deal with a single type of edge and access would be fast.
Obviously it all depends on your queries. My statement holds for queries like "what are all the Entities a Person can update?" or "who can select this Entity?".
I could imagine that your standard query is more "Can this person delete that Entity?" or "Which access rights does this person have for that Entity?".
These two questions are probably not efficient with any of the approaches suggested, because as far as I see, all of them would then require a search, either in the outgoing edges of the Person or in the incoming edges of the Entity.
What would be needed here are a kind of "vertex centric indices", that is an index that can be used for the set of outgoing or incoming edges of a given vertex. If you, for example would use your option 2 (or indeed 1, this does not matter so much), and have a sorted index on all edges that is sorted first by Person and then by Entity. Then it is a lookup with time complexity O(log(#edges)) to find the (probably singleton) set of edges from a given Person to a given Entity.
We at ArangoDB are currently busy to add this feature, which will appear in one of the next two releases.
I can only speak for Neo4j here:
I don't know that it would matter much, but definitely benchmark! Both relationships and properties are stored as linked lists, so it will still need to traverse them. But if you have more relationships between Person and Entity nodes then putting them in properties starts to become more attractive.
I recommend checking out the free O'Reilly book Graph Databases to learn more about the internals of Neo4j. But benchmarks will always be the gold standard.
I'm using OrientDB for a somewhat large amount of data - since importing takes some weeks.
Now, when I'm almost done I get
Database could contain broken vertices
Can I test the database for problems? Or does it just 'act' as if all is right?
I've had a previous iteration where I found out, later, not all vertices and edges were imported correctly.
One of the reasons why I presume something is out of the ordinary is the error message
..ODatabaseException: RecordId cannot support negative cluster id
Current approach is try to print each vertex (type), since a broken vertex appears to throw errors when trying to read all the properties - NOT on fetch-only. Seems suboptimal for over 100M vertices. And how for the Edges?
"Database could contain broken vertices" appears on drop class command to warn you about dropping classes instead of using delete vertex. Follow the suggestions if you don't want broken edges.
About the negative RID you could also use repair database console command.
How you did insert the graph? With or without tx? Did you even stop the process while it was running? Are you importing against plocal or remote protocol?
Graph databases store data as nodes, properties and relations. If I need to retrieve some specific data from an object based upon a query, then I would need to retrieve multiple objects (as the query might have a lot of results).
Consider this simple scenario in object oriented programming in graph-databases:
I have a (graph) database of users, where each user is stored as an object. I need to retrieve a list of users living in a specific place (the place property is stored in the user object). So, how would I do it? I mean unnecessary data will be retrieved every time I need to do something (in this case, the entire user object might need to be retrieved). Isn't functional programming better in graph databases?
This example is just a simple analogy of the above stated question that came to my mind. Don't take it as a benchmark. So, the question remains, How great is object oriented programming in graph-databases?
A graph database is more than just vertices and edges. In most graph databases, such as neo4j, in addition to vertices having an id and edges having a label they have a list of properties. Typically in java based graph databases these properties are limited to java primatives -- everything else needs to be serialized to a string (e.g. dates). This mapping to vertex/edge properties can either be done by hand using methods such as getProperty and setProperty or you can something like Frames, an object mapper that uses the TinkerPop stack.
Each node has attributes that can be mapped to object fields. You can do that manually, or you can use spring-data to do the mapping.
Most graph databases have at least one kind of index for vertices/edges. InfiniteGraph, for instance, supports B-Trees, Lucene (for text) and a distributed, scaleable index type. If you don't have an index on the field that you're trying to use as a filter you'd need to traverse the graph and apply predicates yourself at each step. Hopefully, that would reduce the number of nodes to be traversed.
Blockquote I need to retrieve a list of users living in a specific place (the place property is stored in the user object).
There is a better way. Separate location from user. Instead of having a location as a property, create a node for locations.
So you can have (u:User)-[:LIVES_IN]->(l:Location) type of relationship.
it becomes easier to retrieve a list of users living in a specific place with a simple query:
match(u:User)-[:LIVES_IN]->(l:Location) where l.name = 'New York'.
return u,l.
This will return all users living in New York without having to scan all the properties of each node. It's a faster approach.
Why not use an object-oriented graph database?
InfiniteGraph is a graph database built on top of Objectivity/DB which is an massively scalable, distributed object-oriented database.
InfiniteGraph allows you to define your vertices and edges using a standard object-oriented approach, including inheritance. You can also embed a defined data type as an attribute in another data type definition.
Because InfiniteGraph is object-oriented, it give you access to query capabilities on complex data structures that are not available in the popular graph databases. Consider the following diagram:
In this diagram I create a query that determines the inclusion of the edge based on an evaluation of the set of CallDetail nodes hanging off the Call edge. I might only include the edge in my results if there exists a CallDetail with a particular date or if the sum of the callDurations of all of the CallDetails that occurred between two dates is over from threshold. This is the real power of object-oriented database in solving graph problems: You can support a much more complex data model.
I'm not sure why people have comingled the terms graph database and property graph. A property graph is but one way to implement a graph database, and not particular efficient. InfiniteGraph is a schema-based database and the schema provides several distinct advantages, one of which object placement.
Disclaimer: I am the Director of Field Operation for Objectivity, Inc., maker of InfiniteGraph.