OrientDB - Trouble storing data from a java application - database

I am trying to store triplets inside of OrientDB as Vertex-Edge-Vertex relationships inside of a Java application that I am working on. My understanding of using OrientDB is that I can use the Tinkerpop API and instantiate a graph like this:
OrientGraph graph = new OrientGraph("local:/tmp/orient/test_db");
That is really all I do to instantiate the graph, then I try to connect vertices with edges in a loop like this: (Note that a Statement is a triplet consisting of subject-relationship-object.)
for (Statement s : statements) {
Vertex a = graph.addVertex(null);
Vertex b = graph.addVertex(null);
a.setProperty("Subject", s.getSubject().toBELShortForm());
RelationshipType r = s.getRelationshipType();
if (s.getObject() != null) {
b.setProperty("Object", s.getObject().toBELShortForm());
Edge e = graph.addEdge(null, a, b, r.toString());
}
else {
b.setProperty("Object", "null");
Edge e = graph.addEdge(null, a, b, "no-relationship");
}
}
I then loop through the vertices of the graph and print them out like this:
for (Vertex v : graph.getVertices()) {
out.println("Vertex: " +v.toString());
}
It does print a lot of vertices, but when I log into the server via the command line, using server.sh, all I see are the 3 records for ORole and 4 records for OUser. What am I missing here? Because it seems like although my java program runs and completes, the data is not being put into the database.

The answer, at least for now, seems to be not to use the Tinkerpop API but rather the Orient API directly. This is the same thing I was doing with Tinkerpop, but using the OrientDB API. This actually does store my data into the database:
for (Statement s : statements) {
ODocument sNode = db.createVertex();
sNode.field("Subject", s.getSubject().toBELShortForm());
sNode.save();
ODocument oNode = db.createVertex();
if (s.getObject() != null) {
oNode.field("Object", s.getObject().toBELShortForm());
oNode.save();
}
else {
oNode.field("Object", "null");
oNode.save();
}
RelationshipType r = s.getRelationshipType();
ODocument edge = db.createEdge(sNode, oNode);
if (r != null) {
edge.field(r.toString());
edge.save();
}
else {
edge.field("no relationship");
edge.save();
}
}

Create the Graph under the Server's databases directory. Below an example assuming OrientDB has been installed under "/usr/local/orient":
OrientGraph graph = new OrientGraph("local:/usr/local/orient/databases/test_db");
When you start the server.sh you should find this database correctly populated.
Lvc#

Related

What is the best way to set up a Spring JPA to handling searching for items based on tags?

I am trying to set up a search system for a database where each element (a code) in one table has tags mapped by a Many to many relationship. I am trying to write a controller, "search" where I can search a set of tags which basically act like key words, giving me an element list where the elements all have the specified tags. My current function is incredibly naive, basically it consists of retrieving all the codes which are mapped to be a tag, then adding those a set, then sorting the codes by how many times the tags for each code is found in the query string.
public List<Code> naiveSearch(String queryText) {
String[] tagMatchers = queryText.split(" ");
Set<Code> retained = new HashSet<>();
for (int i = 0; i < Math.min(tagMatchers.length, 4); i++) {
tagRepository.findAllByValueContaining(tagMatchers[i]).ifPresent((tags) -> {
tags.forEach(tag -> {
retained.addAll(tag.getCodes());
}
);
});
}
SortedMap<Integer, List<Code>> matches = new TreeMap<>();
List<Code> c;
for (Code code : retained) {
int sum = 0;
for (String tagMatcher : tagMatchers) {
for (Tag tag : code.getTags()) {
if (tag.getValue().contains(tagMatcher)) {
sum += 1;
}
}
}
c = matches.getOrDefault(sum, new ArrayList<>());
c.add(code);
matches.put(sum, c);
}
c = new ArrayList<>();
matches.values().forEach(c::addAll);
Collections.reverse(c);
return c;
}
This is quite slow and the overhead is unacceptable. My previous trick was a basically retrieval on the description for each code in the CRUDrepository
public interface CodeRepository extends CrudRepository<Code, Long> {
Optional<Code> findByCode(String codeId);
Optional<Iterable<Code>> findAllByDescriptionContaining(String query);
}
However this is brittle since the order of tags in containing factors into whether the result will be found. eg. I want "tall ... dog" == "dog ... tall"
So okay, I'm back several days later with how I actually solved this problem. I used hibernate's built in search library which has a very easy implementation in spring. Just paste the required maven coordinates in your POM.xml and it was ready to roll.
First I removed the manytomany for the tags<->codes and just concatenated all my tags into a string field. Next I added #Field to the tags field and then wrote a basic search Method. The method I wrote was a very simple search function which took a set of "key words" or tags then performed a boolean search based on fuzzy terms for the the indexed tags for each code. So far it is pretty good. My database is fairly small (100k) so I'm not sure about how this will scale, but currently each search returns in about 20-50 ms which is fast enough for my purposes.

Firebase Unity - Get all children into an array/generic list after GetValueAsync?

How can I get data back as an array or generic list from a Firebase database in Unity3D without knowing ahead of time what the name (key) of the children are?
I have been trying out the new Unity Firebase plugin, and I am having an issue figuring out how to get all the children in a specific location, and put the names (the key) and the values into arrays or generic lists so that I can work on the data locally. Forgive me for being so new to Firebase and probably using bad techniques to do this, and this plugin being so new its pretty hard for me to get much outside help, as there are not a lot of docs and tutorials out there on Firebase Unity.
In this particular case I am trying to create "instant messaging" like functionality, without the use of Firebase messaging, and just using regular Firebase database stuff instead. It might have been easier to use Firebase messaging, but mostly for the sake of learning and customization I want to do this on my own with just the Firebase database.
I insert data into the database like this:
public void SendMessage(string toUser, string msg)
{
Debug.Log(String.Format("Attempting to send message from {0} to {1}", username, toUser));
DatabaseReference reference = FirebaseDatabase.DefaultInstance.GetReference("Msgs");
string date = Magnet.M.GetCurrentDate();
// send data to the DB
reference.Child(toUser).Child(username).Child(date).SetValueAsync(msg);
// user receiving message / user sending message > VALUE = "hello dude|20170119111325"
UpdateUsers();
}
And then I try and get it back like this:
public string[] GetConversation(string userA, string userB)
{
// get a conversation between two users
string[] convo = new string[0];
FirebaseDatabase.DefaultInstance.GetReference("Msgs").GetValueAsync().ContinueWith(task =>
{
Debug.Log("Getting Conversation...");
if (task.IsFaulted || task.IsCanceled)
{
Debug.LogError("ERROR: Task error in GetConversation(): " + task.Exception);
}
else if (task.IsCompleted)
{
DataSnapshot snapshot = task.Result;
string[] messagesA = new string[0], messagesB = new string[0];
if(snapshot.HasChild(userA))
{
// userA has a record of a conversation with other users
if(snapshot.Child(userA).HasChild(userB)) // userB has sent messages to userA before
{
Debug.Log("Found childA");
long count = snapshot.Child(userA).Child(userB).ChildrenCount;
messagesA = new string[count];
var kids = snapshot.Child(userA).Child(userB).Children;
Debug.Log(kids);
for (int i = 0; i < count; i++)
{
// this won't work, but is how I would like to access the data
messagesA[i] = kids[i].Value.ToString(); // AGAIN.... will not work...
}
}
}
if(snapshot.HasChild(userB))
{
if(snapshot.Child(userB).HasChild(userA)) // userA sent a message to userB before
{
Debug.Log("Found childB");
long count = snapshot.Child(userB).Child(userA).ChildrenCount;
messagesA = new string[count];
var kids = snapshot.Child(userB).Child(userA).Children;
Debug.Log(kids);
// messy incomplete testing code...
}
}
// HERE I WOULD ASSIGN ALL THE MESSAGES BETWEEN A AND B AS 'convo'...
}
Debug.Log("Done Getting Conversation.");
});
return convo;
}
But obviously this won't work, because DataSnapshot won't let me access it like an array or generic list using indices, and I can't figure out how to treat the data when I don't know the names (the keys) of all the children, and just want to get them out one by one in any order... and since they are named by the date/time they are entered into the DB, I won't know ahead of time what the childrens names (keys) are, and I can't just say "GetChild("20170101010101")" because that number is generated when its sent to the DB from any client.
FYI here is what the DB looks like:
Figured out the answer to your question. Here's my code snippet. Hope this would help!
void InitializeFirebase() {
FirebaseApp app = FirebaseApp.DefaultInstance;
app.SetEditorDatabaseUrl ("https://slol.firebaseio.com/");
FirebaseDatabase.DefaultInstance
.GetReference ("Products").OrderByChild ("category").EqualTo("livingroom")
.ValueChanged += (object sender2, ValueChangedEventArgs e2) => {
if (e2.DatabaseError != null) {
Debug.LogError (e2.DatabaseError.Message);
}
if (e2.Snapshot != null && e2.Snapshot.ChildrenCount > 0) {
foreach (var childSnapshot in e2.Snapshot.Children) {
var name = childSnapshot.Child ("name").Value.ToString ();
text.text = name.ToString();
Debug.Log(name.ToString());
//text.text = childSnapshot.ToString();
}
}
};
}
Firebase developer here.
Have you tried to use Value at the top level Snapshot? It should return to you an IDictionary where the values can also be lists or nested dictionaries. You will have to use some dynamic inspection to figure out what the values are.

Creating batch documents using pouchdb slows the webapp

I am trying to save documents using pouchdb's bulkSave() function.
However, when these documents are saved it starts to sync with master database using sync gateway & in doing so the webapp slows down and when I try to navigate to different tabs no content is displayed on that tab.
Below is an example of how the documents are being created:
for (var i = 0; i <= instances; i++) {
if (i > 0) {
advTask.startDate = new Date(new Date(advTask.startDate).setHours(new Date(advTask.startDate).getHours() + offset));
}
if (advTask.estimatedDurationUnit == 'Minutes') {
advTask = $Date.getAdvTaskEndTimeIfMinutes(advTask);
} else if (advTask.estimatedDurationUnit == 'Hours') {
advTask = $Date.getAdvTaskEndTimeIfHours(advTask);
} else if (advTask.estimatedDurationUnit == 'Days') {
advTask = $Date.getAdvTaskEndTimeIfDays(advTask);
}
if(new Date(advTask.endDate).getTime() >= new Date($scope.advTask.endDate).getTime()) {
// here save the task array using bulkSave() function
$db.bulkSave(tasks).then(function (res) {
$db.sync();
});
break;
}
advTask.startDate = $Date.toGMT(advTask.startDate);
advTask.endDate = $Date.toGMT(advTask.endDate);
var adv = angular.copy(advTask);
tasks.push(adv); // here pushing the documents to an array
offset = advTask.every;
}
Thanks in advance!
bulkSave is not a core PouchDB API; are you using a plugin?
Also one piece of advice I'd give is that Couchbase Sync Gateway does not have 100% support for PouchDB and is known to be problematic in some cases.
Another piece of advice is that running PouchDB in a web worker can prevent your UI thread from getting overloaded, which would fix the problem of tabs not showing up.
Do you have a live test case to demonstrate?

Good graph database for finding intersections (Neo4j? Pegasus? Allegro?...)

I'm looking for a good graph database for finding set intersections -- taking any two nodes and looking at whether their edge endpoints "overlap." Social network analogy would be two look at two people and see whether they are are connected to the same people.
I've tried to get FlockDB (from the folks at Twitter) working, because intersection functions are built in, but found there wasn't much in terms of user community/support. So any recommendations of other graph databases, especially where the kind of intersection functionality I'm looking for already exists...?
Isn't that just the shortest paths between the two nodes with length == 2 ?
In Neo4j you can use the shortestPath() Finder from the GraphAlgoFactory for that.
This would tell you if there is a connection:
Node from_node = index.get("guid", "user_a").getSingle();
Node to_node = index.get("guid", "user_b").getSingle();
if(from_node != null && to_node != null) {
RelationshipExpander expander = Traversal.expanderForAllTypes(Direction.BOTH);
PathFinder<Path> finder = GraphAlgoFactory.shortestPath(expander, 2);
if(finder.findSinglePath(from_node, to_node) != null) {
//Connected by at least 1 common friend
} else {
//Too far apart or not connected at all
}
}
This would tell you who are the common friends are:
Node from_node = index.get("guid", "user_a").getSingle();
Node to_node = index.get("guid", "user_b").getSingle();
if(from_node != null && to_node != null) {
RelationshipExpander expander = Traversal.expanderForAllTypes(Direction.BOTH);
PathFinder<Path> finder = GraphAlgoFactory.shortestPath(expander, 2);
Iterable<Path> paths = finder.findAllPaths(from_node, to_node);
if(paths != null) {
for(Path path : paths) {
Relationship relationship = path.relationships().iterator().next();
Node friend_of_friend = relationship.getEndNode();
}
} else {
//Too far apart or not connected at all
}
}
This code is a little rough and is much easier to express in Cypher (taken from the Cheet Sheet in the Neo4J Server console (great way to play with Neo4J after you populate a database):
START a = (user, name, "user_a")
MATCH (a)-[:FRIEND]->(friend)-[:FRIEND]->(friend_of_friend)
RETURN friend_of_friend
This will give you a list of the nodes shared between to otherwise disconnected nodes. You can pass this query to an embedded server thought the CypherParser class.

identify documents from results of mahout clustering

I am using mahout to cluster text documents indexed using solr.
I have used the "text" field in the document to form vectors. Then I used the k-means driver in mahout for clustering and then the clusterdumper utility to dump the results.
I am having difficulty in understanding the output results from the dumper. I could see the clusters formed with term vectors in those clusters.
But how do I extract the documents from these clusters. I want the result to be the input documents appearing in different clusters.
I also had this problem. The idea is that cluster dumper dumps all your cluster data with points and so on. You have two choices:
modify ClusterDumper.printClusters() method so it will not print all the terms and weights. I have some code like:
String clusterInfo = String.format("Cluster %d (%d) with %d points.\n", value.getId(), clusterCount, value.getNumPoints());
writer.write(clusterInfo);
writer.write('\n');
// list all top terms
if (dictionary != null) {
String topTerms = getTopFeatures(value.getCenter(), dictionary, numTopFeatures);
writer.write("\tTop Terms: ");
writer.write(topTerms);
writer.write('\n');
}
// list all the points in the cluster
List points = clusterIdToPoints.get(value.getId());
if (points != null) {
writer.write("\tCluster points:\n\t");
for (Iterator iterator = points.iterator(); iterator.hasNext();) {
WeightedVectorWritable point = iterator.next();
writer.write(String.valueOf(point.getWeight()));
writer.write(": ");
if (point.getVector() instanceof NamedVector) {
writer.write(((NamedVector) point.getVector()).getName() + " ");
}
}
writer.write('\n');
}
do some grep magic if possible and eliminate all the info about terms and weights.

Resources