How to remove output limit in FollowRecursive? - graph-databases

I'm using a FollowRecursive query to traverse a graph where every node is connected with the predicate "next". The problem is that I can never get more than 99 source => target mappings.
Why is the output limited to only 100 {source: N, Target: M} objects?
The query looks like follows (all variables are of course defined):
var chain_pred = "next";
var c1 = g.M().Out(chain_pred);
var start_node = "begin";
g.V(start_node).FollowRecursive(c1).ForEach( function(v){
g.V(v.id).Out(chain_pred).ForEach( function(t){
var node = {
source: v.id,
target: t.id
}
g.Emit(node)
})
})
I wrote the same query with java script recursive calls (in DepthFirstSearch), and it turns out that I can't get more than 100 objects. I can get the expected output till depth-3. At depth 4, I start to lose entire tree branches in the start node. This implies that there is definitely a cap on recursion that kills the query after 100 results.
How to remove this limitation?

A bit late, but I answer it anyway:
FollowRecursive(c1, -1)

Related

Why does Flink emit duplicate records on a DataStream join + Global window?

I'm learning/experimenting with Flink, and I'm observing some unexpected behavior with the DataStream join, and would like to understand what is happening...
Let's say I have two streams with 10 records each, which I want to join on a id field. Let's assume that for each record in one stream had a matching one in the other, and the IDs are unique in each stream. Let's also say I have to use a global window (requirement).
Join using DataStream API (my simplified code in Scala):
val stream1 = ... // from a Kafka topic on my local machine (I tried with and without .keyBy)
val stream2 = ...
stream1
.join(stream2)
.where(_.id).equalTo(_.id)
.window(GlobalWindows.create()) // assume this is a requirement
.trigger(CountTrigger.of(1))
.apply {
(row1, row2) => // ...
}
.print()
Result:
Everything is printed as expected, each record from the first stream joined with a record from the second one.
However:
If I re-send one of the records (say, with an updated field) from one of the stream to that stream, two duplicate join events get emitted 😞
If I repeat that operation (with or without updated field), I will get 3 emitted events, then 4, 5, etc... 😞
Could someone in the Flink community explain why this is happening? I would have expected only 1 event emitted each time. Is it possible to achieve this with a global window?
In comparison, the Flink Table API behaves as expected in that same scenario, but for my project I'm more interested in the DataStream API.
Example with Table API, which worked as expected:
tableEnv
.sqlQuery(
"""
|SELECT *
| FROM stream1
| JOIN stream2
| ON stream1.id = stream2.id
""".stripMargin)
.toRetractStream[Row]
.filter(_._1) // just keep the inserts
.map(...)
.print() // works as expected, after re-sending updated records
Thank you,
Nicolas
The issue is that records are never removed from your global window. So you trigger the join operation on the global window, whenever a new record has arrived, but the old records are still present.
Thus, to get it running in your case, you'd need to implement a custom evictor. I expanded your example in a minimal working example and added the evictor, which I will explain after the snippet.
val data1 = List(
(1L, "myId-1"),
(2L, "myId-2"),
(5L, "myId-1"),
(9L, "myId-1"))
val data2 = List(
(3L, "myId-1", "myValue-A"))
val stream1 = env.fromCollection(data1)
val stream2 = env.fromCollection(data2)
stream1.join(stream2)
.where(_._2).equalTo(_._2)
.window(GlobalWindows.create()) // assume this is a requirement
.trigger(CountTrigger.of(1))
.evictor(new Evictor[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)], GlobalWindow](){
override def evictBefore(elements: lang.Iterable[TimestampedValue[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)]]], size: Int, window: GlobalWindow, evictorContext: Evictor.EvictorContext): Unit = {}
override def evictAfter(elements: lang.Iterable[TimestampedValue[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)]]], size: Int, window: GlobalWindow, evictorContext: Evictor.EvictorContext): Unit = {
import scala.collection.JavaConverters._
val lastInputTwoIndex = elements.asScala.zipWithIndex.filter(e => e._1.getValue.isTwo).lastOption.map(_._2).getOrElse(-1)
if (lastInputTwoIndex == -1) {
println("Waiting for the lookup value before evicting")
return
}
val iterator = elements.iterator()
for (index <- 0 until size) {
val cur = iterator.next()
if (index != lastInputTwoIndex) {
println(s"evicting ${cur.getValue.getOne}/${cur.getValue.getTwo}")
iterator.remove()
}
}
}
})
.apply((r, l) => (r, l))
.print()
The evictor will be applied after the window function (join in this case) has been applied. It's not entirely clear how your use case exactly should work in case you have multiple entries in the second input, but for now, the evictor only works with single entries.
Whenever a new element comes into the window, the window function is immediately triggered (count = 1). Then the join is evaluated with all elements having the same key. Afterwards, to avoid duplicate outputs, we remove all entries from the first input in the current window. Since, the second input may arrive after the first inputs, no eviction is performed, when the second input is empty. Note that my scala is quite rusty; you will be able to write it in a much nicer way. The output of a run is:
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
4> ((1,myId-1),(3,myId-1,myValue-A))
4> ((5,myId-1),(3,myId-1,myValue-A))
4> ((9,myId-1),(3,myId-1,myValue-A))
evicting (1,myId-1)/null
evicting (5,myId-1)/null
evicting (9,myId-1)/null
A final remark: if the table API offers already a concise way of doing what you want, I'd stick to it and then convert it to a DataStream when needed.

Node fast way to find in array

I have a Problem.
My script was working fine and fast, when there was only like up to 5000 Objects in my Array.
Now there over 20.000 Objects and it runs slower and slower...
This is how i called it
for(var h in ItemsCases) {
if(itmID == ItemsCases[h].sku) {
With "for" for every object and check where the sku is my itmID, cause i dont want every ItemsCases. Only few of it each time.
But what is the fastest and best way to get the items with the sku i need out of it?
I think mine, is not the fastest...
I get multiple items now with that code
var skus = res.response.cases[x].skus;
for(var j in skus) {
var itmID = skus[j];
for(var h in ItemsCases) {
if(itmID == ItemsCases[h].sku) {
the skus is also an array
ItemsCases.find(item => item.sku === itmID) (or a for loop like yours, depending on the implementation) is the fastest you can do with an array (if you can have multiple items returned, use filter instead of find).
Use a Map or an object lookup if you need to be faster than that. It does need preparation and memory, but if you are searching a lot it may well be worth it. For example, using a Map:
// preparation of the lookup
const ItemsCasesLookup = new Map();
ItemsCases.forEach(item => {
const list = ItemsCasesLookup.get(item.sku);
if (list) {
list.push(item)
} else {
ItemsCasesLookup.set(item.sku, [item]);
}
});
then later you can get all items for the same sku like this:
ItemsCasesLookup.get(itmID);
A compromise (not more memory, but some speedup) can be achieved by pre-sorting your array, then using a binary search on it, which is much faster than linear search you have to do on an unprepared array.

Count the elements in a arrays in Mongodb documents: efficiency and performance

I have a collection in a mongodb. The documents contain an array. I want to know how many element exist in all arrays in all documents. There are a couple of ways of doing this (covered in this question: MongoDB: count the number of items in an array), and I want to know which method is more efficient, and how to tell which method is more efficient.
The documents look like this:
{"foo":[1,2,3,4]}
{"foo":[5,6,7]}
I can get the number I want by running either:
db.countTest.aggregate([{$unwind:"$foo"},{$group:{_id:null,"total foo":{$sum:1}}}])
or
db.countTest.aggregate([{$project:{count:{$size:"$foo"}}},{$group:{_id:null,"total foo":{$sum:"$count"}}}])
SO my question is: which method is more efficient, and how do i tell which is more efficient. I have run the aggregate queries with {explain:true} however I didn't find anything especially useful in the output. My feeling is the second method is more efficient, since it doesn't require the unwind, but I was hoping to be able to prove that. The reason for asking is that this example is a scaled down version of a real life scenario with much more, much larger documents.
Also, what are the potential performance impact of running these aggregations on the mongodb? Could the aggregation affect the database performance in general, and the response times of other users?
thanks for your help.
I created a collection with over 1million documents each looking like this:
{ "_id" : ObjectId("56f155f6006ec8ceeec407ec"), "foo" : [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 ] }
I then wrote a javascript which runs the different aggregate queries three times, calculates the deltas and prints out the results. The script looks like this:
var start = Date.now();
db.countTest.aggregate([{$unwind:"$foo"},{$group:{_id:null,"total foo":{$sum:1}}}])
var done = Date.now();
var delta1 = done - start;
print(delta1);
var start = Date.now();
db.countTest.aggregate([{$unwind:"$foo"},{$group:{_id:null,"total foo":{$sum:1}}}])
var done = Date.now();
var delta2 = done - start;
print(delta2);
var start = Date.now();
db.countTest.aggregate([{$unwind:"$foo"},{$group:{_id:null,"total foo":{$sum:1}}}])
var done = Date.now();
var delta3 = done - start;
print(delta3);
var avDelta = (delta1 + delta2 + delta3) / 3;
print ("$Unwind query: average query duration (millis):" + avDelta);
var start = Date.now();
db.countTest.aggregate([{$project:{count:{$size:"$foo"}}},{$group:{_id:null,"total foo":{$sum:"$count"}}}])
var done = Date.now();
var delta1 = done - start;
print(delta1);
var start = Date.now();
db.countTest.aggregate([{$project:{count:{$size:"$foo"}}},{$group:{_id:null,"total foo":{$sum:"$count"}}}])
var done = Date.now();
var delta2 = done - start;
print(delta2);
var start = Date.now();
db.countTest.aggregate([{$project:{count:{$size:"$foo"}}},{$group:{_id:null,"total foo":{$sum:"$count"}}}])
var done = Date.now();
var delta3 = done - start;
print(delta3);
var avDelta = (delta1 + delta2 + delta3) / 3;
print ("$Size query: average query duration (millis):" + avDelta);
This returns the following results:
MongoDB shell version: 3.2.0
connecting to: test
2630
2573
2563
$Unwind query: average query duration (millis):2588.6666666666665
2026
2107
2003
$Size query: average query duration (millis):2045.3333333333333
I ran this several times and the unwind query was consistently slower. It would be interesting to see what difference comes from larger documents, as running a similar query against a production collection with ~250k larger more complex documents is taking significantly longer than 2-3seconds.

In Firebase, is there a way to get the number of children of a node without loading all the node data?

You can get the child count via
firebase_node.once('value', function(snapshot) { alert('Count: ' + snapshot.numChildren()); });
But I believe this fetches the entire sub-tree of that node from the server. For huge lists, that seems RAM and latency intensive. Is there a way of getting the count (and/or a list of child names) without fetching the whole thing?
The code snippet you gave does indeed load the entire set of data and then counts it client-side, which can be very slow for large amounts of data.
Firebase doesn't currently have a way to count children without loading data, but we do plan to add it.
For now, one solution would be to maintain a counter of the number of children and update it every time you add a new child. You could use a transaction to count items, like in this code tracking upvodes:
var upvotesRef = new Firebase('https://docs-examples.firebaseio.com/android/saving-data/fireblog/posts/-JRHTHaIs-jNPLXOQivY/upvotes');
upvotesRef.transaction(function (current_value) {
return (current_value || 0) + 1;
});
For more info, see https://www.firebase.com/docs/transactions.html
UPDATE:
Firebase recently released Cloud Functions. With Cloud Functions, you don't need to create your own Server. You can simply write JavaScript functions and upload it to Firebase. Firebase will be responsible for triggering functions whenever an event occurs.
If you want to count upvotes for example, you should create a structure similar to this one:
{
"posts" : {
"-JRHTHaIs-jNPLXOQivY" : {
"upvotes_count":5,
"upvotes" : {
"userX" : true,
"userY" : true,
"userZ" : true,
...
}
}
}
}
And then write a javascript function to increase the upvotes_count when there is a new write to the upvotes node.
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp(functions.config().firebase);
exports.countlikes = functions.database.ref('/posts/$postid/upvotes').onWrite(event => {
return event.data.ref.parent.child('upvotes_count').set(event.data.numChildren());
});
You can read the Documentation to know how to Get Started with Cloud Functions.
Also, another example of counting posts is here:
https://github.com/firebase/functions-samples/blob/master/child-count/functions/index.js
Update January 2018
The firebase docs have changed so instead of event we now have change and context.
The given example throws an error complaining that event.data is undefined. This pattern seems to work better:
exports.countPrescriptions = functions.database.ref(`/prescriptions`).onWrite((change, context) => {
const data = change.after.val();
const count = Object.keys(data).length;
return change.after.ref.child('_count').set(count);
});
```
This is a little late in the game as several others have already answered nicely, but I'll share how I might implement it.
This hinges on the fact that the Firebase REST API offers a shallow=true parameter.
Assume you have a post object and each one can have a number of comments:
{
"posts": {
"$postKey": {
"comments": {
...
}
}
}
}
You obviously don't want to fetch all of the comments, just the number of comments.
Assuming you have the key for a post, you can send a GET request to
https://yourapp.firebaseio.com/posts/[the post key]/comments?shallow=true.
This will return an object of key-value pairs, where each key is the key of a comment and its value is true:
{
"comment1key": true,
"comment2key": true,
...,
"comment9999key": true
}
The size of this response is much smaller than requesting the equivalent data, and now you can calculate the number of keys in the response to find your value (e.g. commentCount = Object.keys(result).length).
This may not completely solve your problem, as you are still calculating the number of keys returned, and you can't necessarily subscribe to the value as it changes, but it does greatly reduce the size of the returned data without requiring any changes to your schema.
Save the count as you go - and use validation to enforce it. I hacked this together - for keeping a count of unique votes and counts which keeps coming up!. But this time I have tested my suggestion! (notwithstanding cut/paste errors!).
The 'trick' here is to use the node priority to as the vote count...
The data is:
vote/$issueBeingVotedOn/user/$uniqueIdOfVoter = thisVotesCount, priority=thisVotesCount
vote/$issueBeingVotedOn/count = 'user/'+$idOfLastVoter, priority=CountofLastVote
,"vote": {
".read" : true
,".write" : true
,"$issue" : {
"user" : {
"$user" : {
".validate" : "!data.exists() &&
newData.val()==data.parent().parent().child('count').getPriority()+1 &&
newData.val()==newData.GetPriority()"
user can only vote once && count must be one higher than current count && data value must be same as priority.
}
}
,"count" : {
".validate" : "data.parent().child(newData.val()).val()==newData.getPriority() &&
newData.getPriority()==data.getPriority()+1 "
}
count (last voter really) - vote must exist and its count equal newcount, && newcount (priority) can only go up by one.
}
}
Test script to add 10 votes by different users (for this example, id's faked, should user auth.uid in production). Count down by (i--) 10 to see validation fail.
<script src='https://cdn.firebase.com/v0/firebase.js'></script>
<script>
window.fb = new Firebase('https:...vote/iss1/');
window.fb.child('count').once('value', function (dss) {
votes = dss.getPriority();
for (var i=1;i<10;i++) vote(dss,i+votes);
} );
function vote(dss,count)
{
var user='user/zz' + count; // replace with auth.id or whatever
window.fb.child(user).setWithPriority(count,count);
window.fb.child('count').setWithPriority(user,count);
}
</script>
The 'risk' here is that a vote is cast, but the count not updated (haking or script failure). This is why the votes have a unique 'priority' - the script should really start by ensuring that there is no vote with priority higher than the current count, if there is it should complete that transaction before doing its own - get your clients to clean up for you :)
The count needs to be initialised with a priority before you start - forge doesn't let you do this, so a stub script is needed (before the validation is active!).
write a cloud function to and update the node count.
// below function to get the given node count.
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp(functions.config().firebase);
exports.userscount = functions.database.ref('/users/')
.onWrite(event => {
console.log('users number : ', event.data.numChildren());
return event.data.ref.parent.child('count/users').set(event.data.numChildren());
});
Refer :https://firebase.google.com/docs/functions/database-events
root--|
|-users ( this node contains all users list)
|
|-count
|-userscount :
(this node added dynamically by cloud function with the user count)

Sorted array: how to get position before and after using name? as3

I have been working on a project and Stack Overflow has helped me with a few problems so far, so I am very thankful!
My question is this:
I have an array like this:
var records:Object = {};
var arr:Array = [
records["nh"] = { medinc:66303, statename:"New Hampshire"},
records["ct"] = { medinc:65958, statename:"Connecticut"},
records["nj"] = { medinc:65173, statename:"New Jersey"},
records["md"] = { medinc:64596, statename:"Maryland"},
etc... for all 50 states. And then I have the array sorted reverse numerically (descending) like this:
arr.sortOn("medinc", Array.NUMERIC);
arr.reverse();
Can I call the name of the record (i.e. "nj" for new jersey) and then get the value from the numeric position above and below the record in the array?
Basically, medinc is medium income of US states, and I am trying to show a ranking system... a user would click Texas for example, and it would show the medinc value for Texas, along with the state the ranks one position below and the state that ranks one position above in the array.
Thanks for your help!
If you know the object, you can use the array.indexOf().
var index:int = records.indexOf(records["nj"]);
var above:Object;
var below:Object;
if(index + 1 < records.length){ //make sure your not already at the top
above = records[index+1];
}
if(index > 0){ //make sure your not already at the bottom
below = records[index-1];
}
I think this is the answer based on my understanding of your data.
var index:int = arr.indexOf(records["nh"]);
That will get you the index of the record that was clicked on and then for find the ones below and above just:
var clickedRecord:Object = arr[index]
var higherRecord:Object = arr[index++]
var lowerRecord:Object = arr[index--]
Hope that answers your question
Do you really need records to be hash?
If no, you can simply move key to record field and change records to simple array:
var records: Array = new Array();
records.push({ short: "nh", medinc:66303, statename:"New Hampshire"}),
records.push({ short: "ct", medinc:65958, statename:"Connecticut"}),
....
This gives you opportunity to create class for State, change Array to Vector and make all of this type-safe, what is always good.
If you really need those keys, you can add objects like above (with "short" field) in the same way you are doing it now (maybe using some helper function which will help to avoid typing shortname twice, like addState(records, data) { records[data.short] = data }).
Finally, you can also keep those records in two objects (or an object and an array or whatever you need). This will not be expensive, if you will create state object once and keep references in array/object/vector. It would be nice idea if you need states sorted on different keys often.
This is not really a good way to have your data set up - too much typing (you are repeating "records", "medinc", "statename" over and over again, while you definitely could've avoided it, for example:
var records:Array = [];
var states:Array = ["nh", "ct", "nj" ... ];
var statenames:Array = ["New Hampshire", "Connecticut", "New Jersey" ... ];
var medincs:Array = [66303, 65958, 65173 ... ];
var hash:Object = { };
function addState(state:String, medinc:int, statename:String, hash:Object):Object
{
return hash[state] = { medinc: medinc, statename: statename };
}
for (var i:int; i < 50; i++)
{
records[i] = addState(states[i], medincs[i], statenames[i], hash);
}
While you have done it already the way you did, that's not essential, but this could've saved you some keystrokes, if you haven't...
Now, onto your search problem - first of all, true, it would be worth to sort the array before you search, but if you need to search an array by the value of the parameter it was sorted on, there is a better algorithm for that. That is, if given the data in your example, your specific task was to find out in what state the income is 65958, then, knowing that array is sorted on income you could employ binary search.
Now, for the example with 50 states the difference will not be noticeable, unless you do it some hundreds of thousands times per second, but in general, the binary search would be the way to go.
If the article in Wiki looks too long to read ;) the idea behind the binary search is that at first you guess that the searched value is exactly in the middle of the array - you try that assumption and if you guessed correct, return the index you just found, else - you select the interval containing the searched value (either one half of the array remaining) and do so until you either find the value, or check the same index - which would mean that the value is not found). This reduces asymptotic complexity of the algorithm from O(n) to O(log n).
Now, if your goal was to find the correspondence between the income and the state, but it wasn't important how that scales with other states (i.e. the index in the array is not important), you could have another hash table, where the income would be the key, and the state information object would be the value, using my example above:
function addState(state:String, medinc:int, statename:String,
hash:Object, incomeHash:Object):Object
{
return incomeHash[medinc] =
hash[state] = { medinc: medinc, statename: statename };
}
Then incomeHash[medinc] would give you the state by income in O(1) time.

Resources