customizable PageRank algorithm in Gremlin? - graph-databases

I'm looking for a Gremlin version of a customizable PageRank algorithm. There are a few old versions out there, one (from: http://www.infoq.com/articles/graph-nosql-neo4j) is pasted below. I'm having trouble fitting the flow into the current GremlinGroovyPipeline-based structure. What is the modernized equivalent of this or something like it?
$_g := tg:open()
g:load('data/graph-example-2.xml')
$m := g:map()
$_ := g:key('type', 'song')[g:rand-nat()]
repeat 2500
$_ := ./outE[#label='followed_by'][g:rand-nat()]/inV
if count($_) > 0
g:op-value('+',$m,$_[1]/#name, 1.0)
end
if g:rand-real() > 0.85 or count($_) = 0
$_ := g:key('type', 'song')[g:rand-nat()]
end
end
g:sort($m,'value',true())
Another version is available on slide 55 of http://www.slideshare.net/slidarko/gremlin-a-graphbased-programming-language-3876581. The ability to use the if statements and change the traversal based on them is valuable for customization.
many thanks

I guess I'll answer it myself in case somebody else needs it. Be warned that this is not a very efficient PageRank calculation. It should only be viewed as a learning example.
g = new TinkerGraph()
g.loadGraphML('graph-example-2.xml')
m = [:]
g.V('type','song').sideEffect{m[it.name] = 0}
// pick a random song node that has 'followed_by' edge
def randnode(g) {
return(g.V('type','song').filter{it.outE('followed_by').hasNext()}.shuffle[0].next())
}
v = randnode(g)
for(i in 0..2500) {
v = v.outE('followed_by').shuffle[0].inV
v = v.hasNext()?v.next():null
if (v != null) {
m[v.name] += 1
}
if ((Math.random() > 0.85) || (v == null)) {
v = randnode(g)
}
}
msum = m.values().sum()
m.each{k,v -> m[k] = v / msum}
println "top 10 songs: (normalized PageRank)"
m.sort {-it.value }[0..10]
Here's a good reference for a simplified one-liner:
https://groups.google.com/forum/m/#!msg/gremlin-users/CRIlDpmBT7g/-tRgszCTOKwJ
(as well as the Gremlin wiki: https://github.com/tinkerpop/gremlin/wiki)

Related

Loop mistake on Pine Script

I am trying a calculate golden cross with for loop but the code does not work (actually even I really dunno if it is written correctly). There is 2 gold cross but it says 0. I beg you to help me I am new at coding and I gonna lose my mind.
indicator("Cross count", overlay = true)
v1 = input.int(8, "Short WMA", minval=1)
v2 = input.int(21, "Long WMA", minval=1)
v3 = input.int(500, "Look Back Input", minval=10, step=10)
cp = 0
var label lbl = label.new(na, na, " ", style = label.style_label_left)
wmaS= ta.wma(close, v1)
wmaL = ta.wma(close, v2)
for i = 0 to v3
if ta.crossover(wmaS,wmaL)
cp := cp + 1
cp
plot(wmaS, color=color.purple)
plot(wmaL, color=color.orange)
label.set_xy(lbl, bar_index, high)
label.set_text(lbl, str.tostring(cp, "Golden Cross = "))`
This part does not make sense:
for i = 0 to v3
if ta.crossover(wmaS,wmaL)
cp := cp + 1
cp
You create a loop, but inside the loop, you check the same condition. So by default, you just check whether the current bar has a crossover 500 times in a row. If you want to iterate, you should use the [i] operator to check the status of the crossover in the past:
crossoverCond = ta.crossover(wmaS,wmaL)
for i = 0 to v3
if crossoverCond[i]
cp := cp + 1
cp

Binary search sort, Time out

Climbing the Leaderboard. Terminated due to timeout :(
A HackerRank challenge of algorithm category.
My approach: To reduce the overhead when large arrays as input, I used map() and a Binar search function with it. No luck. I got 7/11 test cases passed. Can I ask for some help to improve my code. The code below is my solution so far. I need help to improve it.
Problem description
class Score_board:
def climbingLeaderboard(self, scores, alice):
self.boardScores = sorted(set(scores), reverse=True)
alice_rank = list(map(self.rankBoard,alice))
return alice_rank
def rankBoard(self, current):
score_bank = self.boardScores
midIndex = len(score_bank)//2
while midIndex > 0:
if current in score_bank:
return(self.boardScores.index(current)+1)
elif current < score_bank[midIndex]:
score_bank = score_bank[midIndex:]
midIndex = len(score_bank)//2
elif current > score_bank[midIndex]:
score_bank = score_bank[:midIndex]
midIndex = len(score_bank)//2
else:
boardIndex = self.boardScores.index(score_bank[0])
if current in score_bank:
return(self.boardScores.index(current)+1)
elif current > score_bank[0] and boardIndex == 0:
return(1)
elif current < score_bank[0] and boardIndex == (len(self.boardScores)-1):
return(boardIndex + 2)
elif current > score_bank[0]:
return(boardIndex)
elif current < score_bank[0]:
return(boardIndex + 2)
if __name__ == '__main__':
scores_count = int(input())
scores = list(map(int, input().rstrip().split()))
alice_count = int(input())
alice = list(map(int, input().rstrip().split()))
coord = Score_board()
result = coord.climbingLeaderboard(scores, alice)
print('\n'.join(map(str, result)))
print('\n')

Repeat current poly reduce function on multiple objects that are selected?

I'm looping through multiple objects, but the loop stops before going to the next object.
Created a loop with condition. If condition is met, it calls a ReduceEdge() function. Problem is it will only iterate once and not go to the next object and repeat the procedure.
global proc ReduceEdge()
{
polySelectEdgesEveryN "edgeRing" 2;
polySelectEdgesEveryN "edgeLoop" 1;
polyDelEdge -cv on;
}
string $newSel[] = `ls -sl`;
for($i = 0; $i < size($newSel); $i++)
{
select $newSel[$i];
int $polyEval[] = `polyEvaluate -e $newSel[$i]`;
int $temp = $polyEval[0];
for($k = 0; $k < $temp; $k++)
{
string $polyInfo[] = `polyInfo -fn ($newSel[$i] + ".f[" + $k + "]")`;
$polyInfo = stringToStringArray($polyInfo[$i]," ");
float $vPosX = $polyInfo[2];
float $vPosY = $polyInfo[3];
float $vPosZ = $polyInfo[4];
if($vPosX == 0 && $vPosY == 0 && $vPosZ == 1.0)
{
select ($newSel[$i] + ".e[" + $k + "]");
ReduceEdge();
}
}
}
Expected results:
If I select 4 cylinders, all their edges will reduce by half the current amount.
Actual results:
When 4 cylinders are selected, only one reduces down to half the edges. The rest stay the same.
Since my comment did help you out, I'll try and give a more thorough explanation.
Your first loop (with $i) iterates over each object in your selection. This is fine.
Your second loop (with $k) iterates over the number of edges for the current object in the loop. So far, so good. Though, I'm wondering if it would be more correct to loop of the number of faces...
Now you ask for an array of all face normals of the face at index $k at object $i, with string $polyInfo[] = `polyInfo -fn ($newSel[$i] + ".f[" + $k + "]")`;.
If you try and print the size and values in $polyInfo, you'll realize you have an array with one element, which is the face normal of the particular face you queried just before. Therefore, it will always be element 0, and not $i, which would increases with every iteration.
I have made a Python/PyMEL version of the script, which may be nice for you to see.
import pymel.core as pm
import maya.mel as mel
def reduceEdge():
mel.eval('polySelectEdgesEveryN "edgeRing" 2;')
mel.eval('polySelectEdgesEveryN "edgeLoop" 1;')
pm.polyDelEdge(cv=True)
def reducePoly():
selection = pm.ls(sl=True)
for obj in selection:
for i, face in enumerate(obj.f):
normal = face.getNormal()
if (normal.x == 0.0 and normal.y == 0.0 and normal.z == 1.0):
pm.select(obj + '.e[' + str(i) + ']')
reduceEdge()
reducePoly()

Text mining Clustering Analysis in R - Error :Two dimensional array

I'm trying to follow a document that has some code on text mining clustering analysis.
I'm fairly new to R and the concept of text mining/clustering so please bear with me if i sound illiterate.
I create a simple matrix called dtm and then run kmeans to produce 3 clusters. The code im having issues is where a function has been defined to get "five most common words of the documents in the cluster"
dtm0.75 = as.matrix(dt0.75)
dim(dtm0.75)
kmeans.result = kmeans(dtm0.75, 3)
perClusterCounts = function(df, clusters, n)
{
v = sort(colSums(df[clusters == n, ]),
decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
d[1:5, ]
}
perClusterCounts(dtm0.75, kmeans.result$cluster, 1)
Upon running this code i get the following error:
Error in colSums(df[clusters == n, ]) :
'x' must be an array of at least two dimensions
Could someone help me fix this please?
Thank you.
I can't reproduce your error, it works fine for me. Update your question with a reproducible example and you might get a more useful answer. Perhaps your input data object is empty, what do you get with dim(dtm0.75)?
Here it is working fine on the data that comes with the tm package:
library(tm)
data(crude)
dt0.75 <- DocumentTermMatrix(crude)
dtm0.75 = as.matrix(dt0.75)
dim(dtm0.75)
kmeans.result = kmeans(dtm0.75, 3)
perClusterCounts = function(df, clusters, n)
{
v = sort(colSums(df[clusters == n, ]),
decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
d[1:5, ]
}
perClusterCounts(dtm0.75, kmeans.result$cluster, 1)
word freq
the the 69
and and 25
for for 12
government government 11
oil oil 10

Fastest and most effective way of comparing two array of hashes of different format

I have two arrays of hashes with the format:
hash1
[{:root => root_value, :child1 => child1_value, :subchild1 => subchild1_value, bases => hit1,hit2,hit3}...]
hash2
[{:path => root_value/child1_value/subchild1_value, :hit1_exist => t ,hit2_exist => t,hit3_exist => f}...]
IF I do this
Def sample
results = nil
project = Project.find(params[:project_id])
testrun_query = "SELECT root_name, suite_name, case_name, ic_name, executed_platforms FROM testrun_caches WHERE start_date >= '#{params[:start_date]}' AND start_date < '#{params[:end_date]}' AND project_id = #{params[:project_id]} AND result <> 'SKIP' AND result <> 'N/A'"
if !params[:platform].nil? && params[:platform] != [""]
#yell_and_log "platform not nil"
platform_query = nil
params[:platform].each do |platform|
if platform_query.nil?
platform_query = " AND (executed_platforms LIKE '%#{platform.to_s},%'"
else
platform_query += " OR executed_platforms LIKE '%#{platform.to_s},%'"
end
end
testrun_query += ")" + platform_query
end
if !params[:location].nil? &&!params[:location].empty?
#yell_and_log "location not nil"
testrun_query += "AND location LIKE '#{params[:location].to_s}%'"
end
testrun_query += " GROUP BY root_name, suite_name, case_name, ic_name, executed_platforms ORDER BY root_name, suite_name, case_name, ic_name"
ic_query = "SELECT ics.path, memberships.pts8210, memberships.sv6, memberships.sv7, memberships.pts14k, memberships.pts22k, memberships.pts24k, memberships.spb32, memberships.spb64, memberships.sde, projects.name FROM ics INNER JOIN memberships on memberships.ic_id = ics.id INNER JOIN test_groups ON test_groups.id = memberships.test_group_id INNER JOIN projects ON test_groups.project_id = projects.id WHERE deleted = 'false' AND (memberships.pts8210 = true OR memberships.sv6 = true OR memberships.sv7 = true OR memberships.pts14k = true OR memberships.pts22k = true OR memberships.pts24k = true OR memberships.spb32 = true OR memberships.spb64 = true OR memberships.sde = true) AND projects.name = '#{project.name}' GROUP BY path, memberships.pts8210, memberships.sv6, memberships.sv7, memberships.pts14k, memberships.pts22k, memberships.pts24k, memberships.spb32, memberships.spb64, memberships.sde, projects.name ORDER BY ics.path"
if params[:ic_type] == "never_run"
runtest = TestrunCache.connection.select_all(testrun_query)
alltest = TrsIc.connection.select_all(ic_query)
(alltest.length).times do |i|
#exec_pltfrm = test['executed_platforms'].split(",")
unfinishedtest = comparison(runtest[i],alltest[i])
yell_and_log("test = #{unfinishedtest}")
yell_and_log("#{runtest[i]}")
yell_and_log("#{alltest[i]}")
end
end
end
I get in my log:
test = true
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"cli", "case_name"=>"functional", "ic_name"=>"cli_sanity_test", "executed_platforms"=>"pts22k,pts24k,sv7,"}
array of hash 2 = {"path"=>"BSDPLATFORM/cli/functional/cli_sanity_test", "pts8210"=>"f", "sv6"=>"f", "sv7"=>"t", "pts14k"=>nil, "pts22k"=>"t", "pts24k"=>"t", "spb32"=>nil, "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}
test = false
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"infrastructure", "case_name"=>"bypass_pts14k_copper", "ic_name"=>"ic_packet_9", "executed_platforms"=>"sv6,"}
array of hash 2 = {"path"=>"BSDPLATFORM/infrastructure/build/copyrights", "pts8210"=>"f", "sv6"=>"t", "sv7"=>"t", "pts14k"=>"f", "pts22k"=>"t", "pts24k"=>"t", "spb32"=>"f", "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}
test = false
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"infrastructure", "case_name"=>"bypass_pts14k_copper", "ic_name"=>"ic_status_1", "executed_platforms"=>"sv6,"}
array of hash 2 = {"path"=>"BSDPLATFORM/infrastructure/build/ic_1", "pts8210"=>"f", "sv6"=>"t", "sv7"=>"t", "pts14k"=>"f", "pts22k"=>"t", "pts24k"=>"t", "spb32"=>"f", "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}
test = false
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"infrastructure", "case_name"=>"bypass_pts14k_copper", "ic_name"=>"ic_status_2", "executed_platforms"=>"sv6,"}
array of hash 2 = {"path"=>"BSDPLATFORM/infrastructure/build/ic_files", "pts8210"=>"f", "sv6"=>"t", "sv7"=>"f", "pts14k"=>"f", "pts22k"=>"t", "pts24k"=>"t", "spb32"=>"f", "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}
SO I get only the first to match but rest becomes different and I get result of one instead of 4230
I would like some way to match by path and root/suite/case/ic and then compare the executed platforms passed in array of hashes 1 vs platforms set to true in array of hash2
Not sure if this is fastest, and I wrote this based on your original question that didn't provide sample code, but:
def compare(h1, h2)
(h2[:path] == "#{h1[:root]}/#{h1[:child1]}/#{h1[:subchild1]}") && \
(h2[:hit1_exist] == ((h1[:bases][0] == nil) ? 'f' : 't')) && \
(h2[:hit2_exist] == ((h1[:bases][1] == nil) ? 'f' : 't')) && \
(h2[:hit3_exist] == ((h1[:bases][2] == nil) ? 'f' : 't'))
end
def compare_arr(h1a, h2a)
(h1a.length).times do |i|
compare(h1a[i],h2a[i])
end
end
Test:
require "benchmark"
h1a = []
h2a = []
def rstr
# from http://stackoverflow.com/a/88341/178651
(0...2).map{65.+(rand(26)).chr}.join
end
def rnil
rand(2) > 0 ? '' : nil
end
10000.times do
h1a << {:root => rstr(), :child1 => rstr(), :subchild1 => rstr(), :bases => [rnil,rnil,rnil]}
h2a << {:path => '#{rstr()}/#{rstr()}/#{rstr()}', :hit1_exist => 't', :hit2_exist => 't', :hit3_exist => 'f'}
end
Benchmark.measure do
compare_arr(h1a,h2a)
end
Results:
=> 0.020000 0.000000 0.020000 ( 0.024039)
Now that I'm looking at your code, I think it could be optimized by removing array creations, and splits and joins which are creating arrays and strings that need to be garbage collected which also will slow things down, but not by as much as you mention.
Your database queries may be slow. Run explain/analyze or similar on them to see why each is slow, optimize/reduce your queries, add indexes where needed, etc. Also, check cpu and memory utilization, etc. It might not just be the code.
But, there are some definite things that need to be fixed. You also have several risks of SQL injection attack, e.g.:
... start_date >= '#{params[:start_date]}' AND start_date < '#{params[:end_date]}' AND project_id = #{params[:project_id]} ...
Anywhere that params and variables are put directly into the SQL may be a danger. You'll want to make sure to use prepared statements or at least SQL escape the values. Read this all the way through: http://guides.rubyonrails.org/active_record_querying.html
([element_being_tested].each do |el|
[hash_array_1, hash_array_2].reject do |x, y|
x[el] == y[el]
end
end).each {|x, y| puts (x[bases] | y[bases])}
Enumerate the hash elements to test.
[element_being_tested].each do |el|
Then iterate through the hash arrays themselves, comparing the given hashes by the elements of the given comparison defined by the outer loop, rejecting those not appropriately equal. (The == may actually need to be != but you can figure that much out)
[hash_array_1, hash_array_2].reject do |x, y|
x[el] == y[el]
end
Finally, you again compare the hashes taking the set union of their elements.
.each {|x, y| puts (x[bases] | y[bases])}
You may need to test the code. It's not meant for production so much as demonstration because I wasn't sure I read your code right. Please post a larger sample of the source including the data structures in question if this answer is unsatisfactory.
Regarding speed: if you're iterating through a large data set and comparing multiple there's probably nothing you can do. Perhaps you can invert the loops I presented and make the hash arrays the outer loop. You're not going to get lightning speed here in Ruby (really any language) if the data structure is large.

Resources