String functions on Tinkerpop Query Language - graph-databases

Looking in the Tinkerpop doc, I could find a list of String functions:
TextP.startingWith(string) - Does the incoming String start with the provided String?
TextP.endingWith(string) - Does the incoming String end with the provided String?
TextP.containing(string) - Does the incoming String contain the provided String?
TextP.notStartingWith(string) - Does the incoming String not start with the provided String?
TextP.notEndingWith(string) - Does the incoming String not end with the provided String?
TextP.notContaining(string) - Does the incoming String not contain the provided String?
But, I could not find a way to use them. I also try looking the Javadoc about the TextP in http://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/process/traversal/TextP.html, but also could not find any good information over there.
Query filters like this bellow are working fine:
g.V().has( label, within( 'cake', 'coffee' ) ).limit(3)
Some examples of queries that I have tested and did not worked:
g.V().label().startingWith('c')
g.V().label().fold().startingWith('c')
g.V().label().fold().has(__.startingWith('c'))
g.V().has(label, startingWith('c'))
g.V().has(label, TextP.startingWith('c'))
g.V().has(label.startingWith('c'))

TextP is meant to work like any other Predicate and some of the usage you listed is correct:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has(label, startingWith("p")).label()
==>person
==>person
==>person
==>person
gremlin> g.V().has('name', endingWith("o")).values('name')
==>marko
As a side note I'm a bit surprised to see no examples in the documentation - I intend to get some added.

Considering the current limitations of the Gremlin on AWS Neptune the best way to do it that I could find was:
g.V().has(label, between("c","cz"))
This also works for any particular prefix:
g.V().has(label, between("foo","fooz"))

Related

Use content of a tuple as variable session

I extracted from a previous response an Object of tuple with the following regex :
.check(regex(""""idSc":(.{1,8}),"pasTemps":."codePasTemps":(.),"""").ofType[(String,String)].findAll.saveAs ("OBJECTS1"))
So I get my object :
OBJECTS1 -> List((1657751,2), (1658105,2), (4557378,2), (1657750,1), (916,1), (917,2), (1658068,1), (1658069,2), (4557379,2), (1658082,1), (4557367,1), (4557368,1), (1660865,2), (1660866,2), (1658122,1), (921,1), (922,2), (923,2), (1660875,1), (1660876,2), (1660877,2), (1658300,1), (1658301,1), (1658302,1), (1658309,1), (1658310,1), (2996562,1), (4638455,1))
After that I did a Foreach and need to extract every couple to add them in next requests So we tried :
.foreach("${OBJECTS1}", "couple") {
exec(http("request_foreach47"
.get("/ctr/web/api/seriegraph/bydates/${couple(0)}/${couple(1)}/1552863600000/1554191743799")
.headers(headers_27))
}
But I get the message : named 'couple' does not support index access
I also though that to use 2 regex on the couple to extract both part could work but I haven't found any way to use a regex on a session variable. (Even if its not needed for this case but possible im really interessed to learn how as it could be usefull)
If would be really thankfull if you could provided me help. (Im using Gatling 2 but can,'t use a more recent version as its for work and others scripts have been develloped with Gatling2)
each "couple" is a scala tuple which can't be indexed into like a collection. Fortunately the gatling EL has a function that handles tuples.
so instead of
.get("/ctr/web/api/seriegraph/bydates/${couple(0)}/${couple(1)}/1552863600000/1554191743799")
you can use
.get("/ctr/web/api/seriegraph/bydates/${couple._1}/${couple._2}/1552863600000/1554191743799")

Tone analyser only returns analysis for 1 sentence

When using tone analyser, I am only able to retrieve 1 result. For example, if I use the following input text.
string m_StringToAnalyse = "The World Rocks ! I Love Everything !! Bananas are awesome! Old King Cole was a merry old soul!";
The results only return the analysis for document level and sentence_id = 0, ie. "The World Rocks !". The analysis for the next 3 sentences are not returned.
Any idea what I am doing wrong or am I missing out anything? This is the case when running the provided sample code as well.
string m_StringToAnalyse = "This service enables people to discover and understand, and revise the impact of tone in their content. It uses linguistic analysis to detect and interpret emotional, social, and language cues found in text.";
Running Tone analysis using the sample code on the sample sentence provided above also return results for the document and the first sentence only.
I have tried with versions "2016-02-19" as well as "2017-03-15" with same results.
I believe that if you want sentence by sentence analysis you need to send every separate sentence as a JSON object. It will then return analysis in an array where id=SENTENCE_NUM.
Here is an example of one I did using multiple YouTube comments (using Python):
def get_comments(video):
#Get the comments from the Youtube API using requests
url = 'https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&maxResults=100&videoId='+ video +'&key=' + youtube_credentials['api_key']
r = requests.get(url)
comment_dict = list()
# for item in comments, add an object to the list with the text of the comment
for item in r.json()['items']:
the_comment = {"text": item['snippet']['topLevelComment']['snippet']['textOriginal']}
comment_dict.append(the_comment)
# return the list as JSON to the sentiment_analysis function
return json.dumps(comment_dict)
def sentiment_analysis(words):
# Load Watson Credentials using Python SDK
tone_analyzer = ToneAnalyzerV3(
username=watson_credentials['username'], password=watson_credentials['password'], version='2016-02-11')
# Get the tone, based on the JSON object that is passed to sentiment_analysis
return_sentiment = json.dumps(tone_analyzer.tone(text=words), indent=2)
return_sentiment = json.loads(return_sentiment)
Afterwards you can do whatever you want with the JSON object. I would also like to note for anyone else looking at this if you want to do an analysis of many objects, you can add sentences=False in the tone_analyzer.tone function.

How to query atom field with unicode value in Google App Engine production search?

I wrote some text search with use Google App Engine search.
In SDK I tested such query on atom field:
u'tag:"wartości"'
In production I run the same query but it not works on same data.
How can I do unicode query on atom field?
Is it possible to use unicode in Google App Engine search?
We are aware of this issue and plan to fix ASAP. The fix that we're currently planning will require that the atom field value include exactly the same accent characters in order to match. Matches will continue to be case-insensitive. We expect that at least initially, values that use combining diacritical marks will be treated as different values than those using precomposed characters. We may revisit that decision depending on feedback, but it's the most straightforward fix on our end.
For more on the precomposed characters vs. combining diacritical marks, see this Wikipedia article:
http://en.wikipedia.org/wiki/Precomposed_character
Chris
It looks that I need translate AtomField values into new string and I need to translate queries too. This workaround will allow only Polish unicode search. I do not know tonkenization rules so I use 'q', 'x' to expand alphabet since not used in Polish.
# coding=utf-8
translate = {
u'ą': u'aq',
u'Ą': u'Aq',
u'ć': u'cq',
u'Ć': u'Cq',
u'ę': u'eq',
u'Ę': u'Eq',
u'ł': u'lq',
u'Ł': u'Lq',
u'ń': u'nq',
u'Ń': u'Nq',
u'ó': u'oq',
u'Ó': u'Oq',
u'ś': u'sq',
u'Ś': u'Sq',
u'ż': u'zx',
u'Ż': u'Zx',
u'ź': u'zq',
u'Ź': u'Zq',
}
import re
reTranslate = re.compile(u'(%s)' % u'|'.join(translate))
print reTranslate.pattern
test = u"""\
Właściwie prowadzona komunikacja wewnętrzna w firmie,\
zwłaszcza dużej czy posiadającej rozproszoną sieć oddziałów,\
może przynieść oszczędność czasu, a co za tym idzie, również pieniędzy."""
print reTranslate.sub(lambda match: translate[match.group(0)], test)

Get the id + the map of a vertex on Gremlin?

g.v(1).id
gives me vertex 1 id,
g.v(1).map
gives me vertex 1 properties.
But, how can I get a hash with id and propeties at the same time
I know that it's an old question - so answers below will work on older versions of TinkerPop (3<); just if anyone (like me) stumbles upon this question and looks for a solution that works on TinkerPop 3 - the same result can be achieved by calling valueMap with 'true' argument, like this:
gremlin> g.v(1).valueMap(true)
reference may be found in docs here
As of Gremlin 2.4.0 you can also do something like:
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.v(1).out.map('name','age','id')
==>{id=2, age=27, name=vadas}
==>{id=4, age=32, name=josh}
==>{id=3, age=null, name=lop}
Another alternative using transform():
gremlin> g.v(1).out.transform{[it.id,it.map()]}
==>[2, {age=27, name=vadas}]
==>[4, {age=32, name=josh}]
==>[3, {name=lop, lang=java}]
if implementing with Java use
g.V(1).valueMap().with(WithOptions.tokens).toList()
I've found a solution
tab = new Table()
g.v(1).as('properties').as('id').table(tab){it.id}{it.map}
tab
Just extending on #Stephen's answer; to get the id and the map() output in a nice single Map for each Vertex, just use the plus or leftShift Map operations in the transform method.
Disclaimer: I'm using groovy, I haven't been able to test it in gremlin (I imagine it's exactly the same).
Groovy Code
println "==>" + g.v(1).out.transform{[id: it.id] + it.map()}.asList()
or
println "==>" + g.v(1).out.transform{[id: it.id] << it.map()}.asList()
Gives
==>[[id:2, age:27, name:vadas], [id:4, age:32, name:josh], [id:3, name:lop, lang:java]]

Plotting a word-cloud by date for a twitter search result? (using R)

I wish to search twitter for a word (let's say #google), and then be able to generate a tag cloud of the words used in twitts, but according to dates (for example, having a moving window of an hour, that moves by 10 minutes each time, and shows me how different words gotten more often used throughout the day).
I would appreciate any help on how to go about doing this regarding: resources for the information, code for the programming (R is the only language I am apt in using) and ideas on visualization. Questions:
How do I get the information?
In R, I found that the twitteR package has the searchTwitter command. But I don't know how big an "n" I can get from it. Also, It doesn't return the dates in which the twitt originated from.
I see here that I could get until 1500 twitts, but this requires me to do the parsing manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of twitts. Is it even possible to get them in retrospect?? (for example, asking older posts each time through the API URL ?) If not, there is the more general question of how to create a personal storage of twitts on your home computer? (a question which might be better left to another SO thread - although any insights from people here would be very interesting for me to read)
How to parse the information (in R)? I know that R has functions that could help from the rcurl and twitteR packages. But I don't know which, or how to use them. Any suggestions would be of help.
How to analyse? how to remove all the "not interesting" words? I found that the "tm" package in R has this example:
reuters <- tm_map(reuters, removeWords, stopwords("english"))
Would this do the trick? I should I do something else/more ?
Also, I imagine I would like to do that after cutting my dataset according to time (which will require some posix-like functions (which I am not exactly sure which would be needed here, or how to use it).
And lastly, there is the question of visualization. How do I create a tag cloud of the words? I found a solution for this here, any other suggestion/recommendations?
I believe I am asking a huge question here but I tried to break it to as many straightforward questions as possible. Any help will be welcomed!
Best,
Tal
Word/Tag cloud in R using "snippets" package
www.wordle.net
Using openNLP package you could pos-tag the tweets(pos=Part of speech) and then extract just the nouns, verbs or adjectives for visualization in a wordcloud.
Maybe you can query twitter and use the current system-time as a time-stamp, write to a local database and query again in increments of x secs/mins, etc.
There is historical data available at http://www.readwriteweb.com/archives/twitter_data_dump_infochimp_puts_1b_connections_up.php and http://www.wired.com/epicenter/2010/04/loc-google-twitter/
As for the plotting piece: I did a word cloud here: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ using the snippets package, my code is in there. I manually pulled out certain words. Check it out and let me know if you have more specific questions.
I note that this is an old question, and there are several solutions available via web search, but here's one answer (via http://blog.ouseful.info/2012/02/15/generating-twitter-wordclouds-in-r-prompted-by-an-open-learning-blogpost/):
require(twitteR)
searchTerm='#dev8d'
#Grab the tweets
rdmTweets <- searchTwitter(searchTerm, n=500)
#Use a handy helper function to put the tweets into a dataframe
tw.df=twListToDF(rdmTweets)
##Note: there are some handy, basic Twitter related functions here:
##https://github.com/matteoredaelli/twitter-r-utils
#For example:
RemoveAtPeople <- function(tweet) {
gsub("#\\w+", "", tweet)
}
#Then for example, remove #d names
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))
##Wordcloud - scripts available from various sources; I used:
#http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/
#Call with eg: tw.c=generateCorpus(tw.df$text)
generateCorpus= function(df,my.stopwords=c()){
#Install the textmining library
require(tm)
#The following is cribbed and seems to do what it says on the can
tw.corpus= Corpus(VectorSource(df))
# remove punctuation
tw.corpus = tm_map(tw.corpus, removePunctuation)
#normalise case
tw.corpus = tm_map(tw.corpus, tolower)
# remove stopwords
tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)
tw.corpus
}
wordcloud.generate=function(corpus,min.freq=3){
require(wordcloud)
doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
dm = as.matrix(doc.m)
# calculate the frequency of words
v = sort(rowSums(dm), decreasing=TRUE)
d = data.frame(word=names(v), freq=v)
#Generate the wordcloud
wc=wordcloud(d$word, d$freq, min.freq=min.freq)
wc
}
print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7))
##Generate an image file of the wordcloud
png('test.png', width=600,height=600)
wordcloud.generate(generateCorpus(tweets,'dev8d'),7)
dev.off()
#We could make it even easier if we hide away the tweet grabbing code. eg:
tweets.grabber=function(searchTerm,num=500){
require(twitteR)
rdmTweets = searchTwitter(searchTerm, n=num)
tw.df=twListToDF(rdmTweets)
as.vector(sapply(tw.df$text, RemoveAtPeople))
}
#Then we could do something like:
tweets=tweets.grabber('ukgc12')
wordcloud.generate(generateCorpus(tweets),3)
I would like to answer your question in making big word cloud.
What I did is
Use s0.tweet <- searchTwitter(KEYWORD,n=1500) for 7 days or more, such as THIS.
Combine them by this command :
rdmTweets = c(s0.tweet,s1.tweet,s2.tweet,s3.tweet,s4.tweet,s5.tweet,s6.tweet,s7.tweet)
The result:
This Square Cloud consists of about 9000 tweets.
Source: People voice about Lynas Malaysia through Twitter Analysis with R CloudStat
Hope it help!

Resources