I have encountered functions DATE_ADDDAYSTODATE, DATE_ADDMONTHSTODATE, and DATE_ADDYEARSTODATE that work but I cannot locate any documentation:
SELECT DATE_ADDDAYSTODATE(1, '2022-01-01'),
DATE_ADDMONTHSTODATE(1, '2022-01-01'),
DATE_ADDYEARSTODATE(1, '2022-01-01');
Output:
In general:
SELECT DATE_ADD<time_part>STODATE(...),
DATE_ADD<time_part>STOTIMESTAMP(...)
They seems to be alias/equivalent of DATEADD DATEADD(<time_part>, 1, '2022-01-01). They are not disoverable using SHOW FUNCTIONS:
SHOW BUILTIN FUNCTIONS LIKE 'DATE%';
I am searching for a documentation entry or explanation why we cannot list them using SHOW FUNCTIONS command.
Related
I have problems using jooq with kotlin and the any clause.
Given the following:
I have a Database field in a postgreSQL database which is an array
I have search parameters which are a List of Strings
I want to use jooq any operator to search in the array
I have the following code which is not working:
fun findAll(
someArrayListOfStrings: List<String>?
): List<SomeDTO> {
val filters = ArrayList<Condition>()
filters.add(TABLE.SOME_FIELD.eq(DSL.any(someArrayListOfStrings)))
}
Here I want to dynamically create filters (jooq Conditions) to be added to some SQL statement. It should work to look if SOME_FIELD (PostgreSQL Array Type) contains one of the following strings using the ANY clause (PostgreSQL jooq binding). However I get the following compile-time error:
None of the following functions can be called with the arguments supplied:
public abstract fun eq(p0: Array<(out) String!>!): Condition defined in org.jooq.TableField
public abstract fun eq(p0: Field<Array<(out) String!>!>!): Condition defined in org.jooq.TableField
public abstract fun eq(p0: QuantifiedSelect<out Record1<Array<(out) String!>!>!>!): Condition defined in org.jooq.TableField
public abstract fun eq(p0: Select<out Record1<Array<(out) String!>!>!>!): Condition defined in org.jooq.TableField
But my function call should match the third type where QuantifiedSelect is used.
I looked for hours on the internet but was not able to find any solution. Any site I found told me to try the solution I already have. Does anyone have an idea what I could try and why it does not work?
Thank you!
The method you're calling here is DSL.any(T...), which takes a generic varargs array (in Java). You're passing a List<String>, so this binds T = List<String>, which doesn't satisfy the type constraint on the eq() method.
But even if you changed that to an Array<String>, it wouldn't work because the jOOQ ANY operator doesn't do the exact same thing as the PostgreSQL any(array) operator. So, just resort to either plain SQL templating:
condition("{0} = any({1})", TABLE.SOME_FIELD,
DSL.value(someArrayListOfStrings.toTypedArray()))
Or just use the IN predicate
TABLE.SOME_FIELD.in(someArrayListOfStrings)
When I follow this documentation:
https://learn.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.relationalqueryableextensions.fromsqlraw?view=efcore-5.0 I get an error message:
FormatException: Input string was not in a correct format.
Example:
Context.Entity.FromSqlRaw("SELECT * FROM dbo.SomeFunction({#MyParameter})", new SqlParameter("#MyParameter", "some value"))
However, when I follow this documentation:
https://learn.microsoft.com/en-us/ef/core/querying/raw-sql
, the SQL created is sound and the query is executed correctly:
Example:
Context.Entity.FromSqlRaw("SELECT * FROM dbo.SomeFunction(#MyParameter)", new SqlParameter("#MyParameter", "some value"))
Here the parameters are specified without curly braces - which I think is logical, since there is no string interpolation going on here.
It seems to me that parts of the documentation are wrong. Or am I missing something?
g.V()
.has('atom', '_value', 'red').fold()
.coalesce(unfold(), addV('atom').property('_value', 'red')).as('atom')
.out('view').has('view', '_name', 'color').fold()
.coalesce(unfold(), addE('view').from('atom').to(addV('view').property('_name', 'color')))
Gives me an error:
The provided traverser does not map to a value: []->[SelectOneStep(last,atom)] (597)
What does it mean?
Adding to this in case someone else comes across this.
This specific error occurs when you use the id as a string in from() instead of the vertex object.
To see what I mean, as a simple test run the following gremlin query:
g.addE('view').from('atom').to(addV('view').property('_name', 'color'))
then run this query:
g.addE('view').from(V('atom')).to(addV('view').property('_name', 'color'))
The first query will give you the error stated above, the second one will not.
So it looks like when as() is followed by fold() it deletes the variable set in the as() step. I used aggregate() instead as follows:
g.V()
.has('atom', '_value', 'red')
.fold().coalesce(
unfold(),
addV('atom').property('_value', 'red')
)
.aggregate('atom')
.out('view').has('view', '_name', 'color')
.fold().coalesce(
unfold(),
addE('view')
.from(select('atom').unfold())
.to(addV('view').property('_name', 'color'))
.inV()
)
The as() step is what is known as a reducing barrier step. With reducing barrier steps any path history of a query (such as applying a label via as()) is lost. In reducing barrier steps many paths are reduced down to a single path. After that step there would be no way to know which of the many original labeled vertices would be the correct one to retrieve.
I have a data frame DF which contains numerous variables. Each variable is present twice because I am conducting an analysis of "couples".
Among others, DF has a series of indicators of diversity :
DF$div1.1, DF$div2.1, .... , DF$divN.1, DF$div.1.2, ..., DF$divN.2
Similarly, it has a series of indicators of another characteristic:
DF$char1.1, DF$char2.1, .... , DF$charM.1, DF$char.1.2, ..., DF$charM.2
Here's a link to an example of DF: http://shorttext.com/5d90dd64
Each time the ".1", ".2" stand for the couple member considered.
My goal:
For each indicator divI and charJ, I want to create another variable DF$divchar that takes the value DF$divI.1 when DF$charJ.1>DF$charJ.2; and DF$divI.2 when DF$charJ.1<DF$charJ.2.
Here is the solution I came up with, it seems somehow very intricate and sometimes behaves in strange ways:
I created a series of binary variables that take the value one if DF$charJ.1>DF$charJ.2. The are stored under DF$CharMax.1.
Here's how I created it:
DF$CharMax.1 <- as.data.frame(
sapply(1:length(nam),
function(n)
as.numeric(DF[names(DF)==names.1[n]]
>DF[names(DF)==names.2[n]])
))
I created the function BinaryExtract:
BinaryExtract <- function(var1, var2, extract) {var1*extract +var2*(1-extract)}
I created the matrix NameFull that contains all the possible combinations of div and char, separated with "YY"
NameFull <- sapply(c("div1",...,"divN")
, function(nam) paste(nam, names(DF$YMax.1), sep="YY")
And then I create all my variables:
DF[, as.vector(NameFull)] <- lapply(as.vector(NameFull), function(e)
BinaryExtract(DF[,paste0(unlist(strsplit(e,"YY"))[1],".1")]
, DF[, paste0(unlist(strsplit(e,"YY"))[1],".1")]
, DF$charMax.1[unlist(strsplit(e,"YY"))[2]]))
My Problem
A. It looks like a very complicated solution for something that simple. What am I missing?
B. Moreover, when I print DF, just typing DF in the command window, I do not see the variables NameFull. They seem to appear with the names of char.
Here's what I get: http://shorttext.com/5d9102c
Similarly, I have tried to change all their names to get rid of the "YY" and it does not seem to work:
names(DF[, as.vector(NameFull)]) <- as.vector(c("div1",...,"divN"), sapply(, function(nam)
paste(nam, names(DF$YMax.1), sep=".")))
When I look at names(DF), I keep getting the old names with the "YY"
However, I do get a result if I explicitly call for them
> DF[,"divIYYcharJ"]
I would really appreciate any suggestion, comment and explanation. I am quite new to R ad was more used to Stata. I feel there is something deeply inefficient here. Thanks
I wish to search twitter for a word (let's say #google), and then be able to generate a tag cloud of the words used in twitts, but according to dates (for example, having a moving window of an hour, that moves by 10 minutes each time, and shows me how different words gotten more often used throughout the day).
I would appreciate any help on how to go about doing this regarding: resources for the information, code for the programming (R is the only language I am apt in using) and ideas on visualization. Questions:
How do I get the information?
In R, I found that the twitteR package has the searchTwitter command. But I don't know how big an "n" I can get from it. Also, It doesn't return the dates in which the twitt originated from.
I see here that I could get until 1500 twitts, but this requires me to do the parsing manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of twitts. Is it even possible to get them in retrospect?? (for example, asking older posts each time through the API URL ?) If not, there is the more general question of how to create a personal storage of twitts on your home computer? (a question which might be better left to another SO thread - although any insights from people here would be very interesting for me to read)
How to parse the information (in R)? I know that R has functions that could help from the rcurl and twitteR packages. But I don't know which, or how to use them. Any suggestions would be of help.
How to analyse? how to remove all the "not interesting" words? I found that the "tm" package in R has this example:
reuters <- tm_map(reuters, removeWords, stopwords("english"))
Would this do the trick? I should I do something else/more ?
Also, I imagine I would like to do that after cutting my dataset according to time (which will require some posix-like functions (which I am not exactly sure which would be needed here, or how to use it).
And lastly, there is the question of visualization. How do I create a tag cloud of the words? I found a solution for this here, any other suggestion/recommendations?
I believe I am asking a huge question here but I tried to break it to as many straightforward questions as possible. Any help will be welcomed!
Best,
Tal
Word/Tag cloud in R using "snippets" package
www.wordle.net
Using openNLP package you could pos-tag the tweets(pos=Part of speech) and then extract just the nouns, verbs or adjectives for visualization in a wordcloud.
Maybe you can query twitter and use the current system-time as a time-stamp, write to a local database and query again in increments of x secs/mins, etc.
There is historical data available at http://www.readwriteweb.com/archives/twitter_data_dump_infochimp_puts_1b_connections_up.php and http://www.wired.com/epicenter/2010/04/loc-google-twitter/
As for the plotting piece: I did a word cloud here: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ using the snippets package, my code is in there. I manually pulled out certain words. Check it out and let me know if you have more specific questions.
I note that this is an old question, and there are several solutions available via web search, but here's one answer (via http://blog.ouseful.info/2012/02/15/generating-twitter-wordclouds-in-r-prompted-by-an-open-learning-blogpost/):
require(twitteR)
searchTerm='#dev8d'
#Grab the tweets
rdmTweets <- searchTwitter(searchTerm, n=500)
#Use a handy helper function to put the tweets into a dataframe
tw.df=twListToDF(rdmTweets)
##Note: there are some handy, basic Twitter related functions here:
##https://github.com/matteoredaelli/twitter-r-utils
#For example:
RemoveAtPeople <- function(tweet) {
gsub("#\\w+", "", tweet)
}
#Then for example, remove #d names
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))
##Wordcloud - scripts available from various sources; I used:
#http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/
#Call with eg: tw.c=generateCorpus(tw.df$text)
generateCorpus= function(df,my.stopwords=c()){
#Install the textmining library
require(tm)
#The following is cribbed and seems to do what it says on the can
tw.corpus= Corpus(VectorSource(df))
# remove punctuation
tw.corpus = tm_map(tw.corpus, removePunctuation)
#normalise case
tw.corpus = tm_map(tw.corpus, tolower)
# remove stopwords
tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)
tw.corpus
}
wordcloud.generate=function(corpus,min.freq=3){
require(wordcloud)
doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
dm = as.matrix(doc.m)
# calculate the frequency of words
v = sort(rowSums(dm), decreasing=TRUE)
d = data.frame(word=names(v), freq=v)
#Generate the wordcloud
wc=wordcloud(d$word, d$freq, min.freq=min.freq)
wc
}
print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7))
##Generate an image file of the wordcloud
png('test.png', width=600,height=600)
wordcloud.generate(generateCorpus(tweets,'dev8d'),7)
dev.off()
#We could make it even easier if we hide away the tweet grabbing code. eg:
tweets.grabber=function(searchTerm,num=500){
require(twitteR)
rdmTweets = searchTwitter(searchTerm, n=num)
tw.df=twListToDF(rdmTweets)
as.vector(sapply(tw.df$text, RemoveAtPeople))
}
#Then we could do something like:
tweets=tweets.grabber('ukgc12')
wordcloud.generate(generateCorpus(tweets),3)
I would like to answer your question in making big word cloud.
What I did is
Use s0.tweet <- searchTwitter(KEYWORD,n=1500) for 7 days or more, such as THIS.
Combine them by this command :
rdmTweets = c(s0.tweet,s1.tweet,s2.tweet,s3.tweet,s4.tweet,s5.tweet,s6.tweet,s7.tweet)
The result:
This Square Cloud consists of about 9000 tweets.
Source: People voice about Lynas Malaysia through Twitter Analysis with R CloudStat
Hope it help!