How can I use solr handlers internally? - solr

So assume I want to implement custom morelikethis(or autosuggest) experience by merging output of two different morelikethis handler configurations.
Pseudo code could look like
class MyMoreLikeThis extends SearchHanlder {
def process(reqBuilder) {
val mlt1 = reBuilder.getComponent("/mlt1");
val mlt2 = reBuilder.getComponent("/mlt2");
val rb1 = reqBuilder.copy()
val rb2 = reqBuilder.copy()
reqBuilder.results = mlt1.process(rb1).getResults ++ mlt1.process(rb2).getResults
}
}
Or probably I can use solrj API to access solr from inside.
How can I do this? Is there better way to do this?

You can refer to the below blog posts that has detailed explanation on how to achieve merging results from different queries similar to the problem that you are talking about,
Custom Search Handler
Idea#1
Custom Search Handler
Idea#2
The blog is authored by a former colleague of mine who has number of years expertise with Search & Information Retrieval.

Related

Karate: can we make multiple calls from scenario outline using an array variable? [duplicate]

I currently use junit5, wiremock and restassured for my integration tests. Karate looks very promising, yet I am struggling with the setup of data-driven tests a bit as I need to prepare a nested data structures which, in the current setup, looks like the following:
abstract class StationRequests(val stations: Collection<String>): ArgumentsProvider {
override fun provideArguments(context: ExtensionContext): java.util.stream.Stream<out Arguments>{
val now = LocalDateTime.now()
val samples = mutableListOf<Arguments>()
stations.forEach { station ->
Subscription.values().forEach { subscription ->
listOf(
*Device.values(),
null
).forEach { device ->
Stream.Protocol.values().forEach { protocol ->
listOf(
null,
now.minusMinutes(5),
now.minusHours(2),
now.minusDays(1)
).forEach { startTime ->
samples.add(
Arguments.of(
subscription, device, station, protocol, startTime
)
)
}
}
}
}
}
return java.util.stream.Stream.of(*samples.toTypedArray())
}
}
Is there any preferred way how to setup such nested data structures with karate? I initially thought about defining 5 different arrays with sample values for subscription, device, station, protocol and startTime and to combine and merge them into a single array which would be used in the Examples: section.
I did not succeed so far though and I am wondering if there is a better way to prepare such nested data driven tests?
I don't recommend nesting unless absolutely necessary. You may be able to "flatten" your permutations into a single table, something like this: https://github.com/intuit/karate/issues/661#issue-402624580
That said, look out for the alternate option to Examples: which just might work for your case: https://github.com/intuit/karate#data-driven-features
EDIT: In version 1.3.0, a new #setup life cycle was introduced that changes the example below a bit.
Here's a simple example:
Feature:
Scenario:
* def data = [{ rows: [{a: 1},{a: 2}] }, { rows: [{a: 3},{a: 4}] }]
* call read('called.feature#one') data
and this is: called.feature:
#ignore
Feature:
#one
Scenario:
* print 'one:', __loop
* call read('called.feature#two') rows
#two
Scenario:
* print 'two:', __loop
* print 'value of a:', a
This is how it looks like in the new HTML report (which is in 0.9.6.RC2 and may need more fine tuning) and it shows off how Karate can support "nesting" even in the report, which Cucumber cannot do. Maybe you can provide feedback and let us know if it is ready for release :)

How can i achieve dictionary type data access in Chromium embedded CEF1

I would like to achieve dictionary like data pattern that can be accessed from the
java script. Something like this:
pseudo Code:
for all records:
{
rec = //Get the Record
rec["Name"]
rec["Address"]
}
I am trying to achieve with CefV8Accessor, but i am not getting near to the solution.
Kindly provide few links for the reference, as i see the documentation is very less from chromium embedded.
If I understand correctly, you're trying to create a JS "dictionary" object for CEF using C++. If so, here's a code snippet that does that:
CefRefPtr<CefV8Value> GetDictionary(__in const wstring& sName, __in const wstring& sAddress)
{
CefRefPtr<CefV8Value> objectJS = CefV8Value::CreateObject(NULL);
objectJS->SetValue(L"Name", sName, V8_PROPERTY_ATTRIBUTE_NONE);
objectJS->SetValue(L"Address", sAddress, V8_PROPERTY_ATTRIBUTE_NONE);
return objectJS;
}
The CefV8Accessor can also be used for that matter, but that's only if you want specific control over the set & get methods, to create a new type of object.
In that case you should create a class that inherits CefV8Accessor, implement the Set and Get methods (in a similar way to what appears in the code above), and pass it to the CreateObject method. The return value would be an instance of that new type of object.
I strongly suggest to browse through this link, if you haven't already.

Retrieving db.Property choices

Searching on the documentation provided by google and browsing SO I haven't found a way to retrieve the choices set on a db.Property object (I want to retrieve it in order to create forms based on the model).
I'm using the following recipe to do what I need, Is this correct? Is there any other way of doing it? (simpler, more elegant, more pythonic, etc.)
For a model like this:
class PhoneNumber(db.Model):
contact = db.ReferenceProperty(Contact,
collection_name='phone_numbers')
phone_type = db.StringProperty(choices=('home', 'work'))
number = db.PhoneNumberProperty()
I do the following modification:
class PhoneNumber(db.Model):
_phone_types = ('home', 'work')
contact = db.ReferenceProperty(Contact,
collection_name='phone_numbers')
phone_type = db.StringProperty(choices=_phone_types)
number = db.PhoneNumberProperty()
#classmethod
def get_phone_types(self):
return self._phone_types
You should be able to use PhoneNumber.phone_type.choices. If you want you could make that into a class method too:
#classmethod
def get_phone_types(class_):
return class_.phone_type.choices
You can decide if you prefer the class method approach or not.
Don't forget about Python's dir built-in! It is very useful when exploring objects.

How to access NHibernate.IQueryOver<T,T> within ActiveRecord?

I use DetachedCriteria primarily, it has a static method For to construct an instance. But it seems IQueryOver is more favourable, I want to use it.
The normal way to get an instance of IQueryOver is Isession.Query, and I want get it with ActiveRecord gracefully, anyone knows the way? Thanks.
First, QueryOver.Of<T> return an instance of QueryOver<T, T>, then you build conditions with QueryOver API.
After that, query.DetachedCriteria return the equivalent DetachedCriteria, which can be used with ActiveRecord gracefully.
var query = QueryOver.Of<PaidProduct>()
.Where(paid =>
paid.Account.OrderNumber == orderNumber
&& paid.ProductDelivery.Product == product)
.OrderBy(paid=>paid.ProductDelivery.DeliveredDate).Desc;
return ActiveRecordMediator<PaidProduct>.FindAll(query.DetachedCriteria);
As far as I know, there is no direct support for QueryOver. I encourage you to create an item in the issue tracker, then fork the repository and implement it. I'd start by looking at the implementation of ActiveRecordLinqBase, it should be similar. But instead of a separate class, you could just implement this in ActiveRecordBase. Then wrap it in ActiveRecordMediator to that it can also be used in classes that don't inherit ActiveRecordBase.

Plotting a word-cloud by date for a twitter search result? (using R)

I wish to search twitter for a word (let's say #google), and then be able to generate a tag cloud of the words used in twitts, but according to dates (for example, having a moving window of an hour, that moves by 10 minutes each time, and shows me how different words gotten more often used throughout the day).
I would appreciate any help on how to go about doing this regarding: resources for the information, code for the programming (R is the only language I am apt in using) and ideas on visualization. Questions:
How do I get the information?
In R, I found that the twitteR package has the searchTwitter command. But I don't know how big an "n" I can get from it. Also, It doesn't return the dates in which the twitt originated from.
I see here that I could get until 1500 twitts, but this requires me to do the parsing manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of twitts. Is it even possible to get them in retrospect?? (for example, asking older posts each time through the API URL ?) If not, there is the more general question of how to create a personal storage of twitts on your home computer? (a question which might be better left to another SO thread - although any insights from people here would be very interesting for me to read)
How to parse the information (in R)? I know that R has functions that could help from the rcurl and twitteR packages. But I don't know which, or how to use them. Any suggestions would be of help.
How to analyse? how to remove all the "not interesting" words? I found that the "tm" package in R has this example:
reuters <- tm_map(reuters, removeWords, stopwords("english"))
Would this do the trick? I should I do something else/more ?
Also, I imagine I would like to do that after cutting my dataset according to time (which will require some posix-like functions (which I am not exactly sure which would be needed here, or how to use it).
And lastly, there is the question of visualization. How do I create a tag cloud of the words? I found a solution for this here, any other suggestion/recommendations?
I believe I am asking a huge question here but I tried to break it to as many straightforward questions as possible. Any help will be welcomed!
Best,
Tal
Word/Tag cloud in R using "snippets" package
www.wordle.net
Using openNLP package you could pos-tag the tweets(pos=Part of speech) and then extract just the nouns, verbs or adjectives for visualization in a wordcloud.
Maybe you can query twitter and use the current system-time as a time-stamp, write to a local database and query again in increments of x secs/mins, etc.
There is historical data available at http://www.readwriteweb.com/archives/twitter_data_dump_infochimp_puts_1b_connections_up.php and http://www.wired.com/epicenter/2010/04/loc-google-twitter/
As for the plotting piece: I did a word cloud here: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ using the snippets package, my code is in there. I manually pulled out certain words. Check it out and let me know if you have more specific questions.
I note that this is an old question, and there are several solutions available via web search, but here's one answer (via http://blog.ouseful.info/2012/02/15/generating-twitter-wordclouds-in-r-prompted-by-an-open-learning-blogpost/):
require(twitteR)
searchTerm='#dev8d'
#Grab the tweets
rdmTweets <- searchTwitter(searchTerm, n=500)
#Use a handy helper function to put the tweets into a dataframe
tw.df=twListToDF(rdmTweets)
##Note: there are some handy, basic Twitter related functions here:
##https://github.com/matteoredaelli/twitter-r-utils
#For example:
RemoveAtPeople <- function(tweet) {
gsub("#\\w+", "", tweet)
}
#Then for example, remove #d names
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))
##Wordcloud - scripts available from various sources; I used:
#http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/
#Call with eg: tw.c=generateCorpus(tw.df$text)
generateCorpus= function(df,my.stopwords=c()){
#Install the textmining library
require(tm)
#The following is cribbed and seems to do what it says on the can
tw.corpus= Corpus(VectorSource(df))
# remove punctuation
tw.corpus = tm_map(tw.corpus, removePunctuation)
#normalise case
tw.corpus = tm_map(tw.corpus, tolower)
# remove stopwords
tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)
tw.corpus
}
wordcloud.generate=function(corpus,min.freq=3){
require(wordcloud)
doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
dm = as.matrix(doc.m)
# calculate the frequency of words
v = sort(rowSums(dm), decreasing=TRUE)
d = data.frame(word=names(v), freq=v)
#Generate the wordcloud
wc=wordcloud(d$word, d$freq, min.freq=min.freq)
wc
}
print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7))
##Generate an image file of the wordcloud
png('test.png', width=600,height=600)
wordcloud.generate(generateCorpus(tweets,'dev8d'),7)
dev.off()
#We could make it even easier if we hide away the tweet grabbing code. eg:
tweets.grabber=function(searchTerm,num=500){
require(twitteR)
rdmTweets = searchTwitter(searchTerm, n=num)
tw.df=twListToDF(rdmTweets)
as.vector(sapply(tw.df$text, RemoveAtPeople))
}
#Then we could do something like:
tweets=tweets.grabber('ukgc12')
wordcloud.generate(generateCorpus(tweets),3)
I would like to answer your question in making big word cloud.
What I did is
Use s0.tweet <- searchTwitter(KEYWORD,n=1500) for 7 days or more, such as THIS.
Combine them by this command :
rdmTweets = c(s0.tweet,s1.tweet,s2.tweet,s3.tweet,s4.tweet,s5.tweet,s6.tweet,s7.tweet)
The result:
This Square Cloud consists of about 9000 tweets.
Source: People voice about Lynas Malaysia through Twitter Analysis with R CloudStat
Hope it help!

Resources