Solr More Like This results narrowed down by facet field - solr

I have a single Solr instance for indexing multiple sites content.
While indexing I am populating Website field to be able to perform faceted search on that field for every particular website...and that works ok.
Though, if I use Solr MLT feature I get results from all websites, and I want to narrow MLT results down just to the single website.
Is it possible to define facet for the Solr MLT or is there any other better way to achieve the same?
If Solr supports that, is it also available in solrnet?

Solr 3.1 doesn't support filters on the MoreLikeThis component (issue here). You have to use the MoreLikeThis handler, but this handler is not currently implemented in SolrNet (issue here). available as of 0.4.0 beta 1

Related

Solr : Stemming words Using Solr

I am learning solr and want to use solr for stemming words.I'll be passing the word to the solr and it should send the stemmed word back.I know how to configure solr core for different stemming patterns and also i am able to view their stemmed words in the analyzer (solr admin ui) but i am not sure how to achieve this using java code.I am able to index and query using java api.
I am using solr-5.3.0.
If you need to just stem the words I would recommend you not to use the whole Solr. Just use the code they use for stemming or something similar. E.g. you can use
org.apache.lucene.analysis.en.PorterStemmer.stem(String)
Unfortunately PorterStemmer has package level access so I would just copy it from the sources or you can search the Internet for some other stemmer implementations. I hope that helps.
Good luck!

In Solr MLT Component, Can we able to find why Solr MLT returns particular file?

I want to use Solr MLT. i want to know why particular document is returned by Solr MLt for my search. For example if some documents is returned there should be some word or phrase match with a parent document. Is there a way in Solr to retrieve those words/phrases due to which MLT returns Similar Documents of the Parent?
According to SOLR-860 this information is included in the debugQuery information from Solr 3.1 and onwards.

Solr field collapsing

I read
http://wiki.apache.org/solr/FieldCollapsing
and I tried the query
http://192.168.0.1:8080/solr/append/select?q=mobile&group=true&group.field=brand
and I don't see the field collapsing. I mean I see the results, but not the grouping. My understanding is it should work, nothing to change in the solrconfig.xml ? In my schema, all my field are stored/index. My index is Lucene 2.9 and my Solr is 1.4.1. I don't see what I doing wrong...
Field collapsing is not available in Solr 1.4.1. You need Solr 3.3 or 4.0 (currently unreleased).
The wiki page about field collapsing also explains "If you haven't already, get a recent nightly build of Solr4.0 or Solr3.3..."
Look for "warning tags" in the Solr wiki that show when a particular feature is available only since a particular version of Solr:

How does Solr's MoreLikeThis component internally work to get results?

I'm new to Apache Solr and am currently exploring/trying to make use of MoreLikeThis as a search component (instead of dedicated request handler). I'm finding difficult to understand clearly on how this works internally to get more-like-this results?
For example, I'm trying to search for the word java in one of the document field named mytextcontentfield:
http://localhost/solr/core0/select/?q=mytextcontentfield:java&version=2.2&start=0&rows=10&indent=on&debugQuery=on&mlt=true&mlt.fl=mytextcontentfield
and I could see moreLikeThis in the XML response with unique keys of the documents in name attribute.
My questions here is, how does Solr internally work/match to find more-like-this documents based on the search keyword java? Any explanation with good example are appreciated.
Looks there's no Solr documentation that explain this feature in detail.
But somehow, after some google, managed to find a write-up on How MoreLikeThis Works in Lucene.

Identifying strings in documents, with nutch+solr?

I'm looking into a search solution that will identify strings (company names) and use these strings for search and facets in Solr.
I'm new to Nutch and Solr so I wonder if this is best done in Nutch or in Solr. One solution would be to generate a Parser in Nutch that identifies the strings in question and then index the name of the company, later mapped to a Solr value. I'm not sure on how, but I guess this could also be done inside Solr directly from the text?
Does it make sense to do this string identification in Nutch or in Solr and is there some functionality in Solr or Nutch that could help me here?
Thanks.
You could embed a NER library (see opennlp, lingpipe, gate) in to a custom parser, generate new fields and create an indexingfilter accordingly. This is not particularly difficult and the advantage compared to doing this on the SOLR side is that you'd gain from the scalability of mapreduce (NLP tasks are often CPU-hungry).
See Behemoth for an example of how to embed GATE in mapreduce
Nutch works with Solr by indexing the crawled data to Solr via the Solr HTTP API. You trigger the indexation by calling the solrindex command. See this page for details on how to setup this.
To be able to extract the company names, I would add the necessary code in Solr. I would use a UpdateRequestProcessor. It allows to add an extra step in the indexing process to add extra fields in the document being indexed. Your UpdateRequestProcessor would be used to examine to document sent to Solr by Nutch, extract the company names from the text and add them as new fields in the document. Solr would them index the document + the fields that you add.

Resources