Developing custom facet calculations in SOLR - solr

I'm looking into using Solr for a project where we have some specific faceting requirements. From what I've learned, Solr provides range-based facets, where Solr can provide facets of different value-ranges or date-ranges, e.i. field values are "grouped" and aggregated into different bins.
I would like to do something similar, but I want to create a custom function that maps field values to my specific facets, so that each field value is evaluated using a function to see which facet it belongs to.
myFacet = myFacetMapper(fieldValue)
Its sort of a more advanced version of range-facets, but where values are mapped using a custom function rather than just into different bins.
Does anyone know if this is possible and where to start?

I would look into using SimpleFacets to implement your logic. Then you embed it inside a SearchComponent, that you can register into your solrconfig. Look at the code of FacetComponent for an example.

Create another field with value = myFacetMapper(field) , then do normal faceting on that field.

Related

word proximity not working in apache solr

I am using dismax parser to boost phrase queries like following
qf=story_title^5.0+tax_payer_name+judgement_text^1.0+story_description^1.0+tax_payer_name+nature_of_the_issues+decision_summary+additional_comments+facts_of_the_case+section_number';
pf=story_title^5.0+&pf=judgement_text+story_description^1+nature_of_the_issues+decision_summary+additional_comments+facts_of_the_case+section_number';
qs=3';
ps=3';
but whenever i search like 54F beed registration , some results come up where , there are more registration word recurring and not 54F beed registration
Somewhere i found that solr score depends on percentage of word repeating in document
how can we override this behavior to achieve desired results in solr?
Thanks in advance.
I don't think there's an omitTermFreq setting yet, even if it has been mentioned many times.
A possible solution is to create your own similarity class by subclassing DefaultSimilarity, and returning 1.0f as the tf value.
See Solr Custom Similarity for an on how to implement a custom similarity class. Recent versions of Solr (4.0+) supports a custom similarity class per field.

Select multiple values of same facet using IBM WCS v7 and Apache Solr

We use IBM WCS v7, with embeded Apache Solr. Solr is used as a search engine for our e-commerce based application.
As per a recent requirement, we want to use multi select facet functionality, where the user can check multiple facet attributes, and the corresponding values will be OR'ed to the search result.
Ex- I wish to check Color:RED, Color:BLUE and Color:BLACK in my default Search Results, so that each attribute value will be OR'ed in the resulting search results display.
We use the out-of-the-box SearchDisplayCmd, for our Search functionality, where the field "metaData=" takes care of history of the facets applied, and "facet=" takes care of applying a facet field. For the query param "metaData", it encodes the multiple facets into base64 encoding. It uses a special de-limiter to AND the different facet fields,and restrict the search results.
brand:"POLO" color:"RED" shape:"Oval"
I want to know, if there exists any such de-limiter or any alternatives by using which, I can perform an OR operation, on different values of the same facet attribute, and use "metaData" parameter to maintain a history of the applied facets.
Any help on the same front is highly appreciated. Any other approaches, on applying multiple values of the same facet attribute are also welcome.
Great Thanks in advance.
Regards,
Jitendriya Dash
I recently worked on this: Select multiple values of same facet
I was able to get it also.
Try to find where it hits the tag. The expression builder I used comes OOB. getCatalogNavigationView. Make sure you use the appropriate searchProfile.
Pass the facet param in this way.
<c:forEach var="facetSelect" value="paramValues.facet">
<wcf:param name="facet" value="facetSelect>
</c:forEach
But by this method you will not be able to select values from any other attributes. If someone knows how to select values from the same facet or different facet, pls share.
Update SELECTION column of FACET table to 1 to mark the facetable attribute as multi selectable.
In WCS7+, for enabling multi select facet functionality go to FACET table and set 'SELECTION' column value to 1 instead of 0.
If an attribute is to be made multi select facet, you can make the changes from CMC. Go to the attribute dictionary select the attribute and in facetable properties, check 'Allow multiple facet value'.

Solr Spell Check result based filter query

I implemented Solr SpellCheck Component based on the document from http://wiki.apache.org/solr/SpellCheckComponent , it works good. But i am trying to filter the spell check result based on some other filter. Consider the following schema
product_name
product_text
product_category
product_spell -> copy string from product_name and product_text . And tokenized using white space analyzer
For the above schema, i am trying to filter the spell check result based on provided category. I tried querying like http://127.0.0.1:8080/solr/colr1/myspellcheck/?q=product_category:160%20appl&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true . Spellcheck results does not consider the product_category:160
Is it because the dictionary was build for all the categories? If so is it a good idea to create the dictionary for every category?
Is it not possible to have another filter condition in spellcheck component?
I am using solr 3.5
I previously understood from the SOLR-2010 issue that filtering through the fq parameter should be possible using collation, but it isn't, I think I misunderstood.
In fact, the SpellCheckComponent has most likely a separate index, except for the DirectoSolrSpellChecker implementation. It means the field you select is indexed in a different index, which contains only the information about that specific field you chose to make spelling corrections.
If you're curious, you can also have a look how that additional index looks like using luke, since it's of course a lucene index. Unfortunately filtering using other fields isn't an option there, simply because there is only one field there, the one you use to make spelling corrections.

Is it possible to have SOLR MoreLikeThis use different fields for model and matches?

Let's say I have documents with two fields, A and B.
I'd like to use SOLR's MoreLikeThis, but with a twist: I'm most interested in boosting documents whose A field is like my model document's B field. (That is, extract MLT's 'interesting terms' from the model B field, but only collect MLT results based on the A field.)
I don't see a way to use the mlt.fl fields or mlt.qf boosts to achieve this effect in a single query. (It seems mlt.fl specifies fields used for both discovery of 'interesting terms' and matching to those terms.) Am I missing some option?
Or will I have to extract the 'interesting terms' myself and swap the 'field:term' details?
(Other ideas in this same vein appreciated as well.)
Two options I see are:
Use a copyField - index your original document with a copy of field A named B, and then query using B.
Extend MoreLikeThisHandler and change the fields you query.
The first option costs a bit of programming (mostly configuration changes) and some memory consumption. The second involves more programming but no memory footprint increase. Hope one of them suits your needs.
I now think there are two ways to achieve the desired effect (without customizing the MLT source code).
First option: Do an initial MLT query with the MLT handler, adding the parameter &mlt.interestingTerms=details. This includes the list of terms that were deemed interesting, ranked with their relative boosts. The usual behavior uses those discovered terms against the same mlt.fl fields to find similar documents. For example, the response will include something like:
"interestingTerms":
["field_b:foo",5.0,"field_b:bar",2.9085307,"field_b:baz",1.67070794]
(Since the only thing about this initial query that's interesting is the interestingTerms, throwing in an fq that rules out all docs could help it skip unnecessary scoring work.)
Explicitly re-composing that interestingTerms info into a new OR query field_a:foo^5.0 field_a:bar^2.9085307 field_a:baz^1.67070794 amounts to using the B field example text to find documents that are similar in field A, and may be mimicking exactly the kind of query default MLT does on its usual model field.
Second option: Grab the model document's actual field B text, and feed it directly as a ContentStream body, to be used in lieu of a query, for specifying the model document. Then target mlt.fl at field A for the sake of collecting similar results. For example, a fragment of the parameters might be …&stream.body=foo bar baz&mlt.fl=field_a&…. Again, the net effect being that model text originally from field_b is finding documents similar only in field_a.

Use different Solr Similarity algo for every search

Is possible in Solr 1.4 to specify which similarity class to use for every search within a single index?
Let's say, I got 2 type of search (keyword and brand). For keyword search, I want to use the DefaultSimilarity class. But, for brand search, I want to use my CustomSimilarity class.
I've been modifying the schema.xml to specify a single similarity class to use. But, I came to this requirement that I have to use 2 different similarity classes.
I'll be glad to here your thoughts on this.
Thanks in advance.
AFAIK the Similarity can only be defined at the schema/index level and can't be overriden per fieldType or per query. (see this and this).
However you can customize your result ordering using other methods: boosting, function queries, a custom analyzer per field, or even sorting.
The Solr Relevancy Cookbook wiki is a good reference.

Resources