Randomize result set between the brands in solr

Randomize result set between the brands in solr - solr

I have collection of 2.5 L product which have only 3 brands.Which are divided in like 75%,20%,10% ration.
Now whenever I perform a query to solr,I always use to get product from 75% brand collection.I want to randomize result set so that other brand's product also come in top list.

There are 2 kinds of boosts.
Index-time boosts
Index-time boosts boosts are applied when adding documents, and apply to the entire document or to specific fields.
Query-time boosts
Query-time boosts are applied when constructing a search query, and apply to specific fields
Let your brand's are brand_a, brand_b and brand_c and brand field name is brand
Make brand_a boost by 1, brand_b boost by 2 and brand_c boost by 3
Query-time boosts :
brand:"brand_a"^1 OR brand:"brand_b"^2 OR brand:"brand_c"^3
Index-time boosts :
<add>
<doc boost="1">
<field name="brand">brand_a</field>
...
</doc>
<doc boost="2">
<field name="brand">brand_b</field>
...
</doc>
<doc boost="3">
<field name="brand">brand_c</field>
...
</doc>
</add>
You can use any one of the above depending upon your requirement.

Are you sure you really want to ignore relevance completely and just return random products that (100% ?) match user search? You can have a random field to do so, but I suspect that's a short term solution.
Would you prefer instead to - for example - group the results by brand and then return top X results from each brand? Solr has Collapse and Expand functionality for that.

Related

Solr filter on facets

Each of my documents can have one or more entries of a field called Classes, describing some properties of the document, always of the form:
<field name="Classes">"<Description> - <TypeLabel> - <OriginLabel>"</field>
So for instance a document about food might have the two fields:
<field name="Classes">"Yellow orange - Fruit - California"</field>
<field name="Classes">"Small broccoli - Vegetable - Florida"</field>
I am using Solr 5.0 and a schema.xml file, where I have a multiValued "text_en" field Classes that I copy to a "string" field Classes_asString so that I can do faceting on the whole field and treat is as a big label.
With facet.field on Classes_asString I am getting the facet counts that I want, but now I would like to additionally filter these results.
For example, how do I only get facet results that end with "California"?
Or, in another example, how do I only get facet results that have "Vegetable" between the two "-"?
I have seen the option facet.prefix, but this is not applicable in my case. I would appreciate any help or suggestions.

Maybe this scenario is a good place to use:
Index the Classes info as Child documents. You have at least 3 fields in those fields, so it's worth using their own doc for that?
Then you should be able to facet on the specific child field, either with a current Solr version if it is supported (not sure), or with work in this ticket that is not merged yet

how to do index time filed boost in solr using solrconfig.xml

I am new to solr. Currently we are using solr 4.8.0. We have already done indexing, created fields, field types. Now I want to boost some of the fields using index time boosting. Does this have to be done in solrconfig.xml? What are the tags and syntax?

Index time boosts are applied for each field when the document is submitted to the index. You can either apply the boost to hits in a specific field or for the document as the whole (which would apply the same boost to all fields).
Example from the wiki:
<add>
<doc boost="2.5">
<field name="employeeId">05991</field>
<field name="office" boost="2.0">Bridgewater</field>
</doc>
</add>

How to enforce an exact match to get the highest priority?

I am indexing and searching 5 fields, which are tokenized/filtered in various ways.
BUT, I would like that when I search, if the query I entered matches a value in field 1, it will be the top result I get back.
How would I define:
The field
The query in such a way this field gets priority IF there is 100% match
In my schema, I have the field
<field name="na_title" type="text_names" indexed="true" stored="false" required="true" />
text_names is :<fieldType name="text_names" class="solr.StrField" />
I have ONLY one entry with na_title="somthing is going on here". But, when I search
text_names:somthing is going on here I get many results.
Just to point it out, there are no analyzers nor filters on that field, both for query and index actions.

From the manual:
Lucene allows influencing search results by "boosting" in more than
one level:
Document level boosting - while indexing - by calling
document.setBoost() before a document is added to the index.
Document's Field level boosting - while indexing - by calling
field.setBoost() before adding a field to the document (and before
adding the document to the index).
Query level boosting - during
search, by setting a boost on a query clause, calling
Query.setBoost().

You'll need to index the field twice -- once analyzed and once not. Then you can boost the matches in the nonanalyzed fields over the others.
A shortcut could be to index all those fields as strings and use copyfield to copy them as text into a catch-all field. That would simplify the query a little and decrease the number of duplicate fields.

Solr query (q) or filter query (fq)

I have a ~1 mil product document Solr index. I also have a whole bunch of UI filters such as, categories, tabs, price ranges, sizes, colors, and some other filters.
Is it the right way to have the q selecting everything (q=\*:\*) while all other filters in the fq? example:
fq=(catid:90 OR catid:81) AND priceEng:[38 TO 40] AND (size:39 OR size:40 OR size:41 OR size:50 OR size:72) AND (colorGroup:Yellow OR colorGroup:Violet OR colorGroup:Orange ... AND (companyId:81 OR companyId:691 OR companyId:671 OR companyId:628 OR companyId:185 OR companyId:602 OR ... AND endShipDays:[* TO 7])
To me, everything from categories to companyIds, from colors and sizes, etc are just filters. Any problem in performance in the future growth with this approach ? Should I put some of the queries in the q, which ones ?
Thank you,

It's preferable to use Filter Query over normal Query wherever possible.
FilterQuery is able to take advantage of the FilterCache, which would be a huge performance boost in comparison to your queries.

I would look at the following points about a field to in order to decide:
Does your field have a fixed boost score or do you need scoring for this field at all? If yes, put it in query, because as mentioned above, filter query does not use scores.
Is condition for this field used frequently? If yes - again, as said before, filter cache may give huge advantage, but if no - it may be even slower.
Is your index constant? This is kinda similar to #2. If your index is being updated frequently, usage of filter queries may become a bottleneck instead of giving performance boost.
Some notes about #3: In my experience I had a big index which was populated with new docs every few seconds and autoSoftCommit was set to few seconds as well. During soft commits new searcher was opened which was invalidating caches. So what was really happening, filter hit ratio was almost always 0.
I can tell more: I've figured out that first filter query run is more expensive than run of a query with all those filter conditions moved to "q" instead of "fq". For example, my query took 1 second with 5 filter queries (no cache hit) and 147ms when I moved all "fq" conditions into the main query with "AND". But of course, when I stopped index updates, the same filter queries took 0ms because cache was used. So this is something to consider.
Also few other points for your question:
Try to never use wildcards in your query. It significantly affects performance. Therefore instead of ":" I would suggest using one condition which is less-constant-per-request (most-constant-per-request which don't need score you want to put to "fq")
Range searches also better to be avoided (if possible). And range searches with wildcards even more. It's about your "endShipDays:[* TO 7]". For example, using "endShipDays:(1 2 3 4 5 6 7)" would be more effective, but it's just an example, there are many ways.
Hope it helps.

The way I use q and fq.
I apply full-text search on q and all the filters on fq.
Lets say you have field keyword that your going to have full-text search with fields as defined in your schema with copyField
<copyField source="id" dest="keyword"/>
<copyField source="category" dest="keyword"/>
<copyField source="product_name" dest="keyword"/>
<copyField source="color" dest="keyword"/>
<copyField source="location" dest="keyword"/>
<copyField source="price" dest="keyword"/>
<copyField source="title" dest="keyword"/>
<copyField source="description" dest="keyword"/>
My query would look like
/select?q={keyword}&fq=category:fashion&fq=location:nyc
/select?q=jeans&fq=category:fashion&fq=location:nyc
As digitaljoel suggested, if you have to query multiple fields, then it would be better to use multiple fq's (Refer to above query) instead of using AND and OR with q
Note: in my case q default refers to field keyword as defined in solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">keyword</str>
</lst>

Think about your query and put everything that doesn't have to be scored and is repeatable in the fq parameter. That way consecutive queries that will hit the Solr node between opening the searcher will be able to reuse the information stored in the FilterCache.
Filter cache stores unique filters as the key in the filter - the value is an array of bits where each entry of the array says if a given document matches the given filter or not. That way it is very easy to re-apply the filter for the next query. But you, of course, miss the scoring capabilities.
When looking at your query I would simplify it a bit, by using multiple fq values, something among those lines:
fq=(catid:90 OR catid:81)
fq=priceEng:[38 TO 40]
fq=(size:39 OR size:40 OR size:41 OR size:50 OR size:72)
fq=(colorGroup:Yellow OR colorGroup:Violet OR colorGroup:Orange ... )
fq=(companyId:81 OR companyId:691 OR companyId:671 OR companyId:628 OR companyId:185 OR companyId:602 OR ... )
fq=endShipDays:[* TO 7])
Filters are additive, so the query would return the same results, but at least to me it is easier to manage :)

solr displayed some results first when they are part of the results

I consider this solr psedo-doc
<doc>
<field name="title"/>
<field name="name"/>
<field name="keywords"/>
</doc>
Some doc's will have the keyword "up" which means that they should appear first (despite of their initial order position) when and only when they are part of the search results.
So lets say I have:
doc1('title1','Bob, Alice','people, up, couple')
doc2('title2','Smart Phone, Laptop, Bob','devices, electronics')
if I query with "title:title2 name:Bob" then I should get doc1 first (it has the 'up' keyword).
if I query with "name:Bob" I still get doc1 first for the same reason.
if I query with "name:Laptop" then I should only get doc2 in my results. doc1 should not be included since it doesnt match my search query.
Any suggestion to do this?

You have several options to do something like that:
function query / boost query (in dismax handler)
during index time (boost documents)
extract 'up' keyword to additional field and sort by this field, than score
For example (with dismax handler):
/select?defType=dismax&q=...&bq=keywords:"up"^1000

This can be solved with Solr's query time boosting. So following the guidance from the Solr Relevancy FAQ - you could add an additional boosted search term to all queries, e.g. title:title2 name:Bob keywords:up^2
You could also at index time for each document, determine if the up keyword is present then store that in an additional field (boolean for example) in your schema and boost the query results based on that boolean field.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight