how to do index time filed boost in solr using solrconfig.xml - solr

I am new to solr. Currently we are using solr 4.8.0. We have already done indexing, created fields, field types. Now I want to boost some of the fields using index time boosting. Does this have to be done in solrconfig.xml? What are the tags and syntax?

Index time boosts are applied for each field when the document is submitted to the index. You can either apply the boost to hits in a specific field or for the document as the whole (which would apply the same boost to all fields).
Example from the wiki:
<add>
<doc boost="2.5">
<field name="employeeId">05991</field>
<field name="office" boost="2.0">Bridgewater</field>
</doc>
</add>

Related

how does Solr store documents

I know Solr uses Lucene and Lucene uses an inverted index. But from the Lucene examples I have seen so far, I am not sure I understand how it woks in combination with Solr.
Given the following document:
<doc>
<field name="id">9885A004</field>
<field name="name">Canon PowerShot SD500</field>
<field name="manu">Canon Inc.</field>
<field name="inStock">true</field>
</doc>
From the examples I have seen so far, I would think that Lucene has to treat each field as a document. it would then say: the ord Cannon appears in field name and field manu.
Is the index broken down this much? Or does the index only say: "the word Canon appears in the document with id such and such"?
How does this work exactly when using Lucene with Solr?
What would this document look like in the index? (supposing each field has indexed="true")
I made a blog post few years ago, to explain that in details[1] .
Short answer to this question :
" From the examples I have seen so far, I would think that Lucene has to treat each field as a document."
Absolutely NOT.
Lucene unit of information is the document which is composed by a map field -> value[s] .
A Solr document is just a slightly different representation as Solr incorporate a schema where fields are described.
So in Solr you can just add fields to the documents without having to describe the type and other properties ( which are stored in the schema), while in Lucene you need to define them explicitly when creating the doc.
[1] https://sease.io/2015/07/exploring-solr-internals-lucene.html

Randomize result set between the brands in solr

I have collection of 2.5 L product which have only 3 brands.Which are divided in like 75%,20%,10% ration.
Now whenever I perform a query to solr,I always use to get product from 75% brand collection.I want to randomize result set so that other brand's product also come in top list.
There are 2 kinds of boosts.
Index-time boosts
Index-time boosts boosts are applied when adding documents, and apply to the entire document or to specific fields.
Query-time boosts
Query-time boosts are applied when constructing a search query, and apply to specific fields
Let your brand's are brand_a, brand_b and brand_c and brand field name is brand
Make brand_a boost by 1, brand_b boost by 2 and brand_c boost by 3
Query-time boosts :
brand:"brand_a"^1 OR brand:"brand_b"^2 OR brand:"brand_c"^3
Index-time boosts :
<add>
<doc boost="1">
<field name="brand">brand_a</field>
...
</doc>
<doc boost="2">
<field name="brand">brand_b</field>
...
</doc>
<doc boost="3">
<field name="brand">brand_c</field>
...
</doc>
</add>
You can use any one of the above depending upon your requirement.
Are you sure you really want to ignore relevance completely and just return random products that (100% ?) match user search? You can have a random field to do so, but I suspect that's a short term solution.
Would you prefer instead to - for example - group the results by brand and then return top X results from each brand? Solr has Collapse and Expand functionality for that.

Sort on field completeness of Solr Documents

I have this Solr field
<field name="listing_thumbnail" type="string" indexed="false" stored="true"/>
Now when the results are shown the fields without the field value should be shown at the last. Is this possible in SOLR? To generalise is it possible to sort documents on field completeness?
You can make use of bq (Boost Query) Parameter of the dismax/edismax query handler. This allows to query if a field is empty or not and then affect the score, but to do so the field needs to be indexed=true.
If you had your field indexed you could add bq=(listing_thumbnail:*) - this would give a push to all documents with a value in that field.

how to search within a polygon in solr 4.10

Hi below is data xml which I have inserted in solr.
<add>
<doc>
<field name="id">3007</field>
<field name="name">Autauga</field>
<field name="coord">POLYGON((-10 30,-40 40,-10 -20,40 20,0 0,-10 30))</field>
</doc>
</add>
There will be many documents of such type denoting separate regions
Now please let me know How can I search that document having a given point which lies in the range of polygon.
Your Solr Version must be 4 or higher and you have to import the JTS jar-file. You also have to define a field with a fieldType of "solr.SpatialRecursivePrefixTreeFieldType". Then you can query using a filter query like fq=geo:"Intersects(10.12 50.02)".
But please see my previous post or http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 for more detailed information.

Solr click scoring implementation

after searching and searching over the net, i've found a possible open-source solution for the click-count-popularity in solr (=does not require a payd version of lucid work search).
In my next two answers i will try to solve the problem in a easy way and in a way a little bit complex...
But first some pre-requisites.
We suppose to google-like scenario:
1. the user will introduce some terms in a textfield and push the search button
2. the system (a custom web-app coupled with solr) will produce a web page with results that are clickable
3. the user will select one of the results (e.g. to access to the details) and will inform the system to change the 'popularity' of the selected result
The very easy way.
We define a field called 'popularity' in solr schema.xml
<field name="popularity" type="long" indexed="true" stored="true"/>
We suppose the user will click on the document with id 1234, so we (=the webapp) have to call solr to update the popularity field of the document with id 1234 using the url
http://mysolrappserver/solr/update?commit=true
and posting in the body
<add>
<doc>
<field name="id">**1234**</field>
<field name="popularity" update="inc">1</field>
</doc>
</add>
So, each time the webapp will query something to solr (combining/ordering the solr 'boost' field with our custom 'popularity' field) we will obtain a list ordered also by popularity
The more complex idea is to update the solr index tracing not only the user selection but also the search terms used to obtain the list.
First of all we have to define a history field where to store the search terms used:
<field name="searchHistory" type="text_general" stored="true" indexed="true" multiValued="true"/>
Then we suppose the user searched 'something' and selected from the result list the document with id 1234. The webapp will call the solr instance at the url
http://mysolrappserver/solr/update?commit=true
adding a new value to the field searchHistory
<add>
<doc>
<field name="id">**1234**</field>
<field name="searchHistory" update="add">**something**</field>
</doc>
</add>
finally, using the solr termfreq function in every following query we will obtain a 'score' that combined with 'boost' field can produce a sorted list based of click-count-popularity (and the history of search terms).
This is interesting approach however I see some disadvantages in it:
Overall items storage will grow dramatically with each and every search.
You're assuming that choosing specific item is 100% correct and it wasn't done by mistake or for brief only. In this way you might get wrong search results along the way.
I suggest only to increment the counter or even to maintain relative counter based on the other results that the user didn't click it.

Resources