How to make solr phrase search differentiate relative locations of words

How to make solr phrase search differentiate relative locations of words - solr

Let's say we have a query "Homepage Content".
And there are two records whose title field are (1) "Homepage content" and (2) "content in homepage", respectively.
How to configure solr so that (1) has a higher matching score than (2) for the given query.
(I know it does not make sense to use edismax in this simplified example. But I would like the problem solvers to be aware of the fact that I am using edismax in the real situation.)
Here is my current (extremely simplified) configuration:
schema.xml:
<field name="title" type="text_general" indexed="true" stored="true"/>
solrconfig.xml:
defType='dismax'
qf="title^2"

An alternative is to use the pf argument. It will rank the document higher if the terms are closer together.
Moreover you could use the ps parameter
(Phrase Slop) in order to specify the number of positions that two terms can be apart to match the relevant phrase.
Here is the link to the documentation.

Did you try with elevate component.
For some query string you can specify which doc to be in the top of result by using its docID in elevate.xml file, which will be inside conf folder.
example :
<query text="Homepage">
<doc id="docID" /> <!-- put the first doc ID-->
<doc id="docID" exclude="true" /> <!-- exclude this doc -->
</query>
even thought if doc is matched specifying exclude=true can eliminate that doc in the result.

Related

Solr click scoring implementation

after searching and searching over the net, i've found a possible open-source solution for the click-count-popularity in solr (=does not require a payd version of lucid work search).
In my next two answers i will try to solve the problem in a easy way and in a way a little bit complex...
But first some pre-requisites.
We suppose to google-like scenario:
1. the user will introduce some terms in a textfield and push the search button
2. the system (a custom web-app coupled with solr) will produce a web page with results that are clickable
3. the user will select one of the results (e.g. to access to the details) and will inform the system to change the 'popularity' of the selected result

The very easy way.
We define a field called 'popularity' in solr schema.xml
<field name="popularity" type="long" indexed="true" stored="true"/>
We suppose the user will click on the document with id 1234, so we (=the webapp) have to call solr to update the popularity field of the document with id 1234 using the url
http://mysolrappserver/solr/update?commit=true
and posting in the body
<add>
<doc>
<field name="id">**1234**</field>
<field name="popularity" update="inc">1</field>
</doc>
</add>
So, each time the webapp will query something to solr (combining/ordering the solr 'boost' field with our custom 'popularity' field) we will obtain a list ordered also by popularity

The more complex idea is to update the solr index tracing not only the user selection but also the search terms used to obtain the list.
First of all we have to define a history field where to store the search terms used:
<field name="searchHistory" type="text_general" stored="true" indexed="true" multiValued="true"/>
Then we suppose the user searched 'something' and selected from the result list the document with id 1234. The webapp will call the solr instance at the url
http://mysolrappserver/solr/update?commit=true
adding a new value to the field searchHistory
<add>
<doc>
<field name="id">**1234**</field>
<field name="searchHistory" update="add">**something**</field>
</doc>
</add>
finally, using the solr termfreq function in every following query we will obtain a 'score' that combined with 'boost' field can produce a sorted list based of click-count-popularity (and the history of search terms).

This is interesting approach however I see some disadvantages in it:
Overall items storage will grow dramatically with each and every search.
You're assuming that choosing specific item is 100% correct and it wasn't done by mistake or for brief only. In this way you might get wrong search results along the way.
I suggest only to increment the counter or even to maintain relative counter based on the other results that the user didn't click it.

How to view non-stored fields per document?

I have a field like this:
<field name="status" type="string" indexed="true" stored="false" required="false" />
Using LukeRequestHandler I can view only statistics of the indexed terms, I can view indexed terms per document if stored="true". TermsComponent can show only frequencies of terms, I cannot view terms per document.
Is it possibly to look inside the inverted index without setting stored="true" and reindexing Solr?

In order to view the indexed terms for a single document, you need to use the full Luke application, not the LukeRequestHandler. You would need to copy the index folder from your Solr data directory to another location, then open it in Luke.
There is however a workaround within solr itself - do a search that will return just the one document, and facet on the field you want to examine. Every term in the index for that field on that document will be an entry in the facet output. Here is a full sample URL for this kind of search:
http://localhost:8983/solr/core/select?q=id:1234&facet.field=status&facet.limit=-1&facet.mincount=1&facet=true&facet.method=enum
If you decide to go the Luke route, you can step through your index (or search for an individual document) and view just one document.
The official Luke page is here, but it only supports up through 4.0-ALPHA:
http://code.google.com/p/luke/
You can find Luke for versions beyond 4.0-ALPHA here:
https://java.net/projects/opengrok/downloads
There is an effort underway to absorb Luke into the Lucene/Solr source code as a module, so it will always be current and released at the same time as each Lucene/Solr version.

SOLR - Use single text field in schema for full text search

I am getting familiar with SOLR.
I would like to use SOLR for full text search for many kind of entities. I don't want to create a Document for every different type of entity. I don't want to be able to search for specific fields. I am only interested in that if a specified string is anywhere in any item.
In database terms for example I have a table News and a table Employee and I want to search for the word 'apple', I don't mind in which field it is, I only want to get back the database ID from the records which contain it.
Could it be a solution, that I use a SOLR schema something like this:
<fields>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="content" type="text" indexed="true" stored="false"/>
</fields>
So, I only need an ID and the contents. I put all the data, in which I want to be able search into one 'content' field. When I search for some words it looks for it in the 'id' and int the 'content'.
Is this a good idea? Any performance or design problem?
Thanks,
Tamas

See https://wiki.apache.org/solr/SchemaXml#Copy_Fields. It says:
A common requirement is to copy or merge all input fields into a single solr field. This can be done as follows:-
<copyField source="*" dest="text"/>
That's typically what is done to search across multiple fields.
But if you don't even want your original fields, just concatenate all your fields into one big field content and index in Solr. There should be no problems with that.

You can either copyField to text (see example in the distribution) and have that set as default field ("df" parameter in solrconfig.xml for the select handler).
Or, if you anticipate more complex requirements down the line and/or non-text searches, I would recommend looking at eDismax with qf parameter and it will handle searching all those fields itself.

solr displayed some results first when they are part of the results

I consider this solr psedo-doc
<doc>
<field name="title"/>
<field name="name"/>
<field name="keywords"/>
</doc>
Some doc's will have the keyword "up" which means that they should appear first (despite of their initial order position) when and only when they are part of the search results.
So lets say I have:
doc1('title1','Bob, Alice','people, up, couple')
doc2('title2','Smart Phone, Laptop, Bob','devices, electronics')
if I query with "title:title2 name:Bob" then I should get doc1 first (it has the 'up' keyword).
if I query with "name:Bob" I still get doc1 first for the same reason.
if I query with "name:Laptop" then I should only get doc2 in my results. doc1 should not be included since it doesnt match my search query.
Any suggestion to do this?

You have several options to do something like that:
function query / boost query (in dismax handler)
during index time (boost documents)
extract 'up' keyword to additional field and sort by this field, than score
For example (with dismax handler):
/select?defType=dismax&q=...&bq=keywords:"up"^1000

This can be solved with Solr's query time boosting. So following the guidance from the Solr Relevancy FAQ - you could add an additional boosted search term to all queries, e.g. title:title2 name:Bob keywords:up^2
You could also at index time for each document, determine if the up keyword is present then store that in an additional field (boolean for example) in your schema and boost the query results based on that boolean field.

Querying Solr without specifying field names

I'm new to using Solr, and I must be missing something.
I didn't touch much in the example schema yet, and I imported some sample data. I also set up LocalSolr, and that seems to be working well.
My issue is just with querying Solr in general. I have a document where the name field is set to tom. I keep looking at the config files, and I just can't figure out where I'm going awry. A bunch of fields are indexed and stored, and I can see the values in the admin, but I can't get querying to work properly. I've tried various queries (http://server.com/solr/select/?q=value), and here are the results:
**Query:** ?q=tom
**Result:** No results
**Query:** q=\*:\*
**Result:** 10 docs returned
**Query:** ?q=*:tom
**Result:** No results
**Query:** ?q=name:tom
**Result:** 1 result (the doc with name : tom)
I want to get the first case (?q=tom) working. Any input on what might be going wrong, and how I can correct it, would be appreciated.

Set <defaultSearchField> to name in your schema.xml
The <defaultSearchField> Is used by
Solr when parsing queries to identify
which field name should be searched in
queries where an explicit field name
has not been used.
You might also want to check out (e)dismax instead.

I just came across to a similar problem... Namely I have defined multiple fields (that did not exist in the schema.xml) to describe my documents, and want to search/query on the multiple fields of the document, not only one of them (like the "name" in the above mentioned example).
In order to achieve this, I have created a new field ("compoundfield"), where I then put/copyField my defined fields (just like the "text" field on the schema.xml document that comes with Solr distribution). This results in something like this:
coumpoundfield definition:
<field name="compoundfield" type="text_general" indexed="true" stored="false" multiValued="true"/>
defaultSearchField:
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>compoundfield</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
<!-- copyField commands copy one field to another at the time a document
is added to the index. It's used either to index the same field differently,
or to add multiple fields to the same field for easier/faster searching. -->
<!-- ADDED Fields -->
<copyField source="field1" dest="compoundfield"/>
<copyField source="field2" dest="compoundfield"/>
<copyField source="field3" dest="compoundfield"/>
This works fine for me, but I am not sure if this is the best way to make such a "multiple field" search...
Cheers!

It seems that a DisMax parser
is the right thing to use for this end.
Related stackoverflow thread here.

The current solution is deprecated in newer versions of lucene/solr. To change the default search field either use the df parameter or change the field that is in:
<initParams
path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults">
<str name="df">default_field</str>
</lst>
</initParams>
inside the solrconfig.xml
Note I am using a non-managed schema and solr 7.0.0 at the time of writing

Going through the solr tutorial is definitely worth your time:
http://lucene.apache.org/solr/tutorial.html
My guess is that the "name" field is not indexed, so you can't search on it. You'd need to change your schema to make it indexed.
Also make sure that your XML actually lines up with the schema. So if you are adding a field named "name" in the xml, but the schema doesn't know about it, then Solr will just ignore that field (ie it won't be "stored" or "indexed").
Good luck

Well, despite of setting a default search field is quite usefull i don't understand why don't you just use the solr query syntax:
......./?q=name:tom
or
......./?q=:&fq=name:tom

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to make solr phrase search differentiate relative locations of words - solr

Related

Solr click scoring implementation

How to view non-stored fields per document?

SOLR - Use single text field in schema for full text search

solr displayed some results first when they are part of the results

Querying Solr without specifying field names

Categories

Resources