Enabling solr highlighting on field - solr

I am trying to enable solr highlighting. It works on certain fields but doesnt on others.
The solr documentation says something like A '*' can be used to match field globs, such as
'text_*' or even '*' to highlight on all fields where
highlighting is possible.
I would like to know what decides whether a field is one where highlighting is possible

In addition to #MatsLindh's comment above about the type of the field having to be "text", I found the matrix at https://solr.apache.org/guide/6_6/field-properties-by-use-case.html to be helpful.
Basically a field should be indexed and stored for highlighting to be possible/

Related

Apache solr search text search (among multiple fields)

I am studying/getting familiar Apache Solr database.
I created a simple document via the admin UI:
{
"company_name":["Rikotech inc"],
"id":"12345",
"full_title":["ft rikotech marinov"],
"_version_":1681062832169287680}]
}
Here is the document fetched:
But when I type rikotech in the standard query field, I get no result:
Both full_title and company_name are of type text_general .
I watched YouTube video with some Indian guy, and it worked for him ;|
What am I missing here?
Solr will not search all fields (under any configuration, really) without specifying the fields. However, the tutorial you watched probably had the default copyField rule enabled where everything is copied into a field named _text_, and then that field is configured as the default search field. This effectively means that everything is being copied into a specific field, and then that (single) field is being searched by default.
In your case it's probably better to use the edismax query parser (check the box in front of edismax in the user interface), and then give full_title company_name as the query fields (qf). That will allow you to adjust the weights between the fields as well. full_title company_name^5 will give 5x as much weight to any hits in company_name compared to those in full_title.
I found the problem.
It was that the fields I want to search through by default were copied to some strange fields like full_title_str, instad of text . This is the correct schema setting:

Solr is highlighting wrong words

I have a document which is
{'remaining': 'planet holly wood las vegas','id': 'c6d8e7e5-7ba9-4b68-ae0b-bb31ec4872f3'}
my query is remaining:(((mirage OR *holly* OR (planet AND hollywood)) AND (las AND vegas)) AND id:c6d8e7e5-7ba9-4b68-ae0b-bb31ec4872f3)
in the highlighting result:
{'c6d8e7e5-7ba9-4b68-ae0b-bb31ec4872f3': {'remaining': ['<em>planet</em> <em>holly</em> wood <em>las</em> <em>vegas</em>']}}
Why is planet being highlighted? I'm using solr 8.4
Thanks.
It's being highlighted because it's part of your query.
In general highlighting works across all the tokens being matched by the query, not just those giving a hit (and by default, a hit isn't even necessary in the field you're highlighting on - just that the token from the query is present in the highlighted value).
You can tweak this slightly by using hl.requireFieldMatch, but I'm not sure if that'll work with the optional clauses in your OR statement.
hl.requireFieldMatch
By default, false, all query terms will be highlighted for each field to be highlighted (hl.fl) no matter what fields the parsed query refer to. If set to true, only query terms aligning with the field being highlighted will in turn be highlighted.
The full list of highlighting options is available in the reference guide.

Solr - How to highlight specific terms in specific fields

How can i highlight a specif term in a specif field?
For example, imagine the following query:
foo TITLE("bar")
, what i want to achieve is the highlight of foo in all fields and bar only in the field TITLE.
Until now the following has not worked:
q=<TITLE_field_internal_name>:"bar"hl.fl=&&hl.requireFieldMatch=true
Note: In the above example TITLE is re-mapped correctly to a solr field.
Most highlighting parameters supports the per-field parameter syntax:
f.TITLE.hl.<parameter>
Seeing as your syntax isn't valid Solr syntax and there is no way it'll know that TITLE("bar") refers to a field named TITLE, you'll have to extract that (or provide) that metadata yourself.
If you're querying different fields and only want to highlight the terms hit in those fields (i.e. if your query had been title:bar to only search for bar in the field title), you don't have to use per field settings, but can set hl.requireFieldMatch to true instead.
By default, false, all query terms will be highlighted for each field to be highlighted (hl.fl) no matter what fields the parsed query refer to. If set to true, only query terms aligning with the field being highlighted will in turn be highlighted.
Note: if the query references fields different from the field being highlighted and they have different text analysis, the query may not highlight query terms it should have and vice versa. The analysis used is that of the field being highlighted (hl.fl), not the query fields.
Finally, after much trial and error, i managed to have this thing working Here are a few snippets:
fq=(_query_:"{!edismax+qf%3D'container_title_en'+v%3D'hormones'}"+OR+_query_:"{!edismax+qf%3D$fqf+v%3D'cancer'}")
fqf=authors_tnss+etc+etc+...
hl.q=(_query_:"{!edismax+qf%3D'container_title_en'+v%3D'hormones'}"+OR+_query_:"{!edismax+qf%3D$fqf+v%3D'cancer'}")
hl.fl=id,external_id_s,etc,etc,...
hl.requireFieldMatch=true
, notice that if hl.fl fields are separated by , and in my custom fqf field in fq they are separated by +.

Solr 3.6.2 spellcheck multi-word phrase: how to get collations without ignored stopwords?

I'm having a problem with the Solr 3.6.2 default (field based) spellchecker configured with query time parameters
spellcheck.onlyMorePopular=true
spellcheck.count=5
spellcheck.collate=true
spellcheck.maxCollations=5
spellcheck.maxCollationTries=5
on a field type which has a solr.StopFilterFactory filter on its analyzers.
The suggestion phase works as intended :
the indexed field does not contain any stopword
no suggestion is provided for a given stopword
But the resulting collation always contains the ignored stopwords, which I don't want: I'd prefer a raw suggestion of combined terms over something which looks like a "sort of" natural language answer.
For instance, searching for "handfum of perries", I'd prefer "handful berry" over "handful of berry".
I don't think that the stopwords excluded from spellchecking suggestions because of the field query analyzer are "marked" for preservation like the official documentation goes about other query elements :
Note that the non-spellcheckable terms such as those for range
queries, prefix queries etc. are detected and excluded for
spellchecking. Such non-spellcheckable terms are preserved in the
collated output so that the original query can be run again, as is.
It seems two solutions would be
either having a custom query converter so the stopwords are ignored right from the start: not sure it is possible in 3.6.2
or having a custom spellchecker that would not try to find any suggestion for a stopword (or would always suggest an "empty" string), without messing up the collation process
Am I missing something ?
Regards

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources