Solr: How do I search for a phrase in a field? - solr

I was reading the documentation and saw this:
For example, suppose an index contains two fields, title and text,and that text is the default field. If you want to find a document called "The Right Way" which contains the text "don't go this way," you could include ... the following terms in your search query:
title:"Do it right" AND go
Since text is the default field, the field indicator is not required; hence the ... query above omits it.
The field is only valid for the term that it directly precedes, so the query title:Do it right will find only "Do" in the title field. It will find "it" and "right" in the default field (in this case the text field).
This seems strange to me, how would I search for the phrase "Do it right" in the title?

I continued reading and found the answer in the section titled "Grouping Clauses within a Field". You can use this:
title:("Do it right") AND go

Related

Is there a way to execute a query on SOLR where I have a list of words that need to be in different fields?

everybody. I'm trying to elaborate a query that complies with the following:
Find a set of words that appear in a group of fields. For example, i want to find the documents that have the words soccer, ball and goalkeeper in one or both fields: 'sport_name' and 'descritpion'.
The problem I'm having is that I need to treat both fields as only one for getting results like:
{
"sport_name":"soccer",
"description": "...played with a ball... positions are goalkeeper"
}
I need that the words appear in any field, but all the words need to appear in the "concatenated bigger field".
Is there a way to do this during query time?
Thanks!!
You can do this by using the edismax handler (defType=edismax), setting q.op=AND (since all the terms has to be present) and using qf=sport_name description to tell Solr to search for the given terms in both fields.
You can also use qf=sport_name^2 description to say that you want to weigh hits in the sport_name field twice as much as hits in the description field. So if there was a sport named something with ball, that hit would contribute more to the score than if the same content were present in the description field.

Solr - How to highlight specific terms in specific fields

How can i highlight a specif term in a specif field?
For example, imagine the following query:
foo TITLE("bar")
, what i want to achieve is the highlight of foo in all fields and bar only in the field TITLE.
Until now the following has not worked:
q=<TITLE_field_internal_name>:"bar"hl.fl=&&hl.requireFieldMatch=true
Note: In the above example TITLE is re-mapped correctly to a solr field.
Most highlighting parameters supports the per-field parameter syntax:
f.TITLE.hl.<parameter>
Seeing as your syntax isn't valid Solr syntax and there is no way it'll know that TITLE("bar") refers to a field named TITLE, you'll have to extract that (or provide) that metadata yourself.
If you're querying different fields and only want to highlight the terms hit in those fields (i.e. if your query had been title:bar to only search for bar in the field title), you don't have to use per field settings, but can set hl.requireFieldMatch to true instead.
By default, false, all query terms will be highlighted for each field to be highlighted (hl.fl) no matter what fields the parsed query refer to. If set to true, only query terms aligning with the field being highlighted will in turn be highlighted.
Note: if the query references fields different from the field being highlighted and they have different text analysis, the query may not highlight query terms it should have and vice versa. The analysis used is that of the field being highlighted (hl.fl), not the query fields.
Finally, after much trial and error, i managed to have this thing working Here are a few snippets:
fq=(_query_:"{!edismax+qf%3D'container_title_en'+v%3D'hormones'}"+OR+_query_:"{!edismax+qf%3D$fqf+v%3D'cancer'}")
fqf=authors_tnss+etc+etc+...
hl.q=(_query_:"{!edismax+qf%3D'container_title_en'+v%3D'hormones'}"+OR+_query_:"{!edismax+qf%3D$fqf+v%3D'cancer'}")
hl.fl=id,external_id_s,etc,etc,...
hl.requireFieldMatch=true
, notice that if hl.fl fields are separated by , and in my custom fqf field in fq they are separated by +.

Solr multilingual stemisation

I'm using Solr to index documents like .pdf or .docx. These documents are in french or in english and I want to use the stemisation for both languages.
For exemple, if I search "chevaux" I want to find "cheval" (french) and if I search "raise" I want to find "raising" (english).
Is there a way to do this without createting 2 core (one in english and one in french) ?
Have two fields, one with the field definition you want for French, and one with the field definition you want for English. Then use the Language Detection feature to submit the content to the correct field.
When searching, query the field that has the correct language as the user, or if you don't know, search both - or use language detection to try to do a better guess.
You can also index the same content into both fields, but my initial guess is that it'll give you weird results down the road, where someone enters a French word, but due to the processing rules for English, you get hit that wouldn't have happened if you only indexed to the correct field.
By enabling langid.map, you can tell Solr to index the content into fields named fieldname_langcode (where fieldname is picked up from langid.fl).
langid.map: Enables field name mapping. If true, Solr will map field names for all fields listed in langid.fl.
You can use langid.map.replace or langid.map.pattern if you want to change the default fieldname_langcode naming, but I'd leave those alone for now.

Solr highlighting gives field/snippets with ANY term, instead of those that satisfy the query fully

I'm using Solr 5.x, standard highlighter, and i'm getting snippets which matches even one of the search terms only, even if i indicate q.op=AND.
I need ONLY the fields and snippets that matches ALL the terms (unless i say q.op=OR or just omit it), i.e. the field/snippet must satisfy the query. Solr does return the field/snippet that has all the terms, but also return many others.
I'm using hl.fl=*, to get the only fields having the terms, and searching against the default field ('text' containing full doc). Need to use * since i have multiple dynamic fields. Most fields are 'text_general' type (for search and HL), and some are 'string' type for faceting.
If its not possible for snippets to have all the terms, i MUST get only the fields that satisfy the query fully (since the question is more talking about matching all the terms, but the search query can become arbitrarily complex, so the fields/snippets should match the query).
Also, next is to get snippets highlighted with proximity based search/terms. What should i do/use for this? The fields coming in highlighting in this scenario should also satisfy the proximity query (unlike i get a field that contain any term, without regard to proximity constrains and other query terms etc)
Thanks for your help.
I've also encountered the same problem with highlighting. In my case, the query like
(foo AND bar) OR eggs
highlighted eggs and foo despite bar was not present in the document. I didn't manage to come up with proper solution, however I devised a dirty workaround.
I use the following query:
id:highlighted_document_id AND text:(my_original_query)
with debugQuery set to true. Then I parse explain text for highlighted_document_id. The text contains the terms from the query, which have contributed to the score. The terms, which should not be highlighted, are not present in the explanation.
The Python regex expressions I use to extract the terms (valid for Solr 5.2.1):
term_regex = re.compile(r'weight\(text:(.+) in')
wildcard_term_regex = re.compile(r'text:(.+), product')
then I simply search the markings in the highlighted text and remove them if the term doesn't match against any of the term in term_regex and wildcard_term_regex.
The solution is probably pretty limited, but works for me.

Solr - Results that contain all terms, in any order

In a SOLR install, when I search against a field with a multi-word search term I want SOLR to return documents that have all of the terms in the search, but they do not need to be in the exact order.
For example, if I search for title of Brown Chicken Brown Cow, I want to find all documents that contain all of the terms Brown, Chicken and Cow, irrespective of order in the title field. So, for example, the title "The chicken and the cow have brown poop" should match the query. AFAIK, this is how Google executes searches as well.
I have experimented with the following query formats:
1. Title:Brown AND Title:Chicken
2. Title:Brown AND Chicken
3. Title:Brown+Chicken
I am very confused by the results. In some instances, the first two queries return the same exact set of results. In other instances, the first version will return many results and the second version will return none. The third version seems to meet my needs, but I am confused by the different meaning of the queries.
All of my tests have been run against a field of type text_en.
<field name="Title" multiValued="false" type="text_en" indexed="true" stored="true"/>
So, what's the best SOLR query/set up for this type of search? Also, is there an easy way to make Solr.NET take a user entered search term and convert it to this type of format?
Also, will SOLR by default give documents that match the order of the search phrase a higher relevancy score? If not, what's the right levers to pull to make that happen?
Edit:
Some of my confusion was caused by searching against not default fields vs default fields. Knowing this, the only format that works consistently is the first format.
If I were you I would try to use:
Title:(Brown Chicken)
Brackets will make it equivalent to your query no 1. Quotation will force Solr to search for exact match, including space and order
Please try Title:"Brown Chicken" or use Dismax query parser to handle your queries.
The wiki for lucene query parser speaks (emphasis mine):
....Since text is the default field, the field indicator is not required.
Note: The field is only valid for the term that it directly precedes,
so the query
title:Do it right
Will only find "Do" in the title field. It will find "it" and "right"
in the default field (in this case the text field).
Do you have only the title field in your data model?
Please run debugQuery=on to explain your query to see how they are scored: see it in action https://stackoverflow.com/a/9262300/604511

Resources