Returning documents using multi-valued field - solr

I'm quite new to Solr and I'm supporting an existing Solr search engine which was written by someone else. I've been reading on Solr for the last couple of weeks so I'd consider myself beyond the basics.
A particular field, let's say name, is multi-valued. For example, a document has a field "name" with values "Alice, Trudy". We want that the document is returned when "Alice" or "Trudy" is input and not when "Alice Trudy" is entered. Currently the document is even with "Alice Trudy". How could this be done?
Thanks a lot!
Krt_Malta

If the field value is "Alice, Trudy", normally solr/lucene should match for "alice" or "trudy". If not, there could be special "Text Analysis" or stemming options active for this field.
Take a look at the part "text analysis" at the solr documentation: http://lucene.apache.org/solr/tutorial.html#Text+Analysis
and: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Related

Solr multilingual stemisation

I'm using Solr to index documents like .pdf or .docx. These documents are in french or in english and I want to use the stemisation for both languages.
For exemple, if I search "chevaux" I want to find "cheval" (french) and if I search "raise" I want to find "raising" (english).
Is there a way to do this without createting 2 core (one in english and one in french) ?
Have two fields, one with the field definition you want for French, and one with the field definition you want for English. Then use the Language Detection feature to submit the content to the correct field.
When searching, query the field that has the correct language as the user, or if you don't know, search both - or use language detection to try to do a better guess.
You can also index the same content into both fields, but my initial guess is that it'll give you weird results down the road, where someone enters a French word, but due to the processing rules for English, you get hit that wouldn't have happened if you only indexed to the correct field.
By enabling langid.map, you can tell Solr to index the content into fields named fieldname_langcode (where fieldname is picked up from langid.fl).
langid.map: Enables field name mapping. If true, Solr will map field names for all fields listed in langid.fl.
You can use langid.map.replace or langid.map.pattern if you want to change the default fieldname_langcode naming, but I'd leave those alone for now.

Manipulating and Removing Facets in Apache Solr

I am creating a front end application which queries through a database using the Apache Solr engine, but I have two issues that I just cannot find the answer to.
When Solr is processing a Facet query, how do I get the facet to be a single phrase ("Department of the Navy (160)") instead of a broken up facet of 4 terms ("Department (160)" "of (200)" "the (200)" "Navy(160)").
Also, how do I remove certain facets from being queried, for example "and" "to" "the" etc.
Thank you.
Looks like your phrase is being indexed into a Text field which, among many things, splits by whitespace. This is very good for full text search but not for faceting.
You can have a duplicate field for this, of type string (and not Text), which is not splitted. You can still use the original field for searching but the new string field for faceting.

Can Solr search key words precisely?

For example:
I want to search "support", I hope it will only return the results containing "support", and do NOT return the result containing "supports" or any other relevant matches.
Is it possible to implement like this?
Thanks.
Yes, if you search against an unanalyzed field type, matches are exact. In the default Solr schema the unanalyzed field type is named "string" (of class "solr.StrField")
EDIT: it depends on what you mean by "precisely". If your field value is "support desk" and your query is "support", should it match?
If your answer is yes, then you should look into configuring stemming.
If your answer is no, i.e. the query must match the field value and nothing else, then you should use a string (i.e. unanalyzed) field type.
Furthermore, if your query is "supports" and the field value is "Supports", should it match?
If you answer yes, then you should use a LowerCaseFilterFactory (you can't do this on a string field type, you'll have to switch to a text field type).
If you answer no, then it's ok to use a string field type.
In summary, the Lucene/Solr text analysis pipeline is very configurable, take a look at the analyzer docs for a reference of all available options.
What you are describing is called stemming. There is another almost identical question on stack overflow, check it out : Solr exact word search
You will need to re-index and disable stemming in your configuration. I don't believe it's possible to do that at query time since what is stored in your index is the stemmed version of the word. In your case "support" is stored in the index even is "supports" is displayed.
This should get you started How to configure stemming in Solr?

Return stemmed word in Solr

We have stemming in our Solr search and we need to retrieve the word/phrase after stemming. That is if I search for "oranges", through stemming a search for "orange" is carried out. If I turn on debugQuery I would be able to see this, however we'd like to access it through the result if possible. Basically, we need this, because we pass the searched word as a parameter to a 3rd party application which highlights the word in an online PDF reader. Currently, if a user searches for "oranges" and a document contains "orange", then the PDF wouldn't highlight anything since it tries to highlight "oranges" not "orange".
Thanks all in advance,
Krt_Malta
I've no experience with Solr but if you need it just for presentation to users you could stem their queries using the same stemmer Solr uses yourself. This would probably be faster since it would avoid a trip to Solr's index. For English this would presumably be http://tartarus.org/~martin/PorterStemmer/ - or you could check Solr's implementation.
However, a word of caution, most stemming algorithms do not guarantee that stemmed words will be actual words. Check here http://snowball.tartarus.org/algorithms/english/stemmer.html for examples.
You could use the implicit analysis request handler to get the stemmed word.
For your example, if you are using the text_en field and the Snowball Stemmer, the URL
<YOUR SOLR HOST>/solr/<YOUR COLLECTION>/analysis/field?analysis.query=oranges&analysis.fieldtype=text_en&verbose_output=1
would give you a json response, including the following:
"org.apache.lucene.analysis.snowball.SnowballFilter",
[
{
"text": "orang",
...

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources