Solr stop words question - solr

My Solr installation is set up with the default stop words plus a few extra ones that I added.
Once in a while a user types a query string that consists of all stopwords. The result is that solr returns no documents at all.
What I would like to happen instead is that Solr returns all documents. Is this possible?
Frank

Off the top of my head, you could fetch the stopwords in your application (e.g. http://localhost:8983/solr/admin/file/?file=stopwords.txt), and use that list to detect stopwords in your user query before sending the query to Solr. If you detect that they're all stopwords, replace with *:* to fetch all documents.

Related

Synonyms in Solr Query Results

Whenever I query a string in solr, i want to get the synonyms of field value if there exists any as a part of query result, is it possible to do that in solr
There is no direct way to fetch the synonyms used in the search results. You can get close by looking at how Solr parsed your query via the debugQuery=true parameter and looking at the parsedQuery value in the response but it would not be straightforward. For example, if you search for "tv" on a text field that uses synonyms you will get something like this:
$ curl "localhost:8983/solr/your-core/select?q=tv&debugQuery=true"
{
...
"parsedquery":"SynonymQuery(Synonym(_text_:television _text_:televisions _text_:tv _text_:tvs))",
...
Another approach would be to load in your application the synonyms.txt file that Solr uses and do the mapping yourself. Again, not straightforward,

Solr: How to search records ignoring case in field type "string"?

I have indexed the following record in my collection
{
"app_name":"atm inspection",
"appversion":1,
"id":"app_1427_version_2449",
"icon":"/images/media/default_icons/app.png",
"type":"app",
"app_id":1427,
"account_id":556,
"app_description":"inspection",
"_version_":1599625614495580160}]
}
and It's working fine unless an until i search records case sensitively i.e if i write following Solr query to search records whose app_name contains atm then Solr is returning above response which is a correct behaviour.
http://localhost:8983/solr/NewAxoSolrCollectionLocal/select?fq=app_name:*atm\ *&q=*:*
However, If i execute following Solr query to search records whose app_name contains ATM
http://localhost:8983/solr/NewAxoSolrCollectionLocal/select?fq=app_name:*ATM\ *&q=*:*
Solr is not returning above response because ATM!=atm.
Can someone please help me with the Solr query to search records case insensitively.
Your help is greatly appreciated.
You can't. The field type string requires an exact match (it's a single, unprocessed token being stored for the field value).
The way to do it is to use a TextField with an associated Tokenizer and a LowercaseFilter. If you use a KeywordTokenizer, the whole token will be kept intact (so it won't get split as you'd usually assume with a tokenizer), and since it's a TextField it can have a analysis chain associated - allowing you to add a LowercaseFilter.
The LowerCaseFilter is multiterm aware as far as I remember, but remember that wildcard queries will usually not have any filters applied. You should therefor lowercase the value before creating your query yourself (even if it probably will work in this simple case).

Manipulating and Removing Facets in Apache Solr

I am creating a front end application which queries through a database using the Apache Solr engine, but I have two issues that I just cannot find the answer to.
When Solr is processing a Facet query, how do I get the facet to be a single phrase ("Department of the Navy (160)") instead of a broken up facet of 4 terms ("Department (160)" "of (200)" "the (200)" "Navy(160)").
Also, how do I remove certain facets from being queried, for example "and" "to" "the" etc.
Thank you.
Looks like your phrase is being indexed into a Text field which, among many things, splits by whitespace. This is very good for full text search but not for faceting.
You can have a duplicate field for this, of type string (and not Text), which is not splitted. You can still use the original field for searching but the new string field for faceting.

Solr query looking for IN, IT, or IS in field

When performing these queries (separately):
country:IN
-or-
country:IT
-or-
country:IS
... I get all items in the index returned. I want to get only the items whose country field matches those params. I've tried every combination of escaping with single/double quotes and single/double slashes. When doing so, no items are returned at all.
I've verified that items exist in the index for these params by dumping the whole index (with a loose query) and identifying them. I'm on django-haystack in case that matters, but the issue is there for both the Django python shell and the Solr web admin interfaces.
Thanks for any help!
Filter queries return a subset of documents that match them.
fq=country:(IN OR IT OR IS)
fq=country:IN
Those are standard noise / stop words. You can either remove the terms from the stopwords file (stopwords_en.txt) and redindex your documents. Or set the type to string and use fq like aitchnyu mentions above.

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources