Solr search on a field with ReverseStringFilterFactory return 0 records for reverse input

Solr search on a field with ReverseStringFilterFactory return 0 records for reverse input - solr

I have a requirement where user should able to get same result when searched with a String in reversed or striaght for
example: q="F44" or q="44F" should result same result.
I have created a new field "text_rev" which is assigned to below Field Type.
And I did Copy field with actual/original field "retailId"
<copyField source="retailId" dest="text_rev"/>
<fieldType name="text_rvsstr" class="solr.TextField"><analyzer><tokenizer class="solr.StandardTokenizerFactory"/><filter class="solr.ReverseStringFilterFactory"/></analyzer></fieldType>
when I search with q=text_rev:F44 i get the result but when i search with q=text_rev:44F i get 0 results.
Please advice.

Those searches are on the same field. Searching the reverse direction is only going to work on the reversed field, and searching the forward direction is only going to work on the original field.
By searching both fields for the same information, you can check both directions in one query.
q=retailId:F44 OR text_rev:F44

You need to search both fields. Also, if you actually expect to search in reverse, you need to have asymmetric index and query-type definition. Otherwise your term will get reversed both during indexing and querying and you effectively loose any reason to do so.
You can test that by using Analyse screen of the Admin UI and providing content in both boxes. It will then show how the terms get processed and matched during indexing/querying.

Related

SOLR: Fuzzy search on a text field with spaces

Here's my problem: I have a single text field that is indexed by SOLR, which is the usernames from our database. I'd like the search to be fuzzy and not an exact match. Eg; if the username is "krishnarayaprolu" and I search with a spelling mistake "krishnIrayaprolu", it should still return the record.
This is working fine for me except when the usernames have a space in them. So a username: "krishna rayaprolu", and a search string "krishnI rayaprolu~0.5" is not returning the record. It is returning fine if the spelling mistake is at the end like "krishna rayaprolI~0.5". Any ideas?
For my config, I tried WhiteSpaceTokenizerFactory and StandardTokenizerFactory. On the search side, I tried quotes and escaping the space. None of them helped with my space+fuzziness problem. I'm using the admin interface for searching. Appreciate any pointers.

I have solution for your problem, only need to add some fields in your schema.
Create new ngram field and copy all you title name in ngram field.
When you fire any query for missspell word and you get blank result then split
the word and again fire the same query you will get results as expected.
Example : Suppose user searching for word "krishna rayaprolu" but type it as "krishnI rayaprolu~0.5", then
create query in below way you will get results as expected hopefully.
**(ngram:"krishnI rayaprolu~0.5" OR ngram:"kri" OR ngram:"kris" OR ngram:"krish" OR ngram:"krishn" OR ngram:"krishnI" OR ngram:"ray" OR ngram:"raya" OR ngram:"rayap" ..... )**
We have split the word sequence wise and fire query on field ngram.
Hope it will help you.

Solr: How to search records ignoring case in field type "string"?

I have indexed the following record in my collection
{
"app_name":"atm inspection",
"appversion":1,
"id":"app_1427_version_2449",
"icon":"/images/media/default_icons/app.png",
"type":"app",
"app_id":1427,
"account_id":556,
"app_description":"inspection",
"_version_":1599625614495580160}]
}
and It's working fine unless an until i search records case sensitively i.e if i write following Solr query to search records whose app_name contains atm then Solr is returning above response which is a correct behaviour.
http://localhost:8983/solr/NewAxoSolrCollectionLocal/select?fq=app_name:*atm\ *&q=*:*
However, If i execute following Solr query to search records whose app_name contains ATM
http://localhost:8983/solr/NewAxoSolrCollectionLocal/select?fq=app_name:*ATM\ *&q=*:*
Solr is not returning above response because ATM!=atm.
Can someone please help me with the Solr query to search records case insensitively.
Your help is greatly appreciated.

You can't. The field type string requires an exact match (it's a single, unprocessed token being stored for the field value).
The way to do it is to use a TextField with an associated Tokenizer and a LowercaseFilter. If you use a KeywordTokenizer, the whole token will be kept intact (so it won't get split as you'd usually assume with a tokenizer), and since it's a TextField it can have a analysis chain associated - allowing you to add a LowercaseFilter.
The LowerCaseFilter is multiterm aware as far as I remember, but remember that wildcard queries will usually not have any filters applied. You should therefor lowercase the value before creating your query yourself (even if it probably will work in this simple case).

How to save value with wildcard in Solr?

all.
I have the following trouble with Solr. I need to implement "reverse" search with wildcards. I mean I want to keep value like "auto*" and this item should be found with request like "autocar", "autoplan" or "automate". Could someone help me with this, please? Thanks.

If you want to match shorter indexed value (auto) against longer searched value (autobus), you want a custom analysis chain that includes EdgeNGramFilter on the query side only. Then, the incoming search word will get split into possible prefixes and matched against the indexed term.

search most frequently used word in a selected set of documents

I need to find the most frequently used words in a given field from a selected set of documents. I tried luke handler,
http://localhost:8983/solr/admin/luke?fl=my_field&numTerms=1
But this query gives results considering whole content.

Assuming your field tokenizes to your definition of the word, you can just use faceting for that. That's why faceting fields are usually strings, because the algorithm looks at the tokens generated.
So, in your case, you want the opposite effect.

Different indexing and search strategies on same field without doubling index size?

For a phrase search, we want to bring up results only if there's an exact match (without ignoring stopwords). If it's a non-phrase search, we are fine displaying results even if the root form of the word matches etc.
We currently pass our data through standardTokenizer, StopFilter, PorterStemFilter and LowerCaseFilter. Due to this when user wants to search for "password management", search brings up results containing "password manager".
If I remove StemFilter, then I will not be able to match for the root form of the word for non-phrase queries. I was thinking if I should index the same data as part of two fields in document.
For the first field (to be used for phrase searches), following tokenizers/filters will be used:
StandardTokenizer, LowerCaseFilter
For the second field (Non-phrase searches)
StandardTokenizer, StopFilter, PorterStemFilter, LowerCaseFilter
Now, based on whether it's a phrase search or not, I need to rewrite user's query to search in the appropriate field.
Is this the right way to address this issue? Is there any other way to achieve this without doubling index size?
let's say user's query is
summary:"Furthermore, we should also fix this"
Internally this will be translated to
summary_field1:"Furthermore, we should also fix this"
If user's query is
summary:(Furthermore, we should also fix this)
Internally this will be translated to
+summary_field2:furthermor +summary_field2:we +summary_field2:should +summary_field2:also +summary_field2:fix
both summary_field1 and summary_field2 index the same data. summary_field1 passes through only StandardTokenizer and LowerCaseFilter, whereas summary_field2 passes through StandardTokenizer, StopFilter, PorterStemFilter and LowerCaseFilter.
Please let me know if I'm missing something here.

By defining two different fields you can search for exact matches.
By using boosts you can also bring results in one query. For example:
(firstField:"password management")^5 OR (secondField:"pasword management")^1

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight