When I am passing queries in solr I pass them as strings (“blah blah”). I am doing this because I have encoding problems with Greek (my input field accept Greek characters only as string). But solr sees the characters inside the quotes as an “exact match” term. Is there a way to remove the double quotes from Solr?
Thanks
If you use solr.StrField in your schema, it makes sense that you get exact matches, see:
http://azeckoski.blogspot.com/2009/06/tricky-solr-schema-issue-with-strfield.html
You should use solr.TextField really, that would allow you to use Greek analyzers. I don't quite understand why it accepts Greek characters only as strings. Can you explain ?
About Greek lower case and stemming:
http://wiki.apache.org/solr/LanguageAnalysis#Greek
On the other hand, please note that if you use stemming, you won't be able to do exact matches anymore...
Related
We have used StandardTokenizerFactory in the solr. but we have faced issue when we have search without special character.
Like we have search "What’s the Score?" and its content special character. now we have only search with "Whats the Score" but we didn't get proper result. its
means search title with and without special character should we work.
Please suggest which Filter we need to use and satisfy both condition.
If you have a recent version of Solr, try adding to your analyzer chain solr.WordDelimiterGraphFilterFactory having catenateWords=1.
This starting from What's should create three tokens What, s and Whats.
Not sure if ' is in the list of characters used by filter to concatenate words, in any case you can add it using the parameter types="characters.txt"
I'm a newbie in solr and i would like to search with special character.
for example
id:123
data:it's
q=it'
then it will retain the result data:it's
thanks
Donquixote
The special character ' you have used in the query, q=data:it'* will give you the result.
But there are some special characters like ~^*(){}[]:\" and white space. For them you have to use escape character \ during query. Other special characters, available in keyboard can be searched as is.
I am using Solr 4.1. Using LukeRequest, I want to get the number of documents with data for a specific field. The name of the field is something like http://foo.org/bar/ baz (note the space between bar/ and baz). When I visit http://127.0.0.1:8983/root/admin/luke I get a list of all of my fields, including the aforementioned one. When I visit
http://127.0.0.1:8983/root/admin/luke?fl=http://foo.org/bar/ baz
I get no hits. I have tried url-encoding the string, escaping slashes, escaping the colon, escaping the space, using + instead of space, and every possible combination of backslashes I can think of. The solution posted at another StackOverflow question field listing in solr with "fl" parameter for a field having space in between didn't work for me.
I am really only looking for a yes-no answer to whether any documents have a value for this particular field, so if there is a better way to do this than LukeRequest, I'm all ears for that too.
AFAIK, escaping special characters using a backslash works for values, not for parameters like fl or sort.
This answer on lucene mailing list also confirms my thoughts. I guess you shouldn't have spaces in field names.
I believe you could accomplish the same thing using the TermsComponent as it can tell you if there are any terms associated with a field in the index. However, you will need to specify the field name in the query, so you will run into a similar issue. As Srikanth answered, you are better off not using spaces or special characters in field names.
I want to use the solr keepwordfilterfactory but not getting the appropriate tokenizer for that. Use case is, i have a string say hi i am coming, bla-bla go out. Now from the following string i want to keep the words like hi i, coming,,bla-blaetc. So what tokenizer to use with the filter factory so that i am able to get any such combination in facets. Tried different tokenizer but not getting the exact result. I am using solr 4.0. Is there any such tokenizer that tokenizes based on the keepwords used.
What are your 'rules' for tokenization (splitting long text into individual tokens). The example above seem to be implying that sometimes you have single word tokens and sometimes a multi-word ("hi i"). The multi-word case is problematic here, but you might be able to do it by combining ShingleFilterFactory to give you multi-word tokens as well as the original ones and then you keep only the items you want.
I am not sure whether KeepWord filter deals correctly with multi-word strings. If it does not, you may want to have a special separator character during shingle process and then regex filter it back to space as the last step.
I'm trying to find documents containing asterisks/query marks in Solr text field using Edismax parser. Everything works perfectly when I search for usual text (fq={!edismax}textfield:*sometext*) or even for any other special Lucene character using escaping (fq={!edismax}textfield:*\~*).
However when searching for * (fq={!edismax}textfield:*\**) or ? (fq={!edismax}textfield:*\?*) these characters seem not to be escaped, since all documents are returned. I try also URL encoding for escaped characters (like \%2A instead of \*), however the result is the same.
The problem appear to concern leading wildcards only, since fq={!edismax}textfield:\** and fq={!edismax}textfield:\?* return correct results, but fq={!edismax}textfield:*\* and fq={!edismax}textfield:*\? do not (as well as fq={!edismax}textfield:*sometext\* etc.).
How is it possible to search for */? using Edismax with leading asterisk wildcard?
Quoting the asterisk works for me. This query finds two books in my index with a standalone asterisk in the title:
title:"*"
Here is the title of one of them: "Be * Know * Do, Adapted from the Official Army Leadership Manual".
I'm using a edismax with Solr 3.3.