Escape LukeRequest with spaces, slashes, and colon - solr

I am using Solr 4.1. Using LukeRequest, I want to get the number of documents with data for a specific field. The name of the field is something like http://foo.org/bar/ baz (note the space between bar/ and baz). When I visit http://127.0.0.1:8983/root/admin/luke I get a list of all of my fields, including the aforementioned one. When I visit
http://127.0.0.1:8983/root/admin/luke?fl=http://foo.org/bar/ baz
I get no hits. I have tried url-encoding the string, escaping slashes, escaping the colon, escaping the space, using + instead of space, and every possible combination of backslashes I can think of. The solution posted at another StackOverflow question field listing in solr with "fl" parameter for a field having space in between didn't work for me.
I am really only looking for a yes-no answer to whether any documents have a value for this particular field, so if there is a better way to do this than LukeRequest, I'm all ears for that too.

AFAIK, escaping special characters using a backslash works for values, not for parameters like fl or sort.
This answer on lucene mailing list also confirms my thoughts. I guess you shouldn't have spaces in field names.

I believe you could accomplish the same thing using the TermsComponent as it can tell you if there are any terms associated with a field in the index. However, you will need to specify the field name in the query, so you will run into a similar issue. As Srikanth answered, you are better off not using spaces or special characters in field names.

Related

What Solr field type provides basic wildcard searches?

I posted a document with the field value "Pineapple upside down cake." I want to get hits for pineapple, pine*, *side, pi?????le, upside down, etc. I chose text_en which does not find *side nor pi?????le.
What out of the box field type will give me hits for all the above?
I'm using Solr 7.6.
If you want to retain all the tokens as is (as I commented on your previous question about this, the text_en type contains a stemmer), use a field type with just a WhitespaceTokenizer and a LowercaseFilter. You'll have to define this field yourself.
I'm guessing you can use text_general to get a decent enough answer (it uses the StandardTokenizer, so it'll split on a few more cases than just whitespace).
The reason is that wildcard searches happens without most processing taking place (as it's impossible to do proper handling of stemming, splitting, etc. when you don't have the complete token), so any wildcard search will be against the generated list of tokens after processing.

Solr OR query on a text field

How to perform a simple query on a text field with an OR condition? Something like name:ABC OR name:XYZ so the resulting set would contain only those docs where name is exactly "XYZ" or "ABC"
Dug tons of manuals, cannot figure this out.
I use Solr 5.5.0
Update: Upgraded to Solr 6.6.0, still cannot figure it out. Below are illustrations to demonstrate my issue:
This works:
This works too:
This still works:
But this does not! Omg why!?
There are many ways to perform OR query. Below I have listed some of them. You can select any of it.
[Simple Query]
q=name:(XYZ OR ABC)
[Lucene Query Parser]
q={!lucene q.op=OR df=name v="XYZ ABC"}
Your syntax is right, but what you're asking for isn't what text fields are made for. A text field is tokenized (split into multiple tokens), and each token is searched by itself. So if the text inserted is "ABC DEF GHI", it will be split into three separate tokens, namely "ABC", "DEF" and "GHI". So when you're searching field:ABC, you're really asking for any document that has the token "ABC" somewhere.
Since you want to perform an exact match, you want to query against a field that is defined as a string field, as this will keep the value verbatim (including casing, so the matching will be case sensitive). You can tell Solr to index the same content into multiple fields by adding a copyFile instruction, telling it to take the content submitted for field foo and also copying it into field bar, allowing you to perform both an exact match if needed and a more general search if necessary.
If you need to perform exact, but case insensitive, searches, you can use a KeywordTokenizer - the KeywordTokenizer does nothing, keeping the whole string as a single token, before allowing you to add filters to the analysis chain. By adding a LowercaseFilter you tell Solr to lowercase the string as well before storing it (or querying for it).
You can use the "Analysis" page under the Solr admin page to experiment and see how content for your field is being processed for each step.
After that querying as string_field:ABC OR string_field:XYZ should do what you want (or string_field:(ABC OR XYZ) or a few other ways to express the same.
A wacky workaround I've just come up with:

How to search word with and without special characters in Solr

We have used StandardTokenizerFactory in the solr. but we have faced issue when we have search without special character.
Like we have search "What’s the Score?" and its content special character. now we have only search with "Whats the Score" but we didn't get proper result. its
means search title with and without special character should we work.
Please suggest which Filter we need to use and satisfy both condition.
If you have a recent version of Solr, try adding to your analyzer chain solr.WordDelimiterGraphFilterFactory having catenateWords=1.
This starting from What's should create three tokens What, s and Whats.
Not sure if ' is in the list of characters used by filter to concatenate words, in any case you can add it using the parameter types="characters.txt"

What fieldtype to choose and how to look my query

The problem is this: I've got a column (named name)which consist of names for Example "Иван Кирилов Петров", "Нина Семова Мариножа" and so on.
So I want to make a query which will get all the names that has first name 'Иван' and last name 'Петров'; The second name doesn't matter so i will put * wildcard character.
Also there is a bigger problem: I should be able in a case if the user writes "Иван Кирилов Петров" to find this exact person
what I have tried :
I made the field text_ws type
and tested the following queries:
q=name:Иван*Петров
perfect - it finds what I want - all the names with first Иван and last Петров;
But then i want to find Иван Кирилов Петров i get no response because I want to make an exact search and my type should be string
How can I solve this!
Try adding autoGeneratePhraseQueries="true" flag on your text_ws type definition. And use debugQuery=true flag to see how it does the matches against the field. If the basic thing work, you can then look at pf3 flag in eDismax configuration to boost the query matches.
Solr also comes with dedicated Token Filters for Russian, but you probably don't care about that for the people's names.
I don't think you need a wild-card query. If you are only splitting on white-space during index time (text_ws) and you get complete first, last and/or middle names for query, you can do an AND query like
q=name:(Иван AND Петров)
or
q=name:(ИВАН AND МИНЧЕВ AND ПЕТРОВ)
Update: After your comment, I see that this will do a bag-of-words search and won't preserve the order. I guess you need to keep a string copy field of name, say name_str, which will give you more search options. For example, if there are 2 spaces in the query, meaning you get the first, middle and last names, then you can do an exact match on name_str like
q=name_str:"ИВАН%20МИНЧЕВ%20ПЕТРОВ"
If you are using Solr 4.0 and above, then regex query on the string field can help you. You can do
q=name_str:/ИВАН.*ПЕТРОВ/
will match anything that begins with ИВАН and ends with ПЕТРОВ.
or even
q=name_str:/Иван.*?Кирилов.*?Петров/
Unfortunately, there is no Solr wiki page on regex search yet, but you can google around.
You need to distinguish between the different types of queries you want to do and do different searches. Maybe give a check-box to your users asking if they want an exact match or not.

Escaped asterisk/query mark do not escape when using leading wildcard in Edismax

I'm trying to find documents containing asterisks/query marks in Solr text field using Edismax parser. Everything works perfectly when I search for usual text (fq={!edismax}textfield:*sometext*) or even for any other special Lucene character using escaping (fq={!edismax}textfield:*\~*).
However when searching for * (fq={!edismax}textfield:*\**) or ? (fq={!edismax}textfield:*\?*) these characters seem not to be escaped, since all documents are returned. I try also URL encoding for escaped characters (like \%2A instead of \*), however the result is the same.
The problem appear to concern leading wildcards only, since fq={!edismax}textfield:\** and fq={!edismax}textfield:\?* return correct results, but fq={!edismax}textfield:*\* and fq={!edismax}textfield:*\? do not (as well as fq={!edismax}textfield:*sometext\* etc.).
How is it possible to search for */? using Edismax with leading asterisk wildcard?
Quoting the asterisk works for me. This query finds two books in my index with a standalone asterisk in the title:
title:"*"
Here is the title of one of them: "Be * Know * Do, Adapted from the Official Army Leadership Manual".
I'm using a edismax with Solr 3.3.

Resources