Weird Solr behavior with AND operator

Weird Solr behavior with AND operator - solr

I'm performing 2 queries:
name:"\#"
name:"\#" AND userId:101
First query returns nothing and it's ok. But second query for some reason returns all records belonging to user.
What is wrong?

Your best bet when looking at a query and not knowing what is going on is by using the debugQuery option. It will show you what the string was entered in like as well as what it was parsed as.
There could be a lot of things going wrong. "#" could be one of your stop words. Also you could try sending a %23 since that is the url encoding for the # sign.

Related

Solr query wildcard problem, mismatch in results number vs real document count

Ok, so the problem is that I get some crazy results using solr query from the admin console.
I try to search for some documents which have an alfresco property with a specific name.
The field name is "edm:uid"
So if I try to pass to "q" the following:
edm:uid:FULL_NAME_OF_THE_DOCUMENT
everything works perfectlly.
But if I try to use wildcards everythig breaks.
If I query for example "edm:uid:DOC_01_20190202*", I get let's say 5000 results, everything might be good. If I query "edm:uid:DOC_01*", I get around 1000 result, which I find crazy, as I remove from the matching, the resulting number should increase. If I query "edm:uid:DOC*" I get still around 1000 results, and I should have millions.
I really don't know how solr works, if anyone knows why this happens?
I tried several versions too and doesn't change the results, versions like:
edm:uid:"DOC*"
edm:uid:DOC*
edm:uid:"DOC*"
so tried to put quotes to value, or escape ":" or both didn't change anything.
Also, I found the schema with the fields, and that "edm:uid" is indexed and tokenized.
I also ticked the "debugQuery" option, but I don't understand anything there, just some scores.
Thanks in advance for any suggestions.

AND in solr query returns more results

First of all, I'm not very experienced in using Solr, so I hope this isn't a stupid question ..
I am experiencing some unexpected behavior with a Solr query. Suppose the query is q="Foo:"Bar" . Now make it q="Foo:"Bar" AND() and we get more results back, which just seem random and certainly not meeting condition "Foo" = "Bar".
Am I missing something here? It doesn't seem logical that an extra condition would return more results instead of less.

Your example queries are not valid Solr queries - if you want to query the field "Foo" for the value "Bar", do Foo:Bar. The AND clause is used between several terms to combine the result for all the terms, i.e. Foo:Bar AND Spam:Eggs.
Your example probably just got parsed to be either Foo:Bar or the value AND somewhere in the default search field.

What means "*" after "#" in solr query

I have the query: http://localhost:8983/solr/user/select?q=username%3Auser%40site*&wt=json&indent=true for username:user#site* but the number of results is 0. If I remove * number of results will be correct. And if I use query username:user and username:user* the results will be same.
What's happening when I use * after # in queries?

The only thing happening is that * makes the query a wildcard query. When the query has a wildcard, no analysis (well, not exactly correct, as certain multitermaware filters still are applied) takes place. So if your field is split on # for example into user and site tokens when indexing, that same split does not happen when you're querying.
So Lucene tries to find a token that starts with user#site, but since the only tokens available in the index is user and site, none of these matches. If you tried searching user*, you'd probably get a hit.
You can use the Analysis page under Solr's Admin to see how a field is being processed for your input.

Solr OR query on a text field

How to perform a simple query on a text field with an OR condition? Something like name:ABC OR name:XYZ so the resulting set would contain only those docs where name is exactly "XYZ" or "ABC"
Dug tons of manuals, cannot figure this out.
I use Solr 5.5.0
Update: Upgraded to Solr 6.6.0, still cannot figure it out. Below are illustrations to demonstrate my issue:
This works:
This works too:
This still works:
But this does not! Omg why!?

There are many ways to perform OR query. Below I have listed some of them. You can select any of it.
[Simple Query]
q=name:(XYZ OR ABC)
[Lucene Query Parser]
q={!lucene q.op=OR df=name v="XYZ ABC"}

Your syntax is right, but what you're asking for isn't what text fields are made for. A text field is tokenized (split into multiple tokens), and each token is searched by itself. So if the text inserted is "ABC DEF GHI", it will be split into three separate tokens, namely "ABC", "DEF" and "GHI". So when you're searching field:ABC, you're really asking for any document that has the token "ABC" somewhere.
Since you want to perform an exact match, you want to query against a field that is defined as a string field, as this will keep the value verbatim (including casing, so the matching will be case sensitive). You can tell Solr to index the same content into multiple fields by adding a copyFile instruction, telling it to take the content submitted for field foo and also copying it into field bar, allowing you to perform both an exact match if needed and a more general search if necessary.
If you need to perform exact, but case insensitive, searches, you can use a KeywordTokenizer - the KeywordTokenizer does nothing, keeping the whole string as a single token, before allowing you to add filters to the analysis chain. By adding a LowercaseFilter you tell Solr to lowercase the string as well before storing it (or querying for it).
You can use the "Analysis" page under the Solr admin page to experiment and see how content for your field is being processed for each step.
After that querying as string_field:ABC OR string_field:XYZ should do what you want (or string_field:(ABC OR XYZ) or a few other ways to express the same.

A wacky workaround I've just come up with:

Escape colon character in Solr wildcard query

I'm trying to query a text_general field named body for times like 9:15, 9:15pm, 9:15p, etc. I tried both of the following queries via the REST API without success:
q=body:9\:15* gives me no hits, missing docs that mention 9:15
q=body:"9:15"* gives me all docs, including docs that have nothing resembling 9:15
Debugging in Chrome, I enter these directly in the browser. I've also tried encodeURIComponent on the values to make sure the content isn't lost in HTTP translation. Same outcome either way.
I'm guessing there's a simple answer here and my mental model of how Solr queries work is just broken.

In cases like that I often do 2 things:
Turn Solr query debug on, so I can see that really goes into query. You will see extra node at the end of response.
&debug=query
Examine field analyser with Analysis tool. (url based on Solr's example core)
http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=9%3A30pm&analysis.query=9%3A30&analysis.fieldtype=text_general&verbose_output=0
Both methods should tell you exactly what is going wrong with your query. In second one you can check how matching work without reindexing anything.

Your time string gets tokenized following the Unicode standard annex UAX#29.
So the colon should be stripped out.
I think if you check you will see that your results should contain either 9 or 15.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight