Solr weird search behaviour - solr

I am having lots of solr document indexed which has field
uri = nntp://msnews.microsoft.com/microsoft.public.windows.server.sbs
but when i search with query
uri:nntp\://msnews.microsoft.com/microsoft.public.windows.server.sbs
It returns zero results. The search query works with similar other uri (nntp://msnews.microsoft.com/microsoft.public.windows.windowsxp.general) though.
What am i missing here?

If your search URI is similar to
/select?uri%3Anntp*&rows=0
you should still be able to get a good idea of how many items in that field begin with nntp without even returning any rows, the numFound attribute of the result tag should tell you.
If this is blank, I would check your logfile. It is entirely likely you're adding documents with commit turned off. I would use the command line scripts to force things to commit and refresh the readers:
sync
bin/commit
sync
bin/readercycle
Then I would issue that search again and see if you can see your data again.

Related

Solr query wildcard problem, mismatch in results number vs real document count

Ok, so the problem is that I get some crazy results using solr query from the admin console.
I try to search for some documents which have an alfresco property with a specific name.
The field name is "edm:uid"
So if I try to pass to "q" the following:
edm:uid:FULL_NAME_OF_THE_DOCUMENT
everything works perfectlly.
But if I try to use wildcards everythig breaks.
If I query for example "edm:uid:DOC_01_20190202*", I get let's say 5000 results, everything might be good. If I query "edm:uid:DOC_01*", I get around 1000 result, which I find crazy, as I remove from the matching, the resulting number should increase. If I query "edm:uid:DOC*" I get still around 1000 results, and I should have millions.
I really don't know how solr works, if anyone knows why this happens?
I tried several versions too and doesn't change the results, versions like:
edm:uid:"DOC*"
edm:uid:DOC*
edm:uid:"DOC*"
so tried to put quotes to value, or escape ":" or both didn't change anything.
Also, I found the schema with the fields, and that "edm:uid" is indexed and tokenized.
I also ticked the "debugQuery" option, but I don't understand anything there, just some scores.
Thanks in advance for any suggestions.

/select with 'q' parameter does not work

Whenever i query with q=: it shows all the documents but when i query with q=programmer 0 docs found.(contents is the default search field)
my schema has: id(unique),author,title,contents fields
Also query works fine for:
q=author:"Value" or q=title:"my book" etc, only for contents field no results.
Also when i query using spell checker(/spell?q=programmer) output shows spelling suggestions for this word,when 'programmer' is the right word and present in many documents.
I referred the example docs for configurations.
All of a sudden i am getting this,initially it worked fine.
I guess there some problem only in the contents field,but cannot figure it out.
Is it because indexes are not created properly for contents field?
(I am using solr 4.2 on Windows 7 with tomcat as webserver)
Please help.Thanks a lot in advance.
Are you sure you set the default search field? The reason you have this problem might be because you didn't set the <defaultSearchField> field in your schema.xml file. This is why "q=author:value" works while q=WHATEVER doesn't.
The Is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used.
But also consider this:
The is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used. It is preferable to not use or
rely on this setting; instead the request handler or query LocalParams
for a search should specify the default field(s) to search on. This
setting here can be omitted and it is being considered for
deprecation.
Do you have any data in your instance. try q=*:* and see what it returns. "for" is a stop word, may be it was filtered out. Look for something else as value to test.

Escape colon character in Solr wildcard query

I'm trying to query a text_general field named body for times like 9:15, 9:15pm, 9:15p, etc. I tried both of the following queries via the REST API without success:
q=body:9\:15* gives me no hits, missing docs that mention 9:15
q=body:"9:15"* gives me all docs, including docs that have nothing resembling 9:15
Debugging in Chrome, I enter these directly in the browser. I've also tried encodeURIComponent on the values to make sure the content isn't lost in HTTP translation. Same outcome either way.
I'm guessing there's a simple answer here and my mental model of how Solr queries work is just broken.
In cases like that I often do 2 things:
Turn Solr query debug on, so I can see that really goes into query. You will see extra node at the end of response.
&debug=query
Examine field analyser with Analysis tool. (url based on Solr's example core)
http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=9%3A30pm&analysis.query=9%3A30&analysis.fieldtype=text_general&verbose_output=0
Both methods should tell you exactly what is going wrong with your query. In second one you can check how matching work without reindexing anything.
Your time string gets tokenized following the Unicode standard annex UAX#29.
So the colon should be stripped out.
I think if you check you will see that your results should contain either 9 or 15.

Lucene OR query not working

I am trying to query Solr with following requirement:
_ I would like to get all documents which not have a particular field
-exclusivity:[* TO *]
I would like to get all document which have this field and got the specific value
exclusivity:(None)
so when I am trying to query Solr 4 with:
fq=(-exclusivity:[* TO *]) OR exclusivity:(None)
I have only got results if the field exists in document and the value is None but results not contain results from first query !!
I cannot understand why it is not working
To explain your results, the query (-exclusivity:[* TO *]) will always get no results, because you haven't specified any result to retrieve. By default, Lucene doesn't retrieve any results, unless you tell it to get them. exclusivity:(None) isn't a limitation placed on the full result set, it is the key used to find the documents to retrieve. This differs from a database, which by default returns all records in a table, and allows you to limit the set.
(-exclusivity:[* TO *]) only specifies what NOT to get, but doesn't tell it to GET anything at all.
Solr has logic to handle Pure negative queries (I believe, in much the same way as below, by implicitly retrieving all documents first), but from what I gather, only as the top level query, and it does not handle queries like term1 OR -term2 documented here.
I believe with solr you should be able to use the query *:* to get all docs (though that would not be available in raw lucene), so you could use the query:
(*:* -exclusivity:[* TO *]) exclusivity:(None)
which would mean, get (all docs except those with a value in exclusivity) or docs where exclusivity = "None"
I have founded answer to this problem. I have made bad assumption how "-" works in solr.I though that
-exclusivity:[* TO *]
add everything without exclusivity field to the data set but it is not the case. The '-' could only exclude things from data set. BTW femtoRgon you are right but I am using it as fq (filter query) not as a master query I have forgotten to mention that.
So the solution is like
-exclusivity:([* TO *] AND -(None))
and full query looks like
/?q=*:*&fq=-exclusivity:([* TO *] AND -(None))
so that means I will get everything does not have field exclusivity or has this field and it is populated with value None.

field listing in solr with "fl" parameter for a field having space in between

I have a field in my solr schema as "Post Date"(exclude the quotes). when i fire a query with "fl" (field list) parameter in order to view only Post Date of search results, since this field contains a space I am not getting anything in the docs responses. I tried using +, %20 but still i get no results. Please help.
I would like to inform that i have found a solution to this. I tried experimenting and hence came up with a solution on putting \+ as the substitute for white space in the query. Hence the query should be Post\+Date:[ranges]
I couldnt aford to change my schema as many teams are depending on it and we are upgrading our system to a new search engine.
You can specify (what Solr deems crazy) fields by wrapping them like this:
field(Post Date)
This actually changes the returned results fieldname too so you'll get back something like:
"field(Post Date)" : "2010-01-01"
And not just the name as you might imagine.
As a possible workaround, you might be able to use a wild card to achieve your results. Using the solr wiki http://wiki.apache.org/solr/CommonQueryParameters#glob you may be able to specify fl=Post*Date which would possibly get around your problem. I have not verified this but it might work.
Update: This doesn't seem to work on either version of solr I tried (1.4.0 and 3.6.1). Looks like this may have been discussed at http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams but it does not appear to be implemented.

Resources