Exact phrase search in solr (no substring) - solr

I am using plain solr queries via http from rsolr ruby gem. So its practically the same as doing a query using solr admin interface. I need to search for exact phrases like "SFO 10+" in solr, but it will be treated as a substring search. And the "+" will have some special meaning even when using \\ as noted in the doc. It must not match substring, and the plus should be treated literally.
q: col0_t:"SFO 10+" returns 1 correct hit - 1 wrong
q: col0_t:"SFO 10\\+" same result
q: col0_t:"SFO 10" same result
The unexpected hit is:
"col0_t":"SFO 10-15"
Its a third party database, so I rather not modify anything in the database, unless its strictly necessary. It could damage the original system. We have made our own module on top of this system. I hope to do this directly, or I must clean the data afterwards in the app.

Related

SOLR Solarium can we use filter-queries with dismax-queries?

i just built a search form backed by solr, we are using the solarium library to construct our requests.
we built a "huge" collection of filterqueries like that one:
$query = $client->createQuery($client::QUERY_SELECT);
$query->setStart(0)->setRows(1000);
$query->addFilterQuery($query->createFilterQuery("foo")->setQuery("bar:true"));
$query->addFilterQuery($query->createFilterQuery("fo")->setQuery("ba:false"));
....
but we realized that the search just hits all the single fields we specify in the filterqueries, but we have to actually query multiple fields. while reading the docs i realized we could have been wrong, right? the correct approach would be to use disMax queries (in combination with facets?)? im wondering, can we use DisMax in combination with filterqueries to "expand" our search to multiple fields (with boosts) ? or do we have to actually rework everything?
im kinda missing the big picture to decide what the best/working solution would be
help is much appreciated
edit:
solr:
solr-spec 7.6.0
solarium:
solarium/solarium 6.0.1 PHP Solr client
You can give a query parser when giving the fq argument:
fq={!dismax qf="firstfield secondfield^5"}this is my query
The syntax is known as Local Parameters. Since dismax (or edismax which you should normally use now) doesn't have a identifier in front of it, it is implicitly parsed as the type.
If a local parameter value appears without a name, it is given the implicit name of "type". This allows short-form representation for the type of query parser to use when parsing a query string.
You'll have to make sure that Solarium doesn't escape the value you give to setQuery, but seeing as you're already giving a field:value combination, it doesn't seem to get escaped. Double check the Solr log to see exactly what query is being sent to Solr (or ask Solarium to give you the exact query string being sent if possible).

Solr OR query on a text field

How to perform a simple query on a text field with an OR condition? Something like name:ABC OR name:XYZ so the resulting set would contain only those docs where name is exactly "XYZ" or "ABC"
Dug tons of manuals, cannot figure this out.
I use Solr 5.5.0
Update: Upgraded to Solr 6.6.0, still cannot figure it out. Below are illustrations to demonstrate my issue:
This works:
This works too:
This still works:
But this does not! Omg why!?
There are many ways to perform OR query. Below I have listed some of them. You can select any of it.
[Simple Query]
q=name:(XYZ OR ABC)
[Lucene Query Parser]
q={!lucene q.op=OR df=name v="XYZ ABC"}
Your syntax is right, but what you're asking for isn't what text fields are made for. A text field is tokenized (split into multiple tokens), and each token is searched by itself. So if the text inserted is "ABC DEF GHI", it will be split into three separate tokens, namely "ABC", "DEF" and "GHI". So when you're searching field:ABC, you're really asking for any document that has the token "ABC" somewhere.
Since you want to perform an exact match, you want to query against a field that is defined as a string field, as this will keep the value verbatim (including casing, so the matching will be case sensitive). You can tell Solr to index the same content into multiple fields by adding a copyFile instruction, telling it to take the content submitted for field foo and also copying it into field bar, allowing you to perform both an exact match if needed and a more general search if necessary.
If you need to perform exact, but case insensitive, searches, you can use a KeywordTokenizer - the KeywordTokenizer does nothing, keeping the whole string as a single token, before allowing you to add filters to the analysis chain. By adding a LowercaseFilter you tell Solr to lowercase the string as well before storing it (or querying for it).
You can use the "Analysis" page under the Solr admin page to experiment and see how content for your field is being processed for each step.
After that querying as string_field:ABC OR string_field:XYZ should do what you want (or string_field:(ABC OR XYZ) or a few other ways to express the same.
A wacky workaround I've just come up with:

How to perform an exact search in Solr

I implementing Solr search using an API. When I call it using the parameters as, "Chillout Lounge", it returns me the collection which are same/similar to the string "Chillout Lounge".
But when I search for "Chillout Lounge Box", it returns me results which don't have any of these three words.(in the DB there are values which have these 3 values, but they are not returned.)
According to me, Solr uses Fuzzy search, but when it is done it should return me some values, which will have at least one these value.
Or what could be the possible changes I should to my schema.XML, such that is would give me proper values.
First of all - "Fuzzy search" is a feature you'll have to ask for (by using ~ in standard Lucene query syntax).
If you're talking about regular searches, you can use q.op to select which operator to use. q.op=AND will make sure that all the terms match, while q.op=OR will make any document that contain at least one of the terms be returned. As long as you aren't using fq for this, the documents that match more terms should be scored higher (as the score will add up across multiple terms), and thus, be shown higher in the result set.
You can use the debug query feature in the web interface to see scores for each term for a document, and find out why the document was returned at all. If the document doesn't match any terms, it shouldn't be returned, unless you're asking for all documents to be returned.
Be aware that the analyzer chain defined for the field you're searching might affect what's considered a match and not.
You'll have to add a proper example to get a more detailed answer.

Solr Exact Query

I am using SolrNet to try and perform an exact query search
I have a document with the URL stored in Solr as : file://C:/Users/me/docs/X%20Item3
I want to match all documents that contain "X Item", so will be looking for a "X Item"
I have
new SolrQueryByField("url", "*\"X Item\"*");
But this does not return the document. I also do not want to have to convert space characters to %20 but I may have to if Solr will not do it for when it parses the query.
Help appreciated
Solr does not support wildcard searches by default at the beginning of a term. You can work around this by adding a ReversedWildcardFilter to the field indexing definition.
Depending on the kind of searches performed, you could also split on / to index each part of the path separately, or just the file name.
You shouldn't have to convert spaces to %20, as that should be performed by the client library (I'm not familiar with how SolrNet does it, but it really should abstract that away from you) when making an HTTP request.

Solr questions regarding handler resolution and escaping

I have a couple of questions regarding Solr usage:
Certain requests can be sent to different paths (handlers?). For example, the MoreLikeThis component can being sent to either /select or /mlt.
I have found these two links in the Solr wiki:
http://localhost:8983/solr/mlt?q=id:UTF8TEST&mlt.fl=manu,cat&mlt.mindf=1&mlt.mintf=1&mlt.match.include=false
http://localhost:8983/solr/select?q=apache&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mintf=1&fl=id,score
What is the reasoning behind this setup? If I decide to send my MoreLikeThis requests to /mlt does this mean I can not utilize any /select specific calls - if there is even such a thing - such as facets - ? If not, can a /select path can be configured to handle all requests from Spellcheck to Clustering?
How do you escape double character special strings (&&, ||) in Lucene?
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping+Special+Characters
Do I escape the first character only (\&&) or do I escape both? And when do I need to escape them? A couple of tests that I performed on the example server provided in the Solr package were inconclusive:
http://localhost:8983/solr/select/?q=manu:%22apple%20%26%26%22%20AND%20manu:%22computer%22
Still returns results,
1) The rationale behind MoreLikeThisHandler is explained in the Solr wiki:
When you specifically want information
about similar documents, you can use
the MoreLikeThisHandler.
If you want to filter the similar
results given by MoreLikeThis you have
to use the MoreLikeThisHandler. It
will consider the similar document
result set as the main one so will
apply the specified filters (fq) on
it. If you use the
MoreLikeThisComponent and apply query
filters it will be applyed to the
result set returned by the main query
(QueryComponent) and not to the one
returned by the MoreLikeThisComponent.
2) You need to escape every single character.

Resources