Connecting multiple Filter Queries (fq) in Apache Solr - solr

Let's assume, I have the following documents in my index:
title: Entry #1
myfield: 5
---
title: Entry #2
myfield: 2
---
title: Entry #3
As you can see, myfield is optional and not present in all documents.
Now I want to select all documents where myfield is greater than 3 or the field does not exist. Of course there is also a search word let's say entry, so it finds all documents.
So the query should return Entry #1 and Entry #3.
Currently I am querying like this:
q=entry
defType=dismax
qf=title
fl=*
fq=-myfield:* myfield:[3 TO *]
which does not return any documents. Each filter query alone, so -myfield:* and myfield:[3 TO *] are working like expected.
How can I connect these two filter queries?

It's usually helpful to think of each term of the query as a set of documents, and the boolean operators as operations performed between these sets (i.e. AND finds the intersection between two sets, while OR finds the union). A negative set would then be the difference between sets.
When you use a negative match, you have to subtract it from something. When you don't have any other clauses in your query, Solr helpfully appends the complete set of documents (*:*) in front of your query. But as soon as you add a second boolean term, Solr can't do that any longer - since it doesn't know what you actually mean with your query.
So your negative clause needs to start with a set that the other set (i.e. the documents that doesn't have field) can be subtracted from:
fq=(*:* -myfield:*) OR myfield:[3 TO *]

Related

SOLR: Need to perform a filter query for documents with a field value and documents with out the field itself

I need to filter for documents that have a specific value for a field and all documents which do not have the field :
fq":"((state:"CA") OR NOT(state:*))"
when I execute each subcomponent separately it gives results however when I execute them together I am getting 0 documents found.
You have to subtract the NOT from something.
fq=state:CA OR (*:* NOT state:[* TO *])
So either it's in the set denoted by state:CA, or it's in the set denoted by all documents, minus those that have a value in the field. You can't just say "minus those that have a value in the field", since there is nothing to subtract from (small nitpick: Solr automagically adds the whole set to purely negative queries, so q=-state:[* TO *] would work - but not when you add the boolean operators).

Nested document searches in Solr complex parentFilter syntax

We are adding nested documents to our Solr index. For this purpose, we've added a solr_record_type field to each record, but there will be an interval while we are updating the index where the original documents will have null in this field. We would like to treat all of the original documents as root documents.
In our Solr index, solr_record_type equals 1 and the child types are represented by 2-4. So, in order to get backwards compatibility with what is currently returned by queries, I added this fq parameter:
-solr_record_type:[2 TO 4]
However, I am having trouble composing the parentFilter in the child transformer. For the fl field I've tried:
*,[child parentFilter="-solr_record_type:[2 TO 4]"]
This doesn't work because it then omits the _childDocuments_ section from the results for some reason. I don't know why. I need some way to specify that the parent filter is either "null or 1" or "anything but 2, 3, and 4". How can I do this?
I was unable to find a definitive reference for syntax for the parentFilter, only very simple examples.
A negative query needs to be prefixed with what it's going to remove the documents from. Think of it as the intersection between the two sets, and if you only have the set which are "these documents should be removed", you have nothing to remove them from.
The regular query parser (and the edismax handlers) append the set of all documents, *:* automagically in front of negative queries for you, so it appears to work - until you start with longer AND and OR statements involving negative queries, where you suddenly need to prefix *:* as well.
The same is the case in the parentFilter syntax - there is no inherent set of all documents automagically prefixed internally, so if you have a negative query, you'll have to add it yourself.
*,[child parentFilter="*:* -solr_record_type:[2 TO 4]"]

How to tune Solr query with field priority?

I have to make query to Apache Solr, like this:
name:ABCXYZ AND address:Ame*
I find that the the query with just name:ABCXYZ receives a quick response, with only 1 result. However, the response time is much higher for the query above, which includes a second field, address.
How can I tune Solr, or my query, so that it prioritizes the search for each field? In my case, this would mean to search name before address.
"name:ABCXYZ" gives an answer straight away, because the query is very specific and yields one distinct result. Normally you would expect a query like "name:ABCXYZ AND address:Ame*" to find all entries where name is ABCXYZ, then further find entries where address starts with "Ame".
The thing about Solr is that it is possible to configure it in such a way that even though "name:ABCXYZ AND address:Ame*" only yields one result, solr will continue to search for other entries that doesn't match the entire query string, but matches part of string.
This means that perhaps your search is too "kind"?
We use a query parameter named "mm" (Minimum Match) which you can set on your search handler in solrconfig.xml. This parameter specifies the minimum number of clauses that the query must match. So if mm=1 for instance, your query will find entries where name is "ABCXYZ', but also adresses that start with "Ame". Maybe you should look into this mm parameter? It's possible to set mm=100% which should force your search handler to find exact matches I imagine.
Edit: "mm" is for DisMax Query Parsers by the way.
search name before address
name:ABCXYZ^10 OR address:Ame^6
assigns name a boost of 10, and address a boost of 7.
These boost factors make matches in name much more significant than matches in address

Solr negative boost

I'm looking into the possibility of de-boosting a set of documents during
query time. In my application, when I search for e.g. "preferences", I want
to de-boost content tagged with ContentGroup:"Developer" or in other words,
push those content back in the order. Here's the catch. I've the following
weights on query fields and boost query on source
qf=text^6 title^15 IndexTerm^8
As you can see, title has a higher weight.
Now, a bunch of content tagged with ContentGroup:"Developer" consists of a
title like "Preferences.material" or "Preferences Property" or
"Preferences.graphics". The boost on title pushes these documents at the
top.
What I'm looking is to see if there's a way to deboost all documents that are
tagged with ContentGroup:"Developer" irrespective of the term occurrence is
text or title. I tried something like, but didn't make any difference.
Source:simplecontent^10 Source:Help^20 (-ContentGroup-local:("Developer"))^99
I'm using edismax query parser.
Any pointers will be appreciated.
Thanks,
Shamik
You're onto something with your last attempt, but you have to start with *:*, so that you actually have something to subtract the documents from. The resulting set of documents (those not matching your query) can then be boosted.
From the Solr Relevancy FAQ
How do I give a negative (or very low) boost to documents that match a query?
True negative boosts are not supported, but you can use a very "low" numeric boost value on query clauses. In general the problem that confuses people is that a "low" boost is still a boost, it can only improve the score of documents that match. For example, if you want to find all docs matching "foo" or "bar" but penalize the scores of documents matching "xxx" you might be tempted to try...
q = foo^100 bar^100 xxx^0.00001 # NOT WHAT YOU WANT
...but this will still help a document matching all three clauses score higher then a document matching only the first two. One way to fake a "negative boost" is to give a large boost to everything that does not match. For example...
q = foo^100 bar^100 (*:* -xxx)^999
NOTE: When using (e)dismax, people sometimes expect that specifying a pure negative query with a large boost in the "bq" param will work (since Solr automatically makes top level purely negative positive queries by adding an implicit ":" -- but this doesn't work with "bq", because of how queries specified via "bq" are added directly to the main query. You need to be explicit...
?defType=dismax&q=foo bar&bq=(*:* -xxx)^999

How do I create a Solr query that returns results even if one field in my query has no matches?

Suppose I want to create a recommendation system to suggest people you should connect with based off of certain attributes that I know about you and attributes I have about other people that are stored in a Solr index. Is it possible to query the index with a list of attributes (along with boosts for each attribute) and have Solr return scored results even if some of my fields return no matches? The way that I understand that Solr works is that if one of your fields doesn't contain a match in any documents found in your index, you get zero results for the entire query (even if other fields in the query matched) - is that right? What I would hope is that I could query the index and get a list of results back in order of a score given based on how many (and which) fields matched to something, even if some fields have no matches, for example:
Say that there are 2 people documents stored in the index as follows (figuratively):
Person 1:
Industry: Manufacturing
City: Oakland
Person 2:
Industry: Manufacturing
City: San Jose
And say that I perform a pseudo-Solr query that basically says "Search for everyone whose industry is equal to manufacturing and whose city is equal to Oakland". What I would like is to receive both results back in the result set, even though one of the "Persons" does not reside in Oakland. I just want that person to come back as a result with a lower score than Person1. Is this possible? What might a solr query look like to handle this? Assume that I have many more than 2 attributes for each person (so saying that I can use "And" and "Or" in my solr query isn't really feasible.. or is it?) Thanks in advance for your helpful input! (PS I'm using Solr 3.6)
You mention using the AND operator, which is likely your problem.
The default behavior of Lucene, and Solr, query syntax is exactly what you are asking for. A query like:
industry:manufacturing city:oakland
Will match either, with scoring preference on those that match both. See the lucene query syntax documentation
You can use the bq parameter (boost query) does not affect matching, but affects the scores only.
http://localhost:8983/solr/persons/select?q=industry:manufacturing&bq=City:Oakland^2
play with the boosting factor at the end to get the correct balance between matching score, and boosting score.

Resources