Query SolR for uniq value in multiple field - solr

i'm looking for a particular SolR query that can select a value in a field, only if it is unique.
for exemple, here is some documents :
<doc>
<id>1</id>
<folder_id>abc;def;ghi</folder_id>
</doc>
<doc>
<id>2</id>
<folder_id>def</folder_id>
</doc>
If I ask solr for the folder_id:"def", it will gives me back the two documents, but I want only the one with id: 2
What I want is to be able to retrieve all the documents that have the key def and only this one.
Unfortunatly I can't be able to retreive all the other keys to be able to create a query like this one folder_id:"def" AND NOT folder_id:("abc", "ghi")
Let me know if you guys need some more info

Use the String as field type for your field instead of text.
String stores a word/sentence as an exact string without performing tokenization etc. Commonly useful for storing exact matches, e.g, for facetting. Text typically performs tokenization, and secondary processing (such as lower-casing etc.).
Currently you have field type as Text and it is tokenizing the text and creating separate tokens. Hence you are getting 2 results.
If you apply the String type to your field then you will be able to achieve the exact match.
You can also have the KeywordTokenizer with lowercasefilter factory for your field.
If you want tokenization then you can have 2 fields, one with String type and other with Text type. It all depends on your requirement.

Related

Incorrect results for Solr search with multiple terms

Perhaps someone can enlighten me on how Solr matches terms. So I have a string attribute named assignedBy, and I do a query against this attribute with the value "Aaron Mason" (no quotes). Solr returns more matches than I anticipated because the term "Mason" also matches documents whose other fields contain the word "Mason" in it. By turning on debugging feature (from Solr admin), I see Solr breaks down the query into two attribute queries - "aaron" for assignedBy and "mason" for the catch-all text (see below). Is this the correct behavior? How do I ensure that it only finds matches against the attribute I specify? Thanks.
"debug":{
"rawquerystring":"assignedBy:Aaron Mason",
"querystring":"assignedBy:Aaron Mason",
"parsedquery":"assignedBy:aaron _text_:mason",
"parsedquery_toString":"assignedBy:aaron _text_:mason",
yes you are correct. when you q=assignedBy:Aaron Mason
after parsing the query, based on you query tokenizers in schema file, it looks like
assignedBy:aaron and _text_:mason.
if you don't specify field name queryterm is searched in default field (which is set in solrconfig.xml file) you can look for <str name="df">text</str> under /select handler. in your case it might be _text_.
So, Solr search for its index and retrieve combined results of all documents which has field assignedBy with term "Aaron" and all documents which has field _text_ with term "mason".
you might have used copyfield to copy some field values to text field. check for it.
You can use dismax/edismax where you can specify in which field all your terms to search for
example:
q=Aaron Mason&wt=json&debugQuery=on&defType=dismax&qf=assignedBy
This only finds matches against the field "assignedBy" specified in qf

Query for solr empty field

I am having a list of profiles in solr, indexing with some empty fields (e.g.: country/category etc).
<arr name="country">
<str>185</str>
</arr>
<arr name="category">
<int>38</int>
</arr>
I want to search profiles with no country. I used -country:['' TO *] as country is a string field.
Now how can I check it for an integer field? The field may be empty (no field) or with value 0. I tried category:0 but it is not giving me the correct output, output is empty in this case.
For string type of field you could query like
-country:* OR country:""
It'll give you all the Solr document which doesn't have country field at all. Or country with empty string.
For integer type field
-category:* OR category:0
It'll give you result with empty(no field) or with value 0.
There is an issue with Solr OR and NOT search. Please refer to using OR and NOT in solr query for a better understanding of OR and NOT query.

solr - string search field multiple value, all word must match

Currently i have sample data like this :
<doc>
<int name="name">Nice Dress</int>
<arr name="keyword">
<str>best cocktail dress</str>
<str>platform complete pumps</str>
<str>platform pumps</str>
<str>slip dress</str>
</arr>
I used multiple value for "keyword" field.
case 1
defType:edismax
qf:keyword
q:cocktail dress
solr will return the data.
case 2
defType:edismax
qf:keyword
q:coctail dress pump
it still return the data, If we see from the sample data, no keyword contain all this 3 word ('coctail' 'dress' 'pump') in one row of each keyword.
How to make solr not to return this result?
Thanks.
Check for two parameters
positionIncrementGap - For multivalued fields this parameter would decide what is it distance between the two fields in the multivalued fields. If this value is 100 so the distance between the two multivalued fields would be 100 positions.
Note - The default positionIncrementGap is 0
Check for the qs query slop parameter for dismax which will will decide the slop match between the terms.
Try this query:
q:(coctail dress pump)~100
with your positionIncrementGap set to something like 300.
Those values will need to change depending on how long are your data.

Solr copyField mixed with RegexTransformer

Scenario:
In the database I have a field called Categories which of type string and contains a number of digits pipe delimited such as 1|8|90|130|
What I want:
In Solr index, I want to have 2 fields:
Field Categories_ pipe which would contain the exact string as in the DB i.e. 1|8|90|130|
Field Categories which would be a multi-valued field of type INT containing values 1, 8, 90 and 130
For the latter, in the entity specification I can use a regexTransformer then I specify the following field in data-config.xml:
<field column="Categories" name="Navigation" splitBy="\|"/> and then specify the field as multi-valued in schema.xml
What I do not know is how can I 'copy' the same field twice and perform regex splitting only on one. I know there is the copyField facility that can be defined in schema.xml however I can't find a way to transform the copied field because from what I know (and I maybe wrong here), transformers are only available in the entity specification.
As a workaround I can also send the same field twice from the entity query but in reality, the field Categories is a computed field (selects nested) which is somewhat expensive so I would like to avoid it.
Any help is appreciated, thanks.
Instead of splitting it at data-config.xml. You could do that in your schema.xml. Here is what you could do,
Create a fieldType with tokenizer PatternTokenizerFactory that uses regex to split based on |.
FieldSplit: Create a multivalued field using this new fieldType, will eventually have 1,8,90,130
FieldOriginal: Create String field (if you need no analysis on that), that preserves original value 1|8|90|130|
Now you can use copyField to copy FieldSplit , FieldOriginal values based on your need.
Check this Question, it is similar.
You can create two columns from the same data and treat them separately.
SELECT categories, categories as categories_pipe FROM category_table
Then you can split the "categories" column, but index the other one as-is.

How would I search for blank facets in a multi valued facet field and at the same time in Solr?

I have an application where users can pick car parts. They pick their vehicle and then pick vehicle attributes as facets. After they select their vehicle, they can pick facets like engine size, for example, to narrow down the list of results. The problem was, not all documents have an engine size (it's an empty value in Solr), as it doesn't matter for all parts. For example, an engine size rarely matters for an air filter. So even if a user picked 3.5L for their engine size, I still wanted to show the air filters on the screen as a possible part the user could pick.
I did some searching and the following facet query works perfectly:
enginesize:"3.5" OR enginesize:(*:* AND -enginesize:[* TO *])
This query would match either 3.5 or would match records where there was no value for the engine size field (no value meant it didn't matter, and it fit the car). Perfect...
THE PROBLEM: I recently made the vehicle attribute fields multivalued fields, so I could store attributes for each part as a list. I then applied faceting to it, and it worked fine. However, the problem came up when I applied the query previously mentioned above. While selecting the enginesize facet narrowed down the number of documents displayed to only documents that have that engine size, records (I also use the word record to mean document) that had empty values (i.e. "") for enginesize were not appearing. The same query above does not work for multivalued facets the same way it did when enginesize was a single valued field.
Example:
<doc>
<str name="part">engine mount</str>
<arr name="enginesize">
<str/>
<str/>
<str>3.5</str>
<str>3.5</str>
<str>3.5</str>
<str>3.5</str>
<str>3.5</str>
</arr>
<doc>
<doc>
<str name="part">engine bolt</str>
<arr name="enginesize">
<str>6</str>
<str>6</str>
<str>6</str>
<str>6</str>
<str>6</str>
</arr>
<doc>
<doc>
<str name="part">air filter</str>
<arr name="enginesize">
<str/>
<str/>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
</arr>
<doc>
What I am looking for is a query that will pull back documents 1 and 3 above when I do a facet search for the engine size for 3.5. The first document (the engine mount) matches, because it contains the value in one of the multivalued fields "enginesize" that I am looking for (contains 3.5 in one of the fields). However, the third document for the air filter doesn't get returned because of the empty <str> values. I do not want to return the second document at all because it doesn't match the facet value
I basically want a query that will match empty string values for a given facet and also match the actual value, so I get both documents returned.
Does someone have a query that would return document 1 and document 3 (the engine bracket and the air filter), but not the engine bolt document?
I tried the following without success (including the one at the very top of this question):
// returns everything
enginesize:"3.5" OR (enginesize:[* TO *] )
// only returns document 1
enginesize:"3.5" OR (enginesize:["" TO ""] AND -enginesize:"3.5")
// only returns document 1
enginesize:"3.5" OR (enginesize:"")
I imported the data above using a CSV file, I set the field keepEmpty=true. I tried instead manually inserting a space into the field when I generated the CSV file (which would give you <str> </str>, instead of the previous , and then retried the queries. Doing that, I got the following results:
// returns document 1
enginesize:"3.5" OR enginesize:(*:* AND -enginesize:[* TO *])
// returns all documents
enginesize:"3.5" OR (enginesize:["" TO ""] AND -enginesize:"3.5")
// returns all documents
enginesize:"3.5" OR (enginesize:"")
Does anyone have a query that would work for either situation, whether I have a space as the blank value or simply no value at all?
How about changing how you index, instead of how you query?
Instead of trying to index "engine size doesn't matter" as an empty record, index it as "ANY".
Then your query simply becomes enginesize:"3.5" OR (enginesize:ANY)
i've just been playing with this and found a hint that seems to do the trick for me. translated to your query it should be:
enginesize:"3.5" OR (-enginesize:["" TO *])
hth,
andi
update: after some more testing i don't think this works reliably — for some indexes it had to be the other way round and without the minus sign, i.e. enginesize:[* TO ""]. this might depend on the index type, if it's multi-valued or even on the actual values.
in any case it seems too much of a hack. i'll probably resolve to substituting the empty value with a special marker...
I had the same problem, but solved it in https://stackoverflow.com/a/35633038/13365:
enginesize:"3.5" OR (*:* NOT enginesize:["" TO *])
The -enginesize solution didn't work for me.

Resources