solr - string search field multiple value, all word must match - solr

Currently i have sample data like this :
<doc>
<int name="name">Nice Dress</int>
<arr name="keyword">
<str>best cocktail dress</str>
<str>platform complete pumps</str>
<str>platform pumps</str>
<str>slip dress</str>
</arr>
I used multiple value for "keyword" field.
case 1
defType:edismax
qf:keyword
q:cocktail dress
solr will return the data.
case 2
defType:edismax
qf:keyword
q:coctail dress pump
it still return the data, If we see from the sample data, no keyword contain all this 3 word ('coctail' 'dress' 'pump') in one row of each keyword.
How to make solr not to return this result?
Thanks.

Check for two parameters
positionIncrementGap - For multivalued fields this parameter would decide what is it distance between the two fields in the multivalued fields. If this value is 100 so the distance between the two multivalued fields would be 100 positions.
Note - The default positionIncrementGap is 0
Check for the qs query slop parameter for dismax which will will decide the slop match between the terms.

Try this query:
q:(coctail dress pump)~100
with your positionIncrementGap set to something like 300.
Those values will need to change depending on how long are your data.

Related

Query for solr empty field

I am having a list of profiles in solr, indexing with some empty fields (e.g.: country/category etc).
<arr name="country">
<str>185</str>
</arr>
<arr name="category">
<int>38</int>
</arr>
I want to search profiles with no country. I used -country:['' TO *] as country is a string field.
Now how can I check it for an integer field? The field may be empty (no field) or with value 0. I tried category:0 but it is not giving me the correct output, output is empty in this case.
For string type of field you could query like
-country:* OR country:""
It'll give you all the Solr document which doesn't have country field at all. Or country with empty string.
For integer type field
-category:* OR category:0
It'll give you result with empty(no field) or with value 0.
There is an issue with Solr OR and NOT search. Please refer to using OR and NOT in solr query for a better understanding of OR and NOT query.

Filtering multiValued Field in Solr

How can i search inside array of a multivalued field
My Data is like this
<str name="Key">8</str>
<arr name="city">
<str>Achabal (NAC)</str>
<str>Addi Gam</str>
<str>Adeh Hall</str>
<str>Aho Paisan</str>
<str>Akin Gam</str>
<str>Akura</str>
.......
</arr>
<str name="state">Chandigarh</str>
I want to search inside city filed i am trying the query as below
q=city:*Ak* AND state:Chandigarh <br>
But not working
Above mentioned data is single document
Multivalued fields have no difference with monovalued field (from query perspective). Note that in your query there's an error: SOLR doesn't support using a * symbol as the first character of a search.
See links below
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
http://solr.pl/en/2010/12/20/wildcard-queries-and-how-solr-handles-them
http://www.solrtutorial.com/solr-query-syntax.html
You should not use & in your query but + or %20 (space) instead. & separates url query parameters and state stuff will not be passed as value for q.
Try q=city:*Ak*+state:Chandigarh or q=city:*Ak* state:Chandigarh.
What you also probably would like to do is use filter query instead of query parameter here:
q=city*Ak*&fq=state:Chandigarh
This will query for all cities with 'Ak' and limit results only to ones with state='Chandigarh'

solr facet field values appear to be generated by solr

I want to facet on a specific field. The field is defined as
<field name="specials_de" type="textgen" indexed="true" stored="true" multiValued="true"/>
Two of the values in specials_de are "Cunard Hamburg Specials" and "Cunard the New Yorker". I want to use these two values as facets, but the solr query returns facet fields with values like
<int name="cunard">11</int>
<int name="new">9</int>
<int name="yorker">9</int>
<int name="hamburg">5</int>
<int name="hamburgspecialscunard">3</int>
<int name="hamburgspecials">2</int>
What am I doing wrong?
Just to clarify: I'm not referring to the counts (11, 9, etc.), but to the names, i.e. "cunard", "new", etc.
Text fields are not suggested to be used for Faceting.
You won't get the desired behavior as the text fields would be tokenized and filtered leading to the generation of multiple tokens which you see from the facets returned as response.
SolrFacetingOverview :-
Because faceting fields are often specified to serve two purposes,
human-readable text and drill-down query value, they are frequently
indexed differently from fields used for searching and sorting:
They are often not tokenized into separate words
They are often not mapped into lower case
Human-readable punctuation is often not removed (other than double-quotes)
There is often no need to store them, since stored values would look much like indexed values and the faceting mechanism is used for
value retrieval.
Try to use String fields and it would be good enough without any overheads.

Searching multi-valued fields in the same position

Let's say we have a document like this:
<arr name="pvt_rate_type">
<str>CORPORATE</str>
<str>AGENCY</str>
</arr>
<arr name="pvt_rate_set_id">
<str>1</str>
<str>2</str>
</arr>
Now I do a search where I want to return the document only if it contains pvt_rate_set_id = 1 AND pvt_rate_type = AGENCY in the same position in their mutli-valued fields so the above document should NOT be returned (because pvt_rate_set_id 1 has a pvt_rate_type of CORPORATE)
Is this possible at all in SOLR ? or is my schema badly designed ? how else would you design tat schema to allow for the searching I want?
This may not be available Out of the Box.
You would need to modify the schema to have fields with pvt rate type as field name and id as its value
e.g.
CORPORATE=1
AGENCY=2
This can be achieved by having dynamic fields defined.
e.g.
<dynamicField name="*_pvt_rate_type" type="string" indexed="true" stored="true"/>
So you can input data as corporate_pvt_rate_type or agency_pvt_rate_type with the respective values.
The filter queries will be able to match the exact mappings fq=corporate_pvt_rate_type:1
Unfortunately Solr does not seem to support this.
Another way to do this in Solr would be to store a concatenated string field type_and_id with a delimiter (say comma) separating the type and the id and query like:
q=type_and_id:AGENCY%2C1
(where %2C is the URL encoding for comma).

How would I search for blank facets in a multi valued facet field and at the same time in Solr?

I have an application where users can pick car parts. They pick their vehicle and then pick vehicle attributes as facets. After they select their vehicle, they can pick facets like engine size, for example, to narrow down the list of results. The problem was, not all documents have an engine size (it's an empty value in Solr), as it doesn't matter for all parts. For example, an engine size rarely matters for an air filter. So even if a user picked 3.5L for their engine size, I still wanted to show the air filters on the screen as a possible part the user could pick.
I did some searching and the following facet query works perfectly:
enginesize:"3.5" OR enginesize:(*:* AND -enginesize:[* TO *])
This query would match either 3.5 or would match records where there was no value for the engine size field (no value meant it didn't matter, and it fit the car). Perfect...
THE PROBLEM: I recently made the vehicle attribute fields multivalued fields, so I could store attributes for each part as a list. I then applied faceting to it, and it worked fine. However, the problem came up when I applied the query previously mentioned above. While selecting the enginesize facet narrowed down the number of documents displayed to only documents that have that engine size, records (I also use the word record to mean document) that had empty values (i.e. "") for enginesize were not appearing. The same query above does not work for multivalued facets the same way it did when enginesize was a single valued field.
Example:
<doc>
<str name="part">engine mount</str>
<arr name="enginesize">
<str/>
<str/>
<str>3.5</str>
<str>3.5</str>
<str>3.5</str>
<str>3.5</str>
<str>3.5</str>
</arr>
<doc>
<doc>
<str name="part">engine bolt</str>
<arr name="enginesize">
<str>6</str>
<str>6</str>
<str>6</str>
<str>6</str>
<str>6</str>
</arr>
<doc>
<doc>
<str name="part">air filter</str>
<arr name="enginesize">
<str/>
<str/>
<str></str>
<str></str>
<str></str>
<str></str>
<str></str>
</arr>
<doc>
What I am looking for is a query that will pull back documents 1 and 3 above when I do a facet search for the engine size for 3.5. The first document (the engine mount) matches, because it contains the value in one of the multivalued fields "enginesize" that I am looking for (contains 3.5 in one of the fields). However, the third document for the air filter doesn't get returned because of the empty <str> values. I do not want to return the second document at all because it doesn't match the facet value
I basically want a query that will match empty string values for a given facet and also match the actual value, so I get both documents returned.
Does someone have a query that would return document 1 and document 3 (the engine bracket and the air filter), but not the engine bolt document?
I tried the following without success (including the one at the very top of this question):
// returns everything
enginesize:"3.5" OR (enginesize:[* TO *] )
// only returns document 1
enginesize:"3.5" OR (enginesize:["" TO ""] AND -enginesize:"3.5")
// only returns document 1
enginesize:"3.5" OR (enginesize:"")
I imported the data above using a CSV file, I set the field keepEmpty=true. I tried instead manually inserting a space into the field when I generated the CSV file (which would give you <str> </str>, instead of the previous , and then retried the queries. Doing that, I got the following results:
// returns document 1
enginesize:"3.5" OR enginesize:(*:* AND -enginesize:[* TO *])
// returns all documents
enginesize:"3.5" OR (enginesize:["" TO ""] AND -enginesize:"3.5")
// returns all documents
enginesize:"3.5" OR (enginesize:"")
Does anyone have a query that would work for either situation, whether I have a space as the blank value or simply no value at all?
How about changing how you index, instead of how you query?
Instead of trying to index "engine size doesn't matter" as an empty record, index it as "ANY".
Then your query simply becomes enginesize:"3.5" OR (enginesize:ANY)
i've just been playing with this and found a hint that seems to do the trick for me. translated to your query it should be:
enginesize:"3.5" OR (-enginesize:["" TO *])
hth,
andi
update: after some more testing i don't think this works reliably — for some indexes it had to be the other way round and without the minus sign, i.e. enginesize:[* TO ""]. this might depend on the index type, if it's multi-valued or even on the actual values.
in any case it seems too much of a hack. i'll probably resolve to substituting the empty value with a special marker...
I had the same problem, but solved it in https://stackoverflow.com/a/35633038/13365:
enginesize:"3.5" OR (*:* NOT enginesize:["" TO *])
The -enginesize solution didn't work for me.

Resources