Searching multi-valued fields in the same position - solr

Let's say we have a document like this:
<arr name="pvt_rate_type">
<str>CORPORATE</str>
<str>AGENCY</str>
</arr>
<arr name="pvt_rate_set_id">
<str>1</str>
<str>2</str>
</arr>
Now I do a search where I want to return the document only if it contains pvt_rate_set_id = 1 AND pvt_rate_type = AGENCY in the same position in their mutli-valued fields so the above document should NOT be returned (because pvt_rate_set_id 1 has a pvt_rate_type of CORPORATE)
Is this possible at all in SOLR ? or is my schema badly designed ? how else would you design tat schema to allow for the searching I want?

This may not be available Out of the Box.
You would need to modify the schema to have fields with pvt rate type as field name and id as its value
e.g.
CORPORATE=1
AGENCY=2
This can be achieved by having dynamic fields defined.
e.g.
<dynamicField name="*_pvt_rate_type" type="string" indexed="true" stored="true"/>
So you can input data as corporate_pvt_rate_type or agency_pvt_rate_type with the respective values.
The filter queries will be able to match the exact mappings fq=corporate_pvt_rate_type:1

Unfortunately Solr does not seem to support this.
Another way to do this in Solr would be to store a concatenated string field type_and_id with a delimiter (say comma) separating the type and the id and query like:
q=type_and_id:AGENCY%2C1
(where %2C is the URL encoding for comma).

Related

Using logic AND in a text field

I'm using a schema that has a text field containing ids separated by spaces. The field definition in schema is below:
<field name="aux_identifiers" type="text" indexed="true" stored="true"/>
a query that fetch a single document returns the field as below - example:
<str name="aux_identifiers">1 2 3 4</str>
is there any possibility to apply a logic AND operator to these fields? I need to find the documents that has, as example, the ids 2 and 3 in the field.
fyi, we can't modifiy those fields to multivalued or array and reindex right now. that's why i'm trying a alternate solution.
It would depend on what kind of processing you have on that field, but this should work:
q=aux_identifier:2 AND aux_identifier:3

preserve association in multivalued in solr

I have multivalued fields in my solr datasource. sample is
<doc>
<str name="id">23606</str>
<arr name="institution">
<str>Harvard University</str>
<str>Yale Universety</str>
<str>Cornell University</str>
<str>TUFTS University</str>
<str>University of Arizona</str>
</arr>
<arr name="degree_level">
<str>Bachelors</str>
<str>Diploma</str>
<str>Master</str>
<str>Master</str>
<str>PhD</str>
</arr>
</doc>
in the example above this user has got Bachelors degree from Harvard, Diploma from Yale, Master from Cornell, Master from TUFTS, and PhD from Arizona.
now if i search for users who have Bachelors degree and graduated from Harvard, i will get this user, which is correct.
MyDomain:8888/solr/mycol/select?facet=true&q=:&fq=degree_level:Bachelors&fq=institution:Harvard+University
but if i want those who have Bachelors from Cornell, i will get this user as well, which is incorrect!
MyDomain:8888/solr/mycol/select?facet=true&q=:&fq=degree_level:Bachelors&fq=institution:Cornell+University
The question is: how could i preserve ordering/mapping in multivalued in solr?
Edit:
By the way, i know that i can solve my problem by creating new field to contain concatenation of the degree with university (ie, "Bachelors_Harvard University", "Diploma_Yale Universety", and so on) but i need a solution based on solr core itself as i have a lot of multivalued fields with a lot of combinations.
Below is a list of some suggestions
try using dynamic fields
<dynamicField name="degree_level_*" type="string" indexed="true" stored="true" />
and create fields dynamically while indexing degree_level_Bachelors with value Harward University and so on. so when you want to filter on Bachelors degree, filter on field degree_level_Bachelors. Similarly, if you want to allow filtering on institutions, create a dynamic field for institutions.
you can pre define how you will be storing data:
<year><seperator><degree><seperator><institution><seperator><Major> etc etc.
and then filter on the reqired regex.
eg:
fq=educationDetails:2009#Bachelors#Harvard#*
this will give you all records with bachelors from Harvard in 2009.
you will have to come up with the regex expressions for all the different filters.
two collections to correctly model the one-to-many relationship between user and degree queried using {!join}
one collection at a "user-degree" level of granularity that gets deduped via Solr's field collapsing support.

Query for solr empty field

I am having a list of profiles in solr, indexing with some empty fields (e.g.: country/category etc).
<arr name="country">
<str>185</str>
</arr>
<arr name="category">
<int>38</int>
</arr>
I want to search profiles with no country. I used -country:['' TO *] as country is a string field.
Now how can I check it for an integer field? The field may be empty (no field) or with value 0. I tried category:0 but it is not giving me the correct output, output is empty in this case.
For string type of field you could query like
-country:* OR country:""
It'll give you all the Solr document which doesn't have country field at all. Or country with empty string.
For integer type field
-category:* OR category:0
It'll give you result with empty(no field) or with value 0.
There is an issue with Solr OR and NOT search. Please refer to using OR and NOT in solr query for a better understanding of OR and NOT query.

solr facet field values appear to be generated by solr

I want to facet on a specific field. The field is defined as
<field name="specials_de" type="textgen" indexed="true" stored="true" multiValued="true"/>
Two of the values in specials_de are "Cunard Hamburg Specials" and "Cunard the New Yorker". I want to use these two values as facets, but the solr query returns facet fields with values like
<int name="cunard">11</int>
<int name="new">9</int>
<int name="yorker">9</int>
<int name="hamburg">5</int>
<int name="hamburgspecialscunard">3</int>
<int name="hamburgspecials">2</int>
What am I doing wrong?
Just to clarify: I'm not referring to the counts (11, 9, etc.), but to the names, i.e. "cunard", "new", etc.
Text fields are not suggested to be used for Faceting.
You won't get the desired behavior as the text fields would be tokenized and filtered leading to the generation of multiple tokens which you see from the facets returned as response.
SolrFacetingOverview :-
Because faceting fields are often specified to serve two purposes,
human-readable text and drill-down query value, they are frequently
indexed differently from fields used for searching and sorting:
They are often not tokenized into separate words
They are often not mapped into lower case
Human-readable punctuation is often not removed (other than double-quotes)
There is often no need to store them, since stored values would look much like indexed values and the faceting mechanism is used for
value retrieval.
Try to use String fields and it would be good enough without any overheads.

Solr query must match all words/tokens in a field

I have a text-field called name in my schema.xml. A query q=name:(organic) returns the following documents:
<doc>
<str name="id">ontology.category.1483</str>
<str name="name">Organic Products</str>
</doc>
<doc>
<str name="id">ontology.keyword.4896</str>
<str name="name">Organic Stores</str>
</doc>
This is perfectly right in a normal Solr Search, however I would like to construct the query so that it doesn't return anything because 'organic' only matches 1 of the 2 words available in the field.
A better way to say it could be this: Only return results if all tokens in the field are matched. So if there are two words (tokens) in a field and I only match 1 ('organic', 'organics','organ' etc.) I shouldn't get a match because only 50% of the field has been searched on.
Is this possible in Solr? How do I construct the query?
you are probably using StandardTokenizerFactory (or something similar), one solution is to use KeywordTokenizerFactory and issue a phrase query and then only perfect matches will work. Of course remember other filters you might want to use (like LowerCaseFilterFactory etc). Note that: "stores organic" will not match your doc either
Due to time contraints, I had to resort to the following (hacky) solution.
I added the term count to the index via a DynamicField field called tc_i.
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
Now at query time I count the terms and append it to the query, so q=name:(organic) becomes q=name:(organic) AND tc_i:(1) and this won't return documents for "organic stores" / "organic products" obviously because their tc_i fields are set at 2 (two words).

Resources