Is the order of multi-value fields reliable in Solr? - solr

I have an indexed multi-value field. If I add a parallel multi-value field, is it reliable to have the same order?
Consider this CSV and separator |:
ID,Name,Number
988,Sixth|Second|Third,6|2|3
989,Fifth|Fourth|First,5|4|1
If I get the records by id (not search), can I be sure that the arrays of two fields are always in the original matching order?
{
"doc":
{
"ID":988,
"Name":["Sixth",
"Second",
"Third"]},
"Number":[6,
2,
3]}}

Yes, the order is deterministic and stable. You can safely assume that the sequence of multivalued fields is kept intact. The post detailing this on the mailing list has since disappeared.
There is no guarantee about the ordering of the fields - i.e. "Number" can come before "Name", but internally in the field (the mulivalued part) the values will be returned in the same order as they were indexed.
We've been running applications on Solr since 2008 that depend on this behavior and it has never been an issue.
If there's many fields where you need to know that [0] in one field corresponds to [0] in another field, etc., it might be more useful to add a stored only (not indexed, etc.) JSON representation of the structure as a field and just index the other fields (and not store them) to make the application level code simpler.

Related

Is there ever a reason for copying fields in to a facet field in the index?

I'm looking at a very old solr instance (4-6 years since last touched), and I am seeing these extra dynamic fields, 'f_' and 'fs_' for multi and single valued facet fields.
My understanding, though, is that facets only happen in query-time.
Also, it's just a copy over - the fields dont change type.
So before I nuke these fields to kingdom come; is there a reason for facet fields in an index that is just a copied field?
Thanks
Facets only happening query time is a bit of a misnomer - the content (the tokens) that the facet represents from is generated when indexing. The facet gives the distinct number of documents that has a specific token present.
That means that if the field type is identical and there is only one field being copied into the other named field, the behaviour between the source and the destination field should be identical.
However, if there are multiple fields copying content into the same field, the results will differ. Also be aware that the type is given from the schema for the field, it's not changed by the copyField instruction in any way. A copy field operation happens before any content runs through the indexing chain for the field.
Usually you want facets to be generated on string fields so that the indexed values are kept as-is, while you want to use a text field or similar for searching (with tokenization), since a string field would only give exact (including matching case) hits.

In Solr, how can I do a range query on a multi-valued field with prefix?

This question has some older discussion but I am hoping newer versions might have an answer. My data contains prefixed multi-valued fields.
ex. CustomProperties:["Age:50", "BMI:25"].
I would like to be able to query BMI:[* TO 26].
Then index them as actual fields. Solr won't be able to do proper, integer based range searches when they're just multivalued strings like that.
Add a CustomProperties_* dynamic field type mapped as an integer or long (if all your values are integers), then add the values as CustomProperties_Age and CustomProperties_BMI. Querying the values will then be the same as for any other field:
q=CustomProperties_BMI:[* TO 26]

Hybris: Combine different solr facet under one

I have applied solr facet on properties of products.
Eg: The product can be either Medicine(0/1) or Drug(0/1) or Poison(0/1).
0 means NO, 1 means YES.
These are different features of a product hence appear as different facets. It is possible to display them under one facet instead eg: "Type", under which these three solr facet "Medicine", "Drug", "Poison" should display like:
Type
-----
Medicine (50)
Drug (100)
Poison (75)
Not sure about Hybris, but you should be able to do so with facet queries. You would have one facet query per each of your three conditions. In the UI, you can organize the counts anyway you want.
However, I am not sure why you can't just have a category field that contains a multi-valued field that contains Medicine and/or Drug and/or Poison value. Then faceting on that field would give you the breakdowns. If your values do not come in that way, you can probably manipulate them either with copyField or with a custom Update Request Processor chain to merge into one field.
This is super easy. Just make an IndexedProperty "Type" and a new custom ValueProvider for it. Then extract these values based on the boolean flags - just hard code if necessary. No need for anything more complex.
I tried the solutions posted here but they were not fitting my requirement. I did changes through facet navigation tag files to bring all classification attribute facets (Medicine, Drug, Poison) under a single facet (Type).

Solrnet facet returning spaces

I'm using Solrnet to return search results and am also requesting the facets, in particular categories which is a multi-valued field.
The problem I'm coming up against is that the category "house products" is being returned as two seperate facets because of the space.
Is there a way of ensuring this is returned as a single facet value, or should I be escaping the value when it is added to the index?
Thanks in advance
Al
If the tokens are generated for house products then you are using text analysis for the field.
Text fields are not suggested to be used for Faceting.
You won't get the desired behavior as the text fields would be tokenized and filtered leading to the generation of multiple tokens which you see from the facets returned as response.
Use a copy field to copy the field to a String field to be able to facet on it without splitting the words.
SolrFacetingOverview :-
Because faceting fields are often specified to serve two purposes,
human-readable text and drill-down query value, they are frequently
indexed differently from fields used for searching and sorting:
They are often not tokenized into separate words
They are often not mapped into lower case
Human-readable punctuation is often not removed (other than double-quotes)
There is often no need to store them, since stored values would look much like indexed values and the faceting mechanism is used for
value retrieval.
Try to use String fields and it would be good enough without any overheads.
The faceting works on tokens, so if you have a field that is tokenized in many words it will split the facet too.
I suggest you create another field of type string used only for faceting.

SOLR: Is it it possible to index multiple timestamp:value pairs per document?

Is it possible in solr to index key-value pairs for a single document, like:
Document ID: 100
2011-05-01,20
2011-08-23,200
2011-08-30,1000
Document ID: 200
2011-04-23,10
2011-04-24,100
and then querying for documents with a specific value aggregation in a specific time range, i.e. "give me documents with sum(value) > 0 between 2011-08-01 and 2011-09-01" would return the document with id 100 in the example data above.
Here is a post from the Solr User Mailing List where a couple of approaches for dealing with fields as key/value pairs are discussed.
1) encode the "id" and the "label" in the field value; facet on it;
require clients to know how to decode. This works really well for simple
things where the the id=>label mappings don't ever change, and are
easy to encode (ie "01234:Chris Hostetter"). This is a horrible approach
when id=>label mappings do change with any frequency.
2) have a seperate type of "metadata" document, one per "thing" that you
are faceting on containing fields for id and the label (and probably a
doc_type field so you can tell it apart from your main docs) then once
you've done your main query and gotten the results back facetied on id,
you can query for those ids to get the corrisponding labels. this works
realy well if the labels ever change (just reindex the corrisponding
metadata document) and has the added bonus that you can store additional
metadata in each of those docs, and in many use cases for presenting an
initial "browse" interface, you can sometimes get away with a cheap
search for all metadata docs (or all metadata docs meeting a certain
criteria) instead of an expensive facet query across all of your main
documents.

Resources