I've a string field defined in SOLR that is populated with values such as "020001" and "50002" etc...
I require it to be a string field since I tokenize it for filtering purposes. Now, when I try to sort on this field it shows wrong order (not even ASCII). Is there a way to sort this field in the asc./desc. order? Thanks.
If you sort on a tokenized field, you're probably sorting on a multivalued field, which is something which will probably not give you the result you're expecting.
You can solve this by adding a dedicated field for sorting that contain the value you want to sort by, which will also allow you to use a more proper field type (such as tint) if you want to parse the value into an integer or still use it as a String value.
Related
I'm looking at a very old solr instance (4-6 years since last touched), and I am seeing these extra dynamic fields, 'f_' and 'fs_' for multi and single valued facet fields.
My understanding, though, is that facets only happen in query-time.
Also, it's just a copy over - the fields dont change type.
So before I nuke these fields to kingdom come; is there a reason for facet fields in an index that is just a copied field?
Thanks
Facets only happening query time is a bit of a misnomer - the content (the tokens) that the facet represents from is generated when indexing. The facet gives the distinct number of documents that has a specific token present.
That means that if the field type is identical and there is only one field being copied into the other named field, the behaviour between the source and the destination field should be identical.
However, if there are multiple fields copying content into the same field, the results will differ. Also be aware that the type is given from the schema for the field, it's not changed by the copyField instruction in any way. A copy field operation happens before any content runs through the indexing chain for the field.
Usually you want facets to be generated on string fields so that the indexed values are kept as-is, while you want to use a text field or similar for searching (with tokenization), since a string field would only give exact (including matching case) hits.
I have a field named "disabled" which is multivalued. I will be running a filter query on this field which will basically search for a specific value on this field i.e. fq=disabled:I
I can map the value "I" to an integer and store the corresponding integers into solr and do filter query based on integers.
So I wanted to know if it is better to store the field as solr.trieInt or solr.strField type is better from a performance point of view?
You will not notice any difference. If you were going to run range queries, then it might be more efficient to store it in a Int field, but for a simple lookup, it will not make a difference.
Note too, that 'trie' versions of numeric are not the latest ones, there are Point based numeric types that seem to be even better.
I need to find the most frequently used words in a given field from a selected set of documents. I tried luke handler,
http://localhost:8983/solr/admin/luke?fl=my_field&numTerms=1
But this query gives results considering whole content.
Assuming your field tokenizes to your definition of the word, you can just use faceting for that. That's why faceting fields are usually strings, because the algorithm looks at the tokens generated.
So, in your case, you want the opposite effect.
I'm using Solrnet to return search results and am also requesting the facets, in particular categories which is a multi-valued field.
The problem I'm coming up against is that the category "house products" is being returned as two seperate facets because of the space.
Is there a way of ensuring this is returned as a single facet value, or should I be escaping the value when it is added to the index?
Thanks in advance
Al
If the tokens are generated for house products then you are using text analysis for the field.
Text fields are not suggested to be used for Faceting.
You won't get the desired behavior as the text fields would be tokenized and filtered leading to the generation of multiple tokens which you see from the facets returned as response.
Use a copy field to copy the field to a String field to be able to facet on it without splitting the words.
SolrFacetingOverview :-
Because faceting fields are often specified to serve two purposes,
human-readable text and drill-down query value, they are frequently
indexed differently from fields used for searching and sorting:
They are often not tokenized into separate words
They are often not mapped into lower case
Human-readable punctuation is often not removed (other than double-quotes)
There is often no need to store them, since stored values would look much like indexed values and the faceting mechanism is used for
value retrieval.
Try to use String fields and it would be good enough without any overheads.
The faceting works on tokens, so if you have a field that is tokenized in many words it will split the facet too.
I suggest you create another field of type string used only for faceting.
In apache Solr why do we always need to prefer string field over text field if both solves purposes?
How string or text affects the parameters like index size, index read, index creation?
The fields as default defined in the solr schema are vastly different.
String stores a word/sentence as an exact string without performing tokenization etc. Commonly useful for storing exact matches, e.g, for facetting.
Text typically performs tokenization, and secondary processing (such as lower-casing etc.). Useful for all scenarios when we want to match part of a sentence.
If the following sample, "This is a sample sentence", is indexed to both fields we must search for exactly the text This is a sample sentence to get a hit from the string field, while it may suffice to search for sample (or even samples with stemmning enabled) to get a hit from the text field.
Adding to Johans Sjöbergs good answer:
You can sort a String but not a Text.