I'm using Sunspot Solr on Rails for search.
In my class definition I have something like this (simplified from my real one):
searchable do
text :name
integer :count
boolean :priority
end
Is there any way for me to access the integer and boolean field for their values directly in the hit results from a search?
I see there is the option to set up attribute fields with stored => true, which makes them available to the hit objects. However, the integer and boolean fields are clearly already stored as-is somewhere, as I'm capable of sorting and filtering them, just not accessing them through the hit object's stored method, so is there any way I can get them out for display?
If the answer to this is no, what exactly is stored => true doing when passed to an integer or boolean field?
I have a fairly large index on Websolr, and reindexing over this with stored => true would be a bit prohibitive.
Common field options
indexed=true|false
True if this field should be "indexed".
If (and only if) a field is indexed, then it is searchable, sortable, and facetable.
stored=true|false
True if the value of the field should be retrievable during a search.
Indexed and Stored data is maintained differently. Indexed data is used internally by Solr for its operations.
If you want the data to be display you have to index the data with stored attribute true.
Related
I have an indexed multi-value field. If I add a parallel multi-value field, is it reliable to have the same order?
Consider this CSV and separator |:
ID,Name,Number
988,Sixth|Second|Third,6|2|3
989,Fifth|Fourth|First,5|4|1
If I get the records by id (not search), can I be sure that the arrays of two fields are always in the original matching order?
{
"doc":
{
"ID":988,
"Name":["Sixth",
"Second",
"Third"]},
"Number":[6,
2,
3]}}
Yes, the order is deterministic and stable. You can safely assume that the sequence of multivalued fields is kept intact. The post detailing this on the mailing list has since disappeared.
There is no guarantee about the ordering of the fields - i.e. "Number" can come before "Name", but internally in the field (the mulivalued part) the values will be returned in the same order as they were indexed.
We've been running applications on Solr since 2008 that depend on this behavior and it has never been an issue.
If there's many fields where you need to know that [0] in one field corresponds to [0] in another field, etc., it might be more useful to add a stored only (not indexed, etc.) JSON representation of the structure as a field and just index the other fields (and not store them) to make the application level code simpler.
I'm looking at a very old solr instance (4-6 years since last touched), and I am seeing these extra dynamic fields, 'f_' and 'fs_' for multi and single valued facet fields.
My understanding, though, is that facets only happen in query-time.
Also, it's just a copy over - the fields dont change type.
So before I nuke these fields to kingdom come; is there a reason for facet fields in an index that is just a copied field?
Thanks
Facets only happening query time is a bit of a misnomer - the content (the tokens) that the facet represents from is generated when indexing. The facet gives the distinct number of documents that has a specific token present.
That means that if the field type is identical and there is only one field being copied into the other named field, the behaviour between the source and the destination field should be identical.
However, if there are multiple fields copying content into the same field, the results will differ. Also be aware that the type is given from the schema for the field, it's not changed by the copyField instruction in any way. A copy field operation happens before any content runs through the indexing chain for the field.
Usually you want facets to be generated on string fields so that the indexed values are kept as-is, while you want to use a text field or similar for searching (with tokenization), since a string field would only give exact (including matching case) hits.
When Solr returns a document, the field values match those that where passed to the Solr indexer.
However especially for TextFields Solr typically uses a modified value where (depending on the definition in the schema.xml) various filters are applied, typicall:
conversion to lower case
replacing of synonyms
removal of stopwords
application of stemming
One can see the result of the conversion for specific texts by using Solr Admin > Some core > Analysis. There is a tool called Luke and the LukeRequestHandler but it seems I can only view the values passed to Solr but not the tranformed variant. One can also take a look at the index data on the disk but they seem to be stored in a binary format.
However, non of these seem to enable me to see the actual value as stored.
The reason for asking is that I've created a text field based on a certain filter chain which according to Solr Admin > Analysis transforms the text correctly. However when searching for a specific word in the transformed text it won't find it.
I have a field named "disabled" which is multivalued. I will be running a filter query on this field which will basically search for a specific value on this field i.e. fq=disabled:I
I can map the value "I" to an integer and store the corresponding integers into solr and do filter query based on integers.
So I wanted to know if it is better to store the field as solr.trieInt or solr.strField type is better from a performance point of view?
You will not notice any difference. If you were going to run range queries, then it might be more efficient to store it in a Int field, but for a simple lookup, it will not make a difference.
Note too, that 'trie' versions of numeric are not the latest ones, there are Point based numeric types that seem to be even better.
I have field string field 'tags' and I want to list all indexed values for 'tag' from Solr.
Is there some introspection API in order to get hold of all values as JSON or XML?
You can use TermsComponent.
The TermsComponent SearchComponent is a simple component that provides access to the indexed terms in a field and the number of documents that match each term.
This will return all the indexed terms. You can specify the field for which you want to retrieve the terms for.
http://localhost:8983/solr/terms?terms.fl=tag&terms.sort=index