Introspecting stored values of a field - solr

I have field string field 'tags' and I want to list all indexed values for 'tag' from Solr.
Is there some introspection API in order to get hold of all values as JSON or XML?

You can use TermsComponent.
The TermsComponent SearchComponent is a simple component that provides access to the indexed terms in a field and the number of documents that match each term.
This will return all the indexed terms. You can specify the field for which you want to retrieve the terms for.
http://localhost:8983/solr/terms?terms.fl=tag&terms.sort=index

Related

SOLR: facet.field is working for each word in a field differently, how to apply facet.field for whole field sentence?

In facet.field, I have added "MerchantName" field, so I got result as below
"facet_fields":{
"MerchantName":[
"amazon",133281,
"factory",99566,
"club",99566,
"fashion",4905,
"swish",4905,
"store",1001,
"swank",1001,
"the",1001
]
}
In the above array, "club factory", "swish fashion" and "the swank store" are in a single field, but an array as you can see these are treated as a different word.
So how to apply facet query on the whole field which returns an array with whole field value?
The field MerchantName used for faceting. This field should be defined in schema.xml as a string (type="string") in order for the facet to use the whole text.
As you are using a text based field with field type as text_general, the value will be split into multiple tokens. The same is the case with MerchantName field.
Otherwise it will divide it according to the way it has been tokenized.
You can also add docValues="true" for a field MerchantName, then DocValues will automatically be used any time the field is used for sorting, faceting or function queries.
For faceting Solr could get use of DocValues - which is special way of recording field values internally that is more efficient for some purposes, such as sorting and faceting, than traditional indexing.

Index map values

I have data in which field have following java data types.
What would be the best way to index such kind of data.
Thanks,
field_a map<string,string>
field b map<string,array<string>>
How to define schema.xml for it
Currently Solr doesn't support map type field type. So, you can not query on some particular key inside the map and retrieve its value. I don't know whether it'll be helpful you or not, but I can suggest you a way to keep this in Solr.
You can store the map in a field as a json formatted string. Say, document1 has map1 in field_a and document2 has map2 in field_a. Now, you keep some distinct data related to each map to their corresponding documents. When you want to query, query on those fields in stead of the maps. Then in the search result, when you retrieve the json formatted string, parse it in your application and get the values.
Hope this will help.

Sunspot Solr accessing non-stored attribute fields in search results

I'm using Sunspot Solr on Rails for search.
In my class definition I have something like this (simplified from my real one):
searchable do
text :name
integer :count
boolean :priority
end
Is there any way for me to access the integer and boolean field for their values directly in the hit results from a search?
I see there is the option to set up attribute fields with stored => true, which makes them available to the hit objects. However, the integer and boolean fields are clearly already stored as-is somewhere, as I'm capable of sorting and filtering them, just not accessing them through the hit object's stored method, so is there any way I can get them out for display?
If the answer to this is no, what exactly is stored => true doing when passed to an integer or boolean field?
I have a fairly large index on Websolr, and reindexing over this with stored => true would be a bit prohibitive.
Common field options
indexed=true|false
True if this field should be "indexed".
If (and only if) a field is indexed, then it is searchable, sortable, and facetable.
stored=true|false
True if the value of the field should be retrievable during a search.
Indexed and Stored data is maintained differently. Indexed data is used internally by Solr for its operations.
If you want the data to be display you have to index the data with stored attribute true.

Solr copyField mixed with RegexTransformer

Scenario:
In the database I have a field called Categories which of type string and contains a number of digits pipe delimited such as 1|8|90|130|
What I want:
In Solr index, I want to have 2 fields:
Field Categories_ pipe which would contain the exact string as in the DB i.e. 1|8|90|130|
Field Categories which would be a multi-valued field of type INT containing values 1, 8, 90 and 130
For the latter, in the entity specification I can use a regexTransformer then I specify the following field in data-config.xml:
<field column="Categories" name="Navigation" splitBy="\|"/> and then specify the field as multi-valued in schema.xml
What I do not know is how can I 'copy' the same field twice and perform regex splitting only on one. I know there is the copyField facility that can be defined in schema.xml however I can't find a way to transform the copied field because from what I know (and I maybe wrong here), transformers are only available in the entity specification.
As a workaround I can also send the same field twice from the entity query but in reality, the field Categories is a computed field (selects nested) which is somewhat expensive so I would like to avoid it.
Any help is appreciated, thanks.
Instead of splitting it at data-config.xml. You could do that in your schema.xml. Here is what you could do,
Create a fieldType with tokenizer PatternTokenizerFactory that uses regex to split based on |.
FieldSplit: Create a multivalued field using this new fieldType, will eventually have 1,8,90,130
FieldOriginal: Create String field (if you need no analysis on that), that preserves original value 1|8|90|130|
Now you can use copyField to copy FieldSplit , FieldOriginal values based on your need.
Check this Question, it is similar.
You can create two columns from the same data and treat them separately.
SELECT categories, categories as categories_pipe FROM category_table
Then you can split the "categories" column, but index the other one as-is.

what is the advantages of mutivalued option in solr

What is the advantages of mutivalued field option in solr.
I have a field with comma separated keywords.
I can do 2 things
make a non-multivalued text field
make a multivalued text field which contains each keyword
I can still query in both the cases. So whats the advantages of multivalued over non-multivalued?
advantages of multivalued: you don't need to change the document design. If en document containes multiple values in one filed, so solr/lucen can handle this field.
Also an advantage: multiple values could describe an document more exact (thing about tags of an blog post, or so)
advantages of non-multivalued: you can use specific features, which required an single term (word) in one filed, like spell checking. It's also a benefit for clustering (carrot) or grouping, which works mostly better on non-multivalued fields
Querying by the multivalue field will receive what you want.
Example: doc1 has a keyword 'abc', and doc2 has a keyword 'abcd'. If query by keyword 'abc' only doc1 should be matched.
So in non-multivalue approach both documents will matched, case you'll use like syntax.
multivalue fields can be very handy, let say you have many fields and you wish to search for several fields but not in all of them. you can create multivalue field that include all the fields that you wont to search for them on this field and search in it.
for example, let say you have fields that may have value of string or value of number. and than you wish to search on all string values that were found in the document. so you can create multivalue field for all string values and search in it.

Resources