how to implement solr index partitioning - solr

I want solr to create indexes based on a specific field. For e.g. I have a field in schema.xml, createDate (which might be of value 2012/2013/etc). Now while indexing if the value of that specific field is 2013, the document should be indexed at /data/2013/index folder (or some logically separated folder). I tried to provide the following in my solrconfig xml just before the <config> tag ends:
<partition>
<partitionField name="creationYear">
<value>2004</value>
<value>2005</value>
<value>2006</value>
<value>2007</value>
<value>2008</value>
<value>2009</value>
<value>2010</value>
<value>2011</value>
<value>2012</value>
<value>2013</value>
</partitionField>
</partition>
While indexing its not working and it seems that this was just an idea but not really implemented in solr. Am I assuming correct? Or is there a way I can allow solr to create dynamic index folders based on the year(as in this example)?
Any help would be appreciated!!

Related

Spell checking with Solr

I use Solr to index documents (pdf, word, .txt, etc). I need to use spell checker (in french) but I don't know how to do this. I need this function only on the field "content" the type of this field is text_general.
The spellchecker uses the content of your index to build the terms that are used for suggestions - there is no language configuration, since as long as the content that has been indexed is French, the suggestion back to the user will be based on those terms.
The exception is if you're using the FileBasedSpellChecker, where you provide a dictionary of terms with their correct spelling.
# spellcheck.q is only necessary if you want to use a different query than your actual query
&spellcheck=true&spellcheck.q=foo

Solr fields mapping?

I am indexing documents into solr from a source. At source, for each document, i have some associated properties which i am indexing & fetching into solr.
What i am doing is i am mapping some fields from source properties with solr schema fields. But i could see couple of extra fields in solr logs which i am not mapping. While querying in solr admin UI, i could see only mapped fields.
E.g. In below logs, i am using only content_name & content content_modifier but i could see Template fields also.
INFO - 2014-09-18 12:07:47.185; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr path=/update/extract params={literal.content_name=1_.000&literal.content_modifier=System&literal.Template={8ad4d8f0-93a7-4941-9657-cf3706f00409} {add=[1_.000 (1479581071766978560)]} 0 0
So whats happening here? Will solr index only mapped fields and skip rest of unmapped ones? Or will solr index all fields including mapped & non-mapped but on admin UI , it will show only mapped fields?
Please suggest.
Your question is defined by what your solrconfig and schema say because you can configure it any way you want. Here is how it works for the example schema for Solr 4.10:
1) In solrconfig.xml, the handler use "uprefix" parameter to map all fields NOT in schema to a dynamic field ignored_*
2) In schema.xml, that dynamic field has type ignored
3) Type ignored (in the same file) is defined as stored=false and indexed=false. Which means do not complain if you get one of fields with matching pattern, but do nothing with, literally ignore.
So, if you don't like that, you can modify any part of that pipeline. The easiest test would be to change the dynamic field to use type string and reindex. Then, you should see the rest of the fields.

Error during indexing with Apache Solr: multiple values encountered for non multiValued field keywords

I'm trying to quickly index a large collection of html files for a once off information retrieval experiment with Apache Lucene Solr. I'm using the example Solr instance distributed with the latest release (solr-4.9.0/example/solr) and in the spirit of a quick and dirty solution I'm just submitting the documents with curl:
curl http://localhost:8983/solr/update/extract?literal.id=001 -F myfile=#blah.html
When I look at the logs in the Solr panel during indexing I see a lot of errors of the form:
org.apache.solr.common.SolrException: ERROR: [doc=BLOG06-20060103-014-0011844415] multiple values encountered for non multiValued field keywords: [hair care,​ shampoo,​ hair styles,​ hair styles,​ ...]
It looks like the component doing the keyword extraction is pulling out multiple values when perhaps it should only be a list of words separated by whitespace. Do I need to do anything to force this, or does this look like some kind of bug?
Turns out the solution was as simple as ensuring that the keywords field in schema.xml has multiValued="true" specified. I then had to do this for a couple of other fields. I had foolishly assumed that the schema would be set up to match the default document parser in the demo instance.

SOLR index searching across word boundaries

I am trying to configure a SOLR index of business names to be able to do business name lookups. Here is a use case that I'm trying to solve for:
My solr index contains "WHOLE FOODS MARKET". I have a string that I'm trying to look up that has some relevant information and some not relevant information: "WHOLEFDS TRB 10245".
Any help/pointers would be appreciated -- I'm a SOLR novice.
Take a look at the NGRAM filter within the example schema.xml within the zip distribution of solr.
Further links:
How to use n-grams approximate matching with Solr?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory

Solr Spell Check result based filter query

I implemented Solr SpellCheck Component based on the document from http://wiki.apache.org/solr/SpellCheckComponent , it works good. But i am trying to filter the spell check result based on some other filter. Consider the following schema
product_name
product_text
product_category
product_spell -> copy string from product_name and product_text . And tokenized using white space analyzer
For the above schema, i am trying to filter the spell check result based on provided category. I tried querying like http://127.0.0.1:8080/solr/colr1/myspellcheck/?q=product_category:160%20appl&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true . Spellcheck results does not consider the product_category:160
Is it because the dictionary was build for all the categories? If so is it a good idea to create the dictionary for every category?
Is it not possible to have another filter condition in spellcheck component?
I am using solr 3.5
I previously understood from the SOLR-2010 issue that filtering through the fq parameter should be possible using collation, but it isn't, I think I misunderstood.
In fact, the SpellCheckComponent has most likely a separate index, except for the DirectoSolrSpellChecker implementation. It means the field you select is indexed in a different index, which contains only the information about that specific field you chose to make spelling corrections.
If you're curious, you can also have a look how that additional index looks like using luke, since it's of course a lucene index. Unfortunately filtering using other fields isn't an option there, simply because there is only one field there, the one you use to make spelling corrections.

Resources