Solr indexing fails over media_black_point - solr

In front i want to say that i dont have much experience with Solr.
Problem we are facing, we only want to index content of files and not want to add dynamic fields, is this possible and if so how?
Problem 2: If Problem one is a No, how would we exclude media_black_point,
media_white_point with indexing?
Error code where Solr trips:
{"responseHeader":{"status":400,"QTime":149},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"incompatible dimension (2) and values (313/1000 329/1000). Only 0 values specified","code":400}}

Dynamic Fields and schemaless mode are both there to catch fields you did not declare explicitly. If neither are used, the assumption is that every field you send to Solr (including output from extract handler that generates a Solr document internally) needs to be explicitly mapped. This helps to avoid spelling errors and other unexpected edge-cases.
If you want to ignore all the fields you did not define explicitly, you can use dynamic field with stored/indexed/docValues all set to false. Solr ships with one example out of the box, you just need to uncomment it.
The other option is to ignore specific fields. You can do that by defining a custom UpdateRequestProcessor chain (or individual URP in the latest Solr) and using IgnoreFieldUpdateProcessorFactory with your specific field name or a name pattern.

Related

How To intercept Document in Solr

I want to manipulate doc and change the token value for field(s) by prepending some value to each token. I am doing bulk update through DIH and also posting Documents through SOLRJ. I have replication factor as 2, so Replication should also work. The value that I want to prepend is there in the document as a separate field. I am interested to know the place where I can intercept the document before the indexing so that I can manipulate it. One of the option I can think of overriding DirectUpdateHandler2. Is this the right place?
I can do it by externally processing the document and passing it to SOLR But I want to do it inside SOLR.
Document fields are :
city:mumbai
RestaurantName:Talk About
Keywords:Cofee, Chines, South Indian, Bar
I want to index keywords as
mumbai_cofee
mumbai_Chines
mumbai_South Indian
mumbai_Bar
the right place is an Update Request Processor, you make sure you plug that in sorlconfig.xml into all udpate handlers you are using (including DIH), and the single URP will cover all updates.
In your java code in the URP you can easily get the value of a field and then prepend it to all the others in another field etc. This happens before the doc is indexed.

Solr for Typo3: not searching in dynamic fields

I'm using the Typo3 Solr Extension 2.8.3 and added some dynamic fields into the typoscripts definition.
So for example, there is a dynamic field defined for plugin.tx_solr.index.queue.tx_news.author_stringS = author
among the other typical definitons.
It seems that the dynamic fields are not put into the index automatically.
Is there a way to tell solr how to index dynamic fields too? - But using the typoscripts config ONLY. I don't like to touch the schema definition.
Did you initialize the index queue after adding your dynamic field? If so, check in your Solr Admin if the field was added and if there is any content in it.
Adding fields without changing the schema definitely works out of the box.
EDIT:
Adding fields to the Solr index doesn't mean that they are used for the search query. You must include your dynamic field also to the query using TypoScript.
See the official documentation:
http://forge.typo3.org/projects/extension-solr/wiki/Tx_solrsearch#queryfields
That should do the trick :
plugin.tx_solr.search.query.queryFields = author_stringS^10.0
I suggest you use author_textS to make it more flexible in the search. *_stringS is exact and case sensitive.
If you set queryFields don't forget to add every other fields you want to be included in the search:
plugin.tx_solr.search.query.queryFields = content^40.0, title^5.0, keywords^2.0, tagsH1^5.0, tagsH2H3^3.0, tagsH4H5H6^2.0, tagsInline^1.0, description^4.0, author_textS^10.0

/select with 'q' parameter does not work

Whenever i query with q=: it shows all the documents but when i query with q=programmer 0 docs found.(contents is the default search field)
my schema has: id(unique),author,title,contents fields
Also query works fine for:
q=author:"Value" or q=title:"my book" etc, only for contents field no results.
Also when i query using spell checker(/spell?q=programmer) output shows spelling suggestions for this word,when 'programmer' is the right word and present in many documents.
I referred the example docs for configurations.
All of a sudden i am getting this,initially it worked fine.
I guess there some problem only in the contents field,but cannot figure it out.
Is it because indexes are not created properly for contents field?
(I am using solr 4.2 on Windows 7 with tomcat as webserver)
Please help.Thanks a lot in advance.
Are you sure you set the default search field? The reason you have this problem might be because you didn't set the <defaultSearchField> field in your schema.xml file. This is why "q=author:value" works while q=WHATEVER doesn't.
The Is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used.
But also consider this:
The is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used. It is preferable to not use or
rely on this setting; instead the request handler or query LocalParams
for a search should specify the default field(s) to search on. This
setting here can be omitted and it is being considered for
deprecation.
Do you have any data in your instance. try q=*:* and see what it returns. "for" is a stop word, may be it was filtered out. Look for something else as value to test.

Adding and Updating Solr and lucene field

I am new to solr. can someone address below questions.
1. Currently I have an index with 1.5 mill records. I am having a need to update value of a field to a new value. How do I do it. Will it be a re-indexing? Sample code will be helpful.
I have another need where I want to add a index field but don't want to reindex the entire content. I have document ids with me. For this requirement I can use lucene if that helps.
Currently I have an index with 1.5 mill records. I am having a need to update value of a field to a new value. How do I do it. Will it be a re-indexing? Sample code will be helpful.
Well, the good news is that the latest versions of Solr (starting with 4.3 or 4.4, I think) allows you to do what they call Atomic Updates. See here:
http://wiki.apache.org/solr/Atomic_Updates
From the coding point of view, it as if you were only updating the desired field. Using the Java SolrJ API it's something like this:
Let's say you have a document with a multi value field called "stuffedAnimals". The field already contains "teddy bear" and "stuffed turtle" as values. You want to update it and add a new value like "pink fluffy flamingo". What you can do is:
SolrInputDocument updateDocument = new SolrInputDocument();
//here you must add the id field with the desired value, corresponding to the doc you want to update:
updateDocument.addField("id", 2312312);
//tell it to add the new value to the existing ones, rather then replace them with it:
updateDocument.addField("stuffedAnimals", new HashMap(){{put("add","pink fluffy flamingo");}});
Problem with this is performance: what actually happens when you do this is that the document is removed and re-added entirely (not just the field). This is something you need to take into consideration if you plan on doing a lot of such operations.
I have another need where I want to add a index field but don't want to reindex the entire content. I have document ids with me. For this requirement I can use lucene if that helps.
Well, as I was saying above: when you update a field, the document is actually re-written entirely, so that means it's re-indexed with the new field as well. If you're using Solr 4.4 or earlier you need to declare the new fields in the schema.xml file. If you're using Solr 4.5 or newer you don't need to worry about the schema.xml any more.
Finally, as a remark for both questions: if you want to update a Solr document, make sure all its fields are marked as "stored" (stored=true in schema.xml). Since a partial update on a field translates into Solr removing and re-adding the document (with the update applied), if certain fields are not stored, Solr won't know what value to put in them after the update.
Take a look at atomic update feature added in 4.0.
It allows You to change value of particular field without reindexing whole document.
Remember that all fields in your schema have to be stored(without copyFields). If You need further assistance please write more detailed description.

Solr Spell Check result based filter query

I implemented Solr SpellCheck Component based on the document from http://wiki.apache.org/solr/SpellCheckComponent , it works good. But i am trying to filter the spell check result based on some other filter. Consider the following schema
product_name
product_text
product_category
product_spell -> copy string from product_name and product_text . And tokenized using white space analyzer
For the above schema, i am trying to filter the spell check result based on provided category. I tried querying like http://127.0.0.1:8080/solr/colr1/myspellcheck/?q=product_category:160%20appl&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true . Spellcheck results does not consider the product_category:160
Is it because the dictionary was build for all the categories? If so is it a good idea to create the dictionary for every category?
Is it not possible to have another filter condition in spellcheck component?
I am using solr 3.5
I previously understood from the SOLR-2010 issue that filtering through the fq parameter should be possible using collation, but it isn't, I think I misunderstood.
In fact, the SpellCheckComponent has most likely a separate index, except for the DirectoSolrSpellChecker implementation. It means the field you select is indexed in a different index, which contains only the information about that specific field you chose to make spelling corrections.
If you're curious, you can also have a look how that additional index looks like using luke, since it's of course a lucene index. Unfortunately filtering using other fields isn't an option there, simply because there is only one field there, the one you use to make spelling corrections.

Resources