Solr indexing with space and without space for same field - solr

Folks,
We have one requirement where we want to index data based in solr with and without space
Since application is already in production we donot want to add new field in schema.xml
e.g If word is like "Instititue of Excellence"
We want to index like "Instititue of Excellence" and then index with "InstititueofExcellence" by removing all the spaces in the middle.
Is there any easy way to achieve this ? ( As mentioned above we donot want to create multiple fields)

while Indexing you need to stop StandardTokenization.
OR
May be u need to consider writing new filter class (to merge the words) and include it filed Type definition in solrconfig file.

Related

Solr indexing fails over media_black_point

In front i want to say that i dont have much experience with Solr.
Problem we are facing, we only want to index content of files and not want to add dynamic fields, is this possible and if so how?
Problem 2: If Problem one is a No, how would we exclude media_black_point,
media_white_point with indexing?
Error code where Solr trips:
{"responseHeader":{"status":400,"QTime":149},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"incompatible dimension (2) and values (313/1000 329/1000). Only 0 values specified","code":400}}
Dynamic Fields and schemaless mode are both there to catch fields you did not declare explicitly. If neither are used, the assumption is that every field you send to Solr (including output from extract handler that generates a Solr document internally) needs to be explicitly mapped. This helps to avoid spelling errors and other unexpected edge-cases.
If you want to ignore all the fields you did not define explicitly, you can use dynamic field with stored/indexed/docValues all set to false. Solr ships with one example out of the box, you just need to uncomment it.
The other option is to ignore specific fields. You can do that by defining a custom UpdateRequestProcessor chain (or individual URP in the latest Solr) and using IgnoreFieldUpdateProcessorFactory with your specific field name or a name pattern.

Solr range query in text field

I have a multi valued field. The content looks this way
multi_field:"type:type1; YEAR:2008"
I want to be able to make range requests based on YEAR substring. I cannot understand if I can perform this kind of range queries. I want to have something like this. q=multi_field:"type:type1;*" AND multi_field:"*YEARS:[2005 TO 2010]*"
Is it possible? I know it looks horrible. But is there any way I can get it?
No, it is not possible (at least without a hell-a-lot of coding). The easiest way should be to fix the indexing code to split the field into two separate fields. If you need to keep the original multi_field available (e.g. it is used in processing the search results), create two new fields (e.g. multi_field_part1 and multi_field_part2), do the search over the new fields (q=multi_field_part1:type1 AND multi_field_part2:[2005 TO 2010]), but use the old one in results.

Solr for Typo3: not searching in dynamic fields

I'm using the Typo3 Solr Extension 2.8.3 and added some dynamic fields into the typoscripts definition.
So for example, there is a dynamic field defined for plugin.tx_solr.index.queue.tx_news.author_stringS = author
among the other typical definitons.
It seems that the dynamic fields are not put into the index automatically.
Is there a way to tell solr how to index dynamic fields too? - But using the typoscripts config ONLY. I don't like to touch the schema definition.
Did you initialize the index queue after adding your dynamic field? If so, check in your Solr Admin if the field was added and if there is any content in it.
Adding fields without changing the schema definitely works out of the box.
EDIT:
Adding fields to the Solr index doesn't mean that they are used for the search query. You must include your dynamic field also to the query using TypoScript.
See the official documentation:
http://forge.typo3.org/projects/extension-solr/wiki/Tx_solrsearch#queryfields
That should do the trick :
plugin.tx_solr.search.query.queryFields = author_stringS^10.0
I suggest you use author_textS to make it more flexible in the search. *_stringS is exact and case sensitive.
If you set queryFields don't forget to add every other fields you want to be included in the search:
plugin.tx_solr.search.query.queryFields = content^40.0, title^5.0, keywords^2.0, tagsH1^5.0, tagsH2H3^3.0, tagsH4H5H6^2.0, tagsInline^1.0, description^4.0, author_textS^10.0

Adding and Updating Solr and lucene field

I am new to solr. can someone address below questions.
1. Currently I have an index with 1.5 mill records. I am having a need to update value of a field to a new value. How do I do it. Will it be a re-indexing? Sample code will be helpful.
I have another need where I want to add a index field but don't want to reindex the entire content. I have document ids with me. For this requirement I can use lucene if that helps.
Currently I have an index with 1.5 mill records. I am having a need to update value of a field to a new value. How do I do it. Will it be a re-indexing? Sample code will be helpful.
Well, the good news is that the latest versions of Solr (starting with 4.3 or 4.4, I think) allows you to do what they call Atomic Updates. See here:
http://wiki.apache.org/solr/Atomic_Updates
From the coding point of view, it as if you were only updating the desired field. Using the Java SolrJ API it's something like this:
Let's say you have a document with a multi value field called "stuffedAnimals". The field already contains "teddy bear" and "stuffed turtle" as values. You want to update it and add a new value like "pink fluffy flamingo". What you can do is:
SolrInputDocument updateDocument = new SolrInputDocument();
//here you must add the id field with the desired value, corresponding to the doc you want to update:
updateDocument.addField("id", 2312312);
//tell it to add the new value to the existing ones, rather then replace them with it:
updateDocument.addField("stuffedAnimals", new HashMap(){{put("add","pink fluffy flamingo");}});
Problem with this is performance: what actually happens when you do this is that the document is removed and re-added entirely (not just the field). This is something you need to take into consideration if you plan on doing a lot of such operations.
I have another need where I want to add a index field but don't want to reindex the entire content. I have document ids with me. For this requirement I can use lucene if that helps.
Well, as I was saying above: when you update a field, the document is actually re-written entirely, so that means it's re-indexed with the new field as well. If you're using Solr 4.4 or earlier you need to declare the new fields in the schema.xml file. If you're using Solr 4.5 or newer you don't need to worry about the schema.xml any more.
Finally, as a remark for both questions: if you want to update a Solr document, make sure all its fields are marked as "stored" (stored=true in schema.xml). Since a partial update on a field translates into Solr removing and re-adding the document (with the update applied), if certain fields are not stored, Solr won't know what value to put in them after the update.
Take a look at atomic update feature added in 4.0.
It allows You to change value of particular field without reindexing whole document.
Remember that all fields in your schema have to be stored(without copyFields). If You need further assistance please write more detailed description.

Solr Spell Check result based filter query

I implemented Solr SpellCheck Component based on the document from http://wiki.apache.org/solr/SpellCheckComponent , it works good. But i am trying to filter the spell check result based on some other filter. Consider the following schema
product_name
product_text
product_category
product_spell -> copy string from product_name and product_text . And tokenized using white space analyzer
For the above schema, i am trying to filter the spell check result based on provided category. I tried querying like http://127.0.0.1:8080/solr/colr1/myspellcheck/?q=product_category:160%20appl&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true . Spellcheck results does not consider the product_category:160
Is it because the dictionary was build for all the categories? If so is it a good idea to create the dictionary for every category?
Is it not possible to have another filter condition in spellcheck component?
I am using solr 3.5
I previously understood from the SOLR-2010 issue that filtering through the fq parameter should be possible using collation, but it isn't, I think I misunderstood.
In fact, the SpellCheckComponent has most likely a separate index, except for the DirectoSolrSpellChecker implementation. It means the field you select is indexed in a different index, which contains only the information about that specific field you chose to make spelling corrections.
If you're curious, you can also have a look how that additional index looks like using luke, since it's of course a lucene index. Unfortunately filtering using other fields isn't an option there, simply because there is only one field there, the one you use to make spelling corrections.

Resources