Add new field to SOLR with default value. Populate existing documents

Add new field to SOLR with default value. Populate existing documents - solr

I have added a new field to a SOLR 3.6.1 schema.xml with a default value. Is it possible to populate / index existing documents in the SOLR repository with this default value without having to re-load all the data? I have been looking at re-indexing and re-optimizing but haven't been able to get this to work?

Any changes in schema.xml related to addition or change in fields would need re-indexing of the data.
So you have to reload your data.
If you know the document, you can do a Partial update of all those document with just that field.

Check Solr: Add new fields with Default Value for Existing Documents
If we only need search and display the new fields, we can do the following steps.
add the new field definition in schema.xml:
We need update search query: when search default value for this newFiled, also search null value:
-(-newFiled:defaultValue AND newFiled:[* TO *])
Use DocTransformer to add default value when there is no value in that field for old data.
Some functions may not work such as sort, stats.

Related

Copy Solr Field Values via Script

I would like to copy the data from one field to another field for all documents in Solr.
A title field that is already populated needs to be copied into another field I just created. I'd like to do them all at once if possible via Putty or the Solr Admin console.
Thank you for any help.

If you have pre-ingested data then the only option is to re-ingest the data after adding the second field. You can set only the new field in the docs instead of inserting all the fields using Solr atomic updates. https://solr.apache.org/guide/8_6/updating-parts-of-documents.html#atomic-updates
solr.add({'id':1, 'newField': {'set': 'sample value'}})
For future insertions, if you want the second field to be auto filled, you can use Solr copy field with the source set to the first field. https://solr.apache.org/guide/8_6/copying-fields.html

How to update Field Name in Solr Collection?

I have a solr indexed data as below.
My requirement is to update the field name MATERIAL_DOCUMENT_YEAR which is actually a date to MATERIAL_DOCUMENT_DATE.
The data is in Millions, which will take more time to re-index.
Is there any way from Solr UI to update the field name, without re-indexing the whole data?
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":39,
"params":{
"q":"SOLR_DATA",
"_":"1607925693065"}},
"response":{"numFound":129500000,"start":0,"maxScore":5.632038,"docs":[
{
"PLANT":["HYD"],
"STOCK_TYPE":[""],
"Table_Name":["TBL_MATERIAL_DOC_DISPLAY"],
"MATERIAL_DOCUMENT_YEAR":["20140312"],
"MATERIAL_DESCRIPTION":["T-SHIRT-XXL"],
"MATERIAL_DOCUMENT_NUMBER":["12345678"],
"MOVEMENT_TYPE":["123"],
"COST_CENTER":[""],

There is no way to rename the field without re-indexing and modifying the the schema.xml
May be you can add another field with correct name.
once it is added for all the fields then you can remove the earlier incorrect field.
Second option would be create another collection with correct field names.
Once all the data is up to date in new collection then you can create an alias to it with earlier collection name.
Once done with all the above you can then remove the older index...

Solr full text search for dynamically added data?

I'm trying index the data without defining schema.xml, is the any way to apply full text search without adding schema.xml or updating the manged shema?

The default operation mode of Solr is to use the Schemaless mode. In this mode Solr will guess what the field type is based on what pattern the data matches the first time a field is included. If it is numeric the first time, Solr will guess that it's going to be a numeric field every time.
If the field contains text it'll be indexed as a text field with processing applied as defined in the default schema.
As long as you're using the default configuration you can submit documents with just the field name and the associated text, then search against the field name as necessary.practice

Using default value for Solr field boosting when field does not exist

My existing Solr 4.x instance has about 650k documents indexed. I just added a new field to the schema that will hold a number of votes given to the document that will be used in boosting the score. Until the first user up votes (or down votes) a given document, said document will not have that field defined. You can see this when viewing the document using the Solr Admin tool.
The field was defined with a default value but I think this only applies to new documents (or maybe reindexed documents) that do not have said field specified.
When I try to test out different boosting functions, I get the following exception back
"error": {
"msg": "can not use FieldCache on a field which is neither indexed nor has doc values: votes",
"code": 400
}
Is it possible to specify a default value to be used for boosting when the field does not (yet) exist in the document? My logic would be
field exists -- use field value
field does not exist -- use default value

This seems to related to your earlier question. Perhaps you can try the FuntionQuery as well
q={!boost b=map(field,0,0,0,default_value) } your_query
This will boost based on the field value, and use default_value if the field value is null.
Reference here

Adding and Updating Solr and lucene field

I am new to solr. can someone address below questions.
1. Currently I have an index with 1.5 mill records. I am having a need to update value of a field to a new value. How do I do it. Will it be a re-indexing? Sample code will be helpful.
I have another need where I want to add a index field but don't want to reindex the entire content. I have document ids with me. For this requirement I can use lucene if that helps.

Currently I have an index with 1.5 mill records. I am having a need to update value of a field to a new value. How do I do it. Will it be a re-indexing? Sample code will be helpful.
Well, the good news is that the latest versions of Solr (starting with 4.3 or 4.4, I think) allows you to do what they call Atomic Updates. See here:
http://wiki.apache.org/solr/Atomic_Updates
From the coding point of view, it as if you were only updating the desired field. Using the Java SolrJ API it's something like this:
Let's say you have a document with a multi value field called "stuffedAnimals". The field already contains "teddy bear" and "stuffed turtle" as values. You want to update it and add a new value like "pink fluffy flamingo". What you can do is:
SolrInputDocument updateDocument = new SolrInputDocument();
//here you must add the id field with the desired value, corresponding to the doc you want to update:
updateDocument.addField("id", 2312312);
//tell it to add the new value to the existing ones, rather then replace them with it:
updateDocument.addField("stuffedAnimals", new HashMap(){{put("add","pink fluffy flamingo");}});
Problem with this is performance: what actually happens when you do this is that the document is removed and re-added entirely (not just the field). This is something you need to take into consideration if you plan on doing a lot of such operations.
I have another need where I want to add a index field but don't want to reindex the entire content. I have document ids with me. For this requirement I can use lucene if that helps.
Well, as I was saying above: when you update a field, the document is actually re-written entirely, so that means it's re-indexed with the new field as well. If you're using Solr 4.4 or earlier you need to declare the new fields in the schema.xml file. If you're using Solr 4.5 or newer you don't need to worry about the schema.xml any more.
Finally, as a remark for both questions: if you want to update a Solr document, make sure all its fields are marked as "stored" (stored=true in schema.xml). Since a partial update on a field translates into Solr removing and re-adding the document (with the update applied), if certain fields are not stored, Solr won't know what value to put in them after the update.

Take a look at atomic update feature added in 4.0.
It allows You to change value of particular field without reindexing whole document.
Remember that all fields in your schema have to be stored(without copyFields). If You need further assistance please write more detailed description.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight