How to change unique key to new field and update all the documents without re-indexing? - solr

I want to change the unique key field of documents to a new field, and update documents without re-indexing. What are the options do we have to achieve this? Solr version 8.2 and using solr cloud.

If you change anything on the schema side, you need to reindex the data.
There is an alternative to it.
But the solr cloud has nice feature that can be used here without restarting the solr server.
Create a new config set.
Make all your changes in the newly created configSet.
Upload the newly created configset as zookeper maintains all your configsets.
Create a new collection using the new configset.
Index your data in new collection.
Once all the data is indexed, create an alias to new collection.
Give the alias name same as the old collection.
All your new request will be redirected to new collection.
Once everything is set you can delete the old collection.

Related

How can I have multiple collections in Solr

Hi I am new to Solr and I'm trying to get my bearings.
Using Solr in my case might not be the best idea or might be a bit overkill but this is just for testing to see how to use it.
I would like to create a database which handles users posts and pages, in mongodb I would have created a collection for users, a collection for post and a collection for pages which would obviously contain the individual documents.
I don't know how would I be able to replicate that in Solr . I have created a core for users which I thought is like a collection in mongodb. To add a post on Pages, do I then create a new core for each or is there another way to separate the data?
Thank you for the advice
Yes you can have separate collection in solr as well.
With the latest version of solr where you can use solr cloud and create multiple collections.
Each collection can handle a separate entity.
Please refer the below links for more details
Solr Collection API
Solr Collection Management

Is there a transient kind of annotation for Spring-data-mongodb

I just changed my springboot from using Postgresql to mongo, i am trying to get the mongodb to leave out a property when storing an object in MongoDB, but the #Transient field doesnt work. Is there a way to do this ?
#Transient is supposed to work with Spring Data MongoDB.
From the Spring Data MongoDB documentation:
#Transient: By default all private fields are mapped to the document,
this annotation excludes the field where it is applied from being
stored in the database
If it isn't working for you, I suggest including your code in your question.
Note that if you forgot to add #Transient and added some data to your database, then later added #Transient to a field, Spring Data Mongo isn't going to go through the database collection and delete that field from all the documents that currently have it, it just won't include that field in any new documents it saves to the collection.

Changes to Solr schema.xml do not update after stoping and restarting Solr

I am a new learner of Solr. Now I want to make my own schema.xml. So I add some fields. I stop the solr and restart it. In the admin of solr, I can see the changes in the schema choice. But the content of schema browser doesn't changes. And when I want to index some document. There is an error that says there is no field which I just added in the schema. The content of schema browser is not same as the schema file.
Changing the schema of a core doesn't change the documents you already have there, which is why they look the same even after you restart the Solr service. You need to re-upload the documents with the new fields specified (if they are required fields) after you make a schema change to get these new fields for existing documents.
from here I went to the path of my core instance to make the changes.
/usr/local/Cellar/solr#7.7/7.7.3_1/server/solr/drupal
then I was able to confirm the changes by clicking on Files and scrolling to where I made the change.

Add metadata from database to Solr Index created by Nutch

I have a bespoke CMS that needs to be searchable in Solr. Currently, I am using Nutch to crawl the pages based on a seed list generated from the CMS itself.
I need to be able to add metadata stored in the CMS database to the document indexed in Solr. So, the thought here is that the page text (html generated by the CMS) is crawled via Nutch and the metadata is added to the Solr document where the unique ID (in this instance, the URL) is the same.
As such, the metadata from the DB can be used for facets / filtering etc while full-text search and ranking is handled via the document added by Nutch.
Is this pattern possible? Is there any way to update the fields expected from the CMS DB after Nutch has added it to Solr?
Solr has the ability to partially update a document, provided that all your document fields are stored. See this. This way, you can define several fields for your document, that are not originally filled by nutch, but after the document is added to solr by nutch, you can update those fields with your database values.
In spite of this, I think there is one major problem to be solved. Whenever nutch recrawls a page, it updates the entire document in solr, so your updated fields are missed. Even in the first time, you must be sure that nutch first added the document, and then the fields are updated. To solve this, I think you need to write a plugin for nutch or a special request handler for solr to know when updates are happening.

Solr - Unique Key Exception

Is there any way Solr can throw exception back, either in the status or exception message somehow, for an update request that having an existing unique key. Right now, Solr just sends back a good update message with status 0 while its not adding the document. I need an ability to tell from the client side that if a document was not added because of the duplicate unique key issue.
Thanks!
If a document with the unique id exists, solr just updates the doc. It is by design and as far as i know, there is no way to change it.
You can solr query before you update/add a doc, so that you are not adding it again... but that is not really transactional (solr is not a database).It'd work if you are the only one updating solar and the changes are serialized etc.
If you have this stringent requirement on not adding existing ids, you could use an intermediary database, load it and reindex solr from that..?

Resources