Update all records in a solr column to "hello" - solr

There are many of records in a solr collection. We need to update a particular column to "hello"
I have executed below json using update request handler, But it create a new record with primary key * and set its column to hello.
{
"Primary_key":"*",
"Column1":{"set":"hello"}
}
Is there any way to update a column1 in all records to hello?

There is no way to update a documents in Solr using a query like '*'.
According to me, the best way you can speed up your column update in this case is to submit multiple queries in single update request and use atomic updates.
Atomic updates allows changing only fields of a document without having to reindex the entire document.
You can send multiple update requests like,
[{"id":"1",
"column1":{"set":"hello"},
{"id":"2",
"column1":{"set":"hello"}]
There is a very old jira with this respect.

Related

How to filter out softly deleted records?

Because data integrity have all records in database (PostgreSQL) table flag defining that particular record is (softly) deleted, e.g. table_name.is_deleted = TRUE/FALSE. I would like to filter out these records from all methods that returns data and do not include is_deleted=FALSE into condition array.
Is there some functionality / setting in Cake model to ignore such records?
Don't do it like that. Use the History Table pattern. Copy row to history table on delete using a trigger.
Use something similar to this: https://wiki.postgresql.org/wiki/Audit_trigger_91plus

Solr generate key

I'm working with solr and indexing data from DB.
When I import the data using SQL query, I got some rows with the same key.
I need a way that solr will generate a new field with unique key.
How can I do that?
Thanks
I am not sure if this is possible or not, but maybe you need to re-consider your logic here...
Indexing operation into Solr should be Re-Runable. So, imagine that you come one day and decide to change the schema of your core.
If you generate a new key everytime you import a document, you will end up creating duplicate items when you re-run your data import.
Maybe you need to revisit your DB design to have a unique key, or maybe in the select query, you can create a derived or calculated column value that is calculated based on multiple columns. But I am sure that pushing this problem to solr is not the solution.
ideally the unique key should come from the db (are you sure you cannot get one, by composing some columns etc?).
But, if you cannot, Solr supports UUID generation for this, look here to see how it works depending on your solr version

Outbound message on merging records

When I merge accounts, I want to trigger an outbound message. When I merge account, there is 1 record that is updated and and the other record that goes to recycle bin. I want to fetch Record Id of one that is merged with and the one that is updated. Is it possible with any conditions or do i need to code?
Yes you need to write a trigger for this although a rather simple one.
As stated in documentation merge doesn't trigger they own event instead delete and uppate events triggered.
From Documentation :
To determine which records were deleted as a result of a merge operation use the
MasterRecordId field in Trigger.old. When a record is deleted after losing a merge
operation, its MasterRecordId field is set to the ID of the winning record.
Link to full page

Updating solr index with deleted records

I was trying to figure out how to update the index for the deleted records. I'm indexing from the database. I search for documents in the database, put them in an array and index them by creating a SolrInputDocument.
So, I couldn't figure out how to update the index for the deleted records (because they don't exist in the database now).
I'm using the php-solr-pecl extension.
You need to handle the deletion of the documents separately from Solr.
Solr won't handle it for you.
In case of Incremental, You need to maintain the Documents deleted from the Database and then fire a delete query for the same to clean up the index.
For this you have to maintain a timestamp and delete flag to identify the documents.
In case of the Full, you can just clean up the index and reindex all.
However, in case of failures you may loose all the data.
Solr DIH provides a bit of handling for the same
create a delete trigger on the database table which will insert the deleted record id in another table.(or have boolean field "deleted" and mark the record instead of actually deleting it, considering the trade-offs I would choose the trigger)
Once in a while do a batch delete on index based on the "deleted" table, also removing them from the table itself.
We faced the same issue and came up with batch deletion approach.
We created a program that will delete the document from SOLR based on the uniqueid, if the unique id is present in SOLR but not in database you can delete that document from SOLR.
(Get the uniqueid list from SOLR) minus (uniqueid list from database)
You can just use SQL minus to get the list of uniqueid belonging to the documents that needs to be deleted.
Else you can do everything in JAVA side. Get the list from database, get the list from solr.. Do a comparison between the 2 list and delete based on that..This would be lost faster for huge number of documents. You can use binary search method to do the comparison..
Something like
Collections.binarySearch(DatabaseUniqueidArray, "SOLRuniqueid");

Drop, not overwrite, on unique id field

When using a unique id field, Solr will overwrite old documents with newly indexed documents. Is there any way to prevent this, so that the old documents are stored but the new are dropped?
Thanks.
Nope. Solr will delete the existing record and insert a new one by default
You can check for Deduplication and UpdateXmlMessages#Optional_attributes which may serve the purpose.
You can write your own update request handler that detected extend UpdateRequestProcessorFactory/UpdateRequestProcessor.
Else, you can check if the Id exists and then not insert the new record. Overhead on the Client side.

Resources