Drop, not overwrite, on unique id field - solr

When using a unique id field, Solr will overwrite old documents with newly indexed documents. Is there any way to prevent this, so that the old documents are stored but the new are dropped?
Thanks.

Nope. Solr will delete the existing record and insert a new one by default
You can check for Deduplication and UpdateXmlMessages#Optional_attributes which may serve the purpose.
You can write your own update request handler that detected extend UpdateRequestProcessorFactory/UpdateRequestProcessor.
Else, you can check if the Id exists and then not insert the new record. Overhead on the Client side.

Related

Update all records in a solr column to "hello"

There are many of records in a solr collection. We need to update a particular column to "hello"
I have executed below json using update request handler, But it create a new record with primary key * and set its column to hello.
{
"Primary_key":"*",
"Column1":{"set":"hello"}
}
Is there any way to update a column1 in all records to hello?
There is no way to update a documents in Solr using a query like '*'.
According to me, the best way you can speed up your column update in this case is to submit multiple queries in single update request and use atomic updates.
Atomic updates allows changing only fields of a document without having to reindex the entire document.
You can send multiple update requests like,
[{"id":"1",
"column1":{"set":"hello"},
{"id":"2",
"column1":{"set":"hello"}]
There is a very old jira with this respect.

Solr Indexing duplicate documents

I am using solr to store filepaths and my 'id' (uniquekey) and index its content. When I change the file contents and re-index it, it replaces the contents of the file in the index. Is there anyway I can retain the old version of file under the same id ? I tried adding the overwrite=false parameter with no luck. I am using solr 6.1.0
I think you cannot do that under the same id as id is the uniquekey.
Even its not possible to achieve on RDBMS type.
It could be achieved by providing another id and maintaining the relations of new id(where the document content is changed, consider it as new document with new id) and then have relation of new id and the old id .
You can have a similar concept for solr as well, but every document you need to have another field like id and older_id .
Here in the older_id you can have the id of the document which id the older version and has the content of old document.
And with this your older documents will not be deleted from solr as they will have the new document and new id and older_id the previous document id.

Adding dynamically a sharding key in ArangoDB

I'm installing a clustered db with ArangoDB. I need use indexes in collections.
We suppose that we have one collection named myCollection that was created with the shard keys _key.
Let myVariable be the unique key of myCollection so I have a unique constraint on myVariable.
By myCollection is created, and data are inside.
I don't want erase all, create myCollection again and add a new shard key with myVariable and restore myCollection, so I need to add a new shard key dinamically meanwhile that myCollection is already created.
Is this possible? Can I add, somehow, new shard key?
I mean, add key in _shardBy label without recreate collection.
Thanks for help.
No, changing the shard key after creation is not supported. If you take a look at the consequences this would have, its easily understandeable why:
The shard key identifies to the coordinator which documents should end on which cluster node. Vice versa it can therefore predict where to search for documents based on the shard key. This assumption would fail if you change that condition to an arbirtary new one. Therefore documents not matching the condition would have to be moved to the correct new shard.
As you see, you need to work with all documents anyways. So if you don't want to download all data to the client, some javascript on the coordinator like a Foxx Service could fill the gap:
create the new collection with the proper shard key
fetch all _keys into memory
issue repetive AQL queries that select a range from the old collection and insert it into the new one.
You may want to start an additional coordinator if you don't want to use your existing setup for this.
Hint: An upgrade to ArangoDB 3.0 will require a dump/restore cycle anyways - so if you can postpone your problem a little you may solve it then.

Outbound message on merging records

When I merge accounts, I want to trigger an outbound message. When I merge account, there is 1 record that is updated and and the other record that goes to recycle bin. I want to fetch Record Id of one that is merged with and the one that is updated. Is it possible with any conditions or do i need to code?
Yes you need to write a trigger for this although a rather simple one.
As stated in documentation merge doesn't trigger they own event instead delete and uppate events triggered.
From Documentation :
To determine which records were deleted as a result of a merge operation use the
MasterRecordId field in Trigger.old. When a record is deleted after losing a merge
operation, its MasterRecordId field is set to the ID of the winning record.
Link to full page

Updating solr index with deleted records

I was trying to figure out how to update the index for the deleted records. I'm indexing from the database. I search for documents in the database, put them in an array and index them by creating a SolrInputDocument.
So, I couldn't figure out how to update the index for the deleted records (because they don't exist in the database now).
I'm using the php-solr-pecl extension.
You need to handle the deletion of the documents separately from Solr.
Solr won't handle it for you.
In case of Incremental, You need to maintain the Documents deleted from the Database and then fire a delete query for the same to clean up the index.
For this you have to maintain a timestamp and delete flag to identify the documents.
In case of the Full, you can just clean up the index and reindex all.
However, in case of failures you may loose all the data.
Solr DIH provides a bit of handling for the same
create a delete trigger on the database table which will insert the deleted record id in another table.(or have boolean field "deleted" and mark the record instead of actually deleting it, considering the trade-offs I would choose the trigger)
Once in a while do a batch delete on index based on the "deleted" table, also removing them from the table itself.
We faced the same issue and came up with batch deletion approach.
We created a program that will delete the document from SOLR based on the uniqueid, if the unique id is present in SOLR but not in database you can delete that document from SOLR.
(Get the uniqueid list from SOLR) minus (uniqueid list from database)
You can just use SQL minus to get the list of uniqueid belonging to the documents that needs to be deleted.
Else you can do everything in JAVA side. Get the list from database, get the list from solr.. Do a comparison between the 2 list and delete based on that..This would be lost faster for huge number of documents. You can use binary search method to do the comparison..
Something like
Collections.binarySearch(DatabaseUniqueidArray, "SOLRuniqueid");

Resources