I see the _rev in every document created in ArangoDB, but I have yet to see any information about using those revisions to access the change history for a document. More specifically, how do I query the revision history for a specific document to see the previous versions or even a specific version in time?
My understanding is that the revision (_rev) attribute is just there as a marker so you can know when a field was updated. You can't change it directly, but every time you UPDATE a document in a collection, the _rev value is updated.
To store historical values you would need to implement a process to archive the old values of a document when they get updated.
The _rev attribute can be very helpful when scanning a document and seeing if any values were changed. Rather than having to do a deep compare on a document and what you expect to see, you can just compare the _rev attribute with what you expect to see. If the database returns a different _rev value than what you were checking for then your code can respond to the document changing, however required.
Remember, you have access to the old version of a document when you execute an UPDATE or UPSERT command (the doco) and you could choose to return the OLD document contents to push off to an archive location, or process as you wish. The updated document will receive a new _rev value after that update.
The OLD value does not persist after the return of the UPDATE or UPSERT command, so you'll have to archive it right away or the older document will be lost.
Related
I'm using google datastore to get data of a user:
This is what i'm trying to do:
When data is updated, its updated_at [indexed] property gets set to current timestamp.
I query data on updated_at in ascending order and store cursor returned for later use.
Now user has updated the last entity (which cursor points currently) and no other data is added or updated.
Now i'm expecting that last entity to be returned in next query (using that old cursor) because it was updated and now has a new updated_at timestamp.
But that is not the case, i do not see that (my result is empty list) And now i have lost that update completely because query will return all the other object except that last entity that was updated.
Am I doing something wrong or is this the way it is? If this is natural behavior then what is a preferred way to get the last entity that was updated?
Disclosure This answer represent just my understanding how GAE datastore works. Reality can be different. But the solution should work anyway.
You can think of a cursor as a pointer to a node in linked list.
Basically it's storing just query used to get it and a key to the "last/current" entity. When entities updated in datastore it has no way to update a cursor.
When you change entity's field updated_at it does not change the key stored at cursor. So if you update filtred/ordered properties the old cursor points to the same node but in a different "chain".
Solution: Instead of storing cursor you can store the last (max) updated_at and query your data with .filter('updated_at >', last_updated_at). This way you will:
Get your entity in the results if updated_at changed (increased)
Have smaller & more readable "cursor" to pass around.
Think of datastore cursors as pointing "between" entities, as in "Here's where to continue the scan."
The documentation says "... a cursor, which is an opaque base64-encoded string marking the index position of the last result retrieved," but that last result won't be re-retrieved.
It is the expected behaviour, because the data was modified between the moment when the cursor was created and when it is used. From Cursors and data updates:
... If the results for a query change between uses of a cursor, the
query notices only changes that occur in results after the cursor.
...
I am trying to update existing documents in a (Sentry-secured) Solr collection. The updates are accepted by Solr, but when I query, the document seems to have disappeared from the collection.
What is going on?
I am using Cloudera (CDH) 5.8.3, and Sentry with document-level access control enabled.
When using document-level access control, Sentry uses a field (whose name is defined in solrconfig.secure.xml, but the default is sentry_auth) to determine which roles can see that document.
If you update a document, but forget to supply a sentry_auth field, then the updated document doesn't belong to any roles, so nobody can see it - it becomes essentially invisible! This is easily done, because the sentry_auth field is typically not a stored field, so won't be returned by any queries.
You therefore cannot just retrieve a document, modify a field, then update the document - you need to know which roles that document belongs to, so you can supply a properly-populated sentry-auth field.
You can make the sentry_auth field a "required" field, in the Solr schema, which will prevent you from accidentally omitting it.
However, this won't prevent you from supplying a blank sentry-auth field (or supplying incorrect roles), either of which will also make the document "disappear".
Also note that you can update a document that you do not have document-level access to, provided you have write-access to the collection as a whole, and you have the ID of the document. This means that users can (deliberately or accidentally) over-write or delete documents that they cannot see. This is a design choice, made so that users cannot find out whether a particular document ID exists, when they do not have document-level access to it.
See the Cloudera documentation:
http://blog.cloudera.com/blog/2014/07/new-in-cdh-5-1-document-level-security-for-cloudera-search/
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/search_sentry_doc_level.html
https://www.cloudera.com/documentation/enterprise/5-9-x/topics/search_sentry.html
We need to update the index of Solr 4 but are getting some unexpected results. We run a C# program that uses SolrNet to do an AddRange(). In this process, we're adding new documents and also trying to update existing ones.
We're noticing that some records' fields get updated with the latest data, while others still show the old information. Should we be using the information indicated in the documentation?
The documentation indicates we can set an update="set|add|inc" on the field. If we'd like the existing record to be updated, should we use set? Also, when we delete a field, to have it removed, do we need to shut down Solr and restart? Or set null="true"?
Can you point us to some good information on doing updates to Solr data? Thank you.
The documenation reference that you list describes the parameters for Atomic Updates in Solr 4, which is currently not supported in SolrNet - see issue 199 for more details.
Until this support has been added to SolrNet, your only option for updating documents in the index is to resend the entire document (object in C#) with the required updated/deleted feilds set appropriately. Internally Solr will re-add the document to the index with the updated fields.
Also, when you are adding/updating documents in the index, these changes will not be visible to queries against the index until a commit has been issued. I would recommend using the CommitWithin option of AddParameters to allow Solr to handle this internally, this is described in detail in the SolrWiki - CommitWithin.
I've been attempting to do the equivalent of an UPSERT (insert or update if already exists) in solr. I only know what does not work and the solr/lucene documentation I have read has not been helpful. Here's what I have tried:
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"1","name":{"set":"steve"}}]'
{"responseHeader":{"status":409,"QTime":2},"error":{"msg":"Document not found for update. id=1","code":409}}
I do up to 50 updates in one request and request may contain the same id with exclusive fields (title_en and title_es for example). If there was a way of querying whether or not a list of id's exist, I could split the data and perform separate insert and update commands... This would be an acceptable alternative but is there already a handler that does this? I would like to avoid doing any in house routines at this point.
Thanks.
With Solr 4.0 you can do a Partial update of all those document with just the fields that have changed will keeping the complete document same. The id should match.
Solr does not support UPSERT mechanics out of the box. You can create a record or you can update a record and syntax is different.
And if you update the record you must make sure all your other pre-inserted fields are stored (not just indexed). Under the covers, an update creates a completely new record just pre-populated with previously stored values. But that functionality if very deep in (probably in Lucene itself).
Have you looked at DataImportHandler? You reverse the control flow (start from Solr), but it does have support for checking which records need to be updated and which records need to be created.
Or you can just run a solr query like http://solr.example.com:8983/solr/select?q=id%3A(ID1+ID2+ID3)&fl=id&wt=csv where you ask Solr to look for your ID records and return only ID of records it does find. Then, you could post-process that to segment your Updates and Inserts.
I haven't been able to find any information regarding the best way to handle record editing with approval in CakePHP.
Specifically, I need to allow users to edit data in a record, but the edited data should not overwrite the original record data until administrators have approved the change. I could put the edited records in a new table and then overwrite the originals when I approve them but I wonder if there is an easier way since this idea doesn't seem to play well with the cake philosophy so to speak.
You are going to need somewhere to store that data until an administrator can approve it.
I'm not sure how this can be easier than creating another table with the new edits and the original post id. Then when an administrator approves the edit, the script overwrites the old record with the edited version.
I'm working on a similar setup and I'm going with storing the draft record in the same table but with a flag set on the record called "draft". Also, the original record has a "draft_id" field that has the id of the draft record stored in it.
Then in the model when the original record is loaded by the display engine it shows it normally. But when the edit or preview actions try to load the record, it checks the "draft_id" field and then loads the other record if it's set.
The "draft" flag is used to keep list and other group find type actions from grabbing the draft records too. This might also be solved by a more advanced SQL query but I'm not quite that good with SQL.