Use Objectify to retrieve entity group's metadata - google-app-engine

How can I use Objectify to retrieve the entity group's metadata, specifically the __version__ property?
I would like to use the __version__ property to determine whether a transaction should retry. That's because "it is possible to receive a DatastoreTimeoutException or DatastoreFailureException even when a transaction has been committed and will eventually be applied successfully."

There is nothing special in Objectify to manage this field, but if you create a field like this it should get populated with the version number:
#IgnoreSave long __version__;
That's two underscores on each side.

Related

Solr document disappears when I update it

I am trying to update existing documents in a (Sentry-secured) Solr collection. The updates are accepted by Solr, but when I query, the document seems to have disappeared from the collection.
What is going on?
I am using Cloudera (CDH) 5.8.3, and Sentry with document-level access control enabled.
When using document-level access control, Sentry uses a field (whose name is defined in solrconfig.secure.xml, but the default is sentry_auth) to determine which roles can see that document.
If you update a document, but forget to supply a sentry_auth field, then the updated document doesn't belong to any roles, so nobody can see it - it becomes essentially invisible! This is easily done, because the sentry_auth field is typically not a stored field, so won't be returned by any queries.
You therefore cannot just retrieve a document, modify a field, then update the document - you need to know which roles that document belongs to, so you can supply a properly-populated sentry-auth field.
You can make the sentry_auth field a "required" field, in the Solr schema, which will prevent you from accidentally omitting it.
However, this won't prevent you from supplying a blank sentry-auth field (or supplying incorrect roles), either of which will also make the document "disappear".
Also note that you can update a document that you do not have document-level access to, provided you have write-access to the collection as a whole, and you have the ID of the document. This means that users can (deliberately or accidentally) over-write or delete documents that they cannot see. This is a design choice, made so that users cannot find out whether a particular document ID exists, when they do not have document-level access to it.
See the Cloudera documentation:
http://blog.cloudera.com/blog/2014/07/new-in-cdh-5-1-document-level-security-for-cloudera-search/
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/search_sentry_doc_level.html
https://www.cloudera.com/documentation/enterprise/5-9-x/topics/search_sentry.html

GAE Datastore (Golang): Filter Query When Adding New DB Field

I'm running a GAE Golang application that's working with datastore. I have a struct which translates to a DB model on datastore, and I have added a new field to the struct, call it NewField (type string)
Existing instances ("rows" in the DB) for this struct have this NewField missing of course, which is expected.
I'm looking to create a query that will return all instances with where this NewField is missing (the existing instances).
This is what I tried:
q := datastore.NewQuery("MyModel")
q = q.Filter("NewField =", "")
However this doesn't seem to work.
Any ideas on how to achieve this?
The bad news is that you can't.
Every query on GAE Datastore operates on an index. Since you just added the new property, existing entities without that property will not be in any indices (that includes that property). What you would need is to loop over entities with no index records, but that is not possible.
Your best bet is to query all entities, and do the filtering / update manually in Go code, where the NewField field has the zero value. Once you re-save existing entities, the new property will get indexed, and you will be able to search / filter by that property in the future.
If by any chance your entities store the creation time or last updated time (in a property), then you may use that: filter by last updated time to list only entities where the timestamp is less than the time when you added the new property to your Go model.
Another option (for future changes) is to add a "version" property to your entities. Whenever you perform a model update, increment the version for new entities. And you can always query entities with old versions (or with a specific version).

Error "Non-repeated field already set." when loading from Datastore into BigQuery

[EDIT 20160426: This bug appears to have been solved now!]
[EDIT 20160219: Updated this question again, to reflect different error messages. See also the bug report I filed.]
We have a datastore table that contains a field category, of type Category, which is a custom class. The problem arises when we try to load this table into BigQuery (from a datastore backup). The resulting table should contain (simplified):
category.subfield1
,category.subfield2
,category.subfield3.subsubfield1
,category.subfield4
,category.subfield5
Instead, BigQuery wreaks havoc on the category field:
category_1.record.subfield1
,category_1.record.subfield2
,category_1.record.subfield3.subsubfield1
,category_1.entity.subfield1
,category_1.entity.subfield1
,category_1.entity.subfield3.subsubfield1
,category_1.entity.subfield4
,category_1.entity.subfield5
,category_1.provided
(Omitting a dozen of __key__-subfields for reasons of exposition.)
Before 20160219, the garbled output of the category-field was even worse, but there was a workaround: explicitly listing all the fields, including category, through the option projection_fields. Now that is no longer possible, since it results in a different error message: Field:category [...] Entity was of unexpected kind "__record__"
Original job-ids:
project id: 711939958575
without projection_fields: job_Qw6-ygtZNFJ-Y7W0uLEqdvOrO_8
with projection_fields: job_lzzXo92lud9r5kvW7Z1kuzFLxS4
We came accross the same problem when loading backups from datastore into BigQuery. We had an 'Order' Entity in which we had a nested entity 'Customer'. Ever since we added an index on one of the fields in the nested entity 'Customer', we would be getting the "Non-repeated field already set" error from BigQuery.
The reason was that setting an index on a field in the nested entity (e.g. Index on field email in Customer) created an index on the Order entity called customer.email. When loading data into BigQuery this results in two fields called customer.email, one from the nested Entity and one from the index.
The solution for us was to remove indices on nested Entities, in order to avoid these conflicts while loading datastore backups into BigQuery. Unfortunately we did have to remove all existing records in database, which for us wasn't a big problem, but alternatively you would have to make sure the Index is properly removed.

Set up relation on two existing Salesforce objects

I have a custom object in Salesforce which I need to setup a Master Detail relationship from Accounts. Accounts being the Master and CompHist being the Detail. The problem I am running into is that I need to set the relation to work off of custom fields within the objects. Example:
1.) Accounts has a custom field called CustomerId.
2.) CompHist also has custom field called CustomerId.
3.) I need to be able to have this linked together by CustomerId field for report generation.
About 2,000 records are inserted into CompHist around the 8th of each month. This is done from a .NET application that kicks off at the scheduled time, collects info from our databases and then uploads that data to salesforce via the SOAP API.
Maybe I'm misunderstanding how Salesforce relationships work as I am fairly new (couple months) to salesforce development.
Thanks,
Randy
There is a way to get this to work without triggers that will link the records or pre-querying the SF to learn Account Ids in .NET before you'll push the CompHistories.
Setup
On Account: set the "External ID" checkbox on your CustomerId field. I'd recommend setting "Unique" too.
On CompHist: you'll need to make decision whether it's acceptable to move them around or when the relation to Account is set - it'll stay like that forever. When you've made that decision tick / untick the "reparentable master-detail" in the definition of your lookup / m-d to Account.
And if you have some Id on these details, something like "line item number" - consider making an Ext. Id. for them too. Might save your bacon some time in future when end user questions the report or you'll have to make some kind of "flush" and push all lines from .NET (will help you figure out what's to insert, what's to update).
At this point it's useful to think how are you going to fill the missing data (all the nulls in the Ext. Id) field.
Actual establishing of the relationship
If you have the external ids set it's pretty easy to tell salesforce to figure out the linking for you. The operation is called upsert (mix between update and insert) and can be used in 2 flavours.
"Basic" upsert is for create/update solving; means "dear Salesforce, please save this CompHist record with MyId=1234. I don't know what's the Id in your database and frankly I don't care, go figure this out will ya?"
If there was no such record - 1 will be created.
If there was exactly 1 match - it will be updated.
If there were more than 1 found - SF won't know which one to update and throw error back at you (that's why marking as "unique" is a good idea. There's a chance you'll spot errors sooner).
"Advanced" upsert is for maintaining foreign keys, establishing lookups. "Dear SF, please hook this CompHist up to Account which is marked as "ABZ123" in my DB. Did I mention I don't care about your Ids and I can't be bothered to query your database first prior to me uploading my stuff?"
Again - exact match - works as expected.
0 or 2 Accounts with same ext. id value = error.
Code plz
I'd recommend you to play with Data Loader or similar tool first to get a grasp. of what exactly happens, how to map fields and how to not be confused (these 2 flavours of upsert can be used at same time). Once you'll manage to push the changes the way you want you can modify your integration a bit.
SOAP API upsert: http://www.salesforce.com/us/developer/docs/api/Content/sforce_api_calls_upsert.htm (C# example at the bottom)
REST API: http://www.salesforce.com/us/developer/docs/api_rest/Content/dome_upsert.htm
If you'd prefer an Salesforce Apex example: Can I insert deserialized JSON SObjects from another Salesforce org into my org?

App Engine: create object inside a transaction

I'm writing a site on GAE-Java + Objectify which lets users create their own pages, with unique URL. I haven't been able to figure out a clear way to ensure that when two users try to claim the same url at the same time, only one user gets it.
This is what I'm trying to avoid:
User 1 does a check - its available
User 2 does a check - its available
Meanwhile, User 1 creates page and stores it.
User 2 creates a page and overwrites User 1.
Any ideas on how to solve this on GAE?
Why not just run your code in a transaction? I don't see where the issue is. Do you have a sample of something you've tried and had problems with?
Found a clearer explanation in the python docs:
Attempts to get the entity of the model's kind with the given key name. If it exists, get_or_insert() simply returns it. If it doesn't exist, a new entity with the given kind, name, and parameters in kwds is created, stored, and returned.
The get and subsequent (possible) put are wrapped in a transaction to ensure atomicity. Ths means that get_or_insert() will never overwrite an existing entity, and will insert a new entity if and only if no entity with the given kind and name exists.
In other words, get_or_insert() is equivalent to this Python code:
def txn():
entity = MyModel.get_by_key_name(key_name, parent=kwds.get('parent'))
if entity is None:
entity = MyModel(key_name=key_name, **kwds)
entity.put()
return entity
return db.run_in_transaction(txn)

Resources