Updating Model Schema in Google App Engine? - google-app-engine

Google is proposing changing one entry at a time to the default values ....
http://code.google.com/appengine/articles/update_schema.html
I have a model with a million rows and doing this with a web browser will take me ages. Another option is to run this using task queues but this will cost me a lot of cpu time
any easy way to do this?

Because the datastore is schema-less, you do literally have to add or remove properties on each instance of the Model. Using Task Queues should use the exact same amount of CPU as doing it any other way, so go with that.
Before you go through all of that work, make sure that you really need to do it. As noted in the article that you link to, it is not the case that all entities of a particular model need to have the same set of properties. Why not change your Model class to check for the existence of new or removed properties and update the entity whenever you happen to be writing to it anyhow.

Instead of what the docs suggest, I would suggest to use low level GAE API to migrate.
The following code will migrate all the items of type DbMyModel:
new_attribute will be added if does not exits.
old_attribute will be deleted if exists.
changed_attribute will be converted from boolean to string (True to Priority 1, False to Priority 3)
Please note that query.Run returns iterator returning Entity objects. Entity objects behave simply like dicts:
from google.appengine.api.datastore import Query, Put
query = Query("DbMyModel")
for item in query.Run():
if not 'new_attribute' in item:
item['attribute'] = some_value
if 'old_attribute' in item:
del item['old_attribute']
if ['changed_attribute'] is True:
item['changed_attribute'] = 'Priority 1'
elif ['changed_attribute'] is False:
item['changed_attribute'] = 'Priority 3'
#and so on...
#Put the item to the db:
Put(item)
In case you need to select only some records, see the google.appengine.api.datastore module's source code for extensive documentation and examples how to create filtered query.
Using this approach it is simpler to remove/add properties and avoid issues when you have already updated your application model than in GAE's suggested approach.
For example, now-required fields might not exist (yet) causing errors while migrating. And deleting fields does not work for static properties.

This doesn't help OP but may help googlers with a tiny app: I did what Alex suggested, but simpler. Obviously this isn't appropriate for production apps.
deploy App Engine Console
write code right inside the web interpreter against your live datastore
like so:
from models import BlogPost
for item in BlogPost.all():
item.attr="defaultvalue"
item.put()

Related

Adding Prefix to Firestore Default Document Id

Regarding Firebase document Ids, I am trying to set a prefix before the default Firebase docId generation. For example, if the default document Id generated is 23492drf94fl, then I would write it as, somePrefix:23492drf94fl.
Currently, I understand that there are two ways to do this: one with generating your own custom UUID on the client side with custom prefix, and another is to rewrite the document Id after initially writing it in Firestore.
Is there any shorthand method or function I could use in React (Node.js) to just use the default Firestore docId generation w/ a specified prefix?
Thanks in advance.
There isn't a shorter/easier method or function that would allow you to set a prefix to the auto-generated id. You will need to do it manually as you mentioned it and even doing this manually, it's not a very good option, as you will be impacting more of your application and, of course, spending part of your quotas on each read and write every time a new doc is created.
However, if you really would like or need, to have the document id with a prefix, I would recommend you to use a second field, where you would copy the value of the document id and then, add the prefix. This way, you won't affect the default field created - which can impact the uniform distribution of it, since it's automatically created - and you would still be able to have a MATH:235E23 or SCI:2309F4 your database, that you can use as a default field for you.
Besides that, in case you feel this could be a good improvement to the system, please, consider raising a Feature Request in Google's Issue Tracker, so they can check about the possibility of implementing it in the future.
Let me know if the information helped you!

MongoDB - Update subdocuments using javascript

The documents in my apps collection each contain a subcollection of users. Now I need to update a single user per app given a set of _ids for the apps collection using javascript. I cannot use a regular call to update() for this, as the data inserted will be encrypted using a public key stored within the app document. Therefore the data written into the user-subdocument is dependant on the app-document it is contained in. Pseudo-code of what I need to do:
foreach app in apps:
app.users.$.encryptedData = encrypt(data, app.publicKey)
One way to do it would be to find all the apps and then use forEach() to update every single app. However, this seems to be quite inefficient to me, as all the app-documents would have to be found twice in the database, one time to gather all of them and then another time to update every single document. There has to be a more efficient way.
The short answer is that no, you can not update a document in mongoDB with a value from that document.
Have a look at
https://stackoverflow.com/a/37280419/5293110
for ideas other that doing the iteration yourself.

How to temporarily disable sitecore indexing while editing items

I am developing a Sitecore project that has several data import jobs running on daily basis. Every time a job is executed, it may update a large amount of Sitecore items (thousands) and I've noticed that all these editings trigger Solr index updates.
My concern is, I don't really sure if this is better or update everything at the end of the job is. So, I would love to try both options. Could anyone tell me how can I use code to temporarily disable Lucene/Solr indexing and enable it later when I finish editing all items?
This is a common requirement, and you're right to have such concerns. In general it's considered good practice to disable indexing during big import jobs, then rebuild afterwards.
Assuming you're using Sitecore 7 or above, this is pretty much what you need:
IndexCustodian.PauseIndexing();
IndexCustodian.ResumeIndexing();
Here's a comprehensive article discussing this:
http://blog.krusen.dk/disable-indexing-temporarily-in-sitecore-7/
In addition to #Martin answer, you can pass (silent=true) when you finish the editing of the item, Something like:
item.Editing.BeginEdit();
//Change fields values
item.Editing.EndEdit(true,true);
The second parameter in EndEdit() method force a silent update of the item, which means no Events/Indexing will be triggered on item save.
I feel this is safer than pausing indexing on the whole application level during import process, you just skip indexing of the items you are updating.
EDIT:
In case you need to rebuild the index for the updated items after the import process is done, you can use the following code, It will index the content tree starting from RootItemInTree and below:
var index = Sitecore.ContentSearch.ContentSearchManager.GetIndex("Your_Index_Name")
index.Refresh(new SitecoreIndexableItem(RootItemInTree));
To disable indexing during large import/update tasks you should wrap your logic inside a BulkUpdateContext block. You can also use other wrappers like the EventDisabler to stop events from being fired if that is appropriate in your context. Alternatively you could wrap your code in an EditContext and set it to silent. So your code could end up something like this:
using (new BulkUpdateContext())
using (new EditContext(targetItem, false, true))
{
// insert update logic here...
}
here is a older question that discusses this topic: Optimisation tips when migrating data into Sitecore CMS

Objects not saving using Objectify and GAE

I'm trying to save an object and verify that it is saved right after, and it doesn't seem to be working.
Here is my object
import com.googlecode.objectify.annotation.Entity;
import com.googlecode.objectify.annotation.Id;
#Entity
public class PlayerGroup {
#Id public String n;//sharks
public ArrayList<String> m;//members [39393,23932932,3223]
}
Here is the code for saving then trying to load right after.
playerGroup = new PlayerGroup();
playerGroup.n = reqPlayerGroup.n;
playerGroup.m = reqPlayerGroup.m;
ofy().save().entity(playerGroup).now();
response.i = playerGroup;
PlayerGroup newOne = ofy().load().type(PlayerGroup.class).id(reqPlayerGroup.n).get();
But the "newOne" object is null. Even though I just got done saving it. What am I doing wrong?
--Update--
If I try later (like minutes later) sometimes I do see the object, but not right after saving. Does this have to do with the high replication storage?
Had the same behavior some time ago and asked a question on google groups - objectify
Here the answer I got :
You are seeing the eventual consistency of the High-Replication
Datastore. There has been a lot of discussion of this exact subject
on the Objecify list in google groups , including several links to the
Google documentation on the subject.
Basically, any kind of query which does not include an ancestor() may
return results from a stale view of the datastore.
Jeff
I also got another good answer to deal with the behavior
For deletes, query for keys and then batch-get the entities. Make sure
your gets are set to strong consistency (though I believe this is the
default). The batch-get should return null for the deleted entities.
When adding, it gets a little trickier. Index updates can take a few
seconds. AFAIK, there are three ways out of this: 1; Use precomputed
results (avoiding the query entirely). If your next view is the user's
recently created entities, keep a list of those keys in the user
entity, and update that list when a new entity is created. That list
will always be fresh, no query required. Besides avoiding stale
indexes, this also speeds up your app. The more you result sets you
can reliably manage, the more queries you can avoid.
2; Hide the latency by "enhancing" the query results with the recently
added entities. Depending on the rate at which you're adding entities,
either inject only the most recent key, or combine this with the
solution in 1.
3; Hide the latency by taking the user through some unaffected views
before landing on your query-based view. This strategy definitely has
a smell over it. You need to make sure those extra steps are relevant
to the user, or you'll give a poor experience.
Butterflies, Joakim
You can read it all here:
How come If I dont use async api after I'm deleting an object i still get it in a query that is being done right after the delete or not getting it right after I add one
Another good answer to a similar question : Objectify doesn't store synchronously, even with now

Is it possible to have 2 Models loaded for one entity kind to support data migration?

I am writing a datastore migration for our current production App Engine application.
We made some fairly extensive changes to the data model so I am trying to put in place an architecture to allow easier migrations in the future. This includes test suites for the migrations and common class structures for migration scripts.
I am running into a problem with my current strategy. For both the migrations and the test scripts I need a way to load the Model classes from the old schema and the Model classes for the new data schema into memory at the same time and load entities using either.
Here is an example set of schemas.
rev1.py
class Account(db.Model):
_version = db.IntegerProperty(default = 1)
user = db.UserProperty(auto_current_user_add = True, required = True)
name = db.StringProperty()
contact_email = db.EmailProperty()
rev2.py
class Account(db.Model):
_version = db.IntegerProperty(default = 2)
auth_id = db.StringProperty()
name = db.StringProperty()
pwd_hash = db.StringProperty(required = True, indexed = False)
A migration script may look something like:
import rev1
import rev2
class MyMigration(...):
def isNeeded(self):
num_accounts = num_entities_with_version(rev1.Account, 1)
return num_accounts > 0
def run(self):
rev1_accounts = rev1.Account.all()
for account in [a for a in rev1_accounts if account._version == 1]:
auth_id = account.contact_email
if auth_id is None or auth_id == '':
auth_id = account.user.email()
new_account = rev2.Account.create(auth_id = auth_id,
name = account.name)
And a test suite would look something like this:
import rev1
import rev2
class MyTest(...):
def testIt(self):
# Setup data
act1 = rev1.Account(name = '..', contact_email = '..')
act1.put()
act2 = rev1.Account(name = '..', contact_email = '..')
act2.put()
# Run migration
migration.run()
# Check results
accounts = rev2.Account.all().fetch(99)
So as you can see I am using the old revision in two ways. I am using it in the migration as a way to read data in the old format and convert it into the new format. (note: I can't read it in the new format because of things like the required pwd_hash field and other field changes). I am using it in the test suite to setup test data in the old format before running the migration.
It all seems great in theory, but in practice it falls apart because GAE doesn't allow multiple models to be loaded for the same kind, or more specifically, queries only return for the most recently defined model.
In the development server this seems to be due to the fact that the process of calling get() on a query for an entity (ex: Account.get(my_key)) calls a result hook that builds the result Model object by calling class_for_kind on the entity kind name from the data. So even though I may call rev2.Account.get(), it may build up rev1.Account Model objects because the kind 'Account' maps to rev1.Account in the _kind_map dictionary.
This has made me rethink my migration strategy a bit and I wanted to ask if anyone has thoughts. Specifically:
Would it be safe to manually override google.appengine.ext.db._kind_map at runtime in test and on the production servers to allow this migration method to work?
Is there some better way to keep two versions of a Model in memory at the same time?
Is there a different migration method that may be a smarter way to go about this work?
Other methods I have thought of trying include:
Change the entity kind when the version changes. (use kind() to change it) Then when we migrate we move all classes to the new kind name.
Find a way to query the entities and get back a 'raw' object (proto buffers??) that has not been built into a full object. (would not work with tests)
'Just Do It Live': Don't write tests for any of this and just try to migrate using the latest schema loading the older data working around issues as the come up.
I think there are actually several questions within the greater question. There seem to be two key questions here though, one is how to test and the other is how to really do it.
I wouldn't define the kind multiple times; as you've noted there are nuances to doing this, and, if you wind up with the wrong model loaded, you'll get all sorts of headaches. That said, it is completely possible for you to manipulate the kind_map. I've done this in some special cases, but I try to avoid it when possible.
For a live migration where you've got significant schema changes, you've got two choices: use Expando or use the lower level API. When adding required fields, you might find it easier to use Expando, then run a migration to add the new information, then switch back to a plain db.Model. The lower-level API sits right under the ext.db stuff, and it presents the entity as a Python dict. This can be very convenient for manipulating an entity. Use whichever method you're more comfortable with. I prefer Expando when posible, since it is a higher level interface, but it is a two-step process.
For testing, I'd personally suggest you focus on the actual conversion routines. So instead of testing the method from the point of querying down, test to ensure your conversion routines themselves function correctly. You might even choose to pass in the old entity as a Python dict, then return the new entity.
I'd make one other adjustment here as well. I'd rather use a query to find all my rev 1 accounts. That's the great thing about having an indexed _version on your models. You can trivially find things that need migrated.
Also, check out Google's article on updating schemas. It is old, but still good.
Another approach is to simply do the migration on version 2, leaving the old attributes on the model and setting them to None after you update the version. This will clear out the space they use but will still leave them defined. Then in a following release you can just remove them from the model.
This method is pretty simple, but does require two releases to remove old attribute completely, so is more akin to deprecating the existing attributes.

Resources