Avoid auto persist when deleting document from embedded association - mongoid

MongoID currently auto persist the parent document along with the embedded association when removing anything from the embed-many association. Is there a way to avoid this auto persist?
Document model
User has many embedded hobbies
Both the method below auto persist meaning it makes call to DB.
user.hobbies = [hobby1]
I need to avoid that because I am doing a bulk operation on list of users and want to save extra db calls that are made for embedded associations.


Syncing Javascript State across UI

There are a lot of questions on syncing state between devices or from external storage to/from the UI. This question is about state within the UI.
The UI may have multiple state objects that can point to one entity.
Eg. Multiple User Models that have the same ID and are essentially the same User in the Database.
The second option is to have a pattern that prevents multiple entities and enforces a single Entity is never duplicated.
Eg. Retrieving a User Model with ID=1 will always return the same Model.
So the options I currently face:
Have multiple Models point to the same DB entity
Enforce a single Instance of a Model reflects a DB entity
Both of these have their tradeoffs:
Have multiple Models point to the same DB entity
This requires syncing the Models with the same ID when a copy if updated.
This becomes non-trivial in implementation.
The current implementation we have is an EntityManager that keeps copies of each model and will propagate writes to all copies.
It however has complexities in syncing due to async writes to the remote copies, reads from other devices and remote fetches as well as reactions (mobx) within models need to ensure they are reacting to a consistent state of the model.
Enforce a single Instance of a Model reflects a DB entity
This requires no work to sync. However we have the complexity of ensuring we don't have any copies of a Model pointing to the same DB entity.
This becomes subject to coding conventions.
model.fromJSON({ title: 'foo' })
model = model.fromJSON({ title: 'foo' })
model = model.fetch()
This is hard to understand for new developers and can be missed over time creating hard to debug errors.
The question is how do you generally solve this scenario with a consistent and the least complex in terms of bugs case.

Data Modeling - modeling an Append-only list in NDB

I'm trying to make a general purpose data structure. Essentially, it will be an append-only list of updates that clients can subscribe to. Clients can also send updates.
I'm curious for suggestions on how to implement this. I could have a ndb.Model, 'Update' that contains the data and an index, or I could use a StructuredProperty with Repeated=true on the main Entity. I could also just store a list of keys somehow and then the actual update data in a not-strongly-linked structure.
I'm not sure how the repeated properties work - does appending to the list of them (via the Python API) have to rewrite them all?
I'm also worried abut consistency. Since multiple clients might be sending updates, I don't want them to overwrite eachother and lose an update or somehow end up with two updates with the same index.
The problem is that you've a maximum total size for each model in the datastore.
So any single model that accumulates updates (storing the data directly or via collecting keys) will eventually run out of space (not sure how the limit applies with regard to structured properties however).
Why not have a model "update", as you say, and a simple version would be to have each provided update create and save a new model. If you track the save date as a field in the model you can sort them by time when you query for them (presumably there is an upper limit anyway at some level).
Also that way you don't have to worry about simultaneous client updates overwriting each other, the data-store will worry about that for you. And you don't need to worry about what "index" they've been assigned, it's done automatically.
As that might be costly for datastore reads, I'm sure you could implement a version that used repeated properties in a single, moving to a new model after N keys are stored but then you'd have to wrap it in a transaction to be sure mutiple updates don't clash and so on.
You can also cache the query generating the results and invalidate it only when a new update is saved. Look at NDB also as it provides some automatic caching (not for a query however).

GAE Transaction in root entity

I'm new to GAE and I have some questions about transaction with the DataStore.
For example, I have a user entity, which is created when the user adds my app on Facebook. I get some properties with the Facebook API, but I want to add a username for the user, and it needs to be unique. So in the transaction scope I call this method:
def ExistsUsernameToDiferentUser(self, user, username):
query = User.all()
query.filter("username", username)
query.filter("idFacebook != ", user.idFacebook)
userReturned = query.get()
return True if userReturned else False
But GAE gives me this error:
BadRequestError: queries inside transactions must have ancestors
Ok, I understand, but the user doesn't have any ancestor, it's a root entity. What do I have to do?
I see what you're trying to do now.
By forcing the use of ancestors, the datastore forces you to lock down a portion of the datastore (everything under the given ancestor) so you can guarantee consistency on that portion. However, to do what you want, you essentially need to lock down all User entities to query whether a certain one exists, and then create a new one, and then unlock them.
You CAN do this, just create an entity, it can be an empty entity, but make sure it has a unique key (like "user-ancestor"), save it, and make it the ancestor of every User entity.
THIS IS A PROBABLY A BAD IDEA since this limits your performance on User entities, particularly on writes. Every time a new user is created, all User entities are prevented from being updated.
I'm trying to illustrate how you need to think about transactions a bit differently in the HRD world. It's up to you to structure your data (using ancestors) so that you get good performance characteristics for your particular application. In fact, you might disagree with me and say that User entities will be updated so infrequently that it's ok to lock them all.
For illustrative purposes, another short-sighted possibility is to create multiple ancestors based on the username. ie, one for each letter of the alphabet. Then when you need to create a new User, you can search based on the appropriate ancestor. While this is an improvement from having a single ancestor (it's 26 times better), it still limits your future performance up front. This may be ok if you know right now the total number of users you will eventually have, but I suspect you want hundreds of millions of users.
The best way is to go back to the other suggestion and make the username the key. This allows you the best scalability, since getting/setting the User entity by key can be transactional and won't lock down other entities, limiting your scalability.
You'll need to find a way to work your application around this. For example, whatever information you get before the username can be stored in another entity that has a RelatedField to the User which is created later. Or you can copy that data into the User entity after the User entity is created by key, then remove the original entity.
If usernames are unique why dont you make it the key?
class User(db.Model):
def username(self):
return self.key().name()
Note: You will not need transactions if you use get_or_insert

What is the best strategy for mirroring a remote DB in Core Data?

Let's say that I have two tables in a DB: Expenses and Account. Expenses is the data that I'm interested in and that table has a foreign key to Account. This DB is remote, accessed via Restful-esque commands, and I want to mirror just the data I need for my app in a Core Data data store on the iPhone. The actual DB I'm working with is much bigger than this example. ~30 tables and the Expenses table has ~7 FKs. I'm working closely with the person doing the API design, so I can modify the way I make my requests or the data returned, if necessary.
What is the best strategy for loading this data into Core Data?
My first thought was to have the request for the expense bring back the ids for the FK.
This works fine if I already have an account with id '123' in my data store. If I don't, then I've got to make additional web requests every time I encounter an id I don't haveā€¦ which is going to be incredibly slow. I can get around this by making requests in a specific order, i.e. request all new accounts before requesting expenses, so that I way I know all the FK rows exist. I feel this would become much too cumbersome once the DB starts reaching moderate complexity.
My second thought was to have the data returned from the request follow FKs and return data from the FK.
<name>Bob's Big Boy</name>
<address>1234 Main Street</address>
This looks better and guarantees that I'll have all the data I need when I need it. If I don't already have an account '123' I can create a new account object from that XML. My concern with this method, though, is that as the database grows in complexity, these XML files could become excessively large. The Expenses table has ~7 foreign keys, each of those tables has multiple FKs. It feels like a simple request for just a single Expense could end up returning a huge chunk of data.
How have other people solved this issue?
I am assuming that at any given time you only want to cache part of the server DB in the local app and that the data cached may change overtime.
You probably want to use "stub" entities to represent related objects that you haven't actually downloaded yet. You would set up the entities like this:
The AccountStub entity has the bare minimum info needed to identify the Account in the server DB based on info provided from the Expense table. It serves as a placeholder in the object graph for the full fledged Account object (you can think of it as a type of fault if you like.)
Since Expenses has the relationship with AccountStub and Account inherits from AccountStub you can swap out an Account for an AccountStub (and vice versa) as needed.
You will need to provide a custom subclass for AccountStub and Account such that AccountStub can trigger the downloading of account data and the creation of an Account object when that data is actually required. Then the new Account object should be swapped out for AccountStub in all its relationships (that may take rather a lot of code.)
To use, you would first obtain the data for an Expense object and create that object. You would attempt to fetch for an AccountStub with the ID provided from the Expense table data. Set the fetch to include subentries. If an AccountStub or Account object exist with that ID you will add the Expense object to the relationship. If not, the you create an AccountStub object with that ID and add it to the relationship. Now you have a basic object graph showing the relationship of an Expense object to an AccountStub object. To access the account data of an Expense, you would first check if the related account is a stub or a full account. If it is a stub, then you need to load the full account data before preceding.
The advantage of this system is that you can maintain a fairly complex object graph without having to actually have all the data locally. E.g. you can maintain several relationships and walk those relationships. E.g you could expand your model like this:
If you wanted to find the name of an Expense object's account owner, you would just walk the relationship across the stubs with account.owner.name the Account object itself would would remain just a stub.
If you need to conserve room locally, you can revert an object back to a stub without compromising the graph.
This would take some work and you would have to keep an eye on the stubs but it would let you mirror a complex external DB without having to keep all the data on hand.

Unit of Work - What is the best approach to temporary object storage on a web farm?

I need to design and implement something similar to what Martin Fowler calls the "Unit of Work" pattern. I have heard others refer to it as a "Shopping Cart" pattern, but I'm not convinced the needs are the same.
The specific problem is that users (and our UI team) want to be able to create and assign child objects (with referential integrity constraints in the database) before the parent object is created. I met with another of our designers today and we came up with two alternative approaches.
a) First, create a dummy parent object in the database, and then create dummy children and dummy assignments. We could use negative keys (our normal keys are all positive) to distinguish between the sheep and the goats in the database. Then when the user submits the entire transaction we have to update data and get the real keys added and aligned.
I see several drawbacks to this one.
It causes perturbations to the indexes.
We still need to come up with something to satisfy unique constraints on columns that have them.
We have to modify a lot of existing SQL and code that generates SQL to add yet another predicate to a lot of WHERE clauses.
Altering the primary keys in Oracle can be done, but its a challenge.
b) Create Transient tables for objects and assignments that need to be able to participate in these reverse transactions. When the user hits Submit, we generate the real entries and purge the old.
I think this is cleaner than the first alternative, but still involves increased levels of database activity.
Both methods require that I have some way to expire transient data if the session is lost before the user executes submit or cancel requests.
Has anyone solved this problem in a different way?
Thanks in advance for your help.
I don't understand why these objects need to be created in the database before the transaction is committed, so you might want to clarify with your UI team before proceeding with a solution. You may find that all they want to do is read information previously saved by the user on another page.
So, assuming that the objects don't need to be stored in the database before the commit, I give you plan C:
Store initialized business objects in the session. You can then create all the children you want, and only touch the database (and set up references) when the transaction needs to be committed. If the session data is going to be large (either individually or collectively), store the session information in the database (you may already be doing this).
