In the App Engine Documentation I found an interesting strategy for keeping up to date with changes in the datastore by using Cursors:
An interesting application of cursors is to monitor entities for unseen changes. If the app sets a timestamp property with the current date and time every time an entity changes, the app can use a query sorted by the timestamp property, ascending, with a Datastore cursor to check when entities are moved to the end of the result list. If an entity's timestamp is updated, the query with the cursor returns the updated entity. If no entities were updated since the last time the query was performed, no results are returned, and the cursor does not move.
However, I'm not quite sure how this can always work. After all, when using the High Replication Datastore, queries are only eventually consistent. So if two entities are put, and only the later of the two is seen by the query, it will move the cursor past both of them. Which will mean that the first of the two new entities will remain unseen.
So is this an actual issue? Or is there some other way that cursors work around this?
Having an index, builtin or composite, on a property that contains a monotonically increasing value (such as the current timestamp) may not perform as well as you may want at high write rates. This type of workload will generate a hotspot, as the tail of the index is constantly being updated as opposed to the load being distributed throughout the sorted index. However, for low write-rates, this will work fine.
The rest of the answer will depend on whether you are in the same entity group or separate entity groups.
If your query is an ancestor query, and thus in the same entity group it can be strongly consistent (by default they are), and the described method should always be accurate. The query will immediately see any writes (changes to an entity inside the entity group).
If you are querying over many entities groups, which is always eventually consistent, then there is no guarantee what order the writes are applied/visible. For example:
- Time1 - Write EntityA
- Time2 - Write EntityB
- Time3 - Query only sees EntityB
- Time4 - Query sees EntityA and EntityB
So the method of using a cursor to detect a change is correct, but it may "skip" over some changes.
For more information on eventual/strong consistency, see Balancing Strong and Eventual consistency with Google Cloud Datastore
You'll probably be best informed if you could ask someone who's worked on it, but after thinking about it a bit and re-reading Paxos a bit, I think it should not be a problem, though it would depend on how the datastore's actually implemented.
A cursor is essentially a position in the index. In theory you can re-read the same cursor over and over, and see new entities start appearing after it. In the real world case, you'll generally move on to the newest cursor position and forget about the old cursor position.
Eventual consistency "problems" appear because there's multiple copies of the index spread across multiple machines. Depending on which index you read from, you may get stale results.
You describe a problem case where there are two (exact) copies of an index I, and two new entities are created, E1, and E2. Say I1 = I + E1 and I2 = I + E2, so depending on the index you read from, you might get E1 or E2 as the new entity, move your cursor, and miss an entity when the index gets "patched" with the other index, ie I2 eventually gets patched to I + E1 + E2.
If the datastore actually happens that way, then I suspect, yes, you can get a problem. However, it sounds very difficult to operate that way, and I suspect the datastore indexes only get updated after the Paxos voting comes to an agreement. So you'll never seen an out-of-order index, you'll only see entities show up late: ie, you'll never see I + E2, you'll only ever see (I) or (I + E1) or (I + E1 + E2)
I suspect though, you might get a problem where you may be able to have a cursor that's too new for an index that hasn't caught up yet.
Related
Summary
I have an issue where the database writes from my task queue (approximately 60 tasks, at 10/s) are somehow being overwritten/discarded during a concurrent database read of the same data. I will explain how it works. Each task in the task queue assigns a unique ID to a specific datastore entity of a model.
If I run a indexed datastore query on the model and loop through the entities while the task queue is in progress, I would expect that some of the entities will have been operated on by the task queue (ie.. assigned an ID) and others are still yet-to-be effected. Unfortunately what seems to be happening is during the loop through the query, entities that were already operated on (ie.. successfully assigned an ID) are being overwritten or discarded, saying that they were never operated on, even though -according to my logs- they were operated on.
Why is this happening? I need to be able to read the status of my data without affecting the taskqueue write operation in the background. I thought maybe it was a caching issue so I tried enforcing use_cache=False and use_memcache=False on the query, but that did not solve the issue. Any help would be appreciated.
Other interesting notes:
If I allow the task queue to complete fully before doing a datastore query, and then do a datastore query, it acts as expected and nothing is overwritten/discarded.
This is typically an indication that the write operations to the entities are not performed in transactions. Transactions can detect such concurrent write (and read!) operations and re-try them, ensuring that the data remains consistent.
You also need to be aware that queries (if they are not ancestor queries) are eventually consistent, meaning their results are a bit "behind" the actual datastore information (it takes some time from the moment the datastore information is updated until the corresponding indexes that the queries use are updated accordingly). So when processing entities from query results you should also transactionally verify their content. Personally I prefer to make keys_only queries and then obtain the entities via key lookups, which are always consistent (of course, also in transactions if I intend to update the entities and, on reads, if needed).
For example if you query for entities which don't have a unique ID you may get entities which were in fact recently operated on and have an ID. So you should (transactionally) check if the entity actually has an ID and skip its update.
Also make sure you're not updating entities obtained from projection queries - results obtained from such queries may not represent the entire entities, writing them back will wipe out properties not included in the projection.
Datastore documentation is very clear that there is an issue with "hotspots" if you include 'monotonically increasing values' (like the current unix time), however there isn't a good alternative mentioned, nor is it addressed whether storing the exact same (rather than increasing values) would create "hotspots":
"Do not index properties with monotonically increasing values (such as a NOW() timestamp). Maintaining such an index could lead to hotspots that impact Cloud Datastore latency for applications with high read and write rates."
https://cloud.google.com/datastore/docs/best-practices
I would like to store the time when each particular entity is inserted into the datastore, if that's not possible though, storing just the date would also work.
That almost seems more likely to cause "hotspots" though, since every new entity for 24 hours would get added to the same index (that's my understanding anyway).
Perhaps there's something more going on with how indexes work (I am having trouble finding great explanations of exactly how they work) and having the same value index over and over again is fine, but incrementing values is not.
I would appreciate if anyone has an answer to this question, or else better documentation for how datastore indexes work.
Is your application actually planning on querying the date? If not, consider simply not indexing that property. If you only need to read that property infrequently, consider writing a mapreduce rather than indexing.
That advice is given due to the way BigTable tablets work, which is described here: https://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
To the best of my knowledge, it's more important to have the primary key of an entity not be a monotonically increasing number. It would be better to have a string key, so the entity can be stored with better distribution.
But saying this as a non-expert, I can't imagine that indexes on individual properties with monotonic values would be as problematic, if it's legitimately needed. I know with the Nomulus codebase for example, we had a legitimate need for an index on time, because we wanted to delete commit logs older than a specific time.
One cool thing I think happens with these monotonic indexes is that, when these tablet splits don't happen, fetching the leftmost or rightmost element in the index actually has better latency properties than fetching stuff in the middle of the index. For example, if you do a query that just grabs the first result in the index, it can actually go faster than a key lookup.
There is a key quote in the page that Justine linked to that is very helpful:
As a developer, what can you do to avoid this situation? ... Lower your write rate, or figure out how to better distribute values.
It is ok to store an indexed time stamp as long as that entity has a low write rate.
If you have an entity where you want to store an indexed time stamp and the entity has a high write rate, then the solution is to split the entity into two entities. Entity A will have properties that need to be updated frequently and entity B will have the time stamp and properties that don't get updated often.
When I do this, I have a common ID for the two entities to make it really easy to get from one to the other.
You could try storing just the date and put random hours, minutes, and seconds into the timestamp, then throw away that extra data later. (Or keep the hours and minutes and use random seconds, for example). I'm not 100% sure this would work but if you need to index the date it's worth trying.
In my application I have a datastore query with a filter, such as:
datastore.NewQuery("sometype").Filter("SomeField<", 10)
I'm using a cursor to iterate batches of the result (e.g in different tasks). If the value of SomeField is changed while iterating over it, the cursor will no longer work on google app engine (works fine on devappserver).
I have a test project here: https://github.com/fredr/appenginetest
In my test I ran /db that will setup the db with 10 items with their values set to 0, then ran /run/2 that will iterate over all items where the value is less than 2, in batches of 5, and update the value of each item to 2.
The result on my local devappserver (all items are updated):
The result on appengine (only five items are updated):
Am I doing something wrong? Is this a bug? Or is this the expected result?
In the documentation it states:
Cursors don't always work as expected with a query that uses an inequality filter or a sort order on a property with multiple values.
The problem is the nature and implementation of the cursors. The cursor contains the key of the last processed entity (encoded), and so if you set a cursor to your query before executing it, the Datastore will jump to the entity specified by the key encoded in the cursor, and will start listing entities from that point.
Let's examine your case
Your query filter is Value<2. You iterate over the entities of the query result, and you change (and save) the Value property to 2. Note that Value=2 does not satisfy the filter Value<2.
In the next iteration (next batch) a cursor is present which you apply properly. Therefore when the Datastore executes the query, it jumps to the last entity processed in the previous iteration, and wants to list entities that come after this. But the entity pointed by the cursor may already not satisfy the filter; because the index entry for its new Value 2 will most likely be already updated (non-deterministic behavior - see eventual consistency for more details which applies here because you did not use an Ancestor query which would guarantee strongly consistent results; the time.Sleep() delay just increases the probability of this).
So the Datastore sees that the last processed entity does not satisfy the filter and will not search all the entities again but report that no more entities are matching the filter, hence no more entities will be updated (and no errors wil be reported).
Suggestion: don't use cursors and filter or sort by the same property you're updating at the same time.
By the way:
The part from the Appengine docs you quoted:
Cursors don't always work as expected with a query that uses an inequality filter or a sort order on a property with multiple values.
This is not what you think. This means: cursors may not work properly on a property which has multiple values AND the same property is either included in an inequality filter or is used to sort the results by.
By the way #2
In the screenshot you are using SDK 1.9.17. The latest SDK version is 1.9.21. You should update it and always use the latest available version.
Alternatives to achieve your goal
1) Don't use cursors
If you have many records, you won't be able to update all your entities in one step (in one loop), but let's say you update 300 entities. If you repeat the query, the already updated entities will not be in the results of executing the same query again because the updated Value=2 does not satisfy the filter Value<2. Just redo the query+update until the query has no results. Since your change is idempotent, it would not cause any harm if the update of the index entry of an entity is delayed and would get returned by the query multiple times. It would be best to delay the execution of the next query to minimize the chance of this (e.g. wait a few seconds between redoing the query).
Pros: Simple. You already have the solution, just exclude the cursor handling part.
Cons: Some entities might get updated multiple times (therefore the change must be idempotent). Also the change performed on entities must be something which will exclude the entity from the next query.
2) Using Task Queue
You could first execute a keys-only query and defer the update to using tasks. You could create tasks with let's say passing 100 keys to each, and the tasks could load the entities by key and do the update. This would ensure each entity would only get updated once. This solution would have a little bigger delay due to involving the task queue, but that is not a problem in most cases.
Pros: No duplicated updates (therefore change may be non-idempotent). Works even if the change to be performed would not exclude the entity from the next query (more general).
Cons: Higher complexity. Bigger lag/delay.
3) Using Map-Reduce
You could use the map-reduce framework/utility to do massively parallel processing of many entities. Not sure if it has been implemented in Go.
Pros: Parallel execution, can handle even millions or billions of entities. Much faster in case of large entity number. Plus pros listed at 2) Using Task Queue.
Cons: Higher complexity. Might not be available in Go yet.
I have read the documentation at the link below, but have this important question still open. Do composite indexes become consistent in the same order as the original entity update? For example, let's say the same indexed property that is part of a composite index gets updated for rec1, rec2, and rec3. The recs get updates one second apart (rec1=T0, rec2=T0+1, rec3=T0+2). As the index updates get fanned out, can one assume that the indexes become eventually consistent in the same order as the updates? IOW, the index consistency for rec1 precedes the consistency for rec2 which precedes the consistency for rec3. Not asking if the consistency is the same one second apart (that is not important), but more simply whether the order for becoming consistent stays the same. Or is it possible that rec3's index will become consistent before rec2 or rec1. Many thanks. -stevep
Link: http://code.google.com/appengine/articles/life_of_write.html
Only Google can reliably answer this question.
OTOH, if you look at the link you provided, under the Apply Steps it says:
Since each index can live in a separate location in Bigtable, these writes
can be fanned out in parallel to multiple tablet servers.
Since the indexes are written in parallel on multiple servers, I'd say there is no guarantee that they are written in some order.
I'm trying to understand the data pipelines talk presented at google i/o:
http://www.youtube.com/watch?v=zSDC_TU7rtc
I don't see why fan-in work indexes are necessary if i'm just going to batch through input-sequence markers.
Can't the optimistically-enqueued task grab all unapplied markers, churn through as many of them as possible (repeatedly fetching a batch of say 10, then transactionally update the materialized view entity), and re-enqueue itself if the task times out before working through all markers?
Does the work indexes have something to do with the efficiency querying for all unapplied markers? i.e., it's better to query for "markers with work_index = " than for "markers with applied = False"? If so, why is that?
For reference, the question+answer which led me to the data pipelines talk is here:
app engine datastore: model for progressively updated terrain height map
A few things:
My approach assumes multiple workers (see ShardedForkJoinQueue here: http://code.google.com/p/pubsubhubbub/source/browse/trunk/hub/fork_join_queue.py), where the inbound rate of tasks exceeds the amount of work a single thread can do. With that in mind, how would you use a simple "applied = False" to split work across N threads? Probably assign another field on your model to a worker's shard_number at random; then your query would be on "shard_number=N AND applied=False" (requiring another composite index). Okay that should work.
But then how do you know how many worker shards/threads you need? With the approach above you need to statically configure them so your shard_number parameter is between 1 and N. You can only have one thread querying for each shard_number at a time or else you have contention. I want the system to figure out the shard/thread count at runtime. My approach batches work together into reasonably sized chunks (like the 10 items) and then enqueues a continuation task to take care of the rest. Using query cursors I know that each continuation will not overlap the last thread's, so there's no contention. This gives me a dynamic number of threads working in parallel on the same shard's work items.
Now say your queue backs up. How do you ensure the oldest work items are processed first? Put another way: How do you prevent starvation? You could assign another field on your model to the time of insertion-- call it add_time. Now your query would be "shard_number=N AND applied=False ORDER BY add_time DESC". This works fine for low throughput queues.
What if your work item write-rate goes up a ton? You're going to be writing many, many rows with roughly the same add_time. This requires a Bigtable row prefix for your entities as something like "shard_number=1|applied=False|add_time=2010-06-24T9:15:22". That means every work item insert is hitting the same Bigtable tablet server, the server that's currently owner of the lexical head of the descending index. So fundamentally you're limited to the throughput of a single machine for each work shard's Datastore writes.
With my approach, your only Bigtable index row is prefixed by the hash of the incrementing work sequence number. This work_index value is scattered across the lexical rowspace of Bigtable each time the sequence number is incremented. Thus, each sequential work item enqueue will likely go to a different tablet server (given enough data), spreading the load of my queue beyond a single machine. With this approach the write-rate should effectively be bound only by the number of physical Bigtable machines in a cluster.
One disadvantage of this approach is that it requires an extra write: you have to flip the flag on the original marker entity when you've completed the update, which is something Brett's original approach doesn't require.
You still need some sort of work index, too, or you encounter the race conditions Brett talked about, where the task that should apply an update runs before the update transaction has committed. In your system, the update would still get applied - but it could be an arbitrary amount of time before the next update runs and applies it.
Still, I'm not the expert on this (yet ;). I've forwarded your question to Brett, and I'll let you know what he says - I'm curious as to his answer, too!