Is there an Entity Group Max Size? - google-app-engine

I have an Entity that represents a Payment Method. I want to have an entity group for all the payment attempts performed with that payment method.
The 1 write-per-second limitation is fine and actually good for my use case, as there is no good reason to charge a specific credit card more frequently than that, but I could not find any specifications on the max size of an entity group.
My concern is would a very active corporate account hit any limitations in terms of number of records within an entity group (when they perform their 1 millionth transaction with us)?

No, there isn't a limit for the entity group size, all datastore-related limits are documented at Limits.
But be aware that the entity group size matters when it comes to data contention, see Keep entity groups small. Please note that contention is not only happening when writing entities, but also when reading them inside transaction (see Contention problems in Google App Engine) or, occasionally, maybe even outside transactions (see TransactionFailedError on GAE when no transaction).
IMHO your usage case is not worth the risk of dealing with these issues (fairly difficult to debug and address), I wouldn't use a single entity group in this case.

Related

Entity group in transaction (contention)

I'm reading a book on GAE. In a chapter about transactions, it says:
Updating an entity in a group can potentially cancel updates to any other entity in the group by another process. You should design your data model so that entity groups do not need to be updated by many users simultaneously.
Be especially careful if the number of simultaneous updates to a
single group grows as your application gets more users. In this case,
you usually want to spread the load across multiple entity groups, and
increase the number of entity groups automatically as the user base
grows. Scalable division of a data resource like this is known as
sharding.
An often used example for an entity group is the message board, where the board is the ancestor of messages belonging to that board.
However, If updating a message (i.e. editing it) causes contention, and more often so as the userbase grows, isn't it bad design to create a huge group of messages with the board as its ancestor? The write rate of an entity group is limited to 1 per second. Does that mean any message within the board can be updated at most once per second?
Also, does merely adding an entity to a group (i.e. posting a new message) also count as "updating" and cause contention?
Yes, such design may be considered a bad one as it doesn't scale well with the number of users. I can't see a good reason for which messages would need the board as ancestor.
Yes, creating a new entity in a group counts as an entity group update and all updates can contribute to contention.
Side note: you might find this clarification useful: https://stackoverflow.com/a/39309022/4495081 (but for designs which have good reasons for using multi-entity groups).

Entity Group - deciding on how to group

I've read throughout the Internet that the Datastore has a limit of 1 write per second for an Entity Group. Most of what I read indicate a "write to an entity", which I would understand as an update. Does the 1 write per second also apply to adding entities into the group?
A simple case would be a Thread where multiple posts can be added by different users. The way I see it, it's logical to have the Thread be the ancestor of the Posts. Thus, forming a wide entity group. If the answer to my question above is yes, a "trending" thread would be devastated by the write limit.
That said, would it make sense to get rid of the ancestry altogether or should I switch to the user as the ancestor? What I'd like to avoid is having the user be confused when they don't see the post due to eventual consistency.
A quick clarification to start with
1 write per second doesn't mean 1 entity per second. You can batch writes together, up to a maximum of 500 entities (transactions also have a 10 MiB limit). So if you can patch posts, you can improve your write rate.
Note: you can technically go higher than 1 per second, although your risk of contention errors increases the longer you exceed that limit as well as the eventual consistency of the system.
You can read more on the limits here.
Client-side sharding
If you need to use ancestor queries for strong consistency AND 1 write per second is not enough, you could implement client-side sharding. This essentially means that you write the posts to a up to N different entity-groups using a known key scheme, For example:
Primary parent: "AncestorA"
Optional shard 1: "AncestorA-1"
Optional shard N: "AncestorA-(N-1)"
To query for your posts, issue N ancestor queries. Naturally, you'll need to merge these results on the client-side to display it in the correct order.
This will allow you to do N writes per second.

Google Datastore - Not Seeing 1 Write per Second per Entity Group Limitation

I've read a lot about strong vs eventual consistency, using ancestor / entity groups, and the 1 write per second per entity group limitation of Google Datastore.
However, in my testing I have never hit the exception Too much contention on these datastore entities. please try again. and am trying to understand whether I'm misunderstanding these concepts or missing a piece of the puzzle.
I'm creating entities like so:
func usersKey(c appengine.Context) *datastore.Key {
return datastore.NewKey(c, "User", "default_users", 0, nil)
}
func (a *UserDS) UserCreateOrUpdate(c appengine.Context, user models.User) error {
key := datastore.NewKey(c, "User", user.UserId, 0, usersKey(c))
_, err := datastore.Put(c, key, &user)
return err
}
And then reading them with datastore.Get. I know I won't have issues reading since I'm doing a lookup by key, but if I have a high volume of users creating and updating their information, I would theoretically hit the max of 1 write per second constantly.
To test this, I attempted to create 25 users at once (using the above methods, no batching), yet I don't log any exceptions, which this post implies I should: Google App Engine HRD - what if I exceed the 1 write per second limit for writing to the entity group?
What am I missing? Does the contention only apply to querying, is 25 not a high enough volume, or am I missing something else entirely?
From the documentation:
Writes to a single entity group are serialized by the App Engine
datastore, and thus there's a limit on how quickly you can update one
entity group. In general, this works out to somewhere between 1 and 5
updates per second; a good guideline is that you should consider
rearchitecting if you expect an entity group to have to sustain more
than one update per second for an extended period.
Note the words "extended period". 1 update per second is basically a minimum guaranteed throughput. At any given moment you may be able to achieve a significantly higher levels, but Google is warning you not to architect for those levels to be always available.
The limitation is per entity group, that means you could create as many users as you need without limitation (that's where scaling shines), as long as they don't share the same ancestor.
Things change once you start using the user key as the ancestor of other entities, making them part of the same group and thus having a limit on how many changes you can make to it per second.
Btw this is a generalization, most likely you will be able to make ~5 changes per second, this limitation exist because of the transactional properties of an entity group, so there's some kind of table with changes that must be executed sequentially, so you have to lock, and thus there's limited throughput.
Still, rule of thumb is thinking you can only do 1 per second to force yourself think about how to work under this conditions.
And like mentioned, this is only relevant when you update the database, gets and queries should scale as needed.
I don't think you're missing anything here. Previously, I had seen the same limitations when writing to the same entity group but recently (this week, in fact) I have not seen the delays. I'm willing to suggest that Google has solved this problem, and I'm hoping that someone will prove me correct.

NDB: What happens when 1/s write is exceeded?

I am evauating how to use GAE + NDB for a new project, and got concerned with the limit of 1 write per second for ancestor writes. I might be missing information, so I'm happy to ask for help.
Say several users work with orders. If all new "order" entities have the same unique ancestor, what would happen if say 5 users each create a new order and all 5 hit "save" at the same time?
Do you know what the consecuences could be?
Thanks!
In your use case, nothing bad would happen - all of your writes will succeed. Some of them may be retried internally by the App Engine, but you should not worry about that. You should only get concerned when you expect this rate to be exceeded for a substantial period of time. Then retries would come on top of previous retries and commits may start failing. Giving your example, you will probably need a few million people working on those orders like crazy before it becomes an issue.
From the documentation (emphasis mine):
The first type of timeout occurs when you attempt to write to a single
entity group too quickly. Writes to a single entity group are
serialized by the App Engine datastore, and thus there's a limit on
how quickly you can update one entity group. In general, this works
out to somewhere between 1 and 5 updates per second; a good guideline
is that you should consider rearchitecting if you expect an entity
group to have to sustain more than one update per second for an
extended period.

AppEngine's entity group write limit is very limiting in real life application?

I'm thinking about introducing entity groups in my application to enable strong consistency. Propose I have an Order entity and a OrderRow entity with each Order as a parent for it's OrderRows. Then it would be normal to update the Order with the sum of all OrderRows when adding an OrderRow.
But because the datastore is limited to 1 write per second, each time I edit/add an OrderRow it would take at least one second because of the updating of the Order.
Is this correct? If so, the one second limit is extremely limiting because it's very often you update two entities within the same entity group in one user request?
If it is within a single request, then you can run them all within the same transaction, (which is the purpose of the entity group).

Resources