I am new to Google Cloud Datastore. I have read that there is a write limit on 1 write per second on an entity group. Does it means that the main "guestbook" tutorial on app engine cannot scale to thousands of very active users?
Indeed.
The tutorial is just a showcase. The writes per second limitation is due to the strong consistency guarantee for entities in the same group or ancestor. This limit can be exceed at the price of changing strong consistency by eventual consistency, meaning all datastore queries will show the same information at some point. This is due to App engine distributed design.
Please, have a look at https://cloud.google.com/appengine/articles/scaling/contention to avoid datastore contention issues. Hope it helps.
Yes, I think it does mean that.
It might not be a problem if the greetings are all added to different guestbooks, but quickly adding Greetings to the same Guestbook is definitely not gonna scale. However, in practice it's often much faster than 1 write per second.
Perhaps you could work around this by using a taskqueue to add Greetings, but that might be overkill.
That guestbook tutorial is not a very good example in general. You shouldn't put logic in your jsp's like in that example (you probably shouldn't use jsp's at all). It's also not very practical to use the datastore at such a low level. Just use Objectify.
Related
while starting to learn the google app engine datastore API I noticed the tutorial said:
"the rate at which you can write to the same entity group is limited to 1 write to the entity group per second". Here
I cant seem to understand how can you store a lot of user's information that needs to be written more than 1 time per second like: a simple app that let a user change a value on their profile or something like a comment or chat app that must write rapidly into the datastore
how can this be achieved? what have I missed here?
if there's any samples or tutorials for a real application with the datastore it can be of great reference to me. (preferably in golang but anything will do)
Thanks!
The key part of the bit you quoted is "entity group". Have you properly understood what that means?
A user updating their own profile is unlikely to happen more than once per second - that would require some very fast typing and clicking. As long as the profiles aren't in the same entity group then there's no reason, say, 10k users can't all update their profiles at the same time. For a chat, storing an entire conversation/room as a single entity would be tricky, but storing each message as its own entity wouldn't hit this limit.
You have not missed anything, it's a way High Replication Datastore (HRD) works. It could be strange at first, but if you read about the benefits of HRD, you may find that it makes a sense.
If it's not suitable for you, you can use native MySQL for GAE.
Also, there is a closed question about this subject.
I'm thinking of porting an application from RoR to Python App Engine that is heavily geo search centric. I've been using one of the open source GeoModel (i.e. geohashing) libraries to allow the application to handle queries that answer questions like "what restaurants are near this point (lat/lng pair)" and things of that nature.
GeoModel uses a ListProperty which creates a heavy index which had me concerned about pricing as i have about 10 million entities that would need to be loaded into production.
This article that I found this morning seems pretty scary in terms of costs:
https://groups.google.com/forum/?fromgroups#!topic/google-appengine/-FqljlTruK4
So my question is - is geohashing a moot concept now that Google has released their full text search which has support for geo searching? It's not clear what's going on behind the scenes with this new API though and I'm concerned the index sizes might be just as big as if I used the GeoModel approach.
The other problem with the search API is that it appears I'd have to create not only my models in the datastore but then replicate some of that data (GeoPtProperty and entity_key for the model it represents at a minimum) into Documents which greatly increases my data set.
Any thoughts on this? At the moment I'm contemplating scraping this port as being too expensive although I've really enjoyed working in the App Engine environment so far and would love to get away from EC2 for some of my applications.
You're asking many questions here:
is geohashing a moot concept: Probably not, I suspect the Search API uses geohashing, or something similar for its location search.
can you use the Search API vs implementing it yourself: yes, but I don't know the cost one way or the other.
is geohashing expensive on app engine: in the message thread the cost is bad due to high index write costs. you'll have to engineer your geohashing data to minimize the indexing. If GeoModel puts a lot of indexed values in the list, you may be in trouble - I wouldn't use it directly without knowing how the indexing works. My guess is that if you reduce the location accuracy you can reduce the number of indexed entries, and that could save you a lot of cost.
As mentioned in the thread, you could have the geohashing run in CloudSQL.
I am going to create a web site or you may call it a web application as it is going to and I attempt to use GAE data store. However this website would be used by many people to search for companies, create profiles (accounts). I am not sure how much it is going to cost as there would be many requests to fetch companies profiles and create new profiles. So, I need some advice about my idea is it going to cost a lot ? Does GAE data store fits with this kind of websites and applications ?.
Thanks in advance for reply.
First of all, to estimate the cost you need to answer some questions. What is the number of read and write operations you expect? How many entities will you be storing? What is the approximate size of each entity? With that values you could estimate the cost based on the current pricing.
You didn't write what your requirements are, but I think one of those would be scalability. Take a look at this excerpt from the docs.
The App Engine Datastore is a schemaless object datastore providing
robust, scalable storage for your web application, with no planned
downtime, atomic transactions, high availability of reads and writes,
strong consistency for reads and ancestor queries, and eventual
consistency for all other queries.
If this doen't feet your needs you can also use Google Cloud Sql.
I am using djangoappengine and I think have run into some problems with the way it handles eventual consistency on the high application datastore.
First, entity groups are not even implemented in djangoappengine.
Second, I think that when you do a djangoappengine get, the underlying app engine system is doing an app engine query, which are only eventually consistent. Therefore, you cannot even assume consistency using keys.
Assuming those two statements are true (and I think they are), how does one build an app of any complexity using djangoappengine on the high replication datastore? Every time you save a value and then try to get the same value, there is no guarantee that it will be the same.
Take a look in djangoappengine/db/compiler.py:get_matching_pk()
If you do a djangomodel.get() by the pk, it'll translate to a Google App Engine Get().
Otherwise it'll translate to a query. There's room for improvement here. Submit a fix?
Don't really know about djangoappengine but an appengine query if it includes only key is considered a key only query and you will always get consistent results.
No matter what the system you put on top of the AppEngine models, it's still true that when you save it to the datastore you get a key. When you look up an entity via its key in the HR datastore, you are guaranteed to get the most recent results.
I´m thinking about to write an application will have to store a small amount of records per user (<300) but hopefully will have a lot of users (>>1000).
I did some research for a platform that allows starting small and scale if there is a need to do so and got stuck with App Engine, but I´m not sure if it is the right tool for it, especially the datastore.
How will I get it to scale if I have a User entity and a Message entity and store all users and messages in that entities? I think the amount of records in the entities will grow really big and filtering i.e. for all messages of a user will get expensive. Is that a problem or will Google handle that ? Do I have to introduce multitenancy and create a namespace for each user so I only see the records in the entities that relates to the user? Is there a limit for the number of namespaces ? What would be the right approach for modeling the data in the datastore?
I do not really have a clue how to handle the App Engine datastore and if its the right tool for me.
The App Engine datastore is explicitly designed to handle this kind of scalability. Queries execute in time proportional to the number of records returned, so fetching all a user's messages will take the same time regardless of how many users there are in the system.
I think with those kind of numbers you are probably ok in terms of scalability. anywhere from 300,000 to millions of records is easily handled by any serious datastore.
It is not advisable to think of scaling during the infancy of your project.. Your first step should always be to build an app/product and launch it... Scaling comes afterwords Most of the app/products that are launched these days never make it to the level where they need to scale.. even if you do make or launch such a website/product/app that gets hit by large amount of traffic and you need to scale, then rejoice!!! because you've made it to that level.. But how to get to that level should always be the first question...
I'm not trying to de-moralise you, rather trying to help you focus where you should be... Thanks for reading and good luck with your App! May you do need to scale and as Toby said, even the most basic App Engine configuration is good enough to handle a couple of hundred thousands of records...