Google app engine excessive small datastore operations - google-app-engine

I'm having some trouble with the google app engine datastore. Ever since the new pricing model was introduced, the cost of running my app has increased massively.
The culprit appears to be "Datastore small operations", which come in at more than 20 Million ops per day!
Has anyone had this problem, I don't think I'm doing an excessive amount of key lookups, and I only have 5000 users, with roughly 10 - 20 requests per minute.
Thanks in advance!
Edit
Ok got some stats, these are after abut 3 hours. Here is what I am seeing in my dashboard, in the billing section:
And here are some of the stats:
Obviously there are quite a lot of calls to datastore.get. I am starting to think that it is my design that is causing the problem. Those gets correspond to accounts. Every user has an account, but an account can be one of two types, for this I use composition. So each account entity has a link to its sub account entity.
As a result when I do a search for nearby users it involves fetching the accounts using the query, and then doing a get on each account to get its sub account. The top request in the stats picture is a call that gets 100 accounts, and then has to do a get on each one. I would have thought that this was a very light query, but I guess not. And I am still confused by the number of datastore small ops being recorded in my dashboard.

Definitely use appstats as Drew suggests; regardless of what library you're using, it will tell you what operations your handlers are doing. The most likely culprits are keys-only queries and count operations.

My advice would be to use AppStats (Python / Java) to profile your traffic and figure out which handler is generating the most datastore ops. If you post the code here we can potentially suggest optimizations.

Don't scan your datastore, use get(key) or get_by_id(id) or get_by_key_name(keyname) as much as you can.

Do you have lots of ReferenceProperty properties in your models? Accessing them will trigger db.get for each property unless you prefetch them. This would trigger 101 db.get requests.
class Foo(db.Model):
user = db.ReferenceProperty(User)
foos = Foo.all().fetch(100)
for f in foos:
print f.user.name # this triggers db.get(parent=f, key=f.user)

Related

Passing on goog-app-engine costs to the client

I have a google app engine application that uses the cloud datastore. My application usage is increasing, I cannot maintain it as “free” since I am paying
for the read/write operations to the datastore
the storage of data in the cloud
instance hours
I plan to charge “by the drink.” In other words, as I am charged for application usage, I will pass on that charge on to my clients. Before developing my own solution to do this, I realize that there must be countless others who have solved this problem. If so, what technique(s) have you employed?
Thank you in advance for your suggestions.
Unfortunately there is no silver bullet for this situation. Be advised, unless you really think this through keeping track of your users consumption might end up costing as much as the actual usage.
Without knowing anything else about your app its hard to help but my advice would be to use appstats to figure out the actual cost of a given service and charge the user per time accessed.
Most users do not like (actually, hate) to be charged for something they (a) do not understand, and (b) have no control over. If a phone company tells its customers to go ahead and use their service without telling them how much - exactly - they are going to pay, it will lose all customers in no time. How are you going to explain read/write datastore costs to your users, with indexed properties and all? How about query costs and instance hours? It's a difficult task even if all of your customers are software engineers.
I recommend charging users for something they understand, like creating a free and a premium version of your app with additional features, or charging per game, or per document processed (you did not tell us what your app is doing), etc.

Memcache System - Keys getting dropped frequently

We currently have our application hosted in Google App Engine. Billing is enabled to that application. This application is still in beta that we are using for testing purpose. We have a logic of serving data from the Memcache if present, if not then we get the data from the datastore and update the memcache and serve the data. We are encountering strange behaviour related to Memcache. The data related to some keys in Memcache is getting dropped after few minutes after being set. We tried setting expiration time for the keys in the memcache, even that does not seem to work. Since the data is getting dropped from the memcache the data is again from the datastore which is increasing the billing for our application.
Currently nearly 80% of the billing is related to datastore read. The datastore read is high as the memcache is not working efficiently as it should be. Any insight why we are facing this issue would be really helpful.
Just an FYI, we are having around 75000 keys in the memcache with total size of 100 MB data. Our structure demands keeping such large number of keys in memcache, which I think should not be an issue.
Our application is being by 10 users and the billing amount per day is coming to around $40.
Thanks,
Krish
Unfortunately memcache will evict keys as and when it requires. Setting the time to expire only means the item will be in memcache for up to the expiry time.
Take a look at the docs regarding eviction.
Also, take a look at this for some more insight into ways around memcache issues.
Regarding your data structure, perhaps you could post a new question and we can see if others have advice for you.

any number on Google App Engine free quota in terms of total number of request and unique visitors

Does anyone have any number on Google App Engine (free quota) in terms of total number of request and unique visitors it allows per day?
Maybe someone who has live production code can tell us this?
Rough number is enough, just to get the idea.
I can not get this information from the pricing model.
Thanks
I had this question when I first started using App Engine, but it's impossible to answer with the information in your question.
You must have an estimate on the individual API quota usages, then calculate based on that.
You might be able to simplify it by trying to figure out which API quota you're likely to hit first, and then figuring out the number of requests you can serve before that quota runs out. ie:
Storing photos or other large data for users? You'll probably hit the blobstore quota first. Daily/unique visitor counts probably won't matter.
Serving lots of photos or large data? You'll probably hit the bandwidth quota first.
Need to start a channel for every view? You'll probably hit the channel quota first and get 100 views per day.
Need to send an email for every view? You'll probably hit the mail quota first.
Need to query the datastore a lot? You'll probably hit the datastore limit first.
The datastore limit is the hardest to calculate. You get 50k read and 50k write ops. Most likely you'd read more than write.
If you need 2 read ops per page, you might could do 25k views per day.
If you need 2 read ops per page, but you're smart and you memcache them, and memcache is effective 80% of the time, you could get 125k views per day.
If you need 500 read ops per page and you can't cache it, you can do 100 views per day. That's provided you don't run out of one of the other quotas.
Do your own math.
The quotas and rates (for free and paid apps) are listed on https://developers.google.com/appengine/docs/quotas.

Logs show CPU usage and total cost - how does that relate to datastore operations cost?

The logs tell me how much a certain request costs
api_cpu_ms=278 cpm_usd=0.009244
Is this cost still accurate with the new billing model?
Can I deduce anything about the number of datastore operations the request used?
Is there any way to know the exact number of read, write and small datastore operations used during a single request?
I was told about 2 months ago that they do not reflect the new billing model, and I haven't seen any indications that they've been fixed to reflect the new billing model.
You can get the number of operations per request by enabling Appstats in your application.

Google App Engine - How to implement the activity stream in a social network

I want some ideas on the best practice to implement an activity stream for a social network im building in app engine (PYTHON)
I first want to keep a log for all activities of each user - so that we have a history. i.e. someone became a friend, added a picture, changed their address etc. This way we have a users history available should we need it. Also mean we can remove friendship joins, change user data but have a historical log.
I also want to stream a users activity to their friends. for this only the last X activities need to be kept - that is in the scenario that messages are sent to friends when an activity occurs.
Its pretty straight forward designing a history log - ie: when, what, where. The complication comes as to how we notify friends of a user as to their activity.
In our app friendships are not mutual - ie they are based on the twitter following model. Some accounts could have thousands of followers.
What is the best approach to model this.
using a many to many join table and doing a costly query -
using a feed class that fired a copy of the activity to all the subscribers - maybe into mcache? As their maybe a need to fire thousands of messages i would imagine a cron job would need to be used.
Any help ideas thoughts on this
Thx
There's a great talk by Brett Slatkin called Building Scalable, Complex Apps on App Engine from last year's Google I/O, in which the example is a Twitter-like application, where users' updates are pushed to their followers. Basically exactly what you're trying to do.
I highly recommend the video for anyone writing an App Engine app, it's really helpful.
Don't do joins. They're too expensive, you'll burn through your quota in no time.
You can use a task queue, it's a bit like a cron job (i.e. stuff happens outside of the original request) but you can start them at will. memcache would be good if you're ok with loosing some activity at times the cache is flushed...

Resources