Interpreting cost data in Google Cloud Platform - google-app-engine

I host a basic web app on Google Cloud Platform, and I've noticed my costs creeping up over the last couple of months. It's really accelerated over the last 30 days (fortunately, on a tiny base - I'm still ticking along at under $2 a day). I haven't added any new functionality or clients in months so this was a bit surprising.
My first instinct was an increase in traffic. I couldn't see anything like that in the App Engine dashboard, but I put in a heap of optimizations and dramatically decreased QPS just in case. No change.
The number of instances hasn't moved around much either - this looks like the most likely culprit but it's still just flat, not growing.
My next guess was that data was accumulating in Datastore (even though the cost chart is filtered to App Engine only, I figured a fuller datastore -> a slower datastore -> more instance time in GAE). There's no chart for this, annoyingly, but I determined the data store size was more or less flat (I have a blunt instrument TTL job that runs daily) and culled it by dropping my retention threshold by 20% just to be safe.
These optimizations were on the 17th, but my cost hasn't moved at all. I considered forex fluctuations (I'm billed in Aussie dollars, all my charges are for frontend instances in Japan) but they haven't been anywhere near big enough to explain this.
Any ideas what's going on? I've clicked through all the graphs and reports in billing but can't reconcile the ~100% growth in cost with a flat or dropping qps, instance count and database size.

Yes! I've seen the same thing on a simple App Engine website running Python 3.7! I've had a ticket open since April 29th and they're not helpful. I saw a step change in frontend instance hours on March 24th with no corresponding increase in traffic. I have screenshots that are really telling but I can't upload them since I don't have 10 reputation points.
There's no corresponding increase in traffic, either in the cloud console or in Google analytics.
What's worse, each day the daily estimate shows I'm be under the 28 hour quota. For example, I took a screenshot that showed after 15 hours I was on pace for 24.352 frontend instance hours for the day (I didn't take one at the end of the quota day since it resets at 3AM)
When I woke up the next morning the billing report showed I was charged $0.00 for frontend instance hours for the previous day, but 3 hours later it shot up to $0.48, which means I used 38.6 frontend instance hours worth.
Somehow, the estimated cost calculation was off by 14 hours. Why have the estimate at all if it has an error that large? When I looked at the minute-by-minute billed instance hours for the hours after taking the screenshot through the end of the quota-day, there's nothing that indicates I would have used 23 additional hours from the time I took the screenshot to the time of the quota reset.
This behavior has been happening every day since March 24th for me with no explanation from Google besides "it looks like you exceeded your instances..." I wish I could share the screenshots so you can compare what you're seeing.

Related

AWS Lightsail Metric graphs "No data available"

We're using AWS Lightsail PostgreSQL Database. We've been experiencing errors with our C# application timing out when using the connection to database. As I'm trying to debug the issue, I went to look at the Metric graphs in AWS. I noticed that many of the graphs have frequent gaps in the data, labeled No data available. See image below.
This graph (and most of the other metrics) shows frequent gaps in the data. I'm trying to understand if this is normal, or could be a symptom of the problem. If I go back to 2 weeks timescale, there does not appear to be any other strange behaviors in any of the metric data. For example, I do not see a point in time in the past where the CPU or memory usage went crazy. The issue started happening about a week ago, so I was hoping the metrics would have helped explained why the connections to the PostgreSQL database are failing from C#.
🔶 So I guess my question is, are those frequent gaps of No data available normal for a AWS Lightsail Postgres Database?
Other Data about the machine:
1 GB RAM, 1 vCPU, 40 GB SSD
PostgreSQL database (12.11)
In the last two weeks (the average metrics show):
CPU utilization has never gone over 20%
Database connections have never gone over 35 (usually less than 5) (actually, usually 0)
Disk queue depth never goes over 0.2
Free storage space hovers around 36.5 GB
Network receive throughput is mostly less than 1 kB/s (with one spike to 141kB/s)
Network transmit throughput is mostly less than 11kB/s with all spikes less than 11.5kB/s
I would love to view the AWS logs, but they are a month old, and when trying to view them they are filled with checkpoint starting/complete logs. They start at one month ago and each page update only takes me 2 hours forward in time (and taking ~6 seconds to fetch the logs). This would require me to do ~360 page updates, and when trying, my auth timed out. 😢
So we never figured out the reason why, but this seems like it was a problem with the AWS LightSail DB. We ended up using a snapshot to create a new clone of the DB, and wiring the C# servers to the new DB. The latency issues we were having disappeared and the metric graphs looked normal (without the strange gaps).
I wish we were able to figure out the root of the problem. ATM, we are just hoping the problem does not return.
When in doubt, clone everything! 🙃

How are frontend instance hours calculated on app engine?

I have a simple online ordering application I have built. It probably handles 25 hours a week, most of those on Mondays and Tuesday.
Looking at the dashboard I see:
Billing Status: Free - Settings Quotas reset every 24 hours. Next reset: 7 hrs
Resource Usage
Frontend Instance Hours 16% 4.53 of 28.00 Instance Hours
4.53 hours seems insanely high for the number of users I have.
Some of my pages make calls to a filemaker database stored on another service and have latencies like:
URI Reqs MCycles Latencies
/profile 50 74 1241 ms
/order 49 130 3157 ms
my authentication pages also have high latencies as they call out to third parties:
/auth/google/callback 9 51 2399 ms
I still don't see how they could add up to 4.53 hours though?
Can anyone explain?
You're charged 15 minutes every time an instance is spins up.
If you have few requests, but they are spaced out, your instance would shut down, and you'll incur the 15 minute charge the next time the instance spins up.
You could easily rack up 4.5 instance hours with 18 HTTP requests.
In addition to the previous answer, I thought to add a bit more about your billing which might have you confused. Google gives you 28 hours of free instance time for each 24 hour billing period.
Ideally you always have one instance running so that calls to your app never have to wait for an instance to spin up. One instance can handle a pretty decent volume of calls each minute, so a lot can be accomplished with those free 28 hours.
You have a lot of zero instance time (consumed less than 5 instance hours in seventeen hours of potential billing.) You need to worry more about getting this higher not lower because undoubtedly most of the calls to your app currently are waiting for both spin-up latency plus actual execution latency. If you are running a Go app, spin-up is likely not an issue. Python, likely a small-to-moderate issue, Java...
So think instead about keeping your instance alive, and consume 100% of your free instance quota. Alternatively, be sure to use Go, or Python (with good design). Do not use Java.

Dramatic increase startup time

1-2 days ago startup time increase from 1-3 second to 10-30 seconds with same code.
I use python 2.7 multithreading.
Code in this request read one value from memcache and return it to user. If memcache empty it read and render simple html-template from template in local filesystem. Both use equal cpu_ms.
Same code work fine in test application. Startup time in test application about 1-2 second.
I send production issue yesterday in night, but don't receive answer.
I try change instance type from F1 to F4, startup time for F4 8-10 seconds.
AppID of my app: f1f2ru
Log record before problem:
Log record at start problem:
Log record now:
Log record in test app:
Without logs etc it's hard to know. But who knows, perhaps Google is running low on resources and your free apps are paying the price.
https://developers.google.com/appengine/docs/adminconsole/instances#Loading_Requests
Currently I get about a 5 second startup time from cold on one of my very simple apps. 30 seconds does seem a long time however.
https://developers.google.com/appengine/docs/python/config/appconfig#Warmup_Requests
App Engine frequently needs to load application code into a fresh instance. This happens when you redeploy the application, when the load pattern has increased beyond the capacity of the current instances, or simply due to maintenance or repairs of the underlying infrastructure or physical hardware.
Or just pay:
https://developers.google.com/appengine/docs/adminconsole/performancesettings?hl=en
The Idle Instances sliders control the minimum and maximum number of idle instances available to your application at any given time.
The upper slider sets the minimum number of idle instances:
Note: In order to specify the minimum number of idle instances, you must have a paid app.
As usual, the more you pay the better the service.
Are the libraries the same from before or after? Adding libraries to the source code can and often does increase start up time (AppEngine needs to load them). It is something easily overlooked, and some of them are really really bloated.
I don't know reason of first increase startup time.
Problem of increase operations time is my mistake:
my application do query:
object.date < datetime.datetime.now()
and my datastore have no objects with date < datetime.now(), but it have objects with date = None (null). I forget about null it is normal, indexed value and null < any value.

AppEngine cpm_usd change

AppEngine features a member in its logs called cpm_usd. As far as I understand this is the approximate cost of 1000 of such requests in US Dollars.
Since 08/16/2012 these numbers are SIGNIFICANTLY smaller (factor 500) for my app (I did not change my app). I was wondering what this is about?
Did Google change the way they calculate those costs?
Are frontend hours included or does this only include calls to services like the datastore?
The only answer I have is that they stopped including the frontend hours in the calculation (I am currently still in dev mode and am thus accumulating a lot of idle time that could have distorted the original / old result).
I am not sure why your numbers have changed but my understanding is that, as of the pricing changes last year, this number is not relevant anymore.
Around the time your numbers changed Google was adding cost tracking functionality to the AppStats tool. What you can do now is turn on pricing metrics in AppStats and get an accurate picture of the RPC costs of your request (which covers pretty much all costs except instance hours).
A quick test of a few requests on one of my apps shows that the cpm_usd and the cost reported by AppStats are not in line at all. Based on the number reported by cpm_usd for the requests I was just testing there is no way that number could contain datastore costs which means it is basically useless to me.
Check out the cost tracking that AppStats can provide and see how your own numbers line up:
https://developers.google.com/appengine/docs/java/tools/appstats#cost
Update Sept 5, 2012:
I asked about the current relevance of cpm_usd in a recent App Engine office hours hangout and while they could not give an exact answer they indicated they think it is still a relevant number. It would be nice to have more insight into what cpm_usd currently represents. Here is a recording of Amy answering the question:
https://www.youtube.com/watch?v=W-YBnWdllfI&feature=player_detailpage#t=3127s

Queuing Emails on App Engine

I need to send out emails at a rate exceeding App Engine's free email quota (8 emails/minute). I'm planning to use a TaskQueue to queue the emails, but I wondered: is there already a library or Python module I could use to automate this? It seems like the kind of problem someone might have run into before.
If it's an option, why not just enable billing? It'll jump the max rate from 8 recipients/minute to 5,100 recipients/minute.
The first 2000 recipients is free each day, as long as you aren't going over the daily free quotas my understanding is that it will not cost you anything (and if you need to email more than 2000 people per day you're going to have to enable billing anyways).
The deferred library is designed for exactly this sort of thing. Simply use deferred.defer(message.send), and make sure the queue you're using has the appropriate execution rate.
its cheaper to just pay for it for a year than to engineer a workaround.
Easiest way in my opinion would be to use a queue, ex Amazon SQS, and pull 8 records per minute, in a cron job running every minute.
Considering it was pushed into the queue, then taken out, I am working out the math that it is an extremely cheap service.
See below, 0.000002 is the rate for 2 requests. (Add and View)
8 requests per minute, 60 minutes in an hour, and 24 hours in a day. Take into account 30 days in the average month, you are still under $1.
0.000002 * 8 * 60 * 24 * 30 = $0.6912
This might not be exactly what you were looking for, but it should be a pretty simple solution.
EDIT:
See here, a python SQS & S3 Lib (sqs is all that you should be looking for).
http://pypi.python.org/pypi/Python-Amazon/0.5
I'm not familiar with any canned solutions to this problem, but it should be something very easy to solve. Write the emails to a datastore table, with an auto_add_now date field to record the order in which they entered. Your cron job that runs every minute pulls the eight oldest records off, mails them and deletes them.
Certainly, if you can solve this is a reasonably generic manner, you can be the person who solves this problem for everyone with a nice open source module.

Resources