High Database Instance Uptime for no queries - google-app-engine

I have set up a Wordpress site on google appengine for PHP according to the instructions at https://developers.google.com/appengine/articles/wordpress
I also have a CDN sitting in front of my site so the load on the Google App Engine instance is tiny. Really just cron jobs and the CDN updating it's cache. Here are the access logs fro the last 7 hours for an example.
2013-08-29 06:09:12.829 /post-sitemap.xml 200 6793ms 0kb Amazon CloudFront
2013-08-29 06:09:05.727 /robots.txt 200 4ms 0kb Amazon CloudFront
2013-08-29 04:55:07.937 /wp-cron.php 200 7206ms 0kb AppEngine-Google; (+http://code.google.com/appengine)
2013-08-29 04:33:59.915 /tag/javascript/ 200 8822ms 37kb Amazon CloudFront
2013-08-29 04:33:59.914 This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This requ
2013-08-29 01:12:03.214 / 200 8751ms 39kb Amazon CloudFront
2013-08-29 01:12:03.214 This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This requ
2013-08-29 01:11:50.755 /robots.txt 200 64ms 0kb Amazon CloudFront
2013-08-29 00:05:27.592 /sitemap_index.xml 200 7316ms 1kb Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2013-08-29 00:05:20.217 /robots.txt 200 4ms 0kb Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2013-08-28 23:30:45.596 /system/feeds/sitemap 301 179ms 0kb Amazon CloudFront
My problem is that despite this tiny load my Cloud SQL instance is online way too much and even though I would expect my usage charges under this scenario to be tiny, they are not, I am looking at having to commit to a package to keep them under control rather than paying by usage.
See The following graph for instance uptime that ends at 6am (log time) (10pm UTC)
Then look at the query load for the same period
My guess at what is happening is that a database connection is being opened as soon as the google app engine instance is started (whether it is serving static or dynamic objects).
Any ideas on how I can resolve this?

The Cloud SQL query graph was misleading.
Your application was getting queries throughout the day, which resulted in a hit on the database roughly every hour. This caused the database to be up for about half the time. The problem will be solved in a forthcoming version of the Cloud Console (cloud.google.com/console).
Sorry for the confusion
Joe Faith,
Product Manager,
Google Cloud SQL

Related

Clients getting random ERR_CONNECTION_CLOSED error when connecting to Google App Engine

Our app is running on Google App Engine python2.7 standard environment. Since 8 March 2021, there have been hundreds of user reports of getting ERR_CONNECTION_CLOSED when firing API requests to the app from browsers.
Affected browsers include latest versions of Chrome, Safari, Firefox and Edge, so it is unlikely to be a problem of specific browsers/extensions. Also, similar problems are occurring in our native Android/iOS apps connecting to the same APIs on app engine.
When we check the log viewer on Google Cloud console, we can see that although the browser is reporting ERR_CONNECTION_CLOSED, the request has actually been passed to an instance and a 200 OK response has been produced successfully without errors. This shows that the particular client is able to connect to the server and send the request every time this issue occurs, but it cannot receive any response.
Furthermore, from the application logs, we can see that sometimes a request is sent twice within 2-4 seconds. Since our client application itself doesn't send 2 same requests, this is likely due to client retrying after connection is closed. Our API handlers are less optimized and can take up to 10 seconds to produce a response, but this is well within the 60s limit on app engine. Our API handlers also do not write partial response. The response is written after all processing is done.
Seemingly, the issue only affects a small proportion of our users, and it doesn't affect all of their requests to the same service. When a user is affected, changing browsers or even devices won't help. When a user like myself is not affected, I can keep refreshing for days and there will be no issues. I have tried logging into an affected user's account but the issue doesn't occur on my end, so it is less likely to be account specific issue in the application level (The application log also confirms this).
relevant parts of app.yaml :
runtime: python27
service: apis2
api_version: 1
threadsafe: true
instance_class: F2
inbound_services:
- warmup
builtins:
- appstats: on
- remote_api: on
- deferred: on
automatic_scaling:
min_instances: 0
min_idle_instances: 0
max_instances: 600

App Engine for cloud monitoring metrics throwing 500 error when writing to big query

I want to export Metrics from Cloud Monitoring to Big Query and google has given a solution on how to do this. I am following this this article.
I have downloaded the code from github and I am able to successfully deploy and run the application (python2.7),
I have given the aggregate alignment period as 86400s (I want to aggregate metrics per day starting from 1st July)
One of the app-engines the write-metrics app engine which writes the metrics to the big query, by getting the api response as a pub-sub message is always throwing me these errors:
> Exceeded soft memory limit of 256 MB with 270 MB after servicing 5 requests total. Consider setting a larger instance class in app.yaml.
> While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.
The above is 500 error and very frequent and I find that duplicate records are getting inserted in the table in BigQuery still
and also this one below
DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded.
The app engine logs frequently show POST with codes 500 and 200
In app engine(standard) I have added the scaling as automatic and set in app.yaml as below:
automatic_scaling:
target_cpu_utilization: 0.65
min_instances: 5
max_instances: 25
min_pending_latency: 30ms
max_pending_latency: automatic
max_concurrent_requests: 50
but this seems to have no effect. I am very new to app engine,google-cloud and its stackdriver metrics.
This change makes it work
instance_class: F4_1G
This needs to be as a independent tag and previously i had made the mistake of putting under the automatic scaling: so it gave illegal modifier

GAE datastore admin copy failing on MapReduce model to JSON conversion

I am trying to copy my app's datastore to another app using the datastore admin console, according to this documentation. Since my app uses the Java runtime, I installed the datastore admin Python sample as instructed. I setup the app to whitelist the other app server's ID and installed it as instructed. I used this same method to copy the datastore a couple of months ago and while the process didn't go entirely smoothly, it did end up working.
The tasks created by the datastore admin copy operation are not completing. There are 9 tasks in the default queue (one for each of my entity types I'm trying to copy). The tasks' method/URL is POST /_ah/mapreduce/kickoffjob_callback. They continuously attempt to retry their operations, but continuously fail. The tasks' headers are each something like:
X-AppEngine-Current-Namespace
content-type application/x-www-form-urlencoded
Referer https://ah-builtin-python-bundle-dot-mysourceappid.appspot.com/_ah/datastore_admin/copy.do
Content-Length 970
Host ah-builtin-python-bundle-dot-mysourceappid.appspot.com
User-Agent AppEngine-Google; (+http://code.google.com/appengine)
The tasks' previous run results are each something like:
Dispatched time (UTC) 2013/05/26 08:02:47
Seconds late 0.00
Seconds to process task 0.50
Last http response code 500
Reason to retry App Error
Under the destination app, the only indication I'm getting of there being any incoming copy operation is the log:
2013-05-26 01:55:37.798 /_ah/remote_api?rtok=66767762443
200 1832ms 0kb AppEngine-Google; (+http://code.google.com/appengine; appid: s~mysourceappid)
0.1.0.40 - - [26/May/2013:00:55:37 -0700] "GET /_ah/remote_api?rtok=66767762443 HTTP/1.1" 200 137 - "AppEngine-Google;
(+http://code.google.com/appengine; appid: s~mysourceappid)" "datastore-admin.mydestinationappid.appspot.com" ms=1833
cpu_ms=1120 cpm_usd=0.000015 loading_request=1 app_engine_release=1.8.0 instance=00c61b117c9beacd101ff92c542598f549f755cc
I 2013-05-26 01:55:37.797
This request caused a new process to be started for your application, and thus caused your application code to be loaded
for the first time. This request may thus take longer and use more CPU than a typical request for your application.
So the requests are at least causing an app instance to be spun up, but other than that, nothing is happening and the source app is just getting 500 server errors.
I've tried with writes enabled and disabled on both the source and destination datastores. I've double, triple and quadruple checked that the correct app IDs are registered in the Python datastore admin sample and uploaded the code to both app servers, even though it is only necessary on the destination server (they each whitelist the other's ID). I've tried with both HTTPS and HTTP URLs.
ah-builtin-python-bundle-dot-mysourceappid.appspot.com/_ah/mapreduce/status doesn't give any relevant information other than that there isn't any progress or activity on any of the tasks. If I try to abort the jobs from here, they fail to abort as well. In order to stop the jobs, I have to delete the tasks from the queue directly. I then have to manually clean up the entities left behind, including the _AE_DatastoreAdmin_Operation entity, which causes the datastore admin to still show the copy job as active and a bunch of _GAE_MR_MapreduceControl, _GAE_MR_MapreduceState and _GAE_MR_ShardState entities left behind as well.
What is going wrong? I can't believe there isn't any more relevant log data or info about where the process is failing as well.
UPDATE:
I must have been tired last night and didn't think to look in the logs under the source app ah-builtin-python-bundle instance version, since this is where the datastore admin operations occur. This is the log output I'm getting there:
2013-05-27 00:49:11.967 /_ah/mapreduce/kickoffjob_callback 500 320ms 1kb AppEngine-Google; (+http://code.google.com/appengine)
0.1.0.2 - - [26/May/2013:23:49:11 -0700] "POST /_ah/mapreduce/kickoffjob_callback HTTP/1.1" 500 1608 "https://ah-builtin-
python-bundle-dot-mysourceappid.appspot.com/_ah/datastore_admin/copy.do" "AppEngine-Google;
(+http://code.google.com/appengine)" "ah-builtin-python-bundle-dot-mysourceappid.appspot.com" ms=320 cpu_ms=80
cpm_usd=0.000180 queue_name=default task_name=706762757133111420 app_engine_release=1.8.0
instance=00c61b117c5825670de2531f27693bdc2ffb71
E 2013-05-27 00:49:11.966
super(type, obj): obj must be an instance or subtype of type
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 716, in __call__
handler.post(*groups)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/base_handler.py", line 83, in post
self.handle()
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/handlers.py", line 1087, in handle
spec, input_readers, queue_name, self.base_path(), state)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/handlers.py", line 1159, in _schedule_shards
output_writer=output_writer))
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/handlers.py", line 718, in _state_to_task
params=tstate.to_dict(),
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/model.py", line 805, in to_dict
"input_reader_state": self.input_reader.to_json_str(),
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/model.py", line 165, in to_json_str
json = self.to_json()
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/input_readers.py", line 2148, in to_json
json_dict = super(DatastoreKeyInputReader, self).to_json()
TypeError: super(type, obj): obj must be an instance or subtype of type
Looks like the copy task is crashing while trying to convert the MapReduce data model to JSON because the input reader isn't a subtype of DatastoreKeyInputReader. This must be a bug introduced in either version 1.8.0 or another version since 1.7.5, which was the current SDK version last time I ran a datastore copy operation.
For reference, this has been fixed and will be out soon.
https://code.google.com/p/googleappengine/issues/detail?id=9388

GAE: Queues, Quotas and backend instances

I have a queue with a lot of tasks in it. I would like to use one backend instance to process this queue. My quota info tells me I have blown my budget on hundreds of frontend instance hours and have not used any backend instance hours. As I had configured only one backend instance, I was expecting to be charged no more than 1 (backend) instance hour per hour. Here is my configuration:
backends.yaml
backends:
- name: worker
class: B8
instances: 1
options:dynamic
queue.yaml
- name: import
rate: 20/s
bucket_size: 40
adding tasks to queue in my script
deferred.defer(importFunction, _target='worker', _queue="import")
bill status
Resource Usage
Frontend Instance Hours 198.70 Instance Hours
Backend Instance Hours 0.00 Instance Hours
Task Headers
X-AppEngine-Current-Namespace
Content-Type application/octet-stream
Referer http://worker.appname.appspot.com/_ah/queue/deferred
Content-Length 1619
Host worker.appname.appspot.com
User-Agent AppEngine-Google; (+http://code.google.com/appengine)
I needed to deploy my backend code:
appcfg.py backends update dir instance_name

Google App Engine - does instance hours include upload / download time by HTTP client to server?

I have an application hosted on the Google App Engine platform. The application is mostly I/O intensive and involves a high number of upload and download operations to the app engine server by an HTTP client.
My question is: what does the instance hour comprise of in this case ? Does it include the total time taken by the HTTP client to upload the request data ? Or does the instance hour calculation begin when the entire request data is uploaded and processing of the request starts ?
Example results from the application:
An HTTP client sent an upload request to the app engine server, request data size 1.1 MB
Time taken for request to complete on the client side - 78311 ms
Corresponding server log entry:
- - [Time] "POST / HTTP/1.1" 200 127 - "Apache-HttpClient/UNAVAILABLE (java 1.4)" "" ms=3952 cpu_ms=1529 api_cpu_ms=283 cpm_usd=0.154248 instance=
An HTTP client sent a download request to the app engine server.
Time taken for request to complete on the client side - 8632 ms
Corresponding server log entry:
- - [Time] "POST / HTTP/1.1" 200 297910 - "Apache-HttpClient/UNAVAILABLE (java 1.4)" "" ms=909 cpu_ms=612 api_cpu_ms=43 cpm_usd=0.050377 instance=
Which of these figures contributes towards the instance hour utilization - is it a) ms, b) cpu_ms or c) the time taken for request to complete on the client side ?
Please note that the HTTP client uses a FileEntity while uploading data, therefore I assume that data is sent over by the client to the server in a single part.
Incoming requests are buffered by the App Engine infrastructure, and the request is only passed to an instance of your app once the entire request has been received. Likewise, outgoing requests are buffered, and your app does not have to wait for the user to finish downloading the response. As a result, upload and download time are not charged against your app.
To understand numbers in log look at log breakdown, a bit more readable here.
None of the options you presented (a. b. c.) are directly billed. It used to be that GAE counted CPU time as a unit of cost, but that changed Nov 2011. Now you pay for instance uptime, even if instance is not handling any requests. Instances stop being billed after 15 min of inactivity.
(This does not mean that GAE actually shuts instances down after they stop billing for them - see "Instances" graph in your dashboard.)
How many instances are up depends on your app's performance settings.
Since your app is IO intensive it will help to enable concurrent requests (Java, Python 2.7). This way instance can run multiple parallel requests which are mainly waiting for IO - in our app I'm seeing about 15-20 requests being served in parallel on one instance.
Update:
This is what first link says about ms=xyz log entry:
This is the actual time (hence 'wallclock' time) taken to return a response to
the user, not including the time it took for the user to send their request or
the time it takes to send the response back - that is, just the time spent
processing by your app.
Note that Nick Johnson is an engineer on GAE team, so this can be taken as authoritative answer.

Resources