Strange "Over quota: " error with no further diagnostics - google-app-engine

I'm running a parallel job to copy data from Heroku into Google Cloud Storage and eventually into Bigquery. The way I'm doing it right now is split the job of querying IDs in the range [61500000, 62000000) into say 40 taskqueue tasks, and inside each task handler is responsible for the subrange say [
61500000, 61512500). Inside each taskqueue task handler, it spawns 3 goroutines to query our Heroku API parallely, and an additional goroutine doing an Insert to Google Cloud Storage. The way the 3 HTTP API input goroutines pump data to the GCS insert goroutine is through io.Pipe().
However, for some reason I can't get this to work except for toy workloads. Virtually every time there will be some shards failing with error:
"Post https://www.googleapis.com/upload/storage/v1beta2/b/ethereal-fort-637.appspot.com/o?alt=json&uploadType=multipart: Over quota: "
returned from the storage.ObjectsInsertCall.Do().
I checked everywhere for possible places where we hit the quota for billed apps:
* urlfetch total limits developers.google.com/appengine/docs/quotas#UrlFetch
* instance memory developers.google.com/appengine/docs/go/modules/#Go_Instance_scaling_and_class
but still couldn't find the cause.
Below I explain why I ruled out the above possibilities:
urlfetch total limits
urlfetch is used in the 3 goroutines to query our API server for JSON data. These 3 goroutines then process the data and send them to the GCS goroutine through io.Pipe(). The code looks something like
cl := urlfetch.Client(c)
resp, err := cl.Get("pic-collage.com/...")
if err != nil {
if appengine.IsOverQuota(err) {
c.Errorf("collageJSONByID over quota: %v", err)
}
return err
}
However, while we see numerous "POST www.googleapis.com/upload/storage/v1beta2...: Over quota: " errors, we never ever see the logs "collageJSONByID ..." related to urlfetches to our Heroku server.
instance memory
We are using the B1 instance for our jobs which has 128MB of RAM. Throughout our runs, we see from our appengine console our memory usage is constantly well below 30MB for each and every instance.
I also applied the fix for the caching inside serviceaccounts described in "Over quota" when using GCS json-api from App Engine , but the problem persists.
I wonder is it possible for us to get more information about the specific App Engine quota we are exceeding? Or perhaps there are other hidden quotas for Google Cloud Storage that are not mentioned in the docs ?

The Over quota: message can happen when you've reached your daily billing limit. If you've enabled billing for your application, make sure the daily budget is high enough to cover your usage.

Related

How to run Google App Engine app indefinitely

I successfully deployed a twitter screenshot bot on Google App Engine.
This is my first time deploying.
First thing I noticed was that the app didn't start running until I clicked the link.
When I did, the app worked successfully (replied to tweets with screenshots) as long as the tab was loading and open.
When I closed the tab, the bot stopped working.
Also, in the cloud shell log, I saw:
Handling signal: term
[INFO] Worker exiting (pid 18)
This behaviour surprises me as I expect it to keep running on google server indefinitely.
My bot works by streaming with Twitter api. Also the "worker exiting" line above surprises me.
Here is the relevant code:
def get_stream(set):
global servecount
with requests.get(f"https://api.twitter.com/2/tweets/search/stream?tweet.fields=id,author_id&user.fields=id,username&expansions=author_id,referenced_tweets.id", auth=bearer_oauth, stream=True) as response:
print(response.status_code)
if response.status_code == 429:
print(f"returned code 429, waiting for 60 seconds to try again")
print(response.text)
time.sleep(60)
return
if response.status_code != 200:
raise Exception(
f"Cannot get stream (HTTP {response.status_code}): {response.text}"
)
for response_line in response.iter_lines():
if response_line:
json_response = json.loads(response_line)
print(json.dumps(json_response, indent=4))
if json_response['data']['referenced_tweets'][0]['type'] != "replied_to":
print(f"that was a {json_response['data']['referenced_tweets'][0]['type']} tweet not a reply. Moving on.")
continue
uname = json_response['includes']['users'][0]['username']
tid = json_response['data']['id']
reply_tid = json_response['includes']['tweets'][0]['id']
or_uid = json_response['includes']['tweets'][0]['author_id']
print(uname, tid, reply_tid, or_uid)
followers = api.get_follower_ids(user_id='1509540822815055881')
uid = int(json_response['data']['author_id'])
if uid not in followers:
try:
client.create_tweet(text=f"{uname}, you need to follow me first :)\nPlease follow and retry. \n\n\nIf there is a problem, please speak with my creator, #JoIyke_", in_reply_to_tweet_id=tid, media_ids=[mid])
except:
print("tweet failed")
continue
mid = getmedia(uname, reply_tid)
#try:
client.create_tweet(text=f"{uname}, here is your screenshot: \n\n\nIf there is a problem, please speak with my creator, #JoIyke_", in_reply_to_tweet_id=tid, media_ids=[mid])
#print(f"served {servecount} users with screenshot")
#servecount += 1
#except:
# print("tweet failed")
editlogger()
def main():
servecount, tries = 1, 1
rules = get_rules()
delete = delete_all_rules(rules)
set = set_rules(delete)
while True:
print(f"starting try: {tries}")
get_stream(set)
tries += 1
If this is important, my app.yaml file has only one line:
runtime: python38
and I deployed the app from cloud shell with gcloud app deploy app.yaml
What can I do?
I have searched and can't seem to find a solution. Also, this is my first time deploying an app sucessfully.
Thank you.
Google App Engine works on demand i.e. when it receives an HTTP(s) request.
Neither Warmup requests nor min_instances > 0 will meet your needs. A warmup tries to 'start up' an instance before your requests come in. A min_instance > 0 simply says not to kill the instance but you still need an http request to invoke the service (which is what you did by opening a browser tab and entering your Apps url).
You may ask - since you've 'started up' the instance by opening a browser tab, why doesn't it keep running afterwards? The answer is that every request to a Google App Engine (Standard) app must complete within 1 - 10 minutes (depending on the type of scaling) your App is using (see documentation). For Google App Engine Flexible, the timeout goes up to 60 minutes. This tells you that your service will timeout after at most 10 minutes on GAE standard or 60 minutes on GAE Flexible.
I think the best solution for you on GCP is to use Google Compute Engine (GCE). Spin up a virtual server (pick the lowest configuration so you can stick within the free tier). If you use GCE, it means you spin up a Virtual Machine (VM), deploy your code to it and kick off your code. Your code then runs continuously.
App Engine works on demand, i.e, only will be up if there are requests to the app (this is why when you click on the URL the app works). As well you can set 1 instance to be "running all the time" (min_instances) it will be an anti-pattern for what you want to accomplish and App Engine. Please read How Instances are Managed
Looking at your code you're pulling data every minute from Twitter, so the best option for you is using Cloud Scheduler + Cloud Functions.
Cloud Scheduler will call your Function and it will check if there is data to process, if not the process is terminated. This will help you to save costs because instead of have something running all the time, the function will only work the needed time.
On the other hand I'm not an expert with the Twitter API, but if there is a way that instead of pulling data from Twitter and Twitter calls directly your function it will be better since you can optimize your costs and the function will only run when there is data to process instead of checking every n minutes.
As an advice, first review all the options you have in GCP or the provider you'll use, then choose the best one for your use case. Just selecting one that works with your programming language does not necessarily will work as you expect like in this case.

Google App Engine: debugging Dashboard > Traffic > Sent

I have an GAE app PHP72, env: standard which is hanging intermittently (once or twice a day for about 5 mins).
When this occurs I see a large spike in GAE dashboard's Traffic Sent graph.
I've reviewed all uses of file_get_contents and curl_exec within the app's scripts, not including those in /vendor/, and don't believe these to be the cause.
Is there a simple way in which I can review more info on these outbound requests?
There is no way to get more details in that dashboard. You're going to need to check your logs at the corresponding times. Obscure things to check for:
Cron jobs coming in at the same times
Task Queues spinning up

Google App Engine app has periods when it returns empty response instead of actual data

I have a small service discovery service running on Google App Engine free tier. It queries Google Cloud Datastore with simplest queries on the data that is virtually static. Recently we had incidents that the service was returning empty results. That was going on for 12 hours. After that results came back to normal. We only recently noticed it. In the logs I see at least 3 incidents like that.
I logged into a console and saw:
0 app errors
0 server errors
all green GCE status
max used quota is just 5%
intervals of time where every response is 204
I see absolutely no reason to receive an empty response and yet they are sometimes empty. I see no way to notify Google that there is cloud-side problems, since this is a free tier account with no support.
So, is there anything I might have missed?
UPD: As I look into the code, the only way to get 204+empty body is to have exception in getting an instance of javax.jdo.PersistenceManager or a new instance of javax.jdo.Query. So it is less likely that Cloud Datastore is at fault, because if the result would be empty, app should answer 200+[], if there was an error during a query, app should answer 204+<h1>Exception</h1>....
But again, I don't see how request could work now, but not work 5 minutes ago.
UPD2: The app was stable for more than two years.

Cannot delete data from Appengine Datastore due to error: "API call datastore_v3.Put() required more quota than is available"

A Google Appengine application has reached the free resource limits regarding Datastore Stored Data. (All other quotas are OK). Hence I'm trying to delete data from the Datastore (on the Datastore Adnmin page).
Only problem is, I cannot delete data because I get this error:
Delete Job Status
There was a problem kicking off the jobs. The error was:
The API call datastore_v3.Put() required more quota than is available.
How to break out from this vicious circle?
You need to wait until the current billing day is over in order your datastore operations quotas to be reset, and the you will be able to delete entities.
If you're getting this error after enabling billing,
you need to set a daily budget.
check out this answer: https://stackoverflow.com/a/31693372/1942593

Google App Engine RemoteApiServlet/remote_api handler errors

Recently, i have come across an error (quite frequently) with the RemoteApiServlet as well as the remote_api handler.
While bulk loading large amounts of data using the Bulk Loader, i start seeing random HTTP 500 errors, with the following details (in the log file):
Request was aborted after waiting too long to attempt to service your request.
This may happen sporadically when the App Engine serving cluster is under
unexpectedly high or uneven load. If you see this message frequently, please
contact the App Engine team.
Can someone explain what i might be doing wrong? This errors prevents the Bulk Loader from uploading any data further, and I am having to start all over again.
Related thread in Google App Engine forums is at http://groups.google.com/group/google-appengine-python/browse_thread/thread/bee08a70d9fd89cd
This isn't specific to remote_api. What's happening is that your app is getting a lot of requests that take a long time to execute, and App Engine will not scale up the number of instances your app runs on if the request latency is too high. As a result, requests are being queued until a handler is available to serve them; if none become available, a 500 is returned and this message is logged.
Simply reduce the rate at which you're bulkloading data, or decrease the batch size so remote_api requests execute faster.

Resources