App Engine: Alternatives to urlfetch? Seems very unreliable - google-app-engine

I'm using urlfetch in my app and while everything works perfectly fine in the development environment, i'm finding urlfetch to be VERY unreliable when it's actually deployed. Sometimes it works as it should (retrieving data), but then a few minutes later it might return nothing, then it'll be working fine again a few minutes after that. This is very unacceptable. I've checked to make sure it's NOT the source URL that's the problem (YQL) and, again, everything works as it should in the development environment.
Are there any third-party libraries I could try?
Example code:
url = "http://query.yahooapis.com/v1/public/yql?q=%s&format=json" % urllib.quote_plus(query)
result = urlfetch.fetch(url, deadline=10)
if result.status_code == 200:
r = json.loads(result.content)
else:
return
a = r['query']['results']
# Do stuff with 'a'
Sometimes it'll work as it should, but other times - completely randomly with no code changes - i'll get this this error:
a = r['query']['results']
TypeError: 'NoneType' object is unsubscriptable

Sometimes it'll work as it should,
but other times completely randomly with no code changes
This is a common symptom that your application's requests have exceeded the Yahoo API calls rate limit.
Quoting Yahoo developer documentations rate limit:
IP Based Limits
Our service rate limits are imposed as
a limit on the number of API calls
made per IP address during a specific
time window. If your IP address
changes during that time period, you
may find yourself with more "credit"
available. However, if someone else
had been using the address and hit the
limit, you'll need to wait until the
end of the time period to be allowed
to make more API calls.
Google App Engine uses a pool of IP addresses for outgoing urlfetch requests and your application is sharing these IP addresses with other applications that are calling the same Yahoo endpoint; when the rate limit is exceeded, the endpoint replies with a limit exceeded error causing UrlFetch to fail.
Here another case using the Twitter search API.
When you mix Google App Engine+Third party web APIs, you need to be sure that the API provides authenticated calls allowing your application to have its own quota (StackApps API for example).

import urllib2
response = urllib2.urlopen('http://python.org/')
html = response.read()

This isn't an error in URLFetch - it's an issue with the JSON being returned. Either json.loads is returning None, or r['query'] is - I'm guessing it's probably the latter. Try logging result.content to see what the service is returning. You probably also want to cehck result.status.
One possibility is that your request is being denied or ratelimited by Yahoo in production, but not on your development machine.

Related

How to run Google App Engine app indefinitely

I successfully deployed a twitter screenshot bot on Google App Engine.
This is my first time deploying.
First thing I noticed was that the app didn't start running until I clicked the link.
When I did, the app worked successfully (replied to tweets with screenshots) as long as the tab was loading and open.
When I closed the tab, the bot stopped working.
Also, in the cloud shell log, I saw:
Handling signal: term
[INFO] Worker exiting (pid 18)
This behaviour surprises me as I expect it to keep running on google server indefinitely.
My bot works by streaming with Twitter api. Also the "worker exiting" line above surprises me.
Here is the relevant code:
def get_stream(set):
global servecount
with requests.get(f"https://api.twitter.com/2/tweets/search/stream?tweet.fields=id,author_id&user.fields=id,username&expansions=author_id,referenced_tweets.id", auth=bearer_oauth, stream=True) as response:
print(response.status_code)
if response.status_code == 429:
print(f"returned code 429, waiting for 60 seconds to try again")
print(response.text)
time.sleep(60)
return
if response.status_code != 200:
raise Exception(
f"Cannot get stream (HTTP {response.status_code}): {response.text}"
)
for response_line in response.iter_lines():
if response_line:
json_response = json.loads(response_line)
print(json.dumps(json_response, indent=4))
if json_response['data']['referenced_tweets'][0]['type'] != "replied_to":
print(f"that was a {json_response['data']['referenced_tweets'][0]['type']} tweet not a reply. Moving on.")
continue
uname = json_response['includes']['users'][0]['username']
tid = json_response['data']['id']
reply_tid = json_response['includes']['tweets'][0]['id']
or_uid = json_response['includes']['tweets'][0]['author_id']
print(uname, tid, reply_tid, or_uid)
followers = api.get_follower_ids(user_id='1509540822815055881')
uid = int(json_response['data']['author_id'])
if uid not in followers:
try:
client.create_tweet(text=f"{uname}, you need to follow me first :)\nPlease follow and retry. \n\n\nIf there is a problem, please speak with my creator, #JoIyke_", in_reply_to_tweet_id=tid, media_ids=[mid])
except:
print("tweet failed")
continue
mid = getmedia(uname, reply_tid)
#try:
client.create_tweet(text=f"{uname}, here is your screenshot: \n\n\nIf there is a problem, please speak with my creator, #JoIyke_", in_reply_to_tweet_id=tid, media_ids=[mid])
#print(f"served {servecount} users with screenshot")
#servecount += 1
#except:
# print("tweet failed")
editlogger()
def main():
servecount, tries = 1, 1
rules = get_rules()
delete = delete_all_rules(rules)
set = set_rules(delete)
while True:
print(f"starting try: {tries}")
get_stream(set)
tries += 1
If this is important, my app.yaml file has only one line:
runtime: python38
and I deployed the app from cloud shell with gcloud app deploy app.yaml
What can I do?
I have searched and can't seem to find a solution. Also, this is my first time deploying an app sucessfully.
Thank you.
Google App Engine works on demand i.e. when it receives an HTTP(s) request.
Neither Warmup requests nor min_instances > 0 will meet your needs. A warmup tries to 'start up' an instance before your requests come in. A min_instance > 0 simply says not to kill the instance but you still need an http request to invoke the service (which is what you did by opening a browser tab and entering your Apps url).
You may ask - since you've 'started up' the instance by opening a browser tab, why doesn't it keep running afterwards? The answer is that every request to a Google App Engine (Standard) app must complete within 1 - 10 minutes (depending on the type of scaling) your App is using (see documentation). For Google App Engine Flexible, the timeout goes up to 60 minutes. This tells you that your service will timeout after at most 10 minutes on GAE standard or 60 minutes on GAE Flexible.
I think the best solution for you on GCP is to use Google Compute Engine (GCE). Spin up a virtual server (pick the lowest configuration so you can stick within the free tier). If you use GCE, it means you spin up a Virtual Machine (VM), deploy your code to it and kick off your code. Your code then runs continuously.
App Engine works on demand, i.e, only will be up if there are requests to the app (this is why when you click on the URL the app works). As well you can set 1 instance to be "running all the time" (min_instances) it will be an anti-pattern for what you want to accomplish and App Engine. Please read How Instances are Managed
Looking at your code you're pulling data every minute from Twitter, so the best option for you is using Cloud Scheduler + Cloud Functions.
Cloud Scheduler will call your Function and it will check if there is data to process, if not the process is terminated. This will help you to save costs because instead of have something running all the time, the function will only work the needed time.
On the other hand I'm not an expert with the Twitter API, but if there is a way that instead of pulling data from Twitter and Twitter calls directly your function it will be better since you can optimize your costs and the function will only run when there is data to process instead of checking every n minutes.
As an advice, first review all the options you have in GCP or the provider you'll use, then choose the best one for your use case. Just selecting one that works with your programming language does not necessarily will work as you expect like in this case.

Google Monitoring API : Get Values

I'm trying to use the Google Monitoring API to retrieve metrics about my cloud usage. I'm using the Google Client Library for Python.
The API advertises the ability to access over 900 Stackdriver Monitoring Metrics. I am interested in accessing some Google App Engine metrics, such as Instance count, total memory, etc. The Google API Metrics page has a list of all the metrics I should be able to access.
I've followed the guides on the Google Client Library page , but my script making the API calls is not printing the metrics, it is just printing the metric descriptions.
How do I use the Google Monitoring API to access the metrics, rather than the descriptions?
My Code:
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
...
response = monitor.projects().metricDescriptors().get(name='projects/{my-project-name}/metricDescriptors/appengine.googleapis.com/system/instance_count').execute()
print(json.dumps(response, sort_keys=True, indent=4))
My Output
I expect to see the actual instance count. How can I achieve this?
For anyone reading this, I figured out the problem. I was assuming the values would come from the 'metric descriptors' class in the api, but that was a poor assumption.
For values, you need to use a 'timeSeries' call. For this call, you need to specify the project you want to monitor, start time, end time, and a filter (the metric you want, such as cpu, memory, etc.)
So, to retrieve the app engine project memory, the above code becomes
request = monitor.projects().timeSeries().list(name='projects/my-appengine-project',
interval_startTime='2016-05-02T15:01:23.045123456Z',
interval_endTime='2016-06-02T15:01:23.045123456Z',
filter='metric.type="appengine.googleapis.com/system/memory/usage"')
response = request.execute()
This example has the start time and end time to cover a month of data.

Long-running script on Google App Engine

I'm attempting to create a microservice on Google App Engine that is not intended to handle HTTP requests.
Instead, I was hoping to have a continuously running Python script that monitors a remote queue--RabbitMQ, to be precise--and sends out an api-call to another service as tasks are pushed to the queue.
I was wondering, firstly, is it possible to run a script upon deployment--one that did not originate with a user action/request?
Secondly, how would I accomplish this?
Thanks in advance for your time!
You can deploy your "script" as a manually scaled module -- see https://cloud.google.com/appengine/docs/python/modules/ -- with exactly one instance. As the docs say, "When you start a manual scaling instance, App Engine immediately sends a /_ah/start request to each instance"; so, just set that module's handler for /_ah/start to the handler you want to run (in the module's yaml file and the WSGI app in the Python code, using whatever lightweight framework you like -- webapp2, falcon, flask, bottle, or whatever else... the framework won't be doing much for you in this case save the one-off routing).
Note that the number of free machine hours for manual scaling modules is limited to 8 hours per day (for the smaller, B1 instance class; proportionally fewer for larger instance classes), so you may need to upgrade to paid-app status if you need to run for more than 8 hours.
Like #brant said, App Engine is designed to handle HTTP requests. It's not a perfect fit for background jobs, unless you try to wrap your logic into one http request.
Further, App Engine will emit an error when the response timeout, depending on your scaling settings. If you want to try it, consider basic or manual scaling.
For this type of workload, I would suggest you use a VM.
I think there are a few problems with this design.
First, App Engine is designed to be an HTTP request processor, not a RabbitMQ message processor. GAE is intended for many small requests, not one long-running process.
Second, "RabbitMQ should not be exposed to the public internet, it wasn't created for such use case."
I would recommend that you keep the RabbitMQ clients on the same internal network as the RabbitMQ broker, and have the clients send HTTP requests to App Engine.

Using 1 intance of google-app-engine to monitor external service

I planning to create a NodeJS program, that work 24/7, that ping and make requests to an external server (outside of google cloud) every minute. Just to see that it the external services are are live.
If there is any error it will notify me by SMS & Email.
I don't need any front-end for this app, and no one needs to connect to it. Just simple NodeJS program.
The monitoring and configuration will be by texts files.
Now the questions:
It looks like it will cost me just $1.64. It sounds very cheap. Am I missing something?
It needs to work around the clock, I will request it to start it once, and it need to continue working, (by using setInterval). Is it will be aborted?
What it is exactly mean buy 1 instance. What an instance can do? Only respond to one request or what?
I tried to search in Google: appengine timeout, but didn't found anything that helps.
Free Quota
If you write your application in Python, PHP, Go or Java it can fit in free usage quota:
https://cloud.google.com/appengine/docs/quotas
So there will be absolutely no costs to run it on Google App Engine platform.
There are limit of 657,000 UrlFetch API Calls per day (more than 450 calls per minute in 24/7 mode) for free apps. 4GB traffic may also be sufficient for this kind of work.
Keep in mind there is no SMS sending services provided by Google App Engine and you will need to spend additional UrlFetch API calls to use external SMS services.
Email sending is also limited to 100 Emails per day (or 5000 Emails to admin address), so try not so send repeated notifications about same monitored server every minute, or you'll deplete your Email quote in 1.5 hours.
Scheduled Tasks
There is no way to run single process indefinitely without interruption on App Engine. But you don't have to!
You'll need to encapsulate all the work you're planning to execute in every iteration into single task and then schedule it to run every minute with Cron. See this documentation for Python: https://cloud.google.com/appengine/docs/python/config/cron
It is recommended to have some configuration page where you can set some internal configuration or see monitoring statistics, at least manage flag to temporarily pause tasks execution without redeploying your app.

getLastAccessedTime() in GWT sessions returns 0

I'm trying to add timeouts to GWT sessions, by using the following code to check if a session is alive:
public boolean isSessionAlive() {
return System.currentTimeMillis() - getThreadLocalRequest().getSession()
.getLastAccessedTime() < timeout;
}
I based this code on many examples I saw on web for GWT sessions, such as this.
The above code works great while running on a local web server, but after deploying the project to App Engine it doesn't. The following always returns 0 on App Engine:
getThreadLocalRequest().getSession().getLastAccessedTime()
As far as I understand, the last accessed time is updated on each RPC call.
I made several calls, but this value still remains zero and incorrect result is returned.
Does anybody know how to fix this issue?
Things will change after deployed on GAE
Just today attended the session on app engine by #roman irani .
remember that App Engine is a distributed architecture so a difference from Java EE is that you are never guaranteed the same application server instance during request processing as the previous request. While the object is being serialized correctly in memcache, you still have to call setAttribute() every time due to the fact that memory is not shared.
Clear cut picture here to handle the session
I have found a workaround. Adding the following code in war/WEB-INF/web.xml will cause the session to expire after 30 minutes:
<!-- timeout in minutes -->
<session-config>
<session-timeout>30</session-timeout>
</session-config>
Reference: Session Timeouts with GWT RPC calls.

Resources