Fetching url in python with google app engine - google-app-engine

I'm using this code to send Http request inside my app and then show the result:
def get(self):
url = "http://www.google.com/"
try:
result = urllib2.urlopen(url)
self.response.out.write(result)
except urllib2.URLError, e:
I expect to get the html code of google.com page, but I get this sign ">", what the wrong with that ?

Try using the urlfetch service instead of urllib2:
Import urlfetch:
from google.appengine.api import urlfetch
And this in your request handler:
def get(self):
try:
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
self.response.out.write(result.content)
else:
self.response.out.write("Error: " + str(result.status_code))
except urlfetch.InvalidURLError:
self.response.out.write("URL is an empty string or obviously invalid")
except urlfetch.DownloadError:
self.response.out.write("Server cannot be contacted")
See this document for more detail.

You need to call the read() method to read the response. Also good practice to check the HTTP status, and close when your done.
Example:
url = "http://www.google.com/"
try:
response = urllib2.urlopen(url)
if response.code == 200:
html = response.read()
self.response.out.write(html)
else:
# handle
response.close()
except urllib2.URLError, e:
pass

Related

Flask app is being run locally instead of on heroku

So I've deployed my flask app with react front end to heroku, but there seems to be some problem where flask is running on my local host instead of one the heroku server.
I've read tons of stackoverflow posts on this but to no resolution. Here is my flask code:
from flask import Flask, request
import flask
from flask_sqlalchemy import SQLAlchemy
from datetime import datetime
from flask_cors import CORS
app = Flask(__name__,static_folder="./build",static_url_path="/")
app.config['SQLALCHEMY_DATABASE_URI'] = 'my database url'
app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False
app.secret_key = 'secret string'
CORS(app)
db = SQLAlchemy(app)
class Feature_votes(db.Model):
feature = db.Column(db.String(500), primary_key=True)
votes = db.Column(db.Integer, nullable=False)
date = db.Column(db.DateTime, nullable=False)
def __init__(self, feature, votes, date):
self.feature = feature
self.votes = votes
self.date = date
# Serve the react app
#app.route("/")
def index():
return app.send_static_file("index.html")
# Retrieve currently polled features from Feature_votes
#app.route("/getVotes", methods=['GET'])
def getVotes():
rows = Feature_votes.query.filter().order_by(Feature_votes.date)
response = []
for row in rows:
response.append(
{"feature": row.feature,
"votes": row.votes
})
return flask.jsonify(response)
# Add a new feature to the db with votes set to 0
#app.route("/featureAdd", methods=['POST'])
def featureAdd():
feature = request.get_json()["feature"]
featureEntry = Feature_votes(feature, 0, datetime.utcnow())
db.session.add(featureEntry)
db.session.commit()
response = {"feature": featureEntry.feature,
"votes": 0,
"date": featureEntry.date
}
return response
#app.route("/featureModifyVotes", methods=['POST'])
def featureUnvote():
feature = request.get_json()["feature"]
direction = request.get_json()["direction"]
featureEntry = Feature_votes.query.filter_by(feature=feature).first()
if (direction == "increase"):
featureEntry.votes += 1
else:
featureEntry.votes -= 1
db.session.commit()
response = {featureEntry.feature: featureEntry.votes}
return response
if __name__ == '__main__':
app.run()
and here is my Procfile
web: gunicorn --bind 0.0.0.0:$PORT server:app
Also here is a snip I took from inspect element to show that this request is being served locally.
I am relatively new to web development so it is possible I made a lot of mistakes. Please let me know if you can help or need any more info from me. Thanks.
So apparently that screenshot I posted in the question didn't mean that my server was running on localhost, but rather that my request was being made to the localhost. Turns out I had fetch("http://localhost...) in my build files. After using a relative path, rebuilding and pushing to heroku, everything is working.

how do you get current request's in aiohttp when it isn't a handler

I have an aiohttp app that has some endpoints created using nested apps.
My use case is once the request is processed, I want to return not in web.response format but whatever format is requested by the client in their request's header (could be csv, json, html etc).
So I was using a decorator and from that decorator wrapper, getting the current request to know the header format and process the response to that type.
My question is how can I get current request's context. I know there isn't anything like current_app like in flask, so what's the best of doing what I want to do.
Below I am posting some code that explains above content:
#subapp_routes.get('')
#subapp_routes.get('/{c_id}')
#format_output
async def index(request):
print(request)
c_id = request.match_info.get('c_id', None)
return await get_index(c_id)
def format_data_object(data):
status = 200
mime = _most_acceptable_format(request, data) # HOW TO PASS THIS CURRENT request AS CURRENTLY THIS ISN'T GETTING RECOGNIZED. I TRIED aiohttp.request and aiohttp.web.request but get not recognized, so not sure now
if mime == MIME_DATAFRAME:
return _render_dataframe(data, status)
elif mime == MIME_CSV:
return _render_csv(data, status)
elif mime == MIME_JSON:
return _render_json(data, status)
elif mime == MIME_HTML:
return _render_html(data, status)
raise InvalidRequest('unrecognized format: "%s"' % mime)
def format_output(function):
"""
Output format decorator.
"""
#wraps(function)
def wrapper(*args, **kwargs):
try:
data = function(*args, **kwargs)
return format_data_object(data)
except Exception as ex:
return handle_error(ex)
return wrapper
flask way spoils people.
If you need an entire request, DB connection or other resource -- explicitly pass it into called function.
Very obvious and elegant way which doesn't require any implicit context namespace magic.
Please left things like threadlocal variables to system tools, user code should not use them for sake of simplicity and readability.

GAE weapp2 Access-Control-Allow-Origin Error

I get this error while using webapp2 user authentication. No 'Access-Control-Allow-Origin' header is present on the requested resource.
How do I add access header before redirect ?
redirect code :
self.redirect(users.create_login_url(self.request.uri))
CODE:
class Authenticate(webapp2.RequestHandler):
def get(self):
user = users.get_current_user()
cookie_value = self.request.cookies.get('user')
if user==cookie_value and user!=None:
self.response.headers['Content-Type'] = 'text/plain'
self.response.write('Success')
else:
self.request.headers['Access-Control-Allow-Origin'] = '*'
self.redirect(users.create_login_url(self.request.uri))
def redirect(self, uri, permanent=False, abort=False, code=None,body=None)
and
def redirect(uri, permanent=False, abort=False, code=None,
body=None,request=None, response=None)
So, I think you should call the second one as (if you want to pass the response object):
return webapp2.redirect(users.create_login_url(self.request.uri),True,False,None
,None,None,self.response)
Refer Webapp2 Source Code
Before the self.redirect call, do
self.response.headers['Access-Control-Allow-Origin'] = '*'
or whatever value you wish to have for that header -- self.redirect by default makes a new response object, but you can change that with
return self.redirect(users.create_login_url(self.request.uri,
response=self.response)

url fetch too many repeated redirects

I am trying to load a url and I get this error:
DownloadError: ApplicationError: 2 Too many repeated redirects
This is the code I am using:
headers = { 'User-Agent' : 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; de-at) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1' }
url = "http://www.cafebonappetit.com/menu/your-cafe/collins-cmc/cafes/details/50/collins-bistro"
cmcHTM = urlfetch.fetch(url=url)
cmcHTML = str(cmcHTM.content)
I check the redirections of this website at: http://www.internetofficer.com/seo-tool/redirect-check/
and I found that this site is redirected to itself! So url fetch seems to be going in circles trying to load this page.
Meanwhile, this page loads just fine in my browser.
So I tried using this code:
cmcHTM = urlfetch.fetch(url=url,
follow_redirects=False,
deadline=100
)
This just returns nothing though. Is there any way of getting this html?!
Sorry for the delayed response. I found this that worked:
import urllib, urllib2, Cookie
from google.appengine.api import urlfetch
class URLOpener:
def __init__(self):
self.cookie = Cookie.SimpleCookie()
def open(self, url, data = None):
if data is None:
method = urlfetch.GET
else:
method = urlfetch.POST
while url is not None:
response = urlfetch.fetch(url=url,
payload=data,
method=method,
headers=self._getHeaders(self.cookie),
allow_truncated=False,
follow_redirects=False,
deadline=10
)
data = None # Next request will be a get, so no need to send the data again.
method = urlfetch.GET
self.cookie.load(response.headers.get('set-cookie', '')) # Load the cookies from the response
url = response.headers.get('location')
return response
def _getHeaders(self, cookie):
headers = {
'Host' : 'www.google.com',
'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)',
'Cookie' : self._makeCookieHeader(cookie)
}
return headers
def _makeCookieHeader(self, cookie):
cookieHeader = ""
for value in cookie.values():
cookieHeader += "%s=%s; " % (value.key, value.value)
return cookieHeader
I guess the key is the while loop - following the redirects based on the return header...
I think this is a problem in the site, not in your code. The site seems designed so it does a redirect to itself when it doesn't detect some header that is customarily sent by a browser. E.g. when I try accessing it with curl I get an empty body with a 302 redirect to itself, but in the browser I get a page. You'd have to ask the site owner what they are checking for...

Google App Engine - Secure Cookies

I'd been searching for a way to do cookie based authentication/sessions in Google App Engine because I don't like the idea of memcache based sessions, and I also don't like the idea of forcing users to create google accounts just to use a website. I stumbled across someone's posting that mentioned some signed cookie functions from the Tornado framework and it looks like what I need. What I have in mind is storing a user's id in a tamper proof cookie, and maybe using a decorator for the request handlers to test the authentication status of the user, and as a side benefit the user id will be available to the request handler for datastore work and such. The concept would be similar to forms authentication in ASP.NET. This code comes from the web.py module of the Tornado framework.
According to the docstrings, it "Signs and timestamps a cookie so it cannot be forged" and
"Returns the given signed cookie if it validates, or None."
I've tried to use it in an App Engine Project, but I don't understand the nuances of trying to get these methods to work in the context of the request handler. Can someone show me the right way to do this without losing the functionality that the FriendFeed developers put into it? The set_secure_cookie, and get_secure_cookie portions are the most important, but it would be nice to be able to use the other methods as well.
#!/usr/bin/env python
import Cookie
import base64
import time
import hashlib
import hmac
import datetime
import re
import calendar
import email.utils
import logging
def _utf8(s):
if isinstance(s, unicode):
return s.encode("utf-8")
assert isinstance(s, str)
return s
def _unicode(s):
if isinstance(s, str):
try:
return s.decode("utf-8")
except UnicodeDecodeError:
raise HTTPError(400, "Non-utf8 argument")
assert isinstance(s, unicode)
return s
def _time_independent_equals(a, b):
if len(a) != len(b):
return False
result = 0
for x, y in zip(a, b):
result |= ord(x) ^ ord(y)
return result == 0
def cookies(self):
"""A dictionary of Cookie.Morsel objects."""
if not hasattr(self,"_cookies"):
self._cookies = Cookie.BaseCookie()
if "Cookie" in self.request.headers:
try:
self._cookies.load(self.request.headers["Cookie"])
except:
self.clear_all_cookies()
return self._cookies
def _cookie_signature(self,*parts):
self.require_setting("cookie_secret","secure cookies")
hash = hmac.new(self.application.settings["cookie_secret"],
digestmod=hashlib.sha1)
for part in parts:hash.update(part)
return hash.hexdigest()
def get_cookie(self,name,default=None):
"""Gets the value of the cookie with the given name,else default."""
if name in self.cookies:
return self.cookies[name].value
return default
def set_cookie(self,name,value,domain=None,expires=None,path="/",
expires_days=None):
"""Sets the given cookie name/value with the given options."""
name = _utf8(name)
value = _utf8(value)
if re.search(r"[\x00-\x20]",name + value):
# Don't let us accidentally inject bad stuff
raise ValueError("Invalid cookie %r:%r" % (name,value))
if not hasattr(self,"_new_cookies"):
self._new_cookies = []
new_cookie = Cookie.BaseCookie()
self._new_cookies.append(new_cookie)
new_cookie[name] = value
if domain:
new_cookie[name]["domain"] = domain
if expires_days is not None and not expires:
expires = datetime.datetime.utcnow() + datetime.timedelta(
days=expires_days)
if expires:
timestamp = calendar.timegm(expires.utctimetuple())
new_cookie[name]["expires"] = email.utils.formatdate(
timestamp,localtime=False,usegmt=True)
if path:
new_cookie[name]["path"] = path
def clear_cookie(self,name,path="/",domain=None):
"""Deletes the cookie with the given name."""
expires = datetime.datetime.utcnow() - datetime.timedelta(days=365)
self.set_cookie(name,value="",path=path,expires=expires,
domain=domain)
def clear_all_cookies(self):
"""Deletes all the cookies the user sent with this request."""
for name in self.cookies.iterkeys():
self.clear_cookie(name)
def set_secure_cookie(self,name,value,expires_days=30,**kwargs):
"""Signs and timestamps a cookie so it cannot be forged"""
timestamp = str(int(time.time()))
value = base64.b64encode(value)
signature = self._cookie_signature(name,value,timestamp)
value = "|".join([value,timestamp,signature])
self.set_cookie(name,value,expires_days=expires_days,**kwargs)
def get_secure_cookie(self,name,include_name=True,value=None):
"""Returns the given signed cookie if it validates,or None"""
if value is None:value = self.get_cookie(name)
if not value:return None
parts = value.split("|")
if len(parts) != 3:return None
if include_name:
signature = self._cookie_signature(name,parts[0],parts[1])
else:
signature = self._cookie_signature(parts[0],parts[1])
if not _time_independent_equals(parts[2],signature):
logging.warning("Invalid cookie signature %r",value)
return None
timestamp = int(parts[1])
if timestamp < time.time() - 31 * 86400:
logging.warning("Expired cookie %r",value)
return None
try:
return base64.b64decode(parts[0])
except:
return None
uid=1234|1234567890|d32b9e9c67274fa062e2599fd659cc14
Parts:
1. uid is the name of the key
2. 1234 is your value in clear
3. 1234567890 is the timestamp
4. d32b9e9c67274fa062e2599fd659cc14 is the signature made from the value and the timestamp
Tornado was never meant to work with App Engine (it's "its own server" through and through). Why don't you pick instead some framework that was meant for App Engine from the word "go" and is lightweight and dandy, such as tipfy? It gives you authentication using its own user system or any of App Engine's own users, OpenIn, OAuth, and Facebook; sessions with secure cookies or GAE datastore; and much more besides, all in a superbly lightweight "non-framework" approach based on WSGI and Werkzeug. What's not to like?!
For those who are still looking, we've extracted just the Tornado cookie implementation that you can use with App Engine at ThriveSmart. We're using it successfully on App Engine and will continue to keep it updated.
The cookie library itself is at:
http://github.com/thrivesmart/prayls/blob/master/prayls/lilcookies.py
You can see it in action in our example app that's included. If the structure of our repository ever changes, you can look for lilcookes.py within github.com/thrivesmart/prayls
I hope that's helpful to someone out there!
This works if anyone is interested:
from google.appengine.ext import webapp
import Cookie
import base64
import time
import hashlib
import hmac
import datetime
import re
import calendar
import email.utils
import logging
def _utf8(s):
if isinstance(s, unicode):
return s.encode("utf-8")
assert isinstance(s, str)
return s
def _unicode(s):
if isinstance(s, str):
try:
return s.decode("utf-8")
except UnicodeDecodeError:
raise HTTPError(400, "Non-utf8 argument")
assert isinstance(s, unicode)
return s
def _time_independent_equals(a, b):
if len(a) != len(b):
return False
result = 0
for x, y in zip(a, b):
result |= ord(x) ^ ord(y)
return result == 0
class ExtendedRequestHandler(webapp.RequestHandler):
"""Extends the Google App Engine webapp.RequestHandler."""
def clear_cookie(self,name,path="/",domain=None):
"""Deletes the cookie with the given name."""
expires = datetime.datetime.utcnow() - datetime.timedelta(days=365)
self.set_cookie(name,value="",path=path,expires=expires,
domain=domain)
def clear_all_cookies(self):
"""Deletes all the cookies the user sent with this request."""
for name in self.cookies.iterkeys():
self.clear_cookie(name)
def cookies(self):
"""A dictionary of Cookie.Morsel objects."""
if not hasattr(self,"_cookies"):
self._cookies = Cookie.BaseCookie()
if "Cookie" in self.request.headers:
try:
self._cookies.load(self.request.headers["Cookie"])
except:
self.clear_all_cookies()
return self._cookies
def _cookie_signature(self,*parts):
"""Hashes a string based on a pass-phrase."""
hash = hmac.new("MySecretPhrase",digestmod=hashlib.sha1)
for part in parts:hash.update(part)
return hash.hexdigest()
def get_cookie(self,name,default=None):
"""Gets the value of the cookie with the given name,else default."""
if name in self.request.cookies:
return self.request.cookies[name]
return default
def set_cookie(self,name,value,domain=None,expires=None,path="/",expires_days=None):
"""Sets the given cookie name/value with the given options."""
name = _utf8(name)
value = _utf8(value)
if re.search(r"[\x00-\x20]",name + value): # Don't let us accidentally inject bad stuff
raise ValueError("Invalid cookie %r:%r" % (name,value))
new_cookie = Cookie.BaseCookie()
new_cookie[name] = value
if domain:
new_cookie[name]["domain"] = domain
if expires_days is not None and not expires:
expires = datetime.datetime.utcnow() + datetime.timedelta(days=expires_days)
if expires:
timestamp = calendar.timegm(expires.utctimetuple())
new_cookie[name]["expires"] = email.utils.formatdate(timestamp,localtime=False,usegmt=True)
if path:
new_cookie[name]["path"] = path
for morsel in new_cookie.values():
self.response.headers.add_header('Set-Cookie',morsel.OutputString(None))
def set_secure_cookie(self,name,value,expires_days=30,**kwargs):
"""Signs and timestamps a cookie so it cannot be forged"""
timestamp = str(int(time.time()))
value = base64.b64encode(value)
signature = self._cookie_signature(name,value,timestamp)
value = "|".join([value,timestamp,signature])
self.set_cookie(name,value,expires_days=expires_days,**kwargs)
def get_secure_cookie(self,name,include_name=True,value=None):
"""Returns the given signed cookie if it validates,or None"""
if value is None:value = self.get_cookie(name)
if not value:return None
parts = value.split("|")
if len(parts) != 3:return None
if include_name:
signature = self._cookie_signature(name,parts[0],parts[1])
else:
signature = self._cookie_signature(parts[0],parts[1])
if not _time_independent_equals(parts[2],signature):
logging.warning("Invalid cookie signature %r",value)
return None
timestamp = int(parts[1])
if timestamp < time.time() - 31 * 86400:
logging.warning("Expired cookie %r",value)
return None
try:
return base64.b64decode(parts[0])
except:
return None
It can be used like this:
class MyHandler(ExtendedRequestHandler):
def get(self):
self.set_cookie(name="MyCookie",value="NewValue",expires_days=10)
self.set_secure_cookie(name="MySecureCookie",value="SecureValue",expires_days=10)
value1 = self.get_cookie('MyCookie')
value2 = self.get_secure_cookie('MySecureCookie')
If you only want to store the user's user ID in the cookie (presumably so you can look their record up in the datastore), you don't need 'secure' or tamper-proof cookies - you just need a namespace that's big enough to make guessing user IDs impractical - eg, GUIDs, or other random data.
One pre-made option for this, which uses the datastore for session storage, is Beaker. Alternately, you could handle this yourself with set-cookie/cookie headers, if you really just need to store their user ID.
Someone recently extracted the authentication and session code from Tornado and created a new library specifically for GAE.
Perhaps this is more then you need, but since they did it specifically for GAE you shouldn't have to worry about adapting it yourself.
Their library is called gaema. Here is their announcement in the GAE Python group on 4 Mar 2010:
http://groups.google.com/group/google-appengine-python/browse_thread/thread/d2d6c597d66ecad3/06c6dc49cb8eca0c?lnk=gst&q=tornado#06c6dc49cb8eca0c

Resources