I have used lxml on Google App Engine to scrape some basic data.
It works fine with the SDK. When I try to use it on the appengine servers I get.
IOError: Error reading file 'http://www.google.com': failed to load external entity "http://www.google.com"
My code looks like;
import lxml.html
url = "http://www.google.com"
t = lxml.html.parse(url)
pagetitle = t.find.(".//title").text
self.response.out.write(pagetitle)
edit:
I ended up having to make a small change to handle as is outlined in the answer below.
from google.appengine.api import urlfetch
result = urlfetch.fetch(url)
t = lxml.html.fromstring(result.content)
GAE does not support opening sockets, you should use urlfetch.fetch() to get the page contents, then feed it to the parser.
Related
I am trying to add a custom domain mapping to my App Engine app using the Google API (not through console). However, I keep getting 403 forbidden error when the http request is made with the Discovery API Client. I have obtained a credentials json file from App Engine with owner permissions and I point to that with the GOOGLE_APPLICATION_CREDENTIALS env variable. Since I have full permissions, I'm guessing the problem is I'm not using the API correctly but haven't been able to see what is missing.
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = os.path.dirname(__file__) + str('/json_cred_file')
apps_client = googleapiclient.discovery.build('appengine', 'v1beta')
response = apps_client.apps().domainMappings().create(
appsId='apps/myappname', body=json.loads(
'{"id": "newsubdomain.mydomain.com","sslSettings": '
'{"sslManagementType": "AUTOMATIC" }}')).execute()
Here is the error:
WARNING 2018-07-06 23:51:09,331 http.py:119] Encountered 403 Forbidden with reason "forbidden"
I contacted Google support and the issue is when using the domain mapping function, the service account needs to be added to the search console as an owner. This is the search console: https://www.google.com/webmasters/tools/home
They have a special page in their docs for using this library on app engine: https://developers.google.com/api-client-library/python/guide/google_app_engine
This is how I use the googleapiclient library. One difference I see is this line:
credentials = GoogleCredentials.get_application_default()
from oauth2client.client import GoogleCredentials
from lib.googleapiclient import discovery
class DataFlowJobsListHandler(AdminResourceHandler):
def get(self, resource_id=None):
"""
Wrapper to this:
https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/list
"""
if resource_id:
self.abort(405)
else:
credentials = GoogleCredentials.get_application_default()
service = discovery.build('dataflow', 'v1b3', credentials=credentials)
project_id = app_identity.get_application_id()
_filter = self.request.GET.pop('filter', 'UNKNOWN').upper()
jobs_list_request = service.projects().jobs().list(
projectId=project_id,
filter=_filter) #'ACTIVE'
jobs_list = jobs_list_request.execute()
return {
'$cursor': None,
'results': jobs_list.get('jobs', []),
}
I am getting following error when I am running "goapp serve myapp/" from Myproject folder inside src.
go-app-builder: Failed parsing input: parser: bad import "unsafe" in
github.com/gorilla/websocket/client.go from GOPATH
my file structure is some something like this
$GOPATH/src
|-github.com/gorilla/websocket
|-MyProject
|-myapp
|-app.yaml
|-mainApp.go (which contain init function and part of app package)
Please let me know how to correct this.
on a related subject, I read that google app provide websocket support only in paid app. Is there a way for me to test my website before getting into payment mode? Or is there a better option that google app engine?
Could someone help me access Big Query from an App Engine application ?
I have completed the following steps -
Created an App Engine project.
Installed google-api-client, oauth2client dependencies (etc) into /lib.
Enabled the Big Query API for the App Engine project via the cloud console.
Created some 'Application Default Credentials' (a 'Service Account Key') [JSON] and saved it/them to the root of the App Engine application.
Created a 'Big Query Service Resource' as per the following -
def get_bigquery_service():
from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
credentials=GoogleCredentials.get_application_default()
bigquery_service=build('bigquery', 'v2', credentials=credentials)
return bigquery_service
Verified that the resource exists -
<googleapiclient.discovery.Resource object at 0x7fe758496090>
Tried to query the resource with the following (ProjectId is the short name of the App Engine application) -
bigquery=get_bigquery_service()
bigquery.tables().list(projectId=#{ProjectId},
datasetId=#{DatasetId}).execute()
Returns the following -
<HttpError 401 when requesting https://www.googleapis.com/bigquery/v2/projects/#{ProjectId}/datasets/#{DatasetId}/tables?alt=json returned "Invalid Credentials">
Any ideas as to steps I might have wrong or be missing here ? The whole auth process seems a nightmare, quite at odds with the App Engine/PaaS ease-of-use ethos :-(
Thank you.
OK so despite being a Google Cloud fan in general, this is definitely the worst thing I have been unfortunate enough to have to work on in a while. Poor/inconsistent/nonexistent documentation, complexity, bugs etc. Avoid if you can!
1) Ensure your App Engine 'Default Service Account' exists
https://console.cloud.google.com/apis/dashboard?project=XXX&duration=PTH1
You get the option to create the Default Service Account only if it doesn't already exist. If you've deleted it by accident you will need a new project; you can't recreate it.
How to recover Google App Engine's "default service account"
You should probably create the default set of JSON credentials, but you won't need to include them as part of your project.
You shouldn't need to create any other Service Accounts, for Big Query or otherwise.
2) Install google-api-python-client and apply fix
pip install -t lib google-api-python-client
Assuming this installs oath2client 3.0.x, then on testing you'll get the following complaint:
File "~/oauth2client/client.py", line 1392, in _get_well_known_file
default_config_dir = os.path.join(os.path.expanduser('~'),
File "/usr/lib/python2.7/posixpath.py", line 268, in expanduser
import pwd
File "~/google_appengine-1.9.40/google/appengine/tools/devappserver2/python/sandbox.py", line 963, in load_module
raise ImportError('No module named %s' % fullname)
ImportError: No module named pwd
which you can fix by changing ~/oauth2client/client.py [line 1392] from:
os.path.expanduser('~')
to:
os.env("HOME")
and adding the following to app.yaml:
env_variables:
HOME: '/tmp'
Ugly but works.
3) Download GCloud SDK and login from console
https://cloud.google.com/sdk/
gcloud auth login
The issue here is that App Engine's dev_appserver.py doesn't include any Big Query replication (natch); so when you're interacting with Big Query tables it's the production data you're playing with; you need to login to get access.
Obvious in retrospect, but poorly documented.
4) Enable Big Query API in App Engine console; create a Big Query ProjectID
https://console.cloud.google.com/apis/dashboard?project=XXX&duration=PTH1
https://bigquery.cloud.google.com/welcome/XXX
5) Test
from oauth2client.client import GoogleCredentials
credentials=GoogleCredentials.get_application_default()
from googleapiclient.discovery import build
bigquery=build('bigquery', 'v2', credentials=credentials)
print bigquery.datasets().list(projectId=#{ProjectId}).execute()
[or similar]
Good luck!
I need to serve PDF files stored in Google Cloud Storage.
I tried:
from google.appengine.api import blobstore
from google.appengine.api import images
bkey = blobstore.create_gs_key('/gs/bucket/pdfobject')
url = images.get_serving_url(bkey)
This works well in the dev server, but when deployed to GAE, gives this error:
get_serving_url_hook\n raise _ToImagesError(e, readable_blob_key)\n', 'TransformationError\n']
I also tried:
url = images.get_serving_url(blob_key=None, secure_url=True, filename='/gs/bucket/pdfobject')
Same problem. Works in dev but TransformationError when deployed.
I'm new to Google App Engine and Python. I've almost completed a project, but can't get the get_serving_url() function to work. I've stripped everything down to the most basic functionality, following the documentation. And yet I still get a 500 error from the server. Any thoughts? Here is the code:
from google.appengine.api import images
....
class Team(db.Model):
avatar = db.BlobProperty()
....
def to_dict(self):
....
image_url = images.get_serving_url(self.avatar.key())
The last line is the problem...commenting it out makes the app run fine. But it is copied almost directly from the documentation. I should note that I can download the avatar blob directly with:
class GetTeamAvatar(webapp2.RequestHandler):
def post(self):
team_id = self.request.get('team_id')
team = Team.get_by_id(long(team_id))
self.response.write(team.avatar)
So I know it is stored correctly. I do not have PIL on my machine...is that the issue? The datastore's image API says it has PIL locally so if I'm deploying my app it shouldn't matter, right? I have Python 3.3 and apparently PIL stopped at 2.6.
Python appengine run time is 2.7, (OK and 2.5) so don't even try to work with 3.x.
Secondly get_serving_URL is a method you call with a BlobStore entity key not a BlobProperty.
You are confusing two different things here.
I would concentrate on getting your code to run locally correctly under 2.7 first, and PIL is available for 2.7.
I'm very impressed if you're trying to deploy your app without even testing it locally.
One thing you'll need to do is make PIL available in your app.yaml via the libraries attribute.