Pycharm breakpoints on Thread for Google App Engine - google-app-engine

PyCharm Professional debugger has one issue when working with Google App Engine. Breakpoints do not work when debugging Threads launched manually by my code.
This affects ability to debug code running on dev_appserver which uses threading.Thread or concurrent.futures.ThreadPoolExecutor 2.7 backport (which is officially supported by GAE now)
The issue occurs both in PyCharm 2017.2 and 2017.3.4.
Observed on GAE SDK 1.9.66 on Ubuntu Linux.
Here is a repro code - call this from any request handler.
from concurrent.futures import ThreadPoolExecutor, wait
from threading import Thread
def worker():
logging.info("Worker") # set breakpoint here
time.sleep(3)
def call_this(): # call this from your request handler
tpe = ThreadPoolExecutor(max_workers=5)
futures = [tpe.submit(worker) for i in range(10)]
wait(futures)
threads = [Thread(worker) for i in range(10)]
for t in threads: t.start()
for t in threads: t.join()

Quick fix is to patch patch_threads function in <pycharm-folder>/helpers/pydev/pydevd.py around line 978 and add settrace for the separate GAE's module:
def patch_threads(self):
try:
# not available in jython!
import threading
threading.settrace(self.trace_dispatch) # for all future threads
from google.appengine.dist27 import threading as gae_threading
gae_threading.settrace(self.trace_dispatch) # for all future threads
except Exception as e:
pass
from _pydev_bundle.pydev_monkey import patch_thread_modules
patch_thread_modules()

Related

Possibly memory leak in Pyramid on Appengine Flexible with MemoryStore

We're working on migrating our backend from appengine Standard (and the "webapp2" framework) to Flexible using pyramid. We have a proof of concept of sorts running seemingly without many issues. All it does in this early phase is take requests from a third party ("pings") and then go kick off a task to another internal service to go fetch some data. It connects with googles MemoryStore to cache a user-id to indicate that we've already fetched that users data (or attempted to) within the last 6 hours.
Speaking of 6 hours, it seems that every 6 hours or so, memory usage on the Flexible instance seems to reach a tipping point, and then probably flushes, and all is fine again. This instance is set to have 512MB of memory, yet like clockwork it does this at around 800MB (some kind of grace-usage? or maybe these can't be set to under 1GB)
It's clear by how gradual it's moving that memory isn't being cleared as often as maybe it should be.
When this happens, latency on the instance also spikes.
I'm not sure what's useful in debugging something like this so I'll try to show what I can.
Appengine YAML file:
runtime: custom
env: flex
service: ping
runtime_config:
python_version: 3.7
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
Dockerfile (as custom-runtime Flexible needs it)
FROM gcr.io/google-appengine/python
RUN virtualenv /env
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt
ADD . /app
RUN pip install -e .
CMD gunicorn -b :$PORT main:app
Why custom? I couldn't get this working under the default Python runtime. The pip install -e . was what appeared to be needed.
Then, in root __init__ I have:
from pyramid.config import Configurator
from externalping.memcache import CacheStore
cachestore = CacheStore()
def main(global_config, **settings):
""" This function returns a Pyramid WSGI application.
"""
with Configurator(settings=settings) as config:
config.include('.routes')
config.scan()
return config.make_wsgi_app()
Maybe having the connection to MemoryStore defined so early is the issue? Cachestore:
class CacheStore(object):
redis_host = os.environ.get('REDISHOST', 'localhost')
redis_port = int(os.environ.get('REDISPORT', 6379))
client = None
def __init__(self):
self.client = redis.StrictRedis(host=self.redis_host, port=self.redis_port)
def set_json(self, key, value):
self.client.set(key, json.dumps(value))
return True
def get_json(self, key):
return json.loads(self.client.get(key))
On the actual request itself, after importing from externalping import cachestore, I'm simply calling those methods shown above: cachestore.client.get(user['ownerId'])
This does appear to be how google's documentation says to implement this, as best I can tell. Only difference really is I put a wrapper around it.

Consume SQS tasks from App Engine

I'm attempting to integrate with a third party that is posting messages on an Amazon SQS queue. I need my GAE backend to receive these messages.
Essentially, I want the following script to launch and always be running
import boto3
sqs_client = boto3.client('sqs',
aws_access_key_id=KEY,
aws_secret_access_key=SECRET,
region_name=REGION)
while True:
sqs_client.receive_message(QueueUrl=QUEUE_URL, WaitTimeSeconds=60)
for message in msgs_response.get('Messages', []):
deferred.defer(process_and_delete_message, message)
My main appengine web app is on Automatic Scaling (with the 60-second &10-minute task timeouts), but I'm thinking of setting up a micro-service set to either Manual Scaling or Basic Scaling because:
Requests can run indefinitely. A manually-scaled instance can choose to handle /_ah/start and execute a program or script for many hours without returning an HTTP response code. Task queue tasks can run up to 24 hours.
https://cloud.google.com/appengine/docs/standard/python/an-overview-of-app-engine
Apparently both Manual & Basic Scaling also allow "Background Threads", but I am having a hard-time finding documentation for it and I'm thinking this may be a relic from the days before they deprecated Backends in favor of Modules (although I did find this https://cloud.google.com/appengine/docs/standard/python/refdocs/modules/google/appengine/api/background_thread/background_thread#BackgroundThread).
Is Manual or Basic Scaling suited for this? If so, what should I use to listen on sqs_client.receive_message()? One thing I'm concerned about is this task/background thread dieing and not relaunching itself.
This maybe a possible solution:
Try to use a Google Compute Engine micro instance to run that script continuously and send a REST call to your app engine app. Easy Python Example For Compute Engine
OR:
I have used modules that run instance type B2/B1 for long running jobs; and I have never had any trouble; but those jobs do start and stop. I use the basic scaling: with max_instances set to 1. The jobs I have run take around 6 hours to complete.
I ended up creating a manual scaling app engine standard micro-service for this. This micro-service has handeler for /_ah/start never returns and runs indefinitely (many days at a time) and when it does get stopped, then app engine restarts it immediately.
Requests can run indefinitely. A manually-scaled instance can choose
to handle /_ah/start and execute a program or script for many hours
without returning an HTTP response code. Task queue tasks can run up
to 24 hours.
https://cloud.google.com/appengine/docs/standard/python/an-overview-of-app-engine
My /_ah/start handler listens to the SQS queue, and creates Push Queue tasks that my default service is set up to listen for.
I was looking into the Compute Engine route as well as the App Engine Flex route (which is essentially Compute Engine managed by app engine), but there were other complexities like not getting access to ndb and the taskqueue sdk and I didn't have time to dive into that.
Below are all of the files for this micro-service, not included is my lib folder that contains the source code for boto3 & some other libraries I needed.
I hope this helpful for someone.
gaesqs.yaml:
application: my-project-id
module: gaesqs
version: dev
runtime: python27
api_version: 1
threadsafe: true
manual_scaling:
instances: 1
env_variables:
theme: 'default'
GAE_USE_SOCKETS_HTTPLIB : 'true'
builtins:
- appstats: on #/_ah/stats/
- remote_api: on #/_ah/remote_api/
- deferred: on
handlers:
- url: /.*
script: gaesqs_main.app
libraries:
- name: jinja2
version: "2.6"
- name: webapp2
version: "2.5.2"
- name: markupsafe
version: "0.15"
- name: ssl
version: "2.7.11"
- name: pycrypto
version: "2.6"
- name: lxml
version: latest
gaesqs_main.py:
#!/usr/bin/env python
import json
import logging
import appengine_config
try:
# This is needed to make local development work with SSL.
# See http://stackoverflow.com/a/24066819/500584
# and https://code.google.com/p/googleappengine/issues/detail?id=9246 for more information.
from google.appengine.tools.devappserver2.python import sandbox
sandbox._WHITE_LIST_C_MODULES += ['_ssl', '_socket']
import sys
# this is socket.py copied from a standard python install
from lib import stdlib_socket
socket = sys.modules['socket'] = stdlib_socket
except ImportError:
pass
import boto3
import os
import webapp2
from webapp2_extras.routes import RedirectRoute
from google.appengine.api import taskqueue
app = webapp2.WSGIApplication(debug=os.environ['SERVER_SOFTWARE'].startswith('Dev'))#, config=webapp2_config)
KEY = "<MY-KEY>"
SECRET = "<MY-SECRET>"
REGION = "<MY-REGION>"
QUEUE_URL = "<MY-QUEUE_URL>"
def process_message(message_body):
queue = taskqueue.Queue('default')
task = taskqueue.Task(
url='/task/sqs-process/',
countdown=0,
target='default',
params={'message': message_body})
queue.add(task)
class Start(webapp2.RequestHandler):
def get(self):
logging.info("Start")
for loggers_to_suppress in ['boto3', 'botocore', 'nose', 's3transfer']:
logger = logging.getLogger(loggers_to_suppress)
if logger:
logger.setLevel(logging.WARNING)
logging.info("boto3 loggers suppressed")
sqs_client = boto3.client('sqs',
aws_access_key_id=KEY,
aws_secret_access_key=SECRET,
region_name=REGION)
while True:
msgs_response = sqs_client.receive_message(QueueUrl=QUEUE_URL, WaitTimeSeconds=20)
logging.info("msgs_response: %s" % msgs_response)
for message in msgs_response.get('Messages', []):
logging.info("message: %s" % message)
process_message(message['Body'])
sqs_client.delete_message(QueueUrl=QUEUE_URL, ReceiptHandle=message['ReceiptHandle'])
_routes = [
RedirectRoute('/_ah/start', Start, name='start'),
]
for r in _routes:
app.router.add(r)
appengine_config.py:
import os
from google.appengine.ext import vendor
from google.appengine.ext.appstats import recording
appstats_CALC_RPC_COSTS = True
# Add any libraries installed in the "lib" folder.
# Use pip with the -t lib flag to install libraries in this directory:
# $ pip install -t lib gcloud
# https://cloud.google.com/appengine/docs/python/tools/libraries27
try:
vendor.add('lib')
except:
print "Unable to add 'lib'"
def webapp_add_wsgi_middleware(app):
app = recording.appstats_wsgi_middleware(app)
return app
if os.environ.get('SERVER_SOFTWARE', '').startswith('Development'):
print "gaesqs development"
import imp
import os.path
import inspect
from google.appengine.tools.devappserver2.python import sandbox
sandbox._WHITE_LIST_C_MODULES += ['_ssl', '_socket']
# Use the system socket.
real_os_src_path = os.path.realpath(inspect.getsourcefile(os))
psocket = os.path.join(os.path.dirname(real_os_src_path), 'socket.py')
imp.load_source('socket', psocket)
os.environ['HTTP_HOST'] = "my-project-id.appspot.com"
else:
print "gaesqs prod"
# Doing this on dev_appserver/localhost seems to cause outbound https requests to fail
from lib import requests
from lib.requests_toolbelt.adapters import appengine as requests_toolbelt_appengine
# Use the App Engine Requests adapter. This makes sure that Requests uses
# URLFetch.
requests_toolbelt_appengine.monkeypatch()

"ImportError: No module named _ssl" with dev_appserver.py from Google App Engine

Background
"In the Python runtime, we've added support for the Python SSL
Library, so you can now open secure connections to remote services
such as Apple's Push Notification service."
This quote is taken from a recent post on the Google App Engine blog.
Implementation
If you want to use native python ssl, you must enable it using the libraries configuration in your application's app.yaml file where you specify the library name "ssl" . . .
These instructions are provided for developers through the Google App Engine documentation.
The following lines have been added to the app.yaml file:
libraries:
- name: ssl
version: latest
This much is in line with the advice provided through the Google App Engine documentation.
Problem
I have tried running my project in three different configurations. Two are working, and one is not.
Working ...
After I upload my application to Google App Engine, and run my project through the live server, everything works fine.
Working ...
When I run my project with manage.py runserver and include the Google App Engine SKD in my PYTHONPATH, everything works fine.
Not Working ...
However, when I run my project with dev_appserver.py, I get the following error:
ImportError at /
No module named _ssl
Request Method: GET
Request URL: http://localhost:8080/
Django Version: 1.4.3
Exception Type: ImportError
Exception Value:
No module named _ssl
Exception Location: /usr/local/lib/google_appengine_1.7.7/google/appengine/tools/devappserver2/python/sandbox.py in load_module, line 856
Python Executable: /home/rbose85/Code/venvs/appserver/bin/python
Python Version: 2.7.3
Python Path:
['/home/rbose85/Code/product/site',
'/usr/local/lib/google_appengine_1.7.7',
'/usr/local/lib/google_appengine_1.7.7/lib/protorpc',
'/usr/local/lib/google_appengine_1.7.7',
'/usr/local/lib/google_appengine_1.7.7',
'/usr/local/lib/google_appengine_1.7.7/lib/protorpc',
'/usr/local/lib/google_appengine_1.7.7',
'/usr/local/lib/google_appengine_1.7.7/lib/protorpc',
'/home/rbose85/Code/venvs/appserver/lib/python2.7',
'/home/rbose85/Code/venvs/appserver/lib/python2.7/lib-dynload',
'/usr/lib/python2.7',
'/usr/local/lib/google_appengine',
u'/usr/local/lib/google_appengine_1.7.7/lib/django-1.4',
u'/usr/local/lib/google_appengine_1.7.7/lib/ssl-2.7',
u'/usr/local/lib/google_appengine_1.7.7/lib/webapp2-2.3',
u'/usr/local/lib/google_appengine_1.7.7/lib/webob-1.1.1',
u'/usr/local/lib/google_appengine_1.7.7/lib/yaml-3.10']
Server time: Wed, 24 Apr 2013 11:23:49 +0000
For the current GAE version (1.8.0 at least until 1.8.3), if you want to be able to debug SSL connections in your development environment, you will need to tweak a little bit the gae sandbox:
add "_ssl" and "_socket" keys to the dictionary _WHITE_LIST_C_MODULES in /path-to-gae-sdk/google/appengine/tools/devappserver2/python/sandbox.py
Replace the socket.py file provided by google in /path-to-gae-sdk/google/appengine/dis27 from the socket.py file from your Python framework.
IMPORTANT: Tweaking the sandbox environment might end up with functionality working on your local machine but not in production (for example, GAE only supports outbound sockets in production). I will recommend you to restore your sandbox when you are done developing that specific part of your app.
The solution by jmg works, but instead of changing the sdk files, you could monkey patch the relevant modules.
Just put something like this on the beginning of your project setup.
# Just taking flask as an example
app = Flask('myapp')
if environment == 'DEV':
import sys
from google.appengine.tools.devappserver2.python import sandbox
sandbox._WHITE_LIST_C_MODULES += ['_ssl', '_socket']
from lib import copy_of_stdlib_socket.py as patched_socket
sys.modules['socket'] = patched_socket
socket = patched_socket
I had to use a slightly different approach to get this working in CircleCI (unsure what peculiarity about their venv config caused this):
appengine_config.py
import os
if os.environ.get('SERVER_SOFTWARE', '').startswith('Development'):
import imp
import os.path
import inspect
from google.appengine.tools.devappserver2.python import sandbox
sandbox._WHITE_LIST_C_MODULES += ['_ssl', '_socket']
# Use the system socket.
real_os_src_path = os.path.realpath(inspect.getsourcefile(os))
psocket = os.path.join(os.path.dirname(real_os_src_path), 'socket.py')
imp.load_source('socket', psocket)
I had this problem because I wasn't vendoring ssl in my app.yaml file. I know the OP did that, but for those landing here for the OP's error, it's worth making sure lines like the following are in your app.yaml file:
libraries:
- name: ssl
version: latest
Stumbled upon this thread trying to work with Apples Push notification service and appengine... I was able to get this working without any monkey patching, by adding the SSL library in my app.yaml, as recommended in the official docs, hope that helps someone else :)
I added the code to appengine_config.py as listed by Spain Train, but had to also add the following code as well to get this to work:
phttplib = os.path.join(os.path.dirname(real_os_src_path), 'httplib.py')
imp.load_source('httplib', phttplib)
You can test if ssl is available at your local system by opening a python shell and typing import ssl. If no error appears then the problem is something else, otherwise you don't have the relevant libraries installed on your system. If you are using a Linux operating system try sudo apt-get install openssl openssl-devel or the relevant instructions for your operating system to install them locally. If you are using windows, these are the instructions.

app-engine error due to import _multiprocessing

I create an app-engine endpoint api, which I am loading using GoogleAppengineLauncher. The api launches fine. But when I try to load api explorer for testing, I get an error due to the line from multiprocessing import Process. My research led me to this site. But that's not working for me. Does anyone know how to fix this?
from multiprocessing import Process
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/__init__.py", line 84, in <module>
import _multiprocessing
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/devappserver2/python/sandbox.py", line 861, in load_module
raise ImportError
ImportError
INFO 2013-03-25 23:46:32,229 server.py:528] "POST /_ah/spi/BackendService.getApiConfigs HTTP/1.1" 500 -
INFO 2013-03-25 23:46:32,229 server.py:528] "GET /_ah/api/discovery/v1/apis HTTP/1.1" 500 60
In this group thread, one of the Python 2.7 App Engine runtime engineer point out to alternatives (namely the futures package) that should work with the new Python 2.7 threading support.
Alternatively you could use the fetch_data_async functions to read from a blob without blocking.
fetch_data_rpc = blobstore.fetch_data_async(...)
other_processing()
upload_url = fetch_data_rpc.get_result()

Writing to filesystem in App Engine development server

I'm just trying out using scala and the scalate templating system on an appengine application. By default, scalate tries to write the compiled template to the filesystem. Now, obviously this won't work on appengine, and there is a way to precompile the templates. But I was wondering if it is possible to switch off this restriction, just during development. It slows down the compile/test cycle quite a bit.
In the Python dev server you can, I use it to log to a file when using the dev server:
if os.environ.get('SERVER_SOFTWARE','').startswith('Dev'):
from google.appengine.tools.dev_appserver import FakeFile
FakeFile.ALLOWED_MODES = frozenset(['a','r', 'w', 'rb', 'U', 'rU'])
If you want to write binary files or unicode you might need to add 'wb' or 'wU' to that list. Maybe there is something equivalent in the Java dev server.
I'm currently using webpy that has the same limitation, its templating system can't access parser module (blocked) and can't write to filesystem on Google App Engine, so you need to precompile the templates upfront.
I have resolved this annoying issue with a Python script that, everytime a file of a given directory is changed, triggers the precompilation of that file.
I'm on OSX and I'm using FSEvents but I believe you can find other solutions/libraries on any other platform (incron in Linux, FileSystemWatcher on Windows):
from fsevents import Observer
from fsevents import Stream
from datetime import datetime
import subprocess
import os
import time
PROJECT_PATH = '/Users/.../Project/GoogleAppEngine/stackprinter/'
TEMPLATE_COMPILE_PATH = os.path.join(PROJECT_PATH,'web','template.py')
VIEWS_PATH = os.path.join(PROJECT_PATH,'app','views')
def callback(event):
if event.name.endswith('.html'):
subprocess.Popen('python2.5 %s %s %s' % ( TEMPLATE_COMPILE_PATH ,'--compile', VIEWS_PATH) , shell=True)
print '%s - %s compiled!' % (datetime.now(), event.name.split('/')[-1])
observer = Observer()
observer.start()
stream = Stream(callback, VIEWS_PATH, file_events=True)
observer.schedule(stream)
while not observer.isAlive():
time.sleep(0.1)
I'd strongly advise against using AppEngine...
If you're just looking for free JVM/webapp hosting, then Stax.net offers a better alternative . Amongst other features, it allows you to write to the filesystem and to spawn threads.
They also use Scala internally, so they're very accommodating towards other Scala developers :)
Stax.net: http://www.stax.net/
(Note: I'm in no way affilliated to Stax)

Resources