Google App Engine with Flask: Memorystore/redis produces [Errno 104] Connection reset by peer - google-app-engine

My Flask-based GAE app has been running for a few weeks without issue. Today I noticed the root URL produces a 500 Internal Server Error most of the time. In the logging I see this appears to be related to session handling in Flask (using Flask-Session). Before transitioning to GAE, this app ran on a VM with local Redis instance for well over a year without any problems.
The Memorystore instance has only about 1500 keys at this time and 3 or 4 mb of data, so it is not heavily loaded. The server itself receives very little traffic (just me and the occasional robot). I am looking for insight as to what has produced this change in behavior or what diagnostic procedures I should pursue since I am new to GAE and the Google Cloud environment.
A typical traceback of the failure looks like this:
Traceback (most recent call last):
File "/env/lib/python3.7/site-packages/flask/app.py", line 1969, in finalize_request response = self.process_response(response)
File "/env/lib/python3.7/site-packages/flask/app.py", line 2268, in process_response self.session_interface.save_session(self, ctx.session, response)
File "/env/lib/python3.7/site-packages/flask_session/sessions.py", line 166, in save_session time=total_seconds(app.permanent_session_lifetime))
File "/env/lib/python3.7/site-packages/redis/client.py", line 1540, in setex return self.execute_command('SETEX', name, time, value)
File "/env/lib/python3.7/site-packages/redis/client.py", line 836, in execute_command conn = self.connection or pool.get_connection(command_name, **options)
File "/env/lib/python3.7/site-packages/redis/connection.py", line 1065, in get_connection if connection.can_read():
File "/env/lib/python3.7/site-packages/redis/connection.py", line 682, in can_read return self._parser.can_read(timeout)
File "/env/lib/python3.7/site-packages/redis/connection.py", line 295, in can_read return self._buffer and self._buffer.can_read(timeout)
File "/env/lib/python3.7/site-packages/redis/connection.py", line 205, in can_read raise_on_timeout=False)
File "/env/lib/python3.7/site-packages/redis/connection.py", line 173, in _read_from_socket data = recv(self._sock, socket_read_size)
File "/env/lib/python3.7/site-packages/redis/_compat.py", line 58, in recv return sock.recv(*args, **kwargs) ConnectionResetError: [Errno 104] Connection reset by peer
Again, this is new behavior. The server worked flawlessly for a couple of weeks. What might have changed and where should I look?
Possible related issue: https://github.com/andymccurdy/redis-py/issues/1186

Using health_check_interval eliminated most, but not all of these "Connection reset by peer" errors for us (GAE Python 2.7):
self._redis = Redis(
environ.get("REDISHOST", "localhost"),
int(environ.get("REDISPORT", 6379)),
health_check_interval=30,
)
Perhaps a value lower than 30 would eliminate the remaining occurrences.

Related

Neural prophet Value Error without any message

I will try to be as short as possible.
I ran a Neural prophet forecasting job on multiple products
Task 'model_selection': Exception encountered during task execution!
Traceback (most recent call last):
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 880, in get_task_run_state
value = prefect.utilities.executors.run_task_with_timeout(
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/prefect/utilities/executors.py", line 468, in run_task_with_timeout
return task.run(*args, **kwargs) # type: ignore
File "/builds/-/--prefect-workflows/workflows/worker_flow.py", line 108, in model_selection
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/python_translation/model_selection_master.py", line 483, in run_model_selection
) = cross_validate_neuralprophet(
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/python_translation/models/NeuralProphet.py", line 169, in cross_validate_neuralprophet
train = NeuralProphet_model.fit(df=df_train, freq="W-MON")
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/forecaster.py", line 592, in fit
metrics_df = self._train(df_dict, progress=progress)
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/forecaster.py", line 1806, in _train
loader = self._init_train_loader(df_dict)
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/forecaster.py", line 1572, in _init_train_loader
self.config_normalization.init_data_params(
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/configure.py", line 41, in init_data_params
self.local_data_params, self.global_data_params = df_utils.init_data_params(
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/df_utils.py", line 260, in init_data_params
global_data_params = data_params_definition(
File "/root/.cache/pypoetry/virtualenvs/--py3.8/lib/python3.8/site-packages/neuralprophet/df_utils.py", line 176, in data_params_definition
data_params[covar] = get_normalization_params(
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/df_utils.py", line 300, in get_normalization_params
norm_type = auto_normalization_setting(array)
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/df_utils.py", line 290, in auto_normalization_setting
raise ValueError
ValueError
Describe the bug
Ran a forecasting job ... and it raised a ValueError without any additional mentions.
To Reproduce
I really do not know. It was a Prefect job that I ran over 200 products. And I have no idea why it failed.
Expected behavior
I expected it to forecast without returning an error.
What actually happens
It crashes with a ValueError
Screenshots
Printouts are above.
Environement (please complete the following information):
Python environment: 3.8.10
NeuralProphet version: neuralprophet 0.3.2, installed from PYPI with pip install neuralprophet
Additional context
These are scheduled as a Prefect workflow. Hence I do not run things manually. Around 150 products ran without any issues. And this returned a ValueError.

VOLTTRON Simple Web agent

On release 8.1.1 I am trying to experiment with the simple web agent.
Running through the setup process
volttron -vv -l volttron.log --bind-web-address http://0.0.0.0:8080 &
Everything seem to install OK for http protrocol on the vcfg and starting the agent starts fine but going to the browser I get an empty page response.
And in terminal an error here's the Full traceback:
.do_close of <WSGIServer, (<gevent._socket3.socket [closed] at 0x7f64342242c)> failed with SSLError
Traceback (most recent call last):
File "src/gevent/greenlet.py", line 854, in gevent._gevent_cgreenlet.Greenlet.run
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/baseserver.py", line 34, in _handle_and_close_when_done
return handle(*args_tuple)
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/server.py", line 233, in wrap_socket_and_handle
with _closing_socket(self.wrap_socket(client_socket, **self.ssl_args)) as ssl_socket:
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 793, in wrap_socket
return SSLSocket(sock=sock, keyfile=keyfile, certfile=certfile,
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 311, in init
raise x
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 307, in init
self.do_handshake()
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 663, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: HTTP_REQUEST] http request (_ssl.c:1131)
2021-09-29T13:38:34Z <Greenlet at 0x7f64341fc480: _handle_and_close_when_done(<bound method StreamServer.wrap_socket_and_handle , <bound method StreamServer.do_close of <WSGIServer, (<gevent._socket3.socket [closed] at 0x7f643419195)> failed with SSLError
Traceback (most recent call last):
File "src/gevent/greenlet.py", line 854, in gevent._gevent_cgreenlet.Greenlet.run
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/baseserver.py", line 34, in _handle_and_close_when_done
return handle(*args_tuple)
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/server.py", line 233, in wrap_socket_and_handle
with _closing_socket(self.wrap_socket(client_socket, **self.ssl_args)) as ssl_socket:
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 793, in wrap_socket
return SSLSocket(sock=sock, keyfile=keyfile, certfile=certfile,
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 311, in init
raise x
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 307, in init
self.do_handshake()
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 663, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: HTTP_REQUEST] http request (_ssl.c:1131)
2021-09-29T13:38:34Z <Greenlet at 0x7f643423c6a0: _handle_and_close_when_done(<bound method StreamServer.wrap_socket_and_handle , <bound method StreamServer.do_close of <WSGIServer, (<gevent._socket3.socket [closed] at 0x7f64342242c)> failed with SSLError
EDIT
So if I do a nano ~/.volttron/config it looks like this below. I did change the bind-web-address for the IP address of my test bench instance. Hopefully that wasn't a mistake it looked like the initial bind-web-address was the name of the computer. --bind-web-address http://ben-hp-probook-6550b:8080
message-bus = zmq
vip-address = tcp://127.0.0.1:22916
instance-name = benshome
bind-web-address = http://192.168.0.105:8080
web-ssl-cert = /home/ben/.volttron/certificates/certs/platform_web-server.crt
web-ssl-key = /home/ben/.volttron/certificates/private/platform_web-server.pem
web-secret-key = 0e3b19770c0a8c0a08f274fcdabaf939fecc16601283266934c5ab258a1ed20cf440fde2c83cb8660dac569d31b5cdaf3ab7354a39b0640f355f9c5407c5fce619
I think I did first try HTTPS then resorted to HTTP. Anyways when I start VOLTTRON do I still need a --bind-web-address arg if the ~/.volttron/config is already setup with one?
I've a tried both when starting VOLTTRON to use the --bind flag or not but still unable to bring up a webpage on the IP address of the machine running VOLTTRON of 192.168.0.105. This would be the simple web agent, right?
I was able to reproduce this when I ran through vcfg and specified https, but then did what you did and passed the bind-web-address to the volttron command itself.
However, you shouldn't do this. The instructions assume you haven't gone through the vcfg process and therefore you would have to specify the bind web address on the command line.
Since you went through the vcfg process your config file (~/.volttron/config) will have your hostname:port as the bind-web-address. If it has https in it that is the reason it is not working for you.

S3 permission error when running sagemaker python sdk sklearn in local mode

I created a training script with hard coded input. It works as expected using a training job but I couldn't make it work using local mode.
It brings up a container on my local docker and exits with code (1)
Code:
estimator = SKLearn(entry_point="train_model.py",
train_instance_type="local")
estimator.fit()
Here is the exception:
2020-02-22 06:21:05,470 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
2020-02-22 06:21:05,480 sagemaker-containers INFO No GPUs detected (normal if no gpus installed)
2020-02-22 06:21:05,504 sagemaker_sklearn_container.training INFO Invoking user training script.
2020-02-22 06:21:06,407 sagemaker-containers ERROR Reporting training FAILURE
2020-02-22 06:21:06,407 sagemaker-containers ERROR framework error:
Traceback (most recent call last):
File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_trainer.py", line 81, in train
entrypoint()
File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 36, in main
train(framework.training_env())
File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 32, in train
training_environment.to_env_vars(), training_environment.module_name)
File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_modules.py", line 301, in run_module
_files.download_and_extract(uri, _env.code_dir)
File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_files.py", line 129, in download_and_extract
s3_download(uri, dst)
File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_files.py", line 164, in s3_download
s3.Bucket(bucket).download_file(key, dst)
File "/miniconda3/lib/python3.7/site-packages/boto3/s3/inject.py", line 246, in bucket_download_file
ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
File "/miniconda3/lib/python3.7/site-packages/boto3/s3/inject.py", line 172, in download_file
extra_args=ExtraArgs, callback=Callback)
File "/miniconda3/lib/python3.7/site-packages/boto3/s3/transfer.py", line 307, in download_file
future.result()
File "/miniconda3/lib/python3.7/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/miniconda3/lib/python3.7/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
File "/miniconda3/lib/python3.7/site-packages/s3transfer/tasks.py", line 255, in _main
self._submit(transfer_future=transfer_future, **kwargs)
File "/miniconda3/lib/python3.7/site-packages/s3transfer/download.py", line 345, in _submit
**transfer_future.meta.call_args.extra_args
File "/miniconda3/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/miniconda3/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
An error occurred (403) when calling the HeadObject operation: Forbidden
tmpe_msr8pi_algo-1-kt1vh_1 exited with code 1
I found out that docker restart solved the issue.
After a while it happened again - and it solved it again.
I'm using docker for windows, and the issue is probably related to the created container configuration

VoidMessage response failing on Google Cloud Endpoints

I've create a small Google Cloud Endpoints Frameworks application running on App Engine - Standard Environment. Overall, things have been working very nicely.
I need to create a method that simply returns as HTTP 204 once it is called via HTTP GET. As I understand it, this can be accomplished by simply returning message_types.VoidMessage().
However, when I use this approach, the endpoint returns an HTTP 500 error. From the error console, the error reported is:
ValueError: response_size should be a positive int/long
My method is essentially empty right now (just logging) - the error does not originate until the response is attempted.
Traceback:
(/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py:279)
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 267, in Handle
result = handler(dict(self._environ), self._StartResponse)
File "/base/data/home/apps/..../lib/google/api/control/wsgi.py", line 194, in call
return self._application(environ, start_response)
File "/base/data/home/apps/..../lib/google/api/control/wsgi.py", line 574, in call
return self._application(environ, start_response)
File "/base/data/home/apps/..../lib/google/api/control/wsgi.py", line 323, in call
rules)
File "/base/data/home/apps/..../lib/google/api/control/wsgi.py", line 356, in _create_report_request
url=app_info.url
File "/base/data/home/apps/..../lib/google/api/control/report_request.py", line 241, in new
_validate_int_arg('response_size', response_size)
File "/base/data/home/apps/..../lib/google/api/control/report_request.py", line 45, in _validate_int_arg
raise ValueError('%s should be a positive int/long' % (name,))
ValueError: response_size should be a positive int/long
Any thoughts would be appreciated.
I'm on App Engine Standard edition.
google-endpoints is v2.0.0b5
google-endpoints-api-management is v1.0.0b6

Exception in idle (python 2.7) - possible bug in idle?

I'm trying to run a meta-analysis on a database of fMRI data, using the neurosynth python library through idle. When I try to run even some of the most basic functions, I get an error, not an error my own code, or in the neurosynth modules, the error seems to be a bug in idle itself.
I uninstalled and reinstalled python 2.7, reinstalled neurosynth and its dependencies, and ran into the same error. I've pasted my code below, followed by the error message, which appears in the unix shell (not in the idle shell).
Has anybody come across this error before using idle and python 2.7?
The script:
from neurosynth.base.dataset import Dataset
from neurosynth.analysis import meta, decode, network
import neurosynth
neurosynth.set_logging_level('info')
dataset = Dataset('data/database.txt')
dataset.add_features('data/features.txt')
dataset.save('dataset.pkl')
print 'done'
The error message which appeared in the unix shell:
----------------------------------------
Unhandled server exception!
Thread: SockThread
Client Address: ('127.0.0.1', 46779)
Request: <socket._socketobject object at 0xcb8d7c0>
Traceback (most recent call last):
File "/usr/global/python/2.7.3/lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/global/python/2.7.3/lib/python2.7/SocketServer.py", line 310, in process_request
self.finish_request(request, client_address)
File "/usr/global/python/2.7.3/lib/python2.7/SocketServer.py", line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 503, in __init__
SocketServer.BaseRequestHandler.__init__(self, sock, addr, svr)
File "/usr/global/python/2.7.3/lib/python2.7/SocketServer.py", line 638, in __init__
self.handle()
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/run.py", line 265, in handle
rpc.RPCHandler.getresponse(self, myseq=None, wait=0.05)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 280, in getresponse
response = self._getresponse(myseq, wait)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 300, in _getresponse
response = self.pollresponse(myseq, wait)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 424, in pollresponse
message = self.pollmessage(wait)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 376, in pollmessage
packet = self.pollpacket(wait)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 347, in pollpacket
r, w, x = select.select([self.sock.fileno()], [], [], wait)
error: (4, 'Interrupted system call')
*** Unrecoverable, server exiting!
----------------------------------------
Thanks in advance!
Idle is meant for interactive exploration in the shell, for editing in an editor, and for testing programs by running them from an editor. It is not meant for production running of programs once developed. If there is a problem, one should separate the Idle part from the running with Python part. So in the unix shell, run python -m idlelib (for instance) to see if Idle starts correctly. Then, in an appropriate directory, run python path-to-my-file.py. Which does not work?
The error message is definitely odd, as it has more than just the python traceback. On the other hand, it does not start with a line of your code. I have no idea why the select call would be interrupted.

Resources