S3 permission error when running sagemaker python sdk sklearn in local mode - amazon-sagemaker

I created a training script with hard coded input. It works as expected using a training job but I couldn't make it work using local mode.
It brings up a container on my local docker and exits with code (1)
Code:
estimator = SKLearn(entry_point="train_model.py",
train_instance_type="local")
estimator.fit()
Here is the exception:
2020-02-22 06:21:05,470 sagemaker-containers INFO Imported framework sagemaker_sklearn_container.training
2020-02-22 06:21:05,480 sagemaker-containers INFO No GPUs detected (normal if no gpus installed)
2020-02-22 06:21:05,504 sagemaker_sklearn_container.training INFO Invoking user training script.
2020-02-22 06:21:06,407 sagemaker-containers ERROR Reporting training FAILURE
2020-02-22 06:21:06,407 sagemaker-containers ERROR framework error:
Traceback (most recent call last):
File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_trainer.py", line 81, in train
entrypoint()
File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 36, in main
train(framework.training_env())
File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 32, in train
training_environment.to_env_vars(), training_environment.module_name)
File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_modules.py", line 301, in run_module
_files.download_and_extract(uri, _env.code_dir)
File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_files.py", line 129, in download_and_extract
s3_download(uri, dst)
File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_files.py", line 164, in s3_download
s3.Bucket(bucket).download_file(key, dst)
File "/miniconda3/lib/python3.7/site-packages/boto3/s3/inject.py", line 246, in bucket_download_file
ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
File "/miniconda3/lib/python3.7/site-packages/boto3/s3/inject.py", line 172, in download_file
extra_args=ExtraArgs, callback=Callback)
File "/miniconda3/lib/python3.7/site-packages/boto3/s3/transfer.py", line 307, in download_file
future.result()
File "/miniconda3/lib/python3.7/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/miniconda3/lib/python3.7/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
File "/miniconda3/lib/python3.7/site-packages/s3transfer/tasks.py", line 255, in _main
self._submit(transfer_future=transfer_future, **kwargs)
File "/miniconda3/lib/python3.7/site-packages/s3transfer/download.py", line 345, in _submit
**transfer_future.meta.call_args.extra_args
File "/miniconda3/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/miniconda3/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
An error occurred (403) when calling the HeadObject operation: Forbidden
tmpe_msr8pi_algo-1-kt1vh_1 exited with code 1

I found out that docker restart solved the issue.
After a while it happened again - and it solved it again.
I'm using docker for windows, and the issue is probably related to the created container configuration

Related

Neural prophet Value Error without any message

I will try to be as short as possible.
I ran a Neural prophet forecasting job on multiple products
Task 'model_selection': Exception encountered during task execution!
Traceback (most recent call last):
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 880, in get_task_run_state
value = prefect.utilities.executors.run_task_with_timeout(
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/prefect/utilities/executors.py", line 468, in run_task_with_timeout
return task.run(*args, **kwargs) # type: ignore
File "/builds/-/--prefect-workflows/workflows/worker_flow.py", line 108, in model_selection
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/python_translation/model_selection_master.py", line 483, in run_model_selection
) = cross_validate_neuralprophet(
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/python_translation/models/NeuralProphet.py", line 169, in cross_validate_neuralprophet
train = NeuralProphet_model.fit(df=df_train, freq="W-MON")
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/forecaster.py", line 592, in fit
metrics_df = self._train(df_dict, progress=progress)
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/forecaster.py", line 1806, in _train
loader = self._init_train_loader(df_dict)
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/forecaster.py", line 1572, in _init_train_loader
self.config_normalization.init_data_params(
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/configure.py", line 41, in init_data_params
self.local_data_params, self.global_data_params = df_utils.init_data_params(
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/df_utils.py", line 260, in init_data_params
global_data_params = data_params_definition(
File "/root/.cache/pypoetry/virtualenvs/--py3.8/lib/python3.8/site-packages/neuralprophet/df_utils.py", line 176, in data_params_definition
data_params[covar] = get_normalization_params(
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/df_utils.py", line 300, in get_normalization_params
norm_type = auto_normalization_setting(array)
File "/root/.cache/pypoetry/virtualenvs/--prefect-workflows-9TtSrW0h-py3.8/lib/python3.8/site-packages/neuralprophet/df_utils.py", line 290, in auto_normalization_setting
raise ValueError
ValueError
Describe the bug
Ran a forecasting job ... and it raised a ValueError without any additional mentions.
To Reproduce
I really do not know. It was a Prefect job that I ran over 200 products. And I have no idea why it failed.
Expected behavior
I expected it to forecast without returning an error.
What actually happens
It crashes with a ValueError
Screenshots
Printouts are above.
Environement (please complete the following information):
Python environment: 3.8.10
NeuralProphet version: neuralprophet 0.3.2, installed from PYPI with pip install neuralprophet
Additional context
These are scheduled as a Prefect workflow. Hence I do not run things manually. Around 150 products ran without any issues. And this returned a ValueError.

VOLTTRON Simple Web agent

On release 8.1.1 I am trying to experiment with the simple web agent.
Running through the setup process
volttron -vv -l volttron.log --bind-web-address http://0.0.0.0:8080 &
Everything seem to install OK for http protrocol on the vcfg and starting the agent starts fine but going to the browser I get an empty page response.
And in terminal an error here's the Full traceback:
.do_close of <WSGIServer, (<gevent._socket3.socket [closed] at 0x7f64342242c)> failed with SSLError
Traceback (most recent call last):
File "src/gevent/greenlet.py", line 854, in gevent._gevent_cgreenlet.Greenlet.run
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/baseserver.py", line 34, in _handle_and_close_when_done
return handle(*args_tuple)
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/server.py", line 233, in wrap_socket_and_handle
with _closing_socket(self.wrap_socket(client_socket, **self.ssl_args)) as ssl_socket:
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 793, in wrap_socket
return SSLSocket(sock=sock, keyfile=keyfile, certfile=certfile,
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 311, in init
raise x
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 307, in init
self.do_handshake()
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 663, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: HTTP_REQUEST] http request (_ssl.c:1131)
2021-09-29T13:38:34Z <Greenlet at 0x7f64341fc480: _handle_and_close_when_done(<bound method StreamServer.wrap_socket_and_handle , <bound method StreamServer.do_close of <WSGIServer, (<gevent._socket3.socket [closed] at 0x7f643419195)> failed with SSLError
Traceback (most recent call last):
File "src/gevent/greenlet.py", line 854, in gevent._gevent_cgreenlet.Greenlet.run
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/baseserver.py", line 34, in _handle_and_close_when_done
return handle(*args_tuple)
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/server.py", line 233, in wrap_socket_and_handle
with _closing_socket(self.wrap_socket(client_socket, **self.ssl_args)) as ssl_socket:
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 793, in wrap_socket
return SSLSocket(sock=sock, keyfile=keyfile, certfile=certfile,
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 311, in init
raise x
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 307, in init
self.do_handshake()
File "/home/ben/Desktop/volttron/env/lib/python3.8/site-packages/gevent/_ssl3.py", line 663, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: HTTP_REQUEST] http request (_ssl.c:1131)
2021-09-29T13:38:34Z <Greenlet at 0x7f643423c6a0: _handle_and_close_when_done(<bound method StreamServer.wrap_socket_and_handle , <bound method StreamServer.do_close of <WSGIServer, (<gevent._socket3.socket [closed] at 0x7f64342242c)> failed with SSLError
EDIT
So if I do a nano ~/.volttron/config it looks like this below. I did change the bind-web-address for the IP address of my test bench instance. Hopefully that wasn't a mistake it looked like the initial bind-web-address was the name of the computer. --bind-web-address http://ben-hp-probook-6550b:8080
message-bus = zmq
vip-address = tcp://127.0.0.1:22916
instance-name = benshome
bind-web-address = http://192.168.0.105:8080
web-ssl-cert = /home/ben/.volttron/certificates/certs/platform_web-server.crt
web-ssl-key = /home/ben/.volttron/certificates/private/platform_web-server.pem
web-secret-key = 0e3b19770c0a8c0a08f274fcdabaf939fecc16601283266934c5ab258a1ed20cf440fde2c83cb8660dac569d31b5cdaf3ab7354a39b0640f355f9c5407c5fce619
I think I did first try HTTPS then resorted to HTTP. Anyways when I start VOLTTRON do I still need a --bind-web-address arg if the ~/.volttron/config is already setup with one?
I've a tried both when starting VOLTTRON to use the --bind flag or not but still unable to bring up a webpage on the IP address of the machine running VOLTTRON of 192.168.0.105. This would be the simple web agent, right?
I was able to reproduce this when I ran through vcfg and specified https, but then did what you did and passed the bind-web-address to the volttron command itself.
However, you shouldn't do this. The instructions assume you haven't gone through the vcfg process and therefore you would have to specify the bind web address on the command line.
Since you went through the vcfg process your config file (~/.volttron/config) will have your hostname:port as the bind-web-address. If it has https in it that is the reason it is not working for you.

PloneIDE intallation Error

Running buildout. This might take a while...
While:
Installing ploneide.
An internal error occurred due to a bug in either zc.buildout or in a
recipe being used:
Traceback (most recent call last):
File "/Plone/buildout-cache/eggs/zc.buildout-2.2.1-py2.7.egg/zc/buildout/buildout.py", line 1942, in main
getattr(buildout, command)(args)
File "/Plone/buildout-cache/eggs/zc.buildout-2.2.1-py2.7.egg/zc/buildout/buildout.py", line 622, in install
installed_files = self[part]._call(recipe.install)
File "/Plone/buildout-cache/eggs/zc.buildout-2.2.1-py2.7.egg/zc/buildout/buildout.py", line 1366, in _call
return f()
File "/Plone/zinstance/src/collective.recipe.ploneide/collective/recipe/ploneide/__init__.py", line 200, in install
self.install_developer_manual()
File "/Plone/zinstance/src/collective.recipe.ploneide/collective/recipe/ploneide/__init__.py", line 107, in install_developer_manual
res = subprocess.Popen(cmd, **kwargs)
File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Buildout.cfg
develop =
src/collective.ploneide
src/collective.recipe.ploneide
parts=
ploneide
[Instance]
[ploneide]
recipe = collective.recipe.ploneide
ploneide may not be super well maintained at this point - last commit from 2012 https://github.com/collective/collective.ploneide
It would help if you posted your buildout.cfg and mentioned which OS you're using.

Exception in idle (python 2.7) - possible bug in idle?

I'm trying to run a meta-analysis on a database of fMRI data, using the neurosynth python library through idle. When I try to run even some of the most basic functions, I get an error, not an error my own code, or in the neurosynth modules, the error seems to be a bug in idle itself.
I uninstalled and reinstalled python 2.7, reinstalled neurosynth and its dependencies, and ran into the same error. I've pasted my code below, followed by the error message, which appears in the unix shell (not in the idle shell).
Has anybody come across this error before using idle and python 2.7?
The script:
from neurosynth.base.dataset import Dataset
from neurosynth.analysis import meta, decode, network
import neurosynth
neurosynth.set_logging_level('info')
dataset = Dataset('data/database.txt')
dataset.add_features('data/features.txt')
dataset.save('dataset.pkl')
print 'done'
The error message which appeared in the unix shell:
----------------------------------------
Unhandled server exception!
Thread: SockThread
Client Address: ('127.0.0.1', 46779)
Request: <socket._socketobject object at 0xcb8d7c0>
Traceback (most recent call last):
File "/usr/global/python/2.7.3/lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/global/python/2.7.3/lib/python2.7/SocketServer.py", line 310, in process_request
self.finish_request(request, client_address)
File "/usr/global/python/2.7.3/lib/python2.7/SocketServer.py", line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 503, in __init__
SocketServer.BaseRequestHandler.__init__(self, sock, addr, svr)
File "/usr/global/python/2.7.3/lib/python2.7/SocketServer.py", line 638, in __init__
self.handle()
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/run.py", line 265, in handle
rpc.RPCHandler.getresponse(self, myseq=None, wait=0.05)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 280, in getresponse
response = self._getresponse(myseq, wait)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 300, in _getresponse
response = self.pollresponse(myseq, wait)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 424, in pollresponse
message = self.pollmessage(wait)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 376, in pollmessage
packet = self.pollpacket(wait)
File "/usr/global/python/2.7.3/lib/python2.7/idlelib/rpc.py", line 347, in pollpacket
r, w, x = select.select([self.sock.fileno()], [], [], wait)
error: (4, 'Interrupted system call')
*** Unrecoverable, server exiting!
----------------------------------------
Thanks in advance!
Idle is meant for interactive exploration in the shell, for editing in an editor, and for testing programs by running them from an editor. It is not meant for production running of programs once developed. If there is a problem, one should separate the Idle part from the running with Python part. So in the unix shell, run python -m idlelib (for instance) to see if Idle starts correctly. Then, in an appropriate directory, run python path-to-my-file.py. Which does not work?
The error message is definitely odd, as it has more than just the python traceback. On the other hand, it does not start with a line of your code. I have no idea why the select call would be interrupted.

Go AppEngine remote_api sample not working

Are the Go AppEngine samples up to date?
I'm running into issues getting example/remote_api/datastore_info.go working for my test AppEngine running on localhost.
I've changed the client.PostForm from:
resp, err := client.PostForm("https://www.google.com/accounts/ClientLogin", v)
to:
resp, err := client.PostForm("http://localhost:35058/_ah/remote_api", v)
(35058 is the port reported for api_server during startup).
I've tried both 1.9.3 and latest 1.9.4 versions.
The api server reports:
ERROR 2014-05-06 20:57:56,378 api_server.py:215] Exception while handling
Traceback (most recent call last):
File "/root/go_appengine/google/appengine/tools/devappserver2/api_server.py", line 194, in _handle_POST
request.ParseFromString(wsgi_input)
File "/root/go_appengine/google/net/proto/ProtocolBuffer.py", line 88, in ParseFromString
self.MergeFromString(s)
File "/root/go_appengine/google/net/proto/ProtocolBuffer.py", line 95, in MergeFromString
self.MergePartialFromString(s)
File "/root/go_appengine/google/net/proto/ProtocolBuffer.py", line 109, in MergePartialFromString
self.TryMerge(d)
File "/root/go_appengine/google/appengine/ext/remote_api/remote_api_pb.py", line 210, in TryMerge
d.skipData(tt)
File "/root/go_appengine/google/net/proto/ProtocolBuffer.py", line 529, in skipData
self.skipData(t)
File "/root/go_appengine/google/net/proto/ProtocolBuffer.py", line 529, in skipData
self.skipData(t)
File "/root/go_appengine/google/net/proto/ProtocolBuffer.py", line 537, in skipData
raise ProtocolBufferDecodeError, "corrupted"
ProtocolBufferDecodeError: corrupted
There were some bug fixes in 1.9.6; can you try with the latest SDK?
I've been exactly the same problem in whatever call to my development server
Traceback (most recent call last):
File "/home/mike/go_appengine/google/appengine/tools/devappserver2/api_server.py", line 238, in _handle_POST
request.ParseFromString(wsgi_input)
File "/home/mike/go_appengine/google/net/proto/ProtocolBuffer.py", line 140, in ParseFromString
self.MergeFromString(s)
File "/home/mike/go_appengine/google/net/proto/ProtocolBuffer.py", line 152, in MergeFromString
self.MergePartialFromString(s)
File "/home/mike/go_appengine/google/net/proto/ProtocolBuffer.py", line 168, in MergePartialFromString
self.TryMerge(d)
File "/home/mike/go_appengine/google/appengine/ext/remote_api/remote_api_pb.py", line 210, in TryMerge
d.skipData(tt)
File "/home/mike/go_appengine/google/net/proto/ProtocolBuffer.py", line 677, in skipData
raise ProtocolBufferDecodeError, "corrupted"
ProtocolBufferDecodeError: corrupted
I've go version go1.4.2 (appengine-1.9.24) linux/amd64
Problem was I'm using the IP for "API" instead the IP for default module
INFO 2015-08-13 19:42:03,901 devappserver2.py:763] Skipping SDK update check.
INFO 2015-08-13 19:42:03,947 api_server.py:205] Starting API server at: http://localhost:60852
INFO 2015-08-13 19:42:03,971 dispatcher.py:197] Starting module "default" running at: http://localhost:49333
INFO 2015-08-13 19:42:03,972 admin_server.py:118] Starting admin server at: http://localhost:8000
You must use module host/por to place calls to your go application; API ip is for remote api, I think.

Resources