How to run the mlflow web-based user interface from Amazon SageMaker? - amazon-sagemaker

I want to use the mlflow web-based user interface from a notebook on Amazon SageMaker. But the given address http://127.0.0.1:5000 doesn't seeem to work.
I have installed mlflow on a SageMaker notebook.
This code runs nicely:
import mlflow
mlflow.start_run()
mlflow.log_param("my", "param")
mlflow.log_metric("score", 100)
mlflow.end_run()
And then if I run
! mlflow ui
I get the expected result:
[2019-04-09 11:15:52 +0000] [17980] [INFO] Starting gunicorn 19.9.0
[2019-04-09 11:15:52 +0000] [17980] [INFO] Listening at: http://127.0.0.1:5000 (17980)
[2019-04-09 11:15:52 +0000] [17980] [INFO] Using worker: sync
[2019-04-09 11:15:52 +0000] [17983] [INFO] Booting worker with pid: 17983
However, after that when going to http://127.0.0.1:5000 in my browser, nothing loads.
My guess it that 127.0.0.1 is not the right address, but how can I know which address to use instead?

Hi and thank you for using SageMaker!
Unfortunately, it appears that mlflow isn't compatible with SageMaker at this time. We do provide a feature that can support these scenarios. SageMaker includes a plugin called jupyter-server-proxy which allows other web applications to be hosted on your SageMaker Notebook Instance, such as TensorBoard.
In the case of mlflow, I was almost able to get your example working by visiting https://mynotebookinstance.notebook.us-east-1.sagemaker.aws/proxy/5000/ (notice how the port number is moved to the end), but unfortunately mlflow displays an error since it currently assumes that it is not running at the root URL path.
I've created an issue in the mlflow GitHub repo: https://github.com/mlflow/mlflow/issues/1120 Please "Star" this issue to keep up-to-date on its status.
Best,
Kevin

Related

Setting up a Flink cluster with Podman for a beampipeline with flinkrunner

My goal is to create a streaming pipeline to read data from Apache Kafka, process the data, and write back to it.
Because of security reasons, I want to avoid Docker and use Podman.
I have set up a minimal cluster via a docker-compose.yml with a jobmanager, taskmanager and a Python SDK harness worker. The SDK harness worker seems to get stuck when i try to execute a pipeline.
Running the pipeline (reading a multi-line .txt file and writing it back in a file) it gets transferred to the jobmanager and taskmanager correctly, but then goes idle. When I look in the pythonsdk container, the logs show the following message repeatedly:
2022/12/04 16:13:02 Starting worker pool 1: python -m
apache_beam.runners.worker.worker_pool_main --service_port=50000
--container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1',
'--logging_endpoint=localhost:45087',
'--artifact_endpoint=localhost:35323',
'--provision_endpoint=localhost:36435',
'--control_endpoint=localhost:33237']
2022/12/04 16:16:31 Failed to obtain provisioning information: failed to
dial server at localhost:36435
caused by:
context deadline exceeded
Here is a link to a test pipeline that was created:
Example on github
Environment:
Debian 11;
Podman;
Python 3.2.9;
apache-beam==2.38.0; and
podman-compose
The setup of the cluster defined in:
docker-compose.yml
1x flink-jobmanager (flink version 1.14)
1x flink-taskmanager
1x Python Harness SDK
I chose to create a SDK container manually because I don't have Docker installed and Flink fails when it tries to create a container
over Docker.
I suspect that I have made a mistake in the network setup or there are some configurations missing for the harness worker, but I could not figure out the problem. Any thoughts?
Crossposted in user mailing list of beam.apache.org

[AWS Glue]: org.apache.thrift.TApplicationException: Internal error processing createInterpreter

I'm trying to use zeppelin-0.8.0 to connect to AWS Glue Development endpoint and when executing a cell below error occurs.
And there is no helpful message to understand what could be the problem. Any leads appreciated
172318_1906434757 is finished, status: ERROR, exception: java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal error processing createInterpreter, result: %text org.apache.thrift.TApplicationException: Internal error processing createInterpreter
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_createInterpreter(RemoteInterpreterService.java:209)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.createInterpreter(RemoteInterpreterService.java:192)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:169)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
UPDATE: So as in the answer below looks like 0.8.0 doesn't work with Glue yet.. I had problems running 0.7.x aw well with the javax.ws.rx package having a bunch of MethodNotFoundException when running with Java 8(did not help update-alternative to Java 7 as well). But when running inside a JDK 7 docker container it worked with no problems and was able to connect to my Dev end point. Highly appreciate if anyone can clarify the root cause of it
Could you please provide more information, such as zeppin instance location. Is it running on your desktop/laptop or is it running as AWS Notebook server? Also did you try connecting to zeppelin 0.7.3 version, as mentioned here in this AWS forum link :
https://forums.aws.amazon.com/thread.jspa?threadID=285128
As per the above link dated Jul 2018, think AWS Glue doesn't yet support Zeppelin 0.8 version.
I am assuming all other configurations, environment settings are done as needed. Can help more, if you can provide additional info.
UPDATE:
Anyway, please refer here and setting up zeppelin on windows, for any help on setting up local development environment & zeppelin notebook.
Once you set up the zeppelin notebook, have an SSH connection established (using AWS Glue DevEndpoint URL), so you can have access to the data catalog/crawlers,etc., and also the S3 bucket where your data resides. Then, you can create your python scripts in the zeppelin notebook, and run from the zeppelin.
You can use dev instance provided by Glue, but you may incur additional costs for the same(EC2 instance charges).
Environment settings (updated in response to comments):
JAVA_HOME=E:\Java7\jre7
Path=E:\Python27;E:\Python27\Lib;E:\Python27\Scripts;
PYTHONPATH=E:\spark-2.1.0-bin-hadoop2.7\python;E:\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip;E:\spark-2.1.0-bin-hadoop2.7\python\lib\pys
park.zip
SPARK_HOME=E:\spark-2.1.0-bin-hadoop2.7
Change the drive name/ folders accordingly. Let me know if any help neeed.

Collecting Metrics with Graphite Plugin leads to "A metric named [..] already exists" error

when i configure the flink-conf.yaml to collect metrics with the graphite plugin,
the most time only incomplete metrics are being sent. On the Taskmanager output multiple errors occur like:
2018-08-15 00:58:59,016 WARN org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while registering metric.
java.lang.IllegalArgumentException: A metric named mycomputer.taskmanager.8ceab4c3dfbf9fc5fa2af0447f1373a1.State machine job.Source: Custom Source.0.numRecordsOut already exists
at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
at org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
at org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:329)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:379)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:312)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:302)
at org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.<init>(OperatorIOMetricGroup.java:41)
at org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.<init>(OperatorMetricGroup.java:48)
at org.apache.flink.runtime.metrics.groups.TaskMetricGroup.addOperator(TaskMetricGroup.java:146)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:174)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:143)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
I've tried this on a completely freshly prepared flink-1.6.0 release with following config and the precompiled "State machine job" in the examples folder:
metrics.reporters: grph
metrics.reporter.grph.class: org.apache.flink.metrics.graphite.GraphiteReporter
metrics.reporter.grph.host: localhost
metrics.reporter.grph.port: 2003
metrics.reporter.grph.interval: 1 SECONDS
metrics.reporter.grph.protocol: TCP
I use the official graphite docker image (https://hub.docker.com/r/graphiteapp/docker-graphite-statsd/) that is running on the default configuration.
Has anybody an idea, how i can fix this issue?
Thank's and best regards
update
to exclude that a specific local setting is responsible for this behaviour, I repeated the process on a clean EC2 instance. There's exactly the same error here.
How to reproduce:
start EC2 t2.xlarge
installed java
download flink at https://www.apache.org/dyn/closer.lua/flink/flink-1.6.0/flink-1.6.0-bin-scala_2.11.tgz
added the flink-metrics-graphite-1.6.0.jar to lib
configured the flink-yaml.conf as mentioned in my previous post
./bin/start-cluster.sh
./bin/flink run examples/streaming/StateMachineExample.jar
I have not set up graphite in this case, because the error obviously already
occurs before.
After the job has been started you can view the error in the flink dashboard under Task Manager -> Logs

GAE: AssertionError: No api proxy found for service "datastore_v3"

I'm writing simple to code to access dev server. Both dev server and datastore emulated have been started locally.
from google.appengine.ext import ndb
class Account(ndb.Model):
name = ndb.StringProperty()
acc = Account(name=u"test").put()
print(acc)
Error:
AssertionError: No api proxy found for service "datastore_v3"
I tried to set: export DATASTORE_EMULATOR_HOST=localhost:8760 . It does not help.
$ dev_appserver.py ./app.yaml
WARNING 2017-02-20 06:40:23,130 application_configuration.py:176] The "python" runtime specified in "./app.yaml" is not supported - the "python27" runtime will be used instead. A description of the differences between the two can be found here:
https://developers.google.com/appengine/docs/python/python25/diff27
INFO 2017-02-20 06:40:23,131 devappserver2.py:764] Skipping SDK update check.
INFO 2017-02-20 06:40:23,508 api_server.py:268] Starting API server at: http://localhost:53755
INFO 2017-02-20 06:40:23,514 dispatcher.py:199] Starting module "default" running at: http://localhost:8080
INFO 2017-02-20 06:40:23,517 admin_server.py:116] Starting admin server at: http://localhost:8000
GAE app code cannot run as a standalone python app, it can only run inside a GAE app which runs inside your dev server. Typically as part of a handler code, triggered via http requests to the dev server.
You need to put that code inside one of your app handlers. For example inside the get() method of the MainPage handler from main.py in the Quickstart for Python App Engine Standard Environment (actually it'd be better in a post() method since your code is writing to the datastore).

Generated assets not found play / scala framework

I'm new to scala. I'm trying to get this really interesting example from Matthias Nehlsen to work - github source.
I confirmed that I'm getting twitter data correctly (activator run). However, I'm not getting the generated public assets:
http://hostname:9000/angular/
Gives me this 404:
http://hostname:9000/assets/build/angular/birdwatch.js
http://hostname:9000/#
Gives me a 404 for this:
http://hostname:9000/assets/build/react-js/birdwatch.js
I am running using typesafe-activator 1.2.3 using: "activator run"
BirdWatch% activator run
[info] Loading project definition from ~/git/BirdWatch/project
[info] Set current project to BirdWatch (in build file:~/git/BirdWatch/)
--- (Running the application from SBT, auto-reloading is enabled) ---
[info] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
(Server started, use Ctrl+D to stop and go back to the console...)
[info] a.e.s.Slf4jLogger - Slf4jLogger started
[info] application - Starting new client
On start (forgot to add, after a clean):
Compiling 13 Scala sources and 1 Java source to ~/git/BirdWatch/target/scala-2.10/classes...
[warn] /Users/NathanDunn/git/BirdWatch/conf/routes:6: unreachable code
[warn] GET /cljs-dev/ controllers.BirdWatch.indexCljs
[warn] /Users/NathanDunn/git/BirdWatch/conf/routes:9: unreachable code
[warn] GET /cljs/ controllers.BirdWatch.indexCljsOpt
[warn] two warnings found
I think that this is just indicating that the previous pattern subsumes the latter.
You Need to build the Clients.
Quoting the author :
The different clients are located in the clients/ folder, each of them needs to be built in order to work. Client build artifacts are no longer checked into the git repository in order to avoid bloating the repository size.
Have a look at their read me files:
You need Node.js, Grunt and Bower to build the angular client.
https://github.com/matthiasn/BirdWatch/tree/master/clients/angular
https://github.com/matthiasn/BirdWatch/blob/master/clients/react-js

Resources