`ray up` only launches Docker container on head - distributed

I am trying to use the Ray library to launch runs on multiple remote machines with Docker. Per the docs, I use ray up CONFIG_YAML to set up my cluster and ray submit [OPTIONS] CLUSTER_CONFIG_FILE SCRIPT to run a script on them. The problem is that the process/container only launches on the head node and nothing runs on the workers.
Examining the source, ray up CONFIG_YAML calls the function create_or_update_cluster and ray submit [OPTIONS] CLUSTER_CONFIG_FILE SCRIPT calls submit. Neither of these appear to interact with any node except the head.
Here is my Dockerfile: https://github.com/lobachevzky/ppo/blob/debug/Dockerfile
Here is my cluster config file: https://github.com/lobachevzky/ppo/blob/debug/tune.yaml
Here is my script: https://github.com/lobachevzky/ppo/blob/debug/tune_script.py

Related

Setting up a Flink cluster with Podman for a beampipeline with flinkrunner

My goal is to create a streaming pipeline to read data from Apache Kafka, process the data, and write back to it.
Because of security reasons, I want to avoid Docker and use Podman.
I have set up a minimal cluster via a docker-compose.yml with a jobmanager, taskmanager and a Python SDK harness worker. The SDK harness worker seems to get stuck when i try to execute a pipeline.
Running the pipeline (reading a multi-line .txt file and writing it back in a file) it gets transferred to the jobmanager and taskmanager correctly, but then goes idle. When I look in the pythonsdk container, the logs show the following message repeatedly:
2022/12/04 16:13:02 Starting worker pool 1: python -m
apache_beam.runners.worker.worker_pool_main --service_port=50000
--container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1',
'--logging_endpoint=localhost:45087',
'--artifact_endpoint=localhost:35323',
'--provision_endpoint=localhost:36435',
'--control_endpoint=localhost:33237']
2022/12/04 16:16:31 Failed to obtain provisioning information: failed to
dial server at localhost:36435
caused by:
context deadline exceeded
Here is a link to a test pipeline that was created:
Example on github
Environment:
Debian 11;
Podman;
Python 3.2.9;
apache-beam==2.38.0; and
podman-compose
The setup of the cluster defined in:
docker-compose.yml
1x flink-jobmanager (flink version 1.14)
1x flink-taskmanager
1x Python Harness SDK
I chose to create a SDK container manually because I don't have Docker installed and Flink fails when it tries to create a container
over Docker.
I suspect that I have made a mistake in the network setup or there are some configurations missing for the harness worker, but I could not figure out the problem. Any thoughts?
Crossposted in user mailing list of beam.apache.org

Deploying Haskell yesod docker container on google app engine

I am trying to upload a yesod Docker container on Google App Engine. The source code is here and the Docker image is here.
I followed the documentation in the Custom runtime quickstart, and when invoking gcloud app deploy the app builds fine after increasing the build timeout, but the container either the readiness check when trying to start or shows the following timeout message:
ERROR: (gcloud.app.deploy) Operation [apps/meeshkan-github-webhook-router/operations/xxxx-xxxx-xxxx] timed out. This operation may still be underway.
I have tried experimenting with several things, including a manual readiness check, creating an /_ah/health endpoint, and increasing the timeout of the readiness check all the way to 1799 seconds, but none of these actions seem to work.
One issue may be the size of the container (it is 3.2gb), and I could try to prune it down, but I'd only do that if someone could confirm that container size is a contributing factor to deployment problems. Other than that, I'm not sure what could be causing this failure. The docker image starts fine on our local machines.
Thanks in advance for your help and suggestions!
The issue turned out to be that, because I was building on Windows, images built using Docker Desktop on Windows gave all shell scripts executable permission automatically, whereas Docker on Linux needs shell scripts to be given the executable permission. By adding this line to my Dockerfile:
RUN chmod +x /usr/src/app/run.sh
Everything worked fine!

How to change the date of a docker container without using libfaketime? [duplicate]

I have a container with a running program inside tomcat. I need to change date only in this container and test my program behaviour. I have time sensitive logic, and sometimes need to see what happens in a few days or months later.
Is it possible in docker? I read that if I change date in container, date will get changed on the host system. But it is a bad idea for me. I need to have a few instances of this application on one server and have possibilities of setting up different time for each instance.
But when I try to change date inside the container I get the error:
sudo date 04101812
date: cannot set date: Operation not permitted
Fri Apr 10 18:12:00 UTC 2015
It is very much possible to dynamically change the time in a Docker container, without effecting the host OS.
The solution is to fake it. This lib intercepts all system call programs use to retrieve the current time and date.
The implementation is easy. Add functionality to your Dockerfile as appropriate:
WORKDIR /
RUN git clone https://github.com/wolfcw/libfaketime.git
WORKDIR /libfaketime/src
RUN make install
Remember to set the environment variables LD_PRELOAD before you run the application you want the faked time applied to.
Example:
CMD ["/bin/sh", "-c", "LD_PRELOAD=/usr/local/lib/faketime/libfaketime.so.1 FAKETIME_NO_CACHE=1 python /srv/intercept/manage.py runserver 0.0.0.0:3000]
You can now dynamically change the servers time:
Example:
def set_time(request):
import os
import datetime
print(datetime.datetime.today())
os.environ["FAKETIME"] = "2020-01-01" # string must be "YYYY-MM-DD hh:mm:ss" or "+15d"
print(datetime.today())
That's not possible with Docker. Docker uses the same clock as the outside kernel. What you need is full virtualization which emulates a complete PC.
The sudo fails because it only makes you root of the virtual environment inside of the container. This user is not related to the real root of the host system (except by name and UID) and it can't do what the real root could do.
If you use a high level language like Python or Java, you often have hooks where you can simulate a certain system time for tests or you can write code which wraps "get current time from system" and returns what your test requires.
Specifically for Java, use joda-time. There you can inject your own time source using DateTimeUtils.setCurrentMillis*().
I created a Docker image containing libfaketime for use with Alpine but the process can be done in other distributions.
Here's an example of using it Java using Groovy as an example. But Tomcat can be used as well.
FROM groovy:alpine
COPY --from=trajano/alpine-libfaketime /faketime.so /lib/faketime.so
ENV LD_PRELOAD=/lib/faketime.so \
DONT_FAKE_MONOTONIC=1
Then build and pass the FAKETIME environment variable when doing a docker run for example
docker build -f fakedemo-java.Dockerfile . -t fakedemo
docker run --rm -e FAKETIME=+15d fakedemo groovy -e "print new Date();"
Source is in trajano
/
alpine-libfaketime | Github and the docker image is in trajano/alpine-libfaketime | dockerhub
I also created a variant of it based on Ubuntu: trajano
/
ubuntu-faketime | Github
This worked for me, maybe you could try it:
dpkg-reconfigure tzdata
Edit: Execute it inside the container you are having problems. An interface will appear. There you can edit the timezone and localtime for example, and set it correctly, that fixed my problem, that was the same as yours.
Good luck!
For me, I actually needed to set the actual date for testing. I found the following options work on Mac, but you have to realize you'll be changing the date for all of your containers because you're changing the date of the underlying Alpine VM that Docker uses for all of its containers.
OPTION 1: Change the date of your host machine & restart docker
Use this when:
You can restart docker.
You can change the date of your host machine
Steps:
Stop your containers.
Change the date of your machine via the Date & Time Preferences
Restart docker.
Start your containers.
Run this sequence again to get back to the right date & time.
OPTION 2: Change the date of the Alpine VM
Use this when:
You can't restart docker.
You can't set the date of your host machine
Steps:
screen ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty
The screen starts blank, hit enter a few times.
date -s [hh:mm]
All of your docker containers will now have your new time. You can
also use other formats, look for documentation on “busybox date” as
it’s not quite the same as other date implementations.
To exit hit control-a : and type d
This detaches the screen session, but leaves the tty running.
To reset the time:
screen -r
This resumes your tty.
ntpd -q
This uses the server defined in /etc/ntp.conf (this looks like a magic bridge back to the host clock)
To exit hit control-a : and type quit
This terminates your screen and tty session.
docker exec -it [Container Id] /bin/bash (exec into container)
rm /etc/localtime (see time zone)
ln -s /usr/share/zoneinfo/Asia/Karachi /etc/localtime (set new time zone)
My solution: I run docker on top of a virtualbox vm via vagrant.
This is not as complex as it sounds. The Vagrantfile below resets the datetime of the vm to a specific one, before the machine goes up. The corresponding docker-compose provisioner takes care of automatically running the docker-compose.yml.
Vagrantfile:
Vagrant.configure("2") do |config|
config.vm.define "ubuntuvm" do |ubuntuvm|
config.vm.box = "ubuntu/focal64"
config.vm.provision :docker
ubuntuvm.trigger.before :up do |trigger|
trigger.info = "changing time"
trigger.ruby do |env,machine|
require 'time'
machineTime = "2022/07/02 00:20:00"
offset = DateTime.parse(machineTime).strftime("%Q").to_i - DateTime.now.strftime("%Q").to_i
puts "Updating machine time to: #{machineTime} by shifting biossystemtime #{offset} ms"
puts `VBoxManage setextradata #{machine.id} "VBoxInternal/Devices/VMMDev/0/Config/GetHostTimeDisabled" 1`
puts `VBoxManage modifyvm #{machine.id} --biossystemtimeoffset #{offset}`
end
end
end
config.vm.hostname = "ubuntuvm"
# config.vm.network "forwarded_port", guest: 80 host: 8080
# config.vm.synched_folter "../data"", "/vagrant_data"
config.vm.provider "virtualbox" do |vb|
# vb.memory = "1024"
end
config.vm.provision :docker_compose, yml: "/vagrant/docker-compose.yml", rebuild: true, run: "always"
end
Then I just need to do a the following:
# start the machine and run the docker-compose
vagrant up
Everytime I start the machine I see the messages:
ubuntuvm: Updating machine time to: 2022/07/02 00:20:00 by shifting biossystemtime -212710744 ms
ubuntuvm: Running docker-compose up...
other commands:
# get inside the machine with ssh to check its time and also check the container's time (docker commands already exist inside there due to the provisioner).
vagrant ssh ubuntuvm
# stop the machine
vagrant halt
# destroy the machine
vagrant destroy
prerequisites
install virtualbox
install vagrant
Install docker-compose provisioner (optional for using docker-compose instead of just docker)
Known issues:
On the first run only, vagrant up will fail to update the time because for some reason the before-up trigger is executed even before the machine is created. On all other executions the machine is already there, so the time is reset as expected.
I was having the same problem with my jenkins docker instance following steps fixed my problem
exec into container
docker exec -it 9d41c699a8f4 /bin/bash
See time zone
cat /etc/timezone : out put Etc/UTC
set new time zone, with nano : Asia/Colombo (your timezone here)
Restart the container

How to see console.logs of a running nodeJs application on ubuntu 18 EC2 instance?

I am new to the node world. I created a node js rest API. When I run npm start in my local machine or in the terminal for the first time, I can see console.log() in my terminal. Now, I am running the same application on an AWS Ec2 instance with Ubuntu as os. I run npm start and serve my app on port 80. I do this via ssh and after running my server I close the ssh connection. But when I reconnect via ssh, I want to see those console.log() messages in my terminal for some purpose.
I completely understand that logging messages in the terminal is not a good idea and there can be so many alternatives. Just want to know how to access the same terminal window/result as we see when I start my application.
if you are using pm2, you can try "pm2 logs"
So Nodemon won't work well in a production server or in any instance where you need to have the app going by itself.
Nodemon is a dev tool to enable you to restart your server during development. In a "real" vps you need to place the process in the background or it will be automatically killed when the connection times out.
Check out this YT series for a correct deploy architecture in an NGINX server with the use of pm2 and NGINX on a Red Hat server, I've personally used it more than once:
https://www.youtube.com/playlist?list=PLQlWzK5tU-gDyxC1JTpyC2avvJlt3hrIh

Can't connect to localhost:8080 when trying to run Google App Engine program

I'm trying to run the Google App Engine Python 2.7 Hello World program and view it in a browser via Google App Engine Launcher. I followed the install and program instructions to the letter. I copied and pasted the code in the instructions to the helloworld.py file and app.yam1 and verified that they are correct and in the directory listed as the application directory. I hit run on the launcher and it runs with no errors, although I get no sign that is has completed (orange clock symbol next to app name). I get the following from the logs:
Running dev_appserver with the following flags: --skip_sdk_update_check=yes --port=8080 --admin_port=8000 Python command: /opt/local/bin/python2.7
When I try to open in the browser via the GAE Launcher, the 'browse' icon is grayed out and the browser won't open. I tried opening localhost:8080 in Firefox and Chrome as the tutorial suggests, but I get unable to connect errors from both.
How can I view Hello World in a browser? Is there some configuration I need to make on my machine?
I had the same problem. This seemed to fix it:
cd to google_appengine, run
python dev_appserver.py --port=8080 --host=127.0.0.1 /path/to/application
at this point there is a prompt to allow updates on running, I said Yes.
At this point the app was running as it should, also when I quit this and went in using the launcher again, that worked too.
I have to manually start python and make it point to my app folder, for instance in a command line window on Windows I am using python. I installed python in C:\Python27 and my sample app is in c:\GoogleApps\guestbook
C:\Python27>dev_appserver.py c:\GoogleApps\guestbook
and then I can start my app in the Google App Engine Launcher and hit localhost 8080
How about specifying --host argument? You can find it at the bottom of following doc.
https://developers.google.com/appengine/docs/python/tools/devserver
This might be a little late. But still someone might find it useful.
When ever you go and try changing the port number from 8080 to something else, it will not get updated. So the best option is:
Go to your user directory: eg: C:\Username
There will be a Google folder. Go inside
Open the file google_appengine_projects.ini
Change your port number from 8080 to whatever you like 8081
Save it and close the file.
Launch the GAE Launcher again and you will find the changes reflected and the app runs without issues.
7: Access the application using: http://localhost:NewPort/
This can be used to change ports both run port and admin port for your individual projects running locally.
Hope this helps!
The 8080 portion of your url is a port number. Firefox disables visiting url's of other ports by default. You have to enable them by doing the following: http://blog.christoffer.me/post/2012-02-20-how-to-remove-firefoxs-this-address-is-restricted/
Paraphrasing that website:
Open firefox and visit about:conf
In the Filter box, type in network.security.ports.banned.override
If you can't find such a preference, right click to open up the pop-up menu and pick New and then String
As preference name type network.security.ports.banned.override and 8080 as the value.
Done!
It's likely if this continues to not work that your browser is behaving properly (8080 is a fairly standard port). That means that its a problem with the server and we'd have to do some more debugging.

Resources