From this tutorial, it says we can run flink by start-local.bat. But flink 1.1x has no such .bat files any more. From recent tuturial, you have to run flink by WSL or Cygwin.
Flink itself runs fine on windows. The only issue is with the scripts used to manage the cluster, and to submit jobs. For that you need some way to run linux shell scripts.
This needn't have much impact; many Flink developers never install a local cluster anyway. You can develop in your IDE just fine, and then use Docker whenever you want to bring up a local environment that resembles production.
Related
I'd like to run an instance of Vespa outside of a container (e.g. Docker). The Docker path is definitely quite convenient and works great. But I would like to go thru the process by hand of setting up an instance on macOS and seeing more of the 'nuts and bolts' of Vespa.
It appears there are nice docs which outline a path to building RPM's for Centos, etc. Would walking thru that process and adapting to macOS be my best bet?
Unfortunately, running Vespa on MacOS directly is not yet supported. I'd suggest instead running a CentOS VM or cloud instance and experimenting there.
I have developed a hybrid framework using a maven project, POM, TestNG, etc. It's running fine now I wanted to copy the entire project from one laptop to another laptop so on first laptop I can continue with my work and second laptop I can use it just to execute the scripts which will same my lot of time.
On daily basis I take backup on OneDrive. I have some questions:
Can anybody guide me how to copy the entire project? Do I need to have the same version of Java and Eclipse on second laptop? Anything else need to be installed?
On a daily basis how do I get the backup data from 'OneDrive' to a second laptop?
This sounds like you want a repository. Use Github, Gitlab, Bitbucket, just.. git in general. That's exactly what this is for.
As for your Java and Eclipse versions, you need to look at your running version of selenium, what packages you are using, etc, and determine for yourself what Java version you should be running. The latest version of the jdk is going to have everything the earlier ones had, so it's usually a safe bet to use the latest stable version. Your Eclipse version should always be the latest as well as it is just an IDE and shouldn't have any impact on how your program runs.
Another option is to use a virtual environment (a virtual-env) and upload that to your git repository, this is a localized version of java present inside the project, that can be carried along with it, although this bloats your repository massively.
Try using git and github and you don't have to take backup and need to work on a specific laptop
I am hoping for guidance on how to set --environment_config when running the Beam wordcount.py demo.
It runs fine with DirectRunner. Flink's wordcount also runs fine (ie running Flink via flink run).
I would like to run Beam using the Flink runner using a "seperate Flink cluster" as described in the beam documentation. I can't use Docker, so I plan to use --environment_type=PROCESS.
I am using the following inside the python code to set environment_config:
environment_config = dict()
environment_config['os'] = platform.system().lower()
environment_config['arch'] = platform.machine()
environment_config['command'] = 'ls'
ec = "--environment_config={}".format(json.dumps(environment_config))
Obviously the command is incorrect. When I run this, Flink does receive and successfully process the DataSource sub-tasks. It eventually time-outs on the CHAIN MapPartitions.
Could someone provide guidance (or links) as to how to set environment_config? I am running Beam within a Singularity container.
For environment_type=DOCKER, most everything's taken care of for you, but in process mode you have to do a lot of setup yourself. The command you're looking for is sdks/python/container/build/target/launcher/linux_amd64/boot. You will be required to have both that executable (which you can build from source using ./gradlew :sdks:python:container:build) and a Python installation including Beam and other dependencies on all of your worker machines.
The best example I know of is here: https://github.com/apache/beam/blob/cbf8a900819c52940a0edd90f59bf6aec55c817a/sdks/python/test-suites/portable/py2/build.gradle#L146-L165
The task: Run a tensorflow train.py script I wrote in the cloud with at least 32GB of memory.
Requirements: The script has some dependencies like numpy, scipy, and mkt. I need to be able to install these. I just want a no-nonsense ssh shell like experience. I want to put all my files including the training data in a directory, pip install the packages if necessary, then just hit python train.py and let it run. I'm not looking to run a web app or have Google's machine learning platform do it for me.
All the tutorials around seem needlessly complicated, like they're meant for scaled deployments with http requests and all that. I'm looking for a simple way to run code on a server since my computer is too weak for machine learning.
Don't use AppEngine -- use Compute Engine instead. Almost the same thing, but very simple and you are completely in control of what you run, what you install etc.
Simple steps that should work for you:
-Create a Compute Engine instance
-Chose operating system (Ubuntu xx, but you can choose others instead)
-Chose how many CPUs and how much memory you want (select Customize in order to set yourself the CPU/memory ratio rather than getting default options)
-Enable HTTP/HTTPs in order to be able to use Tensorboard later
-Once created, SSH into the machine. Python is already pre-installed (2.7 default, but 3.x also available as Python3 alias)
-Install Tensorflow, Numpy, Pandas, and whatever you want with simple PIP
-You can also install Bazel if you want to build Tensorflow from source and to speed up the CPU operations
-Install gcsfuse if you want to copy/paste stuff quickly from cloud storage buckets
-Use tmux if you want to run several Tensorflow sessions in parallel (i.e.to try different hyperparameters/etc.)
This is all very clean and simple and works really well. Don't forget to shut it down after finished. You can also create a Preemptable instance to make it super-cheap (but it can be shut down at any time without warning, but happens rarely).
I am currently working on an automated build/CI system for various embedded firmware projects which have been developed in Rowley Associates CrossStudio. The projects can each be built at the command line using CrossBuild.
Now, on to the Docker part:
We need a way of guaranteeing consistent build environments. A build must run identically on any engineer workstation or the build server. As all of the build steps, including running CrossBuild can be executed in a command line Linux environment, I opted to use Docker containers to guarantee environmental consistency.
My intention is to use Docker containers as disposable 'build bots' in the following way. When a build is initiated (either manually by the engineer or by an automated build process), a container is created from the appropriate image, the process runs to completion, outputs are copied to persistent storage and then the container is thrown away.
At the moment, I'm in the process of walking through the build steps manually to prove that everything works as I expected. It doesn't!
I have a Docker container with the appropriate tools installed and can manually invoke CrossBuild and successfully build my project. Unfortunately, the build takes about 30 minutes to complete. This compares to a build time of ~1.5 minutes if I use the same tool directly on my Windows workstation.
I have a Windows 7 (x64) workstation and so to run the Docker container, I'm using Boot2Docker on VirtualBox.
If I observe the CPU and memory usage of the Docker container (either by running ps -aux inside the Boot2Docker VM or observing the resource usage of the Boot2Docker VM in Windows Task Manager), barely any resources are being used (<5% CPU usage, tens of megabytes of RAM). If I build the project using CrossBuild directly on Windows, the CPU usage fluctuates but peaks at 25% (i.e. maxing out one of my 4 threads).
I have proved that, in principle, processes inside the Docker container can occupy all available CPU resources by writing a simple infinite loop in Python, running it and observing CPU usage in Task Manager on the host PC. As expected, a single core was fully utilised.
Further information
Behind the scenes, CrossBuild is driving GCC-ARM
In order to get data in to and out of the Docker container, I'm using VirtualBox shared folders and then creating the container using the -v argument for each share.
Current lines of enquiry
I just had a moment of inspiration and started to wonder whether there might be a read/write bandwidth constraint caused by the way that I'm getting data in and out of the container (i.e. the CPU is never being fully utilised as most of the time is spent waiting for reads and writes). I will investigate this possibility.
Sharing drives from Windows to VirtualBox is notoriously slow. If you want to build from your local machine, use Docker for Windows instead. If you want to replicate a cloud CI environment, you create a volume
docker volume create --name mydata
upload data to it
docker run -v mydata:/data --name temp alpine top
docker cp /my/local/dir temp:/data
docker rm -f temp
Then mount that docker volume as needed in your other CI container (that step can be included in the above).
Note that for a real CI, your data could come from other sources like github. In that case, you can create a container just to download the data into the docker volume.