Why does my automated build run so slowly inside a Docker container?

Why does my automated build run so slowly inside a Docker container? - c

I am currently working on an automated build/CI system for various embedded firmware projects which have been developed in Rowley Associates CrossStudio. The projects can each be built at the command line using CrossBuild.
Now, on to the Docker part:
We need a way of guaranteeing consistent build environments. A build must run identically on any engineer workstation or the build server. As all of the build steps, including running CrossBuild can be executed in a command line Linux environment, I opted to use Docker containers to guarantee environmental consistency.
My intention is to use Docker containers as disposable 'build bots' in the following way. When a build is initiated (either manually by the engineer or by an automated build process), a container is created from the appropriate image, the process runs to completion, outputs are copied to persistent storage and then the container is thrown away.
At the moment, I'm in the process of walking through the build steps manually to prove that everything works as I expected. It doesn't!
I have a Docker container with the appropriate tools installed and can manually invoke CrossBuild and successfully build my project. Unfortunately, the build takes about 30 minutes to complete. This compares to a build time of ~1.5 minutes if I use the same tool directly on my Windows workstation.
I have a Windows 7 (x64) workstation and so to run the Docker container, I'm using Boot2Docker on VirtualBox.
If I observe the CPU and memory usage of the Docker container (either by running ps -aux inside the Boot2Docker VM or observing the resource usage of the Boot2Docker VM in Windows Task Manager), barely any resources are being used (<5% CPU usage, tens of megabytes of RAM). If I build the project using CrossBuild directly on Windows, the CPU usage fluctuates but peaks at 25% (i.e. maxing out one of my 4 threads).
I have proved that, in principle, processes inside the Docker container can occupy all available CPU resources by writing a simple infinite loop in Python, running it and observing CPU usage in Task Manager on the host PC. As expected, a single core was fully utilised.
Further information
Behind the scenes, CrossBuild is driving GCC-ARM
In order to get data in to and out of the Docker container, I'm using VirtualBox shared folders and then creating the container using the -v argument for each share.
Current lines of enquiry
I just had a moment of inspiration and started to wonder whether there might be a read/write bandwidth constraint caused by the way that I'm getting data in and out of the container (i.e. the CPU is never being fully utilised as most of the time is spent waiting for reads and writes). I will investigate this possibility.

Sharing drives from Windows to VirtualBox is notoriously slow. If you want to build from your local machine, use Docker for Windows instead. If you want to replicate a cloud CI environment, you create a volume
docker volume create --name mydata
upload data to it
docker run -v mydata:/data --name temp alpine top
docker cp /my/local/dir temp:/data
docker rm -f temp
Then mount that docker volume as needed in your other CI container (that step can be included in the above).
Note that for a real CI, your data could come from other sources like github. In that case, you can create a container just to download the data into the docker volume.

Related

Why might Apache Flink write files on a Windows box, but not write files on a Linux container using simple FileSink and SimpleStringEncoder?

I'm working with the examples provided in 'flink-training' in the GitHub repository here. Specifically, I'm working on the 'ride-cleansing' example.
I've replaced the PrintSinkFunction with a simple FileSink configured as follows:
FileSink fileSink =
FileSink.forRowFormat(new Path(args[0]),
new SimpleStringEncoder<String>("UTF-8"))
.withRollingPolicy(DefaultRollingPolicy.builder()
.withRolloverInterval(Duration.ofMinutes(1))
.withInactivityInterval(Duration.ofSeconds(30))
.withMaxPartSize(512 * 512 * 512)
.build())
.build();
When I run this example on my local machine in Intellij, the expected directory are created and files are created to reflect the data streamed to the sink.
However, when I run this same example on a Linux box (on Google Colab), the directory is created, but no files are created, regardless of how long I leave it running (I've tried 10+ minutes).
On the Linux Container, I'm running the example using the gradle setup and the following command:
./gradlew :ride-cleansing:runJavaSolution --args="/content/datastream"
On the Windows box, I'm just executing the RideCleansingSolution 'main' with a simple 'Application' run configuration.
What might be different about my setup on the two systems that would decide whether data is written?

it might not work, but if you set up mono develop on whatever nx your using and write it all in c# via Xamarin in VS.NET23 it MIGHT work seamlessly across all platforms and arches... but I'm just spitballing here so `_o_/'

Running Vespa outside Docker

I'd like to run an instance of Vespa outside of a container (e.g. Docker). The Docker path is definitely quite convenient and works great. But I would like to go thru the process by hand of setting up an instance on macOS and seeing more of the 'nuts and bolts' of Vespa.
It appears there are nice docs which outline a path to building RPM's for Centos, etc. Would walking thru that process and adapting to macOS be my best bet?

Unfortunately, running Vespa on MacOS directly is not yet supported. I'd suggest instead running a CentOS VM or cloud instance and experimenting there.

Does Flink support the Windows OS?

From this tutorial, it says we can run flink by start-local.bat. But flink 1.1x has no such .bat files any more. From recent tuturial, you have to run flink by WSL or Cygwin.

Flink itself runs fine on windows. The only issue is with the scripts used to manage the cluster, and to submit jobs. For that you need some way to run linux shell scripts.
This needn't have much impact; many Flink developers never install a local cluster anyway. You can develop in your IDE just fine, and then use Docker whenever you want to bring up a local environment that resembles production.

Simple Tensorflow with Custom Packages on Google Cloud

The task: Run a tensorflow train.py script I wrote in the cloud with at least 32GB of memory.
Requirements: The script has some dependencies like numpy, scipy, and mkt. I need to be able to install these. I just want a no-nonsense ssh shell like experience. I want to put all my files including the training data in a directory, pip install the packages if necessary, then just hit python train.py and let it run. I'm not looking to run a web app or have Google's machine learning platform do it for me.
All the tutorials around seem needlessly complicated, like they're meant for scaled deployments with http requests and all that. I'm looking for a simple way to run code on a server since my computer is too weak for machine learning.

Don't use AppEngine -- use Compute Engine instead. Almost the same thing, but very simple and you are completely in control of what you run, what you install etc.
Simple steps that should work for you:
-Create a Compute Engine instance
-Chose operating system (Ubuntu xx, but you can choose others instead)
-Chose how many CPUs and how much memory you want (select Customize in order to set yourself the CPU/memory ratio rather than getting default options)
-Enable HTTP/HTTPs in order to be able to use Tensorboard later
-Once created, SSH into the machine. Python is already pre-installed (2.7 default, but 3.x also available as Python3 alias)
-Install Tensorflow, Numpy, Pandas, and whatever you want with simple PIP
-You can also install Bazel if you want to build Tensorflow from source and to speed up the CPU operations
-Install gcsfuse if you want to copy/paste stuff quickly from cloud storage buckets
-Use tmux if you want to run several Tensorflow sessions in parallel (i.e.to try different hyperparameters/etc.)
This is all very clean and simple and works really well. Don't forget to shut it down after finished. You can also create a Preemptable instance to make it super-cheap (but it can be shut down at any time without warning, but happens rarely).

Work on a remote project with Eclipse via SSH

I have the following boxes:
a) A Windows box with Eclipse CDT,
b) A Linux box, accessible for me only via SSH.
Both the compiler and the hardware required to build and run my project is only on machine B.
I'd like to work "transparently" from a Windows box on that project using Eclipse CDT and be able to build, run and debug the project remotely from within the IDE.
How do I set up that:
The building will work? Any simpler solutions than writing a local makefile which would rsync the project and then call a remote makefile to initiate the actual build? Does Eclipse managed build have a feature for that?
The debugging will work?
Preferably - the Eclipse CDT code indexing will work? Do I have to copy all required header files from machine B to machine A and add them to include path manually?

Try the Remote System Explorer (RSE). It's a set of plug-ins to do exactly what you want.
RSE may already be included in your current Eclipse installation. To check in Eclipse Indigo go to Window > Open Perspective > Other... and choose Remote System Explorer from the Open Perspective dialog to open the RSE perspective.
To create an SSH remote project from the RSE perspective in Eclipse:
Define a new connection and choose SSH Only from the Select Remote System Type screen in the New Connection dialog.
Enter the connection information then choose Finish.
Connect to the new host. (Assumes SSH keys are already setup.)
Once connected, drill down into the host's Sftp Files, choose a folder and select Create Remote Project from the item's context menu. (Wait as the remote project is created.)
If done correctly, there should now be a new remote project accessible from the Project Explorer and other perspectives within eclipse. With the SSH connection set-up correctly passwords can be made an optional part of the normal SSH authentication process. A remote project with Eclipse via SSH is now created.

The very simplest way would be to run Eclipse CDT on the Linux Box and use either X11-Forwarding or remote desktop software such as VNC.
This, of course, is only possible when you Eclipse is present on the Linux box and your network connection to the box is sufficiently fast.
The advantage is that, due to everything being local, you won't have synchronization issues, and you don't get any awkward cross-platform issues.
If you have no eclipse on the box, you could thinking of sharing your linux working directory via SMB (or SSHFS) and access it from your windows machine, but that would require quite some setup.
Both would be better than having two copies, especially when it's cross-platform.

I'm in the same spot myself (or was), FWIW I ended up checking out to a samba share on the Linux host and editing that share locally on the Windows machine with notepad++, then I compiled on the Linux box via PuTTY. (We weren't allowed to update the ten y/o versions of the editors on the Linux host and it didn't have Java, so I gave up on X11 forwarding)
Now... I run modern Linux in a VM on my Windows host, add all the tools I want (e.g. CDT) to the VM and then I checkout and build in a chroot jail that closely resembles the RTE.
It's a clunky solution but I thought I'd throw it in to the mix.

My solution is similar to the SAMBA one except using sshfs. Mount my remote server with sshfs, open my makefile project on the remote machine. Go from there.
It seems I can run a GUI frontend to mercurial this way as well.
Building my remote code is as simple as: ssh address remote_make_command
I am looking for a decent way to debug though. Possibly via gdbserver?

I tried ssh -X but it was unbearably slow.
I also tried RSE, but it didn't even support building the project with a Makefile (I'm being told that this has changed since I posted my answer, but I haven't tried that out)
I read that NX is faster than X11 forwarding, but I couldn't get it to work.
Finally, I found out that my server supports X2Go (the link has install instructions if yours does not). Now I only had to:
download and unpack Eclipse on the server,
install X2Go on my local machine (sudo apt-get install x2goclient on Ubuntu),
configure the connection (host, auto-login with ssh key, choose to run Eclipse).
Everything is just as if I was working on a local machine, including building, debugging, and code indexing. And there are no noticeable lags.

I had the same problem 2 years ago and I solved it in the following way:
1) I build my projects with makefiles, not managed by eclipse
2) I use a SAMBA connection to edit the files inside Eclipse
3) Building the project:
Eclipse calles a "local" make with a makefile which opens a SSH connection
to the Linux Host. On the SSH command line you can give parameters which
are executed on the Linux host. I use for that parameter a makeit.sh shell script
which call the "real" make on the linux host.
The different targets for building you can give also by parameters from
the local makefile --> makeit.sh --> makefile on linux host.

The way I solved that one was:
For windows:
Export the 'workspace' directory from the Linux machine using samba.
Mount it locally in windows.
Run Eclipse, using the mounted 'workspace' directory as the eclipse workspace.
Import the project you want and work on it.
For Linux:
Mount the 'workspace' directory using sshfs
Run Eclipse.
Run Eclipse, using the mounted 'workspace' directory as the eclipse workspace.
Import the project you want and work on it.
In both cases you can either build and run through Eclipse, or build on the remote machine via ssh.

For this case you can use ptp eclipse https://eclipse.org/ptp/ for source browsing and building.
You can use this pluging to debug your application
http://marketplace.eclipse.org/content/direct-remote-c-debugging

How to edit in Eclipse locally, but use a git-based script I wrote (sync_git_repo_from_pc1_to_pc2.sh) to synchronize and build remotely
The script I wrote to do this is sync_git_repo_from_pc1_to_pc2.sh.
Readme: README_git-sync_repo_from_pc1_to_pc2.md
Update: see also this alternative/competitor: GitSync:
How to use Sublime over SSH
https://github.com/jachin/GitSync
This answer currently only applies to using two Linux computers [or maybe works on Mac too?--untested on Mac] (syncing from one to the other) because I wrote this synchronization script in bash. It is simply a wrapper around git, however, so feel free to take it and convert it into a cross-platform Python solution or something if you wish
This doesn't directly answer the OP's question, but it is so close I guarantee it will answer many other peoples' question who land on this page (mine included, actually, as I came here first before writing my own solution), so I'm posting it here anyway.
I want to:
develop code using a powerful IDE like Eclipse on a light-weight Linux computer, then
build that code via ssh on a different, more powerful Linux computer (from the command-line, NOT from inside Eclipse)
Let's call the first computer where I write the code "PC1" (Personal Computer 1), and the 2nd computer where I build the code "PC2". I need a tool to easily synchronize from PC1 to PC2. I tried rsync, but it was insanely slow for large repos and took tons of bandwidth and data.
So, how do I do it? What workflow should I use? If you have this question too, here's the workflow that I decided upon. I wrote a bash script to automate the process by using git to automatically push changes from PC1 to PC2 via a remote repository, such as github. So far it works very well and I'm very pleased with it. It is far far far faster than rsync, more trustworthy in my opinion because each PC maintains a functional git repo, and uses far less bandwidth to do the whole sync, so it's easily doable over a cell phone hot spot without using tons of your data.
Setup:
Install the script on PC1 (this solution assumes ~/bin is in your $PATH):
git clone https://github.com/ElectricRCAircraftGuy/eRCaGuy_dotfiles.git
cd eRCaGuy_dotfiles/useful_scripts
mkdir -p ~/bin
ln -s "${PWD}/sync_git_repo_from_pc1_to_pc2.sh" ~/bin/sync_git_repo_from_pc1_to_pc2
cd ..
cp -i .sync_git_repo ~/.sync_git_repo
Now edit the "~/.sync_git_repo" file you just copied above, and update its parameters to fit your case. Here are the parameters it contains:
# The git repo root directory on PC2 where you are syncing your files TO; this dir must *already exist*
# and you must have *already `git clone`d* a copy of your git repo into it!
# - Do NOT use variables such as `$HOME`. Be explicit instead. This is because the variable expansion will
# happen on the local machine when what we need is the variable expansion from the remote machine. Being
# explicit instead just avoids this problem.
PC2_GIT_REPO_TARGET_DIR="/home/gabriel/dev/eRCaGuy_dotfiles" # explicitly type this out; don't use variables
PC2_SSH_USERNAME="my_username" # explicitly type this out; don't use variables
PC2_SSH_HOST="my_hostname" # explicitly type this out; don't use variables
Git clone your repo you want to sync on both PC1 and PC2.
Ensure your ssh keys are all set up to be able to push and pull to the remote repo from both PC1 and PC2. Here's some helpful links:
https://help.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh
https://help.github.com/en/github/authenticating-to-github/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
Ensure your ssh keys are all set up to ssh from PC1 to PC2.
Now cd into any directory within the git repo on PC1, and run:
sync_git_repo_from_pc1_to_pc2
That's it! About 30 seconds later everything will be magically synced from PC1 to PC2, and it will be printing output the whole time to tell you what it's doing and where it's doing it on your disk and on which computer. It's safe too, because it doesn't overwrite or delete anything that is uncommitted. It backs it up first instead! Read more below for how that works.
Here's the process this script uses (ie: what it's actually doing)
From PC1: It checks to see if any uncommitted changes are on PC1. If so, it commits them to a temporary commit on the current branch. It then force pushes them to a remote SYNC branch. Then it uncommits its temporary commit it just did on the local branch, then it puts the local git repo back to exactly how it was by staging any files that were previously staged at the time you called the script. Next, it rsyncs a copy of the script over to PC2, and does an ssh call to tell PC2 to run the script with a special option to just do PC2 stuff.
Here's what PC2 does: it cds into the repo, and checks to see if any local uncommitted changes exist. If so, it creates a new backup branch forked off of the current branch (sample name: my_branch_SYNC_BAK_20200220-0028hrs-15sec <-- notice that's YYYYMMDD-HHMMhrs--SSsec), and commits any uncommitted changes to that branch with a commit message such as DO BACKUP OF ALL UNCOMMITTED CHANGES ON PC2 (TARGET PC/BUILD MACHINE). Now, it checks out the SYNC branch, pulling it from the remote repository if it is not already on the local machine. Then, it fetches the latest changes on the remote repository, and does a hard reset to force the local SYNC repository to match the remote SYNC repository. You might call this a "hard pull". It is safe, however, because we already backed up any uncommitted changes we had locally on PC2, so nothing is lost!
That's it! You now have produced a perfect copy from PC1 to PC2 without even having to ensure clean working directories, as the script handled all of the automatic committing and stuff for you! It is fast and works very well on huge repositories. Now you have an easy mechanism to use any IDE of your choice on one machine while building or testing on another machine, easily, over a wifi hot spot from your cell phone if needed, even if the repository is dozens of gigabytes and you are time and resource-constrained.
Resources:
The whole project: https://github.com/ElectricRCAircraftGuy/eRCaGuy_dotfiles
See tons more links and references in the source code itself within this project.
How to do a "hard pull", as I call it: How do I force "git pull" to overwrite local files?
Related:
git repository sync between computers, when moving around?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight