I am trying to create a very simple Docker application that includes a persistent SQL database and a (R) script that adds data to the database at regular intervals (but can be run more often, if needed). I am new to multi-container applications, but it's simple enough to run on-demand: I have an SQL database container and the (R-based) script container that connects to it. But what is the best way to schedule calls to the script?
I could create a cron job inside the container housing the script and run it repeatedly that way, but this feels like it might be violating the "one process per container" principle, and it wouldn't be easily scalable if I made things more complex. I could also run cron on my host, but that feels wrong. Other websites like this suggest creating a separate, persistent container just to coordinate cron jobs.
But if the job that I want run is IN a dockerized container itself, what's the best way to accomplish this? Can a cron container issue a docker run command on the "sleeping" script container? If possible, I assume it's best to only have containers running when you actually need them.
Lastly, would this all be able to be written into a docker-compose file?
I've successfully used a cron scheduler INSIDE the container housing the R script and OUTSIDE of all the containers as part of my host's crontab, but my research suggests these are bad ways to do it.
I also think that scheduled jobs should run in their own containers if possible.
You can make a super simple container scheduler using docker-in-docker and cron like this:
Dockerfile
FROM docker:dind
COPY crontab .
CMD crontab crontab && crond -f
example crontab file
* * * * * docker run -d hello-world
build and run with
docker build -t scheduler .
docker run -d -v /var/run/docker.sock:/var/run/docker.sock scheduler
It'll then run the hello-world image once every minute.
Related
Like many angularjs developers I have a test suite of protractor e2e tests. The test suite takes about half an hour to an hour to run. I would like to be able to run the tests nightly using some kind of cloud based setup, if possible. I'm having trouble figuring out how to host and run the protractor tests.
Is there a common cloud setup or some easy setup for running protractor e2e tests either on check-in or for a nightly build?
The easiest way (I don't say the best way) that I'm using currently is setup task scheduler job that is run on remote machine. This Task Scheduler job is triggered in 2AM and run windows batch file with few commands: first one pulls last version from git, second changes directory to where my automated tests can be run and last one is running tests (more precisely the smoke suite):
git pull origin develop
cd C:\arkcase-test\protractor
protractor protractor.chrome.conf.js --suite=smoke
Jenkins job is fine for this you can trigger a mail saying build success or failure after this nightly run .
You can even attach your HTML report in the mail .
How ever this slave PC should be online connected running the jenkins client.
My team are trying set-up Apache Flink (v1.4) cluster on Mesos/Marathon. We are using the docker image provided by mesosphere. It works really well!
Because of a new requirement, the task managers have to launched with extend runtime privileges. We can easily enable this runtime privileges for the app manager via the Marathon web UI. However, we cannot find a way to enable the privileges for task managers.
In Apache Spark, we can set spark.mesos.executor.docker.parameters privileged=true in Spark's configuration file. Therefore, Spark can pass this parameter to docker run command. I am wondering if Apache Flink allow us to pass a custom parameter to docker run when launching task managers. If not, how can we start task managers with extended runtime privileges?
Thanks
There is a new parameter mesos.resourcemanager.tasks.container.docker.parameters introduced in this commit which will allow passing arbitrary parameters to Docker.
Unfortunately, this is not possible as of right now (or only for the framework scheduler as Tobi pointed out).
I went ahead and created a Jira for this feature so you can keep track/add details/contribute it yourself: https://issues.apache.org/jira/browse/FLINK-8490
You should be able to tweak the setting for the parameters in the ContainerInfo of https://github.com/mesoshq/flink-framework/blob/master/index.js to support this. I’ll eventually update the Flink version in the Docker image...
I love using docker & docker-compose for both development and production environments.
But in my workflow, I keep considering dockers as disposable:
it means if I need to add a feature to my docker, I edit my Dockerfile, then run docker-compose build and docker-compose up -d and done.
But this time, the production DB is also in a Docker.
I still need to make some changes in my environment (eg. configuring backup), but now I can't rerun docker-compose build because this would imply a loss of all the data... It means I need to enter the docker (docker-compose run web /bin/bash) and run the commands inside it while still reporting those to my local Dockerfile to keep track of my changes.
Are there any best practices regarding this situation?
I thought of setting up a process that would dump the DB to a S3 bucket before container destruction, but it doesn't really scale to wide DBs...
I thought of making a Docker not destructible (how?), though it means losing the disposability of a container.
I thought of having a special partition that would be in charge of storing the data only and that would not get destructed when rebuilding the docker, though it feels hard to setup and unsecure.
So what?
Thanks
This is what data volumes are for. There is a whole page on the docker documentation site covering this.
The idea is that when you destroy the container, the data volume persists with data on it, and when you restart it the data hasn't gone anywhere.
I will say though, that putting databases in docker containers is hard. People have done it and had severe dataloss and severe job-loss.
I would recommend reading extensively on this topic before trusting your production data to docker containers. This is a great article explaining the perils of doing this.
I'm looking at implementing Team City and Octopus Deploy for CI and Deployment on demand. However, database deployment is going to be tricky as many are old .net applications with messy databases.
Redgate seems to have a nice plug-in for Team City, but the price will probably be stumbling block
What do you use? I'm happy to execute scripts, but it's the comparison aspect (i.e. what has changed) I'm struggling with.
We utilize a free tool called RoundhousE for handling database changes with our project, and it was rather easy to use it with Octopus Deploy.
We created a new project in our solution called DatabaseMigration, included the RoundhousE exe in the project, a folder where we keep the db change scripts for RoundhousE, and then took advantage of how Octopus can call powershell scripts before, during, and after deployment (PreDeploy.ps1, Deploy.ps1, and PostDeploy.ps1 respectively) and added a Deploy.ps1 to the project as well with the following in it:
$roundhouse_exe_path = ".\rh.exe"
$scripts_dir = ".\Databases\DatabaseName"
$roundhouse_output_dir = ".\output"
if ($OctopusParameters) {
$env = $OctopusParameters["RoundhousE.ENV"]
$db_server = $OctopusParameters["SqlServerInstance"]
$db_name = $OctopusParameters["DatabaseName"]
} else {
$env="LOCAL"
$db_server = ".\SqlExpress"
$db_name = "DatabaseName"
}
&$roundhouse_exe_path -s $db_server -d $db_name -f $scripts_dir --env $env --silent -o > $roundhouse_output_dir
In there you can see where we check for any octopus variables (parameters) that are passed in when Octopus runs the deploy script, otherwise we have some default values we use, and then we simply call the RoundhousE executable.
Then you just need to have that project as part of what gets packaged for Octopus, and then add a step in Octopus to deploy that package and it will execute that as part of each deployment.
We've looked at the RedGate solution and pretty much reached the same conclusion you have, unfortunately it's the cost that is putting us off that route.
The only things I can think of are to generate version controlled DB migration scripts based upon your existing database, and then execute these as part of your build process. If you're looking at .NET projects in future (that don't use a CMS), could potentially consider using entity framework code first migrations.
I remember looking into this a while back, and for me it seems that there's a whole lot of trust you'd have to get put into this sort of process, as auto-deploying to a Development or Testing server isn't so bad, as the data is probably replaceable... But the idea of auto-updating a UAT or Production server might send the willies up the backs of an Operations team, who might be responsible for the database, or at least restoring it if it wasn't quite right.
Having said that, I do think its the way to go, though, as its far too easy to be scared of database deployment scripts, and that's when things get forgotten or missed.
I seem to remember looking at using Red Gate's SQL Compare and SQL Data Compare tools, as (I think) there was a command-line way into it, which would work well with scripted deployment processes, like Team City, CruiseControl.Net, etc.
The risk and complexity comes in more when using relational databases. In a NoSQL database where everything is "document" I guess continuous deployment is not such a concern. Some objects will have the "old" data structure till they are updated via the newly released code. In this situation your code would need to be able to support different data structures potentially. Missing properties or those with a different type should probably be covered in a well written, defensively coded application anyway.
I can see the risk in running scripts against the production database, however the point of CI and Continuous Delivery is that these scripts will be run and tested in other environments first to iron out any "gotchas" :-)
This doesn't reduce the amount of finger crossing and wincing when you actually push the button to deploy though!
Having database deploy automation is a real challenge especially when trying to perform the build once deploy many approach as being done to native application code.
In the build once deploy many, you compile the code and creates binaries and then copy them within the environments. From the database point of view, is the equivalent to generate the scripts once and execute them in all environments. This approach doesn't handle merges from different branches, out-of-process changes (critical fix in production) etc…
What I know works for database deployment automation (disclaimer - I'm working at DBmaestro) as I hear this from my customers is using the build and deploy on demand approach. With this method you build the database delta script as part of the deploy (execute) process. Using base-line aware analysis the solution knows if to generate the deploy script for the change or protect the target and not revert it or pause and allow you to merge changes and resolve the conflict.
Consider a simple solution we have tried successfully at this thread - How to continuously delivery SQL-based app?
Disclaimer - I work at CloudMunch
We using Octopus Deploy and database projects in visual studio solution.
Build agent creates a nuget packages using octopack with a dacpac file and publish profiles inside and pushes it onto NuGet server.
Then release process utilizes the SqlPackage.exe utility to generate the update script for the release environment and adds it as an artifact to the release.
Previously created script executed in the next step with SQLCMD.exe utility.
This separation of create and execute steps gives us a possibility to have a manual step in between, so that someone verifies before the script is executed on Live environment, not to mention, that script saved as an artifact in the release can always be referred to, at any later point.
Would there be a demand I would provide more details and step scripts.
What I want is to run my WPF automation tests (integration tests) in the continuous integration process when possible. It means, everytime something is pushed to the source control I want to trigger an event that starts the WPF automation tests. However integration tests are slower than unit tests that is why I would like to execute them in Parallel, in several Virtual Machines. Is there any framework or tools that allows me to run my WPF automation tests in parallel?
We use Jenkins. Our system tests are built on top of a proprietary framework written in C#.
Jenkins allows jobs to be triggered by SCM changes (SVN, Git, and Mercurial are all supported via plugins). It also allows jobs to be run on remote slaves (in parallel, if needed). You do need to configure your jobs and slaves by hand. Configuring jobs can be done with build parameters: say, you have only one job that accepts test id's as parameters, but it can run on several slaves; you can configure one trigger job that will start several test jobs on different slaves passing to them test id's as parameters.
Configuring slaves is made much easier when your slaves are virtual machines. You configure one VM and then copy it (make sure that node-specific information, such as Node Name is not hard-coded and is easily configurable).
The main advantages of Jenkins:
It's free
It has an extendable architecture allowing people to extend it via plugins. As a matter of fact, at this stage (unlike, say, a year and a half ago) everything I need to do can be done either via plugins or Jenkins HTTP API.
It's can be rapidly deployed. Jenkins runs in its own servlet container. You can deploy and start playing with it in less than an hour.
Active community. Check, for example, [jenkins], [hudson], and [jenkins-plugins] tags on SO.
I propose that you give it a try: play with it for a couple of days, chance are you'll like it.
Update: Here's an old answer I've liked recommending Jenkins.