How to properly initialize DB container in Kubernetes - database

I will have one container in registry with the app. Then I would like to start DB container and somehow init the DB with scripts.
How to fit kubernetes jobs and init containers in my flow. Given my init container will somehow invoke flyway update, which of course needs to run only if the DB is ready. I am also not sure about the placement of the scripts itself(I don't want to refer any hostpath) and how to choose the set that should be invoked(i.e. only create test data for development).
Is it overcomplicated to create only one kubernetes files for all environment with such setup(DEV, FAT, PROD)?

Related

How do you manage static data for microservices?

For a database-per-service architecture, how do you guys manage your static data for each microservice? I want to make it easy for a new developer to jump in and get everything up and running easily on their local machine. I'm thinking of checking the entire database with static data into source control with Docker bind mounts so people can just docker-compose up the database service locally (along with whatever other infrastructure services they might need to run and test their microservice).
I know each microservice might need to handle this in their own way, but I'd like to provide a good default template for people to start with.
Making a standard for how to do this sort of goes against the reason for making microservices, i.e. that you can adapt each microservice to the context it exists in.
That being said, Postgres, Mongo and MySQL all run scripts in /docker-entrypoint-initdb.d when initializing a fresh database instance. The scripts have to fit the database obviously, but it's a fairly standardized way of doing it.
They all have descriptions of how to do it on the image page on docker hub.
You can either get your scripts into the container by making a custom image that contains the scripts or you can map them into the directory using a docker-compose volume mapping.
There are some databases that don't have an easy way to initialize a new database. MSSQL comes to mind. In that case, you might have to handle it programmatically.

airflow initdb in directory other than AIRFLOW_HOME?

Question for Apache Airflow / Docker users. I have a Docker airflow image I've built and I'm trying to use a simple SequentialExecutor / sqlite metadata database, but I'd like to persist the metadata database every time a new container is run. I'd like to do this by mounting to a drive on the local machine, and having it so initdb initializes the database somewhere other than AIRFLOW_HOME. Is this possible / configurable somehow or does anyone have a better solution?
Basically the desired state is:
AIRFLOW_HOME: contains airflow.cfg, dags, scripts, logs whatever
some_other_dir: airflow.db
I know this is possible with logs, so why not the database?
Thanks!
I think the best option is to use docker-compose with a container as metadata database, like this:
https://github.com/apache/airflow/issues/8605#issuecomment-623182960
I use this approach, along with git branches and it works very well. The data persists unless you explicitly remove the containers with make rm

How does Docker handle concurrency on database when many containers are using the same database basic image?

I mean, if it works to save time, hence I read that you can do test running on parallel on several containers over the same basic image of an APP, saving in the container only the changes, then how does it work with DB concurrency?
When multiple containers are run from a given base image, they each work on their own "copy" of the image so there will be no concurrency issue as each container is it's own database.
See the docs for more information.

Change database location of docker couchdb instance?

I have a server with two disks: one is an SSD used by the operating system and the other is a normal 2.5TB HDD.
Now on this server I'm runnning Fedora Server 22 with Docker and there is one image currently running: Fedora/couchdb.
the problem is that this container is saving the database to the much smaller SSD when it should really be stored in the much bigger HDD.
How can I setup this image to store the database on the HDD?
Other Information:
you can specify the disk and index location for couchdb in the config file with
[couchdb]
database_dir = /path/to/dir
view_index_dir = /path/to/dir
how to add a custom configuration to the startup is explained here
http://docs.couchdb.org/en/1.6.1/config/intro.html
of course in order to use the desired path for your dbs in the container you need to make sure it is accessible inside the docker image
this explains everything you need to do so:
https://docs.docker.com/engine/userguide/dockervolumes/
if that does not help please explain in more detail what your setup is and what you want to acchieve.

What's the best way to set up Git staging and production environments with separate databases?

I have 2 branches in Git - staging and production. They are deployed to the same VPS where there is one production database and a separate staging database. This allows us to stage new features without affecting the production environment. Then, when we're ready, we replicate the database changes from staging to production.
What is the best way to set this up so that the staging branch has separate database credentials to production? At the moment, the database creds are stored in a single file. I've been thinking about using gitignore to ignore this file in both branches and edit it manually so that it remains different on each branch. Is this the best thing to do or is there a better way?
We use a cascading approach:
Default settings are in a common "config" file.
For each stage of development, it has its own configuration file, for example we have a config_prod and a config_dev.
Each stage runs as a different (system) user, and for that user we set an environment variable PROJ_SETTINGS and point it to the file that we need to load.
The code then read the defaults, and then overrides them with whatever is available from the resource pointed to by the environment variable (if it exists).
Setting of this variable is taken care of by our normal devops/automation scripts. We have a few advantages:
Keeps all configuration under version control.
Easy to switch settings without modifying the source.
Yes gitignoring the database.yml file is an approach I've used in a few organizations.
We usually keep a database.yml.sample in source control so make it easier. Users just copy that to database.yml and modify as appropriate.

Resources