When to rebuild an image in docker? - database

I'm just starting with docker and I'm a bit confused with the volumes.
Suppose I have create a container with a volume, a folder from my computer and a folder from the container. Later in my pc folder I create a new sqlite database (so it is also created in the container, right?). When the container dies and you create a new container FROM THE SAME IMAGE, the database will still be there. Am I ok so far? But the image has been updated with this new database or what has happened? If not, is it better in these cases to rebuild the image and build a container? (or when should i rebuild the image? when i change environment variables?)
Thank you very much, I hope you can help me!

Perhaps a simpler case is if the database is in a separate container. Consider this docker-compose.yml file:
version: '3.8'
services:
app:
build: .
ports: ['8000:80']
environment: { PGHOST: db, ... }
db:
image: 'postgres:15.1'
environment: { ... }
volumes:
- 'dbdata:/var/lib/postgresql/data'
volumes:
dbdata:
First, notice that nothing at all is mounted on the app container. If the code changes, you must rebuild the image, and if the image changes you must recreate the container. (Compare to compiled languages like C++, Go, Rust, Java, or Elixir: if you change your code you generally must recompile the executable and then launch a new process.)
Hopefully the basic lifecycle of the db container and dbdata named volume are clear.
Now assume that a PostgreSQL 15.2 is released, and you change the image: accordingly. The official image build can't know about your specific data, and it can't be included in the image. Instead, the existing volume will get mounted into a new container running the new image.
[I'd recommend a setup like this if possible, with data stored in a database separate from your application container. This is helpful if you ever want to run multiple replicas of your application, or if you're considering running the database on dedicated hardware or a hosted public-cloud offering.]
You should be able to generalize this to the setup you describe in the question. If you have more like:
version: '3.8'
services:
app:
build: .
ports: ['8000:80']
volumes:
- ./data:/app/data # containing the SQLite file
The bind-mounted /app/data directory actually has its storage outside the container. If the container is deleted and recreated, the data is still there. You do need to rebuild the image if the code changes, but the data is not stored in the image at all (and while it's accessible from the container, it's not stored in the container filesystem as such).

Related

How do you manage static data for microservices?

For a database-per-service architecture, how do you guys manage your static data for each microservice? I want to make it easy for a new developer to jump in and get everything up and running easily on their local machine. I'm thinking of checking the entire database with static data into source control with Docker bind mounts so people can just docker-compose up the database service locally (along with whatever other infrastructure services they might need to run and test their microservice).
I know each microservice might need to handle this in their own way, but I'd like to provide a good default template for people to start with.
Making a standard for how to do this sort of goes against the reason for making microservices, i.e. that you can adapt each microservice to the context it exists in.
That being said, Postgres, Mongo and MySQL all run scripts in /docker-entrypoint-initdb.d when initializing a fresh database instance. The scripts have to fit the database obviously, but it's a fairly standardized way of doing it.
They all have descriptions of how to do it on the image page on docker hub.
You can either get your scripts into the container by making a custom image that contains the scripts or you can map them into the directory using a docker-compose volume mapping.
There are some databases that don't have an easy way to initialize a new database. MSSQL comes to mind. In that case, you might have to handle it programmatically.

airflow initdb in directory other than AIRFLOW_HOME?

Question for Apache Airflow / Docker users. I have a Docker airflow image I've built and I'm trying to use a simple SequentialExecutor / sqlite metadata database, but I'd like to persist the metadata database every time a new container is run. I'd like to do this by mounting to a drive on the local machine, and having it so initdb initializes the database somewhere other than AIRFLOW_HOME. Is this possible / configurable somehow or does anyone have a better solution?
Basically the desired state is:
AIRFLOW_HOME: contains airflow.cfg, dags, scripts, logs whatever
some_other_dir: airflow.db
I know this is possible with logs, so why not the database?
Thanks!
I think the best option is to use docker-compose with a container as metadata database, like this:
https://github.com/apache/airflow/issues/8605#issuecomment-623182960
I use this approach, along with git branches and it works very well. The data persists unless you explicitly remove the containers with make rm

How to properly initialize DB container in Kubernetes

I will have one container in registry with the app. Then I would like to start DB container and somehow init the DB with scripts.
How to fit kubernetes jobs and init containers in my flow. Given my init container will somehow invoke flyway update, which of course needs to run only if the DB is ready. I am also not sure about the placement of the scripts itself(I don't want to refer any hostpath) and how to choose the set that should be invoked(i.e. only create test data for development).
Is it overcomplicated to create only one kubernetes files for all environment with such setup(DEV, FAT, PROD)?

How does Docker handle concurrency on database when many containers are using the same database basic image?

I mean, if it works to save time, hence I read that you can do test running on parallel on several containers over the same basic image of an APP, saving in the container only the changes, then how does it work with DB concurrency?
When multiple containers are run from a given base image, they each work on their own "copy" of the image so there will be no concurrency issue as each container is it's own database.
See the docs for more information.

Change database location of docker couchdb instance?

I have a server with two disks: one is an SSD used by the operating system and the other is a normal 2.5TB HDD.
Now on this server I'm runnning Fedora Server 22 with Docker and there is one image currently running: Fedora/couchdb.
the problem is that this container is saving the database to the much smaller SSD when it should really be stored in the much bigger HDD.
How can I setup this image to store the database on the HDD?
Other Information:
you can specify the disk and index location for couchdb in the config file with
[couchdb]
database_dir = /path/to/dir
view_index_dir = /path/to/dir
how to add a custom configuration to the startup is explained here
http://docs.couchdb.org/en/1.6.1/config/intro.html
of course in order to use the desired path for your dbs in the container you need to make sure it is accessible inside the docker image
this explains everything you need to do so:
https://docs.docker.com/engine/userguide/dockervolumes/
if that does not help please explain in more detail what your setup is and what you want to acchieve.

Resources