Change database location of docker couchdb instance? - database

I have a server with two disks: one is an SSD used by the operating system and the other is a normal 2.5TB HDD.
Now on this server I'm runnning Fedora Server 22 with Docker and there is one image currently running: Fedora/couchdb.
the problem is that this container is saving the database to the much smaller SSD when it should really be stored in the much bigger HDD.
How can I setup this image to store the database on the HDD?
Other Information:

you can specify the disk and index location for couchdb in the config file with
[couchdb]
database_dir = /path/to/dir
view_index_dir = /path/to/dir
how to add a custom configuration to the startup is explained here
http://docs.couchdb.org/en/1.6.1/config/intro.html
of course in order to use the desired path for your dbs in the container you need to make sure it is accessible inside the docker image
this explains everything you need to do so:
https://docs.docker.com/engine/userguide/dockervolumes/
if that does not help please explain in more detail what your setup is and what you want to acchieve.

Related

How do you manage static data for microservices?

For a database-per-service architecture, how do you guys manage your static data for each microservice? I want to make it easy for a new developer to jump in and get everything up and running easily on their local machine. I'm thinking of checking the entire database with static data into source control with Docker bind mounts so people can just docker-compose up the database service locally (along with whatever other infrastructure services they might need to run and test their microservice).
I know each microservice might need to handle this in their own way, but I'd like to provide a good default template for people to start with.
Making a standard for how to do this sort of goes against the reason for making microservices, i.e. that you can adapt each microservice to the context it exists in.
That being said, Postgres, Mongo and MySQL all run scripts in /docker-entrypoint-initdb.d when initializing a fresh database instance. The scripts have to fit the database obviously, but it's a fairly standardized way of doing it.
They all have descriptions of how to do it on the image page on docker hub.
You can either get your scripts into the container by making a custom image that contains the scripts or you can map them into the directory using a docker-compose volume mapping.
There are some databases that don't have an easy way to initialize a new database. MSSQL comes to mind. In that case, you might have to handle it programmatically.

airflow initdb in directory other than AIRFLOW_HOME?

Question for Apache Airflow / Docker users. I have a Docker airflow image I've built and I'm trying to use a simple SequentialExecutor / sqlite metadata database, but I'd like to persist the metadata database every time a new container is run. I'd like to do this by mounting to a drive on the local machine, and having it so initdb initializes the database somewhere other than AIRFLOW_HOME. Is this possible / configurable somehow or does anyone have a better solution?
Basically the desired state is:
AIRFLOW_HOME: contains airflow.cfg, dags, scripts, logs whatever
some_other_dir: airflow.db
I know this is possible with logs, so why not the database?
Thanks!
I think the best option is to use docker-compose with a container as metadata database, like this:
https://github.com/apache/airflow/issues/8605#issuecomment-623182960
I use this approach, along with git branches and it works very well. The data persists unless you explicitly remove the containers with make rm

Docker Like DB Deployment

I've just finished setting a dev environment where every developers /feature/, /bugfix/ and /hotfix/* git branches are automatically built and deployed to a freshly provisioned Windows Container which hosts the webapp and services creating a test environment, for each branch to be validated before merging into master.
While this is working quite nicely, I've still only got 1 dev db per developer which is used by all of their branches.
In an ideal world, I would like each of these test containers to use their own isolated db instance, however the db is currently at about 50gb at the smallest i can get it without going and tearing out historical data which is sometime useful.
What I would really like to do, is create a docker like image for this db and then spawn a new "container" from this image which only keeps track of the diff between it's changes an the original without ever altering the original db.
Is something like this even possible or does anyone have any ideas how I might achieve this db isolation, per container without having to create a full 50gb db for each?
Ok, So after much flailing around in the dark, I think I've finally come up with a solution. #ErikEJ, thanks, you started me off in the right direction. After looking into DB Snapshots on MSSQL, I found the only way to do writable snapshots seemed to be using vss and actually creating writable disk snapshots. This too led me down a long path of at first trying locally and failing, then trying to implement iscsi and still getting nowhere. Then I stumbled upon hyper-v snapshots and had a look at what was happening under the hood there and finally came across creating differencing VHD's. So basically my solution is as below.
Create VHD with a sanitised copy of my production DB mdf and ldf file in.
Then I create differencing vhds for each environment I need, mount the differencing vhds each in their own folder eg c:\db\Issue-1234 and create a new db DB_ISSUE-1234 by atatching to the files in these folders. Only the diffs are then stored and instead of having several 50-60gig copies of the db. I only have 1 and then the differencing vhds only store the difference.
I've just got this working with 2 or 3, so not sure how robust it is and how fast these differencing vhds are going to grow, but looking very promising so far and is allowing me to spin up multiple environments for testing purposes, extremely quickly (in fact all automated by scripts in deployment).
Hope this possibly helps someone else save some time one day and please let me know if anyone has figured out a more efficient/quicker/better way to do this :)

PostgreSQL : Find Postgresql database file on another drive and restore it

I am working on a PostgreSQL database and recently we had a server upgrade, during which we changed our drive from a 2Tb raid Hard disk to a SSD. Now I mounted the RAID drive on a partition and can even access it.
Next what I would like to do is to get the database out of the mounted drive and restore it on the currently running PostgreSQL. How can I achieve this?
root#check03:/mnt/var/lib/postgresql/9.1/main/global# ls
11672 11674 11805 11809 11811 11813_fsm 11816 11820 11822 11824_fsm 11828 11916 11920 pg_internal.init
11672_fsm 11675 11807 11809_fsm 11812 11813_vm 11818 11820_fsm 11823 11824_vm 11829 11918 pg_control pgstat.stat
11672_vm 11803 11808 11809_vm 11813 11815 11819 11820_vm 11824 11826 11914 11919 pg_filenode.map
root#check03:/mnt/var/lib/postgresql/9.1/main/global# cd ..
As you can see I am able to access the drives and the folders, but I don't know what to do next. Kindly let me know. Thanks a lot.
You need the same version of PostgreSQL (9.1), also the same or later minor version. copy main/ and everything below that to the new location. Copy the configuration of the old instance and adapt the paths to fit to the new location (the main/ ist the ''data directory'' (also sometimes called PGDATA)). Start the new instance and look carefully at the logs. You should probably rebuild any indexes.
Also read about the file layout in the fine documentation.
EDIT: If you have any chance to run the old configuration, read about backup and restore, this is a much more safe way to transfer data.
the Postgres binaries must be the same version
make sure that postgres is not running
copy using cp -rfp or tar | tar or cpio , or whatever you like. Make sure you preserve the file owners and mode (top-level-directory must be 0700, owned by postgres)
make sure that the postgres-startup (in /etc/init.d/postxxx) refers to the new directory; sometimes there is an environment variable $PGDATA contiaining the name of the postgres data directory; maybe you need to make changes to new_directory/postgres.conf, too (pg_log et al)
for safety, rename the old data directory
restart Postgres
try to connect to it; check the logs.
Extra:
Seasoned unix-administrators (like the BOFH ;-) might want to juggle with mountpoints and/or symlinks (instead of copying). Be my guest. YMMV
Seasoned DBAs might want to create a tablespace, point it at the new location and (selectively) move databases, schemas or tables to the new location.

Should data files be stored on the same computer (server) the database is stored in?

Currently in our research group, we have many "data files" stored on three servers and a couple of personal computers running different operating systems.
We want to build a database, which would store some information in addition to the URLs of those various "data files". My question is, do we have to copy all the data files and put them in a directory in the same server the database is in? Or can they be left as they are on the different computers? If the second case is ok, what would be the format of the url of the "data files"?
It really depends on what your intended goal is and what your current setup is like
If the files are currently sitting somewhere on the network, and you need a path that the application can use to access them, you just need to store the network path (\\server\share\file for Windows environments) in the database, then read it and access that path to access the files. You'll need to make sure everyone has read access to them.
If the files are currently accessible through a website URL, internal or external, then again, you just need to store that URL (or some portion thereof) (http://mywebsite.com/myfile or http://servername/myfile) and access that.
If either of the above are not currently true, but you want them to be, then you'll need to set up a new share/webserver and put the files there. There's no requirement that this be the same server as the database, but it'd make for better backups if it was.
If you want the files themselves to be in the database, you should check out Bob Fanger's link.
Not sure what you're asking here but...
If you want your database engine to read files filled with data, it probably doesn't matter where they are stored - though this may depend on the database you are using. Are you using MySQL? MS-SQL Server? Oracle?
Many database vendors provide relatively easy-to-use admin tools that would let you choose a file to be loaded, and usually the file chooser dialoge lets you browse networks so you could load a file over the network. Details on how to do this vary so consult the manual for your database engine for loading data from a pre-existing file.
Be aware that if the database is on Computer A and the data is being loaded from Computer B over the network, it will probably be slower than if the data was on the same computer as the database.
It doesn't really matter if the files are stored outside the database anyway.
See Storing Images in DB - Yea or Nay? for more thoughts on that one.
If the files accessible by an url, you can store that with the meta data, like
http://server1/folder/file.ext, file://\server1\folder\file.ext or "file://P:\folder\file.ext"
Things to consider:
Backups
Performance
Synchronisation between the meta-data and the data

Resources