airflow initdb in directory other than AIRFLOW_HOME? - database

Question for Apache Airflow / Docker users. I have a Docker airflow image I've built and I'm trying to use a simple SequentialExecutor / sqlite metadata database, but I'd like to persist the metadata database every time a new container is run. I'd like to do this by mounting to a drive on the local machine, and having it so initdb initializes the database somewhere other than AIRFLOW_HOME. Is this possible / configurable somehow or does anyone have a better solution?
Basically the desired state is:
AIRFLOW_HOME: contains airflow.cfg, dags, scripts, logs whatever
some_other_dir: airflow.db
I know this is possible with logs, so why not the database?
Thanks!

I think the best option is to use docker-compose with a container as metadata database, like this:
https://github.com/apache/airflow/issues/8605#issuecomment-623182960
I use this approach, along with git branches and it works very well. The data persists unless you explicitly remove the containers with make rm

Related

How do you manage static data for microservices?

For a database-per-service architecture, how do you guys manage your static data for each microservice? I want to make it easy for a new developer to jump in and get everything up and running easily on their local machine. I'm thinking of checking the entire database with static data into source control with Docker bind mounts so people can just docker-compose up the database service locally (along with whatever other infrastructure services they might need to run and test their microservice).
I know each microservice might need to handle this in their own way, but I'd like to provide a good default template for people to start with.
Making a standard for how to do this sort of goes against the reason for making microservices, i.e. that you can adapt each microservice to the context it exists in.
That being said, Postgres, Mongo and MySQL all run scripts in /docker-entrypoint-initdb.d when initializing a fresh database instance. The scripts have to fit the database obviously, but it's a fairly standardized way of doing it.
They all have descriptions of how to do it on the image page on docker hub.
You can either get your scripts into the container by making a custom image that contains the scripts or you can map them into the directory using a docker-compose volume mapping.
There are some databases that don't have an easy way to initialize a new database. MSSQL comes to mind. In that case, you might have to handle it programmatically.

Change database location of docker couchdb instance?

I have a server with two disks: one is an SSD used by the operating system and the other is a normal 2.5TB HDD.
Now on this server I'm runnning Fedora Server 22 with Docker and there is one image currently running: Fedora/couchdb.
the problem is that this container is saving the database to the much smaller SSD when it should really be stored in the much bigger HDD.
How can I setup this image to store the database on the HDD?
Other Information:
you can specify the disk and index location for couchdb in the config file with
[couchdb]
database_dir = /path/to/dir
view_index_dir = /path/to/dir
how to add a custom configuration to the startup is explained here
http://docs.couchdb.org/en/1.6.1/config/intro.html
of course in order to use the desired path for your dbs in the container you need to make sure it is accessible inside the docker image
this explains everything you need to do so:
https://docs.docker.com/engine/userguide/dockervolumes/
if that does not help please explain in more detail what your setup is and what you want to acchieve.

What's the correct way to deal with databases in Git?

I am hosting a website on Heroku, and using an SQLite database with it.
The problem is that I want to be able to pull the database from the repository (mostly for backups), but whenever I commit & push changes to the repository, the database should never be altered. This is because the database on my local computer will probably have completely different (and irrelevant) data in it; it's a test database.
What's the best way to go about this? I have tried adding my database to the .gitignore file, but that results in the database being unversioned completely, disabling me to pull it when I need to.
While git (just like most other version control systems) supports tracking binary files like databases, it only does it best for text files. In other words, you should never use version control system to track constantly changing binary database files (unless they are created once and almost never change).
One popular method to still track databases in git is to track text database dumps. For example, SQLite database could be dumped into *.sql file using sqlite3 utility (subcommand .dump). However, even when using dumps, it is only appropriate to track template databases which do not change very often, and create binary database from such dumps using scripts as part of standard deployment.
you could add a pre-commit hook to your local repository, that will unstage any files that you don't want to push.
e.g. add the following to .git/hooks/pre-commit
git reset ./file/to/database.db
when working on your code (potentially modifying your database) you will at some point end up:
$ git status --porcelain
M file/to/database.db
M src/foo.cc
$ git add .
$ git commit -m "fixing BUG in foo.cc"
M file/to/database.db
.
[master 12345] fixing BUG in foo.cc
1 file changed, 1 deletion(-)
$ git status --porcelain
M file/to/database.db
so you can never accidentally commit changes made to your database.db
Is it the schema of your database you're interested in versioning? But making sure you don't version the data within it?
I'd exclude your database from git (using the .gitignore file).
If you're using an ORM and migrations (e.g. Active Record) then your schema is already tracked in your code and can be recreated.
However if you're not then you may want to take a copy of your database, then save out the create statements and version them.
Heroku don't recommend using SQLite in production, and to use their Postgres system instead. That lets you do many tasks to the remote DB.
If you want to pull the live database from Heroku the instructions for Postgres backups might be helpful.
https://devcenter.heroku.com/articles/pgbackups
https://devcenter.heroku.com/articles/heroku-postgres-import-export

EpiServer CMS - get all changed properties

I have an EPIServer CMS. I have a staging instance and a production instance. I want to be able to edit properties/texts in the staging instance, and then in one operation migrate all the new values to production. What is the easiest way to do this?
I suppose I should do something like programatically enumerate all changed properties since a given timestamp and then save key/values to a file, and then update in production from the file. Os is there a better way built-in to achieve the same?
Not built in. If your stage db is a copy of production when you start you can export the pages from stage and include page types and then import them to production, but they will get new ids and you'd have to delete the originals. You would also lose all updates made to production during development. I think you're better of writing that xml exporter/importer.
Looks like Episerver Mirroring will solve your problem. You can use mirroring to move content from staging to prod with the help of scheduled job or by running the job manually.

Empty my Sqlite3 database in RoR

I am working on a Ruby on Rails 3 web application using sqlite3. I have been testing my application on-the-fly creating and destroying things in the Database, sometimes through the new and edit actions and sometimes through the Rails console.
I am interested in emptying my Database totally and having only the empty tables left. How can I achieve this? I am working with a team so I am interested in two answers:
1) How do I empty the Database only by me?
2) How can I (if possible empty) it by the others (some of which are not using sqlite3 but MySql)? (we are all working on an the same project through a SVN repository)
To reset your database, you can run:
rake db:schema:load
Which will recreate your database from your schema.rb file (maintained by your migrations). This will additionally protect you from migrations that may later fail due to code changes.
Your dev database should be distinct to your environment - if you need certain data, add it to your seed.rb file. Don't share a dev database, as you'll quickly get into situations where other changes make your version incompatible.
Download sqlitebrower here http://sqlitebrowser.org/
Install it, run it, click open database (top left) to locationOfYourRailsApp/db/development.sqlite3
Then switch to Browse data tab, there you can delete or add data.
I found that by deleting the deployment.sqlite3 file from the db folder and inserting the command rake db:migrate in the command line, it solves the problem for all of my team working on sqlite3.
As far as I know there is no USER GRANT management in sqlite so it is difficult to control access.
You only can protect the database by file access.
If you want to use an empty database for test purpose.
Generate it once and copy the file somewhere.
and use a copy of this file just before testing.

Resources