Error in custom database built with Kraken2 - database

I need to do some classifications with kraken2 for my master thesis. In order to do that, I need to build a custom db with this software.
In this db I need bacteria and fungi, for example.
I followed the steps in the documentation. I started downloading the taxonomy and adding the libraries as follow:
kraken2-build --download-taxonomy --db $DBNAME
kraken2-build --download-library bacteria --db $DBNAME
kraken2-build --download-library fungi --db $DBNAME
After I built my db using the following line:
kraken2-build --build --db $DBNAME
Eventually, I obtained an empty db, because if I do the inspect on it, the table size is equal to zero.
I also found another way in a github thread. If I add the flag "--no-masking" to the libraries, I have no errors. Actually, I do not think that this approach is the best one. So I need an help to resolve the problem.
Tell me if you need more information.
Thanks!

Related

airflow initdb in directory other than AIRFLOW_HOME?

Question for Apache Airflow / Docker users. I have a Docker airflow image I've built and I'm trying to use a simple SequentialExecutor / sqlite metadata database, but I'd like to persist the metadata database every time a new container is run. I'd like to do this by mounting to a drive on the local machine, and having it so initdb initializes the database somewhere other than AIRFLOW_HOME. Is this possible / configurable somehow or does anyone have a better solution?
Basically the desired state is:
AIRFLOW_HOME: contains airflow.cfg, dags, scripts, logs whatever
some_other_dir: airflow.db
I know this is possible with logs, so why not the database?
Thanks!
I think the best option is to use docker-compose with a container as metadata database, like this:
https://github.com/apache/airflow/issues/8605#issuecomment-623182960
I use this approach, along with git branches and it works very well. The data persists unless you explicitly remove the containers with make rm

A way to check Oracle finished sql

I've got Docker Compose cluster and one of the containers is Oracle 12c. There is schema.sql file to initialize the db. I would like my application to wait until the db executes all the sql. How can I do it in an automatic fashion with bash?
Thank you very much for any suggestions!
There's a lot to explain here, but I'll link one of my previous answers for a similar problem - steps are actually the same because only the database service and background differs.
1)
First thing is, you have to provide a bash script that will wait until a service will respond via http. In databases it usually happens when DB is ready to go and all initializations are done.
the wait-for-it.sh script written by vishnubob in his wait-for-it repo # github.
2)
Second thing, you have to get that script inside each container that requires your DB.
3)
Third, you specify a entrypoint in your compose file, that will execute the waiting script before the actual command running your service will trigger.
example of a entrypoint (as reference to the answer I link to)
docker-entrypoint.sh:
#!/bin/bash
set -e
sh -c './wait-for-it.sh oracle:3306 -t 30'
exec "$#"
All these steps are explained in detail here in scenario 2, be aware of a reference to my another answer inside the answer I'm pointing at here. This issue is a very common problem for beginners and takes quite a lot of explanation, so I cannot post it all here.
note here concerning depends_on which you might think is a native solution for this problem from docker - as docs state, it only waits until the container is running, not actually finished it's internal jobs - docker is not aware of how much there is to be done.

What's the correct way to deal with databases in Git?

I am hosting a website on Heroku, and using an SQLite database with it.
The problem is that I want to be able to pull the database from the repository (mostly for backups), but whenever I commit & push changes to the repository, the database should never be altered. This is because the database on my local computer will probably have completely different (and irrelevant) data in it; it's a test database.
What's the best way to go about this? I have tried adding my database to the .gitignore file, but that results in the database being unversioned completely, disabling me to pull it when I need to.
While git (just like most other version control systems) supports tracking binary files like databases, it only does it best for text files. In other words, you should never use version control system to track constantly changing binary database files (unless they are created once and almost never change).
One popular method to still track databases in git is to track text database dumps. For example, SQLite database could be dumped into *.sql file using sqlite3 utility (subcommand .dump). However, even when using dumps, it is only appropriate to track template databases which do not change very often, and create binary database from such dumps using scripts as part of standard deployment.
you could add a pre-commit hook to your local repository, that will unstage any files that you don't want to push.
e.g. add the following to .git/hooks/pre-commit
git reset ./file/to/database.db
when working on your code (potentially modifying your database) you will at some point end up:
$ git status --porcelain
M file/to/database.db
M src/foo.cc
$ git add .
$ git commit -m "fixing BUG in foo.cc"
M file/to/database.db
.
[master 12345] fixing BUG in foo.cc
1 file changed, 1 deletion(-)
$ git status --porcelain
M file/to/database.db
so you can never accidentally commit changes made to your database.db
Is it the schema of your database you're interested in versioning? But making sure you don't version the data within it?
I'd exclude your database from git (using the .gitignore file).
If you're using an ORM and migrations (e.g. Active Record) then your schema is already tracked in your code and can be recreated.
However if you're not then you may want to take a copy of your database, then save out the create statements and version them.
Heroku don't recommend using SQLite in production, and to use their Postgres system instead. That lets you do many tasks to the remote DB.
If you want to pull the live database from Heroku the instructions for Postgres backups might be helpful.
https://devcenter.heroku.com/articles/pgbackups
https://devcenter.heroku.com/articles/heroku-postgres-import-export

How to make a copy of schema in Amazon RDS (Oracle)?

For all the developers in a team, i am trying to automate creating a dedicated schema for each one of them from a 'master' schema. How can I achieve this?
At the end all I want is: given 'developer_1' schema name, 'developer_1' will have all the tables, views, sequences from schema 'master' along with the index and constrains. Online search pointed to DataPump. AWS documentation seemed pretty light. I am looking to have this setup such a way that, this can be invoked every week to get latest snapshot from master schema. (blowing up whatever existed for developer_1)
Thanks,
You will likely want to you Oracle's Data Pump utility. With this, you can create a schema dump (export) and then import that dump. On import, you can use the handy REMAP_SCHEMA command-line parameter to change the schema name on import.
The links below should help get you started.
Export: http://docs.oracle.com/cd/B28359_01/server.111/b28319/dp_export.htm
Import: http://docs.oracle.com/cd/B28359_01/server.111/b28319/dp_import.htm
I was looking for something similar and found this (and then you during the google-trawling):
http://learnwithme11g.wordpress.com/2012/06/07/copy-schema-into-same-database-with-impdp/
I think you can get around the CLI access by using something like:
!mkdir /path/to/dir
then invoking:
create directory as TEMP_DIR='/path/to/dir'
in SQL*Plus.
Also, be aware that the syntax is wrong in site above.

Empty my Sqlite3 database in RoR

I am working on a Ruby on Rails 3 web application using sqlite3. I have been testing my application on-the-fly creating and destroying things in the Database, sometimes through the new and edit actions and sometimes through the Rails console.
I am interested in emptying my Database totally and having only the empty tables left. How can I achieve this? I am working with a team so I am interested in two answers:
1) How do I empty the Database only by me?
2) How can I (if possible empty) it by the others (some of which are not using sqlite3 but MySql)? (we are all working on an the same project through a SVN repository)
To reset your database, you can run:
rake db:schema:load
Which will recreate your database from your schema.rb file (maintained by your migrations). This will additionally protect you from migrations that may later fail due to code changes.
Your dev database should be distinct to your environment - if you need certain data, add it to your seed.rb file. Don't share a dev database, as you'll quickly get into situations where other changes make your version incompatible.
Download sqlitebrower here http://sqlitebrowser.org/
Install it, run it, click open database (top left) to locationOfYourRailsApp/db/development.sqlite3
Then switch to Browse data tab, there you can delete or add data.
I found that by deleting the deployment.sqlite3 file from the db folder and inserting the command rake db:migrate in the command line, it solves the problem for all of my team working on sqlite3.
As far as I know there is no USER GRANT management in sqlite so it is difficult to control access.
You only can protect the database by file access.
If you want to use an empty database for test purpose.
Generate it once and copy the file somewhere.
and use a copy of this file just before testing.

Resources