Can a Redis instance be configured to NOT overwrite old snapshots? - database

Currently I am using the redis.conf file to provide the fixed directory and filename to my instance to save the redis dump.rdb snapshot.
My intention is to compare two redis snapshots taken at different times.
But Redis rewrites over the old dump file after creating the new one.
I checked the redis repo on github and found the rdb.c file, which has the code that executes the SAVE commands and rewrites over old snapshots.
Before messing with code(since i'm not an experienced developer), I wanted to ask if there is a better way to save snapshots taken at different times? Or if I could just save the last 2 snapshots at a time?

You can use incron to watch the dump directory and execute a script
sudo apt-get install incron
echo "redis" >> /etc/incron.allow
export EDITOR=vi
incrontab -e
/path/where/you/dump/files IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /bin/copy_snapshot
then create a /bin/copy_snapshot script file rename it with a date or something and make sure there are X number of copies

Related

PostgreSQL : Find Postgresql database file on another drive and restore it

I am working on a PostgreSQL database and recently we had a server upgrade, during which we changed our drive from a 2Tb raid Hard disk to a SSD. Now I mounted the RAID drive on a partition and can even access it.
Next what I would like to do is to get the database out of the mounted drive and restore it on the currently running PostgreSQL. How can I achieve this?
root#check03:/mnt/var/lib/postgresql/9.1/main/global# ls
11672 11674 11805 11809 11811 11813_fsm 11816 11820 11822 11824_fsm 11828 11916 11920 pg_internal.init
11672_fsm 11675 11807 11809_fsm 11812 11813_vm 11818 11820_fsm 11823 11824_vm 11829 11918 pg_control pgstat.stat
11672_vm 11803 11808 11809_vm 11813 11815 11819 11820_vm 11824 11826 11914 11919 pg_filenode.map
root#check03:/mnt/var/lib/postgresql/9.1/main/global# cd ..
As you can see I am able to access the drives and the folders, but I don't know what to do next. Kindly let me know. Thanks a lot.
You need the same version of PostgreSQL (9.1), also the same or later minor version. copy main/ and everything below that to the new location. Copy the configuration of the old instance and adapt the paths to fit to the new location (the main/ ist the ''data directory'' (also sometimes called PGDATA)). Start the new instance and look carefully at the logs. You should probably rebuild any indexes.
Also read about the file layout in the fine documentation.
EDIT: If you have any chance to run the old configuration, read about backup and restore, this is a much more safe way to transfer data.
the Postgres binaries must be the same version
make sure that postgres is not running
copy using cp -rfp or tar | tar or cpio , or whatever you like. Make sure you preserve the file owners and mode (top-level-directory must be 0700, owned by postgres)
make sure that the postgres-startup (in /etc/init.d/postxxx) refers to the new directory; sometimes there is an environment variable $PGDATA contiaining the name of the postgres data directory; maybe you need to make changes to new_directory/postgres.conf, too (pg_log et al)
for safety, rename the old data directory
restart Postgres
try to connect to it; check the logs.
Extra:
Seasoned unix-administrators (like the BOFH ;-) might want to juggle with mountpoints and/or symlinks (instead of copying). Be my guest. YMMV
Seasoned DBAs might want to create a tablespace, point it at the new location and (selectively) move databases, schemas or tables to the new location.

use rpm -V against backed-up database

rpm(1) provides a -V option to verify installed files against the installation database, which can be used to detect modified or missing files.
This might be used as a form of intrusion detection (or at least part of an audit). However, it is of course possible that the rpm database installed may be modified by a hacker to hide their tracks (see http://www.sans.org/security-resources/idfaq/rpm.php, last sentence)
It looks like it should be possible to back up the rpm database /var/lib/rpm after every install (to some external medium) and to use that during an audit using --dbpath. Such a backup would have to be updated fo course after every install or upgrade etc.
Is this feasible? Are there any resources that detail methods, pitfalls, suggestions etc for this?
Yes feasible. Use "rpm -Va --dbpath /some/where/else" to point to
some saved database directory.
Copy /var/lib/rpm/Packages to the saved /some/where/else directory,
and run "rpm --rebuilddb --dbpath /some/where/else" to regenerate
the indices.
Note that you can also verify files using the original packaging
like "rpm -Vp some*.rpm" which is often less hassle (and more
secure with RO offline media storing packages) than saving copies
of the installed /var/lib/rpm/Packages rpmdb.

What's the correct way to deal with databases in Git?

I am hosting a website on Heroku, and using an SQLite database with it.
The problem is that I want to be able to pull the database from the repository (mostly for backups), but whenever I commit & push changes to the repository, the database should never be altered. This is because the database on my local computer will probably have completely different (and irrelevant) data in it; it's a test database.
What's the best way to go about this? I have tried adding my database to the .gitignore file, but that results in the database being unversioned completely, disabling me to pull it when I need to.
While git (just like most other version control systems) supports tracking binary files like databases, it only does it best for text files. In other words, you should never use version control system to track constantly changing binary database files (unless they are created once and almost never change).
One popular method to still track databases in git is to track text database dumps. For example, SQLite database could be dumped into *.sql file using sqlite3 utility (subcommand .dump). However, even when using dumps, it is only appropriate to track template databases which do not change very often, and create binary database from such dumps using scripts as part of standard deployment.
you could add a pre-commit hook to your local repository, that will unstage any files that you don't want to push.
e.g. add the following to .git/hooks/pre-commit
git reset ./file/to/database.db
when working on your code (potentially modifying your database) you will at some point end up:
$ git status --porcelain
M file/to/database.db
M src/foo.cc
$ git add .
$ git commit -m "fixing BUG in foo.cc"
M file/to/database.db
.
[master 12345] fixing BUG in foo.cc
1 file changed, 1 deletion(-)
$ git status --porcelain
M file/to/database.db
so you can never accidentally commit changes made to your database.db
Is it the schema of your database you're interested in versioning? But making sure you don't version the data within it?
I'd exclude your database from git (using the .gitignore file).
If you're using an ORM and migrations (e.g. Active Record) then your schema is already tracked in your code and can be recreated.
However if you're not then you may want to take a copy of your database, then save out the create statements and version them.
Heroku don't recommend using SQLite in production, and to use their Postgres system instead. That lets you do many tasks to the remote DB.
If you want to pull the live database from Heroku the instructions for Postgres backups might be helpful.
https://devcenter.heroku.com/articles/pgbackups
https://devcenter.heroku.com/articles/heroku-postgres-import-export

I'm using a mongodb with a custom folder for saving data but that data is not synced with github because the files doesnt change

I use github to version my files and I would like to version my database too, in this case is only for testing purposes.
But the database files created by mongodb are not changed, the files change data is weeks ago :s therefore the github has old data..
I can't really understand why if I'm changing some data in the database the mongodb doesn't save to a file... or at least the file must have changed somehow..
MongoDB preallocates datafiles , which then get gradually filled. Perhaps that is why changes are not properly picked up.
As an aside, of all the possible ways of versioning a MongoDB database, I'm not sure that keeping the datadir itself in a Git repository is the best way to go.
Alternatives: running mongodump will result in a BSON-dump of your database or collection, while running mongoexport will result in a JSON or CSV. Both can be read back in with mongorestore and mongoimport, see documentation.
These dumps can then be versioned using you favourite tool. Personally, when using Git, I would version the JSON dump, e.g.
mongoexport --db mydatabase --collection mycollection > mycollection.json
will result in a JSON file, containing the contents of the chosen collection (you can dump the entire database if you want).
Something extra, if you append --csv and --fields fieldname1,fieldname2, you can dump a nice CSV-file, to read in with another program.

Sync web directories between load balanced servers

I have two load balanced servers each running exactly the same copy of various PHP based websites.
An admin user (or multiple admin users) hitting their site might therefore hit one or other of the servers when wanting to change content e.g. upload an image, delete a file from a media library, etc.
These operations mean one or other or both the servers go out of sync with each other and need to be brought back into line.
Currently I'm looking at rsync for this with the --delete option but am unsure how it reacts to files being deleted vs. new files being created between servers.
i.e. if I delete a file on server Server A and rsync with Server B the file should also be deleted from Server B (as it no longer exists on A) BUT If I separately upload a file to Server B as well as deleting a file from Server A before running the sync, will the file that got uploaded to Server B also get removed as it doesn't exist on Server A?
A number of tutorials on the web deal with a Master-Slave type scenario where Server B is a Mirror of Server A and this process just works, but in my situation both servers are effectively Masters mirroring each other.
I think rsync keeps a local history of files it's dealing with and as such may be able to deal with this problem gracefully but am not sure if this is really the case, or if it's dangerous to rely on this alone?
Is there a better way of dealing with this issue?
I wasn't happy with my previous answer. Sounds too much like somebody must have invented a way to do this already.
It seems that there is! Check out Unison. There's a GUI for it and everything.
First, if you're doing a bidirectional rsync (i.e. running it first one way, and then the other) then you need to be using --update, and you need to have the clocks on both servers precisely aligned. If both servers write to the same file, the last write wins, and the earlier write is lost.
Second, I don't think you can use delete. Not directly anyway. The only state that rsync keeps is the state of the filesystem itself, and if that's a moving target then it'll get confused.
I suggest that when you delete a file you write its name to a file. Then, instead of using rsync --delete, do it manually with, for example, cat deleted-files | ssh serverb xargs rm -v.
So, your process would look something like this:
ServerA:
rsync -a --update mydir serverB:mydir
cat deleted-files | ssh serverB xargs rm -v
ServerB:
rsync -a --update mydir serverA:mydir
cat deleted-files | ssh serverA xargs rm -v
Obviously, the two syncs can't run at the same time, and I've left off other important rsync options: you probably want to consider --delay-updates, --partial-dir and others.

Resources