Sync web directories between load balanced servers - filesystems

I have two load balanced servers each running exactly the same copy of various PHP based websites.
An admin user (or multiple admin users) hitting their site might therefore hit one or other of the servers when wanting to change content e.g. upload an image, delete a file from a media library, etc.
These operations mean one or other or both the servers go out of sync with each other and need to be brought back into line.
Currently I'm looking at rsync for this with the --delete option but am unsure how it reacts to files being deleted vs. new files being created between servers.
i.e. if I delete a file on server Server A and rsync with Server B the file should also be deleted from Server B (as it no longer exists on A) BUT If I separately upload a file to Server B as well as deleting a file from Server A before running the sync, will the file that got uploaded to Server B also get removed as it doesn't exist on Server A?
A number of tutorials on the web deal with a Master-Slave type scenario where Server B is a Mirror of Server A and this process just works, but in my situation both servers are effectively Masters mirroring each other.
I think rsync keeps a local history of files it's dealing with and as such may be able to deal with this problem gracefully but am not sure if this is really the case, or if it's dangerous to rely on this alone?
Is there a better way of dealing with this issue?

I wasn't happy with my previous answer. Sounds too much like somebody must have invented a way to do this already.
It seems that there is! Check out Unison. There's a GUI for it and everything.

First, if you're doing a bidirectional rsync (i.e. running it first one way, and then the other) then you need to be using --update, and you need to have the clocks on both servers precisely aligned. If both servers write to the same file, the last write wins, and the earlier write is lost.
Second, I don't think you can use delete. Not directly anyway. The only state that rsync keeps is the state of the filesystem itself, and if that's a moving target then it'll get confused.
I suggest that when you delete a file you write its name to a file. Then, instead of using rsync --delete, do it manually with, for example, cat deleted-files | ssh serverb xargs rm -v.
So, your process would look something like this:
ServerA:
rsync -a --update mydir serverB:mydir
cat deleted-files | ssh serverB xargs rm -v
ServerB:
rsync -a --update mydir serverA:mydir
cat deleted-files | ssh serverA xargs rm -v
Obviously, the two syncs can't run at the same time, and I've left off other important rsync options: you probably want to consider --delay-updates, --partial-dir and others.

Related

Can a Redis instance be configured to NOT overwrite old snapshots?

Currently I am using the redis.conf file to provide the fixed directory and filename to my instance to save the redis dump.rdb snapshot.
My intention is to compare two redis snapshots taken at different times.
But Redis rewrites over the old dump file after creating the new one.
I checked the redis repo on github and found the rdb.c file, which has the code that executes the SAVE commands and rewrites over old snapshots.
Before messing with code(since i'm not an experienced developer), I wanted to ask if there is a better way to save snapshots taken at different times? Or if I could just save the last 2 snapshots at a time?
You can use incron to watch the dump directory and execute a script
sudo apt-get install incron
echo "redis" >> /etc/incron.allow
export EDITOR=vi
incrontab -e
/path/where/you/dump/files IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /bin/copy_snapshot
then create a /bin/copy_snapshot script file rename it with a date or something and make sure there are X number of copies

PostgreSQL : Find Postgresql database file on another drive and restore it

I am working on a PostgreSQL database and recently we had a server upgrade, during which we changed our drive from a 2Tb raid Hard disk to a SSD. Now I mounted the RAID drive on a partition and can even access it.
Next what I would like to do is to get the database out of the mounted drive and restore it on the currently running PostgreSQL. How can I achieve this?
root#check03:/mnt/var/lib/postgresql/9.1/main/global# ls
11672 11674 11805 11809 11811 11813_fsm 11816 11820 11822 11824_fsm 11828 11916 11920 pg_internal.init
11672_fsm 11675 11807 11809_fsm 11812 11813_vm 11818 11820_fsm 11823 11824_vm 11829 11918 pg_control pgstat.stat
11672_vm 11803 11808 11809_vm 11813 11815 11819 11820_vm 11824 11826 11914 11919 pg_filenode.map
root#check03:/mnt/var/lib/postgresql/9.1/main/global# cd ..
As you can see I am able to access the drives and the folders, but I don't know what to do next. Kindly let me know. Thanks a lot.
You need the same version of PostgreSQL (9.1), also the same or later minor version. copy main/ and everything below that to the new location. Copy the configuration of the old instance and adapt the paths to fit to the new location (the main/ ist the ''data directory'' (also sometimes called PGDATA)). Start the new instance and look carefully at the logs. You should probably rebuild any indexes.
Also read about the file layout in the fine documentation.
EDIT: If you have any chance to run the old configuration, read about backup and restore, this is a much more safe way to transfer data.
the Postgres binaries must be the same version
make sure that postgres is not running
copy using cp -rfp or tar | tar or cpio , or whatever you like. Make sure you preserve the file owners and mode (top-level-directory must be 0700, owned by postgres)
make sure that the postgres-startup (in /etc/init.d/postxxx) refers to the new directory; sometimes there is an environment variable $PGDATA contiaining the name of the postgres data directory; maybe you need to make changes to new_directory/postgres.conf, too (pg_log et al)
for safety, rename the old data directory
restart Postgres
try to connect to it; check the logs.
Extra:
Seasoned unix-administrators (like the BOFH ;-) might want to juggle with mountpoints and/or symlinks (instead of copying). Be my guest. YMMV
Seasoned DBAs might want to create a tablespace, point it at the new location and (selectively) move databases, schemas or tables to the new location.

What's the correct way to deal with databases in Git?

I am hosting a website on Heroku, and using an SQLite database with it.
The problem is that I want to be able to pull the database from the repository (mostly for backups), but whenever I commit & push changes to the repository, the database should never be altered. This is because the database on my local computer will probably have completely different (and irrelevant) data in it; it's a test database.
What's the best way to go about this? I have tried adding my database to the .gitignore file, but that results in the database being unversioned completely, disabling me to pull it when I need to.
While git (just like most other version control systems) supports tracking binary files like databases, it only does it best for text files. In other words, you should never use version control system to track constantly changing binary database files (unless they are created once and almost never change).
One popular method to still track databases in git is to track text database dumps. For example, SQLite database could be dumped into *.sql file using sqlite3 utility (subcommand .dump). However, even when using dumps, it is only appropriate to track template databases which do not change very often, and create binary database from such dumps using scripts as part of standard deployment.
you could add a pre-commit hook to your local repository, that will unstage any files that you don't want to push.
e.g. add the following to .git/hooks/pre-commit
git reset ./file/to/database.db
when working on your code (potentially modifying your database) you will at some point end up:
$ git status --porcelain
M file/to/database.db
M src/foo.cc
$ git add .
$ git commit -m "fixing BUG in foo.cc"
M file/to/database.db
.
[master 12345] fixing BUG in foo.cc
1 file changed, 1 deletion(-)
$ git status --porcelain
M file/to/database.db
so you can never accidentally commit changes made to your database.db
Is it the schema of your database you're interested in versioning? But making sure you don't version the data within it?
I'd exclude your database from git (using the .gitignore file).
If you're using an ORM and migrations (e.g. Active Record) then your schema is already tracked in your code and can be recreated.
However if you're not then you may want to take a copy of your database, then save out the create statements and version them.
Heroku don't recommend using SQLite in production, and to use their Postgres system instead. That lets you do many tasks to the remote DB.
If you want to pull the live database from Heroku the instructions for Postgres backups might be helpful.
https://devcenter.heroku.com/articles/pgbackups
https://devcenter.heroku.com/articles/heroku-postgres-import-export

How to delete/create databases in Neo4j?

Is it possible to create/delete different databases in the graph database Neo4j like in MySQL? Or, at least, how to delete all nodes and relationships of an existing graph to get a clean setup for tests, e.g., using shell commands similar to rmrel or rm?
You can just remove the entire graph directory with rm -rf, because Neo4j is not storing anything outside that:
rm -rf data/*
Also, you can of course iterate through all nodes and delete their relationships and the nodes themselves, but that might be too costly just for testing ...
even more simple command to delete all nodes and relationships:
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r
From Neo4j 2.3,
We can delete all nodes with relationships,
MATCH (n)
DETACH DELETE n
Currently there is no any option to create multiple databases in Noe4j. You need to make multiple stores of Neo4j data. See reference.
Creating new Database in Neo4j
Before Starting neo4j community click the browse option
and choose a different directory
and click start button.
New database created on that direcory
quick and dirty way that works fine:
bin/neo4j stop
rm -rf data/
mkdir data
bin/neo4j start
For anyone else who needs a clean graph to run a test suite - https://github.com/jexp/neo4j-clean-remote-db-addon is a great extension to allow clearing the db through a REST call. Obviously, though, don't use it in production!
Run your test code on a different neo4j instance.
Copy your neo4j directory into a new location. Use this for testing. cd into the new directory.
Change the port so that you can run your tests and use it normally simultaneously. To change the port open conf/neo4j-server.properties and set org.neo4j.server.webserver.port to an unused one.
Start the test server on setup. Do ./neo4j stop and rm -rf data/graph.db on teardown.
For more details see neo4j: How to Switch Database? and the docs.
In Neo4j 2.0.0 the ? is no longer supported. Use OPTIONAL MATCH instead:
START n=node(*)
OPTIONAL MATCH (n)-[r]-()
delete n,r;
Easiest answer is: NO
The best way to "start over" is to
move to another empty data folder
or
close Neo4j completely
empty the old data folder
restart Neo4j and set the empty folder as the data folder
There is a way to delete all nodes and relationships (as described here)
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r
As of version 3 I believe it is now possible to create separate database instances and thus their location is slightly different.
Referring to:https://neo4j.com/developer/guide-import-csv/
The --into retail.db is obviously the target database, which must not contain an existing database.
On my Ubuntu box the location is in:
/var/lib/neo4j/data/databases where I currently see only graph.db which I believe must be the default.
In 2.0.0 -M6 You can execute the following Cypher script to delete all nodes and relations:
start n=node(*)
match (n)-[r?]-()
delete n,r
If you have very large database,
`MATCH (n) DETACH DELETE n`
would take lot of time and also database may get stuck(I tried to use it, but does not work for a very large database). So here is how I deleted a larger Neo4j database on a linux server.
First stop the running Neo4j database.
sudo neo4j stop
Second, delete the databases folder and transactions folder inside data folder in neo4j folder. So where to find the neo4j folder? You can find the neo4j executable path by executing which neo4j. Check for data folder going through that path (it is located inside neo4j folder). And go inside the data folder and you will see databases and transactions folders.
rm -rf databases/ rm -rf transactions/
Restart the Neo4j server
sudo neo4j start
You can delete your data files and if you want to go through this way, I would recommend delete just your graph.db, for example. Otherwise your are going to mess your authentication info.

Making Postgres SQL minimal size. How?

I want to cut Postgres to its minimal size for purpose of including just database function with my application. I'm using Portable Postgres found on internet.
Any suggestions what I can delete from Postgres installation which is not needed for normal database use?
You can delete all the standalone tools in /bin - it can all be done with psql. Keep anything that starts wth pg_, postgres and initdb.
You can probably delete a bunch of conversions in lib/ (the some_and_some.so) files, but probably not until after you've initdb'ed. And be careful not to delete one you'll be using at some point - they are dynamically loaded so you won't notice until a client connects with a different encoding for example.
But note that this probably won't get you much - on my system with debug enabled etc, the binaries take 17Mb. The clean data directory with no data at all in it takes 33Mb, about twice as much. Which you will need if you're going to be able to use your database..

Resources