How to delete/create databases in Neo4j? - database

Is it possible to create/delete different databases in the graph database Neo4j like in MySQL? Or, at least, how to delete all nodes and relationships of an existing graph to get a clean setup for tests, e.g., using shell commands similar to rmrel or rm?

You can just remove the entire graph directory with rm -rf, because Neo4j is not storing anything outside that:
rm -rf data/*
Also, you can of course iterate through all nodes and delete their relationships and the nodes themselves, but that might be too costly just for testing ...

even more simple command to delete all nodes and relationships:
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r

From Neo4j 2.3,
We can delete all nodes with relationships,
MATCH (n)
DETACH DELETE n
Currently there is no any option to create multiple databases in Noe4j. You need to make multiple stores of Neo4j data. See reference.

Creating new Database in Neo4j
Before Starting neo4j community click the browse option
and choose a different directory
and click start button.
New database created on that direcory

quick and dirty way that works fine:
bin/neo4j stop
rm -rf data/
mkdir data
bin/neo4j start

For anyone else who needs a clean graph to run a test suite - https://github.com/jexp/neo4j-clean-remote-db-addon is a great extension to allow clearing the db through a REST call. Obviously, though, don't use it in production!

Run your test code on a different neo4j instance.
Copy your neo4j directory into a new location. Use this for testing. cd into the new directory.
Change the port so that you can run your tests and use it normally simultaneously. To change the port open conf/neo4j-server.properties and set org.neo4j.server.webserver.port to an unused one.
Start the test server on setup. Do ./neo4j stop and rm -rf data/graph.db on teardown.
For more details see neo4j: How to Switch Database? and the docs.

In Neo4j 2.0.0 the ? is no longer supported. Use OPTIONAL MATCH instead:
START n=node(*)
OPTIONAL MATCH (n)-[r]-()
delete n,r;

Easiest answer is: NO
The best way to "start over" is to
move to another empty data folder
or
close Neo4j completely
empty the old data folder
restart Neo4j and set the empty folder as the data folder
There is a way to delete all nodes and relationships (as described here)
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r

As of version 3 I believe it is now possible to create separate database instances and thus their location is slightly different.
Referring to:https://neo4j.com/developer/guide-import-csv/
The --into retail.db is obviously the target database, which must not contain an existing database.
On my Ubuntu box the location is in:
/var/lib/neo4j/data/databases where I currently see only graph.db which I believe must be the default.

In 2.0.0 -M6 You can execute the following Cypher script to delete all nodes and relations:
start n=node(*)
match (n)-[r?]-()
delete n,r

If you have very large database,
`MATCH (n) DETACH DELETE n`
would take lot of time and also database may get stuck(I tried to use it, but does not work for a very large database). So here is how I deleted a larger Neo4j database on a linux server.
First stop the running Neo4j database.
sudo neo4j stop
Second, delete the databases folder and transactions folder inside data folder in neo4j folder. So where to find the neo4j folder? You can find the neo4j executable path by executing which neo4j. Check for data folder going through that path (it is located inside neo4j folder). And go inside the data folder and you will see databases and transactions folders.
rm -rf databases/ rm -rf transactions/
Restart the Neo4j server
sudo neo4j start

You can delete your data files and if you want to go through this way, I would recommend delete just your graph.db, for example. Otherwise your are going to mess your authentication info.

Related

How do I completely delete a MarkLogic database along with it's servers and forests?

I have multiple databases in my local which I do not need. Can I run a curl script or a REST API command where I can delete the database, it's servers and all of the forests so that I can use gradle to just deploy them again?
I have tried to manually delete the server first, then the database and then the forests. This is a lengthy process.
I want a single command to do the whole job for me instead of manually having to delete the components one by one which is possible through the admin interface.
Wagner Michael has a fair point in his comment. If you already used (ml-)Gradle to create servers and databases, why not use its mlUndeploy -Pconfirm=true task to get rid of them? You could potentially even use a fake project, with stub configs to get rid of a fairly random set of databases and servers, though that still takes some manual work.
By far the quickest way to reset your entire MarkLogic, is to stop it, and wipe its data directory. This SO question gives instructions on how to do it, as part of a solution to recover when you lost your admin password:
https://stackoverflow.com/a/27803923/918496
HTH!

What's the correct way to deal with databases in Git?

I am hosting a website on Heroku, and using an SQLite database with it.
The problem is that I want to be able to pull the database from the repository (mostly for backups), but whenever I commit & push changes to the repository, the database should never be altered. This is because the database on my local computer will probably have completely different (and irrelevant) data in it; it's a test database.
What's the best way to go about this? I have tried adding my database to the .gitignore file, but that results in the database being unversioned completely, disabling me to pull it when I need to.
While git (just like most other version control systems) supports tracking binary files like databases, it only does it best for text files. In other words, you should never use version control system to track constantly changing binary database files (unless they are created once and almost never change).
One popular method to still track databases in git is to track text database dumps. For example, SQLite database could be dumped into *.sql file using sqlite3 utility (subcommand .dump). However, even when using dumps, it is only appropriate to track template databases which do not change very often, and create binary database from such dumps using scripts as part of standard deployment.
you could add a pre-commit hook to your local repository, that will unstage any files that you don't want to push.
e.g. add the following to .git/hooks/pre-commit
git reset ./file/to/database.db
when working on your code (potentially modifying your database) you will at some point end up:
$ git status --porcelain
M file/to/database.db
M src/foo.cc
$ git add .
$ git commit -m "fixing BUG in foo.cc"
M file/to/database.db
.
[master 12345] fixing BUG in foo.cc
1 file changed, 1 deletion(-)
$ git status --porcelain
M file/to/database.db
so you can never accidentally commit changes made to your database.db
Is it the schema of your database you're interested in versioning? But making sure you don't version the data within it?
I'd exclude your database from git (using the .gitignore file).
If you're using an ORM and migrations (e.g. Active Record) then your schema is already tracked in your code and can be recreated.
However if you're not then you may want to take a copy of your database, then save out the create statements and version them.
Heroku don't recommend using SQLite in production, and to use their Postgres system instead. That lets you do many tasks to the remote DB.
If you want to pull the live database from Heroku the instructions for Postgres backups might be helpful.
https://devcenter.heroku.com/articles/pgbackups
https://devcenter.heroku.com/articles/heroku-postgres-import-export

Sync web directories between load balanced servers

I have two load balanced servers each running exactly the same copy of various PHP based websites.
An admin user (or multiple admin users) hitting their site might therefore hit one or other of the servers when wanting to change content e.g. upload an image, delete a file from a media library, etc.
These operations mean one or other or both the servers go out of sync with each other and need to be brought back into line.
Currently I'm looking at rsync for this with the --delete option but am unsure how it reacts to files being deleted vs. new files being created between servers.
i.e. if I delete a file on server Server A and rsync with Server B the file should also be deleted from Server B (as it no longer exists on A) BUT If I separately upload a file to Server B as well as deleting a file from Server A before running the sync, will the file that got uploaded to Server B also get removed as it doesn't exist on Server A?
A number of tutorials on the web deal with a Master-Slave type scenario where Server B is a Mirror of Server A and this process just works, but in my situation both servers are effectively Masters mirroring each other.
I think rsync keeps a local history of files it's dealing with and as such may be able to deal with this problem gracefully but am not sure if this is really the case, or if it's dangerous to rely on this alone?
Is there a better way of dealing with this issue?
I wasn't happy with my previous answer. Sounds too much like somebody must have invented a way to do this already.
It seems that there is! Check out Unison. There's a GUI for it and everything.
First, if you're doing a bidirectional rsync (i.e. running it first one way, and then the other) then you need to be using --update, and you need to have the clocks on both servers precisely aligned. If both servers write to the same file, the last write wins, and the earlier write is lost.
Second, I don't think you can use delete. Not directly anyway. The only state that rsync keeps is the state of the filesystem itself, and if that's a moving target then it'll get confused.
I suggest that when you delete a file you write its name to a file. Then, instead of using rsync --delete, do it manually with, for example, cat deleted-files | ssh serverb xargs rm -v.
So, your process would look something like this:
ServerA:
rsync -a --update mydir serverB:mydir
cat deleted-files | ssh serverB xargs rm -v
ServerB:
rsync -a --update mydir serverA:mydir
cat deleted-files | ssh serverA xargs rm -v
Obviously, the two syncs can't run at the same time, and I've left off other important rsync options: you probably want to consider --delay-updates, --partial-dir and others.

Empty my Sqlite3 database in RoR

I am working on a Ruby on Rails 3 web application using sqlite3. I have been testing my application on-the-fly creating and destroying things in the Database, sometimes through the new and edit actions and sometimes through the Rails console.
I am interested in emptying my Database totally and having only the empty tables left. How can I achieve this? I am working with a team so I am interested in two answers:
1) How do I empty the Database only by me?
2) How can I (if possible empty) it by the others (some of which are not using sqlite3 but MySql)? (we are all working on an the same project through a SVN repository)
To reset your database, you can run:
rake db:schema:load
Which will recreate your database from your schema.rb file (maintained by your migrations). This will additionally protect you from migrations that may later fail due to code changes.
Your dev database should be distinct to your environment - if you need certain data, add it to your seed.rb file. Don't share a dev database, as you'll quickly get into situations where other changes make your version incompatible.
Download sqlitebrower here http://sqlitebrowser.org/
Install it, run it, click open database (top left) to locationOfYourRailsApp/db/development.sqlite3
Then switch to Browse data tab, there you can delete or add data.
I found that by deleting the deployment.sqlite3 file from the db folder and inserting the command rake db:migrate in the command line, it solves the problem for all of my team working on sqlite3.
As far as I know there is no USER GRANT management in sqlite so it is difficult to control access.
You only can protect the database by file access.
If you want to use an empty database for test purpose.
Generate it once and copy the file somewhere.
and use a copy of this file just before testing.

What are the best practices for database scripts under code control

We are currently reviewing how we store our database scripts (tables, procs, functions, views, data fixes) in subversion and I was wondering if there is any consensus as to what is the best approach?
Some of the factors we'd need to consider include:
Should we checkin 'Create' scripts or checkin incremental changes with 'Alter' scripts
How do we keep track of the state of the database for a given release
It should be easy to build a database from scratch for any given release version
Should a table exist in the database listing the scripts that have run against it, or the version of the database etc.
Obviously it's a pretty open ended question, so I'm keen to hear what people's experience has taught them.
After a few iterations, the approach we took was roughly like this:
One file per table and per stored procedure. Also separate files for other things like setting up database users, populating look-up tables with their data.
The file for a table starts with the CREATE command and a succession of ALTER commands added as the schema evolves. Each of these commands is bracketed in tests for whether the table or column already exists. This means each script can be run in an up-to-date database and won't change anything. It also means that for any old database, the script updates it to the latest schema. And for an empty database the CREATE script creates the table and the ALTER scripts are all skipped.
We also have a program (written in Python) that scans the directory full of scripts and assembles them in to one big script. It parses the SQL just enough to deduce dependencies between tables (based on foreign-key references) and order them appropriately. The result is a monster SQL script that gets the database up to spec in one go. The script-assembling program also calculates the MD5 hash of the input files, and uses that to update a version number that is written in to a special table in the last script in the list.
Barring accidents, the result is that the database script for a give version of the source code creates the schema this code was designed to interoperate with. It also means that there is a single (somewhat large) SQL script to give to the customer to build new databases or update existing ones. (This was important in this case because there would be many instances of the database, one for each of their customers.)
There is an interesting article at this link:
https://blog.codinghorror.com/get-your-database-under-version-control/
It advocates a baseline 'create' script followed by checking in 'alter' scripts and keeping a version table in the database.
The upgrade script option
Store each change in the database as a separate sql script. Store each group of changes in a numbered folder. Use a script to apply changes a folder at a time and record in the database which folders have been applied.
Pros:
Fully automated, testable upgrade path
Cons:
Hard to see full history of each individual element
Have to build a new database from scratch, going through all the versions
I tend to check in the initial create script. I then have a DbVersion table in my database and my code uses that to upgrade the database on initial connection if necessary. For example, if my database is at version 1 and my code is at version 3, my code will apply the ALTER statements to bring it to version 2, then to version 3. I use a simple fallthrough switch statement for this.
This has the advantage that when you deploy a new version of your application, it will automatically upgrade old databases and you never have to worry about the database being out of sync with the software. It also maintains a very visible change history.
This isn't a good idea for all software, but variations can be applied.
You could get some hints by reading how this is done with Ruby On Rails' migrations.
The best way to understand this is probably to just try it out yourself, and then inspecting the database manually.
Answers to each of your factors:
Store CREATE scripts. If you want to checkout version x.y.z then it'd be nice to simply run your create script to setup the database immediately. You could add ALTER scripts as well to go from the previous version to the next (e.g., you commit version 3 which contains a version 3 CREATE script and a version 2 → 3 alter script).
See the Rails migration solution. Basically they keep the table version number in the database, so you always know.
Use CREATE scripts.
Using version numbers would probably be the most generic solution — script names and paths can change over time.
My two cents!
We create a branch in Subversion and all of the database changes for the next release are scripted out and checked in. All scripts are repeatable so you can run them multiple times without error.
We also link the change scripts to issue items or bug ids so we can hold back a change set if needed. We then have an automated build process that looks at the issue items we are releasing and pulls the change scripts from Subversion and creates a single SQL script file with all of the changes sorted appropriately.
This single file is then used to promote the changes to the Test, QA and Production environments. The automated build process also creates database entries documenting the version (branch plus build id.) We think this is the best approach with enterprise developers. More details on how we do this can be found HERE
The create script option:
Use create scripts that will build you the latest version of the database from scratch, which is empty except the default lookup data.
Use standard version control techniques to store,branch,tag versions and view histories of your objects.
When upgrading a live database (where you don't want to loose data), create a blank second copy of the database at the new version and use a tool like red-gate's link text
Pros:
Changes to files are tracked in a standard source-code like manner
Cons:
Reliance on manual use of a 3rd party tool to do actual upgrades (no/little automation)
Our company checks them in simply because someone decided to put it in some SOX document that we do. It makes no sense to me at all, except possible as a reference document. I can't see a time we'd pull them out and try and use them again, and if we did we'd have to know which one ran first and which one to run after which. Backing up the database is much more important then keeping the Alter scripts.
for every release we need to give one update.sql file which contains all the new table scripts, alter statements, new/modified packages,roles,etc. This file is used to upgrade the database from 1 version to 2.
What ever we include in update.sql file above one all this statements need to go to individual respective files. like alter statement has to go to table as a new column (table script has to be modifed not Alter statement is added after create table script in the file) in the same way new tables, roles etc.
So whenever if user wants to upgrade he will use the first update.sql file to upgrade.
If he want to build from scrach then he will use the build.sql which already having all the above statements, it makes the database in sync.
sriRamulu
Sriramis4u#yahoo.com
In my case, I build a SH script for this work: https://github.com/reduardo7/db-version-updater
How is an open question
In my case I am trying to create something simple that is easy to use for developers and I do it under the following scheme
Things I tested:
File-based script handling in git using GitlabCI
It does not work, collisions are created and the Administration part has to be done by hand in case of disaster and the development part is too complicated
Use of permissions and access via mysql clients
There is no traceability on changes to the database and the transition to production is manual
Use of programs mentioned here
They require uploading the structures and many adaptations and usually you end up with change control just like the word
Repository usage
Could not control the DRP part
I could not properly control the backups
I don't think it is a good idea to have the backups on the same server and you generate high lasgs for the process
This was what worked best
Manage permissions per user and generate traceability of everything that is sent to the database
Multi platform
Use of development-Production-QA database
Always support before each modification
Manage an open repository for change control
Multi-server
Deactivate / Activate access to the web page or App through Endpoints
the initial project is in:
In case the comment manager reads this part, I understand the self-promotion but please just remove this part and leave the rest since I think it complies with the answer to the question reacted in the post ...
https://hub.docker.com/r/arelis/gitdb
I hope this reaches you since I see that several
There is an interesting article with new URL at: https://blog.codinghorror.com/get-your-database-under-version-control/
It a bit old but the concepts are still there. Good Read!

Resources