Neo4J database size / shrinking - database

We have a neo4j database with ~10Mil nodes and ~300 million relationships. the database has grown to about 80 Gig. There are daily jobs that kill off old data and add new data, so the approx number of nodes and relationships stays fairly constant. However, the physical file size of the db files keeps on growing (for example, the relationshipstore file is at 50Gig currently).
I have found the following link that might explain why the size might not go down when deleting (space is left reserved and is taken up by new relationships and nodes), however it still does not explain why our database keeps growing!
neostore.* file size after deleting millions node
Questions:
a) What can we check to find out why our db is still growing?
b) Most relational db's I've worked with always had "shrink files" or "optimize free space" function. Does something like this exist for Neo4j? (google search was unsuccessful!).
Thank you,
-A
P.S. We're running Neo4j 2.1.5 Community Edition on Ubuntu 14

There is a tool called store-utils which will allow you to effectively compact the store. You'd have to shut down the store before running that though.
As always take a back up before you try out any tools that do anything with the store.

Related

Use Import / Export from SQL Server Management Studio to reduce database size

I have a SQL Server database on a production server. The size of database is above 30 GB now. When I export its data using the Import-Export tool in SQL Server Management Studio, the target database filled up with compete data.
But the size of target database reduced dramatically around 50% (i.e. goes to 14 / 15 GB). I tried shrink command to reduce the size but it reduces the database not more than 2 percent.
My process for this import export is as below:
Generate scripts for tables/functions/procedures creation from source database
Use those scripts on blank target database to create same tables (with constraints), functions and procedures
Then disable all constraint on target database
Use SQL Server’s Import/Export tool and import data from source to target database
After completion of importing of all data, enable all constraints on target database
Done
This process make the exact replica of source database. The target database however comes with very small in size when I compare it with the source database.
My question is, can I fully trust this new target database? Can I replace the source database with this target database and use it on production server?
Whether 30 Gb is large or small obviously depends on your specific business context, so I'm not sure what Preben is trying to say there. Maybe just having a bit of a boast?
However, I agree that using the wizard to spin up a new copy of the database is an odd approach. It would be much faster and easier to backup the existing production database, and restore it as a new copy.
The target database will be smaller because it will have no fragmentation, and possibly because the recovery setting is different. As the first database is constantly in use over a long period of time, SQL has to find somewhere to put all the new data being created.
As an analogy, think about all the things you have in your home. Things are on shelves, things are in cabinets, some things are just on the floor. Now you go and buy something new, and you have to figure out where to put it. Maybe you have to move some of your old things to find space for the new things. Everything gets spread out.
Then one day you decide to move house. You take all your things, put them in boxes, and move the boxes to the new house. Everything in your new home is organised into a small amount of space, inside the boxes, because you haven't yet started using any of it.
Once you start actually using those things you will have to spread them all around your new home, on shelves, in cabinets, and so on, and they will suddenly be taking up more space. The more things you need to use, the more boxes get unpacked and spread around the room.

Compression not available on SQL Server Standard? Options?

So as they say, everyday is a school day. Today I learned that my workplace, runs SQL Server Standard edition, where I would have assumed Enterprise was in place. Although in reality shouldn't be surprised!
For some context, we have a very large database that houses our warehouse data. As the database has grown to a large size, it's causing issues with space on the server along with some application performance. So looking at it from my perspective I suggested we archive and purge the PROD database, to house only 18 months data in the PROD environment.
Wrote my scripts and tested them and all fine. I then went to compress the tables I had deleted data from, to find error messages that compression is not available in SQL Server Standard and requires Enterprise edition.
Wondering what my next steps are here? My assumption is that even though I am deleting a lot of data, we won't actually benefit in terms of performance, and space requisition until the tables get compressed.
Shrinking is something I guess I've always shy'd away from, many articles or posts here would advise not to use it.
Wondering, what sort of options do I have here?
Is my assumption correct, in that without compressing, we won't regain space from the trimmed database?
Moving this to resolved as opening query in specific DBA portion of the site

MySQL Database Dump Restores to Smaller Size

I just started using HeidiSQL to manage my databases at work. I was previously using NaviCat, and I was able to simply drag and drop a database from one server to a database on another server and copy all the tables and data--piece of cake, backed up. Similar functionality isn't obvious in Heidi, but I decided using mysqldump is a standard way of backing up a database and I should get to know it.
I used Heidi to perform the dump--creating databases, creating tables, inserting data, and making one single SQL file. This was performed on one single database that is 801.7 MB (checksum: 1755734665). When I executed the dump file on my local doppelganger database it appeared to work flawlessly, however the database size is 794.0 MB (checksum: 2937674450).
So, by creating a new database from a mysqldump of a database I lost 7.7 MB of something (and the databases have different checksums--not sure what that means though). I searched and found a post saying that performing a mysqldump can optimize a database, but could not find any conclusive information.
Am I being a helicopter parent? Is there any easy way to make sure I didn't lose data/structure somehow?
Thank you,
Kai

PostgreSQL database size is less after backup/load on Heroku

Recently I created a new Heroku app for production and populated it's database with a backup that I took from the staging database.
The problem is that the database size, as shown on Heroku's Postgres webpage for the two databases is different!
The first database, where I took the backup from was 360 MBs and the new database that was populated was only 290 MBs.
No errors showed up during the backup/load process. and taking a backup from the two databases results in the same backup file size (around 40 MBs).
The project is working fine, and the two apps look exactly the same, but I'm concerned that I might have lost some data that would cause troubles in the future.
More info: I'm using the same production database plan on both Apps.
Also, the first database is not attached to the first instance (because it was added from the Postgres management page, not from the App's resources page) and the new database is attached to the new App.
Thanks in advance
It is ok for postgresql DB to consume more space when in use.
The reason of this is its MVCC system. Every time you UPDATE any record in a database it creates another "version" of this record instead of rewriting the previous. This "outdated" records will be deleted by VACUUM process, when there will be no need in them.
So, when you restored your db from backup, it didn't have any "dead" records and its size was less.
Details here http://www.postgresql.org/docs/current/static/mvcc.html and http://www.postgresql.org/docs/current/static/sql-vacuum.html.
P.S. You do not need to worry about it. Postgresql will handle VACUUM automatically.
See if this helps: PostgreSQL database size increasing
Also try to measure the size of each table individually and for those tables where you see differences, compare counts of records: postgresql total database size not matching sum of individual table sizes

Why does an empty database take megabytes of space?

When experimenting with (embedded) Apache Derby DB, I noticed that a fresh database, with no tables in it, takes about 1.7 MB of disk space. It's quite a bit more than my common wisdom would expect.
Why is that? Are there significant differences in this between various database engines? Can this be controlled with some "block size" -like settings?
There will be differences between different database engines.
Generally, there will be all the metadata tables that are needed to track the real tables/views/whatever else can appear in the database when they're created, possibly some pre-allocated space ready for when tables are added or when transactions start occurring.
e.g. the model database for SQL Server (2000) occupies ~1.25MB of space, of which 0.5MB is empty. This DB is the basis for all other databases in SQL server.
Why does an empty folder occupies 4KB of data i.e. in Windows?
I have this wild guess out of nowhere...
You said that it's embedded... So since it's embedded, the database itself will contain of the necessary information that it needs in properly handling the database, maybe information about user account information, and so on which in most server/network version of databases is usually handled by built-in databases and so on.. its EMBEDDED! just a thought!

Resources