Recently I created a new Heroku app for production and populated it's database with a backup that I took from the staging database.
The problem is that the database size, as shown on Heroku's Postgres webpage for the two databases is different!
The first database, where I took the backup from was 360 MBs and the new database that was populated was only 290 MBs.
No errors showed up during the backup/load process. and taking a backup from the two databases results in the same backup file size (around 40 MBs).
The project is working fine, and the two apps look exactly the same, but I'm concerned that I might have lost some data that would cause troubles in the future.
More info: I'm using the same production database plan on both Apps.
Also, the first database is not attached to the first instance (because it was added from the Postgres management page, not from the App's resources page) and the new database is attached to the new App.
Thanks in advance
It is ok for postgresql DB to consume more space when in use.
The reason of this is its MVCC system. Every time you UPDATE any record in a database it creates another "version" of this record instead of rewriting the previous. This "outdated" records will be deleted by VACUUM process, when there will be no need in them.
So, when you restored your db from backup, it didn't have any "dead" records and its size was less.
Details here http://www.postgresql.org/docs/current/static/mvcc.html and http://www.postgresql.org/docs/current/static/sql-vacuum.html.
P.S. You do not need to worry about it. Postgresql will handle VACUUM automatically.
See if this helps: PostgreSQL database size increasing
Also try to measure the size of each table individually and for those tables where you see differences, compare counts of records: postgresql total database size not matching sum of individual table sizes
Related
I have a SQL Server database on a production server. The size of database is above 30 GB now. When I export its data using the Import-Export tool in SQL Server Management Studio, the target database filled up with compete data.
But the size of target database reduced dramatically around 50% (i.e. goes to 14 / 15 GB). I tried shrink command to reduce the size but it reduces the database not more than 2 percent.
My process for this import export is as below:
Generate scripts for tables/functions/procedures creation from source database
Use those scripts on blank target database to create same tables (with constraints), functions and procedures
Then disable all constraint on target database
Use SQL Server’s Import/Export tool and import data from source to target database
After completion of importing of all data, enable all constraints on target database
Done
This process make the exact replica of source database. The target database however comes with very small in size when I compare it with the source database.
My question is, can I fully trust this new target database? Can I replace the source database with this target database and use it on production server?
Whether 30 Gb is large or small obviously depends on your specific business context, so I'm not sure what Preben is trying to say there. Maybe just having a bit of a boast?
However, I agree that using the wizard to spin up a new copy of the database is an odd approach. It would be much faster and easier to backup the existing production database, and restore it as a new copy.
The target database will be smaller because it will have no fragmentation, and possibly because the recovery setting is different. As the first database is constantly in use over a long period of time, SQL has to find somewhere to put all the new data being created.
As an analogy, think about all the things you have in your home. Things are on shelves, things are in cabinets, some things are just on the floor. Now you go and buy something new, and you have to figure out where to put it. Maybe you have to move some of your old things to find space for the new things. Everything gets spread out.
Then one day you decide to move house. You take all your things, put them in boxes, and move the boxes to the new house. Everything in your new home is organised into a small amount of space, inside the boxes, because you haven't yet started using any of it.
Once you start actually using those things you will have to spread them all around your new home, on shelves, in cabinets, and so on, and they will suddenly be taking up more space. The more things you need to use, the more boxes get unpacked and spread around the room.
I have a 400GB TFS database (tfs_DefaultCollection). I have ran the Attachment cleaner tool that has informed me it has deleted 200GBs of data. After this and querying the largest tables the row counts are the same and sizes haven't changed. The mdf file size remains the same and so does the top four tables. (tbl_FunctionCoverage, tbl_TestResult, tbl_BuildInformation and tbl_Content). I am assuming there is some form of tidy scripts I need to run maybe? I have executed prc_DeleteUnusedContent and prc_DeleteUnUsedFiles but I believe they are more for version control and workspaces as they made no changes.
I will shrink the database and reindex the tables but as the table row counts and sizes haven't changed i can't see it making much difference.
Any advice is appreciated.
So I think I may have solved my problem...
I wrote a small app to enumerate our build defnitions finding all those marked as deleted. For each deleted build I then deleted all associated test runs and lastly destroying the build. This took around 16 hours to run removing 25,000 builds and about 60,000 test runs. This doesn't appear to change a great deal in the database straight away but some of the tables did reduce in row count.
I then left the database a few days (turned into about 10) and background cleanup jobs that ran appeared to clean up a large amount of the data, they did however run for several days in doing this and in a live system I'm not sure on the performance impact.
Shrinking the database after this reclaimed about 160GB's (took around 10-12 hours) also disabling reporting and removing the Warehouse database (70GB) has bought the total down to database size to 175 GBS.
We used the attachment cleaner tool and it worked for us. I do believe we had to shrink the db as well before we saw the db size drop.
We have a neo4j database with ~10Mil nodes and ~300 million relationships. the database has grown to about 80 Gig. There are daily jobs that kill off old data and add new data, so the approx number of nodes and relationships stays fairly constant. However, the physical file size of the db files keeps on growing (for example, the relationshipstore file is at 50Gig currently).
I have found the following link that might explain why the size might not go down when deleting (space is left reserved and is taken up by new relationships and nodes), however it still does not explain why our database keeps growing!
neostore.* file size after deleting millions node
Questions:
a) What can we check to find out why our db is still growing?
b) Most relational db's I've worked with always had "shrink files" or "optimize free space" function. Does something like this exist for Neo4j? (google search was unsuccessful!).
Thank you,
-A
P.S. We're running Neo4j 2.1.5 Community Edition on Ubuntu 14
There is a tool called store-utils which will allow you to effectively compact the store. You'd have to shut down the store before running that though.
As always take a back up before you try out any tools that do anything with the store.
What is the fastest way to backup/restore Azure SQL database?
The background: We have the database with size ~40 GB and restoring it from the .bacbac file (~4GB of compressed data) in the native way by Azure SQL Database Import/Export Service takes up to 6-8 hours. Creating .bacpac is also very long and takes ~2 hours.
UPD:
UPD.
Creating the database (by the way transactional consistent) copy using CREATE DATABASE [DBBackup] AS COPY OF [DB] takes only 15 minutes with 40 GB database and the restore is simple database rename.
UPD. Dec, 2014. Let me share with you our experience about the fastest way of DB migration schema we ended up with.
First of all, the approach with data-tier application (.bacpac) turned out to be not viable for us after DB became slightly bigger and it also will not work for you if you have at least one non-clustered index with total size > 2 GB until you disable non-clustered indexes before export - it's due to Azure SQL transaction log limit.
We stick to Azure Migration Wizard that for data transfer just runs BCP for each table (parameters of BCP are configurable) and it's ~20% faster than approach with .bacpac.
Here are some pitfalls we encountered with the Migration Wizard:
We run into encoding troubles for non-Unicode strings. Make sure
that BCP import and export runs with same collation. It's -C ... configuration switch, you can find parameters with which BCP calling
in .config file for MW application.
Take into account that MW (at least the version that is actual at the moment of this writing) runs BCP with parameters that will leave the constraints in non-trusted state, so do not forget to check all non-trusted constraints after BCP import.
If your database is 40GB it's long past time to consider having a redundant Database server that's ready to go as soon as the main becomes faulty.
You should have a second server running alongside the main DB server that has no actual routines except to sync with the main server on an hourly/daily basis (depending on how often your data changes, and how long it takes to run this process). You can also consider creating backups from this database server, instead of the main one.
If your main DB server goes down - for whatever reason - you can change the host address in your application to the backup database, and spend the 8 hours debugging your other server, instead of twiddling your thumbs waiting for the Azure Portal to do its thing while your clients complain.
Your database shouldn't be taking 6-8 hours to restore from backup though. If you are including upload/download time in that estimate, then you should consider storing your data in the Azure datacenter, as well as locally.
For more info see this article on Business Continuity on MSDN:
http://msdn.microsoft.com/en-us/library/windowsazure/hh852669.aspx
You'll want to specifically look at the Database Copies section, but the article is worth reading in full if your DB is so large.
Azure now supports Point in time restore / Geo restore and GeoDR features. You can use the combination of these to have quick backup / restore. PiTR and Geo restore comes with no additional cost while you have to pay for
Geo replica
There are multiple ways to do backup, restore and copy jobs on Azure.
Point in time restore.
Azure Service takes full backups, multiple differential backups and t-log backups every 5 minutes.
Geo Restore
same as Point in time restore. Only difference is that it picks up a redundant copy from a different blob storage stored in a different region.
Geo-Replication
Same as SQL Availability Groups. 4 Replicas Async with read capabilities. Select a region to become a hot standby.
More on Microsoft Site here. Blog here.
Azure SQL Database already has these local replicas that Liam is referring to. You can find more details on these three local replicas here http://social.technet.microsoft.com/wiki/contents/articles/1695.inside-windows-azure-sql-database.aspx#High_Availability_with_SQL_Azure
Also, SQL Database recently introduced new service tiers that include new point-in-time-restore. Full details at http://msdn.microsoft.com/en-us/library/azure/hh852669.aspx
Key is to use right data management strategy as well that helps solve your objective. Wrong architecture and approach to put everything on cloud can prove disastrous... here's more to it to read - http://archdipesh.blogspot.com/2014/03/windows-azure-data-strategies-and.html
There is 2 databases: "temp" and "production". Each night production database should be "synchronized", so it will have exactly same data as in "temp". Database sizes are several GB and just copying all data is not an option. But changes are usually quite small: ~100 rows added, ~1000 rows updated and some removed. About 5-50Mb per day.
I was thinking maybe there is some tool (preferably free) that could go trough both databases and create patch, that could be applied to "production database". Or as option just "synchronize" both databases. And it should quite fast. In other words something like rsync for data in databases.
If there is some solution for specific database (mysql, h2, db2, etc), it will be also fine.
PS: structure is guaranteed to be same, so this question is only about transferring data
Finally i found a way to do it in Kettle (PDI):
http://wiki.pentaho.com/display/EAI/Synchronize+after+merge
Only one con: I need create such transformation for each table separately.
Why not setup database replication from Temp Database to your Production database where your temp database will act as the Master and Production will act as a slave. Here is a link for setting up replication in MySql. MSSQL also supports database replication as well. Google should show up many tutorials.